Everything You Need to Know about SQL for Learning Data Science: Part 1
Introduction
In these past few years, maybe you’ve heard about artificial intelligence, data science, machine learning in everywhere, like people talked about it a lot (i’ll talk differences among those topics later, insyaa Allah). Today, i’ll share to you about SQL in Data Science. Let’s get started!
Table of Contents
- What is Data Science?
- What Does SQL Actually Mean?
- Relational Database Management System (RDBMS)
- SQL with MySQL Commands
What Is Data Science?
Data science combines multiple fields, including programming, scientific methods, statistics, linear algebra, and data analysis, to extract value from data. In order to analyze the data from the database, we need to extract it. This is where SQL comes to the surface.
What Does SQL Actually Mean?
SQL stands for Sequential Query Language which is used to communicate with a database. According to Terence Shin’s article, SQL is the second most in-demand skills in 2021 after Python and the third top growing skills from 2019 to 2021. This is showing us how important SQL skill in Data Science field is. “Why did you start with SQL over Python? You just said that Python is the number one most in-demand skills in 2021.” For me, SQL is easier to learn than Python, a simple language to get started, a beginner-friendly. So, start with simple things.
Relational Database Management System (RDBMS)
Before we jump into SQL Commands, let’s start with RDBMS. RDBMS is a software system that enables users to define, create, maintain and access to the database based on the relational model. There are several types of RDBMS in this world, including:
- MySQL and MariaDB
- PostgreSQL
- SQLite
- Oracle Database
- Microsoft SQL Server
- IBM DB2
- Microsoft Azure SQL Database
There are a few differences SQL Commands on each RDBMS but not very significant, don’t worry too much about it. For this time, we’ll be focusing on MySQL Commands (because just MySQL Commands that i know for now).
SQL with MySQL Commands
Actually, there are many types of SQL Statements/Commands that i found, such as Data Definition Language (DDL), Data Manipulation Language (DML), Data Query Language (DQL), Data Control Language (DCL), and Transaction Control Language (TCL). The picture in below is the distribution of types of SQL Statements/Commands:
From many resources, DQL, DDL, and DML are enough for data science. On this opportunity, i will cover Data Query Language (DQL) first.
Table, Column, and Row
When we enter the world of relational databases, we can’t escape from Table, Column, and Row. Maybe, some of you know so well about those things. Let’s review again.
Look at the Employee table.
- A column is a vertical series of cells in a table
- A row is a horizontal series of cells in a table
SELECT and FROM
SELECT *
FROM table_name;
or
SELECT column1, column2, column3, ...
FROM table_name;
SELECT
and FROM
would be your best friend when you’re querying the table. Why? Because those two commands are essential things. You will need SELECT
and FROM
to retrieve data from the table.
SELECT
command is used to select data from database and followed by column(s) name. We can retrieve data that consists of all columns using asterisk (*).SELECT
is paired withFROM
statement.FROM
defines which table will be used.
For further understanding, look at the example:
SELECT *
FROM Employee;
We selected all columns from the Employee table (left) using that command. The result is on the right side. We can retrieve a column or even multiple columns:
SELECT Name
FROM Employee;
SELECT DISTINCT
SELECT DISTINCT column1, column2, column3, ...
FROM table_name;
SELECT DISTINCT
is simply addition of DISTINCT
command in SELECT
statement. When you have multiple values in a column, you can return only different values using DISTINCT
command together with SELECT
command. For example:
SELECT DISTINCT City
FROM Employee;
Afterwards, the result returns only different values.
WHERE
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Sometimes, we need a specified condition in our query. WHERE
statement has a role in it. WHERE
is a SQL statement which is used to filter our records. The following SQL statement selects Name and Age column from Age greater than 30:
SELECT Name, Age
FROM Employee
WHERE Age > 30;
As we can see, the table returns only a specified condition.
ORDER BY
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;
Do you want to know, who is the oldest employee or the highest salary in your company? Here is the ORDER BY
command. ORDER BY
does an excellent job that is used to sort the records. There are two types of ordering, descending or ascending order. By default, ORDER BY
sorts the records in ascending order. The following SQL statement selects all employees from the Employee table and sorted by Age in descending order:
SELECT *
FROM Employee
ORDER BY Age DESC;
The table returns by Age in descending order. If we have multiple values in that column, the data that comes first will be returned first. In the table, John comes first before Ben, so John will be returned first.
Thanks for Reading!
Now that you understanding how basic SQL Commands work. Actually, i divide this series into parts. So, follow me and stay tuned!
Achmad Rifki
- Let’s connect with me on LinkedIn!
- Follow my Github account.