Everything You Need to Know about SQL for Learning Data Science: Part 1

Achmad Rifki Raihansyah Bagja
CodeX
Published in
5 min readSep 6, 2021

--

Photo by Martin Sanchez on Unsplash

Introduction

In these past few years, maybe you’ve heard about artificial intelligence, data science, machine learning in everywhere, like people talked about it a lot (i’ll talk differences among those topics later, insyaa Allah). Today, i’ll share to you about SQL in Data Science. Let’s get started!

Table of Contents

  1. What is Data Science?
  2. What Does SQL Actually Mean?
  3. Relational Database Management System (RDBMS)
  4. SQL with MySQL Commands

What Is Data Science?

Data science combines multiple fields, including programming, scientific methods, statistics, linear algebra, and data analysis, to extract value from data. In order to analyze the data from the database, we need to extract it. This is where SQL comes to the surface.

What Does SQL Actually Mean?

SQL stands for Sequential Query Language which is used to communicate with a database. According to Terence Shin’s article, SQL is the second most in-demand skills in 2021 after Python and the third top growing skills from 2019 to 2021. This is showing us how important SQL skill in Data Science field is. “Why did you start with SQL over Python? You just said that Python is the number one most in-demand skills in 2021.” For me, SQL is easier to learn than Python, a simple language to get started, a beginner-friendly. So, start with simple things.

Relational Database Management System (RDBMS)

Before we jump into SQL Commands, let’s start with RDBMS. RDBMS is a software system that enables users to define, create, maintain and access to the database based on the relational model. There are several types of RDBMS in this world, including:

  • MySQL and MariaDB
  • PostgreSQL
  • SQLite
  • Oracle Database
  • Microsoft SQL Server
  • IBM DB2
  • Microsoft Azure SQL Database

There are a few differences SQL Commands on each RDBMS but not very significant, don’t worry too much about it. For this time, we’ll be focusing on MySQL Commands (because just MySQL Commands that i know for now).

SQL with MySQL Commands

Actually, there are many types of SQL Statements/Commands that i found, such as Data Definition Language (DDL), Data Manipulation Language (DML), Data Query Language (DQL), Data Control Language (DCL), and Transaction Control Language (TCL). The picture in below is the distribution of types of SQL Statements/Commands:

Image created by Author

From many resources, DQL, DDL, and DML are enough for data science. On this opportunity, i will cover Data Query Language (DQL) first.

Table, Column, and Row

When we enter the world of relational databases, we can’t escape from Table, Column, and Row. Maybe, some of you know so well about those things. Let’s review again.

Image created by Author

Look at the Employee table.

Image created by Author
  • A column is a vertical series of cells in a table
  • A row is a horizontal series of cells in a table

SELECT and FROM

SELECT *
FROM table_name;

or

SELECT column1, column2, column3, ...
FROM table_name;

SELECT and FROM would be your best friend when you’re querying the table. Why? Because those two commands are essential things. You will need SELECT and FROM to retrieve data from the table.

  • SELECT command is used to select data from database and followed by column(s) name. We can retrieve data that consists of all columns using asterisk (*).
  • SELECT is paired with FROM statement. FROM defines which table will be used.

For further understanding, look at the example:

SELECT *
FROM Employee;
Image created by Author

We selected all columns from the Employee table (left) using that command. The result is on the right side. We can retrieve a column or even multiple columns:

SELECT Name
FROM Employee;
Image created by Author

SELECT DISTINCT

SELECT DISTINCT column1, column2, column3, ...
FROM table_name;

SELECT DISTINCT is simply addition of DISTINCT command in SELECT statement. When you have multiple values in a column, you can return only different values using DISTINCT command together with SELECT command. For example:

SELECT DISTINCT City
FROM Employee;

Afterwards, the result returns only different values.

WHERE

SELECT column1, column2, ...
FROM table_name
WHERE condition;

Sometimes, we need a specified condition in our query. WHERE statement has a role in it. WHERE is a SQL statement which is used to filter our records. The following SQL statement selects Name and Age column from Age greater than 30:

SELECT Name, Age
FROM Employee
WHERE Age > 30;

As we can see, the table returns only a specified condition.

ORDER BY

SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;

Do you want to know, who is the oldest employee or the highest salary in your company? Here is the ORDER BY command. ORDER BY does an excellent job that is used to sort the records. There are two types of ordering, descending or ascending order. By default, ORDER BY sorts the records in ascending order. The following SQL statement selects all employees from the Employee table and sorted by Age in descending order:

SELECT *
FROM Employee
ORDER BY Age DESC;
Image created by Author

The table returns by Age in descending order. If we have multiple values in that column, the data that comes first will be returned first. In the table, John comes first before Ben, so John will be returned first.

Thanks for Reading!

Now that you understanding how basic SQL Commands work. Actually, i divide this series into parts. So, follow me and stay tuned!

Achmad Rifki

--

--

Achmad Rifki Raihansyah Bagja
CodeX

Student of Electrical Engineering. Data Scientist wanna be