SQL Basics 4: GROUP BY and ORDER BY Statements
SQL
Welcome to the final part of our SQL Basics series! In this blog post, we'll explore the power of GROUP BY and ORDER BY statements in SQL. Follow hands-on examples using our "EmployeeDemographics" and "EmployeeSalary" tables that we have created in previous exercises. Elevate your SQL skills today!
Hello everyone, this the fourth and last post of our SQL Basics series.
If you missed the last 3 posts, and want to start from the beginning to follow up with our exercises, don’t be shy and check out:
SQL Basics 1: Create Table and Add Data in SSMS
We created 2 tables:
EmployeeDemographics
EmployeeSalary
We will be using these tables to go over GROUP BY and ORDER BY examples today.
SQL Basics 2: SELECT + FROM Statements
Which we learned the basics of SELECT and FROM statements, that we will be constantly using for queries.
Where we learned how to filter data like pro and perform transformative operations using comparison, logical, IN, BETWEEN and LIKE operators as well as NULL and NOT NULL within our queries.
Now, let’s talk about GROUP BY and ORDER BY statements in SQL.
Structured Query Language (SQL) is a powerful tool for managing and retrieving data from databases. Two essential clauses in SQL are GROUP BY and ORDER BY, which allow us to organize and manipulate data in meaningful ways. In this blog post, we'll explore what these statements are, how they work, and how they can be effectively used to enhance data analysis and reporting.
Understanding GROUP BY in SQL:
The GROUP BY statement in SQL is used to group rows with the same values in one or more columns.
It is commonly used in combination with aggregate functions like SUM, AVG, COUNT, MIN, and MAX to perform calculations on the grouped data.
The basic syntax of GROUP BY is as follows:
SELECT: Specifies the columns to be included in the result set.
aggregate_function: Represents the desired aggregate function, e.g., SUM, AVG, COUNT, MIN, or MAX, to perform calculations on grouped data.
FROM: Specifies the table containing the data.
GROUP BY: Indicates the columns by which the data should be grouped.
The GROUP BY statement is like DISTINCT and SELECT statements, showing the unique values in a column.
The difference is that, the DISTINCT clause counts by a unique value. For example, DISTINCT will find that in our EmployeeDemographics table are two rooms: one male room and one female room.
On the other hand, GROUP BY counts the unique rooms, but also counts how many employees are in each room, like how many males are in the male room, and how many females are in the female room.
Confusing?
Let me talk in SQL language for you:
If we want to find a DISTINCT gender, the query is “SELECT DISTINCT (Gender) FROM EmployeeDemographics” -> Execute
The result brings 2 values. One for female and one value for male.
However, if we say the gender, then GROUP BY gender, “SELECT DISTINCT (Gender) FROM EmployeeDemographics GROUP BY Gender” -> Execute
Even though the result looks the same, what is really showing us is the group of all females in row 1, and the group of all males in row 2.
To better show this, let’s count how many employees are in each gender room.
To count each group, the query is “SELECT Gender, COUNT (Gender) FROM EmployeeDemographics GROUP BY Gender” -> Execute
As we can see now, when we count each employee in their respective room, there are 4 females and 5 males.
NOTE: I named the column as GenderCount by adding AS function.
OK… Let’s spice up a little!
Now, we want to see gender and age, counting how many employees have the same age in their respective male/female room, “SELECT Gender, Age, COUNT (Gender) FROM EmployeeDemographics GROUP BY Gender, Age” -> Execute
The result shows us that none of them have the same age. If in the table we had 2 or 3 male employees with age 59, then we would have 2 or 3 on row 7, column (No column name) – as I didn’t name the column like last exercise.
When you are working with a big data, you can put many criteria on SELECT, as long you also put on GROUP BY, if you want to group the criteria.
The COUNT () is the derived field/column. It is based off the criteria – in this case gender. It is technically not a real column in the dataset, it is one that we are creating based on our input. That is why we can name the column as we wish.
The COUNT () is the aggregate_function, which could be SUM, AVG, COUNT, MIN, or MAX, depending on what result you are looking for.
This query works if you SELECT and GROUP BY existing columns in the database. In this case, gender, and age.
With this same query, we can still add some more things:
Let’s say we want gender and age, counting how many employees have the same age in their respective male/female room (like before), but we want only employees under 55, “SELECT Gender, Age, COUNT (Gender) FROM EmployeeDemographics WHERE Age < 55 GROUP BY Gender, Age” -> Execute
As result, we get a list of employees with age under 55. If both females in their 40’s were 41, (No column name) in row 1, would count 2, instead of 1, and row 2 would become Male – 49 – 1.
Exploring ORDER BY in SQL:
The ORDER BY statement is used to sort the result set based on one or more columns.
It allows you to display data in a specified order, such as ascending (ASC) or descending (DESC).
The basic syntax of ORDER BY is as follows:
SELECT: Specifies the columns to be included in the result set.
FROM: Specifies the table containing the data.
ORDER BY: Indicates the columns by which the result set should be sorted, along with the sorting direction (ASC for ascending and DESC for descending). If not specified, ASC is assumed by default.
Often, it is useful to use both GROUP BY and ORDER BY in the same query to perform more advanced data analysis.
For our first example, let’s count how many employees in our list is under 55 per gender. “SELECT Gender, COUNT (Gender) AS CountGender FROM EmployeeDemographics WHERE Age < 55 GROUP BY Gender ORDER BY CountGender” -> Execute
To start, we named the new column as CountGender. By default, CountGender came out in sequence 1 – 4. That because by default, SQL has an ascending feature, which means smallest to largest.
If we want to change that: “SELECT Gender, COUNT (Gender) AS CountGender FROM EmployeeDemographics WHERE Age < 55 GROUP BY Gender ORDER BY CountGender DESC” -> Execute
Now we have 4 – 1 sequence.
Let’s do the same query but instead of ordering by CountGender, we do based on gender, which means that now we are looking at A to Z. Since we are using descending, Z to A. “SELECT Gender, COUNT (Gender) AS CountGender FROM EmployeeDemographics WHERE Age < 55 GROUP BY Gender ORDER BY Gender ASC” -> Execute
The result is male before female because we are ordering by gender descending, that brings male before female (Z to A).
If we change the order to ascending (default feature), “SELECT Gender, COUNT (Gender) AS CountGender FROM EmployeeDemographics WHERE Age < 55 GROUP BY Gender ORDER BY Gender DESC” -> Execute
Female goes first, as F comes before M in the alphabetical order.
Now, let’s look into the whole table and order by age. “SELECT * FROM EmployeeDemographics ORDER BY Age” -> Execute
The return is age smallest to largest, as ascending is the default feature.
If we do descending. “SELECT * FROM EmployeeDemographics ORDER BY Gender, Age DESC” -> Execute
The opposite happens. Largest to smallest.
We can also play around and order by using more than one criterion.
For example, we can order by gender and age. “SELECT * FROM EmployeeDemographics ORDER BY Age, Gender” -> Execute
The result comes under the age, but we also have it ordered by gender. Which means that, if we had a male and female with the same age, female would be first on the result table, in the case of ascending, and male first if we use descending feature.
NOTE: We can choose descending or ascending for each criterion just by placing ASC/DESC by it.
“SELECT * FROM EmployeeDemographics ORDER BY Age DESC, Gender ASC” -> Execute
Another thing you can do is to use numbers instead of column names.
By the default, EmployeeID is column 1, FirstName is column 2, LastName is column 3, Age is column 4, and gender is column 5.
“SELECT * FROM EmployeeDemographics ORDER BY 4 DESC, 5 ASC” -> Execute
Same exact result, but we used numbers to refer to the column instead of their column name.
In conclusion, GROUP BY and ORDER BY are fundamental SQL clauses that empower data analysts and developers to manipulate data effectively.
The GROUP BY statement allows for grouping data by one or more columns and applying aggregate functions to perform calculations. On the other hand, ORDER BY enables the sorting of result sets based on specified columns, providing more control over data presentation.
By understanding and effectively using these powerful SQL statements, you can gain valuable insights from your databases and streamline data reporting processes.
Congratulations!
We just finished our SQL Basics exercises. Keep practicing mastering the skill. Create new tables, think of new queries. When you feel ready, let’s move on to more advanced exercises. They will be a little more challenging but very useful for any analyst.
Thank you for reading and keep on learning and practicing. We got his!
If you like this content and want to learn more, you might also read:
SQL Fundamentals for Data Analysis: A Hands-On Learning Series
SQL Basics 1: How to Create Table and Add Data in SSMS
SQL Basics 2: SELECT + FROM Statements
SQL Basics 4: GROUP BY and ORDER BY Statements
SQL Intermediate 1: INNER and OUTER Joins
SQL Intermediate 2: UNION Operator
SQL Intermediate 3: CASE Statement
SQL Intermediate 4: HAVING Clause
SQL Intermediate 5: UPDATE and DELETE Data
SQL Intermediate 6: AS - Aliasing
SQL Intermediate 7: PARTITION BY Clause
SQL Advanced 1: CTEs - Common Table Expressions
SQL Advanced 3: Stored Procedures
SQL Advanced 4: SUBQUERIES - In Select, From and Where
SQL Advanced 5: String Functions - TRIM, LTRIM, RTRIM, REPLACE, SUBSTRING, UPPER, and LOWER