SQL Basics 4: GROUP BY and ORDER BY Statements

SQL

Vanessa Gottardi

5/4/20238 min read

how to use GROUP BY and ORDER BY statement in sql

Welcome to the final part of our SQL Basics series! In this blog post, we'll explore the power of GROUP BY and ORDER BY statements in SQL. Follow hands-on examples using our "EmployeeDemographics" and "EmployeeSalary" tables that we have created in previous exercises. Elevate your SQL skills today!

Hello everyone, this the fourth and last post of our SQL Basics series.

If you missed the last 3 posts, and want to start from the beginning to follow up with our exercises, don’t be shy and check out:

SQL Basics 1: Create Table and Add Data in SSMS

We created 2 tables:

EmployeeDemographics
EmployeeSalary

We will be using these tables to go over GROUP BY and ORDER BY examples today.

SQL Basics 2: SELECT + FROM Statements

Which we learned the basics of SELECT and FROM statements, that we will be constantly using for queries.

SQL Basics 3: WHERE Statement

Where we learned how to filter data like pro and perform transformative operations using comparison, logical, IN, BETWEEN and LIKE operators as well as NULL and NOT NULL within our queries.

Now, let’s talk about GROUP BY and ORDER BY statements in SQL.

Structured Query Language (SQL) is a powerful tool for managing and retrieving data from databases. Two essential clauses in SQL are GROUP BY and ORDER BY, which allow us to organize and manipulate data in meaningful ways. In this blog post, we'll explore what these statements are, how they work, and how they can be effectively used to enhance data analysis and reporting.

Understanding GROUP BY in SQL:

The GROUP BY statement in SQL is used to group rows with the same values in one or more columns.

It is commonly used in combination with aggregate functions like SUM, AVG, COUNT, MIN, and MAX to perform calculations on the grouped data.

The basic syntax of GROUP BY is as follows:

SELECT: Specifies the columns to be included in the result set.

aggregate_function: Represents the desired aggregate function, e.g., SUM, AVG, COUNT, MIN, or MAX, to perform calculations on grouped data.

FROM: Specifies the table containing the data.

GROUP BY: Indicates the columns by which the data should be grouped.

The GROUP BY statement is like DISTINCT and SELECT statements, showing the unique values in a column.

The difference is that, the DISTINCT clause counts by a unique value. For example, DISTINCT will find that in our EmployeeDemographics table are two rooms: one male room and one female room.

On the other hand, GROUP BY counts the unique rooms, but also counts how many employees are in each room, like how many males are in the male room, and how many females are in the female room.

Confusing?

Let me talk in SQL language for you:

If we want to find a DISTINCT gender, the query is “SELECT DISTINCT (Gender) FROM EmployeeDemographics” -> Execute

The result brings 2 values. One for female and one value for male.

However, if we say the gender, then GROUP BY gender, “SELECT DISTINCT (Gender) FROM EmployeeDemographics GROUP BY Gender” -> Execute

Even though the result looks the same, what is really showing us is the group of all females in row 1, and the group of all males in row 2.

To better show this, let’s count how many employees are in each gender room.

To count each group, the query is “SELECT Gender, COUNT (Gender) FROM EmployeeDemographics GROUP BY Gender” -> Execute

As we can see now, when we count each employee in their respective room, there are 4 females and 5 males.

NOTE: I named the column as GenderCount by adding AS function.

OK… Let’s spice up a little!

Now, we want to see gender and age, counting how many employees have the same age in their respective male/female room, “SELECT Gender, Age, COUNT (Gender) FROM EmployeeDemographics GROUP BY Gender, Age” -> Execute

The result shows us that none of them have the same age. If in the table we had 2 or 3 male employees with age 59, then we would have 2 or 3 on row 7, column (No column name) – as I didn’t name the column like last exercise.

When you are working with a big data, you can put many criteria on SELECT, as long you also put on GROUP BY, if you want to group the criteria.

The COUNT () is the derived field/column. It is based off the criteria – in this case gender. It is technically not a real column in the dataset, it is one that we are creating based on our input. That is why we can name the column as we wish.

The COUNT () is the aggregate_function, which could be SUM, AVG, COUNT, MIN, or MAX, depending on what result you are looking for.

This query works if you SELECT and GROUP BY existing columns in the database. In this case, gender, and age.

With this same query, we can still add some more things:

Let’s say we want gender and age, counting how many employees have the same age in their respective male/female room (like before), but we want only employees under 55, “SELECT Gender, Age, COUNT (Gender) FROM EmployeeDemographics WHERE Age < 55 GROUP BY Gender, Age” -> Execute

As result, we get a list of employees with age under 55. If both females in their 40’s were 41, (No column name) in row 1, would count 2, instead of 1, and row 2 would become Male – 49 – 1.

Exploring ORDER BY in SQL:

The ORDER BY statement is used to sort the result set based on one or more columns.

It allows you to display data in a specified order, such as ascending (ASC) or descending (DESC).

The basic syntax of ORDER BY is as follows:

SELECT: Specifies the columns to be included in the result set.

FROM: Specifies the table containing the data.

ORDER BY: Indicates the columns by which the result set should be sorted, along with the sorting direction (ASC for ascending and DESC for descending). If not specified, ASC is assumed by default.

Often, it is useful to use both GROUP BY and ORDER BY in the same query to perform more advanced data analysis.

For our first example, let’s count how many employees in our list is under 55 per gender. “SELECT Gender, COUNT (Gender) AS CountGender FROM EmployeeDemographics WHERE Age < 55 GROUP BY Gender ORDER BY CountGender” -> Execute

To start, we named the new column as CountGender. By default, CountGender came out in sequence 1 – 4. That because by default, SQL has an ascending feature, which means smallest to largest.

If we want to change that: “SELECT Gender, COUNT (Gender) AS CountGender FROM EmployeeDemographics WHERE Age < 55 GROUP BY Gender ORDER BY CountGender DESC” -> Execute

Now we have 4 – 1 sequence.

Let’s do the same query but instead of ordering by CountGender, we do based on gender, which means that now we are looking at A to Z. Since we are using descending, Z to A. “SELECT Gender, COUNT (Gender) AS CountGender FROM EmployeeDemographics WHERE Age < 55 GROUP BY Gender ORDER BY Gender ASC” -> Execute

The result is male before female because we are ordering by gender descending, that brings male before female (Z to A).

If we change the order to ascending (default feature), “SELECT Gender, COUNT (Gender) AS CountGender FROM EmployeeDemographics WHERE Age < 55 GROUP BY Gender ORDER BY Gender DESC” -> Execute

Female goes first, as F comes before M in the alphabetical order.

Now, let’s look into the whole table and order by age. “SELECT * FROM EmployeeDemographics ORDER BY Age” -> Execute

The return is age smallest to largest, as ascending is the default feature.

If we do descending. “SELECT * FROM EmployeeDemographics ORDER BY Gender, Age DESC” -> Execute

The opposite happens. Largest to smallest.

We can also play around and order by using more than one criterion.

For example, we can order by gender and age. “SELECT * FROM EmployeeDemographics ORDER BY Age, Gender” -> Execute

The result comes under the age, but we also have it ordered by gender. Which means that, if we had a male and female with the same age, female would be first on the result table, in the case of ascending, and male first if we use descending feature.

NOTE: We can choose descending or ascending for each criterion just by placing ASC/DESC by it.

“SELECT * FROM EmployeeDemographics ORDER BY Age DESC, Gender ASC” -> Execute

Another thing you can do is to use numbers instead of column names.

By the default, EmployeeID is column 1, FirstName is column 2, LastName is column 3, Age is column 4, and gender is column 5.

“SELECT * FROM EmployeeDemographics ORDER BY 4 DESC, 5 ASC” -> Execute

Same exact result, but we used numbers to refer to the column instead of their column name.

In conclusion, GROUP BY and ORDER BY are fundamental SQL clauses that empower data analysts and developers to manipulate data effectively.

The GROUP BY statement allows for grouping data by one or more columns and applying aggregate functions to perform calculations. On the other hand, ORDER BY enables the sorting of result sets based on specified columns, providing more control over data presentation.

By understanding and effectively using these powerful SQL statements, you can gain valuable insights from your databases and streamline data reporting processes.

Congratulations!

We just finished our SQL Basics exercises. Keep practicing mastering the skill. Create new tables, think of new queries. When you feel ready, let’s move on to more advanced exercises. They will be a little more challenging but very useful for any analyst.

Thank you for reading and keep on learning and practicing. We got his!

If you like this content and want to learn more, you might also read:

SQL Fundamentals for Data Analysis: A Hands-On Learning Series

SQL Basics 1: How to Create Table and Add Data in SSMS

SQL Basics 2: SELECT + FROM Statements

SQL Basics 3: WHERE Statement

SQL Basics 4: GROUP BY and ORDER BY Statements

SQL Intermediate 1: INNER and OUTER Joins

SQL Intermediate 2: UNION Operator

SQL Intermediate 3: CASE Statement

SQL Intermediate 4: HAVING Clause

SQL Intermediate 5: UPDATE and DELETE Data