SQL Advanced 1: CTEs - Common Table Expressions
SQL
Welcome to the first post of our SQL Advanced series! Today, we're diving into the world of Common Table Expressions (CTEs). Learn how CTEs streamline complex queries into manageable components and improve modularity, readability, and code reuse. Whether it's handling hierarchical structures or advanced analytics, CTEs are your gateway to unlocking efficient, elegant SQL queries.
Hello everyone and welcome to the first post of our SQL Advanced series. Today’s topic is CTEs – Common Table Expressions.
Structured Query Language (SQL) is the bedrock of data analysis, enabling us to query, manipulate, and extract valuable insights from databases. Among the many powerful features it offers, Common Table Expressions (CTEs) stand out as a tool that can greatly enhance the efficiency and readability of your SQL queries. In this blog post, we'll take a deep dive into what CTEs are, why they matter for data analysts, and how to use them effectively.
What are Common Table Expressions (CTEs) in SQL?
A Common Table Expression (CTE) is a temporary result set that we can define within a SQL query. It provides a way to break down complex queries into more manageable and readable components.
Essentially, a CTE is a named subquery that we can reference within the context of a larger query. It's like creating a mini-table on the fly that we can work with, which can simplify our overall query structure and make it easier to understand.
Why do CTEs matter for Data Analysts?
Modularity and Readability:
Large, complex queries can quickly become challenging to read and maintain. CTEs allow us to break down our query logic into smaller, more understandable pieces. This makes our queries more modular and easier to follow, especially when dealing with complicated joins or subqueries.
Avoiding Code Duplication:
CTEs help us avoid repeating the same subquery multiple times within a larger query. Instead, we define the subquery once as a CTE and then reference it wherever needed. This not only improves code maintainability but also reduces the risk of errors due to inconsistent subqueries.
Recursive Queries:
In certain scenarios, we might need to perform recursive queries, like crossing hierarchical data structures. CTEs are an essential tool for handling such situations, as they allow us to iteratively build and process results.
Window Functions:
CTEs can be used in conjunction with window functions to perform complex analytical tasks, such as ranking, aggregation, and running totals. This combination provides a powerful way to analyze data within specific partitions or windows.
CTEs Syntax in SQL:
To define a CTE, use the WITH keyword followed by the CTE name and the subquery.
If you like this content and want to learn more, you might also read:
SQL Basics 1: How to Create Table and Add Data in SSMS
SQL Basics 2: SELECT + FROM Statements
SQL Basics 4: GROUP BY and ORDER BY Statements
SQL Intermediate 1: INNER and OUTER Joins
SQL Intermediate 2: UNION Operator:
SQL Intermediate 3: CASE Statement
SQL Intermediate 4: HAVING Clause
SQL Intermediate 5: UPDATE and DELETE Data
SQL Intermediate 6: AS - Aliasing
SQL Intermediate 7: PARTITION BY Clause
SQL Advanced 1: CTEs - Common Table Expressions
SQL Advanced 3: Stored Procedures
SQL Advanced 4: SUBQUERIES - In Select, From and Where
SQL Advanced 5: String Functions - TRIM, LTRIM, RTRIM, REPLACE, SUBSTRING, UPPER, and LOWER
Now, let’s create a CTE using our tables ‘EmployeeDemographics’ and ‘EmployeeSalary’. To reference CTEs in SQL:
Once we've defined a CTE, we can reference it in the main query just like we would with a table.
WITH CTE_Employee AS (
SELECT FirstName, LastName, Gender, Salary,
COUNT (Gender) AS Count_Gender, AVG (Salary) AS AVG_Salary
FROM EmployeeDemographics AS Demo
JOIN EmployeeSalary AS Sal
ON Demo.EmployeeID = Sal.EmployeeID
GROUP BY FirstName, Lastname, Gender, Salary
)
SELECT *
FROM CTE_Employee
WITH CTE_Employee AS (...):
This line initiates the creation of a Common Table Expression (CTE) named CTE_Employee.
The WITH keyword is used to define the CTE, and the subsequent query inside the parentheses is the definition of the CTE itself.
SELECT FirstName, LastName, Gender, Salary, COUNT (Gender) AS Count_Gender, AVG (Salary) AS AVG_Salary FROM EmployeeDemographics AS Demo JOIN EmployeeSalary AS Sal ON Demo.EmployeeID = Sal.EmployeeID GROUP BY FirstName, LastName, Gender, Salary:
This is the definition of the CTE. It starts by selecting columns FirstName, LastName, Gender, and Salary from the tables ‘EmployeeDemographics’ (aliased as Demo) and ‘EmployeeSalary’ (aliased as Sal).
It then includes two aggregate functions:
COUNT (Gender) AS Count_Gender:
This counts the occurrences of each gender within the group.
AVG (Salary) AS AVG_Salary:
This calculates the average salary within the group.
GROUP BY:
Used to group the data based on the selected columns: FirstName, LastName, Gender, and Salary.
SELECT * FROM CTE_Employee:
This is the main query that references the CTE. It selects all columns (*) from the CTE named CTE_Employee.
This query is useful for analyzing the distribution of salaries and gender in our employee data. It gives us insights into how many employees of each gender fall within different salary ranges and provides an average salary figure for each group.
Multiple CTEs in SQL:
We can define multiple CTEs in the same query, making it easier to compartmentalize different aspects of our analysis, we want to perform the following tasks:
Calculate the average salary for each gender.
Find the number of employees per gender.
Determine the total number of employees.
We can achieve this using multiple CTEs:
--- Calculate the average salary per gender
WITH AvgSalaryPerGender AS (
SELECT Gender, AVG(Salary) AS AvgSalary
FROM EmployeeDemographics AS Demo
JOIN EmployeeSalary AS Sal ON Demo.EmployeeID = Sal.EmployeeID
GROUP BY Gender
),
--- Count employees per gender
EmployeeCountPerGender AS (
SELECT Gender, COUNT(*) AS EmployeeCount
FROM EmployeeDemographics
GROUP BY Gender
)
--- Calculate total employee count
SELECT
ASCPG.Gender,
ASCPG.AvgSalary,
ECPG.EmployeeCount,
SUM(ECPG.EmployeeCount) OVER () AS TotalEmployeeCount
FROM AvgSalaryPerGender AS ASCPG
JOIN EmployeeCountPerGender AS ECPG
ON ASCPG.Gender = ECPG.Gender
In this query:
The first CTE (AvgSalaryPerGender) calculates the average salary for each gender by joining the ‘EmployeeDemographics’ and ‘EmployeeSalary’ tables and grouping by gender.
The second CTE (EmployeeCountPerGender) calculates the number of employees per gender by grouping the ‘EmployeeDemographic’ table by gender.
The final query joins the two CTEs and adds a column to calculate the total employee count using a window function.
This example demonstrates how we can effectively use multiple CTEs in a single query to organize and process different aspects of our data analysis.
Recursive CTEs in SQL:
For this example, we do not have appropriated data, however, I want to give you an idea how to use it.
Imagine you have a list of tasks, and some of these tasks have subtasks. Recursive CTEs in SQL work like a loop that goes through this list and its sublists, helping you understand the hierarchical relationship among tasks. It's like looking at a family tree where you have parents and children, and sometimes even grandchildren.
How Does It Work?
Anchor Member (Starting Point): Think of this as where the loop begins. It's like the main list of tasks you start with, without any subtasks. This initial part of the query gives the base for the recursion to build on.
Recursive Member (Building the Loop): Now, think of each subtask as building on the previous task. The recursive member connects tasks with their subtasks and keeps going until there are no more subtasks left. It's like adding layers to your understanding of the tasks.
Why Use Recursive CTEs?
Recursive CTEs are like having a superpower to work with hierarchical data. They're perfect when you want to understand complex relationships, like organizational structures or family trees. They're especially handy for tasks that involve iterating through layers of data until you reach a point where there's no more nesting.
Common Table Expressions (CTEs) are a valuable tool in the SQL arsenal of any data analyst. They enhance the readability and maintainability of queries, allow for modular code organization, and enable advanced analytical tasks. By mastering CTEs, we'll be better equipped to tackle complex data analysis challenges and produce more efficient, elegant, and insightful SQL queries.