SQL Advanced 1: CTEs - Common Table Expressions

SQL

Vanessa Gottardi

5/13/20236 min read

CTEs in sql
CTEs in sql

Welcome to the first post of our SQL Advanced series! Today, we're diving into the world of Common Table Expressions (CTEs). Learn how CTEs streamline complex queries into manageable components and improve modularity, readability, and code reuse. Whether it's handling hierarchical structures or advanced analytics, CTEs are your gateway to unlocking efficient, elegant SQL queries.

Hello everyone and welcome to the first post of our SQL Advanced series. Today’s topic is CTEs – Common Table Expressions.

Structured Query Language (SQL) is the bedrock of data analysis, enabling us to query, manipulate, and extract valuable insights from databases. Among the many powerful features it offers, Common Table Expressions (CTEs) stand out as a tool that can greatly enhance the efficiency and readability of your SQL queries. In this blog post, we'll take a deep dive into what CTEs are, why they matter for data analysts, and how to use them effectively.

What are Common Table Expressions (CTEs) in SQL?

A Common Table Expression (CTE) is a temporary result set that we can define within a SQL query. It provides a way to break down complex queries into more manageable and readable components.

Essentially, a CTE is a named subquery that we can reference within the context of a larger query. It's like creating a mini-table on the fly that we can work with, which can simplify our overall query structure and make it easier to understand.

Why do CTEs matter for Data Analysts?

Modularity and Readability:

Large, complex queries can quickly become challenging to read and maintain. CTEs allow us to break down our query logic into smaller, more understandable pieces. This makes our queries more modular and easier to follow, especially when dealing with complicated joins or subqueries.

Avoiding Code Duplication:

CTEs help us avoid repeating the same subquery multiple times within a larger query. Instead, we define the subquery once as a CTE and then reference it wherever needed. This not only improves code maintainability but also reduces the risk of errors due to inconsistent subqueries.

Recursive Queries:

In certain scenarios, we might need to perform recursive queries, like crossing hierarchical data structures. CTEs are an essential tool for handling such situations, as they allow us to iteratively build and process results.

Window Functions:

CTEs can be used in conjunction with window functions to perform complex analytical tasks, such as ranking, aggregation, and running totals. This combination provides a powerful way to analyze data within specific partitions or windows.

CTEs Syntax in SQL:

To define a CTE, use the WITH keyword followed by the CTE name and the subquery.

CTE syntax in sql
CTE syntax in sql

Now, let’s create a CTE using our tables ‘EmployeeDemographics’ and ‘EmployeeSalary’. To reference CTEs in SQL:

Once we've defined a CTE, we can reference it in the main query just like we would with a table.

WITH CTE_Employee AS (

SELECT FirstName, LastName, Gender, Salary,

COUNT (Gender) AS Count_Gender, AVG (Salary) AS AVG_Salary

FROM EmployeeDemographics AS Demo

JOIN EmployeeSalary AS Sal

ON Demo.EmployeeID = Sal.EmployeeID

GROUP BY FirstName, Lastname, Gender, Salary

)

SELECT *

FROM CTE_Employee

how to use CTE in sql
how to use CTE in sql

WITH CTE_Employee AS (...):

This line initiates the creation of a Common Table Expression (CTE) named CTE_Employee.

The WITH keyword is used to define the CTE, and the subsequent query inside the parentheses is the definition of the CTE itself.

SELECT FirstName, LastName, Gender, Salary, COUNT (Gender) AS Count_Gender, AVG (Salary) AS AVG_Salary FROM EmployeeDemographics AS Demo JOIN EmployeeSalary AS Sal ON Demo.EmployeeID = Sal.EmployeeID GROUP BY FirstName, LastName, Gender, Salary:

This is the definition of the CTE. It starts by selecting columns FirstName, LastName, Gender, and Salary from the tables ‘EmployeeDemographics’ (aliased as Demo) and ‘EmployeeSalary’ (aliased as Sal).

It then includes two aggregate functions:

  • COUNT (Gender) AS Count_Gender:

This counts the occurrences of each gender within the group.

  • AVG (Salary) AS AVG_Salary:

This calculates the average salary within the group.

GROUP BY:

Used to group the data based on the selected columns: FirstName, LastName, Gender, and Salary.

SELECT * FROM CTE_Employee:

This is the main query that references the CTE. It selects all columns (*) from the CTE named CTE_Employee.

This query is useful for analyzing the distribution of salaries and gender in our employee data. It gives us insights into how many employees of each gender fall within different salary ranges and provides an average salary figure for each group.

Multiple CTEs in SQL:

We can define multiple CTEs in the same query, making it easier to compartmentalize different aspects of our analysis, we want to perform the following tasks:

  • Calculate the average salary for each gender.

  • Find the number of employees per gender.

  • Determine the total number of employees.

We can achieve this using multiple CTEs:

--- Calculate the average salary per gender

WITH AvgSalaryPerGender AS (

SELECT Gender, AVG(Salary) AS AvgSalary

FROM EmployeeDemographics AS Demo

JOIN EmployeeSalary AS Sal ON Demo.EmployeeID = Sal.EmployeeID

GROUP BY Gender

),

--- Count employees per gender

EmployeeCountPerGender AS (

SELECT Gender, COUNT(*) AS EmployeeCount

FROM EmployeeDemographics

GROUP BY Gender

)

--- Calculate total employee count

SELECT

ASCPG.Gender,

ASCPG.AvgSalary,

ECPG.EmployeeCount,

SUM(ECPG.EmployeeCount) OVER () AS TotalEmployeeCount

FROM AvgSalaryPerGender AS ASCPG

JOIN EmployeeCountPerGender AS ECPG

ON ASCPG.Gender = ECPG.Gender

how to use CTE in sql
how to use CTE in sql

In this query:

The first CTE (AvgSalaryPerGender) calculates the average salary for each gender by joining the ‘EmployeeDemographics’ and ‘EmployeeSalary’ tables and grouping by gender.

The second CTE (EmployeeCountPerGender) calculates the number of employees per gender by grouping the ‘EmployeeDemographic’ table by gender.

The final query joins the two CTEs and adds a column to calculate the total employee count using a window function.

This example demonstrates how we can effectively use multiple CTEs in a single query to organize and process different aspects of our data analysis.

Recursive CTEs in SQL:

For this example, we do not have appropriated data, however, I want to give you an idea how to use it.

Imagine you have a list of tasks, and some of these tasks have subtasks. Recursive CTEs in SQL work like a loop that goes through this list and its sublists, helping you understand the hierarchical relationship among tasks. It's like looking at a family tree where you have parents and children, and sometimes even grandchildren.

How Does It Work?

Anchor Member (Starting Point): Think of this as where the loop begins. It's like the main list of tasks you start with, without any subtasks. This initial part of the query gives the base for the recursion to build on.

Recursive Member (Building the Loop): Now, think of each subtask as building on the previous task. The recursive member connects tasks with their subtasks and keeps going until there are no more subtasks left. It's like adding layers to your understanding of the tasks.

Why Use Recursive CTEs?

Recursive CTEs are like having a superpower to work with hierarchical data. They're perfect when you want to understand complex relationships, like organizational structures or family trees. They're especially handy for tasks that involve iterating through layers of data until you reach a point where there's no more nesting.

Common Table Expressions (CTEs) are a valuable tool in the SQL arsenal of any data analyst. They enhance the readability and maintainability of queries, allow for modular code organization, and enable advanced analytical tasks. By mastering CTEs, we'll be better equipped to tackle complex data analysis challenges and produce more efficient, elegant, and insightful SQL queries.

Related Stories