SQL Intermediate 4: HAVING Clause
SQL
Discover the untapped potential of SQL's HAVING clause in our latest blog post! Learn how this often-overlooked feature empowers data analysts to extract valuable insights from grouped data. We'll demystify the HAVING clause's syntax, explore practical examples, and showcase how it revolutionizes post-group filtering, providing a deeper understanding of data trends and patterns. Unleash the true power of SQL with the HAVING clause and elevate your data analysis to new heights!
Welcome to the fourth post of our SQL Intermediate series!
If it is your first time here, and want to follow along with our exercises from the beginning, at the end of this post you will find links for the complete SQL series.
Hope you have been doing the exercises and learning as we go 😊
Today’s topic is HAVING Clause. Ready?
Structured Query Language (SQL) is the foundation of modern data management, empowering users to extract meaningful insights from vast databases. When it comes to analyzing aggregated data, the HAVING clause emerges as a powerful tool to fine-tune results further.
Unlike the WHERE clause, which filters individual rows, the HAVING clause filters data after grouping, allowing for complex conditions based on aggregated values. In this blog post, we will dive into the HAVING clause, explore its syntax, and demonstrate how it empowers data analysts to gain deeper insights through practical examples using tables we have created in previous exercise.
Don’t have the tables yet? Create them following these steps:
SQL Basics 1: How to Create Table and Add Data in SSMS
NOTE: Let’s add a few new employees in our ‘EmployeeDemographics’ and ‘EmployeeSalary’ tables for today’s exercise.
INSERT INTO EmployeeDemographics VALUES
(1015, 'Ana', 'Silva', 47, 'Female'),
(1016, 'Davis', 'Mick', 50, 'Male'),
(1017, 'Jim', 'Park', 51, 'Male'),
(1018, 'Pam’, 'Palmer', 54, 'Female')
INSERT INTO EmployeeSalary VALUES
(1015,'Analyst', 112000)
(1016, 'Analyst', 120000)
(1017, 'HR', 70000)
(1018, 'Secretary', 75000)
To better understand the use of HAVING clause, make sure you review:
SQL Basics 4: GROUP BY and ORDER BY Statements
The Purpose of the HAVING Clause in SQL:
The HAVING clause extends the capabilities of SQL's GROUP BY clause by enabling us to filter grouped data based on aggregated results.
It allows us to focus on subsets of data that meet specific conditions after applying aggregate functions like SUM, COUNT, AVG, MAX, MIN, etc. Essentially, the HAVING clause acts as a post-processing filter, allowing analysts to identify patterns and trends within the summarized data.
Syntax of the HAVING Clause:
SELECT: Used to retrieve specific columns or expressions from a table or multiple tables.
"column1" and "column2": Individual columns we want to include in the output result. These columns must be valid columns in the table specified in the FROM clause.
aggregate_function(column3): Represents an aggregate function applied to "column3." Aggregate functions perform calculations on a group of rows and return a single result. Common aggregate functions include:
SUM(column) - calculates the sum of values in "column."
COUNT(column) - counts the number of non-null values in "column."
AVG(column) - calculates the average of values in "column."
MAX(column) - retrieves the maximum value from "column."
MIN(column) - retrieves the minimum value from "column."
aggregate_result: Creates an alias for the aggregated value. The alias is an optional step but helps in giving a meaningful name to the computed result, which is useful for better understanding the output.
Examples of Using the HAVING Clause:
First, let’s count how many job titles we have in our tables ‘EmployeesDemographics’ and ‘EmployeeSalary’, and how many employees have the same job title. For that, we are using JOIN and GROUP BY, as learned from previous exercises.
“SELECT JobTitle, COUNT(JobTitle)
FROM EmployeeDemographics
JOIN EmployeeSalary ON EmployeeDemographics.EmployeeID = EmployeeSalary.EmployeeID
GROUP BY JobTitle” -> Execute
Output is a table with the job titles and how many employees have the same job title.
Next, let’s look at job titles that have more than one employee.
“SELECT JobTitle, COUNT(JobTitle)
FROM EmployeeDemographics
JOIN EmployeeSalary ON EmployeeDemographics.EmployeeID = EmployeeSalary.EmployeeID
WHERE COUNT (JobTitle) > 1
GROUP BY JobTitle” -> Execute
Writing this query, we get a message:
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
It means that we cannot use this aggregate function in the WHERE statement, leading us to use a HAVING Clause.
“SELECT JobTitle, COUNT(JobTitle)
FROM EmployeeDemographics
JOIN EmployeeSalary ON EmployeeDemographics.EmployeeID = EmployeeSalary.EmployeeID
HAVING COUNT (JobTitle) > 1
GROUP BY JobTitle” -> Execute
Another message appears:
Incorrect syntax near the keyword 'GROUP'.
The reason is because this HAVING statement depends on the GROUP BY statement. The HAVING statement must go after the GROUP BY statement because we cannot look at the aggregate information before it’s aggregated in that GROUP BY statement.
Let’s fix it and run one more time:
“SELECT JobTitle, COUNT(JobTitle)
FROM EmployeeDemographics
JOIN EmployeeSalary ON EmployeeDemographics.EmployeeID = EmployeeSalary.EmployeeID
GROUP BY JobTitle
HAVING COUNT (JobTitle) > 1" -> Execute
Boom! Working perfectly 😊
Let's explore another example:
First, let’s check the average salary of all job titles.
“SELECT JobTitle, AVG(Salary)
FROM EmployeeDemographics
JOIN EmployeeSalary ON EmployeeDemographics.EmployeeID = EmployeeSalary.EmployeeID
GROUP BY JobTitle
ORDER BY AVG(Salary)” -> Execute
Now, let’s add the HAVING Clause saying that average of salary where it is greater than $70.000.
“SELECT JobTitle, COUNT(JobTitle)
FROM EmployeeDemographics
JOIN EmployeeSalary ON EmployeeDemographics.EmployeeID = EmployeeSalary.EmployeeID
GROUP BY JobTitle
HAVING AVG(Salary) > 70000
ORDER BY AVG(Salary)” -> Execute
The outcome is job titles that have average salary over $70.000.
Leveraging the HAVING Clause for Data Analysis:
The HAVING clause becomes invaluable in situations where we need to identify patterns and trends within summarized data. By specifying aggregate conditions, we can efficiently filter results and focus on subsets that are most relevant to our analysis.
Whether it's sales performance, customer behavior, or product insights, the HAVING clause equips us with the tools to dig deeper and make data-driven decisions.
The HAVING clause in SQL serves as a powerful filter, allowing us to scrutinize aggregated data and uncover valuable insights. By applying conditions on grouped information, we can identify trends, patterns, and anomalies that might go unnoticed in individual rows.
As data analysts, mastering the HAVING clause empowers us to pose complex queries and derive meaningful conclusions from large datasets. So, the next time you work with aggregated data, remember the HAVING clause—a mighty ally in your quest for data-driven excellence.
If you like this content and want to learn more, you might also read:
SQL Basics 1: How to Create Table and Add Data in SSMS
SQL Basics 2: SELECT + FROM Statements
SQL Basics 4: GROUP BY and ORDER BY Statements
SQL Intermediate 1: INNER and OUTER Joins
SQL Intermediate 2: UNION Operator:
SQL Intermediate 3: CASE Statement
SQL Intermediate 4: HAVING Clause
SQL Intermediate 5: UPDATE and DELETE Data
SQL Intermediate 6: AS - Aliasing
SQL Intermediate 7: PARTITION BY Clause
SQL Advanced 1: CTEs - Common Table Expressions
SQL Advanced 3: Stored Procedures
SQL Advanced 4: SUBQUERIES - In Select, From and Where
SQL Advanced 5: String Functions - TRIM, LTRIM, RTRIM, REPLACE, SUBSTRING, UPPER, and LOWER