SQL Intermediate 1: INNER and OUTER Joins
SQL
In the world of databases, SQL joins are crucial for combining data effectively. As organizations collect vast amounts of information, the ability to merge data from different sources becomes essential. In this blog post, we'll explore SQL joins, understanding the types like INNER, OUTER, LEFT, and RIGHT joins. By the end, you'll be skilled at merging data, gaining insights, and becoming a SQL expert. Let's dive into SQL joins to unlock their full potential for data-driven decisions.
Hello everyone and welcome to the first post of the SQL Intermediate series.
Today, we are starting with some intermediate concepts to take our skills up to the next level, starting with INNER and OUTER joins.
Hope you enjoyed and finished the exercises from our SQL Basics series - Links available at end of this post, which is everything you need to know to get started in the SQL world.
NOTE: To follow up with the next exercises, let's practice what we have learned so far, by adding some NULL values into our existing tables.
HOW TO INSERT NULL VALUES IN SQL TABLES?
To insert new rows into a table with NULL values is simple!
We use statement INSERT INTO table_name VALUES.
When filling the data for each table column, ensure the columns we want NULL values are not specified in the INSERT statement. By omitting the columns or using the keyword NULL for those columns, the database will automatically set those columns to NULL.
Need more details? Check exercise 2. INSERT DATA INTO TABLES here.
-> Add 2 new employees into ‘EmployeeDemographics’ table with NULL values:
“INSERT INTO EmployeeDemographics (EmployeeID, Age) VALUES (1010, 55)” -> Execute
“INSERT INTO EmployeeDemographics (FirstName, Gender) VALUES (‘Sarah’, ‘Female’)” -> Execute
-> Add 2 new employees into ‘EmployeeSalary’ table with NULL values:
“INSERT INTO EmployeeSalary (EmployeeID, Salary) VALUES (1011, 78000)” -> Execute
“INSERT INTO EmployeeSalary (JobTitle, Salary) VALUES ('HR', 55000)” -> Execute
After adding new employees, our tables should look like:
Let’s start…
SQL provides various powerful tools for joining tables, enabling us to combine data from multiple sources to gain deeper insights. Among these, inner and outer joins stand out as fundamental concepts for data retrieval and analysis. In this blog post, we will explore the key differences between inner and outer joins in SQL and understand how they bridge data gaps.
What are Joins in SQL?
In SQL, joins are used to combine rows from two or more tables based on related columns between them. Joins allow us to retrieve data that is spread across multiple tables, creating a unified view of the information.
Basically, we are combining two tables to create one output.
NOTE: To be able to join tables, they must have one shared column. For our exercises, EmployeeID is the common column in both tables to join these tables together to create one output.
The syntax for join tables is:
SELECT: Specify which columns or expressions we want to retrieve from the database.
FROM: Indicates the tables from which we want to retrieve data.
INNER JOIN: Combine rows from two or more tables based on a related condition.
ON: Specifies the condition for the join. It defines how the two tables should be connected.
INNER Joins:
Inner joins are the most used type of join in SQL.
They return only the matching rows from both tables based on the specified condition. Any rows that do not have matching values in both tables are excluded from the result set.
For example, let’s inner join tables ‘EmployeeDemographics’ and ‘EmployeeSalary’.
“SELECT * FROM EmployeeDemographics INNER JOIN EmployeeSalary ON EmployeeDemographics.EmployeeID = EmployeeSalary.EmployeeID” -> Execute
The result is the matching rows from both tables combined.
INNER join, or just join which by default is INNER join, pretty much only brings everything that is in common, overlapping between both tables. The values that are missing, are NULL fields in both tables.
Let me be clearer about it:
NOTE: Use diagram to visualize types of INNER and OUTER SQL joins.
INNER JOIN: Returns only the matching rows from both tables based on the specified condition. It filters out non-matching rows, leaving behind only the data that exists in both tables.
With our previous example, we have two tables, 'EmployeeDemographics' and 'EmployeeSalary', an INNER join between them gave us only the rows where all fields were full.
OUTER Joins:
Outer joins, on the other hand, include all the rows from one table and the matching rows from the other table. If no match is found for a row in one table, the result will still include that row with NULL values for columns from the other table.
There are three types of outer joins:
Full Outer Join
Left Outer Join (or LEFT JOIN)
Right Outer Join (or RIGHT JOIN)
Full OUTER Join:
A Full OUTER Join returns all rows from both tables, regardless of whether there is a match or not. When there is no match for a row in one table, the columns from the other table will contain NULL values.
Example: A full OUTER join will give us all ‘EmployeeDemographics’ and all ‘EmployeeSalary’ tables, including values that do not match.
“SELECT * FROM EmployeeDemographics FULL JOIN EmployeeSalary ON EmployeeDemographics.EmployeeID = EmployeeSalary.EmployeeID” -> Execute
As we can see, based on our diagram, the output is everything from table left/A and table right/B, regardless of if it has a match or not.
Left Join (or Left OUTER Join):
A Left Join returns all the rows from the left/A table and the matching rows from the right/B table. If there is no match for a row in the right/B table, the columns from the right/B table will contain NULL values.
Example: a left join on 'EmployeeDemographics' will take the left/A table and bring only what right/B table is overlapping. If it is not overlapping with left/A table, we do not want in our output.
NOTE: Left/A table is the first table we use – FROM table. Right/B table is the second table that we will be matching with FROM table.
“SELECT * FROM EmployeeDemographics LEFT JOIN EmployeeSalary ON EmployeeDemographics.EmployeeID = EmployeeSalary.EmployeeID” -> Execute
The output gives us the information from the ‘EmployeeDemographics’ table. If ‘EmployeeSalary’ has no EmployeeID, the EmployeeID columns will show NULL, like rows 10 and 11.
Right Join (or Right Outer Join):
A Right Join returns all the rows from the right/B table and the matching rows from the left/A table. If there is no match for a row in the left/A table, the columns from the left/A table will contain NULL values.
In simple words, the exact opposite of Left Join.
Example: A left join on 'EmployeeDemographics' will take the right/B table and bring only what left/A table is overlapping. If it is not overlapping with right/B table, we do not want in our output.
“SELECT * FROM EmployeeDemographics RIGHT JOIN EmployeeSalary ON EmployeeDemographics.EmployeeID = EmployeeSalary.EmployeeID” -> Execute
As result, we get everything from ‘EmployeeSalary’ table. If table ‘EmployeeDemographics’ does not match, fields get NULL, like rows 10 and 11.
In summary, INNER join selects only matching rows, FULL OUTER join combines all rows from both tables, LEFT join keeps all rows from the left table, and RIGHT join keeps all rows from the right table. Each type of join serves different purposes based on the relationships between the tables and the data you want to retrieve.
HINT: For these exercises, we used SELECT *. However, when we are working with real data, we probably want to select exactly columns we want our output. To do that, we need to specify what table we are retrieving the data together with the common column.
Let’s say we want to work with EmployeeID (common column), LastName and JobTitle, based on ‘EmployeeDemographics’ table:
“SELECT EmployeeDemographics.EmployeeID, LastName, JobTitle FROM EmployeeDemographics LEFT JOIN EmployeeSalary ON EmployeeDemographics.EmployeeID = EmployeeSalary.EmployeeID” -> Execute
Comparing INNER and OUTER Joins:
The primary distinction between INNER and OUTER joins lies in their result sets:
INNER joins return only the matching rows from both tables, excluding non-matching rows.
OUTER joins include all rows from at least one table and match rows from the other table where applicable, filling non-matching rows with NULL.
When to Use INNER or OUTER Joins:
Use INNER joins when you need to retrieve data that exists in both tables and when you want to filter out rows with no matches in either table. Inner joins are efficient and often used to fetch related data.
Alternatively, use OUTER joins when you want to include unmatched rows from one table while still linking them to the matching rows in the other table. OUTER joins are helpful when dealing with optional relationships between tables or when exploring data gaps.
Understanding the difference between INNER and OUTER joins in SQL is crucial for effective data retrieval and analysis. INNER joins help us find common data across tables, while OUTER joins enable us to bridge data gaps and include all rows, even if they don't have matches.
By leveraging the power of joins, SQL empowers us to unlock valuable insights from complex relational databases. Whether you're a beginner or an experienced SQL user, mastering these join types will undoubtedly enhance your data manipulation capabilities.
Thank you for reading and keep on learning and practicing. We got his!
If you like this content and want to learn more, you might also read:
SQL Fundamentals for Data Analysis: A Hands-On Learning Series
SQL Basics 1: How to Create Table and Add Data in SSMS
SQL Basics 2: SELECT + FROM Statements
SQL Basics 4: GROUP BY and ORDER BY Statements
SQL Intermediate 1: INNER and OUTER Joins
SQL Intermediate 2: UNION Operator
SQL Intermediate 3: CASE Statement
SQL Intermediate 4: HAVING Clause
SQL Intermediate 5: UPDATE and DELETE Data
SQL Intermediate 6: AS - Aliasing
SQL Intermediate 7: PARTITION BY Clause
SQL Advanced 1: CTEs - Common Table Expressions
SQL Advanced 3: Stored Procedures
SQL Advanced 4: SUBQUERIES - In Select, From and Where
SQL Advanced 5: String Functions - TRIM, LTRIM, RTRIM, REPLACE, SUBSTRING, UPPER, and LOWER