SQL Advanced 5: String Functions - TRIM, LTRIM, RTRIM, REPLACE, SUBSTRING, UPPER, and LOWER
SQL
Delve into the world of SQL's string functions with our blog post. We explore seven fundamental functions, such as TRIM, LTRIM, RTRIM, REPLACE, SUBSTRING, UPPER, and LOWER, that help clean, modify, and format text data. Learn how to eliminate extra spaces, capitalize first letters, and replace substrings. Join us as we unlock the potential of these tools for efficient data manipulation.
In the world of databases and structured data, SQL (Structured Query Language) is a powerful tool that allows us to interact with and manage data efficiently. One common task when working with data is manipulating strings, and SQL provides several handy string functions to help with that. In this blog post, we'll dive into seven fundamental string functions for beginners: TRIM, LTRIM, RTRIM, REPLACE, SUBSTRING, UPPER, and LOWER.
To do today’s exercise, let’s create a new table and insert data with some misspelling to demonstrate string functions:
CREATE TABLE Trim_Errors (
ID varchar(50),
FirstName varchar(50),
LastName varchar (50))
INSERT INTO Trim_Errors VALUES
(' 1001', 'Braddy', 'Smoth'),
('1002 ', 'Christian', 'Lopez'),
(' 1003 ', 'ANGelina', 'Miller - Smith')
SELECT *
FROM Trim_Errors
If you like this content and want to learn more, you might also read:
SQL Basics 1: How to Create Table and Add Data in SSMS
SQL Basics 2: SELECT + FROM Statements
SQL Basics 4: GROUP BY and ORDER BY Statements
SQL Intermediate 1: INNER and OUTER Joins
SQL Intermediate 2: UNION Operator
SQL Intermediate 3: CASE Statement
SQL Intermediate 4: HAVING Clause
SQL Intermediate 5: UPDATE and DELETE Data
SQL Intermediate 6: AS - Aliasing
SQL Intermediate 7: PARTITION BY Clause
SQL Advanced 1: CTEs - Common Table Expressions
SQL Advanced 3: Stored Procedures
SQL Advanced 4: SUBQUERIES - In Select, From and Where
SQL Advanced 5: String Functions - TRIM, LTRIM, RTRIM, REPLACE, SUBSTRING, UPPER, and LOWER
Comparing to our ‘EmployeeDemographics’ table – SQL Basics 1 – we find some discrepancies:
Spaces before ID number, correct first name is Brad, correct last name is Smith.
Spaces after ID number.
Spaces before and after ID number, we want Angelina instead of ANGelina, fix her last name to be only Miller.
Let's adjust them!
String Functions in SQL: TRIM
When dealing with text data, extra spaces at the beginning or end of a string can be problematic. The TRIM function comes to the rescue. It removes any leading or trailing spaces from a string, ensuring clean and consistent data.
Example: Let’s TRIM spaces from ID column.
SELECT ID, TRIM (ID) AS ID_TRIM
FROM Trim_Errors
As we can see, all spaces were deleted.
String Functions in SQL: LTRIM (Left TRIM)
Similar to TRIM, LTRIM is used to remove leading spaces from the beginning of a string. This function can be particularly useful when dealing with data that might have inconsistencies in spacing.
Example: Let’s LTRIM spaces.
SELECT ID, LTRIM (ID) AS ID_TRIM
FROM Trim_Errors
Here, only ID 1001 and 1003 were trimmed as they both had spaces before/left ID number. Note that ID 1003 still have spaces after/right ID number.
String Functions in SQL: RTRIM (Right TRIM)
Conversely, RTRIM eliminates trailing spaces from a string. It's essential for maintaining uniformity in your data.
Example: Let’s RTRIM spaces.
SELECT ID, RTRIM (ID) AS ID_TRIM
FROM Trim_Errors
String Functions in SQL: SUBSTRING (or SUBSTR)
Among the string functions in SQL, the SUBSTRING function stands out due to its ability to precisely extract specific portions of text data from a larger string. What makes SUBSTRING particularly unique is its flexible nature, allowing us to pinpoint where we want to begin and determine how many characters we want to retrieve.
Imagine a scenario where we have a long string, such as a sentence or a paragraph, and we're interested in extracting a specific segment from it. This is where SUBSTRING shines. We provide two crucial pieces of information:
Starting Point: We can specify the exact position from which we want the extraction to begin. This starting point can be specified using either a numeric index or even a string pattern.
Length: Additionally, we can define how many characters we want to retrieve from the starting point. This length can also be expressed using numeric values or string patterns.
This unique combination of starting point and length empowers us to extract exactly the data we need, making SUBSTRING a versatile tool for various tasks. Whether it's grabbing names from email addresses, abbreviating long descriptions, or capturing specific keywords, the SUBSTRING function lets us tailor our data extraction with precision.
Example: We want to fix Brad’s name from Braddy to Brad.
SELECT SUBSTRING (FirstName, 1, 4) AS Fixed_Name
FROM Trim_Errors
WHERE FirstName = 'Braddy'
In this case, only ID 1002 and 1003 were trimmed as they both had spaces after/right ID number. Note that ID 1003 still have spaces before/left ID number.
String Functions in SQL: REPLACE
The REPLACE function allows us to substitute occurrences of a specific substring within a larger string with another substring. This is great for cleaning up or modifying text data.
Example: Let’s fix Angelina’s last name to be only Miller. For that, we specify the part we want to change, then how we want it to be.
SELECT LastName, REPLACE (LastName, '- Smith', '') AS Fixed_LastName
FROM Trim_Errors
Example: Ok... Let’s spice up a little by querying to retrieve specific substrings of first name from our two tables ‘Trim_Errors’ and ‘EmployeeDemographics’, and then join them based on a common substring of the first four characters.
SELECT SUBSTRING(trim.FirstName, 1, 4) AS Trim_Name,
SUBSTRING(Demo.FirstName, 1, 4) AS Demo_Name
FROM Trim_Errors AS Trim
JOIN EmployeeDemographics AS Demo
ON SUBSTRING(trim.FirstName, 1, 4) = SUBSTRING(Demo.FirstName, 1, 4)
WHERE Trim.FirstName = 'Braddy'
Let's break down each part of the query:
SELECT Clause:
SUBSTRING(trim.FirstName, 1, 4) AS Trim_Name: This extracts the first four characters from the 'FirstName' column in the Trim_Errors table and gives it the alias 'Trim_Name'.
SUBSTRING(Demo.FirstName, 1, 4) AS Demo_Name: Similarly, this extracts the first four characters from the 'FirstName' column in the EmployeeDemographics table and assigns the alias 'Demo_Name'.
FROM Clause:
FROM Trim_Errors AS Trim: This defines an alias 'Trim' for the Trim_Errors table, which will be used to reference columns from this table throughout the query.
JOIN Clause:
JOIN EmployeeDemographics AS Demo: This also creates an alias 'Demo' for the EmployeeDemographics table, which will be used similarly.
ON SUBSTRING(trim.FirstName, 1, 4) = SUBSTRING(Demo.FirstName, 1, 4): This establishes a join between the 'Trim_Errors' table and the 'EmployeeDemographics' table. It matches rows where the first four characters of 'FirstName' in the Trim_Errors table are equal to the first four characters of 'FirstName' in the EmployeeDemographics table.
WHERE Clause:
WHERE Trim.FirstName = 'Braddy': This filters the results to only include rows where the 'FirstName' in the Trim_Errors table is 'Braddy'.
In summary, this query extracts the first four characters of 'FirstName' from two tables, 'Trim_Errors' and 'EmployeeDemographics', and then joins them based on those extracted substrings. The resulting output will display the first four characters of the names from both tables side by side for cases where the 'FirstName' in the Trim_Errors table is 'Braddy'.
Here are some scenarios when we might want to use a similar example:
Data Cleansing and Matching: If we have data from different sources that might not match exactly but have common substrings, we can use this kind of query to identify potential matches. For instance, if we're dealing with variations of names (e.g., 'Brad', 'Braddison', 'Braddy'), we can compare the first few characters to find potential duplicates or matches.
Data Analysis: Suppose we want to perform an analysis based on the first few characters of a name, such as understanding the distribution of names or trends within specific name segments. This query structure can help you aggregate and analyze data efficiently.
Fuzzy Matching: When dealing with potentially inconsistent or misspelled data, comparing substrings can be a way to perform a form of fuzzy matching. We're not looking for exact matches but rather for similarities in the beginning of strings. It is useful for columns like gender, last name, age, and date of birth as they will be the same for a specific person getting a high accuracy in matching people across tables.
Merging Data from Different Sources: If we have data from different databases or systems and want to merge it based on some commonalities, this query can help us identify and merge records that share similar prefixes.
Record Linkage: When working with databases containing personal information, such as customer databases, we might want to link records based on partial identifiers like names. This can be helpful for creating a more complete and accurate dataset.
Data Quality Assessment: This query can also be used for assessing data quality. We might find inconsistent data if we notice that records with similar first name substrings don't match as expected.
String Functions in SQL: UPPER
When we need to convert a string to uppercase, the UPPER function is your go-to choice. It ensures consistency in your data and is useful for performing case-insensitive comparisons.
Example: Let’s uppercase all first names.
SELECT FirstName, UPPER (FirstName)
FROM Trim_Errors
String Functions in SQL: LOWER
Conversely, the LOWER function converts a string to lowercase. This can be helpful for making data uniform and for cases where case sensitivity isn't necessary.
Example: Let’s lowercase all first names.
SELECT FirstName, LOWER (FirstName)
FROM Trim_Errors
How to Capitalize Only First latter in SQL:
To capitalize only the first letter of a string in SQL, we can use a combination of string functions, such as UPPER, LOWER, and SUBSTRING.
- Example: Let’s fix ANGelina name to Angelina.
SELECT UPPER(SUBSTRING(FirstName, 1, 1)) + LOWER(SUBSTRING(FirstName, 2, LEN(FirstName)))
FROM Trim_Errors
Let's break down what's happening in the query:
SUBSTRING(FirstName, 1, 1): This extracts the first character of the string in FirstName column.
UPPER(SUBSTRING(FirstName, 1, 1)): This converts the extracted first character to uppercase.
SUBSTRING(FirstName, 2, LEN(FirstName)): This extracts the rest of the string, starting from the second character.
LOWER(SUBSTRING(FirstName, 2, LEN(FirstName))): This converts the rest of the string to lowercase.
UPPER(...) + LOWER(...): Finally, the uppercase first letter and lowercase remaining letters are concatenated to form the capitalized string.
This query will produce a result where only the first letter of each string in the specified column is capitalized, while the rest of the string remains in lowercase.
SQL string functions like TRIM, LTRIM, RTRIM, REPLACE, SUBSTRING, UPPER, and LOWER play a crucial role in manipulating and transforming text data within your SQL queries. With these functions in your toolbox, you can clean, modify, and format text effectively, making your data more consistent and manageable. As you continue your SQL journey, remember that practice is key to mastering these functions and becoming a proficient SQL user. Happy querying!