Day - 6 of SQL for Data Science
SQL Joins are a crucial aspect of database management and querying. They enable us to combine data from multiple database tables, unleashing the true power of data integration and analysis. Mastering SQL Joins will undoubtedly elevate your data manipulation skills.
Use the ERD Diagram to understand and write queries :
What is SQL Joins?
SQL Joins are an essential aspect of working with databases and are used to combine data from multiple tables. They allow us to retrieve related data from different tables based on a specified condition, enabling powerful data analysis and insights.
There are several types of SQL Joins :
The most commonly used join, it returns only the matching rows from both tables. It filters out non-matching rows, leaving you with a result set containing only relevant data.
This join returns all the rows from the left table and the matching rows from the right table. If there is no match, it fills in the gaps with NULL values
Similar to the LEFT JOIN, but it returns all the rows from the right table and matching rows from the left table. Non-matching rows will have NULL values in the result.
Combining both LEFT and RIGHT joins, this returns all rows when there is a match in either of the tables. If there's no match, NULL values will fill in the gaps.
ON
With the addition of the JOIN statement to our toolkit, we will also be adding the ON statement.
We use ON clause to specify a JOIN condition which is a logical statement to combine the table in FROM and JOIN statements.
Example
Below we see an example of a query using a JOIN statement. Let's discuss what the different clauses of this query.
SELECT orders.*
FROM orders
JOIN accounts
ON orders.account_id = accounts.id;
We are able to pull data from two tables :
Above, we are only pulling data from the orders table since in the SELECT statement we only reference columns from the orders table.
The ON statement holds the two columns that get linked across the two tables.
Recommended by LinkedIn
Primary and Foreign Keys
A primary key is a unique column in a particular table. This is the first column in each of our tables. Here, those columns are all called id, but that doesn't necessarily have to be the name. It is common that the primary key is the first column in our tables in most databases.
Foreign Key
A foreign key is a column in one table that is a primary key in a different table. We can see in the Parch & Posey ERD that the foreign keys are:
Each of these is linked to the primary key of another table.
Note
Notice our SQL query has the two tables we would like to join - one in the FROM and the other in the JOIN. Then in the ON Statement, we will always have the PK equal to the FK.
ALIAS
When we JOIN tables together, it is nice to give each table an alias. Frequently an alias is just the first letter of the table name.
Example
FROM tablename AS t1
JOIN tablename2 AS t2
Aliases for Columns in Resulting Table
While aliasing tables is the most common use case. It can also be used to alias the columns selected to have the resulting table reflect a more readable name.
Example
Select t1.column1 aliasname, t2.column2 aliasname2
FROM tablename AS t1
JOIN tablename2 AS t2
Every JOIN we have done up to this point has been an INNER JOIN. That is, we have always pulled rows only if they exist as a match across two tables.
Our new JOINs allow us to pull rows that might only exist in one of the two tables. This will introduce a new data type called NULL.
LEFT and RIGHT JOIN
LEFT JOIN - pulls all the data that exists in both tables, as well as all of the rows from the table in the FROM even if they do not exist in the JOIN statement.
RIGHT JOIN - pulls all the data that exists in both tables, as well as all of the rows from the table in the JOIN even if they do not exist in the FROM statement.