Day - 7 of SQL for Data Science
SQL Aggregation
In the world of data management and analysis, SQL (Structured Query Language) plays a pivotal role. One of its most powerful features is aggregation, which allows us to summarize and condense vast amounts of data into meaningful insights.
In this article, we will dive deep into the concept of SQL aggregation, exploring its various functions and syntax.
Use the ERD Diagram to understand and write queries :
What is SQL Aggregation?
SQL aggregation refers to the process of transforming multiple rows of data into a single summary. Instead of working with individual records, we apply functions to groups of records to produce meaningful results. SQL provides several aggregation functions, such as COUNT, SUM, AVG, MIN, MAX, and more, that enable us to perform calculations across multiple rows efficiently.
NULLs in SQL
In SQL, NULL represents the absence of a value in a particular column of a row. It does not signify zero, empty, or any specific data type but rather indicates the lack of a meaningful value. Think of it as a placeholder for missing or unknown information.
Important Point
When identifying NULLs in a WHERE clause, we write IS NULL or IS NOT NULL. We don't use =, because NULL isn't considered a value in SQL. Rather, it is a property of the data.
Example
Write a query that returns all the accounts where the website is not present.
SELECT *
FROM accounts
WHERE website IS NULL;
Write a query that returns all the web_events where the channel is present.
SELECT *
FROM web_events
WHERE channel IS NOT NULL;
COUNT in SQL
In SQL, the COUNT function is used to count the number of rows that meet a specific condition or the total number of rows in a table. It is one of the most commonly used aggregate functions and provides valuable insights into the size and distribution of data.
The syntax for the COUNT function is straightforward :
SELECT COUNT(column_name)
FROM table_name
WHERE condition;
Let's explore two common use cases of the COUNT function :
1. Counting Total Rows
To count the total number of rows in a table, you can simply use the COUNT function without any specific column or condition.
SELECT COUNT(*) AS TotalRows FROM table_name;
The above query will return a single row containing the total number of rows in the specified table under the column alias "TotalRows."
2. Counting Rows Based on a Condition
To count the number of rows that meet a specific condition, you can use the COUNT function with a WHERE clause.
For example, let's say we have a table called "Employees," and we want to count the number of employees who belong to the "Marketing" department:
SELECT COUNT(*) AS MarketingEmployees FROM Employees WHERE Department = 'Marketing';
This query will return the count of employees who work in the "Marketing" department under the column alias "MarketingEmployees."
3. Handling NULL Values with COUNT
It's important to note that the COUNT function, by default, includes NULL values in its calculation. If you want to exclude NULL values, you can use the COUNT function with a specific column name that does not contain NULL values.
SELECT COUNT(column_name) FROM table_name WHERE column_name IS NOT NULL;
In the case of the "Employees" table, if we want to count the number of employees who have a valid age (i.e., age is not NULL), we can use :
SELECT COUNT(Age) AS ValidAgeEmployees FROM Employees WHERE Age IS NOT NULL;
This query will return the count of employees with a valid age (non-NULL) under the column alias "ValidAgeEmployees."
Recommended by LinkedIn
SUM in SQL
In SQL, the SUM function is an aggregate function that calculates the sum of numeric values in a column or an expression. It is commonly used to obtain the total of a set of numerical values stored in a database table.
The syntax for the SUM function is straightforward:
SELECT SUM(column_name) FROM table_name WHERE condition;
Unlike COUNT, you can only use SUM on numeric columns. However, SUM will ignore NULL values, as do the other aggregation functions.
Example
Write a query that returns the sum of all standard quality of paper from orders table where total_amt_usd is greaer than 1000.
SELECT SUM(standard_qty)
FROM orders
WHERE total_amt_usd > 1000;
Let's explore two common use cases of the SUM function same as COUNT :
1. Summing Column Values
To calculate the sum of values in a specific column, you can use the SUM function with the desired column name. For example, let's say we have a table called "Sales" that stores the sales amount for different products:
SELECT SUM(SalesAmount) AS TotalSales FROM Sales;
The above query will return a single row containing the sum of all sales amounts in the "Sales" table under the column alias "TotalSales."
2. Summing Values Based on a Condition
You can also use the SUM function with a WHERE clause to calculate the sum of values that meet specific criteria. For instance, if we want to calculate the total sales amount for a particular product with a specific product ID:
SELECT SUM(SalesAmount) AS TotalSalesForProductX FROM Sales WHERE ProductID = 'X';
This query will return the sum of sales amounts for all records in the "Sales" table where the "ProductID" is 'X'.
MIN and MAX in SQL
In SQL, the MIN and MAX functions are aggregate functions that allow you to find the minimum and maximum values in a column or an expression. These functions are useful for identifying the smallest and largest values within a dataset.
The syntax for using MIN and MAX is straightforward :
MIN Function
The MIN function returns the minimum value from a specified column or expression. Here's the basic syntax:
SELECT MIN(column_name) FROM table_name WHERE condition;
MAX Function
The MAX function, on the other hand, returns the maximum value from a specified column or expression. Here's the basic syntax:
SELECT MAX(column_name) FROM table_name WHERE condition;
AVG in SQL
In SQL, the AVG function is an aggregate function that calculates the average of numeric values in a column or an expression. It is commonly used to find the average value of a set of numerical data stored in a database table.
The syntax for the AVG function is straightforward:
SELECT AVG(column_name) FROM table_name WHERE condition;
Let's explore how the AVG function works with a practical example:
Example
Suppose we have a table called "TestScores" that stores the test scores of students.
To calculate the average test score from the "TestScores" table, we can use the following query:
SELECT AVG(TestScore) AS AverageScore FROM TestScores;
The above query will return a single row containing the average test score of all students under the column alias "AverageScore."