Day - 7 of SQL for Data Science

Day - 7 of SQL for Data Science

SQL Aggregation

In the world of data management and analysis, SQL (Structured Query Language) plays a pivotal role. One of its most powerful features is aggregation, which allows us to summarize and condense vast amounts of data into meaningful insights.

In this article, we will dive deep into the concept of SQL aggregation, exploring its various functions and syntax.

Use the ERD Diagram to understand and write queries :

No alt text provided for this image


What is SQL Aggregation?

SQL aggregation refers to the process of transforming multiple rows of data into a single summary. Instead of working with individual records, we apply functions to groups of records to produce meaningful results. SQL provides several aggregation functions, such as COUNT, SUM, AVG, MIN, MAX, and more, that enable us to perform calculations across multiple rows efficiently.

NULLs in SQL

In SQL, NULL represents the absence of a value in a particular column of a row. It does not signify zero, empty, or any specific data type but rather indicates the lack of a meaningful value. Think of it as a placeholder for missing or unknown information.

Important Point

When identifying NULLs in a WHERE clause, we write IS NULL or IS NOT NULL. We don't use =, because NULL isn't considered a value in SQL. Rather, it is a property of the data.

Example

Write a query that returns all the accounts where the website is not present.

SELECT *
FROM accounts
WHERE website IS NULL;        

Write a query that returns all the web_events where the channel is present.

SELECT *
FROM web_events
WHERE channel IS NOT NULL;        

COUNT in SQL

In SQL, the COUNT function is used to count the number of rows that meet a specific condition or the total number of rows in a table. It is one of the most commonly used aggregate functions and provides valuable insights into the size and distribution of data.

The syntax for the COUNT function is straightforward :

SELECT COUNT(column_name) 
FROM table_name 
WHERE condition;        

Let's explore two common use cases of the COUNT function :

1. Counting Total Rows

To count the total number of rows in a table, you can simply use the COUNT function without any specific column or condition.

SELECT COUNT(*) AS TotalRows FROM table_name;         

The above query will return a single row containing the total number of rows in the specified table under the column alias "TotalRows."

2. Counting Rows Based on a Condition

To count the number of rows that meet a specific condition, you can use the COUNT function with a WHERE clause.

For example, let's say we have a table called "Employees," and we want to count the number of employees who belong to the "Marketing" department:

SELECT COUNT(*) AS MarketingEmployees FROM Employees WHERE Department = 'Marketing';         

This query will return the count of employees who work in the "Marketing" department under the column alias "MarketingEmployees."

3. Handling NULL Values with COUNT

It's important to note that the COUNT function, by default, includes NULL values in its calculation. If you want to exclude NULL values, you can use the COUNT function with a specific column name that does not contain NULL values.

SELECT COUNT(column_name) FROM table_name WHERE column_name IS NOT NULL;         

In the case of the "Employees" table, if we want to count the number of employees who have a valid age (i.e., age is not NULL), we can use :

SELECT COUNT(Age) AS ValidAgeEmployees FROM Employees WHERE Age IS NOT NULL;         

This query will return the count of employees with a valid age (non-NULL) under the column alias "ValidAgeEmployees."

SUM in SQL

In SQL, the SUM function is an aggregate function that calculates the sum of numeric values in a column or an expression. It is commonly used to obtain the total of a set of numerical values stored in a database table.

The syntax for the SUM function is straightforward:

SELECT SUM(column_name) FROM table_name WHERE condition;        

Unlike COUNT, you can only use SUM on numeric columns. However, SUM will ignore NULL values, as do the other aggregation functions.

Example

Write a query that returns the sum of all standard quality of paper from orders table where total_amt_usd is greaer than 1000.

SELECT SUM(standard_qty)
FROM orders
WHERE total_amt_usd > 1000;        

Let's explore two common use cases of the SUM function same as COUNT :

1. Summing Column Values

To calculate the sum of values in a specific column, you can use the SUM function with the desired column name. For example, let's say we have a table called "Sales" that stores the sales amount for different products:

SELECT SUM(SalesAmount) AS TotalSales FROM Sales;         

The above query will return a single row containing the sum of all sales amounts in the "Sales" table under the column alias "TotalSales."

2. Summing Values Based on a Condition

You can also use the SUM function with a WHERE clause to calculate the sum of values that meet specific criteria. For instance, if we want to calculate the total sales amount for a particular product with a specific product ID:

SELECT SUM(SalesAmount) AS TotalSalesForProductX FROM Sales WHERE ProductID = 'X';         

This query will return the sum of sales amounts for all records in the "Sales" table where the "ProductID" is 'X'.

MIN and MAX in SQL

In SQL, the MIN and MAX functions are aggregate functions that allow you to find the minimum and maximum values in a column or an expression. These functions are useful for identifying the smallest and largest values within a dataset.

The syntax for using MIN and MAX is straightforward :

MIN Function

The MIN function returns the minimum value from a specified column or expression. Here's the basic syntax:

SELECT MIN(column_name) FROM table_name WHERE condition;        

MAX Function

The MAX function, on the other hand, returns the maximum value from a specified column or expression. Here's the basic syntax:

SELECT MAX(column_name) FROM table_name WHERE condition;        

AVG in SQL

In SQL, the AVG function is an aggregate function that calculates the average of numeric values in a column or an expression. It is commonly used to find the average value of a set of numerical data stored in a database table.

The syntax for the AVG function is straightforward:

SELECT AVG(column_name) FROM table_name WHERE condition;        

Let's explore how the AVG function works with a practical example:

Example

Suppose we have a table called "TestScores" that stores the test scores of students.

To calculate the average test score from the "TestScores" table, we can use the following query:

SELECT AVG(TestScore) AS AverageScore FROM TestScores;         

The above query will return a single row containing the average test score of all students under the column alias "AverageScore."


To view or add a comment, sign in

More articles by Mrityunjay Pathak

  • Bias and Variance and Its Trade Off

    There are various ways to evaluate a machine-learning model. Bias and Variance are one such way to help us in parameter…

  • Machine Learning Mathematics🔣

    Machine Learning is the field of study that gives computers the capability to learn without being explicitly…

  • How to Modify your GitHub Profile Readme File as your Portfolio

    What if you don't have a personal portfolio website? No worries! You can transform your GitHub README.md into a…

    4 Comments
  • Data Science Resources

    Are you starting your journey into the world of Data Science? Here's a curated list of top resources to master various…

  • 25 Python Sets Questions with Solution

    25 Python Sets Coding Questions along with Explanations for each. Let's get started ↓ Question 1: Write a Python…

  • 25 Python Tuple Questions with Solution

    25 Python Tuple Coding Questions along with Explanations for each. Let's get started ↓ Question 1: Find the length of a…

  • 25 Python Dictionary Questions and Solutions

    25 Python Dictionary Coding Questions along with Explanations for each. Let's get started ↓ Question 1: Create an empty…

  • 25 Python List Questions with Solution

    25 Python List Coding Questions along with Explanations for each. Let's get started ↓ Question: Given a list nums, find…

    2 Comments
  • 25 Python String Questions with Solution

    25 Python Strings Coding Questions along with Explanations for each. Let's get started ↓ Write a Python program to…

    3 Comments
  • 25 Python Loop Coding Questions

    25 Python Loop Coding Questions along with Explanations for each. Let's get started ↓ Print numbers from 1 to 10 using…

    3 Comments

Insights from the community

Others also viewed

Explore topics