Gaps & Islands: Number of Consecutive Days in SQL

Lasha Dolenjashvili

Senior Data Partner @ Tabby

Published Oct 8, 2023

In my latest post about Gaps & Islands problem, I promised to provide a real-world example that involves finding the number of consecutive login days in a dataset.

Problem Description:

Given a table CUSTOMER_LOGINS with the following columns:

customer_id: an identifier for each customer.
login_date: the date a customer logged in.

The task is to identify and group consecutive login dates for each customer, and for each such group, provide:

The customer_id of the customer.
The start date (interval_start_date) of the consecutive login period.
The end date (interval_end_date) of the consecutive login period.
The total number of consecutive days the customer logged in during that period (consecutive_days).

Step I: Create a Test Table

First, let's create a test table CUSTOMER_LOGINS.

-- Create the CUSTOMER_LOGINS table
CREATE TABLE CUSTOMER_LOGINS (
    customer_id INT,
    login_date DATE
);

-- Insert some sample data
INSERT INTO CUSTOMER_LOGINS (customer_id, login_date)
VALUES 
    (1, '2022-01-09'),
    (1, '2022-01-10'),
    (1, '2022-01-11'),
    (1, '2022-01-18'),
    (1, '2022-01-19'),
    (1, '2022-01-20'),
    (1, '2022-01-21');

SELECT * FROM CUSTOMER_LOGINS;

+-------------+------------+
| customer_id | login_date |
+-------------+------------+
|      1      | 2022-01-09 | 
|      1      | 2022-01-10 | 
|      1      | 2022-01-11 |
|      1      | 2022-01-18 |
|      1      | 2022-01-19 |
|      1      | 2022-01-20 |
|      1      | 2022-01-21 |
+-------------+------------+

Step II: Rank Dates

At first sight, it's not easy to guess how should we approach this problem. But if we think about the intermediary output, we probably need a column to group consecutive dates. Let's rank the date column and see what it gives us.

Step III: Subtract to Get Groups

If you think for a minute, you will notice that by subtracting rnk from login_date we get a unique group for consecutive dates.

WITH ranked_logins AS (
    SELECT
        customer_id
        , login_date
        , DENSE_RANK() OVER (
            PARTITION BY customer_id
            ORDER BY login_date
        ) AS rnk
    FROM
        CUSTOMER_LOGINS
)
SELECT
    customer_id
    , login_date
    -- Subract `rnk` from `login_date`
    , login_date - INTERVAL '1' DAY * rnk AS grp_date
FROM
    ranked_logins;

-- Intermediary Result
+-------------+------------+-------+------------+
| customer_id | login_date |  rnk  |  grp_date  |
+-------------+------------+-------+------------+
|      1      | 2022-01-09 |   1   | 2022-01-08 |
|      1      | 2022-01-10 |   2   | 2022-01-08 | 
|      1      | 2022-01-11 |   3   | 2022-01-08 |
|      1      | 2022-01-18 |   4   | 2022-01-14 |
|      1      | 2022-01-19 |   5   | 2022-01-14 |
|      1      | 2022-01-20 |   6   | 2022-01-14 |
|      1      | 2022-01-21 |   7   | 2022-01-14 |
+-------------+------------+-------+------------+

Step IV: MIN and MAX

The next step is to find the minimum and maximum dates for each grp_date, it will give us the interval_start_date and interval_end_date.

WITH ranked_logins AS (
    SELECT
        customer_id
        , login_date
        , DENSE_RANK() OVER (
            PARTITION BY customer_id
            ORDER BY login_date
        ) AS rnk
    FROM
        CUSTOMER_LOGINS
),
date_groups AS (
    SELECT
        customer_id
        , login_date
        , login_date - INTERVAL '1' DAY * rnk AS grp_date
    FROM
        ranked_logins
)
SELECT
    customer_id
    , MIN(login_date) AS interval_start_date
    , MAX(login_date) AS interval_end_date
FROM date_groups
GROUP BY customer_id, grp_date
ORDER BY customer_id, interval_start_date

-- Intermediary Result
+-------------+---------------------+-------------------+
| customer_id | interval_start_date | interval_end_date |
+-------------+---------------------+-------------------+
|      1      |      2022-01-09     |    2022-01-11     |
|      1      |      2022-01-18     |    2022-01-21     | 
+-------------+---------------------+-------------------+

Step (Final) V: Number of Consecutive Days

It is now obvious, we just need to count the days between interval_start_date and interval_end_date.

WITH ranked_logins AS (
    SELECT
        customer_id
        , login_date
        , DENSE_RANK() OVER (
            PARTITION BY customer_id
            ORDER BY login_date
        ) AS rnk
    FROM
        CUSTOMER_LOGINS
),
date_groups AS (
    SELECT
        customer_id
        , login_date
        , login_date - INTERVAL '1' DAY * rnk AS grp_date
    FROM
        ranked_logins
)
SELECT
    customer_id
    , MIN(login_date) AS interval_start_date
    , MAX(login_date) AS interval_end_date
    , 1 + DATEDIFF('day', 
            MIN(login_date), MAX(login_date)
          ) AS consecutive_days
FROM date_groups
GROUP BY customer_id, grp_date
ORDER BY customer_id, interval_start_date

-- Final Result
+---------+---------------------+-------------------+------------------+
| cust_id | interval_start_date | interval_end_date | consecutive_days |
+---------+---------------------+-------------------+------------------+
|    1    |      2022-01-09     |    2022-01-11     |        3         |
|    1    |      2022-01-18     |    2022-01-21     |        4         |
+---------+---------------------+-------------------+------------------+

It is not always obvious how should you approach the problem. 💡 Advice? Learn the basic structures of SQL and be creative!

🚀 Feel free to ask any questions and remember to 𝗞𝗲𝗲𝗽 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗙𝗹𝗼𝘄𝗶𝗻𝗴! 🔔 Engage, Share, & Follow | Lasha Dolenjashvili

Duhita Dharmadhikari

Business Intelligence Engineer | Analytics Engineer | Tableau, SQL, Python, Automation, Predictive Analytics | Building End-to-End Solutions

1mo

Same here, I love how you broke it down into five steps, made it super simple to grasp! Thank you

1 Reaction

D. Scott Toma

Quality Assurance | Software Testing | Incident/Problem Management | Data Governance

9mo

This is very helpful! I was doing online exercises, and encountered something very similar to this, and I was stuck. Your post got me to the answer

1 Reaction

See more comments

To view or add a comment, sign in

Gaps & Islands: Number of Consecutive Days in SQL

Lasha Dolenjashvili

Senior Data Partner @ Tabby

Problem Description:

Step I: Create a Test Table

Step II: Rank Dates

Recommended by LinkedIn

Step III: Subtract to Get Groups

Step IV: MIN and MAX

Step (Final) V: Number of Consecutive Days

More articles by Lasha Dolenjashvili

Insights from the community

Others also viewed

Unlocking the Power of Efficient SQL Queries

Use CatRegionMap to avoid hard-coding in Categories

How to Bypass Filter in SQL Injection Manually

🚀 Day 9 of 50: Database Challenge Begins! 🚀

SQL Practice question!!!

SQL >>VIEWS:

*Ungrouping the data in SQL

Current Out Queues from a Data Transfer

-- A Quick handy guide to generate SQL Trace --

How to do Insert, Update, and Delete operation against read-only views

Explore topics

Problem Description:

Step I: Create a Test Table

Step II: Rank Dates

Recommended by LinkedIn

Step III: Subtract to Get Groups

Step IV: MIN and MAX

Step (Final) V: Number of Consecutive Days

More articles by Lasha Dolenjashvili

Printing Your Machine's Specifications with Python

Excel Isn't Going Anywhere, So Let's Automate Parsing It

Building Document Parsing Pipelines with Python

Introduction to Network Analysis with Neo4j, AuraDB, and Python 🕸️

Generating 1 Billion Rows of Complex Synthetic Data 🚀

Exploring SQL without Window Functions (Part II) - Examples

SQL’s EXISTS and NOT EXISTS: A Comprehensive Guide

Exploring SQL without Window Functions (Part I)

SQL’s Order of Execution

Insights from the community

Others also viewed

Unlocking the Power of Efficient SQL Queries

Use CatRegionMap to avoid hard-coding in Categories

How to Bypass Filter in SQL Injection Manually

🚀 Day 9 of 50: Database Challenge Begins! 🚀

SQL Practice question!!!

SQL >>VIEWS:

*Ungrouping the data in SQL

Current Out Queues from a Data Transfer

-- A Quick handy guide to generate SQL Trace --

How to do Insert, Update, and Delete operation against read-only views

Explore topics