Gaps & Islands: Number of Consecutive Days in SQL

Gaps & Islands: Number of Consecutive Days in SQL

In my latest post about Gaps & Islands problem, I promised to provide a real-world example that involves finding the number of consecutive login days in a dataset.

Problem Description:

Given a table CUSTOMER_LOGINS with the following columns:

  • customer_id: an identifier for each customer.
  • login_date: the date a customer logged in.

The task is to identify and group consecutive login dates for each customer, and for each such group, provide:

  1. The customer_id of the customer.
  2. The start date (interval_start_date) of the consecutive login period.
  3. The end date (interval_end_date) of the consecutive login period.
  4. The total number of consecutive days the customer logged in during that period (consecutive_days).

Step I: Create a Test Table

First, let's create a test table CUSTOMER_LOGINS.

-- Create the CUSTOMER_LOGINS table
CREATE TABLE CUSTOMER_LOGINS (
    customer_id INT,
    login_date DATE
);

-- Insert some sample data
INSERT INTO CUSTOMER_LOGINS (customer_id, login_date)
VALUES 
    (1, '2022-01-09'),
    (1, '2022-01-10'),
    (1, '2022-01-11'),
    (1, '2022-01-18'),
    (1, '2022-01-19'),
    (1, '2022-01-20'),
    (1, '2022-01-21');

SELECT * FROM CUSTOMER_LOGINS;

+-------------+------------+
| customer_id | login_date |
+-------------+------------+
|      1      | 2022-01-09 | 
|      1      | 2022-01-10 | 
|      1      | 2022-01-11 |
|      1      | 2022-01-18 |
|      1      | 2022-01-19 |
|      1      | 2022-01-20 |
|      1      | 2022-01-21 |
+-------------+------------+        

Step II: Rank Dates

At first sight, it's not easy to guess how should we approach this problem. But if we think about the intermediary output, we probably need a column to group consecutive dates. Let's rank the date column and see what it gives us.

SELECT 
    customer_id
    , login_date
    , RANK() OVER (
        PARTITION BY customer_id
        ORDER BY login_date
      ) AS rnk
FROM CUSTOMER_LOGINS

-- Intermediary Result
+-------------+------------+-------+
| customer_id | login_date |  rnk  |
+-------------+------------+-------+
|      1      | 2022-01-09 |   1   |
|      1      | 2022-01-10 |   2   |
|      1      | 2022-01-11 |   3   |
|      1      | 2022-01-18 |   4   |
|      1      | 2022-01-19 |   5   |
|      1      | 2022-01-20 |   6   |
|      1      | 2022-01-21 |   7   |
+-------------+------------+-------+        

Step III: Subtract to Get Groups

If you think for a minute, you will notice that by subtracting rnk from login_date we get a unique group for consecutive dates.

WITH ranked_logins AS (
    SELECT
        customer_id
        , login_date
        , DENSE_RANK() OVER (
            PARTITION BY customer_id
            ORDER BY login_date
        ) AS rnk
    FROM
        CUSTOMER_LOGINS
)
SELECT
    customer_id
    , login_date
    -- Subract `rnk` from `login_date`
    , login_date - INTERVAL '1' DAY * rnk AS grp_date
FROM
    ranked_logins;

-- Intermediary Result
+-------------+------------+-------+------------+
| customer_id | login_date |  rnk  |  grp_date  |
+-------------+------------+-------+------------+
|      1      | 2022-01-09 |   1   | 2022-01-08 |
|      1      | 2022-01-10 |   2   | 2022-01-08 | 
|      1      | 2022-01-11 |   3   | 2022-01-08 |
|      1      | 2022-01-18 |   4   | 2022-01-14 |
|      1      | 2022-01-19 |   5   | 2022-01-14 |
|      1      | 2022-01-20 |   6   | 2022-01-14 |
|      1      | 2022-01-21 |   7   | 2022-01-14 |
+-------------+------------+-------+------------+        

Step IV: MIN and MAX

The next step is to find the minimum and maximum dates for each grp_date, it will give us the interval_start_date and interval_end_date.

WITH ranked_logins AS (
    SELECT
        customer_id
        , login_date
        , DENSE_RANK() OVER (
            PARTITION BY customer_id
            ORDER BY login_date
        ) AS rnk
    FROM
        CUSTOMER_LOGINS
),
date_groups AS (
    SELECT
        customer_id
        , login_date
        , login_date - INTERVAL '1' DAY * rnk AS grp_date
    FROM
        ranked_logins
)
SELECT
    customer_id
    , MIN(login_date) AS interval_start_date
    , MAX(login_date) AS interval_end_date
FROM date_groups
GROUP BY customer_id, grp_date
ORDER BY customer_id, interval_start_date

-- Intermediary Result
+-------------+---------------------+-------------------+
| customer_id | interval_start_date | interval_end_date |
+-------------+---------------------+-------------------+
|      1      |      2022-01-09     |    2022-01-11     |
|      1      |      2022-01-18     |    2022-01-21     | 
+-------------+---------------------+-------------------+        

Step (Final) V: Number of Consecutive Days

It is now obvious, we just need to count the days between interval_start_date and interval_end_date.

WITH ranked_logins AS (
    SELECT
        customer_id
        , login_date
        , DENSE_RANK() OVER (
            PARTITION BY customer_id
            ORDER BY login_date
        ) AS rnk
    FROM
        CUSTOMER_LOGINS
),
date_groups AS (
    SELECT
        customer_id
        , login_date
        , login_date - INTERVAL '1' DAY * rnk AS grp_date
    FROM
        ranked_logins
)
SELECT
    customer_id
    , MIN(login_date) AS interval_start_date
    , MAX(login_date) AS interval_end_date
    , 1 + DATEDIFF('day', 
            MIN(login_date), MAX(login_date)
          ) AS consecutive_days
FROM date_groups
GROUP BY customer_id, grp_date
ORDER BY customer_id, interval_start_date

-- Final Result
+---------+---------------------+-------------------+------------------+
| cust_id | interval_start_date | interval_end_date | consecutive_days |
+---------+---------------------+-------------------+------------------+
|    1    |      2022-01-09     |    2022-01-11     |        3         |
|    1    |      2022-01-18     |    2022-01-21     |        4         |
+---------+---------------------+-------------------+------------------+        

It is not always obvious how should you approach the problem. 💡 Advice? Learn the basic structures of SQL and be creative!


🚀 Feel free to ask any questions and remember to 𝗞𝗲𝗲𝗽 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗙𝗹𝗼𝘄𝗶𝗻𝗴! 🔔 Engage, Share, & Follow | Lasha Dolenjashvili

Duhita Dharmadhikari

Business Intelligence Engineer | Analytics Engineer | Tableau, SQL, Python, Automation, Predictive Analytics | Building End-to-End Solutions

1mo

Same here, I love how you broke it down into five steps, made it super simple to grasp! Thank you

D. Scott Toma

Quality Assurance | Software Testing | Incident/Problem Management | Data Governance

9mo

This is very helpful! I was doing online exercises, and encountered something very similar to this, and I was stuck. Your post got me to the answer

To view or add a comment, sign in

More articles by Lasha Dolenjashvili

Insights from the community

Others also viewed

Explore topics