Unraveling a Crazy SQL Query: Finding the Most Loyal Customer Step by Step

Unraveling a Crazy SQL Query: Finding the Most Loyal Customer Step by Step

SQL queries can sometimes feel like solving a puzzle, especially when you’re trying to dig deep into data to uncover something specific. In this article, we’re going to tackle a fun yet slightly wild SQL query scenario: finding the most loyal customer who has ordered the same item the most times in a single year (2024, in our case). We’ll break it down step by step, starting from scratch, making it easy to understand even if you’re new to SQL. By the end, you’ll see how we crafted a “crazy” query that’s both powerful and practical. Let’s dive in!


Step 1: Understand the Problem

Before writing any code, let’s define what we’re trying to achieve:

  • Goal: Identify the customer (or customers) who ordered a specific product the most times in 2024.
  • Data: We have a database with three tables:
  • Output: We want the customer’s name, the product’s name, and the number of times they ordered that product. If multiple customers tie for the maximum number of orders, we should include all of them.

This sounds like a real-world problem—think of a coffee shop wanting to find out who’s buying their signature latte the most or an e-commerce site tracking repeat purchases of a popular item.


Step 2: Sketch the Database Schema

To make sense of the query, let’s visualize the tables and their relationships:

  • customers:
  • products:
  • orders:

From this, we can see:

  • Alice (customer_id 1) ordered a Latte (product_id 101) three times and an Espresso (product_id 102) once.
  • Bob (customer_id 2) ordered a Latte once.
  • We need to count orders per customer and product in 2024 and find the highest count.


Step 3: Start with the Basics—Joining Tables

To get the customer name, product name, and order details, we need to combine the orders, customers, and products tables. Since orders links customers and products via customer_id and product_id, we’ll use JOIN operations.

Let’s start with a simple query to fetch all orders with customer and product details:

sql

SELECT 
    c.customer_name,
    p.product_name,
    o.order_date
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE YEAR(o.order_date) = 2024;        

Explanation:

  • FROM orders o: Start with the orders table (aliased as o for brevity).
  • JOIN customers c ON o.customer_id = c.customer_id: Link orders to customers using customer_id.
  • JOIN products p ON o.product_id = p.product_id: Link orders to products using product_id.
  • WHERE YEAR(o.order_date) = 2024: Filter for orders in 2024. The YEAR() function extracts the year from the order_date.
  • SELECT c.customer_name, p.product_name, o.order_date: Return the customer’s name, product’s name, and order date.

Sample Output (based on our example data):

customer_name | product_name | order_date
--------------+--------------+------------
Alice         | Latte        | 2024-01-05
Alice         | Latte        | 2024-02-10
Bob           | Latte        | 2024-03-15
Alice         | Espresso     | 2024-04-20
Alice         | Latte        | 2024-05-25        

This gives us a raw list of orders, but we need to count how many times each customer ordered each product.


Step 4: Group and Count Orders

To find out how many times each customer ordered each product, we need to group the data by customer and product and count the orders:

sql

SELECT 
    c.customer_name,
    p.product_name,
    COUNT(*) as order_count
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE YEAR(o.order_date) = 2024
GROUP BY c.customer_id, c.customer_name, p.product_id, p.product_name;        

Explanation:

  • GROUP BY c.customer_id, c.customer_name, p.product_id, p.product_name: Group the rows by both customer and product. We include customer_id and product_id to ensure uniqueness (in case two customers or products have the same name).
  • COUNT(*) as order_count: Count the number of orders in each group.
  • The WHERE and JOIN clauses remain the same to filter for 2024 and link the tables.

Sample Output:

customer_name | product_name | order_count
--------------+--------------+------------
Alice         | Latte        | 3
Alice         | Espresso     | 1
Bob           | Latte        | 1        

Now we can see that Alice ordered a Latte 3 times, an Espresso once, and Bob ordered a Latte once. But we only want the customer(s) with the highest order count (in this case, Alice’s 3 Lattes).


Step 5: Find the Maximum Order Count

To get only the customer-product pair with the most orders, we need to:

  1. Find the maximum order_count across all customer-product pairs.
  2. Filter our results to show only the rows where the order_count equals that maximum.

This is where the query starts getting “crazy” because we need a subquery to calculate the maximum order count. Let’s modify the query:

sql

SELECT 
    c.customer_name,
    p.product_name,
    COUNT(*) as order_count
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE YEAR(o.order_date) = 2024
GROUP BY c.customer_id, c.customer_name, p.product_id, p.product_name
HAVING COUNT(*) = (
    SELECT MAX(order_count)
    FROM (
        SELECT COUNT(*) as order_count
        FROM orders o2
        JOIN customers c2 ON o2.customer_id = c2.customer_id
        JOIN products p2 ON o2.product_id = p2.product_id
        WHERE YEAR(o2.order_date) = 2024
        GROUP BY c2.customer_id, p2.product_id
    ) max_orders
);        

Explanation:

  • Outer Query: Same as before, it groups orders by customer and product and counts them.
  • HAVING COUNT(*) = (...): The HAVING clause filters groups based on a condition. Here, we only want groups where the order_count equals the maximum order count.
  • Subquery:
  • Table Aliases (o2, c2, p2): We use different aliases in the subquery to avoid conflicts with the outer query’s aliases (o, c, p).
  • Result: The HAVING clause ensures we only return rows where the order_count is 3 (the maximum).

Sample Output:

customer_name | product_name | order_count
--------------+--------------+------------
Alice         | Latte        | 3        

This shows that Alice is our most loyal customer for Lattes, with 3 orders. If another customer also had 3 orders for a different product, they’d appear here too.


Step 6: Add a Touch of Polish

To make the output cleaner and ensure consistent sorting, let’s add an ORDER BY clause to sort by order_count (descending) and then by customer_name and product_name (alphabetically):

sql

SELECT 
    c.customer_name,
    p.product_name,
    COUNT(*) as order_count
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE YEAR(o.order_date) = 2024
GROUP BY c.customer_id, c.customer_name, p.product_id, p.product_name
HAVING COUNT(*) = (
    SELECT MAX(order_count)
    FROM (
        SELECT COUNT(*) as order_count
        FROM orders o2
        JOIN customers c2 ON o2.customer_id = c2.customer_id
        JOIN products p2 ON o2.product_id = p2.product_id
        WHERE YEAR(o2.order_date) = 2024
        GROUP BY c2.customer_id, p2.product_id
    ) max_orders
)
ORDER BY order_count DESC, customer_name, product_name;        

Explanation:

  • ORDER BY order_count DESC, customer_name, product_name: Sorts results by order_count (highest first), then alphabetically by customer_name and product_name for ties.
  • This ensures predictable output, especially if multiple customers tie for the maximum order count.


Step 7: Why Is This Query “Crazy”?

This query might seem straightforward now, but it’s “crazy” for a few reasons:

  • Nested Subquery: The subquery inside a subquery to find the maximum order count can be tricky to follow.
  • Multiple Joins: Combining three tables requires careful attention to relationships.
  • HAVING Clause: Less common than WHERE, it’s used here to filter aggregated results.
  • Alias Management: Using o2, c2, p2 in the subquery to avoid conflicts adds complexity.

Yet, it’s practical and solves a real-world problem—perfect for impressing your data team or debugging a business question!


Step 8: Testing and Debugging Tips

When crafting a query like this, you might hit some snags. Here’s how to debug:

  • Test Each Part: Run the inner subquery alone to ensure it returns the correct order_count values.
  • Check Joins: Verify that the JOIN conditions don’t exclude rows (e.g., use LEFT JOIN if some orders might lack customer data).
  • Validate the Year Filter: Ensure YEAR(o.order_date) works with your database’s date format.
  • Handle Ties: The query already accounts for multiple customers with the same max order count, but double-check the output.
  • Performance: For large datasets, consider indexing customer_id, product_id, and order_date to speed up joins and filtering.


Step 9: Real-World Applications

This query can be adapted for many scenarios:

  • E-commerce: Find customers who repeatedly buy the same product to offer them loyalty discounts.
  • Restaurants/Cafes: Identify who’s obsessed with a specific menu item for targeted promotions.
  • Inventory Management: Spot high-demand products for specific customers to optimize stock.

You could extend it by:

  • Adding more filters (e.g., specific product categories).
  • Calculating total spend (if the orders table includes price data).
  • Comparing across years to spot trends.


Final Query

Here’s the complete, polished query one more time:

sql

SELECT 
    c.customer_name,
    p.product_name,
    COUNT(*) as order_count
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE YEAR(o.order_date) = 2024
GROUP BY c.customer_id, c.customer_name, p.product_id, p.product_name
HAVING COUNT(*) = (
    SELECT MAX(order_count)
    FROM (
        SELECT COUNT(*) as order_count
        FROM orders o2
        JOIN customers c2 ON o2.customer_id = c2.customer_id
        JOIN products p2 ON o2.product_id = p2.product_id
        WHERE YEAR(o2.order_date) = 2024
        GROUP BY c2.customer_id, p2.product_id
    ) max_orders
)
ORDER BY order_count DESC, customer_name, product_name;        


To view or add a comment, sign in

More articles by Ekanadh Reddy

Insights from the community

Others also viewed

Explore topics