MAX_BY Magic

MAX_BY Magic

MAX_BY, a powerful yet under utilized SQL function, is a game changer for writing readable and intuitive SQL.

MAX_BY(return_column, max_column)         

returns the value from one column corresponding to the maximum in another.

Here's a simple example:

Imagine you're analyzing an e-commerce dataset and want to find the most effective marketing channel for each product category.

Instead of writing:

SELECT 
    c.category,
    m.channel
FROM 
    categories c
JOIN (
    SELECT 
        category_id,
        MAX_BY(channel, revenue) AS channel
    FROM 
        marketing_data
    GROUP BY 
        category_id
) m ON c.id = m.category_id        


We can write :

SELECT
    category,
    MAX_BY(channel, revenue) AS best_channel
FROM
    marketing_data
GROUP BY
    category        

Cleaner, faster(may outperform co-related subquery approach), and more intuitive!

Pro Tip:

  • Some databases offer extended functionality - Multiple returns:

`MAX_BY(channel, revenue, 3)` for top 3 channels

  • Can be used in window functions: MAX_BY(channel, revenue) OVER (PARTITION BY category)
  • MAX_BY can produce unpredictable results in cases of ties

If you see any opportunity to convert legacy codebases and rewrite SQL in different dialects, identify the patterns, and if you find an opportunity to use MAX_BY, future developers will thank you. And you got it - MIN_BY is no different!

Sudhanshu Bindal

Driving Data Solutions @HyugaLife | Ex-MamaEarth | Python, SQL | Power BI | Big Query | Apache Spark | Big Data | AWS

7mo

Very Informmative!

Like
Reply
Andrew Sitz

Data & AI | Databricks MVP & Champion

7mo

Capabilities such as max by and group by all have been interesting to see emerging as new standards. Nice post, Deepak Rout!

To view or add a comment, sign in

More articles by Deepak Rout

  • Custom Metadata in Delta Table History

    If you’ve ever lived in the pre-cloud ETL world, you’ll remember how ETL tools or stored procedures were the backbone…

    1 Comment
  • Do You Know What the RELY Option in a Primary Key Does on Databricks?

    If you're working with Databricks SQL and want to supercharge your query performance, it's time to explore the RELY…

  • Be Careful with NULL Handling During Database Migrations

    When migrating databases, especially when using window functions like ROW_NUMBER(), RANK(), or DENSE_RANK(), one subtle…

    1 Comment
  • YAML Engineers

    In the data engineering field, YAML files have become a beloved tool, much like in DevOps. Over the years, data…

  • Lost in translation

    I was scrolling through my LinkedIn feed the other day when I stumbled upon this hilarious (and kinda sad) interaction.…

  • The Modern Alternative to Makefiles

    Have you ever stared at a Makefile, feeling lost in a sea of colons, tabs, and cryptic syntax? You're not alone. Many…

  • Where Is Everyone ?

    Ever sent out a status update, shared code for review, or posted in a team channel only to be met with..

  • SQL Productivity Hack

    As a data consultant, I often find myself writing SQL queries to move data between tables with some transformations…

    1 Comment
  • "ASOF JOIN: Bridging the Time Gap in Data Analysis"

    You must have been hearing more and more about a new type of JOIN called . Modern databases are adding this feature to…

  • Value-Focused Framework

    As leaders in the digital age, we must look beyond technology and patterns to embrace a holistic view of architecture…

Insights from the community

Others also viewed

Explore topics