MAX_BY Magic
MAX_BY, a powerful yet under utilized SQL function, is a game changer for writing readable and intuitive SQL.
MAX_BY(return_column, max_column)
returns the value from one column corresponding to the maximum in another.
Here's a simple example:
Imagine you're analyzing an e-commerce dataset and want to find the most effective marketing channel for each product category.
Instead of writing:
SELECT
c.category,
m.channel
FROM
categories c
JOIN (
SELECT
category_id,
MAX_BY(channel, revenue) AS channel
FROM
marketing_data
GROUP BY
category_id
) m ON c.id = m.category_id
We can write :
SELECT
category,
MAX_BY(channel, revenue) AS best_channel
FROM
marketing_data
GROUP BY
category
Cleaner, faster(may outperform co-related subquery approach), and more intuitive!
Pro Tip:
`MAX_BY(channel, revenue, 3)` for top 3 channels
If you see any opportunity to convert legacy codebases and rewrite SQL in different dialects, identify the patterns, and if you find an opportunity to use MAX_BY, future developers will thank you. And you got it - MIN_BY is no different!
Driving Data Solutions @HyugaLife | Ex-MamaEarth | Python, SQL | Power BI | Big Query | Apache Spark | Big Data | AWS
7moVery Informmative!
Data & AI | Databricks MVP & Champion
7moCapabilities such as max by and group by all have been interesting to see emerging as new standards. Nice post, Deepak Rout!