Radical Speed for SQL Queries on Databricks: Photon Under the Hood

Radical Speed for SQL
Queries on Databricks:
Photon Under the Hood
Alex Behm
Tech Lead, Databricks
Greg Rahn
Staff Product Manager, Databricks

Agenda
▪ Intro to Photon
▪ Recent Developments
▪ Up Next
▪ Summary

Observed Workload Trends
Businesses are moving faster, and as a result
organizations spend less time in data modeling, leading
to worse performance.
▪ Most columns don’t have "NOT NULL" constraints deﬁned
▪ Strings are convenient but slower than speciﬁc types
▪ Data lifecycle: Raw → Bronze → Silver → Gold
Can we get both agility and performance?

-- Data [Analysts | Engineers | Scientists] everywhere
Just one more ask:
SQL as a ﬁrst-class citizen on
Databricks

What is Photon?
Photon is a new 100% Apache Spark compatible query engine
designed for speed and ﬂexibility.
It’s built from the ground up to deliver the fastest performance
on modern cloud hardware for all data use cases across
data engineering, data science, machine learning, and data analytics.

• Re-architected for the fastest performance on real-world
applications
• Native C++ engine for faster queries
• Custom built memory management to avoid JVM bottlenecks
• Vectorized: memory, instruction, and data parallelism (SIMD)
• Works with your existing code and avoids vendor lock-in
• 100% compatible with open source Spark DataFrame APIs and Spark SQL
• Transparent operation to users - no need to invoke something new, it just works
• Optimizing for all data use cases and workloads
• Today, supporting SQL and DataFrame workloads
• Coming soon, Streaming, Data Science, and more
Building the next generation query engine

Why build a new execution engine?

● Parsing
● Catalyst: Analysis/Planning/Optimization
● Scheduling
Execute Task
Client: Submit SQL Query
Execute Task Execute Task Execute Task Spark Executors
Mixed
JVM/Native
Spark Driver
JVM
Photon in the Databricks Lakehouse Platform
Delta Lake
1
0
1
0
1
0
1
0
1
0
1
0

• Hybrid Photon/Spark Plans
• Use Photon when possible, fall back to Spark for unsupported operations
• Completely transparent to users
• Native code using off-heap memory
• Natural access to memory and intrinsics (no ﬁddling with Java Unsafe)
• No JVM GC, large heaps ok
• No JVM JIT performance cliffs / limitations
• Fully integrated with Spark’s memory manager
• Prefers hash join over sort-merge join
• Rich per-operator performance metrics
Key Photon Characteristics

Development Focus Areas
1. Production Readiness
a. Goal: Resilience comparable to DBR → spilling support
b. Testing and hardening, real customer workloads
2. Query Coverage
a. Today: Basics like joins/aggregations/shuffle, common types and functions
b. In development: Nested types, built-in functions
c. Coming soon: Sort/Window
3. Performance
a. Analyze and optimize common usage patterns

Disclaimer: Microbenchmarks
Microbenchmarks do not necessarily reﬂect
real-world end-to-end performance
During Photon development we analyze and optimize
performance with extensive microbenchmarks
In the following slides, we share benchmark results that
were run in controlled and narrowly scoped scenarios

Resilience with Very Large Inputs
• Spilling for very large inputs
• Write intermediate state to external storage to process
inputs exceeding available memory
✅ Hash Shuffle
✅ Hash Aggregation
✅ Hash Join
2-5x Speedup

Example: Spilling Hash Join [1 of 4]
Partitioned Hash Table
• Hash join has two phases
• build and probe
• Build phase: insert records
from one join input into the
hash table
• Hash table has a ﬁxed
number of partitions

• When memory runs out spill
one partition to disk
• New records go to
in-memory partitions or
straight to disk
• Repeat until build is done

• Probe phase: process
rows from other join input
• Emit results for probe
rows matching in-memory
build partitions
• Spill probe rows matching
a spilled build partition
Build
Probe

• For each spilled partition,
repeat the same
build/probe process
• Might spill again! Apply
same algorithm recursively
Build
Probe
⨝

Spilling Hash Join vs. Spilling Sort-Merge Join
• Photon converts Sort-Merge Joins to Hash Joins
• Sort Merge Join
• Buffer + sort both join inputs, increasing memory pressure
• Spilling sort → write entire input to sorted runs
• Hash Join
• Only buffer build input (typically the smaller input) in a hash table
• Graceful degradation: Spill both inputs at the build-partition granularity
• Role reversal: Swap build/probe when processing spilled partitions
Up to 5x Speedup

Hardening: How we test Photon
• Random queries and data
• Using new open-source Spark random query generator
• Failure injection
• Randomly trip error paths to ensure graceful query failure
• Spill injection
• Randomly trigger spill events to simulate memory pressure
• Clang/LLVM C++ tools
• Address Sanitizer
• Undeﬁned Behavior Sanitizer
• Combinations of the above
🐛
🔨

Overview of Query Coverage
Data Types Operators
✅ Byte/Short/Int/Long
✅ Boolean
✅ String/Binary
✅ Decimal
✅ Float/Double
✅ Date/Timestamp
✅ Struct
Coming soon: Array, Map
✅ Scan, Filter, Project
✅ Hash Aggregate/Join/Shuffle
✅ Nested-Loop Join
✅ Null-Aware Anti Join
✅ Union, Expand, ScalarSubquery
Coming soon: Sort, Window
Expressions
✅ Comparison / Logic
✅ Arithmetic / Math (most)
✅ Conditional (IF, CASE, etc.)
✅ String (common ones)
✅ Casts
✅ Aggregates (most common
ones)
✅ Date/Timestamp (in progress)
Coming soon: UDFs, long tail

Expression Coverage for DATE/TIMESTAMP
• Many queries contain date/timestamp logic
• As of today: 95% coverage (100% very soon)
• Fast path for UTC timezone (default)
• Some expressions are very complicated to implement
• Individual functions run in Spark, but still run the operator/plan in Photon

Microbenchmarks do not necessarily reﬂect speedups on end-to-end queries, functions optimized for UTC timezone, your mileage may vary

Nested/Complex Type Support
• ✅ Struct
• Array / Map, in active development
• Reading data and basic usage/functions work
• In progress: collect_list() / collect_set()
• Long tail of array expressions

Microbenchmarks do not necessarily reﬂect speedups on end-to-end queries, your mileage may vary

• Currently supports all scalar types and Struct
• Array/Map in active development
• Can be turned on/off independently of Photon
• spark.databricks.photon.parquetWriter.enabled = true
• Typical speedups: 2-4x
• Wider (>100 columns) tables can see even more gains
Writing Delta/Parquet Data

DML Support [DELETE / UPDATE / MERGE]
• Bulk of work like joins/aggregations run in Photon
• Beneﬁts from Photon Delta/Parquet writing capability
• Typical speedups: 2-3x
ANSI SQL Support
• Development in tandem with open-source Spark
• Fail queries on overﬂow or similar errors

Current/Up Next Efforts in Photon
• Finishing nested type support, including writes
• Outstanding ANSI SQL behaviors
• Sort and Window operators
• Support for bucketed tables

● Enable Photon via Workspace cluster
● Notebook or JAR
● Available on: AWS
● Not supported yet
○ UDFs
○ Streaming
● Photon via Databricks SQL
● Redash
● Tableau
● Microsoft Power BI
● BYO Tool via ODBC / JDBC
● Available on: AWS, Azure
● Not supported yet
○ Sort
○ Window
SQL Data Engineering / ELT / ETL
Interactive SQL Analytics
J
u
n
e
Photon: Key Use Cases for Preview
J
u
n
e

Radical Speed for SQL Queries on Databricks: Photon Under the Hood

SELECT
vendor_id,
SUM(trip_distance) as SumTripDistance,
AVG(trip_distance) as AvgTripDistance
FROM abehm.nyc_yellow
WHERE passenger_count IN (1, 2, 4)
GROUP BY vendor_id
ORDER BY vendor_id
Sort
+- Exchange rangepartitioning
+- HashAggregate
+- Exchange hashpartitioning
+- HashAggregate
+- Project
+- Filter
+- ColumnarToRow
+- FileScan
Sort
+- Exchange
+- ColumnarToRow
+- PhotonResultStage
+- PhotonGroupingAgg
+- PhotonShuffleExchangeSource
+- PhotonShuffleMapStage
+- PhotonShuffleExchangeSink
+- PhotonGroupingAgg
+- PhotonProject
+- PhotonFilter
+- PhotonAdapter
+- FileScan

Spark UI
● Yellow → Photon Nodes
● Blue → Spark Nodes
Metrics
● Photon nodes have rich metrics to help
understand behavior and performance
● Easier than Spark where several nodes
are squashed together

Customer Feedback
Test Date
Average Query
Response time
(seconds)
Reduction
from
previous
June '20
DBR v6.6
7.8
December
'20
Photon
6.2 21%
May '21
Photon
4.4 29%
44% reduction

2.5x
3.7x
Avg query speedup
Power Test speedup

DEMO
"Demo" - just a walkthrough showing where users
can turn on Photon in Databricks?
Note: From getting started to executing existing
code/queries and monitoring Photon (Spark UI +
Query execution on SQLA)

Logo slide with generalized perf observations
brought down merge latency by 2-3x

Related Talks
WEDNESDAY
• 03:50 PM (PT): Databricks SQL Analytics Deep Dive for the Data Analyst - Doug Bateman, Databricks
• 04:25 PM (PT): Radical Speed for SQL Queries on Databricks: Photon Under the Hood - Greg Rahn & Alex Behm,
Databricks
• 04:25 PM (PT): Delivering Insights from 20M+ Smart Homes with 500M+ devices - Sameer Vaidya, Plume
THURSDAY
• 11:00 AM (PT): Getting Started with Databricks SQL Analytics - Simon Whiteley, Advancing Analytics
• 03:15 PM (PT): Building Lakehouses on Delta Lake and SQL Analytics - A Primer - Franco Patano, Databricks
FRIDAY
• 10:30 AM (PT): SQL Analytics Powering Telemetry Analysis at Comcast - Suraj Nesamani, Comcast
& Molly Nagamuthu, Databricks

How to get started
In June
databricks.com/try

SQL> SELECT questions FROM audience;

Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

Data Types Operators
✅ Byte/Short/Int/Long
✅ Boolean
✅ String/Binary
✅ Decimal
✅ Float/Double
✅ Date/Timestamp
✅ Struct
Coming soon: Array, Map
✅ Scan, Filter, Project
✅ Hash Aggregate/Join/Shuffle
✅ Nested-Loop Join
✅ Null-Aware Anti Join
✅ Union, Expand, ScalarSubquery
Coming soon: Sort, Window
Expressions
✅ Comparison / Logic
✅ Arithmetic / Math (most)
✅ Conditional (IF, CASE, etc.)
✅ String (common ones)
✅ Casts
✅ Aggregates (most common
ones)
✅ Date/Timestamp (in progress)
Coming soon: UDFs, long tail

● Parsing
● Catalyst: Analysis/Planning/Optimization
● Scheduling
Execute Task
Client: Submit SQL Query
Execute Task Execute Task Execute Task Spark Executors
Mixed
JVM/Native
Spark Driver
JVM
Delta Lake
1
0
1
0
1
0
1
0
1
0
1
0

Radical Speed for SQL Queries on Databricks: Photon Under the Hood

Recommended

More Related Content

What's hot (20)

Similar to Radical Speed for SQL Queries on Databricks: Photon Under the Hood (20)

More from Databricks (20)

Recently uploaded (20)

Radical Speed for SQL Queries on Databricks: Photon Under the Hood