SlideShare a Scribd company logo
RMIT Classification: Trusted
Steering Query Optimizers: A Practical Take on Big
Data Workloads
Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska,
Marc Friedman, Alekh Jindal
Microsoft, MIT & Intel Lab.
SIGMOD’21
Presented by Hai Lan
RMIT Classification: Trusted
Outline
• Background
• Optimizer & Learned methods on optimizer
• Bao (SIGMOD’21)
• To-be-shared work
• Scope optimizer & workload
• Motivation & Goal
• Method
• Discussions
19/8/21 Group meeting 2
RMIT Classification: Trusted
Background
19/8/21 Group meeting 3
RMIT Classification: Trusted
Background – Optimizer
19/8/21 Group meeting 4
Query
Parser
AST
Query Rewrite
AST’
Optimizer
Phy. Plan
Executor
Results
Logical Opt.
Physical Opt.
Log. Plan
Phy. Plan
Life of A Query
RMIT Classification: Trusted
Background – Optimizer
19/8/21 Group meeting 5
Query
Parser
AST
Query Rewrite
AST’
Optimizer
Phy. Plan
Executor
Results
Logical Opt.
Physical Opt.
Log. Plan
Phy. Plan
Life of A Query Two Representative Arch.
Volcano
RMIT Classification: Trusted
Background – Optimizer
19/8/21 Group meeting 6
Query
Parser
AST
Query Rewrite
AST’
Optimizer
Phy. Plan
Executor
Results
Logical Opt.
Physical Opt.
Log. Plan
Phy. Plan
Life of A Query Two Representative Arch.
Cascades
Volcano
RMIT Classification: Trusted
Background – Keys in Optimizer
19/8/21 Group meeting 7
Cardinality Estimation
Plan Enumeration (Join Order)
Cost Model
A structure to store the table statistics, e.g., sample,
histogram, sketch.
Evaluation model, e.g., evaluate on sample, assumptions when
using histogram.
Cardinality Estimation
Predefined parameters, which are related to physical operators, running env.
Cost Model
Large join query
RMIT Classification: Trusted
Background – Keys in Optimizer
19/8/21 Group meeting 8
Cardinality Estimation
Plan Enumeration (Join Order)
Cost Model
A structure to store the table statistics, e.g., sample,
histogram, sketch.
Evaluation model, e.g., evaluate on sample, assumptions when
using histogram.
Cardinality Estimation
Predefined parameters, which are related to physical operator, environment.
Cost Model
Large join query
The root of all evil, the Achilles Heel of query optimization, is the
estimation of the size of intermediate results, known as cardinalities.
-- Guy Lohman
RMIT Classification: Trusted
19/8/21 Group meeting 9
Learned model to estimate
the cardinality.
Learned model to get the cost8,9.
Reinforcement learning methods to obtain the join order10,11.
Query-driven1,2,3
Data-driven 4,5,6
Hybrid 7
1. Andreas Kipf et al. : Learned Cardinalities: Estimating Correlated Joins with Deep Learning. CIDR 2019
2. Anshuman Dutt et al. : Selectivity Estimation for Range Predicates using Lightweight Models. Proc. VLDB Endow. 12(9): 1044-1057 (2019)
3. Chenggang Wu et al. : Towards a Learning Optimizer for Shared Clouds. Proc. VLDB Endow. 12(3): 210-222 (2018)
4. Zongheng Yang et al. : Deep Unsupervised Cardinality Estimation. Proc. VLDB Endow. 13(3): 279-292 (2019)
5. Benjamin Hilprecht et al. : DeepDB: Learn from Data, not from Queries! Proc. VLDB Endow. 13(7): 992-1005 (2020)
6. Rong Zhu et al. : FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. Proc. VLDB Endow. 14(9): 1489-1502 (2021)
7. Peizhi Wu, Gao Cong: A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation. SIGMOD Conference 2021: 2009-2022
8. Ji Sun, Guoliang Li: An End-to-End Learning-based Cost Estimator. Proc. VLDB Endow. 13(3): 307-319 (2019)
9. Tarique Siddiqui et al. : Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings. SIGMOD Conference 2020: 99-113
10. Sanjay Krishnan et al. : Learning to Optimize Join Queries With Deep Reinforcement Learning. CoRR abs/1808.03196 (2018)
11. Xiang Yu, Guoliang Li, Chengliang Chai, Nan Tang: Reinforcement Learning with Tree-LSTM for Join Order Selection. ICDE 2020: 1297-1308
RMIT Classification: Trusted
Background – Bao1 (Bandit Optimizer)
19/8/21 Group meeting 10
Motivations.
• Due to the inaccurate cardinality estimation, wrong
physical operators may be selected.
• Databases support hints2 to specify some operators.
1. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska: Bao: Making Learned Query
Optimization Practical. SIGMOD Conference 2021: 1275-1288
2. Here `hint` is not the same with in TiDB or MySQL.
RMIT Classification: Trusted
Background – Bao1 (Bandit Optimizer)
19/8/21 Group meeting 11
Motivations.
• Due to the inaccurate cardinality estimation, wrong
physical operators may be selected.
• Databases support hints2 to specify some operators.
Bao’s Work.
• It automatically and adaptively determines the right hint set to use for an incoming query.
• Instead of using `cost` in optimizer, users can specify a metric, like running time used in the paper.
1. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska: Bao: Making Learned Query
Optimization Practical. SIGMOD Conference 2021: 1275-1288
2. Here `hint` is not the same with in TiDB or MySQL.
48 (26) hint sets
RMIT Classification: Trusted
Background – Bao
19/8/21 Group meeting 12
Method.
• Train a predictive model for the metric.
• When a query coming, select the plan with the lowest cost under the metric.
RMIT Classification: Trusted
Background – Bao
19/8/21 Group meeting 13
Method.
• Train a predictive model for the metric.
• When a query coming, select the plan with the lowest cost under the metric.
Prons & Cons.
• Prons
• Dynamic situations
• Integrate with a real system
• Training time
• Cons
• It cannot specify the subplan hint.
RMIT Classification: Trusted
Steer Scope Optimizer
19/8/21 Group meeting 14
RMIT Classification: Trusted
Scope Overview
19/8/21 Group meeting 15
Scope Optimizer.
• Belong to Cascades family.
• 256 rules in total
• Required rules, e.g., EnforceExchange
• Implementation rules, e.g., HashJoinImpl1
• On-by-default rules, e.g., various rewrite rules
• Off-by-default rules, e.g., CorrelatedJoinOnUnion
RMIT Classification: Trusted
Scope Overview
19/8/21 Group meeting 16
Scope Optimizer.
• Belong to Cascades family.
• 256 rules in total
• Required rules, e.g., EnforceExchange
• Implementation rules, e.g., HashJoinImpl1
• On-by-default rules, e.g., various rewrite rules
• Off-by-default rules, e.g., CorrelatedJoinOnUnion
Default
Rules
RMIT Classification: Trusted
Scope Overview
19/8/21 Group meeting 17
Scope Optimizer.
Workload in Scope.
• Belong to Cascades family.
• 256 rules in total
• Required rules, e.g., EnforceExchange
• Implementation rules, e.g., HashJoinImpl1
• On-by-default rules, e.g., various rewrite rules
• Off-by-default rules, e.g., CorrelatedJoinOnUnion
Default
Rules
RMIT Classification: Trusted
Scope Overview
19/8/21 Group meeting 18
Scope Optimizer.
Workload in Scope.
• Recurrent jobs, same template with different variables.
• Short & long running jobs.
• 10% of jobs last over 5 min while consume 90% of containers.
• Metrics
• Runtime
• CPU time
• Total I/O time
• Belong to Cascades family.
• 256 rules in total
• Required rules, e.g., EnforceExchange
• Implementation rules, e.g., HashJoinImpl1
• On-by-default rules, e.g., various rewrite rules
• Off-by-default rules, e.g., CorrelatedJoinOnUnion
Default
Rules
RMIT Classification: Trusted
Motivations & Goal
19/8/21 Group meeting 19
Motivations.
• Due to the inaccurate cardinality estimation, wrong rules may be selected.
• Hints to specify which rules to use.
RMIT Classification: Trusted
Motivations & Goal
19/8/21 Group meeting 20
Motivations.
• Due to the inaccurate cardinality estimation, wrong rules may be selected.
• Hints to specify which rules to use.
Goal.
• Output an alternative rule configuration which is better for optimizing this
particular job, and for a given metric
RMIT Classification: Trusted
Motivations & Goal
19/8/21 Group meeting 21
Motivations.
• Due to the inaccurate cardinality estimation, wrong rules may be selected.
• Hints to specify which rules to use.
Relationship with Bao.
• Directly apply Bao on Scope?
• Hint -> Rule; Hint Set -> Rule configuration
• However …
• A lot more rules (200+ vs. 6) -> too many possible rule configurations
• Large workload -> large running time & hundreds of operator nodes.
Goal.
• Output an alternative rule configuration which is better for optimizing this
particular job, and for a given metric
RMIT Classification: Trusted
Rule Signature & Job Span
19/8/21 Group meeting 22
Rule Signature.
• A bit vector specifying which rules directly contribute to the final query plan produced by the
optimizer as the rule signature.
• The rule signature of a query optimized using the default rule configuration as the default
rule signature.
Job Span
• Given a job, its span contains all non-required rules which, if enabled or disabled, can
affect the final query plan.
• Heuristics to generate the span.
RMIT Classification: Trusted
Which rules to try?
19/8/21 Group meeting 23
• Enable all the rules that are not in the span of the given job.
• For each rule category, independently sample a subset of rules from the job span. Disable
these rules, and enable all others. This gives us a new rule configuration.
• If the rule configuration has not been seen before, add it to the candidate list. Repeat until 𝑀
configurations are generated.
𝑀 = 1000
Randomized Configuration Search.
RMIT Classification: Trusted
Which jobs to try?
19/8/21 Group meeting 24
Choose Jobs & Configurations to Execute.
• Select Jobs.
• Jobs with clearly lower costs with recompiled plans under the default cost model.
• Jobs with low cost, high runtimes under the default configuration (cost model is wrong).
• Select Configurations.
• Select the 10 cheapest (cost model) alternative rule configurations and execute them.
Workload B (compare to the default configuration)
RMIT Classification: Trusted
Which jobs to try?
19/8/21 Group meeting 25
Choose Jobs & Configurations to Execute.
• Select Jobs.
• Jobs with clearly lower costs with recompiled plans under the default cost model.
• Jobs with low cost, high runtimes under the default configuration (cost model is wrong).
• Select Configurations.
• Select the 10 cheapest (cost model) alternative rule configurations and execute them.
Workload B (compare to the default configuration)
RMIT Classification: Trusted
Different metrics
19/8/21 Group meeting 26
Other metrics sometime see regression.
RMIT Classification: Trusted
Different metrics
19/8/21 Group meeting 27
RMIT Classification: Trusted
Different metrics
19/8/21 Group meeting 28
RMIT Classification: Trusted
Different metrics
19/8/21 Group meeting 29
All metrics cannot be improved together.
Potentially to adopt different models for each one.
RMIT Classification: Trusted
Extrapolating to other jobs
19/8/21 Group meeting 30
• The rule signature as the level of granularity across which the same set of rule
configurations could be useful.
• Rule signature job group
• The set of jobs whose default rule signature map to the same bit vector.
Idea.
Methods.
• Case 1: simply apply a previously seen rule configuration.
• Case 2: find set of interesting configurations for each job group and adopt a
model to choose one at the compile time.
RMIT Classification: Trusted
Learning Rule Configurations
19/8/21 Group meeting 31
• Select S rule signatures from Workload.
• Collect the jobs whose default rule signature maps to these rule signatures.
• Obtain K candidate configurations for each job group.
• we sample 𝑀 jobs from all the jobs mapping to these job groups.
• execute each of the 𝐾 configurations for every job.
Training Set.
Learning Problem.
• Treat the dataset of samples in each job group as an independent learning problem.
• Goal is to select one of the 𝐾 candidate configurations for a given query.
• Supervised learning to estimate the running time of query under a configuration.
Featurization.
• Job level features, e.g., input cardinality size, hash of template.
• Rule configuration features, e.g., cost of plan, bit vector of RuleDiff.
• Query graph features, e.g., operators’ id, cost.
Learned Models.
• For each job group, a fully connected neural network with one hidden layer of size 1024.
(Job, RuleConf, Running Time)
RMIT Classification: Trusted
19/8/21 Group meeting 32
Learning Rule Configurations
RMIT Classification: Trusted
Discussion
19/8/21 Group meeting 33
Future work.
• Methods to generate the job span & interesting rule configurations.
• Use feedback from the execution results to guide future iterations of the configuration search
• Other configurable options in Scope.
Discussion.
Summary.
• How to choose the right rule configuration for an incoming query.
• Propose rule signature & job span & several heuristics algos to obtain the candidate rule confs.
• Adopt a learning model to choose the rule confs for each job group.
• Papers.
• Methods.
• Model for each group.
• Parameters.
RMIT Classification: Trusted
Q & A
19/8/21 Group meeting 34
Ad

More Related Content

What's hot (20)

Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1
Kenta Oono
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
Saliya Ekanayake
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Big Data Spain
 
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Databricks
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
Dalei Li
 
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsOn-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy Models
Databricks
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
VMware Tanzu
 
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Databricks
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
How @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudHow @twitterhadoop chose google cloud
How @twitterhadoop chose google cloud
lohitvijayarenu
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
DataWorks Summit
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Spark Summit
 
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan PanayotovSpark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Databricks
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
Impetus Technologies
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)
PyData
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
Jen Aman
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Spark Summit
 
Addressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraAddressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandra
Nakul Jeirath
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
Kohei KaiGai
 
Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1
Kenta Oono
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
Saliya Ekanayake
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Big Data Spain
 
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Databricks
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
Dalei Li
 
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsOn-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy Models
Databricks
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
VMware Tanzu
 
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Databricks
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
SigOpt
 
How @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudHow @twitterhadoop chose google cloud
How @twitterhadoop chose google cloud
lohitvijayarenu
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
DataWorks Summit
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Spark Summit
 
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan PanayotovSpark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Databricks
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
Impetus Technologies
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)
PyData
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
Jen Aman
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Spark Summit
 
Addressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraAddressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandra
Nakul Jeirath
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
Kohei KaiGai
 

Similar to [Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workloads (20)

Blinkdb
BlinkdbBlinkdb
Blinkdb
Nitish Upreti
 
Bayesian Optimization for Balancing Metrics in Recommender Systems
Bayesian Optimization for Balancing Metrics in Recommender SystemsBayesian Optimization for Balancing Metrics in Recommender Systems
Bayesian Optimization for Balancing Metrics in Recommender Systems
Viral Gupta
 
Efficient Query Processing Using Machine Learning
Efficient Query Processing Using Machine LearningEfficient Query Processing Using Machine Learning
Efficient Query Processing Using Machine Learning
Databricks
 
Optimization in scilab
Optimization in scilabOptimization in scilab
Optimization in scilab
Scilab
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
Chetan Khatri
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
Wush Wu
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
PivotalOpenSourceHub
 
Pivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalRPivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalR
go-pivotal
 
Applied_Data_Science__17203210230123.pdf
Applied_Data_Science__17203210230123.pdfApplied_Data_Science__17203210230123.pdf
Applied_Data_Science__17203210230123.pdf
nooreen nayyar syeda
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
Salford Systems
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Databricks
 
2014-08-14 Alpine Innovation to Spark
2014-08-14 Alpine Innovation to Spark2014-08-14 Alpine Innovation to Spark
2014-08-14 Alpine Innovation to Spark
DB Tsai
 
Alpine innovation final v1.0
Alpine innovation final v1.0Alpine innovation final v1.0
Alpine innovation final v1.0
alpinedatalabs
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Chetan Khatri
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
University of Washington
 
Aif360 2018-21 dec-eno_kyam
Aif360 2018-21 dec-eno_kyamAif360 2018-21 dec-eno_kyam
Aif360 2018-21 dec-eno_kyam
YamashitaKatsushi
 
Dm
DmDm
Dm
Shubhashish Biswas
 
Machine learning and its parameter is discussed here
Machine learning and its parameter is discussed hereMachine learning and its parameter is discussed here
Machine learning and its parameter is discussed here
RevathiSundar4
 
Ijcai 2020
Ijcai 2020Ijcai 2020
Ijcai 2020
Viral Gupta
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
Vijay Srinivas Agneeswaran, Ph.D
 
Bayesian Optimization for Balancing Metrics in Recommender Systems
Bayesian Optimization for Balancing Metrics in Recommender SystemsBayesian Optimization for Balancing Metrics in Recommender Systems
Bayesian Optimization for Balancing Metrics in Recommender Systems
Viral Gupta
 
Efficient Query Processing Using Machine Learning
Efficient Query Processing Using Machine LearningEfficient Query Processing Using Machine Learning
Efficient Query Processing Using Machine Learning
Databricks
 
Optimization in scilab
Optimization in scilabOptimization in scilab
Optimization in scilab
Scilab
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
Chetan Khatri
 
Online advertising and large scale model fitting
Online advertising and large scale model fittingOnline advertising and large scale model fitting
Online advertising and large scale model fitting
Wush Wu
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
PivotalOpenSourceHub
 
Pivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalRPivotal OSS meetup - MADlib and PivotalR
Pivotal OSS meetup - MADlib and PivotalR
go-pivotal
 
Applied_Data_Science__17203210230123.pdf
Applied_Data_Science__17203210230123.pdfApplied_Data_Science__17203210230123.pdf
Applied_Data_Science__17203210230123.pdf
nooreen nayyar syeda
 
Informs presentation new ppt
Informs presentation new pptInforms presentation new ppt
Informs presentation new ppt
Salford Systems
 
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo InterlandiLazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Lazy Join Optimizations Without Upfront Statistics with Matteo Interlandi
Databricks
 
2014-08-14 Alpine Innovation to Spark
2014-08-14 Alpine Innovation to Spark2014-08-14 Alpine Innovation to Spark
2014-08-14 Alpine Innovation to Spark
DB Tsai
 
Alpine innovation final v1.0
Alpine innovation final v1.0Alpine innovation final v1.0
Alpine innovation final v1.0
alpinedatalabs
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Chetan Khatri
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
University of Washington
 
Machine learning and its parameter is discussed here
Machine learning and its parameter is discussed hereMachine learning and its parameter is discussed here
Machine learning and its parameter is discussed here
RevathiSundar4
 
Ad

More from PingCAP (20)

[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
PingCAP
 
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
PingCAP
 
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
PingCAP
 
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
PingCAP
 
[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases
PingCAP
 
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
[Paper Reading]  Leases: An Efficient Fault-Tolerant Mechanism for Distribute...[Paper Reading]  Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
PingCAP
 
[Paperreading] Paxos made easy (by sen han)
[Paperreading]  Paxos made easy (by sen han)[Paperreading]  Paxos made easy (by sen han)
[Paperreading] Paxos made easy (by sen han)
PingCAP
 
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
PingCAP
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
PingCAP
 
TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote
PingCAP
 
Finding Logic Bugs in Database Management Systems
Finding Logic Bugs in Database Management SystemsFinding Logic Bugs in Database Management Systems
Finding Logic Bugs in Database Management Systems
PingCAP
 
Chaos Practice in PingCAP
Chaos Practice in PingCAPChaos Practice in PingCAP
Chaos Practice in PingCAP
PingCAP
 
TiDB at PayPay
TiDB at PayPayTiDB at PayPay
TiDB at PayPay
PingCAP
 
Paper Reading: FPTree
Paper Reading: FPTreePaper Reading: FPTree
Paper Reading: FPTree
PingCAP
 
Paper Reading: Smooth Scan
Paper Reading: Smooth ScanPaper Reading: Smooth Scan
Paper Reading: Smooth Scan
PingCAP
 
Paper Reading: Flexible Paxos
Paper Reading: Flexible PaxosPaper Reading: Flexible Paxos
Paper Reading: Flexible Paxos
PingCAP
 
Paper reading: Cost-based Query Transformation in Oracle
Paper reading: Cost-based Query Transformation in OraclePaper reading: Cost-based Query Transformation in Oracle
Paper reading: Cost-based Query Transformation in Oracle
PingCAP
 
Paper reading: HashKV and beyond
Paper reading: HashKV and beyondPaper reading: HashKV and beyond
Paper reading: HashKV and beyond
PingCAP
 
Paper Reading: Pessimistic Cardinality Estimation
Paper Reading: Pessimistic Cardinality EstimationPaper Reading: Pessimistic Cardinality Estimation
Paper Reading: Pessimistic Cardinality Estimation
PingCAP
 
Building a transactional key-value store that scales to 100+ nodes (percona l...
Building a transactional key-value store that scales to 100+ nodes (percona l...Building a transactional key-value store that scales to 100+ nodes (percona l...
Building a transactional key-value store that scales to 100+ nodes (percona l...
PingCAP
 
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
PingCAP
 
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
PingCAP
 
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
PingCAP
 
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
PingCAP
 
[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases
PingCAP
 
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
[Paper Reading]  Leases: An Efficient Fault-Tolerant Mechanism for Distribute...[Paper Reading]  Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
PingCAP
 
[Paperreading] Paxos made easy (by sen han)
[Paperreading]  Paxos made easy (by sen han)[Paperreading]  Paxos made easy (by sen han)
[Paperreading] Paxos made easy (by sen han)
PingCAP
 
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
PingCAP
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
PingCAP
 
TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote
PingCAP
 
Finding Logic Bugs in Database Management Systems
Finding Logic Bugs in Database Management SystemsFinding Logic Bugs in Database Management Systems
Finding Logic Bugs in Database Management Systems
PingCAP
 
Chaos Practice in PingCAP
Chaos Practice in PingCAPChaos Practice in PingCAP
Chaos Practice in PingCAP
PingCAP
 
TiDB at PayPay
TiDB at PayPayTiDB at PayPay
TiDB at PayPay
PingCAP
 
Paper Reading: FPTree
Paper Reading: FPTreePaper Reading: FPTree
Paper Reading: FPTree
PingCAP
 
Paper Reading: Smooth Scan
Paper Reading: Smooth ScanPaper Reading: Smooth Scan
Paper Reading: Smooth Scan
PingCAP
 
Paper Reading: Flexible Paxos
Paper Reading: Flexible PaxosPaper Reading: Flexible Paxos
Paper Reading: Flexible Paxos
PingCAP
 
Paper reading: Cost-based Query Transformation in Oracle
Paper reading: Cost-based Query Transformation in OraclePaper reading: Cost-based Query Transformation in Oracle
Paper reading: Cost-based Query Transformation in Oracle
PingCAP
 
Paper reading: HashKV and beyond
Paper reading: HashKV and beyondPaper reading: HashKV and beyond
Paper reading: HashKV and beyond
PingCAP
 
Paper Reading: Pessimistic Cardinality Estimation
Paper Reading: Pessimistic Cardinality EstimationPaper Reading: Pessimistic Cardinality Estimation
Paper Reading: Pessimistic Cardinality Estimation
PingCAP
 
Building a transactional key-value store that scales to 100+ nodes (percona l...
Building a transactional key-value store that scales to 100+ nodes (percona l...Building a transactional key-value store that scales to 100+ nodes (percona l...
Building a transactional key-value store that scales to 100+ nodes (percona l...
PingCAP
 
Ad

Recently uploaded (20)

PM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptx
PM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptxPM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptx
PM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptx
afriyanrtanjung007
 
Mixed Methods Research.pptx education 201
Mixed Methods Research.pptx education 201Mixed Methods Research.pptx education 201
Mixed Methods Research.pptx education 201
GraceSolaa1
 
MLOps_with_SageMaker_Template_EN idioma inglés
MLOps_with_SageMaker_Template_EN idioma inglésMLOps_with_SageMaker_Template_EN idioma inglés
MLOps_with_SageMaker_Template_EN idioma inglés
FabianPierrePeaJacob
 
The-Future-is-Now-Information-Technology-Trends.pptx.pdf
The-Future-is-Now-Information-Technology-Trends.pptx.pdfThe-Future-is-Now-Information-Technology-Trends.pptx.pdf
The-Future-is-Now-Information-Technology-Trends.pptx.pdf
winnt04
 
Research presentations and statistics for computer science.pptx
Research presentations  and statistics for computer science.pptxResearch presentations  and statistics for computer science.pptx
Research presentations and statistics for computer science.pptx
vimbaimapfumo25
 
463.8-Bitcoin from university of illinois
463.8-Bitcoin from university of illinois463.8-Bitcoin from university of illinois
463.8-Bitcoin from university of illinois
8gqtkfzwbb
 
Nature and Characteristics of Research.pptx
Nature and Characteristics of Research.pptxNature and Characteristics of Research.pptx
Nature and Characteristics of Research.pptx
KyleEmperado
 
DEWDHDIEFHIFHIHGIERHFIHIM SC ID (2).pptx
DEWDHDIEFHIFHIHGIERHFIHIM SC ID (2).pptxDEWDHDIEFHIFHIHGIERHFIHIM SC ID (2).pptx
DEWDHDIEFHIFHIHGIERHFIHIM SC ID (2).pptx
f8jyv28tjr
 
Mathcad Sales Presentation software for use.PPTX
Mathcad Sales Presentation software for use.PPTXMathcad Sales Presentation software for use.PPTX
Mathcad Sales Presentation software for use.PPTX
ManojSharma311544
 
FT Partners Research - FinTech in Africa-2.pdf
FT Partners Research - FinTech in Africa-2.pdfFT Partners Research - FinTech in Africa-2.pdf
FT Partners Research - FinTech in Africa-2.pdf
Obinna8
 
PN_Junction_Diode_Typdbhghfned_Notes.pdf
PN_Junction_Diode_Typdbhghfned_Notes.pdfPN_Junction_Diode_Typdbhghfned_Notes.pdf
PN_Junction_Diode_Typdbhghfned_Notes.pdf
AryanGohil1
 
Faces of the Future The Impact of a Data Science Course in Kerala.pdf
Faces of the Future The Impact of a Data Science Course in Kerala.pdfFaces of the Future The Impact of a Data Science Course in Kerala.pdf
Faces of the Future The Impact of a Data Science Course in Kerala.pdf
jzyphoenix
 
Chapter VII RECURSION.pdf algor and data structure
Chapter VII RECURSION.pdf algor and data structureChapter VII RECURSION.pdf algor and data structure
Chapter VII RECURSION.pdf algor and data structure
benyakoubrania53
 
Urban models for professional practice 03
Urban models for professional practice 03Urban models for professional practice 03
Urban models for professional practice 03
DanisseLoiDapdap
 
Professional Certificate in Applied AI and Machine Learning
Professional Certificate in Applied AI and Machine LearningProfessional Certificate in Applied AI and Machine Learning
Professional Certificate in Applied AI and Machine Learning
Nafisur Ahmed
 
Group Presentation - Cyclic Redundancy Checks.pptx
Group Presentation - Cyclic Redundancy Checks.pptxGroup Presentation - Cyclic Redundancy Checks.pptx
Group Presentation - Cyclic Redundancy Checks.pptx
vimbaimapfumo25
 
Giới thiệu mô hình học nhiều tầng (deep learning models)
Giới thiệu mô hình học nhiều tầng (deep learning models)Giới thiệu mô hình học nhiều tầng (deep learning models)
Giới thiệu mô hình học nhiều tầng (deep learning models)
nkphat
 
web-roadmap developer file information..
web-roadmap developer file information..web-roadmap developer file information..
web-roadmap developer file information..
pandeyarush01
 
Time series analysis & forecasting-Day1.pptx
Time series analysis & forecasting-Day1.pptxTime series analysis & forecasting-Day1.pptx
Time series analysis & forecasting-Day1.pptx
AsmaaMahmoud89
 
chapter-6 (1).pdf immunology innate immunity
chapter-6 (1).pdf immunology innate immunitychapter-6 (1).pdf immunology innate immunity
chapter-6 (1).pdf immunology innate immunity
bedadadenbal50
 
PM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptx
PM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptxPM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptx
PM003_SERENE-CM-PM-Training Material-EAM Maintenance Notification.pptx
afriyanrtanjung007
 
Mixed Methods Research.pptx education 201
Mixed Methods Research.pptx education 201Mixed Methods Research.pptx education 201
Mixed Methods Research.pptx education 201
GraceSolaa1
 
MLOps_with_SageMaker_Template_EN idioma inglés
MLOps_with_SageMaker_Template_EN idioma inglésMLOps_with_SageMaker_Template_EN idioma inglés
MLOps_with_SageMaker_Template_EN idioma inglés
FabianPierrePeaJacob
 
The-Future-is-Now-Information-Technology-Trends.pptx.pdf
The-Future-is-Now-Information-Technology-Trends.pptx.pdfThe-Future-is-Now-Information-Technology-Trends.pptx.pdf
The-Future-is-Now-Information-Technology-Trends.pptx.pdf
winnt04
 
Research presentations and statistics for computer science.pptx
Research presentations  and statistics for computer science.pptxResearch presentations  and statistics for computer science.pptx
Research presentations and statistics for computer science.pptx
vimbaimapfumo25
 
463.8-Bitcoin from university of illinois
463.8-Bitcoin from university of illinois463.8-Bitcoin from university of illinois
463.8-Bitcoin from university of illinois
8gqtkfzwbb
 
Nature and Characteristics of Research.pptx
Nature and Characteristics of Research.pptxNature and Characteristics of Research.pptx
Nature and Characteristics of Research.pptx
KyleEmperado
 
DEWDHDIEFHIFHIHGIERHFIHIM SC ID (2).pptx
DEWDHDIEFHIFHIHGIERHFIHIM SC ID (2).pptxDEWDHDIEFHIFHIHGIERHFIHIM SC ID (2).pptx
DEWDHDIEFHIFHIHGIERHFIHIM SC ID (2).pptx
f8jyv28tjr
 
Mathcad Sales Presentation software for use.PPTX
Mathcad Sales Presentation software for use.PPTXMathcad Sales Presentation software for use.PPTX
Mathcad Sales Presentation software for use.PPTX
ManojSharma311544
 
FT Partners Research - FinTech in Africa-2.pdf
FT Partners Research - FinTech in Africa-2.pdfFT Partners Research - FinTech in Africa-2.pdf
FT Partners Research - FinTech in Africa-2.pdf
Obinna8
 
PN_Junction_Diode_Typdbhghfned_Notes.pdf
PN_Junction_Diode_Typdbhghfned_Notes.pdfPN_Junction_Diode_Typdbhghfned_Notes.pdf
PN_Junction_Diode_Typdbhghfned_Notes.pdf
AryanGohil1
 
Faces of the Future The Impact of a Data Science Course in Kerala.pdf
Faces of the Future The Impact of a Data Science Course in Kerala.pdfFaces of the Future The Impact of a Data Science Course in Kerala.pdf
Faces of the Future The Impact of a Data Science Course in Kerala.pdf
jzyphoenix
 
Chapter VII RECURSION.pdf algor and data structure
Chapter VII RECURSION.pdf algor and data structureChapter VII RECURSION.pdf algor and data structure
Chapter VII RECURSION.pdf algor and data structure
benyakoubrania53
 
Urban models for professional practice 03
Urban models for professional practice 03Urban models for professional practice 03
Urban models for professional practice 03
DanisseLoiDapdap
 
Professional Certificate in Applied AI and Machine Learning
Professional Certificate in Applied AI and Machine LearningProfessional Certificate in Applied AI and Machine Learning
Professional Certificate in Applied AI and Machine Learning
Nafisur Ahmed
 
Group Presentation - Cyclic Redundancy Checks.pptx
Group Presentation - Cyclic Redundancy Checks.pptxGroup Presentation - Cyclic Redundancy Checks.pptx
Group Presentation - Cyclic Redundancy Checks.pptx
vimbaimapfumo25
 
Giới thiệu mô hình học nhiều tầng (deep learning models)
Giới thiệu mô hình học nhiều tầng (deep learning models)Giới thiệu mô hình học nhiều tầng (deep learning models)
Giới thiệu mô hình học nhiều tầng (deep learning models)
nkphat
 
web-roadmap developer file information..
web-roadmap developer file information..web-roadmap developer file information..
web-roadmap developer file information..
pandeyarush01
 
Time series analysis & forecasting-Day1.pptx
Time series analysis & forecasting-Day1.pptxTime series analysis & forecasting-Day1.pptx
Time series analysis & forecasting-Day1.pptx
AsmaaMahmoud89
 
chapter-6 (1).pdf immunology innate immunity
chapter-6 (1).pdf immunology innate immunitychapter-6 (1).pdf immunology innate immunity
chapter-6 (1).pdf immunology innate immunity
bedadadenbal50
 

[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workloads

  • 1. RMIT Classification: Trusted Steering Query Optimizers: A Practical Take on Big Data Workloads Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, Alekh Jindal Microsoft, MIT & Intel Lab. SIGMOD’21 Presented by Hai Lan
  • 2. RMIT Classification: Trusted Outline • Background • Optimizer & Learned methods on optimizer • Bao (SIGMOD’21) • To-be-shared work • Scope optimizer & workload • Motivation & Goal • Method • Discussions 19/8/21 Group meeting 2
  • 4. RMIT Classification: Trusted Background – Optimizer 19/8/21 Group meeting 4 Query Parser AST Query Rewrite AST’ Optimizer Phy. Plan Executor Results Logical Opt. Physical Opt. Log. Plan Phy. Plan Life of A Query
  • 5. RMIT Classification: Trusted Background – Optimizer 19/8/21 Group meeting 5 Query Parser AST Query Rewrite AST’ Optimizer Phy. Plan Executor Results Logical Opt. Physical Opt. Log. Plan Phy. Plan Life of A Query Two Representative Arch. Volcano
  • 6. RMIT Classification: Trusted Background – Optimizer 19/8/21 Group meeting 6 Query Parser AST Query Rewrite AST’ Optimizer Phy. Plan Executor Results Logical Opt. Physical Opt. Log. Plan Phy. Plan Life of A Query Two Representative Arch. Cascades Volcano
  • 7. RMIT Classification: Trusted Background – Keys in Optimizer 19/8/21 Group meeting 7 Cardinality Estimation Plan Enumeration (Join Order) Cost Model A structure to store the table statistics, e.g., sample, histogram, sketch. Evaluation model, e.g., evaluate on sample, assumptions when using histogram. Cardinality Estimation Predefined parameters, which are related to physical operators, running env. Cost Model Large join query
  • 8. RMIT Classification: Trusted Background – Keys in Optimizer 19/8/21 Group meeting 8 Cardinality Estimation Plan Enumeration (Join Order) Cost Model A structure to store the table statistics, e.g., sample, histogram, sketch. Evaluation model, e.g., evaluate on sample, assumptions when using histogram. Cardinality Estimation Predefined parameters, which are related to physical operator, environment. Cost Model Large join query The root of all evil, the Achilles Heel of query optimization, is the estimation of the size of intermediate results, known as cardinalities. -- Guy Lohman
  • 9. RMIT Classification: Trusted 19/8/21 Group meeting 9 Learned model to estimate the cardinality. Learned model to get the cost8,9. Reinforcement learning methods to obtain the join order10,11. Query-driven1,2,3 Data-driven 4,5,6 Hybrid 7 1. Andreas Kipf et al. : Learned Cardinalities: Estimating Correlated Joins with Deep Learning. CIDR 2019 2. Anshuman Dutt et al. : Selectivity Estimation for Range Predicates using Lightweight Models. Proc. VLDB Endow. 12(9): 1044-1057 (2019) 3. Chenggang Wu et al. : Towards a Learning Optimizer for Shared Clouds. Proc. VLDB Endow. 12(3): 210-222 (2018) 4. Zongheng Yang et al. : Deep Unsupervised Cardinality Estimation. Proc. VLDB Endow. 13(3): 279-292 (2019) 5. Benjamin Hilprecht et al. : DeepDB: Learn from Data, not from Queries! Proc. VLDB Endow. 13(7): 992-1005 (2020) 6. Rong Zhu et al. : FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. Proc. VLDB Endow. 14(9): 1489-1502 (2021) 7. Peizhi Wu, Gao Cong: A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation. SIGMOD Conference 2021: 2009-2022 8. Ji Sun, Guoliang Li: An End-to-End Learning-based Cost Estimator. Proc. VLDB Endow. 13(3): 307-319 (2019) 9. Tarique Siddiqui et al. : Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings. SIGMOD Conference 2020: 99-113 10. Sanjay Krishnan et al. : Learning to Optimize Join Queries With Deep Reinforcement Learning. CoRR abs/1808.03196 (2018) 11. Xiang Yu, Guoliang Li, Chengliang Chai, Nan Tang: Reinforcement Learning with Tree-LSTM for Join Order Selection. ICDE 2020: 1297-1308
  • 10. RMIT Classification: Trusted Background – Bao1 (Bandit Optimizer) 19/8/21 Group meeting 10 Motivations. • Due to the inaccurate cardinality estimation, wrong physical operators may be selected. • Databases support hints2 to specify some operators. 1. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska: Bao: Making Learned Query Optimization Practical. SIGMOD Conference 2021: 1275-1288 2. Here `hint` is not the same with in TiDB or MySQL.
  • 11. RMIT Classification: Trusted Background – Bao1 (Bandit Optimizer) 19/8/21 Group meeting 11 Motivations. • Due to the inaccurate cardinality estimation, wrong physical operators may be selected. • Databases support hints2 to specify some operators. Bao’s Work. • It automatically and adaptively determines the right hint set to use for an incoming query. • Instead of using `cost` in optimizer, users can specify a metric, like running time used in the paper. 1. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska: Bao: Making Learned Query Optimization Practical. SIGMOD Conference 2021: 1275-1288 2. Here `hint` is not the same with in TiDB or MySQL. 48 (26) hint sets
  • 12. RMIT Classification: Trusted Background – Bao 19/8/21 Group meeting 12 Method. • Train a predictive model for the metric. • When a query coming, select the plan with the lowest cost under the metric.
  • 13. RMIT Classification: Trusted Background – Bao 19/8/21 Group meeting 13 Method. • Train a predictive model for the metric. • When a query coming, select the plan with the lowest cost under the metric. Prons & Cons. • Prons • Dynamic situations • Integrate with a real system • Training time • Cons • It cannot specify the subplan hint.
  • 14. RMIT Classification: Trusted Steer Scope Optimizer 19/8/21 Group meeting 14
  • 15. RMIT Classification: Trusted Scope Overview 19/8/21 Group meeting 15 Scope Optimizer. • Belong to Cascades family. • 256 rules in total • Required rules, e.g., EnforceExchange • Implementation rules, e.g., HashJoinImpl1 • On-by-default rules, e.g., various rewrite rules • Off-by-default rules, e.g., CorrelatedJoinOnUnion
  • 16. RMIT Classification: Trusted Scope Overview 19/8/21 Group meeting 16 Scope Optimizer. • Belong to Cascades family. • 256 rules in total • Required rules, e.g., EnforceExchange • Implementation rules, e.g., HashJoinImpl1 • On-by-default rules, e.g., various rewrite rules • Off-by-default rules, e.g., CorrelatedJoinOnUnion Default Rules
  • 17. RMIT Classification: Trusted Scope Overview 19/8/21 Group meeting 17 Scope Optimizer. Workload in Scope. • Belong to Cascades family. • 256 rules in total • Required rules, e.g., EnforceExchange • Implementation rules, e.g., HashJoinImpl1 • On-by-default rules, e.g., various rewrite rules • Off-by-default rules, e.g., CorrelatedJoinOnUnion Default Rules
  • 18. RMIT Classification: Trusted Scope Overview 19/8/21 Group meeting 18 Scope Optimizer. Workload in Scope. • Recurrent jobs, same template with different variables. • Short & long running jobs. • 10% of jobs last over 5 min while consume 90% of containers. • Metrics • Runtime • CPU time • Total I/O time • Belong to Cascades family. • 256 rules in total • Required rules, e.g., EnforceExchange • Implementation rules, e.g., HashJoinImpl1 • On-by-default rules, e.g., various rewrite rules • Off-by-default rules, e.g., CorrelatedJoinOnUnion Default Rules
  • 19. RMIT Classification: Trusted Motivations & Goal 19/8/21 Group meeting 19 Motivations. • Due to the inaccurate cardinality estimation, wrong rules may be selected. • Hints to specify which rules to use.
  • 20. RMIT Classification: Trusted Motivations & Goal 19/8/21 Group meeting 20 Motivations. • Due to the inaccurate cardinality estimation, wrong rules may be selected. • Hints to specify which rules to use. Goal. • Output an alternative rule configuration which is better for optimizing this particular job, and for a given metric
  • 21. RMIT Classification: Trusted Motivations & Goal 19/8/21 Group meeting 21 Motivations. • Due to the inaccurate cardinality estimation, wrong rules may be selected. • Hints to specify which rules to use. Relationship with Bao. • Directly apply Bao on Scope? • Hint -> Rule; Hint Set -> Rule configuration • However … • A lot more rules (200+ vs. 6) -> too many possible rule configurations • Large workload -> large running time & hundreds of operator nodes. Goal. • Output an alternative rule configuration which is better for optimizing this particular job, and for a given metric
  • 22. RMIT Classification: Trusted Rule Signature & Job Span 19/8/21 Group meeting 22 Rule Signature. • A bit vector specifying which rules directly contribute to the final query plan produced by the optimizer as the rule signature. • The rule signature of a query optimized using the default rule configuration as the default rule signature. Job Span • Given a job, its span contains all non-required rules which, if enabled or disabled, can affect the final query plan. • Heuristics to generate the span.
  • 23. RMIT Classification: Trusted Which rules to try? 19/8/21 Group meeting 23 • Enable all the rules that are not in the span of the given job. • For each rule category, independently sample a subset of rules from the job span. Disable these rules, and enable all others. This gives us a new rule configuration. • If the rule configuration has not been seen before, add it to the candidate list. Repeat until 𝑀 configurations are generated. 𝑀 = 1000 Randomized Configuration Search.
  • 24. RMIT Classification: Trusted Which jobs to try? 19/8/21 Group meeting 24 Choose Jobs & Configurations to Execute. • Select Jobs. • Jobs with clearly lower costs with recompiled plans under the default cost model. • Jobs with low cost, high runtimes under the default configuration (cost model is wrong). • Select Configurations. • Select the 10 cheapest (cost model) alternative rule configurations and execute them. Workload B (compare to the default configuration)
  • 25. RMIT Classification: Trusted Which jobs to try? 19/8/21 Group meeting 25 Choose Jobs & Configurations to Execute. • Select Jobs. • Jobs with clearly lower costs with recompiled plans under the default cost model. • Jobs with low cost, high runtimes under the default configuration (cost model is wrong). • Select Configurations. • Select the 10 cheapest (cost model) alternative rule configurations and execute them. Workload B (compare to the default configuration)
  • 26. RMIT Classification: Trusted Different metrics 19/8/21 Group meeting 26 Other metrics sometime see regression.
  • 27. RMIT Classification: Trusted Different metrics 19/8/21 Group meeting 27
  • 28. RMIT Classification: Trusted Different metrics 19/8/21 Group meeting 28
  • 29. RMIT Classification: Trusted Different metrics 19/8/21 Group meeting 29 All metrics cannot be improved together. Potentially to adopt different models for each one.
  • 30. RMIT Classification: Trusted Extrapolating to other jobs 19/8/21 Group meeting 30 • The rule signature as the level of granularity across which the same set of rule configurations could be useful. • Rule signature job group • The set of jobs whose default rule signature map to the same bit vector. Idea. Methods. • Case 1: simply apply a previously seen rule configuration. • Case 2: find set of interesting configurations for each job group and adopt a model to choose one at the compile time.
  • 31. RMIT Classification: Trusted Learning Rule Configurations 19/8/21 Group meeting 31 • Select S rule signatures from Workload. • Collect the jobs whose default rule signature maps to these rule signatures. • Obtain K candidate configurations for each job group. • we sample 𝑀 jobs from all the jobs mapping to these job groups. • execute each of the 𝐾 configurations for every job. Training Set. Learning Problem. • Treat the dataset of samples in each job group as an independent learning problem. • Goal is to select one of the 𝐾 candidate configurations for a given query. • Supervised learning to estimate the running time of query under a configuration. Featurization. • Job level features, e.g., input cardinality size, hash of template. • Rule configuration features, e.g., cost of plan, bit vector of RuleDiff. • Query graph features, e.g., operators’ id, cost. Learned Models. • For each job group, a fully connected neural network with one hidden layer of size 1024. (Job, RuleConf, Running Time)
  • 32. RMIT Classification: Trusted 19/8/21 Group meeting 32 Learning Rule Configurations
  • 33. RMIT Classification: Trusted Discussion 19/8/21 Group meeting 33 Future work. • Methods to generate the job span & interesting rule configurations. • Use feedback from the execution results to guide future iterations of the configuration search • Other configurable options in Scope. Discussion. Summary. • How to choose the right rule configuration for an incoming query. • Propose rule signature & job span & several heuristics algos to obtain the candidate rule confs. • Adopt a learning model to choose the rule confs for each job group. • Papers. • Methods. • Model for each group. • Parameters.
  • 34. RMIT Classification: Trusted Q & A 19/8/21 Group meeting 34
  翻译: