SlideShare a Scribd company logo
Tuning ML Models:
Scaling, Workflows,
and Architecture
Joseph Bradley
Solutions Architect
About me
▪ Solutions Architect at Databricks (½ a year)
▪ Software Engineer at Databricks (5 ½ years)
▪ Apache Spark committer and PMC member
Global company with over 5,000 customers and 450+ partners
Original creators of popular data and machine learning open source projects
A unified data analytics platform for accelerating innovation across
data engineering, data science, and analytics
Hyperparameter
Model 1: “Dress”
Model 2: “Sneaker”
Why the difference?
Model 2 had better hyperparameter settings:
• Learning rate
• Model structure
• ...
What’s a hyperparameter?
• Statistical: assumptions about your
model/data
• Practical: inputs your ML library does not
learn from data
• Algorithmic: problem-dependent configs
Hyperparameter tuning
Expert knowledge
Not in this talk:
• Statistical best practices
• Overview of methods for tuning
In this talk:
• Data Science workflow best practices
• Tips for the big data and cloud computing space
Black-box tuning Ignore until needed
à See references at end!
Hyperparameter tuning is tough
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10
# hyperparameters
(7 possible values each)
Fractioncoverage
Non-convex optimization
Curse of dimensionality
è Computational cost
Unintuitive hyperparameters
• Regularization
• Neural net structure
• and many more...
In this talk
Architectural patterns
Workflows
Tips and details
Getting started
In this talk
Architectural patterns
Workflows
Tips and details
Getting started
Single-machine vs. distributed training
• Single-machine training
• Distributed training
• Training 1 model per group (per customer, product, etc.)
Single-machine training
Tuning
Scale out via distributed
hyperparameter tuning
à Train 1 model per Spark task
Driver
Worker WorkerWorker
Tuning
Tools for this:
• Hyperopt + SparkTrials
• sklearn + joblibspark
• Pandas UDFs
Distributed training
Driver
Worker WorkerWorker
Training
Tuning
ML
ML
Scale out via parallel
tuning
à Train 1 model at a time, or
train 2+ models in parallel
Tools for this:
• Apache Spark ML
• Hyperopt
Training one model per group
Driver
Worker WorkerWorker
Distribute
over groups
Tuning Tuning Tuning
Scale out by distributing
over groups
à Train 1 group’s model per
Spark task
Tools for this:
• Apache Spark, Pandas UDFs
• Hyperopt
In this talk
Architectural patterns
Workflows
Tips and details
Getting started
Getting started
Start small
• Bias towards smaller models, fewer
iterations, etc.
• Small is cheaper, and it may suffice.
• Regardless, it gives a baseline.
Think before you tune
• Separate train/val/test sets.
• Use early stopping or smart tuning
wherever possible.
• Pick hyperparameters carefully.
• Pick ranges carefully.
Models vs. pipelines
Best practice: Set up full pipeline before tuning.
• At what point does the pipeline compute the metric you care about?
Tuning models vs. tuning pipelines
• Tuning featurization
Optimizing tuning for pipelines
• Cache intermediate results
Evaluating and iterating
Validation data and metrics
• Record many metrics on both training and validation data.
Tuning hyperparameters independently vs. jointly
• Using smarter hyperparameter search algorithms
Tracking and reproducibility
• Data, code, params, metrics, models, metadata
• Tip: Parametrize code to facilitate tracking
(Demo)
In this talk
Architectural patterns
Workflows
Tips and details
Getting started
Handling code
Getting code to workers
• Generally simple: Pandas UDFs or
integrations (Hyperopt, etc.)
• Debugging code serialization
• Errors are often in worker logs and
look like ML library bugs (e.g., “no
module named X...”)
• Tip: For Python, import libraries
within closures
Passing configs and credentials
• E.g., MLflow active runs and
credentials
Helpful resource:
Distributed Hyperopt best practices
and troubleshooting
Moving data
Single-machine ML
• Broadcast
• Load from blob storage
• Caching data
Distributed ML
• Caching data
Blob storage data prep
• Delta Lake format
• Petastorm and TFRecords
Helpful resources:
• Distributed Hyperopt best practices
and troubleshooting
• Prepare data for distributed
training
Configuring clusters
Single-machine ML
• Sharing machine resources
• Selecting machine types
Distributed ML
• Right-sizing clusters
• Sharing a cluster
Helpful resource:
Scaling Hyperopt to Tune Machine
Learning Models in Python
In this talk
Architectural patterns
Workflows
Tips and details
Getting started
Tools to know about
Apache Spark CrossValidator, TrainValidationSplit, Pandas UDFs
MLflow Tracking, Auto-logging
ML in Python Hyperopt SparkTrials & more (See last year’s SAIS talk)
Scikit-learn sklearn.model_selection, skopt, joblib+Spark
TensorFlow HParams, Keras Tuner, Model Optimization Toolkit
PyTorch Ax
Resources
Blog posts + example notebooks
• Hyperparameter Tuning with MLflow, Apache Spark MLlib and Hyperopt
Talks
• Best Practices for Hyperparameter Tuning with MLflow (SAIS 2019)
• Advanced Hyperparameter Optimization for Deep Learning with MLflow (SAIS 2019)
Project pages + docs
• Hyperopt docs and Github
• MLflow homepage and Github
Slides: tinyurl.com/sais2020-joseph
Notebook: tinyurl.com/sais2020-joseph-demo
Thanks!
Architectural patterns
Workflows
Tips and details
Getting started
Any questions?
Ad

More Related Content

What's hot (20)

The Feature Store in Hopsworks
The Feature Store in HopsworksThe Feature Store in Hopsworks
The Feature Store in Hopsworks
Jim Dowling
 
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkScaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache Spark
Databricks
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
Databricks
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
HostedbyConfluent
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Beyond the Spotify Model - Team Topologies - Tech.rocks - 2020-12-10 - Matthe...
Beyond the Spotify Model - Team Topologies - Tech.rocks - 2020-12-10 - Matthe...Beyond the Spotify Model - Team Topologies - Tech.rocks - 2020-12-10 - Matthe...
Beyond the Spotify Model - Team Topologies - Tech.rocks - 2020-12-10 - Matthe...
Matthew Skelton
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Cathrine Wilhelmsen
 
Introduction to Dremio
Introduction to DremioIntroduction to Dremio
Introduction to Dremio
Dremio Corporation
 
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
DataWorks Summit/Hadoop Summit
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
Mateusz Buśkiewicz
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
Semantic Web Company
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
Grafana Labs
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
Databricks
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 
The Feature Store in Hopsworks
The Feature Store in HopsworksThe Feature Store in Hopsworks
The Feature Store in Hopsworks
Jim Dowling
 
Scaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache SparkScaling Machine Learning with Apache Spark
Scaling Machine Learning with Apache Spark
Databricks
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
Databricks
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
HostedbyConfluent
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Beyond the Spotify Model - Team Topologies - Tech.rocks - 2020-12-10 - Matthe...
Beyond the Spotify Model - Team Topologies - Tech.rocks - 2020-12-10 - Matthe...Beyond the Spotify Model - Team Topologies - Tech.rocks - 2020-12-10 - Matthe...
Beyond the Spotify Model - Team Topologies - Tech.rocks - 2020-12-10 - Matthe...
Matthew Skelton
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Cathrine Wilhelmsen
 
Efficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and ArrowEfficient Data Formats for Analytics with Parquet and Arrow
Efficient Data Formats for Analytics with Parquet and Arrow
DataWorks Summit/Hadoop Summit
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Databricks
 
Introduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AIIntroduction to Knowledge Graphs and Semantic AI
Introduction to Knowledge Graphs and Semantic AI
Semantic Web Company
 
Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018Explore your prometheus data in grafana - Promcon 2018
Explore your prometheus data in grafana - Promcon 2018
Grafana Labs
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
Databricks
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 

Similar to Tuning ML Models: Scaling, Workflows, and Architecture (20)

Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-final
supportlogic
 
Apache Spark MLlib
Apache Spark MLlib Apache Spark MLlib
Apache Spark MLlib
Zahra Eskandari
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
Databricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
Nick Pentreath
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
Databricks
 
Python for Machine Learning_ A Comprehensive Overview.pptx
Python for Machine Learning_ A Comprehensive Overview.pptxPython for Machine Learning_ A Comprehensive Overview.pptx
Python for Machine Learning_ A Comprehensive Overview.pptx
KuldeepSinghBrar3
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
Databricks
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Databricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyApache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Databricks
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
Databricks
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Sri Ambati
 
TensorFlow 101
TensorFlow 101TensorFlow 101
TensorFlow 101
Raghu Rajah
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Databricks
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
Pythonn-machine-learning-with-python.ppt
Pythonn-machine-learning-with-python.pptPythonn-machine-learning-with-python.ppt
Pythonn-machine-learning-with-python.ppt
drakesean662
 
Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-final
supportlogic
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
Databricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
Nick Pentreath
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
Databricks
 
Python for Machine Learning_ A Comprehensive Overview.pptx
Python for Machine Learning_ A Comprehensive Overview.pptxPython for Machine Learning_ A Comprehensive Overview.pptx
Python for Machine Learning_ A Comprehensive Overview.pptx
KuldeepSinghBrar3
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
Databricks
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Databricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph BradleyApache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Apache Spark MLlib's Past Trajectory and New Directions with Joseph Bradley
Databricks
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
Databricks
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Sri Ambati
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Databricks
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
Pythonn-machine-learning-with-python.ppt
Pythonn-machine-learning-with-python.pptPythonn-machine-learning-with-python.ppt
Pythonn-machine-learning-with-python.ppt
drakesean662
 
Ad

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

Fundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithmsFundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithms
priyaiyerkbcsc
 
Agricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptxAgricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptx
mostafaahammed38
 
Process Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital TransformationsProcess Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital Transformations
Process mining Evangelist
 
Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
Dynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics DynamicsDynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics Dynamics
heyoubro69
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
Taqyea
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?
Process mining Evangelist
 
How to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process miningHow to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process mining
Process mining Evangelist
 
Chapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptxChapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptx
PermissionTafadzwaCh
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
Ann Naser Nabil- Data Scientist Portfolio.pdf
Ann Naser Nabil- Data Scientist Portfolio.pdfAnn Naser Nabil- Data Scientist Portfolio.pdf
Ann Naser Nabil- Data Scientist Portfolio.pdf
আন্ নাসের নাবিল
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Multi-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline OrchestrationMulti-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline Orchestration
Romi Kuntsman
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
Feature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record SystemsFeature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record Systems
Process mining Evangelist
 
Fundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithmsFundamentals of Data Analysis, its types, tools, algorithms
Fundamentals of Data Analysis, its types, tools, algorithms
priyaiyerkbcsc
 
Agricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptxAgricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptx
mostafaahammed38
 
Process Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital TransformationsProcess Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital Transformations
Process mining Evangelist
 
Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
Dynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics DynamicsDynamics 365 Business Rules Dynamics Dynamics
Dynamics 365 Business Rules Dynamics Dynamics
heyoubro69
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
Taqyea
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?
Process mining Evangelist
 
How to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process miningHow to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process mining
Process mining Evangelist
 
Chapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptxChapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptx
PermissionTafadzwaCh
 
Transforming health care with ai powered
Transforming health care with ai poweredTransforming health care with ai powered
Transforming health care with ai powered
gowthamarvj
 
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
新西兰文凭奥克兰理工大学毕业证书AUT成绩单补办
Taqyea
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Multi-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline OrchestrationMulti-tenant Data Pipeline Orchestration
Multi-tenant Data Pipeline Orchestration
Romi Kuntsman
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
Feature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record SystemsFeature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record Systems
Process mining Evangelist
 

Tuning ML Models: Scaling, Workflows, and Architecture

  • 1. Tuning ML Models: Scaling, Workflows, and Architecture Joseph Bradley Solutions Architect
  • 2. About me ▪ Solutions Architect at Databricks (½ a year) ▪ Software Engineer at Databricks (5 ½ years) ▪ Apache Spark committer and PMC member
  • 3. Global company with over 5,000 customers and 450+ partners Original creators of popular data and machine learning open source projects A unified data analytics platform for accelerating innovation across data engineering, data science, and analytics
  • 4. Hyperparameter Model 1: “Dress” Model 2: “Sneaker” Why the difference? Model 2 had better hyperparameter settings: • Learning rate • Model structure • ... What’s a hyperparameter? • Statistical: assumptions about your model/data • Practical: inputs your ML library does not learn from data • Algorithmic: problem-dependent configs
  • 5. Hyperparameter tuning Expert knowledge Not in this talk: • Statistical best practices • Overview of methods for tuning In this talk: • Data Science workflow best practices • Tips for the big data and cloud computing space Black-box tuning Ignore until needed à See references at end!
  • 6. Hyperparameter tuning is tough 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 # hyperparameters (7 possible values each) Fractioncoverage Non-convex optimization Curse of dimensionality è Computational cost Unintuitive hyperparameters • Regularization • Neural net structure • and many more...
  • 7. In this talk Architectural patterns Workflows Tips and details Getting started
  • 8. In this talk Architectural patterns Workflows Tips and details Getting started
  • 9. Single-machine vs. distributed training • Single-machine training • Distributed training • Training 1 model per group (per customer, product, etc.)
  • 10. Single-machine training Tuning Scale out via distributed hyperparameter tuning à Train 1 model per Spark task Driver Worker WorkerWorker Tuning Tools for this: • Hyperopt + SparkTrials • sklearn + joblibspark • Pandas UDFs
  • 11. Distributed training Driver Worker WorkerWorker Training Tuning ML ML Scale out via parallel tuning à Train 1 model at a time, or train 2+ models in parallel Tools for this: • Apache Spark ML • Hyperopt
  • 12. Training one model per group Driver Worker WorkerWorker Distribute over groups Tuning Tuning Tuning Scale out by distributing over groups à Train 1 group’s model per Spark task Tools for this: • Apache Spark, Pandas UDFs • Hyperopt
  • 13. In this talk Architectural patterns Workflows Tips and details Getting started
  • 14. Getting started Start small • Bias towards smaller models, fewer iterations, etc. • Small is cheaper, and it may suffice. • Regardless, it gives a baseline. Think before you tune • Separate train/val/test sets. • Use early stopping or smart tuning wherever possible. • Pick hyperparameters carefully. • Pick ranges carefully.
  • 15. Models vs. pipelines Best practice: Set up full pipeline before tuning. • At what point does the pipeline compute the metric you care about? Tuning models vs. tuning pipelines • Tuning featurization Optimizing tuning for pipelines • Cache intermediate results
  • 16. Evaluating and iterating Validation data and metrics • Record many metrics on both training and validation data. Tuning hyperparameters independently vs. jointly • Using smarter hyperparameter search algorithms Tracking and reproducibility • Data, code, params, metrics, models, metadata • Tip: Parametrize code to facilitate tracking
  • 18. In this talk Architectural patterns Workflows Tips and details Getting started
  • 19. Handling code Getting code to workers • Generally simple: Pandas UDFs or integrations (Hyperopt, etc.) • Debugging code serialization • Errors are often in worker logs and look like ML library bugs (e.g., “no module named X...”) • Tip: For Python, import libraries within closures Passing configs and credentials • E.g., MLflow active runs and credentials Helpful resource: Distributed Hyperopt best practices and troubleshooting
  • 20. Moving data Single-machine ML • Broadcast • Load from blob storage • Caching data Distributed ML • Caching data Blob storage data prep • Delta Lake format • Petastorm and TFRecords Helpful resources: • Distributed Hyperopt best practices and troubleshooting • Prepare data for distributed training
  • 21. Configuring clusters Single-machine ML • Sharing machine resources • Selecting machine types Distributed ML • Right-sizing clusters • Sharing a cluster Helpful resource: Scaling Hyperopt to Tune Machine Learning Models in Python
  • 22. In this talk Architectural patterns Workflows Tips and details Getting started
  • 23. Tools to know about Apache Spark CrossValidator, TrainValidationSplit, Pandas UDFs MLflow Tracking, Auto-logging ML in Python Hyperopt SparkTrials & more (See last year’s SAIS talk) Scikit-learn sklearn.model_selection, skopt, joblib+Spark TensorFlow HParams, Keras Tuner, Model Optimization Toolkit PyTorch Ax
  • 24. Resources Blog posts + example notebooks • Hyperparameter Tuning with MLflow, Apache Spark MLlib and Hyperopt Talks • Best Practices for Hyperparameter Tuning with MLflow (SAIS 2019) • Advanced Hyperparameter Optimization for Deep Learning with MLflow (SAIS 2019) Project pages + docs • Hyperopt docs and Github • MLflow homepage and Github Slides: tinyurl.com/sais2020-joseph Notebook: tinyurl.com/sais2020-joseph-demo
  • 25. Thanks! Architectural patterns Workflows Tips and details Getting started Any questions?
  翻译: