SlideShare a Scribd company logo
Executive Briefing
Lessons learned managing data
science projects: Adopting a team
data science process
Managing Data Science Projects
Managing Data Science Projects
Managing Data Science Projects
Our strategy is to build best-in-class
platforms and productivity services for
an intelligent cloud and an
intelligent edge infused with artificial
intelligence (“AI”).
Microsoft Form 10-K 2016
Managing Data Science Projects
Data Science
Toolbox of a Data Scientist
8
8
Do it like a Professional!
Understand the Decision Process
Tip #1
What is the business problem that
needs to be solved, independent of
the technology solution?
What is the decision or action has to
be taken that can be informed by
data.
Predictive Maintenance
Understanding the Decision Process
Key Decision
Should I service
this piece of
equipment?
Data Science Question
What is the probability
this equipment will fail
within the next X days?
Predictive Maintenance
Business Scenario Key Decision Data Science Question
Energy Forecasting Should I buy or sell energy
contracts?
What will be the long/short term demand
for energy in a region?
Customer Churn Which customers should I
prioritize to reduce churn?
What is probability of churn within X days
for each customer?
Personalized Marketing What product should I offer
first?
What is the probability that customer will
purchase each product?
Product Feedback Which service/product needs
attention?
What is social media sentiment for each
service/product?
Framing Data Science Question based on the Scenario
Be obsessed with data
Tip #2
Being Obsessed with Data
Can only complete the process with the right data!
Bring in the people that know the data
Managing Data Science Projects
Managing Data Science Projects
Managing Data Science Projects
Establish Performance Metrics
Tip #3
What is considered a
success for the
business?
How do you measure it?
Establish a
Qualitative
Objective
Translate into
Quantifiable
Metric
Quantify the
metric value
improvement
useful (e.g., 10%
fewer failures 
savings of
$1MM/year)
Establish a
baseline
(e.g., current
failure rate =
10% per year)
Establish how to
measure the
improvement in
the metric with
the data science
solution (e.g.
80% of the
equipment
maintained
based on
predictive
model)
Using Performance Metrics
Document
Success Metrics
using a template
Tips:
1. Data science team embedded within
the business
2. Allow exploring multiple problem
formulations to get to end metric goal
3. Past goal, go within set time period
4. Ensure reproducibility
Establish the E2E solution
Tip #4
1. Set up the end to end solution and
the metrics
2. Launch with a baseline/simple
model
3. Act on the recommendations of
the solution
4. Measure and iterate
Establishing a E2E solution helps with
buy-in from the business
Keep a Human in the Loop
Tip #5
• Empower ALL to perform like the BEST
• Automate repetitive human tasks
• Embed expert knowledge into the solution
• How to interpret the model?
• Importance of Features
• Bias in the model
• Interpreting predictions per instance
• What-if analysis
Users don’t trust black-box models
Data Science is a Team Sport
Learn and Educate
Tip #6
Managing Data Science Projects
1. Learn from experiments
• Why?
• Both Successes or Failures
2. Share the learnings
3. Promote successful experiments to production
4. Move on to the next hypothesis to experiment
• Failure is a valid outcome of an
experiment
• Learn and refine the next experiment
Adopt a Process
Tip #7
A process specifies a detailed sequence of activities
necessary to perform specific business tasks.
It is used to standardize procedures and
establish best practices.
Microsoft’s Team Data Science Process
https://aka.ms/tdsp
Standard Project Lifecycle
Standardized Document
Templates, Project Structure
Shared, Distributed
Resources
Productivity Tools, Shared
Utilities
Managing Data Science Projects
Cross-Industry Standard Process for Data Mining
(CRISP-DM)
Knowledge Discovery in Databases
(KDD)
Managing Data Science Projects
• Data science virtual
machines (DSVMs) as the
fundamental development
platform on cloud
• Use Visual Studio Team
Services (VSTS)
• Work item tracking and scrum planning
• Git repositories
• Shared data science utilities
in Git repository
• Use cloud-based Azure
resources as needed
Managing Data Science Projects
• Terminology:
• Feature: a project
• Story: a stage in the E2E
process of a DS project
• Tasks: specific
coding/documentation/othe
r activities that are needed
to complete a story
• Iteration: usually a 2-week
sprint
Managing Data Science Projects
Managing Data Science Projects
App Developer Source Control
Cloud Services
CI/CD Pipelines
IDE
Data Scientist
Training Environment
[ { "cat": 0.99218,
"feline": 0.81242,
"puma": 0.45456: } ]
IDE
App code
Apps
Edge Devices
Model Storage
PUBLISHCODE CONSUME
Lifecycle Management
Processes. Templates. Permissions
Embed model
CNTK/TF/SCIKIT
KERAS/ …
Train&
testmodel
Data Lake
App telemetry
A/B
Testing
BUILD & TEST
Training+
testcode
Continuous retraining
Testmodel
+app
Model Source Control
• Processes and procedures to make models
reproducible (from source control to data
retention policies)
• Make it easy to work on multiple models
(consistent process)
Model Validation
• Unit testing, functional testing and
performance testing
• Validation needs to be performed both
isolation and when embedded in an
application
Model Versioning & Storage
• Provide a consistent way to store & share
models, plus a way to track where models are
embedded / running
• Provide a consistent model format
• Provide traceability on where a model came
from (which data, which experiment, where’s
the code / notebook)
• Provide a way to track where model is running
• Control who has access to what models
Model Deployment
• Provide an efficient process to get a model build into an
application or service and leveraged to light up an end-user
scenario.
• Simplify the process to interact with the model (through code-
generation, API specifications / interfaces or other methods)
• Support a variety of inferencing targets (cloud / app / edge)
(including FPGAs or dedicated frameworks like CoreML & WinML)
• Provide secrets / service endpoint management to remove
friction from configuring the release process.
Accumulate a toolbox of tricks
Tip #8
• Data Exploration
• RFM – User Behavior Modeling
• Hyper parameter tuning
• Auto Featurization
Note: Domain expertise is still
helpful
Building an Org’s Toolbox
Continuous Learning
Tip #9
Managing Data Science Projects
Lots of common sense… but not common
practice
Managing Data Science Projects
Thank you!
Also thanks to Pavandeep Kalra, Jacob Spolstra, Wee Hyong
Tok, Richin Jain, Brandon Rohrer
Ad

More Related Content

What's hot (20)

Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
Webinar on ChatGPT.pptx
Webinar on ChatGPT.pptxWebinar on ChatGPT.pptx
Webinar on ChatGPT.pptx
Abhilash Majumder
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
ConfluentInc1
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
Using Generative AI
Using Generative AIUsing Generative AI
Using Generative AI
Mark DeLoura
 
AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and Data
Chris Fregly
 
Hegazi_ChatGPT_Book.pdf
Hegazi_ChatGPT_Book.pdfHegazi_ChatGPT_Book.pdf
Hegazi_ChatGPT_Book.pdf
AmirHegazi1
 
How People Are Leveraging ChatGPT
How People Are Leveraging ChatGPTHow People Are Leveraging ChatGPT
How People Are Leveraging ChatGPT
Roy Ahuja
 
Data mesh
Data meshData mesh
Data mesh
ManojKumarR41
 
Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018
LoQutus
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
Sudheer Kondla
 
Using Generative AI in the Classroom .pptx
Using Generative AI in the Classroom .pptxUsing Generative AI in the Classroom .pptx
Using Generative AI in the Classroom .pptx
JonathanDietz3
 
Data analytics
Data analyticsData analytics
Data analytics
Bhanu Pratap
 
The Art of Business Storytelling with Data
The Art of Business Storytelling with DataThe Art of Business Storytelling with Data
The Art of Business Storytelling with Data
Andrés Fortino, PhD
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
Jesus Rodriguez
 
Data stories - how to combine the power storytelling with effective data visu...
Data stories - how to combine the power storytelling with effective data visu...Data stories - how to combine the power storytelling with effective data visu...
Data stories - how to combine the power storytelling with effective data visu...
Coincidencity
 
Product Management for AI/ML
Product Management for AI/MLProduct Management for AI/ML
Product Management for AI/ML
Jeremy Horn
 
Can ChatGPT be compatible with the GDPR? Discuss.
Can ChatGPT be compatible with the GDPR? Discuss.Can ChatGPT be compatible with the GDPR? Discuss.
Can ChatGPT be compatible with the GDPR? Discuss.
Lilian Edwards
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Anant Corporation
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
ConfluentInc1
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
Steve Omohundro
 
Using Generative AI
Using Generative AIUsing Generative AI
Using Generative AI
Mark DeLoura
 
AWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and DataAWS reInvent 2022 reCap AI/ML and Data
AWS reInvent 2022 reCap AI/ML and Data
Chris Fregly
 
Hegazi_ChatGPT_Book.pdf
Hegazi_ChatGPT_Book.pdfHegazi_ChatGPT_Book.pdf
Hegazi_ChatGPT_Book.pdf
AmirHegazi1
 
How People Are Leveraging ChatGPT
How People Are Leveraging ChatGPTHow People Are Leveraging ChatGPT
How People Are Leveraging ChatGPT
Roy Ahuja
 
Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018Self-Service Analytics Framework - Connected Brains 2018
Self-Service Analytics Framework - Connected Brains 2018
LoQutus
 
Data platform architecture
Data platform architectureData platform architecture
Data platform architecture
Sudheer Kondla
 
Using Generative AI in the Classroom .pptx
Using Generative AI in the Classroom .pptxUsing Generative AI in the Classroom .pptx
Using Generative AI in the Classroom .pptx
JonathanDietz3
 
The Art of Business Storytelling with Data
The Art of Business Storytelling with DataThe Art of Business Storytelling with Data
The Art of Business Storytelling with Data
Andrés Fortino, PhD
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
Jesus Rodriguez
 
Data stories - how to combine the power storytelling with effective data visu...
Data stories - how to combine the power storytelling with effective data visu...Data stories - how to combine the power storytelling with effective data visu...
Data stories - how to combine the power storytelling with effective data visu...
Coincidencity
 
Product Management for AI/ML
Product Management for AI/MLProduct Management for AI/ML
Product Management for AI/ML
Jeremy Horn
 
Can ChatGPT be compatible with the GDPR? Discuss.
Can ChatGPT be compatible with the GDPR? Discuss.Can ChatGPT be compatible with the GDPR? Discuss.
Can ChatGPT be compatible with the GDPR? Discuss.
Lilian Edwards
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 

Similar to Managing Data Science Projects (20)

Webinar - Design Thinking for Platform Engineering
Webinar - Design Thinking for Platform EngineeringWebinar - Design Thinking for Platform Engineering
Webinar - Design Thinking for Platform Engineering
OpenCredo
 
Managing an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderManaging an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product Leader
Product School
 
Establish the right practices for Effective AI
Establish the right practices for Effective AIEstablish the right practices for Effective AI
Establish the right practices for Effective AI
Wee Hyong Tok
 
[Webinar] Visa's Journey to a Culture of Experimentation
[Webinar] Visa's Journey to a Culture of Experimentation[Webinar] Visa's Journey to a Culture of Experimentation
[Webinar] Visa's Journey to a Culture of Experimentation
Optimizely
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
Databricks
 
Presentation1the Security Risk Management in.pptx.
Presentation1the Security Risk Management in.pptx.Presentation1the Security Risk Management in.pptx.
Presentation1the Security Risk Management in.pptx.
MahmoudElmahdy32
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
DataScienceConferenc1
 
PureApp Presentation
PureApp PresentationPureApp Presentation
PureApp Presentation
Prolifics
 
Doing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsDoing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating Analytics
Tasktop
 
BooK of EMC Introduction to Big data Analytics Module 2.pptx
BooK of EMC Introduction to Big data Analytics Module 2.pptxBooK of EMC Introduction to Big data Analytics Module 2.pptx
BooK of EMC Introduction to Big data Analytics Module 2.pptx
mostafasameer858
 
IBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOpsIBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOps
Sanjeev Sharma
 
Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...
Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...
Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...
Lviv Startup Club
 
Managing an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderManaging an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product Leader
Product School
 
Atmosphere 2016 - Berk Dulger - DevOps Tactical Adoption Theory
Atmosphere 2016 - Berk Dulger  - DevOps Tactical Adoption TheoryAtmosphere 2016 - Berk Dulger  - DevOps Tactical Adoption Theory
Atmosphere 2016 - Berk Dulger - DevOps Tactical Adoption Theory
PROIDEA
 
How to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPOHow to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPO
Product School
 
Experimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOpsExperimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOps
Databricks
 
AAC2025_Danninger_Fail fast succeed smarter.pdf
AAC2025_Danninger_Fail fast succeed smarter.pdfAAC2025_Danninger_Fail fast succeed smarter.pdf
AAC2025_Danninger_Fail fast succeed smarter.pdf
Agile Austria Conference
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence Workshop
David Tan
 
How to Get Your Organizations To Start Using Microsoft Teams
How to Get Your Organizations To Start Using Microsoft TeamsHow to Get Your Organizations To Start Using Microsoft Teams
How to Get Your Organizations To Start Using Microsoft Teams
Dux Raymond Sy
 
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?
Maxim Salnikov
 
Webinar - Design Thinking for Platform Engineering
Webinar - Design Thinking for Platform EngineeringWebinar - Design Thinking for Platform Engineering
Webinar - Design Thinking for Platform Engineering
OpenCredo
 
Managing an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderManaging an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product Leader
Product School
 
Establish the right practices for Effective AI
Establish the right practices for Effective AIEstablish the right practices for Effective AI
Establish the right practices for Effective AI
Wee Hyong Tok
 
[Webinar] Visa's Journey to a Culture of Experimentation
[Webinar] Visa's Journey to a Culture of Experimentation[Webinar] Visa's Journey to a Culture of Experimentation
[Webinar] Visa's Journey to a Culture of Experimentation
Optimizely
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
Databricks
 
Presentation1the Security Risk Management in.pptx.
Presentation1the Security Risk Management in.pptx.Presentation1the Security Risk Management in.pptx.
Presentation1the Security Risk Management in.pptx.
MahmoudElmahdy32
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
DataScienceConferenc1
 
PureApp Presentation
PureApp PresentationPureApp Presentation
PureApp Presentation
Prolifics
 
Doing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsDoing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating Analytics
Tasktop
 
BooK of EMC Introduction to Big data Analytics Module 2.pptx
BooK of EMC Introduction to Big data Analytics Module 2.pptxBooK of EMC Introduction to Big data Analytics Module 2.pptx
BooK of EMC Introduction to Big data Analytics Module 2.pptx
mostafasameer858
 
IBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOpsIBM Innovate - Uderstanding DevOps
IBM Innovate - Uderstanding DevOps
Sanjeev Sharma
 
Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...
Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...
Anastasiia Khait: Building Product Passion: Empowering Development Teams thro...
Lviv Startup Club
 
Managing an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderManaging an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product Leader
Product School
 
Atmosphere 2016 - Berk Dulger - DevOps Tactical Adoption Theory
Atmosphere 2016 - Berk Dulger  - DevOps Tactical Adoption TheoryAtmosphere 2016 - Berk Dulger  - DevOps Tactical Adoption Theory
Atmosphere 2016 - Berk Dulger - DevOps Tactical Adoption Theory
PROIDEA
 
How to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPOHow to Build an AI/ML Product and Sell it by SalesChoice CPO
How to Build an AI/ML Product and Sell it by SalesChoice CPO
Product School
 
Experimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOpsExperimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOps
Databricks
 
AAC2025_Danninger_Fail fast succeed smarter.pdf
AAC2025_Danninger_Fail fast succeed smarter.pdfAAC2025_Danninger_Fail fast succeed smarter.pdf
AAC2025_Danninger_Fail fast succeed smarter.pdf
Agile Austria Conference
 
Continuous Intelligence Workshop
Continuous Intelligence WorkshopContinuous Intelligence Workshop
Continuous Intelligence Workshop
David Tan
 
How to Get Your Organizations To Start Using Microsoft Teams
How to Get Your Organizations To Start Using Microsoft TeamsHow to Get Your Organizations To Start Using Microsoft Teams
How to Get Your Organizations To Start Using Microsoft Teams
Dux Raymond Sy
 
Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?Using the power of OpenAI with your own data: what's possible and how to start?
Using the power of OpenAI with your own data: what's possible and how to start?
Maxim Salnikov
 
Ad

Recently uploaded (20)

Build With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdfBuild With AI - In Person Session Slides.pdf
Build With AI - In Person Session Slides.pdf
Google Developer Group - Harare
 
fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptxUiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
anabulhac
 
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdfICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
Eryk Budi Pratama
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
ICT Frame Magazine Pvt. Ltd.
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
Cybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft CertificateCybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft Certificate
VICTOR MAESTRE RAMIREZ
 
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient CareAn Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
Cyntexa
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
How to Build an AI-Powered App: Tools, Techniques, and Trends
How to Build an AI-Powered App: Tools, Techniques, and TrendsHow to Build an AI-Powered App: Tools, Techniques, and Trends
How to Build an AI-Powered App: Tools, Techniques, and Trends
Nascenture
 
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Alan Dix
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Dark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanizationDark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanization
Jakub Šimek
 
fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptxUiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
UiPath AgentHack - Build the AI agents of tomorrow_Enablement 1.pptx
anabulhac
 
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdfICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
ICDCC 2025: Securing Agentic AI - Eryk Budi Pratama.pdf
Eryk Budi Pratama
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
ICT Frame Magazine Pvt. Ltd.
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
Cybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft CertificateCybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft Certificate
VICTOR MAESTRE RAMIREZ
 
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient CareAn Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
Cyntexa
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
How to Build an AI-Powered App: Tools, Techniques, and Trends
How to Build an AI-Powered App: Tools, Techniques, and TrendsHow to Build an AI-Powered App: Tools, Techniques, and Trends
How to Build an AI-Powered App: Tools, Techniques, and Trends
Nascenture
 
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Who's choice? Making decisions with and about Artificial Intelligence, Keele ...
Alan Dix
 
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Conte...
Ivano Malavolta
 
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptxReimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
Reimagine How You and Your Team Work with Microsoft 365 Copilot.pptx
John Moore
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Dark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanizationDark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanization
Jakub Šimek
 
Ad

Managing Data Science Projects

  • 1. Executive Briefing Lessons learned managing data science projects: Adopting a team data science process
  • 5. Our strategy is to build best-in-class platforms and productivity services for an intelligent cloud and an intelligent edge infused with artificial intelligence (“AI”). Microsoft Form 10-K 2016
  • 8. Toolbox of a Data Scientist 8 8
  • 9. Do it like a Professional!
  • 10. Understand the Decision Process Tip #1
  • 11. What is the business problem that needs to be solved, independent of the technology solution? What is the decision or action has to be taken that can be informed by data.
  • 13. Understanding the Decision Process Key Decision Should I service this piece of equipment? Data Science Question What is the probability this equipment will fail within the next X days?
  • 15. Business Scenario Key Decision Data Science Question Energy Forecasting Should I buy or sell energy contracts? What will be the long/short term demand for energy in a region? Customer Churn Which customers should I prioritize to reduce churn? What is probability of churn within X days for each customer? Personalized Marketing What product should I offer first? What is the probability that customer will purchase each product? Product Feedback Which service/product needs attention? What is social media sentiment for each service/product? Framing Data Science Question based on the Scenario
  • 16. Be obsessed with data Tip #2
  • 17. Being Obsessed with Data Can only complete the process with the right data!
  • 18. Bring in the people that know the data
  • 23. What is considered a success for the business?
  • 24. How do you measure it?
  • 25. Establish a Qualitative Objective Translate into Quantifiable Metric Quantify the metric value improvement useful (e.g., 10% fewer failures  savings of $1MM/year) Establish a baseline (e.g., current failure rate = 10% per year) Establish how to measure the improvement in the metric with the data science solution (e.g. 80% of the equipment maintained based on predictive model) Using Performance Metrics
  • 27. Tips: 1. Data science team embedded within the business 2. Allow exploring multiple problem formulations to get to end metric goal 3. Past goal, go within set time period 4. Ensure reproducibility
  • 28. Establish the E2E solution Tip #4
  • 29. 1. Set up the end to end solution and the metrics 2. Launch with a baseline/simple model 3. Act on the recommendations of the solution 4. Measure and iterate
  • 30. Establishing a E2E solution helps with buy-in from the business
  • 31. Keep a Human in the Loop Tip #5
  • 32. • Empower ALL to perform like the BEST • Automate repetitive human tasks • Embed expert knowledge into the solution
  • 33. • How to interpret the model? • Importance of Features • Bias in the model • Interpreting predictions per instance • What-if analysis Users don’t trust black-box models
  • 34. Data Science is a Team Sport
  • 37. 1. Learn from experiments • Why? • Both Successes or Failures 2. Share the learnings 3. Promote successful experiments to production 4. Move on to the next hypothesis to experiment
  • 38. • Failure is a valid outcome of an experiment • Learn and refine the next experiment
  • 40. A process specifies a detailed sequence of activities necessary to perform specific business tasks. It is used to standardize procedures and establish best practices.
  • 41. Microsoft’s Team Data Science Process https://aka.ms/tdsp Standard Project Lifecycle Standardized Document Templates, Project Structure Shared, Distributed Resources Productivity Tools, Shared Utilities
  • 43. Cross-Industry Standard Process for Data Mining (CRISP-DM) Knowledge Discovery in Databases (KDD)
  • 45. • Data science virtual machines (DSVMs) as the fundamental development platform on cloud • Use Visual Studio Team Services (VSTS) • Work item tracking and scrum planning • Git repositories • Shared data science utilities in Git repository • Use cloud-based Azure resources as needed
  • 47. • Terminology: • Feature: a project • Story: a stage in the E2E process of a DS project • Tasks: specific coding/documentation/othe r activities that are needed to complete a story • Iteration: usually a 2-week sprint
  • 50. App Developer Source Control Cloud Services CI/CD Pipelines IDE Data Scientist Training Environment [ { "cat": 0.99218, "feline": 0.81242, "puma": 0.45456: } ] IDE App code Apps Edge Devices Model Storage PUBLISHCODE CONSUME Lifecycle Management Processes. Templates. Permissions Embed model CNTK/TF/SCIKIT KERAS/ … Train& testmodel Data Lake App telemetry A/B Testing BUILD & TEST Training+ testcode Continuous retraining Testmodel +app
  • 51. Model Source Control • Processes and procedures to make models reproducible (from source control to data retention policies) • Make it easy to work on multiple models (consistent process)
  • 52. Model Validation • Unit testing, functional testing and performance testing • Validation needs to be performed both isolation and when embedded in an application
  • 53. Model Versioning & Storage • Provide a consistent way to store & share models, plus a way to track where models are embedded / running • Provide a consistent model format • Provide traceability on where a model came from (which data, which experiment, where’s the code / notebook) • Provide a way to track where model is running • Control who has access to what models
  • 54. Model Deployment • Provide an efficient process to get a model build into an application or service and leveraged to light up an end-user scenario. • Simplify the process to interact with the model (through code- generation, API specifications / interfaces or other methods) • Support a variety of inferencing targets (cloud / app / edge) (including FPGAs or dedicated frameworks like CoreML & WinML) • Provide secrets / service endpoint management to remove friction from configuring the release process.
  • 55. Accumulate a toolbox of tricks Tip #8
  • 56. • Data Exploration • RFM – User Behavior Modeling • Hyper parameter tuning • Auto Featurization Note: Domain expertise is still helpful Building an Org’s Toolbox
  • 59. Lots of common sense… but not common practice
  • 61. Thank you! Also thanks to Pavandeep Kalra, Jacob Spolstra, Wee Hyong Tok, Richin Jain, Brandon Rohrer

Editor's Notes

  翻译: