SlideShare a Scribd company logo
DataOps
Data Science Empowerment through
DevOps, Cloud Computing and Building
your own Applications
Kelly O’Briant
Data Science Product Engineer
kelly@rladies.org
@kellrstats | @RLadiesDC
• R-Ladies Washington DC Chapter Founder
and Organizer
• R-Ladies Global unofficial “cloud expert”
• Publish a monthly series called .rprofile
on the rOpenSci blog
• Business Science University
course developer
My Talk Goal:
I want you to leave this conference so
excited, you go back to work and completely
ignore whatever project you’re supposed to
be working on because you’re so pumped up
about building a data product and you can’t
stop yourself from doing it.
Motivation
Why I talk about Data Science Empowerment
R-Ladies events
• How do I get a job as a data scientist/analyst/anything?
• What should I study/learn/do/produce to be a data scientist?
• Am I even a data scientist? Is what I do data science?
Why are data products empowering?
• I use data products to justify/prove to myself that I belong, that my
ideas are valid and to help me communicate with people who are bad
at listening (or when I’m bad at speaking)
Motivation
Traumatic Experiences!
Windows Lab Linux Lab Mac Lab
R-Ladies + International Women’s Day
Twitter Campaign
• Create a twitter bot using R code
to tweet out a profile for every
woman in our Global speaker
directory
• Project collaboration through GitHub
• Docker linked to a local volume
• Twitter Application(s)
Deploy and Use H2O Machine Learning
Models in Production
• Build and validate a model in python
working in a Jupyter Notebook with the
H2O machine learning API
• Package the model code as a POJO or
MOJO file
• Deploy the model to H2O.ai STEAM to
create an ML prediction service complete
with a REST API query URL
Create and Maintain a Personal Website
• Use the blogdown package in an
RStudio project to create the
framework for a Hugo static
website
• Create content for the site by
writing Rmarkdown files
• Compile and deploy the static site –
choose a hosting mechanism:
GitHub? Continuous Integration
with Netlify?
Why are you so into R?
• It’s great for Data Science
• The community at large is awesome
• The female community is awesome
• R integrates with other tech
• It’s growing really fast in cool ways
• I can use it to build cool stuff
Why are you so into R?
• It’s great for Data Science
• The community at large is awesome
• The female community is awesome
• R integrates with other tech
• It’s growing really fast in cool ways
• I can use it to build cool stuff
#rstats
Why are you so into R?
• It’s great for Data Science
• The community at large is awesome
• The female community is awesome
• R integrates with other tech
• It’s growing really fast in cool ways
• I can use it to build cool stuff
Worldwide organization
that promotes gender diversity
in the R community via meetups
and mentorship in a friendly and
safe environment
Why are you so into R?
• It’s great for Data Science
• The community at large is awesome
• The female community is awesome
• R integrates with other tech
• It’s growing really fast in cool ways
• I can use it to build cool stuff
Why are you so into R?
• It’s great for Data Science
• The community at large is awesome
• The female community is awesome
• R integrates with other tech
• It’s growing really fast in cool ways
• I can use it to build cool stuff
Why are you so into R?
• It’s great for Data Science
• The community at large is awesome
• The female community is awesome
• R integrates with other tech
• It’s growing really fast in cool ways
• I can use it to build cool stuff
Back to the topic: DataOps
1. It usually takes a little DevOps to build a Data Product
2. Building more Data Products is empowering – good for your portfolio and soul
What is DevOps
And why should Data-oriented people care about it?
DevOps is…
“A combination of cultural philosophies, practices
and tools that increases an organizations ability to
deliver applications and services at high velocity.
- AWS DevOps Blog
Deliver applications and services at high velocity
Do This – without pulling all
your hair out?
Deliver applications and services at high velocity
Do This – Super Effectively
Host your analysis
• Share
• Publish
• Collaborate
• Prove a point
• Serve a purpose
• Be reproducible
• Save the day
What is DataOps?
DataOps?
Anywhere you can put a little DevOps magic into your data science workflow
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a Hint of DevOps
Build More Data Products
So that you and others can use them to solve real problems
Try Shiny!
The Iris Dataset
Do Machine Learning!
So Hot Right Now
What Species
is this iris??
Credit: xkcd
1. Turn your ideas into R code
• Write functions to generate the
plots you’re envisioning
• Package: ggplot2
• Train and validate a machine
learning model to use
• Package: caret
geom_hist_basic <- function(var){
ggplot(iris, aes_string(x = var)) +
geom_histogram() +
facet_wrap(~ Species)
}
predict_matrix(fit.knn, validation)
Confusion Matrix and Statistics
Prediction setosa versicolor virginica
setosa 10 0 0
versicolor 0 8 1
virginica 0 2 9
2. Turn your R code into an R Shiny app
Client Side Code:
User Interface and
Input Elements
Server Side Code:
(Reactive) R Output
Elements
shinyApp(ui = fluidPage, server = serverFunction)
fluidPage
Code
serverFunction
Code
Try Plumber!
Let’s Build a REST API with R
1. Write Functions in R
Expose Data or Model
Produce Analysis or Visualization
Data Agnostic
Perform Analysis on New Data
2. Create Plumber
API Endpoints
- Get
- Post
4. Send Requests to
the Plumber Service
Through external (or
internal) Applications
- Jupyter Notebooks
- Web Apps
3. Host the Plumber
Script on a Server
- Create Plumber
router object
- Run in an R Session
Docker Image
RStudio
Server
R Session
Running
Plumber
REST API
My Local File
System
- Plumber.R
- Dockerfile
Local Volume Link
Applications
&
Notebooks
Requests!
Demo Framework
That’s it!
Now go build some sweet data products
Resources for Learning R
R-Ladies Global Meetups
• Get involved!
• More female speakers,
leaders, teachers, builders,
friends!
RLadies.org
@RLadiesGlobal
RStudio Webinars
• All of the talks
from RStudio::conf
2018 have just
been published
• Highly
recommend!
Resources for Learning Shiny Development
shiny.rstudio.com
Resources for Learning Plumber
www.rplumber.io
@TrestleJeff
on Twitter!
Note to self: Remember to give
out stickers
I have R-Ladies and R-Ladies Plumber Stickers!
I’m Kelly!
@kellrstats on Twitter
Ad

More Related Content

What's hot (20)

Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
DataKitchen
 
Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!
DataKitchen
 
devopsdays Warsaw 2018 - Chaos while deploying ML
devopsdays Warsaw 2018 - Chaos while deploying MLdevopsdays Warsaw 2018 - Chaos while deploying ML
devopsdays Warsaw 2018 - Chaos while deploying ML
Thiago de Faria
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
Domino Data Lab
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceThe Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data Governance
Eric Kavanagh
 
Bridged Overview by CodeData
Bridged Overview by CodeDataBridged Overview by CodeData
Bridged Overview by CodeData
Sam Sur
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
Databricks
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
DevOps.com
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Perficient, Inc.
 
Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?
SnapLogic
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
Vivek Aanand Ganesan
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
DataWorks Summit/Hadoop Summit
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
Valdas Maksimavičius
 
Hadoop dev 01
Hadoop dev 01Hadoop dev 01
Hadoop dev 01
Vivian S. Zhang
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
Lars Albertsson
 
The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !
Christian Bilien
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
DataKitchen
 
Surviving the Hadoop Revolution
Surviving the Hadoop RevolutionSurviving the Hadoop Revolution
Surviving the Hadoop Revolution
DataWorks Summit/Hadoop Summit
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku
 
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
DataKitchen
 
Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!Do Agile Data in Just 5 Shocking Steps!
Do Agile Data in Just 5 Shocking Steps!
DataKitchen
 
devopsdays Warsaw 2018 - Chaos while deploying ML
devopsdays Warsaw 2018 - Chaos while deploying MLdevopsdays Warsaw 2018 - Chaos while deploying ML
devopsdays Warsaw 2018 - Chaos while deploying ML
Thiago de Faria
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
Domino Data Lab
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceThe Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data Governance
Eric Kavanagh
 
Bridged Overview by CodeData
Bridged Overview by CodeDataBridged Overview by CodeData
Bridged Overview by CodeData
Sam Sur
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
Databricks
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
DevOps.com
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Perficient, Inc.
 
Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?
SnapLogic
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
DataWorks Summit/Hadoop Summit
 
The lean principles of data ops
The lean principles of data opsThe lean principles of data ops
The lean principles of data ops
Lars Albertsson
 
The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !The DBA Is Dead (Again). Long Live the DBA !
The DBA Is Dead (Again). Long Live the DBA !
Christian Bilien
 
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & RedshiftIntroduction to Big Data Technologies:  Hadoop/EMR/Map Reduce & Redshift
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
DataKitchen
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku
 

Similar to Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a Hint of DevOps (20)

Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
Digital Transformation EXPO Event Series
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
Andrew Musselman
 
Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for Integration
Inside Analysis
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Wsrest13 gilherme keynote
Wsrest13 gilherme keynoteWsrest13 gilherme keynote
Wsrest13 gilherme keynote
ruyalarcon
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
Inside Analysis
 
SciPy Latin America 2019
SciPy Latin America 2019SciPy Latin America 2019
SciPy Latin America 2019
Travis Oliphant
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
WeCloudData
 
Drupal - Changing the Web by Connecting Open Minds - Josef Dabernig
Drupal - Changing the Web by Connecting Open Minds - Josef DabernigDrupal - Changing the Web by Connecting Open Minds - Josef Dabernig
Drupal - Changing the Web by Connecting Open Minds - Josef Dabernig
DrupalCampDN
 
Let's analyze how world reacts to road traffic by sentiment analysis final
Let's analyze how world reacts to road traffic by sentiment analysis finalLet's analyze how world reacts to road traffic by sentiment analysis final
Let's analyze how world reacts to road traffic by sentiment analysis final
Sajeetharan
 
Enabling Data centric Teams
Enabling Data centric TeamsEnabling Data centric Teams
Enabling Data centric Teams
Data Con LA
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
Inside Analysis
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
DataWorks Summit
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
Giivee The
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
Fangda Wang
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
Venkatesh Umaashankar
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
 
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Adaryl "Bob" Wakefield, MBA
 
Become Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypeBecome Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackType
nathanmarz
 
SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?
Nicolas Georgeault
 
Maintainable Machine Learning Products
Maintainable Machine Learning ProductsMaintainable Machine Learning Products
Maintainable Machine Learning Products
Andrew Musselman
 
Business in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for IntegrationBusiness in the Driver’s Seat – An Improved Model for Integration
Business in the Driver’s Seat – An Improved Model for Integration
Inside Analysis
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Wsrest13 gilherme keynote
Wsrest13 gilherme keynoteWsrest13 gilherme keynote
Wsrest13 gilherme keynote
ruyalarcon
 
The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
Inside Analysis
 
SciPy Latin America 2019
SciPy Latin America 2019SciPy Latin America 2019
SciPy Latin America 2019
Travis Oliphant
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
WeCloudData
 
Drupal - Changing the Web by Connecting Open Minds - Josef Dabernig
Drupal - Changing the Web by Connecting Open Minds - Josef DabernigDrupal - Changing the Web by Connecting Open Minds - Josef Dabernig
Drupal - Changing the Web by Connecting Open Minds - Josef Dabernig
DrupalCampDN
 
Let's analyze how world reacts to road traffic by sentiment analysis final
Let's analyze how world reacts to road traffic by sentiment analysis finalLet's analyze how world reacts to road traffic by sentiment analysis final
Let's analyze how world reacts to road traffic by sentiment analysis final
Sajeetharan
 
Enabling Data centric Teams
Enabling Data centric TeamsEnabling Data centric Teams
Enabling Data centric Teams
Data Con LA
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
Inside Analysis
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
DataWorks Summit
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
Giivee The
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
Fangda Wang
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
Venkatesh Umaashankar
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
Rachel Berryman
 
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Adaryl "Bob" Wakefield, MBA
 
Become Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypeBecome Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackType
nathanmarz
 
SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?
Nicolas Georgeault
 
Ad

More from Rehgan Avon (9)

Ezgi Karaesmen - Data Cleaning and Manipulation with R
Ezgi Karaesmen - Data Cleaning and Manipulation with REzgi Karaesmen - Data Cleaning and Manipulation with R
Ezgi Karaesmen - Data Cleaning and Manipulation with R
Rehgan Avon
 
Dr. Karen Amstutz - Digitizing Health: How Analytics are Disrupting Healthca...
Dr. Karen Amstutz - Digitizing Health:  How Analytics are Disrupting Healthca...Dr. Karen Amstutz - Digitizing Health:  How Analytics are Disrupting Healthca...
Dr. Karen Amstutz - Digitizing Health: How Analytics are Disrupting Healthca...
Rehgan Avon
 
Amanda Cinnamon - Treat Your Code Like the Valuable Software It Is
Amanda Cinnamon - Treat Your Code Like the Valuable Software It IsAmanda Cinnamon - Treat Your Code Like the Valuable Software It Is
Amanda Cinnamon - Treat Your Code Like the Valuable Software It Is
Rehgan Avon
 
Cheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial WorldCheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial World
Rehgan Avon
 
Wei Xu - Innovative Applications of AI Panel
Wei Xu - Innovative Applications of AI PanelWei Xu - Innovative Applications of AI Panel
Wei Xu - Innovative Applications of AI Panel
Rehgan Avon
 
Helen Patton - Governing Big Data: Security, Privacy & Data Management
Helen Patton - Governing Big Data: Security, Privacy & Data ManagementHelen Patton - Governing Big Data: Security, Privacy & Data Management
Helen Patton - Governing Big Data: Security, Privacy & Data Management
Rehgan Avon
 
Dr. Lara Sucheston-Campbell - Building a working farm: Planning and planting ...
Dr. Lara Sucheston-Campbell - Building a working farm: Planning and planting ...Dr. Lara Sucheston-Campbell - Building a working farm: Planning and planting ...
Dr. Lara Sucheston-Campbell - Building a working farm: Planning and planting ...
Rehgan Avon
 
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Rehgan Avon
 
BDAA_Newsletter
BDAA_NewsletterBDAA_Newsletter
BDAA_Newsletter
Rehgan Avon
 
Ezgi Karaesmen - Data Cleaning and Manipulation with R
Ezgi Karaesmen - Data Cleaning and Manipulation with REzgi Karaesmen - Data Cleaning and Manipulation with R
Ezgi Karaesmen - Data Cleaning and Manipulation with R
Rehgan Avon
 
Dr. Karen Amstutz - Digitizing Health: How Analytics are Disrupting Healthca...
Dr. Karen Amstutz - Digitizing Health:  How Analytics are Disrupting Healthca...Dr. Karen Amstutz - Digitizing Health:  How Analytics are Disrupting Healthca...
Dr. Karen Amstutz - Digitizing Health: How Analytics are Disrupting Healthca...
Rehgan Avon
 
Amanda Cinnamon - Treat Your Code Like the Valuable Software It Is
Amanda Cinnamon - Treat Your Code Like the Valuable Software It IsAmanda Cinnamon - Treat Your Code Like the Valuable Software It Is
Amanda Cinnamon - Treat Your Code Like the Valuable Software It Is
Rehgan Avon
 
Cheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial WorldCheryl Wiebe - Advanced Analytics in the Industrial World
Cheryl Wiebe - Advanced Analytics in the Industrial World
Rehgan Avon
 
Wei Xu - Innovative Applications of AI Panel
Wei Xu - Innovative Applications of AI PanelWei Xu - Innovative Applications of AI Panel
Wei Xu - Innovative Applications of AI Panel
Rehgan Avon
 
Helen Patton - Governing Big Data: Security, Privacy & Data Management
Helen Patton - Governing Big Data: Security, Privacy & Data ManagementHelen Patton - Governing Big Data: Security, Privacy & Data Management
Helen Patton - Governing Big Data: Security, Privacy & Data Management
Rehgan Avon
 
Dr. Lara Sucheston-Campbell - Building a working farm: Planning and planting ...
Dr. Lara Sucheston-Campbell - Building a working farm: Planning and planting ...Dr. Lara Sucheston-Campbell - Building a working farm: Planning and planting ...
Dr. Lara Sucheston-Campbell - Building a working farm: Planning and planting ...
Rehgan Avon
 
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Bijaya Zenchenko - An Embedding is Worth 1000 Words - Start Using Word Embedd...
Rehgan Avon
 
Ad

Recently uploaded (20)

problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
AWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdfAWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdf
philsparkshome
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
muhammed84essa
 
report (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhsreport (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhs
AngelPinedaTaguinod
 
Adopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use caseAdopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use case
Process mining Evangelist
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
Understanding Complex Development Processes
Understanding Complex Development ProcessesUnderstanding Complex Development Processes
Understanding Complex Development Processes
Process mining Evangelist
 
Chapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptxChapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptx
PermissionTafadzwaCh
 
Improving Product Manufacturing Processes
Improving Product Manufacturing ProcessesImproving Product Manufacturing Processes
Improving Product Manufacturing Processes
Process mining Evangelist
 
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdfTOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
NhiV747372
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
Agricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptxAgricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptx
mostafaahammed38
 
real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682real illuminati Uganda agent 0782561496/0756664682
real illuminati Uganda agent 0782561496/0756664682
way to join real illuminati Agent In Kampala Call/WhatsApp+256782561496/0756664682
 
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
bastakwyry
 
Feature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record SystemsFeature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record Systems
Process mining Evangelist
 
What is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdfWhat is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdf
SaikatBasu37
 
How to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process miningHow to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process mining
Process mining Evangelist
 
Controlling Financial Processes at a Municipality
Controlling Financial Processes at a MunicipalityControlling Financial Processes at a Municipality
Controlling Financial Processes at a Municipality
Process mining Evangelist
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
AWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdfAWS Certified Machine Learning Slides.pdf
AWS Certified Machine Learning Slides.pdf
philsparkshome
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
muhammed84essa
 
report (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhsreport (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhs
AngelPinedaTaguinod
 
Adopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use caseAdopting Process Mining at the Rabobank - use case
Adopting Process Mining at the Rabobank - use case
Process mining Evangelist
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
Chapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptxChapter 6-3 Introducingthe Concepts .pptx
Chapter 6-3 Introducingthe Concepts .pptx
PermissionTafadzwaCh
 
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdfTOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
NhiV747372
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
Agricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptxAgricultural_regionalisation_in_India(Final).pptx
Agricultural_regionalisation_in_India(Final).pptx
mostafaahammed38
 
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
bastakwyry
 
Feature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record SystemsFeature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record Systems
Process mining Evangelist
 
What is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdfWhat is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdf
SaikatBasu37
 
How to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process miningHow to regulate and control your it-outsourcing provider with process mining
How to regulate and control your it-outsourcing provider with process mining
Process mining Evangelist
 
Controlling Financial Processes at a Municipality
Controlling Financial Processes at a MunicipalityControlling Financial Processes at a Municipality
Controlling Financial Processes at a Municipality
Process mining Evangelist
 

Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a Hint of DevOps

  • 1. DataOps Data Science Empowerment through DevOps, Cloud Computing and Building your own Applications
  • 2. Kelly O’Briant Data Science Product Engineer kelly@rladies.org @kellrstats | @RLadiesDC • R-Ladies Washington DC Chapter Founder and Organizer • R-Ladies Global unofficial “cloud expert” • Publish a monthly series called .rprofile on the rOpenSci blog • Business Science University course developer
  • 3. My Talk Goal: I want you to leave this conference so excited, you go back to work and completely ignore whatever project you’re supposed to be working on because you’re so pumped up about building a data product and you can’t stop yourself from doing it.
  • 4. Motivation Why I talk about Data Science Empowerment R-Ladies events • How do I get a job as a data scientist/analyst/anything? • What should I study/learn/do/produce to be a data scientist? • Am I even a data scientist? Is what I do data science? Why are data products empowering? • I use data products to justify/prove to myself that I belong, that my ideas are valid and to help me communicate with people who are bad at listening (or when I’m bad at speaking)
  • 6. R-Ladies + International Women’s Day Twitter Campaign • Create a twitter bot using R code to tweet out a profile for every woman in our Global speaker directory • Project collaboration through GitHub • Docker linked to a local volume • Twitter Application(s)
  • 7. Deploy and Use H2O Machine Learning Models in Production • Build and validate a model in python working in a Jupyter Notebook with the H2O machine learning API • Package the model code as a POJO or MOJO file • Deploy the model to H2O.ai STEAM to create an ML prediction service complete with a REST API query URL
  • 8. Create and Maintain a Personal Website • Use the blogdown package in an RStudio project to create the framework for a Hugo static website • Create content for the site by writing Rmarkdown files • Compile and deploy the static site – choose a hosting mechanism: GitHub? Continuous Integration with Netlify?
  • 9. Why are you so into R? • It’s great for Data Science • The community at large is awesome • The female community is awesome • R integrates with other tech • It’s growing really fast in cool ways • I can use it to build cool stuff
  • 10. Why are you so into R? • It’s great for Data Science • The community at large is awesome • The female community is awesome • R integrates with other tech • It’s growing really fast in cool ways • I can use it to build cool stuff #rstats
  • 11. Why are you so into R? • It’s great for Data Science • The community at large is awesome • The female community is awesome • R integrates with other tech • It’s growing really fast in cool ways • I can use it to build cool stuff Worldwide organization that promotes gender diversity in the R community via meetups and mentorship in a friendly and safe environment
  • 12. Why are you so into R? • It’s great for Data Science • The community at large is awesome • The female community is awesome • R integrates with other tech • It’s growing really fast in cool ways • I can use it to build cool stuff
  • 13. Why are you so into R? • It’s great for Data Science • The community at large is awesome • The female community is awesome • R integrates with other tech • It’s growing really fast in cool ways • I can use it to build cool stuff
  • 14. Why are you so into R? • It’s great for Data Science • The community at large is awesome • The female community is awesome • R integrates with other tech • It’s growing really fast in cool ways • I can use it to build cool stuff
  • 15. Back to the topic: DataOps 1. It usually takes a little DevOps to build a Data Product 2. Building more Data Products is empowering – good for your portfolio and soul
  • 16. What is DevOps And why should Data-oriented people care about it? DevOps is… “A combination of cultural philosophies, practices and tools that increases an organizations ability to deliver applications and services at high velocity. - AWS DevOps Blog
  • 17. Deliver applications and services at high velocity Do This – without pulling all your hair out?
  • 18. Deliver applications and services at high velocity Do This – Super Effectively Host your analysis • Share • Publish • Collaborate • Prove a point • Serve a purpose • Be reproducible • Save the day
  • 19. What is DataOps? DataOps? Anywhere you can put a little DevOps magic into your data science workflow
  • 21. Build More Data Products So that you and others can use them to solve real problems
  • 24. Do Machine Learning! So Hot Right Now What Species is this iris?? Credit: xkcd
  • 25. 1. Turn your ideas into R code • Write functions to generate the plots you’re envisioning • Package: ggplot2 • Train and validate a machine learning model to use • Package: caret geom_hist_basic <- function(var){ ggplot(iris, aes_string(x = var)) + geom_histogram() + facet_wrap(~ Species) } predict_matrix(fit.knn, validation) Confusion Matrix and Statistics Prediction setosa versicolor virginica setosa 10 0 0 versicolor 0 8 1 virginica 0 2 9
  • 26. 2. Turn your R code into an R Shiny app Client Side Code: User Interface and Input Elements Server Side Code: (Reactive) R Output Elements shinyApp(ui = fluidPage, server = serverFunction) fluidPage Code serverFunction Code
  • 28. Let’s Build a REST API with R 1. Write Functions in R Expose Data or Model Produce Analysis or Visualization Data Agnostic Perform Analysis on New Data 2. Create Plumber API Endpoints - Get - Post 4. Send Requests to the Plumber Service Through external (or internal) Applications - Jupyter Notebooks - Web Apps 3. Host the Plumber Script on a Server - Create Plumber router object - Run in an R Session
  • 29. Docker Image RStudio Server R Session Running Plumber REST API My Local File System - Plumber.R - Dockerfile Local Volume Link Applications & Notebooks Requests! Demo Framework
  • 30. That’s it! Now go build some sweet data products
  • 32. R-Ladies Global Meetups • Get involved! • More female speakers, leaders, teachers, builders, friends! RLadies.org @RLadiesGlobal
  • 33. RStudio Webinars • All of the talks from RStudio::conf 2018 have just been published • Highly recommend!
  • 34. Resources for Learning Shiny Development shiny.rstudio.com
  • 35. Resources for Learning Plumber www.rplumber.io @TrestleJeff on Twitter!
  • 36. Note to self: Remember to give out stickers I have R-Ladies and R-Ladies Plumber Stickers! I’m Kelly! @kellrstats on Twitter
  翻译: