SlideShare a Scribd company logo
TOP 5 LESSONS
LEARNED
IN DEPLOYING AI IN THE
REAL WORLD
© 2018 PURE STORAGE INC.2
QUESTION ON EVERYONE’S MIND:
WHY IS A STORAGE COMPANY TALKING
ABOUT AI?
© 2018 PURE STORAGE INC.3
NEW ALGORITHMS
Massively Parallel Delivering
Superhuman Accuracy
MODERN COMPUTE
Massively Parallel Architecture
Driving Performance
GPU- THOUSANDS OF CORES
BIG DATA
Data is the New Oil
50 Zettabytes Created in 2020
EXPLOSION IN ARTIFICIAL INTELLIGENCE
FUELED BY PARALLEL COMPUTE, NEW ALGORITHMS, AND BIG DATA
© 2018 PURE STORAGE INC.4
FRAMEWORKS GPU SERVER STORAGE
TECHNOLOGIES OF THE BIG BANG
WHAT CUSTOMERS DEPLOY
© 2018 PURE STORAGE INC.5
DATA IS VITAL TO MACHINE LEARNING
OBSERVATION BY PROF. ANDREW NG, AI LUMINARY
© 2018 PURE STORAGE INC.6
“We don’t have better algorithms,
we just have more data”
PETER NORVIG
Engineering Director, Google
© 2018 PURE STORAGE INC.7
The AI “hierarchy of needs”
credit: Monica Rogati
ML algorithms: linear & logistic
regression, k-means clustering, decision
trees, etc.
Validation: A/B testing, detecting model
drift over time✓
Data preparation: cleaning, feature
identification, exploration, etc.
Data acquisition: ingest, transformation,
and representation of data for analysis
© 2018 PURE STORAGE INC.8
TOP 5 LESSONS LEARNED
1.AIisaDataPipeline
© 2018 PURE STORAGE INC.9
WHAT MOST THINK IS AI
NEW POSSIBILITIES
For Nearly Every Industry
FRAMEWORKS
To Get Started
GPU
The Engine
© 2018 PURE STORAGE INC.10
AI IS SO MUCH MORE
“Hidden Technical Debt in Machine Learning Systems”, Google NIPS 2015
© 2018 PURE STORAGE INC.11
COMPLEXITIES OF AI IN PRODUCTION
INGEST
From sensors, machines,
& user generated
CLEAN &
TRANSFORM
Label, anomaly detection, ETL,
prep, stage
EXPLORE
Quickly iterate to
converge on models
TRAIN
Run for hours to days in
production cluster
CPU Servers GPU Server GPU Production Cluster
COPY &
TRANSFORM
COPY &
TRANSFORM
COPY &
TRANSFORM
© 2018 PURE STORAGE INC.12
WIDE RANGE OF NEEDS IN AI PIPELINE
SIGNIFICANT CHALLENGE TO LEGACY STORAGE
INGEST
From sensors & machines
CLEAN &
TRANSFORM
CPU Servers
EXPLORE
GPU Server
TRAIN
GPU Production Cluster
Access Pattern sequential sequential or random random random
Access Type write read & write read read
File Size mostly large small to large small to large mostly small
Concurrency high high low high
© 2018 PURE STORAGE INC.13
TOP 5 LESSONS LEARNED
1.AIisaDataPipeline
2.Don’tThrowYourDataintoDataLake
© 2018 PURE STORAGE INC.14
DATA LAKE
OR DATA GRAVEYARD?
We see customers creating big data
graveyards, dumping everything into
HDFS [Hadoop Distributed File
System] and hoping to do something
with it down the road. But then they
just lose track of what’s there.
The main challenge is not creating a
data lake, but taking advantage of the
opportunities it presents.
“
”
PricewaterhouseCoopers
Technology Forecast, Issue 1, 2014
© 2018 PURE STORAGE INC.15
MODERN ANALYTICS WITH OLD DATA LAKE
SPRAWLING, COMPLEX SILOS & SLOW PERFORMANCE
Each App Locked into Physical Silos
Redundant Data Copies in Silos
Fixed Compute to Storage in Silo
Built for Large, Sequential Data
Optimized for Batch, Not Real-Time
STATIC DATA LAKE
NO LONGER VIABLE
HDFS DATA LAKE
SILO
SILO
SILOSILOSILO
© 2018 PURE STORAGE INC.16
TOP 5 LESSONS LEARNED
1.AIisaDataPipeline
2.Don’tThrowYourDataintoDataLake
3.CloudorNottoCloud?
© 2018 PURE STORAGE INC.17
IT DEPENDS
WHERE YOU ARE ON YOUR AI JOURNEY
EXPLORATION PRODUCTION
NEED Start Immediately
Get New Products & Features to
Market Faster than Competition
© 2018 PURE STORAGE INC.18
IT DEPENDS
WHERE YOU ARE ON YOUR AI JOURNEY
EXPLORATION PRODUCTION
NEED Start Immediately
Get New Products & Features to
Market Faster than Competition
DON’T NEED Bogged Down with Infrastructure
Bogged Down by Performance
& Cost Inefficiencies
© 2018 PURE STORAGE INC.19
IT DEPENDS
WHERE YOU ARE ON YOUR AI JOURNEY
EXPLORATION PRODUCTION
NEED Start Immediately
Get New Products & Features to
Market Faster than Competition
DON’T NEED Bogged Down with Infrastructure
Bogged Down by Performance
& Cost Inefficiencies
RECOMMENDATION Cloud On-Premises
© 2018 PURE STORAGE INC.20
TOP 5 LESSONS LEARNED
1.AIisaDataPipeline
2.Don’tThrowYourDataintoDataLake
3.CloudorNottoCloud?
4.Lies,DamnLies,andBenchmarks
© 2018 PURE STORAGE INC.21
BENCHMARKS DO NOT REFLECT REALITY
IMAGENET
REAL-WORLD AUTONOMOUS CAR
COMPANY
IMAGE SIZE 100-200KB 2-5MB
FILE SIZE 150MB
(Packed TFRecords)
2-5MB
MODE OF TESTING Synthetic (No I/O) Read from Storage
© 2018 PURE STORAGE INC.22
AI TRAINING SYSTEM
GOAL IS TO KEEP THE GPUs 100% BUSY
decode scale
evaluate
forward-
propagation
update
back-propagation
GPUI/O CPU
FULL TRAINING
WORKFLOW
Setup #1: Synthetic Data
from System RAM into
GPUs
Setup #3: Real Image Data from FlashBlade
into DGX-1
BENCHMARK
SETUP
GPU ONLY I/O + CPU + GPU
Setup #2: Real Image Data
from System RAM Through
CPU + GPU
CPU + GPU
© 2018 PURE STORAGE INC.23
NEAR-LINEAR SCALE DELIVERED
AIRI ENGINEERED FOR MAXIMUM PRODUCTIVITY AND OUT-OF-THE-BOX SCALE
DEEP LEARNING TRAINING- MULTI-NODE USING GPUDIRECT RDMA OVER ETHERNET
Comparing Synthetic Mode, Entire Data in DRAM, Entire Data in FlashBlade
© 2018 PURE STORAGE INC.24
TOP 5 LESSONS LEARNED
1.AIisaDataPipeline
2.Don’tThrowYourDataintoDataLake
3.CloudorNottoCloud?
4.Lies,DamnLies,andBenchmarks
5.IdealDataPlatformisaDataHub
© 2018 PURE STORAGE INC.25
IDEAL PLATFORM FOR MODERN ERA
DYNAMIC DATA HUB ARCHITECTED FOR REAL-TIME & ELASTIC DATA
DATA PIPELINE
DATA HUB
“TUNED FOR EVERYTHING”
Small, Random to Large, Seq.
Architected for the Unknown
REAL-TIME
Low Latency Performance for
Instant Response
ALL-FLASH
Modern, Ultra-Fast
Technology
PARALLEL
No Serial Bottlenecks
for Max Throughput
ELASTIC
Grow Non-Disruptively
with More App Clusters
SIMPLE
Focus More on Insights,
Not Infrastructure
© 2018 PURE STORAGE INC. PURE PROPRIETARY26
NVIDIA® DGX-1™ | 4x DGX-1 Systems | 4 PFLOPS of DL Performance
PURE FLASHBLADE™ | 15x 17TB Blades | 1.5M IOPS
ARISTA | 2x 100Gb Ethernet Switches with RDMA
NVIDIA GPU CLOUD DEEP LEARNING STACK | NVIDIA Optimized Frameworks
AIRI SCALING TOOLKIT | Multi-node Training Made Simple
THE INDUSTRY’S FIRST
COMPLETE AI-READY INFRASTRUCTURE
HARDWARE
SOFTWARE
© 2018 PURE STORAGE INC.27
AI & MODERN ANALYTICS
POWERING ANALYTICS FOR WORLD’S LARGEST PUBLIC HEDGE FUND
AI CLEAN & LABEL AI EXPLORE AI TRAIN
CPU Servers GPU Server GPU Servers
SPARK
CPU Servers CPU Servers
MONGO
Our quants want to test a model,
get the results, and then test
another one- all day long. So a
10-20X improvement in
performance is a game-changer
when it comes to creating a
time-to-market advantage for us.
Gary Collier, co-CTO, Man AHL
“
”
© 2018 PURE STORAGE INC.28
ORCHESTRATION WITH OPENSHIFT
(KUBERNETES)
Monitoring
Load balancing
Scheduling
Resource allocation
OPENSHIFT + PURE PROVIDE RECIPE
FOR OPERATIONS AT SCALE
© 2018 PURE STORAGE INC.29
TOP 5 LESSONS LEARNED
1.AIisaDataPipeline
2.Don’tThrowYourDataintoDataLake
3.CloudorNottoCloud?
4.Lies,DamnLies,andBenchmarks
5.IdealDataPlatformisaDataHub
Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS
Ad

More Related Content

What's hot (20)

Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...
DataWorks Summit
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
Arohi Khandelwal
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
DataWorks Summit/Hadoop Summit
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014
James Chittenden
 
Bigdata Machine Learning Platform
Bigdata Machine Learning PlatformBigdata Machine Learning Platform
Bigdata Machine Learning Platform
Mk Kim
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
Adam Doyle
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Nishant Gandhi
 
State of enterprise data science
State of enterprise data scienceState of enterprise data science
State of enterprise data science
Yan Xu
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
InMobi Technology
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark Summit
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Brock Noland
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar
Cloudera, Inc.
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Etu Solution
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
DataWorks Summit
 
BigData Analytics
BigData AnalyticsBigData Analytics
BigData Analytics
Mayank Kumar Sharma
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
Softweb Solutions
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
Alluxio, Inc.
 
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityEmpower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Databricks
 
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
Addressing Enterprise Customer Pain Points with a Data Driven ArchitectureAddressing Enterprise Customer Pain Points with a Data Driven Architecture
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
DataWorks Summit
 
Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...Building intelligent applications, experimental ML with Uber’s Data Science W...
Building intelligent applications, experimental ML with Uber’s Data Science W...
DataWorks Summit
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
Codemotion
 
Introduction of Big data and Hadoop
Introduction of Big data and Hadoop Introduction of Big data and Hadoop
Introduction of Big data and Hadoop
Arohi Khandelwal
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
DataWorks Summit/Hadoop Summit
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014
James Chittenden
 
Bigdata Machine Learning Platform
Bigdata Machine Learning PlatformBigdata Machine Learning Platform
Bigdata Machine Learning Platform
Mk Kim
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
Adam Doyle
 
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Nishant Gandhi
 
State of enterprise data science
State of enterprise data scienceState of enterprise data science
State of enterprise data science
Yan Xu
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big DataThe Synapse IoT Stack: Technology Trends in IOT and Big Data
The Synapse IoT Stack: Technology Trends in IOT and Big Data
InMobi Technology
 
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark in the Hadoop Ecosystem-(Mike Olson, Cloudera)
Spark Summit
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Brock Noland
 
20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar20100806 cloudera 10 hadoopable problems webinar
20100806 cloudera 10 hadoopable problems webinar
Cloudera, Inc.
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Etu Solution
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
DataWorks Summit
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
Softweb Solutions
 
Achieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloadsAchieving compute and storage independence for data-driven workloads
Achieving compute and storage independence for data-driven workloads
Alluxio, Inc.
 
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for CybersecurityEmpower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Empower Splunk and other SIEMs with the Databricks Lakehouse for Cybersecurity
Databricks
 
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
Addressing Enterprise Customer Pain Points with a Data Driven ArchitectureAddressing Enterprise Customer Pain Points with a Data Driven Architecture
Addressing Enterprise Customer Pain Points with a Data Driven Architecture
DataWorks Summit
 

Similar to Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS (20)

Top 5 Lessons Learned in Deploying AI in the Real World
Top 5 Lessons Learned in Deploying AI in the Real WorldTop 5 Lessons Learned in Deploying AI in the Real World
Top 5 Lessons Learned in Deploying AI in the Real World
Digital Transformation EXPO Event Series
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
Deepak Chandramouli
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
Robert Chong
 
Automating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop AgentAutomating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop Agent
CA | Automic Software
 
VSD Paris 2018 - Présentation Finale
VSD Paris 2018 - Présentation FinaleVSD Paris 2018 - Présentation Finale
VSD Paris 2018 - Présentation Finale
Veritas Technologies LLC
 
Postgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres EverywherePostgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres Everywhere
EDB
 
Analyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff ScheelAnalyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff Scheel
Kangaroot
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithm
Masahiko Umeno
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018
Romit Mehta
 
Dataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platform
Deepak Chandramouli
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution Roadshow
MapR Technologies
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
NetApp
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017
Joshua Patterson
 
Accelerated Any-Scale Solutions from DDN
Accelerated Any-Scale Solutions from DDNAccelerated Any-Scale Solutions from DDN
Accelerated Any-Scale Solutions from DDN
inside-BigData.com
 
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
NVIDIA Taiwan
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Synerzip
 
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
Dan Pilone
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
JessicaMurrell3
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
ibi
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
Deepak Chandramouli
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
Robert Chong
 
Automating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop AgentAutomating Big Data with the Automic Hadoop Agent
Automating Big Data with the Automic Hadoop Agent
CA | Automic Software
 
Postgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres EverywherePostgres Vision 2018: Taking Postgres Everywhere
Postgres Vision 2018: Taking Postgres Everywhere
EDB
 
Analyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff ScheelAnalyzing Big Data - Jeff Scheel
Analyzing Big Data - Jeff Scheel
Kangaroot
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithm
Masahiko Umeno
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018
Romit Mehta
 
Dataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platform
Deepak Chandramouli
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution Roadshow
MapR Technologies
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
NetApp
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017
Joshua Patterson
 
Accelerated Any-Scale Solutions from DDN
Accelerated Any-Scale Solutions from DDNAccelerated Any-Scale Solutions from DDN
Accelerated Any-Scale Solutions from DDN
inside-BigData.com
 
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
NVIDIA Taiwan
 
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughtonReal-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Real-Time With AI – The Convergence Of Big Data And AI by Colin MacNaughton
Synerzip
 
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
Dan Pilone
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
JessicaMurrell3
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
ibi
 
Ad

More from Matt Stubbs (20)

Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability ArchitecturesBlueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Matt Stubbs
 
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Matt Stubbs
 
Blueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data PlatformBlueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data Platform
Matt Stubbs
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Matt Stubbs
 
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Matt Stubbs
 
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCEBig Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Matt Stubbs
 
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQLBig Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Matt Stubbs
 
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTSBig Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Matt Stubbs
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Matt Stubbs
 
Big Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPRBig Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPR
Matt Stubbs
 
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Matt Stubbs
 
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Matt Stubbs
 
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Matt Stubbs
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Matt Stubbs
 
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICSBig Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Matt Stubbs
 
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSEBig Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Matt Stubbs
 
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGBig Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Matt Stubbs
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Matt Stubbs
 
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Matt Stubbs
 
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATEBig Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Matt Stubbs
 
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability ArchitecturesBlueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Blueprint Series: Banking In The Cloud – Ultra-high Reliability Architectures
Matt Stubbs
 
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Speed Up Your Apache Cassandra™ Applications: A Practical Guide to Reactive P...
Matt Stubbs
 
Blueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data PlatformBlueprint Series: Expedia Partner Solutions, Data Platform
Blueprint Series: Expedia Partner Solutions, Data Platform
Matt Stubbs
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Matt Stubbs
 
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Big Data LDN 2018: DATA, WHAT PEOPLE THINK AND WHAT YOU CAN DO TO BUILD TRUST.
Matt Stubbs
 
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCEBig Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Big Data LDN 2018: DATABASE FOR THE INSTANT EXPERIENCE
Matt Stubbs
 
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQLBig Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Big Data LDN 2018: BIG DATA TOO SLOW? SPRINKLE IN SOME NOSQL
Matt Stubbs
 
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTSBig Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Big Data LDN 2018: ENABLING DATA-DRIVEN DECISIONS WITH AUTOMATED INSIGHTS
Matt Stubbs
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Matt Stubbs
 
Big Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPRBig Data LDN 2018: AI VS. GDPR
Big Data LDN 2018: AI VS. GDPR
Matt Stubbs
 
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Big Data LDN 2018: REALISING THE PROMISE OF SELF-SERVICE ANALYTICS WITH DATA ...
Matt Stubbs
 
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Big Data LDN 2018: TURNING MULTIPLE DATA LAKES INTO A UNIFIED ANALYTIC DATA L...
Matt Stubbs
 
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Matt Stubbs
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Matt Stubbs
 
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICSBig Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Big Data LDN 2018: MICROLISE: USING BIG DATA AND AI IN TRANSPORT AND LOGISTICS
Matt Stubbs
 
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSEBig Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Big Data LDN 2018: EXPERIAN: MAXIMISE EVERY OPPORTUNITY IN THE BIG DATA UNIVERSE
Matt Stubbs
 
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNINGBig Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Big Data LDN 2018: A LOOK INSIDE APPLIED MACHINE LEARNING
Matt Stubbs
 
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Big Data LDN 2018: DEUTSCHE BANK: THE PATH TO AUTOMATION IN A HIGHLY REGULATE...
Matt Stubbs
 
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Big Data LDN 2018: FROM PROLIFERATION TO PRODUCTIVITY: MACHINE LEARNING DATA ...
Matt Stubbs
 
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATEBig Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Big Data LDN 2018: DATA APIS DON’T DISCRIMINATE
Matt Stubbs
 
Ad

Recently uploaded (20)

Understanding Complex Development Processes
Understanding Complex Development ProcessesUnderstanding Complex Development Processes
Understanding Complex Development Processes
Process mining Evangelist
 
AWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptxAWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptx
bharatkumarbhojwani
 
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Jayantilal Bhanushali
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Language Learning App Data Research by Globibo [2025]
Language Learning App Data Research by Globibo [2025]Language Learning App Data Research by Globibo [2025]
Language Learning App Data Research by Globibo [2025]
globibo
 
Mining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - MicrosoftMining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - Microsoft
Process mining Evangelist
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdfPublication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
StatsCommunications
 
lecture_13 tree in mmmmmmmm mmmmmfftro.pptx
lecture_13 tree in mmmmmmmm     mmmmmfftro.pptxlecture_13 tree in mmmmmmmm     mmmmmfftro.pptx
lecture_13 tree in mmmmmmmm mmmmmfftro.pptx
sarajafffri058
 
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
muhammed84essa
 
Automation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success storyAutomation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success story
Process mining Evangelist
 
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdfTOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
NhiV747372
 
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial IntelligenceDr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
Automated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptxAutomated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptx
handrymaharjan23
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd
emir73065
 
AWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptxAWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptx
bharatkumarbhojwani
 
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Jayantilal Bhanushali
 
problem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursingproblem solving.presentation slideshow bsc nursing
problem solving.presentation slideshow bsc nursing
vishnudathas123
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Language Learning App Data Research by Globibo [2025]
Language Learning App Data Research by Globibo [2025]Language Learning App Data Research by Globibo [2025]
Language Learning App Data Research by Globibo [2025]
globibo
 
Mining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - MicrosoftMining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - Microsoft
Process mining Evangelist
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdfPublication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
Publication-launch-How-is-Life-for-Children-in-the-Digital-Age-15-May-2025.pdf
StatsCommunications
 
lecture_13 tree in mmmmmmmm mmmmmfftro.pptx
lecture_13 tree in mmmmmmmm     mmmmmfftro.pptxlecture_13 tree in mmmmmmmm     mmmmmfftro.pptx
lecture_13 tree in mmmmmmmm mmmmmfftro.pptx
sarajafffri058
 
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
CERTIFIED BUSINESS ANALYSIS PROFESSIONAL™
muhammed84essa
 
Automation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success storyAutomation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success story
Process mining Evangelist
 
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdfTOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
TOAE201-Slides-Chapter 4. Sample theoretical basis (1).pdf
NhiV747372
 
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial IntelligenceDr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug - Expert In Artificial Intelligence
Dr. Robert Krug
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
Automated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptxAutomated Melanoma Detection via Image Processing.pptx
Automated Melanoma Detection via Image Processing.pptx
handrymaharjan23
 
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
文凭证书美国SDSU文凭圣地亚哥州立大学学生证学历认证查询
Taqyea
 
50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd50_questions_full.pptxdddddddddddddddddd
50_questions_full.pptxdddddddddddddddddd
emir73065
 

Big Data LDN 2018: LESSONS LEARNED FROM DEPLOYING REAL-WORLD AI SYSTEMS

  • 1. TOP 5 LESSONS LEARNED IN DEPLOYING AI IN THE REAL WORLD
  • 2. © 2018 PURE STORAGE INC.2 QUESTION ON EVERYONE’S MIND: WHY IS A STORAGE COMPANY TALKING ABOUT AI?
  • 3. © 2018 PURE STORAGE INC.3 NEW ALGORITHMS Massively Parallel Delivering Superhuman Accuracy MODERN COMPUTE Massively Parallel Architecture Driving Performance GPU- THOUSANDS OF CORES BIG DATA Data is the New Oil 50 Zettabytes Created in 2020 EXPLOSION IN ARTIFICIAL INTELLIGENCE FUELED BY PARALLEL COMPUTE, NEW ALGORITHMS, AND BIG DATA
  • 4. © 2018 PURE STORAGE INC.4 FRAMEWORKS GPU SERVER STORAGE TECHNOLOGIES OF THE BIG BANG WHAT CUSTOMERS DEPLOY
  • 5. © 2018 PURE STORAGE INC.5 DATA IS VITAL TO MACHINE LEARNING OBSERVATION BY PROF. ANDREW NG, AI LUMINARY
  • 6. © 2018 PURE STORAGE INC.6 “We don’t have better algorithms, we just have more data” PETER NORVIG Engineering Director, Google
  • 7. © 2018 PURE STORAGE INC.7 The AI “hierarchy of needs” credit: Monica Rogati ML algorithms: linear & logistic regression, k-means clustering, decision trees, etc. Validation: A/B testing, detecting model drift over time✓ Data preparation: cleaning, feature identification, exploration, etc. Data acquisition: ingest, transformation, and representation of data for analysis
  • 8. © 2018 PURE STORAGE INC.8 TOP 5 LESSONS LEARNED 1.AIisaDataPipeline
  • 9. © 2018 PURE STORAGE INC.9 WHAT MOST THINK IS AI NEW POSSIBILITIES For Nearly Every Industry FRAMEWORKS To Get Started GPU The Engine
  • 10. © 2018 PURE STORAGE INC.10 AI IS SO MUCH MORE “Hidden Technical Debt in Machine Learning Systems”, Google NIPS 2015
  • 11. © 2018 PURE STORAGE INC.11 COMPLEXITIES OF AI IN PRODUCTION INGEST From sensors, machines, & user generated CLEAN & TRANSFORM Label, anomaly detection, ETL, prep, stage EXPLORE Quickly iterate to converge on models TRAIN Run for hours to days in production cluster CPU Servers GPU Server GPU Production Cluster COPY & TRANSFORM COPY & TRANSFORM COPY & TRANSFORM
  • 12. © 2018 PURE STORAGE INC.12 WIDE RANGE OF NEEDS IN AI PIPELINE SIGNIFICANT CHALLENGE TO LEGACY STORAGE INGEST From sensors & machines CLEAN & TRANSFORM CPU Servers EXPLORE GPU Server TRAIN GPU Production Cluster Access Pattern sequential sequential or random random random Access Type write read & write read read File Size mostly large small to large small to large mostly small Concurrency high high low high
  • 13. © 2018 PURE STORAGE INC.13 TOP 5 LESSONS LEARNED 1.AIisaDataPipeline 2.Don’tThrowYourDataintoDataLake
  • 14. © 2018 PURE STORAGE INC.14 DATA LAKE OR DATA GRAVEYARD? We see customers creating big data graveyards, dumping everything into HDFS [Hadoop Distributed File System] and hoping to do something with it down the road. But then they just lose track of what’s there. The main challenge is not creating a data lake, but taking advantage of the opportunities it presents. “ ” PricewaterhouseCoopers Technology Forecast, Issue 1, 2014
  • 15. © 2018 PURE STORAGE INC.15 MODERN ANALYTICS WITH OLD DATA LAKE SPRAWLING, COMPLEX SILOS & SLOW PERFORMANCE Each App Locked into Physical Silos Redundant Data Copies in Silos Fixed Compute to Storage in Silo Built for Large, Sequential Data Optimized for Batch, Not Real-Time STATIC DATA LAKE NO LONGER VIABLE HDFS DATA LAKE SILO SILO SILOSILOSILO
  • 16. © 2018 PURE STORAGE INC.16 TOP 5 LESSONS LEARNED 1.AIisaDataPipeline 2.Don’tThrowYourDataintoDataLake 3.CloudorNottoCloud?
  • 17. © 2018 PURE STORAGE INC.17 IT DEPENDS WHERE YOU ARE ON YOUR AI JOURNEY EXPLORATION PRODUCTION NEED Start Immediately Get New Products & Features to Market Faster than Competition
  • 18. © 2018 PURE STORAGE INC.18 IT DEPENDS WHERE YOU ARE ON YOUR AI JOURNEY EXPLORATION PRODUCTION NEED Start Immediately Get New Products & Features to Market Faster than Competition DON’T NEED Bogged Down with Infrastructure Bogged Down by Performance & Cost Inefficiencies
  • 19. © 2018 PURE STORAGE INC.19 IT DEPENDS WHERE YOU ARE ON YOUR AI JOURNEY EXPLORATION PRODUCTION NEED Start Immediately Get New Products & Features to Market Faster than Competition DON’T NEED Bogged Down with Infrastructure Bogged Down by Performance & Cost Inefficiencies RECOMMENDATION Cloud On-Premises
  • 20. © 2018 PURE STORAGE INC.20 TOP 5 LESSONS LEARNED 1.AIisaDataPipeline 2.Don’tThrowYourDataintoDataLake 3.CloudorNottoCloud? 4.Lies,DamnLies,andBenchmarks
  • 21. © 2018 PURE STORAGE INC.21 BENCHMARKS DO NOT REFLECT REALITY IMAGENET REAL-WORLD AUTONOMOUS CAR COMPANY IMAGE SIZE 100-200KB 2-5MB FILE SIZE 150MB (Packed TFRecords) 2-5MB MODE OF TESTING Synthetic (No I/O) Read from Storage
  • 22. © 2018 PURE STORAGE INC.22 AI TRAINING SYSTEM GOAL IS TO KEEP THE GPUs 100% BUSY decode scale evaluate forward- propagation update back-propagation GPUI/O CPU FULL TRAINING WORKFLOW Setup #1: Synthetic Data from System RAM into GPUs Setup #3: Real Image Data from FlashBlade into DGX-1 BENCHMARK SETUP GPU ONLY I/O + CPU + GPU Setup #2: Real Image Data from System RAM Through CPU + GPU CPU + GPU
  • 23. © 2018 PURE STORAGE INC.23 NEAR-LINEAR SCALE DELIVERED AIRI ENGINEERED FOR MAXIMUM PRODUCTIVITY AND OUT-OF-THE-BOX SCALE DEEP LEARNING TRAINING- MULTI-NODE USING GPUDIRECT RDMA OVER ETHERNET Comparing Synthetic Mode, Entire Data in DRAM, Entire Data in FlashBlade
  • 24. © 2018 PURE STORAGE INC.24 TOP 5 LESSONS LEARNED 1.AIisaDataPipeline 2.Don’tThrowYourDataintoDataLake 3.CloudorNottoCloud? 4.Lies,DamnLies,andBenchmarks 5.IdealDataPlatformisaDataHub
  • 25. © 2018 PURE STORAGE INC.25 IDEAL PLATFORM FOR MODERN ERA DYNAMIC DATA HUB ARCHITECTED FOR REAL-TIME & ELASTIC DATA DATA PIPELINE DATA HUB “TUNED FOR EVERYTHING” Small, Random to Large, Seq. Architected for the Unknown REAL-TIME Low Latency Performance for Instant Response ALL-FLASH Modern, Ultra-Fast Technology PARALLEL No Serial Bottlenecks for Max Throughput ELASTIC Grow Non-Disruptively with More App Clusters SIMPLE Focus More on Insights, Not Infrastructure
  • 26. © 2018 PURE STORAGE INC. PURE PROPRIETARY26 NVIDIA® DGX-1™ | 4x DGX-1 Systems | 4 PFLOPS of DL Performance PURE FLASHBLADE™ | 15x 17TB Blades | 1.5M IOPS ARISTA | 2x 100Gb Ethernet Switches with RDMA NVIDIA GPU CLOUD DEEP LEARNING STACK | NVIDIA Optimized Frameworks AIRI SCALING TOOLKIT | Multi-node Training Made Simple THE INDUSTRY’S FIRST COMPLETE AI-READY INFRASTRUCTURE HARDWARE SOFTWARE
  • 27. © 2018 PURE STORAGE INC.27 AI & MODERN ANALYTICS POWERING ANALYTICS FOR WORLD’S LARGEST PUBLIC HEDGE FUND AI CLEAN & LABEL AI EXPLORE AI TRAIN CPU Servers GPU Server GPU Servers SPARK CPU Servers CPU Servers MONGO Our quants want to test a model, get the results, and then test another one- all day long. So a 10-20X improvement in performance is a game-changer when it comes to creating a time-to-market advantage for us. Gary Collier, co-CTO, Man AHL “ ”
  • 28. © 2018 PURE STORAGE INC.28 ORCHESTRATION WITH OPENSHIFT (KUBERNETES) Monitoring Load balancing Scheduling Resource allocation OPENSHIFT + PURE PROVIDE RECIPE FOR OPERATIONS AT SCALE
  • 29. © 2018 PURE STORAGE INC.29 TOP 5 LESSONS LEARNED 1.AIisaDataPipeline 2.Don’tThrowYourDataintoDataLake 3.CloudorNottoCloud? 4.Lies,DamnLies,andBenchmarks 5.IdealDataPlatformisaDataHub
  翻译: