SlideShare a Scribd company logo
Data Lakes – The Key to a Scalable Data Architecture
May 24th , 2017
Ben Sharma | CEO
ben@zaloni.com
2
Industry-leading enterprise
data lake management,
governance and
self-service platform
Expert data lake
professional services
(Design, Implementation,
Workshops, Training)
Solution-based
packaged offerings to
simplify implementation
and reduce business risk
Enabling the data-powered enterprise
3 Zaloni Proprietary
Increased
Agility
New
Insights
Improved
Scalability
Data lakes are central to the modern data architecture
•  Store all types of data in its raw format
•  Create Refined, Standardized, Trusted datasets for various use cases
•  Store data for longer periods of time to enable historical analysis
•  Query and access data using a variety of methods
•  Manage streaming and batch data in a converged platform
•  Provide shorter time-to-insight with proper management and governance
4 Zaloni Proprietary
Data architecture modernizationTraditionalModern
Data Lake
Sources ETL EDW
Derived
(Transformed)
Discovery Sandbox
EDW
Streaming
Unstructured Data
Various Sources
Data Discovery
Analytics BI
Data Science
Data Discovery
Analytics BI
Zaloni Confidential and Proprietary - Provided under NDA
5 Zaloni Proprietary
0% of market
Optimize
Self-Organizing Data Lake
•  Self-improving data
lake via machine
learning algorithms
•  True democratization
of big data and
analytics
•  Intelligent data
remediation and
curation
•  Recommended Data
Security, and
Governance policies
•  Lights out business
operations optimized
for business success
2% of market
Automate
Responsive Data Lake
•  Self-Service Ingestion
& Provisioning
•  360 View of Customer,
Product, etc
•  Enterprise Data
Discovery
•  Operationalize
analytical models into
business fabric
•  Enables immediate data
impact on business
operations
Manage
10% of market
Managed Data Lake
•  Acquire useful data from
across the enterprise
•  Improved visibility and
understanding via
managed Ingestion of
data and metadata
•  Ensure security and privacy
of sensitive data
•  Operationalize
data at scale
•  Leverage enterprise
governance &
security policies
•  Scalable production data
lake for new and improved
business insights
22% of market
Store
Data Swamp
•  Hadoop on premises
or in the Cloud
•  Limited visibility and
usability of data
•  Limited corporate
oversight & governance
•  Sandbox or Dev
Environments
•  Ad hoc and incremental
growth of big data
applications
•  Ad-hoc and exploratory
insights for individual
use cases
Zaloni Big Data Maturity Model
Stage:
Characteristics:
Descriptor:
Stage Today:
Business Impact:
Ignore
66% of market
•  Emphasis on
structured data
•  Limited ability to
leverage data at
scale
•  Business emphasis
on retrospective
reporting and
analysis
•  Strong governance
and security policies
•  Slow to
accommodate
business changes
Data Warehouse
Value Realized
6 Zaloni Proprietary
Data Lake Reference Architecture
•  Enables ad-hoc, exploratory analytics, experimentation
•  Consumers are anyone with appropriate role-based access
•  Standardized on corporate governance/ quality policies
•  Consumers are anyone with appropriate role-based access
•  Single version of truth
Transient
Landing Zone Raw Zone
Refined Zone
Trusted Zone
Sandbox
Data Lake
•  Temporary store of
source data
•  Consumers are IT,
Data Stewards
•  Implemented in highly
regulation industries
•  Original source data
ready for consumption
•  Consumers are ETL
developers, data
stewards, some data
scientists
•  Single source of truth
with history
•  Data required for LOB specific views - transformed
from existing certified data
•  Consumers are anyone with appropriate role-based access
Sensors
(or other time series data)
Relational Data
Stores (OLTP/ODS/
DW)
Logs
(or other unstructured
data)
Social and
shared data
7 Zaloni Proprietary
•  Leverage the full power of a scale-out
architecture with an actionable, scalable
data lake
Data Lake 360°: Zaloni’s integrated platform for data lakes
1. Enable the lake
2. Govern the data
•  Improve data visibility,
reliability and quality to
reduce time-to-insight
•  Safeguard sensitive data and
enable regulatory compliance
•  Foster a data-driven business
through self-service data
discovery and preparation
3. Engage the business
8 Zaloni Proprietary
1.  Based on a foundation of metadata management
2.  Lightweight and distributed
3.  Hybrid – top down and bottom up approach
Data Governance
Centrally governed, critical data
elements
Regionally governed,
departmental data sets
Locally governed, data used in
specific applications
Gartner Data and Analytics 2017
9 Zaloni Proprietary
•  Central to a well-managed data lake – provides visibility, reliability and enables
data governance
•  Capture and manage operational, technical and business metadata
•  Reduced time to insight for analytics
•  Types of metadata:
§  Where it resides, and how was it ingested
§  What it means, and how it should be interpreted
§  What governance policies apply to it
§  What it's worth, and how its value can be expressed
§  Who it's accessed and consumed by
§  Which business processes downstream consume it
Metadata Management
10 Zaloni Proprietary
Metadata Exchange Framework
1.  Metadata sharing is critical for an integrated approach
2.  Federated approach for metadata collection
3.  Two way metadata exchange between the Data Lake and other Enterprise
repositories
Metadata Exchange Framework
Data Lake
Enterprise Metadata
repository
Two way
exchange
11 Zaloni Proprietary
•  Ability to ingest vast amounts of data
•  Ability to handle a wide variety of formats (streaming, files, custom)
and sources
•  Build in repeatability
via automation to pick up
incoming data and apply
pre-defined processing
Managed Ingestion
12 Zaloni Proprietary
•  See how data moves and how it is consumed in the data lake
•  Safeguard data and reduce risk, always knowing where data has come from,
where it is, and how it is being used
Data Lineage
13 Zaloni Proprietary
•  Rules based data validation
•  Integration with the
managed data pipeline
•  Stats and metrics for
reporting and actions
•  Automation, Remediation,
Notifications
Future:
•  ML based classification
Data Quality
14 Zaloni Proprietary
•  Secure infrastructure for data in motion and data at rest
•  Role based access control for Metadata and the data
•  Mask or tokenize data before published in the lake for consumption
•  Audit, access logs, alerts and notifications
Data Security and Privacy
15 Zaloni Proprietary
1.  Hot -> Warm -> Cold on an entity level based on policies/SLAs
2.  Provide data management features to automate scheduling and orchestration
of data movement between heterogeneous storage environments
3.  Across on-premise and cloud environments
Data Lifecycle Management
16 Zaloni Proprietary
Data Catalog
•  See what data is available across your enterprise
•  Contribute valuable business information to improve search and usage
•  Use a shopping
cart experience
to create sandbox
for ad-hoc
and exploratory
analytics
17 Zaloni Proprietary
Self-service Data Preparation
•  Blend data in the lake without a costly IT project
•  Perform interactive data-driven transformations
18 Zaloni Proprietary
•  How do you create a cloud agnostic data
lake platform?
•  How deploy a cost-effective compute layer?
§  Elastic compute layer
§  Batch and near real-time
•  How do you optimize storage?
§  Support polyglot persistence
§  Data Lifecycle Management
•  How do you optimize network connectivity
between Ground to Cloud?
•  How do you meet enterprise security
requirements?
Considerations for data lake in the cloud
CLOUD and HYBRID
ENVIRONMENTS
19 Zaloni Proprietary
Building your blueprint
1. Questions 2. Inputs 3. Outcomes
Business Drivers
AND Business
Questions:
e.g. Where is fraud
occurring? How do I
optimize inventory?
Data Use Cases Platform
Subject Areas
Source System
Capabilities,
Process
Ingest, Organize,
Enrich, Explore
Roadmap
Managed
Data Lake
Analytics
Strategy
=
++
20 Zaloni Proprietary
New Buyer’s Guide on Data Lake Management and Governance
•  Zaloni and Industry analyst firm, Enterprise
Strategy Group,
collaborated on a guide
to help you:
1.  Define evaluation criteria and compare
common options
2.  Set up a successful proof of concept (PoC)
3.  Develop an implementation that is future-
proofed
Download now at: resources.zaloni.com
21 Zaloni Proprietary
Free eBooks to help you future-proof your data lake initiative
Download now at: resources.zaloni.com
DATA LAKE MANAGEMENT
AND GOVERNANCE PLATFORM
SELF-SERVICE DATA PLATFORM
Ad

More Related Content

What's hot (20)

Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
Angel Abundez
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
Knoldus Inc.
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
DATAVERSITY
 
Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
Snowflake Computing
 
How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...
Christopher Bradley
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
Durga Gadiraju
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
DataWorks Summit
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
Databricks
 
Data Mesh
Data MeshData Mesh
Data Mesh
Piethein Strengholt
 
The Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data StrategyThe Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data Strategy
DATAVERSITY
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
Angel Abundez
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
Knoldus Inc.
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
James Serra
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
DATAVERSITY
 
How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...How to identify the correct Master Data subject areas & tooling for your MDM...
How to identify the correct Master Data subject areas & tooling for your MDM...
Christopher Bradley
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
Durga Gadiraju
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
DataWorks Summit
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
Databricks
 
The Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data StrategyThe Role of Data Governance in a Data Strategy
The Role of Data Governance in a Data Strategy
DATAVERSITY
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 

Similar to Data Lakes - The Key to a Scalable Data Architecture (20)

Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
Zaloni
 
Operationalizing your Data Lake: Get Ready for Advanced Analytics
Operationalizing your Data Lake: Get Ready for Advanced AnalyticsOperationalizing your Data Lake: Get Ready for Advanced Analytics
Operationalizing your Data Lake: Get Ready for Advanced Analytics
IDEAS - Int'l Data Engineering and Science Association
 
Creating a Modern Data Architecture
Creating a Modern Data ArchitectureCreating a Modern Data Architecture
Creating a Modern Data Architecture
Zaloni
 
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase PowerWebinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Zaloni
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
DataScienceConferenc1
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Zaloni
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
datastack
 
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Denodo
 
Houd controle over uw data
Houd controle over uw dataHoud controle over uw data
Houd controle over uw data
ICT-Partners
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
DATAVERSITY
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Denodo
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
 
New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013
IBM Sverige
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Denodo
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
itnewsafrica
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
DATAVERSITY
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
Zaloni
 
Creating a Modern Data Architecture
Creating a Modern Data ArchitectureCreating a Modern Data Architecture
Creating a Modern Data Architecture
Zaloni
 
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase PowerWebinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Webinar -Data Warehouse Augmentation: Cut Costs, Increase Power
Zaloni
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
DataScienceConferenc1
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Zaloni
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
datastack
 
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)
Denodo
 
Houd controle over uw data
Houd controle over uw dataHoud controle over uw data
Houd controle over uw data
ICT-Partners
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
DATAVERSITY
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Denodo
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
 
New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013New Innovations in Information Management for Big Data - Smarter Business 2013
New Innovations in Information Management for Big Data - Smarter Business 2013
IBM Sverige
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Denodo
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
itnewsafrica
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
DATAVERSITY
 
Ad

More from Zaloni (6)

Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataWebinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Zaloni
 
Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big Data
Zaloni
 
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big DataWebinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
Zaloni
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...
Zaloni
 
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in thereOvum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Zaloni
 
Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe?
Zaloni
 
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of DataWebinar - Data Lake Management: Extending Storage and Lifecycle of Data
Webinar - Data Lake Management: Extending Storage and Lifecycle of Data
Zaloni
 
Cloud Computing and Big Data
Cloud Computing and Big DataCloud Computing and Big Data
Cloud Computing and Big Data
Zaloni
 
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big DataWebinar - Risky Business: How to Balance Innovation & Risk in Big Data
Webinar - Risky Business: How to Balance Innovation & Risk in Big Data
Zaloni
 
Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...Understanding Metadata: Why it's essential to your big data solution and how ...
Understanding Metadata: Why it's essential to your big data solution and how ...
Zaloni
 
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in thereOvum Fireside Chat: Governing the data lake - Understanding what's in there
Ovum Fireside Chat: Governing the data lake - Understanding what's in there
Zaloni
 
Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe?
Zaloni
 
Ad

Recently uploaded (20)

fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Top-AI-Based-Tools-for-Game-Developers (1).pptx
Top-AI-Based-Tools-for-Game-Developers (1).pptxTop-AI-Based-Tools-for-Game-Developers (1).pptx
Top-AI-Based-Tools-for-Game-Developers (1).pptx
BR Softech
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
Agentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community MeetupAgentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community Meetup
Manoj Batra (1600 + Connections)
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
Artificial_Intelligence_in_Everyday_Life.pptx
Artificial_Intelligence_in_Everyday_Life.pptxArtificial_Intelligence_in_Everyday_Life.pptx
Artificial_Intelligence_in_Everyday_Life.pptx
03ANMOLCHAURASIYA
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025
Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025
Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025
João Esperancinha
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptxSmart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Seasia Infotech
 
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
Com fer un pla de gestió de dades amb l'eiNa DMP (en anglès)
CSUC - Consorci de Serveis Universitaris de Catalunya
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
Slack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teamsSlack like a pro: strategies for 10x engineering teams
Slack like a pro: strategies for 10x engineering teams
Nacho Cougil
 
Top-AI-Based-Tools-for-Game-Developers (1).pptx
Top-AI-Based-Tools-for-Game-Developers (1).pptxTop-AI-Based-Tools-for-Game-Developers (1).pptx
Top-AI-Based-Tools-for-Game-Developers (1).pptx
BR Softech
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Limecraft Webinar - 2025.3 release, featuring Content Delivery, Graphic Conte...
Maarten Verwaest
 
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
Markus Eisele
 
AsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API DesignAsyncAPI v3 : Streamlining Event-Driven API Design
AsyncAPI v3 : Streamlining Event-Driven API Design
leonid54
 
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
The No-Code Way to Build a Marketing Team with One AI Agent (Download the n8n...
SOFTTECHHUB
 
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Optima Cyber - Maritime Cyber Security - MSSP Services - Manolis Sfakianakis ...
Mike Mingos
 
Unlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web AppsUnlocking Generative AI in your Web Apps
Unlocking Generative AI in your Web Apps
Maximiliano Firtman
 
Artificial_Intelligence_in_Everyday_Life.pptx
Artificial_Intelligence_in_Everyday_Life.pptxArtificial_Intelligence_in_Everyday_Life.pptx
Artificial_Intelligence_in_Everyday_Life.pptx
03ANMOLCHAURASIYA
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025
Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025
Could Virtual Threads cast away the usage of Kotlin Coroutines - DevoxxUK2025
João Esperancinha
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptxSmart Investments Leveraging Agentic AI for Real Estate Success.pptx
Smart Investments Leveraging Agentic AI for Real Estate Success.pptx
Seasia Infotech
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à GenèveUiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPath Automation Suite – Cas d'usage d'une NGO internationale basée à Genève
UiPathCommunity
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 

Data Lakes - The Key to a Scalable Data Architecture

  • 1. Data Lakes – The Key to a Scalable Data Architecture May 24th , 2017 Ben Sharma | CEO ben@zaloni.com
  • 2. 2 Industry-leading enterprise data lake management, governance and self-service platform Expert data lake professional services (Design, Implementation, Workshops, Training) Solution-based packaged offerings to simplify implementation and reduce business risk Enabling the data-powered enterprise
  • 3. 3 Zaloni Proprietary Increased Agility New Insights Improved Scalability Data lakes are central to the modern data architecture •  Store all types of data in its raw format •  Create Refined, Standardized, Trusted datasets for various use cases •  Store data for longer periods of time to enable historical analysis •  Query and access data using a variety of methods •  Manage streaming and batch data in a converged platform •  Provide shorter time-to-insight with proper management and governance
  • 4. 4 Zaloni Proprietary Data architecture modernizationTraditionalModern Data Lake Sources ETL EDW Derived (Transformed) Discovery Sandbox EDW Streaming Unstructured Data Various Sources Data Discovery Analytics BI Data Science Data Discovery Analytics BI
  • 5. Zaloni Confidential and Proprietary - Provided under NDA 5 Zaloni Proprietary 0% of market Optimize Self-Organizing Data Lake •  Self-improving data lake via machine learning algorithms •  True democratization of big data and analytics •  Intelligent data remediation and curation •  Recommended Data Security, and Governance policies •  Lights out business operations optimized for business success 2% of market Automate Responsive Data Lake •  Self-Service Ingestion & Provisioning •  360 View of Customer, Product, etc •  Enterprise Data Discovery •  Operationalize analytical models into business fabric •  Enables immediate data impact on business operations Manage 10% of market Managed Data Lake •  Acquire useful data from across the enterprise •  Improved visibility and understanding via managed Ingestion of data and metadata •  Ensure security and privacy of sensitive data •  Operationalize data at scale •  Leverage enterprise governance & security policies •  Scalable production data lake for new and improved business insights 22% of market Store Data Swamp •  Hadoop on premises or in the Cloud •  Limited visibility and usability of data •  Limited corporate oversight & governance •  Sandbox or Dev Environments •  Ad hoc and incremental growth of big data applications •  Ad-hoc and exploratory insights for individual use cases Zaloni Big Data Maturity Model Stage: Characteristics: Descriptor: Stage Today: Business Impact: Ignore 66% of market •  Emphasis on structured data •  Limited ability to leverage data at scale •  Business emphasis on retrospective reporting and analysis •  Strong governance and security policies •  Slow to accommodate business changes Data Warehouse Value Realized
  • 6. 6 Zaloni Proprietary Data Lake Reference Architecture •  Enables ad-hoc, exploratory analytics, experimentation •  Consumers are anyone with appropriate role-based access •  Standardized on corporate governance/ quality policies •  Consumers are anyone with appropriate role-based access •  Single version of truth Transient Landing Zone Raw Zone Refined Zone Trusted Zone Sandbox Data Lake •  Temporary store of source data •  Consumers are IT, Data Stewards •  Implemented in highly regulation industries •  Original source data ready for consumption •  Consumers are ETL developers, data stewards, some data scientists •  Single source of truth with history •  Data required for LOB specific views - transformed from existing certified data •  Consumers are anyone with appropriate role-based access Sensors (or other time series data) Relational Data Stores (OLTP/ODS/ DW) Logs (or other unstructured data) Social and shared data
  • 7. 7 Zaloni Proprietary •  Leverage the full power of a scale-out architecture with an actionable, scalable data lake Data Lake 360°: Zaloni’s integrated platform for data lakes 1. Enable the lake 2. Govern the data •  Improve data visibility, reliability and quality to reduce time-to-insight •  Safeguard sensitive data and enable regulatory compliance •  Foster a data-driven business through self-service data discovery and preparation 3. Engage the business
  • 8. 8 Zaloni Proprietary 1.  Based on a foundation of metadata management 2.  Lightweight and distributed 3.  Hybrid – top down and bottom up approach Data Governance Centrally governed, critical data elements Regionally governed, departmental data sets Locally governed, data used in specific applications Gartner Data and Analytics 2017
  • 9. 9 Zaloni Proprietary •  Central to a well-managed data lake – provides visibility, reliability and enables data governance •  Capture and manage operational, technical and business metadata •  Reduced time to insight for analytics •  Types of metadata: §  Where it resides, and how was it ingested §  What it means, and how it should be interpreted §  What governance policies apply to it §  What it's worth, and how its value can be expressed §  Who it's accessed and consumed by §  Which business processes downstream consume it Metadata Management
  • 10. 10 Zaloni Proprietary Metadata Exchange Framework 1.  Metadata sharing is critical for an integrated approach 2.  Federated approach for metadata collection 3.  Two way metadata exchange between the Data Lake and other Enterprise repositories Metadata Exchange Framework Data Lake Enterprise Metadata repository Two way exchange
  • 11. 11 Zaloni Proprietary •  Ability to ingest vast amounts of data •  Ability to handle a wide variety of formats (streaming, files, custom) and sources •  Build in repeatability via automation to pick up incoming data and apply pre-defined processing Managed Ingestion
  • 12. 12 Zaloni Proprietary •  See how data moves and how it is consumed in the data lake •  Safeguard data and reduce risk, always knowing where data has come from, where it is, and how it is being used Data Lineage
  • 13. 13 Zaloni Proprietary •  Rules based data validation •  Integration with the managed data pipeline •  Stats and metrics for reporting and actions •  Automation, Remediation, Notifications Future: •  ML based classification Data Quality
  • 14. 14 Zaloni Proprietary •  Secure infrastructure for data in motion and data at rest •  Role based access control for Metadata and the data •  Mask or tokenize data before published in the lake for consumption •  Audit, access logs, alerts and notifications Data Security and Privacy
  • 15. 15 Zaloni Proprietary 1.  Hot -> Warm -> Cold on an entity level based on policies/SLAs 2.  Provide data management features to automate scheduling and orchestration of data movement between heterogeneous storage environments 3.  Across on-premise and cloud environments Data Lifecycle Management
  • 16. 16 Zaloni Proprietary Data Catalog •  See what data is available across your enterprise •  Contribute valuable business information to improve search and usage •  Use a shopping cart experience to create sandbox for ad-hoc and exploratory analytics
  • 17. 17 Zaloni Proprietary Self-service Data Preparation •  Blend data in the lake without a costly IT project •  Perform interactive data-driven transformations
  • 18. 18 Zaloni Proprietary •  How do you create a cloud agnostic data lake platform? •  How deploy a cost-effective compute layer? §  Elastic compute layer §  Batch and near real-time •  How do you optimize storage? §  Support polyglot persistence §  Data Lifecycle Management •  How do you optimize network connectivity between Ground to Cloud? •  How do you meet enterprise security requirements? Considerations for data lake in the cloud CLOUD and HYBRID ENVIRONMENTS
  • 19. 19 Zaloni Proprietary Building your blueprint 1. Questions 2. Inputs 3. Outcomes Business Drivers AND Business Questions: e.g. Where is fraud occurring? How do I optimize inventory? Data Use Cases Platform Subject Areas Source System Capabilities, Process Ingest, Organize, Enrich, Explore Roadmap Managed Data Lake Analytics Strategy = ++
  • 20. 20 Zaloni Proprietary New Buyer’s Guide on Data Lake Management and Governance •  Zaloni and Industry analyst firm, Enterprise Strategy Group, collaborated on a guide to help you: 1.  Define evaluation criteria and compare common options 2.  Set up a successful proof of concept (PoC) 3.  Develop an implementation that is future- proofed Download now at: resources.zaloni.com
  • 21. 21 Zaloni Proprietary Free eBooks to help you future-proof your data lake initiative Download now at: resources.zaloni.com
  • 22. DATA LAKE MANAGEMENT AND GOVERNANCE PLATFORM SELF-SERVICE DATA PLATFORM
  翻译: