SlideShare a Scribd company logo
An adaptive and eventually self-healing framework for geo-
distributed real-time data ingestion
Angad Singh
InMobi
The problem domain
Scale
● 15 billion events per day (post filtering)
● 1.5+ billion users, 200 million per day
● 4 geographically distributed data centers (DCs)
● User’s request may land on non-local DC
Ingestion requirements
● multiple tenants, multiple schemas per tenant
● batch, stream, micro-batch and on-demand ingestion
● 20+ streams, 100+ data types
● need to ingest, transform, validate and aggregate this data
● need to ingest streaming data in real-time (<1 min) for ad-serving/targeting use
cases (strict SLA)
The problem domain
Usage/serving requirements
● need to pivot this data by user, activity type and other primary keys
● serve an aggregated view (profile) at the end in < 5ms p99 latency
● need both real-time serving of the view
● as well as batch summaries for analytics, inference algorithms, feedback loops
● need to be resilient to failure, absolutely no room for data loss/lag in ingestion
Data arrival, volume and velocity
● data may be received out of order, or duplicated
● data can arrive in periodic batches or real-time/streaming or once in a while
● data may arrive in bursts or trickle slowly in some streams (autoscale)
● user data may be received in any DC, but needs to be collectively available in a
single DC
The problem domain
Multi-tenancy
● Quotas
● Rate limiting/SLAs
● Isolation
Manageability
● need to be self-serve, flexible for specific changes in the flow, easily deployable
● may need online migration, reprocessing, etc. of data
● hassle-free schema evolution across the stack
● monitoring, visibility, operability aspects for all of the above
The architecture
Serving layer
(user store)
aerospike
cluster
API
dedup, aggregate, business rules
Ratelimiting/quotas
API
Ad serving
<5ms,99.95%success
notifications
pubsub
(kafka)
notification
listeners (storm)
periodic
dumps
streaming
offline snapshot
store (HDFS)
batch inference jobs
(MR/spark)
analytics engine
(cubes, lens)
real-time
enrichment
on user
engagement
Ingestion layer
globaldcglobaldc
offline snapshot
store (HDFS)
globaldc
localdc
localdc
localdc
upstreamingestionsources
batchsources
streamingsources
adaptors
localdc
adaptors
adaptors
(MR/storm)
localdclocaldc
routers
localdc
routers
routers
(MR/storm)
localdclocaldc
sinkssinks
sinks
(MR/storm)
remotedc
Ingestion Service
orchestrate/manage
remotedc remotedc
Architecture
DC1 (global)
DC2 (slave)
DC3 (slave)
adaptors
(MR)
adaptors
(storm)
adaptors
(storm)
routers
(MR)
routers
(storm)
routers
(storm)
User-Colo
Metadata
(aerospike)
User-Colo
Metadata
(aerospike)
User-Colo
Metadata
(aerospike)
custom
replication
(contains userid)
(contains userid)
(contains userid)
sinks
getColo(userid)
sinks
sinks
Kafka-data-replicator
(stormtopology)
global colo tagger
(storm)
tag not found
tagged
data
tag found
write tag
User Store
API
History
Profile
User Store
API
History
Profile
User Store
API
History
Profile
XDR
(profile)
Cross-DC
architecture
custom
replication
DC1 (global)
DC2 (slave)
DC3 (slave)
adaptors
(MR)
adaptors
(storm)
adaptors
(storm)
routers
(storm)
routers
(storm)
routers
(storm)
User-Colo
Metadata
(aerospike)
User-Colo
Metadata
(aerospike)
User-Colo
Metadata
(aerospike)
XDRXDR
(contains userid)
(contains userid)
(contains userid)
sinks
getColo(userid)
sinks
sinks
Kafka-data-replicator
(stormtopology)
global colo tagger
(storm)
tag not found
tagged
data
tag found
write tag
User Store
API
History
Profile
User Store
API
History
Profile
User Store
API
History
Profile
XDR
(profile)
Mapper
Comparison to
map-reducePartitioner Shuffler Reducer
The ingestion layer
An adaptive and eventually self healing framework for geo-distributed real-time data ingestion
Current Features
Business-agnostic APIs
● Built on simple RESTful APIs: Schema, Feed, Sink, Source, Flow, Driver, Data router, Adaptor
● Unified APIs for doing batch, streaming, micro-batch ingestion.
● Self-serve system which provides rule validation, metrics, etc. and makes the expression of sources,
sinks and flows easy with custom DSL.
Platform-agnostic Flow Execution
● Pluggable execution engine (storm, hadoop, spark) - provides a Driver API
● Uses falcon for batch scheduling, in-built scheduler for streaming drivers (storm, etc.)
Serialization support
● Pluggable schema serde support (thrift, avro)
Current Features
Schema management
● Schema is a first class citizen.
● Contract between source, sink and flow all based on and validated against schema
● Schema versioning and compatibility checks.
● Error-free schema evolution across data flows
● Clean abstractions to centrally manage all the schemas, data sources/feeds, sinks (key value store,
HDFS, etc.) and data flows (storm topologies, MR jobs) which are part of the ingestion pipelines
Manageability, operability
● All entities - schemas, sinks and flows - can be updated online without any downtime.
● Retries, error handling, metrics, orchestration hooks, etc. come standard
Out of the box support for
● Cross-colo flow chaining
● Data routing
● Transformation, validation, conversion
● All based on pluggable code
An adaptive and eventually self healing framework for geo-distributed real-time data ingestion
An adaptive and eventually self healing framework for geo-distributed real-time data ingestion
The problems we’ve seen
Storm
● as usual, lot of knobs to tune based on lot of metrics: workers, threads, tasks, acks, max
spout pending, buffer sizes, xmx, num slots, execute/process/ack latency, capacity, etc.
● debugging storm topology’s isn’t easy: threads, workers, shared logs, shuffling of data
between workers, netty, the ack system, etc.
● storm (0.9.x) doesn’t like heterogenous load: unbalanced distribution between supervisors.
heavy topologies can choke each other. rebalancing not fully resource aware (1.x tries to
solve this)
● no rolling upgrades, supervisor failures cause unrecoverable errors
● zookeeper issues: too many executors leads to worker heartbeat update failure to zk.
● storm-kafka issue: storm-kafka spout unaware of purging (earliestOffset update)
● storm-kafka issue: invisible data loss
● retries should done cautiously
● etc
Kafka
● topic deletion asynchronous, slow
● tuning num partitions manually
● bad consumers can cause excessive logging on brokers
Features under development
● Autoscaling flows - rebalance storm topology based on spout lag, priority and
current throughput (or bolt capacity) - runtime metrics or linear regression on
historical metrics
● Streaming and batch compaction / dedup of data based on domain specific
rules
● Automatic fallback from streaming to batch ingestion in case of huge
backlogs, for low priority ingestions
● Dynamic rerouting / sharding of data between DCs for load balancing cross-
DC flows
● Eventual self-correction of data based on validations on the aggregated view
(data received from multiple streams)
● Data lineage/auditing
● Backfill management
Ad

More Related Content

What's hot (19)

Monitoring Cassandra with graphite using Yammer Coda-Hale Library
Monitoring Cassandra with graphite using Yammer Coda-Hale LibraryMonitoring Cassandra with graphite using Yammer Coda-Hale Library
Monitoring Cassandra with graphite using Yammer Coda-Hale Library
Nader Ganayem
 
Windowing in apex
Windowing in apexWindowing in apex
Windowing in apex
Yogi Devendra Vyavahare
 
Devops kc
Devops kcDevops kc
Devops kc
Philip Thompson
 
Optimistic Algorithm and Concurrency Control Algorithm
Optimistic Algorithm and Concurrency Control AlgorithmOptimistic Algorithm and Concurrency Control Algorithm
Optimistic Algorithm and Concurrency Control Algorithm
Shounak Katyayan
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Flink Forward
 
Fault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache ApexFault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache Apex
Apache Apex Organizer
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
Thomas Weise
 
Stateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkStateful stream processing with Apache Flink
Stateful stream processing with Apache Flink
Knoldus Inc.
 
Apache Samza - New features in the upcoming Samza release 0.10.0
Apache Samza - New features in the upcoming Samza release 0.10.0Apache Samza - New features in the upcoming Samza release 0.10.0
Apache Samza - New features in the upcoming Samza release 0.10.0
Navina Ramesh
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
Renato Guimaraes
 
An Introduction to Prometheus
An Introduction to PrometheusAn Introduction to Prometheus
An Introduction to Prometheus
Evgeny Shmarnev
 
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
confluent
 
Monitoring and Alerting with InfluxDB 2.0 | Deniz Kusefoglu & Nate Isley | In...
Monitoring and Alerting with InfluxDB 2.0 | Deniz Kusefoglu & Nate Isley | In...Monitoring and Alerting with InfluxDB 2.0 | Deniz Kusefoglu & Nate Isley | In...
Monitoring and Alerting with InfluxDB 2.0 | Deniz Kusefoglu & Nate Isley | In...
InfluxData
 
From Startup to Mature Company: PostgreSQL Tips and techniques
From Startup to Mature Company:  PostgreSQL Tips and techniquesFrom Startup to Mature Company:  PostgreSQL Tips and techniques
From Startup to Mature Company: PostgreSQL Tips and techniques
John Ashmead
 
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward
 
Distributed systems vs compositionality
Distributed systems vs compositionalityDistributed systems vs compositionality
Distributed systems vs compositionality
Roland Kuhn
 
Low latency stream processing with jet
Low latency stream processing with jetLow latency stream processing with jet
Low latency stream processing with jet
StreamNative
 
Harvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's FeedHarvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's Feed
Mohamed El-Geish
 
HBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at Xiaomi
HBaseCon
 
Monitoring Cassandra with graphite using Yammer Coda-Hale Library
Monitoring Cassandra with graphite using Yammer Coda-Hale LibraryMonitoring Cassandra with graphite using Yammer Coda-Hale Library
Monitoring Cassandra with graphite using Yammer Coda-Hale Library
Nader Ganayem
 
Optimistic Algorithm and Concurrency Control Algorithm
Optimistic Algorithm and Concurrency Control AlgorithmOptimistic Algorithm and Concurrency Control Algorithm
Optimistic Algorithm and Concurrency Control Algorithm
Shounak Katyayan
 
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy FarkasVirtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Virtual Flink Forward 2020: Autoscaling Flink at Netflix - Timothy Farkas
Flink Forward
 
Fault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache ApexFault Tolerance and Processing Semantics in Apache Apex
Fault Tolerance and Processing Semantics in Apache Apex
Apache Apex Organizer
 
BigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache ApexBigDataSpain 2016: Introduction to Apache Apex
BigDataSpain 2016: Introduction to Apache Apex
Thomas Weise
 
Stateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkStateful stream processing with Apache Flink
Stateful stream processing with Apache Flink
Knoldus Inc.
 
Apache Samza - New features in the upcoming Samza release 0.10.0
Apache Samza - New features in the upcoming Samza release 0.10.0Apache Samza - New features in the upcoming Samza release 0.10.0
Apache Samza - New features in the upcoming Samza release 0.10.0
Navina Ramesh
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
Renato Guimaraes
 
An Introduction to Prometheus
An Introduction to PrometheusAn Introduction to Prometheus
An Introduction to Prometheus
Evgeny Shmarnev
 
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
Data-Oriented Programming with Clojure and Jackdaw (Charles Reese, Funding Ci...
confluent
 
Monitoring and Alerting with InfluxDB 2.0 | Deniz Kusefoglu & Nate Isley | In...
Monitoring and Alerting with InfluxDB 2.0 | Deniz Kusefoglu & Nate Isley | In...Monitoring and Alerting with InfluxDB 2.0 | Deniz Kusefoglu & Nate Isley | In...
Monitoring and Alerting with InfluxDB 2.0 | Deniz Kusefoglu & Nate Isley | In...
InfluxData
 
From Startup to Mature Company: PostgreSQL Tips and techniques
From Startup to Mature Company:  PostgreSQL Tips and techniquesFrom Startup to Mature Company:  PostgreSQL Tips and techniques
From Startup to Mature Company: PostgreSQL Tips and techniques
John Ashmead
 
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward Berlin 2017: Stefan Richter - A look at Flink's internal data s...
Flink Forward
 
Distributed systems vs compositionality
Distributed systems vs compositionalityDistributed systems vs compositionality
Distributed systems vs compositionality
Roland Kuhn
 
Low latency stream processing with jet
Low latency stream processing with jetLow latency stream processing with jet
Low latency stream processing with jet
StreamNative
 
Harvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's FeedHarvesting the Power of Samza in LinkedIn's Feed
Harvesting the Power of Samza in LinkedIn's Feed
Mohamed El-Geish
 
HBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at Xiaomi
HBaseCon
 

Viewers also liked (12)

Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 Planning and Optimizing Data Lake Architecture - Milos Milovanovic Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Institute of Contemporary Sciences
 
Designing a Real Time Data Ingestion Pipeline
Designing a Real Time Data Ingestion PipelineDesigning a Real Time Data Ingestion Pipeline
Designing a Real Time Data Ingestion Pipeline
DataScience
 
Creating a Modern Data Architecture
Creating a Modern Data ArchitectureCreating a Modern Data Architecture
Creating a Modern Data Architecture
Zaloni
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Zaloni
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
Data Science Thailand
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
Thomas Kelly, PMP
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Hortonworks
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestion
Vinod Nayal
 
Analysing Smart City Development in india
Analysing Smart City Development in indiaAnalysing Smart City Development in india
Analysing Smart City Development in india
Omkar Parishwad
 
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 Planning and Optimizing Data Lake Architecture - Milos Milovanovic Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Institute of Contemporary Sciences
 
Designing a Real Time Data Ingestion Pipeline
Designing a Real Time Data Ingestion PipelineDesigning a Real Time Data Ingestion Pipeline
Designing a Real Time Data Ingestion Pipeline
DataScience
 
Creating a Modern Data Architecture
Creating a Modern Data ArchitectureCreating a Modern Data Architecture
Creating a Modern Data Architecture
Zaloni
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Zaloni
 
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
Thomas Kelly, PMP
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Hortonworks
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestion
Vinod Nayal
 
Analysing Smart City Development in india
Analysing Smart City Development in indiaAnalysing Smart City Development in india
Analysing Smart City Development in india
Omkar Parishwad
 
Ad

Similar to An adaptive and eventually self healing framework for geo-distributed real-time data ingestion (20)

Real-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache ApexReal-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache Apex
Apache Apex
 
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsWill it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Navina Ramesh
 
Eranea's solution and technology for mainframe migration / transformation : d...
Eranea's solution and technology for mainframe migration / transformation : d...Eranea's solution and technology for mainframe migration / transformation : d...
Eranea's solution and technology for mainframe migration / transformation : d...
Eranea
 
Network visibility and control using industry standard sFlow telemetry
Network visibility and control using industry standard sFlow telemetryNetwork visibility and control using industry standard sFlow telemetry
Network visibility and control using industry standard sFlow telemetry
pphaal
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
Big Data Spain
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
Maycon Viana Bordin
 
M|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouM|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for You
MariaDB plc
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
Timothy McCormick
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
DataWorks Summit
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Neil Avery
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations
Nicola Kabar
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Martin Zapletal
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
High-Speed Reactive Microservices
High-Speed Reactive MicroservicesHigh-Speed Reactive Microservices
High-Speed Reactive Microservices
Rick Hightower
 
redGuardian DP100 large scale DDoS mitigation solution
redGuardian DP100 large scale DDoS mitigation solutionredGuardian DP100 large scale DDoS mitigation solution
redGuardian DP100 large scale DDoS mitigation solution
Redge Technologies
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
Masashi Narumoto
 
Reactive mistakes - ScalaDays Chicago 2017
Reactive mistakes -  ScalaDays Chicago 2017Reactive mistakes -  ScalaDays Chicago 2017
Reactive mistakes - ScalaDays Chicago 2017
Petr Zapletal
 
Real-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache ApexReal-time Stream Processing using Apache Apex
Real-time Stream Processing using Apache Apex
Apache Apex
 
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsWill it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Navina Ramesh
 
Eranea's solution and technology for mainframe migration / transformation : d...
Eranea's solution and technology for mainframe migration / transformation : d...Eranea's solution and technology for mainframe migration / transformation : d...
Eranea's solution and technology for mainframe migration / transformation : d...
Eranea
 
Network visibility and control using industry standard sFlow telemetry
Network visibility and control using industry standard sFlow telemetryNetwork visibility and control using industry standard sFlow telemetry
Network visibility and control using industry standard sFlow telemetry
pphaal
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
Big Data Spain
 
M|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for YouM|18 Choosing the Right High Availability Strategy for You
M|18 Choosing the Right High Availability Strategy for You
MariaDB plc
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
Timothy McCormick
 
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Netflix Keystone Pipeline at Samza Meetup 10-13-2015
Monal Daxini
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
DataWorks Summit
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Apex
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Neil Avery
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Apache Apex
 
The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations The Enterprise IT Checklist for Docker Operations
The Enterprise IT Checklist for Docker Operations
Nicola Kabar
 
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Large volume data analysis on the Typesafe Reactive Platform - Big Data Scala...
Martin Zapletal
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
High-Speed Reactive Microservices
High-Speed Reactive MicroservicesHigh-Speed Reactive Microservices
High-Speed Reactive Microservices
Rick Hightower
 
redGuardian DP100 large scale DDoS mitigation solution
redGuardian DP100 large scale DDoS mitigation solutionredGuardian DP100 large scale DDoS mitigation solution
redGuardian DP100 large scale DDoS mitigation solution
Redge Technologies
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
Masashi Narumoto
 
Reactive mistakes - ScalaDays Chicago 2017
Reactive mistakes -  ScalaDays Chicago 2017Reactive mistakes -  ScalaDays Chicago 2017
Reactive mistakes - ScalaDays Chicago 2017
Petr Zapletal
 
Ad

More from Angad Singh (7)

From Journeyman to Master
From Journeyman to MasterFrom Journeyman to Master
From Journeyman to Master
Angad Singh
 
OpenSolaris School OS Beginners Guide
OpenSolaris School OS Beginners GuideOpenSolaris School OS Beginners Guide
OpenSolaris School OS Beginners Guide
Angad Singh
 
NetBeans 6.5
NetBeans 6.5NetBeans 6.5
NetBeans 6.5
Angad Singh
 
Netbeans 6.1 Talk
Netbeans 6.1 TalkNetbeans 6.1 Talk
Netbeans 6.1 Talk
Angad Singh
 
Open Solaris 2008.05
Open Solaris 2008.05Open Solaris 2008.05
Open Solaris 2008.05
Angad Singh
 
Sun Spot Talk
Sun Spot TalkSun Spot Talk
Sun Spot Talk
Angad Singh
 
From Journeyman to Master
From Journeyman to MasterFrom Journeyman to Master
From Journeyman to Master
Angad Singh
 
OpenSolaris School OS Beginners Guide
OpenSolaris School OS Beginners GuideOpenSolaris School OS Beginners Guide
OpenSolaris School OS Beginners Guide
Angad Singh
 
Netbeans 6.1 Talk
Netbeans 6.1 TalkNetbeans 6.1 Talk
Netbeans 6.1 Talk
Angad Singh
 
Open Solaris 2008.05
Open Solaris 2008.05Open Solaris 2008.05
Open Solaris 2008.05
Angad Singh
 

Recently uploaded (20)

Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of ExchangesJignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah Innovator
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
The Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI IntegrationThe Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI Integration
Re-solution Data Ltd
 
Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Wonjun Hwang
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
MINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PRMINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PR
MIND CTI
 
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdfAutomate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Precisely
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Does Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should KnowDoes Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should Know
Pornify CC
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Integrating FME with Python: Tips, Demos, and Best Practices for Powerful Aut...
Safe Software
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
Bepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firmBepents tech services - a premier cybersecurity consulting firm
Bepents tech services - a premier cybersecurity consulting firm
Benard76
 
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of ExchangesJignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah - The Innovator and Czar of Exchanges
Jignesh Shah Innovator
 
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Everything You Need to Know About Agentforce? (Put AI Agents to Work)
Cyntexa
 
The Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI IntegrationThe Future of Cisco Cloud Security: Innovations and AI Integration
The Future of Cisco Cloud Security: Innovations and AI Integration
Re-solution Data Ltd
 
Web and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in RajpuraWeb and Graphics Designing Training in Rajpura
Web and Graphics Designing Training in Rajpura
Erginous Technology
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Lea...
Raffi Khatchadourian
 
AI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of DocumentsAI Agents at Work: UiPath, Maestro & the Future of Documents
AI Agents at Work: UiPath, Maestro & the Future of Documents
UiPathCommunity
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Kit-Works Team Study_아직도 Dockefile.pdf_김성호
Wonjun Hwang
 
Mastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B LandscapeMastering Testing in the Modern F&B Landscape
Mastering Testing in the Modern F&B Landscape
marketing943205
 
UiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer OpportunitiesUiPath Agentic Automation: Community Developer Opportunities
UiPath Agentic Automation: Community Developer Opportunities
DianaGray10
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
MINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PRMINDCTI revenue release Quarter 1 2025 PR
MINDCTI revenue release Quarter 1 2025 PR
MIND CTI
 
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdfAutomate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Automate Studio Training: Building Scripts for SAP Fiori and GUI for HTML.pdf
Precisely
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025Zilliz Cloud Monthly Technical Review: May 2025
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Does Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should KnowDoes Pornify Allow NSFW? Everything You Should Know
Does Pornify Allow NSFW? Everything You Should Know
Pornify CC
 
Cybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and MitigationCybersecurity Threat Vectors and Mitigation
Cybersecurity Threat Vectors and Mitigation
VICTOR MAESTRE RAMIREZ
 

An adaptive and eventually self healing framework for geo-distributed real-time data ingestion

  • 1. An adaptive and eventually self-healing framework for geo- distributed real-time data ingestion Angad Singh InMobi
  • 2. The problem domain Scale ● 15 billion events per day (post filtering) ● 1.5+ billion users, 200 million per day ● 4 geographically distributed data centers (DCs) ● User’s request may land on non-local DC Ingestion requirements ● multiple tenants, multiple schemas per tenant ● batch, stream, micro-batch and on-demand ingestion ● 20+ streams, 100+ data types ● need to ingest, transform, validate and aggregate this data ● need to ingest streaming data in real-time (<1 min) for ad-serving/targeting use cases (strict SLA)
  • 3. The problem domain Usage/serving requirements ● need to pivot this data by user, activity type and other primary keys ● serve an aggregated view (profile) at the end in < 5ms p99 latency ● need both real-time serving of the view ● as well as batch summaries for analytics, inference algorithms, feedback loops ● need to be resilient to failure, absolutely no room for data loss/lag in ingestion Data arrival, volume and velocity ● data may be received out of order, or duplicated ● data can arrive in periodic batches or real-time/streaming or once in a while ● data may arrive in bursts or trickle slowly in some streams (autoscale) ● user data may be received in any DC, but needs to be collectively available in a single DC
  • 4. The problem domain Multi-tenancy ● Quotas ● Rate limiting/SLAs ● Isolation Manageability ● need to be self-serve, flexible for specific changes in the flow, easily deployable ● may need online migration, reprocessing, etc. of data ● hassle-free schema evolution across the stack ● monitoring, visibility, operability aspects for all of the above
  • 6. Serving layer (user store) aerospike cluster API dedup, aggregate, business rules Ratelimiting/quotas API Ad serving <5ms,99.95%success notifications pubsub (kafka) notification listeners (storm) periodic dumps streaming offline snapshot store (HDFS) batch inference jobs (MR/spark) analytics engine (cubes, lens) real-time enrichment on user engagement Ingestion layer globaldcglobaldc offline snapshot store (HDFS) globaldc localdc localdc localdc upstreamingestionsources batchsources streamingsources adaptors localdc adaptors adaptors (MR/storm) localdclocaldc routers localdc routers routers (MR/storm) localdclocaldc sinkssinks sinks (MR/storm) remotedc Ingestion Service orchestrate/manage remotedc remotedc Architecture
  • 7. DC1 (global) DC2 (slave) DC3 (slave) adaptors (MR) adaptors (storm) adaptors (storm) routers (MR) routers (storm) routers (storm) User-Colo Metadata (aerospike) User-Colo Metadata (aerospike) User-Colo Metadata (aerospike) custom replication (contains userid) (contains userid) (contains userid) sinks getColo(userid) sinks sinks Kafka-data-replicator (stormtopology) global colo tagger (storm) tag not found tagged data tag found write tag User Store API History Profile User Store API History Profile User Store API History Profile XDR (profile) Cross-DC architecture custom replication
  • 8. DC1 (global) DC2 (slave) DC3 (slave) adaptors (MR) adaptors (storm) adaptors (storm) routers (storm) routers (storm) routers (storm) User-Colo Metadata (aerospike) User-Colo Metadata (aerospike) User-Colo Metadata (aerospike) XDRXDR (contains userid) (contains userid) (contains userid) sinks getColo(userid) sinks sinks Kafka-data-replicator (stormtopology) global colo tagger (storm) tag not found tagged data tag found write tag User Store API History Profile User Store API History Profile User Store API History Profile XDR (profile) Mapper Comparison to map-reducePartitioner Shuffler Reducer
  • 11. Current Features Business-agnostic APIs ● Built on simple RESTful APIs: Schema, Feed, Sink, Source, Flow, Driver, Data router, Adaptor ● Unified APIs for doing batch, streaming, micro-batch ingestion. ● Self-serve system which provides rule validation, metrics, etc. and makes the expression of sources, sinks and flows easy with custom DSL. Platform-agnostic Flow Execution ● Pluggable execution engine (storm, hadoop, spark) - provides a Driver API ● Uses falcon for batch scheduling, in-built scheduler for streaming drivers (storm, etc.) Serialization support ● Pluggable schema serde support (thrift, avro)
  • 12. Current Features Schema management ● Schema is a first class citizen. ● Contract between source, sink and flow all based on and validated against schema ● Schema versioning and compatibility checks. ● Error-free schema evolution across data flows ● Clean abstractions to centrally manage all the schemas, data sources/feeds, sinks (key value store, HDFS, etc.) and data flows (storm topologies, MR jobs) which are part of the ingestion pipelines Manageability, operability ● All entities - schemas, sinks and flows - can be updated online without any downtime. ● Retries, error handling, metrics, orchestration hooks, etc. come standard Out of the box support for ● Cross-colo flow chaining ● Data routing ● Transformation, validation, conversion ● All based on pluggable code
  • 15. The problems we’ve seen Storm ● as usual, lot of knobs to tune based on lot of metrics: workers, threads, tasks, acks, max spout pending, buffer sizes, xmx, num slots, execute/process/ack latency, capacity, etc. ● debugging storm topology’s isn’t easy: threads, workers, shared logs, shuffling of data between workers, netty, the ack system, etc. ● storm (0.9.x) doesn’t like heterogenous load: unbalanced distribution between supervisors. heavy topologies can choke each other. rebalancing not fully resource aware (1.x tries to solve this) ● no rolling upgrades, supervisor failures cause unrecoverable errors ● zookeeper issues: too many executors leads to worker heartbeat update failure to zk. ● storm-kafka issue: storm-kafka spout unaware of purging (earliestOffset update) ● storm-kafka issue: invisible data loss ● retries should done cautiously ● etc Kafka ● topic deletion asynchronous, slow ● tuning num partitions manually ● bad consumers can cause excessive logging on brokers
  • 16. Features under development ● Autoscaling flows - rebalance storm topology based on spout lag, priority and current throughput (or bolt capacity) - runtime metrics or linear regression on historical metrics ● Streaming and batch compaction / dedup of data based on domain specific rules ● Automatic fallback from streaming to batch ingestion in case of huge backlogs, for low priority ingestions ● Dynamic rerouting / sharding of data between DCs for load balancing cross- DC flows ● Eventual self-correction of data based on validations on the aggregated view (data received from multiple streams) ● Data lineage/auditing ● Backfill management
  翻译: