Apache Flink Training: System Overview

Apache Flink® Training
System Overview

2
A stream processor
with many applications
Streaming dataflow runtime
Apache Flink

1 year of Flink - code
April 2014 April 2015

Flink Community
0
20
40
60
80
100
120
Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13 Nov-13 Jun-14 Dec-14 Jul-15
#unique contributor ids by git commits
In top 5 of Apache's big
data projects after one year
in the Apache Software
Foundation

The Apache Way
 Flink is an Apache top-level
project
5
• Independent, non-profit organization
• Community-driven open source software development
approach
• Public communication and open to new contributors

What is Apache Flink?
6
Gelly
Table
ML
SAMOA
DataSet (Java/Scala/Python) DataStream (Java/Scala)
HadoopM/R
Local Remote Yarn Tez Embedded
Dataflow
Dataflow(WiP)
MRQL
Table
Cascading(WiP)
Streaming dataflow runtime

Native workload support
7
Flink
Streaming
topologies
Heavy
batch jobs
Machine Learning at scale
How can an engine natively support all these workloads?
And what does native mean?

E.g.: Non-native iterations
8
Step Step Step Step Step
Client
for (int i = 0; i < maxIterations; i++) {
// Execute MapReduce job
}

E.g.: Non-native streaming
9
stream
discretizer
Job Job Job Job
while (true) {
// get next few records
// issue batch job
}

Native workload support
10
Flink
Stream
processing
Batch
processing
Machine Learning at scale
How can an engine natively support all these workloads?
And what does "native" mean?
Graph Analysis

Flink Engine
1. Execute everything as streams
2. Iterative (cyclic) dataflows
3. Mutable state
4. Operate on managed memory
5. Special code paths for batch
11
State +
Computation

Gelly
Table
ML
SAMOA
HadoopM/R
Dataflow
Dataflow(WiP)
MRQL
Table
Cascading(WiP) Streaming dataflow runtime
Flink stack

Basic API Concept
How do I write a Flink program?
1. Bootstrap sources
2. Apply operations
3. Output to source
14
Data
Stream
Operation
Data
Stream
Source Sink
Data
Set
Operation
Data
Set
Source Sink

Batch & Stream Processing
 DataStream API
15
Stock Feed
Name Price
Microsoft 124
Google 516
Apple 235
… …
Alert if
Microsoft
> 120
Write
event to
database
Sum every
10
seconds
Alert if
sum >
10000
Microsoft 124
Google 516
Apple 235
Microsoft 124
Google 516
Apple 235
b h
2 1
3 5
7 4
… …
Map Reduce
a
1
2
…
 DataSet API
Example: Map/Reduce paradigm
Example: Live Stock Feed

Streaming & Batch
16
Batch
finite
blocking or
pipelined
high
Streaming
infinite
pipelined
low
Input
Data transfer
Latency

Scaling out
17
Data
Set
Operation
Data
Set
Source Sink
Data
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source SinkData
Set
Operation
Data
Set
Source Sink

Sources (selection)
Collection-based
 fromCollection
 fromElements
File-based
 TextInputFormat
 CsvInputFormat
Other
 SocketInputFormat
 KafkaInputFormat
 Databases
19

Sinks (selection)
File-based
 TextOutputFormat
 CsvOutputFormat
 PrintOutput
Others
 SocketOutputFormat
 KafkaOutputFormat
 Databases
20

Hadoop Integration
Out of the box
 Access HDFS
 Yarn Execution (covered later)
 Reuse data types (Writables)
With a thin wrapper
 Reuse Hadoop input and output formats
 Reuse functions like Map and Reduce
21

What’s the Lifecycle of a
Program?
22

Architecture Overview
 Client
 Master (Job Manager)
 Worker (Task Manager)
24
Client
Job Manager
Task
Manager
Task
Manager
Task
Manager

Client
 Optimize
 Construct job graph
 Pass job graph to job manager
 Retrieve job results
25
Job Manager
Client
case class Path (from: Long, to:
Long)
val tc = edges.iterate(10) {
paths: DataSet[Path] =>
val next = paths
.join(edges)
.where("to")
.equalTo("from") {
(path, edge) =>
Path(path.from, edge.to)
}
.union(paths)
.distinct()
next
}
Optimizer
Type
extraction
Data
Source
orders.tbl
Filter
Map
DataSourc
e
lineitem.tbl
Join
Hybrid Hash
build
HT
probe
hash-part [0] hash-part [0]
GroupRed
sort
forward

Job Manager
 Parallelization: Create Execution Graph
 Scheduling: Assign tasks to task managers
 State: Supervise the execution
26
Job Manager
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
GroupRed
sort
forwar
d
Task Manager
Task Manager
Task Manager
Task Manager
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
GroupRed
sort
forwar
d
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
GroupRed
sort
forwar
d
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
GroupRed
sort
forwar
d
Data
Source
orders.tbl
Filter
Map
DataSour
ce
lineitem.tbl
Join
Hybrid Hash
build
HT
prob
e
GroupRed
sort
forwar
d

Task Manager
 Operations are split up into tasks depending
on the specified parallelism
 Each parallel instance of an operation runs in
a separate task slot
 The scheduler may run several tasks from
different operators in one task slot
27
Task Manager
S
l
o
t
Task ManagerTask Manager
S
l
o
t
S
l
o
t

Ways to Run a Flink Program
29
Gelly
Table
ML
SAMOA
HadoopM/R
Dataflow
Dataflow(WiP)
MRQL
Table
Cascading(WiP) Streaming dataflow runtime

Local Execution
 Starts local Flink cluster
 All processes run in the
same JVM
 Behaves just like a
regular Cluster
 Very useful for developing
and debugging
30
Job Manager
Task
Manager
Task
Manager
Task
Manager
Task
Manager
JVM

Embedded Execution
 Runs operators on simple Java
collections
 Lower overhead
 Does not use memory management
 Useful for testing and debugging
31

Remote Execution
 Submit a Job
remotely
 Monitor the status
of a job
32
Client Job Manager
Cluster
Task
Manager
Task
Manager
Task
Manager
Task
Manager
Submit job

YARN Execution
 Multi-user scenario
 Resource sharing
 Uses YARN
containers to run a
Flink cluster
 Easy to setup
33
Client
Node Manager
Job Manager
YARN Cluster
Resource Manager
Node Manager
Task
Manager
Node Manager
Task
Manager
Node Manager
Other
Application

Execution
 Leverages Apache Tez’s runtime
 Built on top of YARN
 Good YARN citizen
 Fast path to elastic deployments
 Slower than native Flink
34

Flink compared to other
projects
35

Batch & Streaming projects
Batch only
Streaming only
Hybrid
36

Batch comparison
37
API low-level high-level high-level
Data Transfer batch batch pipelined & batch
Memory
Management
disk-based JVM-managed Active managed
Iterations
file system
cached
in-memory
cached
streamed
Fault tolerance task level task level job level
Good at massive scale out data exploration
heavy backend &
iterative jobs
Libraries many external built-in & external
evolving built-in &
external

Streaming comparison
38
Streaming “true” mini batches “true”
API low-level high-level high-level
Fault tolerance tuple-level ACKs RDD-based (lineage) coarse checkpointing
State not built-in external internal
Exactly once at least once exactly once exactly once
Windowing not built-in restricted flexible
Latency low medium low
Throughput medium high high

Apache Flink Training: System Overview

Recommended

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Apache Flink Training: System Overview (20)

More from Flink Forward (20)

Recently uploaded (20)

Apache Flink Training: System Overview

Editor's Notes