Beruflich Dokumente
Kultur Dokumente
2
Continuous Apps before Streaming
Scheduler
file 1 Job 1
Serving
file 2 Job 2
file 3 Job 3
time
3
Continuous Apps with Lambda
Scheduler
file 1 Job 1
Serving
file 2 Job 2
Streaming
job 4
Continuous Apps with Streaming
5
Continuous Data Sources
Process a period of Process latest data
historic data with low latency
(tail of the log)
partition
partition
Reprocess stream
(historic data first, catches up with realtime data)
6
Continuous Data Sources
Stream of events in Apache Kafka partitions
partition
partition
7
Continuous Processing
Time State
Enter Apache Flink
9
Apache Flink Stack
Libraries
Runtime
Distributed Streaming Data Flow
keyBy()/
Source map() window()/
[1] [1] apply()
[1]
Sink Streaming
[1] Dataflow
keyBy()/
Source map() window()/
[2] [2] apply()
[2]
11
What makes Flink flink?
Low latency
Make more sense of data
High Throughput
Exactly-once semantics
for fault tolerance
Globally consistent Flexible windows
savepoints (time, count, session, roll-your own)
12
(It's) About Time
13
Different Notions of Time
Flink Flink
Event Producer Message Queue
Data Source Window Operator
partition 1
partition 2
Event Window
Time Storage Stream Processor Processing
Ingestion Ingestion Time
Time Time
14
Event Time vs. Processing Time
Event Time
Episode Episode Episode Episode Episode Episode Episode
IV V VI I II III VII
Processing Time
15
Batch: Implicit Treatment of Time
1h
1h
Serving
Batch Job
Batch Job Layer
1h
Batch Job
Time-driven Data-driven
e.g. last X minutes e.g. last X records
Time
17
Streaming: Windows
Time
"Average over the last 5 minutes”
18
Event Time Windows
Event Time Windows reorder the events to their Event Time order
19
Processing Time
case class Event(id: String, measure: Double, timestamp: Long)
stream
.keyBy("id")
.timeWindow(Time.seconds(15), Time.seconds(5))
.sum("measure")
20
Ingestion Time
case class Event(id: String, measure: Double, timestamp: Long)
stream
.keyBy("id")
.timeWindow(Time.seconds(15), Time.seconds(5))
.sum("measure")
21
Event Time
case class Event(id: String, measure: Double, timestamp: Long)
tsStream
.keyBy("id")
.timeWindow(Time.seconds(15), Time.seconds(5))
.sum("measure") 22
The Power of Event Time
Batch Processors: Event-time in ingestion-time batches
• Stable across re-executions
• Wrong grouping at batch boundaries
23
The Power of Event Time
Batch Processors: Event-time in ingestion-time batches
• Stable across re-executions Mix of data-driven and
• Wrong grouping at batch boundaries wall clock time
24
Event Time Progress: Watermarks
Stream (in order)
23 21 20 19 18 17 15 14 11 10 9 9 7
W(20) W(11)
Event
Watermark
Event timestamp
21 19 20 17 22 12 17 14 12 9 15 11 7
W(17) W(11)
Event
Watermark
Event timestamp 25
Bounding the Latency for Results
Triggering on combinations on
Event Time and Processing Time
26
Matters of State
27
Batch vs. Continuous
Continuous
Batch Jobs
Programs
28
Continuous State
Sessions over time
time
No stateless point in time
29
Re-processing data (in batch)
30
Re-processing data (in batch)
Savepoint A Savepoint B
Savepoint A
33
Re-processing data (continuous)
Draw savepoints at times that you will want to start new jobs
from (daily, hourly, …)
Reprocess by starting a new job from a savepoint
• Defines start position in stream (for example Kafka offsets)
• Initializes pending state (like partial sessions)
34
Forking and Versioning Applications
App. A
Savepoint Savepoint
App. B
Savepoint App. C
Savepoint
35
Conclusion
36
Wrap up
Streaming is the architecture for continuous processing
37
Upcoming Features
Dynamic Scaling, Resource Elasticity
Stream SQL
CEP enhancements
Incremental & asynchronous state snapshotting
Mesos support
More connectors, end-to-end exactly once
API enhancements (e.g., joins, slowly changing inputs)
Security (data encryption, Kerberos with Kafka)
38
What makes Flink flink?
Low latency
Make more sense of data
High Throughput
Exactly-once semantics
for fault tolerance
Globally consistent Flexible windows
savepoints (time, count, session, roll-your own)
39
Flink Forward 2016, Berlin
Submission deadline: June 30, 2016
Early bird deadline: July 15, 2016
www.flink-forward.org
We are hiring!
data-artisans.com/careers