SlideShare a Scribd company logo
Data Streaming in
Big Data Analysis
Docent lecture
Vincenzo Gulisano, Ph.D.
Vincenzo Gulisano Data streaming in Big Data analysis 1
Vincenzo Gulisano Data streaming in Big Data analysis 2
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 3
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 4
... back in the year 2000
• Continuous processing of data
streams
• Real-time fashion
... Store-then-process is not feasible
Vincenzo Gulisano Data streaming in Big Data analysis 5
Financial applications
Sensor networks
ISPs
~ 2010
Vincenzo Gulisano Data streaming in Big Data analysis 6
2017
Vincenzo Gulisano Data streaming in Big Data analysis 7
Advanced Metering Infrastructures
Vehicular Networks
1. Billions of readings per day cannot be
transferred continuously
2. The latency incurred while transferring data
might undermine the utility of the analysis
3. It is not secure to concentrate all the data in
a single place
4. Privacy can be leaked when giving away
fine-grained data
What do we need then?
• Efficient one-pass analysis
• In memory
• Bounded resources
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 8
Main Memory
Disk
1 Data
Query Processing
3 Query
results
2 Query
Main Memory
Query Processing
Continuous
Query
Data
Query
results
9Data streaming in Big Data analysisVincenzo Gulisano
DBMS vs. DSMS
data stream: unbounded sequence of tuples sharing the same schema
10
Example: vehicles’ speed and position reports
time
Field Field
vehicle id text
time (secs) text
speed (Km/h) double
X coordinate double
Y coordinate double
A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
Data streaming in Big Data analysisVincenzo Gulisano
continuous query: Directed Acyclic Graph (DAG) of streams and operators
11
OP
OP
OP
OP OP
OP OP
source op
(1+ out streams)
sink op
(1+ in streams)
stream
op
(1+ in, 1+ out streams)
Data streaming in Big Data analysisVincenzo Gulisano
data streaming operators
• Stateless operators
• do not maintain any state
• one-by-one processing
• Stateful operators
• maintain a state that evolves with the tuples being processed
• produce output tuples that depend on multiple input tuples
12
OP
OP
Data streaming in Big Data analysisVincenzo Gulisano
stateless operators
13
Filter
...
Map
Union
...
Filter / route tuples based on one (or more) conditions
Transform each tuple
Merge multiple streams (with the same schema) into one
Data streaming in Big Data analysisVincenzo Gulisano
stateful operators
14
Aggregate information from multiple tuples
(e.g., compute average speed of the tuples in the last hour)
Compare tuples coming from 2 streams given a certain predicate
(e.g., given the last 5 tuples from each stream, join every pair
reporting the same position)
Aggregate
Join
Data streaming in Big Data analysisVincenzo Gulisano
Since streams are unbounded, windows (over time or tuples)
are defined to bound the portion of tuples to aggregate or join
sample query
For each vehicle, raise an alert if the speed of the latest report is more
than 2 times higher than its average speed in the last 30 days.
15
time
A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
Data streaming in Big Data analysisVincenzo Gulisano
16
Field
vehicle id
time (secs)
speed (Km/h)
X coordinate
Y coordinate
Compute average
speed for each
vehicle during the
last 30 days
Aggregate
Field
vehicle id
time (secs)
avg speed (Km/h)
Join
Check
condition
Filter
Field
vehicle id
time (secs)
speed (Km/h)
Join on
vehicle id
Field
vehicle id
time (secs)
avg speed (Km/h)
speed (Km/h)
sample query
Data streaming in Big Data analysisVincenzo Gulisano
A B C
A B C
A B C
A B C
B
C
A
A
A
B
B
Vincenzo Gulisano Data streaming in Big Data analysis 17
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 18
Faulttolerance
Elasticity
Loadbalancing
Determinism
Parallel execution of streaming operators
Vincenzo Gulisano Data streaming in Big Data analysis 19
Parallel execution of streaming applications
OP1 OP2
OP1 OP2
OP1 OP2
=
1) how to route tuples?
2) where to route tuples?
3) How to merge tuples?
4) How many instances to
deploy per operator?
...
Faulttolerance
Elasticity
Loadbalancing
Determinism
Parallel execution of streaming operators
Vincenzo Gulisano Data streaming in Big Data analysis 20
Parallel execution of streaming applications
DDoSdetection
andmitigation
Intrusiondetection
Datavalidation
Differentially
privateaggregation
Vehicularnetworks
analysis
Urbanmobility
analysis
Security and
privacy
IoT
Transportation
sustainability
Synchronization / Data structures
Faulttolerance
Elasticity
Loadbalancing
Determinism
Parallel execution of streaming operators
Vincenzo Gulisano Data streaming in Big Data analysis 21
Parallel execution of streaming applications
Many-core systems / FPGAs
DDoSdetection
andmitigation
Intrusiondetection
Datavalidation
Differentially
privateaggregation
Vehicularnetworks
analysis
Urbanmobility
analysis
Parallel joins
Parallel
aggregates
Joins
modeling
Security and
privacy
IoT
Transportation
sustainability
Vincenzo Gulisano Data streaming in Big Data analysis 22
Synchronization / Data structures
Faulttolerance
Elasticity
Loadbalancing
DeterminismParallel execution of streaming operators
Parallel execution of streaming applications
Many-core systems / FPGAs
DDoSdetection
andmitigation
Intrusiondetection
Datavalidation
Differentially
privateaggregation
Vehicularnetworks
analysis
Urbanmobility
analysis
Security and
privacy
Transportatio
sustainabilit
Distributed execution of streaming
applications
first the hardware, then the query
first the query, then the hardware
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 23
Millions of
sensors
• Store information
• Iterate multiple times over data
• Think, do not rush through decisions
• ”Hard-wired” routines
• Real-time decisions
• High-throughput / low-latency
Do I really need to
try surströmming?
(... NO)
Danger!!!
Run!!!
(surströmming
can opened)
Vincenzo Gulisano Data streaming in Big Data analysis 24
What traffic
congestion
patterns can I
observe
frequently?
Don’t take
over, car in
opposite lane!
Vincenzo Gulisano Data streaming in Big Data analysis 25
• Store information
• Iterate multiple times over data
• Think, do not rush through decisions
• ”Hard-wired” routines
• Real-time decisions
• High-throughput / low-latency
Millions of
sensors
+ + +
Agenda
•Why data streaming?
•How does it work?
•Past, current and future research
•Conclusions
•Bibliography
Vincenzo Gulisano Data streaming in Big Data analysis 26
Bibliography
1. Zhou, Jiazhen, Rose Qingyang Hu, and Yi Qian. "Scalable distributed communication architectures to support advanced
metering infrastructure in smart grid." IEEE Transactions on Parallel and Distributed Systems 23.9 (2012): 1632-1642.
2. Gulisano, Vincenzo, et al. "BES: Differentially Private and Distributed Event Aggregation in Advanced Metering Infrastructures."
Proceedings of the 2nd ACM International Workshop on Cyber-Physical System Security. ACM, 2016.
3. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "Online and scalable data validation in advanced metering
infrastructures." IEEE PES Innovative Smart Grid Technologies, Europe. IEEE, 2014.
4. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "METIS: a two-tier intrusion detection system for advanced
metering infrastructures." International Conference on Security and Privacy in Communication Systems. Springer International
Publishing, 2014.
5. Yousefi, Saleh, Mahmoud Siadat Mousavi, and Mahmood Fathy. "Vehicular ad hoc networks (VANETs): challenges and
perspectives." 2006 6th International Conference on ITS Telecommunications. IEEE, 2006.
6. El Zarki, Magda, et al. "Security issues in a future vehicular network." European Wireless. Vol. 2. 2002.
7. Georgiadis, Giorgos, and Marina Papatriantafilou. "Dealing with storage without forecasts in smart grids: Problem
transformation and online scheduling algorithm." Proceedings of the 29th Annual ACM Symposium on Applied Computing.
ACM, 2014.
8. Fu, Zhang, et al. "Online temporal-spatial analysis for detection of critical events in Cyber-Physical Systems." Big Data (Big
Data), 2014 IEEE International Conference on. IEEE, 2014.
Data streaming in Big Data analysis 27Vincenzo Gulisano
Bibliography
9. Arasu, Arvind, et al. "Linear road: a stream data management benchmark." Proceedings of the Thirtieth international
conference on Very large data bases-Volume 30. VLDB Endowment, 2004.
10. Lv, Yisheng, et al. "Traffic flow prediction with big data: a deep learning approach." IEEE Transactions on Intelligent
Transportation Systems 16.2 (2015): 865-873.
11. Grochocki, David, et al. "AMI threats, intrusion detection requirements and deployment recommendations." Smart Grid
Communications (SmartGridComm), 2012 IEEE Third International Conference on. IEEE, 2012.
12. Molina-Markham, Andrés, et al. "Private memoirs of a smart meter." Proceedings of the 2nd ACM workshop on embedded
sensing systems for energy-efficiency in building. ACM, 2010.
13. Gulisano, Vincenzo, et al. "Streamcloud: A large scale data streaming system." Distributed Computing Systems (ICDCS), 2010
IEEE 30th International Conference on. IEEE, 2010.
14. Stonebraker, Michael, Uǧur Çetintemel, and Stan Zdonik. "The 8 requirements of real-time stream processing." ACM SIGMOD
Record 34.4 (2005): 42-47.
15. Bonomi, Flavio, et al. "Fog computing and its role in the internet of things." Proceedings of the first edition of the MCC
workshop on Mobile cloud computing. ACM, 2012.
Data streaming in Big Data analysis 28Vincenzo Gulisano
Bibliography
16. Gulisano, Vincenzo Massimiliano. StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine. Diss. Informatica,
2012.
17. Cardellini, Valeria, et al. "Optimal operator placement for distributed stream processing applications." Proceedings of the 10th
ACM International Conference on Distributed and Event-based Systems. ACM, 2016.
18. Costache, Stefania, et al. "Understanding the Data-Processing Challenges in Intelligent Vehicular Systems." Proceedings of the
2016 IEEE Intelligent Vehicles Symposium (IV16).
19. Giatrakos, Nikos, Antonios Deligiannakis, and Minos Garofalakis. "Scalable Approximate Query Tracking over Highly Distributed
Data Streams." Proceedings of the 2016 International Conference on Management of Data. ACM, 2016.
20. Gulisano, Vincenzo, et al. "Streamcloud: An elastic and scalable data streaming system." IEEE Transactions on Parallel and
Distributed Systems 23.12 (2012): 2351-2365.
21. Shah, Mehul A., et al. "Flux: An adaptive partitioning operator for continuous query systems." Data Engineering, 2003.
Proceedings. 19th International Conference on. IEEE, 2003.
Data streaming in Big Data analysis 29Vincenzo Gulisano
Bibliography
22. Cederman, Daniel, et al. "Brief announcement: concurrent data structures for efficient streaming aggregation." Proceedings of
the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 2014.
23. Ji, Yuanzhen, et al. "Quality-driven processing of sliding window aggregates over out-of-order data streams." Proceedings of
the 9th ACM International Conference on Distributed Event-Based Systems. ACM, 2015.
24. Ji, Yuanzhen, et al. "Quality-driven disorder handling for concurrent windowed stream queries with shared operators."
Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. ACM, 2016.
25. Gulisano, Vincenzo, et al. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." Big Data (Big Data), 2015
IEEE International Conference on. IEEE, 2015.
26. Ottenwälder, Beate, et al. "MigCEP: operator migration for mobility driven distributed complex event processing." Proceedings
of the 7th ACM international conference on Distributed event-based systems. ACM, 2013.
27. De Matteis, Tiziano, and Gabriele Mencagli. "Keep calm and react with foresight: strategies for low-latency and energy-efficient
elastic data stream processing." Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming. ACM, 2016.
28. Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." ACM Transactions on
Database Systems (TODS) 33.1 (2008): 3.
29. Castro Fernandez, Raul, et al. "Integrating scale out and fault tolerance in stream processing using operator state
management." Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 2013.
Data streaming in Big Data analysis 30Vincenzo Gulisano
Bibliography
30. Dwork, Cynthia. "Differential privacy: A survey of results." International Conference on Theory and Applications of Models of
Computation. Springer Berlin Heidelberg, 2008.
31. Dwork, Cynthia, et al. "Differential privacy under continual observation." Proceedings of the forty-second ACM symposium on
Theory of computing. ACM, 2010.
32. Kargl, Frank, Arik Friedman, and Roksana Boreli. "Differential privacy in intelligent transportation systems." Proceedings of the
sixth ACM conference on Security and privacy in wireless and mobile networks. ACM, 2013.
Data streaming in Big Data analysis 31Vincenzo Gulisano
Ad

More Related Content

What's hot (20)

Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Edureka!
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
Data Science Thailand
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
Edureka!
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Gajanand Sharma
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
Rahul Agarwal
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
Kamalika Dutta
 
Data visualization
Data visualizationData visualization
Data visualization
Jan Willem Tulp
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
Prashant Kumar Jadia
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
DataminingTools Inc
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Deepika ParthaSarathy
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Data analytics
Data analyticsData analytics
Data analytics
BindhuBhargaviTalasi
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
Great Wide Open
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
GauravBiswas9
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
SujaAldrin
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
Alex Meadows
 
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Edureka!
 
Introduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data ScienceIntroduction to Big Data Analytics and Data Science
Introduction to Big Data Analytics and Data Science
Data Science Thailand
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
Edureka!
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
Kamalika Dutta
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
Guido Schmutz
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
Great Wide Open
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
GauravBiswas9
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
SujaAldrin
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
Alex Meadows
 

Similar to Data Streaming in Big Data Analysis (20)

Data Streaming in IoT and Big Data Analytics
Data Streaming in  IoT and Big Data AnalyticsData Streaming in  IoT and Big Data Analytics
Data Streaming in IoT and Big Data Analytics
Vincenzo Gulisano
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
Vincenzo Gulisano
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
Vincenzo Gulisano
 
CINET: A Cyber-Infrastructure for Network Science Overview
CINET: A Cyber-Infrastructure for Network Science OverviewCINET: A Cyber-Infrastructure for Network Science Overview
CINET: A Cyber-Infrastructure for Network Science Overview
Biocomplexity Institute of Virginia Tech
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
FogGuru MSCA Project
 
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Demetris Trihinas
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things
PayamBarnaghi
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
Guido Schmutz
 
Smart Cities: How are they different?
Smart Cities: How are they different? Smart Cities: How are they different?
Smart Cities: How are they different?
PayamBarnaghi
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
STS FORUM 2016
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Stavros Kontopoulos
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thessaloniki
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in Highways
York University
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept
Miha Ahronovitz
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City Applications
Amit Sheth
 
Semantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream DataSemantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream Data
Oscar Corcho
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
Raja Chiky
 
Multimedia Mining
Multimedia Mining Multimedia Mining
Multimedia Mining
Biniam Asnake
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
Boris Adryan
 
Physical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPhysical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City Applications
PayamBarnaghi
 
Data Streaming in IoT and Big Data Analytics
Data Streaming in  IoT and Big Data AnalyticsData Streaming in  IoT and Big Data Analytics
Data Streaming in IoT and Big Data Analytics
Vincenzo Gulisano
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
Vincenzo Gulisano
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
Vincenzo Gulisano
 
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...
Demetris Trihinas
 
Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things Dynamic Semantics for the Internet of Things
Dynamic Semantics for the Internet of Things
PayamBarnaghi
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
Guido Schmutz
 
Smart Cities: How are they different?
Smart Cities: How are they different? Smart Cities: How are they different?
Smart Cities: How are they different?
PayamBarnaghi
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
STS FORUM 2016
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Stavros Kontopoulos
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thessaloniki
 
Realtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in HighwaysRealtime Big Data Analytics for Event Detection in Highways
Realtime Big Data Analytics for Event Detection in Highways
York University
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept
Miha Ahronovitz
 
Big Data & Smart City Applications
Big Data & Smart City ApplicationsBig Data & Smart City Applications
Big Data & Smart City Applications
Amit Sheth
 
Semantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream DataSemantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream Data
Oscar Corcho
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
Raja Chiky
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
Boris Adryan
 
Physical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City ApplicationsPhysical-Cyber-Social Data Analytics & Smart City Applications
Physical-Cyber-Social Data Analytics & Smart City Applications
PayamBarnaghi
 
Ad

More from Vincenzo Gulisano (7)

Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data StreamingTutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Vincenzo Gulisano
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
Vincenzo Gulisano
 
Strel streaming
Strel streamingStrel streaming
Strel streaming
Vincenzo Gulisano
 
Performance Modeling of Stream Joins
Performance Modeling of Stream JoinsPerformance Modeling of Stream Joins
Performance Modeling of Stream Joins
Vincenzo Gulisano
 
The data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architecturesThe data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architectures
Vincenzo Gulisano
 
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream JoinScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
Vincenzo Gulisano
 
The benefits of fine-grained synchronization in deterministic and efficient ...
The benefits of fine-grained synchronization in  deterministic and efficient ...The benefits of fine-grained synchronization in  deterministic and efficient ...
The benefits of fine-grained synchronization in deterministic and efficient ...
Vincenzo Gulisano
 
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data StreamingTutorial: The Role of Event-Time Analysis Order in Data Streaming
Tutorial: The Role of Event-Time Analysis Order in Data Streaming
Vincenzo Gulisano
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
Vincenzo Gulisano
 
Performance Modeling of Stream Joins
Performance Modeling of Stream JoinsPerformance Modeling of Stream Joins
Performance Modeling of Stream Joins
Vincenzo Gulisano
 
The data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architecturesThe data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architectures
Vincenzo Gulisano
 
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream JoinScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
Vincenzo Gulisano
 
The benefits of fine-grained synchronization in deterministic and efficient ...
The benefits of fine-grained synchronization in  deterministic and efficient ...The benefits of fine-grained synchronization in  deterministic and efficient ...
The benefits of fine-grained synchronization in deterministic and efficient ...
Vincenzo Gulisano
 
Ad

Recently uploaded (20)

Transgenic Mice in Cancer Research - Creative Biolabs
Transgenic Mice in Cancer Research - Creative BiolabsTransgenic Mice in Cancer Research - Creative Biolabs
Transgenic Mice in Cancer Research - Creative Biolabs
Creative-Biolabs
 
Study in Pink (forensic case study of Death)
Study in Pink (forensic case study of Death)Study in Pink (forensic case study of Death)
Study in Pink (forensic case study of Death)
memesologiesxd
 
Subject name: Introduction to psychology
Subject name: Introduction to psychologySubject name: Introduction to psychology
Subject name: Introduction to psychology
beebussy155
 
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Professional Content Writing's
 
Batteries and fuel cells for btech first year
Batteries and fuel cells for btech first yearBatteries and fuel cells for btech first year
Batteries and fuel cells for btech first year
MithilPillai1
 
Sleep_physiology_types_duration_underlying mech.
Sleep_physiology_types_duration_underlying mech.Sleep_physiology_types_duration_underlying mech.
Sleep_physiology_types_duration_underlying mech.
klynct
 
Euclid: The Story So far, a Departmental Colloquium at Maynooth University
Euclid: The Story So far, a Departmental Colloquium at Maynooth UniversityEuclid: The Story So far, a Departmental Colloquium at Maynooth University
Euclid: The Story So far, a Departmental Colloquium at Maynooth University
Peter Coles
 
ICAI OpenGov Lab: A Quick Introduction | AI for Open Government
ICAI OpenGov Lab: A Quick Introduction | AI for Open GovernmentICAI OpenGov Lab: A Quick Introduction | AI for Open Government
ICAI OpenGov Lab: A Quick Introduction | AI for Open Government
David Graus
 
Fatigue and its management in aviation medicine
Fatigue and its management in aviation medicineFatigue and its management in aviation medicine
Fatigue and its management in aviation medicine
ImranJewel2
 
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
Sérgio Sacani
 
Astrobiological implications of the stability andreactivity of peptide nuclei...
Astrobiological implications of the stability andreactivity of peptide nuclei...Astrobiological implications of the stability andreactivity of peptide nuclei...
Astrobiological implications of the stability andreactivity of peptide nuclei...
Sérgio Sacani
 
External Application in Homoeopathy- Definition,Scope and Types.
External Application  in Homoeopathy- Definition,Scope and Types.External Application  in Homoeopathy- Definition,Scope and Types.
External Application in Homoeopathy- Definition,Scope and Types.
AdharshnaPatrick
 
Applications of Radioisotopes in Cancer Research.pptx
Applications of Radioisotopes in Cancer Research.pptxApplications of Radioisotopes in Cancer Research.pptx
Applications of Radioisotopes in Cancer Research.pptx
MahitaLaveti
 
Proprioceptors_ receptors of muscle_tendon
Proprioceptors_ receptors of muscle_tendonProprioceptors_ receptors of muscle_tendon
Proprioceptors_ receptors of muscle_tendon
klynct
 
Eric Schott- Environment, Animal and Human Health (3).pptx
Eric Schott- Environment, Animal and Human Health (3).pptxEric Schott- Environment, Animal and Human Health (3).pptx
Eric Schott- Environment, Animal and Human Health (3).pptx
ttalbert1
 
Preclinical Advances in Nuclear Neurology.pptx
Preclinical Advances in Nuclear Neurology.pptxPreclinical Advances in Nuclear Neurology.pptx
Preclinical Advances in Nuclear Neurology.pptx
MahitaLaveti
 
A CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptx
A CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptxA CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptx
A CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptx
ANJALICHANDRASEKARAN
 
Seismic evidence of liquid water at the base of Mars' upper crust
Seismic evidence of liquid water at the base of Mars' upper crustSeismic evidence of liquid water at the base of Mars' upper crust
Seismic evidence of liquid water at the base of Mars' upper crust
Sérgio Sacani
 
AP 2024 Unit 1 Updated Chemistry of Life
AP 2024 Unit 1 Updated Chemistry of LifeAP 2024 Unit 1 Updated Chemistry of Life
AP 2024 Unit 1 Updated Chemistry of Life
mseileenlinden
 
Funakoshi_ZymoResearch_2024-2025_catalog
Funakoshi_ZymoResearch_2024-2025_catalogFunakoshi_ZymoResearch_2024-2025_catalog
Funakoshi_ZymoResearch_2024-2025_catalog
fu7koshi
 
Transgenic Mice in Cancer Research - Creative Biolabs
Transgenic Mice in Cancer Research - Creative BiolabsTransgenic Mice in Cancer Research - Creative Biolabs
Transgenic Mice in Cancer Research - Creative Biolabs
Creative-Biolabs
 
Study in Pink (forensic case study of Death)
Study in Pink (forensic case study of Death)Study in Pink (forensic case study of Death)
Study in Pink (forensic case study of Death)
memesologiesxd
 
Subject name: Introduction to psychology
Subject name: Introduction to psychologySubject name: Introduction to psychology
Subject name: Introduction to psychology
beebussy155
 
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Chemistry of Warfare (Chemical weapons in warfare: An in-depth analysis of cl...
Professional Content Writing's
 
Batteries and fuel cells for btech first year
Batteries and fuel cells for btech first yearBatteries and fuel cells for btech first year
Batteries and fuel cells for btech first year
MithilPillai1
 
Sleep_physiology_types_duration_underlying mech.
Sleep_physiology_types_duration_underlying mech.Sleep_physiology_types_duration_underlying mech.
Sleep_physiology_types_duration_underlying mech.
klynct
 
Euclid: The Story So far, a Departmental Colloquium at Maynooth University
Euclid: The Story So far, a Departmental Colloquium at Maynooth UniversityEuclid: The Story So far, a Departmental Colloquium at Maynooth University
Euclid: The Story So far, a Departmental Colloquium at Maynooth University
Peter Coles
 
ICAI OpenGov Lab: A Quick Introduction | AI for Open Government
ICAI OpenGov Lab: A Quick Introduction | AI for Open GovernmentICAI OpenGov Lab: A Quick Introduction | AI for Open Government
ICAI OpenGov Lab: A Quick Introduction | AI for Open Government
David Graus
 
Fatigue and its management in aviation medicine
Fatigue and its management in aviation medicineFatigue and its management in aviation medicine
Fatigue and its management in aviation medicine
ImranJewel2
 
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
Sérgio Sacani
 
Astrobiological implications of the stability andreactivity of peptide nuclei...
Astrobiological implications of the stability andreactivity of peptide nuclei...Astrobiological implications of the stability andreactivity of peptide nuclei...
Astrobiological implications of the stability andreactivity of peptide nuclei...
Sérgio Sacani
 
External Application in Homoeopathy- Definition,Scope and Types.
External Application  in Homoeopathy- Definition,Scope and Types.External Application  in Homoeopathy- Definition,Scope and Types.
External Application in Homoeopathy- Definition,Scope and Types.
AdharshnaPatrick
 
Applications of Radioisotopes in Cancer Research.pptx
Applications of Radioisotopes in Cancer Research.pptxApplications of Radioisotopes in Cancer Research.pptx
Applications of Radioisotopes in Cancer Research.pptx
MahitaLaveti
 
Proprioceptors_ receptors of muscle_tendon
Proprioceptors_ receptors of muscle_tendonProprioceptors_ receptors of muscle_tendon
Proprioceptors_ receptors of muscle_tendon
klynct
 
Eric Schott- Environment, Animal and Human Health (3).pptx
Eric Schott- Environment, Animal and Human Health (3).pptxEric Schott- Environment, Animal and Human Health (3).pptx
Eric Schott- Environment, Animal and Human Health (3).pptx
ttalbert1
 
Preclinical Advances in Nuclear Neurology.pptx
Preclinical Advances in Nuclear Neurology.pptxPreclinical Advances in Nuclear Neurology.pptx
Preclinical Advances in Nuclear Neurology.pptx
MahitaLaveti
 
A CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptx
A CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptxA CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptx
A CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptx
ANJALICHANDRASEKARAN
 
Seismic evidence of liquid water at the base of Mars' upper crust
Seismic evidence of liquid water at the base of Mars' upper crustSeismic evidence of liquid water at the base of Mars' upper crust
Seismic evidence of liquid water at the base of Mars' upper crust
Sérgio Sacani
 
AP 2024 Unit 1 Updated Chemistry of Life
AP 2024 Unit 1 Updated Chemistry of LifeAP 2024 Unit 1 Updated Chemistry of Life
AP 2024 Unit 1 Updated Chemistry of Life
mseileenlinden
 
Funakoshi_ZymoResearch_2024-2025_catalog
Funakoshi_ZymoResearch_2024-2025_catalogFunakoshi_ZymoResearch_2024-2025_catalog
Funakoshi_ZymoResearch_2024-2025_catalog
fu7koshi
 

Data Streaming in Big Data Analysis

  • 1. Data Streaming in Big Data Analysis Docent lecture Vincenzo Gulisano, Ph.D. Vincenzo Gulisano Data streaming in Big Data analysis 1
  • 2. Vincenzo Gulisano Data streaming in Big Data analysis 2
  • 3. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 3
  • 4. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 4
  • 5. ... back in the year 2000 • Continuous processing of data streams • Real-time fashion ... Store-then-process is not feasible Vincenzo Gulisano Data streaming in Big Data analysis 5 Financial applications Sensor networks ISPs
  • 6. ~ 2010 Vincenzo Gulisano Data streaming in Big Data analysis 6
  • 7. 2017 Vincenzo Gulisano Data streaming in Big Data analysis 7 Advanced Metering Infrastructures Vehicular Networks 1. Billions of readings per day cannot be transferred continuously 2. The latency incurred while transferring data might undermine the utility of the analysis 3. It is not secure to concentrate all the data in a single place 4. Privacy can be leaked when giving away fine-grained data What do we need then? • Efficient one-pass analysis • In memory • Bounded resources
  • 8. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 8
  • 9. Main Memory Disk 1 Data Query Processing 3 Query results 2 Query Main Memory Query Processing Continuous Query Data Query results 9Data streaming in Big Data analysisVincenzo Gulisano DBMS vs. DSMS
  • 10. data stream: unbounded sequence of tuples sharing the same schema 10 Example: vehicles’ speed and position reports time Field Field vehicle id text time (secs) text speed (Km/h) double X coordinate double Y coordinate double A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2 Data streaming in Big Data analysisVincenzo Gulisano
  • 11. continuous query: Directed Acyclic Graph (DAG) of streams and operators 11 OP OP OP OP OP OP OP source op (1+ out streams) sink op (1+ in streams) stream op (1+ in, 1+ out streams) Data streaming in Big Data analysisVincenzo Gulisano
  • 12. data streaming operators • Stateless operators • do not maintain any state • one-by-one processing • Stateful operators • maintain a state that evolves with the tuples being processed • produce output tuples that depend on multiple input tuples 12 OP OP Data streaming in Big Data analysisVincenzo Gulisano
  • 13. stateless operators 13 Filter ... Map Union ... Filter / route tuples based on one (or more) conditions Transform each tuple Merge multiple streams (with the same schema) into one Data streaming in Big Data analysisVincenzo Gulisano
  • 14. stateful operators 14 Aggregate information from multiple tuples (e.g., compute average speed of the tuples in the last hour) Compare tuples coming from 2 streams given a certain predicate (e.g., given the last 5 tuples from each stream, join every pair reporting the same position) Aggregate Join Data streaming in Big Data analysisVincenzo Gulisano Since streams are unbounded, windows (over time or tuples) are defined to bound the portion of tuples to aggregate or join
  • 15. sample query For each vehicle, raise an alert if the speed of the latest report is more than 2 times higher than its average speed in the last 30 days. 15 time A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2 Data streaming in Big Data analysisVincenzo Gulisano
  • 16. 16 Field vehicle id time (secs) speed (Km/h) X coordinate Y coordinate Compute average speed for each vehicle during the last 30 days Aggregate Field vehicle id time (secs) avg speed (Km/h) Join Check condition Filter Field vehicle id time (secs) speed (Km/h) Join on vehicle id Field vehicle id time (secs) avg speed (Km/h) speed (Km/h) sample query Data streaming in Big Data analysisVincenzo Gulisano
  • 17. A B C A B C A B C A B C B C A A A B B Vincenzo Gulisano Data streaming in Big Data analysis 17
  • 18. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 18
  • 19. Faulttolerance Elasticity Loadbalancing Determinism Parallel execution of streaming operators Vincenzo Gulisano Data streaming in Big Data analysis 19 Parallel execution of streaming applications OP1 OP2 OP1 OP2 OP1 OP2 = 1) how to route tuples? 2) where to route tuples? 3) How to merge tuples? 4) How many instances to deploy per operator? ...
  • 20. Faulttolerance Elasticity Loadbalancing Determinism Parallel execution of streaming operators Vincenzo Gulisano Data streaming in Big Data analysis 20 Parallel execution of streaming applications DDoSdetection andmitigation Intrusiondetection Datavalidation Differentially privateaggregation Vehicularnetworks analysis Urbanmobility analysis Security and privacy IoT Transportation sustainability
  • 21. Synchronization / Data structures Faulttolerance Elasticity Loadbalancing Determinism Parallel execution of streaming operators Vincenzo Gulisano Data streaming in Big Data analysis 21 Parallel execution of streaming applications Many-core systems / FPGAs DDoSdetection andmitigation Intrusiondetection Datavalidation Differentially privateaggregation Vehicularnetworks analysis Urbanmobility analysis Parallel joins Parallel aggregates Joins modeling Security and privacy IoT Transportation sustainability
  • 22. Vincenzo Gulisano Data streaming in Big Data analysis 22 Synchronization / Data structures Faulttolerance Elasticity Loadbalancing DeterminismParallel execution of streaming operators Parallel execution of streaming applications Many-core systems / FPGAs DDoSdetection andmitigation Intrusiondetection Datavalidation Differentially privateaggregation Vehicularnetworks analysis Urbanmobility analysis Security and privacy Transportatio sustainabilit Distributed execution of streaming applications first the hardware, then the query first the query, then the hardware
  • 23. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 23
  • 24. Millions of sensors • Store information • Iterate multiple times over data • Think, do not rush through decisions • ”Hard-wired” routines • Real-time decisions • High-throughput / low-latency Do I really need to try surströmming? (... NO) Danger!!! Run!!! (surströmming can opened) Vincenzo Gulisano Data streaming in Big Data analysis 24
  • 25. What traffic congestion patterns can I observe frequently? Don’t take over, car in opposite lane! Vincenzo Gulisano Data streaming in Big Data analysis 25 • Store information • Iterate multiple times over data • Think, do not rush through decisions • ”Hard-wired” routines • Real-time decisions • High-throughput / low-latency Millions of sensors + + +
  • 26. Agenda •Why data streaming? •How does it work? •Past, current and future research •Conclusions •Bibliography Vincenzo Gulisano Data streaming in Big Data analysis 26
  • 27. Bibliography 1. Zhou, Jiazhen, Rose Qingyang Hu, and Yi Qian. "Scalable distributed communication architectures to support advanced metering infrastructure in smart grid." IEEE Transactions on Parallel and Distributed Systems 23.9 (2012): 1632-1642. 2. Gulisano, Vincenzo, et al. "BES: Differentially Private and Distributed Event Aggregation in Advanced Metering Infrastructures." Proceedings of the 2nd ACM International Workshop on Cyber-Physical System Security. ACM, 2016. 3. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "Online and scalable data validation in advanced metering infrastructures." IEEE PES Innovative Smart Grid Technologies, Europe. IEEE, 2014. 4. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "METIS: a two-tier intrusion detection system for advanced metering infrastructures." International Conference on Security and Privacy in Communication Systems. Springer International Publishing, 2014. 5. Yousefi, Saleh, Mahmoud Siadat Mousavi, and Mahmood Fathy. "Vehicular ad hoc networks (VANETs): challenges and perspectives." 2006 6th International Conference on ITS Telecommunications. IEEE, 2006. 6. El Zarki, Magda, et al. "Security issues in a future vehicular network." European Wireless. Vol. 2. 2002. 7. Georgiadis, Giorgos, and Marina Papatriantafilou. "Dealing with storage without forecasts in smart grids: Problem transformation and online scheduling algorithm." Proceedings of the 29th Annual ACM Symposium on Applied Computing. ACM, 2014. 8. Fu, Zhang, et al. "Online temporal-spatial analysis for detection of critical events in Cyber-Physical Systems." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014. Data streaming in Big Data analysis 27Vincenzo Gulisano
  • 28. Bibliography 9. Arasu, Arvind, et al. "Linear road: a stream data management benchmark." Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. VLDB Endowment, 2004. 10. Lv, Yisheng, et al. "Traffic flow prediction with big data: a deep learning approach." IEEE Transactions on Intelligent Transportation Systems 16.2 (2015): 865-873. 11. Grochocki, David, et al. "AMI threats, intrusion detection requirements and deployment recommendations." Smart Grid Communications (SmartGridComm), 2012 IEEE Third International Conference on. IEEE, 2012. 12. Molina-Markham, Andrés, et al. "Private memoirs of a smart meter." Proceedings of the 2nd ACM workshop on embedded sensing systems for energy-efficiency in building. ACM, 2010. 13. Gulisano, Vincenzo, et al. "Streamcloud: A large scale data streaming system." Distributed Computing Systems (ICDCS), 2010 IEEE 30th International Conference on. IEEE, 2010. 14. Stonebraker, Michael, Uǧur Çetintemel, and Stan Zdonik. "The 8 requirements of real-time stream processing." ACM SIGMOD Record 34.4 (2005): 42-47. 15. Bonomi, Flavio, et al. "Fog computing and its role in the internet of things." Proceedings of the first edition of the MCC workshop on Mobile cloud computing. ACM, 2012. Data streaming in Big Data analysis 28Vincenzo Gulisano
  • 29. Bibliography 16. Gulisano, Vincenzo Massimiliano. StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine. Diss. Informatica, 2012. 17. Cardellini, Valeria, et al. "Optimal operator placement for distributed stream processing applications." Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. ACM, 2016. 18. Costache, Stefania, et al. "Understanding the Data-Processing Challenges in Intelligent Vehicular Systems." Proceedings of the 2016 IEEE Intelligent Vehicles Symposium (IV16). 19. Giatrakos, Nikos, Antonios Deligiannakis, and Minos Garofalakis. "Scalable Approximate Query Tracking over Highly Distributed Data Streams." Proceedings of the 2016 International Conference on Management of Data. ACM, 2016. 20. Gulisano, Vincenzo, et al. "Streamcloud: An elastic and scalable data streaming system." IEEE Transactions on Parallel and Distributed Systems 23.12 (2012): 2351-2365. 21. Shah, Mehul A., et al. "Flux: An adaptive partitioning operator for continuous query systems." Data Engineering, 2003. Proceedings. 19th International Conference on. IEEE, 2003. Data streaming in Big Data analysis 29Vincenzo Gulisano
  • 30. Bibliography 22. Cederman, Daniel, et al. "Brief announcement: concurrent data structures for efficient streaming aggregation." Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 2014. 23. Ji, Yuanzhen, et al. "Quality-driven processing of sliding window aggregates over out-of-order data streams." Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems. ACM, 2015. 24. Ji, Yuanzhen, et al. "Quality-driven disorder handling for concurrent windowed stream queries with shared operators." Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. ACM, 2016. 25. Gulisano, Vincenzo, et al. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." Big Data (Big Data), 2015 IEEE International Conference on. IEEE, 2015. 26. Ottenwälder, Beate, et al. "MigCEP: operator migration for mobility driven distributed complex event processing." Proceedings of the 7th ACM international conference on Distributed event-based systems. ACM, 2013. 27. De Matteis, Tiziano, and Gabriele Mencagli. "Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing." Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 2016. 28. Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." ACM Transactions on Database Systems (TODS) 33.1 (2008): 3. 29. Castro Fernandez, Raul, et al. "Integrating scale out and fault tolerance in stream processing using operator state management." Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 2013. Data streaming in Big Data analysis 30Vincenzo Gulisano
  • 31. Bibliography 30. Dwork, Cynthia. "Differential privacy: A survey of results." International Conference on Theory and Applications of Models of Computation. Springer Berlin Heidelberg, 2008. 31. Dwork, Cynthia, et al. "Differential privacy under continual observation." Proceedings of the forty-second ACM symposium on Theory of computing. ACM, 2010. 32. Kargl, Frank, Arik Friedman, and Roksana Boreli. "Differential privacy in intelligent transportation systems." Proceedings of the sixth ACM conference on Security and privacy in wireless and mobile networks. ACM, 2013. Data streaming in Big Data analysis 31Vincenzo Gulisano

Editor's Notes

  • #3: Before we start... questions and please notice
  • #25: making peace with DBs...
  翻译: