SlideShare a Scribd company logo
A New Algorithm Model for Massive-Scale
Streaming Graph Analysis
Chunxing Yin and Jason Riedy
Georgia Institute of Technology
ICIAM, 16 July 2019
Outline
Motivation and Applications
New Algorithm Model
Streaming Analysis
Closing
New Model for Streaming Graphs — ICIAM 2019 1/20
Motivation and Applications
(insert prefix here)-scale data analysis
Cyber-security Identify anomalies, malicious actors
Health care Finding outbreaks, population epidemiology
Social networks Advertising, searching, grouping
Intelligence Decisions at scale, regulating markets, smart &
sustainable cities
Systems biology Understanding interactions, drug design
Power grid Disruptions, conservation
Simulation Discrete events, cracking meshes
Changes are important. Cannot stop the world...
New Model for Streaming Graphs — ICIAM 2019 2/20
Potential Applications
• Social Networks
• Identify communities, influences, bridges, trends,
anomalies (trends before they happen)...
• Potential to help social sciences, city planning, and
others with large-scale data.
• Cybersecurity
• Determine if new connections can access a device or
represent new threat in < 5ms...
• Is the transfer by a virus / persistent threat?
• Bioinformatics, health
• Construct gene sequences, analyze protein
interactions, map brain interactions
• Credit fraud forensics ⇒ detection ⇒ monitoring
• Real-time integration of all the customer’s data
New Model for Streaming Graphs — ICIAM 2019 3/20
Streaming graph data
Network data rates:
• Gigabit ethernet: 81k – 1.5M packets per second
• Over 130 000 flows per second on 10 GigE (< 7.7 µs)
Person-level data rates:
• 500M posts per day on Twitter (6k / sec)1
• 3M posts per minute on Facebook (50k / sec)2
Should analyze only changes and not entire graph.
Throughput & latency trade off and expose different
levels of concurrency.
1
www.internetlivestats.com/twitter-statistics/
2
www.jeffbullas.com/2015/04/17/21-awesome-facebook-facts-and-statistics-you-need-to-check-out/
New Model for Streaming Graphs — ICIAM 2019 4/20
Streaming graph analysis
Terminology (not universal):
• Streaming changes into a massive, evolving graph
• Need to handle deletions as well as insertions
Previous STINGER performance results (x86-64):
Data ingest >2M upd/sec [Ediger, McColl, Poovey, Campbell, &
B 2014]
Clustering coefficients >100K upd/sec [Riedy, Meyerhenke,
B, E, & Mattson 2012]
Connected comp. >1M upd/sec [McColl, Green, & B 2013]
Community clustering >100K upd/sec∗
[R & B 2013]
PageRank Up to 40× latency improvement [R 2016]
New Model for Streaming Graphs — ICIAM 2019 5/20
New Algorithm Model
Starting incremental / streaming algorithms
• Incremental and
streaming algorithms
start somewhere.
• Initial, static
computation can take a
rather long time...
• During which the graph
cannot change?
• What about supporting
many simultaneous
analyses?
Data ingest rates, R-MAT into
R-MAT, scales 24 & 30
●
●
●
●
●
●
1e+02
1e+03
1e+04
1e+05
1e+06
1 10 100 1000 10000 1e+05
Batch size
Updaterate(upd/s)
platform ● Power8 Haswell Haswell−30
What can we run while the graph changes?
New Model for Streaming Graphs — ICIAM 2019 6/20
What if we don’t hold up changes?
When is an algorithm valid?
Analyze concurrently with the graph changes, and
produce a result correct for the starting graph and
some subset of concurrent changes.
• No locking beyond atomic operations.
• No versioned data structure.
• No stopping.
New Model for Streaming Graphs — ICIAM 2019 7/20
Sample of other execution models
• Put in a query, wait for sufficient data [Phillips, et al.
at Sandia]
• Different but very interesting model.
• Evolving: Sample, accurate w/high-prob.
• Difficult to generalize into graph results (e.g.
shortest path tree).
• Classical: dynamic algorithms, versioned data
• Can require drastically more storage, possibly a copy
of the graph per property, or more overhead for
techniques like read-copy-update.
Generally do not address the latency of computing the
“static” starting point.
New Model for Streaming Graphs — ICIAM 2019 8/20
Algorithm validity in our model: Example.
Can you compute degrees in an undirected graph (no self
loops) concurrently with changes?
Algorithm: Iterate over vertices, count the number of
neighbors.
1
Compute deg(v1)
1 0
Compute deg(v2)
delete edge
Cannot correspond to an undirected graph at all!
Valid for our model? No!
Not incorrect, just not valid for our model.
New Model for Streaming Graphs — ICIAM 2019 9/20
Algorithm validity in our model: Example.
Can you compute degrees in an undirected graph (no self
loops) concurrently with changes?
Algorithm: Iterate over edges, increment the degrees of
the endpoints.
1 1
Inc deg(v1), deg(v2)
1 1
(later...)
delete edge
Corresponds to the beginning graph plus a subset of
concurrent changes.
Valid for our model? Yes!
Undirected stored as directed: skip edges with v1 ≥ v2.
New Model for Streaming Graphs — ICIAM 2019 10/20
Algorithm validity in our model
s
w(e1) = 10
w(e2) = 5 → 1
∆ = 4
• What is valid?
• Typical BFS
• Shiloach-Vishkin connected components
• PageRank, Katz via Jacobi
• Making a copy! (Vertex-induced subgraph)
• What is invalid?
• Making a decision twice in implementations
• ∆-stepping SSSP: Decrease a weight below ∆
• Degree optimization: Cross threshold, miss vertex
• Applying old or different information
• Multiple searches: Betweenness centrality
• Labeling in S. Kahan’s components alg
New Model for Streaming Graphs — ICIAM 2019 11/20
Example: PageRank, Katz Centrality
PageRank
Distribution of rand. walks
(I − αD−1
AT
)x = 1/|V|
Katz Centrality
Count of number of walks
(I − αAT
)x = 1
A: row → col adjacency matrix
D: diagonal matrix of out-degrees
|V|: number of vertices, 1: all-1 vector
Both can be solved by Jacobi iteration, e.g. for Katz:
(I − αAT
)x = 1 ⇒ x(k+1)
= αAT
x(k)
+ 1
New Model for Streaming Graphs — ICIAM 2019 12/20
Jacobi can be valid for our model
Core loop of Jacobi iteration for Katz centrality:
while r(k)
≥ ϵ
1. x(k+1)
= αAT
x(k)
+ 1
2. r(k+1)
= 1−(I−αAT
)x(k+1)
3. k = k + 1
Except this is not valid. Residual r(k+1)
may use a different
graph / adjacency matrix A.
New Model for Streaming Graphs — ICIAM 2019 13/20
Jacobi can be valid for our model
Core loop of Jacobi iteration for Katz centrality:
do
1. x(k+1)
= αAT
x(k)
+ 1 and
r(k)
= 1 − (I − αAT
)x(k)
2. k = k + 1
until r(k−1)
< ε
Must use the same graph for all requirements.
Will need r(k−1)
later!
This also affects convergence speed.
New Model for Streaming Graphs — ICIAM 2019 13/20
Fun properties for one-shot queries
Due to Chunxing Yin3
, under sensible assumptions:
1. You can produce a single-change stream to
demonstrate invalidity.
2. Algorithms producing a subgraph of the input cannot
be guaranteed to run concurrently with changes and
always produce moment-in-time outputs.
3
Yin, Riedy, et al. A New Algorithmic Model for Graph Analysis of Streaming Data. 14th International Workshop on
Mining and Learning with Graphs (MLG), May 2018.
New Model for Streaming Graphs — ICIAM 2019 14/20
Fun properties for one-shot queries
Due to Chunxing Yin3
, under sensible assumptions:
1. You can produce a single-change stream to
demonstrate invalidity.
• Proof idea: Start with a graph that incorporates all
the visible changes, introduce the one change at the
right time.
2. Algorithms producing a subgraph of the input cannot
be guaranteed to run concurrently with changes and
always produce moment-in-time outputs.
3
Yin, Riedy, et al. A New Algorithmic Model for Graph Analysis of Streaming Data. 14th International Workshop on
Mining and Learning with Graphs (MLG), May 2018.
New Model for Streaming Graphs — ICIAM 2019 14/20
Fun properties for one-shot queries
Due to Chunxing Yin3
, under sensible assumptions:
1. You can produce a single-change stream to
demonstrate invalidity.
2. Algorithms producing a subgraph of the input cannot
be guaranteed to run concurrently with changes and
always produce moment-in-time outputs.
• Proof idea: Any time a snapshot result could happen,
delete then re-insert an edge from the output.
3
Yin, Riedy, et al. A New Algorithmic Model for Graph Analysis of Streaming Data. 14th International Workshop on
Mining and Learning with Graphs (MLG), May 2018.
New Model for Streaming Graphs — ICIAM 2019 14/20
Streaming Analysis
On to streaming...
Can we update graph metrics as new data arrives without
just re-running?
• Track what changed during the one-shot query.
• Update locally around those changes, while other
changes are occuring.
• If the update is valid, can repeat to follow a
streaming graph.
Initial
∆0
Upd. w/∆0
∆1
Upd. w/∆1
∆2
Examples: PageRank & Katz, iterative refinement.
Connected components, maintain a spanning tree.
New Model for Streaming Graphs — ICIAM 2019 15/20
Early results with PageRank
(Will explain the algorithm in a moment.)
Synchronous: Ingest delays will increase with # kernels.
Red dot: ingested batch. Blue dot: PR kernel begins.
Vertical: # of iterations
New Model for Streaming Graphs — ICIAM 2019 16/20
Early results with PageRank
(Will explain the algorithm in a moment.)
Concurrent: Constant-rate ingest!
Detects sudden structural change?
Red dot: ingested batch. Blue dot: PR kernel begins.
Vertical: # of iterations
New Model for Streaming Graphs — ICIAM 2019 16/20
Aside: Diameter Inside a “Batch”
New Model for Streaming Graphs — ICIAM 2019 17/20
Updating PageRank and Katz Centrality
Essentially, iterative refinement.
Algorithm 1 Update x and r with given ∆
Input: Graph Hi, previous solution xi, batch ∆i
Output: updated Katz centrality vector xi+1
1: ri = 1 − (I − αHi)xi ← saved from previous round
2: ˜ri+1 = ri + α∆ixi
3: ∆x = JACOBI(I − αHi+1,˜ri+1, tol)
4: xi+1 = xi + ∆x
5: ∆r = α∆ixi − (I − αHi+1)∆x ← saved from Jacobi
6: ri+1 = ri + ∆r
7: return xi+1
New Model for Streaming Graphs — ICIAM 2019 18/20
Closing
Open issues
Difficult problems: Updating triangle counts efficiently!
• Option: re-counting a region around changes,
stopping once counts do not change.
• Possibly incorrect on the region’s border,
but only at changes.
• Next run can fix those... A looser model?
Some algorithms essentially copy subgraphs.
• What are the size bounds?
• Can they characterize algorithms / properties?
• Can we formalize what needs kept for updating
results?
New Model for Streaming Graphs — ICIAM 2019 19/20
Closing
• Summary
• Analysis concurrent with graph change can work.
• But not all methods are valid.
• Avoid evaluating conditions or exploring the graph
more than once.
• Save information necessary for updates.
• Future work
• Track subgraphs / communities for “slow” analyses
• Develop more valid updating methods.
• Explore approximation results related to concurrent
analysis.
New Model for Streaming Graphs — ICIAM 2019 20/20
Ad

More Related Content

What's hot (10)

06 - HAMS implementation
06 - HAMS implementation06 - HAMS implementation
06 - HAMS implementation
HAMSproject
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
Alessandro Negro
 
Sparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersSparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R users
ICTeam S.p.A.
 
Big Data LDN 2016: Data Warehouse Automation: Solve integration challenges, s...
Big Data LDN 2016: Data Warehouse Automation: Solve integration challenges, s...Big Data LDN 2016: Data Warehouse Automation: Solve integration challenges, s...
Big Data LDN 2016: Data Warehouse Automation: Solve integration challenges, s...
Matt Stubbs
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
Akin Osman Kazakci
 
Denis Reznik Data driven future
Denis Reznik Data driven futureDenis Reznik Data driven future
Denis Reznik Data driven future
Аліна Шепшелей
 
Augmented reality meets computer vision data generation for driving scenes.
Augmented reality meets computer vision data generation for driving scenes.  Augmented reality meets computer vision data generation for driving scenes.
Augmented reality meets computer vision data generation for driving scenes.
Abdulrahman Kerim
 
Applocation of Numerical Methods
Applocation of Numerical MethodsApplocation of Numerical Methods
Applocation of Numerical Methods
Nahian Ahmed
 
Time Delayed Recurrent Neural Network for Multi-Step Prediction
Time Delayed Recurrent Neural Network for Multi-Step PredictionTime Delayed Recurrent Neural Network for Multi-Step Prediction
Time Delayed Recurrent Neural Network for Multi-Step Prediction
Kostas Hatalis, PhD
 
Probabilistic Forecasting: How and Why?
Probabilistic Forecasting: How and Why?Probabilistic Forecasting: How and Why?
Probabilistic Forecasting: How and Why?
Kostas Hatalis, PhD
 
06 - HAMS implementation
06 - HAMS implementation06 - HAMS implementation
06 - HAMS implementation
HAMSproject
 
Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
Alessandro Negro
 
Sparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R usersSparklyr: Big Data enabler for R users
Sparklyr: Big Data enabler for R users
ICTeam S.p.A.
 
Big Data LDN 2016: Data Warehouse Automation: Solve integration challenges, s...
Big Data LDN 2016: Data Warehouse Automation: Solve integration challenges, s...Big Data LDN 2016: Data Warehouse Automation: Solve integration challenges, s...
Big Data LDN 2016: Data Warehouse Automation: Solve integration challenges, s...
Matt Stubbs
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
Akin Osman Kazakci
 
Augmented reality meets computer vision data generation for driving scenes.
Augmented reality meets computer vision data generation for driving scenes.  Augmented reality meets computer vision data generation for driving scenes.
Augmented reality meets computer vision data generation for driving scenes.
Abdulrahman Kerim
 
Applocation of Numerical Methods
Applocation of Numerical MethodsApplocation of Numerical Methods
Applocation of Numerical Methods
Nahian Ahmed
 
Time Delayed Recurrent Neural Network for Multi-Step Prediction
Time Delayed Recurrent Neural Network for Multi-Step PredictionTime Delayed Recurrent Neural Network for Multi-Step Prediction
Time Delayed Recurrent Neural Network for Multi-Step Prediction
Kostas Hatalis, PhD
 
Probabilistic Forecasting: How and Why?
Probabilistic Forecasting: How and Why?Probabilistic Forecasting: How and Why?
Probabilistic Forecasting: How and Why?
Kostas Hatalis, PhD
 

Similar to ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis (20)

Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
Jason Riedy
 
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
Subhajit Sahu
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Jason Riedy
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
Subhajit Sahu
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
Aapo Kyrölä
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
Rafael Nogueras
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
Jason Riedy
 
# Can we trust ai. the dilemma of model adjustment
# Can we trust ai. the dilemma of model adjustment# Can we trust ai. the dilemma of model adjustment
# Can we trust ai. the dilemma of model adjustment
Terence Huang
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Subhajit Sahu
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
Petr Novotný
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Jen Aman
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA Hardware
TigerGraph
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
Mohit Garg
 
PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che,...
PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che,...PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che,...
PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che,...
AMD Developer Central
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Spark Summit
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
STINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph StreamingSTINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph Streaming
Jason Riedy
 
20151130
2015113020151130
20151130
chen chao
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
Jen Aman
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
Jason Riedy
 
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
Subhajit Sahu
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Jason Riedy
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
Subhajit Sahu
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
Aapo Kyrölä
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
Rafael Nogueras
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
Jason Riedy
 
# Can we trust ai. the dilemma of model adjustment
# Can we trust ai. the dilemma of model adjustment# Can we trust ai. the dilemma of model adjustment
# Can we trust ai. the dilemma of model adjustment
Terence Huang
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Subhajit Sahu
 
Big Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big GraphsBig Stream Processing Systems, Big Graphs
Big Stream Processing Systems, Big Graphs
Petr Novotný
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Jen Aman
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA Hardware
TigerGraph
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
Mohit Garg
 
PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che,...
PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che,...PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che,...
PL-4089, Accelerating and Evaluating OpenCL Graph Applications, by Shuai Che,...
AMD Developer Central
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Spark Summit
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
STINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph StreamingSTINGER: Multi-threaded Graph Streaming
STINGER: Multi-threaded Graph Streaming
Jason Riedy
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
Jen Aman
 
Ad

More from Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
Jason Riedy
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
Jason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
Jason Riedy
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
Jason Riedy
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
Jason Riedy
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
Jason Riedy
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
Jason Riedy
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
Jason Riedy
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
Jason Riedy
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
Jason Riedy
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
Jason Riedy
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Jason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
Jason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
Jason Riedy
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
Jason Riedy
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity Analysis
Jason Riedy
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Jason Riedy
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Jason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
Jason Riedy
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
Jason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
Jason Riedy
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
Jason Riedy
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
Jason Riedy
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
Jason Riedy
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
Jason Riedy
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
Jason Riedy
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
Jason Riedy
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
Jason Riedy
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
Jason Riedy
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Jason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
Jason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
Jason Riedy
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
Jason Riedy
 
Network Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity AnalysisNetwork Challenge: Error and Sensitivity Analysis
Network Challenge: Error and Sensitivity Analysis
Jason Riedy
 
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
Jason Riedy
 
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel PlatformsSTING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Jason Riedy
 
Ad

Recently uploaded (20)

AWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptxAWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptx
bharatkumarbhojwani
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Jayantilal Bhanushali
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
report (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhsreport (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhs
AngelPinedaTaguinod
 
Feature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record SystemsFeature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record Systems
Process mining Evangelist
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
presentacion.slideshare.informáticaJuridica..pptx
presentacion.slideshare.informáticaJuridica..pptxpresentacion.slideshare.informáticaJuridica..pptx
presentacion.slideshare.informáticaJuridica..pptx
GersonVillatoro4
 
TYPES OF SOFTWARE_ A Visual Guide.pdf CA SUVIDHA CHAPLOT
TYPES OF SOFTWARE_ A Visual Guide.pdf CA SUVIDHA CHAPLOTTYPES OF SOFTWARE_ A Visual Guide.pdf CA SUVIDHA CHAPLOT
TYPES OF SOFTWARE_ A Visual Guide.pdf CA SUVIDHA CHAPLOT
CA Suvidha Chaplot
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
Lesson 6-Interviewing in SHRM_updated.pdf
Lesson 6-Interviewing in SHRM_updated.pdfLesson 6-Interviewing in SHRM_updated.pdf
Lesson 6-Interviewing in SHRM_updated.pdf
hemelali11
 
Process Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital TransformationsProcess Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital Transformations
Process mining Evangelist
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
Mining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - MicrosoftMining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - Microsoft
Process mining Evangelist
 
How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?
Process mining Evangelist
 
Storage Devices and the Mechanism of Data Storage in Audio and Visual Form
Storage Devices and the Mechanism of Data Storage in Audio and Visual FormStorage Devices and the Mechanism of Data Storage in Audio and Visual Form
Storage Devices and the Mechanism of Data Storage in Audio and Visual Form
Professional Content Writing's
 
Process Mining at Deutsche Bank - Journey
Process Mining at Deutsche Bank - JourneyProcess Mining at Deutsche Bank - Journey
Process Mining at Deutsche Bank - Journey
Process mining Evangelist
 
AWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptxAWS RDS Presentation to make concepts easy.pptx
AWS RDS Presentation to make concepts easy.pptx
bharatkumarbhojwani
 
AI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptxAI ------------------------------ W1L2.pptx
AI ------------------------------ W1L2.pptx
AyeshaJalil6
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Jayantilal Bhanushali
 
hersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distributionhersh's midterm project.pdf music retail and distribution
hersh's midterm project.pdf music retail and distribution
hershtara1
 
Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
report (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhsreport (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhs
AngelPinedaTaguinod
 
Feature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record SystemsFeature Engineering for Electronic Health Record Systems
Feature Engineering for Electronic Health Record Systems
Process mining Evangelist
 
HershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistributionHershAggregator (2).pdf musicretaildistribution
HershAggregator (2).pdf musicretaildistribution
hershtara1
 
presentacion.slideshare.informáticaJuridica..pptx
presentacion.slideshare.informáticaJuridica..pptxpresentacion.slideshare.informáticaJuridica..pptx
presentacion.slideshare.informáticaJuridica..pptx
GersonVillatoro4
 
TYPES OF SOFTWARE_ A Visual Guide.pdf CA SUVIDHA CHAPLOT
TYPES OF SOFTWARE_ A Visual Guide.pdf CA SUVIDHA CHAPLOTTYPES OF SOFTWARE_ A Visual Guide.pdf CA SUVIDHA CHAPLOT
TYPES OF SOFTWARE_ A Visual Guide.pdf CA SUVIDHA CHAPLOT
CA Suvidha Chaplot
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
Lesson 6-Interviewing in SHRM_updated.pdf
Lesson 6-Interviewing in SHRM_updated.pdfLesson 6-Interviewing in SHRM_updated.pdf
Lesson 6-Interviewing in SHRM_updated.pdf
hemelali11
 
Process Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital TransformationsProcess Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital Transformations
Process mining Evangelist
 
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docxAnalysis of Billboards hot 100 toop five hit makers on the chart.docx
Analysis of Billboards hot 100 toop five hit makers on the chart.docx
hershtara1
 
Lagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdfLagos School of Programming Final Project Updated.pdf
Lagos School of Programming Final Project Updated.pdf
benuju2016
 
Mining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - MicrosoftMining a Global Trade Process with Data Science - Microsoft
Mining a Global Trade Process with Data Science - Microsoft
Process mining Evangelist
 
How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?
Process mining Evangelist
 
Storage Devices and the Mechanism of Data Storage in Audio and Visual Form
Storage Devices and the Mechanism of Data Storage in Audio and Visual FormStorage Devices and the Mechanism of Data Storage in Audio and Visual Form
Storage Devices and the Mechanism of Data Storage in Audio and Visual Form
Professional Content Writing's
 

ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis

  • 1. A New Algorithm Model for Massive-Scale Streaming Graph Analysis Chunxing Yin and Jason Riedy Georgia Institute of Technology ICIAM, 16 July 2019
  • 2. Outline Motivation and Applications New Algorithm Model Streaming Analysis Closing New Model for Streaming Graphs — ICIAM 2019 1/20
  • 4. (insert prefix here)-scale data analysis Cyber-security Identify anomalies, malicious actors Health care Finding outbreaks, population epidemiology Social networks Advertising, searching, grouping Intelligence Decisions at scale, regulating markets, smart & sustainable cities Systems biology Understanding interactions, drug design Power grid Disruptions, conservation Simulation Discrete events, cracking meshes Changes are important. Cannot stop the world... New Model for Streaming Graphs — ICIAM 2019 2/20
  • 5. Potential Applications • Social Networks • Identify communities, influences, bridges, trends, anomalies (trends before they happen)... • Potential to help social sciences, city planning, and others with large-scale data. • Cybersecurity • Determine if new connections can access a device or represent new threat in < 5ms... • Is the transfer by a virus / persistent threat? • Bioinformatics, health • Construct gene sequences, analyze protein interactions, map brain interactions • Credit fraud forensics ⇒ detection ⇒ monitoring • Real-time integration of all the customer’s data New Model for Streaming Graphs — ICIAM 2019 3/20
  • 6. Streaming graph data Network data rates: • Gigabit ethernet: 81k – 1.5M packets per second • Over 130 000 flows per second on 10 GigE (< 7.7 µs) Person-level data rates: • 500M posts per day on Twitter (6k / sec)1 • 3M posts per minute on Facebook (50k / sec)2 Should analyze only changes and not entire graph. Throughput & latency trade off and expose different levels of concurrency. 1 www.internetlivestats.com/twitter-statistics/ 2 www.jeffbullas.com/2015/04/17/21-awesome-facebook-facts-and-statistics-you-need-to-check-out/ New Model for Streaming Graphs — ICIAM 2019 4/20
  • 7. Streaming graph analysis Terminology (not universal): • Streaming changes into a massive, evolving graph • Need to handle deletions as well as insertions Previous STINGER performance results (x86-64): Data ingest >2M upd/sec [Ediger, McColl, Poovey, Campbell, & B 2014] Clustering coefficients >100K upd/sec [Riedy, Meyerhenke, B, E, & Mattson 2012] Connected comp. >1M upd/sec [McColl, Green, & B 2013] Community clustering >100K upd/sec∗ [R & B 2013] PageRank Up to 40× latency improvement [R 2016] New Model for Streaming Graphs — ICIAM 2019 5/20
  • 9. Starting incremental / streaming algorithms • Incremental and streaming algorithms start somewhere. • Initial, static computation can take a rather long time... • During which the graph cannot change? • What about supporting many simultaneous analyses? Data ingest rates, R-MAT into R-MAT, scales 24 & 30 ● ● ● ● ● ● 1e+02 1e+03 1e+04 1e+05 1e+06 1 10 100 1000 10000 1e+05 Batch size Updaterate(upd/s) platform ● Power8 Haswell Haswell−30 What can we run while the graph changes? New Model for Streaming Graphs — ICIAM 2019 6/20
  • 10. What if we don’t hold up changes? When is an algorithm valid? Analyze concurrently with the graph changes, and produce a result correct for the starting graph and some subset of concurrent changes. • No locking beyond atomic operations. • No versioned data structure. • No stopping. New Model for Streaming Graphs — ICIAM 2019 7/20
  • 11. Sample of other execution models • Put in a query, wait for sufficient data [Phillips, et al. at Sandia] • Different but very interesting model. • Evolving: Sample, accurate w/high-prob. • Difficult to generalize into graph results (e.g. shortest path tree). • Classical: dynamic algorithms, versioned data • Can require drastically more storage, possibly a copy of the graph per property, or more overhead for techniques like read-copy-update. Generally do not address the latency of computing the “static” starting point. New Model for Streaming Graphs — ICIAM 2019 8/20
  • 12. Algorithm validity in our model: Example. Can you compute degrees in an undirected graph (no self loops) concurrently with changes? Algorithm: Iterate over vertices, count the number of neighbors. 1 Compute deg(v1) 1 0 Compute deg(v2) delete edge Cannot correspond to an undirected graph at all! Valid for our model? No! Not incorrect, just not valid for our model. New Model for Streaming Graphs — ICIAM 2019 9/20
  • 13. Algorithm validity in our model: Example. Can you compute degrees in an undirected graph (no self loops) concurrently with changes? Algorithm: Iterate over edges, increment the degrees of the endpoints. 1 1 Inc deg(v1), deg(v2) 1 1 (later...) delete edge Corresponds to the beginning graph plus a subset of concurrent changes. Valid for our model? Yes! Undirected stored as directed: skip edges with v1 ≥ v2. New Model for Streaming Graphs — ICIAM 2019 10/20
  • 14. Algorithm validity in our model s w(e1) = 10 w(e2) = 5 → 1 ∆ = 4 • What is valid? • Typical BFS • Shiloach-Vishkin connected components • PageRank, Katz via Jacobi • Making a copy! (Vertex-induced subgraph) • What is invalid? • Making a decision twice in implementations • ∆-stepping SSSP: Decrease a weight below ∆ • Degree optimization: Cross threshold, miss vertex • Applying old or different information • Multiple searches: Betweenness centrality • Labeling in S. Kahan’s components alg New Model for Streaming Graphs — ICIAM 2019 11/20
  • 15. Example: PageRank, Katz Centrality PageRank Distribution of rand. walks (I − αD−1 AT )x = 1/|V| Katz Centrality Count of number of walks (I − αAT )x = 1 A: row → col adjacency matrix D: diagonal matrix of out-degrees |V|: number of vertices, 1: all-1 vector Both can be solved by Jacobi iteration, e.g. for Katz: (I − αAT )x = 1 ⇒ x(k+1) = αAT x(k) + 1 New Model for Streaming Graphs — ICIAM 2019 12/20
  • 16. Jacobi can be valid for our model Core loop of Jacobi iteration for Katz centrality: while r(k) ≥ ϵ 1. x(k+1) = αAT x(k) + 1 2. r(k+1) = 1−(I−αAT )x(k+1) 3. k = k + 1 Except this is not valid. Residual r(k+1) may use a different graph / adjacency matrix A. New Model for Streaming Graphs — ICIAM 2019 13/20
  • 17. Jacobi can be valid for our model Core loop of Jacobi iteration for Katz centrality: do 1. x(k+1) = αAT x(k) + 1 and r(k) = 1 − (I − αAT )x(k) 2. k = k + 1 until r(k−1) < ε Must use the same graph for all requirements. Will need r(k−1) later! This also affects convergence speed. New Model for Streaming Graphs — ICIAM 2019 13/20
  • 18. Fun properties for one-shot queries Due to Chunxing Yin3 , under sensible assumptions: 1. You can produce a single-change stream to demonstrate invalidity. 2. Algorithms producing a subgraph of the input cannot be guaranteed to run concurrently with changes and always produce moment-in-time outputs. 3 Yin, Riedy, et al. A New Algorithmic Model for Graph Analysis of Streaming Data. 14th International Workshop on Mining and Learning with Graphs (MLG), May 2018. New Model for Streaming Graphs — ICIAM 2019 14/20
  • 19. Fun properties for one-shot queries Due to Chunxing Yin3 , under sensible assumptions: 1. You can produce a single-change stream to demonstrate invalidity. • Proof idea: Start with a graph that incorporates all the visible changes, introduce the one change at the right time. 2. Algorithms producing a subgraph of the input cannot be guaranteed to run concurrently with changes and always produce moment-in-time outputs. 3 Yin, Riedy, et al. A New Algorithmic Model for Graph Analysis of Streaming Data. 14th International Workshop on Mining and Learning with Graphs (MLG), May 2018. New Model for Streaming Graphs — ICIAM 2019 14/20
  • 20. Fun properties for one-shot queries Due to Chunxing Yin3 , under sensible assumptions: 1. You can produce a single-change stream to demonstrate invalidity. 2. Algorithms producing a subgraph of the input cannot be guaranteed to run concurrently with changes and always produce moment-in-time outputs. • Proof idea: Any time a snapshot result could happen, delete then re-insert an edge from the output. 3 Yin, Riedy, et al. A New Algorithmic Model for Graph Analysis of Streaming Data. 14th International Workshop on Mining and Learning with Graphs (MLG), May 2018. New Model for Streaming Graphs — ICIAM 2019 14/20
  • 22. On to streaming... Can we update graph metrics as new data arrives without just re-running? • Track what changed during the one-shot query. • Update locally around those changes, while other changes are occuring. • If the update is valid, can repeat to follow a streaming graph. Initial ∆0 Upd. w/∆0 ∆1 Upd. w/∆1 ∆2 Examples: PageRank & Katz, iterative refinement. Connected components, maintain a spanning tree. New Model for Streaming Graphs — ICIAM 2019 15/20
  • 23. Early results with PageRank (Will explain the algorithm in a moment.) Synchronous: Ingest delays will increase with # kernels. Red dot: ingested batch. Blue dot: PR kernel begins. Vertical: # of iterations New Model for Streaming Graphs — ICIAM 2019 16/20
  • 24. Early results with PageRank (Will explain the algorithm in a moment.) Concurrent: Constant-rate ingest! Detects sudden structural change? Red dot: ingested batch. Blue dot: PR kernel begins. Vertical: # of iterations New Model for Streaming Graphs — ICIAM 2019 16/20
  • 25. Aside: Diameter Inside a “Batch” New Model for Streaming Graphs — ICIAM 2019 17/20
  • 26. Updating PageRank and Katz Centrality Essentially, iterative refinement. Algorithm 1 Update x and r with given ∆ Input: Graph Hi, previous solution xi, batch ∆i Output: updated Katz centrality vector xi+1 1: ri = 1 − (I − αHi)xi ← saved from previous round 2: ˜ri+1 = ri + α∆ixi 3: ∆x = JACOBI(I − αHi+1,˜ri+1, tol) 4: xi+1 = xi + ∆x 5: ∆r = α∆ixi − (I − αHi+1)∆x ← saved from Jacobi 6: ri+1 = ri + ∆r 7: return xi+1 New Model for Streaming Graphs — ICIAM 2019 18/20
  • 28. Open issues Difficult problems: Updating triangle counts efficiently! • Option: re-counting a region around changes, stopping once counts do not change. • Possibly incorrect on the region’s border, but only at changes. • Next run can fix those... A looser model? Some algorithms essentially copy subgraphs. • What are the size bounds? • Can they characterize algorithms / properties? • Can we formalize what needs kept for updating results? New Model for Streaming Graphs — ICIAM 2019 19/20
  • 29. Closing • Summary • Analysis concurrent with graph change can work. • But not all methods are valid. • Avoid evaluating conditions or exploring the graph more than once. • Save information necessary for updates. • Future work • Track subgraphs / communities for “slow” analyses • Develop more valid updating methods. • Explore approximation results related to concurrent analysis. New Model for Streaming Graphs — ICIAM 2019 20/20
  翻译: