SlideShare a Scribd company logo
StructMatrix: large-scale visualization of graphs by
means of structure detection and dense matrices
Hugo Gualdron, Robson L. F. Cordeiro, Jose F Rodrigues-Jr
University of Sao Paulo
In collaboration with Carnegie Mellon University
(Prof. Christos Faloutsos, and PhD Danai Koutra)
Funding by research agency Fapesp (2013/03906-0, 2014/07879-0, 2015/18335)
In: The Fifth IEEE ICDM Workshop on Data Mining in Networks,
Atlantic City, NJ, USA - November, 2015
http://www.icmc.usp.br/pessoas/junio
Jose F Rodrigues-Jr (University of Sao Paulo) 1 / 20
Introduction
Motivation
Big Data!!!
A lot of information, much of it in the form of relationships;
Large-scale graphs: graphs generated by applications in which users
or entities are distributed along large geographical areas - even the
entire planet;
Social networks, recommendation networks, road nets, e-commerce,
computer networks, client-product logs, and many others.
Data analysis is the differential for industrial competition.
General Electric & Accenture.
Jose F Rodrigues-Jr (University of Sao Paulo) 2 / 20
Introduction
Problem
Such graphs are too big:
node-link visualization cannot handle even thousand-vertices graphs;
adjacency matrices are limited by the number of pixels of the screen;
in any case, the cardinality of the nodes prevents rationalization;
non-visual analytical techniques might produce way too many
patterns preventing human cognition.
Still, we want to characterize the structure of graphs for:
understanding the overall structure, and not only the
distribution-based analyses;
spotting outliers and trends that are not dominant;
requesting details on demand concerning subregions of the graph
topology.
Jose F Rodrigues-Jr (University of Sao Paulo) 3 / 20
Introduction
Problem
Layouts node-link and adjacency matrix
Node-link Adjacency matrix
Scalability:
Hundred nodes Thousand nodes
Jose F Rodrigues-Jr (University of Sao Paulo) 4 / 20
Introduction
Methodology overview
Assumptions:
graphs are made of recurrent simple structures (cliques, bi-partite
cores, stars, and chains);
such structures are more meaningful than sole nodes;
even at lower resolutions, the graph main properties are maintained in
a visualization.
Hypothesis: we reach more scalable and meaningful graph visualizations
with:
graph summarization by detecting recurrent structures of the graph;
dense adjacency matrices.
Jose F Rodrigues-Jr (University of Sao Paulo) 5 / 20
Methodology
Proposed method: StructMatrix
Our method has two parts:
1 An algorithm to detect substructures;
2 A dense adjacency matrix of the structures that were detected.
Jose F Rodrigues-Jr (University of Sao Paulo) 6 / 20
Methodology
1.Structure detection
Jose F Rodrigues-Jr (University of Sao Paulo) 7 / 20
Methodology
1.Structure detection
We designed a graph partitioning algorithm based on the fact that
real-world graphs obey to power-law distributions;
In such graphs: few nodes with very high degree and the majority of
nodes with low degree;
Kang and Faloutsos [1] demonstrated that the ordered removal of the
higher degree nodes leads to the removal of hubs from the giant CC,
creating satellite (much smaller) connected components;
This ordered removal lends to a structural scanning of the graph.
Jose F Rodrigues-Jr (University of Sao Paulo) 8 / 20
Methodology
1.Structure detection–Structure vocabulary
StructMatrix Vocabulary ψ
Jose F Rodrigues-Jr (University of Sao Paulo) 9 / 20
Methodology
1.Structure detection–Algorithm
1 If the queue has connected components, StructMatrix gets the first
element for processing.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Algorithm
2 StructMatrix selects the vertices with higher degree (up to 1% of the
vertices) and removes their edges.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Algorithm
2 We get a set of smaller connected subcomponents.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Algorithm
3 We classify the subcomponents according to the vocabulary.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Structure classification
α = n2
4 β = n(n−1)
2 = 0.2
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Algorithm
4 We store the classified subcomponents; the ones that were not
identified go to the queue waiting for a new round of shattering.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Algorithm
5 We proceed to the next element in the queue.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Structure detection results
Graph # Structures fs st ch nc fc nb fb
DBLP 160.885 76% 5% 2% 2% 15% <1% -
WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2%
cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1%
Wikipedia-vote 1.706 65% 33% 2% - - <1% -
Epinions 8774 52% 31% 14% <1% <1% 2% <1%
Roadnet PA 51.175 23% 45% 27% - - 5% -
Roadnet CA 88.993 27% 39% 29% - - 4% -
Roadnet TX 62.614 25% 43% 28% - - 4% -
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
1.Structure detection–Runtime
We compare to algorithm VoG (Koutra et al.[2]): better performance, and
bigger vocabulary.
Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
Methodology
2.Visualization–Projection
After structure detection, we build an adjacency matrix
structure-to-structure whose edges’ weights indicate the number of
edges between the nodes of each structure;
Although smaller than the original matrix, for million-scale graphs,
the struct matrix is still too large to fit in the screen;
For this reason we create a dense matrix according to a straight
proportion (x, y) → (ρx , ρy ) for:
ρx = (Resx − 1) x−xmin
xmax −xmin
+ 1
2
ρy = (Rexy − 1) y−ymin
ymax −ymin
+ 1
2
(1)
where (x, y) are points of the original matrix and Resx , Resy are the
target resolutions; the more resolution, the more details are presented
– these parameters allow for interactive grasping of details.
Jose F Rodrigues-Jr (University of Sao Paulo) 11 / 20
Methodology
2.Visualization–Projection
Jose F Rodrigues-Jr (University of Sao Paulo) 12 / 20
Methodology
2.Visualization–Layout
We organize the matrix according to structure type, and to number of
edges – size of structures (number of nodes) is given by color.
Jose F Rodrigues-Jr (University of Sao Paulo) 13 / 20
Methodology
2.Visualization–Layout
We organize the matrix according to structure type, and to number of
edges – size of structures (number of nodes) is given by color.
Jose F Rodrigues-Jr (University of Sao Paulo) 13 / 20
Experiments
Experiments–Real datasets
Graph # Structures fs st ch nc fc nb fb
DBLP 160.885 76% 5% 2% 2% 15% <1% -
WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2%
cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1%
Wikipedia-vote 1.706 65% 33% 2% - - <1% -
Epinions 8774 52% 31% 14% <1% <1% 2% <1%
Roadnet PA 51.175 23% 45% 27% - - 5% -
Roadnet CA 88.993 27% 39% 29% - - 4% -
Roadnet TX 62.614 25% 43% 28% - - 4% -
Jose F Rodrigues-Jr (University of Sao Paulo) 14 / 20
Experiments
Experiments–Real datasets–WWW-barabasi
WWW-barabasi: webpages and links between them.
Stars (st and fs) refer to webpages with many out links.
Most of the webpages have less than one thousand connections;
however, some present unusual thousand connections.
Jose F Rodrigues-Jr (University of Sao Paulo) 15 / 20
Experiments
Experiments–Real datasets–Road nets
Pennsylvania California Texas
The three road graphs have a similar structure – all U.S. roads;
There is a hierarchical connectivity: bigger to smaller cities;
Surprising grid-like (due to symmetry) structure: intersections refer to
hub cities, and lines refer to inter-city paths.
Jose F Rodrigues-Jr (University of Sao Paulo) 16 / 20
Experiments
Experiments–Real datasets–Road nets
Comparison: Structure-to-structure vs Node-to-node.
California (structure-to-structure) California (node-to-node)
Main differences:
1 The partitioning according to structures;
2 The ordering by number of edges to other structures;
3 There is a hierarchical connectivity: bigger to smaller cities;
4 Surprising grid-like structure: intersections refer to hub cities, and
lines refer to inter-city paths.
Jose F Rodrigues-Jr (University of Sao Paulo) 17 / 20
Experiments
Experiments–Real datasets–DBLP
Overall FC-FC zoom
DBLP is mainly characterized by false stars – possibly because
advisors have students, and students connect one to each other;
By zooming FC-FC, one can see outliers, for instance k3 = “The
Biomolecular Interaction Network Database and related tools 2005
update” 75 authors.
Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20
Conclusions
Contributions
Visualization technique: we introduce a processing and visualization
methodology that puts together algorithmic techniques and design in
order to reach large-scale visualizations;
Analytical scalability: our technique extends the most scalable
technique found in the literature; plus, it is engineered to plot millions
of edges in a matter of seconds;
Practical analysis: we show that large-scale graphs have well-defined
behaviors concerning the distribution of structures, their size, and
how they are related one to each other; finally, using a standard
laptop, our techniques allowed us to experiment in real, large-scale
graphs coming from domains of high impact, i.e., WWW, Wikipedia,
Roadnet, and DBLP.
Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20
References
U. Kang and C. Faloutsos, “Beyond ’caveman communities’: Hubs
and spokes for graph compression and mining,” in ICDM, 2011, pp.
300–309.
D. Koutra, U. Kang, J. Vreeken, and C. Faloutsos, “Vog:
Summarizing and understanding large graphs,” in SDM, 2014.
Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20
Ad

More Related Content

Viewers also liked (14)

Unveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approachUnveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approach
Universidade de São Paulo
 
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
Universidade de São Paulo
 
On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...
Universidade de São Paulo
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing Networks
freshdatabos
 
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring networkSupervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Universidade de São Paulo
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Universidade de São Paulo
 
Techniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesTechniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media images
Universidade de São Paulo
 
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Universidade de São Paulo
 
Graph-based Relational Data Visualization
Graph-based RelationalData VisualizationGraph-based RelationalData Visualization
Graph-based Relational Data Visualization
Universidade de São Paulo
 
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelFast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Universidade de São Paulo
 
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsVertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Universidade de São Paulo
 
Dawarehouse e OLAP
Dawarehouse e OLAPDawarehouse e OLAP
Dawarehouse e OLAP
Universidade de São Paulo
 
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Universidade de São Paulo
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)
Lynn Cherny
 
Unveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approachUnveiling smoke in social images with the SmokeBlock approach
Unveiling smoke in social images with the SmokeBlock approach
Universidade de São Paulo
 
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
6 7-metodologia depesquisaemcienciadacomputacao-escritadeartigocientifico-plagio
Universidade de São Paulo
 
On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...On the Support of a Similarity-Enabled Relational Database Management System ...
On the Support of a Similarity-Enabled Relational Database Management System ...
Universidade de São Paulo
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing Networks
freshdatabos
 
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring networkSupervised-Learning Link Recommendation in the DBLP co-authoring network
Supervised-Learning Link Recommendation in the DBLP co-authoring network
Universidade de São Paulo
 
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Multimodal graph-based analysis over the DBLP repository: critical discoverie...
Universidade de São Paulo
 
Techniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media imagesTechniques for effective and efficient fire detection from social media images
Techniques for effective and efficient fire detection from social media images
Universidade de São Paulo
 
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Fire Detection on Unconstrained Videos Using Color-Aware Spatial Modeling and...
Universidade de São Paulo
 
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing ModelFast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Fast Billion-scale Graph Computation Using a Bimodal Block Processing Model
Universidade de São Paulo
 
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale GraphsVertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Vertex Centric Asynchronous Belief Propagation Algorithm for Large-Scale Graphs
Universidade de São Paulo
 
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Complexidade de Algoritmos, Notação assintótica, Algoritmos polinomiais e in...
Universidade de São Paulo
 
A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)A Fast and Dirty Intro to NetworkX (and D3)
A Fast and Dirty Intro to NetworkX (and D3)
Lynn Cherny
 

Similar to StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices (20)

Traffic Outlier Detection by Density-Based Bounded Local Outlier Factors
Traffic Outlier Detection by Density-Based Bounded Local Outlier FactorsTraffic Outlier Detection by Density-Based Bounded Local Outlier Factors
Traffic Outlier Detection by Density-Based Bounded Local Outlier Factors
ITIIIndustries
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
butest
 
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
ssuser4b1f48
 
EffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
EffectiveOcclusion Handling for Fast Correlation Filter-based TrackersEffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
EffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
EECJOURNAL
 
Object Detection with Computer Vision
Object Detection with Computer VisionObject Detection with Computer Vision
Object Detection with Computer Vision
IRJET Journal
 
A framework for outlier detection in
A framework for outlier detection inA framework for outlier detection in
A framework for outlier detection in
ijfcstjournal
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
theijes
 
NRNB Annual Report 2012
NRNB Annual Report 2012NRNB Annual Report 2012
NRNB Annual Report 2012
Alexander Pico
 
Detecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streamsDetecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streams
fatimabenjelloun1
 
Sunbelt 2013 Presentation
Sunbelt 2013 PresentationSunbelt 2013 Presentation
Sunbelt 2013 Presentation
Juan David Cruz-Gómez
 
An improved particle filter tracking
An improved particle filter trackingAn improved particle filter tracking
An improved particle filter tracking
ijcsit
 
An information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networksAn information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networks
Jim Bagrow
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
IJCSIS Research Publications
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
 
Fault detection and diagnosis for non-Gaussian stochastic distribution system...
Fault detection and diagnosis for non-Gaussian stochastic distribution system...Fault detection and diagnosis for non-Gaussian stochastic distribution system...
Fault detection and diagnosis for non-Gaussian stochastic distribution system...
ISA Interchange
 
Top Cited Articles in Data Mining - International Journal of Data Mining & Kn...
Top Cited Articles in Data Mining - International Journal of Data Mining & Kn...Top Cited Articles in Data Mining - International Journal of Data Mining & Kn...
Top Cited Articles in Data Mining - International Journal of Data Mining & Kn...
IJDKP
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
IJEACS
 
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
Nguyen - Science of Information, Computation and Fusion - Spring Review 2013
The Air Force Office of Scientific Research
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
Farhan Zaki
 
Retinal Blood Vessels Exudates Classification For Detection Of Hemmorages Tha...
Retinal Blood Vessels Exudates Classification For Detection Of Hemmorages Tha...Retinal Blood Vessels Exudates Classification For Detection Of Hemmorages Tha...
Retinal Blood Vessels Exudates Classification For Detection Of Hemmorages Tha...
IJSRED
 
Traffic Outlier Detection by Density-Based Bounded Local Outlier Factors
Traffic Outlier Detection by Density-Based Bounded Local Outlier FactorsTraffic Outlier Detection by Density-Based Bounded Local Outlier Factors
Traffic Outlier Detection by Density-Based Bounded Local Outlier Factors
ITIIIndustries
 
Poster Abstracts
Poster AbstractsPoster Abstracts
Poster Abstracts
butest
 
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
ssuser4b1f48
 
EffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
EffectiveOcclusion Handling for Fast Correlation Filter-based TrackersEffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
EffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
EECJOURNAL
 
Object Detection with Computer Vision
Object Detection with Computer VisionObject Detection with Computer Vision
Object Detection with Computer Vision
IRJET Journal
 
A framework for outlier detection in
A framework for outlier detection inA framework for outlier detection in
A framework for outlier detection in
ijfcstjournal
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
theijes
 
NRNB Annual Report 2012
NRNB Annual Report 2012NRNB Annual Report 2012
NRNB Annual Report 2012
Alexander Pico
 
Detecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streamsDetecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streams
fatimabenjelloun1
 
An improved particle filter tracking
An improved particle filter trackingAn improved particle filter tracking
An improved particle filter tracking
ijcsit
 
An information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networksAn information-theoretic, all-scales approach to comparing networks
An information-theoretic, all-scales approach to comparing networks
Jim Bagrow
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
IJCSIS Research Publications
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
 
Fault detection and diagnosis for non-Gaussian stochastic distribution system...
Fault detection and diagnosis for non-Gaussian stochastic distribution system...Fault detection and diagnosis for non-Gaussian stochastic distribution system...
Fault detection and diagnosis for non-Gaussian stochastic distribution system...
ISA Interchange
 
Top Cited Articles in Data Mining - International Journal of Data Mining & Kn...
Top Cited Articles in Data Mining - International Journal of Data Mining & Kn...Top Cited Articles in Data Mining - International Journal of Data Mining & Kn...
Top Cited Articles in Data Mining - International Journal of Data Mining & Kn...
IJDKP
 
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
Stacked Generalization of Random Forest and Decision Tree Techniques for Libr...
IJEACS
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
Farhan Zaki
 
Retinal Blood Vessels Exudates Classification For Detection Of Hemmorages Tha...
Retinal Blood Vessels Exudates Classification For Detection Of Hemmorages Tha...Retinal Blood Vessels Exudates Classification For Detection Of Hemmorages Tha...
Retinal Blood Vessels Exudates Classification For Detection Of Hemmorages Tha...
IJSRED
 
Ad

More from Universidade de São Paulo (11)

A gentle introduction to Deep Learning
A gentle introduction to Deep LearningA gentle introduction to Deep Learning
A gentle introduction to Deep Learning
Universidade de São Paulo
 
Computação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalhoComputação: carreira e mercado de trabalho
Computação: carreira e mercado de trabalho
Universidade de São Paulo
 
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema HadoopIntrodução às ferramentas de Business Intelligence do ecossistema Hadoop
Introdução às ferramentas de Business Intelligence do ecossistema Hadoop
Universidade de São Paulo
 
Metric s plat - a platform for quick development testing and visualization of...
Metric s plat - a platform for quick development testing and visualization of...Metric s plat - a platform for quick development testing and visualization of...
Metric s plat - a platform for quick development testing and visualization of...
Universidade de São Paulo
 
Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Hierarchical visual filtering pragmatic and epistemic actions for database vi...Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Hierarchical visual filtering pragmatic and epistemic actions for database vi...
Universidade de São Paulo
 
Java generics-basics
Java generics-basicsJava generics-basics
Java generics-basics
Universidade de São Paulo
 
Java collections-basic
Java collections-basicJava collections-basic
Java collections-basic
Universidade de São Paulo
 
Java network-sockets-etc
Java network-sockets-etcJava network-sockets-etc
Java network-sockets-etc
Universidade de São Paulo
 
Java streams
Java streamsJava streams
Java streams
Universidade de São Paulo
 
Infovis tutorial
Infovis tutorialInfovis tutorial
Infovis tutorial
Universidade de São Paulo
 
Java platform
Java platformJava platform
Java platform
Universidade de São Paulo
 
Ad

Recently uploaded (20)

AWS-Certified-ML-Engineer-Associate-Slides.pdf
AWS-Certified-ML-Engineer-Associate-Slides.pdfAWS-Certified-ML-Engineer-Associate-Slides.pdf
AWS-Certified-ML-Engineer-Associate-Slides.pdf
philsparkshome
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
Process Mining at Deutsche Bank - Journey
Process Mining at Deutsche Bank - JourneyProcess Mining at Deutsche Bank - Journey
Process Mining at Deutsche Bank - Journey
Process mining Evangelist
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Process Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce DowntimeProcess Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce Downtime
Process mining Evangelist
 
How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?
Process mining Evangelist
 
2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf
dominikamizerska1
 
report (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhsreport (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhs
AngelPinedaTaguinod
 
national income & related aggregates (1)(1).pptx
national income & related aggregates (1)(1).pptxnational income & related aggregates (1)(1).pptx
national income & related aggregates (1)(1).pptx
j2492618
 
Process Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital TransformationsProcess Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital Transformations
Process mining Evangelist
 
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdfZ14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Fariborz Seyedloo
 
Automation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success storyAutomation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success story
Process mining Evangelist
 
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
bastakwyry
 
Introduction to systems thinking tools_Eng.pdf
Introduction to systems thinking tools_Eng.pdfIntroduction to systems thinking tools_Eng.pdf
Introduction to systems thinking tools_Eng.pdf
AbdurahmanAbd
 
What is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdfWhat is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdf
SaikatBasu37
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Jayantilal Bhanushali
 
Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
Taqyea
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 
AWS-Certified-ML-Engineer-Associate-Slides.pdf
AWS-Certified-ML-Engineer-Associate-Slides.pdfAWS-Certified-ML-Engineer-Associate-Slides.pdf
AWS-Certified-ML-Engineer-Associate-Slides.pdf
philsparkshome
 
RAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit FrameworkRAG Chatbot using AWS Bedrock and Streamlit Framework
RAG Chatbot using AWS Bedrock and Streamlit Framework
apanneer
 
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
indonesia-gen-z-report-2024 Gen Z (born between 1997 and 2012) is currently t...
disnakertransjabarda
 
Process Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce DowntimeProcess Mining Machine Recoveries to Reduce Downtime
Process Mining Machine Recoveries to Reduce Downtime
Process mining Evangelist
 
How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?How to Set Up Process Mining in a Decentralized Organization?
How to Set Up Process Mining in a Decentralized Organization?
Process mining Evangelist
 
2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf2024 Digital Equity Accelerator Report.pdf
2024 Digital Equity Accelerator Report.pdf
dominikamizerska1
 
report (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhsreport (maam dona subject).pptxhsgwiswhs
report (maam dona subject).pptxhsgwiswhs
AngelPinedaTaguinod
 
national income & related aggregates (1)(1).pptx
national income & related aggregates (1)(1).pptxnational income & related aggregates (1)(1).pptx
national income & related aggregates (1)(1).pptx
j2492618
 
Process Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital TransformationsProcess Mining as Enabler for Digital Transformations
Process Mining as Enabler for Digital Transformations
Process mining Evangelist
 
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdfZ14_IBM__APL_by_Christian_Demmer_IBM.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
Fariborz Seyedloo
 
Automation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success storyAutomation Platforms and Process Mining - success story
Automation Platforms and Process Mining - success story
Process mining Evangelist
 
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
2-Raction quotient_١٠٠١٤٦.ppt of physical chemisstry
bastakwyry
 
Introduction to systems thinking tools_Eng.pdf
Introduction to systems thinking tools_Eng.pdfIntroduction to systems thinking tools_Eng.pdf
Introduction to systems thinking tools_Eng.pdf
AbdurahmanAbd
 
What is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdfWhat is ETL? Difference between ETL and ELT?.pdf
What is ETL? Difference between ETL and ELT?.pdf
SaikatBasu37
 
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfjOral Malodor.pptx jsjshdhushehsidjjeiejdhfj
Oral Malodor.pptx jsjshdhushehsidjjeiejdhfj
maitripatel5301
 
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Day 1 MS Excel Basics #.pptxDay 1 MS Excel Basics #.pptxDay 1 MS Excel Basics...
Jayantilal Bhanushali
 
Time series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdfTime series for yotube_1_data anlysis.pdf
Time series for yotube_1_data anlysis.pdf
asmaamahmoudsaeed
 
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
录取通知书加拿大TMU毕业证多伦多都会大学电子版毕业证成绩单
Taqyea
 
L1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptxL1_Slides_Foundational Concepts_508.pptx
L1_Slides_Foundational Concepts_508.pptx
38NoopurPatel
 

StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices

  • 1. StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices Hugo Gualdron, Robson L. F. Cordeiro, Jose F Rodrigues-Jr University of Sao Paulo In collaboration with Carnegie Mellon University (Prof. Christos Faloutsos, and PhD Danai Koutra) Funding by research agency Fapesp (2013/03906-0, 2014/07879-0, 2015/18335) In: The Fifth IEEE ICDM Workshop on Data Mining in Networks, Atlantic City, NJ, USA - November, 2015 http://www.icmc.usp.br/pessoas/junio Jose F Rodrigues-Jr (University of Sao Paulo) 1 / 20
  • 2. Introduction Motivation Big Data!!! A lot of information, much of it in the form of relationships; Large-scale graphs: graphs generated by applications in which users or entities are distributed along large geographical areas - even the entire planet; Social networks, recommendation networks, road nets, e-commerce, computer networks, client-product logs, and many others. Data analysis is the differential for industrial competition. General Electric & Accenture. Jose F Rodrigues-Jr (University of Sao Paulo) 2 / 20
  • 3. Introduction Problem Such graphs are too big: node-link visualization cannot handle even thousand-vertices graphs; adjacency matrices are limited by the number of pixels of the screen; in any case, the cardinality of the nodes prevents rationalization; non-visual analytical techniques might produce way too many patterns preventing human cognition. Still, we want to characterize the structure of graphs for: understanding the overall structure, and not only the distribution-based analyses; spotting outliers and trends that are not dominant; requesting details on demand concerning subregions of the graph topology. Jose F Rodrigues-Jr (University of Sao Paulo) 3 / 20
  • 4. Introduction Problem Layouts node-link and adjacency matrix Node-link Adjacency matrix Scalability: Hundred nodes Thousand nodes Jose F Rodrigues-Jr (University of Sao Paulo) 4 / 20
  • 5. Introduction Methodology overview Assumptions: graphs are made of recurrent simple structures (cliques, bi-partite cores, stars, and chains); such structures are more meaningful than sole nodes; even at lower resolutions, the graph main properties are maintained in a visualization. Hypothesis: we reach more scalable and meaningful graph visualizations with: graph summarization by detecting recurrent structures of the graph; dense adjacency matrices. Jose F Rodrigues-Jr (University of Sao Paulo) 5 / 20
  • 6. Methodology Proposed method: StructMatrix Our method has two parts: 1 An algorithm to detect substructures; 2 A dense adjacency matrix of the structures that were detected. Jose F Rodrigues-Jr (University of Sao Paulo) 6 / 20
  • 7. Methodology 1.Structure detection Jose F Rodrigues-Jr (University of Sao Paulo) 7 / 20
  • 8. Methodology 1.Structure detection We designed a graph partitioning algorithm based on the fact that real-world graphs obey to power-law distributions; In such graphs: few nodes with very high degree and the majority of nodes with low degree; Kang and Faloutsos [1] demonstrated that the ordered removal of the higher degree nodes leads to the removal of hubs from the giant CC, creating satellite (much smaller) connected components; This ordered removal lends to a structural scanning of the graph. Jose F Rodrigues-Jr (University of Sao Paulo) 8 / 20
  • 9. Methodology 1.Structure detection–Structure vocabulary StructMatrix Vocabulary ψ Jose F Rodrigues-Jr (University of Sao Paulo) 9 / 20
  • 10. Methodology 1.Structure detection–Algorithm 1 If the queue has connected components, StructMatrix gets the first element for processing. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 11. Methodology 1.Structure detection–Algorithm 2 StructMatrix selects the vertices with higher degree (up to 1% of the vertices) and removes their edges. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 12. Methodology 1.Structure detection–Algorithm 2 We get a set of smaller connected subcomponents. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 13. Methodology 1.Structure detection–Algorithm 3 We classify the subcomponents according to the vocabulary. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 14. Methodology 1.Structure detection–Structure classification α = n2 4 β = n(n−1) 2 = 0.2 Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 15. Methodology 1.Structure detection–Algorithm 4 We store the classified subcomponents; the ones that were not identified go to the queue waiting for a new round of shattering. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 16. Methodology 1.Structure detection–Algorithm 5 We proceed to the next element in the queue. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 17. Methodology 1.Structure detection–Structure detection results Graph # Structures fs st ch nc fc nb fb DBLP 160.885 76% 5% 2% 2% 15% <1% - WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2% cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1% Wikipedia-vote 1.706 65% 33% 2% - - <1% - Epinions 8774 52% 31% 14% <1% <1% 2% <1% Roadnet PA 51.175 23% 45% 27% - - 5% - Roadnet CA 88.993 27% 39% 29% - - 4% - Roadnet TX 62.614 25% 43% 28% - - 4% - Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 18. Methodology 1.Structure detection–Runtime We compare to algorithm VoG (Koutra et al.[2]): better performance, and bigger vocabulary. Jose F Rodrigues-Jr (University of Sao Paulo) 10 / 20
  • 19. Methodology 2.Visualization–Projection After structure detection, we build an adjacency matrix structure-to-structure whose edges’ weights indicate the number of edges between the nodes of each structure; Although smaller than the original matrix, for million-scale graphs, the struct matrix is still too large to fit in the screen; For this reason we create a dense matrix according to a straight proportion (x, y) → (ρx , ρy ) for: ρx = (Resx − 1) x−xmin xmax −xmin + 1 2 ρy = (Rexy − 1) y−ymin ymax −ymin + 1 2 (1) where (x, y) are points of the original matrix and Resx , Resy are the target resolutions; the more resolution, the more details are presented – these parameters allow for interactive grasping of details. Jose F Rodrigues-Jr (University of Sao Paulo) 11 / 20
  • 21. Methodology 2.Visualization–Layout We organize the matrix according to structure type, and to number of edges – size of structures (number of nodes) is given by color. Jose F Rodrigues-Jr (University of Sao Paulo) 13 / 20
  • 22. Methodology 2.Visualization–Layout We organize the matrix according to structure type, and to number of edges – size of structures (number of nodes) is given by color. Jose F Rodrigues-Jr (University of Sao Paulo) 13 / 20
  • 23. Experiments Experiments–Real datasets Graph # Structures fs st ch nc fc nb fb DBLP 160.885 76% 5% 2% 2% 15% <1% - WWW-barabasi 15.652 32% 52% 5% 3% 2% 4% 2% cit-HepPh 14.479 79% 13% 6% 1% <1% <1% <1% Wikipedia-vote 1.706 65% 33% 2% - - <1% - Epinions 8774 52% 31% 14% <1% <1% 2% <1% Roadnet PA 51.175 23% 45% 27% - - 5% - Roadnet CA 88.993 27% 39% 29% - - 4% - Roadnet TX 62.614 25% 43% 28% - - 4% - Jose F Rodrigues-Jr (University of Sao Paulo) 14 / 20
  • 24. Experiments Experiments–Real datasets–WWW-barabasi WWW-barabasi: webpages and links between them. Stars (st and fs) refer to webpages with many out links. Most of the webpages have less than one thousand connections; however, some present unusual thousand connections. Jose F Rodrigues-Jr (University of Sao Paulo) 15 / 20
  • 25. Experiments Experiments–Real datasets–Road nets Pennsylvania California Texas The three road graphs have a similar structure – all U.S. roads; There is a hierarchical connectivity: bigger to smaller cities; Surprising grid-like (due to symmetry) structure: intersections refer to hub cities, and lines refer to inter-city paths. Jose F Rodrigues-Jr (University of Sao Paulo) 16 / 20
  • 26. Experiments Experiments–Real datasets–Road nets Comparison: Structure-to-structure vs Node-to-node. California (structure-to-structure) California (node-to-node) Main differences: 1 The partitioning according to structures; 2 The ordering by number of edges to other structures; 3 There is a hierarchical connectivity: bigger to smaller cities; 4 Surprising grid-like structure: intersections refer to hub cities, and lines refer to inter-city paths. Jose F Rodrigues-Jr (University of Sao Paulo) 17 / 20
  • 27. Experiments Experiments–Real datasets–DBLP Overall FC-FC zoom DBLP is mainly characterized by false stars – possibly because advisors have students, and students connect one to each other; By zooming FC-FC, one can see outliers, for instance k3 = “The Biomolecular Interaction Network Database and related tools 2005 update” 75 authors. Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20
  • 28. Conclusions Contributions Visualization technique: we introduce a processing and visualization methodology that puts together algorithmic techniques and design in order to reach large-scale visualizations; Analytical scalability: our technique extends the most scalable technique found in the literature; plus, it is engineered to plot millions of edges in a matter of seconds; Practical analysis: we show that large-scale graphs have well-defined behaviors concerning the distribution of structures, their size, and how they are related one to each other; finally, using a standard laptop, our techniques allowed us to experiment in real, large-scale graphs coming from domains of high impact, i.e., WWW, Wikipedia, Roadnet, and DBLP. Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20
  • 29. References U. Kang and C. Faloutsos, “Beyond ’caveman communities’: Hubs and spokes for graph compression and mining,” in ICDM, 2011, pp. 300–309. D. Koutra, U. Kang, J. Vreeken, and C. Faloutsos, “Vog: Summarizing and understanding large graphs,” in SDM, 2014. Jose F Rodrigues-Jr (University of Sao Paulo) 18 / 20
  翻译: