SlideShare a Scribd company logo
1
RDF2Vec: RDF Graph Embeddings
for Data Mining
Petar Ristoski and Heiko Paulheim
11/7/2016 2
Introduction
Linking
Exploration
/ Selection
Consolidation
/ Cleansing
Graph Data
Transformation
Data
Mining
Visualization /
Explanation
Ristoski, Paulheim
Motivation
• Standard data mining algorithms require propositional feature
vector representation
• Feature space: V={v1,v2,…, vn}
• Each instance is represented as an n-dimensional feature vector
(v1,v2,…,vn), where for each 1≤ vi ≤n :
– vi ∈ {true, false}, or vi ∈ {1,0}
– vi ∈ ℝ
– vi ∈ S, where S is a finite set of symbols
11/7/2016 Ristoski, Paulheim 3
11/7/2016 Ristoski, Paulheim 4
Name Person Music Artist Instrument Genre
Trent Reznor 1 1 1 0
Wolfgang A. Mozart 1 1 1 1
Barack Obama 1 0 0 0
Motivation
Vision
• Preserve the information given in the original graph
• Unsupervised
– task and dataset independent
• Compatible with traditional data mining algorithms and tools
• Efficient computation and application
– Low dimensional representation
11/7/2016 Ristoski, Paulheim 5
RDF2VEC APPROACH
11/7/2016 6Ristoski, Paulheim
RDF2Vec
• Adaptation of neural language models
– Word2vec
– Latent representation of words based on text corpus
• Convert RDF graphs in sequences of entities and relations (sentences)
– Graph Walks
– Weisfeiler-Lehman Subtree RDF Graph Kernels
• Train neural language model
– Each entity and relation is represented as N-dimensional numerical vector
– Semantically similar entities appear closer in the embedded space
• Use entity vectors in different ML tasks
11/7/2016 Ristoski, Paulheim 7
Word2vec – Neural Language Model
• Two-layer neural net that converts raw text into vectors
– Each word is represented into a numerical vector
• Continuous Bag-of-Words (CBOW)
– Predict target words from source context words
– Tokyo is the capital of Japan
• Skip-gram
11/7/2016 8
[1] Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." NIPS, 2013.
[2] Rong, Xin. "word2vec parameter learning explained." 2014.
Ristoski, Paulheim
CBOW
11/7/2016 9
Capital
Japan
Tokyo
Ristoski, Paulheim
Word Embedding
11/7/2016 10
• Japan
• Russia
• Germany
• Austria
• Berlin
• Tokyo
• Moscow
• Vienna
Tokyo = [f1, f2, f3, …, fn]
Japan= [f1, f2, f3, …, fn]
Ristoski, Paulheim
?
v(Japan) - v(Tokyo) + v(Berlin) ≈ v(Germany)
Word2vec – Neural Language Model
• Two-layer neural net that converts raw text into vectors
– Each word is represented into a numerical vector
• Continuous Bag-of-Words (CBOW)
– Predict target words from source context word
– Tokyo is the capital of Japan
• Skip-gram
– Predict context words from the target word
– Tokyo is the capital of Japan
11/7/2016 11
[1] Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." NIPS, 2013.
[2] Rong, Xin. "word2vec parameter learning explained." 2014.
Ristoski, Paulheim
Skip-gram
11/7/2016 12
Capital
Japan
Tokyo
Ristoski, Paulheim
RDF2vec
11/7/2016 13
• Convert the graph into sequence of tokens (sentences)
– Graph walks
– Weisfeiler-Lehman Subtree RDF Graph Kernels
Ristoski, Paulheim
Graph Walks RDF2vec
• For each entity in the graph:
– Extract a subgraph with depth d
– Extract walks on the subgraph
– Build word2vec model
dbr:Trent_Reznor -> dbo:associatedBand -> dbr:Exotic_Birds -> dbo:bandMember -> dbr:Chris_Vrenna
dbr:Trent_Reznor -> dbo:genre - > dbr:Dark_ambient -> dbo:instrument -> dbr:Field_recording
11/7/2016 14Ristoski, Paulheim
Random Walks RDF2vec
11/7/2016 15
V*S Walks
V Vectors
Ristoski, Paulheim
Entity Embedding
11/7/2016 16
• dbr:Berlin
• dbr:Tokyo
• dbr:Moscow
• dbr:Vienna
• dbr:Japan
• dbr:Russia
• dbr:Germany
• dbr:Austria
Ristoski, Paulheim
dbr:Tokyo = [f1, f2, f3, …, fn]
dbr:Japan= [f1, f2, f3, …, fn]
Weisfeiler-Lehman Kernel
11/7/2016 17Ristoski, Paulheim
WL Kernel RDF2vec
• Construct sequences using random walks with depth d after each
iteration for each entity in the graph
• Graph G sequences after 1 iteration:
– 1->6->11; 1->6->11->13; 1->6->11->10 …
– 4->11->6; 4->11->13; 4->11->10; 4->11->10->8 …
– …
11/7/2016 18
de Vries, Gerben KD. "A fast approximation of the Weisfeiler-Lehman graph kernel for RDF data.“ ECML, 2013.
Ristoski, Paulheim
WL Kernels RDF2vec
11/7/2016 19
V*S*I
sequences V Vectors
Ristoski, Paulheim
EVALUATION
11/7/2016 20Ristoski, Paulheim
Evaluation Setup
• Datasets
– 3 domain-specific RDF datasets
– 2 large cross-domain RDF datasets with 5 evaluation datasets
• Tasks
– Classification: Naive Bayes, k-Nearest Neighbors (k=3), C4.5 decision tree
and Support Vector Machines.
– Regression: Linear Regression, M5Rules, and k-Nearest Neighbors (k=3).
• Baselines
– Features derived from incoming and outgoing relations and values
– Features derived from graph substructures: WL and Walk-Count Kernels
11/7/2016 Ristoski, Paulheim 21
Domain Specific RDF Datasets
• Datasets
• Results (accuracy)
– Best scores per dataset
11/7/2016 Ristoski, Paulheim 22
Dataset Task #statements #instances #walks depth #sequences WL iter. WL depth #sequences
AIFB C (c=4) 30K 176 all 10 360K 4 2 346K
BGS C (c=2) 600K 146 all 10 2.4M 4 2 5.3M
MUTAG C (c=2) 80K 340 all 10 168K 4 2 908K
Dataset Baseline Walks2vec WL2vec (SG 500)
AIFB 92.68 89.55 93.41
BGS 91.05 78.10 96.18
MUTAG 94.29 82.06 96.33
Large Cross-Domain RDF Datasets
• Datasets
• Evaluation datasets
11/7/2016 Ristoski, Paulheim 23
Dataset #instances depth #sequences Vector size model
DBpedia 5M 4/8 2.5B 200/500 CBOW/SG
Wikidata 17M 4 8.5B 200/500 CBOW/SG
Dataset #Instances ML Task Original Source
Cities 212 R/C (c=3) Mercer
Metacritic Albums 1,600 R/C (c=3) Metacritic
Metacritic Movies 2,000 R/C (c=3) Metacritic
AAUP 960 R/C(c=3) JSE
Forbes 1,585 R/C (c=3) Forbes
• Accuracy Results
– Best scores only
Results: classification
Cities Movies Albums AAUP Forbes
Best Baseline 75.13 79.30 77.94 93.44 76.75
DB2vec CBOW 200 8 77.39 83.65 78.44 92.23 88.30
DB2vec CBOW 500 8 76.84 83.25 77.25 90.61 89.86
DB2vec SG 200 8 78.92 83.30 79.72 91.04 90.10
DB2vec SG 500 8 89.73 82.80 78.20 94.48 88.53
WD2vec CBOW 200 4 75.56 52.20 51.44 90.18 81.08
WD2vec CBOW 500 4 85.56 51.04 53.28 89.74 80.74
WD2vec SG 200 4 75.48 75.39 64.76 90.50 81.17
WD2vec SG 500 4 83.20 76.30 63.42 90.60 81.17
11/7/2016 Ristoski, Paulheim 24
• RMSE Results
– Best scores only
Results: regression
Cities Movies Albums AAUP Forbes
Best Baseline 17.04 19.19 12.81 6.16 18.32
1
DB2vec CBOW 200 8 12.55 15.90 11.79 6.47 17.43
DB2vec CBOW 500 8 12.54 15.81 11.30 6.54 17.62
DB2vec SG 200 8 12.85 15.12 10.90 6.22 17.85
DB2vec SG 500 8 10.19 15.45 10.89 6.26 16.61
WD2vec CBOW 200 4 17.52 23.39 14.55 6.60 21.77
WD2vec CBOW 500 4 18.33 22.18 14.00 6.08 21.92
WD2vec SG 200 4 18.69 19.10 13.51 6.52 21.59
WD2vec SG 500 4 19.23 19.19 13.23 6.05 21.58
11/7/2016 Ristoski, Paulheim 25
Results Summary
• RDF2vec outperform all the baseline approaches
– Smaller feature vectors - more efficient training than bassline
approaches
• WL kernel sequences capture the graph structure better than walks
– Not efficient on large graphs
– Large number of sequences produced – not scalable
• Increasing the depth of the paths increases the quality of the
embeddings
• The vector dimensionality doesn’t affect the performance
• Skip-Gram models constantly outperforms CBOW models
• DBpedia produces higher quality embeddings than Wikidata
11/7/2016 Ristoski, Paulheim 26
Other Use-Cases
• Recommender systems
• Document modeling
– Document similarity
– Entity relatedness
• Alignment of knowledge bases
– DBpedia and Wikidata
• Knowledge base relation prediction and error detection
• Linking text and semi-structured knowledge to knowledge bases
11/7/2016 Ristoski, Paulheim 27
Conclusion
• RDF2Vec: an approach for learning latent numerical representations
of entities in RDF graphs
• Preserves the graph information
• Compatible with all the traditional machine learning algorithms
• More efficient ML models training
• Task and dataset independent approach
• Download the code and the models: http://data.dws.informatik.uni-
mannheim.de/rdf2vec/
11/7/2016 Ristoski, Paulheim 28
Ad

More Related Content

What's hot (20)

Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
Tomer Lieber
 
Deep Semi-Supervised Anomaly Detection
Deep Semi-Supervised Anomaly DetectionDeep Semi-Supervised Anomaly Detection
Deep Semi-Supervised Anomaly Detection
ぱんいち すみもと
 
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
Deep Learning JP
 
Machine Learning with Spark MLlib
Machine Learning with Spark MLlibMachine Learning with Spark MLlib
Machine Learning with Spark MLlib
Todd McGrath
 
Neural networks for Graph Data NeurIPS2018読み会@PFN
Neural networks for Graph Data NeurIPS2018読み会@PFNNeural networks for Graph Data NeurIPS2018読み会@PFN
Neural networks for Graph Data NeurIPS2018読み会@PFN
emakryo
 
近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer
Yusuke Uchida
 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
taeseon ryu
 
深層学習 勉強会第5回 ボルツマンマシン
深層学習 勉強会第5回 ボルツマンマシン深層学習 勉強会第5回 ボルツマンマシン
深層学習 勉強会第5回 ボルツマンマシン
Yuta Sugii
 
[DL輪読会]Learning convolutional neural networks for graphs
[DL輪読会]Learning convolutional neural networks for graphs[DL輪読会]Learning convolutional neural networks for graphs
[DL輪読会]Learning convolutional neural networks for graphs
Deep Learning JP
 
LogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingLogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology Matching
Ernesto Jimenez Ruiz
 
NIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksNIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder Networks
Eiichi Matsumoto
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
Peter Haase
 
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
harmonylab
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim
 
三次元表現まとめ(深層学習を中心に)
三次元表現まとめ(深層学習を中心に)三次元表現まとめ(深層学習を中心に)
三次元表現まとめ(深層学習を中心に)
Tomohiro Motoda
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
Rodion Kiryukhin
 
Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++
Dongheon Lee
 
連続変量を含む相互情報量の推定
連続変量を含む相互情報量の推定連続変量を含む相互情報量の推定
連続変量を含む相互情報量の推定
Joe Suzuki
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
Jeff Z. Pan
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
Tomer Lieber
 
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
[DL輪読会]Reward Augmented Maximum Likelihood for Neural Structured Prediction
Deep Learning JP
 
Machine Learning with Spark MLlib
Machine Learning with Spark MLlibMachine Learning with Spark MLlib
Machine Learning with Spark MLlib
Todd McGrath
 
Neural networks for Graph Data NeurIPS2018読み会@PFN
Neural networks for Graph Data NeurIPS2018読み会@PFNNeural networks for Graph Data NeurIPS2018読み会@PFN
Neural networks for Graph Data NeurIPS2018読み会@PFN
emakryo
 
近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer
Yusuke Uchida
 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
taeseon ryu
 
深層学習 勉強会第5回 ボルツマンマシン
深層学習 勉強会第5回 ボルツマンマシン深層学習 勉強会第5回 ボルツマンマシン
深層学習 勉強会第5回 ボルツマンマシン
Yuta Sugii
 
[DL輪読会]Learning convolutional neural networks for graphs
[DL輪読会]Learning convolutional neural networks for graphs[DL輪読会]Learning convolutional neural networks for graphs
[DL輪読会]Learning convolutional neural networks for graphs
Deep Learning JP
 
LogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology MatchingLogMap: Logic-based and Scalable Ontology Matching
LogMap: Logic-based and Scalable Ontology Matching
Ernesto Jimenez Ruiz
 
NIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder NetworksNIPS2015読み会: Ladder Networks
NIPS2015読み会: Ladder Networks
Eiichi Matsumoto
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
Peter Haase
 
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
harmonylab
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
Heiko Paulheim
 
三次元表現まとめ(深層学習を中心に)
三次元表現まとめ(深層学習を中心に)三次元表現まとめ(深層学習を中心に)
三次元表現まとめ(深層学習を中心に)
Tomohiro Motoda
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
Rodion Kiryukhin
 
Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++Pixel RNN to Pixel CNN++
Pixel RNN to Pixel CNN++
Dongheon Lee
 
連続変量を含む相互情報量の推定
連続変量を含む相互情報量の推定連続変量を含む相互情報量の推定
連続変量を含む相互情報量の推定
Joe Suzuki
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
Jeff Z. Pan
 

Viewers also liked (20)

DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spaces
Petar Ristoski
 
DBpedia Japanese 運営の現状
DBpedia Japanese 運営の現状DBpedia Japanese 運営の現状
DBpedia Japanese 運営の現状
Fumihiro Kato
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
PlanetData Network of Excellence
 
Towards Knowledge-Enabled Society
Towards Knowledge-Enabled SocietyTowards Knowledge-Enabled Society
Towards Knowledge-Enabled Society
National Institute of Informatics (NII)
 
CSV-X
CSV-XCSV-X
CSV-X
Chubu University
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
National Institute of Informatics
 
図書館と人口分布の見える化
図書館と人口分布の見える化図書館と人口分布の見える化
図書館と人口分布の見える化
Yoshikazu Hosono
 
声優LOD
声優LOD声優LOD
声優LOD
Yusuke Sekii
 
Sakepediaの使い方(LODチャレンジ)
Sakepediaの使い方(LODチャレンジ)Sakepediaの使い方(LODチャレンジ)
Sakepediaの使い方(LODチャレンジ)
teru1118
 
Lod2016.key
Lod2016.keyLod2016.key
Lod2016.key
Mami Kajita
 
2016年 地域密着型本おすすめアプリ「なによも」
2016年 地域密着型本おすすめアプリ「なによも」2016年 地域密着型本おすすめアプリ「なによも」
2016年 地域密着型本おすすめアプリ「なによも」
Keiko Noda
 
Tutorial for RDF Graphs
Tutorial for RDF GraphsTutorial for RDF Graphs
Tutorial for RDF Graphs
Kishoj Bajracharya
 
Saveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF dataSaveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF data
Fuming Shih
 
Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Fosdem 2011 - A Common Graph Database Access Layer for .Net and MonoFosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Achim Friedland
 
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic WebMachine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
pauldix
 
Local karuta project
Local karuta projectLocal karuta project
Local karuta project
Nanako Takahashi
 
可視化法学-大和超券ステージ
可視化法学-大和超券ステージ可視化法学-大和超券ステージ
可視化法学-大和超券ステージ
(shibao)芝尾 (kouichiro)幸一郎
 
Graph-based Relational Data Visualization
Graph-based RelationalData VisualizationGraph-based RelationalData Visualization
Graph-based Relational Data Visualization
Universidade de São Paulo
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphs
andyseaborne
 
Swc2013 yamamoto
Swc2013 yamamotoSwc2013 yamamoto
Swc2013 yamamoto
yayamamo @ DBCLS Kashiwanoha
 
DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spaces
Petar Ristoski
 
DBpedia Japanese 運営の現状
DBpedia Japanese 運営の現状DBpedia Japanese 運営の現状
DBpedia Japanese 運営の現状
Fumihiro Kato
 
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as KnowledgeRDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
National Institute of Informatics
 
図書館と人口分布の見える化
図書館と人口分布の見える化図書館と人口分布の見える化
図書館と人口分布の見える化
Yoshikazu Hosono
 
Sakepediaの使い方(LODチャレンジ)
Sakepediaの使い方(LODチャレンジ)Sakepediaの使い方(LODチャレンジ)
Sakepediaの使い方(LODチャレンジ)
teru1118
 
2016年 地域密着型本おすすめアプリ「なによも」
2016年 地域密着型本おすすめアプリ「なによも」2016年 地域密着型本おすすめアプリ「なによも」
2016年 地域密着型本おすすめアプリ「なによも」
Keiko Noda
 
Saveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF dataSaveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF data
Fuming Shih
 
Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Fosdem 2011 - A Common Graph Database Access Layer for .Net and MonoFosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Achim Friedland
 
Machine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic WebMachine Learning Techniques for the Semantic Web
Machine Learning Techniques for the Semantic Web
pauldix
 
Two graph data models : RDF and Property Graphs
Two graph data models : RDF and Property GraphsTwo graph data models : RDF and Property Graphs
Two graph data models : RDF and Property Graphs
andyseaborne
 
Ad

Similar to RDF2Vec: RDF Graph Embeddings for Data Mining (20)

A Comparison of Propositionalization Strategies for Creating Features from Li...
A Comparison of Propositionalization Strategies for Creating Features from Li...A Comparison of Propositionalization Strategies for Creating Features from Li...
A Comparison of Propositionalization Strategies for Creating Features from Li...
Petar Ristoski
 
Visualising Multi-objective Data: From League Tables to Optimisers, and back
Visualising Multi-objective Data: From League Tables to Optimisers, and backVisualising Multi-objective Data: From League Tables to Optimisers, and back
Visualising Multi-objective Data: From League Tables to Optimisers, and back
djw213
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
Heiko Paulheim
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
LDBC council
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large Repositories
United States Air Force Academy
 
Analyzing large multimedia collections in an urban context - Prof. Marcel Wor...
Analyzing large multimedia collections in an urban context - Prof. Marcel Wor...Analyzing large multimedia collections in an urban context - Prof. Marcel Wor...
Analyzing large multimedia collections in an urban context - Prof. Marcel Wor...
Facultad de Informática UCM
 
Multilingual qa
Multilingual qaMultilingual qa
Multilingual qa
shakimov
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
Heiko Paulheim
 
Scaling Dropbox
Scaling DropboxScaling Dropbox
Scaling Dropbox
C4Media
 
rips-hk-lenovo (1)
rips-hk-lenovo (1)rips-hk-lenovo (1)
rips-hk-lenovo (1)
Owen Richfield
 
Conceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQLConceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQL
MongoDB
 
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jExplicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Connected Data World
 
Towards Versioning of Arbitrary RDF Data
Towards Versioning of Arbitrary RDF DataTowards Versioning of Arbitrary RDF Data
Towards Versioning of Arbitrary RDF Data
Linked Enterprise Date Services
 
Training di Base Neo4j
Training di Base Neo4jTraining di Base Neo4j
Training di Base Neo4j
Neo4j
 
Lec11 object-re-id
Lec11 object-re-idLec11 object-re-id
Lec11 object-re-id
United States Air Force Academy
 
Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics Researchers
Vitomir Kovanovic
 
Neo4j graphdatabaseforrecommendations-130531021030-phpapp02-converted
Neo4j graphdatabaseforrecommendations-130531021030-phpapp02-convertedNeo4j graphdatabaseforrecommendations-130531021030-phpapp02-converted
Neo4j graphdatabaseforrecommendations-130531021030-phpapp02-converted
snehapandey01
 
Neo4j - graph database for recommendations
Neo4j - graph database for recommendationsNeo4j - graph database for recommendations
Neo4j - graph database for recommendations
proksik
 
Serials & E-Books in RDA
Serials & E-Books in RDASerials & E-Books in RDA
Serials & E-Books in RDA
Renette Davis
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
Gábor Szárnyas
 
A Comparison of Propositionalization Strategies for Creating Features from Li...
A Comparison of Propositionalization Strategies for Creating Features from Li...A Comparison of Propositionalization Strategies for Creating Features from Li...
A Comparison of Propositionalization Strategies for Creating Features from Li...
Petar Ristoski
 
Visualising Multi-objective Data: From League Tables to Optimisers, and back
Visualising Multi-objective Data: From League Tables to Optimisers, and backVisualising Multi-objective Data: From League Tables to Optimisers, and back
Visualising Multi-objective Data: From League Tables to Optimisers, and back
djw213
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
Heiko Paulheim
 
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
LDBC council
 
Mobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large RepositoriesMobile Visual Search: Object Re-Identification Against Large Repositories
Mobile Visual Search: Object Re-Identification Against Large Repositories
United States Air Force Academy
 
Analyzing large multimedia collections in an urban context - Prof. Marcel Wor...
Analyzing large multimedia collections in an urban context - Prof. Marcel Wor...Analyzing large multimedia collections in an urban context - Prof. Marcel Wor...
Analyzing large multimedia collections in an urban context - Prof. Marcel Wor...
Facultad de Informática UCM
 
Multilingual qa
Multilingual qaMultilingual qa
Multilingual qa
shakimov
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
Heiko Paulheim
 
Scaling Dropbox
Scaling DropboxScaling Dropbox
Scaling Dropbox
C4Media
 
Conceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQLConceptos básicos. Seminario web 1: Introducción a NoSQL
Conceptos básicos. Seminario web 1: Introducción a NoSQL
MongoDB
 
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4jExplicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Explicit Semantics in Graph DBs Driving Digital Transformation With Neo4j
Connected Data World
 
Training di Base Neo4j
Training di Base Neo4jTraining di Base Neo4j
Training di Base Neo4j
Neo4j
 
Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics Researchers
Vitomir Kovanovic
 
Neo4j graphdatabaseforrecommendations-130531021030-phpapp02-converted
Neo4j graphdatabaseforrecommendations-130531021030-phpapp02-convertedNeo4j graphdatabaseforrecommendations-130531021030-phpapp02-converted
Neo4j graphdatabaseforrecommendations-130531021030-phpapp02-converted
snehapandey01
 
Neo4j - graph database for recommendations
Neo4j - graph database for recommendationsNeo4j - graph database for recommendations
Neo4j - graph database for recommendations
proksik
 
Serials & E-Books in RDA
Serials & E-Books in RDASerials & E-Books in RDA
Serials & E-Books in RDA
Renette Davis
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
Gábor Szárnyas
 
Ad

Recently uploaded (20)

Phytonematodes, Ecology, Biology and Managementpptx
Phytonematodes, Ecology, Biology and ManagementpptxPhytonematodes, Ecology, Biology and Managementpptx
Phytonematodes, Ecology, Biology and Managementpptx
Dr Showkat Ahmad Wani
 
Biochemistry Lesson_Molecular Polarity.ppt
Biochemistry Lesson_Molecular Polarity.pptBiochemistry Lesson_Molecular Polarity.ppt
Biochemistry Lesson_Molecular Polarity.ppt
ErPri1
 
Micro-grooved zein macro-whiskers for large-scale proliferation and different...
Micro-grooved zein macro-whiskers for large-scale proliferation and different...Micro-grooved zein macro-whiskers for large-scale proliferation and different...
Micro-grooved zein macro-whiskers for large-scale proliferation and different...
mdokmeci
 
Best SCIENCE Quiz IIT Bomaby Anurag sharma
Best SCIENCE Quiz IIT Bomaby Anurag sharmaBest SCIENCE Quiz IIT Bomaby Anurag sharma
Best SCIENCE Quiz IIT Bomaby Anurag sharma
sudhasharma297367
 
Components of the Human Circulatory System.pptx
Components of the Human  Circulatory System.pptxComponents of the Human  Circulatory System.pptx
Components of the Human Circulatory System.pptx
autumnstreaks
 
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.pptSULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
HRUTUJA WAGH
 
Transgenic Mice in Cancer Research - Creative Biolabs
Transgenic Mice in Cancer Research - Creative BiolabsTransgenic Mice in Cancer Research - Creative Biolabs
Transgenic Mice in Cancer Research - Creative Biolabs
Creative-Biolabs
 
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan CollegeART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
Agin Tom
 
Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...
Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...
Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...
Helena Celeste Mata Rico
 
Preparation of Experimental Animals.pptx
Preparation of Experimental Animals.pptxPreparation of Experimental Animals.pptx
Preparation of Experimental Animals.pptx
klynct
 
Anti tubercular drug Medicinal Chemistry III
Anti tubercular drug Medicinal Chemistry  IIIAnti tubercular drug Medicinal Chemistry  III
Anti tubercular drug Medicinal Chemistry III
HRUTUJA WAGH
 
Antimalarial drug Medicinal Chemistry III
Antimalarial drug Medicinal Chemistry IIIAntimalarial drug Medicinal Chemistry III
Antimalarial drug Medicinal Chemistry III
HRUTUJA WAGH
 
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Professional Content Writing's
 
THE SENSORY ORGANS BY DR. SADAKAT BASHIR.pptx
THE SENSORY ORGANS BY DR. SADAKAT BASHIR.pptxTHE SENSORY ORGANS BY DR. SADAKAT BASHIR.pptx
THE SENSORY ORGANS BY DR. SADAKAT BASHIR.pptx
SadakatBashir
 
External Application in Homoeopathy- Definition,Scope and Types.
External Application  in Homoeopathy- Definition,Scope and Types.External Application  in Homoeopathy- Definition,Scope and Types.
External Application in Homoeopathy- Definition,Scope and Types.
AdharshnaPatrick
 
Macrolide and Miscellaneous Antibiotics.ppt
Macrolide and Miscellaneous Antibiotics.pptMacrolide and Miscellaneous Antibiotics.ppt
Macrolide and Miscellaneous Antibiotics.ppt
HRUTUJA WAGH
 
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
Sérgio Sacani
 
Brief Presentation on Garment Washing.pdf
Brief Presentation on Garment Washing.pdfBrief Presentation on Garment Washing.pdf
Brief Presentation on Garment Washing.pdf
BharathKumar556689
 
The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...
The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...
The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...
Sérgio Sacani
 
Chapter-10-Light-reflection-and-refraction.ppt
Chapter-10-Light-reflection-and-refraction.pptChapter-10-Light-reflection-and-refraction.ppt
Chapter-10-Light-reflection-and-refraction.ppt
uniyaladiti914
 
Phytonematodes, Ecology, Biology and Managementpptx
Phytonematodes, Ecology, Biology and ManagementpptxPhytonematodes, Ecology, Biology and Managementpptx
Phytonematodes, Ecology, Biology and Managementpptx
Dr Showkat Ahmad Wani
 
Biochemistry Lesson_Molecular Polarity.ppt
Biochemistry Lesson_Molecular Polarity.pptBiochemistry Lesson_Molecular Polarity.ppt
Biochemistry Lesson_Molecular Polarity.ppt
ErPri1
 
Micro-grooved zein macro-whiskers for large-scale proliferation and different...
Micro-grooved zein macro-whiskers for large-scale proliferation and different...Micro-grooved zein macro-whiskers for large-scale proliferation and different...
Micro-grooved zein macro-whiskers for large-scale proliferation and different...
mdokmeci
 
Best SCIENCE Quiz IIT Bomaby Anurag sharma
Best SCIENCE Quiz IIT Bomaby Anurag sharmaBest SCIENCE Quiz IIT Bomaby Anurag sharma
Best SCIENCE Quiz IIT Bomaby Anurag sharma
sudhasharma297367
 
Components of the Human Circulatory System.pptx
Components of the Human  Circulatory System.pptxComponents of the Human  Circulatory System.pptx
Components of the Human Circulatory System.pptx
autumnstreaks
 
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.pptSULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
HRUTUJA WAGH
 
Transgenic Mice in Cancer Research - Creative Biolabs
Transgenic Mice in Cancer Research - Creative BiolabsTransgenic Mice in Cancer Research - Creative Biolabs
Transgenic Mice in Cancer Research - Creative Biolabs
Creative-Biolabs
 
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan CollegeART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
Agin Tom
 
Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...
Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...
Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...
Helena Celeste Mata Rico
 
Preparation of Experimental Animals.pptx
Preparation of Experimental Animals.pptxPreparation of Experimental Animals.pptx
Preparation of Experimental Animals.pptx
klynct
 
Anti tubercular drug Medicinal Chemistry III
Anti tubercular drug Medicinal Chemistry  IIIAnti tubercular drug Medicinal Chemistry  III
Anti tubercular drug Medicinal Chemistry III
HRUTUJA WAGH
 
Antimalarial drug Medicinal Chemistry III
Antimalarial drug Medicinal Chemistry IIIAntimalarial drug Medicinal Chemistry III
Antimalarial drug Medicinal Chemistry III
HRUTUJA WAGH
 
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Professional Content Writing's
 
THE SENSORY ORGANS BY DR. SADAKAT BASHIR.pptx
THE SENSORY ORGANS BY DR. SADAKAT BASHIR.pptxTHE SENSORY ORGANS BY DR. SADAKAT BASHIR.pptx
THE SENSORY ORGANS BY DR. SADAKAT BASHIR.pptx
SadakatBashir
 
External Application in Homoeopathy- Definition,Scope and Types.
External Application  in Homoeopathy- Definition,Scope and Types.External Application  in Homoeopathy- Definition,Scope and Types.
External Application in Homoeopathy- Definition,Scope and Types.
AdharshnaPatrick
 
Macrolide and Miscellaneous Antibiotics.ppt
Macrolide and Miscellaneous Antibiotics.pptMacrolide and Miscellaneous Antibiotics.ppt
Macrolide and Miscellaneous Antibiotics.ppt
HRUTUJA WAGH
 
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...An upper limit to the lifetime of stellar remnants from gravitational pair pr...
An upper limit to the lifetime of stellar remnants from gravitational pair pr...
Sérgio Sacani
 
Brief Presentation on Garment Washing.pdf
Brief Presentation on Garment Washing.pdfBrief Presentation on Garment Washing.pdf
Brief Presentation on Garment Washing.pdf
BharathKumar556689
 
The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...
The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...
The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...
Sérgio Sacani
 
Chapter-10-Light-reflection-and-refraction.ppt
Chapter-10-Light-reflection-and-refraction.pptChapter-10-Light-reflection-and-refraction.ppt
Chapter-10-Light-reflection-and-refraction.ppt
uniyaladiti914
 

RDF2Vec: RDF Graph Embeddings for Data Mining

  • 1. 1 RDF2Vec: RDF Graph Embeddings for Data Mining Petar Ristoski and Heiko Paulheim
  • 2. 11/7/2016 2 Introduction Linking Exploration / Selection Consolidation / Cleansing Graph Data Transformation Data Mining Visualization / Explanation Ristoski, Paulheim
  • 3. Motivation • Standard data mining algorithms require propositional feature vector representation • Feature space: V={v1,v2,…, vn} • Each instance is represented as an n-dimensional feature vector (v1,v2,…,vn), where for each 1≤ vi ≤n : – vi ∈ {true, false}, or vi ∈ {1,0} – vi ∈ ℝ – vi ∈ S, where S is a finite set of symbols 11/7/2016 Ristoski, Paulheim 3
  • 4. 11/7/2016 Ristoski, Paulheim 4 Name Person Music Artist Instrument Genre Trent Reznor 1 1 1 0 Wolfgang A. Mozart 1 1 1 1 Barack Obama 1 0 0 0 Motivation
  • 5. Vision • Preserve the information given in the original graph • Unsupervised – task and dataset independent • Compatible with traditional data mining algorithms and tools • Efficient computation and application – Low dimensional representation 11/7/2016 Ristoski, Paulheim 5
  • 7. RDF2Vec • Adaptation of neural language models – Word2vec – Latent representation of words based on text corpus • Convert RDF graphs in sequences of entities and relations (sentences) – Graph Walks – Weisfeiler-Lehman Subtree RDF Graph Kernels • Train neural language model – Each entity and relation is represented as N-dimensional numerical vector – Semantically similar entities appear closer in the embedded space • Use entity vectors in different ML tasks 11/7/2016 Ristoski, Paulheim 7
  • 8. Word2vec – Neural Language Model • Two-layer neural net that converts raw text into vectors – Each word is represented into a numerical vector • Continuous Bag-of-Words (CBOW) – Predict target words from source context words – Tokyo is the capital of Japan • Skip-gram 11/7/2016 8 [1] Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." NIPS, 2013. [2] Rong, Xin. "word2vec parameter learning explained." 2014. Ristoski, Paulheim
  • 10. Word Embedding 11/7/2016 10 • Japan • Russia • Germany • Austria • Berlin • Tokyo • Moscow • Vienna Tokyo = [f1, f2, f3, …, fn] Japan= [f1, f2, f3, …, fn] Ristoski, Paulheim ? v(Japan) - v(Tokyo) + v(Berlin) ≈ v(Germany)
  • 11. Word2vec – Neural Language Model • Two-layer neural net that converts raw text into vectors – Each word is represented into a numerical vector • Continuous Bag-of-Words (CBOW) – Predict target words from source context word – Tokyo is the capital of Japan • Skip-gram – Predict context words from the target word – Tokyo is the capital of Japan 11/7/2016 11 [1] Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." NIPS, 2013. [2] Rong, Xin. "word2vec parameter learning explained." 2014. Ristoski, Paulheim
  • 13. RDF2vec 11/7/2016 13 • Convert the graph into sequence of tokens (sentences) – Graph walks – Weisfeiler-Lehman Subtree RDF Graph Kernels Ristoski, Paulheim
  • 14. Graph Walks RDF2vec • For each entity in the graph: – Extract a subgraph with depth d – Extract walks on the subgraph – Build word2vec model dbr:Trent_Reznor -> dbo:associatedBand -> dbr:Exotic_Birds -> dbo:bandMember -> dbr:Chris_Vrenna dbr:Trent_Reznor -> dbo:genre - > dbr:Dark_ambient -> dbo:instrument -> dbr:Field_recording 11/7/2016 14Ristoski, Paulheim
  • 15. Random Walks RDF2vec 11/7/2016 15 V*S Walks V Vectors Ristoski, Paulheim
  • 16. Entity Embedding 11/7/2016 16 • dbr:Berlin • dbr:Tokyo • dbr:Moscow • dbr:Vienna • dbr:Japan • dbr:Russia • dbr:Germany • dbr:Austria Ristoski, Paulheim dbr:Tokyo = [f1, f2, f3, …, fn] dbr:Japan= [f1, f2, f3, …, fn]
  • 18. WL Kernel RDF2vec • Construct sequences using random walks with depth d after each iteration for each entity in the graph • Graph G sequences after 1 iteration: – 1->6->11; 1->6->11->13; 1->6->11->10 … – 4->11->6; 4->11->13; 4->11->10; 4->11->10->8 … – … 11/7/2016 18 de Vries, Gerben KD. "A fast approximation of the Weisfeiler-Lehman graph kernel for RDF data.“ ECML, 2013. Ristoski, Paulheim
  • 19. WL Kernels RDF2vec 11/7/2016 19 V*S*I sequences V Vectors Ristoski, Paulheim
  • 21. Evaluation Setup • Datasets – 3 domain-specific RDF datasets – 2 large cross-domain RDF datasets with 5 evaluation datasets • Tasks – Classification: Naive Bayes, k-Nearest Neighbors (k=3), C4.5 decision tree and Support Vector Machines. – Regression: Linear Regression, M5Rules, and k-Nearest Neighbors (k=3). • Baselines – Features derived from incoming and outgoing relations and values – Features derived from graph substructures: WL and Walk-Count Kernels 11/7/2016 Ristoski, Paulheim 21
  • 22. Domain Specific RDF Datasets • Datasets • Results (accuracy) – Best scores per dataset 11/7/2016 Ristoski, Paulheim 22 Dataset Task #statements #instances #walks depth #sequences WL iter. WL depth #sequences AIFB C (c=4) 30K 176 all 10 360K 4 2 346K BGS C (c=2) 600K 146 all 10 2.4M 4 2 5.3M MUTAG C (c=2) 80K 340 all 10 168K 4 2 908K Dataset Baseline Walks2vec WL2vec (SG 500) AIFB 92.68 89.55 93.41 BGS 91.05 78.10 96.18 MUTAG 94.29 82.06 96.33
  • 23. Large Cross-Domain RDF Datasets • Datasets • Evaluation datasets 11/7/2016 Ristoski, Paulheim 23 Dataset #instances depth #sequences Vector size model DBpedia 5M 4/8 2.5B 200/500 CBOW/SG Wikidata 17M 4 8.5B 200/500 CBOW/SG Dataset #Instances ML Task Original Source Cities 212 R/C (c=3) Mercer Metacritic Albums 1,600 R/C (c=3) Metacritic Metacritic Movies 2,000 R/C (c=3) Metacritic AAUP 960 R/C(c=3) JSE Forbes 1,585 R/C (c=3) Forbes
  • 24. • Accuracy Results – Best scores only Results: classification Cities Movies Albums AAUP Forbes Best Baseline 75.13 79.30 77.94 93.44 76.75 DB2vec CBOW 200 8 77.39 83.65 78.44 92.23 88.30 DB2vec CBOW 500 8 76.84 83.25 77.25 90.61 89.86 DB2vec SG 200 8 78.92 83.30 79.72 91.04 90.10 DB2vec SG 500 8 89.73 82.80 78.20 94.48 88.53 WD2vec CBOW 200 4 75.56 52.20 51.44 90.18 81.08 WD2vec CBOW 500 4 85.56 51.04 53.28 89.74 80.74 WD2vec SG 200 4 75.48 75.39 64.76 90.50 81.17 WD2vec SG 500 4 83.20 76.30 63.42 90.60 81.17 11/7/2016 Ristoski, Paulheim 24
  • 25. • RMSE Results – Best scores only Results: regression Cities Movies Albums AAUP Forbes Best Baseline 17.04 19.19 12.81 6.16 18.32 1 DB2vec CBOW 200 8 12.55 15.90 11.79 6.47 17.43 DB2vec CBOW 500 8 12.54 15.81 11.30 6.54 17.62 DB2vec SG 200 8 12.85 15.12 10.90 6.22 17.85 DB2vec SG 500 8 10.19 15.45 10.89 6.26 16.61 WD2vec CBOW 200 4 17.52 23.39 14.55 6.60 21.77 WD2vec CBOW 500 4 18.33 22.18 14.00 6.08 21.92 WD2vec SG 200 4 18.69 19.10 13.51 6.52 21.59 WD2vec SG 500 4 19.23 19.19 13.23 6.05 21.58 11/7/2016 Ristoski, Paulheim 25
  • 26. Results Summary • RDF2vec outperform all the baseline approaches – Smaller feature vectors - more efficient training than bassline approaches • WL kernel sequences capture the graph structure better than walks – Not efficient on large graphs – Large number of sequences produced – not scalable • Increasing the depth of the paths increases the quality of the embeddings • The vector dimensionality doesn’t affect the performance • Skip-Gram models constantly outperforms CBOW models • DBpedia produces higher quality embeddings than Wikidata 11/7/2016 Ristoski, Paulheim 26
  • 27. Other Use-Cases • Recommender systems • Document modeling – Document similarity – Entity relatedness • Alignment of knowledge bases – DBpedia and Wikidata • Knowledge base relation prediction and error detection • Linking text and semi-structured knowledge to knowledge bases 11/7/2016 Ristoski, Paulheim 27
  • 28. Conclusion • RDF2Vec: an approach for learning latent numerical representations of entities in RDF graphs • Preserves the graph information • Compatible with all the traditional machine learning algorithms • More efficient ML models training • Task and dataset independent approach • Download the code and the models: http://data.dws.informatik.uni- mannheim.de/rdf2vec/ 11/7/2016 Ristoski, Paulheim 28

Editor's Notes

  • #3: igure~\ref{fig:lodKDDpipeline} gives an overview of a general LOD-enabled knowledge discovery process. Given a set of local data (such as a relational database), the first step is to link the data to the corresponding LOD concepts from the chosen LOD dataset. After the links are set, outgoing links to external LOD datasets can be explored. In the next step, various techniques for data consolidation and cleansing are applied. Next, transformations on the collected data need to be performed in order to represent the data in a way that it can be processed with any arbitrary data analysis algorithms. After the data transformation is done, a suitable data mining algorithm is applied on the data. In the final step, the results of the data mining process are presented to the user.
  • #9: Skip gram softmax is actually logistic regression It is unsupervised method
  • #10: On the input layer we get the context words and in the output layer we are trying to calculate the target word. For each of the input words the vector is retrieved from the input->hiden matrix and averaged which is represented in the hidden layer. This implies that the link (activation) function of the hidden layer units is simply linear r (i.e., directly passing its weighted sum of inputs to the next layer). Using the weights from the hidden-output layer weights, we can compute a score uj for each word in the vocabulary. Then the objective of model is to maximize the average log probability. Where the posterior probability is defined using the softmax function. The averaged vector representation from the context is computed as. With the softmax function we calculate the probabilyt for the target given the context, and we use the value to calculate the error (true is 1 or 0) and then we calculate the loss function using gradient descent and backpropagate through the network We take the input vectors, but you can also take the output vectors, or use sum of both, or empiracal evaluation has shown that concatattion is better
  • #11: If we represent the words in a low dimensional feature space we expect words semanticly similar words to be close to each other.
  • #12: Skip gram softmax is actually logistic regression It is unsupervised method
  • #13: The target word is now at the input layer, and the context words are on the output layer. The input is a one-hot encoded vector. which means h is simply copying a row of the input→hidden weight matrix The objective of the Skip-gram model is to maximize the average log probability. where vw and v ′ w are the “input” and “output” vector representations of w, and W is the number of words in the vocabulary. This formulation is impractical because the cost of computing ∇ log p(wO|wI ) is proportional to W, which is often large (105–107 terms). There are hierarchilcal softmax and negative sampling This inversion might seem like an arbitrary choice, but statistically it has the effect that CBOW smoothes over a lot of the distributional information (by treating an entire context as one observation). For the most part, this turns out to be a useful thing for smaller datasets. However, skip-gram treats each context-target pair as a new observation, and this tends to do better when we have larger datasets. We take the input vectors, but you can also take the output vectors, or use sum of both, or empiracal evaluation has shown that concatattion is better
  • #17: If we represent the words in a low dimensional feature space we expect words semantically similar words to be close to each other.
  翻译: