SlideShare a Scribd company logo
1 
A Comparison of Propositionalization Strategies 
for Creating Features from Linked Open Data 
9/29/2014 Petar Ristoski, Heiko Paulheim
Motivation 
9/29/2014 Ristoski, Paulheim 2
Motivation 
• Many existing applications use LOD as background knowledge 
in data mining 
– Explaining data patterns and statistics: unemployment rate, 
inflation, energy savings, etc … 
– Content-based book/movies recommendation system 
– Classifying incident related tweets 
– Gene classification 
– Prediction of car fuel consumption 
9/29/2014 Ristoski, Paulheim 3
Motivation 
Local LOD 
Data 
link combine cleanse transform analyze 
9/29/2014 Ristoski, Paulheim 4
Motivation 
• Standard data mining algorithms require propositional feature 
vector representation 
• Feature space: V={v1,v2,…, vn} 
• Each instance is represented as an n-dimensional feature vector 
(v1,v2,…,vn), where for each 1≤ vi ≤n : 
– vi ∈ {true, false}, or vi ∈ {1,0} 
– vi ∈ ℝ 
– vi ∈ S, where S is a finite set of symbols 
9/29/2014 Ristoski, Paulheim 5
Motivation 
Name Person Music Artist Instrument Genre 
Trent Reznor 1 1 1 0 
Wolfgang A. Mozart 1 1 1 1 
Barack Obama 1 0 0 0 
9/29/2014 Ristoski, Paulheim 6
Related Work 
• LiDDM (Narasimha et al.) 
– a framework tool for Linked Data mining that capture data from LOD 
cloud to extract hidden information 
• SPARQL-ML (Kiefer et al.) 
– extension to SPARQL to support data mining tasks for knowledge 
discovery in the Semantic Web 
• FeGeLOD (Paulheim et al.) 
– Unsupervised generation of data mining features from LOD 
• The resulting features are binary, or numerical aggregates using 
SPARQL COUNT constructs 
• No proper evaluation of the used propositionalization strategy 
9/29/2014 Ristoski, Paulheim 7
PROPOSED STRATEGIES 
9/29/2014 Ristoski, Paulheim 8
Strategies 
• Strategies for features derived from specific relations 
– r rdf:type C 
– r dcterms:subject S 
• Strategies for features derived from generic relations 
9/29/2014 Ristoski, Paulheim 9
Strategies for Features Derived from Specific Relations 
1. Binary feature: 
– vi =1 if C(r) 
– vi =0 if ⅂C(r) 
2. Relative count feature: 
– vi = 
1 
푛 
, where r has relation to n objects 
3. TF-IDF feature: 
– vi = 
1 
푛 
log 
푁 
{푟|퐶 푟 } 
, where N is the total number of resources in the dataset, 
and {푟|퐶 푟 } denotes the number of resources for which the specific 
relation to an object C exists 
9/29/2014 Ristoski, Paulheim 10
Features Derived from Specific Relations: Binary vs TF-IDF 
+10 Music Artists 
dbpedia:Person 
dbpedia:Artist 
dbpedia:MusicArtist 
dbpedia:MilitaryPerson 
dbpedia:Kris_Kristofferson dbpedia:Elvis_Presley 
Name Person Artist Music Artist Military Person 
Elvis Presley 1 1 1 1 
Kris Kristofferson 1 1 1 1 
Artist X 1 1 1 0 
9/29/2014 Ristoski, Paulheim 11
Features Derived from Specific Relations: Binary vs TF-IDF 
+10 Music Artists 
dbpedia:Person 
dbpedia:Artist 
dbpedia:MusicArtist 
dbpedia:MilitaryPerson 
dbpedia:Kris_Kristofferson dbpedia:Elvis_Presley 
Name Person Artist Music Artist Military Person 
Elvis Presley 0 0 0 0.672 
Kris Kristofferson 0 0 0 0.672 
Artist X 0 0 0 0 
9/29/2014 Ristoski, Paulheim 12
Strategies 
• Strategies for features derived from specific relations 
– r rdf:type C → C(r) 
– dcterms:subject 
• Strategies for features derived from relations as such 
– describe how resource r is related to resource r′ 
– outgoing relation: p(r, r′ ) 
– Incoming relation : p(r′ ,r) 
9/29/2014 Ristoski, Paulheim 13
Strategies for Features Derived from Generic Relations 
1. Binary feature: 
– vi = 1 if p(r,r′) 
– vi = 0 if ⅂p(r) 
2. Count feature: 
– vi = n, where r is connected to n resources with relation p 
3. Relative count feature: 
– vi = 
푛푝 
푃 
, where P is the total number of outgoing relations for r, and np is 
the number of relations of type p for r 
4. TF-IDF feature: 
– vi = 
푛푝 
푃 
log 
푁 
{푟|∃푟′:푝 푟,푟′ } 
, where N is the total number of resources in the 
dataset, and {푟|∃푟′: 푝 푟, 푟′ } denotes the number of resources for which 
p(r, r′) exists 
9/29/2014 Ristoski, Paulheim 14
Features Derived from Generic Relations : Binary vs Relative Count 
dbpedia:Chester_Bennington 
dbpedia:Anthony_Kiedis 
dbpedia:Jules_Verne 
dbpedia:instrument 
8 
5 
1 
dbpedia:author 
68 
Name instrument author 
Chester Bennington 1 0 
Anthony Kiedis 1 1 
Jules Verne 0 1 
9/29/2014 Ristoski, Paulheim 15
Features Derived from Generic Relations : Binary vs Relative Count 
dbpedia:Chester_Bennington 
dbpedia:Anthony_Kiedis 
dbpedia:Jules_Verne 
dbpedia:instrument 
8 
5 
1 
dbpedia:author 
68 
Name instrument author 
Chester Bennington 1 0 
Anthony Kiedis 0.833 0.166 
Jules Verne 0 1 
9/29/2014 Ristoski, Paulheim 16
EVALUATION 
9/29/2014 Ristoski, Paulheim 18
Evaluation 
• Comparative evaluation of the propositionalization strategies on 
three data mining tasks 
– Classification 
– Regression 
– Outlier Detection 
• Evaluated on six datasets, on five feature sets 
– types 
– categories 
– incoming relations 
– outgoing relations 
– incoming and outgoing relations 
9/29/2014 Ristoski, Paulheim 19
Evaluation: Classification 
• Datasets: 
Dataset # instances # types # categories # rel in # rel out # rel in & rel out 
Sports Tweets 5,054 7,814 14,025 3,574 5,334 8,908 
Cities 212 721 999 1,304 1,081 2,385 
• Methods: 
– Naïve Bayes 
– k-Nearest Neighbors (k=3) 
– C4.5 decision trees 
• Metrics for performance evaluation 
– Accuracy 
• Results calculated using stratified 10-fold cross validation 
9/29/2014 Ristoski, Paulheim 20
Evaluation: Classification 
Datasets Cities Sports Tweets 
Features Representation NB k-NN C4.5 Avg. NB k-NN C4.5 Avg. 
types 
Binary 55.71 56.17 59.05 56.98 81 82.9 82.95 82.28 
Relative Count 57.1 49.61 55.22 53.98 80.96 81.44 81.88 81.43 
TF-IDF 57.1 48.7 54.7 53.50 82.13 82.47 82.64 82.41 
categories 
Binary 55.74 49.98 56.17 53.96 82.26 76.56 71.98 76.93 
Relative Count 59.52 44.35 58.96 54.28 90.76 84.09 80.86 85.24 
TF-IDF 55.74 49.98 57.08 54.27 89.65 81.98 81.68 84.44 
rel in 
Binary 60.41 58.46 60.35 59.74 83.12 83.63 84.65 83.80 
Count 56.69 31.1 59.37 49.05 83.25 85.11 85.4 84.59 
Relative Count 49.16 38.23 58.55 48.65 69.51 84.63 85.17 79.77 
TF-IDF 34.94 38.2 54.26 42.47 72.66 84.65 84.99 80.77 
rel out 
Binary 47.62 60 56.71 54.78 80.67 82.36 84.41 82.48 
Count 49.96 55.24 58.59 54.60 79.95 83.35 85.01 82.77 
Relative Count 48.07 58.44 56.65 54.39 62.13 84.27 83.51 76.64 
TF-IDF 40.15 54.78 58.51 51.15 69.91 84.4 84.16 79.49 
r in & out 
Binary 59.44 58.57 56.47 58.16 86.17 85.14 86.45 85.92 
Count 56.13 54.26 60.82 57.07 86.03 86.01 87.16 86.40 
Relative Count 57.68 47.14 56.56 53.79 70.01 84.51 87.22 80.58 
TF-IDF 40.17 46.21 58.46 48.28 75.15 84.86 86.19 82.07 
9/29/2014 Ristoski, Paulheim 21
Evaluation: Regression 
• Datasets: 
Dataset # instances # types # categories # rel in # rel out # rel in & rel out 
Auto MPG 391 264 308 227 370 597 
Cities 212 721 999 1,304 1,081 2,385 
• Methods: 
– Linear Regression 
– M5Rules 
– k-Nearest Neighbors (k=3) 
• Metrics for performance evaluation 
– RMSE 
• Results calculated using stratified 10-fold cross validation 
9/29/2014 Ristoski, Paulheim 22
Evaluation: Regression 
Datasets Auto MPG Cities 
Features Representation LR M5 k-NN Avg. LR M5 k-NN Avg. 
types 
Binary 3.952 3.056 3.63 3.546 24.303 18.793 22.164 21.753 
Relative Count 3.843 2.952 3.571 3.455 18.046 19.696 33.569 23.770 
TF-IDF 3.864 2.964 3.571 3.466 17.852 18.773 22.396 19.674 
categories 
Binary 3.698 2.9 3.62 3.409 18.884 22.323 22.677 21.295 
Relative Count 3.747 3 3.57 3.430 18.952 19.98 34.489 24.474 
TF-IDF 3.782 2.9 3.56 3.416 19.02 22.323 23.189 21.511 
rel in 
Binary 3.849 2.9 3.61 3.444 49.866 19.205 18.532 29.201 
Count 3.892 3 4.62 3.824 138.041 19.915 19.27 59.075 
Relative Count 3.976 2.9 3.57 3.488 122.365 22.335 18.877 54.526 
TF-IDF 4.109 2.8 3.57 3.508 122.921 21.947 18.568 54.479 
rel out 
Binary 3.792 3.1 3.6 3.490 20.008 19.364 20.918 20.097 
Count 4.072 3 4.15 3.734 36.317 19.459 23.994 26.590 
Relative Count 4.095 2.9 3.57 3.536 43.22 21.961 21.472 28.884 
TF-IDF 4.135 3 3.57 3.572 28.845 20.852 22.212 23.970 
r in & out 
Binary 3.991 3.1 3.67 3.572 40.803 18.803 18.211 25.939 
Count 3.991 3.1 4.54 3.870 107.259 19.528 18.906 48.564 
Relative Count 3.922 3 3.57 3.493 103.102 22.091 19.608 48.267 
TF-IDF 3.982 3.01 3.57 3.523 115.373 20.623 19.702 51.899 
9/29/2014 Ristoski, Paulheim 23
Evaluation: Outlier Detection 
• Datasets: 
Dataset # instances # types # rel in # rel out # rel in & rel out 
DBpedia-Peel 2,083 39 586 322 908 
DBpedia-DBTropes 4,228 128 912 2,155 3,067 
• Methods: 
– k-NN Global Anomaly Score – GAS (k=25) 
– Local Outlier Factor – LOF (10<k<50) 
– Local Outlier Probability – LoOP (k=25) 
• Metrics for performance evaluation 
– Area under the ROC curve (AUC) 
• Results calculated on partial gold standard of 100 links 
9/29/2014 Ristoski, Paulheim 24
Evaluation: Outlier Detection 
Datasets Dbpedia-Peel Dbpedia-DBTropes 
Features Representation GAS LOF LoOP Avg. GAS LOF LoOP Avg. 
types 
Binary 0.386 0.487 0.554 0.476 0.503 0.627 0.605 0.578 
Relative Count 0.385 0.398 0.595 0.459 0.503 0.385 0.314 0.401 
TF-IDF 0.386 0.504 0.602 0.497 0.503 0.672 0.417 0.531 
r in 
Binary 0.169 0.367 0.289 0.275 0.426 0.52 0.45 0.465 
Count 0.2 0.285 0.29 0.258 0.503 0.59 0.602 0.565 
Relative Count 0.293 0.496 0.452 0.414 0.589 0.555 0.493 0.546 
TF-IDF 0.14 0.354 0.317 0.270 0.509 0.519 0.568 0.532 
r out 
Binary 0.25 0.195 0.207 0.217 0.325 0.438 0.432 0.398 
Count 0.539 0.455 0.391 0.462 0.547 0.578 0.522 0.549 
Relative Count 0.542 0.544 0.391 0.492 0.618 0.601 0.513 0.577 
TF-IDF 0.116 0.396 0.24 0.251 0.322 0.629 0.472 0.474 
r in & out 
Binary 0.324 0.431 0.51 0.422 0.352 0.439 0.396 0.396 
Count 0.527 0.368 0.454 0.450 0.57 0.563 0.527 0.553 
Relative Count 0.603 0.744 0.616 0.654 0.667 0.672 0.657 0.665 
TF-IDF 0.202 0.667 0.484 0.451 0.481 0.462 0.5 0.481 
9/29/2014 Ristoski, Paulheim 25
Conclusion 
• The chosen propositionalization strategy matters 
• No general recommendation for a strategy 
– What is the data mining task? 
– What are the characteristics of the dataset? 
– Which algorithm is going to be used? 
9/29/2014 Ristoski, Paulheim 26
Future Work 
• Conduct further experiments on more feature sets 
– Qualified incoming and outgoing relations 
– Combine features from multiple LOD sources 
• Conduct experiments on more data mining tasks 
– Clustering, Recommendation Systems etc… 
• More sophisticated strategies 
– Combination of statistical and semantic measures 
– Adaptation of weighting strategies used in text mining to overcome 
problems with erroneous data 
• Use the statistical measures for feature selection 
9/29/2014 Ristoski, Paulheim 27
RapidMiner LOD Extension 
• Simple wiring of operators 
– Importing 
– Linking 
– Feature generation 
– Data consolidation 
– Feature selection 
– Visualization 
9/29/2014 Ristoski, Paulheim 28
RapidMiner LOD Extension 
Local LOD 
Data 
link combine cleanse transform analyze 
9/29/2014 Ristoski, Paulheim 29
RapidMiner LOD Extension 
Data 
Enrichment 
Data 
Analysis 
Linking 
Feature 
Selection 
Schema 
Matching & 
Data Fusion 
9/29/2014 Ristoski, Paulheim 30
RapidMiner LOD Extension 
• Simple wiring of operators 
– Importing 
– Linking 
– Feature generation 
– Data fusion 
– Feature selection 
– Visualization 
• Try it out! 
– find “Linked Open Data” on the marketplace 
– Google Group: groups.google.com/forum/#!forum/rmlod 
9/29/2014 Ristoski, Paulheim 31
32 
A Comparison of Propositionalization Strategies 
for Creating Features from Linked Open Data 
9/29/2014 Petar Ristoski, Heiko Paulheim
Ad

More Related Content

Similar to A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data (20)

DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spaces
Petar Ristoski
 
Efficient blocking method for a large scale citation matching
Efficient blocking method for a large scale citation matchingEfficient blocking method for a large scale citation matching
Efficient blocking method for a large scale citation matching
Mateusz Fedoryszak
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
Polytechnic University of Bari
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
Roelof Pieters
 
Semantic Search and Result Presentation with Entity Cards
Semantic Search and Result Presentation with Entity CardsSemantic Search and Result Presentation with Entity Cards
Semantic Search and Result Presentation with Entity Cards
Faegheh Hasibi
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
Ajay Ohri
 
Visualising Multi-objective Data: From League Tables to Optimisers, and back
Visualising Multi-objective Data: From League Tables to Optimisers, and backVisualising Multi-objective Data: From League Tables to Optimisers, and back
Visualising Multi-objective Data: From League Tables to Optimisers, and back
djw213
 
Optimizing Set-Similarity Join and Search with Different Prefix Schemes
Optimizing Set-Similarity Join and Search with Different Prefix SchemesOptimizing Set-Similarity Join and Search with Different Prefix Schemes
Optimizing Set-Similarity Join and Search with Different Prefix Schemes
HPCC Systems
 
Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open Data
Thomas Gottron
 
Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open Data
REVEAL - Social Media Verification
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
YONG ZHENG
 
Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
Giuseppe Rizzo
 
InstructionsFor this assignment, collect data exhibiting a relat.docx
InstructionsFor this assignment, collect data exhibiting a relat.docxInstructionsFor this assignment, collect data exhibiting a relat.docx
InstructionsFor this assignment, collect data exhibiting a relat.docx
dirkrplav
 
A Survey of Entity Ranking over RDF Graphs
A Survey of Entity Ranking over RDF GraphsA Survey of Entity Ranking over RDF Graphs
A Survey of Entity Ranking over RDF Graphs
Intelligent Search Systems and Semantic Technologies lab at ITIS KFU
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Sung Kim
 
What we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competitionWhat we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competition
Umaporn Kerdsaeng
 
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Cheng Chen
 
Reference Extraction from Wikipedia Infoboxes
Reference Extraction from Wikipedia InfoboxesReference Extraction from Wikipedia Infoboxes
Reference Extraction from Wikipedia Infoboxes
Włodzimierz Lewoniewski
 
Slides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesSlides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data Perspectives
Parang Saraf
 
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
multimediaeval
 
DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spaces
Petar Ristoski
 
Efficient blocking method for a large scale citation matching
Efficient blocking method for a large scale citation matchingEfficient blocking method for a large scale citation matching
Efficient blocking method for a large scale citation matching
Mateusz Fedoryszak
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
Roelof Pieters
 
Semantic Search and Result Presentation with Entity Cards
Semantic Search and Result Presentation with Entity CardsSemantic Search and Result Presentation with Entity Cards
Semantic Search and Result Presentation with Entity Cards
Faegheh Hasibi
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
Ajay Ohri
 
Visualising Multi-objective Data: From League Tables to Optimisers, and back
Visualising Multi-objective Data: From League Tables to Optimisers, and backVisualising Multi-objective Data: From League Tables to Optimisers, and back
Visualising Multi-objective Data: From League Tables to Optimisers, and back
djw213
 
Optimizing Set-Similarity Join and Search with Different Prefix Schemes
Optimizing Set-Similarity Join and Search with Different Prefix SchemesOptimizing Set-Similarity Join and Search with Different Prefix Schemes
Optimizing Set-Similarity Join and Search with Different Prefix Schemes
HPCC Systems
 
Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open Data
Thomas Gottron
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
YONG ZHENG
 
Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...Approximating Numeric Role Fillers via Predictive Clustering Trees  for  Know...
Approximating Numeric Role Fillers via Predictive Clustering Trees for Know...
Giuseppe Rizzo
 
InstructionsFor this assignment, collect data exhibiting a relat.docx
InstructionsFor this assignment, collect data exhibiting a relat.docxInstructionsFor this assignment, collect data exhibiting a relat.docx
InstructionsFor this assignment, collect data exhibiting a relat.docx
dirkrplav
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Sung Kim
 
What we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competitionWhat we got from the Predicting Red Hat Business Value competition
What we got from the Predicting Red Hat Business Value competition
Umaporn Kerdsaeng
 
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Big Data Competition: maximizing your potential
 exampled with the 2014 Higgs...
Cheng Chen
 
Reference Extraction from Wikipedia Infoboxes
Reference Extraction from Wikipedia InfoboxesReference Extraction from Wikipedia Infoboxes
Reference Extraction from Wikipedia Infoboxes
Włodzimierz Lewoniewski
 
Slides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data PerspectivesSlides: Safeguarding Abila through Multiple Data Perspectives
Slides: Safeguarding Abila through Multiple Data Perspectives
Parang Saraf
 
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
MediaEval 2017 Retrieving Diverse Social Images Task (Overview)
multimediaeval
 

Recently uploaded (20)

Phytonematodes, Ecology, Biology and Managementpptx
Phytonematodes, Ecology, Biology and ManagementpptxPhytonematodes, Ecology, Biology and Managementpptx
Phytonematodes, Ecology, Biology and Managementpptx
Dr Showkat Ahmad Wani
 
Anti tubercular drug Medicinal Chemistry III
Anti tubercular drug Medicinal Chemistry  IIIAnti tubercular drug Medicinal Chemistry  III
Anti tubercular drug Medicinal Chemistry III
HRUTUJA WAGH
 
Biochemistry Lesson_Molecular Polarity.ppt
Biochemistry Lesson_Molecular Polarity.pptBiochemistry Lesson_Molecular Polarity.ppt
Biochemistry Lesson_Molecular Polarity.ppt
ErPri1
 
Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...
Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...
Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...
Helena Celeste Mata Rico
 
Seismic evidence of liquid water at the base of Mars' upper crust
Seismic evidence of liquid water at the base of Mars' upper crustSeismic evidence of liquid water at the base of Mars' upper crust
Seismic evidence of liquid water at the base of Mars' upper crust
Sérgio Sacani
 
The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...
The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...
The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...
Sérgio Sacani
 
ANTI URINARY TRACK INFECTION AGENT MC III
ANTI URINARY TRACK INFECTION AGENT MC IIIANTI URINARY TRACK INFECTION AGENT MC III
ANTI URINARY TRACK INFECTION AGENT MC III
HRUTUJA WAGH
 
university of arizona ~ favor's college candidate project.pptx
university of arizona ~ favor's college candidate project.pptxuniversity of arizona ~ favor's college candidate project.pptx
university of arizona ~ favor's college candidate project.pptx
favoranamelechi107
 
Electroencephalogram_ wave components_Aignificancr
Electroencephalogram_ wave components_AignificancrElectroencephalogram_ wave components_Aignificancr
Electroencephalogram_ wave components_Aignificancr
klynct
 
Meiosis Notes Slides biology powerpoint.pptx
Meiosis Notes Slides biology powerpoint.pptxMeiosis Notes Slides biology powerpoint.pptx
Meiosis Notes Slides biology powerpoint.pptx
sbates3
 
Antimalarial drug Medicinal Chemistry III
Antimalarial drug Medicinal Chemistry IIIAntimalarial drug Medicinal Chemistry III
Antimalarial drug Medicinal Chemistry III
HRUTUJA WAGH
 
Integration of AI and ML in Biotechnology
Integration of AI and ML in BiotechnologyIntegration of AI and ML in Biotechnology
Integration of AI and ML in Biotechnology
Sourabh Junawa
 
Macrolide and Miscellaneous Antibiotics.ppt
Macrolide and Miscellaneous Antibiotics.pptMacrolide and Miscellaneous Antibiotics.ppt
Macrolide and Miscellaneous Antibiotics.ppt
HRUTUJA WAGH
 
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan CollegeART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
Agin Tom
 
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Professional Content Writing's
 
External Application in Homoeopathy- Definition,Scope and Types.
External Application  in Homoeopathy- Definition,Scope and Types.External Application  in Homoeopathy- Definition,Scope and Types.
External Application in Homoeopathy- Definition,Scope and Types.
AdharshnaPatrick
 
Freud e sua Historia na Psicanalise Psic
Freud e sua Historia na Psicanalise PsicFreud e sua Historia na Psicanalise Psic
Freud e sua Historia na Psicanalise Psic
StefannyGoffi1
 
A CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptx
A CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptxA CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptx
A CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptx
ANJALICHANDRASEKARAN
 
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.pptSULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
HRUTUJA WAGH
 
Micro-grooved zein macro-whiskers for large-scale proliferation and different...
Micro-grooved zein macro-whiskers for large-scale proliferation and different...Micro-grooved zein macro-whiskers for large-scale proliferation and different...
Micro-grooved zein macro-whiskers for large-scale proliferation and different...
mdokmeci
 
Phytonematodes, Ecology, Biology and Managementpptx
Phytonematodes, Ecology, Biology and ManagementpptxPhytonematodes, Ecology, Biology and Managementpptx
Phytonematodes, Ecology, Biology and Managementpptx
Dr Showkat Ahmad Wani
 
Anti tubercular drug Medicinal Chemistry III
Anti tubercular drug Medicinal Chemistry  IIIAnti tubercular drug Medicinal Chemistry  III
Anti tubercular drug Medicinal Chemistry III
HRUTUJA WAGH
 
Biochemistry Lesson_Molecular Polarity.ppt
Biochemistry Lesson_Molecular Polarity.pptBiochemistry Lesson_Molecular Polarity.ppt
Biochemistry Lesson_Molecular Polarity.ppt
ErPri1
 
Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...
Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...
Chaos and Psychology: Modeling the Human Mind through Nonlinear Dynamical Sys...
Helena Celeste Mata Rico
 
Seismic evidence of liquid water at the base of Mars' upper crust
Seismic evidence of liquid water at the base of Mars' upper crustSeismic evidence of liquid water at the base of Mars' upper crust
Seismic evidence of liquid water at the base of Mars' upper crust
Sérgio Sacani
 
The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...
The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...
The Link Between Subsurface Rheology and EjectaMobility: The Case of Small Ne...
Sérgio Sacani
 
ANTI URINARY TRACK INFECTION AGENT MC III
ANTI URINARY TRACK INFECTION AGENT MC IIIANTI URINARY TRACK INFECTION AGENT MC III
ANTI URINARY TRACK INFECTION AGENT MC III
HRUTUJA WAGH
 
university of arizona ~ favor's college candidate project.pptx
university of arizona ~ favor's college candidate project.pptxuniversity of arizona ~ favor's college candidate project.pptx
university of arizona ~ favor's college candidate project.pptx
favoranamelechi107
 
Electroencephalogram_ wave components_Aignificancr
Electroencephalogram_ wave components_AignificancrElectroencephalogram_ wave components_Aignificancr
Electroencephalogram_ wave components_Aignificancr
klynct
 
Meiosis Notes Slides biology powerpoint.pptx
Meiosis Notes Slides biology powerpoint.pptxMeiosis Notes Slides biology powerpoint.pptx
Meiosis Notes Slides biology powerpoint.pptx
sbates3
 
Antimalarial drug Medicinal Chemistry III
Antimalarial drug Medicinal Chemistry IIIAntimalarial drug Medicinal Chemistry III
Antimalarial drug Medicinal Chemistry III
HRUTUJA WAGH
 
Integration of AI and ML in Biotechnology
Integration of AI and ML in BiotechnologyIntegration of AI and ML in Biotechnology
Integration of AI and ML in Biotechnology
Sourabh Junawa
 
Macrolide and Miscellaneous Antibiotics.ppt
Macrolide and Miscellaneous Antibiotics.pptMacrolide and Miscellaneous Antibiotics.ppt
Macrolide and Miscellaneous Antibiotics.ppt
HRUTUJA WAGH
 
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan CollegeART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
ART.pdf. Agin Tom, clinical Psychology, Prajyoti Niketan College
Agin Tom
 
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Discrete choice experiments: Environmental Improvements to Airthrey Loch Lake...
Professional Content Writing's
 
External Application in Homoeopathy- Definition,Scope and Types.
External Application  in Homoeopathy- Definition,Scope and Types.External Application  in Homoeopathy- Definition,Scope and Types.
External Application in Homoeopathy- Definition,Scope and Types.
AdharshnaPatrick
 
Freud e sua Historia na Psicanalise Psic
Freud e sua Historia na Psicanalise PsicFreud e sua Historia na Psicanalise Psic
Freud e sua Historia na Psicanalise Psic
StefannyGoffi1
 
A CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptx
A CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptxA CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptx
A CASE OF MULTINODULAR GOITRE,clinical presentation and management.pptx
ANJALICHANDRASEKARAN
 
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.pptSULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
SULPHONAMIDES AND SULFONES Medicinal Chemistry III.ppt
HRUTUJA WAGH
 
Micro-grooved zein macro-whiskers for large-scale proliferation and different...
Micro-grooved zein macro-whiskers for large-scale proliferation and different...Micro-grooved zein macro-whiskers for large-scale proliferation and different...
Micro-grooved zein macro-whiskers for large-scale proliferation and different...
mdokmeci
 
Ad

A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data

  • 1. 1 A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data 9/29/2014 Petar Ristoski, Heiko Paulheim
  • 3. Motivation • Many existing applications use LOD as background knowledge in data mining – Explaining data patterns and statistics: unemployment rate, inflation, energy savings, etc … – Content-based book/movies recommendation system – Classifying incident related tweets – Gene classification – Prediction of car fuel consumption 9/29/2014 Ristoski, Paulheim 3
  • 4. Motivation Local LOD Data link combine cleanse transform analyze 9/29/2014 Ristoski, Paulheim 4
  • 5. Motivation • Standard data mining algorithms require propositional feature vector representation • Feature space: V={v1,v2,…, vn} • Each instance is represented as an n-dimensional feature vector (v1,v2,…,vn), where for each 1≤ vi ≤n : – vi ∈ {true, false}, or vi ∈ {1,0} – vi ∈ ℝ – vi ∈ S, where S is a finite set of symbols 9/29/2014 Ristoski, Paulheim 5
  • 6. Motivation Name Person Music Artist Instrument Genre Trent Reznor 1 1 1 0 Wolfgang A. Mozart 1 1 1 1 Barack Obama 1 0 0 0 9/29/2014 Ristoski, Paulheim 6
  • 7. Related Work • LiDDM (Narasimha et al.) – a framework tool for Linked Data mining that capture data from LOD cloud to extract hidden information • SPARQL-ML (Kiefer et al.) – extension to SPARQL to support data mining tasks for knowledge discovery in the Semantic Web • FeGeLOD (Paulheim et al.) – Unsupervised generation of data mining features from LOD • The resulting features are binary, or numerical aggregates using SPARQL COUNT constructs • No proper evaluation of the used propositionalization strategy 9/29/2014 Ristoski, Paulheim 7
  • 8. PROPOSED STRATEGIES 9/29/2014 Ristoski, Paulheim 8
  • 9. Strategies • Strategies for features derived from specific relations – r rdf:type C – r dcterms:subject S • Strategies for features derived from generic relations 9/29/2014 Ristoski, Paulheim 9
  • 10. Strategies for Features Derived from Specific Relations 1. Binary feature: – vi =1 if C(r) – vi =0 if ⅂C(r) 2. Relative count feature: – vi = 1 푛 , where r has relation to n objects 3. TF-IDF feature: – vi = 1 푛 log 푁 {푟|퐶 푟 } , where N is the total number of resources in the dataset, and {푟|퐶 푟 } denotes the number of resources for which the specific relation to an object C exists 9/29/2014 Ristoski, Paulheim 10
  • 11. Features Derived from Specific Relations: Binary vs TF-IDF +10 Music Artists dbpedia:Person dbpedia:Artist dbpedia:MusicArtist dbpedia:MilitaryPerson dbpedia:Kris_Kristofferson dbpedia:Elvis_Presley Name Person Artist Music Artist Military Person Elvis Presley 1 1 1 1 Kris Kristofferson 1 1 1 1 Artist X 1 1 1 0 9/29/2014 Ristoski, Paulheim 11
  • 12. Features Derived from Specific Relations: Binary vs TF-IDF +10 Music Artists dbpedia:Person dbpedia:Artist dbpedia:MusicArtist dbpedia:MilitaryPerson dbpedia:Kris_Kristofferson dbpedia:Elvis_Presley Name Person Artist Music Artist Military Person Elvis Presley 0 0 0 0.672 Kris Kristofferson 0 0 0 0.672 Artist X 0 0 0 0 9/29/2014 Ristoski, Paulheim 12
  • 13. Strategies • Strategies for features derived from specific relations – r rdf:type C → C(r) – dcterms:subject • Strategies for features derived from relations as such – describe how resource r is related to resource r′ – outgoing relation: p(r, r′ ) – Incoming relation : p(r′ ,r) 9/29/2014 Ristoski, Paulheim 13
  • 14. Strategies for Features Derived from Generic Relations 1. Binary feature: – vi = 1 if p(r,r′) – vi = 0 if ⅂p(r) 2. Count feature: – vi = n, where r is connected to n resources with relation p 3. Relative count feature: – vi = 푛푝 푃 , where P is the total number of outgoing relations for r, and np is the number of relations of type p for r 4. TF-IDF feature: – vi = 푛푝 푃 log 푁 {푟|∃푟′:푝 푟,푟′ } , where N is the total number of resources in the dataset, and {푟|∃푟′: 푝 푟, 푟′ } denotes the number of resources for which p(r, r′) exists 9/29/2014 Ristoski, Paulheim 14
  • 15. Features Derived from Generic Relations : Binary vs Relative Count dbpedia:Chester_Bennington dbpedia:Anthony_Kiedis dbpedia:Jules_Verne dbpedia:instrument 8 5 1 dbpedia:author 68 Name instrument author Chester Bennington 1 0 Anthony Kiedis 1 1 Jules Verne 0 1 9/29/2014 Ristoski, Paulheim 15
  • 16. Features Derived from Generic Relations : Binary vs Relative Count dbpedia:Chester_Bennington dbpedia:Anthony_Kiedis dbpedia:Jules_Verne dbpedia:instrument 8 5 1 dbpedia:author 68 Name instrument author Chester Bennington 1 0 Anthony Kiedis 0.833 0.166 Jules Verne 0 1 9/29/2014 Ristoski, Paulheim 16
  • 18. Evaluation • Comparative evaluation of the propositionalization strategies on three data mining tasks – Classification – Regression – Outlier Detection • Evaluated on six datasets, on five feature sets – types – categories – incoming relations – outgoing relations – incoming and outgoing relations 9/29/2014 Ristoski, Paulheim 19
  • 19. Evaluation: Classification • Datasets: Dataset # instances # types # categories # rel in # rel out # rel in & rel out Sports Tweets 5,054 7,814 14,025 3,574 5,334 8,908 Cities 212 721 999 1,304 1,081 2,385 • Methods: – Naïve Bayes – k-Nearest Neighbors (k=3) – C4.5 decision trees • Metrics for performance evaluation – Accuracy • Results calculated using stratified 10-fold cross validation 9/29/2014 Ristoski, Paulheim 20
  • 20. Evaluation: Classification Datasets Cities Sports Tweets Features Representation NB k-NN C4.5 Avg. NB k-NN C4.5 Avg. types Binary 55.71 56.17 59.05 56.98 81 82.9 82.95 82.28 Relative Count 57.1 49.61 55.22 53.98 80.96 81.44 81.88 81.43 TF-IDF 57.1 48.7 54.7 53.50 82.13 82.47 82.64 82.41 categories Binary 55.74 49.98 56.17 53.96 82.26 76.56 71.98 76.93 Relative Count 59.52 44.35 58.96 54.28 90.76 84.09 80.86 85.24 TF-IDF 55.74 49.98 57.08 54.27 89.65 81.98 81.68 84.44 rel in Binary 60.41 58.46 60.35 59.74 83.12 83.63 84.65 83.80 Count 56.69 31.1 59.37 49.05 83.25 85.11 85.4 84.59 Relative Count 49.16 38.23 58.55 48.65 69.51 84.63 85.17 79.77 TF-IDF 34.94 38.2 54.26 42.47 72.66 84.65 84.99 80.77 rel out Binary 47.62 60 56.71 54.78 80.67 82.36 84.41 82.48 Count 49.96 55.24 58.59 54.60 79.95 83.35 85.01 82.77 Relative Count 48.07 58.44 56.65 54.39 62.13 84.27 83.51 76.64 TF-IDF 40.15 54.78 58.51 51.15 69.91 84.4 84.16 79.49 r in & out Binary 59.44 58.57 56.47 58.16 86.17 85.14 86.45 85.92 Count 56.13 54.26 60.82 57.07 86.03 86.01 87.16 86.40 Relative Count 57.68 47.14 56.56 53.79 70.01 84.51 87.22 80.58 TF-IDF 40.17 46.21 58.46 48.28 75.15 84.86 86.19 82.07 9/29/2014 Ristoski, Paulheim 21
  • 21. Evaluation: Regression • Datasets: Dataset # instances # types # categories # rel in # rel out # rel in & rel out Auto MPG 391 264 308 227 370 597 Cities 212 721 999 1,304 1,081 2,385 • Methods: – Linear Regression – M5Rules – k-Nearest Neighbors (k=3) • Metrics for performance evaluation – RMSE • Results calculated using stratified 10-fold cross validation 9/29/2014 Ristoski, Paulheim 22
  • 22. Evaluation: Regression Datasets Auto MPG Cities Features Representation LR M5 k-NN Avg. LR M5 k-NN Avg. types Binary 3.952 3.056 3.63 3.546 24.303 18.793 22.164 21.753 Relative Count 3.843 2.952 3.571 3.455 18.046 19.696 33.569 23.770 TF-IDF 3.864 2.964 3.571 3.466 17.852 18.773 22.396 19.674 categories Binary 3.698 2.9 3.62 3.409 18.884 22.323 22.677 21.295 Relative Count 3.747 3 3.57 3.430 18.952 19.98 34.489 24.474 TF-IDF 3.782 2.9 3.56 3.416 19.02 22.323 23.189 21.511 rel in Binary 3.849 2.9 3.61 3.444 49.866 19.205 18.532 29.201 Count 3.892 3 4.62 3.824 138.041 19.915 19.27 59.075 Relative Count 3.976 2.9 3.57 3.488 122.365 22.335 18.877 54.526 TF-IDF 4.109 2.8 3.57 3.508 122.921 21.947 18.568 54.479 rel out Binary 3.792 3.1 3.6 3.490 20.008 19.364 20.918 20.097 Count 4.072 3 4.15 3.734 36.317 19.459 23.994 26.590 Relative Count 4.095 2.9 3.57 3.536 43.22 21.961 21.472 28.884 TF-IDF 4.135 3 3.57 3.572 28.845 20.852 22.212 23.970 r in & out Binary 3.991 3.1 3.67 3.572 40.803 18.803 18.211 25.939 Count 3.991 3.1 4.54 3.870 107.259 19.528 18.906 48.564 Relative Count 3.922 3 3.57 3.493 103.102 22.091 19.608 48.267 TF-IDF 3.982 3.01 3.57 3.523 115.373 20.623 19.702 51.899 9/29/2014 Ristoski, Paulheim 23
  • 23. Evaluation: Outlier Detection • Datasets: Dataset # instances # types # rel in # rel out # rel in & rel out DBpedia-Peel 2,083 39 586 322 908 DBpedia-DBTropes 4,228 128 912 2,155 3,067 • Methods: – k-NN Global Anomaly Score – GAS (k=25) – Local Outlier Factor – LOF (10<k<50) – Local Outlier Probability – LoOP (k=25) • Metrics for performance evaluation – Area under the ROC curve (AUC) • Results calculated on partial gold standard of 100 links 9/29/2014 Ristoski, Paulheim 24
  • 24. Evaluation: Outlier Detection Datasets Dbpedia-Peel Dbpedia-DBTropes Features Representation GAS LOF LoOP Avg. GAS LOF LoOP Avg. types Binary 0.386 0.487 0.554 0.476 0.503 0.627 0.605 0.578 Relative Count 0.385 0.398 0.595 0.459 0.503 0.385 0.314 0.401 TF-IDF 0.386 0.504 0.602 0.497 0.503 0.672 0.417 0.531 r in Binary 0.169 0.367 0.289 0.275 0.426 0.52 0.45 0.465 Count 0.2 0.285 0.29 0.258 0.503 0.59 0.602 0.565 Relative Count 0.293 0.496 0.452 0.414 0.589 0.555 0.493 0.546 TF-IDF 0.14 0.354 0.317 0.270 0.509 0.519 0.568 0.532 r out Binary 0.25 0.195 0.207 0.217 0.325 0.438 0.432 0.398 Count 0.539 0.455 0.391 0.462 0.547 0.578 0.522 0.549 Relative Count 0.542 0.544 0.391 0.492 0.618 0.601 0.513 0.577 TF-IDF 0.116 0.396 0.24 0.251 0.322 0.629 0.472 0.474 r in & out Binary 0.324 0.431 0.51 0.422 0.352 0.439 0.396 0.396 Count 0.527 0.368 0.454 0.450 0.57 0.563 0.527 0.553 Relative Count 0.603 0.744 0.616 0.654 0.667 0.672 0.657 0.665 TF-IDF 0.202 0.667 0.484 0.451 0.481 0.462 0.5 0.481 9/29/2014 Ristoski, Paulheim 25
  • 25. Conclusion • The chosen propositionalization strategy matters • No general recommendation for a strategy – What is the data mining task? – What are the characteristics of the dataset? – Which algorithm is going to be used? 9/29/2014 Ristoski, Paulheim 26
  • 26. Future Work • Conduct further experiments on more feature sets – Qualified incoming and outgoing relations – Combine features from multiple LOD sources • Conduct experiments on more data mining tasks – Clustering, Recommendation Systems etc… • More sophisticated strategies – Combination of statistical and semantic measures – Adaptation of weighting strategies used in text mining to overcome problems with erroneous data • Use the statistical measures for feature selection 9/29/2014 Ristoski, Paulheim 27
  • 27. RapidMiner LOD Extension • Simple wiring of operators – Importing – Linking – Feature generation – Data consolidation – Feature selection – Visualization 9/29/2014 Ristoski, Paulheim 28
  • 28. RapidMiner LOD Extension Local LOD Data link combine cleanse transform analyze 9/29/2014 Ristoski, Paulheim 29
  • 29. RapidMiner LOD Extension Data Enrichment Data Analysis Linking Feature Selection Schema Matching & Data Fusion 9/29/2014 Ristoski, Paulheim 30
  • 30. RapidMiner LOD Extension • Simple wiring of operators – Importing – Linking – Feature generation – Data fusion – Feature selection – Visualization • Try it out! – find “Linked Open Data” on the marketplace – Google Group: groups.google.com/forum/#!forum/rmlod 9/29/2014 Ristoski, Paulheim 31
  • 31. 32 A Comparison of Propositionalization Strategies for Creating Features from Linked Open Data 9/29/2014 Petar Ristoski, Heiko Paulheim
  翻译: