SlideShare a Scribd company logo
Distributed Query Processing for
Federated RDF Data Management
Olaf Görlitz
07.11.2014
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 2
The Linked Open Data Cloud
Use as one large database!
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 3
Life Science Scenario
Find drugs for
nutritional supplementation
SELECT ?drug ?id ?title WHERE {
  ?drug drugbank:drugCategory category:micronutrient .
  ?drug drugbank:casRegistryNumber ?id .
  ?keggDrug rdf:type kegg:Drug .
  ?keggDrug bio2rdf:xRef ?id .
  ?keggDrug purl:title ?title .
}
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 4
Linked Data Querying Paradigms
Data Warehouse
Link Traversal
Federation
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 5
Linked Data Querying Paradigms
Requirements Data Warehouse Link Traversal Federation
Query Expressiveness
Schema Mapping
Data Freshness
Result Completeness
Scalability
Flexibility
Availability
Performance
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 6
Contributions
Large Scale
Information Retrieval
RDF Federation &
Query Optimization
Benchmarking RDF
Federation Systems
PINTS
Peer-to-Peer Statistics
Management
SPLENDID
Distributed SPARQL
Query Processing
SPLODGE
Linked Data Query
Generation
Görlitz, Staab: SPLENDID: SPARQL
Endpoint Federation Exploiting VOID
Descriptions. COLD'11
Görlitz, Thimm, Staab: SPLODGE:
Systematic Generation of SPARQL
Benchmark Queries for Linked Open
Data. ISWC'12
Görlitz, Sizov, Staab: PINTS: Peer-
to-Peer Infrastructure for Tagging
Systems. IPTPS'08
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 7
SPLENDID Federation
Federated Databases Federated RDF
● Relational Schema ● Implicit Schema, Ontologies
● Specific Data Wrappers ● SPARQL endpoints
● Rich Data Statistics ● Limited Statistics (voiD)
Execute complex SPARQL queries
over federated RDF data sources
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 8
SPLENDID Federation
SPARQL
Query
Source
Selection
Query
Optimization
Query
Execution
SELECT ?drug ?id ?title WHERE {
  ?drug drugbank:drugCategory category:micronutrient .
  ?drug drugbank:casRegistryNumber ?id .
  ?keggDrug bio2rdf:xRef ?id .
  ?keggDrug rdf:type kegg:Drug .
  ?keggDrug purl:title ?title .
}
⋈?drug
⋈?id
⋈?keggDrug
⋈?keggDrug
? drugdrugbank :drugCategory category: micronutrient
? drugdrugbank :casRegistryNumber ?id
? keggDrugrdf : type kegg: Drug
? keggDrugbio 2rdf : xRef ?id
? keggDrugpurl: title? title
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 9
Source Selection Objectives
SPARQL
Query
Source
Selection
Query
Optimization
Query
Execution
Determine all relevant data sources
DARQ FedX SPLENDID
● Explicit 'capabilities'
● Query restrictions
(bound predicates)
● ASK queries + caching
many (initial) requests
● Sub query aggregation
● VoiD descriptions
+ ASK queries
● Sub query aggregation
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 10
voiD voiD voiDvoiD
Source Selection Example
SELECT ?drug ?title WHERE {
  ?drug drugbank:drugCategory category:micronutrient .
  ?drug drugbank:casRegistryNumber ?id .
  ?keggDrug rdf:type kegg:Drug .
  ?keggDrug bio2rdf:xRef ?id .
  ?keggDrug purl:title ?title .
}
→ KEGG, DBpedia, ChEBI
→ KEGG
→ DrugBank
SPARQL
ASK
→ DrugBank, ChEBI
→ KEGG
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 11
Source Selection Result
⋈?drug
⋈?id
⋈?keggDrug
⋈?keggDrug
? drugdrugbank :drugCategory category: micronutrient
? drugdrugbank :casRegistryNumber ?id
? keggDrugrdf : type kegg: Drug
? keggDrugbio 2rdf: xRef ?id
? keggDrugpurl: title? title
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 12
Query Optimization
SPARQL
Query
Source
Selection
Query
Optimization
Query
Execution
Find best (fastest) query execution plan
DARQ FedX SPLENDID
● Dynamic Programming
● Custom Statistics
● Only bound predicates
● Bind Join
● Join Order Heuristics
● No Statistics
● Join Chains
● Bind Join
● Dynamic Programming
● Extended voiD statistics
● Bind + Hash Join
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 13
Dynamic Programming
● iterate over all possible execution plans
● compare cost (execution time)
BindJoin,
HashJoin
⋈?drug
⋈?id
⋈?keggDrug
⋈?keggDrug
? drugdrugbank :drugCategory category: micronutrient
? drugdrugbank :casRegistryNumber ?id
? keggDrugrdf : type kegg: Drug
? keggDrugbio 2rdf : xRef ?id
? keggDrugpurl: title? title
Cost Model
costsend−query
costreceive−tuple
card(R(qi ))
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 14
Cardinality Estimation
⋈?drug
⋈?id
⋈?keggDrug
⋈?keggDrug
? drugdrugbank :drugCategory category: micronutrient
? drugdrugbank :casRegistryNumber ?id
? keggDrugrdf: type kegg: Drug
? keggDrugbio 2rdf : xRef ?id
? keggDrugpurl: title? title
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 15
Cardinality Estimation (Triple Pattern)
cardd (s, p,o) = |d|⋅seld(s)⋅seld (p)⋅seld(o), d∈D
Assuming independence of s, p ,o
cardd (?,p,?)
cardd (s ,? ,?)
cardd (?,?,o)
cardd (s ,? ,o)
cardd (s ,p,?)
cardd (?,p,o)
cardd (?,?,?) cardd (s,p,o)= voiDd →|d| = 1
= voiDd →p
=
voiDd→|d|
voiDd →|s|
=
voiDd→|d|
voiDd →|o|
= 1
=
voiDd →p
voiDd→|sp|
=
voiDd →p
voiDd→|op|
cardd (?,rdf: type,T) = voiDd →T
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 16
Cardinality Estimation (Basic Graph Pattern)
Star Pattern Path Pattern
kegg:Drug
?keggDrug
rn:R01786
?title
rdf:Type
purl:title
bio2rdf:xRef
drugbank:Drug
?keggDrug
rdf:Type
owl:sameAs
?drug kegg:Drug
rdf:Type
cardd
*
(P1 ⋈ P2 ⋈ P3) =
min(cardd (P1),cardd (P2))
⋅
voiDd →p3
voiDd →|sp3
|
cardd ,d '
~
(P1 ⋈ P2) =
cardd (P1)⋅cardd ' (P2)
⋅seld ,d ' (P1 ⋈ P2)
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 17
Query Optimization
SPARQL
Query
Source
Selection
Query
Optimization
Query
Execution
⋈?drug
⋈B(? id)
⋈?keggDrug
⋈H(? keggDrug)
? drugdrugbank :drugCategory category: micronutrient
? drugdrugbank :casRegistryNumber ?id
? keggDrugrdf : type kegg: Drug
? keggDrugbio 2rdf: xRef ?id
? keggDrugpurl: title? title
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 18
Evaluation Methodology
Compare with state-of-the-art federation systems
– Use Multiple linked datasets
– With representative characteristics
– Execute 'typical' SPARQL queries
– In a reproducible benchmark setup
FedBench
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 19
Evaluation Results
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 20
Conclusion
● Federation for Linked Open Data
– Database + Semantic Web technology
– Efficient Distributed Query Processing
– Extension of voiD statistics
● Query generation for Federation Benchmarks
● Efficient statistics management in P2P networks
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 21
Thank You
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 22
VoiD Descriptions/Statistics
}
}
}
} General Information
Basic statistics
triples = 732744
Type statistics
chebi:Compound = 50477
Predicate statistics
bio:formula = 39555
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 23
VoiD statistics extension
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 24
State of the Art
DARQ AliBaba FedX SPLENDID
Statistics ServiceDesc – – VoiD
Source
Selection
Statistics
(predicates)
All sources ASK queries Statistics +
ASK queries
Query
Optimization
DynProg Heuristics Heuristics DynProg
Query
Execution
Bind join Bind join Bound Join +
parallelization
Bind Join +
Hash Join
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 25
SPARQL limitations
● Query protocol
● Only SPARQL endpoints
● Endpoint limitations
– SPARQL version
– Result size
– Data rate
– Availability
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 26
Join Implementation
R1 R2 R1 R2
⋈B ⋈H
Bind Join Hash Join
?id ?y
1 42
2 13
3 20
4 50
5 3
?id ?x
1 'A'
1 'G'
4 'A'
7 'A'
7 'C'
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 27
Join Cost Model
R(q1) R(q2 ') R(q1) R(q2)
⋈B ⋈H
Bind Join Hash Join
cost⋈B
(q1, q2) = |R(q1)|⋅costtuple +
|R(q1)|⋅costquery +
|R(q2')|⋅costtuple
cost⋈H
(q1, q2) = |R(q1)|⋅costtuple +
|R(q2)|⋅costtuple +
2⋅costquery
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 28
SPARQL Semi Join
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 29
SPLENDID Architecture
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 30
FedBench Datasets
● Cross Domain
● Life Science
● Linked Data
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 31
Data Source Selection: Requests
Olaf Görlitz: Distributed Query Processing for
Federated RDF Data Management
07.11.2014
Slide 32
Conclusion
Linked Open Data voiD
Web-scale Query Processing
SPLENDID
Ad

More Related Content

What's hot (20)

ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
Juan Antonio Vizcaino
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
Olaf Hartig
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Olaf Hartig
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Olaf Hartig
 
A Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked DataA Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked Data
Olaf Hartig
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
Ontotext
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
WU (Vienna University of Economics and Business)
 
Modelling context and statement-level metadata in knowledge graphs
Modelling context and statement-level metadata in knowledge graphsModelling context and statement-level metadata in knowledge graphs
Modelling context and statement-level metadata in knowledge graphs
Fabrizio Orlandi
 
HDF5 FastQuery
HDF5 FastQueryHDF5 FastQuery
HDF5 FastQuery
The HDF-EOS Tools and Information Center
 
Projection Indexes for HDF5 Datasets
Projection Indexes for HDF5 DatasetsProjection Indexes for HDF5 Datasets
Projection Indexes for HDF5 Datasets
The HDF-EOS Tools and Information Center
 
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopI Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
GoDataDriven
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Rothamsted Research, UK
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Ontotext
 
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data StreamsEfficient RDF Interchange (ERI) Format for RDF Data Streams
Efficient RDF Interchange (ERI) Format for RDF Data Streams
WU (Vienna University of Economics and Business)
 
Hdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last MileHdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last Mile
Ted Habermann
 
RDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingRDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct Mapping
Boris Villazón-Terrazas
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic Web
Myungjin Lee
 
The HDF Product Designer – Interoperability in the First Mile
The HDF Product Designer – Interoperability in the First MileThe HDF Product Designer – Interoperability in the First Mile
The HDF Product Designer – Interoperability in the First Mile
Ted Habermann
 
HDF Tools Tutorial
HDF Tools TutorialHDF Tools Tutorial
HDF Tools Tutorial
The HDF-EOS Tools and Information Center
 
Introduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming ModelsIntroduction to HDF5 Data and Programming Models
Introduction to HDF5 Data and Programming Models
The HDF-EOS Tools and Information Center
 
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
ProteomeXchange Experience: PXD Identifiers and Release of Data on Acceptance...
Juan Antonio Vizcaino
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
Olaf Hartig
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Olaf Hartig
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
Olaf Hartig
 
A Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked DataA Main Memory Index Structure to Query Linked Data
A Main Memory Index Structure to Query Linked Data
Olaf Hartig
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
Ontotext
 
Modelling context and statement-level metadata in knowledge graphs
Modelling context and statement-level metadata in knowledge graphsModelling context and statement-level metadata in knowledge graphs
Modelling context and statement-level metadata in knowledge graphs
Fabrizio Orlandi
 
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with HadoopI Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
I Mapreduced a Neo store: Creating large Neo4j Databases with Hadoop
GoDataDriven
 
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Behind the Scenes of KnetMiner: Towards Standardised and Interoperable Knowle...
Rothamsted Research, UK
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Ontotext
 
Hdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last MileHdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last Mile
Ted Habermann
 
RDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct MappingRDB2RDF, an overview of R2RML and Direct Mapping
RDB2RDF, an overview of R2RML and Direct Mapping
Boris Villazón-Terrazas
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic Web
Myungjin Lee
 
The HDF Product Designer – Interoperability in the First Mile
The HDF Product Designer – Interoperability in the First MileThe HDF Product Designer – Interoperability in the First Mile
The HDF Product Designer – Interoperability in the First Mile
Ted Habermann
 

Similar to Distributed Query Processing for Federated RDF Data Management (20)

2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
Jun Zhao
 
Tese phd
Tese phdTese phd
Tese phd
Rodrigo Senra
 
Linked Data for improved organization of research data
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research data
Samuel Lampa
 
Linked Open Data (LOD) part 2
Linked Open Data (LOD)  part 2Linked Open Data (LOD)  part 2
Linked Open Data (LOD) part 2
IPLODProject
 
Semantic Web talk TEMPLATE
Semantic Web talk TEMPLATESemantic Web talk TEMPLATE
Semantic Web talk TEMPLATE
Oleksiy Pylypenko
 
Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
EUCLID project
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and Clojure
Dr. Christian Betz
 
Linking the world with Python and Semantics
Linking the world with Python and SemanticsLinking the world with Python and Semantics
Linking the world with Python and Semantics
Tatiana Al-Chueyr
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
François Belleau
 
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Ontotext
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
Ivan Ermilov
 
Accessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBCAccessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBC
Kingsley Uyi Idehen
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
Luis Daniel Ibáñez
 
Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013
Antonio De Marinis
 
Grails And The Semantic Web
Grails And The Semantic WebGrails And The Semantic Web
Grails And The Semantic Web
william_greenly
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
Holistic Benchmarking of Big Linked Data
 
Publishing "5 star" data: the case for RDF
Publishing "5 star" data: the case for RDFPublishing "5 star" data: the case for RDF
Publishing "5 star" data: the case for RDF
PeterWinstanley1
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
Muhammad Saleem
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queries
Basil Ell
 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
WU (Vienna University of Economics and Business)
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
Jun Zhao
 
Linked Data for improved organization of research data
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research data
Samuel Lampa
 
Linked Open Data (LOD) part 2
Linked Open Data (LOD)  part 2Linked Open Data (LOD)  part 2
Linked Open Data (LOD) part 2
IPLODProject
 
Big Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and ClojureBig Data Processing using Apache Spark and Clojure
Big Data Processing using Apache Spark and Clojure
Dr. Christian Betz
 
Linking the world with Python and Semantics
Linking the world with Python and SemanticsLinking the world with Python and Semantics
Linking the world with Python and Semantics
Tatiana Al-Chueyr
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
François Belleau
 
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Transforming Your Data with GraphDB: GraphDB Fundamentals, Jan 2018
Ontotext
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
Ivan Ermilov
 
Accessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBCAccessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBC
Kingsley Uyi Idehen
 
LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013LiveLinkedData - TransWebData - Nantes 2013
LiveLinkedData - TransWebData - Nantes 2013
Luis Daniel Ibáñez
 
Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013
Antonio De Marinis
 
Grails And The Semantic Web
Grails And The Semantic WebGrails And The Semantic Web
Grails And The Semantic Web
william_greenly
 
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
4th Natural Language Interface over the Web of Data (NLIWoD) workshop and QAL...
Holistic Benchmarking of Big Linked Data
 
Publishing "5 star" data: the case for RDF
Publishing "5 star" data: the case for RDFPublishing "5 star" data: the case for RDF
Publishing "5 star" data: the case for RDF
PeterWinstanley1
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
Muhammad Saleem
 
SPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queriesSPARTIQULATION - Verbalizing SPARQL queries
SPARTIQULATION - Verbalizing SPARQL queries
Basil Ell
 
Ad

Recently uploaded (13)

introduction to html and cssIntroHTML.ppt
introduction to html and cssIntroHTML.pptintroduction to html and cssIntroHTML.ppt
introduction to html and cssIntroHTML.ppt
SherifElGohary7
 
DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...
DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...
DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...
werhkr1
 
Breaking Down the Latest Spectrum Internet Plans.pdf
Breaking Down the Latest Spectrum Internet Plans.pdfBreaking Down the Latest Spectrum Internet Plans.pdf
Breaking Down the Latest Spectrum Internet Plans.pdf
Internet Bundle Now
 
Java developer-friendly frontends: Build UIs without the JavaScript hassle- JCON
Java developer-friendly frontends: Build UIs without the JavaScript hassle- JCONJava developer-friendly frontends: Build UIs without the JavaScript hassle- JCON
Java developer-friendly frontends: Build UIs without the JavaScript hassle- JCON
Jago de Vreede
 
Paper: World Game (s) Great Redesign.pdf
Paper: World Game (s) Great Redesign.pdfPaper: World Game (s) Great Redesign.pdf
Paper: World Game (s) Great Redesign.pdf
Steven McGee
 
Presentation Mehdi Monitorama 2022 Cancer and Monitoring
Presentation Mehdi Monitorama 2022 Cancer and MonitoringPresentation Mehdi Monitorama 2022 Cancer and Monitoring
Presentation Mehdi Monitorama 2022 Cancer and Monitoring
mdaoudi
 
How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
ProjectArtificial Intelligence Good or Evil.pptx
ProjectArtificial Intelligence Good or Evil.pptxProjectArtificial Intelligence Good or Evil.pptx
ProjectArtificial Intelligence Good or Evil.pptx
OlenaKotovska
 
GiacomoVacca - WebRTC - troubleshooting media negotiation.pdf
GiacomoVacca - WebRTC - troubleshooting media negotiation.pdfGiacomoVacca - WebRTC - troubleshooting media negotiation.pdf
GiacomoVacca - WebRTC - troubleshooting media negotiation.pdf
Giacomo Vacca
 
plataforma virtual E learning y sus características.pdf
plataforma virtual E learning y sus características.pdfplataforma virtual E learning y sus características.pdf
plataforma virtual E learning y sus características.pdf
valdiviesovaleriamis
 
IoT PPT introduction to internet of things
IoT PPT introduction to internet of thingsIoT PPT introduction to internet of things
IoT PPT introduction to internet of things
VaishnaviPatil3995
 
Cloud-to-cloud Migration presentation.pptx
Cloud-to-cloud Migration presentation.pptxCloud-to-cloud Migration presentation.pptx
Cloud-to-cloud Migration presentation.pptx
marketing140789
 
The Hidden Risks of Hiring Hackers to Change Grades: An Awareness Guide
The Hidden Risks of Hiring Hackers to Change Grades: An Awareness GuideThe Hidden Risks of Hiring Hackers to Change Grades: An Awareness Guide
The Hidden Risks of Hiring Hackers to Change Grades: An Awareness Guide
russellpeter1995
 
introduction to html and cssIntroHTML.ppt
introduction to html and cssIntroHTML.pptintroduction to html and cssIntroHTML.ppt
introduction to html and cssIntroHTML.ppt
SherifElGohary7
 
DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...
DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...
DEF CON 25 - Whitney-Merrill-and-Terrell-McSweeny-Tick-Tick-Boom-Tech-and-the...
werhkr1
 
Breaking Down the Latest Spectrum Internet Plans.pdf
Breaking Down the Latest Spectrum Internet Plans.pdfBreaking Down the Latest Spectrum Internet Plans.pdf
Breaking Down the Latest Spectrum Internet Plans.pdf
Internet Bundle Now
 
Java developer-friendly frontends: Build UIs without the JavaScript hassle- JCON
Java developer-friendly frontends: Build UIs without the JavaScript hassle- JCONJava developer-friendly frontends: Build UIs without the JavaScript hassle- JCON
Java developer-friendly frontends: Build UIs without the JavaScript hassle- JCON
Jago de Vreede
 
Paper: World Game (s) Great Redesign.pdf
Paper: World Game (s) Great Redesign.pdfPaper: World Game (s) Great Redesign.pdf
Paper: World Game (s) Great Redesign.pdf
Steven McGee
 
Presentation Mehdi Monitorama 2022 Cancer and Monitoring
Presentation Mehdi Monitorama 2022 Cancer and MonitoringPresentation Mehdi Monitorama 2022 Cancer and Monitoring
Presentation Mehdi Monitorama 2022 Cancer and Monitoring
mdaoudi
 
How to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabberHow to Install & Activate ListGrabber - eGrabber
How to Install & Activate ListGrabber - eGrabber
eGrabber
 
ProjectArtificial Intelligence Good or Evil.pptx
ProjectArtificial Intelligence Good or Evil.pptxProjectArtificial Intelligence Good or Evil.pptx
ProjectArtificial Intelligence Good or Evil.pptx
OlenaKotovska
 
GiacomoVacca - WebRTC - troubleshooting media negotiation.pdf
GiacomoVacca - WebRTC - troubleshooting media negotiation.pdfGiacomoVacca - WebRTC - troubleshooting media negotiation.pdf
GiacomoVacca - WebRTC - troubleshooting media negotiation.pdf
Giacomo Vacca
 
plataforma virtual E learning y sus características.pdf
plataforma virtual E learning y sus características.pdfplataforma virtual E learning y sus características.pdf
plataforma virtual E learning y sus características.pdf
valdiviesovaleriamis
 
IoT PPT introduction to internet of things
IoT PPT introduction to internet of thingsIoT PPT introduction to internet of things
IoT PPT introduction to internet of things
VaishnaviPatil3995
 
Cloud-to-cloud Migration presentation.pptx
Cloud-to-cloud Migration presentation.pptxCloud-to-cloud Migration presentation.pptx
Cloud-to-cloud Migration presentation.pptx
marketing140789
 
The Hidden Risks of Hiring Hackers to Change Grades: An Awareness Guide
The Hidden Risks of Hiring Hackers to Change Grades: An Awareness GuideThe Hidden Risks of Hiring Hackers to Change Grades: An Awareness Guide
The Hidden Risks of Hiring Hackers to Change Grades: An Awareness Guide
russellpeter1995
 
Ad

Distributed Query Processing for Federated RDF Data Management

  • 1. Distributed Query Processing for Federated RDF Data Management Olaf Görlitz 07.11.2014
  • 2. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 2 The Linked Open Data Cloud Use as one large database!
  • 3. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 3 Life Science Scenario Find drugs for nutritional supplementation SELECT ?drug ?id ?title WHERE {   ?drug drugbank:drugCategory category:micronutrient .   ?drug drugbank:casRegistryNumber ?id .   ?keggDrug rdf:type kegg:Drug .   ?keggDrug bio2rdf:xRef ?id .   ?keggDrug purl:title ?title . }
  • 4. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 4 Linked Data Querying Paradigms Data Warehouse Link Traversal Federation
  • 5. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 5 Linked Data Querying Paradigms Requirements Data Warehouse Link Traversal Federation Query Expressiveness Schema Mapping Data Freshness Result Completeness Scalability Flexibility Availability Performance
  • 6. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 6 Contributions Large Scale Information Retrieval RDF Federation & Query Optimization Benchmarking RDF Federation Systems PINTS Peer-to-Peer Statistics Management SPLENDID Distributed SPARQL Query Processing SPLODGE Linked Data Query Generation Görlitz, Staab: SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions. COLD'11 Görlitz, Thimm, Staab: SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open Data. ISWC'12 Görlitz, Sizov, Staab: PINTS: Peer- to-Peer Infrastructure for Tagging Systems. IPTPS'08
  • 7. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 7 SPLENDID Federation Federated Databases Federated RDF ● Relational Schema ● Implicit Schema, Ontologies ● Specific Data Wrappers ● SPARQL endpoints ● Rich Data Statistics ● Limited Statistics (voiD) Execute complex SPARQL queries over federated RDF data sources
  • 8. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 8 SPLENDID Federation SPARQL Query Source Selection Query Optimization Query Execution SELECT ?drug ?id ?title WHERE {   ?drug drugbank:drugCategory category:micronutrient .   ?drug drugbank:casRegistryNumber ?id .   ?keggDrug bio2rdf:xRef ?id .   ?keggDrug rdf:type kegg:Drug .   ?keggDrug purl:title ?title . } ⋈?drug ⋈?id ⋈?keggDrug ⋈?keggDrug ? drugdrugbank :drugCategory category: micronutrient ? drugdrugbank :casRegistryNumber ?id ? keggDrugrdf : type kegg: Drug ? keggDrugbio 2rdf : xRef ?id ? keggDrugpurl: title? title
  • 9. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 9 Source Selection Objectives SPARQL Query Source Selection Query Optimization Query Execution Determine all relevant data sources DARQ FedX SPLENDID ● Explicit 'capabilities' ● Query restrictions (bound predicates) ● ASK queries + caching many (initial) requests ● Sub query aggregation ● VoiD descriptions + ASK queries ● Sub query aggregation
  • 10. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 10 voiD voiD voiDvoiD Source Selection Example SELECT ?drug ?title WHERE {   ?drug drugbank:drugCategory category:micronutrient .   ?drug drugbank:casRegistryNumber ?id .   ?keggDrug rdf:type kegg:Drug .   ?keggDrug bio2rdf:xRef ?id .   ?keggDrug purl:title ?title . } → KEGG, DBpedia, ChEBI → KEGG → DrugBank SPARQL ASK → DrugBank, ChEBI → KEGG
  • 11. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 11 Source Selection Result ⋈?drug ⋈?id ⋈?keggDrug ⋈?keggDrug ? drugdrugbank :drugCategory category: micronutrient ? drugdrugbank :casRegistryNumber ?id ? keggDrugrdf : type kegg: Drug ? keggDrugbio 2rdf: xRef ?id ? keggDrugpurl: title? title
  • 12. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 12 Query Optimization SPARQL Query Source Selection Query Optimization Query Execution Find best (fastest) query execution plan DARQ FedX SPLENDID ● Dynamic Programming ● Custom Statistics ● Only bound predicates ● Bind Join ● Join Order Heuristics ● No Statistics ● Join Chains ● Bind Join ● Dynamic Programming ● Extended voiD statistics ● Bind + Hash Join
  • 13. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 13 Dynamic Programming ● iterate over all possible execution plans ● compare cost (execution time) BindJoin, HashJoin ⋈?drug ⋈?id ⋈?keggDrug ⋈?keggDrug ? drugdrugbank :drugCategory category: micronutrient ? drugdrugbank :casRegistryNumber ?id ? keggDrugrdf : type kegg: Drug ? keggDrugbio 2rdf : xRef ?id ? keggDrugpurl: title? title Cost Model costsend−query costreceive−tuple card(R(qi ))
  • 14. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 14 Cardinality Estimation ⋈?drug ⋈?id ⋈?keggDrug ⋈?keggDrug ? drugdrugbank :drugCategory category: micronutrient ? drugdrugbank :casRegistryNumber ?id ? keggDrugrdf: type kegg: Drug ? keggDrugbio 2rdf : xRef ?id ? keggDrugpurl: title? title
  • 15. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 15 Cardinality Estimation (Triple Pattern) cardd (s, p,o) = |d|⋅seld(s)⋅seld (p)⋅seld(o), d∈D Assuming independence of s, p ,o cardd (?,p,?) cardd (s ,? ,?) cardd (?,?,o) cardd (s ,? ,o) cardd (s ,p,?) cardd (?,p,o) cardd (?,?,?) cardd (s,p,o)= voiDd →|d| = 1 = voiDd →p = voiDd→|d| voiDd →|s| = voiDd→|d| voiDd →|o| = 1 = voiDd →p voiDd→|sp| = voiDd →p voiDd→|op| cardd (?,rdf: type,T) = voiDd →T
  • 16. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 16 Cardinality Estimation (Basic Graph Pattern) Star Pattern Path Pattern kegg:Drug ?keggDrug rn:R01786 ?title rdf:Type purl:title bio2rdf:xRef drugbank:Drug ?keggDrug rdf:Type owl:sameAs ?drug kegg:Drug rdf:Type cardd * (P1 ⋈ P2 ⋈ P3) = min(cardd (P1),cardd (P2)) ⋅ voiDd →p3 voiDd →|sp3 | cardd ,d ' ~ (P1 ⋈ P2) = cardd (P1)⋅cardd ' (P2) ⋅seld ,d ' (P1 ⋈ P2)
  • 17. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 17 Query Optimization SPARQL Query Source Selection Query Optimization Query Execution ⋈?drug ⋈B(? id) ⋈?keggDrug ⋈H(? keggDrug) ? drugdrugbank :drugCategory category: micronutrient ? drugdrugbank :casRegistryNumber ?id ? keggDrugrdf : type kegg: Drug ? keggDrugbio 2rdf: xRef ?id ? keggDrugpurl: title? title
  • 18. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 18 Evaluation Methodology Compare with state-of-the-art federation systems – Use Multiple linked datasets – With representative characteristics – Execute 'typical' SPARQL queries – In a reproducible benchmark setup FedBench
  • 19. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 19 Evaluation Results
  • 20. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 20 Conclusion ● Federation for Linked Open Data – Database + Semantic Web technology – Efficient Distributed Query Processing – Extension of voiD statistics ● Query generation for Federation Benchmarks ● Efficient statistics management in P2P networks
  • 21. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 21 Thank You
  • 22. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 22 VoiD Descriptions/Statistics } } } } General Information Basic statistics triples = 732744 Type statistics chebi:Compound = 50477 Predicate statistics bio:formula = 39555
  • 23. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 23 VoiD statistics extension
  • 24. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 24 State of the Art DARQ AliBaba FedX SPLENDID Statistics ServiceDesc – – VoiD Source Selection Statistics (predicates) All sources ASK queries Statistics + ASK queries Query Optimization DynProg Heuristics Heuristics DynProg Query Execution Bind join Bind join Bound Join + parallelization Bind Join + Hash Join
  • 25. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 25 SPARQL limitations ● Query protocol ● Only SPARQL endpoints ● Endpoint limitations – SPARQL version – Result size – Data rate – Availability
  • 26. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 26 Join Implementation R1 R2 R1 R2 ⋈B ⋈H Bind Join Hash Join ?id ?y 1 42 2 13 3 20 4 50 5 3 ?id ?x 1 'A' 1 'G' 4 'A' 7 'A' 7 'C'
  • 27. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 27 Join Cost Model R(q1) R(q2 ') R(q1) R(q2) ⋈B ⋈H Bind Join Hash Join cost⋈B (q1, q2) = |R(q1)|⋅costtuple + |R(q1)|⋅costquery + |R(q2')|⋅costtuple cost⋈H (q1, q2) = |R(q1)|⋅costtuple + |R(q2)|⋅costtuple + 2⋅costquery
  • 28. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 28 SPARQL Semi Join
  • 29. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 29 SPLENDID Architecture
  • 30. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 30 FedBench Datasets ● Cross Domain ● Life Science ● Linked Data
  • 31. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 31 Data Source Selection: Requests
  • 32. Olaf Görlitz: Distributed Query Processing for Federated RDF Data Management 07.11.2014 Slide 32 Conclusion Linked Open Data voiD Web-scale Query Processing SPLENDID
  翻译: