SlideShare a Scribd company logo
Discovering	
  Related	
  Data	
  Sources	
  	
  
in	
  Data	
  Portals	
  
	
  
Andreas	
  Wagner,	
  Peter	
  Haase,	
  	
  
Achim	
  Re4nger,	
  Holger	
  Lamm	
  

1st	
  Interna:onal	
  Workshop	
  on	
  Seman:c	
  Sta:s:cs	
  
Sydney,	
  Oct	
  22,	
  2013	
  

	
  
Poten&al	
  of	
  Open	
  (Sta&s&cs)	
  Data	
  

WORLD BANK
fluidOps	
  Open	
  Data	
  Portal	
  
•  Data	
  collec&on	
  

•  Integra&on	
  of	
  major	
  open	
  data	
  catalogs	
  
•  Automated	
  provisioning	
  of	
  10.000s	
  data	
  sets	
  

•  Portal	
  for	
  search	
  and	
  explora&on	
  of	
  data	
  sets	
  
•  Rich	
  metadata	
  based	
  on	
  open	
  standards	
  
•  Both	
  descrip&ve	
  and	
  structural	
  metadata	
  

•  Integrated	
  querying	
  across	
  interlinked	
  data	
  sets	
  
•  Easy	
  to	
  use	
  queries	
  against	
  mul&ple	
  data	
  sets	
  
•  Using	
  federa&on	
  technologies	
  

•  Self-­‐service	
  UI	
  

•  Custom	
  queries	
  and	
  visualiza&ons	
  
•  Widgets,	
  dashboarding,	
  etc.	
  

WORLD BANK
Discovering Related Data Sources in Data Portals
Finding	
  Related	
  Data	
  Sets	
  
•  Many	
  informa&on	
  needs	
  require	
  analysis	
  of	
  mul&ple	
  data	
  sets	
  
•  Example:	
  Compare	
  and	
  correlate	
  GDP,	
  popula&on	
  and	
  public	
  debt	
  
of	
  countries	
  over	
  &me	
  
•  Task	
  of	
  finding	
  related	
  data	
  sets	
  

•  Iden&fy	
  data	
  sets	
  that	
  are	
  similar,	
  but	
  complementary	
  
•  To	
  support	
  queries	
  across	
  mul&ple	
  data	
  sets,	
  e.g.	
  in	
  the	
  form	
  of	
  joins	
  
and	
  unions	
  

•  Inspira&on:	
  Finding	
  related	
  tables	
  

•  En&ty	
  complement:	
  same	
  aVributes,	
  complemen&ng	
  en&&es	
  
•  Schema	
  complement:	
  same	
  en&&es,	
  complemen&ng	
  aVributes	
  
Finding	
  Related	
  Data	
  Sources	
  
via	
  Related	
  En&&es	
  
•  Data	
  Model:	
  Data	
  source	
  is	
  a	
  set	
  of	
  mul&ple	
  
RDF	
  graphs	
  
•  Intui&on:	
  if	
  data	
  sources	
  contain	
  similar	
  
en&&es,	
  they	
  are	
  somehow	
  related	
  
Cluster	
  2	
  
Cluster	
  1	
  
•  Approach:	
  
En&&es	
  

1.  En&ty	
  Extrac&on	
  
2.  En&ty	
  Similarity	
  
3.  En&ty	
  Clustering	
  

Related?!	
  

Source	
  1	
  

Source	
  3	
  
Source	
  2	
  
Related	
  En&&es	
  (2)	
  
1.  En&ty	
  Extrac&on	
  

–  Sample	
  over	
  en&&es	
  in	
  data	
  graphs	
  in	
  D	
  
–  For	
  each	
  en&ty	
  crawl	
  its	
  surrounding	
  sub-­‐graph	
  [1]	
  

2.  En&ty	
  Similarity	
  

–  Define	
  dissimilarity	
  measure	
  between	
  two	
  en&&es	
  
based	
  on	
  kernel	
  func&ons	
  
–  Compare	
  en&ty	
  structure	
  and	
  literals	
  via	
  different	
  
kernels	
  [2,3]	
  

3.  En&ty	
  Clustering	
  

–  Apply	
  k-­‐means	
  clustering	
  to	
  discover	
  similar	
  	
  
	
  en&&es	
  [4]	
  
Contextualisa&on	
  Score	
  
•  Contextualiza&on	
  score	
  for	
  data	
  source	
  D’’	
  
given	
  D’:	
  ec(D’’|D’)	
  and	
  sc(D’’|D’)	
  
•  En*ty	
  complement	
  score	
  

•  Schema	
  complement	
  score	
  
Discovering Related Data Sources in Data Portals
Search	
  for	
  Gross	
  Domes&c	
  Product	
  
Discovering Related Data Sources in Data Portals
Querying	
  the	
  Data	
  Set	
  
Visualizing	
  the	
  Results	
  
Queries	
  Across	
  Related	
  Data	
  Sets	
  
•  Query	
  for	
  GDP	
  of	
  Germany	
  
•  Union	
  of	
  results	
  from	
  	
  
•  Worldbank:	
  GDP	
  (current	
  US$	
  )	
  (up	
  to	
  2010)	
  
•  Eurostat:	
  GDP	
  at	
  Market	
  Prices	
  (including	
  projected	
  values	
  un&l	
  2014)	
  
Queries	
  Across	
  Related	
  Data	
  Sets	
  

Data	
  from	
  Worldbank	
  

Data	
  from	
  Eurostat	
  
Summary	
  and	
  Outlook	
  
•  Techniques	
  for	
  finding	
  related	
  data	
  sets	
  
–  Based	
  on	
  finding	
  related	
  en&&es	
  

•  Implementa&on	
  available	
  in	
  open	
  data	
  portal	
  
•  Outlook	
  

–  Finding	
  relevant	
  related	
  data	
  sources	
  for	
  a	
  given	
  
informa&on	
  need	
  
–  End	
  user	
  interfaces	
  for	
  formula&ng	
  queries	
  	
  
across	
  data	
  sets	
  (see	
  Op&que	
  project)	
  
–  Operators	
  for	
  combining	
  data	
  cubes	
  
–  Interac&ve	
  visualiza&on	
  and	
  explora&on	
  of	
  	
  
combined	
  data	
  cubes	
  (see	
  OpenCube	
  project)	
  
References	
  
[1]	
   	
  G.	
  A.	
  Grimnes,	
  P.	
  Edwards,	
  and	
  A.	
  Preece.	
  
	
  Instance	
  based	
  clustering	
  of	
  seman:c	
  web	
  
	
  resources.	
  In	
  ESWC,	
  2008.	
  
[2] 	
  U.	
  Lösch,	
  S.	
  Bloehdorn,	
  and	
  A.	
  Reenger.	
  
	
  Graph	
  kernels	
  for	
  RDF	
  data.	
  In	
  ESWC,	
  2012.	
  
[3] 	
  J.	
  Shawe-­‐Taylor	
  and	
  N.	
  Cris&anini.	
  Kernel	
  
	
  Methods	
  for	
  PaPern	
  Analysis.	
  2004.	
  
[4]	
   	
  R.	
  Zhang	
  and	
  A.	
  Rudnicky.	
  A	
  large	
  scale	
  
	
  clustering	
  scheme	
  for	
  kernel	
  k-­‐means.	
  In	
  
	
  PaVern	
  Recogni&on,	
  2002.	
  
	
  
	
  
Ad

More Related Content

What's hot (20)

Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
Michele Pasin
 
Sören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge GraphsSören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge Graphs
semanticsconference
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
Ioan Toma
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
Enno Meijers
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
Chiara Del Vescovo
 
DSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRISDSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRIS
Andrea Bollini
 
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
OpenAIRE
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides
DuraSpace
 
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
LIBER Europe
 
DSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformDSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platform
Andrea Bollini
 
Linked Data
Linked DataLinked Data
Linked Data
Anja Jentzsch
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
semanticsconference
 
Linked Data efforts for data standards in biopharma and healthcare
Linked Data efforts for data standards in biopharma and healthcareLinked Data efforts for data standards in biopharma and healthcare
Linked Data efforts for data standards in biopharma and healthcare
Kerstin Forsberg
 
Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...
Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...
Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...
4Science
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...
LIBER Europe
 
Wikidata
WikidataWikidata
Wikidata
Anja Jentzsch
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Fabrizio Orlandi
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream
Andrea Bollini
 
The CIARD RINGValeri
The CIARD RINGValeriThe CIARD RINGValeri
The CIARD RINGValeri
CIARD Movement
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
Fabrizio Orlandi
 
Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...Linked data experience at Macmillan: Building discovery services for scientif...
Linked data experience at Macmillan: Building discovery services for scientif...
Michele Pasin
 
Sören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge GraphsSören Auer | Enterprise Knowledge Graphs
Sören Auer | Enterprise Knowledge Graphs
semanticsconference
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
Ioan Toma
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
Enno Meijers
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
Chiara Del Vescovo
 
DSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRISDSpace standard Data model and DSpace-CRIS
DSpace standard Data model and DSpace-CRIS
Andrea Bollini
 
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
Making Use of the Linked Open Data Services for OpenAIRE (DI4R 2016 tutorial ...
OpenAIRE
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides
DuraSpace
 
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
LIBER Europe
 
DSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platformDSpace-CRIS: a CRIS enhanced repository platform
DSpace-CRIS: a CRIS enhanced repository platform
Andrea Bollini
 
Session 1.6 slovak public metadata governance and management based on linke...
Session 1.6   slovak public metadata governance and management based on linke...Session 1.6   slovak public metadata governance and management based on linke...
Session 1.6 slovak public metadata governance and management based on linke...
semanticsconference
 
Linked Data efforts for data standards in biopharma and healthcare
Linked Data efforts for data standards in biopharma and healthcareLinked Data efforts for data standards in biopharma and healthcare
Linked Data efforts for data standards in biopharma and healthcare
Kerstin Forsberg
 
Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...
Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...
Enhancing Interoperability: The Implementation of OpenAIRE Guidelines and COA...
4Science
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...
LIBER Europe
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Fabrizio Orlandi
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream
Andrea Bollini
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
Fabrizio Orlandi
 

Similar to Discovering Related Data Sources in Data Portals (20)

Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
AKSHAY BHAGAT
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
Bernhard Haslhofer
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
giuseppe_futia
 
Unit 3 part i Data mining
Unit 3 part i Data miningUnit 3 part i Data mining
Unit 3 part i Data mining
Dhilsath Fathima
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
Riccardo Albertoni
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
Giorgos Santipantakis
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
Jamshaid Ashraf
 
UNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data MiningUNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data Mining
Nandakumar P
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Data
aba-sah
 
A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...
Craig Knoblock
 
At33264269
At33264269At33264269
At33264269
IJERA Editor
 
At33264269
At33264269At33264269
At33264269
IJERA Editor
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
dgarijo
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
SSSW
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
IEEEMEMTECHSTUDENTSPROJECTS
 
Relational Database explanation with detail.pdf
Relational Database explanation with detail.pdfRelational Database explanation with detail.pdf
Relational Database explanation with detail.pdf
9wldv5h8n
 
Semantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including AstrophysicsSemantic Technologies for Big Sciences including Astrophysics
Semantic Technologies for Big Sciences including Astrophysics
Artificial Intelligence Institute at UofSC
 
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016
CLARIAH
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
Symeon Papadopoulos
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
AKSHAY BHAGAT
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
giuseppe_futia
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
Riccardo Albertoni
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
Giorgos Santipantakis
 
A Framework for Ontology Usage Analysis
A Framework for Ontology Usage AnalysisA Framework for Ontology Usage Analysis
A Framework for Ontology Usage Analysis
Jamshaid Ashraf
 
UNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data MiningUNIT - 5: Data Warehousing and Data Mining
UNIT - 5: Data Warehousing and Data Mining
Nandakumar P
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Data
aba-sah
 
A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...A scalable architecture for extracting, aligning, linking, and visualizing mu...
A scalable architecture for extracting, aligning, linking, and visualizing mu...
Craig Knoblock
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
dgarijo
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
SSSW
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
IEEEMEMTECHSTUDENTSPROJECTS
 
Relational Database explanation with detail.pdf
Relational Database explanation with detail.pdfRelational Database explanation with detail.pdf
Relational Database explanation with detail.pdf
9wldv5h8n
 
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016
WP4: overzicht van de voortgang van WP4 op de CLARIAH-dag 22 januari 2016
CLARIAH
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
Symeon Papadopoulos
 
Ad

More from Peter Haase (11)

Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryVisual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Peter Haase
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge Graphs
Peter Haase
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Peter Haase
 
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked DataMapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
Peter Haase
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic Technologies
Peter Haase
 
Linked Data as a Service
Linked Data as a ServiceLinked Data as a Service
Linked Data as a Service
Peter Haase
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Peter Haase
 
Everything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchEverything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information Workbench
Peter Haase
 
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
Peter Haase
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
Peter Haase
 
Semantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementSemantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud Management
Peter Haase
 
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactoryVisual Ontology Modeling for Domain Experts and Business Users with metaphactory
Visual Ontology Modeling for Domain Experts and Business Users with metaphactory
Peter Haase
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge Graphs
Peter Haase
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Peter Haase
 
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked DataMapping, Interlinking and Exposing MusicBrainz as Linked Data
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
Peter Haase
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic Technologies
Peter Haase
 
Linked Data as a Service
Linked Data as a ServiceLinked Data as a Service
Linked Data as a Service
Peter Haase
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Peter Haase
 
Everything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information WorkbenchEverything Self-Service:Linked Data Applications with the Information Workbench
Everything Self-Service:Linked Data Applications with the Information Workbench
Peter Haase
 
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...The Information Workbench as a Self-Service Platform for Linked Data Applicat...
The Information Workbench as a Self-Service Platform for Linked Data Applicat...
Peter Haase
 
Cloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application DevelopmentCloud-based Linked Data Management for Self-service Application Development
Cloud-based Linked Data Management for Self-service Application Development
Peter Haase
 
Semantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementSemantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud Management
Peter Haase
 
Ad

Recently uploaded (20)

Shoehorning dependency injection into a FP language, what does it take?
Shoehorning dependency injection into a FP language, what does it take?Shoehorning dependency injection into a FP language, what does it take?
Shoehorning dependency injection into a FP language, what does it take?
Eric Torreborre
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Gary Arora
 
Why Slack Should Be Your Next Business Tool? (Tips to Make Most out of Slack)
Why Slack Should Be Your Next Business Tool? (Tips to Make Most out of Slack)Why Slack Should Be Your Next Business Tool? (Tips to Make Most out of Slack)
Why Slack Should Be Your Next Business Tool? (Tips to Make Most out of Slack)
Cyntexa
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Top 5 Qualities to Look for in Salesforce Partners in 2025
Top 5 Qualities to Look for in Salesforce Partners in 2025Top 5 Qualities to Look for in Salesforce Partners in 2025
Top 5 Qualities to Look for in Salesforce Partners in 2025
Damco Salesforce Services
 
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient CareAn Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
Cyntexa
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
Agentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community MeetupAgentic Automation - Delhi UiPath Community Meetup
Agentic Automation - Delhi UiPath Community Meetup
Manoj Batra (1600 + Connections)
 
May Patch Tuesday
May Patch TuesdayMay Patch Tuesday
May Patch Tuesday
Ivanti
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
Toru Tamaki
 
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
ICT Frame Magazine Pvt. Ltd.
 
Cybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft CertificateCybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft Certificate
VICTOR MAESTRE RAMIREZ
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
Dark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanizationDark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanization
Jakub Šimek
 
Shoehorning dependency injection into a FP language, what does it take?
Shoehorning dependency injection into a FP language, what does it take?Shoehorning dependency injection into a FP language, what does it take?
Shoehorning dependency injection into a FP language, what does it take?
Eric Torreborre
 
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
RTP Over QUIC: An Interesting Opportunity Or Wasted Time?
Lorenzo Miniero
 
Building the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdfBuilding the Customer Identity Community, Together.pdf
Building the Customer Identity Community, Together.pdf
Cheryl Hung
 
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Harmonizing Multi-Agent Intelligence | Open Data Science Conference | Gary Ar...
Gary Arora
 
Why Slack Should Be Your Next Business Tool? (Tips to Make Most out of Slack)
Why Slack Should Be Your Next Business Tool? (Tips to Make Most out of Slack)Why Slack Should Be Your Next Business Tool? (Tips to Make Most out of Slack)
Why Slack Should Be Your Next Business Tool? (Tips to Make Most out of Slack)
Cyntexa
 
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptxDevOpsDays SLC - Platform Engineers are Product Managers.pptx
DevOpsDays SLC - Platform Engineers are Product Managers.pptx
Justin Reock
 
Top 5 Qualities to Look for in Salesforce Partners in 2025
Top 5 Qualities to Look for in Salesforce Partners in 2025Top 5 Qualities to Look for in Salesforce Partners in 2025
Top 5 Qualities to Look for in Salesforce Partners in 2025
Damco Salesforce Services
 
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient CareAn Overview of Salesforce Health Cloud & How is it Transforming Patient Care
An Overview of Salesforce Health Cloud & How is it Transforming Patient Care
Cyntexa
 
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdfKit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Kit-Works Team Study_팀스터디_김한솔_nuqs_20250509.pdf
Wonjun Hwang
 
fennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solutionfennec fox optimization algorithm for optimal solution
fennec fox optimization algorithm for optimal solution
shallal2
 
IT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information TechnologyIT488 Wireless Sensor Networks_Information Technology
IT488 Wireless Sensor Networks_Information Technology
SHEHABALYAMANI
 
May Patch Tuesday
May Patch TuesdayMay Patch Tuesday
May Patch Tuesday
Ivanti
 
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier VroomAI x Accessibility UXPA by Stew Smith and Olivier Vroom
AI x Accessibility UXPA by Stew Smith and Olivier Vroom
UXPA Boston
 
Config 2025 presentation recap covering both days
Config 2025 presentation recap covering both daysConfig 2025 presentation recap covering both days
Config 2025 presentation recap covering both days
TrishAntoni1
 
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
論文紹介:"InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning" ...
Toru Tamaki
 
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
MULTI-STAKEHOLDER CONSULTATION PROGRAM On Implementation of DNF 2.0 and Way F...
ICT Frame Magazine Pvt. Ltd.
 
Cybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft CertificateCybersecurity Tools and Technologies - Microsoft Certificate
Cybersecurity Tools and Technologies - Microsoft Certificate
VICTOR MAESTRE RAMIREZ
 
IT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information TechnologyIT484 Cyber Forensics_Information Technology
IT484 Cyber Forensics_Information Technology
SHEHABALYAMANI
 
Dark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanizationDark Dynamism: drones, dark factories and deurbanization
Dark Dynamism: drones, dark factories and deurbanization
Jakub Šimek
 

Discovering Related Data Sources in Data Portals

  • 1. Discovering  Related  Data  Sources     in  Data  Portals     Andreas  Wagner,  Peter  Haase,     Achim  Re4nger,  Holger  Lamm   1st  Interna:onal  Workshop  on  Seman:c  Sta:s:cs   Sydney,  Oct  22,  2013    
  • 2. Poten&al  of  Open  (Sta&s&cs)  Data   WORLD BANK
  • 3. fluidOps  Open  Data  Portal   •  Data  collec&on   •  Integra&on  of  major  open  data  catalogs   •  Automated  provisioning  of  10.000s  data  sets   •  Portal  for  search  and  explora&on  of  data  sets   •  Rich  metadata  based  on  open  standards   •  Both  descrip&ve  and  structural  metadata   •  Integrated  querying  across  interlinked  data  sets   •  Easy  to  use  queries  against  mul&ple  data  sets   •  Using  federa&on  technologies   •  Self-­‐service  UI   •  Custom  queries  and  visualiza&ons   •  Widgets,  dashboarding,  etc.   WORLD BANK
  • 5. Finding  Related  Data  Sets   •  Many  informa&on  needs  require  analysis  of  mul&ple  data  sets   •  Example:  Compare  and  correlate  GDP,  popula&on  and  public  debt   of  countries  over  &me   •  Task  of  finding  related  data  sets   •  Iden&fy  data  sets  that  are  similar,  but  complementary   •  To  support  queries  across  mul&ple  data  sets,  e.g.  in  the  form  of  joins   and  unions   •  Inspira&on:  Finding  related  tables   •  En&ty  complement:  same  aVributes,  complemen&ng  en&&es   •  Schema  complement:  same  en&&es,  complemen&ng  aVributes  
  • 6. Finding  Related  Data  Sources   via  Related  En&&es   •  Data  Model:  Data  source  is  a  set  of  mul&ple   RDF  graphs   •  Intui&on:  if  data  sources  contain  similar   en&&es,  they  are  somehow  related   Cluster  2   Cluster  1   •  Approach:   En&&es   1.  En&ty  Extrac&on   2.  En&ty  Similarity   3.  En&ty  Clustering   Related?!   Source  1   Source  3   Source  2  
  • 7. Related  En&&es  (2)   1.  En&ty  Extrac&on   –  Sample  over  en&&es  in  data  graphs  in  D   –  For  each  en&ty  crawl  its  surrounding  sub-­‐graph  [1]   2.  En&ty  Similarity   –  Define  dissimilarity  measure  between  two  en&&es   based  on  kernel  func&ons   –  Compare  en&ty  structure  and  literals  via  different   kernels  [2,3]   3.  En&ty  Clustering   –  Apply  k-­‐means  clustering  to  discover  similar      en&&es  [4]  
  • 8. Contextualisa&on  Score   •  Contextualiza&on  score  for  data  source  D’’   given  D’:  ec(D’’|D’)  and  sc(D’’|D’)   •  En*ty  complement  score   •  Schema  complement  score  
  • 10. Search  for  Gross  Domes&c  Product  
  • 14. Queries  Across  Related  Data  Sets   •  Query  for  GDP  of  Germany   •  Union  of  results  from     •  Worldbank:  GDP  (current  US$  )  (up  to  2010)   •  Eurostat:  GDP  at  Market  Prices  (including  projected  values  un&l  2014)  
  • 15. Queries  Across  Related  Data  Sets   Data  from  Worldbank   Data  from  Eurostat  
  • 16. Summary  and  Outlook   •  Techniques  for  finding  related  data  sets   –  Based  on  finding  related  en&&es   •  Implementa&on  available  in  open  data  portal   •  Outlook   –  Finding  relevant  related  data  sources  for  a  given   informa&on  need   –  End  user  interfaces  for  formula&ng  queries     across  data  sets  (see  Op&que  project)   –  Operators  for  combining  data  cubes   –  Interac&ve  visualiza&on  and  explora&on  of     combined  data  cubes  (see  OpenCube  project)  
  • 17. References   [1]    G.  A.  Grimnes,  P.  Edwards,  and  A.  Preece.    Instance  based  clustering  of  seman:c  web    resources.  In  ESWC,  2008.   [2]  U.  Lösch,  S.  Bloehdorn,  and  A.  Reenger.    Graph  kernels  for  RDF  data.  In  ESWC,  2012.   [3]  J.  Shawe-­‐Taylor  and  N.  Cris&anini.  Kernel    Methods  for  PaPern  Analysis.  2004.   [4]    R.  Zhang  and  A.  Rudnicky.  A  large  scale    clustering  scheme  for  kernel  k-­‐means.  In    PaVern  Recogni&on,  2002.      
  翻译: