Publishing Web data
Master DSC & CPS²
Antoine Zimmermann
Use arrows to navigate through the slides
→ or ↓ to go one slide forward
← or ↑ go one slide backward
↖ to go to the first slide
End to go to the last slide
1
Publishing RDF on the Web
- Use case: I want to publish my personal profile in RDF, with my name, affiliation, interests, education, professional relationships, etc.
- Simple conceptual model but...
- what IRI should I use (for myself, my company, etc.)?
- what properties?
- where do I put the data?
- how do I make the data easily usable?
- ...
- See also: Best Practices for Publishing Linked Data – W3C Note 9 January 2014
2
Linked Data Principles
- Use URIs as names for things
- Use HTTP URI so that people can look up those names
- When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
- Include links to other URIs. so that they can discover more things.
See: Linked Data. Tim Berners-Lee’s design issues. July 2006 (revised June
2009)
See also: Tim Berners-Lee’s TED talk. Feb. 2009
3
Linked Data Principles
- Use IRIs as names for things
- Use HTTP IRI so that people can look up those names
- When someone looks up a IRI, provide useful information, using the standards (RDF*, SPARQL)
- Include links to other IRIs. so that they can discover more things.
4
Data should go FAIR!
- To maximise (re)usability of data, they should be:
The FAIR Guiding principles for data management and stewardship
- Findable
- Accessible
- Interoperable
- Reusable
- There is a worldwide initiative supported by many governments and public agencies to push towards FAIR data: Go FAIR
- Linked Data and Semantic Web technologies provide a way to implement FAIR principles
5
Dereferenceing
- Dereferenceing: operation that consists in using an IRI as a URL to get whatever document you can access using that URL
- Corresponds to issueing a
GET
method in HTTP, with the URL stripped of any fragment identifier
- An IRI is dereferenceable if it can be used in a HTTP
GET
request to access a document
6
What do HTTP URIs identify?
- Rule of thumb:
- If a URL directly locates a document then the URL must identify the document
- How do we identify things that are not documents (physical objects, people, ideas, etc.)?
- Non HTTP URIs? → breaks rule n°2 of Linked Data
- HTTP URIs that do not locate documents (e.g., gives 404) → breaks rule n°3 of Linked Data
7
W3C Technical Architecture Group advice
- If the server returns
200 OK
to an IRI look up, then the IRI must denote an information resource (≈ a Web document)
- Otherwise, the IRI may denote anything
- Advice: to identify non-information resources, use either “hash IRIs” or [303-redirected] “slash IRIs”
Warning: controversial decision of the TAG, discussions on this issue have been occasionnally showing up on mailing lists since 2002!
8
Slash IRIs (1)
- A slash IRI is an IRI with a ‘
/
’ followed by a local name:
http://dbpedia.org/resource/Semantic_Web
- issue a GET request:
/resource/Semantic_Web HTTP/1.1
dbpedia.org
Accept: text/html
- server replies:
HTTP/1.1 303 See Other
http://dbpedia.org/page/Semantic_Web
- issue a new GET request:
/page/Semantic_Web HTTP/1.1
dbpedia.org
text/html
- server replies:
HTTP/1.1 200 OK
9
Slash IRIs (1)
- issue a GET request:
/resource/Semantic_Web HTTP/1.1
dbpedia.org
Accept: application/rdf+xml
- server replies:
HTTP/1.1 303 See Other
http://dbpedia.org/data/Semantic_Web
- issue a new GET request:
/data/Semantic_Web HTTP/1.1
dbpedia.org
application/rdf+xml
- server replies:
HTTP/1.1 200 OK
10
Means of publishing RDF
- Put RDF files online (in RDF/XML, Turtle, etc)
- Publish RDF along with web pages (RDFa & JSON-LD)
- Some CMS generate RDF automatically (e.g., Drupal 7+)
- You’ll see more about JSON-LD later
- Generate RDF from other existing formats
- Keep RDF inside database, but provide access via queries (SPARQL endpoints)
12
Existing online RDF datasets
13
Finding existing vocabularies
- Reuse well known vocabularies (Dublin Core, schema.org, FOAF, SIOC, Good Relations, SKOS, voiD, etc.)
- Try an ontology / vocabulary search engine or repository:
- Search engines:
FalconS 💀, SWSE 💀, Sindice (integrated in proprietary software), OU’s Watson 💀, Swoogle 💀, vocab.cc 💀
- Repositories: Linked Open Vocabulary, Ontology Design Patterns, prefix.cc, BioPortal (specialised in bio-medical ontoloies), AgroPortal (specialised in agriculture-related ontologies),
SchemaWeb 💀, Schemapedia 💀, Cupboard 💀, Knoodl, DERI vocabularies 💀, OWL Seek 💀, SchemaCache 💀
- Ask mailing lists, forums (semantic-web@w3.org, stackoverflow.com, Answers knowledge graph)
14
Build your own vocabulary
- Editors:
- Protégé, WebProtégé, NeOn TK, SWOOP, Neologism, TopBraid Composer (commercial software), PoolParty (commercial product), OWLGrEd, Fluent Editor, Semantic Turkey, VocBench,
Vitro 💀, Knoodl 💀, Ontofly 💀, Altova OWL editor 💀, IBM integrated development TK 💀, Anzo for Excel 💀, Euler GUI
- Learn, evaluate:
- Link to other ontologies... more at http://www.w3.org/wiki/Ontology_Dowsing
15
Case 1: Build linked data from text
- Describe in RDF the following situation:
Marco is a student at Université Jean Monnet, studying in the Master 2 programme Web Intelligence. There, he follows the course Semantic Web, taught by Antoine. Marco is italian but lives in Saint-Étienne, place Jean Jaurès, with his friends and flat mates Enrico and José. Marco is interested in Web technologies, theater and sci-fi literature. Enrico is interested in marijuana, reggae and is an activist for worldwide peace. Antoine Zimmermann is associate professor at École des mines, with colleagues Victor Charpenay, Maxime Lefrançois, etc. École des mines is a higher education establishment depending on the Ministry of industry.
16
Case 2: Build linked data from existing data
- Translate the following tables to RDF:
TeamID | Name | Country | Coach |
FRA | XV de France | France | Laporte |
NZL | All Blacks | New Zealand | Henry |
ENG | XV of the Rose | England | Ashton |
… | … | … | … |
PlayerID | Name | TeamID | Position |
1 | Vincent Clerc | FRA | wing |
2 | Lionel Beauxis | FRA | flyhalf |
3 | Joe Rokocoko | NZL | wing |
… | … | … | … |
17
Case :UML to RDF vocabulary
Usually, these translations are appropriate:
- UML classes → RDF classes
- UML attribute → RDF properties with literals as range
- UML links → RDF properties
- generalization →
rdfs:subClassOf
- Visibility and methods are normally not represented in RDF (it’s not a programming language)
- Cardinalities cannot be represented with RDFS, but can in OWL (cf. future courses), but be careful!
- Note: in RDF, properties are not attached to classes. They are first class citizens.
18
RDF files and RDF APIs
- RDF files (RDF/XML, Turtle, N-triples, etc.) can be read into memory with RDF APIs
- The in-memory model of an RDF graph can be manipulated with API methods
19
Storing and managing RDF
- RDF files (RDF/XML, Turtle, N-triples, etc.) can be read into memory with RDF APIs
- Some triple stores scale up to trillions of RDF triples, given enough hardware: GraphDB, AllegroGraph, Virtuoso, Stardog, Amazon Neptune…
- Small capacity triple stores (good for quick development of simple Web apps): Jena Fuzeki, Sesame, and others
20
Linked Data Platform
@base <http://example.com/ldp/>
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix ldp: <http://www.w3.org/ns/ldp#> .
<subfolder> a ldp:Container;
dct:created
"2021-10-01T13:30:30+02:00"^^xsd:dateTime;
ldp:contains
<this>,
<that>,
<it> .
21
Using a Linked Data Platform
- Interaction via HTTP requests
GET
requests access data
POST
requests create resources with an RDF graph and place them in a container
PUT
requests update specific resources
- some metadata is added in passing...
- See demo!
22