Publishing Web data
Master DSC & CPS²
Antoine Zimmermann
Use arrows to navigate through the slides
→ or ↓ to go one slide forward
← or ↑ go one slide backward
↖ to go to the first slide
End to go to the last slide
1
Publishing RDF on the Web
- Use case: I want to publish my personal profile in RDF, with my name, affiliation, interests, education, professional relationships, etc.
- Simple conceptual model but...
- what IRI should I use (for myself, my company, etc.)?
- what properties?
- where do I put the data?
- how do I make the data easily usable?
- ...
- See also: Best Practices for Publishing Linked Data – W3C Note 9 January 2014
2
Linked Data Principles
- Use URIs as names for things
- Use HTTP URI so that people can look up those names
- When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL)
- Include links to other URIs. so that they can discover more things.
See: Linked Data. Tim Berners-Lee’s design issues. July 2006 (revised June
2009)
See also: Tim Berners-Lee’s TED talk. Feb. 2009
3
Linked Data Principles
- Use IRIs as names for things
- Use HTTP IRI so that people can look up those names
- When someone looks up a IRI, provide useful information, using the standards (RDF*, SPARQL)
- Include links to other IRIs. so that they can discover more things.
4
Data should go FAIR!
- To maximise (re)usability of data, they should be:
The FAIR Guiding principles for data management and stewardship
- Findable
- Accessible
- Interoperable
- Reusable
- There is a worldwide initiative supported by many governments and public agencies to push towards FAIR data: Go FAIR
- Linked Data and Semantic Web technologies provide a way to implement FAIR principles
5
Dereferenceing
- Dereferenceing: operation that consists in using an IRI as a URL to get whatever document you can access using that URL
- Corresponds to issueing a
GET method in HTTP, with the URL stripped of any fragment identifier
- An IRI is dereferenceable if it can be used in a HTTP
GET request to access a document
6
What do HTTP URIs identify?
- Rule of thumb:
- If a URL directly locates a document then the URL must identify the document
- How do we identify things that are not documents (physical objects, people, ideas, etc.)?
- Non HTTP URIs? → breaks rule n°2 of Linked Data
- HTTP URIs that do not locate documents (e.g., gives 404) → breaks rule n°3 of Linked Data
7
W3C Technical Architecture Group advice
- If the server returns
200 OK to an IRI look up, then the IRI must denote an information resource (≈ a Web document)
- Otherwise, the IRI may denote anything
- Advice: to identify non-information resources, use either “hash IRIs” or [303-redirected] “slash IRIs”
Warning: controversial decision of the TAG, discussions on this issue have been occasionnally showing up on mailing lists since 2002!
8
Slash IRIs (1)
- A slash IRI is an IRI with a ‘
/’ followed by a local name:
http://dbpedia.org/resource/Semantic_Web
- issue a GET request:
/resource/Semantic_Web HTTP/1.1
dbpedia.org
Accept: text/html
- server replies:
HTTP/1.1 303 See Other
http://dbpedia.org/page/Semantic_Web
- issue a new GET request:
/page/Semantic_Web HTTP/1.1
dbpedia.org
text/html
- server replies:
HTTP/1.1 200 OK
9
Slash IRIs (1)
- issue a GET request:
/resource/Semantic_Web HTTP/1.1
dbpedia.org
Accept: application/rdf+xml
- server replies:
HTTP/1.1 303 See Other
http://dbpedia.org/data/Semantic_Web
- issue a new GET request:
/data/Semantic_Web HTTP/1.1
dbpedia.org
application/rdf+xml
- server replies:
HTTP/1.1 200 OK
10
Means of publishing RDF
- Put RDF files online (in RDF/XML, Turtle, etc)
- Publish RDF along with web pages (RDFa & JSON-LD)
- Some CMS generate RDF automatically (e.g., Drupal 7+)
- You’ll see more about JSON-LD later
- Generate RDF from other existing formats
- Keep RDF inside database, but provide access via queries (SPARQL endpoints)
12
Existing online RDF datasets
13
Finding existing vocabularies
- Reuse well known vocabularies (Dublin Core, schema.org, FOAF, SIOC, Good Relations, SKOS, voiD, etc.)
- Try an ontology / vocabulary search engine or repository:
- Ask mailing lists, forums (semantic-web@w3.org, stackoverflow.com)
14
Build your own vocabulary
15
Case 1: Build linked data from text
- Describe in RDF the following situation:
The new iPhone 42, manufactured by Apple Inc., is sold by D. Adams Business Co. at $1499, but currently it is on sale at $1299 until 31st October 2025.
16
Case 2: Build linked data from existing data
- Translate the following tables to RDF:
| TeamID | Name | Country | Coach |
| FRA | XV de France | France | Laporte |
| NZL | All Blacks | New Zealand | Henry |
| ENG | XV of the Rose | England | Ashton |
| … | … | … | … |
| PlayerID | Name | TeamID | Position |
| 1 | Vincent Clerc | FRA | wing |
| 2 | Lionel Beauxis | FRA | flyhalf |
| 3 | Joe Rokocoko | NZL | wing |
| … | … | … | … |
17
Case :UML to RDF vocabulary
Usually, these translations are appropriate:
- UML classes → RDF classes
- UML attribute → RDF properties with literals as range
- UML links → RDF properties
- generalization →
rdfs:subClassOf
- Visibility and methods are normally not represented in RDF (it’s not a programming language)
- Cardinalities cannot be represented with RDFS, but can in OWL (cf. future courses), but be careful!
- Note: in RDF, properties are not attached to classes. They are first class citizens.
18
RDF files and RDF APIs
- RDF files (RDF/XML, Turtle, N-triples, etc.) can be read into memory with RDF APIs
- The in-memory model of an RDF graph can be manipulated with API methods
19
Storing and managing RDF
- RDF files (RDF/XML, Turtle, N-triples, etc.) can be read into memory with RDF APIs
- Some triple stores scale up to trillions of RDF triples, given enough hardware: GraphDB, AllegroGraph, Virtuoso, Stardog, Amazon Neptune…
- Small capacity triple stores (good for quick development of simple Web apps): Jena Fuzeki, Sesame, and others
20
Linked Data Platform
@base <http://example.com/ldp/>
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix ldp: <http://www.w3.org/ns/ldp#> .
<subfolder> a ldp:Container;
dct:created
"2021-10-01T13:30:30+02:00"^^xsd:dateTime;
ldp:contains
<this>,
<that>,
<it> .
21
Using a Linked Data Platform
- Interaction via HTTP requests
GET requests access data
POST requests create resources with an RDF graph and place them in a container
PUT requests update specific resources
- some metadata is added in passing...
22