Writing Web data in RDF

This session is about modelling data in graph form using RDF and the Turtle syntax.

Objectives

Quick Turtle tutorial

This is a mini tutorial on how to model knowledge in RDF, with a few tips on good and bad practices. You should be able to go through this section quickly. The practical work is given in next section.

Basic relations

An RDF graph contains a set of node–arc–node relations. A simple graph like this one:

Daniel works for Google
A basic node–arc–node relation

can be encoded in Turtle as follows:

<Daniel> <worksFor> <Google> .

This forms a triple where we will call the first element of the triple its subject, the second element its predicate, and the third element its object. Note that there is a dot at the end. For a more complex graph like:

Daniel works for Google and Google has parent company Alphabet
Multiple node–arc–node relations

we can simply add more triples, separated by dots:

<Daniel> <worksFor> <Google> .
<Google> <hasParentCompany> <Alphabet> .

Note again that the dot separates the triples. When there are multiple arcs coming out of the same node, we can simplify the notation. The following graph:

Google has parent company Alphabet and has headquarter at the Googleplex
Multiple node-arc-node relations with the same subject

can be written like this:

<Google> <hasParentCompany> <Alphabet> .
<Google> <hasHeadquarter> <Googleplex> .

or, more concisely, like this:

<Google> <hasParentCompany> <Alphabet> ;
    <hasHeadquarter> <Googleplex> .

When the subject is the same, we can repeat it by simply adding a semicolon between predicate–object pairs. When the series of predicate–object pairs is finished, we must add a dot. We can further simplify the notation when the subject and the predicate are the same:

Google has parent company Alphabet, was founded by Sergey Brin and Larry Page, and has headquarter at the Googleplex
Multiple node-arc-node relations with the same subject or same subject and predicate

can be written:

<Google> <hasParentCompany> <Alphabet> .
<Google> <hasHeadquarter> <Googleplex> .
<Google> <hasFounder> <LarryPage> .
<Google> <hasFounder> <SergeyBrin> .

or more concisely:

<Google> <hasParentCompany> <Alphabet> ;
    <hasHeadquarter> <Googleplex> ;
    <hasFounder> <LarryPage> ;
    <hasFounder> <SergeyBrin> .

or even more concisely:

<Google> <hasParentCompany> <Alphabet> ;
    <hasHeadquarter> <Googleplex> ;
    <hasFounder> <LarryPage>, <SergeyBrin> .

Note the comma that separates two objects for the same subject and predicate.

Remarks: (1) the order of the triples is not important; (2) there is no shortcut when the object is repeated with different subjects or predicates; (3) a given triple cannot appear multiple times (i.e., if the subject, the predicate, and the object are the same, then the triple is the same); (4) in Turtle, spaces cannot be used in node names or predicate labels.

Literals and datatypes

In general, nodes in RDF graphs represent things in the real world that we want to describe. These things can be concrete, physical entities (people, objects, etc.), or abstract things (concepts, ideas, legal entities, etc.). Most of these things cannot be fully encoded in a computer: only their (partial) description can be encoded. However, there are entities that can fully be represented and stored as data, such as integers, decimal numbers, character strings, dates. In this case, we use a different type of nodes to represent them, that we call “literals” because what they represent is literally what’s written. In graphical notation, they are often drawn as rectangles:

Larry Page’s name is “Lawrence Edward Page”
Larry Page’s name is, literally, “Lawrence Edward Page”

In Turtle, this is written:

<LarryPage> <name> "Lawrence Edward Page" .

A literal can have spaces in it. A literal can be of different types (number, string, date, etc.) and the set of literal types may be open, or even infinite in some applications. To make sure we interpret the value of a literal correctly, we must associate a datatype to it, as in the following example:

Larry Page’s name is “Lawrence Edward Page” and he was born on the 26th of March, 1973
Larry Page’s name is “Lawrence Edward Page” and he was born on the 26th of March, 1973

In Turtle, this can be written as:

<LarryPage> <name> "Lawrence Edward Page" ;
    <birthdate> "1973-03-26"^^xsd:date .

The datatype xsd:date determine how we can interpret the string 1973-03-26. There exist standard datatypes that can be used more concisely in Turtle, for strings, integers, decimal numbers, and floating point binary numbers. A standard for dates exists but there is no short notation for it in Turtle. The following example shows how integers and decimal numbers can be written, and also displays comments in Turtle notation:

# This is a comment, starting with '#' and ending at the end of the line
<LarryPage> <name> "Lawrence Edward Page" ; # Character string
    <numberOfChildren> 2 ; # Integer: just a sequence of digits
    <height> 1.7 . # Decimal: 2 sequences of digits separated by '.'

Other features of Turtle

In order to use IRIs stemming from different places, we define prefixes. In this session, you do not need to be much concerned about IRI namespaces, but you may want to use at least the standard XML Schema Datatypes (XSDs). For this, write this line at the beginning of your Turtle file:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

Then you can use the XSDs like this:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

<LarryPage>  <name> "Lawrence Edward Page"^^xsd:string ; # Equivalent to "Lawrence Edward Page" without datatype
    <numberOfChildren> "2"^^xsd:integer ; # Equivalent to 2 without quotes and datatype
    <height> "1.7"^^xsd:decimal . # Equivalent to 1.7 without quotes and datatype
    <birthdate> "1973-03-26"^^xsd:date ; # Uses ISO 8601 format
    <wikipediaPage> "https://en.wikipedia.org/wiki/Larry_Page"^^xsd:anyURI ;
    <wealthInDollars> "107.9e9"^^xsd:double . # Binary floating point double precision

You can give a type to an entity by using the special predicate a, which is more or less equivalent to the phrase “is a”:

<LarryPage> a <Person> .

Basic tips and good practices

Here is a set of tips that you need to have in mind when making a knowledge graph:

Authoring data in RDF

You will be writing some RDF in the Turtle format. You can use the Turtle Editor that is available online, but text editors and IDE often have syntax highlight for Turtle, so you can also use your tool of choice.

In this session we want to describe product data and connect it to other kinds of data.

Describe a product model

  1. Consider the description of a hard drive disk model on Amazon.
  2. Define an IRI for the hard drive model and relate it to the IRI of the manufacturer that you need to define.
  3. You should be able to quickly build an RDF graph with all the basic characteristics of the product.

Save the code to a file with name YourFirstName-YourLastName-ex1.ttl.

Describe individual entities of this model

  1. Assume we want to keep track of the life cycle of an individual hard drive. The hard drives may be sent to different retail stores and then sold to different customers who experience difference usage of the products.
  2. Assume further that there are 3 hard drives of the same model that are sold in different ways. Two are new products and one is a second-hand hard drive that was reconditioned after being returned by a previous customer.
  3. Extend your initial RDF graph to describe the 3 indivudal products.

Save the code to a file with name YourFirstName-YourLastName-ex2.ttl.

Describe the life cycles of different products

  1. One of the product is sold by Amazon at €89.90. One is sold by LDLC at 90.00€. One is a second-hand product at 74.90€.
  2. The product at Amazon has a 2 years warrantee. The product at LDLC has a 2 year warrantee with optional 5 year warrantee for an extra €9.90. The second hand product has a 1 year warrantee.

Save the code to a file with name YourFirstName-YourLastName-ex3.ttl.

Describe the life-cycle of the product

The second hand hard drive was sold to Lucas on the 23rd of January 2023 for €119.00 and sent back on the 1st of February 2023 for reimbursement. The product was then sold to Yassine on the 30th of June 2023 for €74.90.

  1. Extend the RDF file you had before with information about this situation. You may have to introduce identifiers for different states or situations or steps in a process. Assuming there are many situations like this, with your data, it must be possible to know what product was sold and returned, what product was sold at what price when, etc.

Save the code to a file with name YourFirstName-YourLastName-ex4.ttl.