This session is about modelling data in graph form using RDF and the Turtle syntax.
This is a mini tutorial on how to model knowledge in RDF, with a few tips on good and bad practices. You should be able to go through this section quickly. The practical work is given in next section.
An RDF graph contains a set of node–arc–node relations. A simple graph like this one:
can be encoded in Turtle as follows:
<Daniel> <worksFor> <Google> .
This forms a triple where we will call the first element of the triple its subject, the second element its predicate, and the third element its object. Note that there is a dot at the end. For a more complex graph like:
we can simply add more triples, separated by dots:
<Daniel> <worksFor> <Google> .
<Google> <hasParentCompany> <Alphabet> .
Note again that the dot separates the triples. When there are multiple arcs coming out of the same node, we can simplify the notation. The following graph:
can be written like this:
<Google> <hasParentCompany> <Alphabet> .
<Google> <hasHeadquarter> <Googleplex> .
or, more concisely, like this:
<Google> <hasParentCompany> <Alphabet> ;
<hasHeadquarter> <Googleplex> .
When the subject is the same, we can repeat it by simply adding a semicolon between predicate–object pairs. When the series of predicate–object pairs is finished, we must add a dot. We can further simplify the notation when the subject and the predicate are the same:
can be written:
<Google> <hasParentCompany> <Alphabet> .
<Google> <hasHeadquarter> <Googleplex> .
<Google> <hasFounder> <LarryPage> .
<Google> <hasFounder> <SergeyBrin> .
or more concisely:
<Google> <hasParentCompany> <Alphabet> ;
<hasHeadquarter> <Googleplex> ;
<hasFounder> <LarryPage> ;
<hasFounder> <SergeyBrin> .
or even more concisely:
<Google> <hasParentCompany> <Alphabet> ;
<hasHeadquarter> <Googleplex> ;
<hasFounder> <LarryPage>, <SergeyBrin> .
Note the comma that separates two objects for the same subject and predicate.
Remarks: (1) the order of the triples is not important; (2) there is no shortcut when the object is repeated with different subjects or predicates; (3) a given triple cannot appear multiple times (i.e., if the subject, the predicate, and the object are the same, then the triple is the same); (4) in Turtle, spaces cannot be used in node names or predicate labels.
In general, nodes in RDF graphs represent things in the real world that we want to describe. These things can be concrete, physical entities (people, objects, etc.), or abstract things (concepts, ideas, legal entities, etc.). Most of these things cannot be fully encoded in a computer: only their (partial) description can be encoded. However, there are entities that can fully be represented and stored as data, such as integers, decimal numbers, character strings, dates. In this case, we use a different type of nodes to represent them, that we call “literals” because what they represent is literally what’s written. In graphical notation, they are often drawn as rectangles:
In Turtle, this is written:
<LarryPage> <name> "Lawrence Edward Page" .
A literal can have spaces in it. A literal can be of different types (number, string, date, etc.) and the set of literal types may be open, or even infinite in some applications. To make sure we interpret the value of a literal correctly, we must associate a datatype to it, as in the following example:
In Turtle, this can be written as:
<LarryPage> <name> "Lawrence Edward Page" ;
<birthdate> "1973-03-26"^^xsd:date .
The datatype xsd:date
determine how we can interpret the string 1973-03-26
. There exist standard datatypes that can be used more concisely in Turtle, for strings, integers, decimal numbers, and floating point binary numbers. A standard for dates exists but there is no short notation for it in Turtle. The following example shows how integers and decimal numbers can be written, and also displays comments in Turtle notation:
# This is a comment, starting with '#' and ending at the end of the line
<LarryPage> <name> "Lawrence Edward Page" ; # Character string
<numberOfChildren> 2 ; # Integer: just a sequence of digits
<height> 1.7 . # Decimal: 2 sequences of digits separated by '.'
In order to use IRIs stemming from different places, we define prefixes. In this session, you do not need to be much concerned about IRI namespaces, but you may want to use at least the standard XML Schema Datatypes (XSDs). For this, write this line at the beginning of your Turtle file:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
Then you can use the XSDs like this:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
<LarryPage> <name> "Lawrence Edward Page"^^xsd:string ; # Equivalent to "Lawrence Edward Page" without datatype
<numberOfChildren> "2"^^xsd:integer ; # Equivalent to 2 without quotes and datatype
<height> "1.7"^^xsd:decimal . # Equivalent to 1.7 without quotes and datatype
<birthdate> "1973-03-26"^^xsd:date ; # Uses ISO 8601 format
<wikipediaPage> "https://en.wikipedia.org/wiki/Larry_Page"^^xsd:anyURI ;
<wealthInDollars> "107.9e9"^^xsd:double . # Binary floating point double precision
You can give a type to an entity by using the special predicate a
, which is more or less equivalent to the phrase “is a”:
<LarryPage> a <Person> .
Here is a set of tips that you need to have in mind when making a knowledge graph:
<Google> <hasParentCompany> <Alphabet> .
<Google> <hasHeadquarter> <Googleplex> .
is a graph with 3 nodes, not 4.<Google> <hasParentCompany> <Alphabet>, <Alphabet> ;
<hasParentCompany> <Alphabet> .
<Google> <hasParentCompany> <Alphabet> .
is a graph with only 1 triple.xsd:decimal
) over xsd:double
or xsd:float
.house
to describe a single house. Use, for instance, house1
, house2
, etc. to distinguish the entities, or use a slash like this:
<house/1> a <House>
<sale/152196> a <Sale> ;
<soldBy> <JohnDoe> ;
<boughtBy> <JaneDoe> ;
<objectOfTransaction> <samsung/gs22ultra/sn456-997> ;
<dateOfTransaction> "2022-03-11"^^xsd:date ;
<priceInEuros> 499.95 .
headquarter
could mean “is the headquarter of” or “has headquarter”. Be consistent!has
for a relation. “X has Y” could mean that X owns Y, or that X has the charactertic Y, or that X has the disease Y, etc. Be precise!You will be writing some RDF in the Turtle format. You can use the Turtle Editor that is available online, but text editors and IDE often have syntax highlight for Turtle, so you can also use your tool of choice.
In this session we want to describe product data and connect it to other kinds of data.
Save the code to a file with name YourFirstName-YourLastName-ex1.ttl
.
Save the code to a file with name YourFirstName-YourLastName-ex2.ttl
.
Save the code to a file with name YourFirstName-YourLastName-ex3.ttl
.
The second hand hard drive was sold to Lucas on the 23rd of January 2023 for €119.00 and sent back on the 1st of February 2023 for reimbursement. The product was then sold to Yassine on the 30th of June 2023 for €74.90.
Save the code to a file with name YourFirstName-YourLastName-ex4.ttl
.