Semantic Web - Practical session 2

Quick Turtle tutorial

This is a mini tutorial on how to model knowledge in RDF, with a few tips on good and bad practices. You should be able to go through this section quickly. The practical work is given in next section.

Basic relations

An RDF graph contains a set of node–arc–node relations. A simple graph like this one:

Daniel works for Google — A basic *node–arc–node* relation

can be encoded in Turtle as follows:

<Daniel> <worksFor> <Google> .

This forms a triple where we will call the first element of the triple its subject, the second element its predicate, and the third element its object. Note that there is a dot at the end. For a more complex graph like:

Daniel works for Google and Google has parent company Alphabet — Multiple *node–arc–node* relations

we can simply add more triples, separated by dots:

<Daniel> <worksFor> <Google> .
<Google> <hasParentCompany> <Alphabet> .

Note again that the dot separates the triples. When there are multiple arcs coming out of the same node, we can simplify the notation. The following graph:

Google has parent company Alphabet and has headquarter at the Googleplex — Multiple *node-arc-node* relations with the same subject

can be written like this:

<Google> <hasParentCompany> <Alphabet> .
<Google> <hasHeadquarter> <Googleplex> .

or, more concisely, like this:

<Google> <hasParentCompany> <Alphabet> ;
    <hasHeadquarter> <Googleplex> .

When the subject is the same, we can repeat it by simply adding a semicolon between predicate–object pairs. When the series of predicate–object pairs is finished, we must add a dot. We can further simplify the notation when the subject and the predicate are the same:

Google has parent company Alphabet, was founded by Sergey Brin and Larry Page, and has headquarter at the Googleplex — Multiple *node-arc-node* relations with the same subject or same subject and predicate

can be written:

<Google> <hasParentCompany> <Alphabet> .
<Google> <hasHeadquarter> <Googleplex> .
<Google> <hasFounder> <LarryPage> .
<Google> <hasFounder> <SergeyBrin> .

or more concisely:

<Google> <hasParentCompany> <Alphabet> ;
    <hasHeadquarter> <Googleplex> ;
    <hasFounder> <LarryPage> ;
    <hasFounder> <SergeyBrin> .

or even more concisely:

<Google> <hasParentCompany> <Alphabet> ;
    <hasHeadquarter> <Googleplex> ;
    <hasFounder> <LarryPage>, <SergeyBrin> .

Note the comma that separates two objects for the same subject and predicate.

Remarks: (1) the order of the triples is not important; (2) there is no shortcut when the object is repeated with different subjects or predicates; (3) a given triple cannot appear multiple times (i.e., if the subject, the predicate, and the object are the same, then the triple is the same); (4) in Turtle, spaces cannot be used in node names or predicate labels.

Literals and datatypes

In general, nodes in RDF graphs represent things in the real world that we want to describe. These things can be concrete, physical entities (people, objects, etc.), or abstract things (concepts, ideas, legal entities, etc.). Most of these things cannot be fully encoded in a computer: only their (partial) description can be encoded. However, there are entities that can fully be represented and stored as data, such as integers, decimal numbers, character strings, dates. In this case, we use a different type of nodes to represent them, that we call “literals” because what they represent is literally what’s written. In graphical notation, they are often drawn as rectangles:

Larry Page’s name is “Lawrence Edward Page” — Larry Page’s name is, literally, “Lawrence Edward Page”

In Turtle, this is written:

<LarryPage> <name> "Lawrence Edward Page" .

A literal can have spaces in it. A literal can be of different types (number, string, date, etc.) and the set of literal types may be open, or even infinite in some applications. To make sure we interpret the value of a literal correctly, we must associate a datatype to it, as in the following example:

Larry Page’s name is “Lawrence Edward Page” and he was born on the 26th of March, 1973 — Larry Page’s name is “Lawrence Edward Page” and he was born on the 26^th of March, 1973

In Turtle, this can be written as:

<LarryPage> <name> "Lawrence Edward Page" ;
    <birthdate> "1973-03-26"^^xsd:date .

The datatype xsd:date determine how we can interpret the string 1973-03-26. There exist standard datatypes that can be used more concisely in Turtle, for strings, integers, decimal numbers, and floating point binary numbers. A standard for dates exists but there is no short notation for it in Turtle. The following example shows how integers and decimal numbers can be written, and also displays comments in Turtle notation:

# This is a comment, starting with '#' and ending at the end of the line
<LarryPage> <name> "Lawrence Edward Page" ; # Character string
    <numberOfChildren> 2 ; # Integer: just a sequence of digits
    <height> 1.7 . # Decimal: 2 sequences of digits separated by '.'

Other features of Turtle

In order to use IRIs stemming from different places, we define prefixes. In this session, you do not need to be much concerned about IRI namespaces, but you may want to use at least the standard XML Schema Datatypes (XSDs). For this, write this line at the beginning of your Turtle file:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

Then you can use the XSDs like this:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

<LarryPage>  <name> "Lawrence Edward Page"^^xsd:string ; # Equivalent to "Lawrence Edward Page" without datatype
    <numberOfChildren> "2"^^xsd:integer ; # Equivalent to 2 without quotes and datatype
    <height> "1.7"^^xsd:decimal . # Equivalent to 1.7 without quotes and datatype
    <birthdate> "1973-03-26"^^xsd:date ; # Uses ISO 8601 format
    <wikipediaPage> "https://en.wikipedia.org/wiki/Larry_Page"^^xsd:anyURI ;
    <wealthInDollars> "107.9e9"^^xsd:double . # Binary floating point double precision

You can give a type to an entity by using the special predicate a, which is more or less equivalent to the phrase “is a”:

<LarryPage> a <Person> .

Basic tips and good practices

Here is a set of tips that you need to have in mind when making a knowledge graph:

Nodes are entirely identified by their name. There cannot be two distinct nodes that have the same name. So, the following code:
```
<Google> <hasParentCompany> <Alphabet> .
<Google> <hasHeadquarter> <Googleplex> .
```
is a graph with 3 nodes, not 4.
Arc labels can be reused on multiple arcs, but there cannot be the same arc label twice with the same source and same destination. That is:
```
<Google> <hasParentCompany> <Alphabet>, <Alphabet> ;
    <hasParentCompany> <Alphabet> .
<Google> <hasParentCompany> <Alphabet> .
```
is a graph with only 1 triple.
Use standard datatypes for literals, when available, and prefer decimal notations (implicit xsd:decimal) over xsd:double or xsd:float.
Be careful how you name an entity. The same name always identifies the same entity. Avoid generic terms like house to describe a single house. Use, for instance, house1, house2, etc. to distinguish the entities, or use a slash like this:
```
<house/1> a <House>
```
Some things may appear to be instances when in fact they are categories. For instance, “Samsung Galaxy S22 Ultra” may seem to be an instance of phone, but in fact, my Samsung Galaxy S22 Ultra that has been damaged is of the same category as your Samsung Galaxy S22 Ultra. Do not confuse a product model and a single product.
With RDF graphs, you can only represent binary relations. To represent arbitrary n-ary relations, you may have to introduce intermediary nodes that denote the relation, and connect it to the components of the relation. For instance, a sale connects a seller, a buyer, a product or service, a date, and a price. It could be represented like as a graph:
```
<sale/152196> a <Sale> ;
    <soldBy> <JohnDoe> ;
    <boughtBy> <JaneDoe> ;
    <objectOfTransaction> <samsung/gs22ultra/sn456-997> ;
    <dateOfTransaction> "2022-03-11"^^xsd:date ;
    <priceInEuros> 499.95 .
```
Naming entities (nodes and arcs) is one of the most important tasks in knowledge representation. Bad naming conventions can make a knowledge model more ambiguous, more error prone, more difficult to understand, and eventually, not used at all. Be sure you establish naming conventions that you and your collaborators follow through all portions of your knowledge model. E.g., use CamelCase with capital initial for classes, lower case initial for instances and relations, etc. Use verbs for relations everywhere, or use nouns everywhere; or define a convention for where you use verbs, and where you use nouns. A noun for a relation like headquarter could mean “is the headquarter of” or “has headquarter”. Be consistent!
Name things and relations in a descriptive and non ambiguous way. Avoid single generic words like has for a relation. “X has Y” could mean that X owns Y, or that X has the charactertic Y, or that X has the disease Y, etc. Be precise!
As much as possible, make your description as independent of the context as possible. A number that denotes a price in euros must be distinguished from a number that is a price in dollars, or that represents an amount of objects, or that represents a measurement, etc. A word that has 2 senses in different contexts may not be sufficient. Be specific!
Avoid describing things that varies all the time. For instance, prefer the birthdate over the age of a person. Think about the potential need of keeping the history of states of affairs during the whole life cycle of your RDF data base.

Authoring data in RDF

You will be writing some RDF in the Turtle format. You can use the Turtle Editor that is available online, but text editors and IDE often have syntax highlight for Turtle, so you can also use your tool of choice.

In this session we want to describe product data and connect it to other kinds of data.

Describe a product model

Consider the description of a hard drive disk model on Amazon.
Define an IRI for the hard drive model and relate it to the IRI of the manufacturer that you need to define.
You should be able to quickly build an RDF graph with all the basic characteristics of the product.

Save the code to a file with name YourFirstName-YourLastName-ex1.ttl.

Describe individual entities of this model

Assume we want to keep track of the life cycle of an individual hard drive. The hard drives may be sent to different retail stores and then sold to different customers who experience difference usage of the products.
Assume further that there are 3 hard drives of the same model that are sold in different ways. Two are new products and one is a second-hand hard drive that was reconditioned after being returned by a previous customer.
Extend your initial RDF graph to describe the 3 indivudal products.

Save the code to a file with name YourFirstName-YourLastName-ex2.ttl.

Describe the life cycles of different products

One of the product is sold by Amazon at €89.90. One is sold by LDLC at 90.00€. One is a second-hand product at 74.90€.
The product at Amazon has a 2 years warrantee. The product at LDLC has a 2 year warrantee with optional 5 year warrantee for an extra €9.90. The second hand product has a 1 year warrantee.

Save the code to a file with name YourFirstName-YourLastName-ex3.ttl.

Describe the life-cycle of the product

The second hand hard drive was sold to Lucas on the 23^rd of January 2023 for €119.00 and sent back on the 1^st of February 2023 for reimbursement. The product was then sold to Yassine on the 30^th of June 2023 for €74.90.

Extend the RDF file you had before with information about this situation. You may have to introduce identifiers for different states or situations or steps in a process. Assuming there are many situations like this, with your data, it must be possible to know what product was sold and returned, what product was sold at what price when, etc.

Save the code to a file with name YourFirstName-YourLastName-ex4.ttl.

Writing Web data in RDF

Objectives