This session is about observing existing data from public knowledge graphs and modelling knowledge as graphs with RDF.
In this part, we will examine data from a well known Linked Data provider, DBpedia. We will do most of the work in a Web browser. Do the following:
Start a Web browser.
dbo:birthDate
. What kind of information does this property provide?dbo:birthPlace
. Move your mouse on the second link in the Value column. On the bottom left of the browser window, you should see the URL to which this link is pointing to. Write down this URL or memorise it well.http://dbpedia.org/page/England
) consider the Property dbp:areaKm
. What is the number in the Value column? What does the text between brackets represent? Take a look at dbp:gdpNominalPerCapita
. What does the value formally represent? What is its type?owl:sameAs
(not to be confused with schema:sameAs
) and look at the values there. You can see URIs that point to other domains. All of them contain RDF data.curl
to consume Linked DataIn this part, you will use cURL for getting data from the Web. If you are already familiar with cURL, you can jump to the next section.
If you do not have it already, download cURL and put it in a folder you will remember. On a Linux OS, cURL is available as a package in most distributions. Use your distribution’s package manager (such as apt
or brew
) to get it.
You may need to update the PATH
variable in your system environment configuration. On MS Windows, you can use Window-key + R
, then type SystemPropertiesAdvanced
. Then click Environment variables...
. Then find the variable Path
or PATH
in the user or system variables. Edit it and add the path to the folder you used to put cURL.
We will first learn the basics of cURL, then use it to understand how Linked Data principles and best practices are implemented.
http://mines-stetienne.fr
in the address bar. Notice what is happening. We are going to compare this to what cURL does.curl -V
to check that cURL is working. If not, go back to the previous steps.curl http://mines-stetienne.fr
and look at the result. cURL displays the payload (that is, the “body”) of the HTTP response. In this case, it is an HTML document saying that the document was moved to https://mines-stetienne.fr
.curl https://mines-stetienne.fr
and look at the result. It should be empty. We need to figure out what is happening.curl -I https://mines-stetienne.fr
and look at the result. -I
asks to display only the HTTP HEAD of the response, not the payload. We see that the resource with URI https://mines-stetienne.fr
was found at another location and we see the location where we can find it.curl https://www.mines-stetienne.fr/
and look at the result.. This time, you get a web page. This is the HTML code of the page you see in your browser.The HTTP response codes 301 Moved Permanently
and 302 Found
are commonly called “redirects”. Your browser directly displays the Web page because it is “following” the redirects. You can check that the URL in the address bar of your browser is https://www.mines-stetienne.fr/
. The browser stops redirecting when it finds a 200 OK
: it means that the resource you requested (namely, https://www.mines-stetienne.fr/
) has been found and is this file, which is an information resource.
You can follow redirects with cURL, using the option -L
. Check this: curl -L http://mines-stetienne.fr
. You can also see what the server is responding at each step of the negotiation by adding -I
. You can get even more details about the requests and responses by further adding -v
or --verbose
.
We will use the cURL and DBpedia to see how Linked Data can be accessed via HTTP.
http://dbpedia.org/resource/Tim_Berners-Lee
. Use cURL and see what URIs must be requested, in order, to reach a final representation. It is possible that you have to use the option -k
when requesting https
URIs, depending on your system configuration.curl -H "Accept: text/turtle" http://dbpedia.org/resource/Tim_Berners-Lee
. Use -H "Accept: text/turtle"
on all necessary requests to reach a 200 OK
and get some data.http://dbpedia.org/resource/Tim_Berners-Lee
.If the fourth Linked Data principle is used, there should be links from one data set to another, so that we can follow links to discover more data.
A tool that can help you navigate through RDF data is RDF Browser, a Firefox extension that shows RDF in the browser whenever it is available by content negotiation. If you have Firefox, you can install and try this extension.
As an alternative, you can also use Postman. Postman is comparable to cURL, except that it has a graphical interface that facilitates navigation (among other things). All links that appear in a response body will be clickable. You will have to manually add Accept
headers for content negotiation, though.
In DBpedia, Mines Saint-Étienne’s data are linked to many other data sets. Using the RDF Browser, Postman or cURL (in this order of preference), find a path starting from https://dbpedia.org/resource/%C3%89cole_nationale_sup%C3%A9rieure_des_mines_de_Saint-%C3%89tienne
and leading to University of São Paulo, then to the Technical University of Berlin, then to University Saint Gallen. This can take a while if you go in a wrong direction, do not spend too much time on it!
In terms of knowledge graph modelling, we start with a simple exercise that you can do on paper. Use the following facts:
Identify connections, things, and values. Depict things in circles. Depict values in rectangles. Depict connections using arrows. Draw the graph on a piece of paper.
This is a mini tutorial on how to model knowledge in RDF, with a few tips on good and bad practices. You should be able to go through this section quickly. The practical work is given in next section.
An RDF graph contains a set of node–arc–node relations. A simple graph like this one:
can be encoded in Turtle as follows:
<Daniel> <worksFor> <Google> .
This forms a triple where we will call the first element of the triple its subject, the second element its predicate, and the third element its object. Note that there is a dot at the end. For a more complex graph like:
we can simply add more triples, separated by dots:
<Daniel> <worksFor> <Google> .
<Google> <hasParentCompany> <Alphabet> .
Note again that the dot separates the triples. When there are multiple arcs coming out of the same node, we can simplify the notation. The following graph:
can be written like this:
<Google> <hasParentCompany> <Alphabet> .
<Google> <hasHeadquarter> <Googleplex> .
or, more concisely, like this:
<Google> <hasParentCompany> <Alphabet> ;
<hasHeadquarter> <Googleplex> .
When the subject is the same, we can repeat it by simply adding a semicolon between predicate–object pairs. When the series of predicate–object pairs is finished, we must add a dot. We can further simplify the notation when the subject and the predicate are the same:
can be written:
<Google> <hasParentCompany> <Alphabet> .
<Google> <hasHeadquarter> <Googleplex> .
<Google> <hasFounder> <LarryPage> .
<Google> <hasFounder> <SergeyBrin> .
or more concisely:
<Google> <hasParentCompany> <Alphabet> ;
<hasHeadquarter> <Googleplex> ;
<hasFounder> <LarryPage> ;
<hasFounder> <SergeyBrin> .
or even more concisely:
<Google> <hasParentCompany> <Alphabet> ;
<hasHeadquarter> <Googleplex> ;
<hasFounder> <LarryPage>, <SergeyBrin> .
Note the comma that separates two objects for the same subject and predicate.
Remarks: (1) the order of the triples is not important; (2) there is no shortcut when the object is repeated with different subjects or predicates; (3) a given triple cannot appear multiple times (i.e., if the subject, the predicate, and the object are the same, then the triple is the same); (4) in Turtle, spaces cannot be used in node names or predicate labels.
In general, nodes in RDF graphs represent things in the real world that we want to describe. These things can be concrete, physical entities (people, objects, etc.), or abstract things (concepts, ideas, legal entities, etc.). Most of these things cannot be fully encoded in a computer: only their (partial) description can be encoded. However, there are entities that can fully be represented and stored as data, such as integers, decimal numbers, character strings, dates. In this case, we use a different type of nodes to represent them, that we call “literals” because what they represent is literally what’s written. In graphical notation, they are often drawn as rectangles:
In Turtle, this is written:
<LarryPage> <name> "Lawrence Edward Page" .
A literal can have spaces in it. A literal can be of different types (number, string, date, etc.) and the set of literal types may be open, or even infinite in some applications. To make sure we interpret the value of a literal correctly, we must associate a datatype to it, as in the following example:
In Turtle, this can be written as:
<LarryPage> <name> "Lawrence Edward Page" ;
<birthdate> "1973-03-26"^^xsd:date .
The datatype xsd:date
determine how we can interpret the string 1973-03-26
. There exist standard datatypes that can be used more concisely in Turtle, for strings, integers, decimal numbers, and floating point binary numbers. A standard for dates exists but there is no short notation for it in Turtle. The following example shows how integers and decimal numbers can be written, and also displays comments in Turtle notation:
# This is a comment, starting with '#' and ending at the end of the line
<LarryPage> <name> "Lawrence Edward Page" ; # Character string
<numberOfChildren> 2 ; # Integer: just a sequence of digits
<height> 1.7 . # Decimal: 2 sequences of digits separated by '.'
In order to use IRIs stemming from different places, we define prefixes. In this session, you do not need to be much concerned about IRI namespaces, but you may want to use at least the standard XML Schema Datatypes (XSDs). For this, write this line at the beginning of your Turtle file:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
Then you can use the XSDs like this:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
<LarryPage> <name> "Lawrence Edward Page"^^xsd:string ; # Equivalent to "Lawrence Edward Page" without datatype
<numberOfChildren> "2"^^xsd:integer ; # Equivalent to 2 without quotes and datatype
<height> "1.7"^^xsd:decimal . # Equivalent to 1.7 without quotes and datatype
<birthdate> "1973-03-26"^^xsd:date ; # Uses ISO 8601 format
<wikipediaPage> "https://en.wikipedia.org/wiki/Larry_Page"^^xsd:anyURI ;
<wealthInDollars> "107.9e9"^^xsd:double . # Binary floating point double precision
You can give a type to an entity by using the special predicate a
, which is more or less equivalent to the phrase “is a”:
<LarryPage> a <Person> .
Here is a set of tips that you need to have in mind when making a knowledge graph:
<Google> <hasParentCompany> <Alphabet> .
<Google> <hasHeadquarter> <Googleplex> .
is a graph with 3 nodes, not 4.<Google> <hasParentCompany> <Alphabet>, <Alphabet> ;
<hasParentCompany> <Alphabet> .
<Google> <hasParentCompany> <Alphabet> .
is a graph with only 1 triple.xsd:decimal
) over xsd:double
or xsd:float
.house
to describe a single house. Use, for instance, house1
, house2
, etc. to distinguish the entities, or use a slash like this:
<house/1> a <House>
<sale/152196> a <Sale> ;
<soldBy> <JohnDoe> ;
<boughtBy> <JaneDoe> ;
<objectOfTransaction> <samsung/gs22ultra/sn456-997> ;
<dateOfTransaction> "2022-03-11"^^xsd:date ;
<priceInEuros> 499.95 .
headquarter
could mean “is the headquarter of” or “has headquarter”. Be consistent!has
for a relation. “X has Y” could mean that X owns Y, or that X has the charactertic Y, or that X has the disease Y, etc. Be precise!You will be writing some RDF in the Turtle format. You can use the Turtle Editor that is available online, but text editors and IDE often have syntax highlight for Turtle, so you can also use your tool of choice.
Write the mini-description of a production line that you did before in an RDF file.
You can save the code to a file with file extension .ttl
.
rdfs:label
. It is a 35-hours training that started on 24th July 2023 and ends on the 28th of July, 2023. The Web page of the summer school is "https://ai4industry2023.sciencesconf.org/"^^xsd:anyURI
https://w3id.org/people/az/me
).rdf:type
(equivalently, using the Turtle keyword a
).