Observing real data in RDF

This session is about publishing structured data in RDF, following Linked Data principles and best practices. It starts with observing existing data on the Web, then follows by modelling your own data by reusing known vocabularies and linking your data to other data sources.

Objectives

Observing existing data

In this part, we will examine data from a well known Linked Data provider, DBpedia. We will do most of the work in a Web browser. Do the following:

Start a Web browser.

Open a blank text file that you will save as <yourfirstname>-<yourlastname>.txt. You will put answers to the questions asked below and send the file at the end of the session. Your answers should be very short.

  1. Using the address bar, go to http://dbpedia.org/page/Tim_Berners-Lee. What is this page describing? Write your answer in English (or in French if you prefer) in the text file.
  2. Observe the data available there. The Web page is an HTML document, but it shows RDF triples from the RDF database DBpedia, in an almost human-readable form. Try to figure out the triples that are shown there. Give 3 examples of RDF triples (each on a different line in your text file) observed in this file. Write them in the Turtle format.
  3. The Web page shows a table with two columns. The first column (with header Property) has values that are hyperlinks. Click on some of those links, then observe specifically what is shown on dbo:birthDate. What kind of information does this property provide? Write your answer in English (or French) in the text file.
  4. Go back to the previous page. Can you find the description of the entity in English? What is the property used to provide the description? Copy the description then give the property in your text file.
  5. Now, look at the second column in the table, with head Value. Some values are hyperlinks, some are not. What does it mean when the value is a hyperlink? Try to explain as concisely as possible in your text file.
  6. Consider the line where the Property is dbo:birthPlace. Move your mouse on the second link in the Value column. On the bottom left of the browser window, you should see the URL to which this link is pointing to. Write this URL in your text file
  7. Click on the link, then take a look at the address bar in your browser. Compare it to the link you saw just before and write it down in your text file. Why are they different? What does the address on the link represent with regard to what the address to which you are redirected to? Explain concisely in your text file.
  8. On that page (that is, http://dbpedia.org/page/England) consider the Property dbp:areaKm. What is the number in the Value column? What does the text between brackets represent? Take a look at dbp:gvaPerCapita. What does the value formally represent? What is its type? Write your short answers in the text file.
  9. In the header of the page, you can see “Formats”. Select the Turtle format and look at its content. You can also look at other RDF formats, in particular RDF/XML and JSON-LD.
  10. Tim Berners-Lee is also described in other RDF data sets on the Web. Find the property owl:sameAs (not to be confused with schema:sameAs) and look at the values there. You can see URIs that point to other domains. All of them contain RDF data. Find RDF files that describe Tim Berners-Lee at the Deutsche National Bibliothek, and at the BBC. As in DBpedia, the data served by these organisations is usually displayed in HTML, but there are links to the RDF data.
  11. To introduce the next series of exercises, you will look at DBpedia prefixes. Remember that you can declare prefixes in Turtle. For instance, declaring
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    in preamble of a file implies that any occurence of xsd:something in the remainder of the file is syntactic sugar for <http://www.w3.org/2001/XMLSchema#something>. The URI associated to a prefix is called a namespace, as in XML. Look again at the Turtle document describing Tim Berners-Lee, which should start with a list of prefix declarations. What namespaces belong to DBpedia (namespace URIs that include dbpedia.org)? Write both prefixes and namespace URIs in your answer.

Reusing RDF vocabularies

Now that you are a little familiar with DBpedia, consider the exercise from last session where you had to describe Mines Saint-Étienne, using Wikipedia infobox. A solution to the exercise was made available. Rewrite the RDF graph by trying to use DBpedia IRIs as much as possible. Not all relative IRIs in the solution can be replaced by IRIs from DBpedia, but do so wherever possible.

Transcribing a graphic view to Turtle

Write a Turtle document that encodes the same data as what is visualised in the following graph:

A graph describing part of the 2nd floor of the EMSE building at Espace Fauriel.

Save the code to a file with name YourFirstName-YourLastName-graph.ttl.

Authoring data in RDF

Now that you have seen how an existing Linked Data web site works, you will be editing and publishing your own RDF files. You will be describing your personal profile in order to build a distributed social network.

You will be writing some RDF in the Turtle format. Use the Turtle Editor that I showed in my presentation and make sure that you reguarly save what you write in a file (you'll need it in the next lecture). You will make use of what is called the Friend Of A Friend vocabulary (FOAF) and therefore the RDF document that you make will be called your FOAF profile.

Your Turtle should start with the following prefix declarations:

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.org/> # for predicates you couldn't find elsewhere
PREFIX profile: <#> # for resources you describe yourself
  1. Take a look at Tim Berners-Lee’s FOAF profile and Antoine Zimmermann’s FOAF profile. They respectively chose card:i and az:me to identify themselves. You can use profile:i or profile:me for your own description. Antoine Zimmermann's FOAF profile says az:me a foaf:Person (while Tim Berners-Lee's profile has card:i a :Person, but given the prefix declaration, they are both referring to the same IRI http://xmlns.com/foaf/0.1/Person). Remember that the keyword a is a shorthand for the IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#type that could also be written rdf:type. All IRIs that are used in the object position of a rdf:type triple (that is, after the a keyword) are classes. Write that you are person (in Turtle).
  2. To enrich your RDF profile, you can say what your topics of interest are, your past projects, etc. The example files (TimBL’s and AZ’s) provide useful property IRIs that you can use to describe yourself. Add triples to your RDF document to indicate that one of your interest is the Semantic Web. For this, you can use the IRI http://dbpedia.org/resource/Semantic_Web that DBpedia defines to identify the Semantic Web. You can add more topics of interest (use DBpedia Lookup to find resources).
  3. Add your fullname, or you firstname and lastname, possibly your nickname. Relate yourself to your university. Whenever possible, reuse the IRIs you find in Tim Berners-Lee and Antoine Zimmermann's profiles. Otherwise, you may invent new predicates prefixed with ex:, a “put it all” prefix.
  4. Add more things about yourself, such as your address, your previous schools, your family members (real or fictitious), friends, etc.
  5. You can indicate that you know someone (or vice-versa), for instance using the property foaf:knows. Add a triple that would state that you know at least one of your classmates. What IRI did you choose to identify them? Why could it be a problem? You'll see in the next assignment how to resolve this problem with Linked Data.

Merging and connecting two graphs

In your file describing yourself, indicate that you are following the Semantic Web course, that this course has a session in room 214, and add all the data from the file you edited previously, transcribing the graphical view of a graph to Turtle.