Linked Data: interacting with and publishing Web data

This session is about publishing structured data on the Web.

Objectives

Learning curl

In this part, you will use a command line tool called cURL. It is a tool that every computer scientist should have and know how to use. If you are already familiar with cURL, you can jump to the next section.

If you do not have it already, download cURL and put it in a folder you will remember. On a linux OS, you can probably use a package manager like apt to get it.

You may need to update the PATH variable in your system environment configuration. On MS Windows, you can use Window-key + R, then type SystemPropertiesAdvanced. Then click Environment variables.... Then find the variable Path or PATH in the user or system variables. Edit it and add the path to the folder you used to put cURL.

We will first learn the basics of cURL, then use it to understand how Linked Data principles and best practices are implemented.

  1. Open a web browser and type http://www.emse.fr/ in the address bar. Notice what is happening. We are going to compare this to what cURL does.
  2. Open a command line tool windows.
  3. Type curl to check that cURL is working. If not, go back to the previous steps.
  4. Type curl http://www.emse.fr/ and look at the result. cURL displays the payload (that is, the “body”) of the HTTP response. In this case, it is an HTML document saying that the document was moved to https://www.emse.fr/.
  5. Type curl https://www.emse.fr/ and look at the result. It should be empty. We need to figure out what is happening.
  6. Type curl -I https://www.emse.fr/ and look at the result. -I asks to display only the HTTP HEAD of the response, not the payload. We see that the resource with URI https://www.emse.fr/ was permanently moved, and we see the location whre we can find it.
  7. Type curl http://www.mines-stetienne.fr/ and look at the result.. Again, this has been moved. You can also check what is in the header of the response with the option -I.
  8. Type curl https://www.mines-stetienne.fr/ and look at the result.. This time, you get a web page. This is the HTML code of the page you see in your browser.
  9. The HTTP response codes 301 Moved Permanently and 302 Found are commonly called “redirects”. Your browser directly displays the Web page because it is “following” the redirects. You can check that the URL in the address bar of your browser is https://www.mines-stetienne.fr/. The browser stops redirecting when it finds a 200 OK: it means that the resource you requested (namely, https://www.mines-stetienne.fr/) has been found and is this file, which is an information resource.

    You can follow redirects with cURL, using the option -L. Check this: curl -L http://www.emse.fr/. You can also see what the server is responding at ach step of the negotiation by adding -I. You can get even more details about the requests and responses by further adding --verbose.

Use cURL on Linked Data

We will use the cURL and DBpedia to see how Linked Data can be accessed via HTTP.

  1. We want to get a representation of the resource identified by http://dbpedia.org/resource/Tim_Berners-Lee. Use cURL and see what URIs must be requested, in order, to reach a final representation. It is possible that you have to use the option -k when requesting https URIs, depending on your system configuration. Take note of all the URIs in a text file that you will send (to antoine.zimmermann@emse.fr) at the end of the session.
  2. What is the format of the final response, after following the redirects? What is the response code? Note your answers in the text file.
  3. You can request a different format by changing the headers of your HTTP request. Type curl -H "Accept: text/turtle" http://dbpedia.org/resource/Tim_Berners-Lee. Use -H "Accept: text/turtle" on all necessary requests to reach a 200 OK and get some data. Write the list of URIs that have been requested along the way.
  4. Write a single command that gets the RDF/XML representation of the resource http://dbpedia.org/resource/Tim_Berners-Lee and write it in your answer file.

Follow the links

If the fourth Linked Data principle is used, there should be links from one data set to another, so that we can follow links to discover more data.

A tool that can help you navigate through RDF data is RDF Browser, a Firefox extension that shows RDF in the browser whenever it is available by content negotiation. If you have Firefox, you can install and try this extension.

In DBpedia, Tim Berners-Lee’s data are linked to other data sets on other websites. Especially, one can look for the property owl:sameAs and follow external links. Difficult: Using these links and any additional information of any kind from the Web, find a path between Tim Berners-Lee on DBpedia and Antoine Zimmermann on any website. Write the URLs of the RDF documents you need to make the connection between the 2 people. DO NOT WASTE TOO MUCH TIME DOING THIS!

Write some RDF in Turtle

We are going to make RDF data that conform to Linked Data principles. We want the data to be exposed on a Linked Data Platform (see next section). This means that we will describe every entity of interest in a separate file, and link them to one another.

Edit a new file where you will write in Turtle a simple description of yourself with your name, a short description, and anything you may want to add about you. If you need to introduce a new identified subject (a triple with a different subject IRI), you need to create a different file. To avoid deciding what URI you use to identify yourself, you can simply use the empty relative URI, like this:

@prefix ex: <http://example.com/> .
<>  ex:prop  ex:something .

After this, you can already start the next section to publish Linked Data, but in the end, we would like to interlink all the data from all the students in the class. To do this, we will add properties that describe who are the people on the left, on the right, in front of, and behind yourself. For this, use the properties http://example.com/hasRightNeighbour, http://example.com/hasLeftNeighbour, http://example.com/hasFrontNeighbour, and http://example.com/hasRearNeighbour. You can also say that you know someone else in the class. If there is time, you will investigate how we can describe better the configuration of the classroom at a given time (see the last section).

Publishing data on a Linked Data Platform

Your goal in this part is first to edit RDF in a Turtle file, then publish it on a platform, following Linked Data principles. You will link your data to other people’s data. There is a part that cannot be done from outside Mines Saint-Étienne. If you are doing this from home or somewhere else, skip the parts that deal with the Linked Data Platform.

You can use HTTP POST requests to upload data using cURL. This can be done by adding the option -X POST. You should also specify the format of the data you upload, using -H "Content-Type: THEFORMAT". For instance, to POST data in Turtle format, use: -H "Content-Type: text/turtle". You can submit the content of a file in a POST request using --data-binary @filename. The double dashes are necessary. Finally, provide the URI to which you submit the POST request.

  1. Go to http://193.49.165.77:8083/.

    The resource you are seeing is a container; it contains other resources. Note the ldp:member triple. This container follows the Linked Data Platform (LDP) standard. It is one of the newest Semantic Web standards by the World Wide Web Consortium (W3C).

    To add new members to the container, the LDP standard prescribes to send a POST request to the collection. The body of the request must include an RDF graph, in one of the known RDF formats (e.g. Turtle). Optionally, you can suggest a name for the created resource by adding a Slug header (e.g. Slug: some-resource-name).

  2. Use cURL to add the RDF graph you have designed (cf. previous exercise) as a member of the container. Make sure your cURL command includes the appropriate arguments to comply to the LDP standard. Add -i (or --include) to display the response’s headers.
  3. Look at the response status code.
    • If the response status code is 201, the operation has succeeded. Navigate to the created resource by looking up the corresponding Location header. Where else could you find a link to the created resource?
    • If the response status code is 400, your request was not appropriate. Try with different arguments. The server provides you with an error message (in RDF) that may help you understand the problem.
    • If the response status code is 500, something went wrong on the server’s side. Shout out your dire need for help to get the lecturer’s attention. (Use your acting skills to convey how desperate your situation is.)

    The LDP standard also specifies how to update and remove a resource from a container. To that end, send PUT and DELETE requests to the resource (to update and remove, respectively, using the -X option). The body of the PUT request must include an RDF graph. The existing RDF representation of the resource will then be replaced by the provided RDF graph.

  4. Use cURL to remove the resource you have created, then re-create it (with POST).
  5. Navigate back to the resource container: http://193.49.165.77:8083/. You should now see other member resources (those created by your classmates). Pick one and mark its URI.
  6. Add a link (a triple) to your classmate’s resource in your RDF graph. Use cURL to update your resource online. Is the response status code equal to 204?

Model a classroom configuration

This is an exercise in graph-based data modelling. We would like to be able to describe, in the same graph, the configurations of the classrooms during different sessions. We must be able to answer the questions: what seat has a left and right neighbour? Who was sitting next to who in the Semantic Web session of the 1st of October 2021?

This part should be done with a pen and paper, and can be done in group.

Acknowledgement

Thanks to Victor Charpenay for the part on Linked Data Platforms.