Linked Data: interacting with and publishing Web data

This session is about publishing structured data on the Web.

Objectives

Learning curl

In this part, you will use a command line tool called cURL. It is a tool that every computer scientist should have and know how to use. If you are already familiar with cURL, you can jump to the next section.

If you do not have it already, download cURL and put it in a folder you will remember. On a Linux OS, cURL is available as a package in most distributions. Use your distribution's package manager (such as apt or brew) to get it.

You may need to update the PATH variable in your system environment configuration. On MS Windows, you can use Window-key + R, then type SystemPropertiesAdvanced. Then click Environment variables.... Then find the variable Path or PATH in the user or system variables. Edit it and add the path to the folder you used to put cURL.

We will first learn the basics of cURL, then use it to understand how Linked Data principles and best practices are implemented.

  1. Open a web browser and type http://mines-stetienne.fr in the address bar. Notice what is happening. We are going to compare this to what cURL does.
  2. Open a command line tool window.
  3. Type curl -V to check that cURL is working. If not, go back to the previous steps.
  4. Type curl http://mines-stetienne.fr and look at the result. cURL displays the payload (that is, the “body”) of the HTTP response. In this case, it is an HTML document saying that the document was moved to https://mines-stetienne.fr.
  5. Type curl https://mines-stetienne.fr and look at the result. It should be empty. We need to figure out what is happening.
  6. Type curl -I https://mines-stetienne.fr and look at the result. -I asks to display only the HTTP HEAD of the response, not the payload. We see that the resource with URI https://mines-stetienne.fr was found at another location and we see the location where we can find it.
  7. Type curl https://www.mines-stetienne.fr/ and look at the result.. This time, you get a web page. This is the HTML code of the page you see in your browser.
  8. The HTTP response codes 301 Moved Permanently and 302 Found are commonly called “redirects”. Your browser directly displays the Web page because it is “following” the redirects. You can check that the URL in the address bar of your browser is https://www.mines-stetienne.fr/. The browser stops redirecting when it finds a 200 OK: it means that the resource you requested (namely, https://www.mines-stetienne.fr/) has been found and is this file, which is an information resource.

    You can follow redirects with cURL, using the option -L. Check this: curl -L http://mines-stetienne.fr. You can also see what the server is responding at each step of the negotiation by adding -I. You can get even more details about the requests and responses by further adding -v or --verbose.

Use cURL on Linked Data

We will use the cURL and DBpedia to see how Linked Data can be accessed via HTTP.

  1. We want to get a representation of the resource identified by http://dbpedia.org/resource/Tim_Berners-Lee. Use cURL and see what URIs must be requested, in order, to reach a final representation. It is possible that you have to use the option -k when requesting https URIs, depending on your system configuration. Take note of all the URIs in a text file that you will send (to antoine.zimmermann@emse.fr) at the end of the session.
  2. What is the format of the final response, after following the redirects? What is the response code? Note your answers in the text file.
  3. You can request a different format by changing the headers of your HTTP request. Type curl -H "Accept: text/turtle" http://dbpedia.org/resource/Tim_Berners-Lee. Use -H "Accept: text/turtle" on all necessary requests to reach a 200 OK and get some data. Write the list of URIs that have been requested along the way.
  4. Write a single command that gets the RDF/XML representation of the resource http://dbpedia.org/resource/Tim_Berners-Lee and write it in your answer file.

Follow the links

If the fourth Linked Data principle is used, there should be links from one data set to another, so that we can follow links to discover more data.

A tool that can help you navigate through RDF data is RDF Browser, a Firefox extension that shows RDF in the browser whenever it is available by content negotiation. If you have Firefox, you can install and try this extension.

As an alternative, you can also use Postman. Postman is comparable to cURL, except that it has a graphical interface that facilitates navigation (among other things). All links that appear in a response body will be clickable. You will have to manually add Accept headers for content negotiation, though.

In DBpedia, Mines Saint-Étienne's data are linked to many other data sets. Using the RDF Browser, Postman or cURL (in this order of preference), find a path starting from https://dbpedia.org/resource/%C3%89cole_nationale_sup%C3%A9rieure_des_mines_de_Saint-%C3%89tienne and leading to Jean Monnet University. Hint: the shortest path between the two goes through the DBpedia data set about Saint-Étienne (the city). In total, how many HTTP requests did you have to send? Write down the answer in the text file.

Publishing data on a Linked Data Platform

Your goal in this part is to publish the RDF data you have modeled in previous assignments. You will publish it on a platform, following Linked Data principles, such that you will be able to link your data to other people's data. The prerequisite for this part is that you have an RDF graph written in some Turtle file (say, data.ttl).

To communicate with the platform, you will need cURL or Postman. The main cURL options you should know are the following:

-X to indicate the request method, i.e. GET, POST, PUT or DELETE,
-H to add headers, such as Accept, to the request
-i to inspect headers included in the response and
--data-binary @<filename> to specify the body of the request (beware of the 'double dash' and 'at' characters)
-u <username>:<password> to specify the username and password for websites that are password-protected (replace <username> and <password> with the username and password given in class)

Postman's interface should be intuitive enough for those who have used cURL once. Once your communication tool is set up, you are ready to publish your RDF data.

  1. Go to http://193.49.165.77:3000/semweb/. Make sure you obtain an RDF representation of the resource and not an HTML representation.

    The resource you are seeing is a container; it contains other resources, in the same way a directory contains other files. Note the ldp:member triple. This container follows the Linked Data Platform (LDP) standard. It is one of the newest Semantic Web standards by the World Wide Web Consortium (W3C). After having read section 1 of the LDP primer, look at what resources are contained in the semweb container. List them in the text file.

    To add resources to a container, the LDP standard prescribes to send a POST request to it. The body of the request must include an RDF graph, in one of the known RDF formats (e.g. Turtle). Optionally, you can suggest a name for the created resource by adding a Slug header (e.g. Slug: <your name>-data).

  2. Use cURL to add data.ttl as a member of the container. Make sure your request includes the appropriate headers: Content-Type: text/turtle and (optionally) Slug: <your name>-data. Then, look at the response status code.
    • If the response status code is 201, the operation has succeeded. Navigate to the created resource by looking up the corresponding Location header.
    • If the response status code is 400, your request was not appropriate. Try with different arguments. The server provides you with an error message that may help you understand the problem.
    • If the response status code is 500, something went wrong on the server’s side. Use your acting skills to get the lecturer’s attention.
  3. Navigate back to the resource container: http://193.49.165.77:3000/semweb/. You should now see a link pointing to the resource you created (as an RDF triple with the ldp:member predicate).

    The LDP standard also specifies how to update and remove a resource from a container with PUT and DELETE requests, respectively. The body of the PUT request must include an RDF graph. The existing RDF representation of the resource will then be replaced by the provided RDF graph. The DELETE request has no body.

  4. Remove the resource you have created. What effect does it have on the container? Write down the answer in the text file.
  5. Linked Data allows anyone to publish data linking to your own data, as explained in the LDP primer. Your data should include an entity for the SemWeb lecture you have today and another entity for yourself. Can your classmates unambiguously identify these two entities with URIs? If so, what happens if other Web users try to dereference these URIs?

    The entities you have modeled should all be identified with URIs of the form http://193.49.165.77:3000/semweb/<your name>-<entity>, so that anyone discovering a link to your data can access it. Refactor data.ttl so that it only includes such URIs (for the entities you modeled yourself).

  6. Now that you have proper URIs in your data set, these URIs should all be dereferenceable: if someone sends a GET request (with RDF Browser, Postman or cURL) to http://193.49.165.77:3000/semweb/<your name>-<entity>, the platform should respond with triples about http://193.49.165.77:3000/semweb/<your name>-<entity>.

    Split data.ttl into several Turtle files: for each resource <your name>-<entity> that appears in data.ttl, there should be a file <entity>.ttl that includes triples whose subject is http://193.49.165.77:3000/semweb/<your name>-<entity>. Then, publish each of these files on the platform. Make sure you're not using the same URIs as your classmates! Use <your name> as a prefix.
  7. One of the resources you have created should correspond to a hard drive model, e.g. <your name>-barracuda-hdd. You'll now use the LDP feature that allows you to edit your data. Download a Turtle representation of <your name>-barracuda-hdd and edit the resulting file by adding a triple stating that it has been discontinued. Try to re-upload the edited Turtle file using a PUT request to <your name>-barracuda-hdd. Note that the requested URI should be the URI of the product model, not the container's URI. What response status do you get? Write down the answer in the text file.
  8. A Linked Data Platform may require that clients use the HTTP ETag mechanism to avoid conflicts. In practice, it means that all resources have an ETag, provided by the platform in the ETag response header. Send a GET request to <your name>-barracuda-hdd to know that resource's ETag (<some ETag>). You can now re-send your PUT request and indicate to the platform that you wish to update the resource only if the ETag you provide matches the ETag the platform has. To that end, add the header If-Match: <some ETag> header to your PUT request. What response status do you now get?
  9. As you work on the previous tasks, you should see that the container now links to resources uploaded by your classmates. What links (i.e. RDF triples) could you add between your resources and theirs? Add example links to their resources in your description of the product model, of products or any entity you have described so far. Make sure the URIs they chose for their resources are dereferenceable. If your browser is Firefox, use the RDF Browser extension in developer mode (accessible in the extension's settings, by clicking on RDF Browser icon in the URL bar). In developer mode, RDF Browser automatically diagnoses Linked Data conformance of the resource under inspection.

Acknowledgement

Thanks to Victor Charpenay for the part on Linked Data Platforms.