This session is about publishing structured data on the Web.
curl
In this part, you will use a command line tool called cURL. It is a tool that every computer scientist should have and know how to use. If you are already familiar with cURL, you can jump to the next section.
If you do not have it already, download cURL and put it in a folder you will remember. On a Linux OS, cURL is available as a package in most distributions. Use your distribution's package manager (such as apt
or brew
) to get it.
You may need to update the PATH
variable in your system environment configuration. On MS Windows, you can use Window-key + R
, then type SystemPropertiesAdvanced
. Then click Environment variables...
. Then find the variable Path
or PATH
in the user or system variables. Edit it and add the path to the folder you used to put cURL.
We will first learn the basics of cURL, then use it to understand how Linked Data principles and best practices are implemented.
http://mines-stetienne.fr
in the address bar. Notice what is happening. We are going to compare this to what cURL does.curl -V
to check that cURL is working. If not, go back to the previous steps.curl http://mines-stetienne.fr
and look at the result. cURL displays the payload (that is, the “body”) of the HTTP response. In this case, it is an HTML document saying that the document was moved to https://mines-stetienne.fr
.curl https://mines-stetienne.fr
and look at the result. It should be empty. We need to figure out what is happening.curl -I https://mines-stetienne.fr
and look at the result. -I
asks to display only the HTTP HEAD of the response, not the payload. We see that the resource with URI https://mines-stetienne.fr
was found at another location and we see the location where we can find it.curl https://www.mines-stetienne.fr/
and look at the result.. This time, you get a web page. This is the HTML code of the page you see in your browser.The HTTP response codes 301 Moved Permanently
and 302 Found
are commonly called “redirects”. Your browser directly displays the Web page because it is “following” the redirects. You can check that the URL in the address bar of your browser is https://www.mines-stetienne.fr/
. The browser stops redirecting when it finds a 200 OK
: it means that the resource you requested (namely, https://www.mines-stetienne.fr/
) has been found and is this file, which is an information resource.
You can follow redirects with cURL, using the option -L
. Check this: curl -L http://mines-stetienne.fr
. You can also see what the server is responding at each step of the negotiation by adding -I
. You can get even more details about the requests and responses by further adding -v
or --verbose
.
We will use the cURL and DBpedia to see how Linked Data can be accessed via HTTP.
http://dbpedia.org/resource/Tim_Berners-Lee
. Use cURL and see what URIs must be requested, in order, to reach a final representation. It is possible that you have to use the option -k
when requesting https
URIs, depending on your system configuration. Take note of all the URIs in a text file that you will send (to antoine.zimmermann@emse.fr) at the end of the session.curl -H "Accept: text/turtle" http://dbpedia.org/resource/Tim_Berners-Lee
. Use -H "Accept: text/turtle"
on all necessary requests to reach a 200 OK
and get some data. Write the list of URIs that have been requested along the way.http://dbpedia.org/resource/Tim_Berners-Lee
and write it in your answer file.If the fourth Linked Data principle is used, there should be links from one data set to another, so that we can follow links to discover more data.
A tool that can help you navigate through RDF data is RDF Browser, a Firefox extension that shows RDF in the browser whenever it is available by content negotiation. If you have Firefox, you can install and try this extension.
As an alternative, you can also use Postman. Postman is comparable to cURL, except that it has a graphical interface that facilitates navigation (among other things). All links that appear in a response body will be clickable. You will have to manually add Accept
headers for content negotiation, though.
In DBpedia, Mines Saint-Étienne's data are linked to many other data sets. Using the RDF Browser, Postman or cURL (in this order of preference), find a path starting from https://dbpedia.org/resource/%C3%89cole_nationale_sup%C3%A9rieure_des_mines_de_Saint-%C3%89tienne
and leading to Jean Monnet University. Hint: the shortest path between the two goes through the DBpedia data set about Saint-Étienne (the city). In total, how many HTTP requests did you have to send? Write down the answer in the text file.
Your goal in this part is to publish the RDF data you have modeled in previous assignments. You will publish it on a platform, following Linked Data principles, such that you will be able to link your data to other people's data. The prerequisite for this part is that you have an RDF graph written in some Turtle file (say, data.ttl
).
To communicate with the platform, you will need cURL or Postman. The main cURL options you should know are the following:
-X |
to indicate the request method, i.e. GET, POST, PUT or DELETE, |
-H |
to add headers, such as Accept , to the request |
-i |
to inspect headers included in the response and |
--data-binary @<filename> |
to specify the body of the request (beware of the 'double dash' and 'at' characters) |
-u <username>:<password> |
to specify the username and password for websites that are password-protected (replace <username> and <password> with the username and password given in class) |
Postman's interface should be intuitive enough for those who have used cURL once. Once your communication tool is set up, you are ready to publish your RDF data.
http://193.49.165.77:3000/semweb/
. Make sure you obtain an RDF representation of the resource and not an HTML representation.
The resource you are seeing is a container; it contains other resources, in the same way a directory contains other files. Note the ldp:member
triple.
This container follows the Linked Data Platform (LDP) standard.
It is one of the newest Semantic Web standards by the World Wide Web Consortium (W3C). After having read section 1 of the LDP primer, look at what resources are contained in the semweb
container. List them in the text file.
To add resources to a container, the LDP standard prescribes to send a POST
request to it.
The body of the request must include an RDF graph, in one of the known RDF formats (e.g. Turtle).
Optionally, you can suggest a name for the created resource by adding a Slug
header (e.g. Slug: <your name>-data
).
data.ttl
as a member of the container.
Make sure your request includes the appropriate headers: Content-Type: text/turtle
and (optionally) Slug: <your name>-data
.
Then, look at the response status code.
201
, the operation has succeeded.
Navigate to the created resource by looking up the corresponding Location
header.
400
, your request was not appropriate.
Try with different arguments.
The server provides you with an error message that may help you understand the problem.
500
, something went wrong on the server’s side.
Use your acting skills to get the lecturer’s attention.
Navigate back to the resource container: http://193.49.165.77:3000/semweb/
. You should now see a link pointing to the resource you created (as an RDF triple with the ldp:member
predicate).
The LDP standard also specifies how to update and remove a resource from a container with PUT
and DELETE
requests, respectively.
The body of the PUT
request must include an RDF graph.
The existing RDF representation of the resource will then be replaced by the provided RDF graph.
The DELETE
request has no body.
Linked Data allows anyone to publish data linking to your own data, as explained in the LDP primer. Your data should include an entity for the SemWeb lecture you have today and another entity for yourself. Can your classmates unambiguously identify these two entities with URIs? If so, what happens if other Web users try to dereference these URIs?
The entities you have modeled should all be identified with URIs of the form http://193.49.165.77:3000/semweb/<your name>-<entity>
, so that anyone discovering a link to your data can access it. Refactor data.ttl
so that it only includes such URIs (for the entities you modeled yourself).
Now that you have proper URIs in your data set, these URIs should all be dereferenceable: if someone sends a GET
request (with RDF Browser, Postman or cURL) to http://193.49.165.77:3000/semweb/<your name>-<entity>
, the platform should respond with triples about http://193.49.165.77:3000/semweb/<your name>-<entity>
.
data.ttl
into several Turtle files: for each resource <your name>-<entity>
that appears in data.ttl
, there should be a file <entity>.ttl
that includes triples whose subject is http://193.49.165.77:3000/semweb/<your name>-<entity>
. Then,
publish each of these files on the platform. Make sure you're not using the same URIs as your classmates! Use <your name>
as a prefix.
<your name>-barracuda-hdd
. You'll now use the LDP feature that allows you to edit your data. Download a Turtle representation of <your name>-barracuda-hdd
and edit the resulting file by adding a triple stating that it has been discontinued. Try to re-upload the edited Turtle file using a PUT
request to <your name>-barracuda-hdd
. Note that the requested URI should be the URI of the product model, not the container's URI. What response status do you get? Write down the answer in the text file.
ETag
response header. Send a GET
request to <your name>-barracuda-hdd
to know that resource's ETag (<some ETag>
). You can now re-send your PUT
request and indicate to the platform that you wish to update the resource only if the ETag you provide matches the ETag the platform has. To that end, add the header If-Match: <some ETag>
header to your PUT
request. What response status do you now get?
Thanks to Victor Charpenay for the part on Linked Data Platforms.