Semantic Web - Practical session 8

This session will be about managing RDF data programmatically. We will set up an RDF data base (also called a triplestore). We will convert existing, non-RDF data, into RDF, programmatically, then load it to the triplestore.

Generate RDF with Apache Jena

These instructions assume that you are programming in Java, preferably with Eclipse, using the Apache Jena libraries. You may also use RDF4J in Java, RDFlib in Python, or Redland RDF libary in C, or dotNetRDF in C♯, or EasyRDF for PHP, or N3.js for JavaScript, or Ruby RDF for Ruby, or SWI-Prolog Semantic Web Library, etc.

These operations should get you started with Apacha Jena and Eclipse. With a different IDE for Java, the only difference will be the initial settings for a Mavan project. If you are using a different library, look at the documentation.

Start Eclipse.
Go to File -> New -> Java Project....
Use the text box to search for "Maven Project". Select it and click Next >.
Check the box "Create a simple project (skip archetype selection)". Click Next >.
In the Artifact's Group Id, write fr.emse.master. In the Artifact's Artifact Id, write semweb. Click Finish.
Eclipse will generate a project with a folder structure and one file called pom.xml. Double click on this file.
You should see information about the file and at the bottom, tabs named "Overview", "Dependencies", "Dependency Hierarchy", "Effective POM", and "pom.xml". Click on "pom.xml".
Replace the existing code there with the one in this pom.xml. If you used a different groupId or artifactId, change it accordingly.

Now you will generate RDF data from non-RDF sources. Read the Jena tutorial to familiarise yourself with the API and learn how to generate an RDF graph programmatically. Once you are done with the tutorial, follow the instructions below.

Download the TGV dataset or the TER dataset. These datasets are published as open data by SNCF and describe the national and regional train lines, stations and schedule.
Look at the file stops.txt in the dataset you downloaded. It just describes the names and location of train stations. What is relevant is the stop_id, stop_name, stop_lat and stop_lon.
To try your code faster, there is a file that contains only a sample of the data (200 lines). It will also prevent saturating memory. Once you have something that works with the sample, try it with the complete dataset.
Use Jena to make an RDF graph that describes the names and locations of the train stations from the file mentioned before. Geolocated things can be described as instances (rdf:type) of the class http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing, usually abbreviated as geo:SpatialThing. The WGS84 Geo Positioning vocabulary also provides RDF properties for latitude (geo:lat) and longitude (geo:long). Generate IRIs for each stops based on their stop_id.

From a single line of the file, the resulting RDF should be (in Turtle):

@prefix ex: <http://www.example.com/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

ex:StopArea:OCE80194035  a  geo:SpatialThing;
	rdfs:label  "gare de Neustadt (Weinstr) Hbf"@fr;
	geo:lat  "49.35006155"^^xsd:decimal;
	geo:long "8.14067588"^^xsd:decimal .

Setting up a triplestore

There are many triplestores. The simplest to set up is probably Fuseki.

Download the archive for Apache Jena Fuseki from Jena download page.
In the archive, there is an executable: fuseki-server.bat for Windows systems, fuseki-server for Unix-based systems. Execute it. The server will be running in the background.
With your Web browser, go to http://localhost:3030. This interface allows you to manage your data.
Go to "manage datasets". Create a new dataset and make it persistent.
Upload an RDF file of your choice.

In the exercise of the first part, you can generate all the data at once in a large Jena Model and serialise it as RDF, or you can fill in a triplestore little by little. If you want to add data to a triplestore such as Jena Fuseki, you can send update queries like this:

Model model = ModelFactory.createDefaultModel();

			// ... build the model

			String datasetURL = "http://localhost:3030/dataset";

			String sparqlEndpoint = datasetURL + "/sparql";

			String sparqlUpdate = datasetURL + "/update";

			String graphStore = datasetURL + "/data";

			RDFConnection conneg = RDFConnectionFactory.connect(sparqlEndpoint,sparqlUpdate,graphStore);

			conneg.load(model); // add the content of model to the triplestore

			conneg.update("INSERT DATA { <test> a <TestClass> }"); // add the triple to the triplestore

If you finish fast, you can then try to define a vocabulary for GTFS and transform all the SNCF data to RDF. Your vocabulary can also distinguish between train stations and coach stations, relate stations (StopArea) to more specific locations (StopPoint), etc.

Interacting with a Linked Data Platform

To interact with a Linked Data Platform as in practical session 3 programmatically, you need to rely on an HTTP library in your programming language. You may use the Apache HTTP Client in Java (which is also a Jena dependency), or URLlib in Python, etc. Instead of using cURL, you send POST requests with appropriate Turtle payload via the programming interfaces.

Write a program that reproduces the steps of practical session 3 (Publishing data on a Linked Data Platform).

RDF data management and processing

Generate RDF with Apache Jena

Setting up a triplestore

Interacting with a Linked Data Platform