Semantic Web project

The Semantic Web project is a large and long practical exercise that consists in integrating all the pieces that have been seen during the first sessions into a consolidated Web application. To make sure you can advance sufficiently fast to cover everything, you are allowed to work by pair.

Main objective

The objective of the project is to build a Knowledge Graph (KG) from a wiki, as DBpedia or YAGO have been built from Wikipedia. As a starting point, you will use The Tolkien Gateway, an encyclopedia about J.R.R. Tolkien’s legendarium with 13k+ articles.

Requirements

Pedagogical objectives

Timeline

The project starts full time on Friday 19th December 2025 but some of the practical sessions already provided the building blocks for the project. You will be working on your project full time during the sessions of the 19th of December morning, 19th of December afternoon, 6th of January 2026 afternoon (after the exam), and 9th of January 2026 afternoon.

You must deliver all of your working files before the 15th of January 2026. All files must be sent strictly before the end of the day, in Central European Time. Every extra minute will be penalised. You must additionally provide a written report explaining your choices, the functionalities, etc. before the 23rd of January, 2026 end of the day, Central European Time. Everything that comes after this deadline will be rejected as if nothing was delivered.

Delivery: Files must be committed to a code repository of your choice. You must send an email with the link to the repository and no attached document, with your team mate in CC of the email. The report documenting your work must be put on the repository. The knowledge graph that you produce in Turtle must also be put on the code repository.

Resources

Tolkien Gateway wiki

Tolkien Gateway is a wiki powered by MediaWiki, the wiki engine originally developed for Wikipedia. Because of that, the wiki’s content can be accessed via the MediaWiki API. See the list of API calls available on the Tolkien Gateway. The main API calls to use in the project are query and parse (they are called actions in the documentation). With the query action, you can list pages and categories of the wiki. With the parse action, you can get the source of a given page or extract specific elements, such as links to other pages or images.

All MediaWiki-powered wikis use the same syntax to edit pages: wikitext. An introduction by example is available on MediaWiki. The most relevant feature of wikitext in the project is templating. A template is a page containing text that is meant to be reused at several places in the wiki. Invoking the template in some page has the effect of inserting the template text in that page. Templates may have parameters. See the MediaWiki documentation on templates for more details.

Many client libraries exist to ease the development of bots accessing a MediaWiki API. See the list maintained on MediaWiki. Most of these libraries ensure that bots are “well-behaved”, which typically implies that they do not overload the server with many requests (throughput is limited) but also that they declare themselves as bots (by setting an appropriate User-Agent header in their HTTP requests). To avoid being banned by the wiki server, we strongly recommend that you use a client library to access the Tolkien Gateway.

Integrating external data

The collectible card game Middle Earth: The Wizards has cards that represent characters, items, or places of Middle Earth. Some of them are invented by the card maker company, but some cards correspond to actual entities of Tolkien’s world. A data set of all METW cards is available in JSON. Use this data set to link wiki entities with the corresponding cards.

Additionally, you can a data set of Lord of the Rings characters in CSV to enrich your data.

Finally, you may use another wiki about Tolkien’s legendarium to enrich your data further. In particular, this wiki is available in multiple languages (see the list and links at the bottom of the main page). You can add labels for the entities in your data in multiple languages.

Technical guidelines

Here are steps you can carry out to develop your application. You will not be evaluated on each of these steps but on the end result; you are free to plan development in a different way.