UCIAD intends to use ontologies both as a way to achieve the integration of activity across various, possibly heterogeneous systems, and to benefit from their inference capabilities to support the flexible, customisable and expressive analysis of such activity data. Building an ontology that could be used as a conceptual model for all sorts of activity data is quite obviously a difficult task, which is going to be refined and iterated over the length of the project (and hopefully beyond the end of the project).
However, compared to other domains, the advantage of user activities is that there is a lot of data to look at. This might be seen as an issue (from a technical point of view, but also because it is quite overwhelming to get so much data), but in reality, this allows to apply a bottom-up approach to building our ontologies: modelling through characterising the data, rather than through expertise in the domain. It also gives us an insight into the scale of the tasks, and the need for adapted tools to support both the ontological definition of specific situations, and the ontology-based analysis of large amounts of traces of activity data.
Identifying concepts and their relations
The first step in building our ontology is to identify the key concepts, i.e., the key notions, that we need to tackle, bearing in mind that our ultimate goal is to understand activities. The main concepts we are considering are therefore the ones that support the concept of activity. Activities relate to users, but not only. We rely extensively on website logs as sources of activity data. In these cases, we can investigate requests both from human users and from robots automatically retrieving and crawling information from the websites. The server logs in question represent collections can be seen as traces of activities that these users/robots are realising on websites. We therefore need to model these other aspects, which correspond to actions that are realised by actors on particular resources. These are the three kinds of objects that, in the context of Web platforms, we want to model, so that they can be interpreted and classified in terms of activities. We therefore propose 4 ontologies to be used as the basis of the work in UCIAD:
- The Actor Ontology is an ontology representing different types of actors (human users vs robots), as well as the technical setting through which they realise online activities (computer and user agent).
- The Sitemap Ontology is an ontology to represent the organisation of webpages in collections and websites, and which is extensible to represent different types of webpages and websites.
- The Trace Ontology is an ontology to represent traces of activities, realised by particular agents on particular webpages. As we currently focus on HTTP server logs, this ontology contain specific sections related to traces as HTTP requests (e.g., methods as actions and HTTP response code). It is however extensible to other types of traces, such as specific logs for VLEs or search systems.
- The Activity Ontology is intended to define particular classes of activities into which traces can be classified, depending on their particular parameters (including actors and webpages). The type of activities to consider highly depends on the systems considered and to a certain extent on the user. The idea here is that specific versions of the ontology will be built that fit the specific needs of particular systems. We will then extract the generic and globally reusable part of these ontologies to provide a base for an overarching activity ontology. Ultimately, the idea in UCIAD is that individual users will be able to manipulate this ontology to include their specific view on their own activities.
Reusing existing ontology
When dealing with data and ontologies, reuse is generally seen as a good practice. Appart from saving time from not having to remodel things that have already been described elsewhere, it also helps anticipating on future needs for interoperability by choosing well established ontologies that are likely to have been employed elsewhere. We therefore investigated existing ontologies that could help us define the notions mentioned above. Here are the ontology we reused:
- The FOAF ontology is commonly used to describe people, their connections with other people, but also their connections with documents. We use FOAF in the Actor Ontology for human users, and on the Sitemap Ontology for Webpages (as Documents).
- The Time Ontology is a common ontology for representing time and temporal intervals. We use it in the Trace Ontology.
- The Action ontology defines different types of actions in a broad sense, and can be used as a basis for representing elements of the requests in the Trace Ontology, but also as a base typology for the Activity ontology. It itself imports a number of other ontologies, including its own notion of actors.
While not currently used in our base ontologies, other ontologies can be considered at a later stage, for example to model specific types of activities. These include the Online Presence Ontology (OPO), as well as the Semantically-Interlinked Online Communities ontology (SIOC).
Next: Using, refining, customizing
Ontology modelling is a never ending task. Elements constantly need to be corrected and added to cover more and more cases in a way as generic as possible. It is even more the case in UCIAD as the approach is to create the ontology depending on the data we need to treat. Therefore, as we will progressively be adding more data from different sources, including server logs from different types fo websites, activity logs from systems such as VLEs or video players, the ontologies will evolve to include these cases.
Going a step further, what we want to investigate is the user-centric analysis of activity data. The ontologies will be used to provide users with views and analysis mechanisms for the data that concern their own activities. It therefore seems a natural next step to make it possible for the users to extend the ontologies, to customize them, therefore creating their own view on their own data.