On Self-Tracking

May 18th, 2011

I have said it and repeated it numerous times, UCIAD is profoundly different from all the other JISC Activity data projects at many different levels. One of them, at the basis of our main hypothesis is that we consider activity data for the user’s own consumption, and to his/her own benefit. The team working in UCIAD has made this notion of user-centric personal information a guiding principle for research. With my colleague Matthew Rowe we recently described a major aspect of this research in a position paper for the W3C Workshop on Web Tracking and User Privacy: Self-tracking on the Web.

As described in the paper, entitled “Self-Tracking on the Web: Why and How“, self-tracking is “the activity of monitoring and analysing one’s own behaviour regarding personal information exchange and the consequences of such behaviour on their exposure, privacy and reputation“. We emphasize in this paper how existing tools and technologies to realise self-tracking on the Web are limited, especially in comparison with the tools and technologies used to track user activities and data to the benefit of organisations. The paper concluded that “achieving such a process of self-tracking can be very revealing to Web users, helping them reaching a better awareness of their own online behaviour, and a better understanding of the possible consequences of such behaviour on the exposure of their personal information. Such an approach appears to be crucially needed as the Web evolves to both a global information marketplace, and a major medium for all sorts of social interactions online. [...] We therefore argue that a more principled and comprehensive study of the activity of self-tracking on the Web and of the technological requirements for such an activity to take place should be conducted. This requires for both the social and conceptual models of the way personal information is exchanged on the Web to be related to the technological protocols that are used as mediums for instantiating these models. From a more concrete point of view, we believe that a new set of tools are to be created that will support users in monitoring their own activity on the Web

UCIAD can be seen as an experiment in this direction. Focusing on Web data related to the interaction between an user and an organisation, it is looking at the techniques, the models and the tools that are necessary to enable users to have a personalised view on their own data, i.e., the data generated by their own activity. More generally, it is also setting up generic models of activity online i.e., the ontologies and the associated technological components, that can be reused in broader environments.

UCIAD ontologies: A starting point

March 23rd, 2011

UCIAD intends to use ontologies both as a way to achieve the integration of activity across various, possibly heterogeneous systems, and to benefit from their inference capabilities to support the flexible, customisable and expressive analysis of such activity data. Building an ontology that could be used as a conceptual model for all sorts of activity data is quite obviously a difficult task, which is going to be refined and iterated over the length of the project (and hopefully beyond the end of the project).

However, compared to other domains, the advantage of user activities is that there is a lot of data to look at. This might be seen as an issue (from a technical point of view, but also because it is quite overwhelming to get so much data), but in reality, this allows to apply a bottom-up approach to building our ontologies: modelling through characterising the data, rather than through expertise in the domain. It also gives us an insight into the scale of the tasks, and the need for adapted tools to support both the ontological definition of specific situations, and the ontology-based analysis of large amounts of traces of activity data.

Identifying concepts and their relations

The first step in building our ontology is to identify the key concepts, i.e., the key notions, that we need to tackle, bearing in mind that our ultimate goal is to understand activities. The main concepts we are considering are therefore the ones that support the concept of activity. Activities relate to users, but not only. We rely extensively on website logs as sources of activity data. In these cases, we can investigate requests both from human users and from robots automatically retrieving and crawling information from the websites. The server logs in question represent collections can be seen as traces of activities that these users/robots are realising on websites. We therefore need to model these other aspects, which correspond to actions that are realised by actors on particular resources. These are the three kinds of objects that, in the context of Web platforms, we want to model, so that they can be interpreted and classified in terms of activities. We therefore propose 4 ontologies to be used as the basis of the work in UCIAD:

  • The Actor Ontology is an ontology representing different types of actors (human users vs robots), as well as the technical setting through which they realise online activities (computer and user agent).
  • The Sitemap Ontology is an ontology to represent the organisation of webpages in collections and websites, and which is extensible to represent different types of webpages and websites.
  • The Trace Ontology is an ontology to represent traces of activities, realised by particular agents on particular webpages. As we currently focus on HTTP server logs, this ontology contain specific sections related to traces as HTTP requests (e.g., methods as actions and HTTP response code). It is however extensible to other types of traces, such as specific logs for VLEs or search systems.
  • The Activity Ontology is intended to define particular classes of activities into which traces can be classified, depending on their particular parameters (including actors and webpages). The type of activities to consider highly depends on the systems considered and to a certain extent on the user. The idea here is that specific versions of the ontology will be built that fit the specific needs of particular systems. We will then extract the generic and globally reusable part of these ontologies to provide a base for an overarching activity ontology. Ultimately, the idea in UCIAD is that individual users will be able to manipulate this ontology to include their specific view on their own activities.

Reusing existing ontology

When dealing with data and ontologies, reuse is generally seen as a good practice. Appart from saving time from not having to remodel things that have already been described elsewhere, it also helps anticipating on future needs for interoperability by choosing well established ontologies that are likely to have been employed elsewhere. We therefore investigated existing ontologies that could help us define the notions mentioned above. Here are the ontology we reused:

  • The FOAF ontology is commonly used to describe people, their connections with other people, but also their connections with documents. We use FOAF in the Actor Ontology for human users, and on the Sitemap Ontology for Webpages (as Documents).
  • The Time Ontology is a common ontology for representing time and temporal intervals. We use it in the Trace Ontology.
  • The Action ontology defines different types of actions in a broad sense, and can be used as a basis for representing elements of the requests in the Trace Ontology, but also as a base typology for the Activity ontology. It itself imports a number of other ontologies, including its own notion of actors.

The graph representing the dependencies between our ontologies and others is represented below.
UCIAD ontologies dependencies

While not currently used in our base ontologies, other ontologies can be considered at a later stage, for example to model specific types of activities. These include the Online Presence Ontology (OPO), as well as the Semantically-Interlinked Online Communities ontology (SIOC).

Next: Using, refining, customizing

Ontology modelling is a never ending task. Elements constantly need to be corrected and added to cover more and more cases in a way as generic as possible. It is even more the case in UCIAD as the approach is to create the ontology depending on the data we need to treat. Therefore, as we will progressively be adding more data from different sources, including server logs from different types fo websites, activity logs from systems such as VLEs or video players, the ontologies will evolve to include these cases.

Going a step further, what we want to investigate is the user-centric analysis of activity data. The ontologies will be used to provide users with views and analysis mechanisms for the data that concern their own activities. It therefore seems a natural next step to make it possible for the users to extend the ontologies, to customize them, therefore creating their own view on their own data.


March 14th, 2011

UCIAD is a relatively small, experimental project looking at how semantic technologies can help the user-centric integration, analysis and interpretation of activity data in a large organisation. As such, as suggested also to all the other projects in the JISC Activity Data programme, it relies on a central hypothesis that will hopefully be verified through the realisation and application of our software platform. But before we can express this hypothesis, we need to introduce a bit of background. Especially, we beed to get back to what we mean by “user-centric”.

To put it simply, a user-centric approach is considered here in opposition to an organisation-centric approach. The most common way of considering activity data in large organisations at the moment is through consolidating visits to websites in analytics, giving statistics about the number of visits on a given website or webpage, and where these visits were coming from. We qualify this as an organisation-centric view as the central point of focus is the website managed by the organisation. By taking such a restricted perspective on the interpretation of activity data, a number of potentially interesting questions, that take the users concerned with the activity data as the focus point, cannot be answered. The analysis of the activity data can also be only beneficial to the organisation, and not the user, as each user becomes aggregated in website related statistics. We therefore express our main hypothesis as

Hypothesis 1: Taking a user-centric point of view can enable different types of analysis of activity data, which are valuable to the organisation and the user.

In order to test this hypothesis, one actually needs to achieve such user-centric analysis of activity data. This implies a number of technical and technological challenges, namely, the need to integrate activity data across a variety of websites managed by an organisation, to consolidate this data beyond the “number of visits”, and to interpret them in terms of user activities.

Ontologies are formal, machine processable conceptual models of a domain. Ontology technologies, especially associated with technologies from the semantic web, have proven useful in situations where a meaningful integration of large amounts of heterogeneous data need to be realised, and to a certain extent, reasoned upon in a qualitative way, for interpretation and analysis. Our goal here is to investigate how ontologies and semantic technologies can support the user-centric analysis of activity data. In other words, our second hypothesis is

Hypothesis 2: Ontologies and ontology-based reasoning can support the integration, consolidation and interpretation of activity data from multiple sources.

As described in our work plan (see previous blog post), our first task is therefore to build an ontology able to flexibly describe the traces of activities across multiple websites, the users of these websites and the connections between them. The idea is to use this ontology (or rather, this set of ontologies) as a basis for a pluggable software framework, capable of integrating data from heterogeneous logs, and to interpret such data as traces of high-level activities.

The ongoing definition of these ontologies can be followed on our code repository, and a presentation of UCIAD’s basic hypothesis at the JISC Activity Data Programme event is available on slideshare.

Project Plan

February 17th, 2011

UCIAD intends to realise something relatively ambitious -set up a software infrastructure for the user-centric integration of activity data- within a rather short period of time. This stresses the importance of setting up a suitable work plan from the start of the project, ensuring that outputs are delivered and can be taken up as early as possible.

Aims, Objectives and Final Output(s) of the project

The overall aim of UCIAD is to investigate the use of ontologies and semantic technologies for integrating the different data about the interaction of a user with different systems and websites in an organization. More specifically, to achieve this aim we plan:

  1. To investigate and develop the ontological models needed to integrate user activity data. The objective here is to develop a set of ontologies that can be used to integrate logs and traces of activities existing in a variety of formats, depending on the originating system. Such ontologies will provide a common, meaningful and reusable activity data model for capturing user-centric activity data.
  2. To prototype a reusable, pluggable framework to integrate user activity data across different user facing systems within a large organization, relying on the developed ontological models. Such a framework will be based on semantic data management components available in KMi or externally (as open source software) to aggregate data coming from various systems. In order to accommodate an extensible variety of log formats and activity databases, it will implement a pluggable architecture, where plug-ins implementing a mapping between a particular source/format and our ontological model can be easily added to the framework.
  3. To test and scope the applicability of such a framework within realistic scenarios at The Open University. A complete case study integrating logs from various systems at The Open University, especially access and search logs from The Open University’s main website, specific logs from The Open University’s virtual learning environment, the linked open data platform of The Open University, the seminar system of The Open University, websites and user facing systems from various research projects at the Knowledge Media institute (e.g.,,,,, etc.) will be used to test the UCIAD framework.
  4. To demonstrate how the UCIAD activity data framework can benefit the users in their interaction with the organization. Initial requirements, components and guidelines on exploiting the framework to the benefit of the user, regarding in particular GUI issues, ownership and export of the data will be devised by the end of the project, ensuring short-term potential deployment of the results of the project.

Risk Analysis and Success Plan

Considering the ambitious goals of the project, the major risks relate to the maturity and robustness of semantic technologies, related to their ability to handle very large amounts of user activity data across multiple websites, and to support the user-centric interpretation of this data. The team involved in the project has extensive experience in working with such technologies, in large scale projects.

The primary goal of UCIAD being the realisation of an open software platform relying on ontologies to integrate and interpret user activity data, the main success criteria include the successful, documented application of this platform on a large variety of websites at the Open University, and possibly outside. The outputs of the project will be released as open source, and we expect uptake from external organisations to take place towards the end, or after the project.


In order not to infringe the privacy-related expectations from users of the considered websites, the activity data considered as part of the project will be kept private. The ontologies to model and integrate such data will be made available under an open license (CC0), for reuse and extension by the community. Some technologies employed in the project have been developed by external organizations and are available as open source software. Code realized as part of UCIAD will also be released under an open source license (LGPL). The code will be made available through UCIAD’s repositories on github. All documentation produced, including reports, blogs and system documentation will be made available under a creative commons license (CC-By).

Project Team Relationships and End User Engagement

UCIAD is realised and managed at the Knowledge Media Institute (KMi) of the Open University, which is a 84-strong interdisciplinary research laboratory founded at The Open University in 1995. KMi has established itself as a world-class R&D centre at the leading edge of the Web, semantic, learning, and new media technologies. The research areas in KMi include cognitive sciences, new media technologies for learners, human computer interaction, Semantic Web and Web services, multimedia analysis and information retrieval.

The project team includes:

  • Dr. Mathieu d’Aquin is a Research Fellow working in the Semantic Web area at the Knowledge Media Institute. Dr. d’Aquin is leading the research and development around approaches to exploit semantic technologies and semantic data. Dr. d’Aquin has in particular been working on concrete solutions for the realization of applications producing and consuming linked data (see for example the JISC-funded LUCERO project which he is directing), and is currently leading the realization of the Open University’s linked data Web – Dr. d’Aquin is also involved in a research direction concerning the use of Semantic Web technologies for the purpose of personal information management.
  • Prof. Enrico Motta is Professor of Knowledge Technologies at KMi and a leading international scientist in the area of Semantic Technologies, with extensive experience of both fundamental and applied research. Professor Motta will act in the project as the chair of the steering group.
  • Salman Elahi is a research assistant at KMi, and a part time PhD Student working on aspects of user-centric identity and personal information management.
  • Stuart Brown is Web Developments and Online Communities manager at The Open University. He is in particular involved in the overall management of the Open University’s content management systems. Stuart Brown will act as a member of the UCIAD steering group, in charge of the liaison between the project team and the Open University’s online services.

Dissemination will be realised through a variety of channels (blog, twitter, etc.) as well as through direct engagement with the community (users and website developers at The Open University, other researchers and developers through seminars, conferences and dedicated workshops). Several aspects of evaluation will be considered. The ontologies and software framework developed as part of the project will be evaluated both formally (using ontology evaluation frameworks and software validation methods) and through usage in our case study. The overall outcome of the project will be evaluated based on adoption at The Open University and by external parties.

Projected Timeline, Workplan & Overall Project Methodology

Based on the aim and objectives described above, we divide the workplan of UCIAD in 5 workpackages:

WP1 – Ontologies as Semantic Models for Integrating User Activity Data: The goal of this workpackage is to produce the foundational data models for the project, by developing the ontologies to be used to integrate activity data from various sources. Here, we will employ ontology design methodologies developed in KMi, combining reuse of existing ontologies, data-driven modelling and knowledge engineering techniques.

Deliverables: A set of documented and reusable user activity data ontologies.

WP2 – Prototype Ontology Based Architecture for Cross-Organization User Activity Data: The goal of this workpackage is to prototype the architecture for aggregating user activity data based on the ontologies developed in WP1. This architecture will mostly consist of a semantic data management system (triple store, reasoner and query engine), and a plug-in based framework to realise the mapping between logs and activity databases and user activity ontologies.

Deliverables: An open-source, pluggable user activity data framework and

WP3 – Case Study using Multiple Sources of Activity Data: The goal of this workpackage
is to deploy the architecture developed in WP2 in a concrete, realistic scenario. We will in particular set up the architecture with a set of plugins to aggregate data from several websites in of The Open University and the Knowledge Media institute (see list of considered systems and websites in Paragraph 14). Initial agreements with the administrators of the considered systems and websites at The Open University’s online services and Knowledge Media institute have already been obtained.

Deliverables: A set of plugins for the relevant websites/systems (including for example a plugin for access logs of Apache Web servers), with documentation regarding the development of these plugins and the deployment of the UCIAD framework.

WP4 – User Centric Interfaces to Activity Data: The goal of this workpackage is to analyse the requirements and implement initial components for user interfaces to the UCIAD framework. In order to reduce development cost, we plan to reuse components of the open source Piwik web analytics engine2, to provide user-centric, ontology-based analytics across organizational websites, instead of website-centric analytics.

Deliverables: An initial set of components (widgets) for a prototype graphical interface to the UCIAD framework.

WP5 – Dissemination and Project Evaluation: The goal of the project is to investigate and prototype a pluggable framework for user activity data. It is therefore essential for the project to engage with potential users and developers of this framework, to ensure adoption and further extension. We will realise this through extensive and frequent communication across a variety of channels (project website, blog, twitter, seminar and conferences). The evaluation of the results of the project will be realised through demonstrating in a realistic case study, the benefit and quality of the developed components (ontologies, architecture, plugins, interface).

Deliverables: Documented dissemination activities and user-based tests.

UCIAD project plan


Directly incurred Staff £28,569 Include research assistant and director of the project
Directly incurred non-staff £4,000 Include travel and equipment
Directly Allocated £6,994 Include staff and estates
Indirect Cost £31,614
Total £71,178
JISC contribution £49,824
OU contribution £21,353