Explaining user-centric activity data

July 5th, 2011

I was today at the meeting of the JISC activity data programme, where all the projects in the programme came to discuss what they were doing, and what should be the priorities for the coming year(s). As some might have realised, I am actually a bit critical of this sort of discussions. Not that I think that the projects are doing the wrong things, just that there is a lot of catching up to do, and I think we might end up missing the next train (which I believe to be consumer data) while trying to catch up with the previous one (activity data-based recommander systems).

Anyway, I was trying to come up with a reasonable explanation regarding user-centric activity data (mostly based on showing evidence of the current trends in the industry, from energy providers showing users historic information on their own consumption to the Google Data Liberation front and the mydata project) when the ongoing discussion derived on the definition of simple things such as the notion of event. Trying to define the concepts we are talking about is the major goal of our ontologies. However, the discussion made me realised that we also needed a simplified overview of the kind of data we are dealing with, and of what made the difference between the organisation-centric view and the user-centric view of activity data.

Indeed, looking at the figure above, we can summarise very simply what we are dealing with in terms of activity data. Activity data is set of events (or the traces of these events) where an action is realised on a resource (e.g., a webpage) by an actor (most often a user). That is a general view of what we mostly have to consider as raw activity data. However, in order to extract anything meaningful from this data, looking at the raw collection of individual events isn’t going to give us much: we need to abstract the data into sets of events that are meaningful, and which distributions of characteristics can be interpreted.

The figure above represents the most common way of abstracting activity data: what we call the organisation-centric view. The idea is that large sets of events are being analysed that are realised by aggregated sets of users. There can be one set of users, like in the case of analytics system that provide statistics regarding actions realised by all visitors of a website, or the organisation can define sub-groups such as Students/Staff/External that are meaningful to the particular types of activities and analyses being considered. In this case, users stop existing individually in the abstracted activity data, as they only manifest as part of the aggregated statistics for their groups.

User-centric activity data is basically making the abstraction the other way around (see above): aggregating traces of activities around a given user, interpreted according to meaningful sets of resources and events. The challenge in this case (appart from the scalability of the approach, which is going to be the topic of another blog post sometimes) is in the way to define meaningful sets of resources and events. In the data we have been looking at, activities such as “commenting on a blog”, “searching a blog”, “querying linked data” or “using a web application” are clearly emerging, but the number and nature of the types of resources and events that can appear in the data is largely dependent on the system and the user. This is why we believe that using ontologies as a model to drive such abstractions is a good solution: it provides us with a flexible way to define types of resources (e.g., BlogPage, RSS feed, Linked Data endpoint) and the corresponding activities (e.g., commenting, querying, searching), and to automatically classify individual traces and resources into these types. The end result is the ability for individual users to visualise and analyse the distribution of their own activity data in these types and categories. Pushing it a step further, users should even be able to personalise the views, giving their own ontological definitions and obtaining data abstractions that are therefore more meaningful to them.

A colleague forwarded me today this article in french, where the author says (my translation): “What could I accomplish if I had at my disposal, in an exploitable form, the information regarding my pathways and communications? [...] Not only to control what others are doing with it, but to use it to my own benefit? Today, we tend to scratch our head and ask: what would be the use of that?”, and indeed we don’t really know what this will allow in the future. However, as the author of the article suggests, that shouldn’t stop us from trying to find out, as long as we are convinced there is something there to explore.