July 29th, 2011

One of the major issues (which is going to be discussed in longer terms in the “Wins and Fails” post in the next few days) of the approach taken in UCIAD is to communicate on its benefits. One reason is that, to be fully honest, the mechanisms and the whole perspective we are taking on activity data are still too ‘experimental’ for us to fully understand these benefits yet. The other aspect of this is that at the core of our approach is a focus on the benefits of activity data to the end-user and not, as it would usually be the case, to the organisation. We therefore here quickly come back to what we have learnt on the advantages of our approach, first to the end-users, and then deriving potential benefits to the organisation. We summarise our view on the achievements of UCIAD in terms of benefits through a discussion regarding the success of the project, as seen as an experiment towards ontology-based, user-centric activity data.

Benefits to the end-user

There have been a number of places where the potential benefits of user-centric data (or consumer data) have been discussed, as generally labeled as “giving back their data to the users”. These include in particular the popular article “Show Us the Data. (It’s Ours, After All.)” by Richard H. Thaler in the New York Times. As was argued in particular in one of our previous posts, being able to give a complete account of what end-users could do with such data is both unfeasible and undesirable. However, we can summarise the expected benefits, and their connections to the work done in UCIAD, in three different areas:

  • Known yourself… and be more efficient: As we briefly discussed in our post on self-tracking, there is a trend currently regarding people, individuals, monitoring their own activities, statuses, etc. While some would criticise such attitude as pure narcissism, the reality is that monitoring oneself has been realised as one effective way to improve. In sport for example, monitoring performance in relation with other variables (health status, equipment used, etc.) is necessary to improve and achieve the best conditions, for the best results. Besides sports however, there are many areas where monitoring and understanding one’s own behaviour can help being more efficient. There is a large gap between an athlete measuring his/her performance and a user monitoring his/her online activities. However, for a user to know how he/she searched websites, find and exploit resources on the Web or engage with online communities, can only have a positive effect on his/her effectiveness in realising these tasks in the future.
  • Exploit your own data yourself: Besides the passive monitoring of activities, consumer data has often be described as exploitable by individuals. Indeed, in the current situation, organisations collect large amounts of data about their users, that they exploit to their own benefits, often for commercial purposes. Such personal data and profiles are being used and accessed by a large variety of agents, from the search engine that will send personalised results to the advertiser that will target you with specific products, except the user him/herself. For the users to have access, control and possibly ownership of their own data means that they could also exploit them, use them to build their own profiles that can be employed in communicating with other entities on the Web, under their own terms. In a more directly pragmatic way, the users can analyse their own data and build on top of them to extract relevant information to their own benefit. In UCIAD, we not only allow users to export their own data, but we do it using Semantic Web standards to ensure maximum reusability and, through relying on a customisable ontology, the exported data can be flexibly adapted to any kind of uses that the user might come up with, not only the ones that we have thought of.
  • Combine and integrate your own data: While we are still far from such a situation at this stage, we can easily imagine that, with the explosion of the number of systems providing an “export your own data” feature, users will eventually be able to build their own personal knowledge base, feeding it with personal data collected from the many online systems they use. Again, such a scenario requires a certain level of standardisation in the data representation formats being used, for which Semantic Web technologies appear as perfect candidates. A possibly less distant scenario is the one were users interacting with several organisations would export their activity data from the corresponding instances of the UCIAD platform. These data would naturally integrate to provide the user with the ability to monitor, analyse and exploit their activity data across numerous, originally disconnected organisations and websites.

Benefits to the organisation

As explained earlier, one of the core aspects of UCIAD has been to focus on the benefits of collecting and flexibly interpreting activity data to the end-user. This does not mean that the organisation has no interest in considering the type of technology we have been developing, but simply that the benefits to the organisation mostly come as derived from providing benefits to the end-users of the organisation:

  • Transparency: In very simple terms, users are more and more pushing organisation towards more accountability with respect to the data they collect about them. Deploying the UCIAD platform can be seen as a way for an institution to tell users “here is what we have about you in terms of activity data”.
  • Trust: In relation with the point above on transparency, providing collected data back to the user is a way to establish a stronger relationship with them: i.e., one where they can trust the organisation regarding the fair and transparent use of their activity data.
  • Leave data management to the user: Leaving the user in control of their own data can bring valuable benefits to the organisation. In particular, it means that the user can allow, or actively enable, the use of more data than what can be done when he/she is left out of the loop. It makes it possible for example for them to bring and import data they have collected from other systems and organisations, so that the same data does not have to be collected again, and the new organisation does not have to start from scratch.

How do we measure success?

So, now that we have listed all the expected benefits of the approach taken in UCIAD, the natural next question is “have we managed to bring all these benefits to our institution?”. The plain and honest answer is: No.

From the start, we have considered UCIAD as being an experiment (and actually, a rather short one). What we wanted to demonstrate was that:

  1. These benefits are achievable
  2. Technology, such as linked data and ontologies, make the approach feasible

The UCIAD platform demo, collecting log data from several webservers concerning around a dozen websites, interpreting this data in terms of user-activity, extracting the traces of activities around a given user and exposing the user to these traces in a meaningful way, provides an undeniable demonstration that the technical and technological mechanisms to achieve the UCIAD approach are applicable and effective.

We are currently demonstrating this platform to users of the Open University websites, and observing them in engaging with it, and so with their own activity data. This activity will carry on for some times after the end of the project so that we can learn as much as possible from the current state of the platform. However, from these initial discussions, it appears clearly that users are interested, even sometimes fascinated, with the idea of obtaining and using their own activity data. They are, as it has been happening for many systems outside UCIAD (e.g., Google, Facebook), very positive about such features being added to the websites of an organisation they spend so much time interacting with: their University. In many cases now, they are demanding it.

Explaining user-centric activity data

July 5th, 2011

I was today at the meeting of the JISC activity data programme, where all the projects in the programme came to discuss what they were doing, and what should be the priorities for the coming year(s). As some might have realised, I am actually a bit critical of this sort of discussions. Not that I think that the projects are doing the wrong things, just that there is a lot of catching up to do, and I think we might end up missing the next train (which I believe to be consumer data) while trying to catch up with the previous one (activity data-based recommander systems).

Anyway, I was trying to come up with a reasonable explanation regarding user-centric activity data (mostly based on showing evidence of the current trends in the industry, from energy providers showing users historic information on their own consumption to the Google Data Liberation front and the mydata project) when the ongoing discussion derived on the definition of simple things such as the notion of event. Trying to define the concepts we are talking about is the major goal of our ontologies. However, the discussion made me realised that we also needed a simplified overview of the kind of data we are dealing with, and of what made the difference between the organisation-centric view and the user-centric view of activity data.

Indeed, looking at the figure above, we can summarise very simply what we are dealing with in terms of activity data. Activity data is set of events (or the traces of these events) where an action is realised on a resource (e.g., a webpage) by an actor (most often a user). That is a general view of what we mostly have to consider as raw activity data. However, in order to extract anything meaningful from this data, looking at the raw collection of individual events isn’t going to give us much: we need to abstract the data into sets of events that are meaningful, and which distributions of characteristics can be interpreted.

The figure above represents the most common way of abstracting activity data: what we call the organisation-centric view. The idea is that large sets of events are being analysed that are realised by aggregated sets of users. There can be one set of users, like in the case of analytics system that provide statistics regarding actions realised by all visitors of a website, or the organisation can define sub-groups such as Students/Staff/External that are meaningful to the particular types of activities and analyses being considered. In this case, users stop existing individually in the abstracted activity data, as they only manifest as part of the aggregated statistics for their groups.

User-centric activity data is basically making the abstraction the other way around (see above): aggregating traces of activities around a given user, interpreted according to meaningful sets of resources and events. The challenge in this case (appart from the scalability of the approach, which is going to be the topic of another blog post sometimes) is in the way to define meaningful sets of resources and events. In the data we have been looking at, activities such as “commenting on a blog”, “searching a blog”, “querying linked data” or “using a web application” are clearly emerging, but the number and nature of the types of resources and events that can appear in the data is largely dependent on the system and the user. This is why we believe that using ontologies as a model to drive such abstractions is a good solution: it provides us with a flexible way to define types of resources (e.g., BlogPage, RSS feed, Linked Data endpoint) and the corresponding activities (e.g., commenting, querying, searching), and to automatically classify individual traces and resources into these types. The end result is the ability for individual users to visualise and analyse the distribution of their own activity data in these types and categories. Pushing it a step further, users should even be able to personalise the views, giving their own ontological definitions and obtaining data abstractions that are therefore more meaningful to them.

A colleague forwarded me today this article in french, where the author says (my translation): “What could I accomplish if I had at my disposal, in an exploitable form, the information regarding my pathways and communications? [...] Not only to control what others are doing with it, but to use it to my own benefit? Today, we tend to scratch our head and ask: what would be the use of that?”, and indeed we don’t really know what this will allow in the future. However, as the author of the article suggests, that shouldn’t stop us from trying to find out, as long as we are convinced there is something there to explore.