Search:

Wins and fails (lessons along the way)

August 3rd, 2011

If there is one thing I like about the JISC activity data programme in which UCIAD is involved is that the instructions were very clear: your project is a short experiment, to see what could/should be done in the area of activity data in the context of higher education organisations (or at least, this is what I heard). We have integrated that a lot in UCIAD, starting from our two basic hypothesis that a user-centric perspective on activity data is relevant, and that Semantic Web technologies, especially ontologies, provided the right technological basis to achieve such a perspective.

We have discussed in a number of previous posts what we got excited about, what showed us the feasibility, relevance and potential impact of our approach, as well as what unexpected issues we had to face and how some of our assumptions turned out to be wrong. Here, we wanted to give a quick summary of these “wins” and “fails”, starting of course from the wins, and looking at the two aspects corresponding to our two hypothesis: the user-centric view and the semantic technologies view.

Wins – What went right

  • On the user-centric view: Giving data back to the user, user-centric data and consumer data were already emerging trends when we started the project, but clearly exploded as topics that organisations should take into account in the last few months. The New York Times article “Show Us the Data. (It’s Ours, After All.)” has in particular generated a lot of discussions amongst consumer representatives and “data-managers” in various organisations. The mydata project launched by the UK government is also a clear sign that the push for more transparency has to extend to private and public organisations dealing with people’s data. There have already been strong reactions from large companies such as Google, launching its own Data Liberation Front. Generally, users (will more and more) want, and assume the right to access their data and to use them to their own benefits. Only considering the feature of exporting one’s own activity data is technically non-trivial, but of obvious relevance in the current climate where a lot of emphasis is put on transparency, while personal information can be distributed in many different and isolated systems. Beyond the general climate, we have also shown that activity data is not only relevant as aggregated at the level of an organisation, but can give a new perspective when individual users are kept visible in the data (see this post for an explanation of what we mean here). To put it simply, giving people a view on their activity data provides a way for them to reflect on it, and to become more efficient in these activities. It also give them an opportunity to engage with the data, “customize” it, with added-value for the organisation.
  • On Semantic Technologues We have a lot of experience working with ontologies and semantic data, and were therefore confident that there was a great potential here. However, this is probably the point on which most people external to the project would think we had the best chance to fail: we believed that we could apply semantic technologies, linked data-based approaches and (most horribly) ontology-based reasoning to the manipulation, processing and analysis of activity data. Realising the experiments, setting up the UCIAD platform with real, large scale data, applying ontologies on top of these data and evolving these ontologies to create different views for the analysis of these data are, from my very personal point-of-view, the most interesting part of the project. Ontologies have acquired recently a bad reputation, and mentioning them (especially in the context activity data) now often leads to raised eyebrows and condescending looks. One thing that our experiments with UCIAD have shown is that working with ontologies not only has the advantages of introducing formality, shared vocabularies and semantics in our applications, but also represents a flexible and efficient way of introducing meaningful views into large amounts of raw, uninterpreted data. What ontologies bring into such an area is the ability to give definitions that will be at the basis of clustering and organising the data automatically. I can tell what I mean by a “search activity” and magically see all the traces related to search activities being put together, to become explorable and queryable (see our post on reasoning). The nice thing about UCIAD, is that this magic is actually implemented and working in the way we hypothesized it would. It is a fascinating thing to see raw data from log files being classified into meaningful categories of activities, resources and actors. It is even more fascinating knowing that we defined these groups, through encoding these definitions in an ontology, and can add others as we see fit. Due to time constraints, we could only experiment a tiny bit with this process, but we see a very promising approach in the incremental definition of the ontology as an analysis process: looking at the data, thinking that it would make sense to have an activity categorie such as for example “commenting on a blog”, and simply adding it to see the data being automatically reorganised with this new definition.

Fails – What went wrong

  • On the user-centric view: Our biggest failure in my opinion has been that we didn’t manage to communicate appropriately on the notions, approaches and change of perspective that the user-centric view on activity data represents. There are many reasons for this I believe, one being that we have been assuming that the benefits would be self-evident, while they clearly are not (see the post where we tried to get back the basis of the issue). The notion of user-centric data or consumer data might be very trendy, it does not mean that it is ready for wide adoption. There are many issues that need to be solved that go far beyond the purely technical aspects, and that simply come from the fact that activity data has never been looked at in this way before. We don’t really know what will happen in this space, what users would do with these data and how much interest this could generate for the organisation. There are many difficult questions that we could not really address in the scope of the project (including in particular the questions around data ownership, and privacy). While this is enough to keep us excited, there is enormous work to be done before the approach we have been promoting in UCIAD could reach its potential, and be widely adopted.
  • On Semantic Web technologies: While we are still excited about the added-value that semantic web technologies can bring to the analysis of activity data, we have been clearly over-optimistic regarding the maturity of some components we have been relying on, and their ability to handle the scale and complexity of the kind of data we are working with. This issue is clearly summarised in our post on the technical aspect of UCIAD. The good news is however that things are evolving very quickly. It would be a lot easier to implement the UCIAD platform now than it was 6 months ago, as the tools and platforms to deal with semantic data are getting more robust everyday. Also, the evolution of the technology should be followed by an evolution in the skills and ability of the community to adopt such technologies. Realising UCIAD made us reach a better understanding of what was feasible and required to set up a semantic platform for activity data. There is still much to do for such an approach to become feasible in a broader set of situations.

Licensing & reuse of software and data

July 31st, 2011

Deciding on licensing and data distribution is always challenges where talking about data which are intrinsically personal: activity data. Privacy issues are of course relevant here. We cannot distribute openly, or even on proprietary basis, data that relate to users’ actions and personal data on our systems. Anonimisation approaches exist that are supposed to make users un-identifiable in the data. Such approaches however cannot be applied in UCIAD for two main reason:

  • Such anonimisation mechanisms are only garantied in very closed, controlled environment. In particular, they assume that it is possible to completely characterise the dataset, and that integration with other datasets will not happen. These are two assumption that we can’t apply on our data as it is always evolving (in ways that might make established parameters for anonimisation suddenly invalid) and they are meant to be integrated with other data.
  • The whole principle of the project is to distribute the data to the user it concerns, which means that the user is at the center of the data. Anonimising data related to one user, while giving it back to this user makes of course not sense. More generally, anonimisation mechanisms are based on aggregating data into abstracted or averaged values so that individual users disappear. This is obviously in contradiction with the approach taken in UCIAD.

The issue with licensing data in UCIAD is actually even more complicated: what licence to apply to data exported for a particular user? The ownership of the data is not even clear in this case. It is data collected and delivered by our systems, but that are produced out of the activities of the user. We believe that in this case, a particular type of license, that give control to the user on the distribution of their own data, but without opening it completely, is needed. This is an area that we will need to put additional work on, with possibly useful results coming out of the mydata project.

Of course, despite this very complicated issue, more generic components of UCIAD can be openly distributed. These include the UCIAD ontologies, as well as the source code of the UCIAD platform, manipulating data according to these ontologies.

Benefits

July 29th, 2011

One of the major issues (which is going to be discussed in longer terms in the “Wins and Fails” post in the next few days) of the approach taken in UCIAD is to communicate on its benefits. One reason is that, to be fully honest, the mechanisms and the whole perspective we are taking on activity data are still too ‘experimental’ for us to fully understand these benefits yet. The other aspect of this is that at the core of our approach is a focus on the benefits of activity data to the end-user and not, as it would usually be the case, to the organisation. We therefore here quickly come back to what we have learnt on the advantages of our approach, first to the end-users, and then deriving potential benefits to the organisation. We summarise our view on the achievements of UCIAD in terms of benefits through a discussion regarding the success of the project, as seen as an experiment towards ontology-based, user-centric activity data.

Benefits to the end-user

There have been a number of places where the potential benefits of user-centric data (or consumer data) have been discussed, as generally labeled as “giving back their data to the users”. These include in particular the popular article “Show Us the Data. (It’s Ours, After All.)” by Richard H. Thaler in the New York Times. As was argued in particular in one of our previous posts, being able to give a complete account of what end-users could do with such data is both unfeasible and undesirable. However, we can summarise the expected benefits, and their connections to the work done in UCIAD, in three different areas:

  • Known yourself… and be more efficient: As we briefly discussed in our post on self-tracking, there is a trend currently regarding people, individuals, monitoring their own activities, statuses, etc. While some would criticise such attitude as pure narcissism, the reality is that monitoring oneself has been realised as one effective way to improve. In sport for example, monitoring performance in relation with other variables (health status, equipment used, etc.) is necessary to improve and achieve the best conditions, for the best results. Besides sports however, there are many areas where monitoring and understanding one’s own behaviour can help being more efficient. There is a large gap between an athlete measuring his/her performance and a user monitoring his/her online activities. However, for a user to know how he/she searched websites, find and exploit resources on the Web or engage with online communities, can only have a positive effect on his/her effectiveness in realising these tasks in the future.
  • Exploit your own data yourself: Besides the passive monitoring of activities, consumer data has often be described as exploitable by individuals. Indeed, in the current situation, organisations collect large amounts of data about their users, that they exploit to their own benefits, often for commercial purposes. Such personal data and profiles are being used and accessed by a large variety of agents, from the search engine that will send personalised results to the advertiser that will target you with specific products, except the user him/herself. For the users to have access, control and possibly ownership of their own data means that they could also exploit them, use them to build their own profiles that can be employed in communicating with other entities on the Web, under their own terms. In a more directly pragmatic way, the users can analyse their own data and build on top of them to extract relevant information to their own benefit. In UCIAD, we not only allow users to export their own data, but we do it using Semantic Web standards to ensure maximum reusability and, through relying on a customisable ontology, the exported data can be flexibly adapted to any kind of uses that the user might come up with, not only the ones that we have thought of.
  • Combine and integrate your own data: While we are still far from such a situation at this stage, we can easily imagine that, with the explosion of the number of systems providing an “export your own data” feature, users will eventually be able to build their own personal knowledge base, feeding it with personal data collected from the many online systems they use. Again, such a scenario requires a certain level of standardisation in the data representation formats being used, for which Semantic Web technologies appear as perfect candidates. A possibly less distant scenario is the one were users interacting with several organisations would export their activity data from the corresponding instances of the UCIAD platform. These data would naturally integrate to provide the user with the ability to monitor, analyse and exploit their activity data across numerous, originally disconnected organisations and websites.

Benefits to the organisation

As explained earlier, one of the core aspects of UCIAD has been to focus on the benefits of collecting and flexibly interpreting activity data to the end-user. This does not mean that the organisation has no interest in considering the type of technology we have been developing, but simply that the benefits to the organisation mostly come as derived from providing benefits to the end-users of the organisation:

  • Transparency: In very simple terms, users are more and more pushing organisation towards more accountability with respect to the data they collect about them. Deploying the UCIAD platform can be seen as a way for an institution to tell users “here is what we have about you in terms of activity data”.
  • Trust: In relation with the point above on transparency, providing collected data back to the user is a way to establish a stronger relationship with them: i.e., one where they can trust the organisation regarding the fair and transparent use of their activity data.
  • Leave data management to the user: Leaving the user in control of their own data can bring valuable benefits to the organisation. In particular, it means that the user can allow, or actively enable, the use of more data than what can be done when he/she is left out of the loop. It makes it possible for example for them to bring and import data they have collected from other systems and organisations, so that the same data does not have to be collected again, and the new organisation does not have to start from scratch.

How do we measure success?

So, now that we have listed all the expected benefits of the approach taken in UCIAD, the natural next question is “have we managed to bring all these benefits to our institution?”. The plain and honest answer is: No.

From the start, we have considered UCIAD as being an experiment (and actually, a rather short one). What we wanted to demonstrate was that:

  1. These benefits are achievable
  2. Technology, such as linked data and ontologies, make the approach feasible

The UCIAD platform demo, collecting log data from several webservers concerning around a dozen websites, interpreting this data in terms of user-activity, extracting the traces of activities around a given user and exposing the user to these traces in a meaningful way, provides an undeniable demonstration that the technical and technological mechanisms to achieve the UCIAD approach are applicable and effective.

We are currently demonstrating this platform to users of the Open University websites, and observing them in engaging with it, and so with their own activity data. This activity will carry on for some times after the end of the project so that we can learn as much as possible from the current state of the platform. However, from these initial discussions, it appears clearly that users are interested, even sometimes fascinated, with the idea of obtaining and using their own activity data. They are, as it has been happening for many systems outside UCIAD (e.g., Google, Facebook), very positive about such features being added to the websites of an organisation they spend so much time interacting with: their University. In many cases now, they are demanding it.

The mydata project

June 21st, 2011

Announcements have come out recently regarding new projects from the government around the slogan “Better choices, better deals” to support better customer experience, through transparent customer information. This is exciting as it shows how the government, as well as businesses, are now realising that it is through giving control to information to the customers (i.e., the users) that we can build a better, more reliable and more transparent experience. At the core of the initiative is the mydata project which goal can be summarised by the sentence: “giving back customer data to customers”. To a large extent UCIAD can be seen as an experiment in this direction, proposing to deliver activity data to the users (i.e., customers) of large organisations. We certainly share the same hypothesis that, as expressed by Nigel Shadbolt (chair of the MyData project), customers/users getting back their information can help make organisations/businesses “more accountable”, “more efficient” and able to build “new kinds of services”.

Of course, it is still unclear at this stage what will be the concrete outcomes of the mydata project. Great challenges have to be tackled both from a technological point of view (in what format should data be provided to customers? How to ensure reusability? How to deal with heterogeneity?) and from the societal point of view (What are the privacy/security implications? How to enforce “user-centric data provision” policies in businesses? How to spread the benefit equally amongst users?). We hope that our experience with UCIAD (and beyond, with the work building on UCIAD we are planning to do) will contribute to such exciting new approaches to activity/customer data.