Final post – Putting things together (with a demo)

August 5th, 2011

Over the last 6 months we have been working on building the UCIAD platform, experimenting with large-scale activity data, reflecting on user-centric data and blogging our thoughts. While, as can be seen from the last few posts on this blog, there is quite some work we think should follow from this, it is nice to see things finally coming together, and to be able to show a bit of the UCIAD platform with have been talking about for some times. What better way to do this than with a video of the running platform, showing the different components in action. (Note: it is better to watch it in 720p – HD).

This video shows a user (me) registering to the UCIAD platform with some setting details and browsing his activity data as they appear on several Open University websites (mostly, an internal wiki system and the Open University’s linked data platform – This video therefore integrates in a working demo the different components we have been talking about here:

  • User management: As we can see here, as the user registers into the UCIAD platform, his current setting is automatically detected, and other settings (other browsers) that are likely to be his are also included. As the user registers, the settings are associated to his account and the activity data realised through these settings are extracted.
  • Extracting user-centric activity data: As described in the first part of the blog post on reasoning (previous link), the settings associated with the user are used to extract the activity data around this particular user, creating a sub-graph corresponding to his activity.
  • Ontologies to make sense of activity data: The ontologies are used in structuring the data according to a common schema and to provide a base to homogeneously query data coming from different systems. As discussed below, they can also be extended (specified) so that different categories of activities and resources can be represented, and reasoned upon.
  • Ontological reasoning for analysis: What the demo video shows clearly is how the activity data is organised according to different categories (traces, webpages, websites, settings, etc.) coming from the base ontologies, but also according to classes of activities, resources, etc. that have been specially added to cover the websites and the particular user in this case. Here, we extended the ontology in order to include definitions of activities relevant to the use of a wiki and a data platform. The powerful aspect of using ontologies here is that such classes can be added to the ontology for the system to automatically process them and organise the data according to them. Here, for example, we define “Executing a SPARQL Query” as an activity that takes place on a SPARQL endpoint with a “query” parameter, or “Checking Wiki Updates” as an activity on a Wiki page that is realised through an RSS client.
  • Browsing data according to ontologies: We haven’t described this components yet, but we rely on an homemade “browser” that we use in a number of projects and that can inspect ontology classes and members of these classes, generating graphs and simple stats.

Next steps

There are a lot of things to mention here, some of them we have already mentioned several times. An obvious one is the finalisation, distribution and deployment of the UCIAD platform. A particular element we want to get done at a short term is to investigate the use of the UCIAD platform with various users, to see what kind of extensions of the ontologies would be commonly useful, and generally to get some insight into the reaction of users when being exposed to their own data.

More generally, we think that there is a lot more work to do on both the aspects of user-centric activity data and on the use of ontologies for the analysis of such data, as described in particular in our Wins and Fails post. These includes aspects around the licensing, distribution and generally management of user-centric data (as mentioned in our post on licensing). Indeed, while “giving back data to the users” is already technically difficult, there is a lot of fuzziness currently around the issues of ownership of activity data. This also forces us to look a lot more carefully at the privacy challenges that such data can generate, that didn’t exist when these data were held and stayed on server logs.

Beyond UCIAD and the Open University

As discussed in our post on the benefits of UCIAD, the issues considered go largely beyond the Open University and even activity data. The issues around licensing in particular are to be considered more broadly, in the same way as the challenges around communicating on user-centric data.

We have been focusing mostly on the technical issues in UCIAD, providing in this way a base framework to start investigating these broader and more complex challenges.

Most significant lessons

To put it simply, the most significant lessons we learnt (as mentioned in the wins and fails post) are:

  • Both user-centric data and ontologies are complex notions, so don’t assume they are understood.
  • Activity data are massive and complex, beyond what can be handled by current semantic data infrastructures, without a bit of clever management.
  • There is a lot of potential in using ontologies and ontological engineering for the analysis and interpretation of raw data.

Wins and fails (lessons along the way)

August 3rd, 2011

If there is one thing I like about the JISC activity data programme in which UCIAD is involved is that the instructions were very clear: your project is a short experiment, to see what could/should be done in the area of activity data in the context of higher education organisations (or at least, this is what I heard). We have integrated that a lot in UCIAD, starting from our two basic hypothesis that a user-centric perspective on activity data is relevant, and that Semantic Web technologies, especially ontologies, provided the right technological basis to achieve such a perspective.

We have discussed in a number of previous posts what we got excited about, what showed us the feasibility, relevance and potential impact of our approach, as well as what unexpected issues we had to face and how some of our assumptions turned out to be wrong. Here, we wanted to give a quick summary of these “wins” and “fails”, starting of course from the wins, and looking at the two aspects corresponding to our two hypothesis: the user-centric view and the semantic technologies view.

Wins – What went right

  • On the user-centric view: Giving data back to the user, user-centric data and consumer data were already emerging trends when we started the project, but clearly exploded as topics that organisations should take into account in the last few months. The New York Times article “Show Us the Data. (It’s Ours, After All.)” has in particular generated a lot of discussions amongst consumer representatives and “data-managers” in various organisations. The mydata project launched by the UK government is also a clear sign that the push for more transparency has to extend to private and public organisations dealing with people’s data. There have already been strong reactions from large companies such as Google, launching its own Data Liberation Front. Generally, users (will more and more) want, and assume the right to access their data and to use them to their own benefits. Only considering the feature of exporting one’s own activity data is technically non-trivial, but of obvious relevance in the current climate where a lot of emphasis is put on transparency, while personal information can be distributed in many different and isolated systems. Beyond the general climate, we have also shown that activity data is not only relevant as aggregated at the level of an organisation, but can give a new perspective when individual users are kept visible in the data (see this post for an explanation of what we mean here). To put it simply, giving people a view on their activity data provides a way for them to reflect on it, and to become more efficient in these activities. It also give them an opportunity to engage with the data, “customize” it, with added-value for the organisation.
  • On Semantic Technologues We have a lot of experience working with ontologies and semantic data, and were therefore confident that there was a great potential here. However, this is probably the point on which most people external to the project would think we had the best chance to fail: we believed that we could apply semantic technologies, linked data-based approaches and (most horribly) ontology-based reasoning to the manipulation, processing and analysis of activity data. Realising the experiments, setting up the UCIAD platform with real, large scale data, applying ontologies on top of these data and evolving these ontologies to create different views for the analysis of these data are, from my very personal point-of-view, the most interesting part of the project. Ontologies have acquired recently a bad reputation, and mentioning them (especially in the context activity data) now often leads to raised eyebrows and condescending looks. One thing that our experiments with UCIAD have shown is that working with ontologies not only has the advantages of introducing formality, shared vocabularies and semantics in our applications, but also represents a flexible and efficient way of introducing meaningful views into large amounts of raw, uninterpreted data. What ontologies bring into such an area is the ability to give definitions that will be at the basis of clustering and organising the data automatically. I can tell what I mean by a “search activity” and magically see all the traces related to search activities being put together, to become explorable and queryable (see our post on reasoning). The nice thing about UCIAD, is that this magic is actually implemented and working in the way we hypothesized it would. It is a fascinating thing to see raw data from log files being classified into meaningful categories of activities, resources and actors. It is even more fascinating knowing that we defined these groups, through encoding these definitions in an ontology, and can add others as we see fit. Due to time constraints, we could only experiment a tiny bit with this process, but we see a very promising approach in the incremental definition of the ontology as an analysis process: looking at the data, thinking that it would make sense to have an activity categorie such as for example “commenting on a blog”, and simply adding it to see the data being automatically reorganised with this new definition.

Fails – What went wrong

  • On the user-centric view: Our biggest failure in my opinion has been that we didn’t manage to communicate appropriately on the notions, approaches and change of perspective that the user-centric view on activity data represents. There are many reasons for this I believe, one being that we have been assuming that the benefits would be self-evident, while they clearly are not (see the post where we tried to get back the basis of the issue). The notion of user-centric data or consumer data might be very trendy, it does not mean that it is ready for wide adoption. There are many issues that need to be solved that go far beyond the purely technical aspects, and that simply come from the fact that activity data has never been looked at in this way before. We don’t really know what will happen in this space, what users would do with these data and how much interest this could generate for the organisation. There are many difficult questions that we could not really address in the scope of the project (including in particular the questions around data ownership, and privacy). While this is enough to keep us excited, there is enormous work to be done before the approach we have been promoting in UCIAD could reach its potential, and be widely adopted.
  • On Semantic Web technologies: While we are still excited about the added-value that semantic web technologies can bring to the analysis of activity data, we have been clearly over-optimistic regarding the maturity of some components we have been relying on, and their ability to handle the scale and complexity of the kind of data we are working with. This issue is clearly summarised in our post on the technical aspect of UCIAD. The good news is however that things are evolving very quickly. It would be a lot easier to implement the UCIAD platform now than it was 6 months ago, as the tools and platforms to deal with semantic data are getting more robust everyday. Also, the evolution of the technology should be followed by an evolution in the skills and ability of the community to adopt such technologies. Realising UCIAD made us reach a better understanding of what was feasible and required to set up a semantic platform for activity data. There is still much to do for such an approach to become feasible in a broader set of situations.

The mydata project

June 21st, 2011

Announcements have come out recently regarding new projects from the government around the slogan “Better choices, better deals” to support better customer experience, through transparent customer information. This is exciting as it shows how the government, as well as businesses, are now realising that it is through giving control to information to the customers (i.e., the users) that we can build a better, more reliable and more transparent experience. At the core of the initiative is the mydata project which goal can be summarised by the sentence: “giving back customer data to customers”. To a large extent UCIAD can be seen as an experiment in this direction, proposing to deliver activity data to the users (i.e., customers) of large organisations. We certainly share the same hypothesis that, as expressed by Nigel Shadbolt (chair of the MyData project), customers/users getting back their information can help make organisations/businesses “more accountable”, “more efficient” and able to build “new kinds of services”.

Of course, it is still unclear at this stage what will be the concrete outcomes of the mydata project. Great challenges have to be tackled both from a technological point of view (in what format should data be provided to customers? How to ensure reusability? How to deal with heterogeneity?) and from the societal point of view (What are the privacy/security implications? How to enforce “user-centric data provision” policies in businesses? How to spread the benefit equally amongst users?). We hope that our experience with UCIAD (and beyond, with the work building on UCIAD we are planning to do) will contribute to such exciting new approaches to activity/customer data.