If there is one thing I like about the JISC activity data programme in which UCIAD is involved is that the instructions were very clear: your project is a short experiment, to see what could/should be done in the area of activity data in the context of higher education organisations (or at least, this is what I heard). We have integrated that a lot in UCIAD, starting from our two basic hypothesis that a user-centric perspective on activity data is relevant, and that Semantic Web technologies, especially ontologies, provided the right technological basis to achieve such a perspective.
We have discussed in a number of previous posts what we got excited about, what showed us the feasibility, relevance and potential impact of our approach, as well as what unexpected issues we had to face and how some of our assumptions turned out to be wrong. Here, we wanted to give a quick summary of these “wins” and “fails”, starting of course from the wins, and looking at the two aspects corresponding to our two hypothesis: the user-centric view and the semantic technologies view.
Wins – What went right
- On the user-centric view: Giving data back to the user, user-centric data and consumer data were already emerging trends when we started the project, but clearly exploded as topics that organisations should take into account in the last few months. The New York Times article “Show Us the Data. (It’s Ours, After All.)” has in particular generated a lot of discussions amongst consumer representatives and “data-managers” in various organisations. The mydata project launched by the UK government is also a clear sign that the push for more transparency has to extend to private and public organisations dealing with people’s data. There have already been strong reactions from large companies such as Google, launching its own Data Liberation Front. Generally, users (will more and more) want, and assume the right to access their data and to use them to their own benefits. Only considering the feature of exporting one’s own activity data is technically non-trivial, but of obvious relevance in the current climate where a lot of emphasis is put on transparency, while personal information can be distributed in many different and isolated systems. Beyond the general climate, we have also shown that activity data is not only relevant as aggregated at the level of an organisation, but can give a new perspective when individual users are kept visible in the data (see this post for an explanation of what we mean here). To put it simply, giving people a view on their activity data provides a way for them to reflect on it, and to become more efficient in these activities. It also give them an opportunity to engage with the data, “customize” it, with added-value for the organisation.
- On Semantic Technologues We have a lot of experience working with ontologies and semantic data, and were therefore confident that there was a great potential here. However, this is probably the point on which most people external to the project would think we had the best chance to fail: we believed that we could apply semantic technologies, linked data-based approaches and (most horribly) ontology-based reasoning to the manipulation, processing and analysis of activity data. Realising the experiments, setting up the UCIAD platform with real, large scale data, applying ontologies on top of these data and evolving these ontologies to create different views for the analysis of these data are, from my very personal point-of-view, the most interesting part of the project. Ontologies have acquired recently a bad reputation, and mentioning them (especially in the context activity data) now often leads to raised eyebrows and condescending looks. One thing that our experiments with UCIAD have shown is that working with ontologies not only has the advantages of introducing formality, shared vocabularies and semantics in our applications, but also represents a flexible and efficient way of introducing meaningful views into large amounts of raw, uninterpreted data. What ontologies bring into such an area is the ability to give definitions that will be at the basis of clustering and organising the data automatically. I can tell what I mean by a “search activity” and magically see all the traces related to search activities being put together, to become explorable and queryable (see our post on reasoning). The nice thing about UCIAD, is that this magic is actually implemented and working in the way we hypothesized it would. It is a fascinating thing to see raw data from log files being classified into meaningful categories of activities, resources and actors. It is even more fascinating knowing that we defined these groups, through encoding these definitions in an ontology, and can add others as we see fit. Due to time constraints, we could only experiment a tiny bit with this process, but we see a very promising approach in the incremental definition of the ontology as an analysis process: looking at the data, thinking that it would make sense to have an activity categorie such as for example “commenting on a blog”, and simply adding it to see the data being automatically reorganised with this new definition.
Fails – What went wrong
- On the user-centric view: Our biggest failure in my opinion has been that we didn’t manage to communicate appropriately on the notions, approaches and change of perspective that the user-centric view on activity data represents. There are many reasons for this I believe, one being that we have been assuming that the benefits would be self-evident, while they clearly are not (see the post where we tried to get back the basis of the issue). The notion of user-centric data or consumer data might be very trendy, it does not mean that it is ready for wide adoption. There are many issues that need to be solved that go far beyond the purely technical aspects, and that simply come from the fact that activity data has never been looked at in this way before. We don’t really know what will happen in this space, what users would do with these data and how much interest this could generate for the organisation. There are many difficult questions that we could not really address in the scope of the project (including in particular the questions around data ownership, and privacy). While this is enough to keep us excited, there is enormous work to be done before the approach we have been promoting in UCIAD could reach its potential, and be widely adopted.
- On Semantic Web technologies: While we are still excited about the added-value that semantic web technologies can bring to the analysis of activity data, we have been clearly over-optimistic regarding the maturity of some components we have been relying on, and their ability to handle the scale and complexity of the kind of data we are working with. This issue is clearly summarised in our post on the technical aspect of UCIAD. The good news is however that things are evolving very quickly. It would be a lot easier to implement the UCIAD platform now than it was 6 months ago, as the tools and platforms to deal with semantic data are getting more robust everyday. Also, the evolution of the technology should be followed by an evolution in the skills and ability of the community to adopt such technologies. Realising UCIAD made us reach a better understanding of what was feasible and required to set up a semantic platform for activity data. There is still much to do for such an approach to become feasible in a broader set of situations.