In the previous post, we explained to a certain extent what are our motivations for looking at a user-centric approach to activity data, and especially what we expect to be the benefits to the users. We also quickly sketched some specific aspects of identifying and processing user-specific information in our post regarding the reasoning processes employed in UCIAD. Here, we come back more generally on the aspects related to users and user management in the UCIAD platform, including the way to recognise a user, treat registrations and login, manage and present the information about the user activity and handle access rights over semantic data. The actual prototype of the UCIAD platform implementing all these elements is currently being finalised, and will be described more completely in our final post.
Identifying and managing users of UCIAD
The information the UCIAD platform has regarding users can be seen as similar to the ones basic analytics systems have. The user is rarely seen directly, as the interaction is mediated through a “user agent”: a software programme running on a particular computer. Each HTTP request is associated with the ID of the user agent realising it, and the IP address of the corresponding computer. Analytics system have for long realised that the combination of these two parameters was sufficient to recognise a user with a reasonable level of accuracy. The disadvantage however is that the same user can be using different agents (e.g., different browsers) and different computers (or even mobile phones) to access the Web.
In UCIAD, we have the advantage that it is very likely that the user will connect to the UCIAD platform using the same agents and computers they usually use to access the Web, and especially the considered websites. As shown in the mock-up screenshot above, the “settings” the user is using can be detected at the time of logging in, and be attached to the user account. These settings will then be used to aggregate all the activity data that have been realised using the same computer and user-agent, and be added to the set of activity data for the particular user.
In addition, this provides a convenient mechanism to aggregate information realised on different computers and different settings. The user can log again in the UCIAD platform with a different browser, or a different device. When that happens, as described in the figure below, the current setting will simply be added to the list of known settings for this user, and contribute another set of activity data around this particular user.
As explained in the post about reasoning on user centric activity data, managing the activity data regarding a particular user corresponds to creating a sub-graph of the complete graph of raw activity data we collect from logs, based on the information about the known settings of the user. This graph is then being registered in our repository, and the next step is to ensure that the information being provided is restricted to the graph of the logged-in user.
Managing access rights over semantic data
We store, manipulate and reason over activity data using Semantic Web technologies, namely RDF, a triple store with inference capabilities and SPARQL for querying. As part of the UCIAD platform, we needed a mechanism to restrict the queries being sent to only the part of the data that the current user has access to: his/her own subgraph of activity data.
Unfortunately, most current triple stores, and especially the one we are employing, do not provide sufficiently fine-grained access control mechanisms, allowing to associate sub-graphs to particular users. We therefore implemented our own mechanism, which can be seen as a generic recipe for access control over activity data.
The all idea is actually quite simple (as depicted on the diagram above): the actual SPARQL endpoint collecting all the data for all the users is being hidden using standard security measures so that it can only be accessed by our own system. We then implement a “proxy SPARQL endpoint” that can handle basic HTTP authentification. When receiving a query, this proxy endpoint will check the credential of the user and see what sub-graphs the user has access to, so that it can modify the query to restrict it to these sub-graphs only (using the FROM clause in SPARQL). It can then send the query to the real, hidden SPARQL endpoint and forward the results back to the user.
While this mechanism is relatively simple it offers an appropriate level of flexibility, allowing to define arbitrary subgraphs and user definitions as a model for access control. It is actually nice to see how, based on basic authentification mechanisms, the same queries asking for activity data will return different results, depending on the user who is connected.
What users anyway?
Of course, the mechanisms and techniques to manage, identify and process information about users does not answer the question of who they are and what are the benefits they can get from the system. Actually, as argued before, it is pretty hard to predict in advance what is going to be the use of providing back to the users their own activity data. General arguments can be given on the advantages of self-tracking, but in reality, the really important thing is that what is provided by the system has to stay open for any use. Working with the development version of the UCIAD platform, we find it quite fascinating that we, as individual users, can trace back our activities, drill down into specific categories (e.g., search, commenting on blogs, checking the price of a course), send queries which might only be relevant to us (e.g., “how much did I use data.open.ac.uk on sundays?”), etc. It helps us understand our own use of the resources provided by the University, and so to become more efficient with them.