Collecting and Processing Personal Activity Data

June 5th, 2012

I seem to be saying that all the time, but we are currently in one of the busiest periods of the project: processing data. An enormous and very tiring part of the beginning was spent on collecting data, which explains our relative silence lately. We discuss here some of the lessons we can already draw from what we have done.

What Data?

As explained earlier, the goal of the project is to investigate the idea of user-centric activity data, what users would do with it and what it would imply in terms of organisational policies. The way we will realise that is by collecting information about the use of the various website of the Open University by a dozen users over a period of 4 weeks.

We selected users so that they can represent a wide variety of roles in the organisations: students (mostly post-graduate), associate lecturers, researchers, admin and support staff. With the help of the IT services of the OU, we then created a script that extracted the log entries for these users (based on their identifiers) for all the concerned websites (intranet, virtual learning environment, general websites, etc.)

The organisational process of collecting “personal” data

The most difficult part of collecting data such as the one we are considering is clearly not technical. We naturally had to ensured that the users selected enrolled voluntarily and understood what we were going to do with the data, and what was expected from them.

Prior to that however, we had to spend a lot of time obtaining approval from various parts of the open university: the ethics committee, the student research project committee, IT security, the data protection coordinator, etc.

In short, we ran into a cycle, where we were directed from one committee to the other, until the situation finally resolved. The lessons learned here are, first, not to underestimate the inefficiency of organisational structures and plan for this very early and with the most pessimistic view on the time it would take. It is a part of the project that very much illustrates the Hofstadter’s law: “It always takes longer than you expect, even when you take into account Hofstadter’s Law.”

This is tricky however as in turns out that the best time to do these things, according to the official processes, is at proposal time. It is obvious however that, if we were to do that, we would never submit any proposal… Generally, the whole thing is rather frustrating and gives the impression that the whole purpose of these committees is to prevent things from happening.

Another important aspect about getting approval from such committees is: they don’t understand anything to what we do! It did escape us a tiny bit, but it is obvious. These are groups of people looking at research across the whole of the university, with not technical background into what we do, and our topic is clearly confusing: it is doing research directly on personal data, not just having implication of personal data. The need to be very “pedagogical”, and to explain as clearly as possible that there is not way any evil can come out of our research is clearly a challenge.

Visualising personal activity data

In order to explore the use of user-centric activity data, we are investigating an interface for personal web analytics, which is similar to a web analytics such as Google Analytics or Piwik, but where the relationship between the organisation, the user and the data is inverted.

We developed in the first phase of the project an initial version of such an interface, but it was not really intuitive and effective enough. We are now developing a new version which uses a more effective query engine, and with a lot of pre-processing of the data, so that it can be actually used by a variety of users.

UCIAD II interface mock up

UCIAD II interface mock up

What the data already told us

At the moment, we are processing the data. We we can tell however already is how the role a person takes clearly impacts on the way they use university websites. Only the size of the log entries collected for the different users tell us that: students don’t really do much, and on very specific websites (virtual learning environment), researchers and lectures a bit more on other specific websites (tutor system, expense claim system) while admin staff generates a lot more logs on a wider variety of systems. The remaining question is: how these roles will also impact on the way people would use their own activity data, and how the multiple roles people might have, their different personae, interact and should be supported.

The data will tell us more when we will put it in front of the user within the next couple of weeks…

Why I don’t believe in the personal data economy

May 1st, 2012

I care a lot about personal data (especially mine), and yep, my and everybody else’s personal data are all over the place. They are all over the place because they are valuable. They are all over the place because I (and you) don’t have much control over the way they are being collected, exchanged and exploited.

Now, what I conclude from that is that we need more control (and, of course, I would not pretend that I’m the only one thinking that, but that’s not the point). Apparently, what other people would conclude from that is that you should be selling your personal data. That there is such an idea of a “personal data economy” where instead of giving it away, you can decide of a price for your personal data. This is frequently being promoted as the latest, new, brilliant idea on the Web, most recently from HP Labs: “A Stock Exchange for Your Personal Data”.

Why didn’t we think about that before?

Well, quite simply because we did. Numerous times. And it did not work.

Have you heard of things like the Attention Recorder, from the Attention Trust, that was going to make the attention economy the big thing of… hurmm… 2007? Well, me neither.

Why I do not believe in this thing is quite simply because I don’t understand how it could work. It sounds really naive, mostly for three reasons:

  1. The fact that my personal data are valuable does not mean I want to sell them. Apparently, my organs are valuable. There is a market with people who would be more than happy to harvest them and get some benefits from that. To me, they are just essential. They don’t have value from an economic perspective. They have value because I can’t exist and function without them. In other terms, it is not because it has market value that it is a commercialisable personal asset. You would have to be seriously naive to think that it is the case, and come up with very convincing evidence to convince me that this is the way things work.
  2. Putting a price on it does not prevent from getting it for free. That’s one part I really never understand. Personal data is being collected from us, and this is actually necessary for quite a few things to function. If this is the case, why exactly would anybody want to pay for it? You can argue that this is exactly the point: they should not have access to it for free. But how is it that this is going to change? This is not a trick question: I truly don’t understand this!
  3. Wouldn’t it actually kill the personal data economy? Taking the risk to appear as contradicting myself, I would argue that this personal data economy exists already, but not in the form it is envisaged in this “personal data stock-market” trend. We register to online services, allow them somehow to collect data from us, in exchange for what they have to offer. Personal data are not the product, they are the currency! We buy stuff on the Web with our personal data, and actually most of us are reasonably happy about most of what we get out of the deal. We could stretch this metaphor further, and talk about “personal data banks” (some do) and “personal data currency exchange”. But the real point here is that changing the current “market” to a model where personal data is a product being sold would change the balance of this economy: personal data stop having the quality of a currency — i.e. having the same value for everybody. Now, here is the trick: if personal data stop having the same value for everybody, they start being biased; And if they start being biased, they stop having value to the people who exploit them.

Now of course, I probably misunderstood something somehow, and maybe all these points have been addressed (in which case, when can I start making money to tell you how much I weight? — which reminds me I should stop telling things to my friends, just in case). Until that is confirmed through, I would rather work on being able to understand and control the way I spend my personal data online.