Archive for the ‘Anonymisation’ Category

Anonymisation is great, but don’t undervalue pseudonymisation

Posted on April 26th, 2014 by



Earlier this week, the Article 29 Working Party published its Opinion 05/2014 on Anonymisation Techniques.  The opinion describes (in quite some technical detail) the different anonymisation techniques available to data controllers, their relative values, and makes some good practice suggestions – noting that “Once a dataset is truly anonymised and individuals are no longer identifiable, European data protection law no longer applies“.

This is a very significant point – data, once truly anonymised, is no longer subject to European data protection law.  This means that EU rules governing how long that data can be kept for, whether it can be exported internationally and so on, do not apply.  The net effect of this should be to incentivise controllers to anonymise their datasets, shouldn’t it?

Well, not quite.  Because the truth is that many controllers don’t anonymise their data, but use pseudonymisation techniques instead.  

Difference between anonymisation and pseudonymisation

Anonymisation means transforming personal information into data that “can no longer be used to identify a natural person … [taking into account] ‘all the means likely reasonably to be used’ by either the controller or a third party.  An important factor is that the processing must be irreversible.”  Using anonymisation, the resulting data should not be capable of singling any specific individual out, of being linked to other data about an individual, nor of being used to deduce an individual’s identity.

Conversely, pseudonymisation means “replacing one attribute (typically a unique attribute) in a record by another.  The natural person is therefore still likely to be identified indirectly.”  In simple terms, pseudonymisation means replacing ‘obviously’ personal details with another unique identifier, typically generated through some kind of hashing, encryption or tokenisation function.  For example, “Phil Lee bought item x” could be pseudonymised to “Visitor 15364 bought item x”.

The Working Party is at pains to explain that pseudonymisation is not the same thing as anonymisation: “Data controllers often assume that removing or replacing one or more attributes is enough to make the dataset anonymous.  Many examples have shown that this is not the case…” and “pseudonymisation when used alone will not result in an anonymous dataset.

The value of pseudonymisation

The Working Party lists various “common mistakes” and “shortcomings” of pseudonymisation but curiously, given its prevalence, fails to acknowledge the very important benefits it can deliver, including in terms of:

  • Individuals’ expectations: The average individual sees a very big distinction between data that is directly linked to them (i.e. associated with their name and contact details) and data that is pseudonymised, even if not fully anonymised.  In the context of online targeted advertising, for example, website visitors are very concerned about their web browsing profiles being collected and associated directly with their name and address, but less so with a randomised cookie token that allows them to be recognised, but not directly identified.
  • Data value extraction:  For many businesses, anonymisation is just not an option.  The data they collect typically has a value whose commercialisation, at an individual record level, is fundamental to their business model.  So what they need instead is a solution that enables them to extract value at a record level but also that respects individuals’ privacy by not storing directly identifying details, and pseudonymisation enables this.
  • Reversibility:  In some contexts, reversibility of pseudonymised data can be very important.  For example, in the context of clinical drug trials, it’s important that patients’ pseudonymised trial data can be reversed if needing, say, to contact those patients to alert them to an adverse drug event.  Fully anonymised data in this context would be dangerous and irresponsible.
  • Security:  Finally, pseudonymisation improves the security of data held by controllers.  Should that data be compromised in a data breach scenario, the likelihood that underlying individuals’ identities will be exposed and that they will suffer privacy harm as a result is considerably less.

It would be easy to read the Working Party’s Opinion and conclude that pseudonymisation ultimately serves little purpose, but this would be a foolhardy conclusion to draw.  Controllers for whom anonymisation is not possible should never be disincentivised from implementing pseudonymisation as an alternative – not doing so would be to the detriment of their security and to their data subjects’ privacy.

Instead, pseudonymisation should always be encouraged as a minimum measure intended to facilitate data use in a privacy-respectful way.  As such, it should be an essential part of every controller’s privacy toolkit!

The anonymisation challenge

Posted on November 29th, 2012 by



For a while now, it has been suggested that one of the ways of tackling the risks to personal information, beyond protecting it, is to anonymise it.  That means to stop such information being personal data altogether.  The effect of anonymisation of personal data is quite radical – take personal data, perform some magic to it and that information is no longer personal data.  As a result, it becomes free from any protective constraints.  Simple.  People’s privacy is no longer threatened and users of that data can run wild with it.  Everybody wins.  However, as we happen to be living in the ‘big data society’, the problem is that with the amount of information we generate as individuals, what used to be pure statistical data is becoming so granular that the real value of that information is typically linked to each of the individuals from whom the information originates.  Is true anonymisation actually possible then?

The UK Information Commissioner believes that given the potential benefits of anonymisation, it is at least worthwhile having a go at it.  With that in mind, the ICO has produced a chunky code of practice aimed at showing how to manage privacy risks through anonymisation.  According to the code itself, this is the first attempt ever made by a data protection regulator to explain how to rely on anonymisation techniques to protect people’s privacy, which is quite telling about the regulators’ faith in anonymisation given that the concept is already mentioned in the 1995 European data protection directive.  Nevertheless, the ICO is relentless in its defence of anonymisation as a tool that can help society meet its information needs in a privacy-friendly way.

The ICO believes that the legal test of whether information qualifies as personal data or not allows anonymisation to be a realistic proposition.  The reason for that is that EU data protection law only kicks in when someone is identifiable taking into account all the means ‘likely reasonably’ to be used to identify the individual.  In other words and as the code puts it, the law is not framed in terms of the mere possibility of an individual being identified.  The definition of personal data is based on the likely identification of an individual.  Therefore, the ICO argues that although it may not be possible to determine with absolute certainty that no individual will ever be identified as a result of the disclosure of anonymous data, that does not mean that personal data has been disclosed.

One of the advantages of anonymisation is that technology itself can help make it even more effective.  As with other privacy-friendly manifestations of technology – such as encryption and anti-malware software – the practice of anonymising data is likely to evolve at the same speed as the chances of identification.  This is so because technological evolution is in itself neutral and anonymisation techniques can and should evolve as the uses of data become more sophisticated.  What is clear is that whilst some anonymisation techniques are weak because reintroducing personal identifiers is as easy as stripping them out, technology can also help bulletproof anonymised data.

What makes anonymisation less viable though is the fact that in reality there will always be a risk of identification of the individuals to whom the data relates.  So the question is how remote that risk must be for anonymisation to work.  The answer is that it depends on the level of identification that turns non-personal data into personal data.  If personal data and personally identifiable information were the same thing, it would be much easier to establish whether a given anonymisation process has been effective.  But they are not because personal data goes beyond being able to ‘name’ an individual.  Personal data is about being able to single out an individual so the concept of identification can cover many situations which make anonymisation genuinely challenging.

The ICO is optimistic about the benefits and the prospect of anonymisation.  In certain cases – mostly in the context of public sector data uses – it will clearly be possible to derive value from truly anonymised data.  In many other cases however, it is difficult to see how anonymisation in isolation will achieve its end, as data granularity will prevail in order to maximise the value of the information.  In those situations, the gap left by imperfect anonymisation will need to be filled in by a good and fair level of data protection and, in some other cases, by the principle of ‘privacy by default’.  But that’s a different kind of challenge.

 
This article was first published in Data Protection Law & Policy in November 2012.