Another year, another buzz word, and this time around it’s “Big Data” that’s getting everyone’s attention. But what exactly is Big Data, and why is everyone – commercial organisations, regulators and lawyers – so excited about it?
Put simply, the term Big Data refers to datasets that are very, very large – so large that, traditionally, supercomputers would ordinarily have been required to process them. But, with the irrepressible evolution of technology, falling computing costs, and scalable, distributed data processing models (think cloud computing) Big Data processing is increasingly within the capability of most commercial and research organisations.
In its oft-quoted article “The Data Deluge”, the Economist reports that “Everywhere you look, the quantity of information in the world is soaring. According to one estimate, mankind created 150 exabytes (billion gigabytes) of data in 2005. [In 2010], it will create 1,200 exabytes.” Let’s put that in perspective – 1,200 exabytes is 1,200,000,000,000 gigabytes of data. A typical Blu-Ray disc can hold 25 gigabytes – so 1,200 exabytes is about the equivalent of about 48 billion Blu-Ray discs. Estimating your typical Blu-Ray movie at about 2 hours long (excluding special features and the like), then there’s at least 96 billion hours of viewing time there, or about 146,000 human life times. OK, this is a slightly fatuous example, but you get my point – and bear in mind that global data is growing year-on-year at an exponential rate so these figures are already well out of date.
Much of this Big Data will be highly personal to us: think about the value of the data we all put “out there” when we shop online or post status updates, photos and other content through our various social networking accounts (I have at least 5). And don’t forget the search terms we post when we use our favourite search engines, or the data we generate when using mobile – particularly location-enabled – services. Imagine how organisations, if they had access to all this information, could use it to better advertise their products and services, roadmap product development to take account of shifting consumer patterns, spot and respond to potentially-brand damaging viral complaints – ultimately, keep their customers happier and improve their revenues.
The potential benefits of Big Data are vast and, as yet, still largely unrealised. It goes against the grain of any privacy professional to admit that there are societal advantages to data maximisation, but it would be disingenuous to deny this. Peter Fleischer, Google’s Privacy Counsel, expressed it very eloquently on his blog when he wrote “I’m sure that more and more data will be shared and published, sometimes openly to the Web, and sometimes privately to a community of friends or family. But the trend is clear. Most of the sharing will be utterly boring: nope, I don’t care what you had for breakfast today. But what is boring individually can be fascinating in crowd-sourcing terms, as big data analysis discovers ever more insights into human nature, health, and economics from mountains of seemingly banal data bits. We already know that some data sets hold vast information, but we’ve barely begun to know how to read them yet, like genomes. Data holds massive knowledge and value, even, perhaps especially, when we do not yet know how to read it. Maybe it’s a mistake to try to minimize data generation and retention. Maybe the privacy community’s shibboleth of data deletion is a crime against science, in ways that we don’t even understand yet.” (You can access Peter’s blog “Privacy…?” here.)
This quote raises the interesting question of whether the compilation and analysis of Big Data sets should really be considered personal data processing. Of course, many of the individual records within commercial Big Data sets will be personal – but the true value of Big Data processing is often (though not always) in the aggregate trends and patterns they reveal – less about predicting any one individual’s behaviours, reactions and preferences, and more about understanding the global picture. Perhaps its time that we stop thinking of privacy in terms of merely collecting data, and look more to the intrusiveness (or otherwise) of the purposes to which our data are put?
This is perhaps something for a wider, philosophical debate about the pros and cons of Big Data, and I wouldn’t claim to have the answers. What I can say, though, is that Big Data faces some big issues under data protection law as it stands today, not least in terms of data protection principles that mandate user notice and choice, purpose limitation, data minimisation, data retention and – of course – data exports. These are not issues that will go away under the new General Data Protection Regulation which, as if to gear itself up for a fight with Big Data proponents, further bolsters transparency, consent and data minimisation principles, while also proposing a new, highly controversial ‘right to be forgotten’.
So what can and should Big Data collectors do for now? Fundamentally, accountability for the data you collect and process will be key. Your data subjects need to understand how their data will be used, both at the individual and the Big Data level, to feel in control of this and to be comforted that their data won’t be used in ways that sit outside their reasonable expectations of privacy. This is not just a matter of external facing privacy policies, but also a matter of carefully-constructed internal policies that impose sensible checks and balances on the organisation’s use of data. It’s also about adopting Privacy Impact Assessments as a matter of organisational culture to identify and address risks whenever using Big Data analysis for new or exciting reasons.
Big Data is, and should be, the future of data processing, and our laws should not prevent this. But, equally, organisations need to be careful that they do not see the Big Data age as a free for all hunting season on user data that invades personal privacy and control. Big issues for Big Data indeed.