Genetic information and the future of privacy protection

Remarks at the P3G Privacy Summit: Data Sharing and Cloud Computing, 5th Paris Workshop on Genomics Epidemiology

May 3, 2013
Paris, France

Address by Jennifer Stoddart
Privacy Commissioner of Canada

(Check against delivery)


Thank you for inviting me to address you.

In my time today, I want to discuss the evolving privacy landscape and how it’s marked by changes in technology, making vast amounts of personal information more readily available than ever imagined.

Vast amounts of personal information are being amassed and used not only for the commercial benefit of marketers, but also for national security or law enforcement purposes by the state.   

We all recognize that the increasing capacity to both capture and analyse data can also be put to the laudable goal of improving the global health of populations for the greater benefit of society – as your important research clearly shows.        

What I hope to do in the next few minutes is situate the research endeavor into the larger societal picture and discuss some of the regulatory challenges posed in this burgeoning age of Big Data. I will also discuss some of the work of my Office in the area of health privacy.

Big Data: unprecedented accumulation, analysis, use and privacy concerns

By now you are no doubt accustomed to hearing people herald the age of Big Data.

That’s the shorthand for what many agree is a technological revolution transforming many aspects of our societies and forecast to change how we all go about our daily lives.

Amidst the hype, though, I would have us all reflect on the words of the Director of Statistics for the OECD who recently wrote that:

“Big data does not automatically mean bigger and better information. An increasingly important function of national statistics offices in the future will be to help users separate high quality statistical information from low quality data coming from all kinds of new sources.” 

To demonstrate on how much “new” data there is to work with and sort through, consider that in the year 2000 roughly a quarter of the stored information in the world was digital.

Today, more than 98 per cent is digital. This no doubt reflects the veritable explosion in data over the past decade or so collected through e-mail, instant messages, voice mail, Web surfing, social media postings, surveillance videos and so on.

This unprecedented increase in digital data gave birth to a qualitative change in how the information is being used.

Traditional analysis started with a question and collected data to look for the answers.  

Now “advanced analytics” looks at the data to determine what questions it can answer, often including aspects that would not have occurred to anyone to even ask about.

Traditionally, scientists had a theory about human nature and tested it. One commentator noted that the theory of Big Data is to have no theory.

You simply gather huge amounts of information, observe the patterns and estimate probabilities about how people will act in the future. 

A new breed of scientist sifts through reams and reams of data, crunches it using analytical algorithms, making observations and connecting various dots, before suggesting what it even means and how it can be used. 

In other words, the scientific method inherited from the Enlightenment takes a backseat to this new, more impressionistic method.

This phenomenon, known as “predictive analytics”, is already happening and represents a very lucrative approach for big business. As recounted by New York Times writer Charles Duhigg, the retail giant Target developed a “pregnancy-prediction algorithm” to apply to its mammoth data base of what women shoppers bought at Target stores in the United States.

By analyzing purchases of a carefully selected two dozen items, Target’s data scientists could not only assign a “pregnancy prediction” score to each female shopper, but also estimate her due date within a very small window.

This allowed Target to send these women coupons timed to specific stages of their pregnancies.

The company deemed it a marketing triumph. But the results hit home – and I mean literally—when the irate father of a teenage girl found out she was pregnant – albeit in a roundabout way - based on coupons for maternity items, baby clothes and nursery furniture mailed to her at the family home.   

To health researchers like youselves, the ethical implications of such marketing behaviours are, of course, glaring.

Challenges to meaningful consent and anonymization, and prevalence of data breaches

You may think this example is far removed from your own area of research, but it does form part of the larger context which is motivating recent proposals to reform data protection legislation in Europe and elsewhere.    

Indeed, the rise of predictive analytics has sparked soul-searching among privacy advocates and regulators. That’s because the protection of personal information has long rested on three fundamental principles:

  1. A basic understanding by people of how their personal information will be used in order to provide informed consent;
  2. The use of that information only for the declared purpose for which it was initially collected and consented to, or purposes consistent with that use; and
  3. The minimization of personal information collected to what is directly relevant and necessary to accomplish the declared purpose and the discarding of the data once the original purpose has been served.

Having enshrined these principles in national privacy laws and in international understandings, many defenders of privacy have been understandably dismayed to realize that predictive analytics and Big Data are literally turning this framework on its head.

Big Data hasn’t simply increased the risk to privacy; it has changed the nature of that risk.

To be frank the idea of informed consent in the online age has always been an ideal, not a reality.

I say this because surveys show that most people don’t bother reading privacy policies on websites before surrendering personal information such as name, age and gender, and then proceeding to voluntarily reveal much more about themselves in their postings.

And even when people do struggle through the dense legalese of most policies, very few can grasp all the implications.

My Office has worked diligently with organizations which we have investigated, audited, or otherwise examined to make these policies clearer and simpler. 

Yet our efforts have not significantly modified the practice of unintelligible privacy policies.   

With the era of Big Data, however, the very concept of informed consent is challenged even further.

Going back to the Target story, I doubt very much that the young woman was warned in the retailer’s privacy policy that the data collected on her shopping habits would be used to predict her pregnancy.

More generally, how can individuals give their consent to a future action that may be unknown at the time they are deciding to give or withhold consent?

Similarly, how can the data collector give fair notice of a purpose that didn’t exist when the personal information was collected?

And the very thought of minimizing collection and discarding data is anathema to the whole ethos of Big Data.

These are very significant questions and many data protection experts are justifiably alarmed that such uses of personal information may already be widespread, even though they contravene basic principles such as the Fair Information Principles of the OECD Guidelines of 1980.

As well, another strategy used to ensure privacy – the anonymization of data – is being challenged by advances in technology.

A prominent health researcher at the University of Ottawa and Children’s Hospital of Eastern Ontario, demonstrated this in 2011. 

Dr Khaled El-Amam’s work showed that knowing dates of birth, postal codes and gender were sufficient to identify people in a health information database even though their names and all other personal information were stricken.

Even without gender, 97 per cent of the individuals could be identified.

It should be known that Dr El-Amam is himself an advocate of anonymization and has devoted further work to showing how it can remain effective.

But his work to which I referred shows just how challenging an objective this is becoming as computing power escalates exponentially.

Take, for example, some work reported on earlier this year at the Whitehead Institute at the Massachusetts Institute of Technology. There, researchers were able to re-identify nearly 50 individuals who had submitted personal genetic material as participants in genomic studies.

Adding to the list of data protection issues that concern both regulators and citizens is the growing magnitude of data breaches.

In the digital age, it’s not only far easier to accumulate data, but also to lose it. What used to be storable only in a dedicated filing room can now be easily kept and therefore lost on storage devices that can fit in your pocket. 

This new vulnerability to data loss, in which the threat posed by a malicious hacker can be rivalled by or even pale in comparison to a well-meaning researcher or graduate student who wants to take work home with them, is something for all organizations, within any field, to guard against.

The drive to strengthen privacy protection

All of this has prompted serious reappraisal. Some of the proposals being debated by the Committee on Civil Liberties, Justice and Home Affairs of the European Parliament are a manifestation of that paradigm shift. 

For example, in the face of concerns about viability in the face of technological change, they seek to restrict the use of pseudonymous data. And in order to strengthen individual confidence and control over how personal information will be used, the committee has proposed a tightening of consent provisions.

Even in the United States, where a narrower interpretation of what constitutes personal information has up until now been the norm, there is more recent movement to broaden it. 

And in Canada, I myself have raised the need for our federal data protection laws to be modernized and provide greater incentives to increase protection through stronger enforcement of both privacy and security.

All this said, I can imagine you, as health researchers may be wondering why your work would need to be treated in the same way as that of data-mining marketers. 

After all, many of you are not commercially driven, but rather, motivated by the noble vocation to improve the health and well-being of populations around the globe.  And as well, unlike marketers, your work is subject to review by ethics boards and committees.

It’s clear that such efforts are not, and should not, be stifled by borders.  Diseases, especially rare ones, require global collaboration – and that requires the ability to work, and share information, across borders. 

All told, I obviously see privacy as an important value, but not as an absolute.  It’s a value that cannot be unduly diminished at the expense of personal freedom; so too, it’s one that should not unnecessarily hinder innovation to benefit health and humanity. 

Individual consent for health research is an ideal, but it may be increasingly impracticable.  

Where there is no harm to the individual data subject, ethical treatment, proper governance, confidentiality and security of data may be acceptable alternatives for ensuring human dignity.

Work to harmonize privacy protections

Reconciling different points of view on data protection can be challenging.  Policymakers in the European Union, the United States and Asia appear to have different perspectives on privacy.

And of course, even within Europe, the debate around the reform of the EU data protection framework has revealed significant differences of opinion. 

As Canada’s Privacy Commissioner, I have been privileged to partake in these global discussions and hear first-hand the different perspectives of various jurisdictions.  I sometimes find myself in the stereotypical Canadian role of acting as somewhat of a bridge between them. 

Over the last three years I have had the honour of chairing a volunteer group of experts that was established to advise the Organisation for Economic Cooperation and Development on revising the Guidelines on the Protection of Privacy and Transborder Flows of Personal Data, adopted by the OECD in 1980.

Reaching consensus has not been easy.  I was in Paris at an OECD meeting a few weeks ago where proposed revisions to the Guidelines were discussed.  I should add that there are no plans to revise the eight foundational principles; rather, we were discussing supplementary text that elaborates on the Guidelines. 

Perhaps not surprisingly, differences of opinion emerged around the issue of transborder flows of data.

I think it is safe to say that everyone recognizes the importance and inevitability of international information flows. The challenge is how to ensure that this data is protected when it moves across borders. 

The data protection community recognizes that, at least in the short term, harmonizing our national laws to an international standard is unrealistic.  Instead, we have to think about how we can make our laws work together to achieve common objectives.

Interoperability is the new buzz work and we are seeing some promising developments.  For example, earlier this year, representatives of the EU’s Article 29 Working Party met for the first time with representatives from Asia Pacific Economic Cooperation economies to develop a set of tools to facilitate transfers of personal data, for multi-national companies that operate both in Europe and the Asia-Pacific.

In many ways, what I have strived to do during my mandate over the past 10 years on the regulatory front to facilitate harmonization, or at least inter-operability of international data protection laws, very much parallels the important work that Bartha Knoppers, P3G, and all of you as its members, have been striving to do on the data access side to facilitate global data sharing. 

So while we have been working at the issue from different perspectives, in the end, we are not so far apart.

Guarding against chilling research

There’s yet another issue on which we have common cause. 

As you know, one rapidly evolving area involves the potential collection and use of genetic information by insurance companies.

My Office has an interest in this question given our private sector legislation which limits the personal information organizations can collect to only that which is “necessary” for carrying out a legitimate business purpose. 

A legal and policy question we are examining is whether, in the current state of knowledge, life and health insurance companies need to access applicants’ genetic analyses for underwriting purposes.

To address this question, my Office commissioned two  papers by academic experts, Angus Macdonald, an actuarial scientist from Herriot-Watt University, and Michael Hoy, an economist from University of Guelph, Ontario.

Both experts conclude that, at present and in the short to mid-term future, a ban on the use of genetic information by life and health insurance providers (either by way of law, policy or moratorium) wouldn’t have a significant impact on the efficient operation of insurance markets.

I am currently taking this under advisement as I ponder what policy position to take on this important question. 

But what I can say is that, like you, I worry about the chilling effect insurance companies can have on potential participants of socially-important health research who fear the prospect of having to give over research results for risk assessment purposes.

Before I close, I want to briefly mention some of the research funded by my Office under our Contribution Program.  Over the years, we have invested more than 3 million dollars to research in the area of privacy, $500,000 of which pertained to health privacy matters, and more than $270,000 of which was devoted to genetic privacy more specifically. 

I am proud to say that our program has attracted internationally renowned researchers like Bartha, who together with her colleague, Denis Avard at the University of McGill, carried out research on privacy and confidentiality of paediatric biobanks.  

We’ve funded other scholars such as Trudo Lemmens and Lisa Austin to look at governance issues related to biobanks, and Nola Ries and Tim Caulfield to examine privacy policies and practices of the growing business of online direct-to-consumer genetic testing companies.

And finally, in the debates about the European Regulation and elsewhere in civil society, populations increasingly understand their own contributions to Big Data and to scientific research generally. 

People increasingly expect not only direct beneficial results, but acknowledgment of their own contributions.  

To exemplify this, let me turn to some words from Deborah the daughter of the cell donor, taken from Rebecca Skloot’s best-selling The Immortal Life of Henrietta Lacks:

“But I always have thought it was strange, if our mother cells done so much for medicine, how come her family can’t afford to see no doctors? Don’t make no sense. 

“People got rich off my mother without us even knowin about then takin her cells, now we don’t get a dime.”

Respecting privacy and protecting personal information is an ethical practice synonymous with trust. 

It’s timeless.  It cuts across borders, cultures and fields.  And as technological change marches on, its importance will grow in lock-step.

Date modified: