A Record Challenge: Protecting Privacy in the Era of Big Data

This page has been archived on the Web

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

Remarks at the ARMA International Annual Information Symposium

September 15, 2010
Toronto, Ontario

Chantal Bernier
Assistant Privacy Commissioner of Canada

(Check against delivery)


I. Introduction

It is a great pleasure to be here today, and to participate in a symposium on an issue of paramount importance to the Office of the Privacy Commissioner of Canada.

There are two defining pressures in the world of privacy now: Public safety measures and the development of information technology or “Big data”  That will be the focus of my presentation today: What are the specific challenges to privacy in the era of big data.

As Canada’s federal data-protection authority for both the public and private sectors, we receive dozens of complaints every year from people concerned about the way their personal information has been collected, used or shared, whether by government or by a private company.

Hundreds more are vexed by difficulties and delays in gaining access to their personal information in the hands of institutions. Often as not, that can be traced back to a problem of poor recordkeeping.

And every year we learn of breaches of personal information – some minor, others hair-raising.  This is not new.  What is new is the information management and challenge in the digital age.

It is a matter of adequate data security, and policies or procedures around the handling of personal records.

A core function of our Office,  is to ensure the proper collection, use, disclosure, storage, retention and disposal of personal information by government and private enterprises.

In the time available to me this morning, I propose to explore the scope of the privacy challenges we all face in this data-hungry world, both in the private sector and in the public sector

I will also share with you how our Office has been using its authority to help strengthen safeguards for people’s personal information.  But first, let look at how the age of big data redefines the protection of privacy.

II. Age of Big Data

Software engineers and computer scientists inform us that we are living in the age of “Big Data,” deluged by data measured by the petabytes, exabytes and zettabytes.

According to the Economist, for example, one million customer transactions per hour feed into Wal-Mart’s databases, which in turn are estimated to hold more than 2.5 petabytes of data.

How much is that? Take all the books catalogued in America’s Library of Congress, and multiply by 167.

Aside from the unthinkably vast amounts of data being generated, a major challenge lies in today’s awesome computing power. That makes it possible to collect, match, cross-reference, sift, mine and store data in limitless ways. It is thought, for example, that Google processes about one petabyte of data every hour.

The processing of data creates more information, and more records – and the cycle continues.

The info is not only voluminous.  It is volatile and vulnerable.

In this Web 2.0 world, everything – a picture or a post, an e-mail or a tweet, all drafts of a reworked wiki entry, an audio or video file, a fingerprint or an iris scan, a person’s DNA --

You name it: As long as it can be reduced to a set of ones and zeroes in some databank, it’s a record.  It can be disseminated more broadly than ever, faster than ever and if misdirected, deliberately or accidently, cause greater harm than ever.

It is also interesting to note that according to the International Data Corporation, all the gigabytes that people create through their own actions – snapshots, blogs, e-mails, ATM transactions, downloading music and so on – amount to less than 10 percent of all the information that exists about them.

The other 90 percent coalesces behind them in a contrail of credit records, surveillance images, analytics on behaviour, web-use histories and so on.  We are coming to a world where ambient technology picks up personal information as an “emanation” so to speak, gathering perhaps independently trivial information, but aggregated, forming disturbingly complete profile of an individual.

In all, IDC figures the “digital universe” is expanding by 50 percent per year, to an almost inconceivable 35 trillion gigabytes by the end of this decade.

III. Risk – unauthorized disclosure

As all phenomena that bring high benefits, data collection, of course, brings some ominous risks, particularly to privacy.

Risks range from damage to reputation, impact on career or income, the economic loss, for example, with fraud or hacking, to misuse or information by government authorities and even risk to public safety where personal data can be used for criminal purposes.

Through an audit of selected Ontario mortgage brokers that we reported on last spring, for example, where we examined why hundreds of credit reports had been wrongfully accessed using a web-based tool, we found that in every case, an individual impersonating an experienced mortgage agent had been able to download credit reports for his own nefarious purposes.

While police probed the fraudulent aspects of these incidents, our investigation focused on shortcomings in the policies and procedures of mortgage brokerages that led to inadequate safeguards for their clients’ personal records.

Loss or theft of data storage devices constitutes another risk.

Just last June, for instance, an unencrypted computer memory stick was stolen from an employee’s purse, exposing the medical files of 763 surgical patients at three major Toronto hospitals. This in spite of the fact that Ontario’s Information and Privacy Commissioner issued guidelines in 2007, advising health-system workers to encrypt all patient information that may be carried on mobile devices.

Last year, Agriculture Canada was the victim of a hacker who compromise the privacy of 60,000 farmers, accessing their personal credit files.

IV. Mitigating Risks

The question is: how do we mitigate these risks?  The answer is:

  1. Good records management
  2. Identifying purposes
  3. Minimizing collection
  4. Oversight for privacy

I will elaborate on each good records management.

1. Good record management

Mitigating the risks from unauthorized access or disclosure are  one compelling reason for good records management.

But there are others.

Another reason is ensuring the capacity to retrieve information is a key requirement of the privacy law that individuals have access to their personal information that right cannot be honoured without proper information management

Indeed, our investigations branch has encountered situations of appalling disorder in private sector and public sector files, hopelessly frustrating efforts by individuals to gain access to their own personal information.

In one recent case involving a Crown corporation, we found the files in complete disarray, with pages missing, renumbered and probably tampered with. As such we had no way of comparing redacted records that had been released to the complainant under an access-to-information request with any original documentation.

A few years back, we conducted an audit of the Canada Border Services Agency’s recordkeeping practices.  We found that CBSA’s record management was such that the agency could not, with reasonable certainty; report on the extent to which it shares information with the United States. 

They have seen then, implemented our recommendations and a follow up audit confirmed the improvements they had brought to their record management.

A different audit of Canada’s passport operations turned up deficiencies in storage and disposal of the personal data required for passport applications. Even more disturbing was our discovery that there was no audit trail to record when, why and by whom records were accessed.

2. Identifying purposes

Good records hygiene is also about clarifying the reasons for collecting and using personal information in the first place. This is an obligation under the privacy laws, and it serves as an important bulwark against the dangers of ‘function creep – because the technology to get the data is there – so we want it.

With the proliferation of data, we see it all the time – often under the guise of public safety or national security.

Police, for instance, might use cameras to collect surveillance or licence plate data to address particular problems in the present. But then they often want to hold onto it – just in case they find a purpose for it in future.

We are also seeing the state put pressure on the private sector to help gather data on its behalf.

Consider, for example, how an airline might once have used its passenger lists -- for reasons of booking its flights and perhaps refining its marketing efforts. Today, however, such lists are apt to be used in the global war on crime and terrorism.

Sometimes we cannot even guess at what a record might be used for in future. Maybe your DNA is needed for a medical test today. But with the huge advances in scientific understanding of the human genome, predictions about future uses for genetic information are limited only by our imaginations.

The only way to respect privacy while keeping up with info technology developments is to clearly streamline the primary essential purposed for collection, use disclosure of information.

3. Minimizing collection

Both the Privacy Act, which applies to the public sector and PIPEDA which applies to private sector, federally regulated entities, restrict collection of personal information.

The Privacy Act obliges government institutions to respect the privacy of individuals by limiting the collection, use, disclosure, retention and disposal of recorded personal information according to what is strictly necessary

Any personal information to be collected has to relate directly to a government program or activity. The data can ordinarily only be used for the purposes for which it was collected, or for a use consistent with that purposes.

The law also sets out the particular circumstances under which an institution may disclose personal information without the consent of the individual -- and where, conversely, it may refuse to give the individual access to the information.

Treasury Board policies and guidelines have fleshed out aspects of the Privacy Act since the legislation came into force 27 years ago.

For instance, because the Act gives people a general right to see their personal information, and to correct it as necessary, Treasury Board developed a Directive on Recordkeeping that sets out rules on the retention and disposal of data.

The Treasury Board Secretariat is also working on guidelines for the appropriate use of social networks, which will address such matters as the risk of unauthorized disclosure of sensitive government and personal information. 

One of the most powerful tools Treasury Board designed to ensure privacy is called Privacy Impact Assessments (PIAs) which our office reviews. The purpose of PIAs is to identify and address privacy implications of a government program or initiative that entails significant reliance on departments and agencies to integrate privacy consideration into their programs.

On the private-sector side, meanwhile, where PIPEDA applies to federally regulated enterprises such as banks and airlines. It also applies to other businesses across most of Canada, except in provinces that have similar statutes – B.C., Alberta and Quebec, and, for health information, Ontario, also PIPEDA sets ground rules for how such organizations may collect, use or disclose information about individuals in the course of commercial activities.

Based on 10 Fair Information Principles, the law holds organizations accountable for the personal information under their control.

Among other things, PIPEDA also obliges organizations to identify the purposes for which they want to collect information, obtain meaningful consent for the collection, and limit the collection to what is reasonably necessary for the stated purposes.

The law, moreover, requires organizations to have appropriate safeguards for the personal information they collect, and sets rules around its retention. When organizations no longer need the personal information for its original purpose, they have to destroy or erase it, or make it anonymous.

In sum: Organizations need the policies, procedures and people that allow them to address the whole lifecycle of the personal information they hold.

Because, as we tell organizations over and over again: “If you can’t protect it, don’t collect it.”

4. Oversight for privacy

The OPC ensures compliance with Canadian Privacy Law mainly through three functions:

  1. Investigations
  2. Audits
  3. Review of PIAs

(a) Investigations

One major investigation we conducted a few years ago found that the organization behind the Law School Admissions Test did not need to collect thumbprints from students taking the tests, since photographs of the students were an effective and less intrusive alternative.

Our landmark Facebook investigation of last year also dealt with data collection and retention, and whether users are properly informed about the networking site’s policies.

Google Street View as well has prompted concerns about data collection and retention. For example, the company promised to blur the faces of people caught by the streetscape cameras. But we went further, seeking from Google an assurance that the original images are destroyed after blurring.

We recently also announced an investigation into Google Street View’s collection of Wi-Fi data. I cannot, however, speak to the case at this time, as it remains under investigation.

(b) Audits

Sometimes, instead of conducting individual investigations, we conduct a privacy audit as a way to get at the broader underlying issues.

Last year, for instance, our audit of FINTRAC, the Financial Transactions and Reports Analysis Centre of Canada, concluded that too much personal information was being reported to the agency by private-sector entities such as banks, real estate agents, gambling casinos and so forth.

In just a few weeks we will publish our findings in another audit – this one on federal government privacy policies and practices related to the disposal of paper documents and surplus computers. The purpose was to follow up on some egregious data breaches in the past.

Again, I cannot reveal yet what we found, but I encourage you to look up the audit when it is published.

(c) Review of Privacy Impact Assessment

As mentioned before, federal agencies must develop and maintain a PIA for every initiative involving the collection, use or disclosure of personal information.  The purpose is to ensure it complies with privacy requirements and that any privacy issue of potential concern is addressed.  We review PIAs in light of four parts test to justify the collection of information and of the 10 fair information principles to ensure its protection.  The four part test addresses questions of:

  1. Necessity
  2. Proportionality
  3. Effectiveness and
  4. Less privacy invasive alternatives.

The 10 fair information practices ensure the security of the information collected.

V. Data disasters

 The vulnerability of personal information in the age of big data is such that in spite of all our efforts, we still see data disasters.  Organizations in general are often tempted to collect and hang on to information – “just because,” or “just in case.” Besides, it is becoming cheaper to stash it than to sift it, sort it and delete the surplus.

But if the compilation and storage of data is cheap, data disasters are not. That is something that international retail giant TJX knows all too well.

In 2005, thieves cracked the computers of TJX, which owns HomeSense and Winners, and stole data on more than 45 million payment cards, as well as other personal information of people in Canada, the U.S. and overseas.

And they continued to steal data for a year-and-a-half before TJX finally got onto them.

Our Office launched a joint investigation with our Alberta counterpart. Among the major failings we turned up was that TJX collected too much information and kept it too long.

It seems like a simple error, but it cost the company well over $250 million to fix its computer systems and deal with lawsuits, investigations, and other claims stemming from the breach. And that’s not counting the cost to its reputation.

In another case, in the public sector, highly sensitive information about a Canadian citizen incarcerated abroad leaked to the press.  We initiated an investigation and found that over 1,000 public servants in one department could access that individual’s personal information.  We recommended information management changes which the department implemented fully, coming to terms with the reality of the digital age.

VI. Breach notification

In light of such catastrophes, our Office worked with with public sector and private consumer and business groups we developed a set of voluntary data-breach notification guidelines. The guidelines, published in 2007, outline key steps in responding to data spills, such as containing the breach, evaluating the risks, notifying people affected, and preventing future breaches. We also developed an online reporting form, and designated a “notification officer” within our Office to ensure that breach reports arrive at one central place for attention.

Those are good and important steps, but we know that more is needed. And so we are particularly optimistic about amendments to PIPEDA that would make data-breach notification mandatory.

Bill C-29, which has received first reading in the House of Commons, would oblige organizations to report any material breach of security safeguards to our Office. Where a breach poses a real risk of significant harm, organizations would also be required to notify the affected individuals of the incident.

With the public sector, we engage in on-going collaboration to ensure prompt breach notification and proper response on our side.

VII. Other OPC activities

Meantime, our Office is also working on numerous other fronts to ensure that the personal information of Canadians is properly safeguarded.

In January, for instance we participated in a symposium on recordkeeping in the Web 2.0 environment that was sponsored by Library and Archives Canada.

Our 2009-2010 Contributions Program supported a number of relevant research programs, including one on the control of personal medical data held in electronic health records.

We are working with Alberta and British Columbia on an information security checklist, and our three offices previously collaborated on a guide to help retailers limit the collection of driver’s licence data.

Our Office is also about to relaunch an interactive, web-based tool to help retailers and other small businesses safeguard their data. The online mini-course walks entrepreneurs through such critical steps as preparing an information audit of their business, developing a security plan, and doing an assessment of their training needs.

As you may know, we also hosted public consultations addressing important issues of data security.  In Calgary last June, for instance, we examined the growing phenomenon of cloud computing, where data is processed and stored on remote third-party equipment.

In the public sector, we have developed training on Privacy Impact Assessment and we are working with the School of Public Service to enhance knowledge of the Privacy Act.

I want to underline that individuals also play a role. They need to take meaningful ownership of their personal information, to appreciate its inherent value, and to do what they can to protect it.

Toward that end, we publish many fact sheets, booklets and other forms of information.

A lot of our efforts have been directed at youth who, as a general rule, tend to be a little more cavalier with their personal information. We maintain a special website and blog, and have put together material – such as a YouTube video, posters, contests and so on – to get young people thinking about privacy in their online lives.

Conclusion

In conclusion, I think it is safe to say that, when it comes to the protection of personal information, the challenges are complex and plentiful, and growing more daunting all the time.

In the face of these challenges, there can be no single solution or approach.

However, our privacy laws are there to set the direction – to underscore the value of personal information, and to provide the framework for protecting it.

Then it is up to organizations to act in a way that respects both the letter and the spirit of the laws.

It is up to them to resist the impulse to gather more and more personal data, or to use, share or hold on to it beyond what the laws prescribe.

To resist the impulse today – and, more importantly, tomorrow.

Because technology will take us in only one direction: There will never be fewer records, less data. There will only be more.

So the need for safeguards and a healthy respect for personal information will become ever more pressing as time goes by.

Thank you for your attention.

Date modified: