Data Integration and Linked Employer-Employee Data Conference

This page has been archived on the Web

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

March 22, 2002
Wellington, New Zealand

George Radwanski
Privacy Commissioner of Canada

(Check Against Delivery)


This conference is devoted to the problem of linking research survey data with data collected for administrative purposes. It's interesting to me that you're talking first about the technical issues involved in creating these integrated data sets, then about the research that can be conducted using them-and only after that about privacy and confidentiality.

This suggests to me that privacy is seen as something that you protect once you've got the information. It's as though, once you've built these integrated data sets and used them for research, the only remaining issue is to make sure that the data are kept out of the reach of unauthorized users.

And that suggests to me that privacy is being confused with confidentiality and security.

Privacy is more than confidentiality and security. It can't be an afterthought. It has to be built into the system from the outset. Indeed, privacy concerns may well determine how you build these systems, and maybe even whether you build them at all.

Privacy is our right to control access to ourselves and information about ourselves, including the collection, use, and disclosure of that information.

That distinguishes it from confidentiality and security.

Confidentiality is our obligation to protect other people's personal information when it's in our possession. It's an obligation to care for the information, maintain its secrecy and not misuse or wrongfully disclose it.

And security is the process of assessing and countering threats and risks to information.

If information is being collected or used without consent, that's a fundamental violation of privacy, and no confidentiality and security measures will change that. If it was collected for one purpose and is being used for an unrelated purpose, without consent, that too is a violation of privacy.

In Canada, we have some experience with an integrated data base that assured security and confidentiality but nonetheless violated privacy. It was known as the Longitudinal Labour Force File.

The Longitudinal Labour Force File took information that had been collected by various agencies and departments of the Canadian government for such purposes as income tax, immigration, social assistance, employment services, and unemployment insurance-and assembled it into a huge research data base.

There was never any doubt that the government department that assembled this data base did so with the best of intentions. And it was not a violation of Canada's Privacy Act-at least, it wasn't a violation of the letter of the law. The act specifically allows a government institution that has collected personal information for administrative purposes to use or disclose it for research and statistical purposes. All the personal information in the file related directly to the operating programs of the department or was legally disclosed to it by other departments. The department also advised Canadians of the existence of the data base, at least to the degree it was required to by law.

But there were serious problems that made it a violation of the spirit of the Privacy Act. These included the sheer size and comprehensiveness of the data base, the lack of transparency about the use of the information when it was collected and assembled, the permanence of the

data, and the absence of an explicit legal protective framework. More fundamentally, a "Big Brother" file made up of comprehensive dossiers on individual Canadians could prove to be a huge temptation to government to use it for data mining, profiling, monitoring and tracking individuals, or classifying and making predictions about them based on such things as ethnicity, disability, or political activity.

An uproar ensued when my predecessor's annual report raised the level of public awareness of the data base's existence. Eventually, this action by the Privacy Commissioner of Canada and the resulting public pressure compelled the government to dismantle the data base. To the department's credit, it worked quickly and co-operatively with my officials to remedy the problem.

This is an example of my ombudsman role as Privacy Commissioner. Although I don't have the power to order a government department or a private company to do something to respect Canadians' privacy rights, I do have a legislated mandate to speak out on behalf of these rights. Mine is not one voice among many-it is the voice of all Canadians, as spoken by an independent Officer of Parliament.

Like other privacy advocates, I don't dispute the value of personal information to a modern state-for such things as policy and program development, management of the economy, informing public debate, and evaluation of policy and program effectiveness.

But the lesson to be learned from the Longitudinal Labour Force File affair is that privacy has to be respected-it has to be built into the system-because without it, the legitimate goals of the government will not be achieved.

It's up to you to find a way to ensure that your desired statistical and research data bases protect privacy. The right is fundamental, and the onus is on those who design the systems to respect it.

Here are some of the key points I raise with governments when I talk to them about designing their administrative systems. These points may not come as good news for proponents of data base integration. But my hope is that they may help you to respect privacy in designing your systems-and spare you the problems we went through in Canada.

First, retain some walls between banks of data.

Information about individuals and their interactions with government is collected for specific uses. The separate data bases it's held in-"silos" as they're called-reflect the specific purposes that justify the collection and retention of the information. The information is compartmentalized.

Without those silo walls, an agency with a need to know only one piece of information can have access to lots more information than it needs or has any right to know.

And information can be combined, to create profiles of individuals. This may make statistical analysis easier, but it's also a distinguishing feature of surveillance societies. Building dossiers on individuals, tracking their activities and their interaction with government, however well-intended the purpose, has no place in an open, democratic society.

Second, beware of the common client identifier-a device for linking data, which is critical to the use of administrative data for research purposes, and at the same time an authentication, identification, and access device.

We need to question, always, to what extent there is actually a need to identify the client. Sometimes there is an obvious need-for example, when someone is seeking a government service or benefit. But many interactions can-and should-be anonymous. A simple request for general information, for example, requires no authentication of the client's identity beyond, perhaps, a mailing address or telephone number.

Yet governments may be tempted to require authentication when it's not really needed, especially if a ready means of authenticating identity, such as a smart card, is developed.

Governments should require authentication only when necessary. The default setting should be anonymity. Such a presumption in favour of anonymity is a powerful force for privacy.

Third, the desire to develop administrative data bases must not become a justification for refusing anonymity. We can't allow the widespread use of a linking device to impose its own pressure on the development of an identification device, in a sort of "personal information feedback loop."

Fourth, government agencies should drill down no deeper than necessary when authenticating identity. Different agencies need to know different things about a client, although there is some obvious overlap, such as name, address and date of birth. The architecture of the systems operated by different government departments and agencies must reflect that. Where anonymity is not possible, authentication information should be limited and segmented; the only personal information that's revealed should be the information that's needed for that specific transaction.

So how do you go about building privacy into systems at the outset? The best way is to change your mindset and put privacy first. Just as every major building or engineering project must now prepare an environmental assessment long before construction ever begins, so too should system designers conduct a Privacy Impact Assessment before the first piece of personal information is collected.

A Privacy Impact Assessment is an analysis of the likely impacts on privacy of a project, practice, or system. It involves looking at all the personal information practices that go into the system, such as what kinds of information will be collected, how consent will be obtained, how and for how long the information is to be kept, how it will be used, and to whom it will be disclosed. It looks at things like the purposes and statutory authorities for collection, use, and

disclosure, what kinds of linkages there will be between this and other information, how individuals will be able to exercise their right of access to their information, and how they will be able to correct any inaccuracies. It also looks at privacy legislation and principles, and assesses how the project or system complies with them overall.

What are some of the privacy impacts you would be looking for? Let me give you some examples:

Will it be possible to combine unrelated personal information to create new information about identifiable individuals?

Will it be possible to track an individual's transactions with different programs?

Will the system, especially its demands for identification and authentication, lead to profiling, transaction monitoring, or other forms of surveillance?

Will the program or system entail the physical observation of individuals?

Will it facilitate electronic misuse of publicly available personal information?

This is sometimes referred to as a risk-management tool. Getting it wrong on privacy is a risky proposition. As our government found out with the Longitudinal Labour Force File, good design is more cost-effective and less labour-intensive than retrofit.

Privacy is often most threatened by well-intentioned people who suggest trading it off for some greater good. Often it's public security that's described as the greater good; that's been particularly the case recently with the fight against terrorism. But sometimes the trade-off is for something called "efficiency."

It's too often forgotten what efficiency really is. Efficiency refers to the relation between means and ends, to the choice of the best means to achieve a particular goal.

How we define those goals is the crucial question. And protecting privacy has to be one of our primary goals, for privacy is a fundamental human right, arguably the one from which all others are derived-freedom of speech, freedom of thought, freedom of association, just about any freedom you can name.

Privacy should not be seen as an impediment to efficiency, or something that you can sacrifice to be more efficient. It is something that you have to protect. That's a fundamental element of your goal. Whether you're designing government programs, or conducting research surveys, or finding ways those two activities can be combined, It's up to you to find efficient means to achieve that goal.

Date modified: