Privacy Upstream, Discrimination Downstream: The (Un)Intended Consequences of Data Analytics

Address given at the Reboot 18^th Annual Privacy and Security Conference

Victoria, British Columbia
February 10, 2017

Address by Patricia Kosseim
Senior General Counsel and Director-General, Legal Services, Policy, Research and Technology Analysis Branch

(Check against delivery)

Introduction

As many of you know, the Office of the Privacy Commissioner of Canada is mandated to protect and promote the privacy rights of individuals. As data protection authorities, we focus our attention on the collection, use and disclosure of personal information upstream while human rights commissions typically address discriminatory impacts of that information downstream. As an example, the OPC asks whether it’s necessary for insurance companies to collect genetic information for actuarial purposes, whereas our human rights colleagues may step in when an individual has been denied coverage as a result.

Because discrimination is often a function of personal information collected about people – be it individuals or groups – we need to better understand the connection. Never before have we seen such an unprecedented level of data collected about all aspects of our lives, both by humans and machines. The truth is we have all been unwittingly caught up in what one author has called the “digital dragnet.”^{Footnote 1} We need to look more closely at what can happen as a result – if it’s not happening already – in order to work harder and smarter at the front end to limit the amount of personal information collected and how it’s used in the first place.

I propose we go there today – to the deep, dark underworld of discrimination, so that we can work backwards, deconstruct the journey and figure out how best to address issues before they happen.

What Does “Down the Stream” Look Like?

Imagine you’re at your favourite waterpark, taking a nice leisurely ride down a winding, lazy river. You’re lying on a big, comfy, inflatable tube, slowly bobbing up and down enjoying the relaxing ride. Got your shades on, eyes closed. You’ve no idea where you’re going, blissfully ignorant of the fact that you’re getting sunburnt along the way!

Indeed, a growing number of studies demonstrate the pernicious, downstream effects of data analytics. You may have heard about the work of Latanya Sweeney who suggested there was racial bias in ads connected with certain search terms used by Google. When searching for black-identifying first names, like DeShawn, Darnell or Jermaine, Sweeney found a higher percentage of ads offering services for criminal record checks as compared to searches for people with white-identifying names such as Brad, Jill and Emma. Whether the bias is attributable to race or socio-economic status, we just don’t know.^{Footnote 2}

A similarly troubling result was found in a ProPublica study of risk assessment software used across the US to predict criminal recidivism for parole and sentencing decisions. This study found significant racial bias, falsely flagging black defendants as future criminals at almost twice the rate of white defendants.^{Footnote 3}

Some argue “it is what it is.” Google and Reuters did not set out to develop discriminatory algorithms. It’s purely a function of past search activity, aggregated over millions if not billions of users, that the correlation was made. In other words, the computer, completely agnostic as to race or other factors, just spits back what it was taught to look for. Similarly recidivism tends to correlate highly with employment, education and neighborhood crime rates, against which some groups tend to fare less well than others.

But this argument is flawed in several respects. First, it ignores the role algorithms play in perpetuating and even aggravating discriminatory impacts. Second, it ignores the critical role humans play in designing these algorithms and then interpreting and applying their results. And third, it’s nearly impossible to test the claim that algorithms are but neutral and agnostic when they are so opaque and secretly guarded that nobody outside can see them – a point Frank Pasquale makes in his book The Black Box Society.^{Footnote 4}

Cathy O’Neil, a PhD data scientist from Harvard, jaded from her stint in the finance industry, also supports these counterarguments in her book aptly entitled, “Weapons of Math Destruction.”

Among several examples, O’Neil points to crime prediction software that cash-strapped police departments are using across America to determine in which neighborhoods they should be sending their limited patrol cars to maximize efficiency. These models are designed to analyze historical crime patterns in order to predict where criminal activity is most likely to occur next.^{Footnote 5}

But designers have a choice. They can include serious crimes in their model – homicide, assault, burglary and auto theft. Or, they can also include petty, nuisance crimes like vagrancy and panhandling. By including the latter, they inflate the numbers of some of the most impoverished neighbourhoods, inevitably sending more police cars to those segregated parts of America, thus over-policing them as they try to root out trouble with a zero-tolerance mindset.^{Footnote 6}

Although the model did not set out to target race, the factors chosen turn out to be very close proxies, creating a discriminatory feedback loop. Moreover, by focusing on these models that predict street crime, police are necessarily choosing to put fewer resources into fighting white-collar crime in the financial industry, for example. O’Neil goes on to ask “whether we, as a society are willing to sacrifice a bit of efficiency in the interest of fairness. Should we handicap the models, leaving certain data out?”^{Footnote 7}

The potential for discrimination exists in the for-profit sector too where we see many examples of automated decision-making processes that may be vulnerable to potential discrimination, such as: who to lend money to, who to insure, who to hire or who to accept into college.

O’Neil points to for-profit universities that use data about the most vulnerable in society to generate leads. It turns out the most vulnerable are also the most profitable. Would-be students desperate to improve their lives will go to great lengths to secure government loans to get into university. With this in mind, some for-profit universities use algorithms and data analytics to segment their market and target their predatory advertising campaign to: ‘Welfare Mom with Kids’; ‘Recent Divorce, Death or Incarceration’; ‘Low Self-Esteem’; ‘Drug Rehab’; ‘Dead-End Jobs’; ‘No Future’. And the computer models they use set out to find just that.^{Footnote 8}

But there too, there are trade-offs to make between fairness and efficiency, and choices to make about which data to collect or just leave out. Take the emerging sharing economy. In the early days of online marketplaces, the anonymity of the parties to a service transaction made it hard for sellers to discriminate. It seemed that the sharing economy had the potential to end discrimination that was often present in in-person transactions. But then platforms began requiring information such as names and photos, leaving open the possibility of reintroducing racial bias into the online marketplace.

Similar to Sweeney’s study, researchers from Harvard Business School found that AirB&B requests from people with black-sounding names were 16 per cent less likely than those with white-sounding names to be accepted. The point is, sharing platforms make design choices. They can choose not to require surnames or photos at all, or if they do, could require this information only after other salient aspects of the transaction — such as price and availability — have been agreed to.^{Footnote 9}

Somewhat like the US symphony orchestras in the 1970-80s, that made a ‘concerted’ (no pun intended) effort to increase diversity by auditioning musicians behind a screen wall. By reducing the amount of information available to conductors up front (such as physical characteristics), this change resulted in a 160% increase in the female success rate.^{Footnote 10}

Resetting the Voyage: Heading Upstream Again

Now that we know what downstream looks like, let’s head back upstream to explore what course corrections can be made at the outset.

This time, let’s change our little thought experiment. We are no longer lazily bobbing up and down on our inflatable dingies at the waterpark, heading “somewhere down the crazy river.” We’re about to sail across the Atlantic, from the Canary Islands to the Caribbean – the same route Christopher Columbus once took and my sister and brother-in-law just recently completed in their 46-foot alloy sailboat.

For nineteen days, I followed their journey, anxiously reading their daily sailblog for news, praying for their safe arrival on terra firma. This was no lazy river ride! “Gybe the jib to starboard”, “lower the mainsail to windward”, “set the staysail to leeward”… and so it went with technical words I barely understood. Between the two of them - daywatch and nightwatch - they stayed vigilant throughout the journey, constantly adjusting their sails, charting and correcting their course according to weather, currents, wind speed and direction.

How can we begin to address the adverse downstream effects of discrimination and work proactively and diligently up front to remedy, if not avoid them altogether? How can we regulate: the data we feed into algorithms, the algorithms we design to interpret the data, and the societal decisions we make based on their results so as to minimize their harmful impacts?

These are questions being asked around the world as we come to realize that privacy laws alone may be too crude and limited as instruments. Increasingly, there is a sense that we need to develop robust ethical frameworks and bring to bear our best interdisciplinary thinking to fill in the gaps, so to speak, and help guide complex and value-laden decisions at a more granular level.

For example, in 2014, the International Conference of Data Protection and Privacy Commissioners adopted a unanimous resolution calling on all parties to be able to demonstrate that decisions based on Big Data, including the results of profiling, are fair, transparent, accountable… and ethical.^{Footnote 11}

In 2015, the European Data Protection Supervisor issued an Opinion on Digital Ethics, urging the need to consider the ethical dimension of data processing so as to address deeper questions about the impacts on human dignity, individual freedom and the functioning of democracy.^{Footnote 12}

And the 2016 World Economic Forum Report on Global Risk suggests that ethical frameworks and norms need to guide technological innovation if we are going to keep pace with the Fourth Industrial Revolution.^{Footnote 13}

Many in the business, academic and scientific communities have begun heeding that call. For today’s purposes, I have grouped the different responses into four categories: Organizational Ethics, Robot Ethics, Designer Ethics and Integrated Ethics.

Organizational Ethics

Many are actively advocating in favour of ethical frameworks to help guide organizational decision-making at the front end. Before deploying big data initiatives or new IOT devices for example, some are proposing various means for companies to broaden their perspectives of potential harms and benefits beyond the bottom line.

International advocacy groups and think tanks have turned some of their best minds to this question.

The Information Accountability Foundation (IAF) has proposed a Unified Ethical Frame as means by which organizations can more systematically and more meaningfully balance the full range of individual, societal and business interests when evaluating a proposed Big Data use against certain core values.^{Footnote 14}

The Centre for Information Policy Leadership (CIPL) has proposed the notion of enhanced organizational accountability to evaluate the appropriateness of data processing based on fairness and an ethical framework, taking into account risks of harm to individuals and benefits to the organization, individuals and society at large.^{Footnote 15}

Scholars associated with the Future of Privacy Forum (FPF) have suggested the creation of consumer ethics boards, modelled after research ethics boards in academia, as a way of recalibrating the asymmetry between individuals and big business and systematically reviewing proposed data uses from the consumer’s viewpoint.^{Footnote 16}

What this first group of initiatives all have in common, is some way for organizations – either internally or externally – to enhance their decisions by augmenting their profit-making motive, with more thoughtful consideration of potential harms and benefits from multiple perspectives other than their own.

Robot Ethics

Another proposal – a little more out there – is to download onto computers, not only the analytical work they can do far more powerfully than humans can, but also the ethical decision-making processes that go along with it.

Can we teach robots to make ethical choices?

A UK research study found that we are not there yet. They programmed a robot to follow Asimov’s first ethical rule of robots, that a robot may not injure a human being or allow a human being to come to harm. They found that when the robot was capable of saving a human surrogate, it did so 100 per cent of the time. However, when it was faced with the trolley problem — the ethical dilemma of saving one but not the other — it failed nearly half the time to rescue either. It became paralyzed with indecision, constantly changing its mind or “dithering” from one to the other.^{Footnote 17}

Somewhere, in robotics labs around the world, someone is working to perfect this experiment and the robot’s ability to make ethical choices. But no sooner than they do, they may need to reprogram their robot to take into account new rules some have suggested be added to Asimov’s first three.

In addition to:

A robot may not injure a human;
A robot must obey the orders of a human; and
A robot should protect itself except if that means injuring a human or disobeying an order;

Marc Rotenberg, from the Electronic Privacy Information Centre, has proposed that:

A robot must always reveal the basis of its decision (concept of “Algorithmic Transparency”); and
A robot must always reveal its actual identity (to identify the human, organization or state behind a drone for example).^{Footnote 18}

Designer Ethics

A third group of proposals – more technical in nature – is to embed ethical values and principles into the very design of the system.

For example, the Institute for Electrical and Electronics Engineers has recently proposed draft ethics principles to guide the design and development of artificial intelligence and autonomous systems. These are to: embody the highest ideals of human rights, prioritize maximum benefit to humanity, and mitigate risks and negative impacts. They present a number of draft recommendations for the technical community to build these values into their systems, guide their research methodologies, maximize accountability and transparency, and recalibrate the data asymmetry that exists between the individual and the system.^{Footnote 19}

In a Harvard Business Review article, “Fixing Discrimination in Online Marketplaces,” authors Fisman and Luca list a number of concrete features that can be built into the design of online platforms to avoid, or at least minimize, potential discrimination.^{Footnote 20}

A good example of some of these choices in action arose just this week when Facebook announced three design changes to its platform. In response to another Propublica study that uncovered some not so pretty things last year,^{Footnote 21} Facebook:

has adopted an explicit policy against discrimination and will get advertisers to explicitly certify acceptance of the policy before joining its ad network.

It is restricting any housing, employment or credit-related advertisers from using audience selection tools based on certain personal characteristics.

And, with machine learning it hopes to perfect over time, it will begin automatically detecting and taking down ads that violate its policies.^{Footnote 22}

Integrated Ethics

Then, there are those who advocate a broader, more integrated, interdisciplinary approach to addressing the ethical, legal and social issues of algorithms, much like we did in the ’80s at the outset of the Human Genome Mapping Project. The Human Genome brought tremendous promise, yet unknown consequences, and the world was both excited and greatly apprehensive about it.

James Watson, the first Director of the NIH Human Genome Research Institute, announced that a certain percentage of research funds would henceforth always be directed to studying the ethical, legal and social implications of the human genome – otherwise known as ELSI research. This spawned a whole new generation of similar programs in the UK and Canada as research funders came to realize that good science must not only be technically reliable, but also socially robust.

Just as the quest for the human genome became re-contextualized within the broader society and better grounded in what we collectively accept as being just, equitable and fair, so too must the future of the Digital Revolution. Computer scientists, like genomic scientists, must work in tandem with social science colleagues in law, philosophy, sociology, anthropology and others, to anticipate and address the broader, downstream impacts of their work on society.

Kate Crawford, visiting scholar at MIT and Ryan Calo, from University of Washington, suggest something like this: a social-systems approach in the field of artificial intelligence that is more proactive than after-the fact technical corrections and broader than ‘how to’ decisions. This approach is a multi-disciplinary and multi-pronged analysis at every stage of the AI innovation process — from conception, to design, deployment and regulation — to ensure that we get a more holistic understanding of its potential societal impacts and critically assess its net benefits before even being adopted at all.^{Footnote 23}

Conclusion

There are many who think Canada has great potential to become a global leader in the field of artificial intelligence, including Steven Irvine, a former Facebook executive who recently returned to Canada to start a new AI company. According to him, we have the academic heavyweights, as well as the “entrepreneurial talent and strong business leaders” to make this happen. Positioning Canada as an AI leader could have significant economic benefit to the country. Some predict that AI could add $17.5-billion to our annual GDP and create170,000 new jobs by 2025.^{Footnote 24}

But pursuing this opportunity is not without risks:

Futurist Yuval Harari imagines a not too-distant future which challenges our homo-centric view of the world in favor of a data-centric one. He foresees the rise of dataism in which authority shifts from humans to algorithms, and engineers are reduced to chips then data, as we approach the day when the Internet of all Things will completely subvert the human agenda. What will happen, he asks, to society, politics and daily life, when non-conscious but highly intelligent algorithms get to know us better than we know ourselves?^{Footnote 25}

If Harari’s future is not plausible enough for you, consider China’s new social credit rating tool it hopes to launch by 2020. The goal is to create a nation-wide “social-credit” system that compiles digital records of citizen behaviour, based on everything from fare cheating and jaywalking, to internet activity and violations of family-planning rules. The system would give individuals an overall rating that determines what services individuals are entitled to and which blacklists they are put on. Should the experiment succeed, the state will get to decide an individual’s worth in society based on algorithms that not only claim to know them better than themselves, but also rate them morally in relation to fellow citizens.^{Footnote 26}

This central question of the role of human agency in a technology-driven world lies at the core of our strategic priority work on the Economics of Personal Information. About a year ago, our Office launched a consultation paper on the role that consent plays in increasing Canadians’ sense of control over their personal information and enhancing their trust in the digital economy.

We received 51 written submissions, held stakeholder meetings in five major cities, and this week, our Commissioner is on the road, listening in on focus groups with individual Canadians across the country — all with a view to issuing a policy position in mid-2017.

It is quite a daunting task, come to think of it. Here we are, setting out on a journey to increase individual control, when technology, like a formidable current, is pushing us in the opposite direction.

Perhaps the consent process as we currently know it (reams and reams of pages of legalese nobody reads), can be made more meaningful through creative legal, policy and technological ways. Perhaps, society will accept alternative modes of protection in certain circumstances and under certain conditions when individual consent is otherwise not practicable. What should that alternative protection look like and what role different actors play in providing that protection (individuals, organizations, regulators and legislators), will be the subject of our reflections over the next few months.

We need to tackle these issues head-on; certainly we can’t be complacent about them. But we will not resolve them by limiting ourselves to the here and now. We need to address them with courage and strategic foresight, by imagining and transposing ourselves into a plausible future, and working backwards from there to enable the pursuit of exciting possibilities while avoiding potential harms.

In conclusion, we cannot simply lie down on our inflatable dingies, with eyes closed, sailing down the lazy river, wherever the market and technology currents take us. Like early explorers, we need to be active navigators on this journey, charting our course, adjusting the sails, adapting to changing winds, correcting our sail plan, anticipating bad weather — and avoiding pirate territory — as we head into the great unknown.

Endnotes

Footnote 1

Cathy O’Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (New York, NY: Crown Publishing Group/Penguin Random House, 2016)

Return to footnote 1

Footnote 2

Latanya Sweeney, “Discrimination in Online Ad Delivery” (2013) 11:3 Communications of the Association of Computing Machinery (CACM).

Return to footnote 2

Footnote 3

Julia Angwin et al “Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks”, ProPublica (23 May 2016).

Return to footnote 3

Footnote 4

Frank Pasquale, The Black Box Society: The Secret Algorithms That Control Money and Information (Cambridge, MA: Harvard University Press, 2015).

Return to footnote 4

Footnote 5

Cathy O’Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (New York, NY: Crown Publishing Group/Penguin Random House, 2016)

Return to footnote 5

Footnote 6

Ibid.

Return to footnote 6

Footnote 7

Ibid.

Return to footnote 7

Footnote 8

Ibid.

Return to footnote 8

Footnote 9

Ray Fisman and Michael Luca “Fixing Discrimination in Online Marketplaces” (December 2016) Harvard Business Review.

Return to footnote 9

Footnote 10

Ibid.

Return to footnote 10

Footnote 11

36^th International Conference of Data Protection and Privacy Commissioners “Resolution: Big Data”.

Return to footnote 11

Footnote 12

European Data Protection Supervisor, Opinion 4/2015 “Towards a New Digital Ethics: Data, dignity and technology” (11 September 2015).

Return to footnote 12

Footnote 13

World Economic Forum, “The Global Risks Report 2016, 11th edition” (Geneva).

Return to footnote 13

Footnote 14

Information Accountability Foundation, “Unified Ethical Frame for Big Data Analysis,” IAF Big Data Ethics Initiative, Part A (March 2015).

Return to footnote 14

Footnote 15

Centre for Information Policy Leadership, “Protecting Privacy in a World of Big Data Paper 1: The Role of Enhanced Accountability in Creating a Sustainable Data-driven Economy and Information Society”

Return to footnote 15

Footnote 16

Conference Proceedings “Beyond IRBs: Designing Ethical Review Processes for Big Data Research” (Washington DC, 10 December 2015).

Return to footnote 16

Footnote 17

Boer Deng, “Machine ethics: The robot’s dilemma” (2015) 523:7558 Nature 25.

Return to footnote 17

Footnote 18

Marc Rotenberg, “Privacy in the Modern Age: The Search for Solutions” Presentation at the 38th International Conference of Data Protection and Privacy Commissioners (Marrakech, Morocco, 19 October 2016).

Return to footnote 18

Footnote 19