Language selection


My Data Made Me Do It: Ethical Considerations of Big Data

This page has been archived on the Web

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

Address given to the North American Actuarial Council Meeting

Montreal, Quebec
September 30, 2016

Address by Patricia Kosseim
Senior General Counsel and Director General, Legal Services, Policy, Research and Technology Analysis Branch

(Check against delivery)


Thank you for the invitation to join your annual meeting.  I understand you have already spent some time in past meetings discussing the promises of big data and how new transformational changes in predictive modelling stand to revolutionize traditional concepts of risk.  As mathematicians who understand data and the value that can be extracted from it, the actuarial profession is at the precipice of very exciting opportunities. 

No doubt you have already discussed how datafying our backsides, seating positions, geo-location, driving and infotainment habits, and other telematics “exhausted” from smart car sensors, will change automobile insurance forever.  How whole genome sequencing data, combined with other information gleaned from our fitbits, smart vests, or other wearable devices and body sensors, will redefine the terms and conditions of health and life insurance.   How data derived from our smart homes about room temperature, heating devices, alarm systems, appliances, electronics, and floor sensors that detect our gait and home usage, etc. will disrupt the field of property insurance.  And how general patterns revealed not only from our credit histories, education and other sociodemographic data, but now combined with our online behavior, who we befriend through social media, and what we post on Facebook, Twitter, etc. can better predict our financial risk and creditworthiness.  

You have no doubt been well-primed about the benefits of big data by others much better qualified than me to extol its virtues, promises and opportunities (after all, lawyers are not “numbers experts”!). Today, however, you have invited me to discuss some of the ethical risks associated with big data, including the privacy risks.  As stewards of your profession, you have chosen to hit the pause button and take time needed to explore the broader societal implications of this major paradigm shift.  The very values on which your profession is predicated require you to promote the ‘public interest’ before all else.  For example, the Canadian Institute of Actuaries holds a professional duty to the public above even the needs of the profession and its members.

Broader Societal Implications of Big Data

While the promises of Big Data are game changing, so too are its potential consequences on how we understand the world and our place in it.  In their book on Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, put it this way:

“Big Data is poised to reshape the way we live, work and think... The ground beneath our feet is shifting.  Old certainties are being questioned.  Big data requires fresh discussion of the nature of decision-making, destiny, justice. A worldview we thought was made of causes is being challenged by a preponderance of corrélations. The possession of knowledge, which once meant an understanding of the past, is coming to mean an ability to predict the future.”

Just how major are this transformation and its potential societal consequences can be illustrated through just a few examples:

We’ve all had that experience of receiving pop-up ads online uncannily related to topics we may have just queried or researched over the course of recent weeks.  While online behavioural advertising may be convenient for some, it remains offensive for others.  Be that as it may, they are targeted ads aimed at selling us more products and services, which we can always ignore if we choose to.   

But what about when our online behaviour is tracked to make inferences about our preferences, not to enhance commercial sales of perfumes, sports cars or diapers, but to tailor our newsfeeds about the world around us. Which day’s events will filter up to the top of our newsfeed? Brad and Angelina’s break-up and who wore it best at the Oscar’s or the Federal government’s approval of BC’s pacific-northwest pipeline and the atrocious genocide going on in Syria?  How will newsfeeds based on assumptions about who we are and predictions about what may interest us begin to color how we, in turn, see the world and come to believe is most salient in our lives? Will tailored newsfeeds shape us over time, and eventually nudge us into a self-fulfilling prophecy?

Similarly, will targeted political campaigns based on sophisticated big data analyses about who we are come to unduly influence how we vote?  Will I be assumed to be right or left-leaning based on who my ‘friends’ are or some robotic interpretation of what I may have tweeted that day to play devil’s advocate or just to be provocative?  Will other parties abandon me on the assumption that I’m a lost cause, preferring instead to focus on getting their like-minded supporters out to vote and trying to sway the undecided?  Will I stop receiving important information from the other side of an issue that may balance out my views and help inform how I exercise my most fundamental democratic right? 

Big data not only influences the way we see the world, it also influences how others see us.  Take for example, a study at Harvard University by Dr. Latanya Sweeny who found racial bias in ads connected with certain search terms used in Google and Reuters. When searching for black-identifying first names, like DeShawn, Darnell or Jermaine, Sweeny found a higher percentage of ads offering services for criminal records checks as compared to searches for people with white-identifying names such as Brad, Jill and Emma. Is it fair to evoke in others assumptions about who I am based on algorithmic programs that may be biased from the start? 

We’ve all heard of the now infamous Target example, where the large superstore combined purchase patterns with basic demographic data to predict which of their women customers were likely to be in their second trimester of pregnancy and send them targeted ads for products they would need during this period of peak consumption.  On receiving baby coupons and samples at home, the father of a young teenage girl found out about her pregnancy before she was able to tell him about it herself. 

Not only do big data scientists know more about our friends and loved ones than we do, as Target has shown, soon will come the day when algorithms will know us better than we know ourselves. Take for example, the insurance contract predicated on the principle of good faith. Applicants must disclose everything they know about their health, their property, etc. in order for the insurance company to take that knowledge into account when calculating risk.   But what will happen when insurance companies actually have more information about us than we do?  When they move from a model of pooled risk, to highly individualized risk based on huge volumes of personal data amassed not from us, but indirectly from our tracking devices, our body sensors, our genomes, our cars and our homes? What of equal knowledge and good faith then? Will we get to see what their analysis of us and challenge its fairness?

Losing Control Over Our Most Valuable Asset

Many now recognize that personal data is the new oil in today’s digital economy.   Some have even posited the idea that companies monetizing personal information should quantify it as they do any other economic asset and publicly report on it in their financial statements.  Given the increased value in the personal data that we generate (to say nothing of ownership), why are individual consumers not enriched by it?  Rather than gaining bargaining power and exercising greater control over this extremely valuable asset in high demand, why are we seemingly losing the capacity to select who we share it with and for what purposes?

1. Dwindling choice

As smart devices begin to crowd the market – everything from smart TVs, smart phones and smart appliances – there is less and less opportunity to choose good ol’ dumb devices that don’t collect, use and disclose our personal data en masse.  Buyers are not always able to decline the extra bells and whistles of smart vehicles equipped with GPS, telematics, and infotainment systems.  Same with videos, if you’ve noticed, there are fewer and fewer brick and mortar video stores nowadays, having been driven out of market by data-rich, Internet-connected powerhouses like Netflix. Although people aren’t obligated to join social media networks, there are risks of being socially stigmatized if they don’t, opportunity costs for professionals who prefer not to and educational disadvantages for students who don't join their classroom's platform.  Even though legally, you should be able to surf the Net and visit websites without being targeted with online behavioral ads if you choose to decline them, a recent study by our Office found that opt-out icons and procedures, though offered on most websites visited, were very difficult for individuals to find, understand and actually exercise.

2. Increased Obscurity

As big data initiatives begin to replace small data transactions, organizations and entire industries are ‘going dark’.  The commercial value of big data algorithms developed by ingenious engineers behind the scenes is far too great to reveal publicly and risk being copied and undersold by competitors.  Even if organizations accepted to be open about what information they aggregate and analyze without relinquishing intellectual property rights in the process, the information flows, relationships and sheer math of it all have grown far too complex for the average layperson to even understand.  As it is, even in a simple world of basic privacy policies, researchers from Carnegie Mellon found that it would take the average person about 40 minutes per day to read privacy policies they encounter in day to day transactions, that’s 244 hours per year, for a combined national total of 54 billion hours per year!  And that’s just to read the policies… let alone understand them! Then enter the world of artificial intelligence where the sheer complexity of it all begins to evade even data scientists themselves as machines adapt their algorithms on their own.

3. Undefined purposes

The basic premise of a consent-based privacy model is to define and get agreement on the purposes for which personal data will be collected, used and disclosed.  Once those purposes are fulfilled, the permission to retain those data expires and personal data should be returned to the individual, destroyed or otherwise anonymized.  Any other purported purpose outside those agreed-upon boundaries requires new consent.  In this sense, the concept of “purpose” is a pivotal legal gateway for consumers to exercise ongoing control over their personal data.  Yet, “purpose” is an anathema to big data mindsets that seek to experiment with data to find trends and connections they don’t even know they’re looking for. Obtaining prospective consent for specific, defined purposes is said to be impossible, yet requesting such broadly-worded consent for future, undefined purposes seems to makes a mockery of it.

4. Invisible Intermediaries

In a small data world, most commercial transactions were neatly categorized as agreements entered into between individual consumer and organization, where personal information was provided as consideration for a product or service on negotiated terms and conditions. In that context, informed consent as the legal mechanism to ‘seal the deal’ made sense. In an era of big data, the biggest players are invisible intermediaries that mine and combine data across many disparate sources independent of consumers’ participation. Individuals never see these actors, let alone have any sort of relationship with them – hence no real opportunity to say yay or nay to the use of their data. Conversely, companies don’t need to contact individuals directly anymore to scrape data from the Internet and other third-party sources. They claim not to be interested in identifying information and therefore should not be required to obtain consent. Moreover, at the scale at which they operate, they claim it’s simply impossible to do so.  

What is the Place of Consent in a Data-Driven World?

In light of these challenges, our Office has recently launched a national discussion paper in which we have sought views from a broad range of stakeholders on the complex issue of consent in a world of big data and IOT.  Among some of the questions we are asking are:

  • How can we make consent more meaningful in this new context, either informationally or technologically?   Are there minimal disclosure requirements that organizations should be required to make transparent to consumers?  Are there technological ways of de-identifying data to an acceptable threshold of risk or tagging data so that consumer’s permissions follow their personal information no matter where it flows through the ether? 
  • Are there legitimate purposes to which our data may be put which we can agree to generally as a society, without having to require individual consent each and every time?  If so, what would those legitimate purposes be and under what conditions?  Conversely, are there clear no-go zones that should be prohibited, particularly if they risk harming individuals or adversely affecting individual or group interests?  
  • Recognizing that consent is vulnerable in this new context, are there additional protections required from an overarching governance perspective?  How can organizational accountability be strengthened?  Are there self-regulatory mechanisms such as trustmarks or codes of practice that may help raise the bar?  What should be the role of regulators and what enforcement powers should they have?
  • Finally, and most relevant for your purposes, if black and white legal solutions leave us wanting, what role can ethics play in helping guide reflection and action in this new brave world?

Emergence of Ethics Governance Frameworks

Recognizing the pressures and limitations of individual autonomy in a big data era, regulators and advocacy groups the world over are exploring ethics governance regimes as means of providing consumers with additional protections and assurances. 

This “question du jour” is the subject of a much bigger, global conversation.  For example, two years ago, the International Conference of Data Protection and Privacy Commissioners adopted a unanimous resolution on Big Data calling on member countries to demonstrate that decisions based on Big Data, including the results of profiling, are fair, transparent, accountable… and ethical.  

In his 2015 opinion, “Towards a new digital ethics”, the European Data Protection Supervisor affirms that adherence to law is not enough in today’s digital environment.  Instead, modern privacy regulation must also consider the ethical dimension of data processing. Better respect for, and safeguarding of, human dignity are suggested as the counterweight to pervasive surveillance and power asymmetry. 

In a recent White House report on Big Data, the US government outlined a number of steps to ensure that growth in the use of data analytics is matched with equal innovation in rights protection.  The very first of these steps is to “support research into … building systems that support fairness and accountability, and developing strong data ethics frameworks.”

International advocacy groups and think tanks have also turned their best minds to this question.

The Information Accountability Foundation (IAF) launched its Big Data Ethics Initiative in 2014 to address challenges posed by big data analysis. Its Unified Ethical Frame is being proposed as a means by which organizations can meaningfully balance the full range of individual, societal and business interests when evaluating a proposed use of data against five core values: (i) beneficial; (ii) progressive; (iii) sustainable; (iv) respectful; and, (v) fair.  (The IAF is currently undertaking a project, funded by OPC’s arms-length Contributions Program, to adapt this framework to the Canadian context.)

The Centre for Information Policy Leadership (CIPL) has proposed the notion of enhanced organizational accountability for use of personal information. CIPL’s proposed model would see organizations increase transparency and enhance risk assessments when evaluating the appropriateness of data processing based on fairness and an ethical framework, taking into account risks of harm to individuals and benefits to the organization, individuals and society at large.

The Future of Privacy Forum (FPF) has looked to existing ethics governance regimes – such as research ethics boards that exist in academia – to posit whether such an idea could be adapted in the commercial context. Researcher Ryan Calo, for instance, has proposed the concept of a Consumer Ethics Boards, as a way of recalibrating the asymmetry between individuals and big business and systematically vetting the ethics of purported uses of consumer data before they are deployed.

While ethical governance is not a wholesale substitute for consent, there is increasing acknowledgement that ethics should form part of the big data discourse and play a critical role in protecting privacy based on agreed-upon principles and appropriate governance.

Conclusion: What Will Remain of Human Volition in a Big Data World?

In his recently published book, “Homo Deus: A Brief History of Tomorrow”, Prof. Yuval Hariri imagines a not too-distant future which challenges our homo-centric view of the world in favor of a data-centric one:  the rise of dataism in which authority shifts from humans to algorithms, and engineers are reduced to chips then data.  To get to know ourselves and make decisions about our fate as humans, we will no longer need to self-reflect, resort to quiet sanctuaries, keep a private diary, tap into our intuition, explore our values and shape our personalities through intimate relationships etc.. All we need is to get our DNA sequenced, wear biosensors 24/7, let Google, Facebook and other intenet giants analyze our online behaviour, our emails, chats, messages, likes and clicks, and we will approach the day when the Internet of all Things will subvert the human agenda and be able to tell us whom to marry, which career to pursue and whether to start a war. 

What will happen, he asks, to society, politics and daily life, when non-conscious but highly intelligent algorithms know us better than we know ourselves?  This is a profound ethical question and one that is not mere fantasy.  Just this week, a partnership between some of the world’s largest data holders, including Facebook, Google, Microsoft IBM, Amazon and others, have launched a “Partnership on Artificial Intelligence to Benefit People and Society”.

Whether or not we buy into Hariri’s prediction of our future and dataism as a worldview, is not as important as questioning the destination of this big data trainride we’re on, lest we let it take us there on auto-pilot. Critical discussions like the one you are having today, are indispensable for unlocking the tremendous value data has to improve the human condition, while also designing a future in which human agency still matters.    

Date modified: