Privacy Research Papers

The Age of Predictive Analytics: From Patterns to Predictions

A report prepared by the Research Group at the Office of the Privacy Commissioner of Canada

August 2012


Top of Page Table of ContentsIntroduction

In both the public and private sectors there is an overall fascination with predicting how people will behave: What will people purchase? How do they use technology? When will someone behave badly, break the law, or commit fraud? danah boyd has noted this shift, saying “it’s no longer about what you ‘do’ but about what you ‘might do’, and that includes what other’s do where it implicates you or might influence you.”Footnote 1 This type of analysis isn’t an entirely new phenomenon, but the form of that analysis is moving from predictions based on experience, intuition and critical thinking, to one that is based on a technological analysis of raw data – predictive analytics.

In a recent article How Companies Learn Your Secrets,Footnote 2 and book The Power of Habit: Why we do what we do in life and business,Footnote 3 New York Times reporter Charles Duhigg described how certain corporations use “predictive analytics” to understand consumers’ shopping and personal habits in order to market to them more effectively. Specifically, Duhigg shed light on the practices of American department store Target, and reported that the company used predictive analytics as a means to uncover which women were likely to be in their early stages of pregnancy so that it could target advertising to them before any other company. The algorithm that was behind the analytics was dubbed the “pregnancy-prediction algorithm.”

The development of the pregnancy-prediction algorithm exemplifies the emergence of a corporate trend that places huge importance on internal “Data Analytics Departments” or “Data Science Teams.” Duhigg reported that Target has about 50 employees whose sole job is to find trends and patterns hidden in the data that Target collects on its customers.Footnote 4 In the development of the pregnancy-prediction algorithm, the team of data scientists tested theories and analyzed patterns in customer data, historical data from baby registries, and demographic data purchased from data brokers, and found that certain patterns and data linkages could be made that revealed predictable shopping patterns with women who were pregnant. According to Duhigg, the driving force behind the development of the algorithm was a theory that people’s buying habits are more likely to change when they go through a major life event, such as a pregnancy, and therefore targeting advertising campaigns to such customers could reinforce a habit of shopping at Target stores.

The purpose of this research report is to develop a better understanding of the concept of predictive analytics, which is the underlying process described in Duhigg’s article. Predictive analytics is a general purpose analytical process that can be applied in sectors as diverse as retail to boost sales, law enforcement to predict crime, and health programs to monitor for disease outbreaks. In that regard, it is not a straightforward concept to define or describe, and the privacy implications could vary from privacy neutral to privacy invasive, depending on its application. Moreover, it is important to acknowledge that predictive analytics is a process that is closely intertwined with previously known notions of data mining; however the inferences extend beyond retrospective pattern analysis to a result that is more prospective and anticipatory.

Much of the research available on predictive analytics and referred to in this paper relates to the U.S. context and the practices of American companies. In the absence of hard data we still do not know precisely to what extent companies in Canada use predictive analytics, or for what purposes. Nevertheless, by looking at the U.S. context and research we can make some inferences about the current or future practices in Canada. By monitoring the trends in predictive analytics, we can move towards a better understanding of how it may be used by companies, organizations or government, and how it might impact individuals in the Canadian context.

In that regard, this paper will be a starting point to explore some of the underlying themes and conceptual frameworks for understanding predictive analytics, and to discuss a few of the various applications of this technology in the private and public sectors.

This paper will:

  • examine the concept of predictive analytics;
  • provide an overview of the context for predictive analytics;
  • identify some of the applications in the private and public sectors;
  • outline some of the broader privacy implications it can raise for individuals and for society as a whole; and
  • examine predictive analytics in relation to privacy principles in PIPEDA and the Privacy Act.

Top of Page Table of ContentsThe Concept: “Predictive Analytics”

The first question many people ask is how does predictive analytics differ from the data-mining that governments and companies have been doing for some time? Data mining is defined as “the process of discovering interesting patterns and knowledge from large amounts of data.”Footnote 5 Both data mining and predictive analytics are processes that apply sophisticated mathematics and statistical analyses to data in an attempt to discover knowledge and patterns. Although they are related concepts, perhaps even synonymous in terms of process, predictive analytics gives us new clues as to how data mining practices are advancing and becoming increasingly intelligent.

Predictive analytics marks a progression from simply identifying patterns to making predictions based on patterns. Computerworld (2006) defines predictive analytics as “the branch of data mining concerned with forecasting probabilities.”Footnote 6 From this definition we see that predictive analytics is a concept that is more uniquely forward-looking, and when personal information is the raw data, predictive analytics is the process attempting to forecast our future behaviours or intentions. SAS, one of the world’s largest business analytics companies, says predictive analytics is about “revealing previously unseen patterns, sentiments and relationships [emphasis added].”Footnote 7 So where data mining describes the exploratory process of finding patterns and knowledge within data, predictive analytics then attempts to leverage that knowledge derived from data to anticipate meaning and make predictions about the future.

Predictive analytics is a general purpose analytical process that enables organizations to identify patterns in data that can be used to make predictions of various outcomes, not all of which have an impact on individuals. Software products are now becoming more readily available to companies to implement analytics into their business models.Footnote 8 They can be used by organizations to avoid risk, make unprofitable customers more profitable, retain profitable customers, reduce business expenses, identify fraud, avoid process failures, or even to analyze the effects of health treatments.Footnote 9 Emerging capabilities include real-time analytics and the analysis of unstructured information such as text.Footnote 10 The key point to note is that the types of decisions that can be made based on the outcomes of analytics are always advancing into new realms. The trend is a shift away from data mining that presents findings that are mere aggregations or characterizations of patterns in data;Footnote 11 predictive analytics is characterized by its ability or attempt to forecast, anticipate, or infer.

Top of Page Table of ContentsThe Context for Predictive Analytics

Technological innovation and the shifting nature of consumption on the Internet have played an important role in the emergence of predictive analytics as a tool for business and governments. The devices we use and the convergence of different technologies have prompted new channels and sources of data, leading to the proliferation of data at exponential rates. In recent years we have seen the term “Big Data” emerge to describe this trove of information that is garnered from people’s everyday activities, things we consume, and our interactions with other people and objects. This Big Data has in turn become increasingly valuable to organizations and making predictions has become part of the formula for success. The following section will capture some of the essential elements of the overall context and describe how technology plays a role as a catalyst for predictive analytics, how Big data is the essential ingredient, and how the desire to achieve success through data-driven decision-making is setting this trend in motion.

The catalyst: the platforms and incentives make all of us the product

The online environment is at the root of the proliferation of data and ultimately, the emergence of tools to analyze it. Most of our online activities, such as the use of social media, applications, companies requiring us to register and create shopping profiles, etc., all prompt or persuade us to reveal information about ourselves. Sometimes we do not have control over what is requested, for example when we must fill out required fields in order to activate a service or purchase something online. Other times people share personal information willingly in a social manner or in exchange for a benefit or perceived benefit. The Internet of free platforms, free services and free content is laced with incentives, rewards and benefits to participation, whether it be for the convenience of the platforms, the enjoyment we get from being social, or the lure of deals on products we are interested in. The platforms and incentives are all designed specifically to encourage individuals to offer up information about themselves,Footnote 12 and as aptly put by Professor Zittrain in 2011, "If what you are getting online is for free, you are not the customer, you are the product.”Footnote 13

As consumers in particular, a great deal of personal information can be extracted from our consumption patterns and online activities. Social media companies and other corporations are shifting the behavior of consumption on the Internet and linking people and objects together through our social interactions and through various different technologies: “the use and convergence of the web, mobile phones, electronic financial systems, biometric identification systems, RFIDs, GPS, ambient intelligence, and so forth, all participate in the automatic generation of data which become available for still more pervasive and powerful data mining and tracking systems.”Footnote 14

These technologies can all give companies insights about what is happening now, or what will happen soon; “faster than real-time” is the new goal for digital products.Footnote 15 Organizations are all vying to get access to unique information and that encourages each company to seek new avenues for data collection.”Footnote 16 The real-time transactional customer data and the integration of multiple online platforms and technologies will only make the descriptions of patterns of behaviour and predictions of future behaviour increasingly accurate and meaningful.Footnote 17

The ingredients: our data crumbs

Our contemporary society is a data driven world. The concept of Big Data has been around for some time, but the scale of the concept is rapidly escalating. It is not only about the amount of data, but also how data are increasingly interrelated.

Big Data is a concept closely intertwined with predictive analytics because the data points are the ingredients that feed the application of predictive algorithms. In this contemporary and tech-driven society almost everything we do produces a stream of personal information.Footnote 18 Between the user-generated content that is offered up by individuals, and the personal information that companies extract from our consumption activities, it is becoming increasingly possible to capture a fairly detailed picture of how we organize and go about our daily lives: “We have gone from having little bits and pieces about us stored in lots of different places off- and online, to having fully formed pictures of who we are. All digitally captured and stored.”Footnote 19 It is becoming harder to maintain distinctions between our offline self and our digital activities, particularly as we interact with handheld mobile technologies.

With the scale of Big Data ever-increasing, so too is its value. Some have argued that “data is the new oil,”Footnote 20 meaning the new commodity to be refined (analyzed) and then exploited. The explosion of consumer data has created an entire industry that purports to draw meaning from that data. Companies are resorting to collecting vast amounts of data on consumers in order to gain a competitive advantage.Footnote 21 For many, the only way to survive in the global economy is to “embrace and leverage the power of information.”Footnote 22 The exploding data deluge, comprised of our personal information as the raw data, and the increasing capacities for storage and information technologies, are contributing to this rise in the use of analytics.Footnote 23 The true value comes from measuring our behavior and inclinations in fine detail and as a basis to make predictions about future events.

The impetus: to know something before it happens

The desire to derive meaning from data is what motivates a great deal of the collection, storage, and dissemination of information, and why there is a push to find more and more advanced techniques for information analysis.Footnote 24 It is now about learning whatever we can about people, their attributes, and past actions in an effort to understand their predispositions and predict future actions.Footnote 25 Helen Nissenbaum, in her examination of pivotal technological transformations, describes this tendency as an “unbounded confidence placed in the potential of information processes and analysis to solve deep and urgent social problems” and explains that “the confidence fuels an energetic quest both for information and for increasingly sophisticated tools of analysis.”Footnote 26 Nissenbaum reminds us that this trend is not limited to the context of commercial interests, but that the appeal of analytical methods and tools such as predictive analytics will be sought after by a variety of decision-makers in different sectors, for example, in the financial sector, insurance and credit-card companies, health/hospitals, national security and law enforcement.Footnote 27 Depending on the application, the combination of big data coupled with the desire to predict and the intelligent capacity of predictive algorithms could be what truly magnifies the implications for privacy.

Top of Page Table of ContentsThe Applications: Who is trying to predict what?

The growing appetite and emphasis on using data to drive decision-making can have several influences; sometimes it’s about anticipating outcomes that will generate more profits, and sometimes about managing risk or preventing negative outcomes. In a forthcoming book chapter, Ian Kerr characterizes different purposes for utilizing predictive technologies and analytics.Footnote 28 One type of prediction he calls “preferential predictions”, which are an attempt to anticipate individual preferences or inclinations, often for the purpose of tailoring offerings of products and services. Another type is “preemptive predictions”, which are an attempt to anticipate and prevent certain types of actions that are likely to generate social risk.Footnote 29 These two concepts can provide a basis for understanding some of the outcomes being sought by different applications of predictive analytics. The following section will offer some examples from both the private and public sectors to illustrate the desired outcomes of preferential and preemptive predictions.

Top of Page Table of ContentsTargeted Advertising

As we saw with the Target news story, the application of predictive analytics can help companies be more effective in targeting advertising with the view to increasing profits. Companies want to be able to infer their customers’ preferences, identify prospective or profitable customers, and to target appropriate products and services at precisely the right moment.Footnote 30 Terry O’Reilly calls this activity Hyper-Targeting, and says that this practice will allow companies to send perfectly-tailored advertisements directly to individuals, based on deep knowledge of that individual's personal life, at the exact moment they are about to buy something.Footnote 31 The advertising context is a good example of how the potential for large profits can shape how companies’ value and use personal information and what motivates them to use predictive analysis to better understand who we are, what we want, and when we want it, in real-time. Predictive analytics will enable companies to effectively perform this type of targeted advertising, and is a clear illustration of how a preferential prediction could be a lucrative purpose for using predictive analytics in the private sector.

Top of Page Table of ContentsSocial science by social media

Predictive analytics is becoming a tool that helps draw valuable insights from collective user-generated and unstructured data that reveals things about human behaviour and communication patterns, sentiment analysis, and patterns of social influence. “Google searches, Facebook posts and Twitter messages, for example, make it possible to measure behaviour and sentiment in fine detail and as it happens.”Footnote 32

Facebook has the most extensive data set ever assembled on human social behaviour and its team of data scientists are looking for innovative ways to mine the troves of data for insights into human communication and social behaviour.Footnote 33 Since Facebook collects data from users interacting in real time, its data science team is in a unique position to be able to experiment and analyze patterns and motivations behind human social behaviour, preferences, and interactions. For example, its data can give insights into why and the extent to which ideas or fashions spread between people or the extent to which a person’s future actions are influenced by communications with their friends. In real-time, Facebook could track a social trend or calculate a country’s “gross national happiness” by analyzing key words and phrases that signal positive or negative emotions towards various things.Footnote 34 It could prove incredibly lucrative for Facebook to sell ‘insights’ mined from its data to companies who want to know how to induce people to share content or click on ads, or to economists and other researchers studying human social behaviour.

However, generating revenue is not the goal of all organizations interested in this type of application. A new initiative by the United Nations called “Global Pulse” is seeking to conduct sentiment analysis of messages in social networks and text messages to help predict job losses, spending reductions or disease outbreaks in a given region. The goal for the UN is to use early-warning signals and then direct assistance programs in advance of problems, such as preventing a region from slipping back into poverty.Footnote 35

Top of Page Table of ContentsLaw enforcement and intelligence

Law enforcement and intelligence agencies have for a long time been using data mining and profiling techniques to predict or identify potential threats or criminal activity. In a society that is increasingly preoccupied with risks and threats,Footnote 36 there is a continuous concern that anyone can be the “bad man” and enthusiasm for predictive technologies that preempt or prevent conduct that is perceived to generate social risk.Footnote 37 Predictive analytics products are becoming more popular amongst law enforcement agencies, and are already being implemented in the U.S. to help law enforcement forecast “hot spots” based on times and locations of previous crimes, combined with incident records, and historical and sociological information about criminal behavior and patterns.Footnote 38 These pre-crime detection technologies continue to be developed and tested, some already claim that they can predict when crimes will be committed and who will commit them, before they actually happen.Footnote 39 IBM’s analytics tool touts that predictive analytics will help police move from “sense and respond” to “predict and act.”Footnote 40 Other programs seek to analyze behaviour and attribute patterns that are associated with criminal or terrorist activity.Footnote 41 Mandates for public safety and national security are commonly preoccupied with predicting which individuals are most likely to be a terrorist or commit crimes. With this as a goal, the heightened interest in preemptive type predictions only continues to grow.

Top of Page Table of ContentsLocation tracking

Mobile smartphones and tablets are making it possible and popular for someone to “geo-tag” their location in real-time. Mobile apps and online services are not only encouraging and facilitating this trend, but there is now evidence from a recent study that suggests predicting someone’s future location by tracking mobile phone usage is proving quite accurate and effective.Footnote 42 In this study, an algorithm was able to predict a mobile phone user’s future GPS coordinates to within approximately 1,000 square metres. When the prediction took into account additional information from a single friend, the future location could be predicted to within 20 metres.Footnote 43 Even without geo-tagging, the study also showed that location could be predicted with similar accuracy using the geographic location of cell network towers.Footnote 44 The ability to track and predict an individual’s movements could be appealing for companies interested in tailoring advertisements based on preferential predictions, or to law enforcement seeking to predict and preempt criminal activity. Companies would be able to predict an individual’s movements so it can offer a tailored deal at precisely the right moment. For example, a company using the technology could predict when you go on coffee break and what location you tend to go so that it can offer a special coupon just as you are heading out the door.Footnote 45 Alternatively, law enforcement could use the technology to track and predict the location of certain individuals who are suspected criminals. Provided that law enforcement could obtain a warrant to access the GPS location data, the algorithms would enable them to monitor patterns in a suspect’s movement and intervene when the algorithm suggests future movement to an unusual area.Footnote 46

Top of Page Table of ContentsFraud Prevention

There are also areas in which the government or the private sector could utilize predictive analytics for fraud prevention. Government programs concerned with fraudulent transactions or claims for government benefits, insurance and credit reporting, could use pattern and trend analysis aimed at detecting and deterring fraud or false claims. For example, Service Canada's Integrity Services Branch is utilising statistical software as part of a Predictive Risk Analysis pilot project designed to detect Employment Insurance (EI) fraud and abuse.Footnote 47 The idea is that the predictive risk tool would analyze multiple databases and significantly improve the identification of EI applicants who have been overpaid. Each file that is flagged for review by the system would then be investigated. The program represents a shift to automated fraud detection and general risk management, which is now facilitated through the use of the analytics tools.

Top of Page Table of ContentsThe Privacy Implications

It is over simplistic to presume that all or most data analytics is entirely problematic from a privacy point of view.Footnote 48 Predictive analytics can take many forms, and the privacy implications will vary according to the context in which it is used, as well as the scope and implementation. Big Data and intelligent predictive analytics could, on the one hand, help advance research, innovation, and new approaches to understanding the world and make important and socially valuable decisions in fields such as public health, economic development and economic forecasting.Footnote 49 On the other, advanced analytics prompt increased data collection, sharing and linkages, as well as has the potential to be incredibly invasive and intrusive, discriminatory, and yet another underpinning of a surveillance society. The following section will contemplate some of the broader individual and societal privacy implications that can arise with the implementation of predictive analytics.

Top of Page Table of ContentsPredictive analytics can be “creepy”

While it is not a universal reaction, predictive analytics in certain contexts can prompt a "creepy" or unsettling feeling of being under the gaze of an omniscient observer who knows something about us and our behavior.Footnote 50 danah boyd has argued that the collection of data in and of itself is not a violation of privacy, but explains that “piecing it together and using it to “stare” is a serious violation of privacy norms.”Footnote 51

Mis-characterizations and inaccuracies are certainly a problematic outcome of analytics; however, accurate predictions could be an even greater invasion of privacy in certain contexts. Millar points out that the outcomes of predictive analytics can reveal things that form part of what he refers to as our “core private information.” He explains that the use of predictive analytics and data mining can violate core privacy when it reveals an individual’s unexpressed desires, beliefs or intentions to which only the individual would have first-person access.Footnote 52 He further explains that inferences made about us through the use of predictive algorithms, or deep analysis by a team of trained specialists with access to stores of data, could lead us to feel as though our core privacy was violated because they are often inferences that are beyond what could otherwise be easily observed or known by others about us.Footnote 53

In other words, presumptions that are made about people based on activities that are easily observable are not generally alarming to people. On the other hand, presumptions that are derived from a deeper and broader inspection of our activities, or that are done with technical assistance may be unexpected or go beyond our reasonable expectations.

Top of Page Table of ContentsOpaque processes and outcomes

Predictive analytics is usually an opaque process. Even where a company acknowledges that it will use personal information in analytics, for example in a privacy policy, we know that most people do not read or understand the complex and arduous legal language in which they are usually written. A recent poll commissioned by the Office of the Privacy Commissioner of Canada found that only 50% of Canadians who responded to the survey indicated they “rarely” or “never” consult privacy policies, and a majority (62%) of Canadians felt that the privacy policies of Internet sites they visit are somewhat or very vague in terms of giving them the information about what the company will do with their personal information.Footnote 54 Moreover, even where people read the fine print of a privacy policy or terms and conditions, human beings in general are not very good at weighing the impact of consequences that could be well into the future, and the risks increase as we disclose more, something that the design of social media conditions us to do.Footnote 55

For the most part, the extraction of personal information happens without consumers knowing exactly how much they have really provided. People usually do not have any choice as to what personal information they have to give up, or knowledge of who has access to it or how it is used.Footnote 56 While individuals do have some control at the front end, in terms of what they post and share on the Internet and what transactions they complete, their choice to participate is not usually based on a comprehensive understanding of how their data is being manipulated behind the scenes and beyond the moment of their transaction.

Some researchers refer to this as an ‘asymmetrical information flow.’Footnote 57 Companies, organizations, and governments are all trying to learn more intimate details about consumers and citizens, their behaviours, habits, intentions, etc, but individuals know proportionally very little about the organizations with whom they interact.Footnote 58 The complex processes that underlie predictive analytics or data mining techniques are usually quite lost on many individuals, if not most. This leaves people perplexed and completely unaware of the reasons why they may have been “denied a loan, targeted for a particular political campaign message, or saturated with ads at a particular time and place when they have been revealed to be most vulnerable to marketing appeals.”Footnote 59 Even if companies were to include detailed references to their activities, for example in a privacy policy, it seems unlikely that individuals or consumers would have the motivation or capacity to learn as much about them as they do about us.

Moreover, when predictive analytical techniques are used by government it will often be the case that the analytical process and its outcomes are tightly concealed from the public for reasons of national security or public safety. Regardless of the specific application many of the prediction algorithms and software applications are opaque because they “are subject to copyright and trade secret laws, so the public does not get to know who wrote them, how they work or whether the assumptions upon which they are based are sound.”Footnote 60

Top of Page Table of ContentsDiscrimination and damage to reputation

Where predictive analytics are used to make inferences about future behaviours and intentions, there is a risk that individuals will be profiled and categorized and potentially become the subject of discrimination according to those predictions. If we consider this issue in the context of targeted advertising, the concern would be that profiles of consumers may lead to “exclusion of access to goods and services” or “price discrimination based not on the goods and services, but on the identities or profiles of customers.”Footnote 61 In this scenario, dynamic pricing responds to personal characteristics such as wealth (ability to pay), urgency of need, vulnerability to certain enticements, or marketing approaches.”Footnote 62

The issue is not simply about the collection of information, but also what is inferred from the information. Reputation can be a gatekeeper to services, and it is easy for reputation to be built upon inaccurate information or information taken out of context.Footnote 63 It is one kind of problem to be excluded from receiving a certain type of advertisement, but it is potentially much more damaging to be excluded from a government program or subject to a disproportionate level of scrutiny based on mis-information.

Top of Page Table of ContentsPreemption could undermine due process

The aim of predictive and pre-emptive analysis is to make assumptions about what will happen before it even happens. From an ethical perspective, the careless and excessive adoption of technologies that anticipate wrongdoing before it even occurs could have a significant impact on our traditional models of justice, due process and individual freedoms.Footnote 64 The due process concept requires that individuals have an ability to observe, understand, participate in and respond to important decisions or actions that implicate them.Footnote 65 The opaque nature and complexities of analytics could make it more difficult to accomplish this level of transparency and fairness in process.

Top of Page Table of ContentsPredictive Analytics and PIPEDA

Private sector companies in Canada that carry out predictive analytics using personal information will have to ensure their practices are in compliance with the privacy principles contained in PIPEDA. Consent, purpose and use limitation, openness and transparency, and accountability will be some of the key areas to reflect on when we examine how predictive analytics is utilized in Canada in different contexts. The following section will identify some of the trigger points where predictive analytics could raise concerns under PIPEDA.

Top of Page Table of ContentsKnowledge and consent

Consent is a key governing principle under PIPEDA, requiring that individuals have a basic understanding of how their information will be used in order to provide meaningful consent for its collection and use. The issue is that a traditional model for consent is somewhat difficult to apply to the complex and dynamic scenario of data analytics. While people value privacy, they seem to be largely willing to exchange their personal information for online goods and free services. Companies tend to obtain consent for these practices by including an ambiguous “notice” to individuals, and that notice is typically buried deep within the confines of a privacy policy that contains complex legal language. Some argue that this form of consent has no meaning unless individuals have a genuine awareness of the profiles that are being compiled and fully understand how their data is being manipulated: “To know which of your data you want to hide you need to know what profile they match; to know if you want programs and profiles automatically adapted to your behaviour, you need to know when and how this happens… [and] these initiatives should not merely be left to contingent market incentives.”Footnote 66 It is extremely onerous to ask individuals to derive this understanding and provide meaningful consent by reading the convoluted language in privacy policies. Moreover, the reality is that mergers and subcontracting, data sharing agreements between companies and organizations, the scope of data collection points and the linkages being made, and the capacities of the predictive algorithms that only data scientist can really understand, all make this technical and business environment complex and variable over time.

Top of Page Table of ContentsOpenness and transparency

The complexity and variability of the online and business environment also poses problems for transparency. Achieving transparency should mean that information handling practices are conveyed to users in a way that is relevant and meaningful to the choices they must make.Footnote 67 When we consider the power dynamics and the information asymmetries, the goals of organizations versus the goals of individuals to be social or to utilize innovative technology, and the complexity and largely hidden analytical tools such as predictive analytics, it is no wonder that transparency is a difficult privacy principle to observe. Nissenbaum does not think it is possible to explain the current online advertising ecosystem in a useful way without resorting to a lot of detail. Typically, an organization will explain its personal information handling practices using complex legal and contractual language contained in lengthy terms and conditions or privacy policies, which are generally not read or understood by a majority of people.

Nissenbaum’s term the "transparency paradox,"Footnote 68 captures the essence of this problem. She explains that if a privacy policy finely details every flow, condition, qualification, and exception, it is unlikely to be understood, let alone read;Footnote 69 however, summarizing information handling practices in a more simplistic style is no more helpful because it omits the important details that are likely going to make a difference for privacy.Footnote 70 Describing the practices and intended outcomes of predictive analytics is already challenging due to its intricacies and complexities, and there should be clear concerns raised where organizations obscure the details of its activities within ambiguous privacy policy statements.Footnote 71 One should be mindful of “simulated transparency,”Footnote 72 whereby the approach to transparency actually ends up being incomprehensible to the users and overly permissive to the company.

Top of Page Table of ContentsAccountability

Accountability is a key governing principle for organizations that to implement predictive analytics. Being an accountable organization is about more than simply having privacy policies or designating a chief privacy officer. It is about having a business model that gives effect to all the privacy principles, and their underlying meaning.

Ethics is at the foundation of the fair information and privacy principles, and other notions such as appropriate flows of information, reasonable expectations of privacy and contextual integrity. It is fundamentally about acting in consideration of the effects on others and in that way should always play a key role in assessing the privacy implications of the different applications for predictive analytics. Paul Schwartz underscores the importance of ethical analytics and a contextual approach to understanding its implications. He emphasizes that essential components to responsible and ethical use of analytics are the privacy principles, such as notions of accountability and proportionality.Footnote 73 He provides some useful overarching ethical considerations that companies and organizations should respect before using predictive analytics:Footnote 74

  • Ethical use of analytics should be driven by a company’s obligation to be a socially responsible actor.
  • An organization’s processing, analysis, and decision-making through analytics should respect cultural and social norms about acceptable behaviour, and the use of “sensitive” information.
  • A company needs to be accountable; acknowledging that analytics could have a negative as well as beneficial impact on individuals.
  • A company should assess the impact of its use of analytics on the trust in the company held by a wide range of stakeholders.

An assessment of potential impact is the starting point for organizations who are contemplating the use of predictive analytics. This notion ties in well with the OPC’s Accountability Document,Footnote 75 which states that “an accountable organization can demonstrate to customers, employees, shareholders, regulators, and competitors that it values privacy, not only for compliance reasons, but also because privacy makes good business sense.”

Top of Page Table of ContentsPredictive Analytics and the Privacy Act

Government departments and agencies in Canada that carry out predictive analytics using personal information will have to ensure that they are doing so in accordance with the provisions of the Privacy Act and in accordance with Treasury Board (TBS) policies, such as the Directive on Privacy Practices and the Directive on Privacy Impact Assessments. The proliferation of data has been a key catalyst and ingredient in the emergence of predictive analytics in the private sector, and the public sector trend is similar in that it is marked by increased information sharing across different programs, increased outsourcing or contracting with private sector companies, and increased collection of information particularly where “intelligence” is being sought.

The Privacy Act does place some limits with regards to the collection of personal information, namely that personal information is only collected where it directly relates to an operating program or activity of the institution, and government programs shall, wherever possible collect personal information directly from the individual to whom it relates. Government departments are required to inform individuals of the purpose and authority for the collection of personal information, as well as identifying new consistent uses for personal information. It may be in those new “consistent uses” that we find the emergence of predictive analytics as a tool to replace manual scrutiny and analysis of large quantities of data. However, and particularly with public safety and fraud prevention programs, the decision to implement predictive analytics should be preceded by a careful consideration of the privacy impacts, and thorough assessment of the necessity for using predictive analytics, whether its use is reasonable and proportionate to the outcome being sought, and whether the program will be effective while also being minimally intrusive.Footnote 76

Top of Page Table of ContentsConclusion

The Target story that initiated this research report was a revealing illustration of the prominence being placed on data analytics and the position of “data scientists” within companies. The pregnancy-prediction algorithm was concerning because it demonstrated how analytics can yield very personal inferences about people, is generally a very opaque process, and can generate feelings of being manipulated or profiled.

This is a fast evolving field and the scale of data aggregation and analysis is escalating at a magnitude that outpaces much of the concerned dialogue around the practice. Predictive analytics is a tool that can be applied in a variety of different ways, and ethical considerations and fair information and privacy principles can help frame a contextual approach to assessing the privacy risks associated with a given application of predictive analytics, and identify the uses of most concern. Applying ethical considerations should always begin with the realization that the decisions made based on the outcomes of analytics can have a negative effect on people, that certain information should not be collected for the purposes of analytics, and that there should be boundaries and reasonable limits on the types of assumptions that can be made about peoples’ future intentions and behaviours.Footnote 77

Duhigg claims that predictive analytics experts are saying “someday soon, it will be possible for companies to know our tastes and predict our habits better than we know ourselves.”Footnote 78 The vast potential and expanding scope for predictive analytics makes it an issue that is very much on the radar of those concerned with privacy. However, its complexity and obscurity make it difficult to grasp the extent to which it is already used in Canada, what outcomes are being sought, to what extent it is effective or not, or to delineate in advance which specific forms of analytics will raise privacy concerns, and those that may not. In that way, this research report is only the first step in monitoring the trends in predictive analytics and acquiring a better understanding of the challenges that lie ahead.