Complaints under the Personal Information Protection and Electronic Documents Act (the Act)
The Office of the Privacy Commissioner of Canada initiated three complaints against Google Inc. (Google) on May 31, 2010, pursuant to subsection 11(2) of the Act, after being made aware that Google Street View cars had been collecting payload data from unencrypted WiFi networks during their collection of publicly broadcast WiFi signals (service set identifiers [SSID] information and Media Access Control (“MAC”) addresses.
The three complaints are as follows:
Google’s collection, use or disclosure of payload data was done without the individual's prior knowledge and consent;
Google’s collection of payload data was done without prior identification of the purposes for which personal information (PI) was collected;
Google’s collection of payload data was not limited to that which was necessary for the purposes identified.
Summary of Investigation
Following a request from the German data protection authority in Hamburg to audit the WiFi data collected by Google’s Street View cars during a location-based project, Google discovered in May 2010 that it had been collecting payload data from unsecured wireless networks as part of its collection of WiFi data. By Google’s own admission, it appears that this inadvertent collection was due to the integration of the code developed in 2006 with the software used to collect WiFi signals. As a result, Google grounded its Street View cars, stopped the collection of WiFi network data on May 7, 2010, and segregated and stored all of the data already collected.
On June 1, 2010, our Office sent a letter to Google stating that she was launching an investigation with regard to its collection of payload data. Google responded on June 29, 2010.
On June 28, 2010, pursuant to subsection 11(2) of the Act, this Office requested to undertake a site visit to Google’s facility in Mountain View, California. The purpose of this site visit was twofold: 1) to allow the review of the payload data gathered by Google, and 2) to ask specific questions of Google’s representatives, such as the circumstances surrounding this incident, the segregation and storage of the payload data, and the mitigation and prevention measures Google intended to implement.
Google agreed to a site visit. Two technical representatives from this Office then went to the Mountain View facility on July 19, 2010. Although our technicians reviewed the payload data, no Google representatives were available in Mountain View to answer our questions. Instead, by letter dated July 16, 2010, Google answered general questions we posed in a questionnaire we sent on July 12, 2010.
On August 18, 2010, a videoconference was held between Google’s counsel and this Office in order to answer supplementary questions.
The results of our investigation into the three complaints against Google are summarized below in the following sections:
Google’s Product Counsel’s involvement in product review;
Circumstances surrounding the collection of payload data and technical testing;
Personal information collected;
Segregation and storage of the payload data;
Google’s future plans for its location-based services; and
Privacy implications of future plans, and mitigation and prevention measures that Google intends to implement to prevent a recurrence.
A. Google’s Product Counsel’s involvement in product review
Google advised that it has a formal review process for each external product launch. (“External product” denotes a product to be offered to consumers.) This process requires that a Product Counsel assess, among other things, the privacy implications of the product.
Since the code ultimately used to sample all categories of publicly broadcast WiFi data is not considered by Google to be an external product, the formal review process did not apply.
However, our investigation learned that Google’s code design procedure includes a template and process by which the code must be reviewed by Product Counsel before being used or integrated with another Google product. The template—a methodology document—is in fact mandatory and is the first step in the code design procedure.
Our investigation also learned that in the code design-procedure document for the particular code later to be used for the collection of WiFi signals, the engineer did identify one or more privacy concerns about the information collection. These relate to the fact that Google could obtain sufficient data to precisely triangulate a user’s position at a given time.
The engineer qualified his concerns as being “superficial privacy implications”. He did not forward his code design documents to Product Counsel for review—contrary to company procedure. Thus, the code’s privacy implications were never assessed.
We were also informed that Google’s Product Counsel Members consist of practising lawyers with various legal backgrounds. Google claims that they usually have some private-sector experience in privacy issues.
According to Google, Product Counsel Members attend the same introductory training session available to all new Google employees. As well, Product Counsel Members participate in weekly privacy- and security-issue meetings. Google also claims that “Privacy is part of the ongoing CLE [Continuing Legal Education] obligations of Google counsel.”
B. Circumstances surrounding the collection of payload data and technical testing
Google allows its engineers to use 20% of their time to work on projects of interest to them. When using this time in 2006, a Google engineer developed code to sample all categories of publicly broadcast WiFi data.
The engineer involved included lines to the code that allowed for the collection of payload data. He thought it might be useful to Google in the future and that this type of collection would be appropriate.
This code was later used by Google when it decided to launch a particular location-based service. The service relies on a variety of signals (such as GPS, the location of cell towers and the location of WiFi access points) to provide the user with a location. Google installed antennas and appropriate software (including Kismet, an open-source application) on its Google Street View cars in order to collect publicly broadcast WiFi radio signals within the range of the cars while they travelled through an area. These signals are then processed to identify the WiFi networks (using their MAC address) and to map their approximate location (using the GPS co-ordinates of the car when the signal was received). This information on the identity of WiFi networks and their approximate location then populates the Google location-based services database.
In its representations to this Office, Google provided technical information on how it uses WiFi network data for location-based services. Google stated that its software does not store payload transmissions from encrypted networks, but that payload data sent over unencrypted WiFi networks is collected and “dumped” on a disk in raw format.
However, according to Google, the information thus collected would be fragmented because its cars are on the move when collection occurs and the equipment it uses to collect WiFi signals automatically changes channels five times per second.
To our investigation, Google acknowledged that it erred in including in the WiFi-network information-collecting software any code allowing the collection of payload data. Google contends that the code was primarily designed for data-collection software and that this purpose preceded its ultimate application in the collection of WiFi network information for location-based services. Google claims that it did not realize the presence of this code when it began using the software for its geo-location project.
It claims that when the decision was made to use the software for collecting publicly broadcast WiFi information, the code was reviewed for bugs and validated by a second engineer before being integrated with, and installed on, Street View cars. The purpose of this review was to ensure the code did not interfere with normal Street View operations. The code was not further examined to verify what kind of data was actually being obtained through the collection of WiFi publicly broadcast signals.
Google admitted that since it was not its intention to collect payload data and it never intended to use payload data in any of its products, it was not in a position to identify any purposes for the collection of these data or seek consent from affected individuals. Google also admitted that it did not inform any affected individuals of the fact that it was collecting payload data since its employees did not realize they were doing so until May 2010.
Google provided three reasons to explain why the collection of payload data was not discovered earlier:
No one other than the engineer who developed the code was interested in looking at this program. No one thought payload data would be useful and no one had planned to use this data.
Payload data comprised a minuscule amount of the total data collected. Its collection was thus of minimal concern and no one had any reason to examine it.
The engineer had not seen the ramifications of including this code and, consequently, had not spoken of it with his manager.
Google also asserted that since it had no purpose for the collection of payload data, there cannot be any justification for its retention. Consequently, Google is anticipating its secure destruction as soon as possible and is seeking this Office’s authorization to do so.
Our investigation revealed that Google collected WiFi data in Canada from March 30, 2009 to May 7, 2010, and that its Street View cars have driven most urban areas and major roads.
Google stated that it cannot accurately distinguish between WiFi networks and wireless devices. It can, however, identify the unique number of basic service set identifiers (a.k.a. BSSIDs), which generally identify a single WiFi access point. Although the BSSID does identify an access point, it does not indicate how many devices or networks connect through the access point.
Google estimates that it collected over 6 million BSSIDs over the period its Street View cars drove throughout Canada.
C. Personal information collected
Our two technical experts visited Google’s offices in Mountain View, California on July 19 and 20, 2010. The purpose of this site visit was for them to examine the data that had been collected by Google’s Street View cars for Google’s location-based services so as to determine its nature and the quantity involved. Their examination focussed on finding examples of personal information within the WiFi payload data collected in Canada.
Our technical experts searched the payload data to find anything that could constitute personal information (e.g., examples of e-mail, usernames, passwords and phone numbers). They produced an approximate count of possible personal information through an automated search. For example, the count included 787 e‑mail headers and 678 phone numbers. However, a match does not mean a perfect identification. The searches may have included irrelevant items, or missed some items.
To complement the automated search, our experts performed a manual verification for five instances of each type of personal information. This was to demonstrate the existence of each data type, while preventing our experts from intruding too deeply into any individual’s personal information.
Our technical experts found at least five instances of e-mails where they noted the presence of e-mail addresses, complete e-mail headers, IP addresses, machine hostnames, and contents of messages. The messages were truncated in the five instances of e-mails they found, but when performing a manual verification for other items (e.g., phone numbers), they observed complete e-mail messages.
They also found five instances of usernames. These could be seen in cookies, MSN messages and chat sessions. They also found one instance where a password and username were included in an e-mail message that a person was sharing with others to tell them how to log in to a server.
Our experts also found at least five instances of real names of individuals, five instances of residential addresses and five more of business addresses. They noted that, unlike the residential addresses, the business addresses were very common.
They also found five instances of instant messenger headers and five instances of phone numbers—both business and personal phone numbers. Like business addresses, business phone numbers were easier to find than personal ones.
A search for nine-digit or sixteen-digit numbers, which could have been Social Insurance Numbers (SIN) or credit card numbers, did not turn up anything due to there being too many other instances of irrelevant or similar numbers in the dataset. Therefore, although we found no evidence of SIN or credit cards numbers being collected, we still cannot entirely rule out the possibility that they were.
Our technical experts also noticed sensitive items during their searches. For example, they found a list of names, phone numbers, addresses and medical conditions for specified individuals. They also found a reference to someone stopped for a speeding violation, along with address information.
Our experts often saw cookies being passed from client machines to Web servers. These cookies were unencrypted and some contained personal information, including IP addresses, user names and postal addresses. They were surprised by the frequency of unencrypted cookies containing personal information.
In summary, our experts found many instances of personal information in the sample they took of the payload data collected in Canada by Google.
D. Segregation and the storage of the payload data
The WiFi data was collected through WiFi antennas attached to the roof of Street View cars. This WiFi antenna passively received the publicly broadcast radio signals within range of the car using open-source Kismet software. The data was then relayed to a Google-developed application called “gStumbler” and its executable program “gslite”, which processed the data for storage. The data was then saved to hard drives physically located in each Street View car and then subsequently transferred to Google’s servers.
Google alleges it grounded its Street View cars and segregated the payload data on a restricted area of its network as soon as it became aware that its gStumbler application was collecting payload data from unencrypted WiFi networks.
As a follow up step, a Google system administrator copied onto a total of four disks the files containing the payload data collected in all affected countries. This was done from May 9, 2010, to May 13, 2010. These disks contained two copies of the data: one copy obtained after categorizing and labelling the data files by country, and one copy of the data before categorizing.
On May 15, 2010, the system administrator consolidated the payload data onto an encrypted hard drive, segregated by country. A second copy of the encrypted hard drive was made for security and backup preservation. The four original disks were then destroyed in a disk deformer.
A Google employee personally delivered one encrypted hard drive to another Google location for safekeeping, while the system administrator kept the other one in a secure location. Once the Google employee arrived at the destination, the system administrator permanently destroyed the backup, encrypted hard drive. The US data was then segregated onto a separate encrypted drive, while the data from the rest of the world remained on the initial encrypted drive.
E. Google’s future plans for its location-based services
Google still intends to offer location-based services, but does not intend to resume collection of WiFi data through its Street View cars. Collection is discontinued and Google has no plans to resume it.
Google does not intend to contract out to a third party the collection of WiFi data.
Google intends to rely on its users’ handsets to collect the information on the location of WiFi networks that it needs for its location-based services database. The improvements in smart-phone technology in the past few years have allowed Google to obtain the data it needs for this purpose from the handsets themselves.
Although it has no tracking tool to keep records of a customer’s locations (and does not intend to create one), Google acknowledges that it does need to examine the potential privacy concerns of this method of collection.
F. Privacy implications of future plans, and mitigation and prevention measures
Google also states that as products are chartered or otherwise provided with resources and staffing, they are assigned to a Product Counsel in Google’s legal department. This individual has a first-level responsibility for identifying privacy issues in a product.
In order to avoid a recurrence of a product design having a negative impact on privacy, Google claimed to be reviewing its product launch procedures, code review procedures and 20% time policy. In so doing, it would ensure that its internal controls are robust enough to adequately address future issues. As of the issue date of this report, Google’s review of its procedures/policies has not yet been completed.
In making our determinations, we applied Principles 4.1.1 and 4.1.2 of the Personal Information Protection and Electronic Documents Act. Principle 4.1.1 stipulates that accountability for the organization’s compliance with the principles rests with the designated individual(s), even though other individuals within the organization may be responsible for the day-to-day collection and processing of personal information. In addition, other individuals within the organization may be delegated to act on behalf of the designated individual(s). Principle 4.1.2 continues that the identity of the individual(s) designated by the organization to oversee the organization’s compliance with the principles shall be made known upon request.
We also applied Principle 4.2, which states that the purpose for which personal information is collected shall be identified by the organization at or before the time the information is collected.
Principle 4.3 states that the knowledge and consent of the individual are required for the collection, use, or disclosure of personal information, except where inappropriate
Lastly, Principle 4.4 states that the collection of personal information shall be limited to that which is necessary for the purposes identified by the organization.
On September 15, 2010, I shared an earlier version of this report with Google and invited their response. Taking into consideration their response, I have revised my preliminary letter of findings. What follows is a summary of our findings and recommendations.
Collection of personal information
During their site visit, our technical experts uncovered substantial amounts of personal information in the form of e-mail message content (e.g., e‑mail, IP and postal addresses), captured in Google’s collection of payload data in Canada.
Google acknowledged to this Office that it did collect payload data, but not with the intent of using it in any of its products. According to Google, it was “simply mistaken” in collecting the data and did not seek consent from the affected individuals. Principle 4.3 of the Act requires that the knowledge and consent of the individual be obtained for the collection, use or disclosure of their personal information.
Google also stated that it had not identified any purposes for the collection of the payload data. Principle 4.2 requires that such a purpose be identified at or before the time of collection. Further, Principle 4.4 stipulates that the collection of personal information be limited to that which is necessary for the purposes identified. Since no purpose could be identified, it follows that the collection in this case clearly could not be limited to any specific purpose. This is in violation of Principle. 4.4.
Google’s Product Counsel’s involvement
Due to the engineer’s failure to forward his design document to the Product Counsel, the Counsel was unable to assess the privacy implications of the code designed to collect WiFi data. This is a careless error that I take very seriously since a review of design documents by a Product Counsel (and the use of a template) is clearly a mandatory step in Google’s code design procedure.
As a result, the un-scrutinized code was later used to collect data containing personal information. If the Product Counsel had been involved when and as it should have been, Google may have discovered the risk of data over-collection and would have been in a position to remedy the situation before any collection took place. The ensuing negative effects on citizens’ privacy and Google’s reputation could easily have been avoided.
Google informed our Office that engineering and product teams are accountable for complying with Google’s privacy policies and principles. Google then stated that it is working towards improving its code-and-product review processes, as well as accountability mechanisms, for engineering and product management personnel in order to improve their sensitivity to privacy issues at all stages of product and code development. A legal team is working with engineering directors to ensure a comprehensive review of codes for any privacy issues. Google believes that the review of its policies and procedures that it has undertaken will ensure no recurrences. Google stated that it will keep this Office informed as Google completes its review.
Code review and testing
Google asserted that the engineer who developed the lines of code did not see its ramifications of ultimately allowing the collection of a broader range of data from wireless networks. Our investigation was not able to determine with certainty if this was a one-time error committed by one individual or, perhaps, a sign of a more generalized lack of awareness among employees with regards to privacy implications of new products. At Google, the effects of new products on privacy should be well understood not only by the Product Counsel but also by the professionals who develop these products.
In this case, the review and testing of the product containing the code were insufficient to assess privacy impact. It would appear that the review consisted merely of ensuring that the product did not interfere with a second application—that used to collect pictures of the streets navigated by Street View vehicles.
As our investigation revealed, the review was not able to assess the extended capabilities of the product—including its ability to collect more information than necessary for the location-based project.
Steps taken to protect payload data
Once Google realized its Street View cars were collecting more data from wireless networks than anticipated, Google expressed regret in inadvertently collecting the publicly broadcast data. It immediately grounded its vehicles and took measures to safeguard the collected payload data and segregate it by country of origin.
Google’s actions were justified, appropriate and sufficient to safeguard the payload data collected in Canada. In my view, Google upheld the related safeguard provisions under the Act.
Concerning the data that Google collected, it affirmed that it has
no desire to use the Canadian payload data in any manner and will continue to secure the data with strenuous access restrictions until it is deleted.
To this, I would like to add that not only privacy laws, but other applicable laws in the U.S. and in Canada, including laws of evidence, must also be taken into account in determining when to delete the Canadian payload data collected.
The fact that Google does not intend to resume collection of WiFi data with its Street View cars eliminates the possibility of further inappropriate collection of personal information through the tool developed by its engineer.
However, from users’ handsets, Google intends to obtain the information needed to populate its location-based services database. This alternative method of collection could also lead to inappropriate collection and retention of personal information if Google does not put in place appropriate safeguard measures.
I share Google’s goal to avoid recurrences of any similar violations of individuals’ privacy. While I am pleased that Google has taken under review its processes and procedures that could impact privacy, I would nonetheless like the organization to ensure that these controls are complemented by an overarching governance model embodying all privacy issues pertaining to the design of internal/external products and services. I would also like Google to respect reasonable timelines to implement both the governance model and the revised processes and procedures. With this view, and after reviewing the additional information Google provided this Office, I am making the following recommendations:
That Google re-examine and improve the privacy training it provides all its employees, with the goal of increasing staff awareness and understanding of Google’s obligations under privacy laws.
That Google ensure it has a governance model in place that includes:
effective controls to ensure that all necessary procedures to protect privacy have been duly followed prior to the launch of any product;
clearly designated and identified individuals actively involved in the process and accountable for compliance with Google’s obligations under privacy laws.
That Google delete the Canadian payload data it collected, to the extent that Google is allowed to do so under Canadian and U.S. laws. If the Canadian payload data cannot immediately be deleted, the data needs to be properly safeguarded and access thereto is to be restricted.
At this time, I consider the matter to be well-founded and still unresolved. My Office will only consider the matter resolved upon receiving either by or before February 1, 2011, confirmation of the implementation of the above recommendations, at which point I will issue my final report and conclusions.