Compromised search histories have the potential to jeopardize an organization’s intellectual property and strategic plans.
Background
In July, AOL researchers mistakenly published over 21 million search queries generated by approximately 658,000 users. The data was downloaded and reposted across the Internet. In the face of immediate protests from the on-line community, AOL removed the data from its web site and issued a public apology. And, AOL was disingenuous in its apology when it claimed, “there was no personally identifiable data linked to these accounts.” While the search data may not have contained an IP address or username, it did group individuals’ searches together with a randomly generated user ID. Therefore, an identity can be exposed by piecing together the users’ search terms. For example, the New York Times was able to identify a female resident of Lilburn, GA, by examining the details of her user ID.
Threat to Organizations
Just as this woman’s identity was compromised through an examination of the search data, an organization’s intellectual property or future corporate strategy can also be exposed through a similar examination of legally or illegally obtained search data. In essence, aggregated search data is nothing more than a keystroke logger, an application that records all the keystrokes of a user, for a search engine.
As a result, organizations place themselves at risk to another form of information leakage should they use search engines in an irresponsible manner. For example, an organization that is conducting research into a patent application may expose its future direction to those individuals or organizations that can access to its search history.
Recommendations
As a result, TRC customers should consider revising their information security policies to account for the possibility of future information leaks through the release of search query data. TRC customers can protect their sensitive search query data, such as intellectual property and future strategies, by utilizing sophisticated dynamic proxy software. Proxy software, such as Tor, allows organizations to browse the Internet and conduct searches anonymously by masking their IP address. According to its web site, Tor “communications are bounced around a distributed network of servers called onion routers, protecting you from websites that build profiles of your interests, local eavesdroppers that read your data or learn what sites you visit, and even the onion routers themselves.”
Tor hinders others from grouping organizations’ search queries together because it randomly generates a new connection to the search engine and makes the search think that a new user from a different location is conducting an entirely unrelated search. As a result, hackers and other predators will have a more difficult time building a full picture of what a particular organization is researching.