There was of course no way of knowing whether you were being watched at any given moment. How often, or on what system, the Thought Police plugged in on any individual wire was guesswork. It was even conceivable that they watched everybody all the time.
Nineteen Eighty-four, (George Orwell)
e are all part of an invisible free immaterial labour system, not in a sense of free labor related to production of culture or content in digital economy, but more subtle and unconscious form of work based on our basic existence, our movements, patterns of our behavior and our location in both the internet and physical environment.
As you are connected to the network, information about your behavior is being continuously collected, stored and analyzed by numerous algorithms created to serve different goals for their owners. The market for the analysis of large sets of data is growing by 40% per year worldwide and data about our behavior, our interests, our preferences is for sure one of the most valuable set of data out there.
In this research, our main goal is to dive a bit deeper than the surface of the web and websites we visit and explore the network of hidden beneficiaries, companies that are collecting and analyzing data about our online behavior.
But let’s go a few steps back, into the architecture of collecting all those data. A HTTP cookie (also called web cookie, Internet cookie, browser cookie or simply cookie), is a small piece of data sent from a website and stored in a user's web browser while the user is browsing that website. Every time the user loads the website, the browser sends the cookie back to the server to notify the website of the user's previous activity. This 20 years old concept developed in 1994. became a valuable tool for commercialization and monetization of the network enabling development of user targeting business models that are now the main resource of income for most of the biggest Internet companies.
“Before cookies, the Web was essentially private. After cookies, the Web becomes a space capable of extraordinary monitoring”.
Even the existence of the html cookies was not widely known to the public until 1996, when they received a lot of media attention, especially because of potential privacy implications. Developed by Netscape in 1994, cookes were secretly introduced in Netscape’s first version of web browser, without notifying or asking the consent of users, without notification mechanism to alert people when cookies were being placed on their computer, without any transparency about informations stored in the cookie. In the following 20 years of cookie existence, numerous advocacy groups, online consumer privacy groups, privacy commissioners, commissions and national and international regulatory bodies tried different approaches in educating general public, advocacy and legal regulation of cookies impact on users privacy.
Digital Footprint exploitation
There are 3 main types of targeting methods in the advertising industry: property, user segment, and behavioral targeting. Behavioral targeting, most relevant for our research, is based on a exploitation of our digital footprint, the data that is left behind by users on digital services. This data is collected without the owner’s knowledge in most cases. Our digital footprint can contain different types of information: your IP address, websites that you visit, time and length of your visit, type of your equipment, your search queries, your location, your sex and age, sexual preferences, books that you are buying and many other information depends on a service that you are using. All of those information brought together enable user profiling, process of construction and application of profiles generated by computerized data analysis and allow the discovery of patterns or correlations in large quantities of data about users. As our interaction with the Web becomes more natural and even mediates our interaction with others, Web browsing behavior can be rich enough to uniquely characterize who we are through unconscious behavioral patterns and authenticate ourselves with a cognitive fingerprint .
Advanced targeting methods such as Predictive Targeting, performed by the algorithms, combining behavioral targeting, your history of response, location based data, socio-economic data, weather data or any other relevant data available is able to predict your response to the content in real time and serve you an advertisement most likely to provoke your reaction that will result with the conversion.
According to The Pew Internet & American Life survey from February 2012, 65% of the search engine users say “I’m NOT OKAY with targeted advertising because I don’t like having my online behavior tracked and analyzed”. But, even before the general public is even able to address opinion about this issue, it is important that they are aware of the scale and mechanisms of this phenomenon.
So, if you asked yourself a question: How come Google or Facebook are worth hundreds of billions of dollars even though they are providing a free service? - the answer is they are selling the service of profiling and targeting users, allowing others to serve their advertizing to a selected group of users. For example, the scale and quality of personal data that Google is able to collect today can be far more complex than the government secret services could have collected in the past. The ever growing hunger for data doesn't stop on our screens, but extends to the physical space with mobile phone applications and platforms, biometric data through fitness wearable devices, constant flow of real time data through your Google glasses, Internet of Things devices, navigation data from your Google car, smart houses, smart cities and finally conquering the Earth orbit with a system of satellites providing free Internet.
Unfortunately this invisible ecosystem based on exploitation of user data is the same one that supports free online services and content.
Mapping the Trackers
According to our research conducted on 50 most frequently used websites by the citizens in Serbia there are in average 7 different 3rd party cookies embedded in every website we examined. In total, we detected 174 different types of cookies detected 365 times. Those 174 unique cookies belongs to 87 different companies. There is massive dominance of 4 big US companies: Google (90%), Facebook (46%), Twitter (24%) and Amazon (10%) as well as the Infomediaries Gemius SA (36%), Httpool (7%).
So, even if you are avoiding using Google services, your surfing behavior in 90% of the cases is followed by them. In our sample this is done through 17 different cookies. Google analytics as a most frequent one is installed on 65% of the websites. The second one, owned by the same company, is the DoubleClick, embedded on the 40% of the websites. DoubleClick is a subsidiary of Google, acquired in 2008, for US $ 3.1 billion, responsible for products and services for advertising agencies and media companies to allow clients to traffic, target, deliver, and report on their advertising campaigns. There was numerous controversy, related to their products, over tracking user behaviour, misleading users by offering an opt-out option that is insufficiently effective and serving malware via drive-by download exploits. One of the documents provided by former NSA contractor Edward Snowden shows that the NSA uses Google cookies to pinpoint targets.
The second company whose presence is most frequent in our research results is Facebook, covering almost half (46%) of the examined websites. Facebook trackers are mostly present through the like, buttons, logging functionalities and other widgets embedded on the 1st party websites. Whenever you visit a website that have some of those trackers embeded, your browser is sending your IP address (showing your geographic area), browser type and version, the page you’re at and other Facebook cookies from your machine, including your unique Facebook user ID, linked to your Facebook profile in case you are registered there. This allows Facebook to record your behavior even outside of their domain and relate to huge amounts of data that they have already collected on their social network.
Based on our sample of the 50 most visited websites by users from Serbia, more than ¾ of online tracking cookies are owned by companies from US (75.4%). Google is mostly responsible for such high results, taking half of the cookies pie for the US, and leaving the rest to be shared mostly among Facebook, Amazon and Twitter. Beneath the main layer of big US companies presented on the list there is a web of hundreds of smaller mostly advertising and data analytics companies tracking your online behaviour. We can notice presence of a few bigger regional players such as Gemius SA and Adocean Ltd from Poland, as well as the Serbian based HTTPool d.o.o. Overall, a really small percent of those cookies collect data for locally based companies. We can say that Serbia is a great exporter of informations about online behaviour of the citizens. the US is by far the most dominant user-tracking economy, extracting the highest financial value from our online behaviour.
Data is the oil of the 21st century and online tracking is one of the main technologies to extract this oil made of our behaviour, movements and preferences.
Cookies are dead, long live Cookies!