Invisible Infrastructures – SHARE LAB https://labs.rs Research & Data Investigation Lab Wed, 26 Oct 2016 07:50:34 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.5 https://labs.rs/wp-content/uploads/2017/02/cropped-Lab-01-32x32.png Invisible Infrastructures – SHARE LAB https://labs.rs 32 32 115803093 Invisible Infrastructures : Surveillance Architecture https://labs.rs/en/invisible-infrastructures-surveillance-achitecture/ Mon, 09 Mar 2015 11:46:37 +0000 http://labs.rs/?p=240 In April 2014, we collected about 2000 pages of documents and reports through the series of FOIA1 requests to the Commissioner2 related to the 2012 Report on the inspection procedure over the implementation and enforcement of the Law on Personal Data Protection by the operators and state bodies (the police and both civil and military intelligence agencies), that served as a  base for our analysis on metadata retention and digital surveillance architecture. Our tech and legal analysis, presented in a form of an infographic, illustrates different ways in which the 4 biggest telecommunication service providers in Serbia allow state bodies access to our metadata. The following series of infographics and the analysis show numerous methods of access to retained data, which circumvent legal procedures and necessary court orders (direct access to the servers, applications for direct access).

While smartphone penetration in Serbia is about 35% and constantly rising, the percentage of mobile phones in use is well over 130%3. Which means that about a quarter of the populations has more than one mobile phone. Metadata as a type of information was mentioned earlier, and in this context it is important to mention that each and every device regardless of whether it is a smartphone or an earlier generation mobile phone generates metadata. The only difference being that older mobile phones don’t support Internet, thus they don’t generate metadata related to Internet use. Because of the relatively high and rising number of smartphone users, as well as the prospects of development of the matter,  this research is conducted from a smartphone’s perspective.

Every smartphone commercially available in Serbia (and in the World) at present supports three types of traffic through the cellular network i.e. calls, SMS and mobile data (mobile Internet). It is important to note that all three types of traffic go through the same infrastructure, ergo the points in which surveillance is possible are the same for all of them. This would mean that in this part of the research we are talking about mobile device generated traffic in general and emphasising the differences that come to pass in all three different types of traffic.

So, let’s start from the beginning and explain the way a device connects to a network, or rather how it authenticates itself on the network. For the purpose of authentication the device uses 2 ID numbers, the first one is the device’s IMEI number (International Mobile Station Equipment Identity), and the SIM card’s IMSI number (International Mobile Subscriber Identity). Both numbers are unique and predefined for every device/SIM card. The mobile carriers have an infrastructures of Base Stations (BS) that are geographically distributed throughout the area that’s being served by the operator. The BS form the backbone of the entire mobile infrastructure.

Surveillance1C-01

When a call is initiated the caller’s device contacts the nearest BS, and the BS forwards the call to the Mobile Switching Centre (MSC). The MSC then informs the BS that is nearest to the called user who gets the call. Once the call is established (the called user answers the call) meta data is being generated in the MSC. The MSCs archive the metadata in the carrier’s own datacentre. The content of the calls is not being archived, but also passes through the MSC.

Surveillance2c-02-02

What type of metadata is being archived?4
The answer to this question varies from carrier to carrier, at least in Serbia, but there is a general set of metadata that all carriers archive i.e. Caller’s number, called number, IMEI, details about the BS, date and time of the call, duration of the call, amount of data (for Internet), type of service, details about the identity of both parties, list of all SIM cards that have been used in the current device (and vice versa, list of devices the current SIM card has been used in). There is also data that can not be classified as metadata, but can be accessed by having the aforementioned metadata, i.e. National ID number, user’s address (through contracts or registration of the SIM card for prepaid users) and device make and model (using the IMEI number). The process of archiving this data is called Data retention.

How is this data stored?
Carriers in Serbia are obliged by the law to store this data for a period of 12 months for every user. The data is stored on servers; there are no strict rules whether the carriers need to buy there own serves or can use other company’s servers to store all these data. However most of them have data centers in their ownership. All the operations on the servers are being logged for control purposes.

How can these data be accessed?
The mobile carriers in Serbia have designated departments that deal with affairs related to Data retention. The employees, who work in those departments are specially trained to deal with the entire process of data retention and access to retained data. When it comes to access of retained data, there have been identified several actors (i.e. state organs) that have accessed  retained data in some way. Not all state organs have the right to access retained data, this right lays with the organs of justice, as well as the Police, and both civil and military intelligence agencies. Even within this group there are differences in who can access what and how. There are several mechanisms, or channels that can be used for access to retained data.

Surveillance eng web3-03

Request5
The first mechanism is the most simple one, it’s based on the request – response principle. This mechanism is used by all state organs and all carriers. Namely, a representative of the state submits a request to the carrier in which the requested data is stated. There are several forms that are commonly used for submitting these requests, mostly by email, fax, phone or in person. The special department within the carrier then processes the request and delivers a report based on the input that has been submitted. Potential issues in this mechanism include the fact that requests submitted by phone should not be (and in some cases are) processed because of the possibility of fraud, and the inability to deliver the appropriate documentation (a court order). Some of the carriers have developed a system for submitting requests by designating a limited list of dedicated e-mail addresses that serve this purpose.

graphs-01

An upside of this mechanism is that every single request submitted to the carrier, this enables transparency and review of the requests the state organs submit.

graphs-02

Application for Independent access to retained data
Another mechanism for access to retained data is the so-called Application for Independent access to retained data. This is a software implemented by some of the carriers in Serbia for the convenience of the state organs. This mechanism is used by the Police, and both the military and civil intelligence agencies. This basically means that these organs do not need to submit a request in order to get data. The application can be accessed online with credentials provided by the carrier. A set of different queries is available within the application which offers practically limitless access to all the data that is stored in the database in a form of different listings (outgoing calls, incoming calls, data usage, SMS/MMS communication etc.) All of the aforementioned listings, along with the basic details of the user whose metadata is being accessed, contain detailed information about location, duration of service, and all the other types of data that were mentioned earlier as retained data. Submitting a court order for accessing this data is not a requirement, so it is clear why this mechanism would be problematic privacy-wise.

graphs-03

Even though these are the two primary mechanisms used by all carriers, there are some specific scenarios or specially established channels of commuting retained data between some carriers and some state organs. Here, we will give two such examples.

Sending data 
There is an established connection between one mobile carrier and the Security Intelligence Agency (BIA) which represents a standalone mechanism for access to retained data,  independent of all the other mechanisms. There has been a practise that on a daily basis, all the metadata of the users from the Mobile Switching Centre is automatically delivered to BIA. This creates special circumstances of non-transparent handling with retained metadata and implicates data collection on a mass level. Another issue with this mechanism is that it doesn’t comply with the legal provisions that allow for retained data to be stored for a maximum length of 12 months, because no authority monitors BIA for handling retained data. Further more, BIA doesn’t enjoy the right to archive metadata, this responsibility only lies with the carriers.

 Direct Access To the Retention database
Another case is the link between another carrier (who only provides with Internet and landline services) and BIA. In this situation upon a request of BIA the carrier provided them with a special connection to it’s own infrastructure in such a manner that BIA is able to access all four corners of the data system and also intercept digital communication in the carrier’s network.

It is important to note that the two last mechanisms do not have any legal grounds. Furthermore, they are an active threat to user’s privacy and are in conflict with the legislation that regulates electronic communications and similar matter both in Serbia and on international level.

Wiretapping

The principle Metadata doesn’t lie is certainly true, as is the fact that if metadata is mapped right it can provide the interested party with much deeper insight to the situation than the content of the communication. However, this does not mean that the content is not important.

Wiretapping is a technique that has been around for as long as electronic communications exist. With the new technologies used in the communication infrastructure and the new services that are available, the concept of wiretapping has changed and evolved into a new concept which is called surveillance. Surveillance is much more than wiretapping, it can be conducted on many levels, such as personal or organisational, but also on mass level. This means that someone can have the ability to listen into each and every call being made on a national or continental level. Mass surveillance is illegal in almost every country in Europe, for security purposes the law establishes a concept of interception of electronic communications.

wiretapping-06

Interception of electronic communications means targeted surveillance, which can be conducted in special circumstances with appropriate court order and for a limited period of time. However, when it comes to these issues even seemingly minor flaws in the law can have serious consequences and make space for mass surveillance.

In the recent years there has been a portion of bylaws that establish the rights and obligations of carriers and state organs in regard with interception of electronic communications. These regulations are put in such way that carriers are obliged to buy equipment (hardware and software) that can be used for interception and deliver it to a Monitoring Centre, whose headquarters are within BIA. Afterwards, BIA de facto has carte blanche for operation with the equipment, whilst the carriers retain the obligation to fund the maintenance thereof. As stated above, the interception as a sensitive process is very well regulated, but the implications of the bylaws and the lack of transparency in the actual execution of the process are a sound reason to question the legitimacy of the procedure, as it is currently being established in Serbia.

tracking-04

Physical tracking in real time

Base stations were mentioned in the introductory segment of this piece. They form the backbone of the cellular infrastructure. Actually, it is because of the BS that the entire network is  called cellular. A cell is a geographical area covered by a single BS. At any moment any mobile device is connected to three BS, for the purpose of continuity and redundancy. That means that at any moment in time three base stations send and receive signals to and from the device. Base stations are set up in such a way that record the distance to the device, which is in fact it’s location, through several parameters related to the signal, some of them are  AOA (Angle of Arrival), TDOA (Time Difference of Arrival) and TOA (Time of Arrival). This basically means that anybody who has access to BS can at any moment with a high level of accuracy determine the physical/geographical location of any device connected to the network.

In Serbia, according to the bylaws mentioned in the previous section has access to a special terminal equipment for tracking of devices. Furthermore, there are custom-made mobile devices that are configured in a way that they can be used for geo-tracking in real time. This mobile devices are issued by the carrier to the state organs upon request.  Which means that anyone who has access to that terminal equipment (meaning that it’s entirely up to BIA how it will be used) can precisely locate any mobile device connected to a network in Serbia6.

Documents
Report 
Telekom
Telenor
VIP

 Zapisnik11Zapisnik12Zapisnik13Zapisnik14Zapisnik15Zapisnik16Zapisnik17Zapisnik18Zapisnik19 Zapisnik20

]]>
240
Invisible Infrastructures : Data Flow https://labs.rs/en/invisible-infrastructures-data-flow/ Sat, 07 Mar 2015 14:19:07 +0000 http://labs.rs/?p=195 In the previous story we explored the exciting life of our hero – one small Internet packet, but in order to create a wider picture of the data flow and map key locations and actors we conducted a wider analysis of the data paths to the top 100 websites visited by the users located in Serbia.

networktopology

We used Nmap, an open source network security scanner for network exploration to  traceroute and visualize  the paths to the top 100 websites visited by users in Serbia5 according to the Alexa, Web Analytics company owned by Amazon.  Similar to our previous maps, every dot represents one IP address (router or other network device) and the lines between the dots are the links – cables that connect them.

nationalflow2

National traffic

This network journey starts at the yellow dot in the middle of the map. After a few local hops all the traffic heads towards a few points. Since our Share Lab is connected to the Internet via SBB, the biggest regional ISP, the results of this research are based on their network.

All the data travels first to their server in Belgrade at the SBB TelePark. All the traffic to the local websites goes through a single point (bg-ds-r-1-oe0-0-0-1sbb.rs).

So, in theory, if you would like to examine, filter or retain all the national traffic going through the SBB network, you would be able to do that using just this one point. In fact, SBB as well as the other ISPs in Serbia are obliged by the data retention law to do exactly that – to store all metadata about internet traffic and allow government bodies access them.

From this bottleneck of the national traffic, paths lead to different peering routers or to the Serbian Open Exchange (SOX). As we already explained in our Interconnection map of Serbia, networks, run by different Internet Service Providers, are interconnected at physical locations where their routers are connected by cables, such as in an Internet exchange points. Those are the places where different networks meet, merging different networks into a single system, allowing us to connect  to other connected devices on any other network.

Local exchange points allow informations to flow more locally. Without them, internet packets would flow in different routes, and in case of no direct connection between two providers they would go through a third provider or even another country. Unfortunately, there is just one Internet Exchange Point in Serbia and most of the packets go to Belgrade from any other place in Serbia. If more local exchange points existed in different parts of the country, data would flow more locally and significantly shorten their route .
But there is another curiosity visible on this map. There is a spot on Telenor servers as a part of the SOX network (mainstream-telenor.sox.rs) that connects the most visited websites in Serbia. It belongs to Mainstream d.o.o, a company established in 2005 that provides hosting and maintenance services. More than a half of the local websites from our sample are hosted by this company. Most of them have racks with servers  in three Data centers in Belgrade, but according to  our map they are mostly situated at the Telenor Tier3 Data Centar.

Telenor-Data-Center

Based on our map we can conclude that there is a high level of centralization of the local  internet traffic. We can define 3 different levels of centralization :

  1. Centralization on the level of ISP – single point where all the traffic is routed
  2. Internet Exchange point level – there is just one IX in Serbia
  3. Hosting level – Most of the biggest websites are hosted by one single company

 

Points of centralization are points of power, and the more routers or ISPs meet at a single point the importance of that point, router, server increases. It is of great significance to know who has control over these points, given that those entities have influence over the internet in Serbia and, providing the opportunity, could use or misuse their power.
exitnodes

Exit points

From our findings we are able to identify few main data flow paths going out from the country. Similar to the centralization of the local flow over one single router, we have a few main spots through which our data passes before going out.

The two biggest points of centralization according to our map are:

at-be-r-1-pc1.sbb.rs, : mostly connecting to the routers related to DE-CIX in Frankfurt
bg-yo-r-1-pc1.sbb.rs : connected with bpt-b4-link.telia.net leading to the routers in Hungary and Prague

peer-A515169.sbb.rs : connected with Google owned websites

whereismy4-01

Data about Data Flow

Now that we have detected the main bottlenecks of the local internet traffic and the main exit points, let’s try to analyze the main ports and countries where our data flows to as its final destinations.

Of the total 100 most visited websites by users from Serbia, only 27 are actually hosted in Serbia. More than a half of those are hosted by a single hosting company. 63% of our Internet packets leave the country. Let’s examine where.

One big stream of data heads towards Budapest (25) ,Vienna (25) and another to Prague (15). Those are mostly transit ports, that transfer data further to Germany, the Netherlands , the UK or Switzerland.

Frankfurt, Germany, is by far  the capital of our data flow, the biggest transit port of our data. Half of our packets pass through this city at some moment, mostly through the DE-C IX Internet Exchange Point. This place is not just the biggest gathering point for all internet packets that come from Serbia, but the biggest Internet Exchange place in the World, connecting more than 600 ISPs.

Even though Frankfurt is a transit capital, not a lot of data is actually hosted there. The biggest share of our sample websites that are hosted in Europe are situated in Amsterdam.  Another interesting fact is that more than half of those 11  websites are related to pornography. This is the red light district of the Internet’s second biggest port.

data flow cities-01

36% of our visits head over the Atlantic ocean to the US. Unlike the case of European countries where most of the data is in transit, here the data is hosted.

When looking at the overall picture regarding hosting of the most visited websites by internet users from Serbia, the conclusion that can be drawn is that the US’ hosting providers are dominant over the EU’s and Serbia’s (36% US, 27% EU, 27% RS).

Regarding data transfer, the most important location on the US East coast is Ashburn, Northern Virginia – one of the Internet’s capitals, home to a large number of data centers, a strategic communications hub for the eastern United States, a major communications gateway to Europe and the largest Internet peering point in North America.

Regarding the results of our research, the concentration of the final destinations of our Internet packets is dense on the West coast, especially around San Francisco and the Silicon Valley. But what we can not be sure of is whether this is the final destination or just a mask. The findings of our other research say that the exact locations are somewhere else, mostly around Northern Virginia, where big data centers of Google, Facebook and Amazon are located.

whereismap-01

According to our research, it seems that the Internet we use is not such a decentralized place after all.

Based on our sample, the Internet we use consists of main data transit and hosting sites, capitals of data flow, situated in only 13 countries, where our data either flows through or ends its’ journey. This structure is very different from the original idea of a mash, decentralized network, conceptualized in the beginnings of the Internet.

On the other hand, none of the websites from the list of the top 100 visited are outside of Europe and the US, not even from the region.

National borders of the Internet

For the purpose of this research, we can examine two different types of “borders” that exist on the Internet. The existence of the first type is consequential to the fact that in order to operate in a certain country, Internet Service Providers are obliged to act in accordance with different national laws and regulations. The fact that physical infrastructure, i.e. cables, routers, switches and servers are located on the territory of one country and that this infrastructure is owned and managed by a legal entity (company or institution), subjected to national regulations, directly points to links between the state, Internet Service Providers and the data traveling through the networks. For example, Internet Service Providers operating on the territory of Serbia are obliged by the Serbian Law on Electronic Communications to keep all the metadata and give different state institutions access to this data. In order to ensure their customers access to the entire Internet, ISPs have different interconnection points with providers in other, usually neighboring countries. Data, while traveling from one ISP to another, crosses the theoretical border point where one state jurisdiction ends and another starts. Borders applicable to the Internet are the same as the ones found in the “real” world. Internet Service Providers are the gatekeepers of the Internet and therefore any potential form of state censorship, filtering or throttling of traffic is most likely be conducted in cooperation with them. Mapping interconnection points of national and international providers and analysis of the network topology structure allow us to better understand the key points of this infrastructure, where potential Internet censorship, filtering or traffic throttling could happen.

The second type of borders relevant to our research are those created by websites, Internet platforms or applications themselves. Every device connected to the Internet in order to communicate with other devices has an IP address. Even though IP addresses are more logical rather than physical, using only the IP address one could easily determine the country in which the device is located. The reason for this is that the IP addresses are assigned to users by a single authority called IANA (Internet Assigned Numbers Authority), which assigns the ranges of IP addresses to entities interested in buying them, but keeps a database as for which range belongs to whom and other data including to which country is the certain range connected. Because of this, websites, internet platforms or applications are able to detect from which country you are visiting and allow or block your access to the content or service. Reasons for blocking of access on the national level to the content varies from different intellectual property and copyright issues to the blocking of sexual, political or religious content under the pressure from different governments worldwide. In this case, the role of the gatekeeper is played by the companies that own websites or applications. You’ve probably already seen a message like this on sites such as YouTube: “This content is not available in your country”.

Data Flow and privacy

All the ISPs in Serbia and in the most of European Union countries are still legally obliged to store metadata. By storing and analyzing metadata, ISPs and government bodies are able to trace and identify the source and the destination, the date, time, duration and the type of communication. Even without access to the content, metadata reveals private information – sometimes much more than the content would.

Appearing in a video conference call in September 2014, Edward Snowden explained: “Metadata is extraordinarily intrusive. As an analyst, I would prefer to be looking at metadata than looking at content, because it’s quicker and easier, and it doesn’t lie… If I’m listening to your phone call, you can try to talk around things, you can use code words. But if I’m looking at your metadata, I know which number called which number. I know which computer talked to which computer”. Stewart Baker, former General Counsel of the National Security Agency (NSA), said: “Metadata absolutely tells you everything about somebody’s life. If you have enough metadata, you don’t really need content.” How much do the terms you google, all the subjects of your emails, the network of people you communicate with, websites you visit, your location and communication habits reveal about your private or professional life? Metadata analysis is much more intrusive and efficient than for example traditional surveillance techniques practiced by Stasi in former East Germany, described as one of the most effective and repressive intelligence and secret police agencies to ever have existed, employing by some estimations between 500,000 and 2 million occasional informants.

But, without metadata, communication on the Internet as we know it today would not be possible. In order for communication to be possible, we give consent to ISPs to handle and process our data and metadata, and at the same time by living in Serbia or the EU, under the data retention laws, we have agreed that our metadata is being stored, accessed and analyzed by different government bodies. On the other hand, information is the resource driving the Internet industry. The business models of the biggest Internet companies are based on collecting and analyzing our private information and automated profiling in order to sell targeted ads.

Having this in mind, our initial interest in this research was to try to better understand the invisible networks and mechanisms underlying those processes. In our previous work, the focus was more on the legal aspects and analysis of different cases of violation of human rights online, mostly related to privacy and freedom of expression. In order to achieve progress, we believe that we should try to examine and understand technical reality and processes well hidden under the surface of device screens, the complex and invisible mix of software and hardware layers consisting of infinite lines of code and vast amounts of cables, routers and servers.

]]>
195
Invisible Infrastructures : Online Trackers https://labs.rs/en/invisible-infrastructures-online-trackers/ Fri, 06 Mar 2015 08:02:09 +0000 http://labs.rs/?p=6

There was of course no way of knowing whether you were being watched at any given moment. How often, or on what system, the Thought Police plugged in on any individual wire was guesswork. It was even conceivable that they watched everybody all the time.
Nineteen Eighty-four, (George Orwell)

We are all part of an invisible free immaterial labour system, not in a sense of free labor6 related to production of culture or content in digital economy, but more subtle and unconscious form of work based on our basic existence, our movements, patterns of our behavior and our location in both the internet and physical environment.

As you are connected to the network, information about your behavior  is being continuously collected, stored and analyzed by numerous algorithms created to serve different goals for their owners. The market for the analysis of large sets of data is growing by 40% per year worldwide7 and data about our behavior, our interests, our preferences is for sure one of the most  valuable set of data out there.

In this research, our main goal is to dive a bit deeper than the surface of the web and websites we visit and explore the network of hidden beneficiaries, companies that are collecting and analyzing data about our online behavior.

Invisible infrastructure

But let’s go a few steps back, into the architecture of collecting all those data.  A HTTP cookie (also called web cookie, Internet cookie, browser cookie or simply cookie), is a small piece of data sent from a website and stored in a user’s web browser while the user is browsing that website. Every time the user loads the website, the browser sends the cookie back to the server to notify the website of the user’s previous activity8. This 20 years old concept developed in 1994. became a valuable tool for commercialization and monetization of the network enabling development of user targeting business models that are now the main resource of income for most of the biggest Internet companies.

“Before cookies, the Web was essentially private. After cookies, the Web becomes a space capable of extraordinary monitoring”.
Lawrence Lessig

Even the existence of the html cookies was not widely known to the public until 1996, when they received a lot of media attention, especially because of potential privacy implications. Developed by Netscape in 1994, cookes were secretly introduced in Netscape’s first version of web browser, without notifying or asking the consent of users, without notification mechanism to alert people when cookies were being placed on their computer, without any transparency about informations stored in the cookie9. In the following 20 years of cookie existence, numerous advocacy groups, online consumer privacy groups, privacy commissioners, commissions and national and international regulatory bodies tried different approaches in educating general public, advocacy and legal regulation of cookies impact on users privacy.

Digital Footprint exploitation

There are 3 main types of targeting methods in the advertising industry: property10, user segment11, and behavioral targeting12. Behavioral targeting, most relevant for our research, is based on a exploitation of our digital footprint, the data that is left behind by users on digital services. This data is collected without the owner’s knowledge13 in most cases. Our digital footprint can contain different types of information: your IP address, websites that you visit, time and length of your visit, type of your equipment, your search queries, your location, your sex and age, sexual preferences, books that you are buying and many other information depends on a service that you are using.  All of those information brought together enable user profiling, process of construction and application of profiles generated by computerized data analysis and allow the discovery of patterns or correlations in large quantities of data about users. As our interaction with the Web becomes more natural and even mediates our interaction with others14, Web browsing behavior can be rich enough to uniquely characterize who we are through unconscious behavioral patterns and authenticate ourselves with a cognitive fingerprint15 .
Advanced targeting methods such as Predictive Targeting, performed by the algorithms, combining behavioral targeting, your history of response, location based data, socio-economic data, weather data or any other relevant data available is able to predict your response to the content in real time and serve you an advertisement most likely to provoke your reaction that will result with the conversion.

According to The Pew Internet & American Life survey16 from February 2012, 65% of the search engine users say “I’m NOT OKAY with targeted advertising because I don’t like having my online behavior tracked and analyzed”. But, even before the general public is even able to address opinion about this issue, it is important that they are aware of the scale and mechanisms of this phenomenon.

Data Hoarders

So, if you asked yourself a question: How come Google or Facebook are worth hundreds of billions of dollars even though they are providing a free service? – the answer is they are selling the service of profiling and targeting users, allowing others to serve their advertizing to a selected group of users. For example, the scale and quality of personal data that Google is able to collect today can be far more complex than the government secret services could have collected in the past. The ever growing hunger for data doesn’t stop on our screens, but extends to the physical space with mobile phone applications and platforms, biometric data through fitness wearable devices, constant flow of real time data through your Google glasses,  Internet of Things devices, navigation data from your Google car, smart houses, smart cities and finally conquering the Earth orbit with a system of satellites providing free Internet.
Unfortunately this invisible ecosystem based on exploitation of user data is the same one that supports free online services and content17.

Mapping the Trackers

Trackers numbers-02

According to our research conducted on 50 most frequently used websites by the citizens in Serbia there are in average 7 different 3rd party cookies embedded in every website we examined. In total, we detected 174 different types of cookies detected 365 times. Those 174 unique cookies belongs to 87 different companies.  There is massive dominance of 4 big US companies: Google (90%), Facebook (46%), Twitter (24%) and Amazon (10%) as well as the Infomediaries Gemius SA (36%), Httpool (7%).

Trackers company-03


 Tracking Giants

So, even if you are avoiding using Google services, your surfing behavior in 90% of the cases is followed by them. In our sample this is done through 17 different cookies. Google analytics as a most frequent one is installed on 65% of the websites. The second one, owned by the same company, is the DoubleClick, embedded on the 40% of the websites. DoubleClick is a subsidiary of Google, acquired in 2008, for US $ 3.1 billion, responsible for  products and services for advertising agencies and media companies to allow clients to traffic, target, deliver, and report on their advertising campaigns. There was numerous controversy, related to their products, over tracking user behaviour, misleading users by offering an opt-out option that is insufficiently effective and serving malware via drive-by download exploits. One of the documents18 provided by former NSA contractor Edward Snowden shows that the NSA uses Google cookies to pinpoint targets.

Trackers Google-02-01

The second company whose presence is most frequent in our research results is Facebook, covering almost half (46%) of the examined websites. Facebook trackers are mostly present through the like, buttons, logging functionalities and other widgets embedded on the 1st party websites. Whenever you visit a website that have some of those trackers embeded, your browser is sending your IP address (showing your geographic area), browser type and version, the page you’re at and other  Facebook cookies from your machine, including your unique Facebook user ID, linked  to your Facebook profile in case you are registered there. This allows Facebook to record your behavior even outside of their domain and relate to huge amounts of data that they have already collected on their social network.

Trackers Facebook-04

Based on our sample of the  50 most visited websites by users from Serbia, more than ¾ of online tracking cookies are owned by companies from US (75.4%). Google is mostly responsible for such high results, taking half of the cookies pie for the US, and leaving the rest to be shared mostly among Facebook, Amazon and Twitter. Beneath the main layer of big US companies presented on the list there is a web of hundreds of smaller mostly advertising and data analytics companies tracking your online behaviour. We can notice presence of a few bigger regional players such as Gemius SA and Adocean Ltd from Poland, as well as the Serbian based HTTPool d.o.o. Overall, a really small percent of those cookies collect data for locally based companies. We can say that Serbia is a great exporter of informations about online behaviour of the citizens. the US is by far the most dominant user-tracking economy, extracting the highest financial value from our online behaviour.

Tracker countries-05-05

 

Data is the oil of the 21st century and online tracking is one of the main technologies to extract this oil made of our behaviour, movements and preferences.

Cookies are dead, long live Cookies!

]]>
6
Invisible Infrastructures : Mobile permissions https://labs.rs/en/invisible-infrastructures-mobile-permissions/ Mon, 02 Mar 2015 09:06:07 +0000 http://labs.rs/?p=152 Users, even advanced ones, often neglect the importance of the Terms of Service, Privacy Policies and other legal documents they are bound to by installing applications on their devices. On the other hand, the companies that sell/offer those applications for free often make these documents in a way that the user grants many more permissions than the required minimum for the application to operate.

The reasons for making the ToS and the PP long, complex and hard to understand for the average user can be multiple. First of all, it is logical that the companies that produce or distribute applications want to protect themselves from almost any potential claim by the users and prevent legal consequences that can be costly harm their reputation. The second possible reason is access to personal information on the user’s device. However, not all applications have the same ToS and PP, and the goal of this research is to determine who is privacy friendly, and who is not.

Users actively access about 27 apps on their smartphones every month. Even though the number of used apps per month doesn’t increase very fast (from 23,2 apps in 2011 to 26,8 apps in 2013) the problem of not reading the Terms of Service and Privacy Policies persists as a common problem in the apps usage19. However, the average number of installed apps for android users is about 9520.  Analysis have shown that a Privacy Policy has an average length of 2.518 words and takes about 10 minutes to read, which means that a user needs to spend roughly 950 minutes (15,83 hours or 2 work days) in order to read the PP of the apps they have installed.

It is important to understand what is the story behind the confusing, complex and time consuming PP and ToS. Personal data of many formats (mostly content and metadata) has become a new type of currency. It is estimated that the accumulated financial value of personal data stored online could reach €1tn annually by 202021. Many  global companies have developed strategies and tailored their business models to the concept of providing content for a certain amount of personal data they can sell or use.

Mobile-01-01-01

The output of this part of the research is a logical map of permissions that applications for smartphones require the users to grant in the process of installation. The purpose thereof is to show, in a clear way, what users agree to. It is recommended that this map is read from the centre outwards. Starting with the categories of application, through choosing the actual application, reading the list of permissions it requires and finally understanding what do the permissions implicate in plain words. The categorisation of the apps means that the reader of this map will be able to compare different apps who give the same service and afterwards choose the less intrusive one. For instance consider comparing two search engines such as Google and DuckDuckGo. Google search requires permission to be able to execute over forty different operations on the device, while providing the same service as DuckDuckGo which requires permissions for execution of only three different operations without further prompting.

A further issue are the permissions required by the applications that come preinstalled on the device. In the case of Serbia, one carrier sold smartphones that came with several apps (including one media app) already installed on the device (without the possibility to uninstall it). In spite being in collision with the principles of net neutrality this issue takes away from the user the right to chose what kind of data will be given to whom.

Follow the money
There are several so-called monetisation models for smartphone apps. Essentially, it’s no longer enough to develop a really cool application, that is either useful, educational, practical or pure fun; the developers should find a way to make money out of it since the majority is used to getting free content or some sort of service. Monetisation mostly includes revenue from advertisements or surveys, but there are certain scenarios in which users can opt-out from the advertising system for a certain fee.

Mobile advertising is the most common source of revenue from smartphone apps. There are variations thereof, but generally they are characterised with compromised user experience, intrusiveness and users drop-off. Methods for ads delivery to the users include banner ads, interstitials, offer walls and notification ads.

An emerging financial source are surveys which are much easier to integrate in applications due to the fact that they are mostly rendered as an overlay within the application. They are generally more practical than ads and deliver up to 20 times the revenue of standard ads.

Other monetisation concepts include caller ads, widget ads, video ads, audio ads etc. However, there are ways to produce revenue without explicitly or implicitly tracking users. Some of them are, paid applications, applications with premium features and applications with subscriptions22.

Third Party Content vs. Mobile apps
This comparison might seem a bit strange at first sight, but let’s take a step back and look into the data that can be collected by TPC and by mobile apps. As much as it is annoying to have some company collect your data without your explicit permission, which makes TPC one of the most intrusive concept on the Internet, it is much worse to be obliged to give permission to some company that you might or might not know or like, to access certain type of data on your device.

Now, it is important to note that TPC can only access metadata, which by default is a somewhat public category of data. Furthermore, there are techniques and procedures (such as using TOR, AdBlocker etc,) that help users preserve a high level of privacy. The deal with the smartphone apps is that the user seals the deal and “willingly” gives away quite a slab of privacy; whilst not accepting the ToS and PP as presented, signifies not being able to use the application at all.

Just to be frank, metadata (even though it’s been defined several times throughout this paper) is device/software generated data that is necessary for every activity on the internet. This includes IP address, time of access, duration of session, type of software used, location (which is based on the IP address) and the likes, and that is basically all that TPC owners have access to (which should not be considered little in any way).

What do these permissions mean?
Although most of the permissions are straightforward, users often don’t really perceive their intrusiveness, not because they don’t understand the words, but rather because they neglect to understand the meaning thereof. This is a good point to introduce the most common permissions users come across in most of the apps they install.

Make phone calls. This permission allows apps to call phone numbers which can cost the user.Applications can launch the phone screen and fill the number, but needs to prompt the user to press the call button, this permission allows apps to do the entire process in the background.
Send SMS or MMS. This permission allows apps to send SMS and MMS on behalf of the user, this can also cost the user.
Modify/delete SD card contents. This permission allows apps to read, write and delete anything stored on the SD card. There are many legitimate reasons for asking for this permission as many users want for the applications to write some data on the SD card.
Read Contacts. Unless the application explicitly states a specific feature to access contact details, there should not be a reason to ask for this permission. It can access each and every contact stored on the phone.
Write contact data. Applications that are used for quick dial, and certain social networking apps might need this permission for regular operation, otherwise seeking this permission is unjustified.
Read calendar data. Calendar data often includes contact and location data, which makes it a certain type of sensitive data.
Read browser history and bookmarks. Browser history and bookmarks reveal quite a lot about the user, so access to them imposes a certain level of privacy invasion.
Read sensitive logs. Logs contain data that can be logically mapped and reveal the user’s activities, some applications log data such as usernames and passwords.
Modify global system settings. Modifying global system settings can be an intrusive operation if the modifications lead to revealing other types of user data. (Turning on and off location settings)
Retrieve running applications.The list of running apps is a legitimate resource for applications like task managers, but it also reveals information about the user’s preferences and types of services used.
Display of system-level alerts. Abuse of this permission can lead to heavy pop-up advertising.
Take pictures and videos. This permission allows the application to take pictures and videos without any further prompting.
Access location extra commands. Applications who have this permission have detailed information about the user’s geographical location.
Change configuration. It is not clear what does this permission grant, other than changing language and regional settings.
Kill background processes. Potentially risky permission if used to kill processes of anti virus and similar apps.
Process outgoing calls. This permission grants access to outgoing call related metadata, so it should only be granted to VOIP apps.
Use SIP . SIP Session Initiation Protocol is used for VOIP services, so it has similar features as “make phone calls” permission.
Write secure settings. This permission should be reserved for system applications.
Read profile. This permission allows the application to read personal account details of the user stored on the phone.
Read SMS. The applications to whom this permission is granted can access and read SMS, which as such is a serious breach of privacy.
Write call log. This permission can be abused for hiding malicious behavior.
Write profile. Applications that have this permission can write data into the user’s profile.
Read social stream. This permission allows apps to access updates from social network like Facebook and Twitter. This includes not only the user’s own updates, but also the updates of users in their network.
Authenticate accounts. This permission allows apps to authenticate credentials such as passwords, this is legitimate for apps that ask for user authentication and should be reserved for them, even though is often used for phishing.
Read email attachments. Email attachments often contain sensitive information and should thereby be private. This permission should be reserved for e-mail client apps.
Receive SMS/MMS. This permission allows the application to monitor incoming SMS/MMS, record them or preform processing thereon.
Add system service. This permission should only be reserved for system applications.
Read instant messages (IM). Applications that ask for this permission can read instant messages such as messages on Facebook messanger and the likes. 

Intrusiveness
Finally, it is important to categorise the permissions because the users have a right to choose which application they will install on their own devices, and sometimes it is really hard to determine which application is privacy friendly and which one is not. That is why within this part of the research we conducted evaluation of different sorts of permissions granted to apps. Basically we categorised the permissions in 3+1 category; Permissions with high, medium or low Privacy risk (level of intrusiveness) and App specific permissions.

Permissions type

The analysis of this secondary output shows that the apps we analysed require many permissions with high level of intrusiveness. While some of the permissions that are required are legitimate for the operation of the app and is in accordance with the type of service the app provided, the requirement of some permissions should be seriously reconsidered by the application’s developers.

]]>
152
Invisible Infrastructures: Understanding Autonomous Systems https://labs.rs/en/as/ Tue, 10 Feb 2015 13:29:33 +0000 http://labs.rs/?p=34 The Internet in its essence is not what most people perceive when online. It is an abstract space which gives limitless opportunities, but basically it consists of hardware, millions of servers, routers, cables and other network peripheral devices. Basically, in most cases, there is a physical cable or wireless connection reaching almost every corner of the world and every Internet user. Each and every network device of the Internet infrastructure has its own physical location. Some of them are grouped, which makes their locations a sort of “crossroads” of the Internet.

One of the reasons we seldom discuss the issues of this invisible infrastructure is the fact that the speed of the packets traveling through the network is so big and unnoticeable to us, in most cases we don’t feel a significant difference in whether our packets are traveling  just around the corner or to around the world and back.

The fact that we are not able to perceive this difference does not change the fact that those packets, during just a little fragment of a second, travel through thousands of kilometers of cables, myriad of routers and switches, different national territories and a number of potential spots where they can be retained, slowed down, stored, copied or examined.

Unlike the telephone network, which for many years was a monopoly run by a single company in most countries, the global Internet consists of tens of thousands of interconnected networks run by telecommunication companies, Internet service providers, individual companies, universities, governments, and others 23 . Those entities have different legal regimes, business and technical relationships, privacy policies and ownership models. Even our most frequent and most sensitive communication relies on those entities. But even so, in most cases, our knowledge of how those networks are interconnected and how they deal with our data is left in the dark.

orion wide

Our first step of understanding this invisible network is to try to understand the structure of our nearest network, network runed and owned by our Internet service provider. Every ISP is a story for itself, they have a different number of users, a different number of interconnected routers organized in different structures.
Every device that is connected to the Internet (your computer, routers, servers) have an IP address. The IP address is a logical Internet Protocol address which allows data to flow over the Internet. IANA (Internet Assigned Numbers Authority) through the RIRs (Regional Internet Registries24) assigns the ranges of IP addresses to entities interested to buy them, and  they keep a database of which range belongs to whom and other data,  including which range is assigned to which country . So, every ISP has a limited and defined range of IP addresses that they further assign to their users and infrastructure that they own.

This set, range of all IP addresses that one ISP owns, was the starting point of our research.

provajderi tree

We used IP ranges of every ISP and created a Network Topology map for every one of them. In order to visualize large sets of data, in our case more than 300.000 different IP addresses and links between them, we had to find a tool that is able to display, manipulate and transform the network into a map. We used Gephi 25, an interactive visualization and exploration platform for different kinds of networks and complex systems, dynamic and hierarchical graphs. The obtained results are showed below in form of 30 different maps of ISPs in Serbia.

Yunet Verat Telekom Sinet SBB SatTrakt RadiusVektorPTT
Orion Kopernikus IKOM HallSys ExeNet Beotelnet Zrenjanin Beotel AVCOM Amres Absolut OK

Different structures, and what we can learn from them

Network Structure analysis can be useful for different aspects of network security and efficiency of the network, but our main interests as researchers in this case are related to possible privacy related misuse of the network, digital surveillance and data retention, and different forms of Internet filtering, content control and censorship.

There is three basic network structures:
Centralized. All the devices are connected to one center. This center has privileged accessibility and thus represents the dominant element of the network.
Decentralized. Although the center is still the point of highest accessibility, the network is structured so that sub-centers also have significant levels of accessibility.
Distributed. No center has a level of accessibility that significantly differs to the others.

By analysing our visualizations of ISPs in Serbia we have noted that both centralized and decentralized models are present. The centralized model can be associated with the network of the state owned Telekom Serbia and an example of a decentralized model can be seen in the case of the University network – Amres.

But, except feeding our curiosity  for deeper understanding of our technological environment and passion for visualizing big sets of data, can we have a practical use of those maps in the field of internet freedom and user privacy?

The Game of Filtering

Internet filtering (or Internet Censorship) is one of the most widespread forms of government approach to internet control. Internet freedom around the world has declined for the fourth consecutive year, with a growing number of countries introducing online censorship and monitoring practices that are simultaneously more aggressive and more sophisticated in their targeting of individual users 26 .

There are three commonly used techniques to block access to Internet sites: IP blocking, DNS tampering, and URL blocking using a proxy. These techniques are used to block access to specific Web Pages, domains, or IP addresses. When the targeted websites are outside the legal jurisdiction of the government (in a foreign country) this is the most effective way to block access to their citizens. There are more advance techniques, (blocking searches involving blacklisted terms, keywords analysis, dynamic content analyses) but they are more rare and we will discuss them in other parts of our research.
What we find most interesting, related to our ISP mapping efforts is the question: Where will internet filtering take place in our ISP network topology? According to the OpenNet Initiative study, Internet filtration can occur at any or all of the following four nodes in network:

1) INDIVIDUAL COMPUTERS
2) INSTITUTIONS Filtering the network on an institutional level  using technical blocking
3) INTERNET SERVICE PROVIDERS Government-mandated filtering is most commonly implemented by Internet Service Providers (ISPs) using any one or combination of the technical filtering techniques mentioned above.
4) INTERNET BACKBONE State-directed implementation of national content filtering schemes and blocking technologies may be carried out at the backbone level, affecting Internet access throughout an entire country. This is often carried out at the international gateway.

Amres

In one of our previous researches 27 related to the case of the national research and education network of Serbia – AMRES’ internet filtering practice, we discovered a decentralized method of content filtering, delegated and executed through local administrators and routers at every University in Serbia. Each local administrator is responsible for his own black list of sites and ports. The AMRES network is one of the oldest ISPs in Serbia, established in the early 1990s, and its method of Internet filtering presented here is filtering on institutional level. If we take a look at the visualization of the AMRES network, we can clearly see why this method of Internet filtering was the most applicable one – the decentralized structure of the AMRES network  somehow imposes this kind of filtering strategy.

In our view, that type and complexity of a network structure and topology, ownership model & management needs, have a crucial role in defining the model of internet filtering, and  the amount and type of equipment that will be used. For us, users or researchers without access to privileged information,  the  analysis of network topology maps can be a starting point for better understanding infrastructures of control and potential repression.

Telekom

In December 2014, the Government of the Republic of Serbia sent a Proposal of the Law on Amendments to the Law on Games of Chance 28 to the Parliament. The proposed changes were adopted without a discussion and public insight, even though these provisions would introduce Internet censorship in Serbia through a  “back door”. The solution that presented the main problem was the amendment 29  which prohibits “ enabling access to websites by domestic electronic communication network service operators to legal entities or individuals organizing games of chance without the approval or consent of the Administration”.

Fortunately, after SHARE Foundation analyzed the Proposal and started a media campaign, the Proposal of the Law was withdrawn from the parliamentary procedure following an intervention of the Government. In one part of the Proposal, it was written that the installation, maintenance and costs of the equipment intended for filtering is a  responsibility of the ISPs. In order to create an argument regarding unreasonable costs that every ISP would have, we tried to analyze the network topology maps of every individual ISP in Serbia and try to guess how much and what kind of equipment they would need to purchase. Even though our method is not  100% accurate, we had in our hands something to work with, something that gave us an insight into the unknown and invisible design of the networks. By watching the map of Telekom Serbia’s network, the biggest ISP in Serbia and owner of the biggest share of the infrastructure, we could observe the highly centralized structure where almost all the main nodes, routers were connected to just two main servers. The logical conclusion is that in order to perform real time filtering they would need to instal equipment exactly in those two points. On the other hand, from the number of nodes attached to those two main routers, we can guess that they are able to process huge amounts of traffic, therefore the equipment that they would need to install would probably need to be of high-end performance. We were able to predict the type and cost of the theoretical filtering solution, giving that there are just a few manufacturers of such equipment.

We played the  Game of Filtering on the maps of the other ISPs as well, and each of them was a story for itself. Most of them were much more decentralized and we needed more efforts to find out where filtering could potentially happen. Decentralized networks are more complex to control, they have more crossroads, more points to cover if you want to have access to all the data flows. Although, it’s hard not to see the shape of the Panopticon structure in the case of the network organisation similar to the one we saw on the case of Telekom Serbia.

Given that our analysis is still only at the level of an individual ISP, this is just a small fragment of the story. The Internet is a network of networks, and to be able to create a full picture and to understand where the points of control are, we need to examine their local interconnections and links to the International networks. This is the topic of our next analysis.

allproviders

]]>
34
Invisible Infrastructures : Internet Map of Serbia https://labs.rs/en/internet-map/ Sat, 07 Feb 2015 12:10:25 +0000 http://labs.rs/?p=183 For thousands of years maps have been the essential tools to help human mankind to define, explain, and navigate their way through the world. Topology maps of the Internet are an important tool for characterizing the infrastructure and understanding the properties, behavior and evolution of the Internet.In our previous study, we explored individual Internet Service Providers, their size and structure. Now we are trying to understand, how they interconnect, we are exploring a network of networks or we can say the Inter of  Internet.

InternetMap

What are we looking at?

By identifying and tracerouting 300.000 IP addresses and 30 ISPs in Serbia using various open network analysis tools, we created a map representing over 4.500 main routers and servers that make the core of the national Internet infrastructure. This Network Topology map allows us to identify the main actors, companies (ISPs) that own and control the infrastructure, have a possibility to access, retain, analyze or sell user’s metadata, their interconnection points, national Internet exit points and the level of infrastructure centralization on both national  as well as the level of  individual ISPs.

Every dot represents one IP address (router or other network device) and the lines between the dots are the links – cables that connect them. Every colour represents  a different Internet Service Provider (ISP). This is a Network Topology map, i.e. it is not a physical map and it does not show exact geographical locations.

Networks, run by different Internet Service Providers, are interconnected at physical locations where their routers are connected by cables, the points of connection are called Internet exchange points (IXP). Those are the places where different networks meet, joining different networks into a single system, allowing us to connect  to other connected devices on any other network.

Interconnection is both definitive of the Internet, and a manifestation of a business relationship between two ISPs30.

Most ISPs are unlikely to have peering arrangements with all other ISPs in the world. Thus, with the exception of a small number of very large multinational network operators, most ISPs, themselves, need at least one transit provider to ensure they (and their customers) can reach the entire Internet31.

Despite the strong theoretical background, and the virtuality of the matter which was subject to this research, the output is quite concrete.

The most important conclusion is the identification of the intersections, i.e. the points where the ISPs meet. These are points of power, and the more ISPs meet at a single point the importance of that point, router, server, increases. It is important to know who manages and controls those points, because that is the entity that controls the internet in Serbia.

Anyway, the most important output of this research is that it can serve as a starting point for different multidisciplinary researches related to the internet infrastructure in Serbia. A few examples would include, measuring the internet speed in Serbia, measuring the level of bandwidth throttling, determining the routes that are used most often when accessing online content, etc.


Methodology

The research process is divided into four phases. Every phase is equally important since it provides the input data for the phase that follows. The final output of this research can also be used as an input to some other, more advanced analysis.

Determining the IP ranges

Every device that is connected to the Internet has one or more interfaces through which it communicates with other devices on the network. Each and every network interface is defined by a certain set of parameters, one of which is it’s IP address. The IP address is a logical Internet Protocol address which allows data to flow over the Internet from it’s source to the destination it was intended to reach.

Even though IP addresses are more logical rather than physical, using an IP address it is simple to determine in which country the device that uses it is located. The reason for this is that the IP addresses are assigned to users by a single authority. IANA (Internet Assigned Numbers Authority) through the RIRs (Regional Internet Registries, RIPE NCC for Europe and parts of Asia) assigns the ranges of IP addresses to the entities interested to rent them, but they keep a database as for which range is assigned to whom and other data including to which country is the certain range connected. That means that the IP addresses are also somewhat physical addresses. This information is publicly available, and there are websites online that show the IP address ranges by country along with the actual owner.

Scanning the Network

Since not all of the devices are connected directly to each other (in fact few are, i.e. even computers positioned in a single office use a router to communicate), there is the necessity of routing over the Internet. That means that if one host wants to communicate with another host on the Internet, he needs to establish a route through which they can connect. That route is in essence a set of IP addresses of different network devices that make it possible for the two hosts to communicate.

This means that in order to reach the destination address, the data hops from host to host. In order to see how two hosts are connected, the ICMP (Internet Control Message Protocol) is used. That is one of the most important protocols in the IP set of protocols. There is a simple tool, called traceroute, which is mostly used in network diagnostics. This tool makes the data hops over the Internet visible and systematic, which makes them usable by sending ICMP messages and waiting for responses from the destination hosts.

For tracerouting ranges of IP addresses there is a special tool called Nmap, which is quite user friendly, detailed and precise. Naturally, the bigger the range, the more computer resources are exploited. Basically, Nmap traceroutes the paths between the hosts on which it runs and every IP address from the range that is being scanned.

Note: The output is actually consisted of the routes that connect the source computers to all the active hosts from the range that accept ICMP messages.

Data Processing

The outputs of the scans are what we can call “raw data” in this case. They contain quite a portion of data that is not usable due to the hosts not giving any response during the scans because of different reasons, and are as such irrelevant for the Internet infrastructure at the time of scanning.

The actual usable data needs to be extracted and formatted in a proper way, so that it can be used as an input to the visualization software. First and most important it is to know what the software used for visualisation can work with. For this research it was CSV (Comma Separated Values) file, with a simple structure, i.e. 3 fields Source IP, Destination IP and Label.

The output of Nmap can be stored in a .xml file. Both of these file types are a special variant of text files, which makes the entire process of parsing data much easier. In essence, what is needed is a piece of software that will extract some text from one file, and put it in another. There is an ample of solutions available online, manly scripts. In this case a python script was used.

The script takes two arguments, the input file and the output file and what it does is, it searches the text files for a certain words (in this case trace and ipaddr) and when it comes to those predefined keywords it takes the necessary values. In the end it generates the .csv file with the required structure (in this case omitting the Label field, which is not required). The script is available here.

Note: People who prefer Perl to Python should consider this link.

Data Visualization

In order to visualize large sets of data, in our case more than 300.000 different IP addresses and the links between them, we needed to find a tool that has the ability to display, manipulate and transform the network into a map. We used Gephi, an interactive visualization and exploration platform for different kinds of networks and complex systems, dynamic and hierarchical graphs.
Our main challenge was how to represent a large number of nodes, in a most convenient way and still have a visualization useful for further research. Most of the Graph Layout Algorithms integrated into Gephi software during our tests failed to deal with large networks ( +100k nodes ) except partially OpenOrd and ForceAtlas2 algorithms.
ForceAtlas2, the algorithm that we used in the end is a Continuous Graph Layout Algorithm, a force-directed layout which is integrating different techniques such as the Barnes Hut simulation, degree-dependent repulsive force, and local and global adaptive temperatures. More about the algorithm you can find here.
In order to represent more clearly the results we chose to eliminate end-nodes and eliminate *noise*,. This reduced and cleared data set consisted of 4067 nodes, IP addresses that represent interconnected infrastructure of the main routers and servers serving the end users in Serbia.

Tools

Nmap ( http://nmap.org/ )
Python script used for XML to CSV parsing (script)
Gephi ( http://gephi.github.io/ ) 

]]>
183
Invisible Infrastructures : The Exciting life of Internet Packet https://labs.rs/en/packets/ Thu, 05 Feb 2015 12:44:51 +0000 http://labs.rs/?p=1 Before we dive deeper into the exciting life of an Internet packet, we should make a short stop and try to understand some basic technical aspects of the Internet communication and infrastructure. The Internet is a global network of computers and each computer connected to the Internet has a unique address. This address is known as an IP address (for example 24.135.245.173).

All the information transmitted through the Internet, between the routers, servers and other hosts, is split into smaller chunks of data known as packets. Every packet consists of a header and content. If we need to explain this by using an analogy, we should think about those packets as a traditional paper envelope where the letter inside is the content and the stamps and the addresses written on the outside are the headers. Without an address written on the envelope, the letter will never reach the intended destination. Similar to a post office, the ISP’s router examines the destination address of each packet and determines where to send it. As we said, those “addresses written on the envelope” are called headers and they are one type of metadata.


On a sunny morning at 7:45:03, one Internet packet is born. 60 bytes weight, with just one simple mission in life – to get to the place called 173.252.120.6. Even though this does not sound like an exciting mission in life, things that happen in the next 1 second are pretty exciting. His journey starts with a fast 7ms jump, 5 meters away to the box called home router. Over the attic, where he passes through the switch where all the cables from the building meet, he jumps down to the street and into the underground cable that brings him to the main city router in Novi Sad. With a speed of 30.600.000 km/h he  runs for 10 ms to Belgrade, to the SBB TelePark building.

SBB89.216.8.141 SBB TelePark, Belgrade, RS (Photo: Google StreetView)

He jumps around a few routers inside of the building and then leaves the country, travels for 0,05s through the tunnel in the direction of Frankfurt, Germany. Frankfurt is a really popular destination nowadays for young Internet Packets born in Serbia. Almost 50% of them at some point of their really short life, pass through the DeCIX, the biggest Internet Exchange Point (IXP) in the world32 with an average 2523 Gigabits of traffic per second33. This is the place where more than 600 ISPs from more than 60 countries meet and connect, something like airports for the Internet.

In his long distance journey our internet packet will jump from one “crossroad” of the Internet to another, passing different countries, invisible borders and visiting big, gray, dehumanized buildings in the suburbs of the cities. The European IXP scene today consists of some 150 IXPs and represents an impressive spectrum of players, ranging from the largest IXPs worldwide34 to up-and coming IXPs and critical regional players35 all the way to small local IXPs that can be found all across Europe36.

Frankfurt180.81.194.40 – Equinix, Lärchenstr. 110,  65933 Frankfurt – DE-CIX premium enabled site (Photo: Google StreetView)
Frankfurt280.81.194.40 – Equinix, I.T.E.N.O.S. KPN, Level3, Telehouse , Kleyerstrasse 79-90, Frankfurt. – DE-CIX (Photo: Google StreetView)

After the visit to the biggest internet exchange point in the world our packet is off to Dublin, Ireland, passing through the TelecityGroup carrier – neutral data center specialized for bandwidth intensive applications, content and information hosting.

Dublin31.13.30.211 TelecityGroup, Dublin, IR (Photo: Google StreetView)

Some destinations on the path of our Internet packet are hidden for us, numerous repeaters, network equipment and intermediate routers on the way do not reveal their existence on our tracerouting results. Most of this invisible equipment on the way is there to make this travel possible, keeping the speed of packets constant or just connecting two cables, but some of the equipment  on the way are hidden from us for other reasons. In the 1970s, Skewjack farm in west Cornwall, England, at the coast of the Atlantic ocean was known as a cult place for sea-surfing enthusiasts – the Skewjack Surf Village. Unfortunately, the surf village was closed in 1986 and this place became known for another kind of surfing, web surfing, or to be more precise – an extended form of web surfing voyeurism and hoarding. This farm is situated just a few kilometers from one really important place for the Internet, Widemouth Bay south of Bude, landing spot for some of the biggest and most crowded transatlantic optic cables, connecting Europe and US, one of the backbones of today’s Internet. Before the Internet packet dives deep beneath the ocean, he will most likely jump to the bunker-like building at the Skewjack farm.
skewjack2

skewjackSkewjack, UK (Photo: Google Maps)

It was revealed in 2014  that this farm was the location of the Government Communications Headquarters interception point that copies data to GCHQ Bude, an even more visually exciting farm, populated with tens of huge satellite dishes that serve as a satellite ground station and eavesdropping centre. There is an estimation that 25% of all internet traffic travels through this point37.

gchq2gchq1
GCHQ Bude, England (Photo: Google Maps)

After a quick detour, our packet goes into a transatlantic cable landing site 10 km away at the Widemouth Bay, near a small coastal city of Bude, a place with one of the biggest concentration of transatlantic optic cable landing sites in the World.

Before 1866, information traveled from one side of the Atlantic to another only by ship, and this sometimes took weeks. The first attempt in 1858 of laying a 2,000-mile copper cable along the ocean bottom was successful but was operational for only three weeks, when it was destroyed after having experienced many technical difficulties38. It took nine years and five attempts to succeed in building the transatlantic telegraph cable “The Eighth World Wonder”, technology that will rapidly transform communication between continents and create the first worldwide communication network.

cablestation
Cable Station, Valentia Island Ireland (Photo: Google StreetView)

The 1866 trans atlantic telegraph cable, laid down between Valentia Island in Ireland and  Heart’s Content in Newfoundland US, could transfer 8 words a minute, and initially costed $10039 to send 10 words40 . In 1900, the shape, topology of the telegraph network41 looked very similar to the submarine telecommunication optic  that we have today42 . The main landing points of  this network, made of thousands of kilometers of optic cables, are shaped by geographical conditions as well as political and economical power – the power to access, transfer and store informations, to participate in the data and metadata exploitation industry and surveillance-industrial complex.

It’s hard not to be seduced with the magic of those tiny streams of data traveling with a speed of light on the ocean floor. Different data streams are separated in different frequency of light, allowing enormous amounts of data to be transferred, traveling with speed of, in case of our packet, 50.000.000 m/s43 . In the past 150 years, speed of transatlantic communication jumped from the metric of weeks to the fraction of a second, far beyond human perception, making the process of information transfer abstract and invisible. Still, for the high frequency trading algorithms, responsible for a half of the European Union and United States stock trades, every millisecond lost in transfer of data plays a crucial role, pushing for faster and more sophisticated solutions in data transfers.

tuckerton
Tuckerton NJ,
TAT 14 Landing point (Photo: Google Map)

 There are a couple of main spots for cable landing on the other side of the ocean. They are mostly situated on the east side of Long Island (Brookhaven), Manasquan and Tuckerton in New Jersey, an hour and a half drive south from New York city. Our Internet packet is now heading south, towards another Internet capital – Ashburn, Virginia, 50 km northwest of Washington, D.C.

At first, the Internet backbone was maintained by the US government, runned by the National Science Foundation and was used by the academic or educational communities and institutions. Their supercomputing initiative, launched in 1984, was designed to make high performance computers accessible to researchers around the US44 and in 1986 this 56 kbit/s backbone was connecting scientific centers across US. But this backbone was prohibited for growing number of commercial ISPs by the NSFNET Acceptable Use Policy45. In the beginning of the 90s commercial ISPs needed to find a way to make a physical connection between themselves in order to exchange traffic over their private infrastructure, avoiding government owned backbone. They came up with a common, neutral physical locations where they would connect their networks, some kind of a informational highways’ roundabout. One of the first such locations was Ashburn, suburb of Washington, D.C, populated with numerous technology startups, military and government contractors. MAE (Metropolitan Area Exchange) created in 1992, fast became one of the biggest crossroads in the Internet history, with most of the world’s Internet traffic passing through it at some point, creating a sort of an Internet black hole. The 5th floor of a building on Tysons Corner became a bottleneck of the Internet.

The opening of the network access points also marked an important philosophical shift, one that would have ramifications for its physical structure. In a clear departure from its original roots, the Internet was no longer structured as a mesh, but rather entirely depended on a handful of centers46.

Even though it is no longer as influential as it was in the beginning of 90s, Ashburn is still one of the Internet capitals, home of a large number of data centers, a strategic communications hub for the eastern United States, a major communications gateway to Europe and the largest Internet peering point in North America.

virginiaEquinix, 44470 Chilum Place, Ashburn, VA

After a visit to the former Internet capital, our Internet packet heads 700 km southwest, to his final destination – Forest City in North Carolina. Forest City – a home to 7,500 residents and hundreds of millions of user profiles. Physical manifestation of Facebook. The world’s biggest database of personal informations, private and public photos, intimate chats, thoughts and emotions packed into two massive 28.000 square meters facilities filled with hard drives, routers, wires and cooling systems.

facebook31.13.29.232 Facebook Data Center, Forest City, North Carolina, US (Photo: Google StreetView)

Only 80 full-time employees working three shifts are needed to run these gigantic gray buildings. Thanks to the automation systems47, one technician can take care of about 25,000 servers that work in complete dark, lights turning on only when sensors detect movement. Not far from this place there are other big facilities, created with the same goal, similar in size but operated by Google (in Lenoir) and Apple (in Maiden).

googleGoogle data center, Lenoir, North Carolina, US (Photo: Google StreetView)

Those are the locations where your data actually exists. Data centers are monopolies of collective data, accumulation of information about information48.Those are the locations where metadata society accumulates wealth, consisted of vast amounts of information, created by us and analyzed by them.

This is the end point of the exciting 1-second-long life and journey of our Internet packet. In only one second, he traveled over 9000 km and crossed numerous borders, being transferred from one ISP to another, operating under different legal frameworks and commercial interests, jumping from one Internet crossroad to another and leaving a trace of his existence at every point of his path. The life mission of this packet was simple, he was created to send information to facebook.com that a user, somewhere in Serbia typed www.facebook.com in his browser. Once at his fated destination he will trigger birth and send out on a journey a certain amount of new packets, filled with informations that will travel in the opposite direction, from the Facebook data center to the user’s computer, resulting in a Facebook page being shown on his screen in a blink of a second.

datacentar

 Ghosts and the afterlife of Data

At his final destination our packet will be stored, buried to rest in a dark, cold room of the data center among other billions of packets, waiting to eventually have an afterlife, to be a subject of algorithmic analysis. But this is not the only place where he will be stored. On his journey, at numerous points he was cloned  and stored in other data centers, ISPs’ data retention servers in different countries by different government agencies or commercial companies. He will eventually be used in different ways, as a piece of the big puzzle presenting your behavior, preferences and interests or as a little piece that will differ you from or mark you as a potential terrorist in the eye of the algorithm. On the other side, our little Internet packet will contribute to  the fast growing industry of  personal data collection, analysis and trade. The estimated value of EU citizens’ data was €315bn in 2011 and has the potential to grow to nearly €1tn annually by 202049.

]]>
1