Invisible Infrastructures : Data Flow

In the previous story we explored the exciting life of our hero – one small Internet packet, but in order to create a wider picture of the data flow and map key locations and actors we conducted a wider analysis of the data paths to the top 100 websites visited by the users located in Serbia.

We used Nmap, an open source network security scanner for network exploration to traceroute and visualize the paths to the top 100 websites visited by users in Serbia¹ according to the Alexa, Web Analytics company owned by Amazon. Similar to our previous maps, every dot represents one IP address (router or other network device) and the lines between the dots are the links – cables that connect them.

National traffic

This network journey starts at the yellow dot in the middle of the map. After a few local hops all the traffic heads towards a few points. Since our Share Lab is connected to the Internet via SBB, the biggest regional ISP, the results of this research are based on their network.

All the data travels first to their server in Belgrade at the SBB TelePark. All the traffic to the local websites goes through a single point (bg-ds-r-1-oe0-0-0-1sbb.rs).

So, in theory, if you would like to examine, filter or retain all the national traffic going through the SBB network, you would be able to do that using just this one point. In fact, SBB as well as the other ISPs in Serbia are obliged by the data retention law to do exactly that – to store all metadata about internet traffic and allow government bodies access them.

From this bottleneck of the national traffic, paths lead to different peering routers or to the Serbian Open Exchange (SOX). As we already explained in our Interconnection map of Serbia, networks, run by different Internet Service Providers, are interconnected at physical locations where their routers are connected by cables, such as in an Internet exchange points. Those are the places where different networks meet, merging different networks into a single system, allowing us to connect to other connected devices on any other network.

Local exchange points allow informations to flow more locally. Without them, internet packets would flow in different routes, and in case of no direct connection between two providers they would go through a third provider or even another country. Unfortunately, there is just one Internet Exchange Point in Serbia and most of the packets go to Belgrade from any other place in Serbia. If more local exchange points existed in different parts of the country, data would flow more locally and significantly shorten their route .
But there is another curiosity visible on this map. There is a spot on Telenor servers as a part of the SOX network (mainstream-telenor.sox.rs) that connects the most visited websites in Serbia. It belongs to Mainstream d.o.o, a company established in 2005 that provides hosting and maintenance services. More than a half of the local websites from our sample are hosted by this company. Most of them have racks with servers in three Data centers in Belgrade, but according to our map they are mostly situated at the Telenor Tier3 Data Centar.

Based on our map we can conclude that there is a high level of centralization of the local internet traffic. We can define 3 different levels of centralization :

Centralization on the level of ISP – single point where all the traffic is routed
Internet Exchange point level – there is just one IX in Serbia
Hosting level – Most of the biggest websites are hosted by one single company

Points of centralization are points of power, and the more routers or ISPs meet at a single point the importance of that point, router, server increases. It is of great significance to know who has control over these points, given that those entities have influence over the internet in Serbia and, providing the opportunity, could use or misuse their power.

Exit points

From our findings we are able to identify few main data flow paths going out from the country. Similar to the centralization of the local flow over one single router, we have a few main spots through which our data passes before going out.

The two biggest points of centralization according to our map are:

at-be-r-1-pc1.sbb.rs, : mostly connecting to the routers related to DE-CIX in Frankfurt
bg-yo-r-1-pc1.sbb.rs : connected with bpt-b4-link.telia.net leading to the routers in Hungary and Prague

peer-A515169.sbb.rs : connected with Google owned websites

Data about Data Flow

Now that we have detected the main bottlenecks of the local internet traffic and the main exit points, let’s try to analyze the main ports and countries where our data flows to as its final destinations.

Of the total 100 most visited websites by users from Serbia, only 27 are actually hosted in Serbia. More than a half of those are hosted by a single hosting company. 63% of our Internet packets leave the country. Let’s examine where.

One big stream of data heads towards Budapest (25) ,Vienna (25) and another to Prague (15). Those are mostly transit ports, that transfer data further to Germany, the Netherlands , the UK or Switzerland.

Frankfurt, Germany, is by far the capital of our data flow, the biggest transit port of our data. Half of our packets pass through this city at some moment, mostly through the DE-C IX Internet Exchange Point. This place is not just the biggest gathering point for all internet packets that come from Serbia, but the biggest Internet Exchange place in the World, connecting more than 600 ISPs.

Even though Frankfurt is a transit capital, not a lot of data is actually hosted there. The biggest share of our sample websites that are hosted in Europe are situated in Amsterdam. Another interesting fact is that more than half of those 11 websites are related to pornography. This is the red light district of the Internet’s second biggest port.

36% of our visits head over the Atlantic ocean to the US. Unlike the case of European countries where most of the data is in transit, here the data is hosted.

When looking at the overall picture regarding hosting of the most visited websites by internet users from Serbia, the conclusion that can be drawn is that the US’ hosting providers are dominant over the EU’s and Serbia’s (36% US, 27% EU, 27% RS).

Regarding data transfer, the most important location on the US East coast is Ashburn, Northern Virginia – one of the Internet’s capitals, home to a large number of data centers, a strategic communications hub for the eastern United States, a major communications gateway to Europe and the largest Internet peering point in North America.

Regarding the results of our research, the concentration of the final destinations of our Internet packets is dense on the West coast, especially around San Francisco and the Silicon Valley. But what we can not be sure of is whether this is the final destination or just a mask. The findings of our other research say that the exact locations are somewhere else, mostly around Northern Virginia, where big data centers of Google, Facebook and Amazon are located.

According to our research, it seems that the Internet we use is not such a decentralized place after all.

Based on our sample, the Internet we use consists of main data transit and hosting sites, capitals of data flow, situated in only 13 countries, where our data either flows through or ends its’ journey. This structure is very different from the original idea of a mash, decentralized network, conceptualized in the beginnings of the Internet.

On the other hand, none of the websites from the list of the top 100 visited are outside of Europe and the US, not even from the region.

National borders of the Internet

For the purpose of this research, we can examine two different types of “borders” that exist on the Internet. The existence of the first type is consequential to the fact that in order to operate in a certain country, Internet Service Providers are obliged to act in accordance with different national laws and regulations. The fact that physical infrastructure, i.e. cables, routers, switches and servers are located on the territory of one country and that this infrastructure is owned and managed by a legal entity (company or institution), subjected to national regulations, directly points to links between the state, Internet Service Providers and the data traveling through the networks. For example, Internet Service Providers operating on the territory of Serbia are obliged by the Serbian Law on Electronic Communications to keep all the metadata and give different state institutions access to this data. In order to ensure their customers access to the entire Internet, ISPs have different interconnection points with providers in other, usually neighboring countries. Data, while traveling from one ISP to another, crosses the theoretical border point where one state jurisdiction ends and another starts. Borders applicable to the Internet are the same as the ones found in the “real” world. Internet Service Providers are the gatekeepers of the Internet and therefore any potential form of state censorship, filtering or throttling of traffic is most likely be conducted in cooperation with them. Mapping interconnection points of national and international providers and analysis of the network topology structure allow us to better understand the key points of this infrastructure, where potential Internet censorship, filtering or traffic throttling could happen.

The second type of borders relevant to our research are those created by websites, Internet platforms or applications themselves. Every device connected to the Internet in order to communicate with other devices has an IP address. Even though IP addresses are more logical rather than physical, using only the IP address one could easily determine the country in which the device is located. The reason for this is that the IP addresses are assigned to users by a single authority called IANA (Internet Assigned Numbers Authority), which assigns the ranges of IP addresses to entities interested in buying them, but keeps a database as for which range belongs to whom and other data including to which country is the certain range connected. Because of this, websites, internet platforms or applications are able to detect from which country you are visiting and allow or block your access to the content or service. Reasons for blocking of access on the national level to the content varies from different intellectual property and copyright issues to the blocking of sexual, political or religious content under the pressure from different governments worldwide. In this case, the role of the gatekeeper is played by the companies that own websites or applications. You’ve probably already seen a message like this on sites such as YouTube: “This content is not available in your country”.

Data Flow and privacy

All the ISPs in Serbia and in the most of European Union countries are still legally obliged to store metadata. By storing and analyzing metadata, ISPs and government bodies are able to trace and identify the source and the destination, the date, time, duration and the type of communication. Even without access to the content, metadata reveals private information – sometimes much more than the content would.

Appearing in a video conference call in September 2014, Edward Snowden explained: “Metadata is extraordinarily intrusive. As an analyst, I would prefer to be looking at metadata than looking at content, because it’s quicker and easier, and it doesn’t lie… If I’m listening to your phone call, you can try to talk around things, you can use code words. But if I’m looking at your metadata, I know which number called which number. I know which computer talked to which computer”. Stewart Baker, former General Counsel of the National Security Agency (NSA), said: “Metadata absolutely tells you everything about somebody’s life. If you have enough metadata, you don’t really need content.” How much do the terms you google, all the subjects of your emails, the network of people you communicate with, websites you visit, your location and communication habits reveal about your private or professional life? Metadata analysis is much more intrusive and efficient than for example traditional surveillance techniques practiced by Stasi in former East Germany, described as one of the most effective and repressive intelligence and secret police agencies to ever have existed, employing by some estimations between 500,000 and 2 million occasional informants.

But, without metadata, communication on the Internet as we know it today would not be possible. In order for communication to be possible, we give consent to ISPs to handle and process our data and metadata, and at the same time by living in Serbia or the EU, under the data retention laws, we have agreed that our metadata is being stored, accessed and analyzed by different government bodies. On the other hand, information is the resource driving the Internet industry. The business models of the biggest Internet companies are based on collecting and analyzing our private information and automated profiling in order to sell targeted ads.

Having this in mind, our initial interest in this research was to try to better understand the invisible networks and mechanisms underlying those processes. In our previous work, the focus was more on the legal aspects and analysis of different cases of violation of human rights online, mostly related to privacy and freedom of expression. In order to achieve progress, we believe that we should try to examine and understand technical reality and processes well hidden under the surface of device screens, the complex and invisible mix of software and hardware layers consisting of infinite lines of code and vast amounts of cables, routers and servers.

http://www.alexa.com/topsites/countries/RS