mapping – SHARE LAB

Invisible Infrastructures: Understanding Autonomous Systems

admin — Tue, 10 Feb 2015 13:29:33 +0000

The Internet in its essence is not what most people perceive when online. It is an abstract space which gives limitless opportunities, but basically it consists of hardware, millions of servers, routers, cables and other network peripheral devices. Basically, in most cases, there is a physical cable or wireless connection reaching almost every corner of the world and every Internet user. Each and every network device of the Internet infrastructure has its own physical location. Some of them are grouped, which makes their locations a sort of “crossroads” of the Internet.

One of the reasons we seldom discuss the issues of this invisible infrastructure is the fact that the speed of the packets traveling through the network is so big and unnoticeable to us, in most cases we don’t feel a significant difference in whether our packets are traveling just around the corner or to around the world and back.

The fact that we are not able to perceive this difference does not change the fact that those packets, during just a little fragment of a second, travel through thousands of kilometers of cables, myriad of routers and switches, different national territories and a number of potential spots where they can be retained, slowed down, stored, copied or examined.

Unlike the telephone network, which for many years was a monopoly run by a single company in most countries, the global Internet consists of tens of thousands of interconnected networks run by telecommunication companies, Internet service providers, individual companies, universities, governments, and others ¹ . Those entities have different legal regimes, business and technical relationships, privacy policies and ownership models. Even our most frequent and most sensitive communication relies on those entities. But even so, in most cases, our knowledge of how those networks are interconnected and how they deal with our data is left in the dark.

Our first step of understanding this invisible network is to try to understand the structure of our nearest network, network runed and owned by our Internet service provider. Every ISP is a story for itself, they have a different number of users, a different number of interconnected routers organized in different structures.
Every device that is connected to the Internet (your computer, routers, servers) have an IP address. The IP address is a logical Internet Protocol address which allows data to flow over the Internet. IANA (Internet Assigned Numbers Authority) through the RIRs (Regional Internet Registries²) assigns the ranges of IP addresses to entities interested to buy them, and they keep a database of which range belongs to whom and other data, including which range is assigned to which country . So, every ISP has a limited and defined range of IP addresses that they further assign to their users and infrastructure that they own.

This set, range of all IP addresses that one ISP owns, was the starting point of our research.

We used IP ranges of every ISP and created a Network Topology map for every one of them. In order to visualize large sets of data, in our case more than 300.000 different IP addresses and links between them, we had to find a tool that is able to display, manipulate and transform the network into a map. We used Gephi ³, an interactive visualization and exploration platform for different kinds of networks and complex systems, dynamic and hierarchical graphs. The obtained results are showed below in form of 30 different maps of ISPs in Serbia.

Different structures, and what we can learn from them

Network Structure analysis can be useful for different aspects of network security and efficiency of the network, but our main interests as researchers in this case are related to possible privacy related misuse of the network, digital surveillance and data retention, and different forms of Internet filtering, content control and censorship.

There is three basic network structures:
Centralized. All the devices are connected to one center. This center has privileged accessibility and thus represents the dominant element of the network.
Decentralized. Although the center is still the point of highest accessibility, the network is structured so that sub-centers also have significant levels of accessibility.
Distributed. No center has a level of accessibility that significantly differs to the others.

By analysing our visualizations of ISPs in Serbia we have noted that both centralized and decentralized models are present. The centralized model can be associated with the network of the state owned Telekom Serbia and an example of a decentralized model can be seen in the case of the University network – Amres.

But, except feeding our curiosity for deeper understanding of our technological environment and passion for visualizing big sets of data, can we have a practical use of those maps in the field of internet freedom and user privacy?

The Game of Filtering

Internet filtering (or Internet Censorship) is one of the most widespread forms of government approach to internet control. Internet freedom around the world has declined for the fourth consecutive year, with a growing number of countries introducing online censorship and monitoring practices that are simultaneously more aggressive and more sophisticated in their targeting of individual users ⁴ .

There are three commonly used techniques to block access to Internet sites: IP blocking, DNS tampering, and URL blocking using a proxy. These techniques are used to block access to specific Web Pages, domains, or IP addresses. When the targeted websites are outside the legal jurisdiction of the government (in a foreign country) this is the most effective way to block access to their citizens. There are more advance techniques, (blocking searches involving blacklisted terms, keywords analysis, dynamic content analyses) but they are more rare and we will discuss them in other parts of our research.
What we find most interesting, related to our ISP mapping efforts is the question: Where will internet filtering take place in our ISP network topology? According to the OpenNet Initiative study, Internet filtration can occur at any or all of the following four nodes in network:

1) INDIVIDUAL COMPUTERS
2) INSTITUTIONS Filtering the network on an institutional level using technical blocking
3) INTERNET SERVICE PROVIDERS Government-mandated filtering is most commonly implemented by Internet Service Providers (ISPs) using any one or combination of the technical filtering techniques mentioned above.
4) INTERNET BACKBONE State-directed implementation of national content filtering schemes and blocking technologies may be carried out at the backbone level, affecting Internet access throughout an entire country. This is often carried out at the international gateway.

In one of our previous researches ⁵ related to the case of the national research and education network of Serbia – AMRES’ internet filtering practice, we discovered a decentralized method of content filtering, delegated and executed through local administrators and routers at every University in Serbia. Each local administrator is responsible for his own black list of sites and ports. The AMRES network is one of the oldest ISPs in Serbia, established in the early 1990s, and its method of Internet filtering presented here is filtering on institutional level. If we take a look at the visualization of the AMRES network, we can clearly see why this method of Internet filtering was the most applicable one – the decentralized structure of the AMRES network somehow imposes this kind of filtering strategy.

In our view, that type and complexity of a network structure and topology, ownership model & management needs, have a crucial role in defining the model of internet filtering, and the amount and type of equipment that will be used. For us, users or researchers without access to privileged information, the analysis of network topology maps can be a starting point for better understanding infrastructures of control and potential repression.

In December 2014, the Government of the Republic of Serbia sent a Proposal of the Law on Amendments to the Law on Games of Chance ⁶ to the Parliament. The proposed changes were adopted without a discussion and public insight, even though these provisions would introduce Internet censorship in Serbia through a “back door”. The solution that presented the main problem was the amendment ⁷ which prohibits “ enabling access to websites by domestic electronic communication network service operators to legal entities or individuals organizing games of chance without the approval or consent of the Administration”.

Fortunately, after SHARE Foundation analyzed the Proposal and started a media campaign, the Proposal of the Law was withdrawn from the parliamentary procedure following an intervention of the Government. In one part of the Proposal, it was written that the installation, maintenance and costs of the equipment intended for filtering is a responsibility of the ISPs. In order to create an argument regarding unreasonable costs that every ISP would have, we tried to analyze the network topology maps of every individual ISP in Serbia and try to guess how much and what kind of equipment they would need to purchase. Even though our method is not 100% accurate, we had in our hands something to work with, something that gave us an insight into the unknown and invisible design of the networks. By watching the map of Telekom Serbia’s network, the biggest ISP in Serbia and owner of the biggest share of the infrastructure, we could observe the highly centralized structure where almost all the main nodes, routers were connected to just two main servers. The logical conclusion is that in order to perform real time filtering they would need to instal equipment exactly in those two points. On the other hand, from the number of nodes attached to those two main routers, we can guess that they are able to process huge amounts of traffic, therefore the equipment that they would need to install would probably need to be of high-end performance. We were able to predict the type and cost of the theoretical filtering solution, giving that there are just a few manufacturers of such equipment.

We played the Game of Filtering on the maps of the other ISPs as well, and each of them was a story for itself. Most of them were much more decentralized and we needed more efforts to find out where filtering could potentially happen. Decentralized networks are more complex to control, they have more crossroads, more points to cover if you want to have access to all the data flows. Although, it’s hard not to see the shape of the Panopticon structure in the case of the network organisation similar to the one we saw on the case of Telekom Serbia.

Given that our analysis is still only at the level of an individual ISP, this is just a small fragment of the story. The Internet is a network of networks, and to be able to create a full picture and to understand where the points of control are, we need to examine their local interconnections and links to the International networks. This is the topic of our next analysis.

Invisible Infrastructures : Internet Map of Serbia

admin — Sat, 07 Feb 2015 12:10:25 +0000

For thousands of years maps have been the essential tools to help human mankind to define, explain, and navigate their way through the world. Topology maps of the Internet are an important tool for characterizing the infrastructure and understanding the properties, behavior and evolution of the Internet.In our previous study, we explored individual Internet Service Providers, their size and structure. Now we are trying to understand, how they interconnect, we are exploring a network of networks or we can say the Inter of Internet.

What are we looking at?

By identifying and tracerouting 300.000 IP addresses and 30 ISPs in Serbia using various open network analysis tools, we created a map representing over 4.500 main routers and servers that make the core of the national Internet infrastructure. This Network Topology map allows us to identify the main actors, companies (ISPs) that own and control the infrastructure, have a possibility to access, retain, analyze or sell user’s metadata, their interconnection points, national Internet exit points and the level of infrastructure centralization on both national as well as the level of individual ISPs.

Every dot represents one IP address (router or other network device) and the lines between the dots are the links – cables that connect them. Every colour represents a different Internet Service Provider (ISP). This is a Network Topology map, i.e. it is not a physical map and it does not show exact geographical locations.

Networks, run by different Internet Service Providers, are interconnected at physical locations where their routers are connected by cables, the points of connection are called Internet exchange points (IXP). Those are the places where different networks meet, joining different networks into a single system, allowing us to connect to other connected devices on any other network.

Interconnection is both definitive of the Internet, and a manifestation of a business relationship between two ISPs⁸.

Most ISPs are unlikely to have peering arrangements with all other ISPs in the world. Thus, with the exception of a small number of very large multinational network operators, most ISPs, themselves, need at least one transit provider to ensure they (and their customers) can reach the entire Internet⁹.

Despite the strong theoretical background, and the virtuality of the matter which was subject to this research, the output is quite concrete.

The most important conclusion is the identification of the intersections, i.e. the points where the ISPs meet. These are points of power, and the more ISPs meet at a single point the importance of that point, router, server, increases. It is important to know who manages and controls those points, because that is the entity that controls the internet in Serbia.

Anyway, the most important output of this research is that it can serve as a starting point for different multidisciplinary researches related to the internet infrastructure in Serbia. A few examples would include, measuring the internet speed in Serbia, measuring the level of bandwidth throttling, determining the routes that are used most often when accessing online content, etc.

Methodology

The research process is divided into four phases. Every phase is equally important since it provides the input data for the phase that follows. The final output of this research can also be used as an input to some other, more advanced analysis.

Determining the IP ranges

Every device that is connected to the Internet has one or more interfaces through which it communicates with other devices on the network. Each and every network interface is defined by a certain set of parameters, one of which is it’s IP address. The IP address is a logical Internet Protocol address which allows data to flow over the Internet from it’s source to the destination it was intended to reach.

Even though IP addresses are more logical rather than physical, using an IP address it is simple to determine in which country the device that uses it is located. The reason for this is that the IP addresses are assigned to users by a single authority. IANA (Internet Assigned Numbers Authority) through the RIRs (Regional Internet Registries, RIPE NCC for Europe and parts of Asia) assigns the ranges of IP addresses to the entities interested to rent them, but they keep a database as for which range is assigned to whom and other data including to which country is the certain range connected. That means that the IP addresses are also somewhat physical addresses. This information is publicly available, and there are websites online that show the IP address ranges by country along with the actual owner.

Scanning the Network

Since not all of the devices are connected directly to each other (in fact few are, i.e. even computers positioned in a single office use a router to communicate), there is the necessity of routing over the Internet. That means that if one host wants to communicate with another host on the Internet, he needs to establish a route through which they can connect. That route is in essence a set of IP addresses of different network devices that make it possible for the two hosts to communicate.

This means that in order to reach the destination address, the data hops from host to host. In order to see how two hosts are connected, the ICMP (Internet Control Message Protocol) is used. That is one of the most important protocols in the IP set of protocols. There is a simple tool, called traceroute, which is mostly used in network diagnostics. This tool makes the data hops over the Internet visible and systematic, which makes them usable by sending ICMP messages and waiting for responses from the destination hosts.

For tracerouting ranges of IP addresses there is a special tool called Nmap, which is quite user friendly, detailed and precise. Naturally, the bigger the range, the more computer resources are exploited. Basically, Nmap traceroutes the paths between the hosts on which it runs and every IP address from the range that is being scanned.

Note: The output is actually consisted of the routes that connect the source computers to all the active hosts from the range that accept ICMP messages.

Data Processing

The outputs of the scans are what we can call “raw data” in this case. They contain quite a portion of data that is not usable due to the hosts not giving any response during the scans because of different reasons, and are as such irrelevant for the Internet infrastructure at the time of scanning.

The actual usable data needs to be extracted and formatted in a proper way, so that it can be used as an input to the visualization software. First and most important it is to know what the software used for visualisation can work with. For this research it was CSV (Comma Separated Values) file, with a simple structure, i.e. 3 fields Source IP, Destination IP and Label.

The output of Nmap can be stored in a .xml file. Both of these file types are a special variant of text files, which makes the entire process of parsing data much easier. In essence, what is needed is a piece of software that will extract some text from one file, and put it in another. There is an ample of solutions available online, manly scripts. In this case a python script was used.

The script takes two arguments, the input file and the output file and what it does is, it searches the text files for a certain words (in this case trace and ipaddr) and when it comes to those predefined keywords it takes the necessary values. In the end it generates the .csv file with the required structure (in this case omitting the Label field, which is not required). The script is available here.

Note: People who prefer Perl to Python should consider this link.

Data Visualization

In order to visualize large sets of data, in our case more than 300.000 different IP addresses and the links between them, we needed to find a tool that has the ability to display, manipulate and transform the network into a map. We used Gephi, an interactive visualization and exploration platform for different kinds of networks and complex systems, dynamic and hierarchical graphs.
Our main challenge was how to represent a large number of nodes, in a most convenient way and still have a visualization useful for further research. Most of the Graph Layout Algorithms integrated into Gephi software during our tests failed to deal with large networks ( +100k nodes ) except partially OpenOrd and ForceAtlas2 algorithms.
ForceAtlas2, the algorithm that we used in the end is a Continuous Graph Layout Algorithm, a force-directed layout which is integrating different techniques such as the Barnes Hut simulation, degree-dependent repulsive force, and local and global adaptive temperatures. More about the algorithm you can find here.
In order to represent more clearly the results we chose to eliminate end-nodes and eliminate *noise*,. This reduced and cleared data set consisted of 4067 nodes, IP addresses that represent interconnected infrastructure of the main routers and servers serving the end users in Serbia.

Tools

Nmap ( http://nmap.org/ )
Python script used for XML to CSV parsing (script)
Gephi ( http://gephi.github.io/ )