Serbia – SHARE LAB

Hacking Team : The “Italian job” of Serbian security services

admin — Tue, 14 Jul 2015 07:07:52 +0000

Database of leaked Hacking Team emails reveals details on negotiations for purchasing spy software

At least one Serbian security service negotiated the purchase, while the Ministry of Defense comes up as a trial user of the spy software made by Hacking Team (HT), a company from Milan whose electronic databases were made publicly available last week by Anonymous and Wikileaks.

Not long after the Italian company’s Twitter account had been compromised, more than 400 gigabytes of data were published, including internal documents, client lists as well as source code.

Huge HT databases are still in the initial phase of analysis by experts, journalists and activists around the world. Share Foundation team singled out a company correspondence related to Serbia, in which members of the Security Information Agency (SIA) and the Ministry of Defense have participated, along with a private company located in New Belgrade.

The negotiations lasted until the end of 2011, partly with mediating services from a private company for trade and manufacturing of computer equipment “Teri Engineering“ from New Belgrade, whose CEO arranged meetings, software testing and negotiated the price. In an internal discussion, this Belgrade company was mentioned as a “player” which could introduce the spy software manufacturers “to the whole central Europe”.

According to the available information. the first contact from Serbia was established after the international exhibition of internal security equipment MiliPol Paris 2011, when a member of the Security Information Agency (SIA) contacted a branch of Hacking Team, asking if a presentation of HT software in Belgrade could be arranged.

SIA and Hacking Team

The software is known as the Remote Control System, RCS, based on the targeted spreading of viruses on computers and mobile phones of persons under surveillance. Most clients using this system are states and security services from across the world.

The initial presentation in Belgrade surely took place, but the correspondence dies down until April 2012, when the same SIA member addresses the HT manager, who will notify him that the new version of the software will be available in May and that they could meet at the end of that month.

In the internal correspondence of the HT manager regarding the planned presentation in SIA Headquarters in Belgrade on 24 and 25 May 2012 it is noted that the SIA was “already introduced to the software in their Headquarters in the beginning of the year and a month ago in Rome”. It is stated that the SIA is now calling them to test only the software for mobile device surveillance. One of the HT representatives communicating with the Serbian security service member is mentioned in “Spyfiles 3”, a Wikileaks database with information related to many global manufacturers and vendors of spy equipment and software.

MSA searching for the spy virus

Independently of the communication with the Security Information Agency, at the same time the CEO of “Teri Engineering”, a private company from Belgrade, addresses the Hacking Team managers, with a recommendation (and a percentage for closing the deal) from Nice Systems, an Israeli company specialised in electronic surveillance and data analysis. In the correspondence, the intermediary from Belgrade lists MSA which is an abbreviation for Military Security Agency (“VBA” in Serbian) as a possible client, and offers local implementation services.

The negotiations begin in April, a month before the parliamentary elections of 2012, and the intermediary from Serbia insist that the presentation is held as soon as possible. From the emails it could be understood that the presentation was held shortly afterwards, and that the client from Serbia (MSA) received the system for a trial.

Negotiations on price soon followed, and the intermediary – despite the hefty commission for her company and the partners from Israel – managed to significantly lower the price from close to 500.000 euros to around a half of that amount. A person with an email address on the Ministry of Defense domain participated in the correspondence regarding the technical details of activating the virus and using the infected phone.

In late fall 2012, CEO of “Teri Engineering“ from Belgrade notified HT that because of possible “problems with the budget”, the procuring entity (instead of MSA/Ministry of Defense) could be Telecom Serbia, “100% state-owned company”.

Same year in September, after the trial was finished, the intermediary from Belgrade told the HT representatives that their system had a problem “which does not exist with the competitors”. It was Gamma, a company from London, whose software FinSpy, as it is known, soon found its buyer in Serbia.

Communication continued at the start of 2014, when there is a news from Belgrade that the budget for this deal was finally adopted, but negotiations stumble because of the price. Another obstacle were the parliamentary elections (March 2014) and the expected changes in the Ministry of Defense and security agencies, with new personnel appointments awaiting.

Hacking Team tried to arrange another presentation in Belgrade, aiming to divert their potential client from the competition. At that moment however, the competing spy software is already in Serbia.

In May last year, the communication from Belgrade totally dies down.

How do the agencies monitor infected devices

Until now, several ways how the Hacking Team’s system uses exploits in targets (e.g. devices) were identified. It is an advanced graphical interface in which most operations are performed with a single click. With the system, buyers also receive an instruction manual how to execute different types of infections, physically and on the internet.

The most common way of infecting targeted devices via the internet is to send infected documents (.doc files) by email, which when saved automatically start downloading spyware in the background and install a “backdoor” on the infected device, therefore implementing HT spyware.

In the control panel, there is a list of all infected devices, with their maximum number depending on the specific product. It is important to note that every system is tailor-made and that the price of the system depends on its functions, supported devices (PC, Mac, BlackBerry, mobile devices) and operating systems (Windows, Linux, OS X).

Primary use of this software is to monitor the system on which the spyware is implemented and not be recognised by the anti-virus program, which is why it is necessary to update the system regularly, so the price of yearly maintenance is 20% of the total value of the licence (75.000 €).

As part of its server, Hacking Team also had a KnowledgeBase, where it was described in detail which data from which devices and operating systems can be extracted. There are also instruction how to infect devices, as well as analysis of different anti-malware software.

For technical support, user would open a ticket on Hacking Team’s website and then their team would do a reconstruction of the problem in a laboratory and found a solution, which can be another reason why the maintenance price is relatively high.

Users of RCS software are mostly governments or government agencies. The system works on the basis of proxy servers which “launder” the traffic through several countries, so it is virtually impossible to technically determine who performs surveillance and where is the surveilling operator located.

During the past several years, Hacking Team, a manufacturer of surveillance software and equipment, has been targeted by civic organisations because of its active role in the global development of the surveillance industry without civilian control, as well as selling the software to countries known for heavy human rights abuses, even when it represents a violation of UN sanctions, in case of Sudan.

Hacking Team was a key actor in the research carried out by CitizenLab at the start of last year, because of the sale of RCS to various governments. Their product was used for tracking the award-winning Moroccan news portal “Mamfakinch“ in 2012, as well as human rights activists from the United Arab Emirates.

Last year, Privacy International warned of the possibility that this company had received million and a half euros from funds connected to the Region of Lombardy in 2007. From the leaked financial databases it can be seen that Mexico, Italy and Morocco are the biggest Hacking Team clients, with “orders” valued at several million euros in total.

Share Foundation wrote about the legal framework for import of this kind of software in Serbia back in 2013 because of the “Trovicor” case, stating that rules for dual use goods must be applied and that a permit from the Ministry of Trade, Tourism and Telecommunications is obligatory. In October 2014, the European Commission updated the list of dual use goods, inter alia because of the need to control IT intrusion software (‘spyware’) and telecommunication and internet surveillance equipment. In accordance with this, the Government of the Republic of Serbia has also adopted aDecision in May 2015 to fully comply the national control list of dual use goods with the European Commission’s list.

On the other hand, use of equipment such as the one being sold by Hacking Team is not explicitly prescribed as a measure that state bodies can use. If we assume that certain organisations can be authorised to use this equipment, in our legal system that wouldn’t be possible without a court decision in accordance with the law. Using it in any other way would be an obvious violation of human rights which are guaranteed by the Constitution of the Republic of Serbia and numerous international conventions.

Invisible Infrastructures : Surveillance Architecture

admin — Mon, 09 Mar 2015 11:46:37 +0000

In April 2014, we collected about 2000 pages of documents and reports through the series of FOIA¹ requests to the Commissioner² related to the 2012 Report on the inspection procedure over the implementation and enforcement of the Law on Personal Data Protection by the operators and state bodies (the police and both civil and military intelligence agencies), that served as a base for our analysis on metadata retention and digital surveillance architecture. Our tech and legal analysis, presented in a form of an infographic, illustrates different ways in which the 4 biggest telecommunication service providers in Serbia allow state bodies access to our metadata. The following series of infographics and the analysis show numerous methods of access to retained data, which circumvent legal procedures and necessary court orders (direct access to the servers, applications for direct access).

While smartphone penetration in Serbia is about 35% and constantly rising, the percentage of mobile phones in use is well over 130%³. Which means that about a quarter of the populations has more than one mobile phone. Metadata as a type of information was mentioned earlier, and in this context it is important to mention that each and every device regardless of whether it is a smartphone or an earlier generation mobile phone generates metadata. The only difference being that older mobile phones don’t support Internet, thus they don’t generate metadata related to Internet use. Because of the relatively high and rising number of smartphone users, as well as the prospects of development of the matter, this research is conducted from a smartphone’s perspective.

Every smartphone commercially available in Serbia (and in the World) at present supports three types of traffic through the cellular network i.e. calls, SMS and mobile data (mobile Internet). It is important to note that all three types of traffic go through the same infrastructure, ergo the points in which surveillance is possible are the same for all of them. This would mean that in this part of the research we are talking about mobile device generated traffic in general and emphasising the differences that come to pass in all three different types of traffic.

So, let’s start from the beginning and explain the way a device connects to a network, or rather how it authenticates itself on the network. For the purpose of authentication the device uses 2 ID numbers, the first one is the device’s IMEI number (International Mobile Station Equipment Identity), and the SIM card’s IMSI number (International Mobile Subscriber Identity). Both numbers are unique and predefined for every device/SIM card. The mobile carriers have an infrastructures of Base Stations (BS) that are geographically distributed throughout the area that’s being served by the operator. The BS form the backbone of the entire mobile infrastructure.

When a call is initiated the caller’s device contacts the nearest BS, and the BS forwards the call to the Mobile Switching Centre (MSC). The MSC then informs the BS that is nearest to the called user who gets the call. Once the call is established (the called user answers the call) meta data is being generated in the MSC. The MSCs archive the metadata in the carrier’s own datacentre. The content of the calls is not being archived, but also passes through the MSC.

What type of metadata is being archived?⁴
The answer to this question varies from carrier to carrier, at least in Serbia, but there is a general set of metadata that all carriers archive i.e. Caller’s number, called number, IMEI, details about the BS, date and time of the call, duration of the call, amount of data (for Internet), type of service, details about the identity of both parties, list of all SIM cards that have been used in the current device (and vice versa, list of devices the current SIM card has been used in). There is also data that can not be classified as metadata, but can be accessed by having the aforementioned metadata, i.e. National ID number, user’s address (through contracts or registration of the SIM card for prepaid users) and device make and model (using the IMEI number). The process of archiving this data is called Data retention.

How is this data stored?
Carriers in Serbia are obliged by the law to store this data for a period of 12 months for every user. The data is stored on servers; there are no strict rules whether the carriers need to buy there own serves or can use other company’s servers to store all these data. However most of them have data centers in their ownership. All the operations on the servers are being logged for control purposes.

How can these data be accessed?
The mobile carriers in Serbia have designated departments that deal with affairs related to Data retention. The employees, who work in those departments are specially trained to deal with the entire process of data retention and access to retained data. When it comes to access of retained data, there have been identified several actors (i.e. state organs) that have accessed retained data in some way. Not all state organs have the right to access retained data, this right lays with the organs of justice, as well as the Police, and both civil and military intelligence agencies. Even within this group there are differences in who can access what and how. There are several mechanisms, or channels that can be used for access to retained data.

Request⁵
The first mechanism is the most simple one, it’s based on the request – response principle. This mechanism is used by all state organs and all carriers. Namely, a representative of the state submits a request to the carrier in which the requested data is stated. There are several forms that are commonly used for submitting these requests, mostly by email, fax, phone or in person. The special department within the carrier then processes the request and delivers a report based on the input that has been submitted. Potential issues in this mechanism include the fact that requests submitted by phone should not be (and in some cases are) processed because of the possibility of fraud, and the inability to deliver the appropriate documentation (a court order). Some of the carriers have developed a system for submitting requests by designating a limited list of dedicated e-mail addresses that serve this purpose.

An upside of this mechanism is that every single request submitted to the carrier, this enables transparency and review of the requests the state organs submit.

Application for Independent access to retained data
Another mechanism for access to retained data is the so-called Application for Independent access to retained data. This is a software implemented by some of the carriers in Serbia for the convenience of the state organs. This mechanism is used by the Police, and both the military and civil intelligence agencies. This basically means that these organs do not need to submit a request in order to get data. The application can be accessed online with credentials provided by the carrier. A set of different queries is available within the application which offers practically limitless access to all the data that is stored in the database in a form of different listings (outgoing calls, incoming calls, data usage, SMS/MMS communication etc.) All of the aforementioned listings, along with the basic details of the user whose metadata is being accessed, contain detailed information about location, duration of service, and all the other types of data that were mentioned earlier as retained data. Submitting a court order for accessing this data is not a requirement, so it is clear why this mechanism would be problematic privacy-wise.

Even though these are the two primary mechanisms used by all carriers, there are some specific scenarios or specially established channels of commuting retained data between some carriers and some state organs. Here, we will give two such examples.

Sending data
There is an established connection between one mobile carrier and the Security Intelligence Agency (BIA) which represents a standalone mechanism for access to retained data, independent of all the other mechanisms. There has been a practise that on a daily basis, all the metadata of the users from the Mobile Switching Centre is automatically delivered to BIA. This creates special circumstances of non-transparent handling with retained metadata and implicates data collection on a mass level. Another issue with this mechanism is that it doesn’t comply with the legal provisions that allow for retained data to be stored for a maximum length of 12 months, because no authority monitors BIA for handling retained data. Further more, BIA doesn’t enjoy the right to archive metadata, this responsibility only lies with the carriers.

Direct Access To the Retention database
Another case is the link between another carrier (who only provides with Internet and landline services) and BIA. In this situation upon a request of BIA the carrier provided them with a special connection to it’s own infrastructure in such a manner that BIA is able to access all four corners of the data system and also intercept digital communication in the carrier’s network.

It is important to note that the two last mechanisms do not have any legal grounds. Furthermore, they are an active threat to user’s privacy and are in conflict with the legislation that regulates electronic communications and similar matter both in Serbia and on international level.

Wiretapping

The principle Metadata doesn’t lie is certainly true, as is the fact that if metadata is mapped right it can provide the interested party with much deeper insight to the situation than the content of the communication. However, this does not mean that the content is not important.

Wiretapping is a technique that has been around for as long as electronic communications exist. With the new technologies used in the communication infrastructure and the new services that are available, the concept of wiretapping has changed and evolved into a new concept which is called surveillance. Surveillance is much more than wiretapping, it can be conducted on many levels, such as personal or organisational, but also on mass level. This means that someone can have the ability to listen into each and every call being made on a national or continental level. Mass surveillance is illegal in almost every country in Europe, for security purposes the law establishes a concept of interception of electronic communications.

Interception of electronic communications means targeted surveillance, which can be conducted in special circumstances with appropriate court order and for a limited period of time. However, when it comes to these issues even seemingly minor flaws in the law can have serious consequences and make space for mass surveillance.

In the recent years there has been a portion of bylaws that establish the rights and obligations of carriers and state organs in regard with interception of electronic communications. These regulations are put in such way that carriers are obliged to buy equipment (hardware and software) that can be used for interception and deliver it to a Monitoring Centre, whose headquarters are within BIA. Afterwards, BIA de facto has carte blanche for operation with the equipment, whilst the carriers retain the obligation to fund the maintenance thereof. As stated above, the interception as a sensitive process is very well regulated, but the implications of the bylaws and the lack of transparency in the actual execution of the process are a sound reason to question the legitimacy of the procedure, as it is currently being established in Serbia.

Physical tracking in real time

Base stations were mentioned in the introductory segment of this piece. They form the backbone of the cellular infrastructure. Actually, it is because of the BS that the entire network is called cellular. A cell is a geographical area covered by a single BS. At any moment any mobile device is connected to three BS, for the purpose of continuity and redundancy. That means that at any moment in time three base stations send and receive signals to and from the device. Base stations are set up in such a way that record the distance to the device, which is in fact it’s location, through several parameters related to the signal, some of them are AOA (Angle of Arrival), TDOA (Time Difference of Arrival) and TOA (Time of Arrival). This basically means that anybody who has access to BS can at any moment with a high level of accuracy determine the physical/geographical location of any device connected to the network.

In Serbia, according to the bylaws mentioned in the previous section has access to a special terminal equipment for tracking of devices. Furthermore, there are custom-made mobile devices that are configured in a way that they can be used for geo-tracking in real time. This mobile devices are issued by the carrier to the state organs upon request. Which means that anyone who has access to that terminal equipment (meaning that it’s entirely up to BIA how it will be used) can precisely locate any mobile device connected to a network in Serbia⁶.

Documents
Report
Telekom
Telenor
VIP

Invisible Infrastructures: Understanding Autonomous Systems

admin — Tue, 10 Feb 2015 13:29:33 +0000

The Internet in its essence is not what most people perceive when online. It is an abstract space which gives limitless opportunities, but basically it consists of hardware, millions of servers, routers, cables and other network peripheral devices. Basically, in most cases, there is a physical cable or wireless connection reaching almost every corner of the world and every Internet user. Each and every network device of the Internet infrastructure has its own physical location. Some of them are grouped, which makes their locations a sort of “crossroads” of the Internet.

One of the reasons we seldom discuss the issues of this invisible infrastructure is the fact that the speed of the packets traveling through the network is so big and unnoticeable to us, in most cases we don’t feel a significant difference in whether our packets are traveling just around the corner or to around the world and back.

The fact that we are not able to perceive this difference does not change the fact that those packets, during just a little fragment of a second, travel through thousands of kilometers of cables, myriad of routers and switches, different national territories and a number of potential spots where they can be retained, slowed down, stored, copied or examined.

Unlike the telephone network, which for many years was a monopoly run by a single company in most countries, the global Internet consists of tens of thousands of interconnected networks run by telecommunication companies, Internet service providers, individual companies, universities, governments, and others ⁵ . Those entities have different legal regimes, business and technical relationships, privacy policies and ownership models. Even our most frequent and most sensitive communication relies on those entities. But even so, in most cases, our knowledge of how those networks are interconnected and how they deal with our data is left in the dark.

Our first step of understanding this invisible network is to try to understand the structure of our nearest network, network runed and owned by our Internet service provider. Every ISP is a story for itself, they have a different number of users, a different number of interconnected routers organized in different structures.
Every device that is connected to the Internet (your computer, routers, servers) have an IP address. The IP address is a logical Internet Protocol address which allows data to flow over the Internet. IANA (Internet Assigned Numbers Authority) through the RIRs (Regional Internet Registries⁶) assigns the ranges of IP addresses to entities interested to buy them, and they keep a database of which range belongs to whom and other data, including which range is assigned to which country . So, every ISP has a limited and defined range of IP addresses that they further assign to their users and infrastructure that they own.

This set, range of all IP addresses that one ISP owns, was the starting point of our research.

We used IP ranges of every ISP and created a Network Topology map for every one of them. In order to visualize large sets of data, in our case more than 300.000 different IP addresses and links between them, we had to find a tool that is able to display, manipulate and transform the network into a map. We used Gephi ⁷, an interactive visualization and exploration platform for different kinds of networks and complex systems, dynamic and hierarchical graphs. The obtained results are showed below in form of 30 different maps of ISPs in Serbia.

Different structures, and what we can learn from them

Network Structure analysis can be useful for different aspects of network security and efficiency of the network, but our main interests as researchers in this case are related to possible privacy related misuse of the network, digital surveillance and data retention, and different forms of Internet filtering, content control and censorship.

There is three basic network structures:
Centralized. All the devices are connected to one center. This center has privileged accessibility and thus represents the dominant element of the network.
Decentralized. Although the center is still the point of highest accessibility, the network is structured so that sub-centers also have significant levels of accessibility.
Distributed. No center has a level of accessibility that significantly differs to the others.

By analysing our visualizations of ISPs in Serbia we have noted that both centralized and decentralized models are present. The centralized model can be associated with the network of the state owned Telekom Serbia and an example of a decentralized model can be seen in the case of the University network – Amres.

But, except feeding our curiosity for deeper understanding of our technological environment and passion for visualizing big sets of data, can we have a practical use of those maps in the field of internet freedom and user privacy?

The Game of Filtering

Internet filtering (or Internet Censorship) is one of the most widespread forms of government approach to internet control. Internet freedom around the world has declined for the fourth consecutive year, with a growing number of countries introducing online censorship and monitoring practices that are simultaneously more aggressive and more sophisticated in their targeting of individual users ⁸ .

There are three commonly used techniques to block access to Internet sites: IP blocking, DNS tampering, and URL blocking using a proxy. These techniques are used to block access to specific Web Pages, domains, or IP addresses. When the targeted websites are outside the legal jurisdiction of the government (in a foreign country) this is the most effective way to block access to their citizens. There are more advance techniques, (blocking searches involving blacklisted terms, keywords analysis, dynamic content analyses) but they are more rare and we will discuss them in other parts of our research.
What we find most interesting, related to our ISP mapping efforts is the question: Where will internet filtering take place in our ISP network topology? According to the OpenNet Initiative study, Internet filtration can occur at any or all of the following four nodes in network:

1) INDIVIDUAL COMPUTERS
2) INSTITUTIONS Filtering the network on an institutional level using technical blocking
3) INTERNET SERVICE PROVIDERS Government-mandated filtering is most commonly implemented by Internet Service Providers (ISPs) using any one or combination of the technical filtering techniques mentioned above.
4) INTERNET BACKBONE State-directed implementation of national content filtering schemes and blocking technologies may be carried out at the backbone level, affecting Internet access throughout an entire country. This is often carried out at the international gateway.

In one of our previous researches ⁹ related to the case of the national research and education network of Serbia – AMRES’ internet filtering practice, we discovered a decentralized method of content filtering, delegated and executed through local administrators and routers at every University in Serbia. Each local administrator is responsible for his own black list of sites and ports. The AMRES network is one of the oldest ISPs in Serbia, established in the early 1990s, and its method of Internet filtering presented here is filtering on institutional level. If we take a look at the visualization of the AMRES network, we can clearly see why this method of Internet filtering was the most applicable one – the decentralized structure of the AMRES network somehow imposes this kind of filtering strategy.

In our view, that type and complexity of a network structure and topology, ownership model & management needs, have a crucial role in defining the model of internet filtering, and the amount and type of equipment that will be used. For us, users or researchers without access to privileged information, the analysis of network topology maps can be a starting point for better understanding infrastructures of control and potential repression.

In December 2014, the Government of the Republic of Serbia sent a Proposal of the Law on Amendments to the Law on Games of Chance ¹⁰ to the Parliament. The proposed changes were adopted without a discussion and public insight, even though these provisions would introduce Internet censorship in Serbia through a “back door”. The solution that presented the main problem was the amendment ¹¹ which prohibits “ enabling access to websites by domestic electronic communication network service operators to legal entities or individuals organizing games of chance without the approval or consent of the Administration”.

Fortunately, after SHARE Foundation analyzed the Proposal and started a media campaign, the Proposal of the Law was withdrawn from the parliamentary procedure following an intervention of the Government. In one part of the Proposal, it was written that the installation, maintenance and costs of the equipment intended for filtering is a responsibility of the ISPs. In order to create an argument regarding unreasonable costs that every ISP would have, we tried to analyze the network topology maps of every individual ISP in Serbia and try to guess how much and what kind of equipment they would need to purchase. Even though our method is not 100% accurate, we had in our hands something to work with, something that gave us an insight into the unknown and invisible design of the networks. By watching the map of Telekom Serbia’s network, the biggest ISP in Serbia and owner of the biggest share of the infrastructure, we could observe the highly centralized structure where almost all the main nodes, routers were connected to just two main servers. The logical conclusion is that in order to perform real time filtering they would need to instal equipment exactly in those two points. On the other hand, from the number of nodes attached to those two main routers, we can guess that they are able to process huge amounts of traffic, therefore the equipment that they would need to install would probably need to be of high-end performance. We were able to predict the type and cost of the theoretical filtering solution, giving that there are just a few manufacturers of such equipment.

We played the Game of Filtering on the maps of the other ISPs as well, and each of them was a story for itself. Most of them were much more decentralized and we needed more efforts to find out where filtering could potentially happen. Decentralized networks are more complex to control, they have more crossroads, more points to cover if you want to have access to all the data flows. Although, it’s hard not to see the shape of the Panopticon structure in the case of the network organisation similar to the one we saw on the case of Telekom Serbia.

Given that our analysis is still only at the level of an individual ISP, this is just a small fragment of the story. The Internet is a network of networks, and to be able to create a full picture and to understand where the points of control are, we need to examine their local interconnections and links to the International networks. This is the topic of our next analysis.

Invisible Infrastructures : Internet Map of Serbia

admin — Sat, 07 Feb 2015 12:10:25 +0000

For thousands of years maps have been the essential tools to help human mankind to define, explain, and navigate their way through the world. Topology maps of the Internet are an important tool for characterizing the infrastructure and understanding the properties, behavior and evolution of the Internet.In our previous study, we explored individual Internet Service Providers, their size and structure. Now we are trying to understand, how they interconnect, we are exploring a network of networks or we can say the Inter of Internet.

What are we looking at?

By identifying and tracerouting 300.000 IP addresses and 30 ISPs in Serbia using various open network analysis tools, we created a map representing over 4.500 main routers and servers that make the core of the national Internet infrastructure. This Network Topology map allows us to identify the main actors, companies (ISPs) that own and control the infrastructure, have a possibility to access, retain, analyze or sell user’s metadata, their interconnection points, national Internet exit points and the level of infrastructure centralization on both national as well as the level of individual ISPs.

Every dot represents one IP address (router or other network device) and the lines between the dots are the links – cables that connect them. Every colour represents a different Internet Service Provider (ISP). This is a Network Topology map, i.e. it is not a physical map and it does not show exact geographical locations.

Networks, run by different Internet Service Providers, are interconnected at physical locations where their routers are connected by cables, the points of connection are called Internet exchange points (IXP). Those are the places where different networks meet, joining different networks into a single system, allowing us to connect to other connected devices on any other network.

Interconnection is both definitive of the Internet, and a manifestation of a business relationship between two ISPs¹².

Most ISPs are unlikely to have peering arrangements with all other ISPs in the world. Thus, with the exception of a small number of very large multinational network operators, most ISPs, themselves, need at least one transit provider to ensure they (and their customers) can reach the entire Internet¹³.

Despite the strong theoretical background, and the virtuality of the matter which was subject to this research, the output is quite concrete.

The most important conclusion is the identification of the intersections, i.e. the points where the ISPs meet. These are points of power, and the more ISPs meet at a single point the importance of that point, router, server, increases. It is important to know who manages and controls those points, because that is the entity that controls the internet in Serbia.

Anyway, the most important output of this research is that it can serve as a starting point for different multidisciplinary researches related to the internet infrastructure in Serbia. A few examples would include, measuring the internet speed in Serbia, measuring the level of bandwidth throttling, determining the routes that are used most often when accessing online content, etc.

Methodology

The research process is divided into four phases. Every phase is equally important since it provides the input data for the phase that follows. The final output of this research can also be used as an input to some other, more advanced analysis.

Determining the IP ranges

Every device that is connected to the Internet has one or more interfaces through which it communicates with other devices on the network. Each and every network interface is defined by a certain set of parameters, one of which is it’s IP address. The IP address is a logical Internet Protocol address which allows data to flow over the Internet from it’s source to the destination it was intended to reach.

Even though IP addresses are more logical rather than physical, using an IP address it is simple to determine in which country the device that uses it is located. The reason for this is that the IP addresses are assigned to users by a single authority. IANA (Internet Assigned Numbers Authority) through the RIRs (Regional Internet Registries, RIPE NCC for Europe and parts of Asia) assigns the ranges of IP addresses to the entities interested to rent them, but they keep a database as for which range is assigned to whom and other data including to which country is the certain range connected. That means that the IP addresses are also somewhat physical addresses. This information is publicly available, and there are websites online that show the IP address ranges by country along with the actual owner.

Scanning the Network

Since not all of the devices are connected directly to each other (in fact few are, i.e. even computers positioned in a single office use a router to communicate), there is the necessity of routing over the Internet. That means that if one host wants to communicate with another host on the Internet, he needs to establish a route through which they can connect. That route is in essence a set of IP addresses of different network devices that make it possible for the two hosts to communicate.

This means that in order to reach the destination address, the data hops from host to host. In order to see how two hosts are connected, the ICMP (Internet Control Message Protocol) is used. That is one of the most important protocols in the IP set of protocols. There is a simple tool, called traceroute, which is mostly used in network diagnostics. This tool makes the data hops over the Internet visible and systematic, which makes them usable by sending ICMP messages and waiting for responses from the destination hosts.

For tracerouting ranges of IP addresses there is a special tool called Nmap, which is quite user friendly, detailed and precise. Naturally, the bigger the range, the more computer resources are exploited. Basically, Nmap traceroutes the paths between the hosts on which it runs and every IP address from the range that is being scanned.

Note: The output is actually consisted of the routes that connect the source computers to all the active hosts from the range that accept ICMP messages.

Data Processing

The outputs of the scans are what we can call “raw data” in this case. They contain quite a portion of data that is not usable due to the hosts not giving any response during the scans because of different reasons, and are as such irrelevant for the Internet infrastructure at the time of scanning.

The actual usable data needs to be extracted and formatted in a proper way, so that it can be used as an input to the visualization software. First and most important it is to know what the software used for visualisation can work with. For this research it was CSV (Comma Separated Values) file, with a simple structure, i.e. 3 fields Source IP, Destination IP and Label.

The output of Nmap can be stored in a .xml file. Both of these file types are a special variant of text files, which makes the entire process of parsing data much easier. In essence, what is needed is a piece of software that will extract some text from one file, and put it in another. There is an ample of solutions available online, manly scripts. In this case a python script was used.

The script takes two arguments, the input file and the output file and what it does is, it searches the text files for a certain words (in this case trace and ipaddr) and when it comes to those predefined keywords it takes the necessary values. In the end it generates the .csv file with the required structure (in this case omitting the Label field, which is not required). The script is available here.

Note: People who prefer Perl to Python should consider this link.

Data Visualization

In order to visualize large sets of data, in our case more than 300.000 different IP addresses and the links between them, we needed to find a tool that has the ability to display, manipulate and transform the network into a map. We used Gephi, an interactive visualization and exploration platform for different kinds of networks and complex systems, dynamic and hierarchical graphs.
Our main challenge was how to represent a large number of nodes, in a most convenient way and still have a visualization useful for further research. Most of the Graph Layout Algorithms integrated into Gephi software during our tests failed to deal with large networks ( +100k nodes ) except partially OpenOrd and ForceAtlas2 algorithms.
ForceAtlas2, the algorithm that we used in the end is a Continuous Graph Layout Algorithm, a force-directed layout which is integrating different techniques such as the Barnes Hut simulation, degree-dependent repulsive force, and local and global adaptive temperatures. More about the algorithm you can find here.
In order to represent more clearly the results we chose to eliminate end-nodes and eliminate *noise*,. This reduced and cleared data set consisted of 4067 nodes, IP addresses that represent interconnected infrastructure of the main routers and servers serving the end users in Serbia.

Tools

Nmap ( http://nmap.org/ )
Python script used for XML to CSV parsing (script)
Gephi ( http://gephi.github.io/ )