Beruflich Dokumente
Kultur Dokumente
3.1 3.2 3.3 3.3.1 3.3.2 3.3.3 3.3.4 3.4 3.4.1 3.4.2 3.4.3 3.4.4 3.5 3.5.1 3.5.2 3.5.3 3.5.4 4
PARTNERS
1 Abstract
Following an application to the ISOC Community Projects funding, a grant was allocated to purchase equipment and write software track Internet IPv6 Connectivity worldwide. The detailed project application is available elsewhere and will not be repeated here since it would distract from the Teams achievements. The project team has designed and implemented an IPv6 Crawler, a computer and its software that runs through the DNS at preset intervals in order to detect, for example, IPv6 DNS servers and IPv6 compliant Web servers, SMTP mailers, and NTP servers. This Project aims to catalyze the rate of IPv6 adoption by creating and making available a set of tables and graphs showing the spread of adoption, per domain name. Results can be provided on automatically generated Web pages and displayed, for example, per country code top level domain (ccTLD), per gTLD, per type of organization, per business field and the classification of results is user configurable. By archiving the data, the service will be useful for future historical purposes as it will track how a radically new technology spreads on the Internet and this information might be useful to future strategic network planners working on the Future of the Internet.
2 Introduction
The Internet is running out of IPv4 addresses, and whilst many individuals and organizations currently track both the running out of IPv4 addresses, and the spreading and use of IPv6 addresses, no effort currently follows a structured method for tracking which could be expanded for future use. This project aims to introduce such a system. It is based around a set of computers which are dedicated to the task of tracking IPv6 connectivity and archiving results for immediate or future use. Its design was undertaken with a view to use well known, ubiquitous and sustainable formats in order to keep the resulting data useful for future generations.
3 Technical Details
The equipment installed in this project is made up of two servers: a back end Crawler, and a front end Web server. Although both are connected to the same part of the backbone and through the same router, the two servers function entirely independently of each other. The Crawler works through connectivity tests and generates huge quantities of data which are stored as text-based data files. The Web server integrates this data into an SQL database which can then be interrogated by Web pages to make the results available worldwide. All data generated is archived for historical purposes. The most exciting part of this project is that the data generated by the back end can be analysed completely independently of the crawls taking place an analogy would be a vacuum cleaner used to suck anything in its path, with the ability to independently design a system to open the dust bag to explore what is inside it at a later stage. As such, the front end Web server is currently a conceptualised design showing some of the capabilities for analysis which can be performed on the data. 2
It is worth noting that the multi-processor, multi-core hardware used in this project is exclusively functioning for the purpose of the primary purpose of IPv6 crawling. At present, the limit on crawling speed is imposed by the bandwidth used in the process of connectivity testing: since a large proportion of traffic generated is UDP traffic, it was decided to throttle the number of parallel processes so as not to cause ripples with our upstream providers.
3.1 Hardware
The hardware used for the project was purchased specifically for this task. There is a front end Web server and a back end Crawler. The two computers are independent, with the back end only performing crawling, and the front end performing web server, archiving, and email functions. The front end synchronises its data and downloads it from the back end at regular intervals. It also acts as a storage database.
Figure 1: The Crawler (left picture), and the Web server (right picture), Router (below)
Specifications for the Servers are shown below: Parameter Computer Systems Crawler (back end) Model Name (eth0) Specification
IPv4 address (eth0) / speed IPv6 address (eth0) / speed Name (eth1) IPv4 address (eth1) / speed CPU RAM HD Storage PSU Operating System Web Server (front end) Model Name (eth0)
HP DL360p turtle.ipv6matrix.org ; crawler.ipv6matrix.org turtle.ipv6matrix.com ; crawler.ipv6matrix.com turtle.ipv6matrix.net ; crawler.ipv6matrix.net 212.124.204.162 / 100 Mb/s 2a00:19e8:20:1::a2 / 100 Mb/s shell.ipv6matrix.org 194.33.63.250 / 1 Gb/s (GIH private address space) 2 x Dual Core Intel(R) Xeon(TM) CPU 3.60GHz 4 Gb DDR2 SDRAM 146 Gb hardware SATA 2-disk RAID (hot swappable) 2 x hot-swappable redundant 535W. CENTOS 5 Linux / updated
IPv4 address (eth0) / speed IPv6 address (eth0) / speed Name (eth1) IPv4 address (eth1) / speed CPU RAM HD Storage PSU Operating System Telecommunications Router Model Operating System DRAM Ethernet Ports / speed Interface card / speed
HP DL140 elephant.ipv6matrix.org ; www.ipv6matrix.org elephant.ipv6matrix.com ; www.ipv6matrix.com elephant.ipv6matrix.net ; www.ipv6matrix.net 212.124.204.170 / 100 Mb/s 2a00:19e8:20:1::aa / 100 Mb/s tusk.ipv6matrix.org 194.33.63.251 / 1 Gb/s (GIH private address space) 2 x Dual Core Intel(R) Xeon(TM) CPU 3.40GHz 4 Gb DDR2 SDRAM 2 x 1 Tb fast SATA Single 500W Ubuntu 4.4 Linux / updated
CISCO 2811 Advanced IP Services IOS 64 Mb 2 / 100 Mb/s MN-16ESW 16 port / 100 Mb/s
3.2 Network
Figure 2: Network set-up The transfer of data between the Crawler and the Web server takes place using a pair of private (non-routable) IPv4 addresses and a cross-over 1Gb/s CAT5e link. This allows for fast synchronisation of data between the servers. A specific set of IPv4 addresses (non routable) are used for the 1Gb link cross-over Ethernet cable: shell.ipv6matrix.org. 0 tusk.ipv6matrix.org. 0 IN IN A A 194.33.63.250 (on turtle.ipv6matrix.org Crawler) 194.33.63.251 (on elephant.ipv6matrix.org WWW)
Further peering agreements (peer 2 and peer 3) are in progress and being undertaken by 2020Media Ltd.
Feature Category
Feature
T1 T1 T1 T1 T1
Address Type Trace/ping Trace/ping Trace/ping DNS Trace/ping Trace/ping Trace/ping DNS Geo DNS DNS DNS DNS
Find IPv4 address(es) from DNS Find IPv6 address(es) from DNS Find ASN lookup using internal database (requires regular updating) Find primary, secondary etc. DNS server & whether they are dual stack (IPv6 glue) Check SOA record for DNS server & test contact For each address, determine type of address from prefix: 6to4 prefix = 2002::/16; Teredo prefix = 2001::/32; 6bone etc. using manually compiled prefix database Ping & Traceroute IPv4 address(es) Ping & Traceroute IPv6 address(es) Calculate difference in latency between IPv4/IPv6, including mean and variation Identify broken AAAA records from above DNS results Identify AAAA records with no actual connectivity (through ping & traceroute) Identify MTU differences between IPv4 and IPv6 by using several packet sizes in probing Record hop count from traceroutes and compare IPv4IPv6 hop count Detect if proper Reverse DNS is defined (matching of forward/reverse) Use Geo-localisation to match geographical coordinates of the node. Use local Geo-localisation database (can be updated) Check for DNS Server Obtain Name servers from DNS records about domain itself Test Name servers according to testing procedure T1 Check MX records
IPv6 Crawler Project Report: Part 1 MX MX MX MX MX MX MX MX MX MX MX WWW DNS T1 T1 T1 T1 Connect Connect Connect Automatic Connect Connect Connect
T1 T1 Connect Connect
T1
Obtain MX details from DNS records about domain itself Test MX servers according to testing procedure T1 Follow procedure for: _ Check Primary MX _ Secondary MX (if any) _ Check further MX (if any) For each MX record, connect to the remote machine using SMTP port: _ Detect remote mailer type (if possible) _ Detect if connected by IPv6 or IPv4 _ Detect if IPv6 record present but unreachable, thus fallback to IPv4 _ Detect if TLS implemented at remote machine _ Test mailer/version Check Web server Establish which domain works: Check domain prefixes www, ipv6, www.ipv6, www6, six, or indeed any other prefix (can be configured using config. file) Test Web addresses according to testing procedure T1 Test ports 80 (http) and 443 (https) Check NTP server Use address prefix time ie. time.example.com or ntp Test DNS and connectivity of NTP servers according to testing procedure T1 Use NTPDATE to check for NTP server using d and either -4 (IPv4) or -6 (IPv6) option to detect. Separately record -4 and -6 for future use (finding out when IPv4 NTP servers start decreasing) Keep option open for testing any other kind of type of server (modular approach)
NTP Other
Connect
Using an example domain example.com, the tests of procedure T1 can be performed on the following extensions: www.example.com and its variants (www6, www.ipv6, etc.), example.com SMTP mail exchangers (MX), ntp.example.com, DNS servers for example.com, hence its grouping as a common module. The software suite is built in Python 2.6.4. As a result, it is possible to enable/disable each test types for successive runs. This is seen next.
3.3.3 Results
Test results are recorded in files which are saved in a time-coded set of sub-directories. An example of the directory structure is as follows: crawls |-- 2010-07-18__12-24-48_summary.db |-- crawl_2010-06-16__11-40-49.log |-- crawl_2010-07-18__12-24-48.log |-- net | |-- 2010-06-16__11-40-49 | | |-- NS_net.csv | | |-- WWW_net.csv | | |-- NTP_net.csv | | |-- MX_net.csv | | |-- geoip_NS_net.csv | | |-- .... | | |-- .... | | `-- net.db | `-- 2010-07-18__12-24-48 | |-- NS_net.csv | |-- WWW_net.csv | |-- NTP_net.csv | |-- MX_net.csv | |-- geoip_NS_net.csv | |-- .... | |-- .... | `-- net.db |-- .... |-- .... |-- .... `-- com `-- 2010-07-18__12-24-48 |-- NS_com.csv |-- .... |-- .... `-- com.db
The .LOG files in the top directory (crawl) provide a record of the crawlers status. The filenames format is crawl_yyyy-mm-dd__hh-mm-ss.log giving the time stamp of the runs start, where yyyy=year; mm=month in number format; dd=day; hh=hour in 24h format; mm=minute; ss=second. The subdirectories of the crawl files can be: A top level domain, when generated from an input file listing domains all under one top level domain, and whose name is derived from the input example.csv file where example is the Top Level Domain in question. This is the preferred way of running the Crawler; or
An arbitrary name derived from the input example.csv file where example could be any word describing the criterion for example, britishuniversities, governmentsites, etc. This is seldom used because the same results could be achieved by interrogating the SQL database generated later on. Nonetheless, it is possible to run the Crawler in this way.
Each sub-directory then contains another set of sub-directories which include a time stamp in their name, whose format is: yyyy-mm-dd__hh-mm-ss, where yyyy=year; mm=month in number format; dd=day; hh=hour in 24h format; mm=minute; ss=second. These sub-directories then contain the test results in multiple comma separated value (.CSV) text files. This format is expected to be supported on digital media for the foreseeable future. The .DB files are generated later during analysis stage, from the .CSV files, in order to create a mySQL database which can be interrogated remotely and/or by a machine. Both the .CSV source files and the .DB mySQL files are archived in their entirety thanks to the fact that the directory structure is time-stamped, and therefore not over-written at subsequent runs. The following Comma Separated Value (CSV) text listings, are created for each TLD. For example, for the .UK TLD: File Names Main Data Type E-mail Exchanger Name Server WWW Network Time Protocol Start of Authority (NameSer vers) Geoip (where is the server?) Reverse IP Content Format
[type, domain, host, ipv4, ipv6, rank] [type, domain, host, ipv4, ipv6, rank] [type, domain, host, ipv4, ipv6] [type, domain, host, ipv4, ipv6]
soa_NS_uk
[type, domain, soa, primary_by_rank, primary_inhouse, secondary, total, contact, serial, refresh, retry, expire, minimum] [type, domain, host, ipv4, ipv6, asn, city, region_name, country_code, longitude, latitude]
geoip_MX_uk geoip_NS_uk geoip_NTP_uk geoip_WWW_uk reverse_MX_uk reverse_NS_uk reverse_NTP_uk reverse_WWW_uk ping_MX_uk ping_NS_uk ping_NTP_uk ping_WWW_uk
Ping
[type, domain, host, ipv4, ipv6, count, min, avg, max, std, min6, avg6, max6, std6]
10
IPv6 Crawler Project Report: Part 1 tcp25_MX_uk tcp80_WWW_uk tcp443_WWW_uk Tcp on 25 (SMTP) Tcp on 80 (HTTP) Tcp on 443 (HTTPS) Transport Layer Security Tracing the path
IPv6 Matrix / August 2010 [type, domain, host, port, ipv4, ipv6, tcp, tcp6] [type, domain, host, port, ipv4, ipv6, tcp, tcp6] [type, domain, host, port, ipv4, ipv6, tcp, tcp6]
tls_MX_uk
[type, domain, host, ipv4, ipv6, mtu4, hops4, back4, path4, mtu6, hops6, back6, path6]
IPv6 Type
In addition to the above tables, a first analytical step produces summaries generated for all crawled tlds. These are saved as follows: - geoip_summary - domainPenetration_summary - IpDuality_summary - ping_summary - path_summary [tld, type, country, hosts, ipv6hosts] [tld, total_num_domains, ipv6_enabled_domains_count] [tld, type, domains, hosts, ipv4, ipv6, ipv4_6, no_ip] [tld, type, hosts, faster, delay6, delay4] [tld, type, hosts, lesshops, hops6, hops4]
These are used as a basis for Web server to draw the Maps & display statistics. Once transferred over, these can then be processed by the Web server and formatted into SQL databases. This will be described in further detail in Section 3.4.
11
12
In each of those cases, the results recorded by the Crawler would be affected negatively, with a high probability of erroneous results being generated. With the variety of firewall products and the high amount of malicious activity requiring tightening of firewall rules on the Internet, it is impossible to avoid false positives, except by behaving sensibly. A proposed solution for scanning a larger database of sites could be to mirror this set-up in various points around the Internet, each scanning the domain database out of synchronization with other scanners. This suggestion is described later in the Further Work section of this report.
It is worth noting that at present, part of the information imbedded in a Web page is displayed directly from the CherryPy system from port 4444; this is due to instability of the system (memory leak) probably caused by immaturity of the technologies used here. This is likely to be resolved as developers release updates.
13
Figure 3: Web Page Structure The detailed results tables are generated from the mySQL database back-end and can therefore be interrogated using filters and search functions. It is important to note that the Web site is a prototype of the type of analytical results which could be displayed online. The Crawlers database contains such a wealth of information that the possibilities for plotting data, chronological analysis, trends, maps, etc. far exceed the concept presented in this report.
14
15
We recommend spending time using the filters to detect more information, including more DNS anomalies, errors and curiosities.
16
17
4 Partners
The Internet Society has provided funding for the project as part of its Community Projects Funding.
The English Chapter of the Internet Society acts as the local home to the project.
Global Information Highway Ltd., through Dr. Olivier MJ Crpin-Leblond, has designed and coordinated the project and sponsored many of its logistics.
International
Team Leaders
CTM
Nile University (NU), Egypt, through Dr. Sameh ElAnsary, Dr. Moustafa Ghanem, and Dr. Mohamed Abouelhoda, has partnered to write the software. The talented NU Team is composed of Mr. Mahmoud Ismail, Ms. Poussy Amr and Mr. Islam Ismail. TwentytwentyMedia Ltd., through Mr. Rex Wickham and Mr. Alan Barnett, has provided connectivity and rack space in their stacks at Telehouse East, Britains largest data centre and Backbone Internet Exchange Point. CTM International Ltd., through Mr. Omer Hamid, has supplied and configured all hardware required for the project, including all servers and telecommunication equipment required to connect to the Internet.
Dr. S. Al Ansary
18