HTTP-Based Botntet Detection

botAnalytics: Improving HTTP-Based Botnet Detection by Using Network Behavior Analysis System
Meisam Eslahi
DISSERTATION SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF COMPUTER SCIENCE
Faculty of Computer Science and Information Technology University of Malaya 2010
UNIVERSITI MALAYA ORIGINAL LITERARY WORK DECLARATION Name of Candidate: Meisam Eslahi Registration/Matric No: WGA070104 Name of Degree: Master of Computer Science Title of Project Paper/Research Report/Dissertation/Thesis (this Work): botAnalytics: Improving HTTP-Based Botnet Detection by Using Network Behavior Analysis System Field of Study: Network Security I do solemnly and sincerely declare that: (1) I am the sole author/writer of this Work; (2) This Work is original; (3) Any use of any work in which copyright exists was done by way of fair dealing and for permitted purposes and any excerpt or extract from, or reference to or reproduction of any copyright work has been disclosed expressly and sufficiently and the title of the Work and its authorship have been acknowledged in this Work; (4) I do not have any actual knowledge nor do I ought reasonably to know that the making of this work constitutes an infringement of any copyright work; (5) I hereby assign all and every rights in the copyright to this Work to the University of Malaya (UM), who henceforth shall be owner of the copyright in this Work and that any reproduction or use in any form or by any means whatsoever is prohibited without the written consent of UM having been first had and obtained; (6) I am fully aware that if in the course of making this Work I have infringed any copyright whether intentionally or otherwise, I may be subject to legal action or any other action as may be determined by UM. (I.C/Passport No: I2140114)
Candidates Signature
Date
Subscribed and solemnly declared before, Witnesss Signature Name: Dr Rosli Salleh Designation: Supervisor
II
Date
Abstract
This thesis reports on the research conducted to develop a method for detecting HTTP-based Botnets based on the Network Behaviour Analysis system. Bots are small-size malwares that infect computers, and join with other bots via the Internet to form a network of bots called Botnet.
Botnets and their bots have a dynamic and flexible nature. The Botmasters, who control the Botnets, update the bots and change their codes day by day to avoid the traditional detection methods such as signature-based anti-viruses. In addition, many techniques are employed by Botmasters to make their Botnets undetectable for as long as possible. The latest generations of Botnets are HTTP-based, and use the standard HTTP protocol to communicate with their bots. By using the normal HTTP traffic, the bots passed off as normal users of the networks, and they can easily bypass the current network security systems.
To solve this problem, a method based on network behaviour analysis system was developed to improve the existing methods of detecting HTTP-based Botnets and their bots. The system, botAnalytics, was developed by modifying the existing network behavior analysis methods and adding new features to them. The Delphi programming language was used to develop the botAnalytics system, while Microsoft Sql Server 2008 was selected as its database management system. New filters and algorithms were designed and developed to analyse the collected network packets to look for any evidence of suspicious HTTPbased Botnets activities.
III
In addition to HTTP-based Botnet detection, one of the HTTP header fields, called the User-Agent, was used by botAnalytics to analyse the level of danger of detected suspicious activities. This is the first reported use of the User-Agent to aid Botnet detection. Based on the result of the testing and evaluation of botAnalytics, the system has been found to be very efficient in detecting HTTP-based Botnets. botAnalytics was also found to be very efficient for detecting small-scale Botnets.
IV
Acknowledgements
Thank God, the most Gracious and Merciful, for all the blessings bestowed on me. The submission of this dissertation marks the end of a somewhat long journey in my pursuit of Masters degree at the University of Malaya, Kuala Lumpur. The journey would have been difficult if not for all the help, understanding and kindness of many people. Without doubt, I would like to express my sincere gratitude to my supervisors, Dr Omar Zakaria and Dr Rosli Salleh for their kindness to take me under their charge to conduct this research. Their patience and encouragement gave me the motivation to work on this research until its successful completion. Their guidance and readiness to share their knowledge have greatly contributed to the direction I should take and what I should do to achieve my goal. I cannot thank them enough, and it is hoped the Malay way of expressing how I feel says it all Ribuan terima kasih. While doing my studies and research in FCSIT, one can say that one is never working alone. I have the friendship, goodwill and support of my course-mates and friends, who have never hesitated to offer their advice and moral support when it is needed. To my good friend, Mohsen Saghafi, in particular, thank you for being there whenever I needed someone to go to for advice. To all of them, especially Saiful Khan, Teh Kang Hai, Paul Nelson, and Ali Keshavarz a big thank you. I would like to express my gratitude and love to my family for their care and understanding when I was doing my research. To the two special women in my life, my mother K.Abdullahi and my wife Maryam Var Naseri, your boundless love, and for your confidence in me, you have been my pillars of strength and determination to help me to
V
carry on, and if I have succeeded, then you have been a big part of my success, and I dedicate it to both of you together with my love.
VI
Table of Content
Abstract ........................................................................................................................... III Acknowledgements .......................................................................................................... V Table of Content ............................................................................................................ VII List of Figures ................................................................................................................. XI List of Tables ................................................................................................................. XIII Abbreviations ................................................................................................................ XIV Chapter 1: Introduction ................................................................................................ 1 1.1 Background ............................................................................................................ 1 1.2 Motivation.............................................................................................................. 3 1.3 Statement of Problem ............................................................................................. 5 1.4 Statement of Objectives .......................................................................................... 6 1.5 Proposed Solution .................................................................................................. 7 1.6 Thesis Scope .......................................................................................................... 7 1.7 Thesis Organisation ................................................................................................ 8 Chapter 2: Bot and Botnets ......................................................................................... 10 2.1 Introduction ............................................................................................................ 10 2.2 Characteristics of Botnet ......................................................................................... 11 2.2.1 Botnet life cycle and Botmaster activities ......................................................... 11 2.2.2 Botmasters Prime Targets................................................................................. 13 2.2.3 Botnet Command and Control (C&C) Mechanism............................................ 13 2.2.4 Centralised Command and Control Mechanism ................................................ 14 2.2.4.1 IRC-based Botnets ..............................................................................16 2.2.4.2 HTTP-based Botnets ...........................................................................17 2.2.5 Decentralised or P2P Command and Control Mechanism ................................. 18 2.3 Why Choose HTTP Botnets? .................................................................................. 19 2.4 Existing Botnet Detection Methods ......................................................................... 21 2.4.1 Honeypot and Honeynet ................................................................................... 21
VII
2.4.2 Detection by Signature ..................................................................................... 22 2.4.3 Detection by DNS Monitoring .......................................................................... 23 2.4.4 Detection using Attack Behaviour Analysis ...................................................... 25 2.5 Detection Based on Network Behaviour Analysis ................................................... 26 2.5.1 Why Choose Network Behaviour Analysis? ..................................................... 27 2.5.2 Existing Detection Methods Based on NBA ..................................................... 27 2.5.3 Evaluation and Comparison of Existing NBA Methods for Botnet Detection.... 30 2.6 Conclusion ............................................................................................................. 32 Chapter 3: Modeling of Detection System .................................................................. 33 3.1 Introduction: ........................................................................................................... 33 3.2 Proposed Method Architecture................................................................................ 33 3.3 Data Reduction Filters ............................................................................................ 34 3.4 VOU Mechanism ................................................................................................. 36 3.5 Analysing the Collected Traffic ............................................................................ 38 3.6 LODA Mechanism ............................................................................................... 42 3.7 Proposed Method Flowchart ................................................................................. 44 3.8 Conclusion ........................................................................................................... 46 Chapter 4: Implementation of Proposed Model ......................................................... 47 4.1 Introduction .......................................................................................................... 47 4.2 DELPHI programming language .......................................................................... 47 4.3 Client Side Implementation .................................................................................. 49 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 Settings ....................................................................................................... 49 Sniffing the Traffic ..................................................................................... 51 H.T.S. filter ................................................................................................ 52 G.P.S. filter ................................................................................................. 52 VOU Mechanism ........................................................................................ 53 Microsoft Sql Server 2008 .......................................................................... 55 Tables structure ......................................................................................... 57
VIII
4.4 Database Implementation ..................................................................................... 55 4.4.1 4.4.2
4.4.3
Tables relationship ...................................................................................... 63 General info ................................................................................................ 64 Analyse ...................................................................................................... 66 Notifications ............................................................................................... 73 Report......................................................................................................... 74 User Agent list ............................................................................................ 75 White list .................................................................................................... 76 Black list .................................................................................................... 77 Sensor status ............................................................................................... 77 User account ............................................................................................... 80
4.5 Server Side Implementation.................................................................................. 63 4.5.1 4.5.2 4.5.3 4.5.4 4.5.5 4.5.6 4.5.7 4.5.8 4.5.9
4.6 Conclusion ........................................................................................................... 82 Chapter 5: Testing the Proposed Model ..................................................................... 83 5.1 Introduction .......................................................................................................... 83 5.2 Hardware Requirements ....................................................................................... 83 5.3 Testing bots .......................................................................................................... 84 5.4 Testing Command and Control servers ................................................................. 86 5.5 Testing clients ...................................................................................................... 86 5.6 Testing analyser ................................................................................................... 87 5.7 Testing results ...................................................................................................... 87 5.8 Conclusion ........................................................................................................... 88 Chapter 6: Data Analysis and Discussion ................................................................... 89 6.1 Introduction ............................................................................................................ 89 6.2 Evaluation of botAnalytics ..................................................................................... 89 6.2.1 Filtering evaluation .......................................................................................... 90 6.2.2 VOU algorithm evaluation ............................................................................... 96 6.2.3 LODA algorithm evaluation ............................................................................. 97 6.3 Comparison of botAnalytics with Other Systems .................................................... 99 6.3.1 False-Positive rate ............................................................................................ 99
IX
6.3.2 Efficiency in small-scale Botnets.................................................................... 100 6.4 Conclusion ........................................................................................................... 101 Chapter 7: Conclusion and Future Work ................................................................. 102 7.1 Introduction .......................................................................................................... 102 7.2 Achievement of Objectives ................................................................................... 102 7.3 Contributions ........................................................................................................ 103 7.3.1 7.3.2 7.3.3 7.3.4 HTTP-based Botnet Detection: ................................................................. 103 Establishment of User-Agent : .................................................................. 104 New Filters and Algorithms: ..................................................................... 104 Evaluate the Level of Danger: ................................................................... 104 Real Time Detection: ................................................................................ 105 Linux Platform ......................................................................................... 105 Other Type of Bots and Botnets: ............................................................... 106 Prevention Methods: ................................................................................. 106 Advanced the User-Agent for Botnet Detection : ...................................... 106
7.4 Limitations and Future Work ................................................................................ 105 7.4.1 7.4.2 7.4.3 7.4.4 7.4.5
7.5 Conclusion ........................................................................................................... 107 References .................................................................................................................... 108
List of Figures
Figure 2-1: Botnet life cycle (Schiller & Binkley, 2007) .................................................. 11 Figure 2-2: General schema of Botnets C&C mechanism ................................................. 14 Figure 2-3: Centralised Botnet (Ping, et al., 2007) ............................................................ 15 Figure 2-4: IRC-based C&C Botnet (Gu, et al., 2008) ...................................................... 16 Figure 2-5: HTTP-based C&C Botnet (Gu, et al., 2008) ................................................... 17 Figure 2-6: Decentralized or P2P Botnet (Ping, et al., 2007)............................................. 18 Figure 3-1: botAnalytics System Architecture .................................................................. 33 Figure 3-2: The flowchart of H.T.S. filter ......................................................................... 35 Figure 3-3: The flowchart of G.P.S. filter ......................................................................... 36 Figure 3-4: The VOU Module Flowchart ......................................................................... 37 Figure 3-5: Flowchart of H.A.R. Filter ............................................................................. 40 Figure 3-6: Flowchart of L.A.R. Filter ............................................................................. 41 Figure 3-7: P.A.R. Filter Flowchart .................................................................................. 42 Figure 3-8: LODA Module Flowchart .............................................................................. 43 Figure 3-9: The Proposed Method Flowchart ................................................................... 45 Figure 4-1: botAnalytics Client Side GUI ........................................................................ 49 Figure 4-2: Setting GUI ................................................................................................... 50 Figure 4-3: Traffic Sniffer GUI ........................................................................................ 51 Figure 4-4 : H.T.S Filter GUI ........................................................................................... 52 Figure 4-5 : G.P.S Filter GUI ........................................................................................... 53 Figure 4-6 : VOU Mechanism GUI .................................................................................. 53 Figure 4-7 : VOU Pseudo Code........................................................................................ 54 Figure 4-8: botAnalytics Database: Relationship between the Tables ............................... 63 Figure 4-9: botAnalytics Server Side GUI ........................................................................ 64 Figure 4-10: General Info GUI ......................................................................................... 65 Figure 4-11: GET and POST Percentage Query Pseudo Code .......................................... 65 Figure 4-12: Collected Traffic Statistics Query Pseudo Code ........................................... 66 Figure 4-13: Primary Data Tab of the Analyse Section ..................................................... 67 Figure 4-14: Black/White listing Tab of the Analyse Section ........................................... 68 Figure 4-15: H.A.R. Result Tab of the Analyse Section................................................... 69 Figure 4-16: H.A.R. Filter Pseudo Code........................................................................... 69 Figure 4-17: L.A.R Result Tab of the Analyse Section ..................................................... 70 Figure 4-18: L.A.R. Filter Pseudo Code ........................................................................... 70 Figure 4-19: P.A.R. Result Tab of the Analyse Section .................................................... 71 Figure 4-20: P.A.R. Filter Pseudo Code ........................................................................... 72 Figure 4-21: LODA Module Pseudo Code ....................................................................... 72 Figure 4-22: LODA Module Result GUI .......................................................................... 73
XI
Figure 4-23: Notifications GUI ........................................................................................ 74 Figure 4-24: Report GUI .................................................................................................. 75 Figure 4-25: User Agent List GUI .................................................................................... 76 Figure 4-26: White List GUI ............................................................................................ 76 Figure 4-27: Black List GUI ............................................................................................ 77 Figure 4-28: Sensor Status GUI........................................................................................ 78 Figure 4-29: Sensor Info Pseudo Code ............................................................................. 79 Figure 4-30: Top 10 Active Sensors Pseudo Code ............................................................ 79 Figure 4-31: Edit Profile Tab ........................................................................................... 80 Figure 4-32: Create New User Tab ................................................................................... 81 Figure 4-33: Manage Existing Users Tab ......................................................................... 82 Figure 5-1 : General Schema for the Testing Phase .......................................................... 84 Figure 5-2 : The Black Energy User Agent (Nazario, 2007) ............................................. 84 Figure 5-3: The Firefox User Agent ................................................................................. 85 Figure 5-4: The Bobax User Agent................................................................................... 85 Figure 6-1: The H.T.S. Filter Results Chart (See also Table 6-1) ...................................... 91 Figure 6-2: The G.P.S. Filter Results Chart (See also Table 6-1) ...................................... 92 Figure 6-3: The H.A.R. Filter Results Chart (See also Table 6-1) ..................................... 93 Figure 6-4: The L.A.R. Filter Results Chart (See also Table 6-1) ..................................... 94 Figure 6-5: The P.A.R. Filter Results Chart (See also Table 6-1)...................................... 95
XII
List of Tables
Table 2-1: Comparison of Methods from Past Researches with botAnalytics ................... 30 Table 4-1: Comparison of Managed-code with Native-code Languages ........................... 48 Table 4-2 : Comparison of DBMSs .................................................................................. 55 Table 4-3 : Microsoft Sql Server 2008 Extra New Features .............................................. 56 Table 4-4 : tblUser Structure ............................................................................................ 57 Table 4-5: tblSquestion Structure ..................................................................................... 58 Table 4-6: tblRole Structure ............................................................................................. 58 Table 4-7: tblUserAgent Structure.................................................................................... 58 Table 4-8: tblWhiteList Structure ..................................................................................... 59 Table 4-9: tblBlackList Structure ..................................................................................... 59 Table 4-10: tblClientsInfo Structure ................................................................................. 60 Table 4-11: tblVOU Structure .......................................................................................... 60 Table 4-12: Structure of tblHMType table ........................................................................ 60 Table 4-13: Structure of tblVouValue Table ..................................................................... 61 Table 4-14: Structure of tblResult Table .......................................................................... 61 Table 4-15: Structure of tblLODA Table .......................................................................... 62 Table 4-16: Structure of the tblNotification Table ............................................................ 62 Table 5-1: botAnalytics Filtering Result ........................................................................... 87 Table 5-2: botAnalytics Botnet Detection Results ............................................................ 87 Table 6-1: botAnalytics: Results of Filtering .................................................................... 90 Table 6-2: The VOU Algorithm Result ............................................................................ 96 Table 6-3: The LODA Algorithm Results ........................................................................ 98 Table 6-4: Comparison of the botAnalytics with existing HTTP-based Botnet detection researches ........................................................................................................................ 99 Table 6-5: The botAnalytics False-Positive .....................................................................100
XIII
Abbreviations
C&C DBMS DDOS DNS DNSBL ERD G.A.S. G.P.S. GUI H.A.R. H.T.S. HTTP IID IRC L.A.R. LODA NBA P2P P.A.R. RAD SDLC VOU Command and Control Database Management System Distributed Denial of Service Domain Name System DNS-based Black Hole List Entity Relationship Diagram Grouping and Sorting GET and POST Separator Graphical User Interface High Access Rate HTTP Traffic Separator Hyper Text Transfer Protocol Iterative and Incremental Development Internet Relay Chat Low Access Rate Level of Danger Analysing Network Behaviour Analysis Peer-to-Peer Periodical Access Rate Rapid Application Development System Development Life Cycle Validation of User-Agent
XIV
Chapter 1: Introduction
1.1 Background The development of computer networking, followed by the Internet in the second half of the last century, can be said to be one of the key technological developments that has revolutionised our daily life. The convenience and speed of digital communication has become an integral part of home computer use, as well in every other aspects of human activities, today, from education to business and research. While high-speed computer networking and the Internet have brought great convenience, a number of security challenges have also emerged with these technologies (O'Connor, 2004; Tanenbaum, 2002). With the increasing use of computer networks and Internet on a global scale, network security becomes an important issue. In fact, without having adequate network security all the benefits brought by these technological developments would be lost as the networks and Internet are vulnerable to malicious attacks. These attacks or threats can come in different forms and can generally be categorised as: Viruses and Worms; Trojans; Backdoors; Spyware; Phishing; and Botnets. Among all these threats, the Botnet is considered the most dangerous (Barroso, 2007; Jae-Seo, HyunCheol, Jun-Hyung, Minsoo, & Bong-Nam, 2008; Star, 2008) A Botnet is a linked group of infected computers (termed as bots or zombie), which communicate with each other and get their commands from a controller, called Botmaster. A Botmaster has a mechanism to control their Botnets by sending commands to the bots and receiving response from them. Different command and control mechanisms (e.g. IRC,
HTTP, and P2P) are used by Botmasters to achieve this goal (Govil & Jivika, 2007; Naseem, shafqat, Sabir, & Shahzad, 2010). The main aim of Botnets is to carry out different types of malicious activities or to gain illegal profits. Some of these activities such as Distributed Denial of Service (DDoS), Spamming, Thieving Personal Information, Illegal Hosting, Click Fraud, and Adware are described below: a) DDOS: this is the distributed form of Denial of Service or DOS attack that is carried out by sending of a large number of UDP packets, ICMP requests, or TCP sync floods, aimed at using the resources of particular servers and forcing them to shut down. Because the Botmasters control the Botnets, they can carry out this type of attack from thousands of different places by sending a particular command to the bots in the infected computers in the same Botnet (Govil & Jivika, 2007; Puri, 2003; Srikanth, Dina, Matthias, & Arthur, 2005).
b) Spamming: spamming refers to emails, which have the same content but are sent in high volume. Botnets can be considered as a perfect platform to collect different email addresses from infected computers, and generate and send spam or phishing emails (Yinglian et al., 2008).
c) Thieving Personal Information: Botmasters use the Botnets to steal information and use them for their own benefits. They can set a trigger to the bots and make them scan the websites where the important information is entered. In addition, other applications such as key-loggers are spread by the bots to obtain important information like personal passwords, and financial data like online banking
2
passwords, and credit card information. Depending on the size of the Botnet, a Botmaster can collect the required data or information from thousands to millions of computers (Al-Hammadi & Aickelin, 2008; Govil & Jivika, 2007).
d) Illegal hosting : A computer or server with a large storage and a high-bandwidth connection to the Internet can became a target for a Botmaster to gain control and use for file sharing, illegally (AUSCERT, 2002; Puri, 2003).
e) Click Fraud and Adware: One of the main differences between Botnets and other Internet threats is that a Botnet can be used to make money by click frauding. Botmasters can amass a lot of money by using their bots to click on open websites that pay a small sum of money for each visit to the website or for each click on the advertisement. Pop-up advertisements can also be downloaded, installed, or displayed by bots to force a user to visit particular websites (Barroso, 2007).
In addition, the Botnets can be used to spread different types of computer threats in the form of viruses, Trojans, Backdoors, worms, etc. This means that Botnets are not only a threat, but also a platform for the distribution of other threats (Star, 2008).
1.2 Motivation In recent years, the Botnets have become the biggest threat to cyber security, and have been used as an infrastructure to carry out nearly every type of cyber attacks. In a review of the different types of malicious activities perpetrated by Botnets, it is found
3
that they are not only a dangerous threat to computer networks and the Internet, but are also involved in other types of threats and attacks (Jae-Seo, et al., 2008; Lee, Wang, & Dagon, 2007). Based on the network world report in 2009, more than 11.1 million computers in the US had been infected by the 10 most damaging Botnets. While the theft of personal information has always been considered as one of the most disturbing Internet threats, the Zeus Botnet alone had infected nearly 3.5 million computers and attempted to steal sensitive information. Each bot can send an average of three spam emails or fake messages per second, thus, the Koobface Botnet with 2.9 million infected computers can generate more than 8 million fake messages per second (Messmer, 2009). In addition, the detection of Botnets and their associated bots are difficult based on justification described below: a) Skilful Developers: Botnet developers have higher technical capabilities than any other online attackers. Unlike other types of network threats, Botnets and their bots are designed and developed for long-term goals, or even, for illegal monetary gains. Botmasters have various strategies to keep the bots safe and hidden, as long as possible (Lee et al., 2007).
b) Dynamic Nature and Flexibility: Botnets and bots have a dynamic and flexible nature. They are continuously being updated and their codes changed by the developers and owners to elude the traditional detection methods such as signature-based anti-viruses. The McAfee Research Lab reported that any success in Botnet detection is only temporary as the Botmasters frequently
change their strategies, and design new methods to recover and restore their detected bots, within a short time (McAfee, 2010).
c) Using Standard Protocols: The Botnets use standard protocols to establish their communication infrastructure. The latest generations of Botnets, called HTTP-based Botnets, use the HTTP protocol as their communication method. By using the normal HTTP traffic, they disguise themselves as normal network users and easily avoid detection by the current network security systems (JaeSeo et al., 2008).
d) Silent Threats: Barroso (2007) termed the Botnets as Silent Threats, as they try to control the infected computers without the knowledge of the computer users. The bots on infected computers will not make any unusual or suspicious use of the CPU, memory, or other computer resources, which will, otherwise, cause their presence to be exposed.
The examples above show that Botnet detection is a big challenge in the network security management.
1.3 Statement of Problem Companies computers with high-bandwidth connectivity to the Internet, university servers, and home computers are the main targets for Botnets. The Botmasters try to get the control of these targets and carry out their malicious activities.
Today, the detection of Botnets has become a main issue in the field of computer network security. Botnets have several characteristics that make them difficult to be detected. They are distributed very fast and the Botmasters are always trying different techniques to protect their bots from existing anti-virus software and detection systems (Lee, et al., 2007). Currently, there isnt any effective technique to stop Botnets and existing detection techniques are unable to detect and prevent the Botnets sufficiently. The McAfee Research Labs predicted that the cyber community will face more widely-distributed and more resilient Botnets, which are difficult to detect and destroy. Undoubtedly, network security researchers will continuously face big challenges on this problem (McAfee, 2010).
1.4 Statement of Objectives The aim of this research is to develop an improved method for the detection of HTTP-based Botnets. In this context, the objectives of this research are as follows: To study detailed knowledge of the HTTP-based and other types of Botnets. To evaluate the existing methods of Botnet detection. To study an overview of the characteristics and architecture of Network Behaviour Analysis (NBA) System. To model a system to improve detection of the HTTP-based Botnets based on the Network Behaviour Analysis (NBA). To develop a HTTP-based Botnet detection system by using NBA system architecture. To test and evaluate the proposed and developed system and to compare it to existing NBA methods.
6
1.5 Proposed Solution In this thesis, a Network Behaviour Analysis system, called botAnalytics, is developed. The botAnalytics uses software sensors which are installed on network clients to collect information on the network flows. The information from an entire network will be stored in the server database and will be examined by another part of botAnalytics system, known as the analyser, to look for any evidence of HTTP-based Botnets activities. botAnalytics aims to be able to detect HTTP-based Botnets regardless of their size and with very low false-positive ratio. Various types of data filtering were introduced for first time or modified by botAnalytics to make the detection process better. In addition, one of the HTTP header fields, User-Agent (Fielding et al., 1999), was used to design a new algorithm to evaluate the danger level of detected suspicious activities.
1.6 Thesis Scope In this research, the Network Behaviour Analysis technique (Scarfone & Mell, 2007) was selected as it can be modified and used to detect the HTTP-based Botnets. Improvement will be made to existing HTTP-based Botnet detection capabilities by adding new features. The Network Behavior Analysis technique was chosen because of its ability to detect encrypted and new (Zero-Day) bots, despite its drawback that it works passively, and is not suitable for real-time detection (Derek, 2009). It is difficult to find the source codes of HTTP-based bots to establish a real Botnet, hence, it has to be simulated and implemented using appropriate programming approaches. The implementation of the bots in this research is based on two existing
7
HTTP-based bots - the Black Energy (Nazario, 2007) and Bobax (Joe, 2004). Black Energy and Bobax were selected because the methods proposed by the other researches such as Jae-Seo et al. (2008) and Gu, Zhang, & Lee (2008) used these bots to evaluate their methods. Thus, the same bot structure can also be used to evaluate the new proposed method developed in this research, and compare it with other methods.
1.7 Thesis Organisation Chapter 1 (Introduction): This chapter presents an overview of Botnets and their malicious activities, the motivation of this research, problem statements, the objectives, and the scope of this research.
Chapter 2 (Literature Review):
This chapter presents information from the
literature on Botnets characteristics, lifecycle, and architecture. It also gives an overview of current Botnet detection methods.
Chapter 3 (Modeling of Detection System):
This chapter presents the steps
involved in modeling the HTTP-based Botnet detection system to achieve the objectives.
Chapter 4 (Implementation of Proposed Model): This chapter discusses the steps involved in developing the proposed system.
Chapter 5 (Testing the Proposed Method): This chapter discusses the steps involved in testing the proposed system, and the testing process.
8
Chapter 6 (Result Analysis and Discussion): This chapter presents the research findings, and discusses the effects of the new filters and algorithms developed.
Chapter 7 (Conclusion and future work): This chapter provides a summary of the whole research and the significance of its findings. It also gives recommendations for related work to be undertaken, in future.
Chapter 2: Bot and Botnets

This chapter presents a review of the literature on other researches on Botnets, and the methods for Botnet detection. Section one gives an overview of bots and Botnets. Section two discusses the characteristics of Botnets, including the life cycle, the Botmasters functions, and their prime targets, as well as their command and control mechanisms. Existing Botnet detection methods are reviewed in section three. The last section presents the network behaviour analysis technique, and background information on its use in Botnet detection. 2.1 Introduction A bot (originates from the term robots) is an application that can perform and repeat a particular task faster when compared to human. When a large number of bots spread to different computers and connect to each other through the Internet, they form a group called Botnet, which is a network of bots (Mitsuaki et al., 2007). Botnets range in size from a large Botnet having millions of bots, to a small Botnet having thousands of bots, only. Regardless of their size, which has a direct link to their complexity and purpose, Botnets are mainly created to carry out malicious activities in computer networks (Govil & Jivika, 2007; Lee, et al., 2007; Zhaosheng et al., 2008). A bot is designed to infect computers, and the infected computers become a part of a Botnet without their owners knowledge, and come under the control of a person, known as the Botmaster. The Botmaster sends orders to all the bots and controls the entire Botnet through the Internet and the servers, known as the command and control (C&C) servers (Govil & Jivika, 2007; Zhaosheng, et al., 2008).
10
2.2 Characteristics of Botnet 2.2.1 Botnet life cycle and Botmaster activities Botnets can be of different sizes or structures but, in general, they go through the same stages in their life cycle (Govil & Jivika, 2007; Schiller & Binkley, 2007). Figure 21 shows the life cycle of Botnets. a) Infection The life cycle of a Botnet begins with the infection of the different computers by its bots. An infected computer is known as a zombie (Lee, et al., 2007) .
Figure 2-1: Botnet life cycle (Schiller & Binkley, 2007)
11
b) Rallying After infecting the computer, the bot must connect to its Command and Control (C&C) server and let the Botmaster know that it has already established a zombie, successfully. In addition, it updates itself with essential information such as updating the list of relative C&C server IP address list. Therefore, rallying refers to the process when the bots connect to the C&C server for the first time (Schiller & Binkley, 2007).
c) Get Commands and Send Reports During this stage, the bots on the infected computers or zombies, listen to the Command and Control server or connect to them periodically to get new commands from the Botmaster. A new command, when detected by the bots, is treated as an order; they execute the order and the results are reported to the Command and Control server; the bots then wait for new commands (Govil & Jivika, 2007; Schiller & Binkley, 2007).
d) Abandon When a bot is no longer usable (e.g. too slow) or the Botmaster decides that the particular bot is no longer suitable, it may be abandoned by the Botmaster. If this happens, the Botnet is still available. A whole Botnet is destroyed when all its bots are detected or abandoned or when the Command and Control Servers are detected and blocked (Schiller & Binkley, 2007).
12
e) Securing the Botnet One of the important issues in each Botnet life cycle is the constant effort to keep the whole Botnet secure. The Botmasters do this by encrypting the messages that are delivered between the bots, and between the bots and the Command and Control servers. In addition, Botmasters may update the bots with new codes and new techniques to evade the anti-virus software (Schiller & Binkley, 2007).
2.2.2 Botmasters Prime Targets The Botmasters may infect different types of computers or servers but the most common targets are the less-monitored computers, high-bandwidth connectivity, university servers, and home computers. Computers that are connected to the Internet using broadband connection, give attackers an opportunity to use the same bandwidth. The not so computer-savvy home users are also prime targets of the Botmasters. These users usually have low awareness or lack knowledge of network security, and Botmasters take advantage of this to gain unauthorised access into the computers and keep their bots there for a long time without being detected (Govil & Jivika, 2007; Puri, 2003).
2.2.3 Botnet Command and Control (C&C) Mechanism As discussed in the previous sections, a Botnet threat comes from three main elements - the bots, the Command and Control (C&C) servers, and the Botmasters. The bots infect the computers, and the Command and Control servers distribute the
13
Botmasters order to the bots in infected computers. These three elements have close communication with one another, thus, they will be useless without some form of Command and Control mechanism for this to take place (Gu, Zhang, & Lee, 2008). The Command and Control mechanism creates an interface between the bots, C&C servers and the Botmasters, to transmit data between them. It is very crucial for Botmasters to establish a fool-proof connection between themselves, the infected computers, and C&C servers (Govil & Jivika, 2007). Figure 2-2 shows the logical relationship between these three elements.
Figure 2-2: General schema of Botnets C&C mechanism
There are two types of Botnet command and control architectures - centralised and decentralised - based on the way communication is implemented (Chao, Wei, & Xin, 2009; Zeidanloo & Manaf, 2009).
2.2.4 Centralised Command and Control Mechanism In the centralised command and control approach, all the zombies or bots are connected to the central C&C server, which is constantly waiting for new bots to be connected. Depending on the Botmasters settings, a C&C server may provide some
14
services to register the available bots, and this will make it possible to track their activities. Undoubtedly, the Botmaster must be connected to the C&C server to have control of the Botnets and distribute its commands and tasks (Gu, et al., 2008; Jing, Yang, Kaveh, Hongmei, & Jingyuan, 2009; Lee, et al., 2007; Ping, Sherri, & Cliff, 2007). Figure 2-3 shows the structure of a Centralised Command and Control Botnet.
Figure 2-3: Centralised Botnet (Ping, et al., 2007)
Centralised Botnets are the most common type of Botnets as they use simple steps to create and manage the bots, and response is fast (Gu, et al., 2008; Jing, et al., 2009; Ping, et al., 2007). The centralised C&C mechanism is divided into two main types IRC-based or HTTP-based - based on the communication protocols they use to establish their connection (Naseem, et al., 2010; Zeidanloo & Manaf, 2009; Zhaosheng, et al., 2008).
15
2.2.4.1 IRC-based Botnets IRC or Internet Relay Chat is a system that is used by computer users to communicate online or chat in real-time mode (Kalt, 2000). This method was used in the first generation of bots, at which the Botmaster used the IRC server and the relevant channels to distribute their command (Jae-Seo, et al., 2008). Each bot connects to the IRC server and channel that has been selected by a Botmaster, and waits for commands. In this setup, the Botmaster establishes real-time communication with all the connected bots, and controls them. The IRC bots follow the PUSH approach, which means that when an IRC bot connects to a selected channel, it does not get disconnected, and remains in the connect mode (Gu, et al., 2008; Naseem, et al., 2010; Ping, Lei, Baber, & Cliff, 2009). Figure 2- 4 shows the IRC-based Command and Control Botnets.
Figure 2-4: IRC-based C&C Botnet (Gu, et al., 2008)
16
2.2.4.2 HTTP-based Botnets HTTP-based Command and Control is a new technique that allows the Botmasters to control their bots by using the HTTP protocol (Jae-Seo, et al., 2008). In this technique, the bots use specific URL or IP address defined by the Botmaster, to connect to a specific web server, which plays a Command and Control Server role (Naseem, et al., 2010). HTTP bots adopt the PULL approach, unlike the PUSH approach used by the IRCbased bots. In the PULL approach, the HTTP-based bots do not remain in the connect mode after it has established a connection to the Command and Control server, the first time. In the PULL approach, the Botmasters publish the commands on certain web servers, and the bots periodically visit those web servers to update themselves or get new commands. This process continues at a regular interval, that is defined by the Botmaster (Gu, et al., 2008; Jae-Seo, et al., 2008; Naseem, et al., 2010; Ping, et al., 2009). Figure 2-5 shows the HTTP-based Command and Control Botnets.
Figure 2-5: HTTP-based C&C Botnet (Gu, et al., 2008)
17
2.2.5 Decentralised or P2P Command and Control Mechanism The decentralised Command and Control architecture is based on the peer-to-peer network model. In this model, the infected computers or zombies can act as a bot and as a C&C server at the same time (Ianelli & Hackworth, 2005.; Jing, et al., 2009; Naseem, et al., 2010). In fact, in P2P Botnets, instead of having a central C&C server, each bot acts as a server to transmit the commands to its neigbouring bots. The Botmaster sends commands to one or more bots, and the bots that receive the commands then deliver them to other bots, and this process is repeated by each bot that receives a new command. Unlike the centralised Botnet, creating and managing the P2P Botnets involve complex procedures and require a high level of expertise (Gu, et al., 2008; Jing, et al., 2009; Ping, et al., 2007) . Figure 2-6 shows the structure of a decentralised Command and Control Botnet.
Figure 2-6: Decentralized or P2P Botnet (Ping, et al., 2007)
18
2.3 Why Choose HTTP Botnets? As discussed in sections 2.2.4 and 2.2.5, there are three different types of Botnets IRC, HTTP, and P2P. The reasons for choosing the HTTP-based Botnets for this research, are as follows: In the first generation of Botnets, the IRC technology was used by Botmasters to control the bots because the IRC system has several advantages such as ease of use, ease of control, and ease of management (Ianelli & Hackworth, 2005.; Jae-Seo, et al., 2008). However, the main weakness of IRC Botnet is the central control mechanism. A whole Botnet can be destroyed by blocking the IRC server or blocking the IRC ports. Hence, the P2P Botnets were designed to overcome this problem (Wei, Tavallaee, Goaletsa, & A. Ghorbani, 2009; Zhaosheng, et al., 2008). In the decentralised Botnets or P2P Botnets, there is no central Command and Control server, rather, there are multiple distributed servers. Commands are delivered bot by bot to the entire Botnet. In addition, some decryption methods are used to make the communication secure (Gu, et al., 2008; Ianelli & Hackworth, 2005.; Ping, et al., 2007) . These techniques make it more difficult to detect P2P Botnets as compared to the IRC Botnets. However, P2P Botnets are not as widely used as IRC Botnets because the implementation and control of P2P bots can be quite difficult and complex. In addition, there is no latency in message delivery in P2P Botnets, and also the Botmasters are not able to know about the delivery status of the commands (Bailey, Cooke, Jahanian, Yunjing, & Karir, 2009). Recently, Botmasters have begun to use the centralised Command and Control structure, again. However, the HTTP protocol is used in place of the IRC protocol
19
(Jae-Seo, et al., 2008; Naseem, et al., 2010), and also port 80 is used. Because of the wide range of services used, it is not easy to block the central Command and Control server (Sandvine, 2006). In addition, by using the HTTP protocol, bots hide their communication flows among the normal HTTP flows, and avoid detection by the network defenders such as the firewalls (Chao, et al., 2009; Govil & Jivika, 2007; Zeidanloo & Manaf, 2009). From the review of the characteristics of IRC, P2P, and HTTP-based Botnets, it is clear that the HTTP command and control mechanism is a new technology that is preferred by Botmasters. Compared to the IRC and P2P Botnets, HTTP-based Botnets have a set of attributes that make it difficult for them to be detected. Surprisingly, the number of researches focusing on the detection of HTTP-based Botnets is relatively low as compared to the number of researches on the detection methods for IRC-based and P2P Botnets. The following sections discuss the past and current researches on Botnet detection methods.
20
2.4 Existing Botnet Detection Methods This section discusses the current methods and research on Botnet detection. 2.4.1 Honeypot and Honeynet Honeypots are tools that are used as traps for bots as they can detect bots or collect information on their activities. The information can be used to understand more about bots behaviour or the intentions of the Botmasters. Nepenthes is a good example of a Honeypot that is used to collect the bots binary codes and other information about them (Niels & Thorsten, 2007; Rajab, Zarfoss, Monrose, & Terzis, 2006). Freiling, Holz, and Wicherski (2005) used Honeypots to collect information about DDOS attacks. This information includes DDOS signs and characteristics, cases, and the attackers intention and behaviour. This information is useful for the development of methods to prevent DDOS attacks. Similarly, Rajab et al. (2006) combined several Honeypots as a multifaceted approach to collect a large amount of information about IRC bots. By analysing the data, as well as tracking the activities of the bots, they learned more about the bots characteristics and behaviour. Like any other tools and techniques, Honeypots have their weaknesses. There are two types of Honeypots - low-interaction honeypots, and high-interaction honeypots. The main difference between them is the level of access rights to system resources, services, and functions.
21
Low-interaction honeypots like Nepenthes, are installed on computers to emulate limited services of their operating system, thus, they provide Botmasters limited interaction with the computers. Therefore, these computers may not be completely compromised, and the information collected on them may not be sufficient for analysis to detect Botnets (Niels & Thorsten, 2007). On the other hand, the high-interaction honeypots do not emulate any services of operating system but provide the real system and services. The Botmaster can use this real services to gain full control of the computer in which the high-interaction honeypot is installed (Niels & Thorsten, 2007). Today, it is not surprising that Botmasters use many techniques to avoid the honeypots (C. Zou & Cunningham, 2006) .
2.4.2 Detection by Signature Signature refers to the known patterns or characteristics of threats from intruders into computer systems. By analysing and comparing these patterns or characteristics, it is possible to distinguish the threat activities from the normal activities (Scarfone & Mell, 2007). Goebel and Holz (2007) used an IRC nickname as signature. Using this method, known as Rishi, a reasonable amount of information on IRC traffic can be collected. Subsequently, all the IRC nicknames are extracted from the collected data and checked for known bots nicknames by using some regular expressions. To reduce the amount of comparison and the time taken, Goebel and Holz used a white list and a black list.
22
The signature-based detection method is not very effective because this method cannot identify new behaviour patterns or certain characteristics. This method is based on a simple comparison of the collected information with the predefined characteristics of well-known bots. Thus, this method is good for detecting well-known bots, but quite useless for detecting new and zero-day bots (Chao, et al., 2009; Scarfone & Mell, 2007).
2.4.3 Detection by DNS Monitoring Monitoring and analysing the DNS traffic generated by bots had been used as a technique to detect Botnets. Choi, Lee, Lee, and Kim (2007) found that bots generate DNS traffic in some situations, for example, when identifying the Command and Control server or arranging attacks such as DDOS attack. The researchers used three main differences between the bot-generated DNS flows and the normal DNS flows, as ways to detect Botnets. The first difference they noticed is the amount of the source IP addresses that send the DNS queries to specific domain names. The Botnet DNS queries are generated by a fixed number of IP addresses that belong to the bots in the same Botnet. On the other hand, a number of IP addresses of legitimate DNS queries generated by anonymous users to a particular domain name, are random. The second difference is the difference in the format and frequency of DNS queries generated by bots and by normal users. Bots have similar group activities, thus, DNS queries of the same format are generated by bots from the same Botnets, intermittently,
23
and only in special situations, but the DNS queries of normal users are generated continuously, and in a random format. The third difference is that normal users hardly use a distributed DNS (DDNS), whereas, it is used by bots. Salomon and Brustoloni (2008) also used DDNS as base parameters to suggest two approaches for Botnet detection. They found that Botmasters do not use certain Command and Control servers for a long time, and periodically change the servers. In this situation, bots will try to find the address of a new Command and Control server. When this happens, there will be a higher number of DDNS queries to specific domain names. These are signs of unusual activities of Botnets. NXDOMAIN has been evaluated as another parameter. The term, NXDOMAIN or Non-Existent Domain, describes the special state that accrues when the DNS resolvers are unable to resolve a certain domain name for any reason such as change of domain names, unregistered domain names, or server problems. Salomon and Brustoloni
(2008) suggested that the high number of DDNS queries containing the NXDOMAIN code, could have been generated by the bots, which are searching for their Command and Control servers that might have been blocked, or moved. DNSBL or DNS Block List is a list of spamming computers and network IP addresses. Ramachandran, Feamster, and Dagon (2006) stated that DNSBLs may be checked by Botmasters to keep themselves aware of their bots status - to find out whether a particular bot is being blocked. Thus, their algorithms are designed to
distinguish the normal DNSBL queries (generated in a normal service such as mail servers) from the queries generated by Botmasters.
24
The detection methods, discussed above, were designed to analyse bots and Botnetgenerated DNS (domain name system) queries. These methods are no longer effective as the new generation of bots and Botnets have been designed to generate minimum number of DNS queries. Moreover, the process of analysing DNS is very complex (JaeSeo, et al., 2008).
2.4.4 Detection using Attack Behaviour Analysis In this method, the characteristics and behaviour of attacks have been studied by researchers more than other issues such as the bots, Command and Control servers, or Botmaster behaviour, or the communication methods used. Hu, Knyz, and Shin (2009) proposed a system, called RB-Seeker, which has three different sub-systems to detect bots that carry out URL redirection attacks. The first two sub-systems of the method attempt to identify all domains, which are related to redirection activities, based on the characteristics and behaviour of the URL redirection attack. At this stage, the system does not make any decision about the domain status, which can either be normal or malicious. In the next stage, the third sub-system examines the DNS queries to distinguish the malicious domains from the normal domains. This method, however, uses DNS-based techniques, but the main aim is to focus more on URL redirection activities, and DNS probing is used only as a sub-system. Therefore, this method does not belong to the DNS-based category. Brodsky and Brodsky (2007) found that a higher number of spam emails are sent by bots, within a short period than those sent by humans. Based on this observation, the
25
source of spam emails were identified and recorded. Subsequently, the number of spam emails generated by the same recorded sources, within a short period, was used as a parameter for decision-making. Likewise, Yinglian et al. (2008) designed a system to collect all the URLs that were sent by the spam emails, and divided them into different groups based on their Web domains. In the next step, all the URL groups were given the regular expression generator to create a signature for malicious URLs. These methods can identify bots based on the similarity of their group activities. The methods are effective when countering attacks from a large number of attackers.
2.5 Detection Based on Network Behaviour Analysis Network Behaviour Analysis or NBA is a method that can be used to collect a wide range of information and statistics about network traffic. The information is analysed to detect for any signs of threats or malicious activities. The NBA method consists of several components that include the sensors and management servers (Analyser) (Scarfone & Mell, 2007; Timofte & Romania, 2007). The NBA system collects information such as IP addresses, operating system, available services, and logging data such as Timestamp, event type, network protocols, host ports, and additional packet header field for each client (Scarfone & Mell, 2007) .
26
2.5.1 Why Choose Network Behaviour Analysis? The Network Behaviour Analysis system has been chosen for this research for two main reasons: a) Ability to Detect Unknown Threats: Botmasters update their techniques day-by-day to hide their activities from existing detection methods (Lee, et al., 2007). The NBA system can thwart the Botmasters strategy as it can detect unknown (zero-day) threats. This feature of the NBA system can further improve Botnet detection (Derek, 2009; Scarfone & Mell, 2007). b) Ability to Detect Encrypted Threats: Botnets try to hide their communication flow among normal web traffic (e.g. HTTP C&C) (Zeidanloo & Manaf, 2009) or use encryption methods (e.g. P2P C&C) (Ping, et al., 2007). NBA looks out for abnormal flow patterns in network traffic, and not at the content of the information being transmitted (Rehak et al., 2009). In addition, the benchmark report from Aberdeen Group (Derek, 2009) pointed out that the NBA methods produce good results when combined with other methods.
2.5.2 Existing Detection Methods Based on NBA The Network Behaviour Analysis technique has been widely used by researchers for Botnet detection for many years.
27
Strayer, Walsh, Livadas, and Lapsley (2006) designed a system to detect IRC bots using five filters. Initially, the IRC chat traffic is separated from the other types of traffic. The IRC traffic is then examined using five different filters to reduce the amount of useless traffic flows. The first filter is applied to reduce the amount of IRC traffic based on the assumption that bots use only TCP-based IRC flows. The other four filters, respectively, further reduce the IRC traffic tracked based on the following criteria: flows that only have a SYN and RST flags; high bit rate flows; average packet size is bigger than expected; and short duration flows. In the last stage of filtering of the IRC traffic flow, the machine-learning technique proposed by Livadas, Walsh, Lapsley, and Strayer (2006) is applied. Finally, a five-dimensional correlation algorithm is used to make a final decision to detect IRC bots (Strayer, Walsh, Livadas, & Lapsley, 2006). Gianvecchio, Xie, Wu, and Wang (2008) studied the results from different measurements, which show the difference between the bot behaviour and human behaviour in the IRC chat. They noticed a difference between the bots and human with respect to the inter-message delay and message size in the Internet chat rooms. After analysing these two parameters, they proposed a system that uses entropy and machinelearning-based classifiers to detect chat bots. Mitsuaki et al. (2007) introduced three metrics - relationship style, response time, and synchronization activities - for detecting bots. Because the Botmasters are
connected to the bots via Command and Control servers, they assume that there is a 1 to N relationship between the Botmaster and the bots in a Botnet. Mitsuaki et al. use the structure of this relationship as a metric to detect Botnets.
28
They also observed that the IRC chat bots respond faster than human, hence, the response time is used as the second metric. Finally, they observed that the bots get their commands from the Botmaster. This means that the bots may perform abnormal activities to be in synchronisation with other bots in the same Botnet. This synchronisation activity is used as another metric. Wei et al. (2009) categorised the services or applications using signature-based and decision tree classifiers. They categorised the network applications into IRC chat, P2P, and web applications. Then, focusing on each category, they use the response time, and synchronisation activities metrics introduced by Mitsuaki et al. (2007) to differentiate the bot activities from the normal activities. Guofei, Phillip, Vinod, Martin, and Wenke (2007) proposed the BotHunter that models the five subsets, which may happen during the infection process by bots. They set these subsets in different correlation engines to examine the traffic flows to look for any evidence of Botnet activities. BotSniffer (Gu, et al., 2008) and its extension BotMiner (Guofei, Roberto, Junjie, & Wenke, 2008), are Botnet detection systems that carry out their tasks by analysing the similarity in the abnormal or malicious activities generated by the bots of the same Botnet. Jae-Seo et al. (2008) used a parameter based on one of the pre-defined characteristics of HTTP-based Botnets. As discussed earlier, the HTTP bots periodically connect to a particular Command and Control server to get updates. The researchers suggested that there is a degree of periodic repeatability or DPR to show the
29
rate of periodic connections to certain servers. The value of DPR is used as a parameter to detect HTTP-based bots. The next section will evaluate some of the methods used in past researches and compare them to the system developed in this research.
2.5.3 Evaluation and Comparison of Existing NBA Methods for Botnet Detection A Botnet detection system, called botAnalytics, was developed in this research to detect HTTP-based Botnets. The reasons for choosing HTTP-based bots, and the Network Behaviour Analysis approach for the design of botAnalytics, had already been discussed. In this section, botAnalytics will be compared with other methods from past researches that also used the NBA technique. Table 2-1 shows the comparison, in brief.
Table 2-1: Comparison of Methods from Past Researches with botAnalytics
As shown in table 2-1, all the methods are able to detect unknown (zero-day) bots. This ability is one of the main advantages of using the Network Behaviour Analysis system, as discussed earlier. botAnalytics was designed to detect HTTP-based Botnets,
30
hence, it cannot be compared with the first five methods, that were designed to detect IRC-based Botnets. Jae-Seo et al. (2008) proposed a system to detect only HTTP-based bots. In this method, normal applications can incorrectly be detected as bots, and this can produce very high false-positive results. The methods proposed by Guofei et al. (2008) and Gu et al. (2008) were designed to detect all three types of bots IRC-based, P2P, and HTTP-based bots. In general, their methods produce low false-positive results, but their sub-systems, which are involved in detecting HTTP-based bots, produce high false-positive results. This is because the proposed HTTP-based Botnet detection sub-systems have the same design as that proposed by Jae-Seo et al. As discussed earlier, the technique proposed by Guofei et al. and its extension by Gu et al., are based on the similarity of the bots group activities, and use data mining approaches. These techniques work with a Botnet that has a large number of bots to produce results to make better decision. For this reason, these methods are not effective in small-scale Botnets. Gu et al. (2008) proposed a method to detect small-scale Botnets, but this method has a direct relationship with the false-positive rate, which means that if its effectiveness in small-scale Botnets increases, the false-positive ratio also increases. The botAnalytics system developed in this research was aimed at overcoming the weaknesses of BotSniffer (Gu, et al., 2008), and BotHunter (Guofei, Phillip, Vinod, Martin, & Wenke, 2007). It can detect even a very small-scale Botnet that has only one
31
bot. In addition, botAnalytics produces very low false-positive rate, unlike the method developed by Jae-Seo et al. (2008).
2.6 Conclusion There are three types of Botnet based on the way their bots communicate with each other. IRC-based and HTTP-based Botnets are called centralised and P2P is called decentralised. HTTP-based bots are the latest generation of Botnets that hide their activity by using the normal HTTP traffic. HTTP-based Botnets have a set of characteristics that make its detection difficult compared to the IRC and P2P Botnets. There are a several methods and techniques that have been used by researchers to track the Botnet activities and detect them, but the number of researches in HTTP-based Botnet detection is low as compared to the number of researches on the detection methods for IRC-based and P2P Botnets. The ability of the NBA system to detect unknown and encrypted threats made it the preferred system to modeling botAnalytics. Next chapter discusses the process of modeling a detection system based on NBA architecture.
32
Chapter 3: Modeling of Detection System

3.1 Introduction: This chapter describes the method adopted to carry out the research on modeling a new system for detecting HTTP-based Botnet. As described in literature review, in this research a detection method has been proposed by using the network behaviour analysis (NBA) architecture (Derek, 2009; Scarfone & Mell, 2007). The proposed method use NBA architecture to collect a wide range of information and statistics about particular network traffic. Then the collected information is analysed to search for any signs of bots and Botnet activities. 3.2 Proposed Method Architecture There are three layers in proposed method architecture - data collecting platform, data storing platform, and data analysing platform. Based on the NBA structure, the proposed method consists of several components that include the software sensors and management server (Analyser) (Scarfone & Mell, 2007; Timofte & Romania, 2007). Figure 3-1 shows the schema of proposed method architecture.
Figure 3-1: botAnalytics System Architecture
33
3.2.1 Data Collecting Platform The data collecting platform consists of a set of software sensors, which had been installed on each client in a particular network. The main task of the data collecting platform is to collect data of the HTTP traffic in each client and to store the data in the database. This platform also uses a set of filters and other techniques to separate out data on unwanted traffic.
3.2.2 Data Analysing Platform The data collected by the data collecting platform are analysed by the data analysing platform to detect suspicious activities associated with a bot or Botnet. A set of filters and techniques are used by this platform to make the analysis process fool-proof.
3.2.3 Data Storing Platform The data storing platform is the place where the collected data are kept before and after the analysis process. All the results are saved in the database to maintain the history of the system performance.
3.3 Data Reduction Filters In addition to sniff network traffic, the proposed data collecting platform apply two filters on collected data to filter out the useless data from being collected, and reduce the amount of unwanted data.
34
3.3.1 HTTP Traffic Separator Filter HTTP Traffic Separator filter (H.T.S) was designed to separate the HTTP traffic from other types of traffic in the network. botAnalytics was designed to detect HTTB-based Botnets. As mentioned in section 2.2.4, HTTP-based Botnets use the HTTP traffic; hence, the data on other types of network traffic are not collected. Figure 3-2 shows the flowchart of this filter.
Figure 3-2: The flowchart of H.T.S. filter
3.3.2 Get and Post Separator Filter The Get and Post Separator (G.P.S.) filter designed to select only the HTTP traffic with GET or POST methods. The HTTP-based bots use the GET or POST methods to contact their Command and Control server, thus, the other methods provide no information about bot activities (Joe, 2004; Naseem, et al., 2010; Nazario, 2007). Therefore, The G.P.S. filter focuses on the HTTP methods, and only selects the HTTP traffic with the GET and POST methods. Figure 3-3 shows the flowchart of this filter.
35
Figure 3-3: The flowchart of G.P.S. filter
3.4 VOU Mechanism The VOU or Validation of User-Agents mechanism was designed based on a unique algorithm. It is used, for first time in this research in the data collecting platform. This mechanism defines the VOU field for each collected HTTP traffic packet with an appropriate value. The VOU mechanism acts on each collected packet of HTTP traffic with the GET or POST methods, and obtains the User-Agent from the collected traffic header. In the next step, the VOU tries to define the User-Agent string and its corresponding application from the installed application list. The install application list contains the list of applications and services, which are available on each client within a network, together with their corresponding User-Agent. This list can be updated by users or automatically from websites such as www.user-agents.org . Figure 3-4 shows the flowchart of the VOU mechanism.
36
Figure 3-4: The VOU Module Flowchart
For each collected HTTP packet, the VOU field is updated with either one of three different values, based on different conditions, as explained below:
37
1)
UNKNOWN value If the VOU mechanism is not able to determine the User-Agent for any reason, for example, due to encryption or use of fake User-Agents, the VOU field of the collected traffic will be given the UNKNOWN value. If the VOU mechanism is able to determine the User-Agent but is not able to identify the corresponding application, the VOU field will also be given the UNKNOWN value.
2)
VALID value The VOU field will be set to the VALID value if the User-Agent and its corresponding application have been identified, and the corresponding application has been installed on the client and is available at the same time.
3)
NOTVALID value If the User-Agent and its corresponding application have been identified but the corresponding application is not available on the client, the VOU field will be given the NOTVALID value.
3.5 Analysing the Collected Traffic The data collecting platform periodically sniffs the network traffic and applies the H.T.S. and G.P.S. filters to select only HTTP-type traffic using the GET or POST method. In addition to these filters the VOU mechanism is applied on collected data as described on section 3.4. When a reasonable number of packets have been collected and stored in the data store platform, the Analyser begins its work in the data analysing platform as follows:
38
3.5.1 Grouping and Sorting The Grouping and Sorting (G.A.S) process sorts data on the collected traffic and divides them into different groups based on the source IP address (SIP), destination IP address (DIP), URL, and the User-Agent string (UA).
While the other researches mostly use source IP, destination IP and Domain names to divide the collected traffic packets to different groups, in the proposed method one of the HTTP header fields known as the User-Agent has been used as another parameter beside the previous ones, to make the collected network packets classification more accurate. The G.A.S. process categorised the traffic packets into different groups, then the three different filters are applied to each group of packets to search for signs of suspicious activities and presence of HTTP bots.
3.5.2 High Access Rate Filter The H.A.R. filter or High Access Rate filter eliminates the group of similar HTTP connections or requests that have been generated within a very short time, for example, more than one request per second. Figure 3-5 shows the H.A.R. filter flowchart.
39
Figure 3-5: Flowchart of H.A.R. Filter
3.5.3 Low Access Rate Filter The L.A.R. filter or Low Access Rate filter removes the HTTP traffic with less than 2 packets of requests in the whole data collecting period. For example, if a group of HTTP traffic is generated within a very short time in the data collecting period, it will be removed by this filter. Figure 3-6 shows the L.A.R. filter flowchart.
40
Figure 3-6: Flowchart of L.A.R. Filter
3.5.4 Periodic Access Rate Filter The P.A.R. filter or Periodic Access Rate filter selects the HTTP connections or requests that were generated at periodic intervals. This filter was designed based on the nature of HTTP-based Botnets. As noted in the literature review, the HTTP bots connect to their command and control server periodically to get the commands or updates. Figure 3-7 shows the P.A.R. filter flowchart.
41
Figure 3-7: P.A.R. Filter Flowchart
3.6 LODA Mechanism LODA or Level of Danger Analysing mechanism is designed to analyse the detected suspicious traffic to define its level of danger. Figure 4-9 shows the flow chart of the analysing algorithm of LODA. Figure 3-8 shows the flow chart of the analysing algorithm of LODA. For every suspicious activity detected, the analysis process starts by examining the VOU field value, which has been set by the VOU mechanism. If the value of the VOU field of a particular group of suspicious traffic is VALID, the level of danger field for that group will be set to LOW. If the VOU value is NOTVALID, the level of danger will be set to HIGH, and if the VOU value is UNKNOWN, the next step of analysing will start.
42
Figure 3-8: LODA Module Flowchart
If the value of the VOU field is UNKNOWN, the query is referred to the database to retrieve the count of similar traffic group, which is generated by other clients in the
43
network. The answer is compared with the limit value set by the system Administrators. If the count is greater than the limit value, the level of danger will be set to HIGH. If the count is less than the limit amount, another query will be submitted to retrieve the count of a similar traffic group from the client history. In this case, if the count is greater than limit value, the level of danger will be set to HIGH, and if the count is less than limit value, the level of danger will be set to the LOW value.
3.7 Proposed Method Flowchart The various filters and mechanisms designed in this research have been discussed in detail in previous sections. Figure 3-9 shows the proposed method flowchart of all the filters and mechanisms, combined. As described in proposed method architecture section, botAnalytics contains software sensors, which are installed on the network clients and form the data collecting platform. The data collecting platform periodically sniffs the network traffic and applies the H.T.S. and G.P.S. filters to select only HTTP-type traffic using the GET or POST method. The system then sets a collecting time and date, and collects the traffic information, which includes the source IP address, destination IP address, URL, User-Agent string value, and HTTP-type method. In the final step, the VOU mechanism is applied to the collected traffic data, and the results stored in the data storing platform.
44
Figure 3-9: The Proposed Method Flowchart
When a reasonable number of packets have been collected and stored in the data storing platform, the Analyser begins its work in the data analysing platform by collecting a set of packets from the data sorting platform, and applying the G.A.S. filter on the packets. The G.A.S. filter sorts the traffic packets into different groups. Next, the H.A.R., L.A.R., P.A.R. filters and the LODA mechanism are applied to each group of packets to search for signs of suspicious activities or presence of bots.
45
3.8
Conclusion This chapter discussed the modeling of a system to detect HTTP-Based Botnets.
The proposed model was designed based on NBA architecture and consists of three platforms data collecting, data storing, and data analysing. These platforms interact with each other to analyse the collected traffic data. The different filters - G.A.S., H.A.R., L.A.R. and P.A.R. were designed to examining the unanalysed data to detect if there are any suspicious activities. P.A.R was designed based on the nature of HTTP-Based bots that periodically connect to their C&C servers. H.A.R and L.A.R were designed based on observations related to HTTP-Based activities. In addition to the Botnet detection the VOU and LODA mechanisms were designed to analyse the level of danger of suspicious activities. The proposed model was implemented and the implantation process will be discussed in next chapter.
46
Chapter 4: Implementation of Proposed Model

4.1 Introduction The previous chapter discussed the techniques and methods of this research. This chapter discusses the implementation of the proposed model and will focus on process of botAnalytics realization and its execution according to its main concepts and plans.
4.2 DELPHI programming language To choose a suitable programming language to develop botAnalytics, two types of programming languages were reviewed. They include .NET and Java which are managed-code languages, and C++ and DELPHI which are native-code languages. In a managed-code language, the codes are executed under the management of virtual machines. C# codes run under the control of .NET Framework Common Language Runtime and Java codes execution is controlled by Java Virtual Machine (Gough, 2005). On the other hand, the native-code programming languages produce an executable file that can be run independently (Gileadi, Ford, Moerman, & Purba, 2007). The managed-code programming languages, which use virtual machines to control and execute the codes, give lower performance when compared to the native-code programming languages such as C++ and DELPHI (Gileadi, et al., 2007).
47
The execution codes of the native-code programming languages are optimised by their compiler in order to provide better execution time and speed. They can be run independently without any additional requirements like Virtual Machines (Gileadi, et al., 2007). Table 4-1 shows a comparison of managed-code and native-code programming languages.
Table 4-1: Comparison of Managed-code with Native-code Languages
botAnalytics was designed to have a good Graphical User Interface (GUI). C++ is a high-performance programming language, but it is not suitable for developing the user interface because of the complex coding involved. On the other hand, DELPHI has a form designer that provides a high quality environment for visual development (Cantu, 2003). It also has a number of Visual Component Libraries (VCL) that greatly help in developing a system in a shorter time (Teixeira & Pacheco, 2001). In view of DELPHI good features and its ability to meet the requirements of botAnalytics to have user-friendly GUI, high performance, and fast execution time, it was selected as the programming language to develop botAnalytics.
48
4.3 Client Side Implementation The botAnalytics client side is the execution form of the data collecting platform (software sensors), described in chapter 3. The function of the client side is to collect the data on network traffic, apply H.T.S. and G.P.S. filters and the VOU mechanism on the data, and store the results to the database (data store platform), which resides in the server. Figure 4-1 shows the graphical user interface on the client side.
Figure 4-1: botAnalytics Client Side GUI
4.3.1 Settings The settings tab on the client side consists of two parts - network configuration, and installed application list. Figure 4-2 shows the setting GUI.
49
Figure 4-2: Setting GUI
a)
The IP address of the server, and the database, can be configured. The connection testing is provided also to test the connectivity between the sensors allocated in data collecting platform and the database on server.
b)
From the setting GUI screen, all applications which have been installed on the client can be retrieved. The list of installed applications can be found in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion \ Uninstall section of the of the Windows registry. Tregistry is a standard Delphi class, which is used by the setting section to read information from the Windows registry.
50
4.3.2
Sniffing the Traffic Magmonsock (Magmonsock, 2010) is a set of free Delphi components that are
designed to sniff the internet traffic by using the WinPcap network driver. WinPcap is a standard tool that is used by botAnalytics to capture information on the network traffic. By using the WinPcap Library, the low-level network layers become accessible, and the operating system is extended by the WinPcap driver to gain access to the low-level network layers. WinPcap has a high-performance architecture for network packet monitoring in the Windows operating system (Risso & Degioanni, 2001). These free components were used by botAnalytics to implement the traffic sniffer. Figure 4-3 shows the traffic sniffer GUI.
Figure 4-3: Traffic Sniffer GUI
The information on the sniffed traffic gathered by the traffic sniffer includes the time, port type, source and destination IP addresses, sniffed packet type, data length, and other information.
51
4.3.3
H.T.S. filter The H.T.S. filter is applied on the data of the sniffed traffic. It filters out all other
types of traffic, except the HTTP traffic. This filter examines the packet type value, which is provided by the traffic sniffer module, and only selects the traffic if the value is wwwhttp or http. In addition, this filter removes the TCP flags. Figure 4-4 shows the GUI of the H.T.S. filter.
Figure 4-4 : H.T.S Filter GUI
The information of the collected packets includes the time, protocol, data length, source and destination IP addresses, method (e.g. GET, POST), User-Agent string, host, and the URL.
4.3.4 G.P.S. filter The G.P.S. filter examines the method of the HTTP traffic, and only selects traffic with the GET or POST method. Figure 4-5 shows the GUI of the G.P.S. filter.
52
Figure 4-5 : G.P.S Filter GUI
4.3.5 VOU Mechanism The VOU mechanism is one of the important algorithms that is introduced in botAnalytics, for the first time. For each collected packet, the VOU module obtains the User-Agent string and sends a query to install application list described in section 3.4 to retrieve the User-Agent corresponding application. Figure 4-6 shows the GUI of the VOU module.
Figure 4-6 : VOU Mechanism GUI
53
If the result of the query is null, it means that the system is unable to find the name of any application corresponding to the User-Agent string, the VOU sets the value UNKNOWN to the VOU field. If the query returns the name of an application corresponding to the User-Agent string, the VOU again refers to the installed application list in the Settings section to verify whether the application is installed on the client. If the application name exists in the list, the VOU sets the value VALID to the VOU field, otherwise, the VOU sets the value NOTVALID to the VOU field.
Figure 4-6 indicates that the User-Agent belongs to the Mozilla Firefox internet browser, which is installed on the client. Therefore, the VOU module sets the value VALID to the VOU field. Figure 4-7 shows the pseudo code of the VOU mechanism.
Figure 4-7 : VOU Pseudo Code
54
4.4 Database Implementation The Activex Data Objects or ADO is provided by Microsoft to access a relational database. The Delphi ADO component belongs to a standard Delphi component set, which makes the ADO available to Delphi by encapsulating the functionality of the ADO. botAnalytics uses the Delphi ADO components to gain access to the database, and control the process of retrieving and storing data into the database. 4.4.1 Microsoft Sql Server 2008 Three different database management systems (DBMS) - Mysql, Oracle 11g, and Microsoft Sql Server 2008 - were reviewed with respect to their scalability, security, and performance. The review was done to select a suitable DBMS for botAnalytics. Table 4-2 shows a comparison of the three DBMSs.
Table 4-2 : Comparison of DBMSs
Microsoft Sql Server 2008 and Oracle 11g both support large-sized databases, use of multiple CPUs, and large memory size, thus, they can support a large number of users and are highly scalable (Microsoft.com, 2010; Oracle.com, 2010). Mysql, however, has its good features such as Mysql Clustering, but has a lower level of scalability compared to the other two DMBSs (Mysql.com, 2010).
55
As noted earlier, performance is one of the strengths of botAnalytics. In terms of performance, Mysql cannot be compared to Microsoft Sql Server or Oracle, because of its database size and its level of scalability. However, with a small-size database, Mysql shows reasonable performance when compared with the other two DBMS (Erickson, 2009). Both Microsoft Sql Server and Oracle have strong security features, but based on the National Vulnerability Database (NIST) reports, Oracle databases consistently have more security vulnerabilities than Microsoft Sql server (Microsoft, 2008). Mysql provides a high level of security too, but the level of security is still lower than that of Microsoft Sql Server 2008 and Oracle 11g (Erickson, 2009). In comparing Microsoft Sql Server 2008 with Oracle 11g, Sql Server 2008 has new features that directly enhance its performance (Microsoft, 2008). Table 4-3 shows a comparison of the two DBMSs.
Table 4-3 : Microsoft Sql Server 2008 Extra New Features
56
Based on three important parameters - scalability, security, and performance - the Microsoft Sql Server 2008 was selected as a suitable Database Management System for botAnalytics.
4.4.2 Tables structure Nine different tables were designed to store and manage the data in botAnalytics. The structures of these tables are shown below:
a) tblUser (System User Table) This table stores the information of the system users. Table 4-4 shows the tblUser structure.
Table 4-4 : tblUser Structure
Field Name UserCode UserName Password RoleCode FirstName LastName Address Email Phone HP Status QCode Answer
Field Type Integer Varchar(30) Varchar(20) Integer Varchar(20) Varchar(30) Varchar(255) Varchar(20) Varchar(15) Varchar(15) Integer Integer Varchar(100)
Description Primary Key User Name User Password User Role (Foreign Key from tblRole) User First Name User Last Name User Address User Email Address User Phone Number User Hand Phone Number User Status (Active / Inactive) Security Question (Foreign Key from tblSquestion) Security Question Answer
57
a) tblSquestion ( Security Questions Table) This table stores the users security question that is used in the event the user forgets his/her password. Table 4-5 shows the structure of tblSquestion.
Table 4-5: tblSquestion Structure
Field Name QCode Question
Field Type Integer Varchar(100)
Description Primary Key Security Question
a) tblRole This table stores the types of users - Administrator or Operator. Table 4-6 shows the structure of tblRole.
Table 4-6: tblRole Structure
Field Name RoleCode RoleName
Description Primary Key Type of User (Administrator / Operator)
b) tblUserAgent (User-Agent Table) This table stores the User-Agents string value and the User-Agents and the names of the corresponding applications. Table 4-7 shows the tblUserAgent structure.
Table 4-7: tblUserAgent Structure
Field Name UACode UAString UACA UADes UserCode
Field Type Integer Varchar(255) Varchar(30) Varchar(255) Integer
Description Primary Key User-Agent String Value User-Agent Corresponding Application User-Agent Description To Determine Which User has Added this Item to The User-Agent List (Foreign Key from tblUser)
58
a) tblWhiteList (White List Table) The trusted User-Agents, IP addresses, and URLs are kept in this table. Table 4-8 shows the structure of the tblWhiteList.
Table 4-8: tblWhiteList Structure
Field Name WCode UAString IP URL UserCode
Description Primary Key User-Agent String Value IP Address URL To Determine Which User has Added this Item to the White List (Foreign Key from tblUser)
b) tblBlackList (Black List Table) This table stores the User-Agents, IP addresses, and URLs that have been blocked because of suspicious activities or malicious attacks associated with them. Table 4-9 shows the tblBlackList structure.
Table 4-9: tblBlackList Structure
Field Name BCode UAString IP URL UserCode
Description Primary Key User-Agent String Value IP Address URL To Determine Which User has Added this Item to the Black List (Foreign Key from tblUser)
c) tblClientsInfo (System Clients Information) This table stores all the information about the clients. Table 4-10 shows the tblClientsInfo structure.
59
Table 4-10: tblClientsInfo Structure
Field Name CLCode SIP SMA JDate
Field Type Integer Varchar(15) Varchar(17) Date
Description Primary Key Source IP Address Source MAC Address The Date the Client Joined the System
a) tblVOU (VOU Module Results Table) This table stores the data that were processed by the VOU module. The data kept in this table will be analysed by the LODA module. Table 4-11 shows the tblVOU structure.
Table 4-11: tblVOU Structure
Field Name VOUCode DateTime TCode CLCode VCode DataLength DIP HOST URL UserAgent Protocol
Field Type Integer Date Integer Integer Integer Interger Varchar(15) Varchar(255) Varchar(15) Varchar(2000) Varchar(50)
Description Primary Key User Name HTTP Method Type (Foreign Key from tblHMType) Client Info (Foreign Key from tblClientsInfo) VOU Value (Foreign Key from tblVouValue) Length of Packet Destination IP Address Destination HOST User Phone Number User-Agent String Value Packet Protocol
b) tblHMType (HTTP Method Type Table) According to the RFC 2616 (Fielding, et al., 1999), the different methods of HTTP packets are OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, and CONNECT. The tblHMtype table keeps these methods in three categories GET, POST, and Other. Table 4-12 shows the structure of this table.
Table 4-12: Structure of tblHMType table 60
Field Name TCode MethodType
Description Primary Key GET / POST / Other
c) tblVouValue (VOU Field Values Table) The VOU module may set the VCode field of the tblVOU table with any of three values - VALID, NOTVALID, and UNKOWN. These values are kept in the tblVouValue table. Table 4-13 shows the structure of the tblVouValue table.
Table 4-13: Structure of tblVouValue Table
Field Name VCode VouValue
Description Primary Key VALID/ NOTVALID/UNKNOWN
d) tblResult (Analyse Module Results Table) The final analysed results produced by the Analyse module are stored in this table. Table 4-14 shows the structure of tblResult table.
Table 4-14: Structure of tblResult Table
Field Name RCode CLCode DIP URL LCode UAString Count FDate LDate
Field Type Integer Integer Varchar(15) Varchar(2000) Integer Varchar(2000) Integer Date Date
Description Primary Key Client Info (Foreign Key from tblClientsInfo) Destination IP Address URL LOD Value (Foreign Key from tblLodaValue) User-Agent String Count (Increased by Adding Same Item) The First Date that This Item was Added The Last Date when the Count Field was Updated
61
e) tblLodValue (LOD Field Value Table) The LODA module catagorises the suspicious activities into two types -high or low level of danger. The values of these two types are kept in the tblLodValue table. Table 4-15 shows the structure of the tblVouValue table.
Table 4-15: Structure of tblLODA Table
Field Name LCode LodValue
Description Primary Key High / Low
f) tblNotification (Notifications Table) The notifications generated by the system are stored in this table. Table 4-16 shows the structure of the tblNotification table.
Table 4-16: Structure of the tblNotification Table
Field Name NCode CLCode
Description Primary Key To Determine Which Client has Generated this Notification (Foreign Key from tblClientsInfo) Notification String Notification Date
NString NDate
Varchar(2000) Date
62
4.4.3
Tables relationship The structure of all the database tables in botAnalytics was presented in previous
section. Figure 4-8 shows the relationship between these tables in the database
Figure 4-8: botAnalytics Database: Relationship between the Tables
4.5 Server Side Implementation The botAnalytics server implements the data analysing platform, which is described in the botAnalytics architecture in the previous chapter. The main function of server is to analyse the data collected by the clients to search for any evidence of HTTP-based Botnets activities. Figure 4-9 shows the general view of the GUI of botAnalytics server.
63
Figure 4-9: botAnalytics Server Side GUI
4.5.1
General info General info provides information about botAnalytics system. The information
provided includes the users who are currently logged into the system, time and date, and different types of information about the packets collected by the clients (sensors). Figure 410 shows the GUI of General Info module. a) It shows the user who is currently in the system.
b)
The standard Delphi unit, called Tdatetime, is used in this screen to get the current system date and time.
64
Figure 4-10: General Info GUI
c)
From this screen, a query is made to the database to display the percentage of collected packets using the GET and POST methods. Figure 5-11 shows the pseudo code of this query.
Figure 4-11: GET and POST Percentage Query Pseudo Code
d)
This screen also shows information about the collected packets, which have not been analysed, yet. Figure 5-12 shows the pseudo code of this query.
65
Figure 4-12: Collected Traffic Statistics Query Pseudo Code
e)
This screen also informs the user whether analyse process is recommended or not, based on the duration, and the amount of packets that have been collected by the sensors.
4.5.2 Analyse The analyse section is the main part of botAnalytics on the server. The G.A.S., H.A.R., L.A.R., P.A.R. filters, and the LODA mechanism, are applied to the collected packets to search for any signs of suspicious activities.
a) Primary Data This tab shows all the data that have not been analysed yet. Figure 5-13 shows the primary data GUI.
66
Figure 4-13: Primary Data Tab of the Analyse Section
When the Start Analyse button is clicked, the system applies the G.A.S. filter to group the packets, based on the packets source and destination IP addresses, User-Agent, host, and URL. All the packets, which are in the white list or the black list, are deleted. Figure 5-14 shows the black/white listing tab.
As shown in figure 5-14, below this tab is the statistical information, which shows the amount and the percentage of data that have been deleted, based on the black and white listing. By clicking on the Next button, the H.A.R. filter is applied to the remaining data, and the results will be shown in the H.A.R. results tab.
67
Figure 4-14: Black/White listing Tab of the Analyse Section
b) H.A.R. Filter botAnalytics applies the H.A.R. filter on groups of packets to delete all the packet groups that accessed the destination at a high rate (more than one access in one second). Figure 5-15 shows the GUI of the H.A.R result tab.
68
Figure 4-15: H.A.R. Result Tab of the Analyse Section
As shown in figure 5-15, below this tab is the statistical information, which shows the amount and the percentage of packets that have been deleted by the H.A.R. filter. Figure 5-16 shows the pseudo code of the H.A.R. filter.
Figure 4-16: H.A.R. Filter Pseudo Code
The H.A.R. filter deletes all packet groups that have made more than one access in one second to the destination. By clicking on the Next button, the L.A.R. filter is applied to the remaining packets and the result will be shown in the L.A.R. result tab.
69
c) L.A.R. Filter The L.A.R. filter deletes groups of packets with low rate of access (not more than one access per hour) to the destination. Figure 5-17 shows the GUI of the L.A.R. result tab.
Figure 4-17: L.A.R Result Tab of the Analyse Section
The L.A.R. filter deletes all packet groups that have made not more than one access in one hour to the destination. Figure 5-18 shows the pseudo code of L.A.R. filter.
Figure 4-18: L.A.R. Filter Pseudo Code
70
d) P.A.R. Filter: The P.A.R. filter selects groups of packets that periodically access the destination, and deletes all remaining groups of packets. In this step, the selected packets indicate the presence of suspicious activities. Figure 5-19 shows the GUI of the P.A.R. result tab.
Figure 4-19: P.A.R. Result Tab of the Analyse Section
The P.A.R. filter calculates the total packet collecting time, and divides it into equal intervals. Then, it only selects the packet groups, which are in all the intervals. The remaining packet groups are considered as making non-periodic access to the destination, and are deleted by this filter. Figure 5-20 shows the pseudo code of the P.A.R. filter.
71
Figure 4-20: P.A.R. Filter Pseudo Code
e) LODA Mechanism: The application of the LODA module is the last step in the analyse section. The module is used to assess the level of danger of the detected suspicious activities. Figure 5-21 shows the LODA pseudo code.
Figure 4-21: LODA Module Pseudo Code
For each suspicious activity detected by the system, the LODA module checks the VOU field. If the VOU field has a VALID value, the LODA sets the level of danger to LOW, and if the value of the VOU field is NOTVALID, the LODA module sets the level of danger to HIGH. If the VOU field has the value UNKNOWN, a simple query is then submitted to the database to retrieve the count of similar packet activity. If the count is not null, it means that the same packet activity had been detected by the system
72
previously, therefore, the level of danger is set to HIGH. If the count is null, the level of danger is set to LOW. Figure 5-22 shows the GUI of the LODA module result.
Figure 4-22: LODA Module Result GUI
4.5.3 Notifications The Notification section interacts with the User-Agent, the white and the black lists from the database, and generates appropriate notifications based on any changes to the lists. For example, if an item has been added to the black list or the white list, the notification generated is intended to inform the users that a particular item has been added to the black list or the white list, and the identity of the user who added the item. Figure 5-23 shows the notification GUI.
73
Figure 4-23: Notifications GUI
The notifications are added to the notifications table using a simple insert query, and they can be retrieved from the table using a simple select query.
4.5.4 Report The results of the Analyse process are stored in the Result table, and can be retrieved in the report section. The report function is implemented by using simple queries to search the Result table based on the different fields, which include the source and destination IP addresses, URL, User-Agent string, and the date. Figure 5-24 shows the GUI of the report function.
74
Figure 4-24: Report GUI
4.5.5 User Agent list Users can add the following to the User Agent list: the User-Agent string, UserAgent corresponding applications, and the User-Agent description to User-Agent table by using a simple insert query. In addition, users can delete or edit existing User-Agents from the list. Figure 5-25 shows the User Agent List GUI.
75
Figure 4-25: User Agent List GUI
4.5.6 White list In the white list section, the users can add an IP address, URL, or User-Agent string to the white list. Figure 5-26 shows the white list GUI.
Figure 4-26: White List GUI 76
4.5.7 Black list In the black list section, the users can add an IP address, URL, or User-Agent string to the black list. Figure 5-27 shows the black list GUI.
Figure 4-27: Black List GUI
4.5.8 Sensor status Sensor status is another part of the System Info module. It provides information on the status of botAnalytics system sensors. Knowing the sensor status keeps the users aware about the general situation of the system. Figure 5-28 shows the GUI of the sensor status. a) Sensors Info: This section provides information about all the sensors that are registered in the system. The botAnalytics sensors register themselves automatically when they
77
are installed on the clients by inserting the client IP address and the MAC address in the clients info table.
Figure 4-28: Sensor Status GUI
The sensor info section shows this information, and by submitting a simple query to the VOU table, information about the sensors status whether they are active or inactive will be displayed. Figure 5-29 shows the pseudo code of this query.
78
Figure 4-29: Sensor Info Pseudo Code
As shown in figure 5-29, for each client in the clients info table, the system gets the IP address, and applies a query to the VOU table to enquire whether any packet has been inserted by the client in the last 30 minutes. If the answer to the query is null, the system sets the client status to Inactive, but if the answer to the query is not null, the system sets the client status to Active.
b) Active and Inactive sensors (%): The sensors info section retrieves a number of system sensors with their status. The Active and Inactive sections use this information to calculate the percentage of active and inactive sensors.
c) Top 10 Active Sensors: This section shows the top 10 active sensors among all other system sensors. Figure 5-30 shows the pseudo code for the query used in this section.
Figure 4-30: Top 10 Active Sensors Pseudo Code
79
4.5.9 User account This section consists of three tabs - edit profile, create new user, and manage the existing users. Figure 5-31 shows the edit profile tab.
Figure 4-31: Edit Profile Tab
a) Edit profile: The edit profile tab allows the users to edit information on their profiles change password, edit user info, and change the security question. These functions can be carried out by all users.
80
b) Create New User: This section is only available to the system administrators. They can create new users account. Figure 5-32 shows the create new user tab.
Figure 4-32: Create New User Tab
c) Manage the Existing Users: This tab is only available to the administrators. They can change the existing users status to active or inactive, or change their password. In addition, the status of a user can be modified in this tab, to become an administrator or an operator. Figure 5-33 shows the Manage the Existing Users GUI.
81
Figure 4-33: Manage Existing Users Tab
The user account management module is easily implemented by applying a query to a user table.
4.6 Conclusion This chapter discussed the proposed method implementation includes algorithms, and the pseudo codes of filters and mechanism. botAnalytics is based on the client-server architecture. Some of its features were implemented on the client side, and some on the server side. An experiment has been set up to test and evaluate the proposed method new filters
and mechanisms and it will be described in next chapter.
82
Chapter 5: Testing the Proposed Model

5.1 Introduction This chapter discusses the different test scenarios which are designed to test and evaluate botAnalytics. The test environment simulates a real Botnet, as closely as possible. Three main items are needed to prepare for the experiment and testing process - the infected PCs, Analyser, and the Command and Control servers.
5.2 Hardware Requirements In the testing phase, at least three items are needed - an analyser; infected PCs or zombies; and a C&C server, which sends the bots the commands. All the PCs need to have the minimum configuration as follow: CPU 1.8 GHz RAM 2GB DDR Hard Disk 120 GB
However, the hard disk capacity of the analyser should be at least 240GB because all the data from the clients will be stored in it. The infected PCs and the analyser should be connected to each other in a network, as well as to the Internet. Figure 5-1 shows the general schema of the testing phase.
83
Figure 5-1 : General Schema for the Testing Phase
5.3 Testing bots Real HTTP bots are not available for the test, as a result, the HTTP bots have to be generated and simulated (Gu, et al., 2008; Guofei, et al., 2008)., therefore, Four different HTTP bots were implemented by using the free Delphi Internet Component Suite (Franois, 2010). The bots were modeled after two HTTP bots - Black Energy (Nazario, 2007), and Bobax (Joe, 2004) - to test the botAnalytics system. The HTTP bots implemented for testing are: UM_HBot1, UM_HBot2, UM_HBot3, and UM_HBot4, which are described as follows:
5.3.1 UM_HBot1 The UM_Hbot1 is generated based on a description of the Black Energy bot. It contacts the Command and Control server every three minutes. Figure 5-2 shows the Black Energy (UM_HBot1) User-Agent.
Figure 5-2 : The Black Energy User Agent (Nazario, 2007)
84
5.3.2 UM_HBot2 The UM_HBot2 is a modified version of UM_HBot1, and generally, follows the Black Energy bot structure, but it has been modified to become stealthier.
The UM-HBot2 contacts the Command and Control server periodically, but at random intervals of between three to ten minutes. In addition, UM_HBot2 uses the Firefox standard User-Agent string. Figure 5-3 shows the Firefox User Agent.
Figure 5-3: The Firefox User Agent
5.3.3 UM_HBot3 The UM_HBot3 bot is generated based on the description of Bobax. It connects to the Command and Control server at every four-minute intervals. Figure 5-4 shows the Bobax (UM_HBot3) User-Agent.
Figure 5-4: The Bobax User Agent
5.3.4 UM_HBot4 The Um_HBot4 bot follows the Bobax structure, but it has been modified to become more difficult to detect. Like UM_HBot2, the UM_HBot4 bot contacts the Command and Control server at random intervals, and uses the standard Firefox User-Agent.
85
5.4 Testing Command and Control servers The official domain of the botAnalytics system, www.eslahi.net, was used to simulate the Command and Control servers. Thus, the bots involved in the testing periodically connect to this domain (as Command and Control server).
5.5 Testing clients With the use of four different HTTP bots, four clients were also used to test the botAnalytics system. This means that each client was infected by one type of bot. For each bot, only one client was used to evaluate botAnalytics efficiency in a small-scale (single bot) Botnet. Client 1 and Client 2 were infected by UM_Hbot1 and UM_HBot3, respectively. The third client was infected by UM_HBot2, which used the Mozilla Firefox standard User-Agent. Thus, to evaluate the VOU mechanism, the Mozilla Firefox web browser was installed on this client. The VOU must generate the VALID value for UM_HBot2 packets in the third client. The fourth client was infected by UM_HBot4, which also used the Mozilla Firefox standard User-Agent, but in this case, the Mozilla Firefox web browser was not installed on this client. Thus, the VOU mechanism must generate the NOTVALID value for the UM_HBot4 packets. In addition, in four client machines the botAnalytics client side was installed, thus, making them act as software sensors and form the data collecting platform, described in section 3.2.1.
86
5.6 Testing analyser The botAnalytics server and database were located on same host to form the data analysing and data storing platforms.
5.7 Testing results Testing was carried out for more than 3 hours (3h:11m:59s), and data from four different bots traffic were collected and filtered by the botAnalytics system. Figure 5-38 shows the results of the filtering process.
Table 5-1: botAnalytics Filtering Result
Aside from the use of the filters, the VOU and LODA algorithms were applied to the collected traffic data to complete the Botnet detection process. Figure 5-39 shows the process.
Table 5-2: botAnalytics Botnet Detection Results
Analysis of the results and system evaluation will be discussed in the next chapter.
87
5.8 Conclusion This chapter discussed the proposed method testing and described the different steps involved in testing the botAnalytics system. Its difficult to find a real HTTP bots binaries, thus, different HTTP bots were implemented based on the description of existing HTTP bots. The bots were modeled after two HTTP bots - Black Energy, and Bobax - to test the botAnalytics system. The results from the use of the filters, and from the use of the VOU and LODA mechanisms, were presented. These result will be discussed and evaluate in next chapter.
88
Chapter 6: Data Analysis and Discussion

This chapter discusses the analysis of the results obtained from botAnalytics testing. System efficiency in small-scale Botnets (even with one bot), and the rate of false-positive in detection will be presented as metrics to evaluate the system and the algorithms. 6.1 Introduction As discussed in previous chapters, different filters were developed for first time or modified from existing filters to filter out useless traffic to facilitate Botnet detection. In addition, VOU and LODA algorithms were developed and used for the first time in this research. The algorithms were used to evaluate the level of danger of suspicious activities detected.
The first section of this chapter discusses the filtering process and the use of the new algorithms. The later section of this chapter compares botAnalytics with other existing HTTP-based Botnet detection methods based on false-positive rate, and their efficiency in detecting small-scale Botnets.
6.2 Evaluation of botAnalytics
This section discusses the results obtained following the use of the new algorithms and filters of botAnalytics on the traffic data.
89
6.2.1 Filtering evaluation The ability to differentiate the Botnets Command and Control traffic flows from the normal flows, is very important in the Botnet detection process. The existing detection methods use many types of filters to sift through the collected data on the network traffic to purge (filter out) the useless traffic data (Guofei, et al., 2008; Strayer, et al., 2006). This is done so that there is lesser amount of traffic data to be analysed, thus, making it faster to detect suspicious activities. botAnalytics uses five filters. Two filters - H.T.S. and G.P.S. filters - are used to select HTTP traffic only, while the other three filters - H.A.R., L.A.R., and P.A.R. - are designed to separate the HTTP-based Command and Control traffic flows from normal flows. Table 6-1 shows the results of the filtering process on the collected data.
Table 6-1: botAnalytics: Results of Filtering
Table 6-1 shows that 305,861 packets were collected by the botAnalytics sensors on the four client computers each of which had been infected by one type of HTTP-based bot. The results of each of the botAnalytics filters (H.T.S., G.P.S., H.A.R., L.A.R., and P.A.R.) are discussed, below:
90
a) H.T.S. Filter: The HTTP-based bots are designed based on the HTTP protocol (Jae-Seo, et al., 2008), therefore, the H.T.S. filter uses one of the application layer protocols, known as HTTP, to filter the collected traffic and select only HTTP-based traffic packets. The results show that the H.T.S. filter reduced the collected data from 305,861 to 102,623 packets, indicating that 66.45 % of collected traffic data packets were removed by this filter. Figure 6-1 shows the H.T.S. results in a bar chart.
Figure 6-1: The H.T.S. Filter Results Chart (See also Table 6-1)
b) G.P.S. Filter: The G.P.S. filter focuses on the HTTP methods, and only selects the HTTP traffic with the GET and POST methods. The HTTP-based bots use the GET or POST methods to contact their Command and Control server, thus, the other
91
methods provide no information about bot activities (Joe, 2004; Naseem, et al., 2010; Nazario, 2007). The G.P.S. filter of botAnalytics removed 87.86% of traffic data (H.T.S. result), resulting in the reduction of data packets collected from 102,623 packets to 12461 packets. Figure 6-2 shows the G.P.S. filter results in a bar chart.
Figure 6-2: The G.P.S. Filter Results Chart (See also Table 6-1)
c) H.A.R. Filter: The automatic software such as updaters and downloaders may act as bots and cause an increase in the false-positive ratio in the Botnet detection process (JaeSeo, et al., 2008). Strayer et al. (2006) observed that the bots do not generate bulk data transfers, therefore, the H.A.R filter was developed to remove all traffic that
92
transmit at a rate of at least two packets per second. Figure 6-3 shows the H.T.S. filter results in a bar chart.
Figure 6-3: The H.A.R. Filter Results Chart (See also Table 6-1)
Strayer et al. (2006) had used the same approach to design a filter for their IRC-based Botnets detection system. However, the filter had not been very effective. The H.A.R. filter developed in this research to detect HTTP-based bots, has proven to be very effective. As shown in figure 6-3, the H.A.R. filter removed 91.93% of unwanted traffic data packets, that is, reducing the collected packets to 1005 packets from 12461packets.
93
d) L.A.R. Filter: The L.A.R. filter removed the low-rate of access (less than 2 packets in whole data collecting period) traffic. The botAnalytics detection method is passive, which means that it collects the traffic data for a few hours initially, and then analyse the collected traffic to look for any suspicious activities. The bots are designed to perform bigger tasks and much faster than human, hence, they do not generate brief traffic in few hours (Strayer, et al., 2006). Figure 6-4 shows the L.A.R filter results in a bar chart.
Figure 6-4: The L.A.R. Filter Results Chart (See also Table 6-1)
As shown in figure 6-4, the L.A.R. filter reduced the traffic data packets collected from 1005 packets to 226 packets, which meant that 77.51 % of the traffic data packets had been removed in the L.A.R. filtering process.
94
e) P.A.R. Filter: The P.A.R. filter functions based on the same concept as the existing HTTPbased Botnet detection methods. As discussed earlier, the HTTP-based bots follow the pull style, which means that they periodically contact their Command and Control server to get commands (Gu, et al., 2008; Guofei, et al., 2008; Jae-Seo, et al., 2008). The P.A.R. filter only selects the periodic traffic and to a specific destination. It filtered out unwanted traffic data packets and reduced the data packets collected from 226 packets to 125 packets. This means 44.69 % of the traffic data packets had been removed by this filter. Figure 6-5 shows the P.A.R. filter results in a bar chart.
Figure 6-5: The P.A.R. Filter Results Chart (See also Table 6-1)
Overall, the five botAnalytics filters have reduced the collected traffic data packets from 305,861 packets to 125 packets. This means that 99.96% of the traffic data packets had been removed by botAnalytics filters, and only the HTTP-based bots packets survived
95
the filtering process. The drastic reduction in the traffic data packet collected, had been achieved without the use of the white list or the black list techniques. It is evident from the results that the botAnalytics system was able to detect all the HTTP-based Botnets present during the testing process. In next step, the botAnalytics system will be used to analyse the level of danger posed by the detected HTTP-based bots by using the new VOU and LODA algorithms.
6.2.2 VOU algorithm evaluation The botAnalytics system focused on one of the HTTP header fields, called UserAgent. The VOU algorithm was designed to retrieve the User-Agent of collected traffic. It sets the corresponding VOU field using three values VALID, NOTVALID, and UNKNOWN. Table 6-2 shows the VOU results in the testing phase.
Table 6-2: The VOU Algorithm Result
The UM_Hbot1 and UM_Hbot3 bots use their own User-Agents, which are not in the botAnalytics system database. Thus, botAnalytics is unable to identify their UserAgents, and will assign the value UNKNOWN to them. The botAnalytics system is not signature-based, and it does not use the User-Agent value as a signature. Often, the retrieved User-Agents cannot be identified by the system, and are considered as unknown User-Agents.
96
The UM_Hbot2 uses the standard Mozilla Firefox User-Agent, which is in the botAnalytics system database, and thus, it can be identified. Based on the testing scenario, the Mozilla Firefox web browser was installed on the client that was infected by UM_Hbot2. When the botAnalytics system retrieved the User-Agent of the traffic generated by this bot, it encountered a traffic that was seemingly generated by the Mozilla Firefox web browser. Because the Mozilla Firefox was installed on the client, the system assigned the VALID value to traffic generated by UMHbot2. Likewise, the UM_Hbot4 used the standard Mozilla Firefox User-Agent string, but in this case, the Mozilla Firefox was not installed on the client that was infected by this bot. When the botAnalytics system retrieved the User-Agent of the traffic generated by this bot, it encountered a traffic that was seemingly generated by Mozilla Firefox web browser. Because the Mozilla Firefox was not installed on the client, the system assigned the NOTVALID value to traffic generated by UMHotbot4. In the next step, these results will be used by the LODA algorithm to evaluate the level of danger of suspicious network traffic.
6.2.3 LODA algorithm evaluation Presently, botAnalytics is the first and only system to use an algorithm to evaluate the level of danger posed by the HTTP-based bots, beside trying to detect them. When any suspicious activity is detected by the system, the LODA algorithm uses the VOU value of the activity to determine to what extent this activity can be dangerous, and to help the network security experts to make better judgment or decision. Table 6-3 shows the results following the use of the LODA algorithm in the testing phase.
97
Table 6-3: The LODA Algorithm Results
The VOU value of UM_HBot4 bot was NOTVALID, which meant that it tried to show itself as a standard software that was not installed on the client, hence, the LODA algorithm identified the level of danger to be high. On the other hand, the UM_HBot2 bot had a VALID value, thus, it was identified to be of low level of danger. The VOU value for UM_HBot2 and UM_HBot4 bots was UNKNOWN. Hence, the LODA algorithm sent a query to the database to find out whether similar activity has been detected before. In the test scenario, there was no record kept in the system, on the bots activities. Thus, LODA was not able to find any history about UM_HBot2 and UM_HBot4 bots activity. The LODA algorithm considered the bots to be low level of danger. botAnalytics is an expert system, which means that, henceforth, all suspicious activities that are associated with UM_HBot2 and UM_HBot4 bots, will be considered to be of high level of danger, because this type of activities have been saved into the system, previously.
98
6.3 Comparison of botAnalytics with Other Systems In the literature review, the botAnalytics system was compared with other existing HTTP-based Botnet detection systems. botAnalytics was able improve on the HTTP-based Botnet detection process by decreasing the false-positive rate, and increasing the detection efficiency of the system for small-scale Botnets (even with one bot). Table 6-4 shows a comparison of botAnalytics with systems developed in other researches.
Table 6-4: Comparison of the botAnalytics with existing HTTP-based Botnet detection researches
6.3.1 False-Positive rate The current detection methods are based on the fact that HTTP-based bots periodically connect to their Command and Control servers. Jae-Seo et al. (2008) used only this information and proposed their own method to detect HTTP-based bots. They cautioned, however, that the automatic applications such as downloaders can generate a high false-positive rate. Likewise, Gu et al. (2008) found that some programmes such as Gmail session, which periodically checks the emails for updates, can generate high false-positive results. They proposed the white list to avoid these false-positive results. However, the new filters in botAnalytics - H.A.R., and L.A.R. have successfully removed the type of traffic mentioned above. In effect, this reduced the false-positive rate without using the black list
99
or the white list. Table 6-5 shows the false-positive results in the botAnalytics detection process.
Table 6-5: The botAnalytics False-Positive
Without using the black list or the white list technique, the H.A.R. and L.A.R. filters successfully removed all the traffic in the studies by Jae-Seo et al. and Gu et al. As shown in table 6-5, the false-positive rate for each client of between 0.002 and 0.003 can be considered as 0. However, if the white/black list is used by the botAnalytics system, it can generate 0% false-positive. As discussed above, botAnalytics incorporates new algorithms to determine the level of danger of detected suspicious activities, and they can help the system experts to make an informed decision.
6.3.2 Efficiency in small-scale Botnets BotSniffer (Gu, et al., 2008) and BotMiner (Guofei, et al., 2008) are two Botnet detection system that carry out their tasks based on an analysis of group activities. They rely on the fact that bots from the same Botnet generate the same traffic (Guofei, et al., 2008). Gu et al. (2008) found that group activity analysis requires a reasonable number of members (bots) in one Botnet.
100
However, BotSniffer, proposed a sub-system for small-scale Botnets; but it is not as robust as the group analysis technique. The results in table 6-5 show that only one bot in each client scenario was used to validate the efficiency of system. This assumes that there is only one bot in the entire Botnet. The results also show that botAnalytics is able to detect a Botnet which has only one bot.
6.4 Conclusion In this chapter, the filters and the new algorithms developed in this research were discussed. The filters include: H.T.S., G.P.S., G.A.S, H.A.R., L.A.R., and P.A.R. filters and the new algorithms include the VOU and LODA algorithms. The test results show that the new filters used in botAnalytics are efficient in reducing the amount of unneeded traffic data packets to facilitate detection of suspicious bot activities. The VOU and LODA algorithms were developed to evaluate the level of danger posed by detected suspicious activities. The botAnalytics system was compared with existing HTTP-based Botnet detection techniques in terms of false-positive rates, and their efficiency when used in small-scale Botnets. The results show that botAnalytics is able to detect HTTP-based bots with very low false-positive ratio (can be considered zero), and even if a Botnet has one bot, only.
101
Chapter 7: Conclusion and Future Work

7.1 Introduction This chapter presents the conclusion of the research. It discusses the achievement of the research objectives, and the contributions made. There are also recommendations for future work to be undertaken.
7.2 Achievement of Objectives The aim of the research was the development of an HTTP-based Botnet detection system, based on the network behaviour analysis system. The botAnalytics system was aimed at detecting bots which are in a centralised HTTP-based Command and Control Botnets and hide their malicious activities among normal HTTP traffic. The botAnalytics system was designed based on the network behaviour analysis technique and developed using the Delphi programming language. The history of Botnet and different types of bots were reviewed. In addition, the network behaviour analysis system and other existing Botnet detection methods such as Honeypot and Honeynet, signature-based, DNS monitoring, and attack behavior analysis were discussed to provide an overview and acquire relevant knowledge on the research field. In the botAnalytics system, the User-Agent which is one of the HTTP header fields was used for first time to evaluate the level of danger of the detected suspicious activities. In addition, considering that the existing methods mostly use the source and destination IP
102
addresses, Host, and URLs to classify the collected traffic flows, botAnalytics added UserAgent to these items to make the classification more accurate. Chapter 4 and chapter 5 discussed the design and implementation of the botAnalytics system. Chapter 6 highlighted the performance of botAnalytics system, which was evaluated on its false-positive rate, as well as its efficiency in small-scale Botnets. The performance of the botAnalytics system was compared to the performance of other existing HTTP-based Botnet detection methods developed by other researches. The results of testing show that botAnalytics produced a very low false- positive result, and exhibited very high efficiency in detecting even small-scale Botnets (even with one bot in the whole Botnet) when compared to other methods.
7.3 Contributions The development of a detection system for HTTP-based Botnets, called botAnalytics, is the main contribution of this research. The major contributions can be summarised as follows:
7.3.1
HTTP-based Botnet Detection: botAnalytics adopted the network behaviour analysis architecture to detect HTTP-based Botnet. It has achieved very good performance - has a very low falsepositive ratio, and highly efficient in detecting even small-scale Botnets.
103
7.3.2 Establishment of User-Agent : Aside from detecting malicious HTTP-based Botnet activities, the botAnalytics system uses the User-Agent field of the HTTP traffic packet headers to evaluate the level of danger of detected suspicious activities. This is the first time such a method has been used for Botnet detection. The botAnalytics system also uses the User-Agent as another item or parameter beside the source and destination IP addresses, Hosts and URLs, to make the collected network packets classification more accurate.
7.3.3 New Filters and Algorithms: In this research, new H.A.R. and L.A.R. filters were developed and used for the first time for Botnet detection. The filters are very effective in removing useless traffic data packets, thus, making the analysis process more accurate. Two new algorithms - VOU and LODA algorithms were introduced in this research. These algorithms were used to check the detected suspicious HTTP-based bots activities to determine the level of danger they posed.
7.3.4 Evaluate the Level of Danger: Currently, all existing Botnet detection methods are designed to detect the bots, only. In this research, however, in addition to detecting the Botnets, botAnalytics also evaluated the level of danger of detected bots activities.
104
7.4 Limitations and Future Work There are limitations in this research, which it is hoped, will be resolved in future work, and they include:
7.4.1 Real Time Detection: As discussed in the literature review, the botAnalytics system was designed based on the network behaviour analysis system, which works passively. This means that the system collects the network traffic for a specific period. It then analyses the collected data to look for any evidence of HTTP-based bot activities. Therefore, the proposed system is unable to do real-time detection.
7.4.2 Linux Platform As noted in literature review, one of the main targets of the Botmasters is the computers using high bandwidth for network connection. This includes computers in the universities or the home where users have a low level of awareness and knowledge about computer security. These targets of Botmasters usually use the Windows operating system. botAnalytics, however, was implemented using the Delphi programming language which generates the native codes for the Windows platform. Therefore, botAnalytics cannot run on other operating systems such as Linux.
105
7.4.3 Other Type of Bots and Botnets: In this research, botAnalytics was designed to detect the HTTP-based bots only. As noted in literature review, two other types of bots - IRC and P2P cannot be detected by botAnalytics. For future work, the architecture and characteristics of other types of Botnets should be reviewed, so that appropriate systems can be developed to detect them.
7.4.4 Prevention Methods: botAnalytics system does not provide any firewall against bots. In the event malicious bots are detected, the botAnalytics system is not able to block their activities. The system security experts can only use botAnalytics to detect the bots, but they should consider using an external firewall to block the detected bots activities.
7.4.5 Advanced the User-Agent for Botnet Detection : The User-Agent field is used to evaluate the danger level of suspicious activities, and to classify the collected packets more accurately. Future efforts should focus on designing a pattern recognition method to distinguish the original User-Agents from the fake ones, and to use this method to improve Botnet detection.
106
7.5 Conclusion This thesis reports on the research conducted to develop a method for detecting HTTP-based Botnets based on the network behaviour analysis system. In addition to detecting the HTTP-based bots, the botAnalytics system also introduces a technique to evaluate the danger level of suspicious activities. This is done by using one of the HTTP header fields, called User-Agent, and this is the first time this technique has ever been used in Botnet detection systems. The botAnalytics system developed has been evaluated and its performance has been compared to the other HTTP-based Botnet detection systems. The system was evaluated based on the false-positive rate and its efficiency in Botnets detection even in small-scale Botnets. The testing results show that the botAnalytics system has achieved higher efficiency in detecting small-scale Botnets when compared with other existing methods. The very low false-positive ratio obtained through the use of the H.A.R., and L.A.R. filters, shows that the botAnalytics system is able to detect HTTP-based bots, very efficiently. All the aims and objectives of the research have been achieved, and botAnalytics is certainly an improved method for detecting HTTP-based Botnets. There is, however, a need to be vigilant because the threats from Botmasters and their Botnets are on-going, and there are also the continuing threats from IRC and P2P Botnets.
107
References
Al-Hammadi, Y., & Aickelin, U. (2008). Detecting Bots Based on Keylogging Activities. Paper presented at the Proceedings of the 2008 Third International Conference on Availability, Reliability and Security. AUSCERT. (2002). Increased intruder attacks against servers to expand illegal file sharing networks from https://auscert.org.au/render.html?it=2229&template=1 Bailey, M., Cooke, E., Jahanian, F., Yunjing, X., & Karir, M. (2009). A Survey of Botnet Technology and Defenses. Paper presented at the Conference For Homeland Security, 2009. CATCH '09. Cybersecurity Applications & Technology. Barroso, D. (2007). Botnets The Silent Threat ENISA Position Paper No. 3. C. Zou, C., & Cunningham, R. (2006). Honeypot-Aware Advanced Botnet Construction and Maintenance. Paper presented at the Proceedings of the International Conference on Dependable Systems and Networks. Cantu, M. (2003). Mastering Delphi 7: SYBEX Inc. Chao, L., Wei, J., & Xin, Z. (2009). Botnet: Survey and Case Study. Paper presented at the International Conference on Innovative Computing, Information and Control (ICICIC). Derek, B. (2009). Network Behavior Analysis: Protecting by Predicting and Preventing Retrieved from http://www.aberdeen.com/aberdeen-library/6421/RB-network-behavior-analysis.aspx Erickson, J. (2009). Database Technologies: Concepts, Methodologies, Tools, and Applications (Vol. 1): Information Science Reference - Imprint of: IGI Publishing. Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., et al. (1999). Hypertext Transfer Protocol -- HTTP/1.1: RFC Editor. Franois, P. (2010). The Internet Component Suite. from http://www.overbyte.be/frame_ index.html Gileadi, I., Ford, C., Moerman, M., & Purba, S. (2007). Patterns for Performance and Operability: Auerbach Publications. Gough, J. (2005, 29 March-1 April 2005). Virtual machines, managed code and component technology. Paper presented at the Software Engineering Conference, 2005. Proceedings. 2005 Australian. Govil, J., & Jivika, G. (2007). Criminology of BotNets and their detection and defense methods. Electro/Information Technology, 2007 IEEE International Conference 215 - 220 Gu, G., Zhang, J., & Lee, W. (2008). BotSniffer: Detecting botnet command and control channels in network traffic. In Proceedings of the 15th Annual Network and Distributed System Security Symposium (NDSS'08). Guofei, G., Phillip, P., Vinod, Y., Martin, F., & Wenke, L. (2007). BotHunter: detecting malware infection through IDS-driven dialog correlation. Paper presented at the Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium. Guofei, G., Roberto, P., Junjie, Z., & Wenke, L. (2008). BotMiner: clustering analysis of network traffic for protocol- and structure-independent botnet detection. Paper presented at the Proceedings of the 17th conference on Security symposium. Ianelli, N., & Hackworth, A. (2005.). Botnets as a vehicle for online crime, CERT, Request for Comments (RFC) 1700.
108
Jae-Seo, L., HyunCheol, J., Jun-Hyung, P., Minsoo, K., & Bong-Nam, N. (2008). The Activity Analysis of Malicious HTTP-Based Botnets Using Degree of Periodic Repeatability. Paper presented at the Proceedings of the 2008 International Conference on Security Technology. Jing, L., Yang, X., Kaveh, G., Hongmei, D., & Jingyuan, Z. (2009). Botnet: classification, attacks, detection, tracing, and preventive measures. EURASIP J. Wirel. Commun. Netw., 2009, 2-2. Joe, S. (2004). Bobax Trojan Analysis. from http://www.secureworks.com/research/threats /bobax/ Kalt, C. (2000). Internet relay chat: architecture (RFC 2810). Request for Comments (RFC), 2810. Lee, W., Wang, C., & Dagon, D. (2007). Botnet Detection: Countering the Largest Security Threat (Advances in Information Security): Springer-Verlag New York, Inc. Magmonsock. (2010). Internet Packet Monitoring Components. from http://www.magsys.co.uk/delphi/magmonsock.asp McAfee. (2010). 2010 Threat Predictions. from http://www.mcafee.com/ us/local_content/white_papers/7985rpt_labs_threat_predict_1209_v2.pdf Messmer, E. (2009). America's 10 most wanted botnets. from http://www.networkworld.com/news/2009/072209-botnets.html?page=1 Microsoft. (2008). Leaping Forward: SQL Server 2008 Compared to Oracle Database 11g. from http://download.microsoft.com/download/6/9/d/69d1fea7-5b42-437a-b3baa4ad13e34ef6/SQL2008_vs_Oracle11g.docx Microsoft.com. (2010). SQL Server 2008 Editions. from http://www.microsoft.com/ sqlserver/2008/en/us/editions.aspx Mitsuaki, A., Takanori, K., Masayoshi, S., Teruaki, Y., Youki, K., & Suguru, Y. (2007). A Proposal of Metrics for Botnet Detection Based on Its Cooperative Behavior. Paper presented at the Proceedings of the 2007 International Symposium on Applications and the Internet Workshops. Mysql.com. (2010). MySQL Documentation: MySQL Reference Manuals. from http://dev.mysql.com/doc/ Naseem, F., shafqat, M., Sabir, U., & Shahzad, A. (2010). A Survey of Botnet Technology and Detection. International Journal of Video & Image Processing and Network Security IJVIPNS-IJENS, Vol: 10 No: 01. Nazario, J. (2007). BlackEnergy DDoS Bot Analysis. from http://atlaspublic.ec2.arbor.net/docs/BlackEnergy+DDoS+Bot+Analysis.pdf Niels, P., & Thorsten, H. (2007). Virtual honeypots: from botnet tracking to intrusion detection: Addison-Wesley Professional. O'Connor, P. (2004). Using Computers in Hospitality: Cengage Learning Business Press. Oracle.com. (2010). Oracle Database 11g Editions. from http://www.oracle.com/us/products/database/product-editions-066501.html Ping, W., Lei, W., Baber, A., & Cliff, C. Z. (2009). A Systematic Study on Peer-to-Peer Botnets. Paper presented at the Proceedings of IEEE International Conference on Computer Communications and Networks. Ping, W., Sherri, S., & Cliff, C. Z. (2007). An advanced hybrid peer-to-peer botnet. Paper presented at the Proceedings of the first conference on First Workshop on Hot Topics in Understanding Botnets. Puri, R. (2003). Bots & botnet: An overview. from http://www.sans.org/reading_room/ whitepapers/malicious/bots-botnet-overview_1299
109
Rajab, M. A., Zarfoss, J., Monrose, F., & Terzis, A. (2006). A multifaceted approach to understanding the botnet phenomenon. Paper presented at the Proceedings of the 6th ACM SIGCOMM conference on Internet measurement. Rehak, M., Pechoucek, M., Grill, M., Stiborek, J., Bartos, K., & Celeda, P. (2009). Adaptive Multiagent System for Network Traffic Monitoring. Intelligent Systems, IEEE, 24(3), 16-25. Risso, F., & Degioanni, L. (2001, 2001). An architecture for high performance network analysis. Paper presented at the Computers and Communications, 2001. Proceedings. Sixth IEEE Symposium on. Sandvine. (2006). Dynamic Botnet Detection. from michaeldundas.com/papers/dynamic_botnet_detection.pdf Scarfone, K., & Mell, P. (2007). Guide to Intrusion Detection and Prevention Systems (IDPS),NIST CSRC special publication SP 800-94. from http://csrc.ncsl.nist.gov/publications/nistpubs/800-94/SP800-94.pdf Schiller, C., & Binkley, J. (2007). Botnets: The Killer Web Applications: Syngress Publishing. Srikanth, K., Dina, K., Matthias, J., & Arthur, B. (2005). Botz-4-sale: surviving organized DDoS attacks that mimic flash crowds. Paper presented at the Proceedings of the 2nd conference on Symposium on Networked Systems Design \& Implementation Volume 2. Star. (2008). A guide for small and medium businesses. from www.gfi.com/whitepapers/security_threats_SMBs.pdf Strayer, W. T., Walsh, R., Livadas, C., & Lapsley, D. (2006, 14-16 Nov. 2006). Detecting Botnets with Tight Command and Control. Paper presented at the Local Computer Networks, Proceedings 2006 31st IEEE Conference on. Tanenbaum, A. (2002). Computer Networks: Prentice-Hall ISBN. Teixeira, S., & Pacheco, X. (2001). Borland Delphi 6: Developer's Guide: Sams. Timofte, J., & Romania, P. (2007). Securing the Organization with Network Behavior Analysis, Economy Informatics, 1-4/2007. from economyinformatics.ase.ro/content/EN7/JTimofte.pdf Wei, L., Tavallaee, M., Goaletsa, R., & A. Ghorbani, A. (2009). BotCop: An Online Botnet Traffic Classifier. Paper presented at the Proceedings of the 2009 Seventh Annual Communication Networks and Services Research Conference. Yinglian, X., Fang, Y., Kannan, A., Rina, P., Geoff, H., & Ivan, O. (2008). Spamming botnets: signatures and characteristics. Paper presented at the Proceedings of the ACM SIGCOMM 2008 conference on Data communication. Zeidanloo, H. R., & Manaf, A. A. (2009, 28-30 Dec. 2009). Botnet Command and Control Mechanisms. Paper presented at the Computer and Electrical Engineering, 2009. ICCEE '09. Second International Conference on. Zhaosheng, Z., Guohan, L., Yan, C., Zhi Judy, F., Phil, R., & Keesook, H. (2008). Botnet Research Survey. Paper presented at the Proceedings of the 2008 32nd Annual IEEE International Computer Software and Applications Conference.
110

HTTP-Based Botntet Detection

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

HTTP-Based Botntet Detection

Hochgeladen von

Copyright:

Verfügbare Formate

botAnalytics: Improving HTTP-Based Botnet Detection by Using Network Behavior Analysis System

Faculty of Computer Science and Information Technology University of Malaya 2010

4.4 Database Implementation ..................................................................................... 55 4.4.1 4.4.2

7.5 Conclusion ........................................................................................................... 107 References .................................................................................................................... 108

Chapter 2 (Literature Review):

This chapter presents information from the

Chapter 3 (Modeling of Detection System):

This chapter presents the steps

Chapter 2: Bot and Botnets

Figure 2-1: Botnet life cycle (Schiller & Binkley, 2007)

Figure 2-2: General schema of Botnets C&C mechanism

Figure 2-3: Centralised Botnet (Ping, et al., 2007)

Figure 2-4: IRC-based C&C Botnet (Gu, et al., 2008)

Figure 2-5: HTTP-based C&C Botnet (Gu, et al., 2008)

Figure 2-6: Decentralized or P2P Botnet (Ping, et al., 2007)

Chapter 3: Modeling of Detection System

Figure 3-1: botAnalytics System Architecture

Figure 3-2: The flowchart of H.T.S. filter

Figure 3-3: The flowchart of G.P.S. filter

Figure 3-4: The VOU Module Flowchart

Figure 3-5: Flowchart of H.A.R. Filter

Figure 3-6: Flowchart of L.A.R. Filter

Figure 3-7: P.A.R. Filter Flowchart

Figure 3-8: LODA Module Flowchart

Figure 3-9: The Proposed Method Flowchart

Chapter 4: Implementation of Proposed Model

Figure 4-1: botAnalytics Client Side GUI

Figure 4-2: Setting GUI

Figure 4-3: Traffic Sniffer GUI

Figure 4-4 : H.T.S Filter GUI

Figure 4-5 : G.P.S Filter GUI

Figure 4-6 : VOU Mechanism GUI

Figure 4-7 : VOU Pseudo Code

Field Name QCode Question

Field Type Integer Varchar(100)

Description Primary Key Security Question

Field Name RoleCode RoleName

Field Type Integer Varchar(20)

Description Primary Key Type of User (Administrator / Operator)

Field Name UACode UAString UACA UADes UserCode

Field Type Integer Varchar(255) Varchar(30) Varchar(255) Integer

Field Name WCode UAString IP URL UserCode

Field Type Integer Varchar(255) Varchar(30) Varchar(255) Integer

Field Name BCode UAString IP URL UserCode

Field Type Integer Varchar(255) Varchar(30) Varchar(255) Integer

Table 4-10: tblClientsInfo Structure

Field Name CLCode SIP SMA JDate

Field Type Integer Varchar(15) Varchar(17) Date

Field Name TCode MethodType

Field Type Integer Varchar(10)

Description Primary Key GET / POST / Other

Field Name VCode VouValue

Field Type Integer Varchar(8)

Description Primary Key VALID/ NOTVALID/UNKNOWN

Field Name LCode LodValue

Field Type Integer Varchar(4)

Description Primary Key High / Low

Field Name NCode CLCode

Field Type Integer Varchar(30)

Figure 4-8: botAnalytics Database: Relationship between the Tables

Figure 4-9: botAnalytics Server Side GUI

Figure 4-10: General Info GUI

Figure 4-11: GET and POST Percentage Query Pseudo Code