Sie sind auf Seite 1von 5

A Flow based Anomaly Detection System using

Chi-square Technique
Muraleedharan N, Arun Parmar , Manish Kumar
Computer Networking and Internet Engineering
Centre for Development of Advanced Computing (C-DAC)
Bangalore, India
{murali,parmar,manish}@ncb.ernet.in

Abstract— Various tools, which are capable to evade different that attacker monopolize a system or network by DoS attack.
security mechanisms like firewall, IDS and IPS, exist and that The gathered information can make an attacker to get
helps the intruders for sending malicious traffic to the network or maximum benefit out of it with less probability of detection.
system. So, inspection of malicious traffic and identification of Proper detection of initiator for malicious traffic can help to
anomalous activity is very much essential to stop future activity prevent attacks in future.
of intruders which can be a possible attack. In this paper we
present a flow based system to detect anomalous activity by using Various tools are available and by using those tools attacker
IP flow characteristics with chi-square detection mechanism. can conduct different types of scanning and flooding on a
This system provides solution to identify anomalous activities like network or system. Moreover, some of the scanning and
scan and flood attack by means of automatic behavior analysis of flooding tools provide features for evading firewall rules or
the network traffic and also give detailed information of attacker, sneaking past intrusion detection or prevention systems [2].
victim, type and time of the attack which can be used for Examining the traffic for detecting anomalous activity of
corresponding defense. Anomaly Detection capability of the attacker in initial stage is very much required to stop further
proposed system is compared with SNORT Intrusion detection attacks.
system and results prove the very high detection rate of the
system over SNORT for different scan and flood attack. The Traditionally, to detect malicious traffic, parameters like
proposed system detects different stealth scan and malformed number of connection request, number of rejected connection,
packets scan. Since the probability of using stealth scan in real average packet size, flag values present in the packet headers
attack is very high, this system can identify the real attacks in the etc are used. But due to the dynamic nature of network traffic
initial stage itself and preventive action can be taken. and application access, this type of detection can generate false
alarms. Required challenge in anomaly detection is the volume
Keywords- Anomaly detection, chi-square, flow, scan detection of data for analysis. Collection and analyses of network traffic
information at packet level for a high-speed network to provide
I. INTRODUCTION accurate result in real-time is a difficult task. In case of high
speed network, packet-level traffic monitoring is expensive
Intrusion detection system can be classified into two, because of deep packet inspection requirement and those
Signature based and anomaly based. Signature based method is intrusion detection systems can detect only known attacks
very useful for known attacks and well known tools are based on signatures. Due to these reasons, flow based traffic
available such as IPS and IDS. Anomaly detection is an active monitoring and anomaly detection are important.
area in network intrusion detection research which was
originally proposed by Denning [1]. It can detect various types IP flow is a unidirectional sequence of IP packets of a given
of intrusion based on the deviation in the normal usage of protocol travelling between a source and destination IP, port
network and this has an advantage over signature based pair within a certain period of time [3]. Capturing flow plays an
technique. important role for network monitoring and network security.
Netflow version 9 (standardized by IETF) protocol
Anomaly detection refers to finding out the abnormal implemented by Cisco [3] is widely used for network
pattern of traffic or abnormal behavior from network or system. monitoring and anomaly detection in high speed network.
Malicious traffic like scan, flood, DoS, DDoS, worm, high data Many other solutions were developed like jflow, netstream,
transfer etc, changes the normal behavior in network traffic. cflowd for exporting flow from routers.
The detection of those malicious activities with detailed
information is very much necessary for identifying and As a universal standard of export of internet protocol flow
preventing the attacks. To collect the information about information from routers, probes and other devices, IPFIX is
network, system and applications, intruders uses scanning developed [4]. IPFIX created in 2001 is a working group of
techniques. In initial stages, attackers try to understand the IETF for standardizing and it is to enhancing the aspect of
systems and their vulnerabilities in it by analyzing the gathered network flow capturing and filtering. Important aspect of
information. Once the attacker discovers the vulnerabilities, he IPFIX over other flow export protocol is that in IPFIX, user
tries with different exploits on those vulnerabilities to get can define the parameters in the flow by means of a template.
control over the system or network. Sometimes it may happen So it can be used for network monitoring, network planning,

978-1-4244-4791-6/10/$25.00 2010
c IEEE 285
security analysis, application monitoring, host monitoring, and composed of several traffic parameters and constant value.
traffic engineering. Information model for IPFIX is defined in Their proposed method strictly focuses on the detection of
rfc5102 [5]. DoS/DDoS attack. Another work based on flow monitoring is
explained in [12] which work on monitoring the four
In this paper, we propose the usage of IPFIX with a novel predefined metrics that capture the flow statistic of the
approach of network anomaly detection method based on chi- network. This method is capable to detect UDP flood, ICMP
square mechanism through transport layer protocol behavior flood and scanning, by using Holt-Winters Forecasting
analysis. This system collects flow data as input for the technique. This technique makes projection about future
analysis and defines flow records using IPFIX protocol. By performance based on historical and current data of the
using this method, we can identify the anomalies like scan, network. The prediction which comes out by this technique
flood and more granular information about type of scans, may arise false alarms because network behavior is not static.
attackers, victims and attack time etc which is useful for
corresponding defense. Another advantage of this method is By using flow aggregation Y Hu, DM Chiu and J Lui [13],
that, using the same technique, different types of scan and identifying abnormal cluster. Proposed algorithm relies on
flood can be detected. information-theoretic techniques and identifies the clusters of
attack flows by using required parameters like source IP,
The remainder of this paper is organized as follows. The
destination IP, source port and destination port. With the help
related works are introduced in section 2. Section 3 explains of same parameters an entropy based flow aggregation
the architecture and detection techniques of our system. In algorithm [14] is developed for identifying the abnormal
section 4, the experiment set-up, results and results analysis are distribution of traffic in the cluster. In both the proposed
explained. Section 5 concludes this paper. system they used different combination of four parameters.
These methods identifying abnormal behavior by volume of
II. RELATED WORK cluster corresponding the considered combination and have
low overhead but can result false alarms.
Survey by Varun, Arindam and Preveen [6] provides a Based on the scanning behavior and patterns, a probabilistic
structured and comprehensive overview of the research on technique has developed for scan detection and proposed by
anomaly detection by grouping the existing techniques in to C.Leckie and R.Kotagiriv [15]. In proposed scheme, scan
different categories based on the underlying approach and also detection is done based on count of same combination of
discuss about the computational complexities of different parameters like source, destination IP and port. This scheme
techniques. Qayyum et.al [7] presents guidelines for statistical relies on the packet based approach for detecting scan
based anomaly detection techniques with the perspective of activities.
different scenarios and areas of implementation. They also In our proposed technique we have used eighteen
describe the pros and cons of different anomaly detection parameters for anomaly detection which get high impact in
techniques. case of malicious traffic. Other than the detection of anomalous
Due to the simplicity and higher performance, chi-square activities, with the help of these parameters, the proposed
detection is used widely in the area of anomaly detection. Nong system can provide the details of attack category, attack type,
Ye et.al [8] suggested chi-square based statistical profiling for victim and attacker details. Moreover this system is capable to
anomaly detection which is based on packet inspection. SM detect distributed scan and flood attacks.
Masum and Nong Ye [9] have explained the robustness and In our previous work on flow data analysis [16], we
scalability of chi-square mechanism over Canberra method for identified the behavior of flow data with respect to different
real-time intrusion detection in large network. They tested the transport layer protocol like TCP, UDP and ICMP. By using
method with different dataset containing both normal and flow information different types of anomaly detection can be
intrusive activities and proved the good detection ratio. Nong done. By considering the small volume of flow data Quyen et
Ye, CM Borror and D Parmar [10] have done the comparison al [17] has proposed a solution for detecting scan activities.
of chi-square mechanism with Hotelling’s T2 control chart. Usually scan and flood attacks contain huge number of packets
Their result shows more scalability and accuracy in chi-square of small size but tools are available for doing the scanning with
procedure. heavy data length option in packets. By using those available
In case of high speed network, profiling and anomaly option attackers can scan network or system with heavy
detection based on packet inspection can be expensive. Based payload in the packet and which can be undetectable by
on the study of chi-square and to remove the expensiveness, we considering the small volume of packets.
suggest the use of chi-square techniques over flow data for
network monitoring and anomaly detection. III. PROPOSED SYSTEM
Recently, instead of packet based analysis, flow based
A. Objective
security analysis and anomaly detection are getting attention
from many researchers. Mayung et al [11] suggests that by The main objective of the system is to develop a generic
aggregating packets of the identical flow, one can identify the solution for detecting network anomalies like scan and flood
abnormal traffic pattern that appears during attack. They for high speed network using flow data. Another objective is
formalize detection function for attack detection, which are to provide more granular information, like scan type, time,

286 2010 IEEE 2nd International Advance Computing Conference


protocol, attacker and victim, about the detected anomaly with TABLE I. PARAMETER LISTS
the help of IPFIX protocol.
Protocol
B. Architecture TCP UDP ICMP
Figure 1 illustrates the architecture of proposed anomaly No of Packets
detection system. This system consists of three key

Parameters
Avg Packet size in bytes
components named as flow probe, flow collector and flow Average flow duration in second
analyzer. The flow analyzer consists of Profiler and Anomaly Number of flows
detector. The system accepts flow data as the input and Average packets per flow
generates alerts as output. Number of single packet flow

Flow Probe: Aggregates the packets based on flow keys


and generates flow packets. These flow packets are exports to Threshold Setting
the flow collector. Flow enabled network devices like router or
At the initial stage, for learning the network behavior, our
switches do the function of flow probe.
system profiles the traffic for a specified duration and keeps
Flow Collector: collects the exported flow packets from track of required parameters for setting the threshold. Entire
flow probe and creates flow records. These flow records are profile period ‘α’ is subdivided in to multiple profile intervals
given to flow analyzer for detecting anomalous activities. ‘β’ with equal time duration and the number of profile intervals
‘δ’ can be calculated as
Flow Analyzer: As a generic anomaly detection system,
this system is also works in a profile and detection mode. In α
profile time, the system identifies the normalcy of the traffic δ= (1)
and derives a base line (threshold) for normalcy. In detection β
time, system calculates threshold using real time data and During each profile interval, values of all parameters is
compares the calculated threshold with profile time threshold. observed and stored in the database. Once the profile period is
If the detection time threshold is higher than the profile time over, the mean value of each parameter are calculated and
threshold, it generates an alert. corresponds to each profile interval, it calculates the chi-square
value using the following formula (2).
C. Detection Method
Parameters
18
(O i − E i ) 2
We have considered TCP, UDP and ICMP traffic for this
flow analysis. Based on the behavior analysis of normal and
χp = ∑ i =1 Ei
(2)

abnormal IP flow information, we have selected 6 different


parameters from each protocol (TCP, UDP and ICMP) for our Where ‘ χp’ is the chi-square value for the profiler interval ‘p’ ,
analysis. Major reasons for selecting these parameters for ‘Oi’ is the observed value and ‘Ei ‘ is the expected value of
analysis are firstly, these parameter shows large variation in parameter ‘i’ for profiler interval ‘p’ . We have considered the
anomalous activities like scan and flood attacks. Secondly, mean as the expected value. Then it calculates the mean (µ χ)
other than network based anomaly, these parameters can and standard deviation (σχ) of all ‘δ’ chi-square values. The
identify protocol based anomalies also. List of selected threshold value ‘T’ for anomaly detection is calculated using
parameters for our analysis are shown in Table 1. the equation
Number of packets, flow, single packet flow and average
flow duration has significant changes in scan and flood attack. Τ = μχ + 3×σ χ (3)
Similarly, using average packet size, average packets per flow
and number of single packet flow, anomalous behavior in After profile period, system moves in to detection mode. In
protocol (TCP) can be detected. Since ICMP header does not detection mode it calculates the chi-square value (χd) by
have any port number field, the type and code value in the collecting the considered parameters from the network traffic
ICMP header are used for flow creation. by using the following formula.
18
(O i − E i ) 2
χd = ∑
i =1 Ei
(4)

Where ‘Oi’ is the observed value from the real time traffic and
‘Ei ‘ is the expected value from profiled traffic of parameter ‘i’.
To identifying the anomaly, ‘χd‘ is compared with the threshold
value ‘T’. If χd > T then it detected an anomaly and generates
an alert.
Figure1:Architecture Diagram After detecting the anomaly, system analyzes the
corresponding flow data set for verification and identification

2010 IEEE 2nd International Advance Computing Conference 287


of attack type. Attacker and victim details are identified using TABLE II. RESULT COMPARISON
the source and destination IP fields in the flow data. By means Sl Attack Attack Output
of protocol, port number and flag value, attack type can be no Catego Type Proposed System Snort
identified. Alert generated by the system consists of attack ry Dete Detecti Dete Detecti
time, attacker and victim IP, category of attack (scan, flood, cted on cted / on
/Not Delay Not Delay
DoS and DDoS) and type (scan type and flood type) of attack.
detec ( sec) detec (Sec)
ted ted
1 SYN scan Yes 118 Yes 1
2 Connect Yes 83 Yes 1
IV. EXPERIMENT
3 ACK Yes 115 No -
A. Test set-up 4 NULL Yes 78 No -
5 FIN Yes 114 No -
For evaluating the detection capability, accuracy and
6 XMAS Yes 101 No -
performance of the system, we deployed it in a live network

Scan
which has around 250 nodes with different operating systems 7 OS fingerprint Yes 106 Yes 1
and application. This network is connected to Internet through 8 Maimon Yes 112 No -
a 2Mbps link and Internet access is restricted by means of 9 Window Yes 119 No -
proxy server. Flow data are collected from the gateway 10 Datalength Yes 100 No -
machine using a software flow probe called ‘libipfix’ [18] 11 Version Detection Yes 109 Yes 1
probe and exported in to a flow collector machine. Collector 12 Fast Scan Yes 79 Yes 1
machine keep track of flow data and analyze it for scanning 13 UDP Yes 78 Yes 1
and flooding activity. TCP Yes
14 Flood - Yes -
The gateway machine is connected to the mirrored port of 15 UDP Yes - Yes -
the switch through a hub. To compare the detection capability
of the system with ‘snort’ [19] intrusion detection system, we
have installed and configured ‘snort’ in one machine and it is C. Result analysis
also connected to the hub. So both the machine can access the From Table II, it is clear that the detection rate of the
same traffic which includes all incoming, outgoing and internal proposed system is higher than Snort. Out of 13 different scan,
traffic. We have enabled and configured the ‘sfportscan’ proposed system detected all of them (detection rate 100%)
preprocessor in snort for identifying the scanning activities. but snort detected only 6 (detection rate 46%) and all the flood
Profile period ( Į ) of the proposed system is configured as attacks are detected by both the system. Compared to Snort,
86400 seconds ( 1 day ) and profile interval (ȕ ) is 60 seconds ( the proposed system have the detection capability of stealth
1 minute ). scan like (XMAS, NULL and FIN). Scan using ACK flags in
We have identified two machines inside our network, one the TCP header can be detected using the state-full detection
to initiate anomalous activity which includes scan and flood mechanism but the proposed system detected the ACK scan
and other as victim of the anomalous activity. Time also.
synchronization of the attacker machine and flow analyzer is Nmap tool provides an option (--datalength) for changing
done through the ntp service. Using ’nmap’ [20] and ’hping3’ the default payload size of scan packets. By using this attacker
[21] tool, we have done different types of scanning and can craft a packet with heavy payload. We have used this
flooding from attacker machine to victim machine. option for creating scan packets with 300 byte size and
injected to the victim machine. The proposed system detected
this scan and identified the scan type.
B. Result Since the attacker try to hide from the visibility of detection
techniques like IDS and firewall, uses stealthy scan options for
Table II depicts the results of proposed anomaly detection
scan. Similarly the probability of detection can be reduced by
system. To verify the detection capabilities of the system, we
changing the default packet size in real attack scan to evade
have conducted 13 different scan attacks using ‘nmap’ tool and
the detection mechanism of victim network. By detecting the
2 different flood attacks (TCP and UDP) using ‘Hping3’ and
stealthy scan and malformed packet scan the proposed system
‘nmap’. Attack Category column of the table mention category
can identify the real attacks in the initial stage itself and
of anomaly and attack type describes the type attack on those
preventive action can be taken.
category. In the ‘Output’ column of the tables compares the
Regarding the detection delay, snort has an advantage over
detection capabilities of the system with SNORT intrusion
proposed system. The average detection delay of proposed
detection system. The ‘Detection Delay’ column in the output
system is 100 seconds and that of snort is 1 seconds. The
provides the time difference (in seconds) between scan and its
detection delay of proposed system can be reduced by
detection.
reducing the profile interval and flow export time. Reduction
in the profile interval may create false positive.

288 2010 IEEE 2nd International Advance Computing Conference


V. CONCLUSION [8] Nong Ye, Qiang Chen, Syed Masum Emran and Kyutae Noh, Chi-
square Statistical Profiling for Anomaly Detection, In IEEE,2000
A flow based anomaly detection system using chi-square [9] Sayed Masun Emran, Nang YE,Robustness of Chi-Square and Canberra
technique is proposed in this paper. The proposed system Diastance Metrics for Computer Intrusion Detection,Quality and
provides a generic solution for detecting network anomalies Reliability Engineering International, 2002
like scan and flood for high speed network. By means of the [10] Nong Ye, Connie M. Borror, Darshit Parmar, Scalable Chi-Squae
Distance versus Conventional Statistical Distance for Process
flow data, the system is more effective to detect different type Monotoring with Uncorrelated Data Variables, Quality and Reliability
of anomaly with detailed information about the activity. The Engineering International, 2003
capability of the system is compared with Snort and the results [11] Myung-Sup Kim, Hun-Jeong Kang, Seong-Cheol Hong, Seung-Hwa
are found to be satisfactory. Chung, and James W. Hong, A Flow-based Method for Abnormal
Two related issues for future research are how to reduce the Network Traffic Detection, IEEE/IFIP Network Operations and
Managementk Symposium, 2004
detection delay of the system to achieve real-time detection
[12] HA Nguyen, T Van Nguyen, DI Kim, D Choi, Network traffic
and prevention, how to extend the system to detect worms. anomalies detection and identification with flow monitoring, WCON
’08,May 2008
[13] Yan hu, Dah-Ming Chiu, c. John C.S Lui, Adaptive Flow Aggregation –
REFERENCES A New Solution for Robust Flow Monitoring under Security Attacks, in
IEEE, Oct-2006
[1] D.E. Denning. An intrusion detection model, IEEE Transaction on [14] Yan hu, Dah-Ming Chiu, c. John C.S Lui, Entropy Based Adaptive Flow
Software Engineering, SE-13, 1987, 222-232J. Clerk Maxwell, A Aggregation, in IEEE/ACM, Dec 2007
Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: [15] C.Leckie, R.Kotagiri, A Probabilistic Approach to Detection Network
Clarendon, 1892, pp.68–73. Scans, in IEEE/IFIP,2002
[2] Firewall/IDS Evasion and spoofing, Nmap Reference Guide, [16] Muraleedharan N, Analysis of TCP Flow data for Traffic Anomaly and
http://nmap.org-bypass-firewalls-ids.html Scan Detection, 16th IEEE International Conference on Networks, 2008
[3] Introduction to Cisco IOS NetFlow – A Technical Overview, [17] M. Z. Le The Quyen and Y. Tanaka. Anomaly identification based on
http://www.cisco.com/en/US/prod/collateral/iosswrel/ps6537/ps6555/ps flow analysis, TENCON 2006. 2006 IEEE Region 10 conference,pp. 1-
6601/prod_white_paper0900aecd80406232.html 4, November 2006
[4] RFC 5101,Specification of the IP Flow Information Export (IPFIX) [18] http://meteor.fokus.fraunhofer.de/libipfix/downloads.php
Protocol for the Exchange of IP Traffic Flow Information [19] http://www.snort.org
[5] http://tools.ietf.org/html/rfc5102 [20] http://nmap.org
[6] Varun chandola, Arindam banerjee and Preveen kumar, Anomaly [21] http://www.hping.org
Detection: A survey. ACM Comput. Surv. 41, 3 Article 15(july 2009)
[7] A.Qayyum,M H Islam and Jamil, Taxonomy of Statistical Based
Anomaly Detection Techniques for Intrusion Detection, IEEE
International Conference on Emerging Technologies,2005

2010 IEEE 2nd International Advance Computing Conference 289

Das könnte Ihnen auch gefallen