Beruflich Dokumente
Kultur Dokumente
Introduction
PLAN
Experimental resultas
Conclusion
Introduction
Introduction
NIDS is a network intrusion detection system that attempts to detect malicious activities A problem of NIDS is that detects only known network attacks. Many methods proposed on the design of NIDS such as: decision tree based on C5 used by the KDD Cup 1999 Fuzzy rough C-means (FRCM) based on the fuzzy set theory. support vector machine (SVM)
This study proposes a SVM-based intrusion detection system based on a hierarchical clustering algorithm to preprocess the KDD Cup 1999 dataset before SVM training.
1
Time and memory are limited Incremental and dynamic clustering of incoming objects Only one scan of data is necessary Constructs a tree called a clustering feature (CF) tree Does not need the whole data set in advance Able to handle noise effectively.
Two key phases: Scans the database to build an in-memory tree Applies clustering algorithm to cluster the leaf nodes
P N i
SS
P N i
Theorem to merge sub-clusters: CF1 + CF2= (n1+n2, LS1+LS2, SS1+SS2) Given a cluster of instances we define: Centroid:
It starts from the root and traverses the CF tree recursively down to the leaf level by choosing the child node, whose centroid is closest at each level to the new entry.
5
no leaf node splitting is invoked, the algorithm simply adds the CF to reect the addition of the new entry. If leaf node splitting is invoked, the algorithm checks whether the parent node meets the branching factor constraint. oIf the parent node violates the B threshold, it is split and recursively traversed back to the root, while performing the same checks.
SVM have shown good results in data classication , but it is unable to operate at such a large dataset due to system failures caused by insufficient memory
The
10
CF trees construction
The CF trees can be constructed with a single scan of the dataset. four kinds of attacks in KDD Cup: DoS: Denial of Service R2L: illegitimate access from a remote machine U2R: Acquire the privileges of a super user Probing: scan of port One CF tree for normal taffic
In the KDD Cup 1999 data set there are 41 features LS and SS are contain the sum and square sum value of each of the features 10
Feature selection
not all features are needed in the design of a network intrusion detection system. It is critical to identify important features of network trafc data. a feature is important is determined based on the accuracy and the number of false positives of the system, with and without the feature.
11
Experimental resultas
The datasets contained 24 training attack types classied into four kinds of attacks, DoS, U2R, R2L, and Probe.
The best performance of this system in terms of accuracy was 95.72% with only a 0.73% false positive rate. this system showed superior performance in DoS and Probe attacks and suffered from both U2R and R2L attacks because the numbers of instances for these two attacks were too small in the original KDD Cup 1999 dataset. 12
This table presents the original numbers of instance for each kind of attack and the numbers of instances by CF trees with different threshold T
The hierarchical clustering reduces the number of instances for datasets. For exemple with a CF tree (T=0.2) the number of instances of Dos is very small compared to original data set (only 271)
13
This table compares the KDD Cup 1999 winners system and other researches, this system provided the best detection rate for DoS and Probe attacks. The ESC-IDS showed the best detection rate for the R2L attack, and Multi-classifier showed the best detection rate for the U2R attack.
in terms of accuracy, this system could achieve the best performance.
14
As shown in Table , the IDS detection rate of this study for new attacks is only 39.04%, and the worst detection is on new R2L attacks.
15
Conclusion
CONCLUSION ET PERSPECTIVES
Many researches concerning NIDSs applied SVMs because SVMs are well known for their generalization performances. this study proposed an SVM-based network intrusion detection system with BIRCH hierarchical clustering for data preprocessing. The BIRCH hierarchical clustering could provide highly qualified, abstracted and reduced datasets to the SVM training the resultant SVM classifiers showed better performance than the SVM classifiers using the originally redundant dataset.
16