Beruflich Dokumente
Kultur Dokumente
T where H
i2
,
im
]
T
is the weight vector connecting the i
th
hidden
node and the output nodes and b
i
is the threshold of the i
th
hidden node w
i
. x
i
denotes the inner product of the w
i
and x
i
. The value of N is calculated as in
Equation (4.1). For finding the Moore-Penrose matrix MATLAB is used. The
original matrix H is passed as a parameter from the J ava application and the
resultant matrix H
i
w i
x
(5.1)
133
Mean of values in window v
i
=
1
i
v i
x
(5.2)
Step 3: Gain
1
is the ratio of absolute difference between the current data point
from the median to the mean amplified by the threshold and calculated as
Gain
1i
=(|
i
-
u
i
|* t) /
i
and (5.3)
Gain
2
is the ratio that is normalized between two distance magnitudes.
Gain
2i
=
(|u
i
-
i
|)/ (|
i
-
u
i
|) * 100 (5.4)
The median that is calculated is over the short term window w. Similarly
the mean is calculated over the longer term window v. This ratio makes Gain
2
much
more robust to fluctuations that may happen in the long terms . Using Brute Force
method it is found that selecting (w/v <0.5) would be the best choice as this
depends on the performance of detection of outliers. If Gain
1
>Gain
2
, the data point
is classified as an outlier. Gain
2
is used as a data dependent threshold that classifies
Gain
1
as outlier or not. Gain
1
is always sensitive to the mean and to the variation of
current data point from median. Gain
2
is always sensitive to deviation of current data
point from mean and to variation of mean from median. In the presence of outliers in
window v, Gain
1
will be greater than Gain
2
.
Step 4: A stronger possibility of current data point being an outlier or CP is
indicated if Gain
1
is higher than Gain
2
. To classify the current data point
as CP an additional check is made to find if the point lies beyond a
certain band around the median represented as LL
i
and UL
i
. The
classification state is then saved in vector V.
Gain
1i
0 : ) (
1 : ) (
2
2
i i i i i
i i i i i i
GV UL CPx LL Gain
GV UL CPx LL CPx Gain
(5.5)
134
Step 5: Gain information in vector GV is used to classify outlier and CP. If the
current point has a higher Gain
1
as indicated in GV
i
the past three states
are considered for classification. Vector GV
si
is the sum state of
GV
i
,GV
i-1
and GV
i-2
whichstores state of the past two data points with
respect to the selected current data point. If the value of GV
si
is 3 it
indicates that outliers were detected in the past and current point could be
the change point. We test the previous states to make sure there was no
change point detected in the past two data points. If detected then the CP
is inferred as an outlier. Similarly if the value of GV
si
is 1 then it is
possible that current point is an outlier as shown in Equation (5.6) and
Equation (5.7).
GV
i
= 1 : GV
si
= (GV
i
+ GV
1 i
+ GV
2 i
)
= 0 : GV
si
= 0 (5.6)
GV
si
= 3 (GV
s
1 i
= 3 GV
s
2 i
= 3) : GV
s
i
=
1
= 1 (GV
s
1 i
= 1 GV
s
2 i
= 1):GV
s
i
= GV
s
i
+ GV
s
i
- 1 (5.7)
Finally CPx
i
is classified depending on the values of GV
si
. If the value of
the current point is 0, then it is inferred that there is no significant deviation. If the
current point is 1 or 2 and the current data point deviates more than t% threshold it is
inferred as an outlier and signifies a higher possibility of CPx
i
- 1 being a CP. If the
state of current point is 3, with accuracy it is inferred that two points prior to current
one is the CP.
= 0 : No change , adjust LL, UL to u
i
GV
si
= (1 2) ( CPx
i
> (CPx
i
+ t % CPx
i
) ): Outlier (5.8)
> 2 : Change point, adjust LL, UL
Step 6: The classification of outliers and change points as signified in the GV
si
vector is reported. The Gains and state elements of vector GV, GV
s
for
past data points N, N-1 and N-2 are persisted. Persistence of median and
mean over the window sizes while classifying current data point enables
online implementation.
135
5.7 STRUCTURE CHART
The overall flow in the research work between different modules is as
shown in Figure 5.3
Figure 5.3 Structure chart of the research work
The proposed CPOD Algorithm examines the network data and creates a
description of differences and stores in the knowledge base for further reference. If a
deviation is detected it signals the alarm unit. A strategy for invoking the deviation
analyzer is by querying periodically the knowledge base for the new profiles. Also
the profiler may signal when a new profile is added to the knowledge base. The
Alarm Unit is responsible for informing the administrator when the deviation
analyzer reports unusual behavior in the network stream. This can be in the form of
SMS, e-mails, console alerts, log entries etc.
5.8 IMPORTANCE OF SELECTING OPTIMAL SUBSET OF
FEATURES
The importance of selecting appropriate features is one of the important
steps that need to be performed before any kind of data manipulation. Several efforts
have been successfully put into research areas such as statistical pattern recognition,
statistics, ML and DM. Most of these research efforts were also successfully applied
136
in AI field such as image retrieval, text categorization and IDS. For selecting an
optimal subset of features from the given data set, the FS process generally performs
generation and evaluation of subset from the original data set with a stopping
criterion and finally the result is validated. The subset generation and evaluation
process is repeated until the stopping criterion is met. Recent research work has
shown that reaching an optimal set of features is still a NP-hard task. The FS process
normally does not produce or combine new features. Initially it starts from a given
set of fixed features and gradually searches for the optimal subset. During the subset
generation step new sets of features are selected based on an internal searching or
heuristic method. The factors that affect this step are the search direction and
starting point. In backward method it starts from the full set of features and
gradually removes features as needed while in forward method, it starts with an
empty set of features and incrementally adds features as needed. Bidirectional
approach combines both the methods and a random approach that tries to avoid local
minima. The search strategy is used to create the next set of candidate features. The
common searching strategies are heuristics, complete search or sequential search and
random search.
Once a feature subset is generated, the subset evaluation phase will
evaluate the features against a certain criterion. This stage is a complex task which
can be achieved with the help of expert knowledge information or using ML
algorithms or both. The stopping criterion is used to end the subset generation
process. This may happen when the process completes its search, or a specified
threshold is reached such as maximum number of iterations. In the final stage the
result validation will provide an empirical proof of the selected feature set with the
use of expert knowledge or by conducting a performance experiment. In practice, a
feasible way to evaluate the results is to conduct experiments before and after the FS
process and compare the overall performance of the algorithm.
Chapter 6 presents the proposed model for network level feature
evaluation. The model provides a solution for the subset evaluation step of the
feature selection process. The model uses statistical methods combined with SVM
and Relevance Vector Machines (RVM). The method is primarily designed for
network traffic based features.
137
5.9 EXPERIMENTAL RESULTS
Statistics of IP addresses and their count observed in our research
laboratory for a time period of 30 minutes specified as window w
1
, w
2
, w
3
as in
Table 5.1.
Figure 5.4 Initial packet capture screen
Table 5.1 Real time data collected
IP Address Time Window w
1
Time Window w
2
Time Window w
3
172.16.30.28 1010 1011 2000
172.16.30.91 86 86 90
172.16.30.75 415 492 512
172.16.30.108 140 140 140
172.16.30.70 24 24 24
172.16.30.92 58 58 58
172.16.30.68, 175 175 179
172.16.30.69 14 14 14
172.16.30.96 14 14 14
172.16.30.35 7 7 7
172.16.30.95 100 100 100
172.16.30.114 7 7 7
172.16.30.18 3 3 3
172.16.30.88 2 2 2
172.16.30.49 35 36 36
172.16.30.101 11 12 12
138
Outlier was detected for the network data over a time period of 9.00 to
5.00 for a particular host. For experimental analysis the data was collected. The
packet count for all the IP addresses was captured for one week on all working days
in a time window of 30 minutes from 9:00 AM to 5:00 PM. The average packet rate
was calculated for each time window and statistics are as shown in Table 5.2.
Table 5.2 Average packet rate for one week
Week 1 Average Packet rate
Time Day 1 Day 2 Day 3
9:00 1010 1000 900
9:30 1000 900 1000
10:00 3000 1000 1123
10:30 900 923 950
11:00 1020 900 856
11:30 1011 1000 1200
12:00 1000 1000 1000
12:30 5000 5950 5020
1:00 900 950 912
1:30 1001 1011 1010
2:00 1000 1000 1000
2:30 1120 1210 1000
3:00 980 900 1000
3:30 850 900 850
4:00 1020 1220 1211
4:30 1120 1020 1112
5:00 20 20 20
Figure 5.5 Outlier detected for Table 5.2 data
139
Figure 5.5 shows outlier detected for the network data over a time period
of 9.00 to 5.00 for a particular host.
For the real time statistics the data was collected for one week and the
graph is as shown in Figures 5.6 and 5.7.
Figure 5.6: Plot of real time data and detection of change point and outliers
Figure 5.7 Plot of real time data and detection of change point and outliers
140
Figure 5.8 Plot of real time data collected for week 1
Figure 5.9 Plot of real time data collected for week 2
Figure 5.10 shows CUSUM chart, which helps in analyzing the change
point and variation in the behavior for a particular host IP.
Figure 5.10 CUSUM analysis
141
Figure 5.11 shows standard deviation analysis, which is used for drawing
normal network profile with all the observed statistics and results
Figure 5.11 Standard deviation analysis
Figure 5.12 Outlier detected for the data given in Table 5.2
Figure 5.13 Snapshot showing threshold of IP addresses captured
142
5.10 SUMMARY
Lot of work on techniques for detecting changes in data is still in
progress. CPOD algorithm is based upon statistical method which is fast and space
efficient. The algorithm in practice with synthetic and real data sets revealed the
effectiveness in detecting changes.
In my research a solution for classifying outliers and CP from real time
data is implemented which is addressed in two parts: scoring and classification.
Scores are computed that reflect outliers and incrementally discover to keep the state
of outliers in data series. The algorithm is characterized in its property to address
outliers and change points at the same time. This enabled to deal with frequently and
fast changes in the source. The current implementation and usage indicates the
success of the algorithm. This gave a unifying view of outlier detection and change
point detection in real time network data. In this work, the design and successful
implementation of a system with OD was done.
My research work addresses management of the IDS where significant
detection of attacks present major difficulties. Using the RVM a probabilistic kernel-
based learning machine model is implemented that is explained in Chapter 6.
143
CHAPTER 6
APPLICATION OF RELEVANCE VECTOR MACHINES
IN DEVELOPING IDS
6.1 INTRODUCTION
Reasonable level of security is provided by static defense mechanisms
such as firewalls and software updates. Dynamic mechanisms can also be used to
achieve security such as IDS and Network Analyzers (NA). The main difference
between IDS and NA is that IDS aims to achieve the specific goal of detecting
attacks whereas NA aims to determine the changing trends in network of computers.
Earlier work emphasized that data can be obtained by three ways using real traffic,
sanitized traffic and simulated traffic. But in real time, fast response with reduced
false positives to external events within an extremely short time is demanded and
expected. Therefore design of alternative algorithms to implement real time learning
is imperative for critical applications for fast changing environments. Even for
offline applications, speed is still a need, and a real time learning algorithm that
reduces training time and human effort to nearly zero would always be of
considerable value. Mining data in real time is still a big challenge.
IDS involve automatic identification of unusual activity by collecting
data, and comparing it with reference data. An assumption of IDS is that a networks
normal behavior is distinct from abnormal or intrusive behavior, which can be a
result of various attack/s.
In my research work, flow analysis is used for network traffic analysis
which searches for behavioral characteristics in a flow. There are various
characteristics such as transferred bytes, packets, flow length, inter-arrival times,
144
inter-packet gaps, etc that are monitored and computed. Data that is collected from
flows can be used on high-speed networks, as there is no deep packet inspection.
In the recent years, there has been a growing interest in the development
of Change Detection (CD) techniques for the analysis of intrusion detection. This
interest stems from the wide range of applications in which CD methods can be
used. Detecting the changes by observing data collected at different times is one of
the most important applications of network security. Research in exploring CD
techniques for medium/high network data can be found for the new generation of
very high resolution data. The advent of these technologies has greatly increased the
ability to monitor and resolve the details of changes and makes it possible to
analyze. At the same time, they present a new challenge over other technologies in
that a relatively large amount of data must be analyzed and corrected for registration
and classification errors to identify frequently changing trend. In my research work
an approach for IDS which embeds a Change Detection Algorithm with Relevance
Vector Machine (RVM) is implemented. IDS are considered as a complex task that
handles a huge amount of network related data with different parameters. Current
research work has proved that kernel learning based methods are very effective in
addressing these problems. In contrast to SVM, the RVM provides a probabilistic
output while preserving the accuracy. Th143e focus of my work is to model RVM
that can work with large network data set in a real environment and develop RVM
classifier for IDS. The new model consists of Change Point Outlier Detection
(CPOD) algorithm (explained in Chapter 5) and RVM. The model is competitive in
processing time and improves the classification performance compared to other
known classification models like SVM. The goal is to make the system simple but
efficient in detecting network intrusion in an actual real time environment. Results
show that the model learns more effectively, automatically adjusting to the changes
and as well as adjusting the threshold while minimizing the false alarm rate with
timely detection.
In my research work a hybrid approach for improving the performance of
detection algorithm by building more intelligence to the system is proposed. In this
direction CP detection is considered for discovering change points if properties of
145
network behavior change. CP is the change in characteristics that occur very fast
with respect to the sampling period of the measurements, if not instantaneously. The
detection of changes refers to tools that help to decide whether such a change has
occurred in the characteristics or not. OD is another major step in DM problem
which discovers abnormal or deviating data points with respect to distribution in
data. Outliers are often considered as an error or noise although they may carry very
important information.
A real time detection system is one in which network intrusion detection
happens while an attack is occurring. A real time IDS captures the present network
traffic data which is on line data. Bayesian learning algorithms, like RVM allow the
user to specify a probability distribution over possible parameter values from the
learned classifier. This will provide one solution to the over fitting problem as the
algorithm can use prior distribution to regularize the classifier.
6.2 RELATION BETWEEN DM, ML AND STATISTICS
Figure 6.1 Relation between DM, ML and statistics
Figure 6.1 shows the relation between DM, ML and statistics. DM can
help to improve IDS by employing one or more of the following techniques:
Statistics
AI and ML
Database and
Datawarehouse
Data
Mini ng
146
1. Data summary: Statistics which includes finding outliers.
2. Visualization: Presenting a graphical summary of the acquired
answers.
3. Clustering of the data into categories.
4. Rule Discovery: Defining normal activity and enabling to discover
anomalies.
5. Classification: Predict the category to which a particular dataset
belongs.
Using DM techniques it is convenient to extract patterns on issues
relating to their feasibility, usefulness, effectiveness and scalability. Few specific
things that DM can contribute to the design of IDS are:
1. Remove normal activity from alarm data set and allow analysis that
focus on real attacks.
2. Identify false alarms and insignificant signatures
3. Find anomalous activity that uncovers a real attack.
4. Identify important, interesting ongoing patterns in time series data
set.
Some of the benefits of using DM Techniques are
1. Large data sets may contain valuable implicit information that can be
discovered automatically.
2. Difficult to program applications using traditional manual
programming.
3. Classification of security issues involves a vast amount of data that
needs to be analyzed and DM is well suited to discover interesting
patterns.
147
Use of DM approaches in developing IDS
1. IDS are difficult to program using ordinary programming languages.
2. ML is suitable as it is adaptive and dynamic in nature.
3. The environment in which an IDS works is dependent on personal
preferences. Hence the ability of computers to learn with improved
performance is really a challenging task.
Research shows that many ML techniques can be used for data
classification. It is presented that popular supervised learning techniques gives high
detection accuracy for IDS. A.M.Turing identified ML as a precondition for
intelligent systems. ML is generally used for automatic computing procedures that
are based on logical or binary operations. They learn a task from a series of
examples and in my research work usage of ML is concerned with classification. DT
approaches the problem applying a sequence of logical steps and is capable of
representing the most complex problem given sufficient data. This means an
enormous amount of data set needs to be collected. GA and Inductive Logic
Procedures (ILP) are currently active research areas that allow dealing with more
general types of data. If the number and type of attributes vary and additional layers
of learning are introduced these types are helpful. Some of the main characteristics
of ML and DM are:
1. ML focuses on the prediction, based on known properties learnt from
the training data
2. DM focuses on the discovery of (previously) unknown properties on
the data
3. DM uses many ML methods with a slightly different goal in mind.
4. ML employs DM methods as unsupervised learning or as a
preprocessing step to improve learner accuracy.
148
5. In ML, the performance is usually evaluated with respect to the
ability to reproduce known knowledge
The main aim of ML algorithms is to generate classifying expressions
simple enough to be understood by the users. Like statistical approaches,
background knowledge may be used in development but operation is assumed to be
automatic without user intervention.
A wide range of real world applications are discussed in the community
of Statistical Analysis and DM. Statistical techniques usually assume an underlying
distribution of data and require the elimination of data instances containing noise.
Statistical methods though computationally intense can be applied to analyze the
data. They are widely used to build behavior based IDS. The behavior of the system
is measured by a number of variables sampled over time such as the resource usage
duration, the number of processors, memory disk resources consumed during that
session etc. The model keeps averages of all the variables and detects whether
thresholds are exceeded based on the standard deviation of the variable. Very few on
line (real time) network IDS approaches are proposed until now. Subaie [132] uses
Hidden Markov Models (HMM) over NN in anomaly intrusion detection to classify
normal network activity and attack using a large training dataset. The approach was
evaluated by analyzing how it affected the classification results. Authors in [133]
propose a hybrid intelligent systems using DT, SVM and Fuzzy SVM for anomaly
detection (unknown or new attacks). The results show that the hybrid SVM approach
improves the performance for all the classes when compared to a SVM approach.
ML is concerned with building systems that experience and improve the
performance. Classification is one of the standard tasks in ML. A classifier is a
function F(x) that assigns a class label C from a finite set of labels. Given a
character, the classifier might depict the letter which can be constructed
automatically. One method to build classifiers automatically is to use supervised ML
techniques that are executed in two stages: Learning phase and Testing phase.
149
A classifier C as a function F(i) : I ->C, where I is an input vector and
C ={C
1
,C
2
, . . . ,C
n
} is a class space. A binary classifier F
b
is a classifier mapping an
instance space I to a binary class space: F
b
(i) : I ->{P, N}.
SVM proposed by Cortes and Vapnik [134] is a supervised learning
algorithm that is used increasingly in IDS. The classification performance of SVM
model is better than the classification methods, such as ANN. The benefit of SVMs
is that they learn very effectively with high dimensional data. Rui [135] uses
Incremental RVM to detect intrusions. The features selected by this method prove to
be effective and also decreased the space density of data. This improves the
generalization performance of RVM and the results are better than RVM and SVM.
This guarantees the reliability of using RVM based approach for designing IDS.
RVM has a better generalization performance than SVM due to the generation of
less support vectors.
The SVM is one of the most successful classification algorithms in the
DM area.
RVM is capable of delivering a fully probabilistic output and it is proved
to have nearly identical performance to, if not better than, that of SVM in several
benchmarks. Di He [136] proposes an IDS approach based on the RVM where a
Chebyshev chaotic map is introduced as the inner training noise signal. The result
shows that the approach can reach higher detection probabilities under different
kinds of intrusions and the computational complexity reduces efficiently.
6.3 SVM TRAINING AND CLASSIFICATION
The SVM is one of the most successful classification algorithms in the
DM area. SVM is a supervised learning algorithm that is used increasingly in IDS.
The classification performance of SVM model is better than the classification
methods, such as ANN. The benefits of SVM are:
1. They learn very effectively with high dimensional data.
150
2. Maps input feature vectors into a higher dimensional feature space
through some nonlinear mapping.
3. Can learn a larger set of patterns and be able to scale better, because
the classification complexity does not depend on the dimensionality
of the feature space.
SVMs also have the ability to update the training patterns dynamically
whenever there is a new pattern during classification. The main disadvantage is
SVM can only handle binary class classification whereas intrusion detection requires
multi-class classification.
SVM algorithms are binary classifiers that are sufficient to distinguish
between normal and intrusive data. Recent SVM algorithms support multi class
learning. The approach combined several two-class SVMs and for each SVM, the
training data is partitioned into two classes so that one represents an original class
and the other class represents the attacks. It is also necessary to specify an upper
bound parameter C that can be determined experimentally. This results in a cross-
validation procedure, which is wasteful both for computation as well as data.
SVM uses kernel trick to apply linear classification methods to non linear
classification problems. SVM tries to separate two classes of data points in a multi-
dimensional space using a maximum margin hyperplane. A hyperplane is one that
has a maximum distance to the closest data point from both classes. The problem of
learning SVMs is theoretically well founded and well understood. They are
particularly useful for application domains with a high number of dimensions, such
as text classification, image recognition, bioinformatics or medical applications. The
disadvantage of these methods is that the models are difficult to understand.
Suppose an empirical data is given
(x
1
,y
1
), . . . , (x
n
, y
n
) X Y (6.1)
where X is a nonempty set with x
i
as predictor variables and y
i
Y that are called as
response variables.
151
Assumptions are not made on the domain X and in order to generalize the
points to unseen data points an additional structure is required. In the case of binary
classification, given new input x X, predict the corresponding y {1}. It means
to choose y such that (x,y) is similar to the training examples in some sense.
Similarity measure is required in X and in {1}. For X, a function
k :X X R, (x,x') k(x,x'), for all x,x' X (6.2)
k(x,x') =<f(x),f(x')> (6.3)
where f(x) maps into some dot product space H, called the feature space. The
similarity measure k is usually called a kernel, and f is called its feature map. The
main advantage of using such a kernel is to construct algorithms in dot product
spaces using similarity measure.
The detection of intruders from a data set can be realized using many
methods by identifying an unknown pattern from a set of known patterns. SVM are
supervised learning methods that are used for classification. SVM is a kernel method
and the selection of a kernel function used in SVM is very crucial in determining the
performance. The idea of SVM is to identify maximum margin using the kernel
methods. The data can be first implicitly mapped to a high dimensional kernel space.
The max margin classifier is determined in the kernel space whereas the
corresponding decision function in the original space can be non-linear.
(a) (b)
Figure 6.2 (a) Original data in the input space. (b) Mapped data in the
feature space
152
The non-linear data in the feature space is classified into linear data in
kernel space by the SVM which is illustrated in Figure 6.2. Consider a classification
problem where the discriminant function is nonlinear, as illustrated in Figure 6.2 (a).
Applying a mapping function f(x) into this feature space the data under
consideration can become linearly separable as illustrated in Figure 6.2 (b).The aim
of SVM classification is to find an optimal hyper plane separating relevant and
irrelevant vectors by maximizing the size of the margin (between classes). User can
initially construct the training set by selecting samples from several validation data
sets.
In my research work, all known IP addresses are considered as normal
and by providing a bigger training set the accuracy and invariance would definitely
increase. IDS uses ML approach and an effort is made for automating intrusion
detection in large data sets, thereby improving retrieval accuracy. Similarly, in the
absence of class labels, unsupervised clustering can be employed. Intrusion
detection depends on similarity measure. However classification can be performed
using several techniques that neither require nor make use of similarity measures.
During the training process specifying the kernel parameter is important and if it is
too small then generalization performance may suffer from overfitting. If sufficient
data set is available, the kernel parameter can be found using cross validation. Using
cross validation single elements from the data set are removed one at a time and the
SVM is trained on the remaining elements and then tested on the removed data set.
Tight bounds on the generalization can be obtained using the approximation that the
set of Support Vectors (SV) does not change by removing single patterns. The
perpendicular distance between the separating hyper plane and a hyper plane
through the closest points are called as SV. The recent model selection strategies
will give a reasonable estimate for the kernel parameter based on theoretical
arguments without the use of validation data. Using a limited number of datasets,
these model selection strategies appear to work well. In real life datasets, the data
points are not labeled. This is of particular importance in special situations, where
labeling data is expensive or the dataset is large and not labeled. SVM constructs the
hypothesis using a subset of the data containing the most informative patterns. These
153
patterns are good candidates for active or selective sampling techniques which
would predominantly request the labels for those patterns that will become SV.
SVM searches for SV that are data points found to lie at the edge of an
area in space which is a boundary from one class of points to another. The SV are
used to identify a hyperplane that separates the classes. Modeling of SVM deals with
these support vectors, rather than the whole training dataset, and so the size of the
training set is not an issue. If the data is not linearly separable, then kernels are used
to map the data into higher dimensions so that the classes are linearly separable.
Many researchers have proposed SVM as a novel technique for
developing IDS. SVM map input feature vectors into a higher dimensional feature
space through some nonlinear mapping and are developed on the principle of
Structural Risk Minimization (SRM). SRM seeks to find a hypothesis h for which
one can find lowest probability of error. The traditional learning techniques for
intrusion detection are based on the minimization of the empirical risk, which
attempt to optimize the performance of the learning set. Computing the hyper plane
to separate the data points to train SVM leads to a quadratic optimization problem.
SVM uses a linear separating hyper plane to create a classifier but all the problems
cannot be separated linearly in the original input space. SVM uses a feature called
kernel to solve this problem. The kernel functions transforms linear algorithms into
nonlinear ones via a map into feature spaces that include polynomial, Radial Basis
Functions (RBF) two layer sigmoid neural nets etc. The user may provide one of
these functions at the time of training the classifier, which selects SV. SVM classify
data by using these SV which are members of the set of training inputs that outline a
hyper plane in feature space.
6.4 THE KERNEL MAPPING
The key issue of SVMs is to use f(x) to map the data into a higher
dimensional space. Covers theorem guarantees that any data set becomes arbitrarily
separable as the data dimension grows. Finding such nonlinear transformations is far
from trivial. To achieve this task, a class of functions called kernels is used. Roughly
154
speaking, a kernel K(x,y) is a real-valued function K : X X -> R for which there
exists a function x:X -> Z, where Z is a real vector space, with the property
K(x,y) =f(x)
T
f(y). This function f(x) is precisely the mapping in Equation (6.2).
The kernel K(x,y) acts as a dot product in the space Z. In the SVM literature X and
Z are called, respectively, input space and feature space. Kernel methods give a
systematic and principled approach to ML and the good generalization performance
achieved can be readily justified using statistical learning theory or Bayesian
arguments.
The emphasis will be on using RBF kernels which generate RBF
networks. This approach is general since other types of learning machines can be
readily generated with different choices of kernel. RBF networks have been widely
used because they exhibit good generalization and universal approximation. This is
achieved by using RBF nodes in the hidden layer. A new approach to designing
RBF networks based on kernel methods is applied and this technique has a number
of advantages. The emphasis is on classification and novel intrusion detection. The
kernel representation of data amounts to a nonlinear projection of data into a high
dimensional space where it is easier to separate the two classes of data. With a
suitable choice of kernel, the data can become separable in feature space despite
being non-separable in the original input space.
In real time problems the task is not to classify but to detect novel or
abnormal instances. In the current research work that involves classification of
intrusions, the system does not correctly detect an intrusion with an abnormal
behavior which is distinct from all normal behavior of the training set. Novelty
detection would potentially highlight the data as abnormal and model the support of
a data distribution. The main objective is to create a binary valued function which is
positive in those regions of input space where the data predominantly lies and
negative elsewhere. The approach is to find a hyper sphere with a minimal radius R
and centre a, which contains most of the data and novel test points that lie outside
the boundary of this hyper sphere.
155
Some of the unique features of SVM and Kernel Methods are
1. They are explicitly based on a theoretical model of learning
2. They come with theoretical guarantees about their performance
3. They have a modular design that allows one to separately implement
and design their components
4. They are not affected by local minima
5. They do not suffer from the curse of dimensionality
The major advantage of using kernel based systems is that after a valid
kernel function is selected, one can practically work in spaces of any dimension
without increasing computational cost. User even need not know which features are
being used. One more advantage is that it is possible to design and use a kernel for a
particular problem. This kernel can be applied directly to the data without the need
for a feature extraction process. When a large data set is available with many
features then feature extraction process is very important. The importance of kernel
based learning methods is that it allows one to use any valid kernel on a kernel based
algorithm. The R interface provided in e1071 [137] and klaR [138] includes
interface to SVM, a popular implementation along with other classification tools like
Regularized Discriminant Analysis (RDA) etc. libsvm is an integrated software for
support vector classification. The R tool offers an interface to the libsvm a very
efficient SVM implementation. libsvm provides a robust and fast SVM
implementation and produces results for most classification and regression
problems. Most of the libsvm and klaR SVM code is in C++ and hence can be
enhanced by modifying the code and updating new kernels.
6.5 RVM TRAINING AND CLASSIFICATION
RVM is a sparse ML algorithm that is similar to the SVM in many
respects. RVM is another area of interest in the research community as they provide
a number of advantages. RVM is based on a Bayesian formulation of a linear model
with an appropriate prior that results in a sparse data representation. As a result, they
156
can generalize well and provide inferences at very low computational cost. Though
SVM has several desirable properties like it fits functions in high dimensional
feature spaces, through the use of kernels and with possibly large space of functions
available in feature space, good generalization performance can be achieved. It is
sparse meaning only a subset of training examples is retained at runtime, thereby
improving computational efficiency. Although relatively sparse, SVM make
unnecessary use of basis functions as the number of SV required typically grows
linearly with the size of the training data set. SVM outputs a point estimate with
regression and a binary decision in classification. As a result it is difficult to estimate
the conditional distribution to capture the uncertainty during prediction. In RVM the
kernel function must be the continuous symmetric kernel of positive integer operator
to satisfy Mercer condition. Maintaining its classification accuracy RVM has the
ability to yield a decision function that is much sparser than SVM. This leads to
significant reduction in the computational complexity of the decision function and
thereby making it more suitable for real time applications.
The RVM produces a function which is comprised of a set of kernel
functions also known as basis functions and a set of weights. This function
represents a model for the system presented to the learning process from a set of
training data set. The kernels and weights calculated by the learning process and the
model function defined by the weighted sum of kernels are fixed. From this set of
training vectors the RVM selects a sparse subset of input vectors which are deemed
to be relevant to the probabilistic learning scheme. This is used for building a
function that estimates the output of the system from the inputs. These relevant
vectors are used to form the basis functions and comprise the model function.
RVM is a probabilistic sparse kernel model identical in functional form
to the SVM making predictions based on a function of the form
y(x) =_ o
n
K(x,x
n
)
N
n=1
+ o
0
(6.4)
where u
n
are the model weights and K() is a kernel function. It adopts a Bayesian
approach to learning, by introducing a prior over the weights u
157
p(o,[) = [ N([
|0,o
-1 m
=1
) 0ommo([
|[
[
,o
[
) (6.5)
governed by a set of hyper-parameters , one associated with each weight, whose
most probable values are iteratively estimated for the data. Sparsity is achieved
because in practice the posterior distribution in many of the weights is sharply
peaked around zero. Furthermore, unlike the SVM classifier, the non-zero weights in
the RVM are not associated with examples close to the decision boundary, but rather
appear to represent prototypical examples. These examples are termed Relevance
Vectors (RV). The function returns an object containing the model parameters along
with indexes for the RV and the kernel function along with the hyper parameters
used.
RVM is currently of much interest in the research community as they
provide a number of advantages. RVM is based on a Bayesian formulation of a
linear model with an appropriate prior that results in a sparse data representation. As
a result, they can generalize well and provide inferences at very low computational
cost. Many applications like object detection and classification, target detection in
images, classification of micro calcifications from mammograms etc are developed.
RVM produces a function which is comprised of a set of kernel functions also
known as basis functions and a set of weights. This function represents a model for
the system presented to the learning process from a set of training data set. The
kernels and weights calculated by the learning process and the model function
defined by the weighted sum of kernels are fixed. From this set of training vectors
the RVM selects a sparse subset of input vectors which are deemed to be relevant by
the probabilistic learning scheme. This is used for building a function that estimates
the output of the system from the inputs. These relevant vectors are used to form the
basis functions and comprise the model function.
In the classification phase each of the network data selected from the
feature selection phase is classified as normal data or attack data. This phase
consists of two main dataset which are used for training and testing. During
158
the first phase training is performed using RVM with a set of network
records with known answer classes. Based on the training the IDS model can
classify the data in each record into normal network activity or main attack
types. Then the model is tested with new or untrained dataset where each
record was captured in a real time environment in the college research lab.
For an input vector x, an RVM classifier models the probability
distribution of its class labeled C c (1, +1} using logistic regression as
p _(C =1_x) =
1
1+cxp(-]RvM(x))
] (6.6)
where f
RVM
(x) the classifier function is given by,
f
RVM
(x) =_
N
=1
uiK(x,xi) (6.7)
where K(.,.) is a kernel function, and x
i
, i =1,2,...,N, are training samples. The
parameters u
i
, i 1, 2, ..., N, in f
RVM
(x) are determined using Bayesian estimation,
introducing a sparse prior on u
i
The parameters u
i
are assumed to be statistically
independent obeying a zero-mean Gaussian distribution with variance
i
-1
, used to
force them to be highly concentrated around zero, leading to very few nonzero terms
in f
RVM
(x).
6.6 METHODOLOGY AND PROPOSED ARCHITECTURE
Over 90% of Internet traffic uses the TCP. Because of its widespread use
and its impressive growth, the research focuses on the detection of anomalous
behavior within TCP traffic. Exploring the TCP packet attributes would enable a
classifier to identify normal and abnormal activity on a packet-by-packet basis.
From these attributes, a DT is built which will enable to identify and classify
different attacks and violations. The process of building a classifier model using
RVM is depicted in Figure 6.3.
159
Figure 6.3 Architecture of the RVM model
In my research work, design of IDS is treated as a traditional
classification problem where each abnormal behavior corresponds to a class label.
Researchers have found that SVM classifiers are ideal for designing IDS and hence
this kind of classification algorithms is chosen. Since SVM is a binary classifier, the
multi class problem needs to be decomposed into binary problems. Traffic in the
network results in continuous change as the users login and make use of Internet.
For capturing the packets in real time, JPCAP and WINPCAP tool is used to collect
the information that is being transmitted. Data set for 30 minutes is collected which
contains both normal and attack data set. In order to mine the contextual information
contained in the data set, it is required to detect the attacks and extract the required
information.
The procedure is as follows:
1. Collect data set with normal and attack behavior.
2. Extract features and derive a subset of features that are necessary and sufficient
to be used in a classifier
3. Train SVM and use SVM for classification
Stage 3 Stage 1 Stage 2
Alarm
Unit
CPOD
Algorithm
Real Time
Data
Log File
Data
Preprocessing
Outliers
Vector
Traffic Data
Classifier
Inference
Using RVM
Log
File
160
The network data is collected from the interface which is capable of
capturing information flowing within the local network. For example, anomalies can
be detected on a single machine, a group of network, a switch or a router. For the
current research work the TCP/IP packet is collected in real time from the research
lab network and dumped for further process. The Data Preprocessing phase handles
the conversion of raw packet or connection data into a format that algorithms can
utilize and store the results in the knowledge base. Rather than operating on a raw
network dump file, the algorithm uses summary information to perform the analysis.
Data is preprocessed to generate summary lines about each connection found in the
dump file. The resulting summary file is then parsed and processed by the algorithm
to give a count to each data/each time point, with a higher score indicating a high
possibility of being an outlier/a change point.
The Log File stores the data as rules produced by the detection algorithm
for further mining process. It may also hold information for the preprocessor, such
as patterns for recognizing attacks and conversion templates. This training data is
responsible for generating the initial rule sets that are needed to be used for
deviation analysis. It can be triggered automatically based on time or the amount of
pre-processed data available.
The proposed Outlier Detection Algorithm examines the network data
and creates a description of differences and stores in the outlier vectors for further
reference. If a deviation is detected, it signals the alarm unit. A strategy for invoking
the deviation analyzer is by querying periodically the outlier vectors for the new
profiles. Also the profiler may signal when a new profile is added and the Alarm
Unit is responsible for informing the administrator when the deviation analyzer
reports unusual behavior in the network stream. This can be in the form of SMS, e-
mails, console alerts, log entries etc.
In the data preprocessing step as shown in Figure 6.3, packets are
captured using J PCAP library and information is extracted that includes IP
header, TCP header, UDP header, and ICMP header from each packet. After that,
the packet information is partitioned and formed into a record by aggregating
161
information every 30 minutes. Each record consists of data features considered
as the key signature features representing the main characteristics of network data
and activities.
Experiments were done using R statistical framework. Performance of
the process at each level is measured. In order to train the SVM, unique IP addresses
with 19 different feature sets as listed in Table 3.2 with 3043 training instances and
375 and 459 test instances for normal and attack were used respectively.
6.6.1 R Statistical Tool
The R statistical tool is used for the implementation and it is a known
fact that different choices for its kernel, its parameters and its way of solving the
quadratic problems result in very different models. The R software has built in
functions that can be used for solving the quadratic problems. The use of API
facilities makes it extremely flexible and is capable of performing evaluation
measurements. DM combines concepts, tools, and algorithms from ML and
Statistics for the analysis of huge datasets. This will allow users to gain insights,
understanding, and knowledge of data set and many commercially available
products offer high levels of analytical tools. R is ideally suited for many
challenging tasks associated with DM. R offers a complete statistical computing
product and a programming language for the skilled statistician.
R provides the SVM in e1071 as an interface to libsvm, which provides a
very efficient and fast implementation. ksvm provided in kernlab [139] for kernel
learning is integrated into R so that different kernels can easily be explored. kernlab
in R currently has an implementation of the RVM which can be used for regression.
A new class called kernel is also introduced and kernel functions are objects of this
class. An issue with SVM is that parameter tuning is not an easy job and
computationally expensive. The approach is to build multiple models with different
parameters and choose the one with lowest expected error. This can lead to
suboptimal results though, unless quite extensive search is performed. Research has
explored that it is possible to predict the performance of models by different
162
parameter settings. It has been found that the task of learning is difficult from the
large set of available collection features. By changing the representation of features
the ability to reason and learn will significantly improve. Learning is easier if kernel
learning entities are projected into a higher dimensional space which can be done by
computing the dot product of the data. Different kernels result in different
projections and have demonstrated excellent performance on many ML and pattern
recognition work. However, they are sensitive to the choice of kernel, may be
intolerant to noise, and cannot deal with missing data as well as data of mixed types.
Choosing a suitable kernel is vital as different kernels result in different
mappings of the data space to the feature space. R tool allows using many of the
kernels and for this work a linear and an RBF kernel are used. Results show that a
RBF kernel works better for detection and linear kernel could not perform a
mapping to a higher dimensionality space. Since the number of parameters was
more, polynomial kernel was not used.
kernlab in R is an extensible package for kernel based ML methods
which provides a framework for creating and using kernel based algorithms. The
package contains dot product kernels implementation of SVM and RVM, Gaussian
processes, a ranking algorithm, kernel PCA, kernel CCA, kernel feature analysis and
a range of clustering algorithm.
6.6.2 RBF Networks
In my research work RBF kernel is used. RBF network consists of three
layers as described below:
1. Input Layer: This layer broadcasts the values of the input vector to
each of the units in the hidden layer. One neuron in the input layer
corresponds to one predictor variable. If the values are categorical
variables, n-1 neurons are used where n is the number of categories in
the input vector.
163
2. Hidden Layer: Each unit in this layer will produce an activation
based on the associated radial basis function Hidden layer consists of
variable number of neurons. Each neuron consists of a radial basis
function centered on a point with the same dimensions as the
predictor variables.
3. Output Layer: Each unit in this layer computes a linear combination
of the activations of the hidden units. The layer has a weighted sum
of outputs from the hidden layer to form the network outputs.
(x) = _ w
]
m
]=1
b
]
(x)
b(x) =exp _-
(x - c)
2
r
2
_
Figure 6.4 RBF networks
Figure 6.4 shows RBF Networks where f(x) is the function
corresponding to the j
th
output unit which is a linear combination of h radial basis
functions r
1
, r
2
,, r
h
. h(x) is the Gaussian activation function with the parameters r
which could be the radius or standard deviation and c the center or average taken
from the input vector defined separately for each RBF unit. The learning process is
based on adjusting the parameters of the network.
This can be achieved by adjusting the three parameters
1. Weight w between the hidden nodes and the output nodes
2. Center c of each neutron of the hidden layer
3. Unit width r.
X
1
X
i
X
n
h
m
(x)
h
j
(x)
h
1
(x)
f(x)
RBF Network
Output Layer Hidden Layer Input Layer
W
m
W
j
W
1
164
The tests were performed using the linear kernel model with the RBF
kernel by calculating the performance over all classes. The RBF kernel resulted in
better performance but some bad choices for the parameters of the RBF kernel result
in accuracy even below 60%. But still enough tests must be conducted in order to
define good parameters, as this can determine the performance of IDS.
Changing the values of h(x) and weights affect the model and in order to
find the optimal parameters it is required to try different values and compare the
results. In this work the best parameter setting for each of the feature sets of
Table 6.1was performed.
As specified in Table 6.1 the input data set x to the classifier (SVM or
RVM) is collected for a window size of 30 minutes. The choice of the window size
was large enough to cover all attacks in the dataset that was chosen empirically for
our research work. Data was preprocessed for the training samples by applying
Gaussian function to detect the center s of all manually identified attacks. For
applying RVM, several parameters need to be tuned for best performance during the
training phase. The most important parameter is the type of kernel function to be
used (i.e., polynomial vs RBF) and its associated parameter (i.e., the order p for the
polynomial, or the kernel width s for RBF). To determine these for the classifier
model a cross validation procedure was applied using the training set and
determined the parameter settings for the RVM classifier.
For each attack in the test set, the attack detection process was carried out
by the following steps:
A trained classifier (RVM or SVM) was applied with a threshold to classify each
attack in the dataset as NORMAL or ATTACK.
1. A confusion matrix of the potential attacks was generated.
2. The detected attacks were grouped into attack objects.
3. The performance of the detection algorithm was evaluated using the Receiver
Operating Characteristic (ROC) curve.
165
6.7 EXPERIMENTAL RESULTS
ROC [140] curves plot the correct detection rate, True Positive (TP)
versus the average number of False Positives (FPs) per dataset varied over the
continuum of the decision threshold.
Figure 6.5 ROC curve for the data
Figure 6.6 Performance chart of SVM
166
Table 6.1 Error obtained by the RVM with different parametric values
Poly Kernel Degree 1 Degree 2 Degree 3 Degree 4
Error Rate 0.092 0.084 0.063 0.042
RBF Kernel o =1 o =4 o=6 o=9
Error Rate 0.072 0.064 0.043 0.039
Table 6.2 Error obtained by the SVM with different parametric values
Poly Kernel Degree 1 Degree 2 Degree 3 Degree 4
Error Rate 0.102 0.093 0.081 0.052
RBF Kernel o =1 o =4 o=6 o=9
Error Rate 0.094 0.072 0.051 0.043
The training dataset have a total of 121 ATTACK class and 1291
examples for the NORMAL class. This dataset was collected after preprocessing
phase and these examples formed the training set. The RVM was trained and
Tables 6.1 and 6.2 summarize the training results where the generalization error
obtained by the classifier is listed under different parametric values. A similar set of
results was also obtained for training the SVM and the results for RVM is best
obtained with a degree 4 polynomial kernel. SVM achieved the best error level with
a degree 5 polynomial kernel.
Table 6.3 Comparison between RVM and SVM models
Model Number of training data Number of vectors Testing Performance
SVM
100 25 0.76
500 129 0.81
1000 230 0.86
5000 540 0.88
RVM
100 17 0.71
500 109 0.72
1000 170 0.81
5000 240 0.82
167
At these parametric values, for RVM the number of relevance vectors
was found to be less as compared to SVM. Table 6.3 summarizes the number of
support vectors and relevance vectors found from the training dataset. With their
parameters tuned both RVM and SVM classifier models were retrained using all the
samples in the training set. The trained classifiers were used subsequently for
performance evaluation.
Figure 6.7 ROC curve obtained for RVM and SVM models
6.8 PERFORMANCE ANALYSIS OF RVM AND SVM
Both the RVM and SVM classifiers were evaluated using all the attacks
in the test dataset. As can be deducted from Figure 6.7 the RVM classifier could
achieve essentially the same performance as the SVM, but with a much reduced
computational complexity. Table 6.4 shows the comparison between the SVM and
RVM models. The value of testing performance from RVM model is effectively
same as that of SVM with lesser support vectors. The performance of the RVM is
also better than the SVM. The current implementation and usage indicates the
success of the algorithm. This gave a unifying view of detection of attacks in real
time network data. Usage of RVM also showed a competitive accuracy by
maintaining ability of sparseness. Experimental results showed that the RVM model
168
achieved essentially the same performance with a much sparser model as a
previously developed SVM model. This much reduced computational complexity in
RVM makes it more feasible for real time processing while designing IDS. The
proposed method is competitive with respect to processing time and allows the use
of selected training data set. The result shows an improvement in RVM
classification performance.
169
CHAPTER 7
CONCLUSION
This thesis has made contributions to two key research areas namely
Intrusion Detection and ML. The contributions apply specifically to the application
of ML to intrusion detection. Several factors were found to affect significantly the
results and further investigation demonstrated that this is indeed a critical challenge
to in designing an IDS. SLFN was capable of detecting a particular class of
intrusion. CP algorithm was proposed to optimize the detection of outliers. This
approach was found to be successful and able to detect the class of intrusion. Since
single objective optimization is performed, there was no control on the
classification. To address this limitation, RVM was proposed and class imbalance
has been identified as a significant challenge for intrusion detection.
The overall performance and scope of detection of the IDS directly
depends on the feature selection stage. The main focus of this thesis is on mining the
most useful network features for attack detection. In order to do this a network
feature classification schema is proposed and a deterministic feature evaluation
procedure that helps to identify the most useful features that can be extracted from
network packets. The difference however is in the time of collection, size of the
network, throughput, and also the type of users that the networks have.
The proposed method uses mathematical, statistical and RVM techniques
to rank the participation of individual features into the detection process. The
presented experimental results empirically confirm that the proposed model can
successfully be applied to mine new features in the detection process.
170
An ideal data for IDS should be labeled at packet level and there must be
a considerable number of attacks. The current work does not differentiate the final
results based on the speed of the attacks. We believe that an interesting further study
would be to analyze the set of features that are appropriate for fast or slow attacks.
However, to do that, the dataset that will be used needs to have an equal number of
attacks in each of the attack categories for study. Multiple datasets can be considered
and data sets need to be extracted from a set of diverse networks.
An extensive review of Artificial Intelligence (AI) applied to intrusion
detection was conducted and the findings are reported in the literature survey.
Various research studies have adopted the ML techniques and evaluated them on the
KDD Cup 99 data set. The results thus obtained are and also contradictory. This
made to investigate the causes of the discrepancies and it was found that a critical
challenge to intrusion detection is the collection of data set and to detect a particular
class of intrusion in real time.
SLFN was proposed to optimize the weights and selection of layers in
MLP to better learn and was found to be successful. The system was able to detect
the previously unknown class of intrusion. The data set selected posed several
challenges during the selection of ML algorithms such as:
1. Working with high dimensional data and large memory requirements.
2. Learning speed that gets affected because of very large data set.
3. Feature selection from the large data set.
4. Implementation learning that is incremental / continuous.
5. Detecting new unknown intrusions.
As explained before CP and OD methods with the use of RVM was
carried out without too much loss of accuracy. The system has proved good with
respect to architecture, data processing, alert aggregation, and reporting
mechanisms. However, the methods proposed here have addressed a critical
challenge of learning from large data set and providing the user with a set of
171
solutions. User can select and can then incorporate in an IDS framework as a
detection module. Furthermore, there is always an improvement required from the
proposed methods, concerning scalability and performance that are discussed further
in Section 7.2.
CP and OD combination was a successful method for improving on the
performance of IDS. The results obtained in this thesis support the observations that
combination of different methods can improve the performance of IDS. Current
approaches of creating hybrid are prone to succeed because they may yield a
solution with a good classification trade-off. It has been demonstrated in this thesis
that FPR were comparatively and outperformed other methods.
7.1 MAJOR CONTRIBUTIONS AND NOVELTY
To the field of intrusion detection domain the main contribution of this
thesis is the suitability and application of RVM in building robust and efficient IDS.
A novel framework is developed that addressed three critical issues which affect the
performance of anomaly and hybrid IDS in high speed networks.
The following three issues are addressed:
1. Attack detection coverage
2. Generating less false alarms
3. Efficiency in operation.
As a result of this research, a framework is built to develop efficient IDS.
The framework offers customization and ease of detecting different variety of
attacks. The system can identify the type of attack and specific intrusion response
mechanism can be initiated by the user so that the impact of the attack is minimized.
172
CP and OD are efficient methodologies available for building robust and
efficient IDS. Integrating the framework with these two technologies can be used to
build effective IDS. Using CP and OD as intrusion detectors resulted in very few
false alarms and the attacks can be detected with very high accuracy.
The logging framework developed using J PCAP, WINPCAP and J AVA
can capture network data that are significant to detect attacks. The framework can be
used for a variety of applications that requires IDS as plug-in.
Network session needs to be in order to detect attacks with high accuracy
and Feature Extractor can be effectively used to model the events and select required
features. Using CPOD attacks can be detected with smaller window size and good
selection of threshold. A range of experiments are performed and in order to detect
intrusions effectively, it is critical to model the correlations between multiple
features. Since feature sets are independent it makes the model complex and
inefficient as it affects the attack detection capability. The framework developed can
easily define and specific features are extracted, which enables for building effective
intrusion detectors. Our framework is customizable and can be used to build
efficient network IDS which can detect a wide variety of attacks. Experimental
results and comparison with other well known methods for intrusion detection such
as Decision Trees, Naive Bayes and Support Vector Machines has proved better in
terms of accuracy and detection rate without affecting the overall system
performance.
The notable part of our research work is the improvement in attack
detection accuracy. Statistical tests using CPOD demonstrates a higher assurance in
detection accuracy. As the system developed is not based on signatures of attacks it
is capable of detecting novel attacks. Experimental result confirms that our system,
based on CPOD, RVM methods can detect attacks at an early stage by analyzing
only a small number of data set resulting in an efficient system which can block
attacks in real-time.
173
7.2 FUTURE WORK
The task of detecting intrusions in networks is very critical and leaves no
margin for errors. Developing successful attack detection is to identify the best
possible approach which is an extremely difficult task. To develop a single solution
that can work for every network and application is a real challenge. In my research
work a novel framework is developed using different methods which perform better
than previously known approaches. In order to improve the overall performance the
domain knowledge is used for selecting better features for training which is justified
because of the critical nature of the task of intrusion detection. An interesting
direction for future research is to develop completely automatic IDS. Another area
of work is to develop a faster implementation by employing our approach on multi
core processors.
IPS/IRS which aim at preventing attacks rather than simply detecting
them are another area that can be explored. This can be achieved by integrating IDS
with the known security policy of individual networks. This would help by
minimizing the false alarms raised by the IDS.
But one of our objectives in this work was to detect and classify network
attacks. Future research in this area is definitely needed and other DM methods can
be incorporated. Studies could also be conducted with more attack types which are
totally new, or variations on existing types. In this vein, other studies could address
the problem of classifying rarely seen attack types like the U2R and R2L. Future
work will research on the possibility of expanding the output of the individual
classifiers so that it would be easier to identify the exact source of a given attack. By
adding a prediction layer it is possible to reduce the complexity of carefully tuning
the thresholds and window sizes. Idea is to develop a layer that predicts next data
point with some probability. The layer would also learn to readjust probabilities
from current deviations and enhance accuracy. Although this study makes a
contribution to the IDS classification, there are other DM methods, such as memory-
based systems, logistic regression, and discriminant analysis that can be further
explored.
174
7.3 SIGNIFICANT CHALLENGES AND OPEN ISSUES
1. It is really very hard to trace the true source of attack. Hence if a
reliable method is developed that can trace back the packets to their
actual source then it is possible to prevent many of the attacks.
Though some solutions are available, a global effort is required which
is a real challenge ahead. It is not only to identify the true source but
the overall performance of the system should not be affected.
2. New methods based on user profiling can be developed which will
learn the normal user activity and the same can be used to detect
deviations if any from the model that is learnt. Most of the works
related to this are based on thresholds. Hence a detailed empirical
analysis can be performed and develop IDS.
3. In the current Internet era to keep pace with the rapid and ever
changing networks and applications is still a major task. The research
in developing IDS must synchronize with the present network that
supports wireless technologies, ADHOC networks and mobile
devices. IDS must be developed in such a way that they can integrate
with such networks and devices. They should also provide support
for advances in a comprehensible manner.
175
REFERENCES
[1] Levent Koc , Mazzuchi T.A., Shahram Sarkani, A Network Intrusion
Detection System Based on A Hidden Nave Bayes Multiclass Classifier,
Expert Systems with Applications, 39, pp 1349213500, 2012
[2] Andreas Fuchsberger, Intrusion Detection Systems and Intrusion Prevention
Systems, Information Security Technical Report, 10, pp 134 139, 2005
[3] Peyman Kabiri, Ali Ghorbani A, Research on Intrusion Detection and
Response: A Survey, International J ournal of Network Security, 1(2),
pp 84 102, Sep. 2005
[4] Aleksandar Lazarevic, Levent Ertoz, Vipin Kumar, Aysel Ozgur, J aideep
Srivastava, A Comparative Study of Anomaly Detection Schemes in
Network Intrusion Detection, SIAM Conference on Data Mining , 2003
[5] Douglas J . Brown, Bill Suckow, Tianqiu Wang, A Survey of Intrusion
Detection Systems, Department Of Computer Science, University of
California, San Diego, USA, 2008
[6] Karen Scarfone , Peter Mell, Guide to Intrusion Detection and Prevention
Systems (IDPS), Recommendations of the National Institute of Standards
and Technology, 2008
[7] Next Generation Intrusion Detection Systems (IDS), McAfee Network
Protection Solutions, 2007
[8] Leslie T. O., Current Trends in IDS and IPS, May 29, 2007
[9] Animesh P., J ugn-Min P., An Overview of Anomaly Detection Techniques:
Existing Solutions and Latest Technological Trends, Computer Networks,
51, pp. 3448-3470, 2007
[10] Garcia P., Diaz J., Macia G.F., Vazquez E., Anomaly-Based Network
Intrusion Detection: Techniques, Systems and Challenges, Computer &
Security, 28, pp. 18-28, 2009
[11] Richard Heady, George Luger, Arthur Maccabe, Mark Servilla, The
Architecture of A Network Level Intrusion Detection System, Technical
Report NM 87131. Computer Science Department, University of New
Mexico, Albuquerque Mexico., 1990
176
[12] Barford P., Kline J., Plonka D., Ron A., A Signal Analysis of Network
Traffic Anomalies, In Internet Measurement Workshop, Marseille,
November 2002
[13] Lakhina A, et al., Mining Anomalies Using Traffic Feature Distributions,
Proc. ACM SIGCOMM, 2005
[14] Kim S. S, Narasimha Reddy A.L, Marina Vannucci, Detecting Traffic
Anomalies through Aggregate Analysis of Packet Header Data, In
Networking, 2004
[15] Wu S., Yen E., Data Mining-Based Intrusion Detectors, Expert Systems
with Applications, pp 56055612, 2009
[16] Shelly Xiaonan Wu, Wolfgang Banzhaf, The Use of Computational
Intelligence in Intrusion Detection Systems: A Review, Applied Soft
Computing, pp 1-35, 2010
[17] Georgios P. S, Sokratis K. K, Reducing False Positives In Intrusion
Detection Systems, Computers & Security, 29, pp 35 44, 2010
[18] Chih-Fong Tsai, Yu-Feng Hsu, Chia-Ying Lin, Wei-Yang Lin, Intrusion
Detection by Machine Learning: A Review, Expert Systems with
Applications , 36, pp 11994 12000, 2009
[19] Petrovskiy M., Outlier Detection Algorithms in Data Mining Systems,
Programming and Computer Software, 29(4), pp 228237, 2003
[20] J ian Tang, Zhixiang Chen, Ada Waichee Fu, David Cheung W, Capabilities
of Outlier Detection Schemes In Large Datasets, Framework and
Methodologies, Knowledge and Information Systems, 11, pp 4584, 2006
[21] Varun Chandula, Arindam Banerjee, Vipin Kumar, Outlier Detection: A
Survey, Technical Report TR 07-017, Dept of CSE, University of
Minnesota, USA
[22] Joel Branch W., Chris Giannella, Boleslaw Szymanski, Ran Wolff , Hillol
Kargupta, In Network Outlier Detection in Wireless Sensor Networks,
Knowledge and Information Systems, 34, 2013
[23] Takeuchi J , Yamanishi K., A Unifying Framework For Detecting Outliers
and Change Points From Time Series, IEEE Transactions on Knowledge
and Data Engineering, 18(4), pp 482 492 , APRIL 2006
177
[24] Ramaswamy S., Rastogi R., Shim K., Efficient Algorithms for Mining
Outliers from Large Data Sets, ACM SIGMOD International Conference
On Management of Data, Dallas, TX, USA, pp. 427-438, 2000
[25] Basseville M., Nikiforov V., Detection of Abrupt Changes: Theory and
Applications, Prentice-Hall Inc., Englewood Cliffs, N. J., 1993
[26] Ertoz L., Eilertson E., Lazarevic A., Tan P. N., Kumar V., Srivastava J.,
Dokas P., The MINDS - Minnesota Intrusion Detection System, Next
Generation Data Mining Boston, MIT Press, 2004.
[27] Mohammed Nazer G., Lawrence Selvakumar A., Current Intrusion
Detection Techniques in Information Technology - A Detailed Analysis,
European J ournal of Scientific Research, Euro J ournals Publishing,
pp 611-624, Inc. 2011
[28] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, Ghorbani Ali A., A Detailed
Analysis of the KDD CUP 99 Data Set, CI in Security and Defense
Applications, 2009
[29] Huang G.B., Zhu Q.Y., Siew C.K., Real-Time Learning Capability of
Neural Networks, IEEE Transactions on Neural Networks, 17(4),
pp 863 878, JULY 2006
[30] Yongqiang Liu, A Review About Transfer Learning Methods and
Applications, International Conference on Information and Network
Technology IACSIT Press, Singapore IPCSIT, 4, pp 7 11, 2011
[31] Sinno J ialin Pan, Qiang Yang, A Survey on Transfer Learning, IEEE
Transactions on Knowledge and Data Engineering, 22(10), pp 1345 1359,
October 2010
[32] Weon I-Y, Doo Heon Song, Chang-Hoon Lee, Effective Intrusion Detection
Model Through The Combination of A Signature-Based Intrusion Detection
System and A Machine Learning-Based Intrusion Detection System,
J ournal of Information Science and Engineering, 22, pp 1447-1464, 2006
[33] Terran Lane, Brodley Carla E., An Empirical Study of Two Approaches to
Sequence Learning for Anomaly Detection, Machine Learning, 51,
pp 73107, 2003
[34] Hodge V., Austin, J ., A Survey of Outlier Detection Methodologies,
Artificial Intelligence, 22, pp 85126, (2004)
178
[35] William Chauvenet, A Manual of Spherical and Practical Astronomy,
Lippincott, Philadelphia, 1
st
Ed (1863); Reprint of 1891 5th Ed: Dover, NY
(1960).
[36] Kifer D., Ben-David S. , Gehrke, J ., Detecting Change in Data Streams,
30
th
International Conf. on Very Large Data Bases, pp 180-191, 2004
[37] Gustafsson, F., Adaptive Filtering and Change Detection, J ohn Wiley &
Sons Inc., 2000
[38] J oao B. D. Cabrera, Gosar J ., Wenke Lee, Mehra Raman K., On the
Statistical Distribution of Processing Times in Network Intrusion Detection,
Proceedings of the 43
rd
IEEE Conference on Decision and Control, Bahamas,
December 2004
[39] Benjamin Peirce, Criterion for the rejection of doubtful observations,
Astronomical Journal II, 45, pp 161- 163, 1852
[40] Gould B.A., On Peirce's Criterion for the Rejection of Doubtful
Observations, With Tables for Facilitating Its Application Astronomical
J ournal IV, 83, pp 81 87, 1855
[41] Wei Lu, Hengjian Tong, Detecting Network Anomalies Using CUSUM and
EM Clustering, ISICA 2009, LNCS 5821, pp. 297308, 2009
[42] Alexander Tartakovsky G.,Boris Rozovskii L., Rudolf Blazek B., Hongjoong
Kim, Detection of Intrusions In Information Systems By Sequential Change
- Point Methods, Statistical Methodology, 3 ,pp 252293, 2006
[43] Cheifetz N., Same A., Aknin P., Verdalle E., A CUSUM Approach for
Online Change-Point Detection On Curve Sequences, Computational
Intelligence and Machine Learning, Bruges (Belgium), pp 25-27, April 2012
[44] Fabio Pacifici, Change Detection Algorithms: State of the Art, v1.2, Earth
Observation Laboratory, Tor Vergata University, Rome, Italy, Feb 2007
[45] Shohei Hido , Yuta Tsuboi , Hisashi Kashima , Masashi Sugiyama ,
Takafumi Kanamori , Statistical Outlier Detection Using Direct Density
Ratio Estimation, Knowledge and Information Systems. 26(2), pp 309-336,
2011
[46] Yoshinobu Kawahara, Masashi Sugiyama , Sequential Change Point
Detection Based on Direct Density Ratio Estimation, Statistical Analysis
and Data Mining, 5(2), pp 114 127, 2012
179
[47] Vapnik Vladimir N., An Overview of Statistical Learning Theory, IEEE
Transactions on Neural Networks, 10(5), September 1999
[48] Wenke Lee, Stolfo Salvatore J ., Chan Philip K., Real Time Data Mining-
based Intrusion Detection
[49] Usman Asghar Sandhu, Sajjad Haider , Salman Naseer, Obaid Ullah Ateeb,
A Survey of Intrusion Detection & Prevention Techniques, International
Conference on Information Communication and Management , IPCSIT, 16,
pp 66 71 , 2011
[50] Adedayo Adetoye, Andy Choi, Marina Md. Arshad ,Olufemi Soretire,
Network Intrusion Detection & Response System, September 2003
[51] Olin Hyde, Machine Learning For Cyber Security at Network Speed &
Scale, 1
st
Public Edition: AI-ONE Inc, October 11, 2011
[52] Iftikhar Ahmad, Azween Abdullah and Abdullah Alghamdi, Towards the
Selection of Best Neural Network System for Intrusion Detection,
International Journal of the Physical Sciences , 5(12), pp 1830-1839,
October, 2010
[53] Liao Y, Vemuri V., Use of K-nearest Neighbor Classifier for Intrusion
Detection, Computers & Security, 21(5), pp 439448, 2002
[54] Freeman S., Bivens A., Branch J., Host-Based Intrusion Detection Using
User Signatures, Research Conference, NY, 2002
[55] J ake Ryan, Meng-J ang Lin, Intrusion Detection with Neural Networks,
Advances in Neural Information Processing Systems , Cambridge, MA, MIT
Press, 1998
[56] Ghosh Anup K., Schwartzbard A., Michael Schatz, Learning Program
Behavior Profiles for Intrusion Detection , Proceedings of the Workshop on
Intrusion Detection and Network Monitoring, 1999
[57] Srinivas Mukkamala, Sung Andrew H., Ajith Abraham, Intrusion Detection
Using An Ensemble of Intelligent Paradigms, Journal of Network and
Computer Applications, 28, pp 167182, 2005
[58] Wlodzislaw Duch, Norbert Jankowski, Survey of Neural Transfer
Functions, Neural Computing Surveys, (2), pp 163-212, 1999
180
[59] Zheng Zhang, Jun Li, Manikopoulos C.N., Jay Jorgenson, Jose Ucles,
HIDE: A Hierarchical Network Intrusion Detection System Using
Statistical Preprocessing and Neural Network Classification, Proceedings of
the IEEE Workshop on Information Assurance and Security United States
Military Academy, West Point, NY, pp 85 90, 2001
[60] Huang G-B, Zhu Q-Y, Siew C-K., Extreme Learning Machine: A New
Learning Scheme of Feedforward Neural Networks, Extreme Learning
Machine: Theory and Applications, Neurocomputing, 70, pp 489-501, 2006
[61] Chunlin Zhang, Ju J iang, Mohammed Kamel, Comparison of BPL and RBF
Network In Intrusion Detection System, Proceedings of the 9
th
International
Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular
Computing, pp 466 - 470, 2003
[62] Hofmann C., Schmitz B., Sick, Rule Extraction from Neural Networks For
Intrusion Detection In Computer Networks, IEEE International Conference
on Systems, Man and Cybernetics, 2, pp 12591265, 2003
[63] Anyanwu Longy O., J ared Keengwe, Arome Gladys A., Scalable Intrusion
Detection with Recurrent Neural Networks, International J ournal of
Multimedia and Ubiquitous Engineering , 6(1), pp 21 28, 2011
[64] Mansour Sheikhan, Zahra Jadidi, Ali Farrokhi , Intrusion Detection Using
Reduced-Size RNN Based on Feature Grouping, Neural Computing and
Applications, 21(6), pp 1185 1190, September 2012
[65] Ghosh A.K., Michael C., Schatz M., A Real-Time Intrusion Detection
System Based on Learning Program Behavior, Proceedings of the 3
rd
International Workshop on Recent Advances in Intrusion Detection
(RAID00), Toulouse, France, 1907, pp 93 109, 2000
[66] Cheng E., J in H., Han Z., Sun J., Network-Based Anomaly Detection Using
An Elman Network, Networking and Mobile Computing, 3619,
pp 471 480, 2005
[67] Cannady J ., Applying CMAC-Based On-Line Learning To Intrusion
Detection, Proceedings of the IEEE-INNS-ENNS International J oint
Conference on Neural Networks (IJCNN00), 5, pp 405 410, 2000
[68] Liberios Vokorokos, Anton Balaz, Martin Chovanec, Intrusion Detection
System Using Self Organizing Map, Acta Electrotechnics Informatica, 6
(1), pp 1 6, 2006
181
[69] Yuan Cao, Haibo He, Hong Man, Xiaoping Shen, Integration of Self-
organizing Map (SOM) and Kernel Density Estimation (KDE) for Network
Intrusion Detection, Proc. of SPIE, 7480, 2009
[70] Alan Bivens, Chandrika Palagiri, Rasheda Smith, Boleslawszymanski,
Network-Based Intrusion Detection Using Neural Networks, ANNIE , 12,
pp 579 584, 2002
[71] Sarasamma S. T., Zhu Q. A., Huff J., Hiearchical Kohonenen Net for
Anomaly Detection in Network Security, IEEE Transactions on Systems,
Man, and Cybernetics Part B: 35(2), pp 302312, 2005
[72] Rhodes B.C., Mahaffey J.A., Cannady J.D., Multiple Self-Organizing Maps
For Intrusion Detection, Proceedings of the 23
rd
National Information
Systems Security Conference, pp 1619, 2000
[73] Zanero S., Analyzing TCP Traffic Patterns Using Self Organizing Maps,
International Conference on Image Analysis and Processing (ICIAP05),
3617, pp 8390, 2005
[74] Hoque M S., Md. Abdul Mukit, Md. Abu Naser Bikas, An Implementation
of Intrusion Detection System Using Genetic Algorithm , International
Journal of Network Security & Its Applications (IJNSA), 4(2), pp 109 120,
March 2012
[75] Chet Langin, Shahram Rahimi, Soft computing in Intrusion Detection: The
State of the Art, J Ambient Intell Human Comput, 1, pp 133 145, 2010
[76] Sangkatsanee P., Wattanapongsakorn N., Charnsripinyo C., Practical Real-
Time Intrusion Detection Using Machine Learning Approaches, Computer
Communications , 34, pp 2227 2235, 2011
[77] Amor N. Ben, Benferhat S. , Z. Elouedi, Naive Bayes vs Decision Trees in
Intrusion Detection Systems, Proceedings of the ACM symposium on
Applied computing, pp 420 424, 2004
[78] Weiming Hu, Steve Maybank, AdaBoost-Based Algorithm for Network
Intrusion Detection, IEEE Transactions on Systems, Man, and Cybernetics,
38(2), pp 577 583, April 2008
[79] Khor K-C, Ting C-Y, Amnuaisuk S-P, From Feature Selection to Building
of Bayesian Classifiers: A Network Intrusion Detection Perspective,
American Journal of Applied Sciences, 6 (11), pp 1948-1959, 2009
182
[80] Altwaijry H., Algarny S., Bayesian Based Intrusion Detection System,
J ournal of King Saud University, Computer and Information Sciences, 24,
pp 16 , 2012
[81] Pan Z-S, Chen S-C, Hu G-B, Zhang D-Q, Hybrid Neural Network and C4.5
for Misuse Detection, Proceedings of the Second International Conference
on Machine Learning and Cybernetics, Xian, pp 2463 2467, November
2003
[82] Moguerza J avier M., Munoz A., Support Vector Machines with
Applications, Statistical Science, 21(3), pp 322 336, 2006
[83] Rung-Ching Chen, Kai-Fan Cheng, Chia-Fen Hsieh, Using Rough Set and
Support Vector Machine for Network Intrusion Detect, International Journal
of Network Security & Its Applications (IJNSA), 1(1), pp 1 13, April 2009
[84] Shaohua Teng, Hongle Du, Naiqi Wu, Wei Zhang, J iangyi Su, A
Cooperative Network Intrusion Detection Based on Fuzzy SVMs, J ournal
of Networks, 5(4), pp 475 483, April 2010
[85] Latifur Khan, Mamoun Awad, Bhavani Thuraisingham, A New Intrusion
Detection System Using Support Vector Machines and Hierarchical
Clustering, The VLDB Journal, 16, pp 507 521, 2007
[86] Yang Yi, J iansheng Wu, Wei Xu, Incremental SVM Based on Reserved Set
For Network Intrusion Detection, Expert Systems with Applications, 38,
pp 7698 7707, 2011
[87] J eongseok Seo, Sungdeok Cha, Masquerade Detection Based on SVM and
Sequence-Based User Commands Profile, ASIACCS07, 2007
[88] Dennis Decoste, Bernhard Scholkopf, Training Invariant Support Vector
Machines, Machine Learning, 46, pp 161 190, 2002
[89] Srinivas Mukkamala, Guadalupe J anoski, Andrew Sung, Intrusion
Detection: Support Vector Machines and Neural Networks
[90] Duan K-B, Sathiya Keerthi S., Which is the Best Multiclass SVM Method?
An Empirical Study, N.C. Oza et al. (Eds.), LNCS 3541, pp 278 285,
2005
[91] Sung Andrew H., Srinivas Mukkamala, Identifying Important Features for
Intrusion Detection Using Support Vector Machines and Neural Networks,
Symposium on Applications and the Internet - SAINT , pp 209 - 217, 2003
183
[92] Sandhya Peddabachigari, Ajith Abraham, Crina Grosanc, J ohnson Thomas,
Modeling Intrusion Detection System Using Hybrid Intelligent Systems,
J ournal of Network and Computer Applications, Elsevier Ltd, pp 1 16,
2005
[93] Chen R-C., Chen S-P., Intrusion Detection Using A Hybrid Support Vector
Machine Based on Entropy and TF-IDF, International J ournal of Innovative
Computing, Information and Control, 4(2), pp 413 424, 2008
[94] Qing Song, Wenjie Hu, Wenfan Xie, Robust Support Vector Machine for
Bullet Hole Image Classification, IEEE Transaction on Systems, Man and
Cybernetics, 32(4), pp 440 448, Nov 2002
[95] Wenjie Hu, Yihua Liao, Vemuri V. Rao, Robust Anomaly Detection Using
Support Vector Machines, Proceedings of the International Conference on
Machine Learning, pp 1 - 7
[96] Ganapathy S., Yogesh P., Kannan A., Intelligent Agent-Based Intrusion
Detection System Using Enhanced Multiclass SVM, Hindawi Publishing
Corporation Computational Intelligence and Neuroscience , pp 1- 10, 2012
[97] Huang G-B, Zhu Q-Y, Siew C-K, Extreme Learning Machine: Theory and
Applications, Neurocomputing ,70, pp 489 501, 2006
[98] Zhong L L., Zhang Ya Ming, Zhang Yu Bin, Network Intrusion Detection
Method by Least Squares Support Vector Machine Classifier, IEEE
Transactions, 2010
[99] Chengjie G U, Shunyi Zhang, He Huang, Online Internet Traffic
Classification Based on Proximal SVM, J ournal of Computational
Information Systems, 7(6), pp 2078 2086, 2011
[100] Huang C-L, Wang C-J , A GA-Based Feature Selection and Parameters
Optimization For Support Vector Machines, Expert Systems with
Applications, 31, pp 231240, 2006
[101] Ning Ye, Ruixiang Sun, Yingan Liu, Lin Cao, Support Vector Machine
With Orthogonal Chebyshev Kernel, IEEE Transactions, 2006
184
[102] Defeng Wang, Yeung D S, Tsang E C, Weighted Mahalanobis Distance
Kernels for Support Vector Machines, IEEE Transaction on Neural
Networks, 2007
[103] J ianhua Xu , Xuegong Zhang, Kernels Based on Weighted Levenshtein
Distance, Proceedings IEEE International Joint Conference, 2004
[104] Lodhi H., Craig Saunders, J ohn Shawe-Taylor, Text Classification Using
String Kernels, J ournal of Machine Learning Research, 2, pp 419 - 444,
2002
[105] Konrad Rieck, Pavel Laskov, Linear-Time Computation of Similarity
Measures for Sequential Data, J ournal of Machine Learning Research, 9,
pp 23 48, 2008
[106] Wenke Lee, Stolfo Salvatore J ., A Framework for Constructing Features
and Models for Intrusion Detection Systems, ACM Transactions on
Information and System Security, 3(4), pp 227 261, November 2000
[107] Srinivas Mukkamala, Sung Andrew H., Feature Selection for Intrusion
Detection Using Neural Networks and Support Vector Machines, Technical
Report, pp 1 17
[108] Luis Talavera, An Evaluation of Filter and Wrapper Methods for Feature
Selection in Categorical Clustering, Technical Report
[109] Huan Liu, Lei Yu, Toward Integrating Feature Selection Algorithms for
Classification and Clustering, IEEE Transactions on Knowledge and Data
Engineering, 17(4), pp 491 502, 2005
[110] Nguyen H T., Katrin Franke, Slobodan Petrovic, Towards a Generic
Feature-Selection Measure for Intrusion Detection, International
Conference on Pattern Recognition, IEEE, pp 1529 1532, 2010
[111] Ivan Kojadinovic, Thomas Wottka, Comparison Between A Filter and A
Wrapper Approach To Variable Subset Selection in Regression Problems,
ESIT 2000, pp 311 321, 2000
[112] Tipping M.E., The Relevance Vector Machine Advances in Neural
Information Processing Systems, 12, pp 652 - 658
[113] Tipping M.E., Sparse Bayesian Learning and the Relevance Vector
Machine, J ournal of Machine Learning Research, 1, pp 211 244, 2001
185
[114] Zhiqiang Zhang, J ianzhong Cui, Network Intrusion Detection Based on
Robust Wavelet RVM Algorithm, J ournal of Information & Computational
Science, pp 2983 2989, 2011
[115] Natalia Stakhanova, Samik Basu, J ohnny Wong, A Taxonomy Of Intrusion
Response Systems, International J ournal of Information and Computer
Security, 1(1/2), pp 1 18, 2007
[116] Bingrui Foo, Glause Matthew W., Howard Gaspar M., Wu Yu-Sung,
Saurabh Bagchi, Spafford Eugene H., Intrusion Response Systems: A
Survey, CERIAS Tech Report, 2008
[117] Andreas Fuchsberger, Intrusion Detection Systems and Intrusion Prevention
Systems, Information Security Technical Report,10, pp 134 139, 2005
[118] Powers Simon T., A Hybrid Artificial Immune System and Self Organising
Map for Network Intrusion Detection, Preprint submitted to Elsevier, 2012
[119] Julie Greensmith, Amanda Whitbrook, Uwe Aickelin, Artificial Immune
Systems, Technical Report
[120] Jungwon Kim, Bentley Peter J., Immune System Approaches to Intrusion
Detection - A Review, Technical Report
[121] Li K, Huang, Fast Construction of Single Hidden Layer Feedforward
Networks, Handbook of Natural Computing. Springer, Berlin, Mar 2010
[122] Li M-B, Huang G-B , Saratchandran P., Sundararajan N., Fully Complex
Extreme Learning Machine, Neurocomputing, 68, pp 306 314, 2005
[123] Gang Wang, J inxing Hao , J ian Ma, Lihua Huang, A New Approach To
Intrusion Detection Using Artificial Neural Networks and Fuzzy Clustering,
Expert Systems with Application, 2010
[124] Srilatha Chebrolu, Ajith Abraham, Thomas Johnson P., Feature Deduction
and Ensemble Design of Intrusion Detection Systems, Computers &
Security, 24, pp 295 307, 2005
[125] Shanmugavadivu R, Nagarajan N An Anomaly-Based Network Intrusion
Detection System Using Fuzzy Logic, Indian J ournal of Computer Science
and Engineering (IJ CSE), 2(1), pp 101 111
[126] Dao Vu N.P., Rao Vemuri, A Performance Comparison of Different Back
Propagation Neural Networks Methods in Computer Network Intrusion
Detection, Technical Report
186
[127] Rosenblatt F., Principles of Neurodynamics: Perceptrons and the Theory of
Brain Mechanisms, Spartan Books, New York, 1962
[128] Abdulkadir Sengur, Multiclass Least-Squares Support Vector Machines for
Analog Modulation Classification, Expert Systems with Applications,
36(3), pp 6681 6685, 2009
[129] Mehdi Moradi, Mohammad Zulkernine, A Neural Network Based System
for Intrusion Detection and Classification of Attacks, International
Conference on Advances in Intelligent Systems, Theory and Applications,
Luxembourg, IEEE, November 2004
[130] Bouzida, Cuppens F., Neural Networks Vs. Decision Trees For Intrusion
Detection, IEEE/IST Workshop on Monitoring, Attack Detection and
Mitigation
[131] Xiaojun Tong, Zhu Wang, Haining Yu, A Research Using Hybrid
RBF/Elman Neural Networks for Intrusion Detection System Secure
Model, Computer Physics Communications, 180(10), pp 1795 1801, 2009
[132] Al-Subaie M., Zulkernine M., Efficacy of Hidden Markov Models Over
Neural Networks in Anomaly Intrusion Detection, International Computer
Software and Applications Conference (COMPSAC06), pp 325 332, 2006
[133] Ahmad Ghodselahi, A Hybrid Support Vector Machine Ensemble Model
for Credit Scoring, International J ournal of Computer Applications, 17(5),
pp 1 5, March 2011
[134] Cortes C., Vapnik V., Support Vector Networks, Machine Learning, 20,
pp 273 297, 1995
[135] Li Rui, Computer Network Attack Evaluation Based on Incremental
Relevance Vector Machine Algorithm, J ournal of Convergence Information
Technology, J CIT, 7(1), J anuary 2011
[136] Di He, Improving the Computer Network Intrusion Detection Performance
Using the Relevance Vector Machine with Chebyshev Chaotic Map, IEEE,
2011
[137] Package e1071 September 12, 2012, http://cran.r-project.org /web
/packages /e1071 /e1071.pdf
[138] Package klaR, August 28, 2012http://cran.r-project.org/ web/ packages/
klaR/ klaR.pdf
187
[139] Package kernlab, November 28, 2012, http://cran.r-project.org /web
/packages /kernlab/kernlab.pdf
[140] R. A. Maxion, R. R. Roberts, Proper Use of ROC Curves in
Intrusion/Anomaly Detection, Technical Report Series, CS-TR-871,
November 2004
188
APPENDIX 1
GLOSSARY OF TECHNICAL TERMS
Alert A message generated by IDS whenever it detects an event of interest.
An alert typically contains information about the attack or some
unusual activity that was detected.
Anomaly Any significant deviations from the normal behavior/pattern
Attack An intelligent act that is a deliberate attempt (especially in the sense
of a method or technique) to evade security services and violate the
security policy of a system In other words, an intrusion attempt.
Event Activity detected by the IDS which may result in an alert. For
example, N failed logins in T seconds might indicate a brute-
force login attack
False
negative
occurs if the IDS does not identify an event that is part of an attack
as being malicious
False positive Occurs if the IDS identify an event that is not part of an attack as
being malicious.
Intrusion Any set of actions that attempt to compromise the confidentiality,
integrity or availability of system or network resources. Any
intrusion is a consequence of an attack, but not all attacks lead to an
intrusion
Intrusion
Detection
System
Monitors computer systems and/or network and analyzes the data for
possible hostile attacks originating from external world and also for
system misuse or attacks originating from inside the enterprise
Network
Security
Protection of Integrity, Availability and Confidentiality of
Network Assets and services from associated threats and
vulnerabilities so as to maintain the service availability, avoid
financial losses, damage to image, protect personnel, customer and
business secrets etc.
189
Promiscuous
Mode
Network Interface card when set in promiscuous mode, not only
accepts the packets intended to it but also receives and processes all
other packets which are moving around in the network
Signature/
Pattern based
intrusion
detection
The intrusion detection system contains a database of known
vulnerabilities in the form a sequence of strings. It monitors traffic
and seeks a pattern or a signature match
True
Negative
They occur when no alerts are triggered for events which are not part
of an attack(s)
True Positive They occur when alerts are triggered for events which are part of an
attack(s)
Vulnerability A flaw or weakness in a systems design, implementation, or
operation and management that could be exploited to violate the
systems security posture
Security
Policy
A set of rules and practices that specify or regulate how a system
or organization provides security services to protect sensitive and
critical system resources
190
APPENDIX 2
ATTACK DESCRIPTION
ARPpoison An attacker who has compromised a host on the local
network disrupts traffic by listening for ARP-who-has
packets and sending forged replies. ARP (address resolution
protocol) is used to resolve IP addresses to Ethernet
addresses. Thus, the attacker disrupts traffic by misdirecting
traffic at the data link layer
DoS attack A denial-of-service attack or distributed denial-of-service
attack (DDoS attack) is an attempt to make a computer
resource unavailable to its intended users. Although the
means to, motives for, and targets of a DoS attack may vary,
it generally consists of the concerted, malevolent efforts of a
person or persons to prevent an Internet site or service from
functioning efficiently or at all, temporarily or indefinitely
by choking the network bandwidth, and/or consuming
computing resources like memory and CPU
Fragment overlap
attack
A TCP/IP Fragmentation Attack is possible because IP
allows packets to be broken down into fragments for more
efficient transport across various media. The TCP packets
(and its header) are carried in the IP packet. In this attack the
second fragment contains incorrect offset. When packet is
reconstructed, the port number will be overwritten
IPsweep An IPsweep attack is a surveillance sweep to determine
which hosts are listening on a network. This information is
useful to an attacker in staging attacks and searching for
vulnerable machines
Land This is a Denial of service attack where a remote host is sent
a UDP packet with the same source and destination
191
Neptune Floods the target machine with SYN requests on one or
more ports, thus causing Denial of service
POD This attack, also known as Ping Of Death, crashes some
older operating system by sending an oversize fragmented IP
packet that reassembles to more than 65,535 bytes, the
maximum allowed by the IP protocol. It is called ping of
death because some older versions of Windows 95 could be
used to launch the attack using ping -l 65510
Smurf This is a distributed network flooding attack initiated by
sending ICMP ECHO REQUEST packets to a broadcast
address with the spoofed source address of the target. The
target is then flooded with ECHO REPLY packets from
every host on the broadcast address
Teardrop This attack reboots the host by sending a fragmented IP
packet that cannot be reassembled because of a gap between
the fragments
UDP storm An attacker floods the local network by setting up a loop
between an echo server and a Client machine or another
echo server by sending a UDP packet to one server with
the spoofed source address of the other
192
LIST OF PUBLICATIONS
1. Naveen N.C, Dr.Srinivasan R., Dr.Natarajan S., Application of Change
Point Outlier Detection Methods in Real Time Intrusion Detection,
International Conference on Advanced Computer Science Applications and
Technologies ACSAT2012 - Kuala Lumpur, Malaysia, 26 28 Nov 2012,
Accepted for publication in IEEE Xplore 2013
2. Naveen N.C, Dr.Srinivasan R, Dr.Natarajan S., Application of Relevance
Vector Machines in Real Time Intrusion Detection, (IJ ACSA) International
J ournal of Advanced Computer Science and Applications, 3 (9), pp 48 53,
2012
3. Naveen N.C, Anisha B.S, Arvind Murthy A Unified Approach for Outlier
Detection Using Change Point for Intrusion Detection, IFRSAs
International J ournal of Computing, 2(3), pp 550 555, July 2012
4. Naveen N.C, Dr. Srinivasan R., Dr.Natarajan S., A Unified Approach for
Real Time Intrusion Detection using Intelligent Data Mining Techniques,
International Journal of Computer Applications IJCA Special Issue on
Network Security and Cryptography (NSC), pp 13-17, 2011
5. Naveen N.C, Dr. Srinivasan R., Dr.Natarajan S., Research Direction in
Intrusion Detection , Prevention and Response System-A Survey, IFRSA
International Journal of Data Ware Housing & Mining (IIJ DWM), 1(1), pp
95-100, Aug 2011
193
VITAE
Currently working as Associate Professor in the Dept of ISE, R V
College of Engineering, Bangalore.
Responsibilities held in current designation:
Handle subjects for MTech Software Engineering and Information
Technology
Co-ordinator for Placement, NBA and TEQIP of ISE Department
Event coordinator and organizer for workshops conducted for faculty of
various engineering colleges.
Conducted various technical and cultural fests in the department
Play a major role in college administrative activities.
BOS and BOE member for Autonomous
Professional Training
1. Successfully completed training in ORACLE at TULEC, Bangalore.
2. Successfully completed training in C,C++at SPAN.
3. Successfully completed training in J ava, EJ B
4. Successfully completed training on .Net conducted from Microsoft
Industry Exposure
Working as a corporate trainer for induction batches for Wipro, Tata Elxsi,
YAHOO, SAP Labs, Sabre Holdings and faculty for MS BITS, Pilani.
Books Publication
Solution Manual for the custom Cryptography and Network Security 4
th
Edition,
Pearson, 2011 ISBN 978-81-317-5906-6
Naveen N C