A New Data-Mining Approach For Network Intrusion

A New Data-Mining Approach for Network Intrusion
Detection System using Boyer-Moore Algorithm

Ch.Balarama Krishna1, K.Rajesh Kumar2 Asst. Professor
1, 2
Department of Computer Science
Swaranandhra College of Engineering & Technology
Seetharampuram-Narsapur,Andhra Pradesh, India.
Chbalaram25@gmail.com , kumarkaligithi@gmail.com
Abstract—Today’s as information systems are and hardware that set about to perform intrusion
varies open to the Internet, the importance of detection. Intrusion detection is a process of
protected networks is largely developed. New colleting intrusion related knowledge happening in
intelligent Intrusion Detection Systems which are the process of observing the events and balancing
based on civilized algorithms rather than popular them for sign or intrusion. It raises the alarm when a
signature based detections are in application. executable intrusion pass in the system. The network
There is frequently the need to upgrade an data obtain of intrusion detection consists of macro
installed Intrusion Detection System due to new amount of textual information, which is delicate to
attack methods or improved computing comprehend and analyze. Many IDS can be described
environments. Since many current Intrusion fundamental functional components. Information
Detection Systems are implemented by manual Obtain, Analysis, and Response. Different obtains of
encoding of skilful knowledge, changes to information and events based on information are
them are costly and slow rate. In data mining- collected to decide whether intrusion has taken place.
based intrusion detection system, we should This information is collected at various levels like
make use of specific domain knowledge in system, host, application, etc. Based on analysis of
relation to intrusion detection in order to this data, we can sight the intrusion based on two
effectively extract relative rules from large general practices. Misuse detection and Anomaly
amounts of records. This paper proposes new detection. Issue detection is based on extensive
ensemble approach for Boyer-Moore Algorithm. knowledge of patterns associated with known attacks
Detrimental results how’s better results for provided by human experts. Pattern matching, data
detecting intrusions as analyze to others existing mining, and state transition analysis are some of the
methods. approaches for Misuse detection and Anomaly
detection is based on profiles that represent normal
Keywords –Data mining for Naïve Bayesian,
behavior of users, hosts networks, and detecting
Boyar and Moore algorithm,ensemble approach,
struggle of significant deviation these profiles.
network intrusion detection system.
Statistical Methods, expert system are some of the
methods for intrusion detection based on Anomaly
I.INTRODUCTION detection.
Being widely used and rapidly developed in recent

The primary desire behind using intrusion
years, network technologies have provided us with
detection in data mining [5, 10, 12, 13, 15, 18] is
new life and shopping experiences, especially in the
automation .guide of the natural nature and pattern
fields of e-business,e-learning and e-money. But along
of the intrusion can be computed using data mining.
with network development, there has come a huge
To apply data mining techniques in intrusion
increase in network crime. It not only greatly affects
detection, first, the collected monitoring data needs to
our everyday life, high believes heavily on
be preprocessed and altered to the format suitable
networks and Internet technologies, but also damages
for mining processing. Next, the reformatted data will
computer systems that serve our daily activities,
be used to develop a clustering or classification
including business, learning, and entertainment and
model. The classification model can be rule-based,
so on. Besides of this internal hacking is difficult to
decision-tree based, association-rule based, Bayesian-
detect because firewalls and Intrusion Detection
network based, or neural network based. Intrusion
Systems usually only defend against outside attacks.
Detection mechanism based on IDS are not only
Intrusion Detection Systems an essential detection
automated but so provide for a significantly
used as a counter measure to maintain data integrity
elevated carefulness and productivity. Unlike
and system availability from attacks. Intrusion
manual techniques, Data Mining ensures that no
Detection Systems (IDS)is a combination of software
intrusion will be missed while checking real time increasingly difficult and in-accurate to estimate the
records on the network. Credibility is essential in multidimensional distributions of the data
every system.IDS are now becoming essential part of points[1]. However, recent outlier detection algorithms
our security system, and its credibility also adds that we utilize in this study are based on computing
value to the whole system Data mining techniques can the full dimensional distances of the points from one
be applied to gain insightful knowledge of another[9,16]as well a son computing the densities of
intrusion prevention mechanisms. They can help local neighborhoods [6].Classifier construction is
detect new vulnerabilities and intrusions, discover another essential research challenge to build efficient
previous unknown patterns of attacker natures, and IDS. Nowadays, many data mining algorithms have
provide decision support for intrusion management. become very popular for classifying intrusion
The proposed paper organized as, Section2 explains detection datasets such as decision tree, Naïve
about data mining. Section3 introduces Boyer-Moore Bayesian classifier, neural network, genetic
algorithm. Experiment and result included in Section algorithm, and support vector machine etc. However,
4 with concluding conclusion in section 5. the classification truthfulness of most existing data
mining algorithms needs to be improved, because it is
II.DATA MINING
very difficult to detect several new attacks, as the
Data mining ,also called Knowledge-Discovery attackers are continuously changing their attack
and Data Mining, is one of the hot topic in the field of patterns Anomaly network intrusion detection models
knowledge extraction from database. Data mining is are now using to detect new attacks but the false
used to automatically learn patterns from large positives are usually very high. The performance of an
quantities of data Mining can efficiently discover intrusion detection model depends on its detection rates
useful and Interesting rules from large collection of and false positives .Ensemble approaches [14, 17] have
data. It is a fairly recent topic in computer science the advantage that they can be made to adopt the
but utilizes many older computational techniques from changes in the stream more accurately than single
statistics, information retrieval, machine learning and model techniques. Several ensemble approaches have
pattern recognition. Data mining is disciplines works been proposed for classification of evolving data
to finds the major relations between collections of data streams. Ensemble classification technique is
and enables to is cover a new and anomalies nature. advantageous over single classification method. It is
Data mining based on intrusion detection techniques combination of several base models and it is used for
practically fall into one of two categories; misuse continuous learning. Ensemble classifier has better
detection and anomaly detection In misuse detection, sharpness over single classification technique.
each instance in a data set is labeled as ‘normal ‘or Bagging and boosting are two of the most well-
‘intrusion’ and a learning algorithm is trained over known ensemble learning methods due to their
the labeled data. These techniques are able to theoretical performance guarantees and strong
automatically retrain intrusion detection models on experimental results. Boosting has attracted much
different input data that include new types of attention in the machine learning community as well as
attacks, as long as they have been labeled in statistics mainly because of its excellent
appropriately Data mining are used in different field performance and computational attractiveness for
such as marketing, financial affairs and business large datasets.
organizations in general and proof it is success. The
main approaches of data mining that are used III.OUR APPROACH
including classification which maps a data item into
one of several predefined categories. This approach
normally output “classifiers” has ability to classify This proposed model uses The Boyer-Moore
new data in the future, for example, in the form of Algorithm i.e. Naïve Bayesian classification
decision trees or rules. An ideal application in techniques to increase performance of the intrusion
intrusion detection will be together sufficient detection system .
“normal” and “abnormal” audit data for a user or a The Boyer-Moore Algorithm:-Although the above
program. The second essential approach is Clustering algorithm is quite knowing, it doesn’t help that much
which maps data items into groups according to unless the strings you are exploring involve allot of
similarity or distance between them. come again at terns. The algorithm of Boyer and
Moore [BM 77] approaches the pattern with the text
Anomaly detection techniques thus identify new from right to left. If the text symbol that is contrasted
types of intrusions as deviations from normal usage with the rightmost pattern symbol does not action in
[7,8]. In statistics-based outlier detection techniques the pattern at all, then the pattern can be moved by m
[4]the data points are modeled using a stochastic positions back of this text symbol.
distribution and points are determined to be outliers
depending upon their relationship with in model.
However ,with increasing dimensionality ,it becomes
Bad character heuristics
This method can also be activated if the bad character,
i.e. the text symbol that causes a differ, occurs
somewhere else in the pattern. Then the pattern can be
shifted so that it is adjusted to this text symbol. The
next example clear up this situation.
Example:
0 1 2 3 4 5 6 7 8 9 ... Figure 2: Only a part of the parallel suffix occurs at

a b b a b a b a c b a the beginning of the pattern
b a b a c
b a b a c The situation is mush the same to the Knuth-Morris-
Comparison b-c causes a inequality. Text symbol b Pratt preprocessing. The matching suffix is a border of
occurs in the pattern at positions 0 and 2. The pattern a suffix of the pattern. Thus, the borders of the suffixes
can be shifted so that the rightmost b in the pattern is of the pattern have to be completed. However, now the
coordinate to text symbol b. reverse mapping is needed between a given border and
the smallest suffix of the pattern that has this border.
Good suffix heuristics Moreover, it is fundamental that the border cannot be
Now and then, the bad character heuristics fails. In the developed to the left by the same symbol, since this
following direction the comparison a-b causes a would cause another imparity later shifting the pattern.
mismatch. An alignment of the rightmost occurrence In the following first part of the preprocessing
of the pattern symbol a with the text symbol a would algorithm an array f is computed. Every entry
produce a negative shift. Instead, a shift by 1 would be f[i]contains the starting position of the deeped border
possible. However, in this case it is choice to derive of the suffix of the pattern starting at position i. The
the maximum available shift distance from the suffix ε beginning at position m has no border,
structure of the pattern. This method is called good therefore f[m] is set to m+1. Like to the Knuth-Morris-
suffix heuristics.. Pratt preprocessing algorithm, each border is
Example: calculated by checking if a shorter border that is
already known can be developed to the left by the
0 1 2 3 4 5 6 7 8 9 ... same symbol. However, the case when a border cannot
a b a a b a b a c b a be implemented to the left is also interesting, since it
c a b a b starts to a promising shift of the pattern if a mismatch
c a b a b occurs. Therefore, the conforming shift distance is
The suffix ab has eqated. The pattern can be shifted saved in an array s – provided that this entry is not
until the next instanced of ab in the pattern is aligned already activated. The latter is the case when a smaller
to the text symbols ab, i.e. to position 2. In the suffix has the same border.
following situation the suffix ab has identified. There Good Suffix preprocessing
is no other occurrence of ab in the pattern. Therefore,
the pattern can be shifted behind ab, i.e. to position 5. Example:
1. The identical suffix occurs somewhere else in the i: 0 1 2 3 4 5 6 7

pattern (Figure 1). p:a b b a b a b
f: 5 6 4 5 6 7 7 8
2. Only a part of the parallel suffix occurs at the s: 0 0 0 0 2 0 4 1
beginning of the pattern (Figure 2). The expanded border of suffix babab beginning at
position 2 is bab, starting at position 4. Therefore,
f[2] = 4. The expanded border of suffix ab beginning
at position 5 is ε, beginning at position 7. Therefore,
f[5] = 7. The values of array s are determined by the
borders that cannot be extended to the left. The suffix
babab starting at position 2 has border bab, starting at
position 4. This border cannot be took to the left since
p[1] p[3]. The difference 4 – 2 = 2 is the shift
distance if bab has matched and then a mismatch
Figure 1: The identical suffix (gray) occurs occurs. Therefore, s[4] = 2. The suffix babab
somewhere else in the pattern beginning at position 2 has border b, too, beginning at
position 6. This border cannot be extended either. The
difference 6 – 2 = 4 is the shift distance if b has
matched and then a mismatch occurs. Therefore,
s[6] = 4. The suffix b beginning at position 6 has categorized into five main classes as one normal class
border ε, beginning at position 7. This border cannot and four main intrusion classes as DOS, U2R, R2L
be extended to the left. The difference 7 – 6 = 1 is the and Probe. There are 22different types of attacks that
shift distance if nothing has matched, i.e. if a are grouped into the four main types of
mismatch occurs in the first comparison. Therefore, attacks DOS, U2R, R2L and Probe tabulated in Tables.
s[7] = 1. The experimental putting is for the KDD99 Cup,
filling 10% of the whole real raw data stream (494021
data samples) and 12 features are selected as per
IV. EXPERIMENT AND RESULT proposed algorithm. Figures 1(a) -1(c) show
graphical comparison of `Boyar-Moore algorithm
The proposed The Boyer-Moore algorithm is tested with the Winner (KDDCup’99), eClass0, eClass1,
on KDDCup’99 dataset and compared to that of a KNN, C4.5 and Naïve Bays in terms of closeness or
Naïve Bayer’s, KNN, eClass0 [1], eClass1 [1] and detection rate.
the Winner (KDDCup’99).
A. Survey of Anomaly Detection
TABLE II. TYPES OF ATTACKS
There are common types of two attacks in network
intrusion detection(NDS): the attacks that involve
single connections and the attacks that involve
multiple connections (bursts of connections). The
standard metrics in Table 1 treat all types of attacks
similarly thus failing to provide sufficiently generic and
systematic evaluation for the attacks that involve many
network connections.
Normal with 41 features
TABLE I. METRIX FOR EVALUATION OF

INTRUSION DETECTION
Interleaved Test-Then-Train- In this method (b) DOS attack with 41 features

each individual example can be used to test the
model before it is used for principles and from this
the definiteness scan be incrementally updated. The
intension behind using this method is that, the model is
always being tested on examples it has not
seen. The advantage over holdout method being that
holdout set is not needed for testing and ensures a
smooth plot of carefulness over time as each
individual example will become increasingly less
significant to the overall average.
B, Assessment on KDDCup’99 Data Set
The experiment is set up on a intrusion detection (c) Probe attack with 41 features
real data stream which has been used in the Knowledge
Discovery and Data Mining (KDD) 1999 Cup
competition.InKDD99 dataset the input data flow
hold the implements of the network connections, such
as protocol type, connection duration, login
type etc. Each data sample in KDD99 dataset
corresponds attribute value of a class in the network
data flow, and each class is labeled either as normal
or as an attack with exactly one specific attack type.
In total, 41 features have been used in
KDD99 dataset and each connection can be
[12]R.Bane,N.Shivsharan,“Network intrusion
detection system (NIDS)”, pp. 1272-1277, 2008.
[13]V.Barnett, T. Lewis, “Outliers in Statistical
V. CONCLUSION Data”, John Wiley and Sons, NY, 1994.
[14] R. G. Byrnes D. J. Barrett, R. E. Silverman,
This paper introduced a network intrusion detection "Linux.Bezpieczeństwo. Receptury.", O'Reilly, 2003.
model using The Boyer-Moore : a learning technique [15] H. S. Javitz, A. Valdes, “The NIDES Statistical
that allows combining several decision trees to form a Component: Description and Justification”, Technical
classifier which is obtained from a weighted Report,Computer Science Laboratory, SRI International,
majority vote of the classifications given by 1993.
individual trees. The observation closeness of The [16]E Knorr, Ng, R.:Algorithms for Mining
Boyer-Moore has compared with Naïve Bayesian, Distance-based Outliers in Large Data Sets.
KNN,eClass0,eClass1and the WinnerKDDCup’99). Proceedings of the VLDB Conference (1998).
Boyer-Moore algorithm outperformed the compared [17] W. Lee, S. J. Stolfo, “Data mining approaches for
algorithms on real world intrusion dataset, intrusion detection” Proc. of the 7th USENIX Security
KDDCup’99. On the basis of these results, it can be
Symp.. San Antonio, TX, 1998.
concluded that The Boyer-Moore Algorithm may be
[18] Masahiro Yamauchi, Thsimasa, "A Heuristic
a competitive alternative to these techniques in
Algorithm FSD for the Legal Firing Sequence
intrusion detection system
Problem of PetriNcts”,Hiroshima University.
[19] M. Masud, J. Gao, L. Khan, J. Han, “Classifying
REFERENCES evolving data streams for intrusion detection”.
[20] M. Panda, M. Patra, “Ensemble rule based
assifiers for detectingnetwork intrusions”,.
1] S. Kumar, "Classification and detection of computer
[21] W. Lee and S. J. Stolfo, "Data mining approaches
intrusions",
for intrusion
Ph.D. thesis, Purdue Univ., West Lafayette, IN, 1995.
detection", Proc. of the 7th USENIX Security Symp.,
[2] W. Lee and D. Xiang "Information-theoretic
San Antonio,
measures for anomaly
TX, 1998.
detection", In Proc. of the 2001 IEEE Symp. on
[22] W. Lee, S. J. Stolfo, and K. W. Mok "A data
Security and Privacy,
mining framework for
Oakland, CA, May, 2001, pp. 130-143.
building intrusion detection models", Proc. of the 1999
[3] A. K. Ghosh, A. Schwartzbard, and M. Schatz,
IEEE Symp.
"Learning program
on Security and Privacy, Oakland, CA, May, 1999, pp.
[4] P. Nowak, "System wykrywania włamań
120-132.
iinformowania oawariach serwisów internetowych",
[23] W. Lee, R. A. Nimbalkar, K. K. Yee, S. B. Patil,
Master Thesis, Technical University of Lodz, July
P. H. Desai, T. T.
2006.
Tran, and S. J. Stolfo, "A data mining and Cidf based
[5] P. P. Angelov, X. Zhou, “Evloving fuzzy rulebased
approach for
Classifiers from data streams”, IEEE Transaction
detecting novel and distributed intrusions", Lectures
on Fuzzy Systems, Vol 16, No. 6, pp. 1462-1475,
Notes in
2008.
Computer Science, Vol. 1907, pp. 49- 54, 2000.
[6]E.Amoroso.Sieci:Wykrywanieintruzów
[24] The UCI KDD Archive, "KDD cup 1999 data",
.dawnictwo RM, 1998.
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.ht
[7]M.Wójtowski,B.Sakowicz,P.Mazur, Kompleksowy
ml
system wysokiej dostępności", Mikroelektronika
[25] MIT Lincoln Laboratory, "DARPA intrusion
Informatyka,Łódź 2005,pp. 211-216,ISBN 83-922632-
detection evaluation",
0-0.
http://www.ll.mit.edu/IST/ideval/, MA, USA.
[8]B.Sakowicz,J.Wojciechowski,K.Dura.
[26] S. Mukkamala and A. H. Sung, "Identifying
Metody.budowania wielowarstwowych aplikacji
significant features for
lokalnych i rozproszonych w oparciuo technologic
network forensic analysis using artificial intelligent
Java 2 Enterprise Edition”, Mikroelektronika I
techniques",International Journal of Digital Evidence,
Informatyka, maj 2004, KTMiI P.Ł. , pp. 163-168,
Vol. 1, Issue 4, Winter
ISBN 83-919289-5-0.
2003.
[9]B.Foote, "Integrating Java with C++",
JavaWorld.com, 1996
[10] Aggrawal, P. Yu, “Outlier Detection for High
Dimensional Data”,Proceedings of the ACM SIGMOD
Conference, 2001.
[11] B. Caswell, J. Hewlett, "Snort users manual",
2003.

A New Data-Mining Approach For Network Intrusion

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

A New Data-Mining Approach For Network Intrusion

Hochgeladen von

Copyright:

Verfügbare Formate

A New Data-Mining Approach for Network Intrusion

Detection System using Boyer-Moore Algorithm

Being widely used and rapidly developed in recent

0 1 2 3 4 5 6 7 8 9 ... Figure 2: Only a part of the parallel suffix occurs at

1. The identical suffix occurs somewhere else in the i: 0 1 2 3 4 5 6 7

TABLE I. METRIX FOR EVALUATION OF

Interleaved Test-Then-Train- In this method (b) DOS attack with 41 features

Das könnte Ihnen auch gefallen