128 views

Uploaded by Santh Blr

- Markov Chain and Classification of Difficulty Levels Enhances the Learning Path in One Digit Multiplication
- Peka Guide
- Sudha Ma Thy 2016
- A Survey on Data Mining Approaches for Healthcare
- Text Mining
- Quality Detection of Fruits by Using ANN Technique
- beliefnetworksbayesianclassification-130611015246-phpapp02
- Lab_03
- awgnlikelihoodfunc.pdf
- Data Mining Techniques Application in Power Distribution Utilities
- Crime Prediction using Fuzzy C-Means Algorithm
- Self Generated Fuzzy Membership Function Using
- Assign Batch Class to Material-ecc6.0
- DataScienceforBusiness (1)
- Ps and Solution CS229
- DayLite Implementation Guide
- exame_exemplo
- Neural Networks in Data Mining
- Harj8
- Basic Research Process

You are on page 1of 7

System

Kaushal Mittal 04329024

M.Tech I Year

Under the Guidance of Prof. Sunita Sarawagi

KReSIT, Indian Institute of Technology Bombay

ing and clustering techniques are examples of un-

Classification and clustering techniques in data supervised learning.

mining are useful for a wide variety of real time

Intrusion detection systems are softwares used

applications dealing with large amount of data.

for identifying the intentional or unintentional

Some of the applications of data mining are text

use of the system resources by unauthorized

classification, selective marketing, medical di-

users. They can be categorized into misuse de-

agnosis, intrusion detection systems. Intrusion

tection systems and anomaly detection systems.

detection system are software system for iden-

Misuse detection systems model attacks as a

tifying the deviations from the normal behav-

specific pattern and are more useful in detect-

ior and usage of the system. They detect at-

ing known attack patterns. Anomaly detection

tacks using the data mining techniques - clas-

systems are adaptive systems that distinguish

sification and clustering algorithms. In this re-

the behavior of the normal users from the other

port, I discuss approaches based on classification

users. The misuse detection systems can detect

techniques like naive bayesian classifiers, neural

specific types of attacks but are not generalized.

networks and WINNOW based algorithm. Ap-

They cannot detect new attacks until trained for

proaches based on clustering techniques like hier-

them. On the other hand, anomaly detection

archical and density based clustering have been

systems are adaptive in nature, they can deal

discussed to emphasize the use of clustering tech-

with new attacks, but they cannot identify the

niques in intrusion detection.

specific type of attacks. If the intrusion occurs

during learning, then the anomaly detection sys-

tem may learn the intruders behavior and hence

1 Introduction may fail. Being more generalized and having

a wider scope as compared to misuse detection

Classification techniques analyze and categorize

systems, most of the current research focus on

the data into known classes. Each data sample

anomaly detection systems.

is labeled with a known class label. Clustering is

a process of grouping objects resulting into set of Data mining approaches can be applied for

clusters such that similar objects are members of both anomaly and misuse detection. The data

the same cluster and dissimilar objects belongs sample are a set of system properties, represent-

to different clusters. In classification, the classes ing the behavior of the system/user. Classifi-

and number of classes is predefined. Training ex- cation techniques are used to learn a model us-

amples are used to create a model, where each ing the training set of data samples. The model

training sample is assigned a predefined label. is used to classify the data samples as anoma-

This is not the case with clustering. Classifica- lous behavior instance or the normal behavior

1

instance. Clustering techniques can be used to belief networks, neural networks etc. are used in

form clusters of data samples corresponding to data mining based applications. In this section,

the normal use of the system. Any data sam- I discuss the naive bayesian classfiers and neural

ple with characteristics different from the formed networks.

clusters is considered to be an instance of anoma-

lous behavior. Clustering based techniques can 2.1 Naive Bayesian Classifiers

detect new attacks as compared to the classifica-

tion based techniques. Naive bayesian classifiers use the bayes theorem

A number of classification and clustering al- to classify the new instances of data. Each in-

gorithms can be used for anomaly detection. [?] stance is a set of attribute values described by

proposes the use of bayesian classifiers to learn a vector, X = (x1 , x2 . . . , xn ). Considering m

a model that distinguishes the behavior of in- classes, the sample X is assigned to the class Ci

truder from the normal users behavior. [?] pro- if and only if

poses hierarchical clustering based algorithm for P (X|Ci )P (Ci ) > P (X|Cj )P (Cj )

anomaly detection on network. [?] proposes the

WINNOW based algorithm for anomaly detec- for all j in (1, m) such that j 6= i.

tion. [?] proposes the use of neural networks and The sample belongs to the class with maximum

[?] proposes the use of density based clustering posterior probability for the sample. For cate-

for anomaly detection. gorical data P (Xk |Ci ) is calculated as the ratio

Rest of the report is organized as follows: Sec-of frequency of value xk for attribute Ak and the

tion 2 discusses the bayesian classifiers and neu- total number of samples in the training set. For

ral network based classification. Section 3 dis- continuous valued attributes a guassian distribu-

cusses the hierarchical and density based cluster- tion can be assumed without any loss of gener-

ing. Section 4 discusses the anomaly detection ality.

approach based on WINNOW based algorithm In naive bayesian approach the attributes are

and the use of the classification and clustering assumed to be conditionally independent. In-

algorithms discussed in section 2 and section 3, spite of this assumption, naive bayesian classi-

for anomaly detection. Section 5 gives the con- fiers give satisfactory results because focus is on

clusion. identifying the classes for the instances, not the

exact probabilities. Application like spam mail

classification, text classification can use naive

2 Classification Techniques bayesian classifiers. Theoretically, bayesian clas-

sifiers are least prone to errors. The limitation is

In Classification, training examples are used to

the requirement of the prior probabilities. The

learn a model that can classify the data samples

amount of probability information required is ex-

into known classes. The Classification process

ponential in terms of number of attribute, num-

involves following steps:

ber of classes and the maximum cardinality of

1. Create training data set. attributes. With increase in number of classes

or attributes, the space and computational com-

2. Identify class attribute and classes. plexity of bayesian classifiers increases exponen-

tially.

3. Identify useful attributes for classification

(relevance analysis).

2.2 Neural Networks

4. Learn a model using training examples in

An artificial neural network consists of connected

training set.

set of processing units. The connections have

5. Use the model to classify the unknown data weights that determines how one unit will af-

samples. fect other. Subset of such units act as input

nodes, output nodes and remaining nodes con-

A variety of classification techniques viz. decision stitute the hidden layer. By assigning activation

tree induction, bayesian classification, bayesian to each of the input node, and allowing them to

2

propagate through the hidden layer nodes to the rameter for restricting the level of clustering.

output nodes, neural network performs a func- Clustering stops when the required number of

tional mapping from input values to output val- clusters have been formed or depth of the clus-

ues. The mapping is stored in terms of weights tering tree has reached to a specified value. Hier-

over connection. archical clustering algorithms can be categorized

Backpropagation network are simple feed for- into:

ward neural networks. Input is submitted to

the network and the activation at each level are • Agglomerative algorithms - based on bot-

cascaded forward, ending up with activations at tom up approach.

output nodes. During training backpropagation

• Divisive algorithms - based on top down ap-

algorithm [?] is used to tune the values of weights

proach.

over connection. The error at the output layer

is calculated and backpropagated. This feed-

back is used at intermediate level to readjust 3.1.1 Agglomerative Algorithms

the weights. Performance of training phase de- These algorithms initially assign each sample

pends on the learning rate used for adjusting to be a separate cluster. The clusters with the

the weights. Too small value of learning rate least distance are merged to get larger clusters

makes learning very slow. Conversely, too large till the termination condition is satisfied or a

value may result in oscillation of weights between single cluster is left.

wrong values and network may take long time to

learn. The training stops when weights tend to

converge or the network is able to classify the 1. BIRCH - Balanced Iterative Reducing

samples correctly. After training the backpropa- and Clustering using Hierarchies.

gation network can be used as a model for clas- In BIRCH, summary of statistics of the clus-

sification of new instances. ter, called as cluster feature, is calculated for

They are adaptive in nature, tolerant to noisy each sub-cluster consisting on n samples. A

data and can classify instances for which they height balanced CF(clustering feature) tree

are not trained. However, training may take a is dynamically constructed with samples as

long time and is an irreversible process. Also the leaf nodes CF of the children as non leaf

the knowledge representation in neural networks nodes. A sample is kept in the closest leaf

is not directly interpretable by humans. node. When the size of leaf node becomes

larger than the threshold, the node splits

and the CF is recalculated for the individ-

3 Clustering Techniques ual nodes and updated in the tree. The com-

plexity of the algorithm is O(n) where n is

Clustering involves unsupervised learning - num- the number of objects to be clustered. It is

ber of classes and the classes are not known in known to generate the best clusters with the

prior. In this section I discuss the hierarchical available resources but do not work well if

and density based approaches for clustering. the clusters are not spherical in shape (be-

cause it use the notion of radius to consider

3.1 Hierarchical Clustering Algo- the boundary of the cluster).

rithms

2. CURE - Clustering using Representa-

These algorithms group the data into a tree of tives.

clusters forming a hierarchical structure. The This algorithm works well for non-spherical

clusters are merged or split based on some dis- shaped clusters also. In CURE cluster is

tance measurement that accounts for the simi- represented by a set of representative points

larity or difference between the samples respec- generated by randomly selecting the scat-

tively. The distance can be euclidean distance, tered points in the cluster and shrinking

mean distance, maximum distance, average, cen- them by fractions to reach the cluster cen-

troid etc. The number of clusters acts as a pa- ter. The two clusters with the closest pair

3

systems can be used to collect this data. For win-

of representative points are merged. More

dows operating system application like perfmon

than one representative points allows non-

and netstat are used whereas for Linux systems

spherical shaped clusters. However, the

top, tcpdump, strace, etc are used.

aggregate interconnectivity is ignored and

hence the categorical attributes cannot be

Rest of the section discusses the approaches

handled. for anomaly detection based on the classification

and clustering techniques described in section 2

3.2 Density based Clustering Algo- and section 3.

rithms

The set of samples forming a dense region are

4.1 Naive Bayesian Approach

treated as clusters. Since the clusters are based [?] proposes the use of naive bayesian approach

on density, not on distance, they need not nec- for anomaly detection in systems with windows

essarily be spherical. operating system. [?] measure around 1500 fea-

tures every second. Features are specific system

3.2.1 DBSCAN - Density based Spa- properties like average CPU utilization, average

tial Clustering of Application with of last 10 values of data transfer rate, memory

Noise. utilisation, number of processes etc. A data sam-

ple comprises of values for each of these 1500 fea-

This algorithm identify regions with sufficiently

tures. The problem is defined as to classify each

high density as clusters. All the samples within

data sample into anomalous or normal category.

the radius form -neighbourhood of the sample.

[?] assume that the features are conditionally

-neighbourhood for each sample points is called

independent given the category. The training set

a core group i.e. initial cluster. All the objects

is used to calculate the prior probabilities. The

that are density reachable, connected or directly

prior probabilities are used to calculate the prob-

density reachable are merged to form larger clus-

ability of obtaining the current measurement,

ters. This continues till no more merging of clus-

given the possible categories. The current sam-

ters is possible. If spatial indexes are used, the

ples is assigned a label for which the probability

complexity of DBSCAN algorithm is O(nlogn),

calculated is maximum. [?] conducted experi-

otherwise it is O(n2 ). The algorithm is useful for

ments over test data and found the detection rate

applications with spatial databases and noise.

of this approach to be 57.8%. The low detection

rate is on account of the assumption that the

4 Intrusion Detection Systems features are conditionally independent.

This section briefly discusses the data mining

4.2 Neural Network

approaches proposed for intrusion detection sys-

tem. The characteristics of a good intrusion de- [?] proposes the use of neural networks for

tection system are: anomaly detection. The approach consists of

maintaining a database of a sequence of system

1. High detection rate.

calls made by each program to the operating sys-

2. Less false alarms. tem, used as the signature for the normal behav-

ior. If online sequence of system calls for a pro-

3. Less CPU cycles. gram differ from the sequence in the database

anomalous behavior is registered. If significant

4. Quick detection of intrusion.

percentage of sequences do not match then alarm

The user profile, system behavior comprising of for intrusion is raised.

the statistics related to the network, CPU, mem- Backpropagation network is trained with

ory, processes, softwares and applications used training set of sequences of system calls. A leaky

by the users constitute the test data for the in- bucket algorithm is used to capture the tempo-

trusion detection system. A large number of sys- ral locality of the anomalous sequences. When

tem tools and utilities plugged in with operating closely related anomalous sequences are faced,

4

counter gets a large value and when a normal se- user and detect deviation from that behaviour

quence is obtained the counter gradually drops as anomalous. For systems with multiple valid

down to zero. This leads to intrusion detection users, the requirement is to consider the be-

only when a lot of similar anomalous sequences haviour of each of the N valid users as normal

are obtained, thereby representing the behavior and the remaining users as anomalous. For such

of intruder. requirements this approach is useful. It creates

density based clusters corresponding to the be-

4.3 Hierarchical Clustering havior of the N users. Any sample not resem-

bling the behaviour of any of the N users will

[?] proposes the use of graph clustering for in- lie outside the cluster and will be considered as

trusion detection over networks. The approach outlier.

consists of using agglomerative clustering to

form clusters of nodes communicating exten-

4.5 WINNOW based Algorithm

sively with each other. The nodes or systems

on the network can be represented by a graph. [?] proposes a WINNOW based algorithm for

In the graph, nodes represent the systems and anomaly detection. Most of the approaches dis-

edges with weight represent the amount of data cussed above did not achieved high detection

exchange between the system corresponding the rates. Experimental results proves that this ap-

nodes linked by the edge. This graph is decom- proach can achieve detection rate of about 95%

posed into the number of clusters such that nodes with less than one false alarm per day. The data

within each cluster exchange data with other collected is same as described in subsection ??.

nodes in the cluster extensively. Perfmon is used to measure around 200 differ-

Feature vector consisting of values for features ent properties and 1500 different features corre-

like degree of nodes, average outgoing traffic etc. sponding to these properties. Each data sample

is calculated. A neural network is used to learn is a vector containing the value of these 1500 fea-

the mapping from these feature values to the nor- tures.

mal behavior or anomalous behavior. If the in- The algorithm consists of three phases.

truder use the system, the traffic over the net-

work changes, resulting in the change of the fea-

4.5.1 Training Phase

ture values, leading to the detection by the neu-

ral network as anomalous behavior. The data samples are collected corresponding to

the normal user and an equal number of samples

4.4 Density based Local Outliers corresponding to the intruder (Any user other

than the normal user). The values for most

[?] proposes a density based clustering approach of the features are continuous. They are dis-

for anomaly detection. Data sample correspond- cretized into ten bins. The values of a feature

ing to the anomalous behavior is considered as are classified into ten bins by fitting the stan-

an outlier. This approach assigns a LOF (Local dard distribution functions like uniform, guas-

outlier factor) to each data sample. Greater the sian, exponential and erlang. The distribution

LOF, greater is the probability of sample being function for which the root mean square error is

an outlier. The k-distance is computed for the minimum represents the probability distribution

k th nearest neighbor for each sample. DBSCAN for the feature. The probability distribution is

algorithm is used to find k-neighbourhood for the obtained by normalizing the frequency in each

data sample and form clusters of samples corre- bin with the total count of samples. [?] propose

sponding to the normal behavior. Large value of to use the WINNOW based algorithm to assign

LOF for a data sample indicates it is distant from weights to each feature to model the normal be-

the clusters of samples corresponding to normal haviour.

behaviour. Hence the sample is an outlier. The WINNOW based algorithm is as follows

This approach does not requires tuning and

is adaptive in nature. Most of the techniques 1. Initialize the weights for each feature, wf ,

discussed above learn the behavior of a specific as 1.

5

2. For each training sample user. Even during this phase WINNOW algo-

rithm has to be used to adjust to the chang-

(a) Initialize votes for and votes against to ing behavior of the user, otherwise false alarm

be 0. rate will increase and intrusion detection rate

(b) For all features, if relative probability will gradually drop.

of the feature is less than the constant

r, add the weight wf to votes for oth-

4.5.4 Experimental Evaluation and

erwise add it to votes against.

Analysis

(c) If votes f or > votes against then the

measurement is anomalous. [?] have conducted experiments to analyze the

performance of the WINNOW based algorithm

(d) If sample corresponds to normal user

proposed. The analysis show that if tuning pa-

and is considered anomalous, the

rameters are carefully selected, the intrusion de-

weights of all features that voted for

tection rate reaches 95% with less than one false

raising alarm are reduced to half of

alarm per day. The tuning parameter W is to

their current values. Conversely, if

be selected carefully. Too small the value of W

anomalous sample is treated as normal,

larger the number of false alarms, due to over-

the weights of all features that voted

lapping of the samples with the samples that

against raising the alarm are reduced

have already raised an alarm. As W increases,

to half of their current values.

the false alarm rate decreases, but intruder gets

more time to use the system before being de-

4.5.2 Tuning Phase tected. The system can adapt itself to learn the

Tuning data consists of data samples from the changing behavior of the normal user. If the tun-

normal user and the intruders(other users). This ing parameters are wrongly selected, the system

phase involves calculation of three system pa- may learn the intruder’s behavior also.

rameters W - the window size, T hreshmini and

T hreshf ull . For different combinations of these

parameters, following steps are executed: 5 Conclusion

1. The feature values of test sample collected Intrusion detection systems are one of the key

each second, vote for mini alarms. If the ra- areas of application of data mining techniques.

tio of votes for and votes against is greater Naive bayesian classifiers though performs well

than T hreshmini , then a mini alarm is for most of the applications inspite of the as-

raised. sumption of conditional independency, does not

provide good results for intrusion detection sys-

2. If number of mini alarms in last W sec- tems. Clustering techniques can be used for in-

onds is greater than T hreshf ull then raise trusion detection, as they can detect unknown

an alarm signaling intrusion. After each attacks also. They are useful for misuse detec-

such alarm wait for W seconds to avoid the tion as well as anomaly detection systems.

samples from overlapping. WINNOW based algorithm provides higher

The goal is to select the value of these parame- detection rates and lower false alarm rates as

ters to maximize the intrusion detection rate and compared to the other approaches discussed.

minimize the false alarms. The system involves less CPU cost. The only

costly phase is the tuning phase. Oftenly, intru-

sion or misuse of system can be best described

4.5.3 Operation Phase

by excessive usage of resources and events that

In this phase the learned statistical model, along do not occur frequently. For eg. too many print

with the values of the parameters W, T hreshmini jobs, downloads etc. The system assumes the

and T hreshf ull are used to detect anomalous be- samples collected from all other users, except the

haviour. The system can retrain and retune to normal user as samples of anomalous behavior.

adjust to the changing behavior of the normal In practice, this may not be a good representa-

6

tive set for anomalous behavior. It is necessary

to test the system with data consisting of real

data samples corresponding to intrusive behav-

ior.

References

[1] Jude Shavlik and Mark Shavlik, Selec-

tion, Combination, and Evaluation of

Effective Software Sensors for Detect-

ing Abnormal computer Usage, KDD

2004, Seattle, Washington, USA., 2004.

Srivastava and V. Kumar, A Com-

parative Study of Anomaly Detection

Schemes in Network Intrusion Detec-

tion, Proc. SIAM Conf. Data Mining,

2003.

M. Schatz, Learning Program Behav-

ior Profiles for Intrusion Detection,

USENIX Workshop on Intrusion De-

tection and Network Monitoring, April

1999.

Intrusion Detection by Graph Cluster-

ing and Graph Drawing, RAID 2000.

ing, McGraw-Hill, International Edi-

tion 1997.

Data Mining Concepts and Techniques,

Morgan Kaufmann, 2001.

- Markov Chain and Classification of Difficulty Levels Enhances the Learning Path in One Digit MultiplicationUploaded byMartin
- Peka GuideUploaded byrozitaismail673553
- Sudha Ma Thy 2016Uploaded byvishek kumar
- A Survey on Data Mining Approaches for HealthcareUploaded byHandsome Rob
- Text MiningUploaded byArundhati Mukherjee
- Quality Detection of Fruits by Using ANN TechniqueUploaded byIOSRjournal
- beliefnetworksbayesianclassification-130611015246-phpapp02Uploaded byVarshika Choudhary
- Lab_03Uploaded byMuhdHusaini
- awgnlikelihoodfunc.pdfUploaded byVithal Kaligotla
- Data Mining Techniques Application in Power Distribution UtilitiesUploaded byMichael McElligott
- Crime Prediction using Fuzzy C-Means AlgorithmUploaded byIRJET Journal
- Self Generated Fuzzy Membership Function UsingUploaded byRimbun Ferianto Sr
- Assign Batch Class to Material-ecc6.0Uploaded byZoheb Hasan Ahmed
- DataScienceforBusiness (1)Uploaded byiltdf
- Ps and Solution CS229Uploaded byAnonymous COa5DYzJw
- DayLite Implementation GuideUploaded bybrufpot
- exame_exemploUploaded byJosé Eduardo Madeira Celeiro Diniz Rebelo
- Neural Networks in Data MiningUploaded byEditor IJRITCC
- Harj8Uploaded byHariharKalia
- Basic Research ProcessUploaded byBarun Gupta
- 1994-2006Uploaded bypujamaiti
- A Review of Road Extraction From Remote Sensing ImagesUploaded byDineshKumarAzad
- 1.Fraud Detection in Health Insurance Using Data Mining TechniquesUploaded byBannuru Rafiq
- The New Work MindsetUploaded byalexandreandredossan
- clusterexample.docUploaded byManish
- 1027201162153Uploaded bysaman
- The State of the Residential Fire Fatality Problem in Sweden_ Epidemiology, Risk Factors, And Event TypologiesUploaded byRatna Ah
- Clustering Exercise 4-15-15Uploaded bynomore891
- Weka Exercise 1.docUploaded byamol
- 2006 - Human Face Recognition Using Zernike Moments and Nearest Neighbor Classifier - InUploaded byluongxuandan

- CHAP018 (1)Uploaded byBipin Saraswat
- XPTHC-4Uploaded byGon Racer
- PDLCUploaded byNaveen Lohchab
- Neural Networks and Face RecognitionUploaded byS Bharadwaj Reddy
- Cellular Telephone SystemUploaded byjayendra_1710
- Readme Antconc3.2.4Uploaded byVanilza Maria da Silva
- Country+India+Version_FICINUploaded byShreekumar
- Sequence & Series Theory_eUploaded bythinkiit
- RTOS_Marketstudy-V3Uploaded byIRFAN RAWAIL KHAN
- Sonata I.Uploaded byPriscila Norberto
- Home & Lifestyle 01Uploaded byEddie L Lim
- busreservationsysUploaded byajay11171
- 42660127 ATM TEST Cases SampleUploaded bymatun123
- Yamaha Motif Xs6 Xs7 Xs8 (Service)Uploaded byluca1114
- Safety Management System EMA Report TUVUploaded bysaadullah
- Planetary Combinations for ITUploaded byKrishna CH
- Karthik Sharma CVUploaded byKarthik Sharma
- ProView8130_DatasheetUploaded byVanderlei Barreto Do Lago
- iformation GuideUploaded bystejar
- C6RGT_21020Uploaded byworb13
- Design of 9 Speed Gear BoxUploaded byDeeraj Varma
- babytherm_pi_9048558_en(new).pdfUploaded byputri
- HDR Photography with CS4Uploaded bymikronex
- Planning Design and Installation of Access Control SystemUploaded byChungNguyen
- WH TransformUploaded byyugandhar_yadam
- Library Services Satisfaction Survey QuestionnaireUploaded byshane_pfr01203252
- HWQ2Uploaded byfrankandbooth
- New Matlab CodeUploaded byJoy Mondal
- 3 9 applicationsUploaded byapi-233527181
- CatalogUploaded byAhmad Ashraf