Sie sind auf Seite 1von 4

Machine Learning Approach for Anomaly Detection

in Cloud -A Review

Pratibha S Dr.vinay S
Assistant Professor, Department of CSE, Professor, Department of CSE,
PES Institute of Technology and Management, PES Institute of Technology and Management ,
Shivamogga, India Shivamogga, India
pratibha @pestrust.edu.in vinay@pestrust.edu.in

Abstract—Cloud Computing is a new evolution in distributed attacks based on previous rules and patterns stored in
architecture providing services and resources on demand. Since database. The downside of misuse detection is that it does not
cloud is an Internet based technology it is posed with new detect new attacks which are notin the database. Anomaly
security attacks and challenges with respect to reliability and detection system Anomaly detection is the process of finding
safety. Anomaly detection is one of the Intrusion detection in the patterns in a dataset whose behavior is not normal on
cloud to detecting unknown attack patterns. Considering expected. These unexpected behaviors are also termed as
increase in network traffic and exponential growth of data anomalies or outliers. The anomalies cannot always be
involvement of human in detection system is really difficult and categorized as an attack but it can be a surprising behavior
has to faces limitations. A machine learning technique aims to
which is previously not known. It may or may not be harmful.
implement a detection system that can learn from
data(experience)and make decisions without human intervention.
The main advantage of machine learning is to analyze huge 1.1 Anomaly detection technique
volume of audit data , discover new attacks, construct detection The basic methodology of anomaly detection follows
rules improving bringing high speed and accuracy in the
parameter wise training a model before detection[2].
network. There are various machine learning techniques to
improve detection system. Each of the machine learning
technique has its own advantages and limitations. This paper
presents an overview of different machine learning techniques
for anomaly detection which can guide researchers in an
interesting area of machine learning.

Keywords—anomly detection, machine learning , supervised,


unsupervised ,Neural Networks, Bayesian Network, Support Vector
Machine, Genetic algorithm,Decsion Tree,fuzzy logic,

1. INTRODUCTION
Cloud computing is Internet based which has become an
essential part of our life. On demand services offered by Cloud
computing suffers from traditional attacks of Internet for Figure 1: Methodology of Anomaly Detection
which firewall is not sufficient. Security issue in cloud Parameterization: Pre processing data into a pre-established
computing is becoming an important aspect because of threats formats such that it is acceptable or in accordance with the
related to data and service availability. Intrusion detection targeted systems behavior.
system is security mechanism to monitor network activities Training stage: A model is built on the basis of normal (or
and identify the various attacks affecting the network services. abnormal) behavior of the system. There are different ways
Intrusion detection systems are always designed with that can be opted depending on the type of anomaly detection
preventative mechanism against intrusions. Since cloud considered. It can be both manual and automatic.
environment offers services like Software as a Service, Detection stage: When the model for the system is available, it
Platform as a Service, and Infrastructure as a Service it is is compared with the (parameterized or the pre defined)
necessary to employ an effective Intrusion detection system to observed traffic. If the deviation found exceeds (or is less than
improve availability, security and performance of the cloud. when in the case of abnormality models) from a pre defined
Basically, there are two categories of types of Intrusion threshold then an alarm will be triggered.
detection , Signature based detection and Anomaly based The remaining paper is planned as follows,Section 2
detection[8]. Signature based detection system detects known gives the machine learning techniques for anomaly detection.
Section 3 brings out the summary of comparisons of various compared to unsupervised method because they have access o more
machine learning techniques, Section 4 give recommendations information. The technical issue with supervised method is to label
for using machine learning tools to detect anomalies..Section 5 the training set and shortage of data set covering all areas.
is a conclusion part. The most common supervised algorithms are, Supervised Neural
Networks, Support Vector Machines (SVM), k-Nearest Neighbors,
Bayesian Networks and Decision Tree.
2. MACHINE LEARNING BASED ANOMALY
DETETCTION SYSTEM
Cloud computing domain poses challenges such as high
cost of errors, lack of training data ,a semantic gap between 2.2 Unsupervised Machine learning method
results and their operational interpretation, enormous
Unsupervised method (also known as clustering
variability in input data[6].To overcome these challenges method)mainly used for behavior densely concentrated in
using signature based detection involves human activity to test particular areas or groups of areas. The most common
attacks and frame new defined rule which may take hours and unsupervised algorithms are, K-Means, Self-organizing maps (SOM),
days to generate signatures for rapid attacks. A motivation C-means, Expectation-Maximization Meta algorithm (EM), Adaptive
arises to construct and maintain detection system with less resonance theory (ART), Unsupervised Niche Clustering (UNC) and
human effort. Machine learning is an appropriate tool for a One-Class Support Vector Machine.
predictive model. 3.COMPRISON OF MACHINE LEARNING TECHNIQUES
Machine learning is behavior based. It is the ability of a
program to learn from previous experiences and improve the In literature survey, it was found that various machine
performance of a system. learning techniques aimed to achieve high achieve high
Compared to other detection systems, machine learning is detection rate with their own pros and consaccordig to [8][23].
significantly harder to employ machine learning effectively. Table 1 : Pros and cons of machine learning techniques
 Machine-learning algorithms excel much better at Machine Pros Cons
finding similarities than at identifying activity that Sln Learning
does not belong there: the classic machine learning o Technique
application is a classification problem, rather than 1 Neural Ability to generalize from Slow training
discovering meaningful outliers as required by an Networks limited, noisy and pprocess not suitable
incomplete data. for real-time
anomaly detection system. detection.
 A basic rule of machine-learning is that one needs to Does not need expert
knowledge and it can find Over-fitting may
train a system with specimens of all classes, and, unknown or novel happen during neural
crucially, the number of representatives found in the intrusions network training
training set for each class should be large. 2 Bayesian Encodes pprobabilistic Harder to handle
 In intrusion detection, the relative cost of any Network relationships among the continuous features.
misclassification is extremely high compared to variables of interest.
May not contain any
many other machine learning applications. Ability to incorporate both good classifiers if
 A false positive requires spending expensive analyst Prior knowledge and data. prior knowledge is
wrong
time examining the reported incident only to
eventually determine that it reflects benign 3 Support Better learning ability for Training akes a long
Vector small samples. time
underlying activity. False negatives, on the other Machine
hand, have the potential to cause serious damage to High training rate and Mostly used binary
an organization. decision rate, classifier which
insensitiveness to cannot give additional
 When addressing the semantic gap, one dimension of input data. information about
consideration is the incorporation of local security detected type of
attack.
policies we note that traffic diversity is not restricted
to packet-level features, but extends to application- 4 Genetic Capable of deriving best Genetic aalgorithm
layer information as well, both in terms of syntactic algorithm classification rules and cannot assure
Selecting optimal constant optimization
and semantic variability. parameters. response times.
Biologically iinspired and Over-fitting.
Machine learning can be done with two methods, employs evolutionary
supervised and unsupervised. algorithm.

2.1 Supervised Machine learning method


Supervised methods (also known as classification 5 Fuzzy Logic Reasoning is High resource
aapproximate rather than consumption
methods) requires a labeled training set containing both precise. Involved.
normal and anomalous samples to construct the predictive
Effective, esspecially Reduced, relevant
model[7]. The result of supervised detection method is better against port scans and rule subset
probes. identification and classified using when naïve
dynamic rule Naïve Bayes bayes classifier
updating at runtime classifiers in different
is a difficult task. environments.
6 Decision Decision Tree works Decision Tree works Fu, Liu and One Class and First class SVM It does not
Tree well with huge data sets. ll with huge data sets. Pannu(2012) Two Class is used for require a prior
Support Vector detecting failure history
High detection High detection Machines abnormality and is self-
accuracy. accuracy. (In cloud score. adaptive by
computing) learning from
7 Self- Simple and easy-to- Time consuming observed
organizing understand algorithm that algorithm Secondly failure events.
map works. detector is
A topological clustering retrained when
supervised algorithm that certain new data The accuracy
works with nonlinear data records are of failure
set. included in the detection
The excellent capability existing dataset cannot reach
to visualize high- 100%.
dimensional data onto 1 or
Farid, Harbi, and Naive bayes It performs Minimized
2 dimensional space
Rahman (2010) and decision balance false positives
tree for detections and and
8 Expectation Can easily change the Slow convergence in adaptive keeps false maximized
Maximizatio model to adapt to a some cases intrusion positives at balance
n Meta different distribution of detection acceptable detection rates.
data sets. level for
Parameters number does different types of Require
not increase with the network attacks. improvement
training data increasing of False
positive rate to
remote to user
attacks.
Yasami and k-Means k-Means Outperforms
In this Cloud computing intrusion detection research, it Mozaffari clustering and clustering is first the individual
was found that one strategy to find attacks is to continuously (2009) ID3 decision applied to the k-Means and
monitor the cloud activities for anomalies and profile rules tree learning normal training the ID3.
deviated from previous experiences using machine learning methods instances to form
k clusters. This approach
tools. An extensive research work on these algorithms it was is limited to
found that using any one algorithm does not yield better result. An ID3 decision specific
tree is dataset
Now the present research direction is to bring out the constructed on
effectiveness of each of algorithms ,limit their drawbacks, each cluster.
combine them to propose a hybrid technique to make high Peddabachigari, Decision Tree The data set is Delivers good
accurate detection system. Abraham,Grosan (DT) and first passed performance on
and Support Vector through the DT the KDD cup
According to a survey and related works done [2] Thomas (2007) Machines and node dataset.
,presents a summary of various hybrid approaches conducted. (SVM information is This approach
generated and is when
passed along compared to
with SVM delivers
Table 1 : Compilation of hybrid approaches for anomaly detection
the original set of equivalent
Author Name Methods used Methodology Pros and Cons attributes results
Chitrakar, SVM Similar data Higher through SVM to
Roshan, classification instances are accuracy. obtain the final
and Chuanhe and k medoids grouped by k- output.
(2012) clustering medoids
technique and Time
resulting clusters complexity is 4.ANALYSIS AND RECOMMENDATIONS
are classified more when the
into using SVM dataset is very By “machine-learning” we mean algorithms that are first
classifiers large. trained with reference input to “learn” its specifics (either
Chitrakar, Higher Similar data Increase in supervised or unsupervised), to then be deployed on
Roshan, accuracy. instances are detection Rate previously unseen input for the actual detection
and Chuanhe Time grouped by using and reduction process[18].The research work so far conducted clearly
(2012) complexity is k- Medoids in mean time of
more when the clustering false alarm
evaluates the performance of various machine learning
dataset is very technique. rate. algorithms in terms of accuracy, detection rate and false alarm
large. rate. KDD CUP'99 dataset is current benchmark dataset to
Resulting conduct experiments. Basically, machine learning is a
clusters are Hard to predict
classification problem finding similarities rather than REFERENCES
discovering [15]. [1] Arif Sari, A Review of Anomaly Detection Systems inCloud
Networks and Survey of Cloud Security Measures in Cloud Storage
Cloud security researchers have to face challenges in using Applications, Journal of Information Security, 2015, 6, 142-154.
machine learning tools for anomaly detection and deploy a
[2] Shikha Agrawal, Jitendra Agrawal ,Survey on Anomaly Detection
model working without human intervention. Following are the using Data Mining Techniques Survey on Anomaly Detection using Data
recommendations for machine learning technique[6]. Mining Techniques, 19th International Conference on Knowledge Based and
Intelligent Information and Engineering Systems 2015, Procedia Computer
 One should have a clear picture of what problem a Science 60 ( 2015 ) 708 – 713.
system targets: what specifically are the attacks to be
[3] Mahendra Kumar Ahirwar, Manish Kumar Ahirwar, Uday Chourasia,
detected. The more narrowly one can define the ANOMALY DETECTION IN THE SERVICES PROVIDED BY MULTI
target activity; the better one can tailor a detector to CLOUD ARCHITECTURES: A SURVEY,IJRET: International Journal of
its specifics and reduce the potential for Research in Engineering and Technology, 2014, eISSN: 2319-1163 | pISSN:
misclassifications. 2321-7308.
[4] Ana Cristina Oliveiray, Marco Spohnyz, Reinaldo Gomesy, Do Le
 The single most important step for sound evaluation Quocx and Breno Jacinto Duarte Improving Network Traffic Anomaly
concerns obtaining appropriate data to work with. Detection for Cloud Computing Services, ICSNC 2014 : The Ninth
The “gold standard” here is obtaining access to a International Conference on Systems and Networks Communications.
dataset containing real network traffic from as large [5] Jayveer Singh, Manisha J. Nene, A Survey on Machine Learning
an environment as possible; and ideally multiple of Techniques for Intrusion Detection Systems, International Journal of
these from different networks. Advanced Research in Computer and Communication Engineering Vol. 2,
Issue 11, November 2013.
 The most important aspect of interpreting results is to [6] Ms. Vijitha.kondiparthi, Ms. Greeshmananda.V, Machine Learning
understand their origins. A sound evaluation For Cloud Computing Intrusion Detection, International Journal of Innovative
frequently requires relating input and output on a Research and Development , 2013,ISSN: 2278 – 0211 (Online).
very low-level. Researchers need to manually [7] Salima Omar ,Asri Ngadi ,Hamid H. Jebur, Machine Learning
examine false positives. If when doing so one cannot Techniques for Anomaly Detection: An Overview, 2013, International
determine why the system incorrectly reported a Journal of Computer Applications (0975 – 8887) Volume 79 – No.2, October
particular instance, this indicates a lack of insight into 2013.
the anomaly detection system’s operation. [8] Hua TANG, Zhuolin CAO, Machine Learning-based Intrusion
Detection Algorithms, Journal of Computational Information
Systems5:6(2009) 1825-1831,2009.

5. CONCLUSION [9] Chandola V., Banerjee A. , Kumar V., Anomaly detection: A survey,
ACM Computing Surveys (CSUR); 41(3); 2009;p. 15.
With a increase in the applicability of cloud in
[10] Shelke, P.K., Sontakke, S. and Gawande, A.D. (2012) Intrusion
industrial and academic and medical fields such as Google, Detection System for Cloud Computing. International Journal of Scientific &
Amazon and Microsoft study of cloud security is very Technology Research, 1, 67-71.
important. The complex properties of computing environment [11] Roschke, S., Cheng, F. and Meinel, C. (2009) Intrusion Detection in
of cloud always demands data availability, service availability Cloud. Eight IEEE International Conference on Dependable Automatic and
and reliability. Monitoring cloud activities continuously is a Secure Computing, Liverpool, 729-734.
major task of security. There is a need to analyze the large [12]Tang D. H., Cao Z.,Machine Learning-based Intrusion Detection
volume of network dataset and improve the performance of Algorithm; Journal of Computational Information Systems;5(6); 2009; p.
intrusion detection Detecting abnormal activities of cloud by 1825-1831.
anomaly based detection and to have human independent [13] T. Shon and J. Moon. A hybrid machine learning approach to
solution is possible by machine learning technique. Everyday network anomaly detection. Information Sciences,vol.177,pp.3799–
cloud computing is witnesses a new anomaly.. Using machine 3821,2007.
learning tools as classification or clustering method [14] A Abhinav S. Raut1, Kavita R. Singh2, Anomaly Based Intrusion
appropriately for finding anomalies. Detection-A Review, Int. J. on Network Security, Vol. 5,2014.

. There are various machine learning techniques each one [15] Anton Gulenko, Marcel Wallschl¨ager, Florian Schmidt, Odej Kao,
with advantages and drawbacks. An extensive research work Feng Liu, Evaluating Machine Learning Algorithms for Anomaly Detection in
Clouds, 2016 IEEE International Conference on Big Data..
is being carried out to make decision regarding the
applicability of a machine learning techniques in different [16] Amjad Hussain Bhat, Sabyasachi Patra, Dr. Debasish Jena, Machine
circumstances. The researchers try to find optimize various Learning Approach for Intrusion Detection on Cloud Virtual Machines,
(IJAIEM) Volume 2, Issue 6, June 2013.
machine learning techniques or hybrid techniques to bring
high detection rate and high accuracy and low false alarm [17] Hung-Jen Liao, Chun-HungRichardLin, Ying-ChihLin , Kuang-
YuanTung , Intrusion detection system: A comprehensive review, Journal of
rate. This paper is a an outcome of study of various machine Network and Computer Applications 36 (2013) 16–24.
learning techniques which can be a guidance for further
research in anomaly detection in cloud.
)

Das könnte Ihnen auch gefallen