Sie sind auf Seite 1von 5

76 (IJCNS) International Journal of Computer and Network Security,

Vol. 2, No. 10, 2010

An Empirical Study to Investigate the Effectiveness


of Masquerade Detection
Jung Y. Kim1, Charlie Y. Shim2 and Daniel McDonald3
1
Utica College, Computer Science Department,
1600 Burrstone Road, Utica, NY, 13492
jkim@utica.edu
2
Kutztown University of Pennsylvania, Computer Science Department,
PO Box 730, Kutztown, PA, 19530
shim@kutztown.edu
3
Kutztown University of Pennsylvania, Computer Science Department,
PO Box 730, Kutztown, PA, 19530
dmcdo373@live.kutztown.edu

type of employed classifier and the ideal length of instances


Abstract: Masquerade detection is an important research area
in computer security. A masquerade attack can be identified can further be examined. Unfortunately, a few studies on
when audited user’s patterns significantly deviate from his or SVM-based masquerade detection have discussed these
her normal profile. One of the popular approaches in issues. Changes in these parameters will directly affect the
masquerade detection is to use Support Vector Machines overall effectiveness of the masquerade detection and this is
(SVMs). The main goal is to maximize detection rates while what we studied in this research. The rest of the paper is
minimizing the number of false alarms. In this paper, we
organized as follows. Section two surveys the previous work
explore various aspects in masquerade detection using SVMs to
determine how the overall effectiveness of the system can be
related to the topic that has been investigated. In section
enhanced. Setting a proper threshold level has a greater three, we present our empirical study and illustrate the effect
influence on the false alarm rate than the detection rate. In of adjusting different parameters in detail. Section four
addition, we have found that the classifier that takes the order of summarizes our work and concludes with our findings.
the instances into account outperformed the other type when the
instance length is not overly long.
2. Related Work
Keywords: SVM (Supportive Vector Machine), masquerade
detection, detection rate, false alarm rate. Masquerade detection is an important field of study and
various approaches such as Naïve Bayes classifiers [6], [7]
1. Introduction and Support Vector Machines [8], [9], [10] have been
attempted. Applying a Naïve Bayes classifier is simple and
The main purpose of the masquerade detection framework effective. A drawback of using this classifier, however, is
is to identify masquerade attempts before serious loss or that new “unseen” characteristics are more likely to be
damage occurs. Masquerade attacks are difficult to detect considered as a legitimate user’s patterns, which allows a
since masqueraders enter the system as valid users and thus masquerader to elude detection [11].
won’t be affected by existing access control schemes [1]. Wang and Stolfo found employing the SVM in
Masquerade detection can be designed as a class of anomaly masquerade detection performed better than a Naïve Bayes
detection in that the test instance is declared as anomalous if classifier in that it showed higher detection rates [6]. As an
it does not fall within the boundary of normal behavior [2]. attempt to increase the efficiency of the system, they used
Note that the behavior of masqueraders is unusual and thus “truncated” UNIX commands and a large command set.
deviates from that of legitimate users. The goal of anomaly They used the one-class training algorithm to detect
detection is to maximize detection rates while minimizing masquerade attacks and asserted that increasing the
false alarm rates. Various approaches have been tried to detection threshold might allow for a higher detection rate
implement masquerade detection and one of the most recent [6]. However, even though higher detection accuracy could
attempts is to use the Support Vector Machine (SVM). It is be achieved, their system left the problem of false alarm
because the SVM has achieved excellent classification rates being escalated simultaneously. Therefore, the idea of
performance in a wide range of applications such as texts, combining the output of the system with other sensors was
images, and computer security field [3], [4], [5]. However, suggested to reduce the number of false alarms.
focuses have usually been placed on demonstrating the Maxion applied "enriched" UNIX commands –
superiority of the proposed method over other approaches. commands with their corresponding arguments – to a Naive
The important topic that has been overlooked is how we Bayes classifier [11]. Higher detection rates were achieved
can maximize the effectiveness of the masquerade detection with minimally increased false alarm rates. Moreover,
using the SVM. Factors such as the relationship between the irregularly used arguments of enriched commands could be
(IJCNS) International Journal of Computer and Network Security, 77
Vol. 2, No. 10, 2010

identified [11]. However, the problem of proper threshold implementation for support vector classification [12] and
setting was still left behind. Another study showed that the this was selected for unordered classification. The
composition of two kernel methods was shown to improve SVMHMM (Support Vector Machines Hidden Markov
the detection accuracy while minimizing the false alarm rate Model) is an implementation of SVMs for sequence tagging
slightly [8]. [13] and it was selected for ordered classification.
The overall results of the tests that we have conducted are
3. Empirical Study and Experimental Results presented in Figure 1 and Figure 2. Figure 1 shows
detection rates when different threshold (TH) values, 35%,
As we surveyed in the previous section, the SVM has
50%, and 70%, were applied to each SVM and Figure 2
been popularly employed in masquerade detection.
shows their corresponding false alarm rates. Note that both
Nevertheless, these studies mainly focused on demonstrating
detection rates and false alarm rates are increased when the
the superiority of the proposed model when compared to
instance length gets longer. Detailed analysis of our findings
other approaches. The main purpose of our research is to
are described in sections 3.2 ~ 3.4.
provide a guideline for modeling an ideal set of features in
utilizing the SVM so that the effectiveness of masquerade
detection can be maximized. Our study analyzes the
performance of masquerade detection with respect to three
parameters: threshold levels, the type of classifiers, and the
length of instances. Section 3.1 describes our experimental
design and overall test results.
3.3 Dataset and Experimental Design
We used the most popular dataset provided by Schonlau
et al. for our experiments. This dataset is called the SEA
data and it includes 15,000 UNIX commands for each of 50
users [7]. We believed that the sequence of UNIX
commands were a good identifier to determine the identity
of each user. This approach was widely used by many
researchers [8], [9], [11]. The sequence of commands was
parsed and partitioned to generate meaningful subgroups Figure 1. Comparison of detection rates
which were fed to the SVM. That is, each user's command
history in the dataset was divided into multiple files which
were broken down into two distinct categories: training data
and test data. Commands were first taken from the dataset to
compile a 500 line file for training on the appropriate SVM
which generated a profile for each user. Next, multiple files
were generated for each sequence length, 4 to 13, for the
purpose of identifying the effectiveness in terms of sequence
length. Each user's profile was then trained on the
appropriate SVM, and the profile was used to classify each
test file for each user. For each user, 500 tests were
conducted.
We analyzed detection rates by classifying a user's profile
against other user's test files. Comparing a user’s test data
against his (or her) own normal profile generated false
alarms. Data was then collected to determine the average Figure 2. Comparison of false alarm rates
detection rate and false alarm rate for each user in terms of
different instance lengths. This data was further extended 3.4 Threshold values
into three threshold values: 35%, 50%, and 70%. This in The threshold value represents the selected minimum
turn was averaged to determine the average detection rate matching percentage so that the audited behavior can be
and false alarm rate for each sequence length. That way, we classified as a masquerade attack or not. Determining an
could determine the relationship between the threshold level appropriate threshold level directly affects the performance
and the performance of masquerade detection. of the system. That is, in general, the raise in the threshold
value causes the increase in both detection rates and false
Different types of classifiers were used for SVMs and we
alarms. Figure 3 and Figure 4 show the average detection
classified the types of classifiers into two distinctive groups:
rates and false alarm rates when three threshold values,
ordered and unordered. The order of the command sequence 35%, 50%, and 70%, were applied to each SVM.
is considered in ordered classifiers whereas it is not taken Our testing showed that threshold values had a profound
into account in unordered classifiers. The LIBSVM (a effect on detection and false alarm rates. Increasing the
Library for Support Vector Machines) is an integrated
78 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 10, 2010

threshold value increased both detections and false alarms. Figure 2). False alarm rates differed by 26 points at
Although the lowest threshold, 35%, had the lowest sequence lengths four and 18 points at length five.
detection rates, this threshold produced minimal false Note that the SVMHMM outperformed the LIBSVM in
alarms in testing (see Figure 3 and Figure 4). Much higher most cases with respect to detection rates when the instance
detection rates were seen at a threshold of 50% than at a lengths were less than 10 (see Figure 1). However, as the
threshold of 35%. While detection rates as high as 93.3% instance length was increased, the performance of both
(SVMHMM) and 96.3% (LIBSVM) were achieved at a SVMs converged at length 10. The performance degradation
threshold level of 70%, this was at a cost of a high false in the SVMHMM seems to be caused by the increasing
alarm rate, 83.5% (SVMHMM) and 89.1% (LIBSVM) particularity as the instance lengths become too long. This is
respectively. Thus, a threshold of 70% or higher is seen as because it is less likely that users always enter a long series
impractical for use due to the excessively high false alarm of commands in the exactly same pattern.
rates. The performance of the LIBSVM, however, turned out to
be less relevant to the instance lengths and this behavior is
shown in Figure 1 and Figure 2. The reason behind this
phenomenon is that the specific order of commands entered
by users is not considered in the LIBSVM. Therefore, there
is no significant change in the performance as the instance
length varies.
3.6 Instance lengths
In order to determine the effect of applying different
instance lengths, we classified the employed instance
lengths into three groups: Short (lengths of 4 ~ 6), Medium
(lengths of 7 ~ 9), and Long (lengths of 10 ~ 13). Testing
results are averaged and redrawn using these groups and
they are represented in Figure 5, 6, 7, and 8.

Figure 3. Analysis of thresholds (SVMHMM)

Figure 5. Analysis of detection rates (SVMHMM)


Figure 4. Analysis of thresholds (LIBSVM)
One important fact that we have found is that the number
of false alarms was seen to increase at a lot faster rate than
detections as the threshold value was increased; note that an
average of 67% increase in the false alarm rate was found,
whereas there was only a 23.2% increase in the detection
rate when the threshold value was escalated to a 70%. Thus,
an appropriate threshold level needs to be selected in such a
way that reasonable detection rates and tolerable false alarm
rates can be achieved.
3.5 The type of classifiers
The SVMHMM (ordered classifier) outperformed the
LIBSVM (unordered classifier) in minimizing false alarms
Figure 6. Analysis of false alarm rates (SVMHMM)
when the instance lengths are not overly long. There was a
significant difference in the false alarm rates between both When the SVMHMM was used, increasing the instance
SVMs when the smallest instance length was used. The length was shown to increase both detections and false
greatest difference between the two classifiers was seen at a alarms (see Figure 5 and Figure 6). Thus, detection rates
50% threshold with the smallest two sequence lengths (see can be maximized by using larger instance lengths whereas
(IJCNS) International Journal of Computer and Network Security, 79
Vol. 2, No. 10, 2010

a smaller instance length is desirable in order to maintain attractive to use a medium length instance; note that there
lower false alarm rates. was a 21.86% increase in the detection rate (see Figure 9)
As we previously mentioned in section 3.3, the when the instance was lengthened from short to medium.
performance of the LIBSVM is less affected by the instance
length (see Figure 7 and Figure 8). Note that there was a 4. Conclusion
slight benefit in the detection rate as the instance lengths
increased under the 35% threshold setting. There have been many approaches in tackling
masquerade attacks. However, these studies primarily
focused on demonstrating the advantage of the proposed
model when compared to other approaches. The main goal
of our research is to investigate the effectiveness of
masquerade detection using SVMs. We analyzed the
performance of masquerade detection with respect to three
parameters: threshold levels, the type of classifiers, and the
length of instances.
In conclusion, no parameters that were selected and tested
were able to improve detection rates while decreasing false
alarms. In all tests, increased detection rates correlate to
increased false alarm rates. However, masquerade detection
using sequence classification was more successful when
limiting false alarms with the use of smaller instance
lengths. Increasing threshold values to a 70% showed little
Figure 7. Analysis of detection rates (LIBSVM) benefit since false alarm rates increased significantly with
only slight increase in detection rates. This study shows that
there is an advantage of using smaller instance lengths
applied to a classifier which considers the order as an effort
to minimize false alarm rates. If maximizing detection
capability is the main goal, the type of a classifier is less
relevant. Instead, it is desirable to use a longer instance at
the sufficient level of threshold where reasonable limits of
false alarms can be retained.
Finally, a new dataset, if any, could be used in order to
support and reinforce the validity of our findings. This
research will help to provide a principle for modeling an
ideal set of rules so that the effectiveness of masquerade
detection can be maximized.

Figure 8. Analysis of false alarm rates (LIBSVM) References


However, as the instance length increased, results using [1] B. Szymanski, and Y. Zhang “Recursive Data Mining
for Masquerade Detection and Author Identification,”
different classifiers (SVMHMM and LIBSVM) began to Proceedings of the Fifth Annual IEEE SMC, pp
converge (see Figure 5~8). Results showed that both 424~431, 2004
classifiers converge to the same detection and false alarm [2] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly
rates when a long instance group was used. detection: A survey,” ACM Computing Surveys
(CSUR), Volume 41, Issue 3, pp. 15:1-15:58, 2009.
[3] Z. Liu, J. Liu, and Z. Chen, “A generalized Gilbert's
algorithm for approximating general SVM classifiers,”
Neurocomputing, Volume 73 , Issue 1-3, pp. 219-224,
2009.
[4] J. Wu, Z. Lin, and M. Lu, “Asymmetric semi-
supervised boosting for SVM active learning in CBIR,”
Proceedings of the ACM International Conference on
Image and Video Retrieval, pp. 182-188, 2010
[5] T. Joachims, “Text categorization with support vector
machines: Learning with many relevant features,”
Proceedings of the European Conference on Machine
Learning (ECML), pp. 137-142, 1998.
[6] K. Wang, and S. Stolfo, “One-Class Training for
Masquerade Detection,” Proceedings of the ICDM
Workshop on Data Mining for Computer Security
(DMSEC), Melbourne, pp. 2-7, 2003.
[7] M. Schonlau, W. DuMouchel, W. Ju, A. Karr, M.
Theus, and Y. Vardi, “Computer Intrusion: Detecting
Figure 9. Analysis of Instance Length Change Masquerades,” Statistical Science Vol.16, No.1, pp.
58–74, 2001.
Therefore, we can assert that there is no significant
benefit in employing a longer instance; however, it is still
80 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 10, 2010
[8] J. Seo, and S. Cha, “Masquerade detection based on
SVM and sequence-based user commands profile,”
Proceedings of the 2nd ACM symposium on
Information, computer and communications security
(ASIACCS '07), 2007.
[9] H. Kim, and S. Cha, “Empirical evaluation of SVM-
based masquerade detection using UNIX commands,”
Computers & Security Vol. 24, No. 2, pp 160-168,
2005.
[10] S. Mukkamala, and A. Sung, “Feature Ranking and
Selection for Intrusion Detection Systems Using
Support Vector Machines,” Proceedings of the Second
Digital Forensic Research Workshop (DFRWS),
Syracuse, 2002.
[11] R. Maxion, “Masquerade Detection Using Enriched
Command Lines,” Proceedings of the International
Conference on Dependable Systems & Networks” , pp.
22-25, 2003.
[12] C. Chang, and C. Lin, “LIBSVM -- A Library for
Support Vector Machines,” [Online]. Available:
http://www.csie.ntu.edu.tw/~cjlin/libsvm. [Accessed:
Jun. 14, 2010].
[13] T. Joachims, “SVMHMM - Sequence Tagging with
Structural Support Vector Machines,” August 14, 2008,
[Online]. Available:
http://www.cs.cornell.edu/People/tj/svm_light/svm_hm
m.html. [Accessed: Jul. 26, 2010].

Das könnte Ihnen auch gefallen