Sie sind auf Seite 1von 7

2010 Sixth International Conference on Information Assurance and Security

RAPID: Reputation based Approach for Improving


Intrusion Detection Effectiveness
Ashley Thomas
SecureWorks, Inc
Atlanta, USA
athomas@secureworks.com
malicious behavior. For example, consider the signature that
searches for the byte pattern of a series of 0x90 (noop sled),
that is often part of a buffer overflow attack. Similarly, the
fnord preprocessor module that is part of open source IDS
snort, detects polymorphic shellcode, by searching for a series
of bytes that correspond to noop instructions. Such mechanisms
are broad enough to detect any buffer overflow payload that
consists of such a pattern, and will successfully detect new
attacks, zero days and variations to existing attacks. However,
these tend to also match certain benign payloads, such as, a
binary file transfer, a JPEG file, or some encrypted payload. In
fact, since such payloads are more frequently encountered in
daily internet traffic data, these signatures or modules are likely
to generate more false positives than true positives. Similarly,
behavior based and anomaly intrusion detections are other
classes of mechanisms that detect attacks that are not specific
to any vulnerability or exploit; these also tend to generate more
false positives.

Abstract Reducing false positives have been one of the toughest


challenges and a very practical problem in real life deployments
of intrusion detection systems. It leads to decreased confidence in
the IDS alerts. The security analyst is faced with the choice
between disabling valuable signatures that also generate false
positives on one hand, and missing true alerts among the flood of
false positives on the other hand. In this paper we present an
architecture that utilizes IP reputation along with signature levels
in order to reduce false positives and thereby increase the
effectiveness of the IDS. In the proposed approach the IDS
signatures are classified and grouped into various levels based on
their false positive rating, and the incoming traffic is analyzed by
one or more of the signature levels based on the reputation of the
IP addresses. We also discuss a prototype implementation of the
proposed approach that is based on open source IDS Snort.
Evaluation showed promising results in reducing false positives
and corresponding improvement in Bayesian detection rate for
the prototype system as compared to Snort.
Keywords- Intrusion Detection Systems, Reputation, False positive
reduction

I.

With the typical IDS device that uses very specific


signatures covering attacks against specific vulnerabilities, it is
race against time. New signatures have to be developed and
deployed before the attacker launches an attack. The generic
signatures on the other hand offer protection irrespective of the
type of vulnerability, and hence are useful. Instead of disabling
such signatures in the face of huge false positives, it would be
better if such signatures can be applied on a subset of the
traffic, especially the more suspicious subset.

INTRODUCTION

Among the various challenges faced by IDS devices, the


case of high false positives generated is a significant one [1].
False positives occur when the criteria or signature used to
identify a malicious activity also misidentifies legitimate ones
as malicious. In a normal environment, typically, there are very
few intrusions in a day, where as there are thousands or
millions of benign sessions. This means that the base rate or the
unconditional probability of intrusion in such environments is
extremely low. Axelsson et al [2] showed that due to the base
rate fallacy problem, an IDS device's effectiveness is
drastically affected; that the IDS device generates a huge
volume of false positives even though the detection rate or
accuracy may be perfect. Subsequently, the security analyst
soon learns to ignore such alerts, or disables the signature
completely. This, in effect, reduces the effectiveness of the
intrusion detection systems.

In this paper, we propose an IDS architecture that groups


signatures and detection mechanisms according to their false
positive rating and priority. They are, then, selectively applied
against the incoming traffic based on its relative suspicion
level; for example highly suspicious traffic is analyzed by both
low and high false positive groups. The goal of this architecture
is to increase the IDS effectiveness.
As part of the proposed IDS architecture, we also explore
the reputation based approach for classifying incoming traffic
into various classes of suspicion level. Most network based
attacks use a multi stage strategy, for example reconnaissance,
gaining access, maintaining access, inflicting damage,
attacking further, and covering tracks [3]. By detecting prior
stage of attacks and maintaining a reputation for the concerned
IP addresses, both attacker and victim, subsequent stages may
be detected using the combined power of specific and generic
signature set. Attacks from the past that are part of separate
attack (than a prior stage) also contribute to the reputation.

There are signatures that are created to detect attacks against


specific vulnerabilities, and others that are aimed at detecting
some generic pattern or activity. The former type, i.e. the
highly specific signatures, is usually recommended in order to
avoid false positives. The downside of this approach is that
such signatures fail to detect when there is the slightest
variation in the attack. Generic signatures and detection
modules, on the other hand, targets a broad spectrum of

978-1-4244-7409-7/10/$26.00 2010 IEEE

118

II.

RELATED WORK

III.

The challenge of controlling false positives may be


addressed at various stages. It may be controlled at the IDS
device itself, for example by writing more specific signatures.
Alternatively, once the alert has been generated by the IDS
device, various approaches such as alert clustering, alert
aggregation and alert correlation techniques may be used to
address the false positives. Valdes et al [12] used a heuristic
based approach in order to group alerts based on alert
similarity. Debar et al [13] proposed an approach that
aggregates and correlates alerts produced by a number of
different IDS devices. Our approach attempts to control the
false positive at the IDS device itself using reputation and
signature levels. By preventing the generation of the false
positives completely critical IDS resources and post processing
work may be saved. However, both approaches may be used in
conjunction as they are essentially complimentary.

BACKGROUND

Bayesian Detection Rate


The Bayesian detection rate is a metric for measuring IDS
effectiveness [2][15], and is defined as the likelihood of an
actual intrusion when an alert has been generated by the IDS.
Axelsson et al [2] showed that when the base rate of intrusion
is low, the Bayesian detection rate is dominated by the false
positive rate, even large differences in true positive rate does
not alter it substantially. The base rate of intrusion is the
unconditional probability of intrusion.
Let I and ~I denote an intrusion and non intrusion events; and
let A and ~A denote the presence and absence of an alert. The
Bayesian detection rate, is the probability of an
intrusion given that an alert has been generated, and is given
by

Bolzoni et al [7] approached the false positives problem by


using correlation and alert verification. Using the additional
context of output anomalies, and correlating with the alerts
generated from the IDS, the false positives are reduced. The
effectiveness of this approach depends on the accurate
identification of an output anomaly by the output anomaly
detector engine, for alerts generated by the NIDS. For every
attack there may not be a corresponding output anomaly and
this could result in false negatives. Other practical network
scenarios such as asymmetric routing could also limit its
effectiveness.


+ ~ ~

(1)

: , : ,
~ : , ~ = 1
When () is very low (of the order of 2 times 10 -5), even for
unrealistic detection rate, (|), (as high as 100%), the false
positive rate, (|~), has to be extremely low ( of the order
of 10-5) for the Bayesian detection rate, to be 66%.

Christodorescu et al [4] addressed the problem of false


positives using a combination of detectors approach. The
approach considered a set of attacks that can be detected by the
IDS using an expensive but more accurate method, for example
sandboxing. Due to performance constraints this method
cannot be to be applied to every connection. Therefore, a
combination of detectors was used; an efficient detector
distinguished between benign and might-be-intrusion events
and a less efficient further analyzed the might-be-intrusions to
make final decision. Remaining under given performance
constraints their approach is able to perform costly detection
techniques for a small subset of the traffic. The approach
suggested a solution to reduce false positives associated with
signatures for which there is an expensive, but accurate,
alternate method to detect the considered attack. However, in
all detection techniques that have a high frequency of false
positives, there need not be an alternative detection technique
that is costly, but with lower false positive rate. Also, in other
cases, implementing the expensive detection in a real scenario
may not be practical, for example, maintaining various versions
of FTP servers for sandboxing in order to detect attacks against
those FTP server versions may not be practical. In addition, any
inconsistencies between the real server and the sandbox server
(differences in version, or patch level) may lead to errors in
detection. Our approach can be applied to reduce the false
positives of any signature, including very generic signatures, by
placing them in an appropriate signature level and selectively
applying them on the network traffic based on reputation.

Effect of for a given | 0 , and |~ 0

In (1) when is 1 (completely attack traffic), the


Bayesian detection rate gets the maximum value of 1 for any
| ! = 0 . On the contrary, when is 0 (completely
benign traffic) the Bayesian detection rate becomes 0, given the
false positive rate, |~ , is non zero. In realistic networks,
the base rate of intrusion is very low and hence results in a low
Bayesian detection rate. One way to improve the Bayesian
detection rate, and hence the effectiveness of the system, is to
increase the base rate of intrusion.

Effect of | for a given 0 , and |~ 0

As the true positive rate | increases, the Bayesian


detection rate increases.

Effect of |~ for a given 0 , and | 0

As the false positive rate |~ increases, the Bayesian


detection rate decreases. When is low, this decrease is
more prominent.
The total effectiveness of the IDS is a function of the
effectiveness of each of the individual signatures. Signatures
that have a relatively high false positive rate can cause the
overall IDS system's false positive rate to be high and
subsequently, the Bayesian detection rate of the system to be
low, especially in environments with very low base rate of
intrusion. When the Bayesian detection rate is really low, say
0.01, it means that out of 100 generated alerts only one may
indicate an intrusion. This may lead to a scenario where
identifying that one attack from all the noise becomes difficult
and impractical. In practice, it is common for the security

119

officer or administrator to disable signatures that are noisy, and


thereby in effect controlling the Bayesian detection rate of the
system. The risk of missing an attack due to the disabled rule is
accepted. Clearly, this becomes a tradeoff between false
positives and false negatives.
IV.

system, or globally based on information and events analyzed


and shared between groups of trusted systems.
With respect to the problem at hand, a reputation system is
useful in order to classify incoming traffic based on relative
level of suspicion. An IP address that has conducted a
reconnaissance or an attack in the past will be maintained in the
reputation store with an associated score. Similarly, IP
addresses of well known attackers and botnets may also be
included in the reputation store. The more malicious activity
that the IP address was involved in, the higher the reputation
score would be. An IP address that was possibly compromised,
based on past alerts, would also have a relatively high score,
since the chances of that IP address initiating malicious traffic
is relatively high.

MOTIVATION FOR SIGNATURE LEVELS AND TRAFFIC


CLASSIFICATION

In most cases of real network traffic data, the base rate of


intrusion will be low, and in such situations the false positive
rate dominates the Bayesian detection rate equation and results
in a poor value. In order to improve the Bayesian detection
rate, one option is to increase the base rate. This is the main
point of our approach.
A typical signature based intrusion detection system may
comprise of thousands of individual signatures each with
varying priority, specificity, sensitivity, and cost. The
signatures can be grouped into various signature levels based
on their false positive rating (specificity) and also the priority.
When the base rate of intrusion is low, it is mainly the
signature levels with the relatively higher false positive ratings
that affect the Bayesian detection rate of the system. This may
be improved by increasing the effective base rate of intrusion
from the perspective of these signature levels. In other words, if
the traffic data subset that is analyzed by these signature levels
can be created such that the base rate of intrusion for that
subset is higher, then the Bayesian detection rate can be
improved. This is the motivation for using the signature levels
approach and for classifying the incoming traffic data so as to
be analyzed by the appropriate signature levels.

Criteria for building reputation


There should be a set of defined criteria to build and
maintain reputation for IP addresses. The following three
criteria are used for that purpose:

Consider a signature that has a relatively high false positive


rate. This signature may be placed in a signature level with
other signatures of similar false positive ratings, and only
highly suspicious traffic subset (i.e. with high base rate of
intrusion) is applied against it. This can, in effect, result in a
higher Bayesian detection rate for that signature and
subsequently in a higher Bayesian detection rate for the system.
However, if a particular signature is applied only against a
subset of the traffic, it may result in false negatives (missed
attacks) due to the presence of an attack in the uninspected
traffic subset. This essentially is a selection problem in which,
based on the priority and the specificity (false positive rate) of
signatures and the tolerance level for false positives, the
signatures are to be grouped into multiple levels.
V.

Local Alerts: The IDS alerts generated are fed back to the
reputation system. This information includes the source
and destination IP addresses, and a suspicion score (that
corresponds to the signature priority). The reputation of
the concerned IP addresses increases with each generated
alert.

Distributed Alert Information: Alert information may be


shared among a group of trusted devices, in order to
maintain a globally valid reputation system. In this paper
we are considering reputation only as built from a single
IDS perspective.

Traffic Heuristics: This is yet another factor that may be


used in order to select suspicious IP addresses. For
example, various characteristics of the network traffic
could be analyzed for abnormalities. Abnormalities may be
seen at network layer in the form of high percentage of
fragments, or at the transport layer in the form of high
percentage of out of order segments, overlapping segments
etc. Such traffic abnormalities may be due to the use of
IDS evasion tools such as fragroute [6].

Avoiding IP Address Spoofing


Since IP address can be spoofed easily, an attacker may try
to manipulate and corrupt the reputation system. The system
should take precaution to avoid such manipulation. Only that
traffic which has a very low chance of being spoofed should be
considered for building reputation. It would be advisable to
avoid using UDP and ICMP based alerts because these have
high chance of spoofing. Also, stateful analysis of TCP should
be done in order to avoid alerting on stateless traffic and
corrupting the reputation store.

REPUTATION BASED ANALYSIS

The concept of reputation is commonly used in the area of


network security. It is used in email spam detection methods
[11], where a list of IP addresses and corresponding reputation
score is maintained and used to identify spam email. It is used
in mobile ad hoc networks [10] to identify misbehaving nodes
and also in peer to peer networks to classifying good and bad
peers.

VI.

PROPOSED ARCHITECTURE AND PROTOTYPE

We propose a Reputation Based Approach for Intrusion


Detection (RAPID); the proposed architecture is shown in
Fig.1. The signatures and analysis modules have been grouped
into multiple levels based on priority, and false positive rating.
Fig.1 only shows three levels; there may be arbitrary number of

Reputation about an entity, for example IP address, is an


opinion that is gathered and maintained based on information
in the past. Such reputation can be formed locally on a system
based on the information that is seen and analyzed on that

120

Figure 3. Proposed Architecture.

signature levels, however. For a given priority, the false


positive rating increases as the level increases.

network intrusion detection system, and is mainly signature


based. Secondly, although significant improvements have been
made to Snort, it still struggles with high number of false
positives in real life scenarios. These reasons make snort a
good candidate against which our approach may be validated.

The classifier module does a reputation lookup in order to


retrieve the suspicion score of the source and destination IP
address of the input traffic. The traffic is then analyzed by the
signature levels starting from the lowest level. There is a check
after each level to verify if the traffic needs to be analyzed by
the higher levels (with higher false positive rating). The alerts
and heuristics generated by the IDS signatures and analysis
modules are fed back so that the reputation of the IP addresses
is maintained. Fig.1 also shows the reputation being shared
between distributed systems.

A snort signature consists mainly of two parts - a rule header


and rule options list. The rule header contains the rule's action,
protocol, source and destination IP addresses and netmasks,
and the source and destination port information. The rule
option section contains alert messages and information on
which parts of the packet should be inspected to determine if
the rule action should be taken.

Incorporating the architectural needs

The main aspects of the architecture are as follows

Signature levels: This is a logical grouping of various IDS


signatures and modules based on their relative false
positive rating and priority. The priority relates to the
relative importance of the signatures and modules.

Reputation system: This is a database or store that gives a


reputation score for a given IP address. This store may be
populated with the IP addresses of well known attackers
and botnets. In addition, this is continually updated based
on the local alerts information. The system architecture
should provide for this capability.

Classifier module: This module classifies the input traffic


based on the reputation score of the source and destination
IP addresses.

Based on the architectural needs described in section-VI,


the following modifications and additions were made to Snort
program.

RAPID: A PROTOTYPE IMPLEMENTATION


In order to validate our idea, we implement and incorporate
the framework into Snort [5] version 2.6. We chose Snort
because of the following reasons. Firstly, Snort is a widely used

121

Signature levels: A new rule option, siglevel, was created


in order to classify the rule according its false positive
rating. The rule option takes a numeric value as argument
in the range 1 to 10, 1 being the least false positive rating
and 10 being the highest. This numeric value maps to a
corresponding siglevel_threshold which is configurable.
Signatures with similar siglevel rating form a logical
signature level. The allocation of siglevel to a signature or
module is an iterative process based on observed behavior
(false positives) and may be based on heuristics.

Reputation system: A hash table based IP cache was


implemented for maintaining reputation information in
order to facilitate the classification of traffic based on
suspicion level of the IP address. When a signature triggers
the corresponding IP address (source, destination or both)
is inserted into this cache with an associated reputation

score that is determined by the priority of the triggered


signature. If an entry already exists, the score is
incremented by the score specified in the signature. The
higher the score the more suspicious the corresponding IP
address.

~ =
+
+
The Bayesian detection rate was calculated based on the false
positive rate, the true positive rate and the base rate of intrusion
using (1).
=

Signature priority: Snort already supports a rule option


namely, classification that includes a priority. A direct
mapping between the signature priority and a signature
score was implemented, for e.g. priority 1 maps to a score
of 100. This means that when a priority 1 signature is
triggered, the reputation score of the corresponding IP
address is incremented by 100. Table 1 shows a sample of
the various signature classification levels and the
appropriate priorities that is used by Snort.

Evaluation datasets
DARPA Dataset: The first set, DARPA 1999 IDS evaluation
dataset [14], is an artificially created dataset. Although the
dataset is not current, we used it mainly because it is labeled
(i.e. ground truth is known), publicly available, and widely
used as a standard for IDS evaluation purposes [9]. The dataset
is divided into five weeks, two of which are training sets
(attack free) and the rest three weeks are test datasets that are
labeled (containing attacks). For the evaluations, we used week
1 dataset which is attack free, and also week 4 and 5 datasets
that has labeled attacks. Together week 4 and 5 contains 201
instances of attacks. Of these, 8 of them had console as the
attacker instead of the IP address. These were not considered
for the evaluation since correlating based on IP address with
the generated alerts was not possible. This left a total of 193
attacks as shown in Table II. The total number of connections
(including TCP, UDP and ICMP) were identified by counting
the different 5 tuples (protocol, source IP address/port,
destination IP address/port) and the base rate of intrusion was
calculated by dividing the total number of attacks by the total
number of connections.

Classification module: When the IDS starts inspecting a


packet, a lookup is made to the reputation information
cache to retrieve the associated score of the involved IP
addresses. Traffic with higher suspicion score (maintained
in the reputation store) is analyzed by signature levels with
higher siglevel such that _
.
TABLE I.

CLASSIFICATION

Classification

Description

Priority

Successful-admin

Successful Admin Privilege Gain

Attempted-admin

Attempted Admin Privilege Gain

Successful-user

Successful User Privilege Gain

Attempted-user

Attempted User Privilege Gain

Dataset

Conns

Attacks

Base rate

Successful-dos

Denial of service

Darpa Week4-5

530571

193

0.00036375

Attempted-dos

Attempted Denial of service

Private set

243570

1167

0.00479

Bad-unknown

Potentially Bad Traffic

TABLE II.

BASE RATES OF EVALUATION DATASETS

The DARPA evaluation dataset [14] also provided a truth


scoring file that contained the details of each attack instance.
This includes the Name of the Attack, Attackers IP address,
Victims IP address, Victims port (for example, port 25),
Attack duration, Attack start time among other details. During
evaluation, this information was used to classify the generated
alerts as false positives or true positives, and also to identify
false negatives as described earlier.

VII. EXPERIMENTS AND RESULTS


In order to validate the approach we evaluate and compare
RAPID and open source Snort using the test datasets. The
signature set used is a registered user version, released by
snort.org[8] and comprising of 3004 rules. Additional
configuration consists of a list of preprocessor modules,
including stream4, frag2, fnord, back orifice, and portscan
modules. We refer to this configuration as the default
configuration.

Private Dataset: The second dataset is from real network


traffic (as opposed to artificially generated) that is collected
from a corporate network over a period of a week. The dataset
consists of 25 million packets, and comprises mainly of HTTP,
SMTP, SSH, among other traffic types. As opposed to the
DARPA set, for which the ground truth or the labeling of
dataset was provided, the labeling was done for the private
dataset as a separate step before the evaluation. The set
contains 243570 connections and has a total number of 1167
attacks. This gives a overall base rate of intrusion, () of
0.00479 (shown in Table II).

Evaluation approach
The evaluation conducted is a trace driven approach. We
selected datasets for which we know the ground truth, i.e.
which data (packets) are benign and which data are malicious.
Based on this information, we calculated the base rate of each
dataset as shown in Table II. After the IDS under evaluation
have processed the dataset, the generated alerts are compared
against the ground truth in order to identify true positives (TP),
false positives (FP), true negatives (TN) and false negatives
(FN). The false positive rate ~ and the true positive rate
were calculated based on the following equations.

Results Discussions:
DARPA Dataset: The DARPA week 1 dataset is attack free.
We evaluated open source Snort (default configuration) using
this dataset. Since the dataset was completely attack free, all

122

TABLE III.

the generated alerts were false positives and the number of true
positives is zero, as shown in Table III. The top 5 high false
positive alerts were identified (Table IV). The corresponding
signatures and modules were allotted to an appropriate
signature level based on their relative false positive rate as part
of the RAPID configuration. This may be considered as a
tuning step for the RAPID system. Using this configuration we
evaluated RAPID with week 1 dataset. The results are shown in
Table III; as expected the results showed a decrease in false
positives.

RESULTS: DARPA WEEK 1 ATTACK-FREE DATASET

IDS

Total
Alerts

FP

TP

Snort - default config

125619

125619

RAPID

281

281

Snort - disabled config

281

281

TABLE IV.

As a separate step, the selected top 5 false positive signatures


or modules were completely disabled in a separate
configuration; we refer to this configuration as the disabled
configuration. Evaluating Snort with this disabled
configuration also shows decreased number of false positives
as expected. When there are no attacks, disabling the generic
signatures that generate high rate of false positives have no
consequences, i.e. subsequent false negatives, and the results is
similar to the results for RAPID.

Alert Name (signature id or module name)

Count

Sig level

SNMP request udp (sid: 1417)

62251

SNMP public access udp (sid: 1411)

62251

Possible Mutated IA32 NOP Sled detected (fnord)

495

WEB-MISC /doc/ access (sid: 1560)

209

POLICY FTP anonymous login attempt (sid: 553)

301

TABLE V.

As the next step the systems were evaluated using the week 4-5
datasets that contains labeled attacks. Snort was evaluated
using both the default and disabled configurations. For RAPID
the previous configuration was used for initial evaluation.
Following the initial evaluation, the top 5 high false positive
alerts were identified and the corresponding signatures were
allotted to an appropriate signature level. After this 2nd tuning
step, the RAPID system was re-evaluated (referred as RAPID
tuned in the results table). The results are shown in Table V
and VI. The results showed that using RAPID the false
positives dropped by 72% for the initial run, and by 88% after
re-tuning. Completely disabling the high false positive
signatures (disabled configuration) produced the least false
positives.

DARPA WEEK 4-5 ATTACK DATASET

IDS

Total
Alerts

False
Alerts

True
Alerts

Attacks
Detected

Attacks
Missed

Snort - default config

29784

21066

8718

80

113

RAPID

12482

5900

6582

69

124

3965

2680

1285

57

136

574

417

157

22

171

RAPID - tuned
Snort-disabled config

TABLE VI.

The false negatives, i.e. missed attacks, is maximum for Snortdisabled-config, where the high false positive signatures and/or
modules were completely disabled. In case of RAPID, the high
false positive signatures were not completely disabled but
placed in an appropriate signature level so that only
connections with a certain suspicion level was analyzed by
those signatures. After the 2nd tuning step (RAPID-tuned), yet
another set of top 5 high false positives signatures were placed
in higher signature levels. In these cases, corresponding to the
reduction in false positives, we see an increase in false
negatives as well. But the false negatives are not as high as in
the case of Snort-disabled-config.

TOP 5 FALSE POSITIVE ALERTS

DARPA WEEK 4-5 ATTACK DATASET - CALCULATIONS

IDS

P(A|I)

P(A|~I)

P(I|A)

Snort - default config

0.414

0.02173

0.0068

RAPID

0.357

0.00608

0.0209

RAPID tuned

0.295

0.00276

0.0374

Snort- disabled config

0.113

0.00043

0.0872

Private Dataset: In the private dataset, all the attacks originated


from a single IP address, which performed a series of scanning
activity followed by attacks. Since the dataset is not labeled,
manual analysis was performed. No instances of false negatives
were identified when the dataset was analyzed by open source
snort (default configuration). This was used as a baseline for
comparing results.

The Bayesian detection rate, , is the probability of an


actual intrusion when an alert is generated. This was calculated
using (1). This value for RAPID was between Snorts default
configuration and disabled configuration. Further, it can be
seen that after the 2nd tuning step, there was a reduction in total
false positives (with a corresponding increase in false
negatives). Therefore, by tuning the RAPID system, the
security officer shall be able to negotiate a middle ground
between the two extremes, i.e. having to deal with a lot of false
positives, and missing a lot of attacks.

The results of the comparison between open source IDS and


our prototype system, RAPID, is shown in Table VII. The
number of false positives reduced significantly for RAPID and
the resulting Bayesian detection rate also improved
significantly. The Bayesian detection rate, i.e. the probability of
an actual intrusion when an alert is generated, was almost 99%
for RAPID compared to 34% for open source snort.

123

TABLE VII.

RESULTS: PRIVATE DATASET

IDS

TP

FP

P(A|I)

P(A|~I)

P(I|A)

Snort

1229

2358

0.00968

0.3320

RAPID

1229

0.0000328

0.9932

approach could be to consider the goodness side also. The


better the reputation of an IP address, packets to and from that
IP address could be given lesser inspection or analysis. This
may be of interest from a performance perspective, especially
when the IDS is overloaded and is driven to resource
exhaustion situations.
We discussed the use of reputation in order to create traffic
subsets with increased base rate of intrusion, so that certain
valuable signatures that otherwise have high false positive
rating could be used against that subset. Anomaly based IDS
also could benefit from such an approach, especially when
used in conjunction with a signature based IDS.

It may be noted that the Bayesian detection rate of the system


was higher with the private dataset that had a higher base rate
of intrusion. This is consistent with the (1).
Limitations:
The approach is based on a reputation system that is based
on IP addresses. In most cases, the IP address is a valid identity
of the end point. There are certain special environments such as
NAT (Network Address Translation) and DHCP that we will
discuss about.

REFERENCES
[1]

[2]

Network Address Translation:

In this case the traffic from all machines behind the NAT
device, say a firewall, will have the same source IP address. In
this case, the attack traffic from one of the machines behind the
NAT device can affect the reputation of all the devices behind
the NAT device. Therefore, the benign and suspicious traffic
from those machines will end up being inspected by the high
false positives signatures as well.

[3]
[4]

[5]

DHCP:
[6]
[7]

In the case of DHCP the machines receive a temporary IP


address for a period of time as a lease, and after the lease
expires the IP address may be allotted for a different machine.
This poses a challenge for our approach since our assumption is
based on using IP address as an identity for the attacker. The
use of timeouts is used as a work around for this issue, i.e. IP
addresses entered into the reputation system will be timed out
after a period of inactivity, during which there are no alerts or
suspicious activity detected.

[8]
[9]

[10]

VIII. CONCLUSION AND FUTURE WORK


[11]

Reducing false positives is one of the major practical


challenges in real life IDS deployments. In this paper we
proposed an approach to improve false positives reduction.
We discussed our implementation of that approach and
conducted a side by side evaluation of our prototype system,
RAPID and the open source IDS Snort. Our evaluation
showed promising results favoring the proposed approach.
As a future work the reputation may be calculated from a
set of distributed and trusted devices. This may result in
globally valid reputation information to be used by each of the
individual devices.
Our approach used the reputation concept to associate IP
addresses to certain level of badness. One variation to this

[12]

[13]

[14]

[15]

124

T. Pietraszek, Using adaptive alert classification to reduce false positives


in intrusion detection, in: RAID2004, (Vol. 3324) of Lecture Notes in
Computer Science, Sophia Antipolis, France, 2004. Springer-Verlag, pp.
102-124.
Stefan Axelsson, The base-rate fallacy and its implications for the
difficulty of intrusion detection, Proceedings of the 6th ACM
Conference on Computer and Communications Security, 1999 pp 1-7
Nong Ye: "Secure Computer and Network Systems: Modeling, Analysis
and Design", Published by John Wiley and Sons, 2008
Mihai Christodorescu and Shai Rubin,"Can Cooperative Intrusion
Detectors Challenge the Base-Rate Fallacy",Published in Malware
Detection, volume 27 of Advances in Information Security. SpringerVerlag, October 2006.''
Martin Roesch, "Snort: Lightweight intrusion detection for networks,
13th Systems Administration Conference (LISA'99)}, pp 229-238.
USENIX Associations, 1999.
Dug Song, 2002, Fragroute, http://www.monkey.org/dugsong/fragroute/
Damiano
Bolzoni,
Bruno
Crispo
and
Sandro
Etalle
ATLANTIDES: An Architecture for Alert Verification in Network
Intrusion Detection Systems,2007
Snort Ruleset Download. http://www.snort.org/start/rules
Peter Mell,Vincent Hu, Richard Lippmann, Josh Haines, Marc Zissman:
An Overview of Issues in Testing Intrusion Detection Systems
http://csrc.nist.gov/publications/nistir/nistir-7007.pdf
P. Michiardi and R. Molva, Core: A Collaborative Reputation
mechanism to enforce node cooperation in Mobile Ad Hoc Networks,"
Communication and Multimedia Security Conference (CMS'02),
September 2002.
Holly Esquivel, Tatsuya Mori and Aditya Akella and Almir Mutapcic.
On the Effectiveness of IP reputation for Spam Filtering, COMSNETS
2010
A. Valdes and K. Skinner. Probabilistic Alert Correlation. Proceedings
of the 4th Intl Symposium on Recent Advances in Intrusion Detection
(RAID)2001.
H. Debar and A. Wespi, Aggregation and Correlation of IntrusionDetection Alerts, Proceedings of the 4th International Symposium on
Recent Advances in Intrusion Detection (RAID) 2001
R. P. Lippmann, Joshua W. Haines, David J. Fried, Jonathan Korba, and
Kumar Das, The 1999 DARPA Off-Line Intrusion Detection Evaluation,
Computer Networks, In Press, 2000.
Guofei Gu , Prahlad Fogla , David Dagon , Wenke Lee. Measuring
intrusion detection capability: An information-theoretic approach. In
Proceedings of ACM Symposium on Information, Computer &
Communications
Security
(ASIACCS06)

Das könnte Ihnen auch gefallen