Beruflich Dokumente
Kultur Dokumente
Mahmoud M. Ghozlan
Department of Information Technology
IGSR, Alexandria University
Alexandria, Egypt
ghozlan.edu@gmail.com
Abstract Most of valuable information resources for all databases will lead to bad reputation of the organization, loss of
organizations are stored in database. It's a serious subject to customers sureness and might even lead to lawsuits.
protect this information against intruders. However,
conventional security mechanisms havent been designed to There are many ways to secure databases like user
detect anomalous actions of database users. Intrusion detection authentication, data transaction encryption, data watermarking,
systems (IDS) deliver an extra layer of security that cannot be and intrusion detection. Each of these ways has its benefits and
guaranteed by built-in security tools. IDS provide the ideal they should work together to reach the maximum security level
solution to defend databases from intruders. In this paper, we of database. User authentication is a prevention technique that
suggest an anomaly detection approach that summarizes the raw prevents unauthorized users from gaining access to database.
transactional SQL queries into compact data structure called Data transaction encryption is a prevention technique also,
hexplet, which can model normal database access behavior which thwarts attackers from understanding the data in case of
(abstract the user's role profile) and recognize impostors sniffing on the session. Watermarking the data is a detection
specifically tailored for rolebased access control (RBAC) technique that used to pledge data integrity. Intrusion Detection
database system. This hexplet allows us to preserve the is a detection technique that is used to identify the malicious
correlation among SQL statements in the same transaction by activities as early as possible if the system prevention
exploiting the information in the transaction-log entry. Our mechanisms were bypassed to minimize the harm caused by
target is to improve detection accuracy, specially the detection of intruders. For more information regarding these techniques
those intruders inside the organization who behave strange
reader can refer to [1][2][3].
behavior. Our model utilizes Naive Bayes Classifier (NBC) as a
simple technique for evaluating the legitimacy of transaction. Actually, available database security mechanisms are not
Experimental results show the performance of the proposed basically designed to detect intruders; they are intended to
model in the term of error equal rate. avoid the intruders. So, intrusion detection system is
considered to be the second defense line. Database Intrusion is
Keywords Database security, Anomaly detection, Database commonly defined as a set of actions that try to violate data
intrusion detection, Role-based profiling. integrity, data confidentiality or data availability. While
I. INTRODUCTION database intrusion detection is the process of tracking
transactions submitted to database and analyzing them to detect
In today's business world, most valuable asset of possible presence of intruders. In general, there are two types
organizations is its information and thus needs efficient of database intrusion attacks (I) insider and (II) outsider [1].
management and protection. Over the last few years database Insider's attacks are the ones when an intruder has all the
systems form the central of the information systems privileges to access the database, but he performs malicious
infrastructure, because they permit the efficient administration actions. Outsider's attacks are the ones when the intruder does
and retrieval of huge amounts of data in addition to offer not have the proper rights to access the database, and attempts
mechanisms that can be employed to certify the integrity of the to first rush into and then performs malicious actions.
stored data. Data found in these databases vary between private Detecting insider's attacks is often more difficult than detecting
information, banking transactions, personal medical data, outsider's attacks.
commercial contracts, etc. Any violation of security to these
74
command, tables accessed and columns accessed) and then able to conclude role intruders, that is, individuals that while
uses it to create role profiles. This method has been extended in holding a definite role diverge from the normal activities of
[17], in which the authors upgraded their data structure to that role. With respect to ID, using roles means that the number
contain information about the SQL query predicate to increase of profiles to form and maintain is much smaller than those one
the efficiency of intrusion detection system. Their data would need when considering individual users. This implies
structure "quiplet" stores five basic information about the SQL that an ID solution, based on RBAC, could be easily deployed
statement (SQL command, relations accessed, attributes in practice.
accessed, tables accessed in SQL query predicate and columns
accessed in SQL query predicate). They also use profiling The two difficulties that we report in this work are as
technique to detect SQL injections. Furthermore, the technique follows: how to construct and maintain profiles signifying
proposed by the authors in [18] is an implementation to precise and stable user behavior, how to employ these profiles
intrusion detection mechanism in the DBMS layer using the for carrying out the ID task at hand. The main challenge in
data structure presented in [17]. overcoming these difficulties is to extract the right information
from database log, so that accurate profiles can be built. When
The approach stated by the researchers in [19] uses its own role information exists, the problem is transformed into a
data structure to create a profile for each role in the database. supervised learning problem. Comparing to the work in [19],
This type of data structure is used in further comparisons to the proposed system employs a new representation for the
detect anomaly access behavior in the database. In their database log records that holds information about the whole
approach, database log file is read to extract the list of tables transaction's commands not only the list of attributes read and
accessed by transaction and list of attributes read and written written by transactions as in the comparable work. By using
by transactions. The main disadvantages of this approach are this representation, we may possibly enhance intrusion
that; it doesn't keep any information about SQL query predicate detection performance.
in the data structure and doesn't extract the correlation among
queries in the same transaction. A. System Design
The system's design consists of three main components: the
In [20] authors proposed database intrusion detection traditional DBMS tool that grips the query execution process,
mechanism that can provide an additional security layer to the the database audit log files and the ID mechanism. These
database. It can be considered as generic approach for any modules form the new stretched DBMS enriched with an
database application to detect the malicious activities. Their independent ID system operates at application level. The flow
approach concentrated on security policies for transactions of interactions for the ID process is shown in Fig. 1. Every time
permitted by DBMS. Its designed to mine audit log of a transaction is issued, it is examined by the ID mechanism
legitimate transaction performed on database and generate before execution. First, the system transforms the new
signature for legal transactions as per security policy. The transaction into data structure supported by our ID mechanism
transactions not compliant to signature of valid transaction are (hexplet). Then the system check the hexplet contrary to the
identified as malicious transaction. existing profiles and submit the assessment of the transaction
Most of the IDS that are discussed above show a lot of (anomalous vs. not anomalous) to the response engine. The
variance in their accuracy and efficiency. The main challenge response engine applies a policy base of existing response
faced most of them is that any trial to improve the rate of mechanisms to issue a reply depend on the valuation of the
correct intrusion detection, generally causes an increase in the transaction submitted by the comparison process. In case of
false alarms as well. In this work, our attention is to build an intrusion, the most common action is to send an alert to the
intrusion detection scheme, which employs the concept of security administrator. However other actions are possible such
creating profiles based on transaction for each role in the as disable the user or drop the query. If by assessment, the
database, by means of utilizing the usage of database log file to transaction is not anomalous, the response engine simply
extract profile features with the aim to improve intrusion updates the database audit log and the profiles with the new
detection performance. The main challenge in attacking our transaction's information. Before the detection phase, the
problem is to extract the right information from the database learning phase should be performed to create the initial profiles
traces, so that accurate profiles can be built. To address this from a set of intrusion free records from the database audit log.
problem, we propose a new representation for the database log B. Hexplet Data Structure
records, which could holds information about a whole
transaction not only a single query. Algorithms in [16] [17] are In order to identify user behavior, the recommended system
similar to our approach in the concept but our approach uses the database log file for mining information regarding
surpasses them in keeping dependency between SQL users' actions. The log records, after being processed, are used
statements in the same transaction and evaluating the whole to form preliminary profiles representing acceptable actions.
transaction as one unit, rather than evaluating each SQL Each one or more entry (each single transaction) in the log file
statement individually. is represented as a separate data unit; these units are then
combined to form the desired profiles. We assume that, SQL
III. THE PROPOSED SYSTEM queries related to the same transaction are marked together in
the log file.
The approach we follow is similar to the one suggested by
Kamra et al. [17]; However, we improved the representation of
SQL commands to also include information about the whole
transaction. Our ID system builds a profile for each role and is
75
Learning Phase relations in the database. Element SEL-ATTR-BIN[i][j] is equal
to 1 if the SQL query references the j-th attribute of the i-th
relation in the query predicate; otherwise, it is equal to 0. The
Transaction's Hexplets
DB log file
Role Based sixth field is a reference which repeated as long as the
Transformation Profile Creator
remaining SQL commands in the transaction; the first five
elements in the reference hold the same preceding five
components, and so on. To summarize, in query based
Role Profiles approach, both queries of the same transaction are recorded in
Intrusion Detection Phase different data structure (i.e. different actions); but in our
approach complete transaction is recorded in one data structure
New
(i.e. queries form one action). Consider for example that the
Hexplet Compare Role
Transaction
Transaction's Profile against user issued the following transaction:
Transformation New Transaction
Start Transaction
Alarm Delete From R2 Where R2.A2 = 16 and R2.B2 = 5
Drop Response Select * from R2
Transaction Engine
End Transaction
No Action
in query based approach, this action could be considered as
Fig. 1. The proposed intrusion detection system normal. It may result in false negative. But, as the proposed
approach binds all the queries of the same transaction in one
In order to construct profiles, we need to preprocess the log behavior, it will certainly improve the performance of IDS.
file items and translate them into a format that can be analyzed
by our algorithm. Therefore, we symbolize each transaction by C. Classifier
a basic data unit that holds six fields, and thus it is called a In this work, we employ the Naive Bayes Classifier (NBC)
hexplet. Our hexplet is an extension to the data structure for the ID task in RBAC-administered databases. The
presented in [17]; it represents a single transaction and consists motivation for utilizing NBC is its low computational
of a linked list of quiplets. Hexplet contains the following requirements for detection task. The small running time is
information: the first five elements represent the first SQL mainly due to the attribute independence assumption. For a
statement in the transaction (SQL command, relations better understanding of the concepts underlying the NBC
accessed, attributes accessed, tables accessed in SQL query [16][17]. In the classification problem, a set of training
predicate and columns accessed in SQL query predicate); the hexplets is provided, and a new instance with attribute values is
sixth element holds information about rest of SQL commands given (correspond to the set of observations). The goal is to
in the transaction (if any). predict the target value, or the class, of this new instance. The
approach we define here is to assign to this new instance the
For the sake of simplicity we characterize a generic hexplet
most probable class value MAP , given the attributes
using a 6-ary relation H c, PR , PA , SR , SA , NH , where c
,, that describe it. That is
corresponds to the first SQL command, PR to the projection
relation information, PA to the projection attribute information, MAP max , ,, 1
SR to the selection relation information, SA to the selection
attribute information, all for the first SQL command, and NH using Bayes theorem we can rewrite the expression as:
refers to the next hexplet, which represents the next SQL
command in the transaction (if any) and this hexplet contains
also a reference to the next hexplet (if any) and so on. Table 1 , , ,
shows two different transactions and their representation max ,
, ,,
according to hexplets. In the example, we consider a database
schema consisting of two relations R1 = {A1; B1; C1; D1} and
R2 = {A2; B2; C2; D2}. The complete representation of the
hexplet is (SQL-CMD, PROJ-REL-BIN [], PROJ-ATTR-BIN TABLE I. EXAMPLES OF HEXPLET CONSTRUCTION
[][],SEL-REL-BIN[], SEL-ATTR-BIN[][], NEXT-HEXPLET).
Transaction Hexplet
The first field is symbolic and corresponds to the first SQL
command in the transaction, the second is a binary vector that Start Transaction {<select> < 1; 1 > < [1; 0; 1; 0];
contains 1 in its i-th position if the i-th relation is projected in SELECT R1:A1;R1:C1;R2:B2;R2:D2 [0; 1; 0; 1] > < 1; 1 >
FROM R1;R2 WHERE R1:A1 = R2:B2 < [1;0; 0; 0]; [0; 1; 0; 0]>
the SQL query. The third field is a vector of n vectors, where n End Transaction <{Null}>}
is the number of relations in the database. Element PROJ-
Start Transaction
ATTR-BIN[i][j] is equal to 1 if the SQL query projects the j-th SELECT R1:A1;R1:C1;R2:B2;R2:D2
{<select> < 1; 1 > < [1; 0; 1; 0];
attribute of the i-th relation; otherwise, it is equal to 0. Likewise, [0; 1; 0; 1] > < 1; 1 > < [1;0; 0;
FROM R1;R2
0]; [0; 1; 0; 0]>
the fourth field is a binary vector that contains 1 in its i-th WHERE R1:A1 = 5 and R2:B2 = 5
<{<delete> <0; 1> <[0; 0; 0; 0];
position if the i-th relation is used in the SQL query predicate. Delete From R2
[0; 0; 0; 0]> <0; 1> <[0;0; 0; 0];
Where R2:A2 = 16 and R2:B2 = 5
The fifth field is a vector of n vectors, where n is the number of End Transaction
[1; 1; 0; 0]> <{Null}>}>}
76
max , ,, , only and read-write roles. The dataset used for training should
be intrusion free (trusted transactions) and we used the
max . 2 database log file after revision for this purpose. The
intruders/anomalous queries were generated by reading the
in this case, estimating is simple since it requires just database log and change the role assigned to each transaction
counting the frequency of in the training data. randomly. The queries in this dataset consist of a mix of select,
insert, update, and delete commands collected at random and
requires only a frequency count over the tuples in the training hence the number of both of them is random.
data with class value equal to . To tackle the problem of zero
probability (number of observations is very small in large For the experimental assessment, we investigate the quality
training sets or zero), we adopt a standard Bayesian approach in of results of our method by calculating the precision and recall
estimating this probability as discussed in [17]. percentages, which are defined as follows [16]:
The NBC directly applies to our anomaly detection
Precision 4
framework by considering the set of roles in the system as
classes and the log file hexplets as the observations. In what
follows, we show how equation 1 can be applied for our data Recall 5
type (hexplet). If R denotes the set of roles, the predicted role
of a given observation H c, PR , PA , SR , SA , NH is here, TP is the number of times the system was able to detect
intruders correctly, FP is the number of times the system raise a
wrong alarm, while FN is the number of times that an intruder
MAP arg max access the database and the system couldnt detect him. The
results are average over a 5-fold cross validation for the dataset
to achieve randomization. One matter of concerns is the low
. , recall rate for this dataset because it reflects the ability of the
system to prevent intruders from accessing the database. The
objective of this evaluation is two-fold. First, we present results
. , 3 comparing the classification behavior when log files are
modeled using the proposed hexplet and existing f-quiplet
representations. Second, we measure the performance of our
where is the number of SQL Commands in the transaction, system in terms of computational cost related to time
is the number of attributes appears in the projection part for consumed to train our algorithm and time consumed to assess
the current processing SQL command k, and is the number transaction.
of attributes appears in the selection part for the current
processing SQL command k. With the above equation in place,
the ID task is quite straightforward. For every new transaction, TABLE II. AVERAGE PRECISION & RECALL STATISTICS %
its MAP is predicted by the classifier. If this MAP is different
from the original role associated with the transaction, an Precision Recall
anomaly is detected. For accepted transactions, the classifier can Hexplet 94.18 94
be updated in a direct way by increasing the frequency count of
F Quiplet [17] 95.62 85
the relevant attributes. The procedure for ID can easily be
generalized for the case when a user is assigned more than one
role at a time. This is because our method detects anomalies on
a per transaction basis rather than per user basis. Hence, as long
as the role associated with the transaction is consistent with the
role predicted by the classifier, the system will not detect an (Total Training Time (Seconds
anomaly. 120
IV. EXPERIMENTAL EVALUATION 100
Time in Seconds
80
These sections reports results from our experimental
evaluation of the proposed approach that illustrating its 60
performance as an intrusion detection mechanism. The 40
experiments are performed on the MySQL 5.2 CE DBMS on 20
Microsoft Windows 7 Enterprise SP1 32 bit running on a 0
machine has the following configurations: Intel Core Duo CPU 0 200 400 600 800 1000120014001600180020002200
T2350 @ 1.86 GHz 1.87 GHz , 2 GB of RAM. The real dataset
used for evaluating consists of 2000 transactions (6500 SQL
Statements). Each of them contains a random number of SQL Number of Transactions
statements. The database itself consists of 55 Relations Fig. 2. Total training time with different number of transactions
(Tables) with 665 attributes (columns) in all. In the database
there are 8 roles that access the database with different read
77
The complexity of the detection algorithm is O R N A)
(Average Detection Time (Milliseconds where R is the number of roles in the database, N is the number
200
of SQL statements in the transaction, and A is the average
Time in Milliseconds
78
[11] S. Lee, W. Low, and P. Wong, ''Learning Fingerprints for a Database
Intrusion Detection System'', Proceedings of the 7th European
Symposium on Research in Computer Security, Switzerland, pp. 264 -
170, October 14 - 18, 2002.
[12] Y. Hu and B. Panda, ''Identification of Malicious Transactions in
Database Systems'', Proceedings of 7th International Database
Engineering and Applications Symposium, pp. 329 - 335, Hong Kong,
July 18-17, 2003.
[13] Y. Hu, and B. Panda, ''A Data Mining Approach for Database Intrusion
Detection'', Proceedings of ACM Symposium on Applied Computing,
pp. 711 - 718, Cyprus, March 14-16, 2004.
[14] C. Chung, M. Gertz, and K. Levitt, ''DEMIDS: A Misuse Detection
System for Database Systems'', Proceedings of the 3rd International
Working Conference on Integrity and Internal Control in Information
Systems, , pp. 159 - 168, Netherlands, November 17-19, 1999.
[15] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu, ''Hippocratic Databases'',
Proceedings of the 17th International Conference on Very Large
Databases, pp. 143 - 154, Hong Kong, August 20 - 23, 2002.
[16] E. Bertino, A. Kamra, E. Terzi, and A. Vakali, ''Intrusion Detection in
RBAC-Administered Databases'', Proceedings of the 21st Annual
Computer Security Applications Conference, pp. 160 - 172, USA,
December 05 - 09, 2005.
[17] A. Kamra, E. Terzi, and E. Bertino, ''Detecting Anomalous Access
Patterns in Relational Database'', The Very Large Database Journal, vol.
16, Issue 5, pp. 1063 - 1077, 2008.
[18] A. Kamra, E. Bertino, and G. Lebanon, ''Mechanisms for Database
Intrusion Detection and Response'', Proceedings of the 2nd SIGMOD
PhD Workshop on Innovative Database Research, pp. 31 - 36, Canada,
June 13, 2008.
[19] U. P. Rao, G. Sahani, and D. Patel, ''Machine Learning Proposed
Approach for Detecting Database Intrusions in RBAC Enabled
Databases'', Proceedings of the International Conference on Computing
Communication and Networking Technologies, pp. 1 - 4, India, July 29 -
31, 2010.
[20] Y. Rathod, M. Chaudhari, and G. Jethava, ''Database Intrusion Detection
by Transaction Signature'', Proceedings of 3rd International Conference
on Computing Communication & Networking Technologies, India, pp. 1
- 5, July 26 - 28, 2012.
79