Sie sind auf Seite 1von 7

2010 10th Annual International Symposium on Applications and the Internet

Analysis & Detection of SQL Injection Vulnerabilities via Automatic Test Case
Generation of Programs

Michelle Ruse Tanmoy Sarkar Samik Basu


Department of Computer Science Department of Computer Science Department of Computer Science
Iowa State University Iowa State University Iowa State University
Email: mruse@iastate.edu Email: tanmoy@iastate.edu Email: sbasu@iastate.edu

AbstractSQL injection attacks occur due to vulnerabilities The primary challenge in detecting and avoiding such
in the design of queries where a malicious user can take injection attacks stems from the fact that it is difficult, if
advantage of input opportunities to insert code in the queries
not impossible, to check for all possible forms of malicious
that modify the query-conditions resulting in unauthorized
database access. We provide a novel technique to identify the user-inputs that can make key conditions in SQL queries
possibilities of such attacks. The central theme of our technique tautology. Typically, SQL injection detection (we will refer
is based on automatically developing a model for a SQL query as SQLID) techniques rely on enforcing certain rules that
such that the model captures the dependencies between various must be satisfied by the user-inputs. In the event the user-
components (sub-queries) of the query. We, then, analyze
inputs violate any one of these rules, the query is deemed
the model using CREST test-case generator and identify the
conditions under which the query corresponding to the model is malicious and not allowed to execute. These rules are then
deemed vulnerable. We further analyze the obtained condition- enforced or checked at runtime with actual user inputs to
set to identify its subset; this subset being referred to as the queries. While the existing rule-based techniques provide
causal set of the vulnerability. Our technique considers the promising results, they suffer from the drawback that the
semantics of the query conditions, i.e., the relationship between
rules for SQLID are constructed from the syntactic structure
the conditions, and as such complements the existing techniques
which only rely on syntactic structure of the SQL query. In of the SQL query.
short, our technique can detect vulnerabilities in nested SQL In this paper, we show that there is a large class of
queries, and can provide results with no false positives or false SQL queries where the existing SQLID techniques, based
negatives when compared to the existing techniques. on syntactic analysis of queries, either fail to detect injec-
Keywords-SQL injection, Program Verification, Assertion, tion vulnerability (false negative) or detect a vulnerability
Test cases, Decision Diagrams for some input when there is none (false positive). We
further show that such false inferences can be avoided by
I. I NTRODUCTION considering the dependencies between the conditions in the
Many Web services provide end-users with the capability SQL queries, especially when the query is nested. Such
to query and update databases available over the Internet. dependencies cannot be identified by pure syntactic analysis.
One of the most common form of code-injection based The steps in our technique are as follows:
attacks to such services is caused by the vulnerabilities in 1. The SQL query is translated into a C program that cap-
the design of queries that are made available to the end- tures the nesting of sub-queries and dependencies between
users. These attacks allow intruders to obtain results from their query conditions.
and/or update databases without proper authorizations. In 2. The concolic testing tool CREST1 is employed to obtain
this paper, we focus on injection attacks in SQL queries. the valuations of the conditions (specifically the user-input
The vulnerability exploited by SQL injection attacks is dependent conditions) that can maliciously affect the top-
based on maliciously updating the actual query conditions level query, thus making it vulnerable to injection attacks.
(that are dependent on user-inputs) such that the condition 3. Finally, the valuations of the conditions are analyzed to
becomes a tautology (evaluates to true always). When a con- identify a minimal set of conditions and their valuations
dition that is responsible in providing limited access (based that are responsible for such injection attacks. We refer
on the type of user making the query) to the underlying to these condition-valuation pairs as a causal set of an
database becomes a tautology, the result is unauthorized injection vulnerability.
access and update to the database. That is, SQL injection The advantage of obtaining such a causal set is that (a) the
leads to violation of privacy, security and integrity of data actual execution of the SQL query (with user inputs) can be
stored in the database.
1 CREST: Automatic Test Generation Tool for C. Available at
This work is supported in part by NSF grant CCF0702758. http://code.google.com/p/crest/.

978-0-7695-4107-5/10 $26.00 2010 IEEE 23


31
DOI 10.1109/SAINT.2010.60
monitored at run-time against the conditions in the causal corresponding C-program, (b) the application of CREST for
set and (b) the execution can be identified to exploit a SQL obtaining conditions of injection vulnerability, and (c) the
vulnerability if the conditions in the causal set are satisfied. analysis technique deployed to obtain the minimal causal set.
SELECT X1 FROM T1, T2 Section IV presents advantages of the proposed technique.
WHERE Y11 = $input11 AND Y12 = $input12 Finally, Section V discusses future directions of research.
AND Y13 NOT IN
SELECT X2 FROM T3, T4 II. R ELATED W ORK
WHERE Y21 = $input21
OR Y22 = $input22
SQLID techniques can be classified into two broad cat-
Consider the above query, which has four locations (cor- egories. One class of techniques is concerned with the
responding to user-inputs) where the code injection can embedding of SQL queries in Web programs. The primary
happen. We denote these locations as c11, c12, c21 and objective is to identify whether the Web inputs from the
c22 corresponding to the conditions that depend on user user can be relayed directly (without sanitization) to the SQL
inputs $input11, $input12, $input21 and $input22. query. These techniques are typically based on program anal-
The query is exploited via code injection if one of the ysis [5], [7]. The other class of techniques works with the
following holds due to injected code: raw SQL query and tries to identify when one or more user
inputs can exploit the query vulnerability. In essence, the two
1. Condition at location c11 becomes a tautology, condition
classes of techniques complement each other in finding (a)
at location c12 does not become a contradiction, and
the inputs that are introduced in an embedded SQL query
disjunction of the conditions at locations c21 and c22
without being sanitized and (b) the subset of unsanitized
does not become a tautology;
inputs that are likely to allow unauthorized access to the
2. Condition at location c12 becomes a tautology, condition
underlying database via the SQL query. Our technique falls
at location c11 does not become a contradiction, and
in the second class and we focus on comparing our technique
disjunction of the conditions at locations c21 and c22
with the ones in that class. Most of these techniques depend
does not become a tautology;
on static analysis of the SQL query and runtime detection
3. Conditions at locations c11 and c12 do not become con-
based on the result of static analysis.
tradictions, and disjunction of the conditions at locations
c21 and c22 becomes a contradiction.
Beuhrer et al. [3], and Halfond and Orso [4] propose
techniques that construct syntactic models (parse tree and
In contrast to the existing techniques, our technique automat- FSA) capturing the structures of the query in the absence
ically identifies the above requirements. These requirements of the user-inputs and detect injection at runtime whenever
are used at run-time to check whether or not the user inputs the syntactic structure of query with the user inputs deviates
satisfy them. In the following sections, we will use the above from the model.
example to discuss the salient aspects of our approach and
Works by Su and Wassermann [11], [10], on the other
present the advantages of our technique over the existing
hand, rely on generating a grammar from a given SQL query
ones for SQLID. The contributions of our approach are
such that the grammar only accepts the intended query.
summarized as follows:
At runtime, the query with the user-input is verified against
1. A new approach for SQLID that analyzes the semantic the grammar. In the event the grammar does not accept
dependencies between SQL query conditions and does not the run-time query, the query is not allowed to execute. In
solely rely on syntactic structure of the query. contrast to [3], [4], Su and Wassermans work provides a
2. Our approach is complementary to the existing techniques comprehensive and formal definition of intended behavior
for SQLID and leads to an effective detection mechanism of the SQL queries.
for SQL injection vulnerabilities. Especially, our tech- While the above works are based on detecting structural
nique, being based on semantic dependencies, does not dissimilarities using grammar-based (regular [4] and
have any false positive or false negative results. context-free [11], [10]) techniques, others have developed
3. We provide a novel reduction technique that takes as input techniques that rely on embedding/inserting preventive
various cases that can lead to SQL injection and automat- code in SQL queries to disallow users from injecting code.
ically combines these cases into a succinct summary. The Boyd and Keromytis [1] augment SQL query, by inserting
succinctness allows for easy understanding of the query random values before and after user inputs, and checks
vulnerability and efficient monitoring of the user-inputs whether each user input (without any SQL statement) in
that can lead to exploitation of the vulnerability. We refer the augmented query is flanked by identical random values.
to the summary as the causal set. If the random values before and after a user input are
The rest of the paper is organized as follows. Section II not identical, then the query is deemed malicious due to
presents a comparative study of existing techniques for code injection. The technique, though promising, suffers
SQLID. Section III discusses our proposed techniques; es- from some of the usual drawbacks of syntactic analysis
pecially (a) the technique for translating SQL queries to (see below). In addition, if the user guesses the random

32
24
values used to flank the inputs, then he/she can inject A. Translating SQL Query Conditions to C-Programs
code without getting detected. Another code embedding
The primary objective of our translator is to generate a
technique has been developed by Bravenboer et al. [2] that
program (essentially using if-control construct) which cap-
has recently resulted in a practical tool StringBorg (see
tures the valuations of the conditions at different locations
http://strategoxt.org/Stratego/StringBorg).
in the query (associated to the WHERE clause) and their
The central theme of this technique is to enhance the host
inter-dependencies that can maliciously affect the query
language syntax (e.g., Java) to allow for direct insertion
result. For the purpose of discussion, we will consider the
of SQL queries in the language. This ensures that all
following types of conditions2 : atomic conditions of the form
user inputs are appropriately escaped, thus avoiding
X=Value; belongs-to conditions of the form X IN (some
code injection attacks. However, broad application of this
nested query result); and boolean combination (conjunction,
technique requires supporting implementations for different
disjunction, negation, etc.) of the above conditions.
host languages (at present the supported host languages
In the event the WHERE condition is atomic, the query
include PHP and Java).
becomes vulnerable whenever the condition (after code-
All the above techniques aim to identify syntactic differ- injection) at that location becomes a tautology. We do
ences between queries with and without user inputs. While not consider the valuation of the exact condition in the
some syntactic differences are indeed indicators of malicious query; instead, we are interested in the valuation of the
activity, there are several that may not lead to intrusive condition (after code injection) at the location where the
behavior. On the other hand, there can be some malicious original condition was present. For example, for an atomic
behaviors that are not identifiable via syntactic differences. condition X=$input in a WHERE clause of the query, we
Consider the example introduced in Section I. Suppose that say that WHERE clause contains a location (say, c) that holds
the input $input12 is equal to AND 1=1 --. This an atomic condition dependent on user input and may be
makes the condition associated to the location c13 to be affected by the user. The user can make this condition a
a tautology, thus leading to an injection attack. However, tautology by providing an input such that $input= OR
several of the existing techniques will not be able to detect 1=1. Observe that the user input makes the original condition
any change in the structure of the query. As such, this code non-atomic (by adding a disjunction); however, we are not
injection will go undetected (false negative). concerned with this exact change. We simply detect that
On the other hand, if both the inputs $input11 and the location c (where a user-input dependent condition is
$input22 are made equal to OR 1=1, a change in the present) contains a condition that has become a tautology.
structure of the query will be detected by the above tech- Similarly, for a conjunctive condition, we are interested in
niques; however, such injection of code is benign (see the finding out whether the conditions at any one of the locations
injection vulnerability requirements discussed in Section I; (say c1 and c2) which contain the conjuncts can be made
the condition associated with location c11 is a tautology, but a tautology. This is because if any one of the conjuncts at
the disjunction of the conditions at locations c21 and c22 location (e.g., c1) becomes a tautology while the other (c2)
is a tautology). As such, even if the structure of the query is not a contradiction, then the query result is affected by
changes, the inputs will not be able to exploit SQL injection the condition at c1. A simple and commonly used example
vulnerability for the example under consideration (false illustrating this scenario is:
positive). We address the above drawbacks by proposing SELECT X FROM T WHERE user=$input1 AND passwd=$input2
and developing a complementary method that takes into
consideration the semantic dependencies between conditions In this example, there are two locations c1 and c2. If
in a SQL query and identifying the ones whose valuations the user provides $input2 such that it is equal to OR
(tautology) can lead to SQL injection. 1=1, then the user can access entries in the table without
proper authorization. This intrusion is allowed as long as no
III. A NALYZING SQL Q UERY C ONDITIONS other condition, i.e., at location c1, becomes a contradiction,
in which case the result of the query is an empty set.
Our technique consists of three main steps: (a) compiling Similar arguments can be provided for the dual operation:
SQL queries to a target language (in our case C) such disjunction.
that the dependencies between query conditions at various Proceeding further, for conditions that depend on nested
locations in the query are faithfully captured; (b) applying sub-queries (belongs-to), we say that if the sub-query is
an existing test generator (in our case CREST) to obtain affected by some code-injection then the belong-to condition
the test cases that correspond to valuation of conditions at is also affected. For example, in the query presented in
different locations leading to possible injection vulnerability Section I, if conditions at locations c21 and c22 evaluate
exploitation; and finally (c) analyzing the test cases to
identify the cause of the vulnerability that can be effectively 2 Other forms of conditional expressions can be translated by following
used during run-time monitoring of the query-executions. appropriate rules of translation.

33
25
Algorithm 1 Query Translator the valuation (a) taut, denoting that the condition at that
location has become a tautology; or (b) cont, denoting that
1: procedure T RANSLATE(q)
2: Obtain condition-locations c associated to WHERE clause; the condition at that location has become a contradiction;
3: T RANSLATE(c, q); or (c) remain unchanged (denoting no code injection). In
4: end procedure addition to q and q13, which capture whether the top-
level and the nested queries, respectively, are affected by
5: procedure T RANSLATE(c, q)
their corresponding WHERE conditions, there are several other
6: if c is atomic then
7: print if (c == taut) q = taut; q** variables used as intermediate data variables in the
8: print if (c == cont) q = cont; translator. Figure 1(b) shows the code generated as a result
9: end if of the translation.
10: if c := c1 AND c2 then
11: T RANSLATE(c1 , q1 ); T RANSLATE(c2 , q2 );
We say that the injection vulnerability is exploited if
12: print if (q1 == taut && q2 != cont) q = taut; the valuation of conditions at locations (related to user
13: print if (q2 == taut && q1 != cont) q = taut; inputs) are such that the program (resulting from translation)
14: print if (q1 == cont) q = cont; violates the assertion on q, the top-level query (Line 18 in
15: print if (q2 == cont) q = cont; Figure 1(b)).The following theorem states the correctness of
16: end if
17: if c := V IN qk then
the above claim.
18: T RANSLATE(qk ); Theorem 1 (Sound and Complete Translation): Given a
19: print if (qk == taut) q = taut; program P generated by Algorithm 1 from a query Q, there
20: print if (qk == cont) q = cont; exists some execution path in P where q evaluates to taut
21: end if at the programs exit point if and only if there exists some
22: if c := V NOT IN qk then
combination of valuations of query conditions at different
23: T RANSLATE(qk );
24: print if (qk == taut) q = cont; locations that maliciously affects the result of Q.
25: print if (qk == cont) q = taut; Proof sketch. The proof follows directly from the semantics
26: end if of the conditions and their effect on the queries. If the
... . Translation rules contd. query condition is conjunctive, then the query is affected
27: end procedure
only when the condition in at least one of the locations
becomes a tautology while conditions at other locations
are not contradictions. This is carefully captured by the
to a contradiction (due to code injection of the form 0=1), translation algorithm and appropriately used to generate the
the condition at location c13 (associated with Y13 NOT IN corresponding code. As a result, the program P will have an
...) becomes a tautology. execution path which makes q (the program variable used
Algorithm 1 presents a snapshot of our translator. It takes to capture SQL injection attack at the top-level query Q)
as input a SQL query and generates program code. The first to be equal to tautology. Similarly, arguments can be made
step, as noted above, is to gather the locations c of conditions following the semantics of other types of conditions.
associated with the WHERE clause of the query (Line 2). Then The above theorem ensures that there exist no false positives
a subroutine TRANSLATE, with the condition location c and or false negatives in our analysis.
query q as parameters, is invoked (we have overloaded q B. Application of CREST
to denote a query and also a variable to capture how the
query is affected by the conditions in its WHERE clause). We use CREST, an automatic test generation engine for C
As outlined above, the algorithm recursively explores the programs, to analyze the program obtained after translation.
query condition-locations (conjunctions, disjunctions, etc.) We consider the assertion that q does not evaluate to taut
and, wherever necessary, analyzes the locations of subquery at the exit of the program (Line 18 in Figure 1(b)). The
conditions (e.g., at Lines 18, 23). Note that if there exists a program keeps all variables uninitialized. More specifically,
conjunctive (disjunctive) condition at location c, we repre- uninitialized variables are declared as CREST variables,
sent it as AND (OR) of the corresponding locations holding which allows CREST to choose different valuations of these
the conjuncts (disjuncts) (see Line 10). variables to generate test cases that violate the assertion. At
Example 1: Figure 1(a) presents the recursive explo- its core, CREST relies on concrete and symbolic execution
ration of query in Section I by the translation algorithm. of programs to maximize exploration of branches in a
In the figure, c11 denotes the location for the condi- program and identify assertion violations (if they exist). In
tion Y11=$input1, c12 denotes the location for the con- the event the program does not contain any loops (as is the
dition Y12=$input2, c13 denotes the location for the case of the result of our translations), CREST can potentially
condition Y13 NOT IN ..., c21 denotes the location for explore all possible branches and therefore can generate all
Y21=$input21, and finally c22 denotes the location for
possible test cases that lead to assertion violation. Each test
Y22=$input22. Each condition location can either take
case assigns some values to the CREST variables and these

34
26
0. variable declarations, initializations
1. if (c11 == taut) q11 = taut; 1
2. if (c11 == cont) q11 = cont;
3. if (c21 == taut) q21 = taut;
c11 & c12 & c13 q 4
4. if (c21 == cont) q21 = cont;
5. if (c22 == taut) q22 = taut;
6. if (c22 == cont) q22 = cont; 6
c11 q11 c12 & c13 q123 7. if (q21 == taut) q13 = cont;
8. if (q22 == taut) q13 = cont;
9. if (q21 == cont && q22 == cont) q13 = taut; 9
c12 q12 c13 q13 10.if (c12 == taut) q12 = taut;
11.if (c12 == cont) q12 = cont; 10
12.if (q12 == taut && q13 != cont) q123 = taut; 13
c21 | c22 q13
13.if (q13 == taut && q12 != cont) q123 = taut;
14.if (q12 == cont && q13 == cont) q123 = cont; 12
15.if (q123 == taut && q11 != cont) q = taut;
c21 q21 c22 q22 16.if (q11 == taut && q123 != cont) q = taut; 15
17.if (q123 == cont && q11 = cont) q = cont;
18
// For SQL injection requirement
18.assert(q != taut);
(a) (b) (c)
Figure 1: (a) Possible execution tree of T RANSLATE; (b) Result of translation; (c) Partial execution graph explored by CREST.
values denote the conditions under which q evaluates to condition at location c11 does not become a contradiction,
taut, i.e., a vulnerability is exploited. and disjunction of the conditions at locations c21 and c22
Example 2: Figure 1(c) shows some of the execution does not become a tautology.
traces of the program in Figure 1(b) explored by CREST In this section, we present a reduction mechanism which
to generate test cases (each node in the trace denotes a results in a summarization of all cases obtained from
line number of the program). The execution traces 1-4-6-9- CREST. The proposed succinctness achieves two advan-
10-12-15-18 and 1-4-6-9-10-13-15-18 correspond to the test tages. First, the succinctness permits efficient monitoring of
case where c11, c12 are tautologies and c21, c22 are con- user inputs at runtime. Second, it removes all redundancies
tradictions. Note that CREST may not assign taut or cont in the conditions, thus allowing the developer to understand
to all variables while generating a test case. For instance, the the root cause of the SQL injection vulnerability in the query
path 1-4-6-9-13-15-18 corresponds to the test case where and to take appropriate corrective measures.
c11 is a tautology and c21, c22 are contradictions. The
Decision tree representation of vulnerability require-
variable c12 remains uninitialized; we will refer to such
ments. Recall that a vulnerability requirement is given in
values as unin.
terms of valuation of conditions at different locations of the
C. Causal Set Detection: Reductions query under consideration. The domain of valuation D is
{taut, cont, unin}. Each requirement can be viewed
In the above sections, we have presented how the CREST
as a conjunctive formula where each conjunct corresponds
test case generator can be used effectively to identify
to a valuation of a condition at a particular location. For
injection-causing requirements (i.e., the valuation of con-
instance, one of the requirements is
ditions at various locations). At run-time, when the user
inputs are provided, they are monitored to check whether c11 = taut c21 = unin c22 = cont c12 = taut
any of these requirements are satisfied. Any user input that
satisfies at least one requirement will be deemed intrusive That is, the conditions at locations c11 and c12 are tau-
and the query will not be allowed to execute with the input, tologies, the condition at location c21 is uninitialized (i.e.,
thus stopping SQL injection attack. While CREST generates not adversely affected by user input) and the condition at
all possible requirements in terms of condition valuations location c22 is a contradiction.
at each location, the number of such requirements may be The set of all requirements is therefore a disjunction of
large, and therefore it may be ineffective to verify user inputs conjunctive formulas representing individual requirements.
against each of the requirements one at a time. For instance, Such formulas can be represented using a (3-valued) deci-
CREST identifies eight different cases, corresponding to the sion tree where each node in the tree corresponds to one
case where the condition at location c12 is tautology3 (see of the location variables and directed edges from a node
table in Figure 3(a)), under which injection-vulnerability can represent its valuation. The edges are labeled with items
be exploited in the query (introduced in Section I). The D. The ordering in which variables appear in the tree
summary of these cases is that after the user provides some is pre-specified and the leaf node is termed T (true) node.
input, the condition at location c12 becomes a tautology, the A path from the root to the leaf in the tree corresponds
to a conjunctive formula, which in turn corresponds to one
3 Similar cases are obtained when conditions at locations c11, c21, or possible valuation of the location variables as described by
c22 become either tautologies or contradictions. some requirement. Figure 3(b) presents a 3-valued decision

35
27
Ci Cj Ck Ci Cj Ck

C1 C1 C1 C C Remove C
Remove Generalize if (V=taut) then V2 V1 Duplicate V1 V2
unin

cont

taut

unin
C2 not(cont) V1 V2

V
Redundant Test Values else not(taut) Tests
C2 C2 C2 C1 C2 C1 C2
Tests

Figure 2: Decision graph representation and reductions of query conditions

tree representing the injection-requirements shown in the reflect that the valuation of c1 is not equal to the negation
table (Figure 3(a)). of V. This merging follows from the fact that if there are
Decision trees to Decision diagrams. Decision trees can be at least two paths in the decision tree, one where c1 is
reduced to decision diagrams which removes all duplications equal to taut (or cont) and the other where c1 is equal
and redundancies from the decision tree taking into consid- to unin, and all other node values remain the same, then
eration the semantics of boolean operations (conjunction and the valuation of c1 in these paths is equal to not(cont)
disjunction) over the domain of the decision tree node-values (or not(taut)). We refer to this rule as generalization. It
(D in our case). (See [6] for details.) We present rules for is worth mentioning that the generalization rule depends on
reducing our 3-valued decision tree to a 3-valued decision the domain and semantics of the valuations in a multi-valued
diagram in Figure 2. decision tree/diagram.
Finally, the third rule corresponds to at least two same
c11 c21 c22 c12
taut cont cont taut
subtrees/graphs that are rooted at two different nodes. In that
taut cont unin taut case, one of the nodes is removed and all incoming edges
taut unin unin taut C11
to the removed node are redirected to the one that is not
unin
taut unin cont taut taut

unin cont cont taut removed. This rule is referred to as duplicate test removal.
unin C21 unin C21
unin cont unin taut
unin unin unin taut cont cont The application of the above rules converts a 3-valued
C22 C22
unin unin cont taut unin cont unin cont
decision tree to a 3-valued decision diagram (a DAG).
(a) C12 C12 C12 C12 Figure 3(c) presents the 3-valued decision diagram obtained
taut taut taut taut
C11 from the decision tree in Figure 3(b). The steps that lead
T T T T
not(cont)
to the decision diagram are summarized as follows. Using
C21 C22 C22
not(taut)
unin cont unin cont
the rule to remove duplicate tests where the test node does
C22
C12 C12 C12 C12 not have any children, only one node T is allowed in the
taut taut taut taut
not(taut) decision diagram. All but one c12 nodes are removed from
T T T T
C12
the decision tree (duplicate test removal). Similarly, there
taut (b) are four duplicate subtrees rooted at c22 and as such three
T of them are removed. The node c22 has two branches, each
(c) going to the same node c12, and as such the branches are
merged (generalization) to not(taut). Similarly, duplicate
Figure 3: (a) Requirements, (b) 3-valued Decision Tree, (c) 3- test removal and generalization are applied to nodes c21 and
valued Decision Diagram c11 to obtain the decision diagram.
The decision diagram states that SQL injection vulner-
The first rule states that if there is a node c1 such that all ability can be exploited by user inputs which make (a)
its three branches go to the same node c2, then the valuation the condition at location c12 a tautology, (b) the condition
of the node c1 is not relevant, i.e., there exists some specific at location c11 not a tautology, and (c) the conditions at
valuations for all variables other than c1 such that for all locations c21 and c22 not contradictions. This is concise
possible valuations of c1, there exists an injection-causing and precise representation of the injection requirements
requirement. In this case, node c1 can be removed and all shown in the table of Figure 3(a). In essence, the decision
incoming edges to c1 are redirected to its child-node (c2). diagram captures the causal set of requirements. Note that
This rule is commonly referred to as redundant test removal. due to space constraints, in Figure 3, we have shown one
The second rule corresponds to the case when there exists small set of requirements in the table and the corresponding
a node c1 in a path where one of its branches is labeled with decision tree and diagram. The size of the table is much
unin and the other labeled with V (which is equal to either larger for our example; the reduction due to summarization
taut or cont), and both branches lead to the same node to a causal set obtained by generating the corresponding
c2. In that case, the two branches from c1 are merged to decision diagram, therefore, is significant.

36
28
The reduction algorithm for obtaining a decision diagram as it relies on semantic dependencies between the conditions
from a decision tree is well-studied [6]. It is based on that are affected by user inputs. This work specifically
recursive backward exploration of the decision tree and has focuses on SQL query vulnerabilities that are exploited
a complexity of O(N log(N )), where N is the total number by injections that lead to tautologies in query conditions.
of nodes in the decision tree. One of the challenging aspects As part of future work, we plan to develop a complete
of decision diagram is the order of the nodes (e.g., we framework and empirically evaluate the strengths of our
considered the ordering c11 followed by c21, followed technique using real-life SQL queries. We also plan to
by c22, followed by c12) that will result in the smallest investigate various decision diagram construction and min-
possible decision diagram corresponding to a decision tree. It imization heuristics, and identify the ones that best suit
is computationally expensive (NP-Complete). However, we our purpose. Especially, we will take into consideration
can leverage different heuristics [9] that have been proposed Multi-valued Decision Diagram Library [8] that allows for
to efficiently produce a good ordering of variables. representation of logical formulas over variables with any
size (finite) domain and has several optimized reduction
IV. P RELIMINARY E VALUATION
algorithms.
As proved in Section III (Theorem 1), our technique
R EFERENCES
does not have any false positives or false negatives (for the
SQL queries syntax considered for translation). Additionally, [1] S. W. Boyd and A. D. Keromytis. SQLrand: Preventing
we have claimed that our technique is likely to capture in SQL injection attacks. In Applied Cryptography and Network
a succinct fashion the core conditions (causal set) which, Security (ACNS) Conference, 2004.
when satisfied by the user-inputs, will cause a SQL injection [2] M. Bravenboer, E. Dolstra, and E. Visser. Preventing injection
attack. In the following, we will use some sample examples attacks with syntax embeddings. In ACM Conference on
to show that our claim holds true in practice. Generative Programming and Component Engineering, 2007.
The example introduced in Section I contains four loca-
[3] G. Buehrer, B. W. Weide, and P. A. G. Sivilotti. Using parse
tions where user-inputs can affect the conditions. As each of tree validation to prevent SQL injection attacks. In ACM
the locations can take up one of three values (taut, cont Workshop on Software Engineering and Middleware, 2005.
and unin), there are 34 different test inputs. CREST can
identify around 28 different injection-causing test cases (see [4] W. G. J. Halfond and A. Orso. AMNESIA: Analysis and mon-
Figure 3(a) for test cases corresponding to c12 = taut). itoring for neutralizing SQL-injection attacks. In IEEE/ACM
international Conference on Automated software engineering,
However, our technique of reduction obtains only 4 different 2005.
elements in the causal set. In short, our technique results in
85% reduction. Next, consider the following SQL query. [5] W. G. J. Halfond, A. Orso, and P. Manolios. Using positive
tainting and syntax-aware evaluation to counter SQL injection
SELECT deductible FROM policy as p
WHERE inputPolicy = $input11 OR id = $input12 attacks. In ACM SIGSOFT Foundations of Software Engineer-
UNION ing (FSE), 2006.
SELECT d.insuredname FROM dependents as d
WHERE inputPolicy = $input21 OR id = $input22 [6] M. Huth and M. Ryan. Logic in Computer Science: Mod-
elling and Reasoning about Systems, chapter 6. Cambridge
Similar to the previous example, this query also has four University Press, 2004.
locations where user inputs affect the conditions; however,
the dependencies between these locations are different from [7] M. Martin and M. S. Lam. Automatic generation of XSS and
those in the previous example. CREST obtains 13 different SQL injection attacks with goal-directed model checking. In
injection causing test cases, while our technique correctly USENIX Security symposium, 2008.
identifies the causal set to contain cases where at least one [8] A. S. Miner. Implicit GSPN reachability set generation using
of the locations result in a taut condition, and reduces that decision diagrams. Performance Evaluation, 56(1-4):145
number to 4 (about 69% reduction). 165, 2004.
In summary, the proposed technique has two main ad-
[9] M. Rice and S. Kulhari. A survey of static variable ordering
vantages. It does not produce any false positive or false heuristics for efficient BDD/MDD construction. Technical
negatives. It produces results that capture exactly the cause report, UC Riverside, 2008.
of SQL injection with respect to user inputs. The causal
set is, therefore, precise and succinct, making it easier to [10] Z. Su and G. Wassermann. The essence of command injection
monitor for injection-causing user inputs and also to take attacks in web applications. In ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages, 2006.
appropriate corrective measure in the event of an injection.
V. C ONCLUSION [11] G. Wassermann and Z. Su. Sound and precise analysis of
web applications for injection vulnerabilities. In ACM Confer-
We have shown that our technique is at the same time ence on Programming Language Design and Implementation,
more general and more precise than the existing techniques, 2007.

37
29

Das könnte Ihnen auch gefallen