Beruflich Dokumente
Kultur Dokumente
30
Authorized licensed use limited to: Universitat Passau. Downloaded on May 31,2010 at 14:00:55 UTC from IEEE Xplore. Restrictions apply.
tion obtained from counterexamples [1, 7]. By contrast, 2.1. Configurable Program Analysis with
we introduce new predicates based on information obtained Dynamic Precision Adjustment
from the abstract states encountered so far during the anal-
ysis. This is related to the automatic discovery of invari- For presentation and formalization purposes, we use the
ants from state samples [11], such as linear relationships framework of configurable program analysis [4]. This pro-
between variables, and to widening operators used in ab- vides the advantage of making explicit the new, orthogonal
stract interpretation [9]. However, unlike widening, CPA+ capability that dynamic precision adjustment gives the anal-
does not relax the precision of individual abstract states but ysis. We add two components to the CPA: in order to pair
adjusts the precision of entire analyses going forward. In abstract-domain elements with precisions, we introduce a
particular, the precision of any component analysis may be set of precisions, and in order to adjust the precision of such
increased as well as decreased, depending on the accumu- pairs, we introduce a precision adjustment operator. Thus
lated results so far. the new algorithm for dynamic precision adjustment will be
able to maintain for each abstract state its precision.
2. Program-Analysis Framework A configurable program analysis with dynamic precision
adjustment (CPA+) D = (D, Π, , merge, stop, prec)
consists of an abstract domain D, a set Π of precisions, a
We introduce the concept of configurable program anal- transfer relation , a merge operator merge, a termination
ysis with dynamic precision adjustment (CPA+). A CPA+ check stop, and a precision adjustment function prec, which
makes it possible to dynamically adjust the precision of an are explained in the following.
analysis while exploring the abstract state space of the pro-
gram. A composite CPA+ can control the precision of its 1. The abstract domain D = (C, E, [[·]]) is defined by the
component analyses during the verification process: it can set C of concrete states, the semi-lattice E of abstract states,
make a component analysis more abstract, and more effi- and a concretization function [[·]]. The semi-lattice E =
cient (in the extreme: switch it off completely), and it can (E,
, ⊥, ,
) consists of the (possibly infinite) set E of
make a component analysis more precise, and more expen- abstract domain elements, the top element
∈ E, the bot-
sive (in the extreme: track the values for each variable with tom element ⊥ ∈ E, the partial order ⊆ E × E, and the
the best precision possible). function
: E × E → E (the join operator). The func-
tion
yields the least upper bound for two lattice elements,
Preliminaries. We restrict the presentation to a simple im- and the symbols
and ⊥ denote the least upper bound and
perative programming language, where all operations are greatest lower bound of the set E, respectively. The con-
either assignments or assume operations, and all variables cretization function [[·]] : E → 2C assigns to each abstract
range over integers.1 A program is represented by a control- state e its meaning, i.e., the set of concrete states that it rep-
flow automaton (CFA), which consists of a set L of program resents. For soundness of the program analysis, the abstract
locations (models the program counter pc), an initial pro- domain has to fulfill the following requirements:
gram location pc 0 (models the program entry), and a set
G ⊆ L × Ops × L of control-flow edges (models the oper- (a) [[
]] = C and [[⊥]] = ∅
ation that is executed when control flows from one program
location to another). The set of program variables that occur (b) ∀e, e ∈ E : e e ⇒ [[e]] ⊆ [[e ]]
in operations from Ops is denoted by X. A concrete state
of a program is a variable assignment c : X ∪{pc} → Z that (c) ∀e, e ∈ E : [[e
e ]] ⊇ [[e]] ∪ [[e ]]
assigns to each variable an integer value. The set of all con- (the join operator is precise or overapproximate)
crete states of a program is denoted by C. A set r ⊆ C of
Note that requirement (c) is implied by (b) and the fact that
concrete states is called a region. Each edge g ∈ G defines
g
is the least upper bound.
a (labeled) transition relation → ⊆ C × {g} × C. The com-
plete transitionrelation → is the union over all control-flow 2. The set Π of precisions determines the possible preci-
g g
edges: → = g∈G →. We write c→c if (c, g, c ) ∈ →, sions of the abstract domain. The program analysis uses el-
g
and c→c if there exists a g with c→c . A concrete state cn ements from Π to keep track of different precisions for dif-
is reachable from a region r, denoted by cn ∈ Reach(r), ferent abstract states. A pair (e, π) is called abstract state e
if there exists a sequence of concrete states c0 , c1 , . . . , cn with precision π. The operators on the abstract domain are
such that c0 ∈ r and for all 1 ≤ i ≤ n, we have ci−1 →ci . parametric in the precision.
1 In
3. The transfer relation ⊆ E × G × E × Π assigns to
our implementation based on B LAST, we allow C programs
as inputs, and transform them into the intermediate language C IL
each abstract state e possible new abstract states e with pre-
(http://manju.cs.berkeley.edu/cil/). B LAST supports in- cision p which are abstract successors of e, and each transfer
g
terprocedural program analysis. is labeled with a control-flow edge g. We write e(e , π)
31
Authorized licensed use limited to: Universitat Passau. Downloaded on May 31,2010 at 14:00:55 UTC from IEEE Xplore. Restrictions apply.
if (e, g, e , π) ∈ , and e(e , π) if there exists a g with Algorithm 2.1 CPA+(D, e0 , π0 )
g
e(e , π). Input: a configurable program analysis with dynamic precision
The transfer relation has to fulfill the following require- adjustment D = (D, Π, , merge, stop, prec),
ment: an initial abstract state e0 ∈ E with precision π0 ∈ Π,
where E denotes the set of elements of the semi-lattice of D
(d) ∀e
∈ E, g ∈ G, π ∈Π : Output: a set of reachable abstract states
g
e(e ,π) [[e ]] ⊇ c∈[[e]] {c | c→c } Variables: a set reached of elements of E × Π,
g
32
Authorized licensed use limited to: Universitat Passau. Downloaded on May 31,2010 at 14:00:55 UTC from IEEE Xplore. Restrictions apply.
approximation of the set of reachable concrete states. The states, E1 and E2 being their respective sets of abstract
configurable program analysis with dynamic precision ad- states, a composite set of precisions Π× , a composite
justment is given by the abstract domain D, the preci- transfer relation × ⊆ (E1 × E2 ) × G × (E1 × E2 ) × Π× ,
sions Π, the transfer relation of the input program, the a composite merge operator merge× : (E1 × E2 ) ×
merge operator merge, the termination check stop, and the (E1 × E2 ) × Π× → (E1 × E2 ), a composite termination
precision adjustment function prec. The algorithm keeps check stop× : (E1 × E2 ) × 2E1 ×E2 × Π× → B,
updating two sets of abstract states with precision: a set and a composite precision adjustment function
reached to store all abstract states with precision that are prec× : (E1 ×E2 )×Π× ×2(E1 ×E2 )×Π× → (E1 ×E2 )×Π× .
found to be reachable, and a set waitlist to store all abstract The three composites × , merge× , and stop× are ex-
states with precision that are not yet processed (the frontier). pressions over the components of D1 , D2 , and Π×
The state exploration starts from the initial abstract state e0 (i , mergei , stopi , preci , [[·]]i , Ei ,
i , ⊥i , i ,
i ), as well
with precision π0 . For a current abstract state e with pre- as the operators ↓ and . The strengthening operator ↓
cision π, the algorithm first adjusts the (local) precision of and the compare relation can be used to increase the
the algorithm using the precision adjustment function prec, precision of the composite operators in a way similar to
based on the set of reached abstract states with precision. their use in CPA [4].
Next the algorithm considers each successor e with the new To guarantee that a configurable program analysis with
precision π̂, according to the transfer relation. Now, us- dynamic precision adjustment can be built from the com-
ing the given operator merge, the abstract successor state posite program analysis with dynamic precision, we impose
is combined with each existing abstract state from reached. the requirements for CPA+ operators (d, e, f, g) from Sec-
If the operator merge has added information to the new ab- tion 2.1 on the four composite operators.
stract state, such that the old abstract state is subsumed, then For a given composite program analysis with dynamic
the old abstract state with precision is replaced by the new precision C = (D1 , D2 , Π× , × , merge× , stop× , prec× ),
abstract state with precision in the sets reached and waitlist. we can construct the program analysis
If after the merge step the resulting new abstract state with with dynamic precision adjustment D× =
precision is not covered by the set reached, then it is added (D× , Π× , × , merge× , stop× , prec× ), where the product
to the set reached and to the set waitlist. domain D× is defined as the direct product of D1 and D2 :
D× = D1 × D2 = (C, E× , [[·]]× ). The product lattice is
Theorem 2.1 (Soundness) For a given configurable pro-
E× = E1 × E2 = (E1 × E2 , (
1 ,
2 ), (⊥1 , ⊥2 ), × ,
× )
gram analysis with dynamic precision adjustment D and an
with (e1 , e2 ) × (e1 , e2 ) if e1 1 e1 and
initial abstract state e0 with precision π0 , Algorithm CPA+
e2 2 e2 (and for the join operation, we have
computes a set of abstract states that overapproximates the
(e1 , e2 )
× (e1 , e2 ) = (e1
1 e1 , e2
2 e2 )). The
set of reachable concrete
states: product concretization function [[·]]× is such that
[[e]] ⊇ Reach([[e0 ]]).
e∈CPA+(D,e0 ,π0 )
[[(d1 , d2 )]]× = [[d1 ]]1 ∩ [[d2 ]]2 .
It is necessary to sharpen the analysis over the otherwise
Proof. The proof constructs an invariant over the main imprecise cartesian product of the component abstract do-
loop of the algorithm which is strong enough to show that mains [8, 10, 14]. Previous approaches have tried to achieve
in each iteration, the algorithm (1) does not replace an ab- this by defining a particular, specialized product. Basing our
stract state with precision from the set reached by one that formalism on top of configurable program analysis, we can
represents only a subset of concrete states, even if the pre- specify the desired degree of ‘sharpness’ in the composite
cision is refined, and (2) adds abstract states that cover all operators × , merge× , and stop× . In previous approaches,
concrete successors of an abstract state when it is removed a redefinition of basic operations was necessary, but using
from the set waitlist. CPA [4], we can reuse the existing abstract interpreters. In
addition to that, the CPA+ allows the analysis algorithm to
2.3. Composition for CPA+ specify a local precision for each abstract state and to adjust
this precision dynamically during the analysis.
A configurable program analysis with dy-
namic precision adjustment can be composed of Theorem 2.2 (Composition) Given a composite program
several configurable program analyses with dy- analysis with dynamic precision adjustment C, the result D×
namic precision adjustment. A composite CPA+ of the composition is a configurable program analysis with
C = (D1 , D2 , Π× , × , merge× , stop× , prec× )3 consists of dynamic precision adjustment.
two configurable program analyses with dynamic precision
adjustment D1 and D2 sharing the same set of concrete
Proof. The requirements (a), (b), and (c) for abstract do-
3 We extend this notation to any finite number of Di . mains of CPA+ follow from the construction of D× as
33
Authorized licensed use limited to: Universitat Passau. Downloaded on May 31,2010 at 14:00:55 UTC from IEEE Xplore. Restrictions apply.
⎧
product domain (abstract domains are closed under prod- ⎪
⎪
if π(x) = 0 or p/e ⇒ (x =
)
⎪
⎪
uct). The operators of the composite CPA+ fulfill the re- ⎨ ⊥ if π(x) = 0 and
quirements (d), (e), (f), and (g) of CPA+ by definition. e (x) = (p/e ⇒ false or ∃y ∈ X : e(y) = ⊥)
⎪
⎪
⎪
⎪ c if π(x) = 0 and p/e ⇒ (x = c)
⎩
e(x) otherwise
3. Application: Combining Explicit and where p/e denotes the predicate p with all occurrences of a
Symbolic Program Analysis variable y ∈ X replaced by e(y), or
(2) g = (·,
⎧ w := exp, ·) and for all x ∈ X :
The motivation of this work is the need for a formalism ⎨
if π(x) = 0
that allows us to compose a program analysis from known e (x) = exp/e if π(x) = 0 and x = w
⎩
parts and that makes it possible to dynamically adjust the e(x) otherwise
precision of the algorithm, locally, and across different ab- where exp/e denotes the evaluation of an expression exp
stract domains. Our goal is to track variables explicitly in over X for⎧an abstract value assignment e:
the analysis, and at the point where it becomes too expen- ⎪
⎪ ⊥ if x ∈ X occurs in exp with e(x) = ⊥
⎪
⎪
sive to track all possible values, we abstract the values that ⎨
if x ∈ X occurs in exp with e(x) =
we have seen so far by predicates and add these predicates exp/e = c otherwise, where expression exp
⎪
⎪
to a predicate analysis. We first introduce three CPA+ and ⎪
⎪ evaluates to c, after each occurrence
⎩
then the composite CPA+. of variable x ∈ X is replaced by e(x)
4. The merge operator does not combine elements when
3.1. CPA+ for Explicit Analysis control flow meets: mergeC (e, e , π) = e .
5. The termination check considers abstract states individu-
The CPA+ for explicit analysis C = ally: stopC (e, R, π) = (∃e ∈ R : e e ).
(DC , ΠC , C , mergeC , stopC , precC ), which can track 6. The precision adjustment function computes a new ab-
explicit values for program variables, consists of the stract state with precision: prec(e, π, R) = (e , π ) if the
following components: following holds for all x ∈ X: if |R(x)| ≥ π(x) then
1. The abstract domain DC = (C, E, [[·]]) uses the semi- π (x) = π(x) ∧ e (x) =
, and otherwise π (x) =
lattice E = (E,
, ⊥, ,
), which consists of the set E = π(x) ∧ e (x) = e(x), where R(x) denotes the set {e(x) |
(X → Z) of abstract variable assignments, where Z = (e, ·) ∈ R} of explicit values held in R for variable x.
Z∪{
Z , ⊥Z } induces the flat lattice over the integer values For a set M of pairs, we use the notation (x, ·) ∈ M for
(we write Z to denote the set of integer values). The top ∃y : (x, y) ∈ M .
element
∈ E with
(x) =
Z for all x ∈ X is the
abstract variable assignment that has no explicit value for 3.2. CPA+ for Predicate Analysis
any variable, and the bottom element ⊥ ∈ E with ⊥(x) = The CPA+ for predicate analysis P =
⊥Z for all x ∈ X is the abstract variable assignment which (DP , ΠP , P , mergeP , stopP , precP ), a program anal-
models that there is no value assignment possible (such a ysis for cartesian predicate abstraction that tracks the
state cannot be reached in an execution of the program). validity of predicates over program variables, consists of
The partial order ⊆ E × E is defined as e e if for all the following components:
x ∈ X, we have e(x) = e (x) or e(x) = ⊥Z or e (x) =
1. The domain DP = (C, E, [[·]]) is based on predi-
34
Authorized licensed use limited to: Universitat Passau. Downloaded on May 31,2010 at 14:00:55 UTC from IEEE Xplore. Restrictions apply.
g
3. The transfer relation P has the transfer eP (e , π) 3.4. Composition of Explicit, Predicate, and
if e is the largest set of predicates from π such that Location Analysis
[[e]] ⊆ pre(p, g) for each p ∈ e , where pre(p, g) = {c ∈ C |
g
∃c ∈ C : c→c and c |= p} denotes the weakest precondi- We construct a composite CPA+ that dynamically
tion for predicate p and control-flow edge g. changes the precision across different analyses, based on
4. The merge operator does not combine elements when the analyses defined in the previous subsections. In par-
control flow meets: mergeP (e, e , π) = e . ticular, we define a composite CPA+ that on-the-fly ab-
stracts the explicit CPA+ (by switching it off for certain
5. The termination check considers abstract states individu- variables) and refines the predicate CPA+ (by adding pred-
ally: stopP (e, R, π) = (∃e ∈ R : e ⊇ e ). icates), using information from the reachable set of abstract
6. The precision adjustment function does not change the states with precision. The predicates added to the predicate
abstract state with precision: prec(e, π, R) = (e, π). CPA+ characterize the sample set of values provided by
the explicit CPA+.
Note. If a predicate is not in the precision set and not in
The composite program analysis with dynamic precision
the abstract state, then this predicate is not tracked. If a
C = (L, P, C, Π× , × , merge× , stop× , prec× ), is based
predicate is in the precision set but not in the abstract state,
on the CPA+ for location analysis L, the CPA+ for pred-
then there exists a concrete state represented by the abstract
icate analysis P, the CPA+ for explicit analysis C, and the
state for which the predicate is not true. If a predicate is in
following components:
the abstract state, then the predicate is true for all concrete
states represented by the abstract state. 1. The composite precision is defined by
In the above formalization, we assume a fixed universe of Π× = ΠL × ΠP × ΠC .
possible predicates — isolated automatic refinement of the 2. The transfer relation × has the transfer
g
predicate abstraction is not discussed in this paper, as our (l, P, v)× ((l , P , v ), (πl , πP , πv )) if
goal is to point out possibilities to refine the abstract states lL (l , πl ) and P P (P , πP ) and vC (v , πv ).
by using information from other component analyses.
3. The composite merge operator is defined by
merge× ((l, P, v), (l , P , v ), π) = (l , P , v ).
3.3. CPA+ for Location Analysis
4. The composite termination check is defined by
stop× (e, R, π) = (∃e ∈ R : e × e ).
The CPA+ for location analysis L =
(DL , ΠL , L , mergeL , stopL , precL ), which tracks the 5. The composite precision adjustment funct. is defined by
syntactical reachability of program locations, consists of prec× ((l, P, v), (πl , πP , πv ), R) = ((l , P , v ), (πl , πP , πv ))
the following components: if (l , πl ) = (l, πl ) and
(v , πv ) = precC (v, πv ,
1. The domain DL is based on the flat lattice for the set L
{(v , πv ) | ((l, p, v ), (·, ·, πv )) ∈ R}) and
of program locations: πP = πP ∪ x∈X:v(x)=∧v (x)=
DL = (C, E, [[·]]), with E = (L ∪ {
, ⊥},
, ⊥, ,
), abstract(x, {v | ((l, P, v ), ·) ∈ R}) and
⊥ l
and l = l ⇒ l l for all elements l, l ∈ L P = P ∪ {p ∈ (πP \ πP ) | v |= p}.
(this implies ⊥
l = l,
l =
, l
l =
for all elements This composite precision adjustment function stops the
l, l ∈ L, l = l ), and [[
]] = C, [[⊥]] = ∅, and for all l ∈ L: explicit analysis for a variable x if the number of val-
[[l]] = {c ∈ C | c(pc) = l}. ues for x exceeds the specified threshold. If this situation
2. There is only one precision, which is the set of all loca- occurs, then the precision of the predicate analysis is in-
tions: ΠL = {L}. creased by adding all predicates computed by abstraction
g function abstract : (X × 2X→Z ) → 2P . Our implemen-
3. The transfer relation L has the transfer lL (l , π) tation of abstract applies a simple heuristic for discovering
g
if g = (l, ·, l ), and the transfer
L (
, π) for all g ∈ G relationships between variables and values, based on pred-
(the syntactical successor in the CFA without considering icates that syntactically occur in the program; user-defined
the semantics of the operation op). predicates are also possible. More interesting implementa-
4. The merge operator does not combine elements when tions, which mine predicates that best generalize the sample
control flow meets: mergeL (e, e , π) = e . set of explicit value assignments, are left for future work.
We plan to investigate methods based on linear templates
5. The termination check considers abstract states individu-
to infer predicates from the sample set, and we will draw
ally: stopL (e, R, π) = (e ∈ R).
from dynamic approaches like DAIKON [11], which auto-
6. The precision is never adjusted: precL (e, π, R) = (e, π). matically detect partial invariants from program executions.
35
Authorized licensed use limited to: Universitat Passau. Downloaded on May 31,2010 at 14:00:55 UTC from IEEE Xplore. Restrictions apply.
(a) Extreme examples
3.5. Analysis of Heap Structures
Program Symbolic Explicit + Symbolic Explicit
k=0 k=1 k=5 k=∞
For lack of space we do not provide a full description of
the following combination of configurable program analy- ex1 0.46 s 0.17 s 0.21 s —
ses with dynamic precision adjustment. In order to analyze ex2 0.43 s 0.16 s 0.21 s 1.00 s
data structures on the heap, we use a composition of the ex3 1 0.16 s 0.13 s 0.11 s 0.08 s
CPA+ for explicit heap analysis and the CPA+ for shape ex3 2 0.40 s 0.24 s 0.14 s 0.09 s
analysis. The explicit heap analysis keeps track of instances ex3 4 1.41 s 0.58 s 0.20 s 0.12 s
of data structures (lists, trees) in its abstract states. Each ab- ex3 8 6.53 s 2.10 s 0.31 s 0.18 s
stract state represents an explicit, finite part of heap memory loop1 25.20 s 26.01 s 22.78 s 0.16 s
that is modeled as a table. Pointer addresses are modeled as loop2 279.84 s 277.07 s 258.79 s 0.44 s
symbolic values and memory content can be looked up in square — — — 0.08 s
the table (i.e., pointer arithmetic is currently not supported). (b) SSH client/server software
We use an additional mapping from program variables to a
Program Symbolic Explicit + Symbolic
scalar value, a symbolic pointer, or top. The size of the ex-
k=0 k=1 k=2
plicit memory state is measured using the number of entries
s3 clnt.1 27.61 s 2.87 s 7.73 s
in the heap-content table.
s3 clnt.2 22.14 s 2.47 s 3.58 s
The CPA+ for shape analysis is derived from the CPA
s3 clnt.3 14.75 s 2.54 s 3.55 s
for shape analysis that we used earlier [3, 4]. The preci-
s3 clnt.4 10.61 s 2.52 s 3.56 s
sion adjustment function tries to find a trade-off between
s3 srvr.1 7.95 s 1.50 s 2.25 s
the large space requirements of the explicit heap analysis
s3 srvr.2 5.00 s 1.39 s 2.13 s
and the expensive computations of the shape analysis. More
s3 srvr.3 3.61 s 1.44 s 2.11 s
precisely, the precision adjustment function measures the
s3 srvr.4 4.32 s 1.41 s 2.14 s
size of the explicit data structures that are produced already.
s3 srvr.6 67.93 s 1.49 s 2.17 s
Once a threshold is hit, the function abstract generalizes
s3 srvr.7 35.59 s 1.95 s 2.56 s
from a set of explicit data structures to shapes, i.e., the func-
s3 srvr.8 4.88 s 1.44 s 2.17 s
tion analyzes the explicit sample structures by querying for
s3 srvr.9 33.20 s 1.94 s 2.67 s
well-known data-structure invariants, and derives interest-
s3 srvr.10 4.58 s 1.47 s 2.16 s
ing core and instrumentation predicates for the shape anal-
s3 srvr.11 36.64 s 1.95 s 2.60 s
ysis [17]. From here, the overall analysis proceeds with
s3 srvr.12 64.78 s 1.50 s 2.21 s
tracking more precise shapes and not storing explicit infor-
s3 srvr.13 125.91 s 2.04 s 2.64 s
mation anymore for that particular structure.
s3 srvr.14 68.27 s 1.52 s 2.19 s
s3 srvr.15 5.01 s 1.68 s 2.12 s
4. Experiments s3 srvr.16 69.12 s 1.51 s 2.28 s
36
Authorized licensed use limited to: Universitat Passau. Downloaded on May 31,2010 at 14:00:55 UTC from IEEE Xplore. Restrictions apply.
Program Shapes Explicit + Shapes Explicit it requires a predicate for every value of the variable:
h=3 h=5 h=∞ i = 0, i = 1, ..., i = 100. Program square contains a
list 1 0.24 s 0.17 s 0.19 s — function that computes the square using additions exclu-
list 2 0.84 s 0.85 s 1.08 s — sively (strength reduction). Predicate analysis of this pro-
list 3 2.99 s 3.30 s 5.06 s — gram fails for two reasons. First, even if unrolling the loop,
list 4 8.80 s 11.40 s 24.39 s — the symbolic analysis could not find the predicate necessary
list bnd4 1 0.35 s 0.28 s 0.07 s 0.06 s to prove safety. Second, the main program uses an expres-
list bnd4 2 1.02 s 1.22 s 0.22 s 0.21 s sion in an assert statement that is beyond the decision pro-
list bnd4 3 2.90 s 4.40 s 1.23 s 1.15 s cedure used in B LAST. The explicit analysis is able to prove
list bnd4 4 7.45 s 12.78 s 6.14 s 5.90 s the program safe efficiently.
list flags 1 0.44 s 0.36 s 0.36 s — The second set of programs (s3 clnt.i, s3 srvr.i)
list flags 2 1.36 s 1.08 s 1.19 s — represents the subroutine for the connection handshake pro-
list flags 3 4.95 s 3.70 s 4.15 s — tocol (state machine) of the SSH client and server, for which
list flags 4 19.66 s 13.89 s 16.33 s — we verify several protocol-specified safety properties (one
list flags 5 79.10 s 59.09 s 74.50 s — line in the table for one property). In all experiments,
list bnd2 f1 0.42 s 0.06 s 0.05 s 0.07 s the combination analysis significantly outperforms the pure
list bnd4 f1 0.69 s 0.51 s 0.11 s 0.08 s predicate analysis (π(x) = 0), sometimes by orders of mag-
list bnd4 f2 2.38 s 1.46 s 0.21 s 0.19 s nitude. The reason is that most predicates in the pure predi-
list bnd4 f3 8.19 s 4.86 s 0.57 s 0.52 s cate analysis are (mis-) used to track just one single explicit
list bnd4 f4 31.65 s 18.40 s 1.94 s 2.14 s value, for which the explicit analysis is superior. The con-
list bnd4 f5 130.94 s 74.72 s 9.64 s 9.48 s figuration for k = 2 always needs more time for the verifi-
cation task, because it tries to track a second value for vari-
Table 2. Performance comparison on examples for ables that require symbolic analysis; for these programs it is
the combination of shape analysis and explicit faster to switch to symbolic tracking after the analysis has
heap analysis found out that one value is not sufficient. A purely explicit
analysis results in false alarms for all examples, because
tic predicates from the program, and predicates returned by
there are variables that have to be analyzed symbolically.
B LAST’s lazy counterexample-guided refinement.
(The programs contain branches whose outcome depends
The first set of example programs consists of some con-
on inputs or uninitialized variables.)
structed, relatively small examples. The programs ex1 and
ex2 exhibit different scenarios for which the combination Explicit Heap Analysis and Shape Analysis. For the sec-
of the two analyses is better than any of the component anal- ond part of our evaluation, we used the CPA+ that results
yses on its own. Program ex1 is the example from Fig. 1. from combining the CPA+ L with the CPA+ for explicit
The results show that it is best to track variables with a heap analysis and the CPA+ for shape analysis. Table 2 re-
small number of possible values explicitly and the loop in- ports performance results for a set of small programs that
dex symbolically. The purely explicit analysis (π(x) = ∞) manipulate lists. We experimented with several configu-
fails for ex1 due to the unbounded number of iterations in rations for the threshold value h, which is the number of
one of the loops. Program ex2 is like ex1 but the exit con- nodes in the list after which we call the function abstract
dition of the second loop is changed so that the number of and switch off the explicit analysis for the heap. The ex-
iterations is bounded by a known constant. Therefore, the treme threshold h = 0 switches the explicit analysis off for
purely explicit analysis succeeds, but the symbolic analy- the whole analysis, and h = ∞ keeps the explicit analysis
sis is still faster due to the high number of concrete values always on. Function abstract takes as input the explicit rep-
of the loop counter. Program ex3 1 results from ex1 by resentation of the data structure and creates a shape graph
removing the loops. It consequently shows that the purely that summarizes the explicit structure.
explicit analysis is always best. The three variations ex3 2, The first four lines of the table let us analyze a negative
ex3 4 and ex3 8 are extended versions of ex3 1, with a effect of explicit heap analysis: the lists generated by pro-
larger number of program locations, which is twice, four gram list i represent an unbounded sequence of i differ-
times and eight times the size of ex3 1, respectively. This ent data elements in an ascending order. The shape graph of
experimental setting illustrates the exponential run time for the shape analysis contains O(i) nodes, whereas the size of
the symbolic analysis versus the linear run time for the ex- the explicit structure not only has a size proportional to the
plicit analysis. length of the list, but explores many different versions of the
To verify programs loop1 and loop2, it is neces- list (combinatorial explosion). The next four examples are
sary to unroll a loop 100 and 200 times, respectively. variations of the first four, but create lists of bounded length
A predicate analysis is prohibitively expensive because (list bnd4 i). The results show an effect similar to the
37
Authorized licensed use limited to: Universitat Passau. Downloaded on May 31,2010 at 14:00:55 UTC from IEEE Xplore. Restrictions apply.
unbounded case, i.e., the explicit analysis suffers from com- References
binatorial explosion for larger values of i, and the summa-
rization in the shape graphs overcompensates the overhead [1] T. Ball and S. K. Rajamani. The S LAM project: Debugging
for the expensive shape operations. But the explicit heap system software via static analysis. In Proc. POPL, pages
analysis can be more efficient when the number of different 1–3. ACM, 2002.
[2] D. Beyer, T. A. Henzinger, R. Jhala, and R. Majumdar. The
lists that need to be examined is relatively small.
software model checker B LAST: Applications to software
The third set of examples list flags i again pro- engineering. Int. J. Softw. Tools Technol. Transfer, 9(5-
duces lists of unbounded length. The performance num- 6):505–525, 2007.
bers indicate that the explicit analysis can be helpful —if [3] D. Beyer, T. A. Henzinger, and G. Théoduloz. Lazy shape
the threshold is reasonably small— for constructing the ab- analysis. In Proc. CAV, LNCS 4144, pages 532–546.
stract shape graphs by conversion from explicit heap struc- Springer, 2006.
tures (speedup of 24 % for threshold h = 3). The last set [4] D. Beyer, T. A. Henzinger, and G. Théoduloz. Config-
urable software verification: Concretizing the convergence
of examples produces lists of bounded length, but requires
of model checking and program analysis. In Proc. CAV,
larger and many shape graphs to prove the property. This LNCS 4590, pages 504–518. Springer, 2007.
shows that the overhead of expensive shape-analysis opera- [5] B. Blanchet, P. Cousot, R. Cousot, J. Feret, L. Mauborgne,
tions can be effectively avoided by explicit heap tracking A. Miné, D. Monniaux, and X. Rival. A static analyzer for
when the control flow of the program limits the number large safety-critical software. In Proc. PLDI, pages 196–
of possible structures for the explicit analysis and leads to 207. ACM, 2003.
many different shape graphs in the shape analysis. [6] S. Chaki, E. M. Clarke, A. Groce, and O. Strichman.
Predicate abstraction with minimum predicates. In Proc.
Discussion. So far, our preliminary experiments have not CHARME, LNCS 2860, pages 19–34. Springer, 2003.
identified an absolut advantage of one of the configurations, [7] E. M. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith.
but that different configurations lead to significantly differ- Counterexample-guided abstraction refinement for symbolic
ent performance, and that further work is necessary to lever- model checking. J. ACM, 50(5):752–794, 2003.
[8] M. Codish, A. Mulkers, M. Bruynooghe, M. G. de la Banda,
age the potential that the combination offers. We introduced
and M. Hermenegildo. Improving abstract interpretations by
the new concept of CPA+ in order to make it possible to
combining domains. In Proc. PEPM, 194–205. ACM, 1993.
compare such different analyses and combinations thereof, [9] P. Cousot and R. Cousot. Abstract interpretation: A unified
and showed that the results are indeed interesting and worth lattice model for the static analysis of programs by construc-
a further research project. tion or approximation of fixpoints. In Proc. POPL, pages
All benchmarks and an executable of the implementa- 238–252. ACM, 1977.
tion are available online on our supplementary web page at [10] P. Cousot and R. Cousot. Systematic design of program
analysis frameworks. In Proc. POPL, 269–282. ACM, 1979.
http://www.cs.sfu.ca/∼dbeyer/blast cpaplus/.
[11] M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant,
C. Pacheco, M. S. Tschantz, and C. Xiao. The DAIKON sys-
5. Conclusion tem for dynamic detection of likely invariants. Sci. Comput.
Program., 69(1-3):35–45, 2007.
[12] P. Godefroid. Model checking for programming languages
We have provided an algorithm and tool for experiment- using V ERI S OFT. In Proc. POPL, pp. 174–186. ACM, 1997.
ing with the combination of several program analyses. Our [13] B. S. Gulavani, T. A. Henzinger, Y. Kannan, A. V. Nori, and
S. K. Rajamani. S YNERGY: A new algorithm for property
method allows the on-line transfer of information between
checking. In Proc. FSE, pages 117–127. ACM, 2006.
the different analyses, and this information can be used to [14] S. Gulwani and A. Tiwari. Combining abstract interpreters.
increase or decrease the precision of any analysis on-the-fly. In Proc. PLDI, pages 376–386. ACM, 2006.
Using our tool, we showed that it can be beneficial to com- [15] T. A. Henzinger, R. Jhala, R. Majumdar, and K. L. McMil-
bine predicate abstraction with an explicit analysis because lan. Abstractions from proofs. In Proc. POPL, pages 232–
many predicates are used to track only specific values of 244. ACM, 2004.
variables. Our method allows us to dynamically change the [16] T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy
abstraction. In Proc. POPL, pages 58–70. ACM, 2002.
partition between the variables that are analyzed symboli- [17] M. Sagiv, T. W. Reps, and R. Wilhelm. Parametric shape
cally using predicates and those that are analyzed explic- analysis via 3-valued logic. ACM Trans. Program. Lang.
itly. We also studied the effects of combining a symbolic, Syst., 24(3):217–298, 2002.
graph-based shape analysis with an explicit heap analysis. [18] K. Sen, D. Marinov, and G. Agha. C UTE: A concolic unit
The new formalism and tool allow us to quickly set up such testing engine for C. In Proc. ESEC/FSE, pages 263–272.
experiments and study the effects of different parameter set- ACM, 2005.
[19] G. Yorsh, T. Ball, and M. Sagiv. Testing, abstraction, theo-
tings. We have not yet addressed the problem of automati-
rem proving: Better together! In Proc. ISSTA, pages 145–
cally mining meaningful predicates from a given set of sam-
156. ACM, 2006.
ple values. This is left for future investigation.
38
Authorized licensed use limited to: Universitat Passau. Downloaded on May 31,2010 at 14:00:55 UTC from IEEE Xplore. Restrictions apply.