Beruflich Dokumente
Kultur Dokumente
C ONCURRENT S OFTWARE
by
Niloofar Razavi
Copyright
c 2014 by Niloofar Razavi
Abstract
Niloofar Razavi
Doctor of Philosophy
University of Toronto
2014
With the increasing dependency on software systems, we require them to be reliable and cor-
rect. Software testing is the predominant approach in industry for finding software errors.
There has been a great advance in testing sequential programs throughout the past decades.
Several techniques have been introduced with the aim of automatically generating input val-
ues such that the executions of the program with those inputs provide meaningful coverage
Today, multi-threaded (concurrent) programs are becoming pervasive in the era of multi-
processor systems. The behaviour of a concurrent program depends not only on the input
values but also on the way the executions of threads are interleaved. Testing concurrent pro-
grams is notoriously hard because often there are exponentially large number of interleavings
of executions of threads that has to be explored. In this thesis, we propose an array of heuristic-
based testing techniques for concurrent programs to prioritize a subset of interleavings and test
(A) a sound and scalable technique that based on the events of an observed execution,
predicts runs that might contain null-pointer dereferences. This technique explores the inter-
leaving space (based on the observed execution) while keeping the input values fixed and can
(B) a test generation technique that uses a set of program executions as a program under-
approximation to explore both input and interleaving spaces. This technique generates tests
ii
that increase branch coverage in concurrent programs based their approximation models.
defined based on the notion of data-flow between threads and is parameterized by the number of
interferences among threads. Testing techniques that employ this heuristic are able to provide
(D) a testing technique which adapts the sequential concolic testing to concurrent programs
by incorporating the bounded-interference heuristic into it. The technique provides branch
Based on the above techniques, we have developed tools and used them to successfully find
iii
Acknowledgements
I want to take this opportunity to thank all the people that had an influence in the work pre-
sented in this thesis. First and foremost, none of this work would have been possible without
Azadeh Farzans support. I am grateful for the time she dedicated to my projects, as well as her
inspiration. I would like to thank my committee members, Marsha Chechik, Sheila McIlraith,
Steve Easterbrook, and Scott Stoller for their insightful comments and suggestions about my
work.
University of Illinois at Urbana Champaign for their collaboration on the null-pointer derefer-
ence prediction work, and Helmut Veith and Andreas Holzer from the Vienna University of
Technology for their collaborations on (conc)2 olic testing work. I need to thank the people at
the NEC Laboratories America, Aarti Gupta, Franjo Ivancic and Vineet Kahlon, for the amaz-
ing internship experience I had at NEC. I learned a lot from them and enjoyed every minute
of working at NEC. A special thanks to all my Toronto friends and officemates: Varada Kol-
hatkar, Jocelyn Simmons, Golnaz Elahi, Alicia Grubb, Zachary Kincaid, Aws Albarghouthi
Finally and foremost, I would like to thank my family. My parents, Seyed Alireza and
Sousan, whose constant love and encouragement has helped me to be optimistic even in the
most difficult days of my life. My sisters, Negin and Negar, who were always listening to my
complains patiently. My love, Andreas, whose unwavering love, support, and understanding
iv
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.6.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
v
2.6.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
vi
5.1.2 Global Traces (Revisited) . . . . . . . . . . . . . . . . . . . . . . . . . 97
vii
6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Bibliography 168
viii
List of Tables
2.1 Experimental results for precise/relaxed prediction using logical constraint en-
coder/solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.3 Experimental results for predicting data races and atomicity violations using
planning encoder/solver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
ix
List of Figures
2.5 E XCEPTIO NULL with logical constraint encoder/solver built on top of P ENE -
3.2 Test generation based on MTA for the program in Figure 3.1. . . . . . . . . . . 58
x
6.2 Symbolic trace obtained from the assertion violating execution of the pro-
gram presented in Figure 6.1 and its corresponding interference scenario IS() 127
6.4 Constraint systems DC(I) and TC(I) for an interference scenario I = (V, E, `). 133
6.6 An example showing initial path exploration for thread T 0 (cf. 6.2). . . . . . . . 141
d2 , d3 , and d4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
xi
Appendices
xii
Chapter 1
Introduction
1.1 Background
Software systems nowadays affect every aspect of our life; they are used in medical services,
transportation, education systems, businesses, and etc. Even a small error in any of these
systems may lead to huge loss of money, time, or lives. Therefore, there is a great need to
develop techniques to ensure that software systems are reliable, safe, and secure. In industry,
software testing is still the predominant technique to find correctness and performance issues
in software systems. Billions of dollars are spent on software testing each year which includes
Sequential software systems consist of a single thread of execution. The testing process of a
sequential system includes providing different input values to the system and investigating the
behaviour of the system under the given inputs. For example, Concolic testing [24, 73, 5, 84, 4]
is an automatic sequential testing technique which executes the program with both concrete and
symbolic inputs simultaneously and uses path constrains (i.e., branch conditions encountered
during execution) on symbolic inputs to generate input values that lead the execution towards
uncovered parts of the program. Sequential software testing techniques are often coupled with a
notion of coverage that the technique provides. Various coverage criteria have been introduced
1
C HAPTER 1. I NTRODUCTION 2
for sequential program testing over the years, e.g., path coverage [24, 4, 73], control-flow
coverage [84], predicate coverage [34], etc. These coverage criteria quantify the testing process
and give the tester some meaningful information about how much the software has been tested.
the increase in the usage of multi-core hardware. However, testing concurrent software is more
challenging than testing sequential software since the behaviour of a concurrent program not
only depends on input values but also is affected by the way the executions of threads are
interleaved; i.e., even under fixed input values, different interleavings of executions of threads
may lead to different behaviours. Test generation for concurrent systems includes both (i)
exploring the input space to find a set of input values that may trigger a bug (input generation),
and (ii) exploring the interleaving space, with bug-triggering input values, to find possible
bugs (schedule generation). However, the exploration space is huge for real programs and it is
Stress testing [2] and randomized testing [11, 83, 71, 58, 43, 3] are two traditional testing
techniques for concurrent programs. Stress testing [2] is an approach which puts programs
under pressure by providing a set of input values to the program and then executing the program
with each input for a long time (for server applications) or for many times with many threads
(for other types of applications) with the hope of exploring different interleavings in different
executions. Randomized testing [11, 83, 71, 58, 43, 3], on the other hand, aims at exploring
distinct interleavings in different executions by changing the priorities of the threads on the fly
or strewing the code with sleep commands for random time intervals. However, both of these
More recent techniques employ some heuristics to restrict the exploration space to a man-
as follows:
C HAPTER 1. I NTRODUCTION 3
Techniques in this category rely on a given set of input values and focus on interleaving ex-
ploration while keeping the inputs fixed. For example, there is an array of prediction tech-
niques [17, 82, 35, 93, 92, 90, 89, 7, 59, 74, 68] which are based on the philosophy of using
heuristics to target interleavings that are more likely to contain bugs, e.g., interleavings that
contain data races [13, 95, 57, 50, 69, 18] or atomicity violations [75, 58, 49, 94, 19, 15], and
testing as many of those as possible (under the given time and space limitations).
A data race occurs when two threads access a shared memory location at the same time
and at least one of the accesses is a write access. Data races might lead to unpredictable
program behaviors and could be symptomatic of errors. A code unit is not atomic if it is
interrupted during an execution by statements from another threads, and the interaction cannot
be ruled out as harmless by presenting an equivalent execution in which this interruption does
not occur. Atomicity violations could lead to concurrent program behaviours overlooked by
programmers [47]. Many prediction techniques use data races and atomicity violations as
heuristics to reduce the interleaving exploration space [82, 90, 59, 68]. Prediction techniques
utilize a static lock-based [17, 82, 35, 59, 93, 92] or symbolic [90, 89] (using SMT solvers)
There is also an array of search techniques [53, 54, 12, 55] that take a more coverage-
oriented approach than the prediction techniques. They characterize a subset of the search
space by a bounding parameter p. More behaviors are explored as p is increased, and in the
limit all behaviors are explored. Context bounding [53, 54, 55] has been used as a heuris-
tic to prioritize interleavings within a bounded number of context-switches over the others.
The intuition behind this search strategy is that many bugs in concurrent programs manifest
themselves by a few number of context-switches occurring during the program execution. For
example, CHESS [53] is a tool that explores all interleavings (under the fixed input values)
up to a bounded number of context-switches. Delay bounding [12] has been used to transform
a deterministic scheduler into a sufficiently non-deterministic one (for the testing purpose) by
C HAPTER 1. I NTRODUCTION 4
enabling it to delay each ready task for a bounded number of times. The non-deterministic
scheduler allows efficient exploration of the interleaving space by increasing the possibility of
The main advantage of the techniques in this category is that they are both simple and
efficient in finding simple concurrency bugs that do not require complicated input values to be
revealed. They simplify the testing process by ignoring input exploration and have been proven
to be very effective in bug finding. As a result, these techniques are used for testing software
in earlier development stages under some manually provided input values where the goal is to
Techniques in this category use program under-approximations as heuristics to limit the ex-
ploration space; i.e., input/interleaving exploration is done for the approximated programs.
Most of these techniques [80, 79] use concurrent trace programs, i.e., program slices built
a static symbolic analysis on concurrent trace programs, based on encoding possible execu-
tions of concurrent trace programs as a set of logical constraints and using SMT solvers, to
check safety properties (of course they are incomplete for proving properties because of the
approximations); e.g., the output of the analyses is whether an assertion can be violated or not.
The main advantage of these techniques over the techniques in the first category is that they
perform input exploration (although it is limited to the approximated programs) and hence
could be used to catch more complicated bugs; of course, this advantage comes at the expense
of more complicated analyses. Therefore, these techniques are best to be applied after testing
the software with the techniques in the first category to search for bugs that might be overlooked
there, i.e., bugs that occur under specific input values that are not tested by the techniques in
Techniques that fall into this category explore both input and interleaving spaces of the whole
program while using some heuristics for interleaving exploration. Sequentialization tech-
niques [44, 42, 85, 63, 62, 21] transform a concurrent program into a sequential program and
then analyze the sequential program statically (e.g., for finding assertion violations) using se-
quential analysis techniques. These techniques utilize some heuristics to embed a subset of
behaviours of the concurrent program in the resulting sequential program. Most of these se-
quentialization techniques are based on the context bounding heuristic; the sequential program
context-switch non-deterministically after each concurrent program statement if the bound has
Other techniques in this category [73, 70] leverage concolic testing techniques, using some
heuristics for interleaving exploration, to generate tests for concurrent programs. For example,
jCute [73] is a concolic testing tool for concurrent Java programs that uses data races as a
heuristic to prune the interleaving space. It executes the program and identifies data races in
the observed execution. After each program execution, jCute either keeps the interleaving fixed
as before and performs input generation to cover a previously uncovered part of the program
or keeps the input values fixed as before and explores a new interleaving by simply re-ordering
The techniques in this category are more expensive than the techniques in the second cate-
gory since the input/interleaving exploration is performed for the whole program rather than its
these techniques are capable of providing coverage guarantees for the concurrent programs af-
ter the testing process is finished. Therefore, these techniques can be applied in later stages
of software development (when techniques in other categories fail to find any more bugs) to
1.2 Contributions
In this dissertation, we focus on effective test generation for concurrent programs and advance
program run
A prediction technique (in category (i)) is sound if the predicted runs are feasible program runs
and is scalable if it works for large runs. Having an analysis which is both sound and scalable
is a common challenge in all prediction techniques and often one of these issues is sacrificed
for the benefit of the other. On the other hand, prediction techniques have mostly focused on
data races, atomicity violations and assertion violations as heuristics to explore interleavings
that might contain any of these violation patterns. However, the applicability of prediction
techniques is not restricted to these bugs; i.e., they can target other types of bugs (e.g., memory
bugs, deadlocks, and etc.) that are also common in concurrent programs. That requires to
come up with appropriate violation patterns (reflecting these bugs) and provide corresponding
analyses.
We introduce a new pattern, called null reads, for predicting null-pointer dereferences in
concurrent programs. The intuition behind this pattern is that null is a critical value and in
many cases reading null values might lead to memory bugs. Our prediction technique is both
sound and scalable. To provide scalability, the analysis is performed at the shared commu-
nication level (i.e., accesses to shared variables and synchronization events) by suppressing
local computation in the observed runs. We also develop a static pruning technique which
drastically reduces the size of the prediction problem. To provide soundness, we employ the
maximal causal model [74] which works at the shared communication level and guarantees
soundness. We also develop a relaxation technique that allows us to deviate from the maxi-
C HAPTER 1. I NTRODUCTION 7
mal causal model gradually to predict some (not necessarily sound) runs when the prediction
We propose two different techniques for encoding the prediction problem based on the
maximal causal model; in the first technique, the problem is encoded as a constraint satisfac-
tion problem and the state-of-the-art SMT (Satisfiability Modulo Theories) solvers are used to
search for solutions. The second technique is based on conceptualization and realization of the
prediction problem as an AI automated planning [56] problem. This enables us to benefit from
AI planners.
Most of the techniques that use program approximations to perform input/interleaving explo-
ration (in category (ii)), so far, were aimed at finding assertion violations. In fact, none of these
techniques have targeted test generation (i.e., input and schedule generation) for exploring dif-
ferent possible program behaviours. Furthermore, all existing techniques in category (ii) fix
the approximation model a priori which does not allow exploring program code or behaviors
that are beyond the approximation. For example, they cannot catch the violation of assertions
We use concurrent trace programs (i.e., program slices built from program executions) as
program approximation models and develop a multi-trace analysis to generate tests for concur-
rent programs. The multi-trace analysis is built on top of symbolic prediction techniques and
utilizes information available in multiple program runs to generate tests that would increase
However, we do not fix the approximation model a priori, i.e., the approximation is aug-
mented by the observed run after running each generated test. Furthermore, we make the
multi-trace analysis target test generation for branches that are not present in the approximation
which allows us to explore program behaviours that are beyond the approximation. Note that in
C HAPTER 1. I NTRODUCTION 8
an active testing framework [25], many runtime bugs can be encoded as branches. Therefore,
by targeting branch coverage, one can implicitly aim for catching those bugs. We use this fact
and combine a sequential testing technique with the multi-trace analysis such that individual
threads are exposed to sequential test generation first to increase branch coverage as much as
possible. Upon saturation, we fall back to our multi-trace analysis to generate tests for covering
C. Bounded-interference heuristic
(i) Inefficiency: most sequentialization techniques use the context bounding heuristic. Note
that many thread interleavings might be equivalent to each other according to the way threads
interfere with each other. Therefore, exploring all such interleavings reduces the efficiency
(ii) Lack of coverage guarantees: most concolic testing techniques for concurrent programs
use data races as a heuristic for exploring the interleaving space. Due to this heuristic, these
techniques are unable to quantify the partial work done during the testing process as a coverage
measurement. Therefore, they cannot provide any coverage guarantees (on program code or
behaviours) for concurrent programs when the time or memory limit is hit.
in category (iii), to efficiently provide coverage guarantees for concurrent programs. An in-
terference happens whenever a thread reads a value that is written by another thread. The
idea behind the bounded-interference heuristic is to gradually explore all program behaviours
within a bounded number of interferences among threads. This heuristic is parameterized with
the number of interferences and therefore can be used to provide coverage guarantees (modulo
Another property of this heuristic is that it is defined based on the notion of data flow
among the threads, in contrast to the control-based notions such as context bounding that are
C HAPTER 1. I NTRODUCTION 9
tied to schedules. Therefore, it can be naturally incorporated into sequential testing techniques
to explore the input space and the interference space in a unified manner.
sequential program such that the sequential program embeds all behaviours of the concurrent
simple by considering concurrent programs with only two threads where only one thread can
A nice property of the sequentialization technique is that inputs of the concurrent program
and interference scenarios are both encoded as inputs of the generated sequential program and
hence the underlying sequential testing technique is able to explore both input and interference
scenario spaces side by side. The sequentialization is sound, i.e., every bug in a generated
sequential program represents a bug in the corresponding concurrent program and applying a
sequential testing technique with specific coverage guarantees provides coverage guarantees
Existing concolic test techniques for concurrent programs (in category (iii)) use data races as a
heuristic for exploring the interleaving space; i.e., interleaving exploration is done by switching
the order of events involved in a data race in a previous execution. However, these techniques
are able to provide coverage guarantees only when the testing algorithm is terminated after
considering all possible orderings of events involved in a data race. Note that this exploration
space is often very large for real world programs such that the testing algorithm fails to ter-
minate in a reasonable amount of time. Unfortunately, due to the data race heuristic, these
techniques are unable to quantify the partial work done (e.g., at the occasion of a timeout) as a
technique to generate tests for concurrent programs. Using the bounded-interference heuristic,
our concolic testing technique provides coverage guarantees (modulo the interference bound)
for concurrent programs both after the testing process is finished and when a time/computation
limit is reached. We introduce a new component in concolic testing that explores possible
interference scenarios (within the interference bound), and build a general framework which
can employ different exploration strategies for inputs and interference scenarios.
We develop a search strategy that targets branch coverage in concurrent programs; i.e., in-
terference scenario and input spaces are explored based on branches of the concurrent program
that are yet uncovered during the testing process. This test generation technique is sound and
provides branch coverage guarantees (modulo interference bound) for concurrent programs.
1.3 Outline
Chapter 2 describes a sound and scalable technique for predicting null-pointer derefer-
ences in concurrent programs. This chapter is based on two of our publications on this
technique (i.e., [65] and [16]). We discuss the null reads pattern and present our static
pruning which targets increasing scalability. Then, we present the logical constraints
and planning encodings of the prediction problem based on the maximal causal model.
We discuss the relaxation technique and experimentally show the effectiveness and effi-
ciency of the prediction technique. Finally, we describe the related work and compare it
This chapter is organized according to our publication on test generation based on multi-
trace analysis (i.e., [66]). We present our multi-trace analysis in detail and show how
test generation technique in increasing branch coverage and finding concurrency. At the
end of the chapter, we present the related work and compare it to our multi-trace analysis
technique.
testing techniques to generate tests with coverage guarantees for concurrent programs.
ization (i.e., [64]). We present the transformation algorithm and prove it to be sound and
tool). We evaluate the effectiveness and efficiency of testing based on this sequential-
technique.
Chapter 6 describes our concolic testing technique for concurrent programs. This chapter
current programs (i.e., [14]). We show how the general testing framework is built based
for a search strategy that targets branch coverage in concurrent programs. We prove the
the testing technique in code coverage and finding concurrency bugs. Finally, we present
Chapter 7 summarizes the research in thesis and identifies possible directions for future
work.
Chapter 2
Concurrent Programs
Prediction-based testing is a promising approach for testing concurrent programs [82, 59, 89,
90, 74, 68, 7, 77, 78]. It involves observing one execution of the program under test with some
input values, and from that predict alternate interleavings of thread executions with the same
input values. Prediction only explores the interleavings of thread executions that are close to
the observed executions only, while at the same time explores interesting interleavings that are
Prediction techniques, so far, have focused on predicting bugs that correspond to atomicity
violations [82, 59, 90], data races [68], and assertion violations [89, 7]; i.e., these violation
patterns are used as heuristics to reduce the exploration space to interleavings that are more
probable to realize any of these patterns. Although these heuristics have been very successful
in finding concurrency bugs, a recent research [96] shows that memory bugs (e.g., null-pointer
dereferences) are often more harmful than many other types of bugs since they normally cause
program crashes. Therefore, memory bugs could be good candidates to be targeted by predic-
tion techniques.
Here, we propose a new violation pattern for prediction that is different from data races or
12
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS13
atomicity violations; we propose null reads that target interleavings that lead to null-pointer
tal techniques to soundly and scalably predict executions that are likely to realize null reads
patterns:
computation entirely to achieve scalability; we use the maximal causal model [74] which
works at the shared communication level (i.e., accesses to shared variables and synchro-
nization events). Our approximation of the prediction problem asks for runs that force
threads to read null values where possible. Predicted runs in this model will be feasi-
ble but may not actually cause a null-pointer dereference (e.g., the thread reading a null
value might dereference a pointer only if it is not null), though they are likely to do so.
2. Static Pruning: We use a static analysis that aggressively prunes the executions by identi-
fying a small segment of the observed run on which the prediction effort can be focused.
Pruning of executions does not affect feasibility of the runs, but increases the scalability
of our technique.
tion level that allows some leeway so that the prediction algorithm can predict runs with
mild deviations from the maximal causal model; this makes the class of predicted runs
larger at the expense of possibly making them infeasible, though in practice, we found
4. SMT and AI planning encodings: We encode the prediction problem both as a constraint
satisfaction problem and as an AI planning problem. The former enables the applicabil-
ity of state-of-the-art SMT solvers while the latter enables us to benefit from the compact
problem.
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS14
5. Re-execution: The runs predicted using the above techniques might be infeasible (in case of
relaxed prediction), or might be feasible and yet not cause any null-pointer dereference.
We mitigate this by re-executing the program according to the predicted runs to check
if a null-pointer dereference actually occurs. Errors reported are always real (i.e., they
cause an uncaught exception or result in failing the test harness), and hence we incur no
false positives.
This chapter is based on our publications on these techniques (i.e., [65] and [16]). We
Consider the code snippet extracted from the Pool 1.21 library in the Apache Commons
collection, presented in Figure 2.1. In the returnObject method, first, the state of the shared
object pool, is tested outside the synchronized block, by checking the value of the flag variable
block that dereferences the shared object pool. Method close, on the other hand, closes the
pool by writing null to pool and setting isClosed to true, signaling that the pool has been
closed.
The error in this code (and such errors are very typical) stems from the fact that the check of
isClosed in method returnObject is not within the synchronized block; hence, if a thread
executing the returnObject method performs the check at line 3, and then a concurrent
thread executes the method close before the synchronized block begins, then the access to
In a dynamic testing setting, consider the scenario where we observe an execution with
two threads T and T 0 , where T executes the method returnObject first, and then, T 0 exe-
cutes the method close after T finishes executing returnObject. There is no null-pointer
1
http://commons.apache.org
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS15
dereference in the execution. Our goal is to predict a permutation of the events of (called
Our prediction for null-pointer dereferences works as follows. In the run , thread T reads
a non-null value from the shared object pool when the object method pool is called at line
10. Also, T 0 writes a null value to the same shared object pool at line 18. Our prediction
approach identifies that read-write pair and searches for alternative schedules 0 in which, the
read at line 10 (in T ) reads the value null written by the write at line 18 (in T 0 ). Therefore,
it predicts a run 0 in which T is executed first until it gets to the synchronized block at line 5,
followed by the execution of T 0 and then the execution of the synchronized block in T , which
Our prediction algorithm observes accesses to shared variables and synchronization events
but suppresses the semantics of the local computation entirely and does not even observe them.
Then, it identifies null-WR pairs = (e, f ), where e is a write of null to a variable and f is a
non-null read of the same variable in the observed run. Then, it encodes the problem of finding
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS16
a sound permutation of events of the observed run in which f is reading the null value written
by e as a logical constraint system or an AI planning problem and uses the state-of-the-art SMT
solvers and planners to search for an answer (if there exists any solution).
2.2 Preliminaries
Our prediction technique, similar to other prediction techniques, is based on program runs,
i.e., a run of the program is observed and then the information available in the run is used for
predicting other runs. Here, we first discuss what kind of information is available in program
runs and then provide a background on the maximal causal model [74] which is the basis for
We define a global trace to be a sequence of global computation (i.e., accesses to shared vari-
ables) and synchronization events. Note that a global trace does not contain any information
about local computation (i.e., reads and writes to local variables) in the execution.
We assume a set of thread identifiers T = {T1 , T2 , ...} and define a set of shared variables
SV = {sv1 , sv2 , ...} that the threads can access. Let Init(x) and V al(x) represent the initial
value and the set of possible values that the shared variable x SV can get, respectively. We
The set of actions that a thread can perform on the set of shared variables SV and global
Actions rd(x, val) and wt(x, val) correspond to reading value val from and writing value
val to shared variable x, respectively. Actions ac(l) and rel(l) represent acquiring and releasing
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS17
(Ti , a) T . Let EV denote the set of all possible events. The sequence of events observed
Definition 2.2.1 (Global Trace). A global trace is a finite string EV . By [n], we denote
the nth event of . Given a global trace , |Ti is the projection of to events involving Ti .
In this chapter, whenever we refer to traces we mean global traces. A global trace is
lock-valid iff it respects the semantics of locking, i.e., two threads cannot obtain the same lock
simultaneously.
Definition 2.2.2 (Lock-Valid Traces). Let be a global trace and |Ti ,l be the projection of
(i) For each lock l, |Ti ,l (if it is not empty) starts with an acquire event (Ti , ac(l)) and
acquire events (Ti , ac(l)) alternate with corresponding lock release events (Ti , rel(l)) in
|Ti ,l , and
(ii) For each acquire event [m] = (Ti , ac(l)) either (1) there exists a corresponding release
event [n] = (Ti , rel(l)) such that m < n, and there are no acquire or release events of
lock l by other threads between [m] and [n], or (2) the lock is not released by Ti in
(i.e, there is no event (Ti , rel(l)) after [m]) and there are no acquire or release events
Let be a lock-valid trace. Lock-sets and lock acquisition histories for are defined as
follows:
Definition 2.2.3 (Lock-Sets and Acquisition Histories (from [39])). Lock-Set(Ti , [j]) is de-
fined to be the set of locks acquired but not released by Ti before [j] in . Then, for thread Ti
and lock l such that l Lock-Set(Ti , [n]), where n is the length of , we define AH(Ti , l, ) be
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS18
the set of locks that were acquired (and possibly released) by Ti after the last (Ti , ac(l)) event
in .
A global trace is data-valid iff it respects the read-write constraints, i.e., each read from
a shared variable should read the value written by the most recent write event to that shared
variable.
Definition 2.2.4 (Data-Valid Traces). Let be a global trace. Then, is data-valid iff for each
(i) The last write event to x writes value val; i.e., there is m such that m < n and [m] =
(Ti , wt(x, val)) and there is no k such that m < k < n and [k] = (Tq , wt(x, val0 )) for
(ii) There is no write event to variable x before the read and val is the initial value of x; i.e.,
there is no m such that m < n and [m] = (Tj , wt(x, val0 )) (for any val0 and any thread
A global trace is creation-valid iff every thread is created at most once and the events of
Definition 2.2.5 (Creation-Valid Traces). A global trace is creation-valid iff for every Ti T ,
there is at most one event of the form (Tj , tf(Ti )) in , and, if such an event exists, then all events
Each global trace obtained from an execution of a program defines a total order on the set
of events in it. Furthermore, there is an induced partial order between the events of each thread:
Definition 2.2.6 (Program Order). Let be a global trace obtained from an execution of a
program. We define a partial relation vk such that [i] vk [j] iff [i], [j] |Tk , and i j.
The program order arranges the events in each thread according to their order in .
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS19
Definition 2.2.7 (Causal Relation). Let be a global trace. We define a partial order on the
(iv) [i] = (Tp , wt(x, val)) and [j] = (Tq , rd(x, val0 )) or [j] = (Tq , wt(x, val0 )) for some
(v) [i] = (Tp , rd(x, val)), [j] = (Tq , wt(x, val0 )) for some x, val, val0 , and Tp 6= Tq , or
(vi) [i] = (Tp , tf(Tq )) and [j] is performed by Tq (i.e., [j] = (Tq , )), and Tp 6= Tq .
The causal relation, for each event in the trace, defines a set of events on which the event
depends.
Our prediction technique is based on the maximal causal model [74]. The maximal causal
model works at the shared communication level, i.e., it only considers accesses to shared vari-
ables and synchronization events. Given a global trace of a concurrent program, a causal model
is obtained which is both sound and maximal; i.e., all traces consistent with the causal model
correspond to feasible executions of the concurrent program under analysis, and assuming only
the global trace and no knowledge about the source code of the program, the model captures
more feasible executions than any other sound causal model. In the following, we define the
Definition 2.2.8 (Precisely Predictable Runs (adapted from [74])). Let be a global trace
over a set of threads T , shared variables SV , and locks L, obtained from a program execution.
(ii) 0 is lock-valid,
(iv) creation-valid.
Let P rP red() denote the set of all runs with global traces that are precisely predictable from
The first condition above ensures that the sequence of events of Ti occurred in 0 is a prefix
of the sequence of events of Ti occurred in . Note that we are forcing the thread Ti to read
the same values of shared variables as it did in the original run. Along with data-validity, this
ensures that the thread Ti reads precisely the same values and updates the local state in the
same way as in the observed run. Lock-validity and creation-validity are, of course, required
for feasibility. The following theorem states the soundness of the prediction that guarantees all
Theorem 2.2.9 (from [74]). Let P be a program and be a global trace corresponding to an
The complete proof of this theorem can be found in [74]. The intuition behind why the
theorem holds is that as long as each read event reads in the same value as it did in the observed
run, each thread is forced to take the same path as it took in the observed run. Since the
observed run is a feasible run, all predicted runs are guaranteed to be feasible.
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS21
Although null-pointer dereferences could occur on both local and shared variables, a predic-
tion that takes into account local variables and local computation encounters scalability issues
that works at the shared communication level, i.e., accessing to shared variables and synchro-
nization events. Consider a thread T that in some interleaving reads a non-null value from
shared variable x and subsequently does some computation locally using the non-null value,
and consider the task of predicting whether this could result in a null-pointer dereference. Our
approximation of the prediction problem at the shared communication level asks for a run that
forces the thread T to read a null value from x. Note that this approximation is neither sound
nor complete: Thread T may read null for x but may not dereference the pointer (e.g., it could
check if x is null), and there may be runs where the value read is not null and yet the local
necessary to scale to large runs, as it is imperative that local computation is not modeled. To
guarantee the feasibility of the predicted runs, our prediction approach is based on the maximal
causality model (discussed in Section 2.2.2). Now, we formally define the precise prediction
Definition 2.3.1 (Precisely Predictable Null-Reads). Let be a global trace obtained from an
execution of a program P . We say that a run with global trace 0 is a precisely predictable run
(i) 0 = 00 .f where f is of the form rd(x, null) and 00 is a precisely predictable from (see
(ii) there is some val 6= null such that ( 00 |Ti ).rd(x, val) is a prefix of |Ti .
Intuitively, the above conditions require that the 0 be a precisely predictable from fol-
lowed by a read of null by a thread Ti on variable x, and further, in the observed trace , thread
Ti must be performing a non-null read of variable x after performing its events in 00 . The
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS22
above captures the fact that we want a precisely predictable run followed by a single null-read
that corresponds to a non-null read in the original observed run. Note that 0 itself is not in
The first step of our prediction is to identify a set of null-WR pairs = (e, f ), where e
is a write of null to a variable and f is a non-null read of the same variable, in the observed
trace. Then, we perform a static lock-based analysis according to the null-WR pairs to identify
a small segment of the observed run on which the prediction can focus. Finally, we encode the
following, we discuss how we identify the null-WR pairs and then present the static pruning
Each null-WR pair = (e, f ) is a tuple where e is a write of null to a shared variable x and f
is a non-null read of the same shared variable. We would like to identify pairs that are feasible
at least according to the hard constraints of thread-creation and locking in the program. For
instance, if a thread writes to a shared variable x and reads from it in the same lock-protected
region of code, then clearly the read cannot match a write protected by the same lock in another
thread. Similarly, if a thread initializes a variable x to a non-null and then creates another thread
that reads x, clearly the read cannot see the uninitialized x. We use a lock-based static analysis
of the run (without using a constraint solver) to filter out such infeasible null-WR pairs .
The idea is to check if for a null-WR pair = (e, f ), f can read from e in a (not neces-
sarily feasible) run that only respects lock-validity and creation-validity constraints (and not
data-validity). Creation validity is captured by computing a causal relation among the threads
(Definition 2.2.7) by considering only the program order and thread creation constraints (i.e.,
items (i) and (vi)) in Definition 2.2.7. If f e according to this relation, then clearly f cannot
occur after e and the null-WR pair should be discarded. Lock-validity is captured by reduc-
ing the problem of realizing the pair (e, f ) to pairwise reachability under nested locking [39],
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS23
Ti Tj
.
e: writeX,null .
.. ..
.
}
LS 1
AH 1
e.!
..
f : readX,null
..
{LS 2
AH 2
.
e.!!: writeX,
.
{ LSAH compatible
1 LS =
1
2
with AH 2
Figure 2.2: Static lock-based analysis for feasibility of a null-WR pair = (e, f ).
which is then solved by computing lock-sets and acquisition histories for each event. Similar
techniques have been exploited for finding atomicity violations in the tool P ENELOPE [82].
Consider an observed trace and a null-WR pair = (e, f ) where f (a read in thread Tj )
occurs before e (a write in thread Ti ) in . Let us assume that e00 is the next write event (to the
We claim that if there exists a lock-valid run with global trace 0 (obtained by permuting
the events in ) in which f reads the null value provided by e, then in 0 , f should be scheduled
after e, but before e00 ; if f is scheduled before e, then it would not read from e. If f is scheduled
after e00 then the write in e00 overwrites the null value written by e before it reaches f . This
means that there should exist an event e0 of thread Ti , occurring between events e and e00 , that
is right before (or after) f in 0 ; in other words, e0 and f are co-reachable. Note that in cases
that there is no write event (to the same variable accessed in e and f ) in Ti after e, e0 could be
As shown in Figure 2.2, we iterate over all possible events e0 of Ti between e and e00 in
and use a simple technique [39] to check the co-reachability of e0 and f . As proposed in [39],
the co-reachability check is done by examining the lock-sets and acquisition histories (See
Definition 2.2.3) at e0 and f : The lock-sets at e0 and f must be disjoint and the acquisition
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS24
histories at e0 and f must be compatible, i.e., there are no locks l Lock-Set(Ti , e0 ) and l0
Given a global trace , we collect all null-WR pairs = (e, f ) as discussed in the previous
section. Then, according to the prediction problem for each null-WR pair = (e, f ), we have
to search for a lock-valid, data-valid and creation-valid trace, consisting of the events in , in
which f is reading the null value written by e. However, instead of using the whole (which
can be very large) for the purpose of prediction, we slice a relevant segment of it, and use the
segment instead. This segment is often orders of magnitude smaller than itself, and hence
the scalability of prediction is increased. However, any run predicted from the segment will
still be feasible. While this limits the number of predictable runs in theory, it does not prevent
us from finding errors in practice (in particular, no error was missed due to pruning in our
experiments).
Consider a global trace and a null-WR pair = (e, f ). The idea behind pruning is to
first prune away a set of events in which are not causally before e or f as they play no role
in occurrence of e or f . Assume that 0 is the new trace obtained after pruning. Then, in the
next step we find the largest prefix of 0 before reaching e and f such that all of the locks are
free at the end of this prefix. The intuition behind this is that such a prefix can be replayed in
the predicted run precisely in the same way as it occurred in the observed run. The prediction
problem is then restricted to the remained suffix, containing e and f , while the initial values of
shared variables for prediction is obtained by the last write to each shared variable in the prefix
segment. The prefix then can be stitched to a run predicted from the suffix since the suffix will
For the first step, let define the smallest subset of events of that satisfies the following
properties: (1) contains events e and f , (2) for any event e0 in , all events e00 that are
causally before it (i.e., e00 e0 according to Definition 2.2.7) are in , and (3) for every event
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS25
T4 T1 T2 T3
t
1 t3
t4 t
e 2
f
The intuition is that events that are not in are not relevant for the scheduling of the null-
WR pair. Figure 2.3 presents a run of a program with 4 threads that is projected into individual
threads. Here, e belongs to thread T1 and f belongs to thread T2 . The cut labeled marks
the boundary after which all events are not causally before e or f , and hence, need not be
For the second step, we identify a causally prefix-closed set of events before e and f to
remove. For the null-WR pair , let define the largest subset of events of that has the
following properties: (1) it does not contain e or f , (2) for any event e0 in , all events e00 that
are causally before it (i.e., e00 e0 ) are in , and (3) for any event e0 in Ti such that e0 is the
last event of Ti in , the Lock-Set(Ti , e0 ) is empty. In the figure, the curve labeled marks
The run segment relevant to a null-WR pair is then defined as the set of events in =
\ scheduled according to the total order in . This run segment is passed to the run
In this section, we encode the problem of precisely predicting a run that realizes a null-
WR pair as a set of logical constraints. More specifically, given a trace and a null-WR pair =
(e, f ), we encode the set of all possible runs in the maximal causal model that forces the read
f to read the null value written by e as a set of constraints and use a constraint solver to search
for a solution. The constraints fall into the Difference Logic [52] which is efficiently decid-
able [41].
Prediction according to the maximal causal model is basically an encoding of the creation-
validity, data-validity, and lock-validity constraints using logic, where quantification is re-
moved by expanding over the finite set of events under consideration. Modeling this using
constraint solvers has been done before ([68]) in the context of finding data races. We refor-
mulate this encoding briefly here (in Section 2.4.1) and adapt it to predict null-pointer derefer-
ences. We also propose a wide set of carefully chosen optimizations on this encoding.
Prediction based on the maximal causal model is sound, in the sense that it guarantees fea-
sibility of the predicted runs. However, sound prediction under the maximal causal model can
be too restrictive in some cases, i.e., the constraint system is unsatisfiable. Slightly diverging
from the maximal causal model can lead to prediction of runs that are also feasible in the origi-
nal program in many cases. In Section 2.4.2, we present a relaxation technique on the maximal
causal model and show how the precise encoding is changed to reflect the relaxation technique.
In this section, first we present how the maximal causal model can be captured by logical
constraints. Then, we show how these constraints are adapted for predicting runs realizing a
specific null-WR pair. Finally, we provide a set of optimizations that reduce the size of the
: PO FC LC DC
V
PO: Ti T POTi Cinit
Vn
Cinit : i=1 (tinit < tei,1 )
Vmi 1
POTi : j=1 (tei,j < tei,j+1 )
V
FC: Ti T (tetf(Ti ) < tei,1 )
Coupled(r, w0 ))
V V V W
DC: xSV valV al(x) rRx,val ( w0 Wx,val
V
Coupled(r, w): (tw < tr ) w0 Wx \{w} ((tr < tw0 ) (tw0 < tw ))
Given a trace , we first encode the constraints on all runs precisely predictable from it, using
the maximal causal model, independent of the specification that we want runs that realize a
given null-WR pair. A predicted run can be seen as a total ordering of the set of events E
the predicted run. Using these timestamps, we logically model the constraints required for
precisely predictable runs (see Definition 2.2.8); i.e., the runs should respect the program order
Figure 2.4 illustrates the various constraints. The constraint system is a conjunction of
Suppose that the given trace consists of the events of n different threads, and let |Ti =
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS28
ei,1 , ei,2 , ..., ei,mi be the sequence of events in that relates to thread Ti .
PO: The program order constraint (PO) captures the condition that the predicted run respect
the program order of the observed run. We consider an initial event einit which corresponds
to the initialization of shared variables. This event should happen before any thread starts the
execution in any predicted run; constraint Cinit encodes this fact. The constraint POi requires
that the predicted run obey the order of events in thread Ti , and PO requires that all threads
FC: Turning to creation-validity, suppose that etf(Ti ) is the event that creates thread Ti . Then,
the constraint FC requires that the first event of Ti can only happen after etf(Ti ) . Combined with
program order constraint, this means that all events before the creation of Ti in the thread that
created Ti must also occur before the first event of Ti in the predicted run.
LC1 LC2 : Lock-validity (see Definition 2.2.2) is captured by the formula LC. We assume
that each lock acquire event aq of lock l in the observed run is matched by precisely one lock
release event rl of lock l in the same thread, unless the lock is not released by the thread in the
run. Each lock acquire event aq and its corresponding lock release event rl define a lock block,
represented by [aq, rl]. Let LTi ,l be the set of lock blocks in thread Ti regarding lock l. Then,
LC1 ensures that no two threads can be inside lock blocks of the same lock l, simultaneously.
Turning to locks that never get released, the constraint LC2 ensures that the acquire of lock l by
a thread that never releases it must always occur after all releases of lock l in other threads. In
this formula, NoRelTi ,l stands for lock acquire events in Ti with no corresponding lock release
event.
DC: The data-validity constraints DC (see Definition 2.2.4) capture the fact that reads must
be coupled with appropriate writes; more precisely, that every read of a value from a variable
must have a write before it writing that value to that variable, and moreover, there is no other
intermediate write to that variable. Let Rx,val represent the set of all read events that read value
val from variable x in , Wx represent the set of all write events to variable x, and Wx,val
represent the set of all write events that specifically write value val to variable x. For each
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS29
read event r = rd(x, val) and write event w Wx,val , the formula Coupled(r, w) requires that
in the predicted run, w should happen before r and all other writes to variable x should either
happen before w or after r; i.e., w should be the most recent write to variable x before r and
hence r is coupled with w. The constraint DC requires that each read be coupled with a write
that writes the same value as the read reads in the observed run.
Now, we present how the constraint system proposed above can be adapted for predicting runs
(consistent with the maximal causal model) that realize a null-WR pair. Suppose that = (e, f )
is the null-WR pair and is (the segment of) the observed trace containing e and f which is
considered for prediction. Notice that in the observed run f reads a non-null value while we
will force it to read null in the predicted run by coupling it with write event e. Therefore, we
drop from the data-validity formula (DC) that the value read at f should be the same as in
the observed run. In addition, we need to add a constraint N C = Coupled(f, e) that forces the
read f be coupled with the write e, i.e., e occurs before f while avoiding any other write to the
Suppose that f is performed by thread Ti and e is performed by thread Tj . Since both e and
f should occur in the predicted run, according to the program order, all of the events in Ti and
Tj before f and e should appear in the predicted run. Note that once f reads a different value,
we no longer have any predictive power on what the program will do (as we do not examine the
code of the program but only its runs). Consequently, we cannot predict any events causally
later than f , i.e., f should be the last event in the predicted run.
A further complication is how to deal with events that are after e in Tj and events in threads
other than Ti and Tj . Note that some of these events may need to occur in order to satisfy the
requirements of events before f in the predicted run (for instance a read before f may require
a write after e to occur). However, not all of these events are required to happen before f . Our
strategy is to let the solver figure out the precise set of events that are required in the predicted
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS30
run. Therefore, the lock-validity and data-validity constraints are enforced on events that are
scheduled before f (i.e., their timestamp is less than the timestamp of f ). More precisely, we
replace:
(i) ( w0 Wx,val Coupled(r, w0 )) in the formula DC with ( w0 Wx,val (tr < tf tw0 < tf )
W W
Coupled(r, w0 )),
(ii) (trl0 < taq ) in LC2 with (((trl0 < tf ) (taq < tf )) (trl0 < taq )),
(iii) (trl < taq0 trl0 < taq ) in LC1 with ((taq0 < tf ) (taq < tf )) (trl < taq0 trl0 < taq )),
V V V
and add Ti T lL [aq,rl]LTi ,l (taq < tf trl < tf ) to the constraint system to ensure that
each lock block is completely scheduled before f if its lock acquire event is scheduled before
f . This is to avoid introducing new non-released locks in the predicted runs that might lead to
deadlock.
Optimizations
The data-validity constraint (DC) is expensive to express, as it is in Figure 2.4; in the worst case,
it is cubic in the maximum number of accesses to any variable. There are several optimizations
that reduce the number of constraints in the encoding. Suppose that r = (Ti , rd(x, val)) is a
(i) each write event w0 to x that occurs after r in Ti , i.e., r vi w0 , can be excluded in the
(ii) suppose that w is the most recent write to x before r in Ti . Then, each write event w0
(iii) when r is being coupled with w Wx,val in thread Tj , each write event w0 before w in
(iv) suppose that r is being coupled with w Wx,val in thread Tj and w0 is the next write
event to x after w in thread Tj . Then, each write event w00 after w0 in Tj , i.e., w0 vj w00 ,
(v) event r can be coupled with einit only when there is no other write event to x before r in
The lock-validity constraint (LC), which is quadratic in the number of lock blocks, is also
(i) If a read event r in thread Ti can be coupled with only one write event w which is in
thread Tj then in all precisely predictable runs, w should happen before r. Therefore, the
lock blocks according to each lock l that are in Tj before w and the lock blocks according
to lock l that are in Ti after r are already ordered. Hence, there is no need to consider
(ii) When considering lock acquire events with no corresponding release events in LC2 , it is
sufficient to only consider the last corresponding lock blocks in each thread and exclude
A and provide the logical constraint encoding of the problem based on the encoding presented
in this section.
To guarantee feasibility of predicted runs, the encoding based on the maximal causal model
restricts all of the reads in the predicted run (except f ) to read the same value as they did in the
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS32
observed trace. However, this can be too restrictive in the sense that no run can be predicted
with this restriction. For instance, suppose that = (e, f ) is a null-WR pair where e is in thread
Ti and f is in thread Tj . Furthermore, suppose that in the observed trace , all events of Tj
occur before events of Ti . In this case, if there is a read event r in thread Ti before e (r vi e)
that can be matched only with a write w in Tj after f (f vj w) then there is no precisely
predictable run in which is realized. However, if the value being read by r does not affect
the paths taken by the threads (for example, there is no conditional that checks the value of this
variable), ignoring the constraints related to r will help us in finding a feasible run.
We hence have a trade-off between two choices; we would like to maintain the same values
read for as many shared variable reads as possible to increase the probability of getting a
feasible run, but at the same time allow a few reads to read different values to make it possible
to predict some runs. Our proposal is an iterative algorithm for finding the minimum number of
reads that can be exempt from data-validity constraints that will allow the prediction algorithm
to find at least one run. We define a suitable relaxed logical constraint system to predict such
a run. Our experiments show that exempting a few reads from data-validity constraints greatly
improves the flexibility of the constraints and increases the possibility of predicting a run, and
Suppose that there are n read events in trace . The iterative algorithm works as follows:
The data-validity constraints are expressed so that we specifically ask for n reads to be coupled
precisely. If we fail to find a solution then we attempt to find a solution that couples n 1
reads precisely in the next round. We keep decrementing n and searching for a solution until
the constraint system becomes satisfiable and a run (solution) is found or a threshold for the
number of relaxed reads is reached. The changes required in the encoding to make this possible
For every read event ri R, we introduce a new Boolean variable, bi , that is true if the
integer variable bInti which is 0 if bi is false, and 1 if bi is true. This is enforced through a set
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS33
of constraints, one for each ri R: [(bi bInti = 1) (bi bInti = 0)]. Furthermore,
W
for each ri R, in the DC constraint, we change the sub-term ( w0 Wx,val (tr < tf tw0 <
tf ) Coupled(r, w0 )) to (bi ( w0 Wx,val (tr < tf tw0 < tf ) Coupled(r, w0 ))), forcing
W
the data-validity constraint for read ri to hold when bi is true. Note that with these changes,
we require a different theory, i.e., Linear Arithmetic in the SMT solver to solve the constraints,
compared to the Difference Logic which was used for our original set of constraints.
Initially, we set a threshold to be |R|, i.e., the number of all read events. In each iteration,
P
we assert the constraint 1i|R| bInti = , which specifies the number () of data-validity
constraints that should hold in that iteration. If no run can be predicted with the current thresh-
old (i.e., the constraint solver reports unsatisfiability), then is decremented in each iteration,
until the formula is satisfiable. This way, when a satisfying assignment is found, it is guaran-
teed to have the maximum number of reads that respect data-validity possible for predictable
run. Note that once < |R|, the predicted run is not theoretically guaranteed to be a feasi-
ble run. However, in practice, is close to |R| and predicted runs are usually feasible in the
program.
AI automated planning is a rich and rapidly evolving area of research [56]. The last 15 years
have seen tremendous advances in the field with the development of compact encoding tech-
niques for state representation and transition functions, together with highly effective search
techniques based on both satisfiability (SAT) and heuristic search. These advances have not
only led to the development of fast and highly effective AI planning systems, but they have
also led to advances in model checking [23, 26, 28] and related fields. Therefore, we elected
to explore the effectiveness of AI automated planning as a vehicle for prediction. In this con-
text, the prediction of a run realizing a null-WR pair is characterized as a sequential planning
problem with the temporally extended goal of achieving the particular violation being sought.
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS34
In this section, we propose a means of encoding the realization of a null-WR pair as a tem-
porally extended goal and characterizing the overall task as a classical planning problem using
the maximal causal model (discussed in Section 2.2.2). Despite the focus on null reads, it is
important to note that a very similar encoding can be applied to prediction of runs contain-
ing other types of concurrency violations like atomicity violations and data races. Indeed, the
merit of using AI planning is in exploiting the rich compact encodings of transition systems
that planners use, the ability to encode complex violation patterns (at least anything that can be
encoded in Linear Temporal Logic [60, 87]) as planning goals and the highly optimized heuris-
tic search techniques that have been honed over the past decade. Here, we only focus on the
encoding of precise prediction, using the maximal causal model. Note that our precise predic-
tion technique does not require incorporating any numerical data. We leave the investigation
of possible encodings of prediction based on the relaxation technique for future work.
In the rest of this section, we first present a background on planning. Then, we show how
Informally, a planning problem can be described as follows: given a description of a set of ac-
tions an agent can perform, together with a specification of an initial state and a goal, the task
of automated planning is to generate a set of actions, together with some ordering constraints,
such that if those actions are executed by an agent, starting in the initial state, following the
ordering constraints, they will lead to a state in which the goal is achieved. Classical planning
problems are characterized by a finite initial state that is completely specified, a finite set of
actions that are deterministic, and a goal condition that is restricted to conditions placed on the
final state of the system. More formally, a STRIPS classical planning problem [56] is defined
state, G F specifies a set of goal states where the facts comprising G hold, and A is a fi-
nite set of deterministic actions. Each action a A is described by a tuple (pre(a), add(a),
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS35
del(a)) where pre(a) is a pair (pre+ (a), pre (a)) of disjoint subsets of F that define, respec-
tively, the positive and negative preconditions of action a, respectively. (For STRIPS classical
planning, pre (a) is empty.) Further, add(a) and del(a) are disjoint subsets of F that define,
information about the planning state. Therefore, every f F that is not explicitly mentioned
in a planning state, including the initial state, is assumed to be false in that state. Action a is
applicable in a planning state s F iff pre+ (a) s and pre (a) s = . Applying action
a in state s would result in a new, successor state, succ(a, s) = (s\del(a)) add(a). The
goal G corresponds to a set of planning states and a plan, (strictly speaking a sequential plan,
Much of the research and many of the advances in automated planning have been with
respect to classical planning. However, many real-world planning problems do not fit within
this narrow characterization. One such restriction is that the goal to be achieved by an agent
the objective of the search is restricted to some property of the final state of the system.
A relaxation of this restriction is to support goals that are temporally extended, i.e., where
the objective can specify properties that occur along the trajectory of states realized by a plan
execution. In the spirit of this, a temporally extended planning problem, P (e.g., [1]), in this
setting is a classical planning problem P = (S0 , F, A, G) where the goal G is not restricted
to a final-state goal, but rather is a set of facts together with some ordering constraints. Such
temporally extended goals are often specified in linear temporal logic (LTL) [60]. A sequential
Automated planning problems are typically encoded in terms of a planning domain de-
scription that describes the dynamics of the planning problem parameterized representations
of the actions, their preconditions and effects, and by a problem instance that includes a de-
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS36
scription of the initial state and the goal. The de facto standard for specifying planning domains
and planning instances is PDDL, the Planning Domain Definition Language [51]. PDDL has
evolved over the years to address increasing needs for expressiveness, and is firmly estab-
lished as the input specification language for most automated planning systems. PDDL3 [22],
a recent version of PDDL, allows for the specification of temporally extended constraints and
Automated planning systems themselves vary in their approaches to plan generation. Two
popular approaches are those based on heuristic search, as exemplified by the very successful
Fast-Forward (FF) domain independent planning systems used here [31], and those based on
SAT (e.g., [67, 40]). While these systems take PDDL as input, most transform the PDDL
into an internal representation that is tailored to the needs of their search algorithm. Recent
advances in automated planning have seen the development of effective planning techniques for
net-benefit planning, and planning with nondeterministic effects of actions. These advances
We encode the dynamics of the given trace as an initial state S0 , a set of facts F , and a set
of actions A. Actions correspond to events within the given trace. The facts record which
actions (i.e., run events) have been executed and some specific properties relating to the most
recent write to each shared variable and lock availability. The preconditions and effects for
individual actions are written so as to enforce program order, and also to enforce lock-validity,
data-validity and creation-validity that ensure that any plan generated from this planning in-
stance corresponds to a precisely predictable run from the given global trace that guarantees
We treat each null-WR pair = (e, f ) as a temporally extended goal. More specifically, it
can be specified as an LTL formula eventually (Happenede and next eventually (Happenedf ))
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS37
where Happenede and Happenedf encode the occurrence of events e and f , respectively. As
such, the task of predicting a run that realizes is viewed as the automated generation of a
plan with a temporally extended goal. Exploiting results proposed in [1] such problem can
between LTL and Buchi automata [81]. In more restrictive cases, such as realizing a null-
WR pair described here, there is an even simpler transformation of the temporally extended
goals into a final-state goals [29] via what is effectively precondition control on the actions.
Here, we present schemas or templates for the general PDDL encoding we employ for
prediction. For ease of explanation, syntax does not strictly conform to PDDL syntax but is
expressively equivalent. First, we illustrate how the maximal causal model is encoded in a
planning domain such that any sequence of applicable actions correspond to a feasible execu-
tion of the corresponding program. Then, we describe how null reads, considered as temporary
extended goals, can be compiled into constraints on the evaluation of the domain.
Given a trace , we use the maximal causal model to encode the set of precisely predictable
Events and Program Order: We encode each event in as an action in the planning do-
main. Therefore, we may have five different types of actions: read, write, thread creation,
lock acquire, and lock release actions. Let |Ti = {ei,1 , ei,2 , ..., ei,m } be the projection of
on thread Ti . Suppose that we have ei,1 vi ei,2 , ei,2 vi ei,3 , ..., ei,m1 vi ei,m . According to
the program order, event ei,j+1 cannot occur before event ei,j . Let action Aci,j represent event
ei,j , i.e., the j th event in thread Ti . For each action Aci,j we introduce a predicate (Donei,j )
that indicates whether the action has been applied or not. These predicates are initially false
and become true after the application of action Aci,j . To enforce program order, action Aci,j
In a general planning problem, an action may be applied several times to find a solution for
the planning problem. Note that this cannot happen in our case as each action represents an
event which can occur at most once in any run. Therefore, each action cannot be applied more
than once in any plan. To encode this fact, predicate (NOT Donei,j ) is considered as one of the
preconditions of each action Aci,j . Putting it all together, the following forms the template for
(: ACTION Aci,j
Note that if an actions is encoding the first event in a thread Ti , then its precondition only con-
sists of (NOT Donei,1 ). The actions may also have other preconditions and effects according
to the type of the event (i.e., read, write, thread creation, lock acquire, and release) they rep-
resent. The . . . denotes that other event-specific conditions may be added to the preconditions
and effects of the template. In the following, we describe each of these event types in detail.
Write Events: Let Wx represent the set of all write events to variable x in the observed run.
To keep track of the most recent write event to variable x, we consider a set of predicates,
S
represented by writes(x) = {(xm,n )| em,n Wx } {(xinit )}.
Predicate (xinit ) indicates whether x has its initial value. It is initially true, indicating that
no write event has been performed to x. Predicate (xm,n ) indicates whether event em,n has
performed the most recent write to x. Predicates of this type are all initially false. When action
Acm,n , which encodes a write to x, is applied, predicates (xm,n ) becomes true. In addition, all
predicates in writes(x) other than (xm,n ), are set to false indicating that they are not the most
recent write to x. Therefore, at each point in time only one of the predicates in writes(x) can
be true. The following shows how action Aci,j , which corresponds to a write event to variable
x, is encoded:
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS39
(: ACTION Aci,j
Read Events: According to data-validity in the maximal causal model, each read event in the
predicted run should read the same value as it did in the original run. However, we are not
encoding the real values in the planning domain and it is just enough to recognize for each read
event, the set of write events that write the same value to the corresponding variable as the read
Suppose that event ei,j = (Ti , rd(x, val)) reads the value val from variable x and let
W ritex,val denote the set of events that write value val to variable x. In any precisely pre-
dictable run, read event ei,j can only be coupled with a write event in W ritex,val . Therefore,
for each write event em,n W ritex,val an action is considered as follows which forces read
Having (xm,n ) as the precondition of the action would force the read event to read the value
Thread Creation Events: According to creation-validity in the maximal causal model, each
thread can start execution only after it is created. For each thread Ti , we consider a predicate
(Createdi ) which indicated whether Ti has been created or not. Each of these predicates
(Createdi ) is initially false and is set to true by the action corresponding to the event creating
Lock Acquire and Lock Release Events: According to lock-validity, each lock can be ob-
tained by at most one thread at each point of time. Therefore, if a lock is obtained by thread Ti
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS40
then other threads cannot acquire it unless Ti releases the lock. Assume that L = {l1 , ..., lm } is
the set of locks acquired in the observed run. To guarantee lock-validity, a predicate (Availablelj )
is introduced for each lock lj , indicating whether lock lj is obtained by any thread or is free.
These predicates are initially true since all of the locks are available at the beginning.
Suppose that action Aci,j corresponds to a lock acquire event on lock l. It requires (Availablel )
(: ACTION Aci,j
the action. Note that after performing the action, lock l is not available anymore and cannot be
Actions corresponding to a lock release event of lock l set (Availablel ) to true, making the
lock available again. Therefore, action Aci,j which corresponds to a lock release event on lock
l is encoded as follows:
(: ACTION Aci,j
A domain description following these template transformations of a given trace will enable us
to generate feasible runs based on the maximal causal model. Next, we will show how we can
A null-WR pair = (e, f ) consists of a write e, writing a null value to a shared variable, and a
read f from the same shared variable, reading a non-null value. A null read with respect to
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS41
Suppose that Aci,j is the action corresponding to the write event e which writes a null value
to variable x, and Acf is the action corresponding to the read event f . Recall that after the
application of action Aci,j , predicate (xi,j ) becomes true indicating that Aci,j has performed
the most recent write to x. According to the encoding of write events, any other write to
variable x after the application of Aci,j would set (xi,j ) to false. Since e should be the most
recent write to x when the read f is about to happen, it is enough to make (xi,j ) a precondition
of action Acf .
We also consider a predicate (Happenedf ) which represents whether action Acf has hap-
pened or not. This predicate is initially false and is set to true in the effect set of action Acf .
Then, the final-state goal is defined as (: goal(Happenedf )). In this case, f is guaranteed to
read the null value written by e when (Happenedf ) becomes true because Acf can be applied
A and provide the planning encoding of the problem based on the encoding presented in this
section.
2.6 Evaluation
multi-threaded Java programs. E XCEPTIO NULL is built on top of P ENELOPE [82] which is
a tool that predicts atomicity violations using a lock-based analysis. The implementation is
equipped with both logical constraint and planning encoding techniques. To evaluate our null-
of multi-threaded Java programs. In the following, we briefly discuss E XCEPTIO NULL and
Figure 2.5: E XCEPTIO NULL with logical constraint encoder/solver built on top of P ENE -
2.6.1 Implementation
Figure 2.5 demonstrates the architecture of E XCEPTIO NULL with logical constraint encoder/-
solver. It consists of three main components: a monitor, a run predictor, and a scheduler. The
monitor and scheduler are built on top of P ENELOPE, with considerable enhancements and
optimizations, including the extension of the monitoring to observe values of shared variables
at reads and writes. In the following, we will explain each of these components in more detail.
Monitor: The monitor component has an instrumenter which uses the Bytecode Engineering
Library (BCEL)2 to (automatically) instrument every class file in bytecode so that a call to
an event recorder is made after each relevant action is performed. These relevant actions in-
clude field and array accesses, acquisitions and releases of locks, thread creations, but exclude
accesses to local variables. The instrumented classes are then used in the Java Virtual Ma-
chine (JVM) to execute the program and generate a global trace. For the purpose of generating
data-validity constraints, values read/written by shared variable accesses are also recorded.
Run Predictor: The run predictor consists of several components: null-WR pair extractor,
2
http://jakarta.apache.org/bcel
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS43
segment generator, logical constraint and planning encoder/solver, and run extractor. The
null-WR pair extractor generates a set of null-WR pairs from the observed trace by the static
lock analysis described in Section 2.3.1. The segment generator component, for each null-
WR pair = (e, f ), isolates a part of the observed trace that is relevant to as described in
Given a null-WR pair and the relevant segment, the logical constraint encoder/solver (as
shown in Figure 2.5), first, produces a set of constraints according to the encoding presented
in Section 2.4. Then, it utilizes the Z3 [9] SMT solver to find a solution. Any model found
by Z3 represents a partial run. The run extractor component generates a run by attaching the
partial run generated based on the model returned by Z3 to the prefix generated by the segment
generator. When Z3 cannot find a solution, the logical constraint encoder/solver iteratively
weakens the constraints according to the relaxation method proposed in Section 2.4.2 and calls
Figure 2.6 shows the planning encoder/solver. Given a null-WR pair and the relevant seg-
ment, the planning encoder/solver, first, uses the segment to generate a planning problem en-
coding the maximal causal model as discussed in Section 2.5.2. Then, null-WR pair is encoded
as a temporally extended goal. The planning domain and the temporally extended goal are
then compiled into a classical planning problem according to the algorithm proposed in Sec-
tion 2.5.2. To find a plan, one can use a variety of plan generation algorithms. Here we use FF,
Scheduler: The scheduler is implemented using BCEL as well; the scheduling algorithm is
instrumented into Java classes using bytecode transformations, so that the program interacts
with the scheduler when it is executing an action regarding shared variable accesses or syn-
chronization events. The scheduler, at each point, looks at the predicted run, and directs the
appropriate thread to perform a sequence of n steps. The communication between the sched-
uler and threads is implemented using wait-notify synchronization which allows us to have a
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS44
2.6.2 Experiments
We evaluate the effectiveness and efficiency of our null-pointer dereference prediction tech-
several test cases and input parameters. We investigate the effects of the static pruning and
relaxed prediction discussed in Sections 2.3.2 and 2.4.2, respectively. Finally, we evaluate uti-
lizing planning techniques in null-pointer dereference prediction (as discussed in Section 2.5).
Benchmarks: The benchmarks are all concurrent Java programs that use synchronized
blocks and methods as means of synchronization. They include RayTracer from the Java
Stack, and HashSet from Java libraries, Pool (3 releases) and StaticBucketMap from the
Apache Commons Project4 , Apache FtpServer5 , Hedc6 , and Weblech7 . The elevator
of spheres from a given view point, Pool is an object pooling API in the Apache Commons
3
http://www.javagrande.org/
4
http://commons.apache.org
5
http://mina.apache.org/ftpserver
6
http://www.hedc.ethz.ch
7
http://weblech.sourceforge.net
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS45
Additional Null-Pointer
Deref. by Relaxation
Additional Predicted
Runs by Relaxation
Interleaving Points
Num. of Precisely
Num. of Potential
Num. of Feasible
Deref. by Precise
Num. of Threads
Shared Variables
Time to Monitor
Num. of Locks
Predicted Runs
Predicted Run
Null Pointer
Predictions
Total Time
Prediction
Program
Num. of
(LOC)
Input
Pairs
Base
Data 7.3s 3 116 8 14K 7.4s 0 - - - - 7.9s 0 0
Elevator
Data2 7.3s 5 168 8 30K 7.4s 0 - - - - 8.9s 0 0
(566)
Data3 19.2s 5 723 50 150K 19.0s 0 - - - - 58.5s 0 0
A-10 5.0s 10 106 10 648 5.0s 9 9 - 9 5.6s 50.5s 1 0
RayTracer
A-20 3.6s 20 196 20 1.7K 4.4s 19 19 - 19 6.7s 2m15s 1 0
(1.5K)
B-10 42.4s 10 106 10 648 42.5s 9 9 - 9 42.7s 6m24s 1 0
Apache
FtpServer LGN 1m2s 4 112 4 582 60s 116 78 32 65 1m13s 2h14m46s 9 3
(22K)
Weblech
v.0.0.3 Std 4.9s 3 153 3 1.6K 4.92s 55 10 29 30 16.26s 10m34s 1 1@
(35K)
Table 2.1: Experimental results for precise/relaxed prediction using logical constraint encoder/-
unexpected behavior, respectively. All other errors are null-pointer dereference exceptions.
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS46
FtpServer is a FTP server by Apache, and Vector, Stack, HashSet and StringBuffer
are Java libraries that respectively implement a concurrent vector, stack, HashSet and String-
Buffer data structures. Hedc is a Web crawler application and Weblech is a websites download
tool.
Table 2.1 illustrates the experimental results for null-pointer dereference prediction using
the logical constraint encoder/solver. It provides information about monitoring, run prediction,
and scheduling phases. In the monitoring phase, the number of threads, shared variables, locks,
the number of potential interleaving points (i.e., number of global events), and the time taken
for monitoring are reported. For the prediction phase, we report the number of null-WR pairs in
the observed run, the number of precisely predicted runs, and the additional number of runs
predicted by relaxation (when there is no precisely predicted run for a null read-write pair). In
the scheduling phase, we report the total number of feasible (i.e., could be scheduled) predicted
runs. Finally, we report the average time for prediction and rescheduling of each run, the total
time taken to complete the tests (for all phases), and also the number of errors found using the
Observations: Comparing the number of null-WR pairs with the number of precisely predicted
runs, we can see that our precise prediction technique performs very well in practice; i.e., for
most of the benchmarks, the precise prediction is able to generate feasible runs for many of the
null-WR pairs . Furthermore, we could find 27 errors in total in our set of benchmarks using
the precise prediction technique that also proves the effectiveness of the precise prediction.
According to the number of additional runs predicted by relaxation, we conclude that relax-
ation technique works extremely well; The relaxation method could predict lots of runs for the
null-WR pairs for which the precise prediction technique cannot find any solution. Comparing
the number of feasible predictions and number of precisely predicted runs, we can see that a
large number of runs predicted by the relaxed prediction technique were feasible. Furthermore,
10000
Prediction Time (w/o Pruning)
Prediction Time (w Pruning)
1000
Time (Seconds)
100
10
0.1
R
Po
Po 2
Po 3
SB 5
Ve
St
St
Ft
W
ay
as
ed
pS fer
ac
rin
eb
ol
ol
ol
ct
uc
T
hS
c
or
gB
le
er
1.
1.
1.
ra
ke
ch
et
ve
ce
uf
rM
r
r
ap
Figure 2.7: Prediction times with/without pruning in log scale.
In our experiments, the errors manifested in the form of raised exceptions in most of the pro-
(the user is asked to push a stop button even after the website is downloaded completely, result-
ing in non-termination!). RayTracer has a built-in validation test which was failed in some of
the predicted runs. For some of the test cases of Vector and Stack the output produced was
not the one expected. In Table 2.1, exceptions raised in different parts of the code are counted
as separate errors. For example, the 9 exceptions in FtpServer are raised in 7 different func-
tions and at different locations inside the functions, and involve null-pointer dereferences on 5
different variables.
In general, we can see that E XCEPTIO NULL performs considerably well, predicting a large
number of feasible program runs leading to null-pointer dereferences. In total, it finds about
40 executions with null-pointer dereferences in the benchmarks. All the errors are completely
reproducible deterministically using the scheduler. Furthermore, despite the use of fairly so-
phisticated static analysis and logic-solvers, the time taken for prediction is very reasonable.
The effect of pruning: Figure 2.7 illustrates the substantial impact of our pruning algorithm
(presented in Section 2.3.2) in reducing prediction time. It presents prediction time with and
without using the pruning algorithm. Note that the histogram is on a logarithmic scale. For ex-
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS48
ample, in the case of Weblech, the prediction algorithm is about 16 times faster with pruning.
Furthermore, all errors found without the pruning were found on the pruned runs, showing that
the pruning did not affect the quality of error-finding on our benchmarks.
using the planning encoder/solver on some of the benchmarks. As mentioned before, the en-
coding is based on the maximal causal model and can only support precise prediction. There-
fore, from the Java benchmarks, we pick ones for which relaxed prediction does not have any
effect regarding the number of errors found. In Table 2.2, in addition to the information about
the observed runs, i.e., number of threads, shared variables, locks and events, we provide the
number of null-WR pairs, number of precisely predicted runs, average time per prediction and
number of null-pointer dereferences found by the predicted runs for both logical constraint and
planning encodings.
can observe that the number of predicted runs is not affected by the heuristics used in FF.
it. Therefore, according to the number of predicted runs logical constraint encoder/solver and
planning encoder/solver are equal to each other. As a result, using the planning encoder/solver
we could find all of the errors found by using the logical constraint encoder/solver; in fact, the
Another observation is that the planning encoder/solver is much more faster than the logi-
cal constraint encoder/solver. The average time per prediction is 0.01 second for FF which is
negligible. This implies that the heuristic-based search algorithm embedded in FF can perform
very well on the planning problems obtained through the encoding outlined in Section 2.5.2.
This suggests that maybe other test generation techniques also can employ a planning encoder/-
solver to speed up and benefit from the advance search algorithms embedded in planners.
As noted previously, the novelty and effectiveness of the planning encoding is in the con-
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS49
Predicted Runs by FF
Num. of Shared Vars
Num. of Precisely
Num. of Precisely
Num. of Potential
Num. of Locks
Program
(LOC)
Input
SBucketMap
BMT 4 123 19 892 2 2 2 0.01 0.25 1 1
(750)
PT1 4 30 1 100 3 0 0 - - - -
Pool 1.3
PT2 4 31 1 271 3 0 0 - - - -
(7K)
PT4 4 23 3 422 62 1 1 <0.01 1.33 0 0
StringBuffer
SBT 3 16 3 80 2 2 2 <0.01 0.15 1 1
(1.4K)
Elevator
Data 3 116 8 14K 0 - - - - - -
(566)
Table 2.2: Experimental results for precise prediction using planning encoder/solver.
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS50
VT1 0 - - 0 - -
Vector
VT2 0 - - 11 11 0.01
(1.3K)
VT3 0 - - 4 4 0.01
ST1 0 - - 0 - -
Stack
ST2 0 - - 22 22 0.01
(1.4K)
ST3 0 - - 8 8 0.01
SBucketMap
BMT 1 1 0.02 2 2 0.01
(750)
PT1 0 - - 0 - -
Pool 1.3
PT2 0 - - 12 10 <0.01
(7K)
PT4 0 - - 127 10 <0.01
HashSet
HT1 20 20 0.01 3 3 0.01
(1.3K)
StringBuffer
SBT 0 - - 1 1 <0.01
(1.4K)
Elevator
Data 0 - - 4 0 -
(566)
Table 2.3: Experimental results for predicting data races and atomicity violations using plan-
ning encoder/solver.
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS51
ceptualization and realization of the test generation task as an AI automated planning task
where the null reads patterns are characterized as temporally extended goals. Nevertheless,
our conceptualization of the test generation task allows for predicting runs with other violation
patterns as well. This ability to elegantly and effectively deal with arbitrary violation patterns
is a strength of the approach. We investigated the applicability of this test generation technique
We used P ENELOPE [82] to extract a set of data race and atomicity violation patterns from
the observed runs. Each data race pattern is a tuple (e, f ) where e and f are accesses to the
same shared variable in different threads which conflict with each other (i.e., least one of them
is a write access). Each atomicity violation pattern is a tuple (e1 , e2 , f ) where e1 , e2 and f are
accesses to the same shared variable, e1 and e2 are in one thread and f is in another thread,
and f conflicts with both e1 and e2 . For each data race pattern (e, f ), the planning goal is to
perform the actions corresponding to e and f , back to back. For each atomicity violation pattern
(e1 , e2 , f ) the planning goal is to perform the corresponding action of f after the corresponding
Table 2.3 shows the experimental results for predicting data races and atomicity violations
using planning encoder/solver. We report the number of access patterns, number of predicted
runs, and average time per predicted run by FF for both data races and atomicity violations.
According to the experiments, the planning encoder/solver was able to predict runs for most
of the access patterns pretty fast (i.e., average 0.01 sec.). This confirms the applicability of the
Conclusion: Our experiments showed that our prediction technique in predicting null-pointer
dereferences is very effective in practice. Using this technique, we could find 41 bugs in our
set of benchmarks. We showed that our relaxation technique is very useful when there is no
precise solution for the prediction problem. According to the relaxation technique, we could
predict runs (actually many of them were feasible runs) that resulted in finding 13 additional
bugs. We also showed that our pruning technique made the prediction process up to 16 times
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS52
faster for some of the benchmarks without affecting the quality of bug-finding. We compared
the logical constraints and AI planning encoding approaches for the precise prediction. Our
experiments showed that the planning encoder/solver is much more faster than the logical con-
straints encoder/solver. Finally, we showed that our prediction technique is general and can be
Prediction techniques use heuristics (e.g., atomicity violations, data races, and assertion viola-
tions) to reduce the interleaving exploration space under fixed inputs. Similar to our prediction
technique for finding null-pointer dereferences, several prediction techniques work at shared
communication level, i.e., accesses to shared variables and synchronization events. These
techniques either enhance a lock-based analysis [17, 82, 77, 78, 35] or a graph-based anal-
ysis [93, 92] for prediction. For example, P ENELOPE [17, 82] is a testing tool for predicting
atomicity violations in concurrent program. It works at shared communication level and uses a
lock-based analysis that guarantees all of the predicted runs respect the semantics of locking.
However, since data-flow is ignored in the analysis, the predicted runs are not guaranteed to be
feasible. Other lock-based techniques [77, 78, 35] follow a more restrictive approach in inter-
leaving exploration to guarantee soundness; in the predicted runs each read should be matched
with exactly the same write as it was matched in the observed run. Techniques that utilize a
graph-based analysis [93, 92], on the other hand, build a dependency graph based on the events
in the observed run to identify atomicity violations. These techniques are even more restrictive
than lock-based techniques according to the set of explored interleavings since in addition to
the constraint that each read should be matched with exactly the same write as it did in the
observed run, the predicted runs should preserve the order of lock blocks as in the observed
run. These restrictions guarantee soundness for the prediction technique. The maximal causal
model (MCM) [74] is another technique working at the shared communication level that targets
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS53
sound prediction. It is the maximal precise prediction technique one can achieve at the shared
communication level. We discussed this technique in this chapter in detail. It has been used by
Said et al. [68] for finding data race witnesses. Our null-pointer prediction technique utilizes
There are also prediction techniques [90, 89] that works at the statement level. These
techniques observe a run and symbolically encode every single instruction executed (local
computation as well as global computation) in the observed run and use a sound symbolic
analysis to predict atomicity violations and assertion violations. These techniques have too big
A more liberal notion of generalized dynamic analysis of a single run has also been studied
in a series of papers by Chen et al. [7, 6]. JP REDICTOR [7] offers a predictive runtime analysis
that uses sliced causality [6] to exclude the irrelevant causal dependencies from an observed run
and then exhaustively investigates all of the interleavings consistent with the sliced causality to
CT RIGGER [59] is another testing tool that targets finding atomicity violations in concur-
rent programs. It first extracts a set of atomicity violation patterns from an observed run. Then,
for each pattern, it instruments the program code by inserting some synchronization around the
accesses corresponding to the pattern, with the aim of increasing the probability of realizing
the violation pattern in the execution of the instrumented program. However, the atomicity
violation patterns are not guaranteed to be realized in the instrumented program, i.e., it is not
sound.
Another closely related work is C ON M EM [96], where the authors target a variety of mem-
ory errors in testing concurrent programs, including null-pointer dereferences, but the predic-
tion algorithms are much weaker and quite inaccurate compared to our robust prediction tech-
niques; their prediction analysis is mostly based on the synchronization events present in an
observed run (ignoring the flow of data among the threads) and hence is not sound. Therefore,
they had to build a validator to automatically prune false positives by enforcing the predicted
C HAPTER 2. P REDICTING N ULL -P OINTER D EREFERENCES IN C ONCURRENT P ROGRAMS54
interleavings.
2.8 Summary
In this chapter, we introduced a new pattern for interleaving selection that targets null-pointer
sound and scalable prediction. For the sake of scalability, our prediction is based on an approx-
imation that ignores local computation entirely. We also proposed a static pruning technique
that reduces the size of the prediction problem drastically. We exploited the maximal causal
model [74] that guarantees sound prediction at shared communication level. For cases where
there is no sound solution, we proposed a relaxation method at the expense of losing soundness
guarantees. However, our experiments showed that the majority of the runs predicted by the re-
laxation method are feasible. We developed two different encodings for our prediction problem
based on logical constraints and AI planning. The former encoding allows us to use state-of-
the-art SMT solvers to search for a solution. The latter encoding allows us to benefit from the
compact encoding and advanced heuristic-based search algorithms embedded in the planners.
According to our experiments, both approaches are equal regarding the effectiveness in bug
finding. However, the planning approach showed to be much more faster than the logical con-
straints approach. We implemented our prediction technique in a tool that predicts null-pointer
tool, that proved the efficiency and the effectiveness of our null-pointer dereference prediction
technique.
Chapter 3
Under-approximations of Programs
Program slices built from concurrent program executions, referred to as concurrent trace pro-
grams, have been used as program under-approximations to find bugs in the corresponding pro-
grams [89, 77, 80, 79]. Concurrent trace programs encode program runs as a set of thread-local
operations. Some techniques [80, 79] subject concurrent trace programs (instead of the whole
these techniques target test generation (i.e., input/schedule generation) for exploring different
program behaviours. Rather, they focus on finding assertion violations corresponding to as-
sertions that present in the approximation model. Furthermore, in all of these techniques, the
approximation model is fixed at the beginning, and none of these techniques consider program
programs. However, our main goals are (i) to use the approximation model as a basis for test
generation, and (ii) to generate tests that increase code coverage in the concurrent program.
More specifically, test generation targets covering static branches in the program that have not
55
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 56
been covered by previous tests. Note that in an active testing framework [25], many runtime
bugs can be encoded as branches. Therefore, by targeting branch coverage, we can implicitly
(def) represents a write to a shared variable in some thread and a use represents a read from
that variable in some other thread, that would lead to covering a previously uncovered branch.
We exploit the fact that it is easy to generate a set of different test runs (e.g., by executing
the program with different input values) without any significant effort. By observing already
available test runs for various writes to shared variables, we are then able to select segments
of previously observed runs, and insert them (not necessarily atomically) into other runs to
exploit previously unseen def-use pairs leading to covering previously uncovered branches of
the program. In the following, we call such segments containing a write to a shared variable
interloper segments.
Given this intuition, we have to address the following challenges: (1) How to generate an
interesting set of test inputs and thread schedules to start with, if none is provided, (2) How to
effectively search for feasible interloper segments, and (3) How to generate inputs and feasible
schedules corresponding to inserting an interloper segment into another run leading to covering
To address the first question, we rely on sequential test generation techniques; i.e., we
subject each thread to sequential testing individually, first. We utilize the fact that state-of-
the-art sequential test generation techniques are generally able to quickly cover a large part of
the program in terms of branches in individual threads. Indeed, the branches of a concurrent
program that are not covered using sequential testing techniques alone may require interesting
interactions between the threads of the concurrent program that are worth further exploration.
To address the second question, we develop a static Multi-Trace Analysis (MTA) technique.
In our MTA technique, we advance the symbolic predictive analysis technique [89, 77] (which
considers fixed inputs) with symbolic inputs to be able to generate input values. Furthermore,
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 57
unlike symbolic predictive analysis, our MTA exploits information in multiple program runs.
Finally, to address the third question, we generate an appropriate logical constraint system,
whose model implies a set of input values and a schedule, and use SMT solvers to search for
solutions.
This chapter is based on our publication on MTA technique (i.e., [66]). We elaborate the
Consider the simple concurrent program consisting of two threads that call Thread1 and
Thread2 in Figure 3.1, respectively. Variable x is shared among the threads and input is
the input of the program. The error in Thread1 is not reachable (i.e., the if-branch is not
coverable) when the threads are executed sequentially back to back. However, the error will
become reachable when input 2 and the read of x at line 3 in Thread1 reads the value
written by the write in other thread at line 9. Our goal is to generate a test (i.e., input values
plus a schedule) such that the execution of the program with the generated inputs according to
Suppose that we subject each thread to sequential test generation to increase code coverage
as much as possible in individual threads. Sequential test generation, will not do much for
Thread1 since its behavior does not depend on the inputs of the program. However, for
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 58
Figure 3.2: Test generation based on MTA for the program in Figure 3.1.
Thread2, sequential test generation will generate two different values for input, one 0 and
one > 0, corresponding to skipping and covering the if-branch at line 8, respectively. Without
loss of generality, suppose that 0 and 1 are the two values generated for input.
Now, we execute the concurrent program with the generated inputs and get two concurrent
runs Run1 and Run2 depicted in Figure 3.2. One can see that the if-branch in Thread1
(which leads to the error) is skipped in both of these runs since the read of x in the branch
condition always reads the value written to x locally at line 2. However, we observe that in
Run2 there is a write to x by the other thread that could be a candidate for providing a value for
the read of x in the branch condition (overwriting the value written at line 2). Therefore, we
select an appropriate interloper segment from Run2, containing the candidate write to variable
x (as shown in Figure 3.2), and insert it between the write to x at line 2 and the read from x at
line 3 in Run1 (shown by an arrow in Figure 3.2) and search for input values and a schedule
that would result in covering the if-branch at line 8 (if possible). Figure 3.2, on the right
side, depicts a generated test that results in covering the if-branch at line 8 and hence leads to
the error state. Note that the error state cannot be reached by applying prediction techniques
Similar to symbolic prediction [89, 90], we symbolically encode the set of all feasible runs,
where the interloper segment is inserted in Run1 and the if-branch at line 3 is taken, as a set
of logical constraints. Unlike symbolic prediction which works on fixed inputs, we consider
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 59
symbolic values for inputs to be able to do input generation. In this example, the interloper
segment is inserted atomically between the write to x and the read from x in Run1. How-
ever, in general, these two events might be far from each other and in that case, the generated
constraints encode all feasible runs in which the statements in the interloper segment are inter-
leaved with the statements in Run1 appearing between the corresponding write and read. We
3.2 Preliminaries
In this section, we first formally define symbolic traces and concurrent trace programs which
form the basis for our test generation technique. Then, we discuss how a concurrent trace pro-
gram, obtained from a single execution of a concurrent program, is used by symbolic prediction
techniques to predict bugs in the concurrent program. Finally, we provide a brief overview of
In Section 2.2.1, we defined the notion of a global trace as the sequence of events correspond-
ing to accesses to shared variables and synchronization events. In this section, we define the
notion of a symbolic trace that contains information about both global and local computation
in program executions.
For a concurrent program, let T = {T1 , T2 , ...} represent the set of thread identifiers, and SV
be the set of shared variables. Each thread Ti has a finite set of local variables LVi , and can
access the set of variables in Vi = SV LVi during its execution. Each thread Ti executes a
Definition 3.2.1 (Trace Statement). A trace statement is a tuple (sId, stmt) where sId is a
unique identifier (e.g., a combination of a thread identifier and location in the program), and
(assume(c), asgn) is the atomic guarded assignment, where asgn is a set of assignments,
each of the form v := exp, where v Vi is a variable and exp is an expression over Vi .
assume(c) means the conditional expression c over Vi must be true for the assignments
in asgn to execute.
assert(c) is the assertion statement. The conditional expression c over Vi must be true
The guarded assignment (assume(c), asgn) may have the following variants: (1) when
c = true, it can represent normal assignments; (2) when the assignment set is empty, assume(c)
itself can represent the then-branch of an if(c)-statement, while assume(c) can represent
the else-branch; and (3) with both guard and assignments, it can represent an atomic check-
and-set, which is the foundation for synchronization primitives. In particular, it can precisely
capture the semantics of all synchronization primitives in the standard PThreads library. For
Let stmtIds represent the set of trace statement identifiers in the program. We refer to the
execution of trace statements as events. An event e is a tuple (tid, loc), where tid T is a
thread index, loc = (sId, instId) represents the location of thread Ttid where sId stmtIds
is the identifier of the statement and instId represents the thread-local instance identifier of the
trace statement with statement identifier sId; i.e., if a trace statement is executed again inside
a loop, a new event will be generated at run-time with the same sId and a new instId. Let EV
EV .
Definition 3.2.3 (Global Locations in Symbolic Traces). Let be a symbolic trace. The global
location at [j] is defined as a tuple (loc1 , loc2 , . . .) where loci is the location of thread Ti at
[j], i.e., the location of Ti in the last event of thread Ti in before [j].
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 61
A global location at a specific point in a symbolic trace defines the location of each thread
at that point.
Given a symbolic trace , we build a concurrent program where each thread Ti consists of
a single path of execution (specifically the path it took in ). We refer to this program as a
concurrent trace program (or CTP). The semantics of CTPs is defined using state transition
systems.
Sk
Let V = i=1 {LVi } SV , be the set of variables and V al be a set of values. A state is a
map s : V V al assigning a value to each variable. We use s[v] and s[exp] to denote variable
sId refers to a statement of form (assume(c), asgn), s[c] is true, and for each assignment
v := exp in asgn, s0 [v] = s[exp] holds; s and s0 agree on other variables. Note that if
s[c] is false, the transition does not exist, i.e., the execution is blocked.
sId refers to a statement of form assert(c) and s[c] is true. When s[c] is false, an attempt
Now, we formally define concurrent trace programs obtained from symbolic traces.
is a set of events, and v is a partial order, where for any ei , ej E, we have ei v ej iff
A concurrent trace program CT P orders events from the same thread by their execution
order in ; events from different threads are not explicitly ordered. Let 0 = e01 . . . e0n be a
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 62
such that, s0 is the initial state of the program and for all i = 1, . . . , n there exists a transition
i e0
si1 si .
CTPs have been used to predict bugs in concurrent programs [90, 89]: Given a symbolic trace ,
a model CT P is derived to symbolically check all its feasible linearizations. For this, a logical
constraint formula CT P is created such that CT P is satisfiable iff there exists a feasible
CSSA Encoding. The CSSA form has the property that each variable is defined exactly once.
A definition of variable v is a trace statement that modifies v, and a use is a trace statement
where v appears in an expression. Unlike in the classic sequential SSA form, we do not need to
add -functions to model the confluence of multiple if-else branches, because in CT P , each
thread has a single control path. Throughout the transformation, trace statements in CT P are
changed as follows:
2. For each use of a local variable v LVi , replace v with the most recent (unique) defini-
tion v 0 .
3. For each use of a shared variable v SV , create a unique name v 0 and add the definition
the same thread, or a definition of v in another concurrent thread. Then, replace v with
From CSSA to CT P . Each event e is assigned a fresh integer variable O(e) denoting its
execution time. Let HB(e, e0 ) denote that e happens before e0 which is encoded as a logical
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 63
constraint: O(e) < O(e0 ). A path condition g(e) is defined for each event e in CT P as follows
where PCTOP , ST
CT P , and CT P encode program order, statements, and -Functions, respec-
1. Let PCTOP = ST
CT P = CT P = true, initially.
2. Program Order: For each event e in CT P with a thread-local preceding event e0 , let
exp, let ST ST ST ST
CT P := CT P (lval = exp). If e contains assume(c), let CT P := CT P
(g(e) c).
4. -Functions: For each w = (v1 , . . . , vk ), defined in e, suppose that ei is the event that
Vk
j=1,j6=i (HB(ej , ei ) HB(e, ej ))].
Intuitively, the -function evaluates to vi iff it chooses the ith definition in the -set.
Having chosen vi , all other definitions occur before ei or after the use of vi .
Checking for Bugs. Formula CT P encodes all feasible linearizations of CT P . To check for
a specific bug, e.g., an assertion or atomicity violation, another formula bug is built such that
Concolic testing is an effective test generation technique for sequential programs [24, 73, 5,
84, 4] for which different coverage criteria have been studied throughout the years. It assumes
that the behavior of a sequential program solely depends on the values of inputs provided by
the external environment, i.e., the program is deterministic. The main idea behind concolic
testing is to use information available in previous executions of the program to generate input
values that drive the execution towards covering uncovered parts of the program. It augments
traditional symbolic execution with concrete execution by falling back upon concrete values
observed during concrete execution to handle non-linear computations or calls to external li-
As shown in Figure 3.3, concolic testing has three main components: concolic execution
engine, path exploration, and realizability checker. Concolic execution engine executes the
program with a given input vector concolically (i.e., executes the program with concrete and
symbolic input values at the same time) and as a result, generates a symbolic trace that con-
sists of a sequence of path constraints on symbolic inputs (i.e., branch conditions based on
symbolic inputs encountered during execution). It generally fall back upon concrete values to
handle non-linear computations or calls to external library functions. Different coverage cri-
teria have been investigated and employed by concolic testing techniques. For example, path
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 65
coverage targets exploring all possible program execution paths, or control-flow coverage and
its variations such as basic block coverage and explicit branch coverage target code coverage.
These coverage criteria quantify a degree to which the program has been tested.
Given a symbolic trace, the path exploration component then selects one of the branch con-
ditions and negates it (while keeping the previous branch conditions the same). The goal is to
try to diverge from the already observed executions by taking a different side of an encountered
branch. The path exploration component can follow a simple DFS or utilize some heuristics
(e.g., branch statements, overall stack trace, the depth of the branches, and etc.) in selecting
the target branch. Finally, the realizability checker component uses SMT solvers to generate
an input vector (if possible) that would satisfy the new path constraints, with the understanding
that such an input vector is likely to drive the execution of the program towards a different path.
models for programs. To that end, each thread is subjected to sequential concolic testing, first,
to increase branch coverage in individual threads. After each execution, essential information
about the run (i.e., statements executed), current coverage (i.e., covered/skipped branches), and
writes to shared variables in the execution is stored. Upon saturation, multi-trace analysis is
used to generate new test inputs and thread schedules to cover previously uncovered branches.
The intuition behind this approach is that some of the bugs in concurrent programs might
be sequential bugs that do not relate to any specific interleaving. The idea is to catch those bugs
by sequential testing, which is cheaper than concurrent testing, without requiring to consider
the interleaving space. Then, concurrent test generation aims to cover the remaining uncov-
ered branches by exploring the input space and the interleaving space simultaneously to find a
Our MTA works as follows: First, we select a target branch of interest based on the current
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 66
coverage information. Then, we pick a stored run that has been previously observed to come
close to the target branch but skipped it, i.e., the condition of the branch did not hold when
the branch was hit. Suppose that the uncovered branch depends on a set of shared variables
S. Generally, the branch condition may not be in terms of shared variables, but by intra-thread
Then, we choose candidate interloper segments from the set of so-far stored runs, such that
the interloper segments contains a write to a shared variable in S. Note that these interloper
segments may contain executions of multiple threads. The idea is to select an interloper seg-
ment and insert it (not necessarily atomically) into the runs that came close to the target branch,
such that some of the shared variables on which the uncovered branch depends, are overwritten
We encode all possible interleavings where the interloper segment is inserted in the se-
lected run and the target branch condition is satisfied as an SMT problem. Any solution to this
problem implies input values and a schedule that covers the target branch.
In this section, we first briefly discuss how we perform sequential testing of concurrent pro-
grams as the first step of our test generation technique. Then, we present our multi-trace
analysis in detail.
In order to perform sequential testing of a concurrent program, we first execute the program
with a set of random inputs, I, to obtain a symbolic trace of the program (represented by ).
Then, we focus on sequential testing of each thread Ti at a time. Based on the observed trace,
ordering constraints between the events of different threads in such that thread Ti is executed
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 67
sequentially and without any interference from other threads (if possible).
To do so, we generate happens-before relations on the events of to enforce all of the events
of other threads to happen after the last event of Ti in . In cases where the complete sequential
For sequential testing of thread Ti , we apply a traditional concolic testing technique [4]
starting with input set, I, and following the schedule implied by 0 . Then, we perform a depth-
first search, making a path constraint (i.e., conjunction of the condition of the branches tra-
versed) corresponding to the inner-most uncovered branch in Ti while requiring the condition
of the uncovered branch to be true according to 0 . A satisfiable solution for these constraints
Without loss of generality, we assume that there is an if-branch in thread Ti (whose condition
depends on a shared variable x) which could not be covered by sequential concolic testing.
Furthermore, we assume that there is a run rn which hits the corresponding if-statement
while the condition of the if-statement is evaluated to false. The main goal is to generate a
test (i.e., input values and a schedule) in which the last write to x before the if-branch in rn is
overwritten by another write to x and the branch is covered. To that end, we find an interloper
segment from a run (could be different from rn), with a write to x, that could be soundly (but
not necessarily atomically) inserted after the last write to x in rn and search for possible input
Algorithm 1 presents our test generation technique using MTA. The inputs of the algorithm
include a concurrent program P , a set of branches Brs that are left uncovered during sequen-
tial testing, and a set of successful runs of the program Rns. Initially, Rns mostly contains
sequential runs, but over time it accumulates multi-threaded executions as well. In fact, we
extend the set of program runs (that form an under-approximation for the concurrent program)
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 68
4 while Rns0 6= do
7 while V ars 6= do
11 gLoc getGlobalLocation(rn, e)
13 while segs 6= do
16 if cs is satisfiable then
For each uncovered branch, the algorithm goes over the runs hitting the branch and tries to find interloper segments
from other runs (with a write of variable var affecting the branch condition) and insert it in the runs hitting the
branch and search for input values and a schedule that cover the branch. The interloper segment should be inserted
between the last write (w) and the last read (r) of var before the branch. getGlobalLocation returns global location at
the insertion point, findEnterloperSegments returns interloper segments, logicalEncoder encodes the insertion problem
as a logical constraint system. Any solution to the constraint system implies input values and a schedule that cover
the branch. getUncoveredBranches returns yet uncovered branches according to the newly executed test run.
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 69
after each program execution correspondingly. The set of branches Brs is also updated with
the branches skipped in the execution (if they have not been previously covered).
The main loop (lines 1-20) goes over the uncovered branches in Brs one by one and tries
to generate a test that covers the target branch. For each uncovered branch br, it picks a set of
runs Rns0 from Rns that hit the branch condition. Obviously, the branch condition is false
in all of these runs. Then, in lines 4-20, it iterates over these runs, searching for an appropriate
interloper segment that could be inserted in the run. For each of these runs rn, it first finds
the set of shared variables V ars whose values affect the branch condition by performing a
traditional def-use analysis on rn. Then, for each of these variables var, it tries to find a
segment containing a write to var that can be inserted after the last write to var in rn (lines 7-
20).
For a variable var V ars, let (w, r) be a pair of events where w denotes the last write
to var before the target branch and r denotes the read of var just before the branch in rn. To
break this write/read matching and overwrite the effect of w, the interloper segment should be
inserted between w and r in rn. The interloper segments should be selected in such a way that
they could be inserted soundly in rn. At a minimum, threads executing in the segment should
be at the same locations as they are at the insertion point in rn, i.e., the global locations at the
beginning of the segment and insertion point should be the same. The while loop at line 10,
goes over the global locations at an event e in rn, such that w < e < r, where < represents the
order of the events in rn, and tries to find an appropriate set of interloper candidates.
Given a global location gLoc, a variable var V ars, and a set of runs Rns, algorithm
findEnterloperSegments returns a set of segments from Rns such that the global location at
the beginning of the segments is consistent with gLoc and each segment contains a write to var.
We discuss findEnterloperSegments (Algorithm 2) in detail, later. The while loop at line 13,
goes over the interloper segments and calls logicalEncoder engine which generates a set of
logical constraints encoding the set of all feasible runs of the program that result from inserting
a specific segment seg at gLoc in rn before the affecting read r such that the condition of br
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 70
used to check the satisfiability of the constraints. Any model satisfying the constraints implies
a set of input values and a schedule that would cause the branch to be covered. In that case,
the program is executed with the generated inputs according to the generated schedule to get a
new run rn0 . Then, rn0 is added to Rns and the skipped branches in rn0 are added to Brs if
Given a global location gLoc, a shared variable var, and a set of runs Rns, findEnterloperSeg-
ments (Algorithm 2) returns a set of segments from Rns such that the global location at the
beginning of the segments is consistent with gLoc and each segment contains a write to var.
The while loop at line 3 goes over the runs in which there is at least one write to var. For each
run, it iterates over the set of writes to var and finds candidate segments containing a write w
to var and starting at a global location consistent with gLoc. Note that write w might be pro-
tected by some locks in the corresponding thread. In that case, the interloper segment should
release all of those locks to let other threads be able to obtain the locks in the future without
being blocked. Therefore, to build the interloper segment, the algorithm first moves forward to
the first event ev after w in rn where the corresponding thread of w does not hold any locks.
Let getFirstLockFreePoint(rn, w) return such event ev at line 7 (ev is equal to w when w is not
a protected write).
Then, event ev is added to the segment and the algorithm moves backwards in rn, adding
events to the segment, until it reaches w (lines 10-13). Then, it continues moving backwards
in rn, considering the preceding events and adding them to the segment, until it reaches at
a global location consistent with gLoc (if possible). Note that there might be some threads
not active in the segment. Requiring the location of such threads to match with gLoc is too
restrictive and could miss useful segments. Therefore, as we move backwards in rn, we keep
track of the active threads in a set T hreads. While moving backwards in rn, the algorithm
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 71
3 while Rns0 6= do
6 foreach w W rts do
7 ev getFirstLockFreePoint(rn, w)
8 seg ev
9 T hreads {tid(ev)}
10 while ev 6= w do
11 ev preceding event of ev in rn
12 seg ev.seg
15 while gLoc|T hreads 6= gLoc0 |T hreads and ev is not the first event in rn do
16 ev preceding event of ev in rn
18 seg ev.seg
22 return segs
Given a global location gLoc, variable var, and a set of runs Rns, the algorithm returns all interloper segments
consisting of a write to var from Rns which start at a global location consistent with gLoc and end at the first
lock-free point after the write. getFirstLockFreePoint returns the first lock-free point after a given write in a given
run. The algorithm then moves backwards in the run, until it reaches a global location consistent with gLoc (if
possible). T hreads keep track of the active threads in the interloper segment which are the only threads whose
locations are required to match gLoc at the beginning of the interloper segment.
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 72
searches for a global location gLoc0 where the projection of gLoc0 and gLoc to the threads in
T hreads are equal (lines 15-19); i.e., gLoc|T hreads = gLoc0 |T hreads . If such global location is
reached then the segment is added to the set of appropriate segments which is returned by the
Given a run rn, an uncovered branch br (which is skipped in rn), a shared variable var that
affects the branch condition, an event r in run which reads from a shared variable var, an
interloper segment seg, an event w in seg that writes to var, and a global location gLoc in rn
representing the insertion point, logicalEncoder generates a logical constraint system encoding
the set of all feasible runs, in which the schedule is the same as in rn until reaching gLoc and
then events in the interloper segment are interleaved with the events in rn after global location
gLoc in a way that r is guaranteed to read the value written by w, and the condition of br is
satisfied.
We call the event sequence in rn before the global location gLoc, the prefix segment and
the event sequence after gLoc and before br, the main segment. The inputs of the program are
treated symbolically such that we could use SMT solvers [9, 10] to simultaneously search for
input values and a schedule that would cause br to be covered. The SMT encoding is based
on the concurrent trace programs (See Definition 3.2.4) of the main and interloper segments.
However, unlike symbolic prediction, we consider symbolic values for inputs to be able to
Let CT Pmain and CT Pint denote the CTPs of the main and interloper segments, respec-
tively. Note that findEnterloperSegments ensures that the location of each thread, being active
in the interloper segment, is the same at the beginning of both segments. Therefore, each active
thread in the interloper segment should have a maximum common prefix of locations in both
CT Pmain and CT Pint . The thread may then diverge after this prefix in the segments.
Suppose that E main and E int represent the set of events in the main and interloper segments,
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 73
respectively. Note that not all of these events may be required for test generation. Indeed,
certain events may be inconsistent with each other, e.g., if they originated from diverging runs.
Therefore, for each event ei E main E int , an indicator bit bei is considered whose value
determines whether the event is required to happen before the target branch or not.
logicalEncoder generates a constraint formula such that is satisfiable iff there exist
input values and a schedule (which follows the prefix segment and then interleaves the execu-
tion of threads in the main and interloper segments) that covers br. consists of 7 different
sub-formulas:
= F P P O ST BR AW R Ind .
where e0i is the predecessor of ei in the prefix segment. This keeps the order of
that g(ei ) is required to be true in any case since all of the events in the prefixed
Section 3.2.3. This reserves the order of events in both main and interloper segments.
tion 3.2.3. This encodes the statements in both main and interloper segments.
4. -Functions ( ): Define a new -function for each shared variable use in E main E int
{r} to include definitions in the prefix, main, and interloper segments. As in standard
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 74
Vk
j=1,j6=i (HB(ej , ei ) HB(e, ej ))], where ei denotes the event that defines vi .
the uncovered target branch. In rn, there is an event ebr that relates to the statement
Let BR = c g(e0 ) where e0 is the predecessor of ebr in the corresponding thread. This
6. Affecting Write/Read Matching (AW R ): Let Wvar represent the set of all events that
HB(w, ei )), guarantees that the read of var in r is matched the write to var in w.
saying that if bei is true then its path condition should be true as well and it should
happen before the target branch. Otherwise, it should happen after the branch.
Ind (bei bej ) saying that each event happening before the branch requires that
its preceding event (in the same thread) also happen before the branch.
Let demain
Ti and deint
Ti represent the first events of thread Ti after the common prefix
deint
T i , for each thread after this point we should consider events either from the
main segment or from the interloper segment; i.e., for each active thread Ti in the
Discussion on Scalability. As mentioned in Section 2.7, the main drawback of symbolic pre-
diction (using CTPs) is that it would encounter scalability issues for large runs. The reason is
that the SMT encoding is done at statement level which considers local computation as well as
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 75
threads, the size of the SMT problem grows rapidly as the size of CTP increases. Although our
test generation technique here utilizes CTPs, the proposed SMT encoding is scalable to large
CTPs. This is because normally, both the main and interloper segments are very small com-
pared to the length of the runs. The SMT encoding keeps a large part of the generated schedule
(i.e., the prefix segment) fixed and only encodes the interleavings of events in the main and
interloper segments. This decreases the size of the generated SMT problem. We show this
3.5 Evaluation
We have implemented the test generation technique based using MTA on top of F USION [90]
F USION is changed to consider symbolic values for input variables. We subjected our tool
ing technique. In the following, we briefly discuss the implementation and then present our
experimental results.
3.5.1 Implementation
Figure 3.4 presents the architecture of the tool. It has four main components: a sequential
concolic execution engine, a multi-trace analysis engine, an SMT solver, and a concurrent
execution engine. Sequential concolic execution engine performs sequential testing of con-
current programs as discussed in Section 3.4.1. Multi-trace analysis engine consists of four
sub-components: coverage-guided target selection, concurrent test run selection, interloper se-
lection, and multi-trace SMT encoder. Coverage-guided target selection selects a target branch
according to the number of attempts that have been made for covering that branch; branches
with less number of attempts have priorities over the others. For a target branch, concurrent test
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 76
run selection returns a set of runs with different prefixes before skipping the branch. Interloper
selection and multi-trace SMT encoder implement the algorithms presented in Section 3.4.
Given a set of inputs and a schedule, concurrent execution engine executes the program with
3.5.2 Experiments
We evaluated the effectiveness and efficiency of the test generation technique using MTA by
apache1 and apache2 are programs corresponding to two bugs in A PACHE FTP server from
BugBench [46]. apache1s and apache2s are simplified versions of apache1 and apache2,
respectively, where we removed parts of the code that were immaterial to branches with re-
spect to shared variables. ctrace is a fast, lightweight trace/debug C library. ctrace1 and
ctrace2 are two test drivers using this library, which contain some data races. splay is a
program built using a C library implementing several tree structures. Finally, aget is a multi-
% branch coverage
Num. of branches
Num. of branches
covered branches
covered branches
Program (LOC)
with interlopers
Num. of inputs
Num. of tests
Num. of tests
seq.conc.
Bug found
Num. of
Num. of
Time (s)
Time (s)
bluetooth Thread
3 each 12 2 7 1 4 5 3 1 1 yes 58100
(88)
apache1s
1 each 8 4 7 3 1 1 1 2 8 yes 87100
(253)
apache1
3 each 22 6 16 10 2 2 2 2 32 yes 7281
(640)
apache2s
2 each 10 3 7 2 2 2 3 1 5 yes 7090
(268)
apache2
3 each 22 4 15 9 1 1 3 2 81 yes 6873
(864)
aget
1-fixed each 18 1 12 121 1 1 2 1 179 yes 6672
(680)
Table 3.1 presents the experimental results. We report the number of inputs of programs and
for each thread in each program we show the total number of branches reported by F USION.
Note that according to the simplifications (e.g., constant propagation and etc.) applied on the
observed traces by F USION, the number of branches reported by F USION is less than the actual
number of branches in the program. For example, F USION omits branches which depend only
on local variables or relate to sanity checks on the system execution and does not include them
The table also contains information about the sequential testing of threads and the multi-
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 78
trace analysis. For sequential testing, we report number of generated tests, number of covered
branches, and total time spent in testing each thread. For multi-trace analysis, we report number
of generated tests, number of covered branches, total time, whether any bug is found, number of
branches for which some interloper segments were found, and the average number of interloper
segments found per branch. The table also presents the improvement in the percentage of
Observations: The experiments show that the sequential testing is able to cover a large number
of branches quite fast. Note that some programs in our benchmark suite have fixed inputs. For
example, aget expects a URL as input which we fixed for testing purposes. The sequential
testing of these programs consists of a single execution of the program. As can be seen in the
table, execution of these programs itself takes some considerable amount of time.
The experiments also show that MTA is successful in increasing branch coverage over se-
quential testing. For example, in case of splay, MTA increases branch coverage from 33% (in
sequential testing) to 80%. According to the number of branches with interloper segments, we
can see that for not many uncovered branches MTA could find interloper segments. However,
often a test generated to cover an uncovered branch can lead to covering some other yet un-
covered branches as well. Furthermore, trying only a few number of interloper segments were
Another observation is that the total time spent on MTA is reasonable in practice. This is
largely due (1) relying on the strength of sequential testing techniques to cover most branches
sequentially, and (2) the effectiveness of MTA in finding interloper segments to cover target
branches.
Furthermore, MTA is very effective in finding concurrency bugs. All of the bugs found
in the benchmark suite were revealed by MTA by covering a branch that was not coverable
in sequential testing. This suggests that branches that cannot be covered by purely sequential
Comparison with prediction techniques: Some programs in our benchmark suite have fixed
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 79
apache1s 35 9s no 2 1s yes
Table 3.2: Comparing MTA with symbolic prediction using F USION. DNF: did not finish,
inputs. One can claim that predictive analysis may generate the same results in these bench-
mark. Note that MTA is different from predictive analysis (for which the inputs are fixed as
well) in the sense that MTA aims to increase branch coverage (by exploring both input and in-
terleaving spaces of the program) while predictive analysis does not perform input exploration
To investigate the need for automated input generation for concurrent programs, we com-
pared our tool with F USION [90], based on our set of benchmarks. F USION tries to find con-
currency bugs such as data races. Since it does not report on coverage, here we only highlight
whether the known bugs in the benchmarks are discovered by F USION or not. In Table 3.2, we
report the number of generated runs (i.e., schedules in this case) and the time spent by F USION
on prediction with both some randomly generated fixed inputs and some bug-triggering inputs
generated by MTA.
We observe that for most of the benchmarks, F USION cannot discover the bugs with random
inputs. It can spend substantial analysis time in searching for alternative thread interleavings
without finding the known bugs, as in the case of apache2. However, it performs pretty
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 80
well for some of the benchmarks with bug-triggering inputs generated by MTA. For ctrace
benchmarks, F USION did not finish due to the large overhead of symbolic prediction using
CTPs which encodes all feasible permutations of events considering all computation (global
as well as local) in the observed run. Although MTA also utilizes CTPs to generate tests, it
was able to handle ctrace benchmarks, since as discussed before, often a large prefix of the
generated schedule is fixed according to the runs in which interloper segments are inserted.
Conclusion: Our experiments showed that our MTA is very effective in increasing branch cov-
erage in concurrent programs. Furthermore, using the MTA, we could find a large number of
bugs in our benchmarks. This confirms that increasing branch coverage in concurrent programs
is an appropriate approach for bug finding. We also compared our MTA with F USION which
is a symbolic prediction tool that explores the interleaving space with fixed inputs. Our ex-
periments showed that applying F USION with some randomly generated inputs on concurrent
programs in most cases fails to find program bugs. However, our MTA analysis was able to find
the bugs in the benchmarks as it performs input exploration as well as interleaving exploration.
This shows the need for automated input generation in bug finding.
Similar to MTA, some recent work [80, 79] use concurrent trace programs as approximation
models of concurrent programs. These work target assertion violations, i.e, input/interleaving
exploration is tailored towards finding assertion violations. The technique in [80] utilizes both
rent trace programs to capture a suitable communication for finding assertion violations (in
concurrent trace programs). In [79], a two-staged analysis is proposed which separates intra-
and inter-thread reasoning. The first stage uses sequential program semantics to obtain a pre-
cise summary of each thread in terms of the accesses to shared variables made by the thread.
The second stage performs inter-thread reasoning by composing these thread-modular sum-
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 81
maries using the notion of sequential consistency to find assertion violations. However, there
are main differences between MTA and these work; MTA targets test generation, for exploring
different program behaviours according to the uncovered parts of the program. Moreover, these
techniques keep the under-approximation fixed and are restricted to concurrent trace program
behaviours; e.g., they cannot check program assertions that do not show up in the concurrent
trace program. MTA, on the other hand, extends the program approximation during the testing
process. It uses program approximations to generate tests that target exploring parts of the
program that do not appear in the approximation; i.e., it explores program behaviours that are
There are also some work exploiting analyses based on (def-use) pairs of shared variables
in concurrent programs. For example, Shi et al. use invariants based on a def-use relation (ob-
tained from a set of bug-free program runs) for bug detection and error diagnosis [76]. Wang et
al. [91] follow a similar approach for coverage-guided testing. They utilize dynamic informa-
tion collected from bug-free test runs to learn ordering constraints over the memory-accessing
and synchronization statements. These ordering constraints are treated as likely invariants and
are used to guide the selection of interleavings for future test execution. However, none of
Our notion of interloper segments is related to a work by Shacham et al. [75], where they
tions from a single adversarial environment. There, the search is guided towards interleaving
non-commutative simple operations. However, interloper segments used by MTA may contain
events from multiple threads, and are used to increase branch coverage in concurrent programs.
At a high level, our main insight to separate sequential coverage and leverage it for con-
current programs is similar to the insight by Joshi et al. [38], that many bugs in concurrent
programs can be found by sequential analysis. Their goal, however, was to improve the us-
ability of concurrent bug finding tools by filtering away bugs that can be reproduced purely
sequential.
C HAPTER 3. T EST G ENERATION BASED ON U NDER - APPROXIMATIONS OF P ROGRAMS 82
3.7 Summary
concurrent programs. We used concurrent trace programs as approximation models and devel-
oped a multi-trace analysis to cover the uncovered part of the program based on the so far seen
executions. More specifically, the analysis targets increasing branch coverage in concurrent
programs. It combines information available in multiple runs of the program to (1) focus on an
interloper segment of a run that provides values needed to take an uncovered branch, (2) insert
the interloper segment in another run by searching for input values and an interleaving that
would result in covering the uncovered branch. The multi-trace analysis encodes this problem
as a set of logical constraints and uses SMT solvers to search for possible input values and in-
terleaving. Our test generation technique utilizes the fact that the state-of-the-art sequential test
generation techniques are generally able to quickly cover a large part of the program. There-
fore, at the beginning, individual threads are exposed to sequential testing to increase branch
coverage as much as possible. Upon saturation, our test generation technique falls back to the
a tool that supports concurrent C programs. We performed some experiments that show the ef-
fectiveness of our multi-trace analysis in increasing branch coverage and finding concurrency
Testing techniques for sequential programs are often coupled with a notion of coverage that the
techniques provide. Different program coverage criteria (e.g., path coverage, branch coverage,
etc.) have been introduced and targeted by sequential testing techniques. These coverage
criteria quantify the effort put into the testing process in a meaningful way. For example, a
sequential testing technique that provides branch coverage assures the tester that all of the
branches in the program that could be covered under some input values are actually explored
Providing such coverage guarantees for concurrent programs is challenging since in ad-
dition to the input values, the exploration space is affected by the interleaving of execution of
threads. Heuristics like context bounding [53, 54, 55] and delay bounding [12] were introduced
and used by many techniques to reduce the exploration space into a manageable set that pro-
vides a meaningful coverage for concurrent programs. They characterize a subset of the search
space by a bounding parameter p. As p is increased, more program behaviors are explored and
in the limit it is guaranteed that all program behaviors are explored. For example, C HESS [53]
is a tool from Microsoft that provides coverage guarantees on the interleaving space, i.e., all
83
C HAPTER 4. B OUNDED -I NTERFERENCE : A H EURISTIC FOR P ROVIDING C OVERAGE G UARANTEES84
of the program fixed. Also, several sequentialization techniques [44, 42, 85, 62, 63] utilize
the context bounding heuristic to search over both input and interleaving spaces (modulo con-
text bound) for finding assertion violations in concurrent programs. The main problem with
these heuristics is that they are defined based on the notion of control-flow among the threads
(ignoring data completely) and therefore search strategies that utilize these heuristic are not
guaranteed to be efficient. In fact, many thread interleavings might be equivalent to each other
according to the way threads interfere with each other and therefore exploring all such inter-
tests with coverage guarantees for concurrent programs. An interference happens whenever a
thread reads a value from a shared variable which is provided by another thread. Based on the
that, since it is defined from the point of view of flow of data between threads (in contrast to
the control-based notions such as context bounding), it can be very naturally incorporated into
a setting in which the search for input values and interleaving can be performed in a unified
manner. This heuristic might have applications beyond test generation; e.g., it can be used in
model checking and program verification, bug localization, bug fixing, and etc. Here, we focus
on the test generation aspect. Utilizing this heuristic, we have developed two different test
generation techniques, based on concolic testing techniques, for concurrent programs, which
An interference happens whenever a thread reads a value from a shared variable which is pro-
the degree of interference among threads while exploring program behaviours; i.e., first all
program behaviours without any interference are explored. After that, all program behaviours
with only one interference are explored. Then, all program behaviours with only two interfer-
ences are explored, and so on. In the following, we present the application of the bounded-
Figure 4.1 shows a simplified model of the Bluetooth driver [62]. There are two dispatch
functions, called Add and Stop. Function Add is called by the operating system to perform I/O
in the driver and Stop is called to stop the device. There are four shared variables: pendingIO,
C HAPTER 4. B OUNDED -I NTERFERENCE : A H EURISTIC FOR P ROVIDING C OVERAGE G UARANTEES86
ized to 1 and keeps track of the number of concurrently executing threads in the driver. It
is incremented atomically whenever a thread enters the driver and is decremented atomically
whenever it exits the driver. The boolean variable stoppingFlag is initialized to false and
will be set to true to signal the closing of the device. New threads are not supposed to enter
the driver once stoppingFlag is set to true. Variable stoppingEvent is initialized to false,
and will be set to true after pendingIO becomes zero. Finally, stopped is initialized to false
and will be set to true once the device is fully stopped; the thread stopping the driver sets it to
true after it is established that there are no other threads running in the driver. Threads that call
function Add expect stopped to be false (assertion at line 10) after they enter the driver.
Consider a concurrent program with two threads, T and T 0 , calling Add and Stop, respec-
tively. The assertion at line 10 in function Add ensures that the driver is not stopped before
T starts working inside the driver. It is easy to see that this assertion always passes if T is
executed sequentially, i.e., without any interference from T 0 . Therefore, if the assertion at line
10 is to be violated, it will have to be with some help from T 0 , where a shared variable read in
We start by digressing slightly from the fully sequential execution of T , by letting only one
reads the value written by T 0 at line 22 then the assert statement at line 10 is not reachable
since the if-branch at line 9 will not be covered. Selecting the read from pendingIO at line
6 in T as the non-local read, forces the read from stop in the assertion statement to read the
initial value false (since only one read can be non-local), and hence the assertion check will
be passed successfully. Finally, if we select the read from stopped in the assertion statement
as the non-local read then it has to read the value written by T 0 at line 30. However, since
both threads read and write to pendingIO at lines 6 and 24, there is no interleaving in which
none of the reads from pendingIO is non-local. Therefore, the assertion cannot be violated by
making only one read non-local. So, we digress more by allowing two reads of shared variables
C HAPTER 4. B OUNDED -I NTERFERENCE : A H EURISTIC FOR P ROVIDING C OVERAGE G UARANTEES87
to be non-local.
With two non-local reads, one can see that the assertion at line 10 can be falsified if the
reads from pendingIO (at line 6) and stopped (at line 10) read the values written by T 0 at
lines 24 and 30, respectively. A feasible interleaving that realizes this interference scenario
would be the one in which T is executed first until it evaluates the branch condition at line 3,
The concept of context bounding was first introduced by Qadeer et al. [62] and later used by
many techniques in test generation [53, 55], model checking [61], and sequentialization [44, 42,
85]. It characterizes a subset of program execution with a bounding parameter, i.e., the number
of context-switches between threads. The idea is to incrementally increase this bound (starting
from 0) and explore all program executions within the bounded number of context-switches.
It is based on the conviction that most concurrency errors will be discovered within a small
number of context-switches. In practice, the bound for the number of context-switches cannot
go beyond 2 or 3 for real programs because of the large number of interleavings required to be
explored even for a small bound. The main drawback of the context bounding heuristic is that
it is defined based on the notion of control-flow among the threads and it completely ignores
data-flow. In fact, many thread interleavings might be equivalent to each other according to the
way threads interfere with each other and therefore exploring all such interleavings imposes a
Bounded-interference heuristic, on the other hand, is defined based on the notion of data
flow among threads. All program executions with exactly the same set of interferences are
behaviourally equivalent under the same input values, no matter how the execution of threads
are interleaved. Therefore, for given input values it suffices to try only one of possible in-
terleavings that realize each interference scenario. This property presents an advantage of
C HAPTER 4. B OUNDED -I NTERFERENCE : A H EURISTIC FOR P ROVIDING C OVERAGE G UARANTEES88
bounded-interference over context bounding; according to context bounding, all possible in-
terleavings (no matter whether they are introducing new interference scenarios or not) with a
One interesting question is that whether there is any relation between the minimum bound
required to catch a bug using bounded-interference and context bounding heuristics, respec-
tively. Note that in both heuristics, all program behaviours will be explored in the limit, and
therefore all program bugs will be caught by both heuristics in the limit. However, the prob-
lem is that the search space (even for a small bound) can be really large for real programs that
makes it impossible to reach the limit. Theoretically, every bug that can be discovered using the
number of context-switches, and therefore is discoverable using the context bounding heuristic
with a bound at least as large as k 0 . However, there is no relation between k and k 0 in general;
there are program bugs for which k is smaller than k 0 and program bugs in which k 0 is smaller
than k. For example, the bug in the program presented in Section 4.1 requires at least 2 num-
smaller than k; there a single context-switch can introduce several interferences. However, the
cases where k is smaller than k 0 might not be so obvious in the first place. In the following, we
present a buggy program which requires at least 3 context-switches but only one interference
Consider the buggy program in Figure 4.2. There are two threads, Thread1 and Thread2
where both have a batch update section in which they update a shared memory location G
several times. Let us assume that according to the specification of the program, threads cannot
be in their batch update section simultaneously. The input variable turn determines which
thread can perform its updates first. Shared variables start and done identify the thread that
just started and finished its updates, respectively. Each thread waits until turn indicates that it
has permission to start its batch update (implemented by a while loop with a wrong condition).
C HAPTER 4. B OUNDED -I NTERFERENCE : A H EURISTIC FOR P ROVIDING C OVERAGE G UARANTEES89
As soon as it gets the permission it updates start and enters the batch update section where it
updates G for 50 times. After performing the updates, it gives the turn to the other thread to give
it a chance to perform its updates (if it has not performed its updates yet) and updates done
accordingly. An assertion then ensures that if the thread is the most recent one that started
updating G and also the most recent one that finished its updates, then turn should be set for
the other thread. Assume that the assertion statement is executed atomically.
However, the wrong condition of the while loops would cause input values other than 1
and 2 violate the specification requirement. This bug requires at least 3 context switches to be
found, i.e., k 0 = 3. One execution that violates the assertion in Thread1 would be the one in
which Thread2 is executed until it updates the start variable. Then, there is a context-switch
and Thread2 enters its batch update section (overwriting start), performs its batch updates,
and updates the turn variable. Then, there is another context-switch and Thread2 completes
C HAPTER 4. B OUNDED -I NTERFERENCE : A H EURISTIC FOR P ROVIDING C OVERAGE G UARANTEES90
its execution by performing its batch updates, overwriting turn and updating variable done.
Then, another context-switch happens and Thread1 continues execution by overwriting done.
At this point, the assertion in Thread1 is falsified and the bug is revealed. However, the
with the bound set to one (i.e., k = 1); it is enough to only consider the read from turn in
the assertion as a non-local read that reads the value written by Thread2 while finishing its
batch updates. Note that any schedule that realizes this interference scenario still requires 3
context-switches, but this way, one can focus on finding a feasible schedule (e.g., encoding it
as a SMT problem) realizing the interference scenario as opposed to exploring all interleavings
to always perform better than the other in catching concurrency bugs. We believe that these
heuristics can be viewed as two complementary heuristics and suggest that program analysis
The bounded-interference heuristic is defined based on the notion of data flow among threads.
Therefore, it can be incorporated into the sequential concolic testing techniques to explore the
interference scenario space (in addition to input exploration) to provide coverage guarantees
for concurrent programs. We have developed two different techniques with coverage guar-
interference concolic testing, by incorporating the bounded-interference heuristic into the se-
quential concolic testing. Here, we briefly discuss these two techniques. A detailed description
transform concurrent programs into sequential programs, with the aim of being able to ap-
C HAPTER 4. B OUNDED -I NTERFERENCE : A H EURISTIC FOR P ROVIDING C OVERAGE G UARANTEES91
ply sequential concolic testing techniques to the generated sequential program without any
on the number of interferences; given a concurrent program and a bound k on the number of
interferences, the sequentialization transforms the concurrent program into a sequential pro-
gram such that every execution of the sequential program corresponds to an execution of the
concurrent program with maximum k number of interferences. To be able to use the power of
sequential concolic testing techniques (namely, input exploration) to explore both input and in-
terference scenario spaces of concurrent programs, inputs of the generated sequential programs
consist of the inputs of the corresponding concurrent program as well as some other inputs that
specify interference scenarios (i.e., non-local reads and their matching writes). Therefore, by
applying concolic testing on the generated sequential program, the input space and the inter-
ference space of the concurrent program will be explored systematically. The main advantage
of this technique is that one can utilize of-the-shelf sequential concolic testing tools without
any modification and benefit from their coverage-oriented search algorithms to generate tests
bound) depending on the coverage guaranteed that the underlying sequential concolic testing
Bounded-interference concolic testing, on the other hand, adapts sequential concolic test-
ing techniques to support concurrent programs in the first place by incorporating the bounded-
interference heuristic in their search algorithms. To that end, sequential concolic testing is
equipped with one additional component, called interference exploration component, that enu-
colic testing, exploits the fact that sequential concolic testing is able to quickly cover a large
part of individual threads. Therefore, individual threads are exposed to sequential concolic
testing first to increase coverage as much as possible in individual threads. Indeed, parts
of the concurrent program that are not covered by sequential concolic testing may require
some interferences among threads to be covered. After sequential concolic testing of threads,
C HAPTER 4. B OUNDED -I NTERFERENCE : A H EURISTIC FOR P ROVIDING C OVERAGE G UARANTEES92
bounded-interference concolic testing considers uncovered parts of the program and explores
the interference space (modulo the interference bound) searching for input values and thread
schedules that would result in covering an uncovered part of the program. The main advantage
in bounded-interference concolic testing, provides coverage guarantees by its own and the
completeness of the technique does not depend on the coverage guarantees of sequential con-
Section 6 in detail.
Chapter 5
Sequentialization
techniques have been introduced in the literature [44, 42, 85, 62, 63] with the aim of reducing
the problem of concurrent program analysis to sequential program analysis. Throughout the
sequentialization, a concurrent program is transformed into a sequential program such that the
sequential program embeds a subset of behaviours of the concurrent program. Then, avail-
able techniques for sequential program analysis can be utilized for analyzing the resulting
sequential program (and hence the concurrent program). Most of the proposed sequential-
ization techniques utilize the context bounding heuristic to reduce the search space, i.e., the
sequential program embeds a set of concurrent program behaviours within a bounded number
of context-switches.
However, there are some problems with these sequentialization techniques that make it
infeasible to apply traditional sequential testing techniques on the generated sequential pro-
grams: (1) The generated sequential program is highly non-deterministic. This is because
93
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 94
gram. Most sequential testing techniques (specifically those that provide coverage guarantees
like concolic testing) assume that sequential programs are deterministic and hence they do not
techniques [44, 42, 63] are aimed to be used in a static setting (where the programs are not
executed) and for finding assertion violations. According to these sequentialization techniques
if one wants to execute the generated sequential program, he has to guess the values of shared
variables at the beginning of each context. As the result, the sequential program can get to un-
reachable states for wrong guesses. (3) The context bounding heuristic makes the exploration
process inefficient, i.e., many thread interleavings might be equivalent to each other accord-
ing to the way threads interfere with each other and therefore exploring all such interleaving
Because of the aforementioned problems, sequentialization techniques have been used only
in static program analysis so far. Naturally, they suffer from static program analysis limitations;
they do not perform well regarding memory tracking and calls to function libraries for which
the source code is not available. They normally cannot handle complicated cases and also
suffer from false positives (i.e., they might warn users in cases where the program is correct).
tion, targets test generation and hence state-of-the-art sequential testing techniques (e.g., con-
colic testing [24, 73, 4]) can be applied on the resulting sequential program without any mod-
ification. This way, we can employ and benefit from the advanced exploration algorithms
embedded into sequential testing techniques for an effective testing of concurrent programs.
that transforms P into a sequential program Pbk such that every execution of Pbk corresponds to
at least one execution of P (might be a partial execution) in which there are at most k number
of interferences. Our transformation effectively defers both the input generation and interfer-
ence scenario selection tasks to the sequential testing technique, by encoding both as inputs
to the newly generated sequential program. All program behaviours within a certain degree
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 95
of interference are encoded into the resulting sequential program, but the set of interferences
is determined by the values of some inputs in the sequential program. Therefore, both input
and interference spaces of the concurrent program P will be explored when the corresponding
sequential program is subjected to sequential testing. We effectively encode all feasible inter-
leavings for a set of interferences (defined by the inputs) as a set of constraints, and then use
Our concurrent program testing technique is then as follows: Each individual thread is ex-
posed to sequential concolic testing (i.e., interference bound k = 0), first. Then, we incremen-
tally increase k (starting with k = 1), sequentialize and perform sequential concolic testing on
the resulting sequential program to find bugs. Applying a sequential testing technique with spe-
cific coverage guarantees would provide coverage guarantees (modulo the interference bound)
Our transformation has the following limitations: (i) It works for concurrent programs con-
sisting of two threads. However, a study of concurrency bugs by Lu et al. [47] found that 96%
of concurrency bugs involve only two threads. Therefore, the choice of limiting concurrent
programs to contain only two threads should not be restrictive in finding concurrency bugs. (ii)
It only allows one thread to be interfered by the other one. Our experiments show that this was
not restrictive in finding concurrency bugs in our benchmarks, i.e., we could catch all of the
5.1 Preliminaries
In this section, we first fix the syntax of a simple sequential/concurrent programming language.
We use it later, in Section 5.2, to present the transformation algorithm. We also define the
notion of a consistent global trace which is used to prove the soundness of the sequentialization
technique.
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 96
< seq pgm > ::= < input decl >< main method >
< input decl > ::= inputs: < var decl >
< main method > ::= main { < var list >< stmt >; }
< var list > ::= vars: < var decl >
< stmt > ::= < stmt >; < stmt > | < simple stmt > | < complex stmt >
< simple stmt > ::= skip | x =< expr > | assume(< b expr >) | assert(< b expr >)
< complex stmt > ::= if (< b expr >) { < stmt >; } else{ < stmt >; }
< b expr > ::= true | f alse | x | < b expr > | < b expr > < b expr >
< conc pgm > ::= < input decl >< var list >< init method >< seq pgm >+
< complex stmt > ::= if (< b expr >) { < stmt >; } else { < stmt >; } |
Here, we define the syntax of a simple sequential/concurrent programming language with vari-
ables, either scalars or arrays, ranging over integer and boolean domains (Figure 5.1). We
assume that array sizes are specified statically during variable declaration. We also assume that
programs are bounded, while loops are unrolled for a bounded number of times, and function
Figure 5.1a presents the syntax of a simple sequential programming. A sequential program
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 97
has a list of inputs and a method named main, from which it starts the execution. The main
method has a list of variables and a sequence of statements. Statements are either simple (e.g.,
skip, assignment, assume, and assert) or complex (e.g., conditional statement). Expressions can
be integer constants, variables, or boolean expressions. Boolean expressions can be true, false,
or boolean variables and can be combined using standard boolean operations. Furthermore,
non-boolean expressions are implicitly transformed to boolean expressions in the natural way
(i.e., false when the expression is evaluated to zero and true, otherwise) when assigned to
boolean variables.
grams (called threads) running in parallel. Threads share some variables, and their inputs are
included in the inputs of the concurrent program. Here, definition of the complex statement is
A lock statement consists of a sequence of statements which are executed after acquiring a lock
obtains a lock on a variable, other threads cannot acquire a lock on the same variable unless the
thread releases the lock. Each concurrent program has a method, named init, for initializing
shared variables, and also for linking the inputs of the concurrent program to the inputs of the
individual threads.
We defined the notion of a global trace in Section 2.2.1, which is a sequence of accesses to
shared variables and synchronization operations. Note that a global trace does not contain any
information about local computations (i.e., reads and writes to local variables). In this chapter,
wherever we refer to program traces we mean global traces. However, we assume that program
threads are created statically (as in Figure 5.1b) at the beginning of the program and there is no
Definition 5.1.1 (Consistent Global Traces). A global trace (Definition 2.2.1) is consistent
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 98
tion 2.2.5).
Note that the programming language in Figure 5.1b does not allow dynamic thread creation
(i.e., all of the threads are created statically at the beginning). Therefore, every global trace is
creation-valid by default. As a result, the definition of consistent global traces are reduced to
Definition 5.1.2 (n-Interference Thread-Local Traces). A thread-local (i.e., all events corre-
spond to the same thread) global trace is a n-interference thread-local trace if there are n
events that read values different from what is written by the most recent write events to the
corresponding variables.
gram with an input set I. A n-interference thread-local trace for thread Ti is feasible if
it corresponds to an execution of Ti under some input values I while allowing n reads from
shared variables to read arbitrary values during the execution. We call such executions inter-
Note that the above definition does not consider any restriction for the arbitrary values read
by interfered read events. In fact, there might be no execution of P (under input values I) with
Lemma 5.1.4. Let P be a concurrent program and be a consistent global trace. If there
exists some input values I such that for each thread Ti in P , |Ti is a feasible n-interference
thread-local trace (for some n) under I then represents a feasible execution of P under I.
Proof. Since |Ti corresponds to a thread-local execution of each thread Ti under I with n
interference, therefore respects program order (i.e., execution order in each thread). More-
over, is data-consistent which shows that for each interfered read from a shared variable x
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 99
in a thread Ti there is another thread that can provide the value read by the interfered read for
execution of P under I.
Let P be a bounded concurrent program consisting of two threads T and T 0 , with a set of inputs
I. Given an interference bound k in T (i.e., number of reads that are allowed to read values
written by T 0 ), we transform P into a sequential program Pbk such that each execution of Pbk
is not interfered by T . The sequential program Pbk has an input set Ibk where I Ibk . The inputs
in Ibk \ I specify the interference scenario (i.e., the set of non-local reads and their matching
writes). Once k is fixed, there is a choice of which k reads to choose to get interfered in T and
which k writes to choose as their corresponding writes in T 0 . Program Pbk takes all of these
choices as inputs. This means that any sequential testing tool that explores the input space
systematically will naturally try all possible interference scenarios (within the computation
limit).
The sequential program Pbk has two copies of shared variables; each thread reads/writes on
its own copy. Pbk first simulates the execution of T 0 to let it perform all writes that are supposed
to provide values for non-local reads. Then, it simulates the execution of T where non-local
The sequential program Pbk simulates the execution of T 0 from the beginning until the first
lock-free point (i.e., thread T 0 does not hold any lock), where all writes that are supposed
to produce values for non-local reads, have occurred. The reason that the execution of T 0
is continued to a lock-free point (after all writes that provide values for non-local reads are
performed) is that Pbk ultimately simulates the executions of P in which the events of T and T 0
executions in which T is blocked forever for acquiring locks that are never released by T 0 .
Since T 0 uses its own copy of shared variables during execution, the sequential program
stores the values written by writes that are supposed to provide values for non-local reads, in
some auxiliary variables and later loads these values while the corresponding non-local reads
are being performed. When Pbk simulating the execution of T , it retrieves the value stored in
Note that not all interference scenarios defined by inputs are realizable. Therefore, we
have to ensure that there exists a feasible trace of P which (1) consists of the same events
as in the execution of Pbk , (2) observes for each interfered read in T the value written by the
corresponding write in T 0 , and (3) all other reads that are not involved in any interference read
values written by their own thread (or the initial value of corresponding variable when there is
no write to the variable before the read in that thread). To achieve this, all global events (i.e.,
accesses to shared variables and synchronization events) are logged during the execution of Pbk ,
and a set of constraints is generated that corresponds to the existence of a feasible trace. Every
time that T performs a read from a shared variable, we use a call to a SMT solver to check
for the satisfiability of these constraints. If the feasibility check passes, it means that there
exists a trace (representing an execution of P ), with the same set of global events, in which the
previous reads involved in interferences are reading from the writes defined by the interference
scenario, and all other reads read from local writes. In this case, the execution of Pbk continues.
Otherwise, the execution is abandoned to prevent exploring unreachable states. Note that since
the interferences are limited to the ones specified by the inputs, the state of the program after
passing each feasibility check is the same for any possible model of the constraint system.
Therefore, it is just enough to ensure the existence of a feasible trace to be able to proceed the
execution soundly. In the remainder of this section, we precisely define the transformation that
inputs: I; main(){
int[k] rds, wrts; //initialize G, G
vars: G, G; ...
int[k] vals; //read-write assumptions
bool[k] rDone, wDone; ...
[T];
assume(allWsDone());
[T];
}
Figure 5.2 illustrates the sequential program Pbk generated based on concurrent program P
consisting of threads T and T 0 . We assume that both T and T 0 are bounded sequential pro-
grams where loops are unrolled for a bounded number of times and function calls are in-lined.
Therefore, all reads from shared variables in T and all writes to shared variables in T 0 can be
identified and enumerated, according to their order in the corresponding thread. The input set
of Pbk consists of I (i.e., inputs of the concurrent program P ), and two arrays, rds and wrts,
of size k specifying k interferences; rds[i] stores the identifier of the ith non-local read in
T and wrts[i] stores the identifier of the write in T 0 which is supposed to provide a value
for rds[i]. In the case that the number of interferences is k 0 < k, we assume that rds[i] =
The sequential program Pbk has two copies of shared variables, G and G, on which T and
T 0 operate, respectively. Variable vals is an array of size k, where vals[i] stores the value
written by wrts[i]. There are also two arrays of size k, named rDone and wDone, such
that rDone[i] and wDone[i] indicate whether the ith non-local read and its matching write
have occurred, respectively. All elements of these arrays are initialized to false. wDone[i]
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 102
and rDone[i] become true when wrts[i] and rds[i] are performed, respectively. These
arrays are used to ensure that the corresponding reads and writes show up in the execution of
Pbk .
The main method in Pbk first initializes shared variables (according to the init method of
concurrent program P ). Then, it prunes away some obvious unrealizable interference scenarios
(explained in the next paragraph). Then, it calls the transformed version of T 0 (represented by
[T 0 ]) and ensures that all writes specified by wrts have occurred during the execution of
if wDone[i] is true for all 1 i k (if wrts[i] is not null) and false, otherwise. Finally,
As mentioned earlier, not all interference scenarios defined by rds and wrts are realiz-
able. The minimum (but not sufficient) requirement is that for each non-local read rds[i]
from a shared variable, its matching write wrts[i] should write to the same shared variable.
This is ensured through a set of assumption statements in the main method. Note that in our
transformation scheme, one always has the option of approximating the search space by allow-
ing only a subset of reads in T to be non-local, and also by selecting only a subset of writes
to the corresponding variable in T 0 as candidates for each non-local read. The main method
also ensures that in the case that the number of interferences is less than k (say k 0 < k), then
rds[1..k] and wrts[1..k] define the interference scenario and the rest of the elements
in rds and wrts are set to null. This is done by a set of assumptions ((rds[i] = null)
fix the number of interferences in advanced and it is defined by the rds and wrts inputs.
Furthermore, to avoid exploration of redundant interference scenarios (where the same set of
interferences are perturbed in rds and wrts arrays), the main method imposes an order on
the the set of non-local reads such that rds[i] < rds[i+1] for 1 i < k (if rds[i] and
two auxiliary functions, append and isFeasible, to check for the existence of a feasible
trace realizing the input interference scenario. Function append is used to add information
about global events to a log file. Each global event is a tuple (Ti , a) where Ti is the identifier
of the thread performing the event and a is a read/write action to a shared variable x or is
a lock acquire/release action on a lock variable l. At any point during the execution of Pbk ,
this log provides the exact sequence of global events that occurred up to that point. Function
isFeasible checks whether the log can correspond to a feasible trace of program P (cf.
Section 5.2.2).
Figure 5.3 (on the right side) contains the transformation function for the statements of thread
T 0 . The transformed program [T 0 ] is called by the main method and is executed until the first
lock-free point (i.e., T 0 does not hold any locks) at which all writes specified in wrts have
occurred. Note that the log contains all information necessary to determine which locks are
held at any point in the execution. Function returnCondition, used in [T 0 ], returns true
if T 0 is at a lock-free point and all writes in wrts are performed; otherwise, it returns false.
have a consistent log, for each shared variable access, we log an access to the corresponding
variable in G instead of G. For each shared variable x, let x0 denote the corresponding copy
for thread T 0 and let (b )expr0 be a (boolean) expression in which each shared variable x is
replaced by x0 .
For each expression, the transformation logs a read event from each shared variable read
in the expression. For each assignment statement writing to a variable x, the right-hand side
expression (b )expr is transformed first and (b )expr0 is assigned to the corresponding variable;
if x is a local variable then x is used as the left-hand side of the assignment, otherwise, x0 is
used as the left-hand side of the assignment to let T 0 work on its own copy of shared variables.
In case that the assignment statement writes to a shared variable, the transformation checks
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 104
(b )expr //for each read r of x in (b )expr (b )expr // for each read r from shared var
//where x is a shared var // x in (b )expr
if (r == rds[1]) { append(log, (T, rd("x", x)));
x = vals[1]; x = (b )expr [(b )expr];
rDone[1] = true; (x is a local var) x = (b )expr0
append(log, (T, rd("x", x)), 1); x = (b )expr [(b )expr];
assume(isFeasible(log)); (x is a shared var x = (b )expr0 ;
} else if (r == rds[2]) { and w is the id if (w == wrts[1]) {
x = vals[2]; of this write) vals[1] = x;
assume(rDone[1]); wDone[1] = true;
rDone[2] = true; append(log, (T, wt("x", x)), 1);
append(log, (T, rd("x", x)), 2); if (returnCondition()) return;
assume(isFeasible(log)); } else if (w == wrts[2]) {
} vals[2] = x;
..
. wDone[2] = true;
else if (r == rds[k]) { append(log, (T, wt("x", x)), 2);
x = vals[k]; if (returnCondition()) return;
assume(rDone[k-1]); }
..
append(log, (T, rd("x", x)), k); .
assume(isFeasible(log)); else if (w == wrts[k]) {
} else { vals[k] = x;
append(log, (T, rd("x", x))); wDone[k] = true;
assume(isFeasible(log)); append(log, (T, wt("x", x)), k);
} if (returnCondition()) return;
x = (b )expr [(b )expr]; x = (b )expr; }
(x is a shared var) append(log, (T, wt("x", x))) lock(x){ S } append(log, (T, ac(x)));
x = (b )expr [(b )expr]; [S];
(x is a local var) x = (b )expr append(log, (T, rel(x)));
lock(x){S} append(log, (T, ac(x))); if (returnCondition()) return;
[S]; assume(b expr) [b expr];
append(log, (T, rel(x))) assume(b expr0 )
whether the write is in wrts. If the write is supposed to provide a value for the j th non-
local read (i.e., the write is equal to wrts[j]), the value of the shared variable is stored in
vals[j] and wDone[j] is set to true. Then, a write event to x by T 0 is logged and function
For a lock statement on variable x, a lock acquire and a lock release event are logged right
before and after the transformation of the lock body, respectively. Furthermore, after logging a
lock release event function returnCondition is called to check whether the execution of
T 0 should be stopped. Note that sequential programs do not have any lock statement. Therefore,
here only the body of the lock statement is considered in the sequential program.
For assume and assert statements, the corresponding boolean expressions are transformed
before these statements and the transformed statements refer to (b )expr0 instead of (b )expr.
For each conditional statement, the transformation generates a conditional statement (having
(b )expr0 instead of (b )expr as the conditional expression) where the statements in both if
and else branches are transformed, correspondingly. Here, the conditional boolean expression
is transformed right before the conditional statement as well. The transformation of a sequence
Figure 5.3 (on the left side) presents the transformation function for the statements of thread
a candidate for a non-local read (observing a value provided by a write in T 0 ) while restricting
the total number of non-local reads to k. When transforming a (boolean) expression, for each
(i) r is selected as one of the non-local reads by inputs; if r is the j th non-local read, where
rDone[j] is set to true, indicating the j th non-local read has been performed which is
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 106
required when the next non-local read (i.e., rds[j+1]) is performed. Then, a read event
from x by T is logged and it records that it was the j th non-local read. Finally, it ensures
that a feasible trace exists that realizes the interference scenario so far. Therefore, it calls
(ii) r is a local read, i.e., it does not belong to the input set rds. A read event from x by
which this read and all previous local reads see values written locally (while all previous
non-local reads are matched with the corresponding writes as specified by inputs) exists.
For each assignment statement, the right-hand side expression, is transformed (as discussed
above), first. Then, the assignment comes unchanged. In case that the assignment writes to
a shared variable, a write event to the corresponding variable is logged. For a lock state-
ment on variable x, lock acquire and lock release events are logged right before and after
the transformation of the lock body, respectively. Assume and assert statements remain the
same unless the corresponding boolean expressions are transformed before these statements.
For each conditional statement, the transformation generates a conditional statement (with the
same conditional expression) where the statements in both if and else branches are trans-
formed, correspondingly. Here, the conditional boolean expression is transformed right before
the statement as well. The transformation of a sequence of statements includes the sequence of
The isFeasible function gets a log as its input and checks for the existence of a feasible
trace of the concurrent program (consisting of the events in ) in which rds[i] in is reading
) and all other reads are reading values written by local writes. The isFeasible function
generates a constraint system that encodes all such feasible traces and uses SMT solvers to
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 107
: PO LC WRCinterference WRClocal
Vm1 Vn1
PO: i=1 (tei < tei+1 ) i=1 (te0i < te0i+1 ) Cinit
find an answer. For each logged event e in , an integer variable te is considered to encode
the timestamp of the event. The constraints required for such feasible traces are captured using
timestamps.
Figure 5.4 illustrates the constraint system. It is a conjunction of program order constraints
PO: Let |T = e1 , e2 , ..., em and |T 0 = e01 , e02 , ..., e0n be the sequence of events in projected
to threads T and T 0 , respectively. According to the program order, each event cannot happen
unless its preceding event in that thread (according to ) has occurred. We also consider an
initial event einit which corresponds to the initialization of shared variables. This event should
happen before any thread starts its execution in any feasible trace; it is encoded as the constraint
Cinit in Figure 5.4. The constraint PO, ensures that the order of events in T and T 0 is preserved.
LC: Each feasible trace should be lock-valid; i.e., threads cannot hold the same lock simul-
taneously. Each lock acquire event aq of lock l in the log is matched by precisely one lock
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 108
release event rl of lock l in the same thread, unless the lock is not released by thread T in the
log. Each lock acquire event aq and its corresponding lock release event rl define a lock block,
represented by [aq, rl]. Let LT,l and LT 0 ,l be the set of lock blocks of lock l in threads T and
T 0 , respectively. Then, LC1 ensures T and T 0 cannot be inside lock blocks of the same lock
l, simultaneously. Turning to locks that are not released by T in the log, the constraint LC2
ensures that the acquire of lock l by thread T which is not released must always occur after all
releases of lock l in thread T 0 . In this formula, NoRelT,l stands for lock acquire events in T
WRCinterference & WRClocal : Let Wx represent the set of all write events to shared variable x in
the log, and LocW be a function that for each read event r from a shared variable x returns the
most recent write event to x in the log performed by the same thread; in case there is no such
write event, einit is returned. For each read event r from a shared variable x and write event
w to the same variable, the formula Coupled(r, w) ensures that r is coupled with w by forcing
all events that write to x to happen either before w or after r. Therefore, WRCinterference ensures
that each read rds[i] is coupled with wrts[i] and WRClocal ensures that all other reads are
coupled with the most recent local writes to the corresponding variables.
Now, we discuss soundness and completeness of our testing technique based on bounded-
interference sequentialization. Let P be a concurrent program with threads T and T 0 , and Pbk
is not interfered by T ).
Lemma 5.3.1. Suppose that an error is reached in [T 0 ] when Pbk is executed with some input
values I, rds, and wrts. The error corresponds to an error in concurrent program P , i.e.,
Proof. Let log be the log generated by Pbk during the execution. Since the error was revealed
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 109
in [T 0 ] and [T 0 ] simulates the execution of T 0 (on its own copy of shared variables) without
any interferences from T , log should represent a consistent feasible n-interference thread-
local trace of thread T 0 (for n = 0). Therefore, log corresponds the execution of P with input
Lemma 5.3.2. Let log be a log (written by Pbk during the execution of Pbk with input values
I, rds, and wrts) at a feasibility check point. Suppose that the constraint system made by
log according to the timestamps satisfying the constraint system. Then, is a feasible global
Proof. log|T represents a feasible n-interference thread-local trace (for some n k) and
local trace for n = 0) under input values I. According to the lock-validity and write-read
constraints in the feasibility check, is a consistent global trace. According to the program
order constraints |T = log|T and |T 0 = log|T 0 . Therefore, both |T and |T are feasible
n-interference thread-local traces. Based on Lemma 5.1.4, should be a feasible global trace
Lemma 5.3.3. Suppose that an error is reached in [T ] when Pbk is executed with some input
values I, rds, and wrts). The error corresponds to an error in concurrent program P , i.e.,
Proof. Let log be the log generated by Pbk before revealing the error. There are two cases:
(i) The error is revealed before any non-local read. In this case, log|T is a consistent feasi-
ble n-interference thread-local trace of T (for n = 0). Therefore, log|T corresponds the
execution of P with input values I where T is executed sequentially first which leads to
the error.
(ii) The error is revealed after some non-local reads in [T ]. Let logpre be the prefix of
log which is passed to the last call of function isFeasible in the execution of Pbk
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 110
before the error. We know that isFeasible(logpre ) returns true since otherwise the
execution would have been aborted. According to Lemma 5.3.2, there exists a (partial)
execution R of program P under input values I with a global trace such that |T =
is a common prefix of log|T and 0 |T . On the other hand, log|T and logpre |T are
feasible n-interference thread-local traces of T (for some n) under I with exactly the
Theorem 5.3.4. Every error revealed in the execution of Pbk corresponds to an error in the
concurrent program P , i.e., there exists an execution of P that leads to the error.
Proof. The error is either revealed in [T ] or in [T 0 ]. According to Lemmas 5.3.1 and 5.3.3,
Lemma 5.3.5. Let Pbk be the sequential program obtained from a concurrent program P ac-
cording to the transformation presented in Section 5.2. Suppose that a bug requires k 0 inter-
some input values in P . Then, there exists some input values I, wrts and rds for Pbk such that
Proof. Let be the global trace corresponding to an execution R of program P with k 0 inter-
ferences in T under input values I that reveals the bug. There are two cases:
k 0 = 0, i.e., the bug is a sequential bug either in T or in T 0 . Let rds[i] = null and
wrts[i] = null for all i k. We claim that the execution of Pbk under input values I,
wrts and rds reveals the bug. Let log be the log generated during the execution of Pbk .
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 111
First, assume that the bug is in T 0 . |T 0 represents a feasible 0-interference global trace
for sequential execution of T 0 under input values I. On the other hand, Pbk , calls [T 0 ]
which simulates the sequential execution of T 0 (on its own copy of shared variables)
under input values I. Furthermore, since wrts[i] = null for all i k, the execution
of [T 0 ] will not be stopped by any return statement added in [T 0 ] (to return after the
first lock-free point where all writes in wrts are performed). Therefore, log|T 0 = |T 0
Now, assume that the bug is in T . |T represents a feasible 0-interference global trace
for sequential execution of T under input values I. Pbk first calls [T 0 ] which works on
the second copy of shared variables. Therefore, when [T 0 ] returns, the main copy of
shared variables have their initial values. Since rds[i] = null for all i k, each from
a shared variable reads a value generated locally during the execution of [T ] and hence
Pbk simulates the sequential execution of T under input values I. Therefore, log|T = |T
k 0 > 0, i.e., the bug is a concurrency bug that is revealed in T . From , we obtain a set
Let = {(id(r), id(w)) | r is a read event by T and w is its matching write performed by
i.e., specifies the set of interferences in . We sort such that (id(r), id(w)) <
(id(r0 ), id(w0 )) iff id(r) < id(r0 ). Let rds[1..k] = Reads() and wrts[1..k]
= W rites() where Reads and W rites return arrays consisting of the reads and writes
null. We claim that Pbk reveals the bug when it is executed with input values I, wrts
and rds:
Let vw[1..k] and vr[1..k] represent the values written by wrts[1..k] or read
by rds[1..k] in , respectively. Assume that the execution of Pbk with input values I,
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 112
wrts and rds generates a log file log. We prove that log|T = |T which reveals the
bug.
Pbk first calls [T 0 ]. The execution of [T 0 ] with input values I, wrts and rds, sim-
interference global trace for sequential execution of T 0 with input values I that contains
Pbk calls [T ] after [T 0 ]. Let log|T,i and |T,i represent prefixes of log|T and |T right
before the ith non-local read, respectively. |T,1 corresponds to sequential execution of
T under input values I until the first non-local read. Also, before reaching the first non-
local read, Pbk simulates the sequential execution of T under input values I. Therefore,
rds[1] will occur in the execution of Pbk and log|T,1 = |T,1 . When reaching rds[1],
Pbk loads value vals[1] = vw[1] into rds[1]. Therefore, Pbk continues simulating
until reaching rds[2]. As a result log|T,2 = |T,2 . The same kind of argument is
valid for later non-local reads and therefore log|T,k0 = |T,k0 . When reaching rds[k],
Pbk loads value vals[k] = vw[k] into rds[k]. After rds[k] is performed, Pbk
continues simulating the same thread-local execution path of T as the thread-local exe-
cution path of T in R (i.e., log|T = |T ), which reaches the error. Note that according
execution of Pbk . Therefore, the execution of Pbk will not be aborted before getting to the
error.
Theorem 5.3.6. Suppose that a testing tool provides path coverage guarantees. Let Pbk be the
presented in Section 5.2. Subjecting the testing tool to Pbk will catch all bugs that require k 0
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 113
Proof. Assume that a bug requires k 0 interferences (where 0 k 0 k) in thread T and no in-
terference in thread T 0 to be revealed under some input values in P . According to Lemma 5.3.5
there exists some input values I, wrts and rds such that the execution of Pbk under these inputs
reveals the bug. Therefore, there exists at least one execution path in Pbk that leads to the bug.
Hence, a sequential testing tool that provides path coverage guarantees should be able to catch
the bug.
5.4 Evaluation
We have implemented a prototype testing tool for multi-threaded C# programs according to the
we first briefly discuss the implementation and then present the experimental results.
5.4.1 Implementation
parser such that in addition to a concurrent program, it gets an unrolling bound (u) for loops
and an interference bound (k) as inputs and transforms the concurrent program to a sequential
program, according to the transformation rules presented in Section 5.2. Throughout the trans-
formation, each loop in the concurrent program is unrolled for u times, and all reads and writes
to shared variables are identified and enumerated. In this prototype, the assumptions in the
main method of the sequential program (defining the set of possible interference scenarios),
1
http://csparser.codeplex.com/
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 114
According to Theorem 5.3.6, the testing technique is complete if the resulting sequential
program is subjected to a sequential testing tool that provides path coverage guarantees. How-
ever, providing path coverage guarantees can be very expensive in practice. We used Microsoft
PEX [84] as our backend sequential testing tool which itself uses Z32 [9] as the underlying
SMT solver. PEX targets different variations of control-flow coverage such as basic block cov-
erage, explicit and implicit branch coverage. Control-flow coverage, in general, is weaker than
path coverage in the sense that it might miss some program bugs. However, PEX managed to
find all known bugs and some new bugs in our benchmarks.
For a given concurrent program, we sequentialize the program, starting with interference
bound k = 1, and use PEX to test the resulting sequential program. In the case that no error
is found by PEX, the interference bound is increased incrementally and the same process is
repeated until the time/computation limit is hit, an error is found, or all possible interference
5.4.2 Experiments
presented in Figure 4.1. Account is a program that creates and manages bank accounts.
Meeting [44] is a sequential program for scheduling meetings. Like in [44], we assumed that
there are two copies of the program running concurrently. Vector, Stack, StringBuffer,
2
http://research.microsoft.com/en-us/um/redmond/projects/z3/
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 115
program Num. of bugs Num. of bugs Num. of bugs Num. of tests total
Bluetooth (55) 0 1 0 18 26
Account (103) 1 0 0 38 28
Meeting (101) 1 0 0 12 16
HashSet (334) 1 0 0 32 22
StringBuffer (198) 1 0 0 12 12
Series (230) 1 0 0 9 10
Ray (1002) 1 0 0 7 18
FTPNET (2158) 2 0 0 10 56
Mutual (104) 1 0 0 28 10
and Hashset are all classes in Java libraries. To test these library classes, we wrote pro-
grams with two threads, where each thread executes exactly one method of the corresponding
class. Series, SOR, and Ray are Java Grande multi-threaded benchmarks3 . For the above
Java programs, we used a Java to C# converter to transform the corresponding Java classes to
C#. FTPNET4 is an open source FTP server in C# and Mutual is a buggy program (presented
in Figure 4.2) in which threads can be in a critical section simultaneously due to improper
synchronization.
We set the loop unrolling bound u to 2 and sequentialized the programs for interference
3
http://www.javagrande.org/
4
http://sourceforge.net/projects/ftpnet/
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 116
in any of the benchmarks). In the main method of the sequential program, we added assump-
tions allowing all possible interference scenarios; i.e., we let each shared variable read in T to
be a non-local read and consider all writes to the corresponding shared variable in thread T 0 as
Table 5.1 contains information about the number of bugs found by allowing k interferences
for 1 k 3, the total number of tests generated by PEX, and the total time spent by PEX for
Observations: The experiments show that the bounded-interference heuristic is very effective
in finding concurrency bugs. All of the bugs were found by allowing only one or two inter-
ferences. This implies that concurrency bugs are not very complex, normally. In all of the
benchmarks, no new error was found when k was increased to 3. Since these benchmarks have
been used by other tools before, we know of no (previously found) bugs that were missed by
our tool. Moreover, we found some new bugs that were not previously reported in FTPNET.
Another observation is that the total number of generated tests by PEX is reasonable. Since
this number is directly affected by the total number of possible interference scenarios (in which
one thread is interfered by the other one), we can conclude that the bounded-interference
heuristics performs pretty well in reducing the search space while being effective in finding
bugs.
Furthermore, the testing technique is very time efficient; the testing time for all of the
benchmarks (except SOR) was less than two minutes. For SOR, the majority of time (about 7
minutes) was spent for testing the program with 3 interferences. This is because there are many
shared variables reads in SOR and many options for the coupling writes for each read.
Although the main goal of the sequentialization was to quickly test whether the bounded-
interference heuristic performs well in practice, we were curious to see how it performs com-
pared to the other sequentialization techniques. We selected P OIROT5 , which is a tool that
5
http://research.microsoft.com/en-us/projects/poirot/
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 117
ysis to find assertion violations, and compared our tool with it. However, a side by side com-
parison with P OIROT, when it does not aim for test generation to increase code coverage, is not
meaningful.
P OIROT has its own input language. We picked two of the benchmarks (i.e., SOR and
Mutual), which did not use the object oriented paradigm and could be transformed to the the
input language of P OIROT more naturally, and translated them manually. Our experiments
showed that P OIROT did not scale well for these benchmarks. P OIROT failed to catch the bug
(which is an assertion violation) in SOR for context bound of 2, 3, and 4 within 30 minutes (for
each bound). For Mutual, which requires 3 context switches to expose the bug, we set the
loop unrolling bound u = 50. Our tool found the bug in a few seconds while Poirot failed to
Conclusion: Our experiments showed that the bounded-interference heuristic is very effective
in finding concurrency bugs. In fact, all of the bugs in our benchmarks were found by allowing
only a few number of interferences among threads. This suggests that the bounded-interference
heuristic can be used by test generation techniques to efficiently search through the exploration
space. We also performed some experiments to compare our testing technique based on the
bounded-interference sequentialization with a P OIROT that uses the context bounding sequen-
tialization to find bugs. Our experiments showed that the bounded-interference heuristic is
The idea of sequentializing concurrent programs and analyzing the resulting sequential pro-
grams was first proposed by Qadeer et al. in [62]. According to their sequentialization tech-
nique, the generated sequential program simulates the behaviours of the concurrent program
up to only two context-switches. Their transformation algorithm is cheap in the sense that it
does not introduce any additional copies of the shared variables. However, allowing maximum
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 118
Lal and Reps [44] proposed another sequentialization technique in which a boolean concur-
rent program with finitely many threads is transformed to a boolean sequential program which
threads (in a round-robin manner). Their transformation introduces k extra copies of shared
variables, one to keep the values of shared variables at each round. The sequential program
calls the threads sequentially, and each thread uses the corresponding set of shared variables
in each round. It then ensures that the values of shared variables at the end of each round are
equal to the values of them at the beginning of the next round. Lahiri et al [42] adapted the
sequentialization technique of Lal and Reps for C programs. Appealing to this transformation,
Rakamaric implemented a tool, called STORM [63], for static unit checking. Later, La Torre et
al. [85] adjusted the sequentialization technique of Lal and Reps for k context-switches instead
static program analysis; execution of the sequential programs requires guessing the values of
shared variables at the beginning of each context/round which might lead to unreachable states
La Torre et al [85] also proposed a lazy sequentialization technique using the context bound-
ing heuristic that does not introduce any additional copies of shared variables. The idea is to
execute the active thread in each context from the beginning to re-compute the values of local
variables at the beginning of that context while using the corresponding pre-computed values
of shared variables at the beginning of each previous context in which the thread was active.
The main problem with the lazy transformation is that it has a huge overhead calling threads
multiple times.
that the generated sequential programs are highly non-deterministic; a context-switch is added
after each statement of the concurrent program non-deterministically in the sequential program.
Therefore, it is not feasible to apply sequential testing techniques on the generated sequential
C HAPTER 5. T ESTING BASED ON B OUNDED -I NTERFERENCE S EQUENTIALIZATION 119
generation, and available sequential testing techniques can be applied to the generated sequen-
5.6 Summary
heuristic to verify the effectiveness the bounded-interference heuristic. Based on our sequen-
tialization, a concurrent program is transformed to a sequential program such that the execu-
tions of the generated sequential program are equal to executions of the concurrent program
(within a bounded number of interferences among threads). One advantage of our sequential-
ization technique is that (in contrast to the sequentialization techniques based on context bound-
ing) state-of-the-art sequential testing techniques can be applied on the generated sequential
programs without any modification to explore both input and interference spaces of concur-
rent programs. Furthermore, using sequential testing tools with coverage guarantees (like path
coverage) would imply coverage guarantees (modulo the interference bound and computation
limits) on the concurrent program after the testing process is finished. We implemented a pro-
totype for multi-threaded C# programs. Our experiments showed that the bounded-interference
Concurrent Programs
Concolic testing [24, 73, 5, 84, 4] is a powerful technique in providing coverage guarantees for
sequential programs.
a successful technique in testing sequential programs. Concolic testing assumes that pro-
grams are deterministic, i.e., they will take the same execution path when the same input values
are given to them. Several advanced search algorithms over the input space (which is the only
parameter for sequential programs) have been proposed and embedded in concolic testing, tar-
geting different coverage criteria (e.g., path coverage, branch coverage, etc.). However, apply-
ing concolic testing to concurrent programs is very challenging. The behaviour of a concurrent
program is influenced not only by input values but also by interleavings of execution of threads.
Therefore, concolic execution of concurrent programs would result in a set of constraints that
are closely tied to the specific schedule performed during program execution.
Data races have been used to leverage concolic testing to generate tests for concurrent
programs [73, 70]. For example, jCUTE [73] is a concolic testing tool for multi-threaded
Java programs that uses data races as a heuristic for interleaving exploration. It first executes
the program with some random input values and observes an execution of the program under
120
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS121
some random thread scheduling. It identifies possible data races in the execution and repeat-
edly either generates new inputs (by keeping the schedule same as before) or generates a new
schedule (by keeping the input values same as before) by re-ordering the events that form a
data race. The main problem with this testing technique is that it can provide coverage guar-
antees only when the testing algorithm is terminated after considering all possible orderings
of events involved in a data race. Note that this exploration space is often very large for real
world programs such that the testing algorithm fails to terminate in a reasonable amount of
time. Therefore, due to the data race heuristic, jCUTE is unable to quantify the partial work
done (e.g., at the occasion of a timeout) as a meaningful coverage measure for the program.
In this chapter, we generalize concolic testing to concurrent programs, and hence we call it
concurrent concolic testing, or (conc)2 olic testing in short [14]. However, we use the bounded-
way that we can quantify the effort spent on testing as a coverage measure. We introduce a
new component in concolic testing, called interference scenario exploration component, that
explores possible interference scenarios (within the interference bound) and for each of them
generates a test (i.e., input values and a schedule) that realizes it (if possible). Using the inter-
ference scenario exploration component, we build a general testing framework where one can
employ different strategies in exploring both input space and interference scenario space. We
have implemented a search strategy that targets achieving maximal branch coverage for concur-
rent programs (time and space allowing) while considering a bounded amount of interferences
(Conc)2 olic testing can theoretically guarantee completeness in the limit; i.e., if the test-
ing algorithm runs for long enough without encountering memory issues, then, in the limit,
we can cover every program branch or declare it unreachable. However, it can also provide
coverage guarantees (modulo the maximum bound reached) at the occasion of timeouts or out-
of-memory errors. Naturally, (conc)2 olic testing is limited by the same constraints that hold
concolic testing back, namely, external function libraries or limitations of the SMT solvers for
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS122
undecidable logics.
We implemented the (conc)2 olic testing technique as a tool for testing multi-threaded C
programs and used a set of benchmarks in concurrency research literature to demonstrate the
practical efficiency of our technique in providing coverage and finding bugs in these bench-
Figure 6.1 shows a buggy implementation of function addAll of a concurrent vector. We use
this example to explain ideas and algorithms presented in this chapter. This example also nicely
demonstrates why there is a need for systematic exploration of both input and interleaving
Function addAll has two input parameters which are pointers to vector structures. It ap-
pends all elements of the second vector to the end of the first vector. Each vector has three
fields: data which is an array holding vector elements, size which represents the size of
data, and cnt which keeps track of the number of elements in data. Function addAll uses
a lock lk to synchronize the calls to this function. It first checks whether there is enough
space to insert all elements of u->data into v->data, i.e., v->cnt + u->cnt v->size
(cf. line 4). If not, it increases the size of v->data accordingly. The invariant v->size
u->cnt + v->cnt is stated as an assertion at line 8. Finally, it copies the elements and updates
v->cnt. The bug in addAll corresponds to the fact that the value of v->cnt is being read (at
line 2) outside the lock block and hence v->cnt can be changed by other threads before the
Imagine a concurrent program with two threads T and T 0 , each of them calling addAll
with v and u as arguments, where v is shared between the threads and u is an input of the
program. Therefore, each individual field of v is treated as a shared variable and each individual
field of u is treated as an input. Also, suppose that initially v->cnt is 10 and v->size is 20.
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS123
Now, consider the situation where u->cnt=7 and the program is executed as follows:
(i) The first thread T executes line 2, reading 10 from v->cnt, 7 from u->cnt and storing
value 17 in numElem.
(ii) The second thread T 0 is executed completely. It reads values 10 and 7 from v->cnt and
u->cnt, respectively (at line 2) and assigns 17 to numElem. Then, it enters the lock
block. Since v->size is greater than 17 it skips lines 5 and 6 and assigns 17 to the
(iii) Then, T continues execution: It skips lines 5 and 6 since (numElem=17) < (v->size=20).
However, when T gets to the assertion, v->cnt has value 17 written by T 0 . Therefore,
ule combined a with particular (relative) values for the input vectors to manifest. If the threads
are executed sequentially back to back, nothing goes wrong. On the other hand, if we execute
the same interleaving (as described above), but start with u->cnt having the value 3 (instead
of 7), then nothing goes wrong again; the first thread assigns 13 to numElem, the second thread
skips lines 5 and 6 and assigns 13 to u->cnt. Then, the first thread skips lines 5 and 6 since
(numElem=13) < (v->size=20). This means that triggering this concurrency bug does not
solely depend on the schedule, nor does it solely depend on the chosen input values; it depends
on finding the right combination of input values and a schedule. Any testing technique that
does not explore the combination space systematically has the potential of missing on this bug.
6.2 Preliminaries
In this section, we introduce some notions from concolic testing adjusted to our application.
Classical sequential concolic testing (discussed in Section 3.2.4) logs a set of path constraints
over input variables during concolic execution which describes the conditions on the values of
the inputs that have to be true to drive the execution of the program along the same path. How-
ever, doing the same for concolic execution of multi-threaded programs would result in a set of
constraints that are closely tied to the specific schedule performed during program execution.
To solve this problem, we proceed as follows: Instead of explicitly tracking scheduling deci-
sions, we introduce symbolic variables which enable us to track the information flow between
threads. More precisely, we introduce an additional symbolic variable each time a shared vari-
able is read. Furthermore, for each shared variable write, we store the symbolic value (based on
symbolic inputs and symbolic read variables). By doing so, we will be able to flexibly combine
reads from and writes to shared variables and build a set of path constraints in a way which is
not tied to a specific schedule but rather depends on a set of interferences among the threads.
In the following, we define the notion of global symbolic traces. Same as global traces in-
troduced in Section 2.2.1, global symbolic traces does not contain any information about local
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS125
computations (i.e., accesses to local variables). However, for shared variable read and write
events, instead of concrete values read or written during the execution, we record symbolic
values. Furthermore, to be able to generate path constraints, global symbolic traces include
branching decisions (that depend on shared variables or input values) made throughout the
execution as well.
Formally, a concurrent program consists of a set of threads T = {T1 , T2 , . . .}, a set of input
variables IN, a set of shared variables SV, a set of local variables LV, and a set of locks L that the
threads manipulate. Let SymbIN be a set of symbolic input variables {i0 , i1 , . . .} and SymbRV
be a set of symbolic shared read variables {r0 , r1 , . . .}. Furthermore, let Expr represent the
set of all expressions over SymbIN and SymbRV, and let Pred(Expr) represent the set of all
predicates over Expr. Then, the set of actions that a thread can perform is defined as:
Action rd(x, r) corresponds to reading symbolic value r from a shared variable x. Each time
we observe a read from a shared variable during concolic execution, we introduce a new sym-
bolic variable r SymbRV that is uniquely associated with that specific read. Action wt(x, val)
variable x. To couple a read of x with a write to x, it is enough to connect the stored expres-
sion at write to the symbolic value of the read, i.e., r = val. Action tf(Ti ) represents forking
thread Ti . Actions ac(l) and rel(l) represent acquiring and releasing of lock l, respectively.
Finally, action br() denotes a branch which requires predicate to be true. We model asser-
tions in a program by two branches, i.e., one branch for passing the assertion and one branch
(Ti , a) T . Let EV denote the set of all possible events. During concolic execution, we
Definition 6.2.1 (Global Symbolic Trace). A global symbolic trace is a finite string EV .
By [n], we denote the nth event of . Given a global symbolic trace , |Ti is the projection
Ti .
In this chapter, wherever we refer to symbolic traces (or shortly traces), we mean global
symbolic traces. The inputs to the concolic execution engine (which is adapted to execute
multi-threaded programs) are an input vector of the program and a schedule which exactly
(Ti1 , n1 )(Ti2 , n2 ) . . . (Tim1 , nm1 ) (Tim , ) where Tij T , for all 1 j m, and nj > 0,
for 1 j < m, specifies the number of executed actions. A tuple (Tij , ) represents the ex-
ecution of thread Tij until Tij terminates. A program run R = P (in, ) is feasible if P can
be executed with input vector in and according to schedule . Each feasible program run R
We assume that the program is instrumented in such a way that all program actions covered
Figure 6.2, on the left, shows a symbolic trace obtained from the assertion violating ex-
ecution of the program in Figure 6.1, discussed in Section 6.1. Note that concolic execution
does not log any information about accesses to local variables. Internally, the concolic ex-
ecution engine keeps track of the symbolic values of local variables and is therefore able to
Initial thread: T T0
[r00 = 10] 2
1 rd(v->cnt, r0 )
Context switch: T T 0
3
2 rd(v->cnt, r00 )
3 ac(lk) [r10 = 20] 4
4 rd(v->size, r10 )
br(r00 + i0 r10 ) 5
5 br(r00 + i0 r10 )
6 rd(v->size, r20 )
[r20 = 20] 6
7 rd(v->cnt, r30 ) T
8 br(r20 i0 + r30 ) 1 [r0 = 10] [r30 = 10] 7
9 rd(v->cnt, r40 )
10 wt(v->cnt, r40 + i0 ) 12 br(r20 i0 + r30 ) 8
11 rel(lk)
13 [r1 = 20] [r40 = 10] 9
0
Context switch: T T
12 ac(lk) 14 br(r0 + i0 r1 ) wt(v->cnt, r40 + i0 ) 10
13 rd(v->size, r1 )
14 br(r0 + i0 r1 ) 15 [r2 = 20] 11
15 rd(v->size, r2 )
16 [r3 = r40 + i0 ]
16 rd(v->cnt, r3 )
17 br(r2 < i0 + r3 ) 17 br(r2 < i0 + r3 )
Figure 6.2: Symbolic trace obtained from the assertion violating execution of the program
presented in Figure 6.1 and its corresponding interference scenario IS(). i0 represents the
symbolic value of input v->cnt. r0 , r00 , r30 , and r40 read initial value 10, and r1 , r2 , r10 , and r20
In this section, we formally define the concept of interference scenarios, its variations, and
some applicable operations, which form the basis for (conc)2 olic testing. We also define two
constraint systems to generate input values and schedules according to interference scenarios,
respectively.
An interference occurs whenever a thread reads a value that is written by another thread. We
introduce interference scenarios to describe a class of program executions under which certain
of thread-local symbolic traces extended with an interference relation between write and read
events from different threads. We represent a set of interference scenarios in a data structure
called interference forest. Formally, an interference forest is a finite labeled directed acyclic
graph whose nodes represent events and whose edges express relations between events.
v V where `(v) = (Ti , a), we also define Th(v) = Ti and Ac(v) = a to be the thread and
the action of the corresponding event, respectively. The set of edges E is the disjoint union
simply, a local edge) is an edge (s, t) EL where Th(s) = Th(t). An interference edge
(s, t) EI is an edge where Th(s) 6= Th(t) and Ac(s) = wt(x, val) and Ac(t) = rd(x, r) for
some x, val, and r. We require that EI is an injective relation, i.e., each read is connected to at
most one write by EI . The thread-local edges can be naturally partitioned according to their
threads, i.e., EL = ET1 ET2 . . . ETn . Each ETi induces a subforest GTi which consists
of all nodes with Th(v) = Ti and edges in ETi . We require that each GTi is a rooted tree.
The number of interference edges |EI | is called the degree of the interference forest. Given
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS129
Thread T1 Thread T2
wt(y) wt(z)
rd(y) rd(y)
wt(x) wt(x)
.. rd(x) rd(x) ..
.
n .
..
. .. ..
.. ..
. . . .
Figure 6.3: Example of an interference forest. Dashed lines enclose an interference scenario in
an interference forest J, RI (J) denotes the read nodes involved in the interference edges of J,
Figure 6.3 shows an interference forest. The nodes labeled with read/write and branch
actions are represented by squares and diamonds, respectively. Local edges are presented by
arrows and interference edges are presented by dotted arrows. The left tree represents GT1 and
the right tree represents GT2 . The degree of the interference forest is 2.
The transitive closure E of the edge relation E is called the causality relation of I. Given a
node n, the causal interference scenario (CIS) of n is the subforest of I induced by the causal
predecessors of n, i.e., by the node set {v | (v, n) E }. We denote it by C = CIS (I, n) and
Every causal interference scenario is itself an interference scenario. This is also the cru-
cial property why interference forests serve as compact representations for sets of interference
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS130
scenarios. In Figure 6.3, the causal interference scenario of node n is the interference scenario
be two interference forests. Let RdsJ and RdsK be the symbolic variables corresponding to
reads from shared variables in node labels of J and K, respectively. J and K are isomorphic
if there exists a bijection f : RdsJ RdsK , and a bijection g : VJ VK such that (i)
g(vj ) = vk iff `I (vk ) is equal to `J (vj ) where each symbolic variable r RdsJ is replaced by
Isomorphism on interference forests is like isomorphism on labeled graphs while the iso-
morphic nodes have the same labels modulo symbolic variables corresponding to reads from
shared variables. Construction of new interference scenarios from existing ones and merging
interference scenarios into an interference forest are two central operations in (conc)2 olic tech-
Definition 6.3.5 (Compatible Interference Forests). Two interference forests I, J are compati-
ble if there is an interference forest K and interference subforests I 0 , J 0 of K such that I 0 and
Definition 6.3.5 also applies to compatible interference scenarios since each interference
Remark 6.3.6. Two compatible interference forests can be merged into an interference forest
by naturally taking the minimal K, i.e., K only contains nodes and edges corresponding to I 0
and J 0 . Note that if interference scenarios I and J are not compatible, then there is at least
Each symbolic trace (obtained from a program execution) defines an interference sce-
nario, denoted by IS (). Intuitively, each event represents a unique node in IS () which is
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS131
labeled with that event. For each thread Ti , thread-local edges are added between the cor-
responding nodes according to the order in |Ti (where |Ti = i,1 , i,2 , .., i,m denotes the
projection of events in on thread Ti .) An interference edge is added for each node labeled
with a read event if the last write event to the same shared variable before the read event in
S
V = Ti {ni,j | ni,j is a unique node for event i,j },
for some x, val, r and i,k is the last write to x in before j,h }.
Figure 6.2 shows the interference scenario for the symbolic trace obtained from the assertion
concurrent program P iff there is a feasible partial run R of P with = (R) such that IS ()
Realizable interference scenarios define equivalence classes on the set of program runs
which represent the same flow of data among the threads. Note that interference scenarios
are not monotonic wrt. realizability. Let I and I 0 be two interference scenarios where I is a
subgraph of I 0 . Then, the realizability of I does not imply the realizability of I 0 and vice versa.
We will discuss the reasons for this behaviour at the end of this section.
Interference scenarios specify partial program runs and therefore unanticipated behaviour
can be observed:
R be a partial program run with = (R) such that I is isomorphic to IS (). Let R0 be a
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS132
IS (). More specifically, IS ( 0 ) might contain some additional interferences. We refer to these
Each interference scenario implies constraints on both data and temporal order of the events.
Here, we describe these constraints in detail. In Section 6.4.3, we present a theorem (Theo-
rem 6.4.3) that shows how these constraints can be used to check for the realizability of an
interference scenario.
Data Constraints. Each interference scenario I = (V, E, `) defines a data constraint DC(I) as
shown in Figure 6.4. Any solution to DC(I) (if one exists), defines an input vector i for the
concurrent program. The constraint DC(I) consists of three parts: (i) DCbranch , (ii) DCinterference ,
and (iii) DClocal . The constraint DCbranch encodes all branch conditions occurring in I. The
intuition behind this constraint is that the program execution should follow the control path
in each thread represented by the respective branching conditions. DCinterference relates each
read from a shared variable, which should be interfered by a write from another thread, to the
symbolic value of the corresponding write. Finally, DClocal relates each read from a shared
variable, which should not be interfered by any write from other threads, to the most recent
write to the same shared variable performed by the same thread. If there is no such write
before the read, the symbolic value of the shared variable is constrained to the initial value of
the variable. In this formula, let intReads represent all nodes v with Ac(n) = rd(x, r) such that
v is involved in an interference edge in EI , and let LocW be a function that for each node v with
Ac(v) = rd(x, r) and Th(v) = Ti returns a node u with Ac(u) = wt(x, val) and Th(u) = Ti in
sistency constraint TC(I). Any solution to this constraint defines a schedule for the concurrent
program. The constraints in TC(I), as defined in Figure 6.4, are divided into the following
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS133
DCmatch (vrd , vwt ): (r = val) for Ac(vwt ) = wt(x, val), Ac(vrd ) = rd(x, r)
Figure 6.4: Constraint systems DC(I) and TC(I) for an interference scenario I = (V, E, `).
four categories: (i) thread-local program-order consistency (POTi ), (ii) thread-fork consistency
(FC), (iii) lock consistency (LC1 &LC2 ), and (iv) write-read consistency (WRC1 &WRC2 ). For
each node n in I, an integer variable tn (timestamp) is considered to encode the index of the
event of the node in a symbolic trace . In the constraints in Figure 6.4, let ni,j represent the
j th node in GTi , and let ntf(Ti ) represent the node n where Ac(n) = tf(Ti ). The constraints
of TC(I) are:
POTi : Ensures that for thread Ti , the thread-local program order is respected in the schedule.
LC1 & LC2 : Each lock acquire node aq with Ac(aq) = ac(l) and T h(aq) = Ti and its
corresponding lock release node rl in Ti define a lock block, represented by [aq, rl]. Let LTi ,l
be the set of lock blocks in thread Ti regarding lock l. LC1 ensures that no two threads can be
inside lock blocks of the same lock l, simultaneously. LC2 ensures that the acquire of lock l
by a thread that never releases it in I must occur after all releases of lock l in other threads. In
this formula, NoRelTi ,l stands for lock acquire nodes in Ti with no corresponding lock release
nodes.
WRCinterference & WRClocal : Let Wx represent the set of all nodes u with Ac(u) = wt(x, val),
and intReads and LocW as defined before. For each read node v and write node u, the formula
Coupled(v, u) ensures that the read event of v is coupled with the write event of u in by
forcing all events that write to the corresponding variable to happen either before the event of
interference scenarios wrt. realizability that was mentioned in the discussion following Def-
inition 6.3.7. Let I and I 0 be two interference scenarios where I is a subgraph of I 0 . Then,
according to the data constraints, all constraints in DCbranch (I) and DCinterference (I) appear
in DCbranch (I 0 ) and DCinterference (I 0 ), respectively. However, the constraints in DClocal (I) and
DClocal (I 0 ) are incomparable. The same phenomenon exists in the temporal-consistency con-
straints, i.e., WRClocal in I and I 0 are incomparable. This implies that, by extending an interfer-
ence scenario, the resulting constraint systems do not change in a monotonic way.
In this section, we first propose a general concolic testing framework for concurrent programs
based on the notion of interference scenarios defined in the previous section. Then, we develop
Symb.
Multi-threaded Trace
Path Exploration
Concolic Execution
Seq. Schedule & Schedule &
Interference Interference
Random Inputs Inputs
Scenarios Exploration
Figure 6.5 shows the (conc)2 olic testing framework. Similar to concolic testing (see Fig-
ure 3.3), (conc)2 olic testing has a concolic execution engine, a path exploration component
and a realizability check component. However, (conc)2 olic testing has one more component,
called interference exploration component, that explores the space of interference scenarios.
In (conc)2 olic testing, the execution engine is leveraged to execute a concurrent program
with the provided input values based on a given schedule. Symbolic traces obtained from
concolic execution are stored in an interference forest that keeps track of various interference
scenarios that have already been explored. The path exploration component then, based on an
already-seen interference scenario, aims to cover a previously uncovered part of the program
(e.g., uncovered branches), by doing input exploration according to that interference scenario.
Based on the interference scenario and a target branch defined by the path exploration, the
realizability check component investigates whether there exist a set of input values and a fea-
sible schedule such that the execution of the concurrent program with the inputs and based
on the schedule results in covering the branch. If the answer is yes, the next round of con-
colic execution uses some input values and a schedule that realizes the interference scenario
and covers the branch. In the case that the answer is no, the interference exploration compo-
nent extends the interference scenario by introducing new interferences. In the following, we
Concolic Execution. There are two input parameters for the concolic execution engine in
(conc)2 olic testing: (1) an input vector and (2) a schedule. The concolic execution engine exe-
cutes the concurrent program with the given input vector and according to the given schedule.
The program is instrumented such that, during the execution, all accesses to shared variables,
synchronization events, and branching events are recorded to generate a global symbolic trace.
This global symbolic trace contains all necessary information for the (conc)2 olic engine to
make progress. However, it excludes any extra information, such as details of local com-
putations of threads, that can safely be ignored in (conc)2 olic testing to gain scalability and
efficiency.
Path Exploration. The role of the path exploration component is to explore the input space for
a new set of input values that according to a previously-seen interference scenario covers a yet
uncovered part of the program. As in concolic testing, it gets a symbolic trace and selects a
branching event whose condition should be flipped to drive the execution towards an uncovered
part of the program. The goal of path exploration is to use the exact set of interferences seen
in a global symbolic trace and explore the input space based on that. Therefore, the path
Realizability Checker. Getting an interference scenario with an uncovered target branch, the
realizability checker determines if there is a set of input values and a feasible schedule such that
the execution of the program with the input values and under the schedule realizes the given
interference scenario and leads to covering the target branch. It generates the corresponding
constraint systems (discussed in Section 6.3.2) for the interference scenario. There are two
(i) The combined constraint system has a solution. Then, any solution implies an input vec-
tor and a schedule which give rise to a program execution leading to covering the target
branch with exactly the same set of interferences as defined in the given interference sce-
nario. Therefore, we can formulate a new execution for the next round of the concolic
execution module.
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS137
(ii) At least one of the constraint systems does not have a solution. This means that the
interference scenario has to change. The current interference scenario is passed to the in-
terference exploration module (described below), so that a new ones are produced based
on it.
interference. This is done by picking a read from the given interference scenario that is not
interfered by other threads, and an appropriate write from the forest, and adding an interference
from the write to the read to generate a new interference scenario. Note that the occurrence
of the write event itself may be conditional on existence of other interferences. Therefore, to
preserve soundness, all of those interferences should be included in the produced interference
scenario as well.
Search Strategy. Having the above components, (conc)2 olic testing can exploit different
search strategies and heuristics to explore the interference scenario space. We have devel-
and targeting branch coverage. According to this search strategy, all interference scenarios
with one interference are explored first, and then interference scenarios with two interferences
are explored, and so on. A nice feature of this exploration strategy is that it is complete (Theo-
rem 6.4.3) modulo the interference bound (and of course concolic testing limitations).
Here, we present an algorithm that instantiates the (conc)2 olic testing framework by employing
the bounded-interference heuristic in the search strategy and targeting branch coverage. Each
assertion in the program can be modeled by two branches, one for passing the assertion and one
for assertion violation. Therefore, our (conc)2 olic testing implicitly aims at finding assertion
violations. We are specifically interested in interference scenarios related to nodes labeled with
branch actions:
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS138
Definition 6.4.1 (Interference Scenario Candidate, ISC). Let n be a node with Ac(n) = br(),
for some . A causal interference scenario C is an interference scenario candidate for node n
if sink(C) = n.
Note that each ISC C with sink(C) = n (if realizable) defines a set of partial program runs
where Ac(n) is the last action in the run. According to the bounded-interference heuristic, our
algorithm enumerates all ISCs of degree i, and checks their realizability, before moving to ISCs
Assumptions. For a concurrent program P , we assume that (i) individual threads in P are
deterministic sequential programs. (ii) all threads are created by the main method of program
P . Furthermore, to keep the exposition simple, we will make the following simplifying as-
sumptions: (iii) There are no unforeseen interferences for an ISC C, i.e., each program run R0
extending a partial run R, with C = IS (R), results in an interference scenario IS (R0 ) which
has exactly the same interferences as C. (iv) There are no locks in concurrent programs. Note
that we state assumptions (iii) and (iv) for ease of presentation and our (conc)2 olic testing is
not limited to settings where these assumptions are true; specially, all benchmark programs in
Section 6.5.2 contain locks. We address removing these assumptions in Section 6.4.4.
Algorithm 3 shows our (conc)2 olic testing algorithm. Given a concurrent program P and a
threshold kmax for the number of interferences, the algorithm explores ISCs of degree kmax
with the aim of increasing branch coverage. For each such ISC, the algorithm tries to compute
a corresponding test.
1-5: Data Structures. The algorithm utilizes three central data structures: (i) a global inter-
ference forest forest that stores all interference scenarios explored by concolic execution, (ii) a
list of sets W0 , . . . , Wkmax , where each Wk , for 0 k kmax , serves as a worklist for ISCs of
degree k, and (iii) a list of sets UN 0 , . . . , UN kmax , where each UN k , for 0 k kmax , stores
all processed but unrealizable ISCs of degree k. All these data structures are initially empty
(cf. lines 1 to 5). During the execution of Algorithm 3, each generated ISC C of degree k is
initially inserted into Wk . Later on, Algorithm 3 checks for the realizability of C and moves it
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS139
3 for k = 0 to kmax do
4 Wk
5 UN k
6 i random inputs
7 foreach thread Tj do
9 W0 W0 ExtractISCs()
10 for k = 0 to kmax do
11 while Wk 6= do
13 ISC-Set iscs
14 (result, i, ) RealizabilityCheck(C)
16 UN k UN k {C}
18 else
19 ConcolicExecution(P, (i, ))
20 Wk Wk ExtractISCs()
21 Wrts new-write-nodes(forest, )
22 foreach C0 UN i , 0 i k 1 do
24 foreach C0 iscs do
25 k 0 Degree(C0 )
26 if k 0 kmax then
0 0
27 W k W k {C0 }
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS140
2 MergeInterferenceForests(forest, F)
3 ISC-Set iscs
6 return iscs
6-9: Initial Path Exploration. We initialize W0 by executing a test (i, (Tj , )) for each
thread Tj (line 8), where i is a random input vector (we use the same i for each thread Tj )
and the schedule (Tj , ) allows only a sequential execution of thread Tj without any interrup-
tion from other threads. After concolic execution of thread Tj , program execution is aborted
without executing any other thread, and a global symbolic trace is returned. The global
symbolic trace , is passed to ExtractISCs (at line 9), which derives new ISCs for uncovered
branches with the exactly the same set of interferences implied by . Since corresponds to
a thread-local execution here, the degree of all generated ISCs is equal to 0. The ExtractISCs
algorithm is described in the next paragraph. The returned ISCs are inserted into worklist W0 .
After the initial path exploration phase, W0 contains, for each thread in P , a set of ISCs for
further exploration.
as input. It first obtains an interference scenario according to , i.e., IS(). For example, Fig-
ure 6.6a shows IS(T 0 ) where T 0 is the global symbolic trace returned by the initial sequential
execution of thread T 0 (introduced in Figure 6.2). Then, the algorithm tries to generate new
ISCs for uncovered branches according to IS(). Each branch node in IS(), has a corre-
sponding dual branch node where its symbolic constraint is negated. ExtractISCs (at line 1),
extends IS() to an interference forest F by introducing for each branch, a dual branch node,
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS141
(a) Interference scenario IS (T 0 ) for a symbolic trace T 0 obtained by a sequential execution of thread
T 0 (cf. 6.2).
d1 d2
d1 d2
Figure 6.6: An example showing initial path exploration for thread T 0 (cf. 6.2).
called a dangling node. Figure 6.6b shows the extension of the IS(T 0 ) in Figure 6.6a by
dangling nodes. The generated interference forest F is then merged into forest (cf. line 2)
as described in Remark 6.3.6. Finally, for each dangling node which was not merged with an
existing node, ExtractISCs creates an ISC (cf. lines 4 and 5). For example, CIS (forest, d1 )
and CIS (forest, d2 ) in Figure 6.6c are the ISCs generated by ExtractISCs from the interference
forest in Figure 6.6b. Generated ISCs are returned to the main algorithm. Note that all of
these ISCs will have the same interferences as IS (). Since forest is initially empty, during the
10-27: Main Loop. The testing algorithm processes worklists W0 , . . . , Wkmax in ascending
order. While processing Wk , each ISC C Wk is removed from Wk and its realizability is
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS142
d1 d2
1 12 13 14 15 16 17
d3 d4
Figure 6.7: Interference scenario IS () from Figure 6.2, extended by dangling nodes d1 , d2 ,
d3 , and d4 .
checked. Given an ISC C, RealizabilityCheck returns a triple (result, i, ) where result indicates
15-17: ISC Exploration. C is stored into UN k for later processing if it is not realizable. Since
the realizability of ISCs is not monotonic (as discussed in Section 6.3), C still has a chance
to become realizable if some more interferences are introduced in it. Therefore, the algorithm
collects all write nodes stored in forest (cf. line 17) and further calls ExploreISCs (Algorithm 5)
to extend C to a set of ISCs for target branch sink(C) by introducing a new interference from
a write in Wrts to a read in C. Each of the generated ISCs has a degree i > k and is added
to Wi in lines 24 to 27. Since i > k, the newly generated ISCs will be processed after Wk is
19-20: Path Exploration. If C is realizable, then the program is concolically executed with
input vector i and according to schedule (cf. line 19). The moment sink(C) is executed,
the schedule enforces an exclusive execution of thread Th(sink(C)) without any interruption
from other threads. The concolic execution returns a global symbolic trace which is fed
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS143
4 foreach nw Wrts do
7 Iw CIS (forest, nw )
8 if nr
/ RI (Iw ) and compatible(C, Iw ) then
9 C0 merge(C, Iw )
14 return iscs
to ExtractISCs to update forest and derive ISCs from similar to the k = 0 case described
earlier. Figure 6.7 shows an example where k > 0. There, the loosely dashed lines enclose the
interference scenario candidate CIS (forest, d4 ). All generated ISCs will be added to Wk .
21-23: ISC Re-Exploration. When forest is updated with IS() during the path exploration,
some write nodes might be added to the forest. For each of these write nodes, all previously
unrealizable ISCs have to be reconsidered and extended (if possible) by an interference from
the new write node to a read node in these ISCs. This happens at lines 21 to 23; each previously
corresponding to the newly observed write (if possible). All of these writes requires exactly k
interferences to happen since they occur after sink(C) is covered. Therefore, the ISCs generated
at line 23 have a degree greater than k and are added to the according worklists at lines 2427.
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS144
with new interferences. Let nr be a read node in a given ISC C. Let nw be a write node in
forest and let Iw be the causal interference scenario of nw , i.e., Iw = CIS (forest, nw ). To create
an ISC C00 which extends C by the interference (nw , nr ), the algorithm checks the following
conditions:
2. nr and nw correspond to the same shared variable, i.e., if Ac(nr ) = rd(x, r), for some
symbolic variable r, then Ac(nw ) is of the form wt(x, val), for some expression val.
Section 6.3).
If all conditions are satisfied then C and Iw are merged as described in Remark 6.3.6 and form
an ISC C0 . Then, C0 is extended to ISC C00 by introducing the interference edge (nw , nr ).
Algorithm 5 collects all generated ISCs and finally returns them at line 14. Note that each
generated ISC C00 has the same sink as C, i.e., sink(C00 ) = sink(C), and has at least one more
interference than C, i.e., Degree(C00 ) Degree(C) + 1. The degree of C00 might increase by
more than one interference, because Iw might contain interferences which are not present in C,
In Section 6.3.2, we discussed data constraints DC(I) and temporal constraints TC(I) corre-
sponding to an interference scenario I. In the following, we first show that these constraints
can be used to check the realizability of I which guarantees soundness. Then, prove that Algo-
rithm 3 is complete.
The soundness of our (conc)2 olic testing is proved by the following lemma:
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS145
Lemma 6.4.2. Assume that TC(C) and DC(C) are satisfiable for an ISC C generated by Al-
gorithm 3. Let i be a model for DC(C) and be a schedule obtained according to the values
Theorem 6.4.3. Let C be an ISC generated by Algorithm 3. C is realizable if and only if DC(C)
Proof. : First, we assume that C is realizable, i.e., there exists a program run R = P (i, )
such that IS(R) is isomorphic to C. In this case i and are models for DC(C) and TC(C),
: Now, we assume that if for an ISC C generated by Algorithm 3, DC(C) and TC(C)
are satisfiable. Let i be a model for DC(C) and be a schedule obtained from a model of
TC(C). According to Lemma 6.4.2, R = P (i, ) is a feasible run of program P and IS(R) is
checked by determining whether DC(C) and TC(C) are both satisfiable. Assume that C is
realizable, i is a model for DC(C) and 0 is a schedule obtained according to the values of
timestamps (sorted in ascending order) from a model of TC(C). Then, the RealizabilityCheck
algorithm (called in Algorithm 3 at line 14) returns a triple (result, i, ) where result determines
that C is realizable and = 0 (Th(n), ) which forces the sequential execution of thread Th(n)
after 0 .
Theorem 6.4.4 (soundness). Each program test run R = P (i, ) generated by Algorithm 3 to
Proof. A program test run R = P (i, ) is generated whenever an ISC C is found to be realiz-
able. In this case, i is a solution to DC(C) and is a schedule according a model of TC(C).
To state the completeness theorem, we first define the notion of k-coverable branches:
The completeness of our (conc)2 olic testing is proved by the following lemma:
Lemma 6.4.6. For each k-coverable branch br where k kmax , either k = 0 and br is covered
by the initial random test (at line 8) or there exists a realizable ISC C in W k (sink(C) might
be different from br) whose generated test covers br. This implies that all writes that require k
Theorem 6.4.7 (Completeness). Given a concurrent program P and a bound kmax on the
The theorem is implied trivially by Lemma 6.4.6. Like all completeness results in con-
colic testing our completeness theorem relies on several idealizing assumptions. The theorem
states that for deterministic programs without non-linear arithmetics and calls to external li-
brary functions, our (conc)2 olic testing algorithm covers all branches of P that require at most
kmax many interferences to be covered. In practice, concolic execution falls back upon concrete
values observed during execution to handle non-linear computations or calls to external library
Dealing with Unforeseen Interferences. In order to drop assumption (iii) stated at the begin-
ning of Section 6.4.2 about unforseen interferences, we need to make the following changes:
(1) The concolic execution engine stops as soon as an unforeseen interference is observed and
returns a global symbolic trace that ends with the read event of the unforeseen interference.
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS147
(2) The ExtractISCs algorithm is extended as follows: When building forest F at line 1, a
distinguished dangling node is added which is labeled with the read event of the unforeseen
tISCs algorithm will then create a causal interference scenario for this special dangling node
(at line 5). Consequently, Test algorithm will then try to realize this interference scenario, first
without introducing any interference. If this is not possible then it will introduce some inter-
ferences later. Note that the notion of CIS should be extended to allow sink nodes labeled with
read events. Since our algorithms never make use of the fact that the sink of a CIS is a branch
Dealing with Locks. At the beginning of Section 6.4, we assumed that programs do not contain
locks. Here, we present the issues that locks in programs introduce and discuss how the testing
algorithm can be changed slightly to handle programs with locks. Consider an ISC C with
sink(C) = n. It might be the case that for a thread Ti 6= Th(sink(C)), the last node in GTi is
labeled with a write event (interfering with a read node in GTh(sink(C)) ) that happened while Ti
was holding some locks. This situation may cause the following problems: (i) C might never
become realizable, e.g., if n is also protected by the same locks, then Ti does not have any
chance to release the locks for Th(sink(C)). (ii) C might be realizable but the test generated
for C may lead to a deadlock, e.g., thread Th(sink(C)) acquires any of these locks later.
To solve the problems, whenever we create a new ISC C, we extend all thread-local sub-
scenarios of C according to forest, except for thread Th(sink(C)), such that for each thread Ti
the last node in GTi according to C is not protected by any lock. This change to the testing al-
gorithm can be done by uncommenting the commented line in ExploreISCs algorithm; assume
consider the ISC shown in Figure 6.7. There, T 0 holds a lock at node 10. Therefore, the ISC
is extended to include node 11 where T 0 releases the lock. We assume that this extension is
always unique, i.e., each lock acquire operation corresponds to exactly one lock release opera-
tion in the code, where both operations are in the same block of code under the same branches.
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS148
Note that this assumption is not unrealistic since many widely-used programming languages
(e.g., Java, C#, etc.) use lock block instructions to handle locks (i.e., the lock is acquired at the
beginning of the lock block and is released at the end of the lock block) that inherently satisfy
6.4.5 Optimizations
new interferences to unrealizable ISCs. For an unrealizable ISC, it might be the case that
no extension of it by interferences will ever get realizable. From the unsatisfying core of
the constraint systems (defined in Section 6.3.2), we can identify such situations. Let C =
(V, E, `) be an ISC. Data constraint DC(C) is then equal to DCbranch (V ) DCinterference (C)
removes some predicates in DClocal (C) from DC(C0 ) but the predicates in DCbranch (V ) and
DCinterference (C) remain as part of DC(C0 ). Therefore, if the unsatisfying core of DC(C) does
not involve predicates from DClocal (C), we can conclude that DC(C0 ) or any other extension
of C will not be realizable as well and, therefore, we can exclude C from further exploration.
Analogously, if TC(C) is not satisfiable and no constraints from WRClocal are involved in the
unsatisfying core then, again, we can conclude that C will not become realizable by adding
new interferences and we can exclude C from further exploration. Furthermore, in both cases,
the unsat core can be used to guide the exploration by introducing an interference for a so far
Duplication Freedom. The ExploreISCs algorithm allows multiple instantiations of the same
ISC. For example, suppose that an ISC C becomes realizable by introducing interferences for
two reads. The algorithm can first select any of these reads and generate two ISCs in which one
of these reads is interfered. Then, in the future, these two ISCs can be extended such that the
other read is also interfered, generating two instances of the same ISC. To avoid duplication
of ISCs, we use a caching mechanism. In this way, an ISC will be processed only if it is not
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS149
Prioritized Exploration. While processing each worklist Wk , we can choose to prioritize the
ISCs in Wk . For example, in our implementation, we assign higher priorities to ISCs which
would cover some yet uncovered part of program code (in case they are realizable). Based
on this exploration strategy, the ExploreISCs algorithm (at line 11) first processes ISCs with
higher priorities.
6.5 Evaluation
We implemented our (conc)2 olic testing technique in a tool called C ON C REST. We built C ON -
C REST on top of C REST [4] which is a concolic testing tool for sequential C programs. To
the following, we briefly discuss the implementation and then present our experimental results.
6.5.1 Implementation
In Section 6.4.1, we discussed how a traditional concolic testing technique can be leveraged
to (conc)2 olic testing. Figure 6.5 shows the high-level components of (conc)2 olic testing. We
implemented (conc)2 olic testing by extending a sequential concolic testing tool C REST [4] as
follows: (i) Concolic execution engine is changed such that in addition to an input vector,
it also gets a schedule, executes the program with the given input vector according to the
schedule and generates a global symbolic trace. (ii) The path exploration component is adapted
Algorithm 5. (iv) The constraint system generated by the realizability check component is
lifted as discussed in Section 6.3.2. (v) The search algorithm is changed based on the bounded-
Program ]Thrd ]Inputs ]Br ]Br Max k reached ]Br Bug found ] ISC time
Table 6.1: Experimental results for (conc)2 olic testing according to bounded-interference
heuristic. ]Br: number of static branches, i.e., number of basic code blocks. k: number of
interferences. Full Cov.: all branches are covered. Max Cov.: all possible interference sce-
6.5.2 Experiments
Benchmarks: bluetooth is a simplified version of the Bluetooth driver from [62]. sor
is from Java Grande multi-threaded benchmarks (which we translated to C). ctrace1 and
ctrace2 are two test drivers for the ctrace library. apache1 and Apache2 are test drivers
for APACHE FTP server from BugBench [46]. splay and rbTree are test drivers for a C li-
brary which implements several types of trees. aget is a multi-threaded download accelerator.
to evaluate the scalability of our (conc)2 olic testing according to the number of threads. It has
the property that a new assertion can be violated every time we increase the number of threads
by one. In this example, there is a shared variable x among the threads. Each thread has an
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS151
integer input i such that 0 i 5, and performs x = x + 10i for 9 times in a loop. There is
also an assertion in the loop checking that x does not have a specific value.
The experiments were performed with kmax = 100 (at most 100 interferences) and a time-
out of 2 hours. In Table 6.1, we report the number of threads and inputs, the total num-
ber of static branches in the benchmarks, the number of static branches covered by having
0/1/2/3/4/etc. interferences, the maximum bound reached for the number of interferences (and
the reason why it did not go beyond it), branch coverage improvement over sequential testing
(i.e., 0 interference), if any bug was found (and the number of interferences required to expose
the bug), the total number of explored ISCs, and the time spent on testing, respectively.
Observations: The experiments show that C ON C REST is effective in increasing branch cov-
erage. For some of the benchmarks, a substantial number of branches were not sequentially
coverable and were only covered after interferences were introduced, e.g., for rbTree, the
number of covered branches increases from 67 in sequential testing to 95 in (conc)2 olic test-
ing.
From the maximum bound reached in each benchmark for the number of interferences, we
can see that although we set the maximum number of interferences to be 100, the actual bound
explored by C ON C REST is much smaller. This is because in most cases (with the exception
of 2 timeout cases), we either achieved maximum branch coverage or explored all possible
ISCs (i.e., no more branches are coverable). In the lack of a bug found, reaching the maximum
coverage provides guarantees to the tester that, e.g., no assertions in the code can be violated.
There are cases where maximum branch coverage is achieved, but the number does not
coincide with the total number of static branches. We found that the remaining branches were
either not coverable by the test driver, or were branches on local variables, related to sanity
checks on the system execution. Such sanity checks include checks on system call executions,
such as whether a file-open operation using fopen succeeded or not. Since our test execution
was not providing any mock environment, such as for the file system, these sanity checks could
not fail.
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS152
Row U P D Assertion Violation Found (time (s)) Max k Reached Total Time (s)
Table 6.2: Optimization effects on pfscan benchmark. U = unsat core guidance, P = priori-
Another observation is that C ON C REST is very effective at finding bugs; all of the known
bugs in the benchmarks were discovered by C ON C REST. All of the bugs found in our bench-
marks were revealed by covering a branch that could not be covered by sequential testing.
Moreover, as the table shows, all bugs were discovered under a relatively small number of in-
terferences (maximum 4). This implies that concurrency bugs are not very complex according
Furthermore, we can see that the time spent by C ON C REST is very reasonable. In all
of the benchmarks (except two), we could get full coverage or maximum coverage in a few
minutes. The interference spaces of pfscan and ctrace2 were so huge that the interference
bound could not go beyond 4 and 1 (within the time limit of 2 hours), respectively. However,
C ON C REST was able to find the bugs in these benchmarks within the time limit.
Effect of Optimizations: Table 6.2 presents the effects of the optimizations discussed in Sec-
tion 6.4.5 for pfscan (as an example). The bug in pfscan corresponds to an assertion vio-
whether the assertion violation is found (and the time spent on testing until it is found), the
maximum number of interferences explored, and the total time spent for testing.
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS153
Program ] Br. covered by k>0 Time to find the bug (s) timeout? ] Br. covered by MTA Time to find the bug (s)
apache1 3 1 no 2 32
apache2 1 1 no 1 81
aget 1 3 no 1 180
splay 18 no bug no 8 no bug
ctrace1 3 1 no 3 341
ctrace2 5 757 yes 14 447
We can observe that when there is no optimization enabled, C ON C REST runs out of mem-
ory with k = 2. The efficiency of unsat-core guidance is clear because without this optimiza-
tion k cannot go higher than 2. In fact, to move to k = 4 and catch the assertion violations,
both unsat-core guidance and duplication-freedom optimizations have to be enabled. The ef-
fect of prioritized exploration can be observed by comparing rows 1 and 3: when prioritization
is enabled the assertion violation is found earlier. Therefore, all of the optimizations are effec-
tive in reducing the exploration space (and the time spent on exploration accordingly) to catch
program bugs.
Comparison with MTA: The goal of multi-trace analysis (discussed in Chapter 3) is to in-
crease branch coverage in concurrent programs. (Conc)2 olic testing also generates tests tar-
geting branch coverage. However, it has another goal as well, which is to provide coverage
compare these testing techniques against each other. In Chapter 3, we discussed how MTA is
tion process on program approximations (i.e., set of program runs) by omitting branches which
depend only on local variables or relate to sanity checks on the system execution and does not
include them in the total number of branches. Therefore, a side by side comparison of the
the number of branches covered by MTA (after sequential testing of individual threads) and
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS154
(conc)2 olic testing (with k > 0 interferences). Furthermore, we compare the time spent by
In Table 6.3, we consider the benchmarks used to evaluate both testing techniques (i.e.,
benchmarks common in Tables 6.1 and 3.1). For each benchmark, we report the number of
branches covered after sequential testing and the time spent until the bug was found by each
technique. We consider a timeout of 2 hours for (conc)2 olic testing (as in Table 6.1) and also
report if the (conc)2 olic testing of the benchmarks reaches the timeout.
Comparing the number of covered branches, we can see that except in ctrace2 (where
(conc)2 olic testing reaches the timeout), (conc)2 olic testing performs better than MTA. This
is because (conc)2 olic testing guarantees to cover each branch that can be covered by some
tests (modulo the interference bound) where as MTA employs heuristics in selecting the inter-
loper segments and exploring the input and interleaving spaces that do not provide coverage
guarantees.
Another observation is that for benchmarks where (conc)2 olic testing does not reach the
timeout, MTA requires more time to find the bug. This is because the analysis in (conc)2 olic
testing is performed at global computation level (i.e., it is based on accesses to shared variables
and synchronization events), while MTA considers and encodes all computation (local as well
as global) in the runs which naturally increases the size of the test generation problem and
From the above observations, one might conclude that (conc)2 olic testing is better than
MTA in any ways. However, that is not true. In case of ctrace2, where (conc)2 olic test-
ing reached the timeout, MTA performs better than (conc)2 olic testing regarding both branch
coverage and bug finding time. The interference scenario space of ctrace2 is so huge that
(conc)2 olic testing could not get to k = 2 in 2 hours. In this case, MTA required less time to
find the bug and had a chance to cover more branches. These branches are covered under inter-
ference scenarios that (conc)2 olic testing did not get the chance to explore them. This suggests
that for programs with large interference scenario spaces, MTA might actually perform better
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS155
than (conc)2 olic testing regarding bug finding. Therefore, both techniques have their own right
of existence.
Conclusion: Our experiments showed that (conc)2 olic testing is effective in increasing branch
coverage in concurrent programs. In fact for many of bur benchmarks, it was able to provide
maximum branch coverage that provides bug finding guarantees for the tester, e.g., no more
assertion in the code can be violated. Furthermore, (conc)2 olic testing was able to find a large
number of bugs in our benchmarks. All of the bugs in our benchmarks were found by allow-
ing only a few number of interferences among threads which shows the effectiveness of the
bounded-interference heuristic in bug finding. We also showed that all of the optimizations
that we proposed for reducing the interference scenario exploration space are necessary for
scalability of (conc)2 olic testing. Furthermore, we compared (conc)2 olic testing with MTA.
Our experiments showed that none of them outperforms the other and both techniques have
their own right of existence. (Conc)2 olic testing performs better than MTA for programs with
manageable interference scenario spaces as it can provide maximum coverage guarantees for
them. However, for programs with huge interference scenario spaces, MTA may perform better
in bug finding as it might be able to try interference scenarios that the (conc)2 olic testing does
Concolic testing of multi-threaded programs has been introduced by Sen et al. [73, 72, 70]
and realized in a tool for testing concurrent java programs, named jCUTE. There are several
differences between the technique proposed by Sen et al. and (conc)2 olic testing: Their tech-
nique uses data races as a heuristic to limit the interleaving space, i.e., interleaving exploration
is done based on the data races found in previous executions by delaying the execution of the
threads at the data race points to get schedules in which the order of the execution of the events
involved in a data race is flipped. (Conc)2 olic testing uses the bounded-interference heuristic
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS156
to reduce the exploration space. This algorithm is proved to be complete in [70]. However, in
contrast to (conc)2 olic testing that provides coverage guarantees modulo the maximum bound
reached, jCUTE cannot provide coverage guarantees on partial work done (e.g., at the occasion
of hitting time or memory limitations). This is due to the use of the data race heuristic that does
not quantify the partial work done as a meaningful coverage measure for the program.
Our notion of interference-based search is related to work by Wang et al. [80, 79], where
they use concurrent trace programs as an under-approximation of the programs and explore
the interference scenario space to find bugs in program approximations. The proposed method
in [80] utilizes both over- and under-approximations of interferences among the threads in con-
current trace programs to capture suitable interferences for finding assertion violations. In [79],
a two-staged analysis is proposed which separates intra- and inter-thread reasoning. The first
stage uses sequential program semantics to obtain a precise summary of each thread in terms
of the global accesses made by the thread. The second stage performs inter-thread reason-
ing by composing these thread-modular summaries using the notion of sequential consistency
to find assertion violations. However, there are several differences between these techniques
and (conc)2 olic testing: They work on program approximations as opposed to programs. Fur-
thermore, they target discovering assertion violations (by performing a symbolic analysis) in
concurrent trace programs as opposed to generating tests for exploring different program be-
haviours. Moreover, their analysis is based on symbolic traces, which include all computation
(local as well as global) and synchronization in program executions. The analysis in (conc)2 olic
testing uses global symbolic traces, which ignores local computation, and therefore is much
more scalable.
6.7 Summary
In this chapter, we adapted a sequential concolic testing technique, based on the bounded-
interference heuristic, to generate tests for concurrent programs. We introduced a new com-
C HAPTER 6. B OUNDED -I NTERFERENCE C ONCOLIC T ESTING OF C ONCURRENT P ROGRAMS157
ponent in concolic testing, called interference scenario exploration component, that explores
possible interference scenarios (within the interference bound). Using the interference scenario
exploration component, we built a general testing framework where one can employ different
strategies in exploring both input and interference scenario spaces. We have developed a search
strategy that targets providing maximum branch coverage by incrementally increasing the in-
terference bound. Therefore, it is able to provide coverage guarantees (modulo the interference
bound and concolic execution limitations) after the testing process is finished or when a time
or computation limit is reached. We proved that our testing technique is both sound and com-
plete. We implemented the testing technique by leveraging the concolic testing tool C REST to
test multi-threaded C programs and performed some experiments that show the effectiveness
of (conc)2 olic testing in increasing branch coverage and finding concurrency bugs.
Chapter 7
Testing concurrent programs is notoriously hard because the behavior of a concurrent program
not only depends on input values but also is affected by the way the executions of threads
are interleaved. There is often an exponentially large number of interleavings that need to be
explored and an exhaustive search is mostly infeasible. The research in this thesis focused
techniques that use heuristics to reduce the exploration space by focusing on a manageable
Summary
The first approach that we took to alleviate the exploration problem was to ignore input explo-
ration and explore the interleaving space under fixed inputs. However, the interleaving space
itself is huge enough for real world concurrent programs to make the complete exploration
infeasible. Therefore, many techniques that follow this approach (e.g., prediction techniques)
target specific types of bugs and try to select and explore a subset of the interleavings that have
more chances to reveal those bugs. These techniques have focused mostly on bugs correspond-
ing to atomicity violations, data races, and assertion violations. However, there are other types
of bugs (e.g., memory bugs, deadlocks, etc.) that have not been studied in this approach and
158
C HAPTER 7. C ONCLUSION AND F UTURE W ORK 159
In Chapter 2, we introduced a pattern, called null reads, for targeting memory bugs that lead
to null-pointer dereferences in concurrent programs; i.e., interleavings that realize this pattern
technique that according to a single observed execution of the program, predicts other runs
(under fixed inputs) that realize a null read pattern. We studied two different encodings of the
prediction problem, one as a set of logical constraints and one as an AI planning problem. The
former allows us to use SMT solvers to search for solutions while the latter enables us to ben-
efit from the compact encoding techniques and advanced heuristic-based searching algorithms
embedded in AI planners. Our prediction technique is both sound and scalable. We provided
a theorem that proves the soundness. The scalability of the prediction technique is supported
by our set of experiments. Our experiments also showed that our prediction technique is very
fast and effective in finding null-pointer dereferences in concurrent programs. Another valu-
able property of the prediction technique is that it provides a general framework which can be
applied on patterns other than null reads to find other types of bugs.
The next approach that we took to alleviate the exploration problem in testing concurrent
programs was to explore both input and interleaving spaces of the programs but based on an
approximation model. Most of the techniques that follow this approach, use concurrent trace
programs, i.e., program slices built based on program executions, as approximation models for
concurrent programs. These techniques fix the approximation model a priori and utilize static
According to a previous research [25], many runtime bugs (including assertion violations) can
be encoded as branches in an active testing framework. We used this result and built a multi-
trace analysis with the aim of increasing branch coverage in concurrent programs which inher-
ently targets a broader set of bugs than assertion violations. The multi-trace analysis, for each
uncovered branch, tries to find an interloper segment from an execution trace that provides a
C HAPTER 7. C ONCLUSION AND F UTURE W ORK 160
value needed to cover the branch and searches for input values and a schedule that would cover
the uncovered branch by inserting the interloper segment in another run. Our test generation
technique combines the sequential concolic testing with the multi-trace analysis by subjecting
each thread to sequential concolic testing first to increase branch coverage in individual thread
as much as possible. Then, upon saturation multi-trace analysis targets uncovered branches.
Furthermore, the test generation technique does not fix the approximation model; it extends
the approximation model by each generated test run. Our experiments showed that the test
The last approach in test generation for concurrent programs was to consider the programs
in the first place and use heuristics for input/interleaving exploration that allow us to provide
some meaningful coverage guarantees for the programs. Most of the techniques that follow this
approach use context bounding (which is defined based on the notion of control flow among
threads) as the heuristic to limit the interleaving exploration. However, many thread interleav-
ings might be equivalent to each other according to the way threads interfere with each other.
Therefore, exploring all such interleavings reduces the efficiency without discovering any new
bugs.
ing space by a parameter that bounds the number of interferences among the threads. Therefore,
it can be used to provide coverage guarantees modulo interference bound. Another property of
this heuristic is that it is defined based on the notion of flow of data between threads (in con-
trast to the control-based notions such as context bounding). Therefore, it can be incorporated
into sequential testing techniques to explore the input and interleaving spaces of the concurrent
and an interference bound k, our sequentialization transforms the concurrent program into a
sequential program such that the resulting sequential program encodes all behaviours of the
concurrent program with maximum k number of interferences among the threads. One ad-
vantage of this sequentialization is that traditional sequential testing techniques can be applied
on the resulting sequential program without any modification. Inputs of the concurrent pro-
gram and interference scenarios are both encoded as inputs of the resulting sequential program
and therefore underlying sequential testing tools are able to explore both input and interfer-
ence scenario spaces, side by side. We proved that our sequentialization is sound and could
be complete (modulo interference bound) depending on the coverage guarantees that the un-
derlying sequential testing tool provides. Our experiments showed that most of concurrency
bugs can be revealed by allowing a few number of interferences among the threads and hence
After ensuring the effectiveness of the bounded-interference heuristic and to avoid the over-
head of sequentialization and the dependency of the coverage guarantees on the underlying
sequential testing tools, we adapted a sequential concolic testing technique with the bounded-
interference heuristic to generate tests for concurrent programs with coverage guarantees. In
Chapter 6, we introduced a new component in concolic testing that explores possible interfer-
ence scenarios, within the interference bound. Then, for each interference scenario, it generates
a test (i.e., input values and a schedule) that realizes the interference scenario (if possible). A
nice property of this testing technique is that it provides a general framework where one can
search strategy that targets branch coverage in concurrent programs; i.e., interference scenar-
ios are explored based on uncovered branches in the program. We proved that this technique
is both sound and complete (modulo interference bound). The completeness does not depend
on coverage guarantees of sequential concolic testing techniques since the search strategy in
for concurrent programs where coverage guarantees can be provided only when the testing al-
gorithm terminates (after the exploration is completed according to the underlying heuristic),
our concolic testing technique is able to quantify the partial work done and provide coverage
guarantees at each point of time (e.g., at the occasion of a timeout) modulo the explored bound.
Conclusion
We developed several heuristic-based techniques for testing concurrent programs. These tech-
niques attack the problem of test generation from different points of views and each of them has
its own right of existence. In fact, this is the testing goal that defines which of these techniques
should be used for testing. The testing goal can range from simplicity and time efficiency to
Our prediction technique simplifies the testing problem by exploring the interleaving space
type of bugs. Although we have developed our prediction technique by targeting null-pointer
dereferences in concurrent programs, our technique provides a framework which can be used
to investigate other types of bugs as well. This technique is very simple and therefore, can be
used in earlier stages of the program design where the goal is to catch simple bugs that does
not require complicated scenarios for input values as fast as possible. It can also be used when
the tester has an idea under which input values the program might encounter some problems.
the most efficient technique when the goal is to catch a specific type of bugs.
Our test generation technique based on program approximations is more expensive than
the prediction technique since it performs input exploration as well as interleaving exploration.
This technique does not target any specific type of bugs; it tries to find bugs by increasing
branch coverage in concurrent programs. In contrast to the prediction technique, this tech-
nique is able to catch bugs that depend on complicated scenarios for input values. During the
C HAPTER 7. C ONCLUSION AND F UTURE W ORK 163
software testing process, we suggest to apply prediction techniques first to catch simple bugs
with less cost. Then, our test generation technique based on program approximation can be ap-
plied to catch bugs that might be overlooked by the prediction techniques because of ignoring
input exploration. However, due to approximations, this technique cannot provide coverage
The bounded-interference concolic testing technique, like the test generation technique
based on program approximations, tries to find bugs by increasing branch coverage in con-
current programs. However, it is able to provides branch coverage guarantees for concurrent
programs (modulo interference bound). Therefore, in the limit all coverable branches are guar-
anteed to be covered at the end of the testing process. However, this technique is more ex-
pensive than the other techniques and hence is best to be used at the last stages of software
Now, one might wonder why do we need the previous techniques when the bounded-
interference concolic testing technique is able to provide full coverage guarantees. The answer
is that for many large programs, the time and computation limitations might not let the inter-
ference exploration to reach its limit. Therefore, for such programs, the interference bound
cannot go beyond a certain bound. In this case, by using the previous techniques, one still has
Future Work
In the following, we mention some interesting open problems of this theses that are worth to
be addressed in future:
concurrent programs by exploring the interleavings that realize a null read pattern. We believe
that there is still lots of room to investigate new violation patterns that target other types of
program bugs. For example, some array out-of-bound errors could be detected by a pattern
C HAPTER 7. C ONCLUSION AND F UTURE W ORK 164
(e, f ) where e is a read from a shared variable indexing an array and f is a write to the same
shared variable that writes a value that is out of the array bound. Any run that realizes this
pattern, i.e., forcing e to read the value written by f , would lead to an array out-of-bound error.
Other interesting patterns would be the ones that target deadlocks in concurrent programs. For
example, some simple deadlock situations could be detected by a pattern (ac1 , ac2 , ac01 , ac02 )
where ac1 and ac2 are lock acquisition events of locks l and l0 in one thread, respectively, and
ac01 and ac02 are lock acquisition of locks l0 and l in another thread, respectively, and ac2 and
ac02 are inside the lock blocks corresponding to ac1 and ac01 , respectively; any run in which ac2
and ac02 are co-reachable would block both threads. We leave investing these violation patterns
To guarantee soundness, we utilized the maximal causal model in our prediction technique
that requires each read in the predicted run (except the one involved in the null read pattern)
to read the same value as it did in the original run. Note that a prediction problem does not
necessarily have a solution in the maximal causal model. We proposed a relaxation technique,
which deviates from the maximal causal model gradually, by allowing some reads to read
different values than what they read in the original run. The relaxation technique searches
for a run, realizing the given null read pattern, with the minimum number of relaxed reads
while there is no preference on which reads to relax. Runs that are predicted by the relaxation
technique are no more guaranteed to be feasible. We believe that a more detailed analysis
based on the program source code to detect the set of reads whose values do not affect affect
any branch condition could help in predicting more feasible runs by adapting the relaxation
technique such that those reads have priorities over others to get relaxed.
We also showed that the prediction problem can be encoded as an AI automated planning
problem to benefit from the compact encoding techniques and advanced heuristic-based search-
ing algorithms embedded in AI planners. Note that the encoding employed for sound prediction
did not exploit numerics. A variety of planners, including a variant of FF, METRIC-FF [32],
plan with numeric fluents. There has also been significant work on planning with numerics
C HAPTER 7. C ONCLUSION AND F UTURE W ORK 165
using a diversity of approaches including SAT-encodings (e.g., [33]), and most recently with
encodings using so-called Planning Modulo Theories [27], the latter holding great promise
for test generation with numeric reasoning but with the computational advantages of domain-
the applicability of these techniques in our relaxation technique which involves numeric rea-
soning.
Future Work on Test Generation Based on Approximation Models: Our test generation
We developed a search algorithm that prioritizes yet uncovered branches according to the num-
ber of attempts that have been made for covering them; branches with less number of attempts
have priorities over the others. Then, for the branch with highest priority, the multi-trace anal-
ysis enumerates all possible interloper segments, looking for input values and a schedule that
would result in covering the branch by inserting the interloper segment into another run. Al-
though our experiments showed that this search strategy is effective in increasing branch cov-
erage and finding concurrency bugs, we believe that the test generation technique can still be
improved by realizing other heuristics in the search algorithm. For example, the depth of the
uncovered branches in the control-flow graph of the program could also be used as a heuristic
to prioritize the branches; branches with less depth have priorities over others since covering
them might lead to discovering a larger part of the program. Furthermore, interloper segments
could also be prioritized according to different factors. For example, interloper segments with
shorter lengths could have priorities over others since they are less restrictive w.r.t. generating
feasible tests. Finally, the search algorithm can be adapted to incremental testing where it can
skip covering some of the uncovered branches (e.g., according to a list provided by the tester).
Another topic for future work would be to investigate more sophisticated and targeted search
colic testing technique provides coverage guarantees for concurrent programs (modulo inter-
C HAPTER 7. C ONCLUSION AND F UTURE W ORK 166
ference bound). We developed a search strategy where all interference scenarios of degree k
(i.e., consisting of k interferences) for all uncovered branches are explored before exploring
interference scenarios of degree k + 1 (starting at k = 0). This search strategy is not guaran-
teed to always be the most efficient one. For example, a search strategy that focuses on one
branch at a time and explores the interference space (increasing the number of interferences to
the bound) accordingly might be a more efficient strategy when some branches have priorities
over the others, e.g., they correspond to assertion violations. Another issue is the size of the
interference scenario space that could get very large even for small bounds for large programs.
Our current search strategy enumerates all possible interference scenarios (modulo interfer-
ence bound). We believe that the search strategy can borrow ideas from partial order reduction
techniques [86, 26] in verification to avoid redundant exploration of interference scenarios that
have the same effect. However, one has to be very careful with applying reduction techniques
to preserve completeness. We leave investigating other search strategies and possible reduction
tic can be used in program analysis areas other than test generation. For example, the context
bounding heuristic has been used in model checking to develop context-bounded model check-
ers [61, 8] that verify properties modulo a context bound. We can consider similar applica-
tion for the bounded-interference heuristic to verify properties modulo an interference bound.
As another example, the bounded-interference heuristic can be used in bug localization (i.e.,
identifying the roots of the bug) and bug fixing [36, 37, 48]. Concurrency bugs often relate
interference heuristic, we can find interference scenarios with the minimum number of inter-
ferences that lead to a single bug. These interference scenarios can be analyzed automatically
to localize the bug. Applying the bounded-interference heuristic in automatic bug localization
In Chapter 4, we mentioned that the context bounding heuristic is not so efficient as many
C HAPTER 7. C ONCLUSION AND F UTURE W ORK 167
interleavings might be equivalent to each other according to the interferences that exist among
the threads. We think that partial order reduction techniques [86, 26] can be combined with
context bounding to avoid this inefficiency. However, the partial order techniques would define
equivalent classes on the set of interleavings such that interleavings in the same class realize the
same interference scenarios. In this case, the combination of partial order reduction techniques
with context bounding would result in exploring all interference scenarios within a bounded
number of context switches. Implementing this approach and comparing it with the bounded-
[1] Jorge A. Baier and Sheila A. McIlraith. Planning with first-order temporally extended
goals using heuristic search. In Proceedings of the 21st National Conference on Artificial
[2] David Bainbridge, Ian H. Witten, Stefan Boddie, and John Thompson. Stress-testing
general purpose digital library software. In Proceedings of the 13th European Conference
on Research and Advanced Technology for Digital Libraries, ECDL09, pages 203214,
2009.
[3] Sebastian Burckhardt, Pravesh Kothari, Madanlal Musuvathi, and Santosh Nagarakatte.
[4] Jacob Burnim and Koushik Sen. Heuristics for scalable dynamic test generation. In Pro-
[5] Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, and Dawson R. En-
gler. EXE: Automatically generating inputs of death. In Proceedings of the 13th ACM
168
B IBLIOGRAPHY 169
[6] Feng Chen and Grigore Rosu. Parametric and sliced causality. In Proceedings of the
2007.
[7] Feng Chen, Traian-Florin Serbanuta, and Grigore Rosu. jPredictor: a predictive runtime
analysis tool for Java. In Proceedings of the 30th International Conference on Software
[8] Lucas Cordeiro and Bernd Fischer. Verifying multi-threaded software using smt-based
[9] Leonardo Mendonca de Moura and Nikolaj Bjrner. Z3: An efficient SMT solver. In
Proceedings of the 14th International Conference on Tools and Algorithms for the Con-
[10] Bruno Dutertre and Leonardo de Moura. A fast linear-arithmetic solver for DPLL(T).
[11] Orit Edelstein, Eitan Farchi, Yarden Nir, Gil Ratsaby, and Shmuel Ur. Multithreaded Java
[12] Michael Emmi, Shaz Qadeer, and Zvonimir Rakamaric. Delay-bounded scheduling. In
[13] Dawson Engler and Ken Ashcraft. RacerX: effective, static detection of race conditions
[14] Azadeh Farzan, Andreas Holzer, Niloofar Razavi, and Helmut Veith. (Conc)2 olic Testing.
[15] Azadeh Farzan and P. Madhusudan. Causal atomicity. In Proceedings of the 18th Inter-
[16] Azadeh Farzan, P. Madhusudan, Niloofar Razavi, and Francesco Sorrentino. Predicting
4756, 2012.
[17] Azadeh Farzan, P. Madhusudan, and Francesco Sorrentino. Meta-analysis for atomicity
[18] Cormac Flanagan and Stephen N. Freund. FastTrack: efficient and precise dynamic race
[19] Cormac Flanagan and Shaz Qadeer. Types for atomicity. In Proceedings of the 2003 ACM
[20] John Foley and Chris Murphy. Q&A: Bill Gates On Trustworthy Com-
qa-bill-gates-on-trustworthy-computing/6502378.
ceedings of the 17th International Conference on Tools and Algorithms for the Construc-
[22] Alfonso Gerevini, Patrik Haslum, Derek Long, Alessandro Saetti, and Yannis Dimopou-
los. Deterministic planning in the fifth international planning competition: PDDL3 and
[23] Patrice Godefroid. Model checking for programming languages using VeriSoft. In Pro-
[24] Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: directed automated random
[25] Patrice Godefroid, Michael Y. Levin, and David A. Molnar. Active property checking. In
Proceedings of the 8th ACM & IEEE International Conference on Embedded Software,
[26] Patrice Godefroid and Pierre Wolper. A partial approach to model checking. Information
[27] Peter Gregory, Derek Long, Maria Fox, and J. Christopher Beck. Planning modulo theo-
ries: Extending the planning paradigm. In Proceedings of the 22nd International Confer-
[28] Alex Groce and Willem Visser. Model checking Java programs using structural heuristics.
[29] Patrik Haslum and Alban Grastien. Diagnosis as planning: Two case studies. In Proceed-
[30] Maurice P. Herlihy and Jeannette M. Wing. Linearizability: a correctness condition for
492, 1990.
[31] Joerg Hoffmann. FF: The Fast-Forward planning system. AI Magazine, 22:5762, 2001.
[32] Joerg Hoffmann. The Metric-FF planning system: Translating ignoring delete lists to
2003.
[33] Joerg Hoffmann, Carla Gomes, Bart Selman, and Henry Kautz. SAT encodings of state-
[34] Andreas Holzer, Christian Schallhart, Michael Tautschnig, and Helmut Veith. Query-
tion, Model Checking, and Abstract Interpretation, VMCAI 09, pages 151166, 2009.
[35] Jeff Huang and Charles Zhang. Persuasive prediction of concurrency access anomalies.
[36] Guoliang Jin, Linhai Song, Wei Zhang, Shan Lu, and Ben Liblit. Automated atomicity-
[37] Guoliang Jin, Wei Zhang, Dongdong Deng, Ben Liblit, and Shan Lu. Automated
[38] Saurabh Joshi, Shuvendu K. Lahiri, and Akash Lal. Underspecified harnesses and inter-
[39] Vineet Kahlon, Franjo Ivanaic, and Aarti Gupta. Reasoning about threads communicat-
ing via locks. In Proceedings of the 17th International Conference on Computer Aided
[40] Henry A. Kautz and Bart Selman. Unifying SAT-based and graph-based planning. In Pro-
[41] Daniel Kroening and Ofer Strichman. Decision Procedures: An Algorithmic Point of
[42] Shuvendu K. Lahiri, Shaz Qadeer, and Zvonimir Rakamaric. Static and precise detec-
tion of concurrency errors in systems code using SMT solvers. In Proceedings of the
2009.
[43] Zhifeng Lai, Shing-Chi Cheung, and Wing Kwong Chan. Detecting atomic-set serializ-
235244, 2010.
[44] Akash Lal and Thomas Reps. Reducing concurrent analysis under a context bound to
[45] Jaejin Lee, David A. Padua, and Samuel P. Midkiff. Basic compiler algorithms for parallel
[46] Shan Lu, Zhenmin Li, Feng Qin, Lin Tan, Pin Zhou, and Yuanyuan Zhou. BugBench:
Benchmarks for evaluating bug detection tools. In Workshop on the Evaluation of Soft-
[47] Shan Lu, Soyeon Park, Eunsoo Seo, and Yuanyuan Zhou. Learning from mistakes: a
[48] Shan Lu, Soyeon Park, and Yuanyuan Zhou. Detecting concurrency bugs from the per-
[49] Shan Lu, Joseph Tucek, Feng Qin, and Yuanyuan Zhou. AVIO: Detecting atomicity
[50] Daniel Marino, Madanlal Musuvathi, and Satish Narayanasamy. LiteRace: effective sam-
[51] Drew V. McDermott. PDDL The Planning Domain Definition Language. Technical
[52] Jesper B. Mller, Jakob Lichtenberg, Henrik Reif Andersen, and Henrik Hulgaard. Dif-
ference decision diagrams. In 8th Annual Conference of the EACSL on Computer Science
[53] Madan Musuvathi and Shaz Qadeer. Chess: systematic stress testing of concurrent soft-
[54] Madanlal Musuvathi and Shaz Qadeer. Iterative context bounding for systematic testing
[55] Madanlal Musuvathi, Shaz Qadeer, Thomas Ball, Gerard Basler, Piramanayagam Aru-
muga Nainar, and Iulian Neamtiu. Finding and reproducing heisenbugs in concurrent
[56] Dana Nau, Malik Ghallab, and Paolo Traverso. Automated Planning: Theory & Practice.
[57] Robert OCallahan and Jong-Deok Choi. Hybrid dynamic data race detection. In Pro-
ceedings of the 9th ACM SIGPLAN Symposium on Principles and Practice of Parallel
[58] Chang-Seo Park and Koushik Sen. Randomized active atomicity violation detection in
[59] Soyeon Park, Shan Lu, and Yuanyuan Zhou. CTrigger: exposing atomicity violation bugs
[60] Amir Pnueli. The temporal logic of programs. In Proceedings of the 18th Annual Sym-
[61] Shaz Qadeer and Jakob Rehof. Context-bounded model checking of concurrent software.
In Proceedings of the 11th International Conference on Tools and Algorithms for the
[62] Shaz Qadeer and Dinghao Wu. KISS: keep it simple and sequential. SIGPLAN Notices,
39:1424, 2004.
[63] Zvonimir Rakamaric. STORM: static unit checking of concurrent programs. In Proceed-
[64] Niloofar Razavi, Azadeh Farzan, and Andreas Holzer. Bounded-interference sequential-
ization for testing concurrent programs. In Proceedings of the 5th International Confer-
[65] Niloofar Razavi, Azadeh Farzan, and Sheila A. McIlraith. Generating effective tests for
[66] Niloofar Razavi, Franjo Ivanaic, Vineet Kahlon, and Aarti Gupta. Concurrent test gen-
eration using concolic multi-trace analysis. In Proceedings of the 10th Asian Symposium
[67] Jussi Rintanen. Planning with specialized SAT solvers. In Proceedings of the 25th AAAI
[68] Mahmoud Said, Chao Wang, Zijiang Yang, and Karem Sakallah. Generating data race
[69] Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, and Thomas Ander-
son. Eraser: a dynamic data race detector for multithreaded programs. ACM Transactions
[70] Koushik Sen. Scalable automated methods for dynamic program analysis. In PhD Dis-
sertation, 2006.
[71] Koushik Sen. Race directed random testing of concurrent programs. SIGPLAN Notices,
43(6):1121, 2008.
[72] Koushik Sen and Gul Agha. A race-detection and flipping algorithm for automated testing
Conference on Hardware and Software, Verification and Testing, HVC06, pages 166
182.
[73] Koushik Sen and Gul Agha. CUTE and jCUTE: concolic unit testing and explicit path
[74] Traian-Florin Serbanuta, Feng Chen, and Grigore Rosu. Maximal causal models for se-
[75] Ohad Shacham, Nathan Bronson, Alex Aiken, Mooly Sagiv, Martin Vechev, and Eran
[76] Yao Shi, Soyeon Park, Zuoning Yin, Shan Lu, Yuanyuan Zhou, Wenguang Chen, and
Weimin Zheng. Do I use the wrong definition?: DeFuse: definition-use invariants for de-
tecting concurrency and sequential bugs. In Proceedings of the ACM International Con-
[77] Arnab Sinha, Sharad Malik, and Aarti Gupta. Efficient predictive analysis for detecting
the Theory and Applications of Formal Methods in Hardware and System Verification,
[78] Arnab Sinha, Sharad Malik, Chao Wang, and Aarti Gupta. Predictive analysis for de-
[79] Nishant Sinha and Chao Wang. Staged concurrent program analysis. In Proceedings of
[80] Nishant Sinha and Chao Wang. On interference abstractions. In Proceedings of the 38th
[81] Fabio Somenzi and Roderick Bloem. Efficient Bchi automata from LTL formulae. In Pro-
[82] Francesco Sorrentino, Azadeh Farzan, and P. Madhusudan. PENELOPE: weaving threads
[83] Scott D. Stoller. Testing concurrent Java programs using randomized scheduling. In
[84] Nikolai Tillmann and Jonathan De Halleux. Pex: white box test generation for .NET.
In Proceedings of the 2nd International Conference on Tests and Proofs, TAP08, pages
134153, 2008.
[85] Salvatore Torre, P. Madhusudan, and Gennaro Parlato. Reducing context-bounded con-
[86] Antti Valmari. A stubborn attack on state explosion. In Proceedings of the 2nd Interna-
[87] Moshe Y. Vardi. An automata-theoretic approach to linear temporal logic. In Banff Higher
[88] Christoph von Praun and Thomas R. Gross. Object race detection. In Proceedings of the
[89] Chao Wang, Sudipta Kundu, Malay Ganai, and Aarti Gupta. Symbolic predictive analysis
for concurrent programs. In Proceedings of the 2nd World Congress on Formal Methods,
[90] Chao Wang, Rhishikesh Limaye, Malay Ganai, and Aarti Gupta. Trace-based symbolic
Tools and Algorithms for the Construction and Analysis of Systems, TACAS10, pages
328342, 2010.
[91] Chao Wang, Mahmoud Said, and Aarti Gupta. Coverage guided systematic concurrency
[92] Liqiang Wang and Scott D. Stoller. Accurate and efficient runtime detection of atomicity
[93] Liqiang Wang and Scott D. Stoller. Runtime analysis of atomicity for multi-threaded
[94] Jaeheon Yi, Caitlin Sadowski, and Cormac Flanagan. SideTrack: generalizing dynamic
atomicity analysis. In Proceedings of the 7th Workshop on Parallel and Distributed Sys-
[95] Yuan Yu, Tom Rodeheffer, and Wei Chen. RaceTrack: efficient detection of data race
2005.
180
[96] Wei Zhang, Chong Sun, and Shan Lu. ConMem: detecting severe concurrency bugs
ings in Prediction
Consider the buggy program in Figure 1. initialize and exit are the methods of the same
class with x and y being the fields of the class of types myObject and int, respectively.
Method initialize is for initializing x and y, which calls a function func of x after initial-
izing x and y. Method exit reads an input and if the value of the input is greater than zero
writes zero and null to y and x, respectively. Now, consider a concurrent program consisting
Suppose that we execute the program with input value 1 and observe a run in which T1 is
this run where p1 = (T1 , ac(l)), p2 = (T1 , wt(x, obj)), p3 = (T1 , wt(y, 1)), p4 = (T1 , rd(y, 1)),
p5 = (T1 , rel(l)), p6 = (T1 , ac(l)), p7 = (T1 , rd(x, obj)), p8 = (T1 , rel(l)), q1 = (T2 , ac(l)),
q2 = (T2 , wt(x, obj)), p3 = (T2 , wt(y, 1)), and q4 = (T2 , rel(l)). Based on this global trace and
using the maximal causal model, we want to predict a run for the null-WR pair = (q3 , p7 ).
Here is the set of constraints obtained by the logical constraints encoding proposed in Sec-
tion 2.4.1:
= PO FC LC DC C where
FC = (true)
181
182
This constraint system is satisfiable and tinit < tp1 < tp2 < tp3 < tinit < tp4 < tp5 < tq1 <
tq2 < tq3 < tq4 < tp6 < tp7 defines a schedule which would lead to a null-pointer dereference at
event q3 when the program reads value 1 from the input (as in the observed run).
tion 2.5.2. Here is the set of actions in the planning domain that encode the null-pointer deref-
erence prediction:
(: ACTION Acp1
)
183
(: ACTION Acp2
(: ACTION Acp3
(: ACTION Acp5
(: ACTION Acp6
(: ACTION Acp7
184
(: ACTION Acq1
(: ACTION Acq2
(: ACTION Acq3
(: ACTION Acq4
(: goal(Happenedp7 ))
According to this encoding, sequence Acp1 , Acp2 , Acp3 , Acp4 , Acp5 , Acq1 , Acq2 , Acq3 , Acq4 ,
185
Acp6 , Acp7 is a solution for the planning problem which implies a bug-triggering schedule for
the program.
Appendix B
Proof. The key feature of ISCs generated by Algorithm 3 that proves the lemma, is that they are
generated based on feasible program runs. In fact, this lemma does not hold for any arbitrary
C. For an interference scenario S = (V, E, `), let GTi (S), V (S), E(S), and EI (S) represent
identifiers that shows which thread should be executed at each step. We show that R is feasible
Let Rk = P (i, [1..k]) be the partial program run obtained by executing program P ac-
cording to for k steps and let Ck be a sub-interference scenario of C such that for each thread
Ti , Ck contains the first m nodes of GTi (C) if Ti is executed for m steps in [1..k]. Ck contains
edges of C for which both involving vertexes are in Ck . Let i (Ck ) be the sequence of labels
of the nodes in the path GTi (Ck ) from the root to the leave.
Now, we have to prove that Rk is feasible and IS(Rk ) and Ck are isomorphic for all 1
k n (where n = ||). To prove the isomorphism of IS(Rk ) and Ck , we prove that (1)
GTi (IS(Rk )) and GTi (Ck ) are isomorphic for every thread Ti and (2) EI (IS(Rk )) and EI (Ck )
are isomorphic, i.e., there is an interference edge (u, v) in Ck iff (isom(u), isom(v)) is an
interference edge in IS(Rk ), where isom(n) maps a node n in Ck to its isomorphic node in
Induction base: Without loss of generality we assume that thread Ti performs action a in
186
187
the first step in R. First of all, R1 is feasible because the first step cannot be blocked as no
blocking synchronization event (like lock acquire) happens before it in R. We know that C is
built from feasible executions of P (i.e., i (C1 ) corresponds to a feasible thread-local execution
reading/writing to a shared variable or acquiring a lock as its first action a in an execution of the
program, then it should read/write to the same variable or acquire the same lock, respectively,
as its first action in all program executions. If Ti performs a branching action corresponding
to a conditional statement as its first action a, then the action of the node in C1 should also
be a branching action corresponding to the same conditional statement. Let br() be the label
of the node in C1 . Without loss of generality we assume that corresponds the branch where
the condition of the conditional statement is true. The conditional statement can only depend
on inputs (and not any shared variable reads since there is no read from a shared variable
before a in Ti ). On the other hand, input vector i is obtained from DC(C) which forces the
proves that GTi (IS(R1 )) and GTi (C1 ) are isomorphic. Also, there is no interference for the
first step in both IS(R1 ) and C1 and hence EI (IS(R1 )) and EI (C1 ) are both empty. As a
Induction hypothesis: Suppose that for all 1 < k n 1, Rk is feasible, and IS(Rk ) is
Induction step: We prove that Rn is feasible, and IS(Rn ) is isomorphic to Cn . Without loss of
generality, assume that in the nth step, thread Ti performs an action a. We have the following
and hence Rn is feasible. We know that i (Cn ) represents a feasible thread-local ex-
ecution of Ti . According to induction hypothesis, GTi (IS(Rn1 )) and GTi (Cn1 ) are
isomorphic meaning that Ti in run Rn1 takes exactly the same path as it takes in the
188
of Ti , it is implied that the action corresponding to the last node in GTi (Cn ) is wt(x, val0 )
and val0 is equal to val where each symbolic variable r is replaced by its map r0 . There-
fore, GTi (IS(Rn )) and GTi (Cn ) are isomorphic. Furthermore, a does not introduce any
other hand, EI (Cn1 ) is equal to EI (Cn ) since all nodes in Cn1 are causally before the
last node of GTi (Cn ) in Cn and hence the last node in GTi (Cn ) does not involve in any
a = ac(l): Thread Ti is able to perform perform a without getting blocked since Rn fol-
lows the schedule which is lock consistent according to the temporal-consistency con-
and GTi (Cn1 ) are isomorphic meaning that Ti in run Rn1 takes exactly the same path as
it takes in the feasible thread-local execution represented by i (Cn1 ). Together with the
determinism of Ti , it is implied that the action corresponding to the last node in GTi (Cn )
is ac(l). Therefore, GTi (IS(Rn )) and GTi (Cn ) are isomorphic. Furthermore, a does
a = rel(l): Thread Ti is able to perform a since it is not a blocking action and hence
Rn is feasible. According to induction hypothesis, GTi (IS(Rn1 )) and GTi (Cn1 ) are
isomorphic meaning that Ti in run Rn1 takes exactly the same path as it takes in the
of Ti , it is implied that the action corresponding to the last node in GTi (Cn ) is rel(l).
Therefore, GTi (IS(Rn )) and GTi (Cn ) are isomorphic. Furthermore, a does not introduce
isomorphic to EI (Cn ).
hypothesis, GTi (IS(Rn1 )) and GTi (Cn1 ) are isomorphic meaning that Ti in run Rn1
takes exactly the same path as it takes in the feasible thread-local execution represented
by i (Cn1 ). Together with the determinism of Ti , it is implied that the action corre-
sponding to the last node in GTi (Cn ) should be branching action corresponding to the
Without loss of generality, we assume that represents that the condition of statement
S is true. We prove that 0 represents that the condition of statement S is true as well,
i.e., 0 is equal to where each symbolic variable r is replaced by its map r0 . i satisfies
statement S being false. As a result, GTi (IS(Rn )) and GTi (Cn ) are isomorphic. Further-
more, a does not introduce any new interference in IS(Rn ). Therefore, EI (IS(Rn )) =
a = rd(x, r): Thread Ti is able to perform a since it is not a synchronization action and
hence Rn is feasible. According to induction hypothesis, GTi (IS(Rn1 )) and GTi (Cn1 )
are isomorphic meaning that Ti in run Rn1 takes exactly the same path as it takes in the
of Ti , it is implied that the action corresponding to the last node in GTi (Cn ) should be
equal to rd(x, r0 ) for some symbolic variable r0 . Therefore, GTi (IS(Rn )) and GTi (Cn )
190
The leaf of GTi (Cn ), is a node nr (labeled with (Ti , rd(x, r0 ))) that is involved in
prove that there is an interference edge from isom(nw ) to isom(nr ) (i.e., the leaf
of GTi (IS(Rn ))) in IS(Rn ). n is a model for TC(Cn ) that orders the nodes in
according to n . Since IS(Rn1 ) and Cn1 are isomorphic, n1 orders the nodes
in IS(Rn1 ) such that isom(nw ) be the last node writing to x. Therefore, there
The leaf of GTi (Cn ), is a node nr (labeled with (Ti , rd(x, r0 ))) that is not involved
in any interference edge in Cn . Let nw be the last node before nw in GTi (Cn ) such
that it is labeled with (Ti , rd(x, val)) for some val. We prove that isom(nr ) (i.e.,
the leaf of GTi (IS(Rn ))) is not involved in any interference edge in IS(Rn ). n
be the last node writing to x before nr . Since IS(Rn1 ) and Cn1 are isomorphic,
n1 orders the nodes in IS(Rn1 ) such that isom(nw ) be the last node writing
to x. isom(nw ) is labeled with (Ti , rd(x, val0 )) where val0 is equal to val where
each symbolic variable r is replaced by its map r0 . Therefore, the last write to x
is done by thread Ti and hence isom(nr ) is not involved in any interference edge.
of individual threads by path exploration. Therefore, for each branch br that is coverable by
sequential testing, either it is covered by the initial random test (at line 8), or there exists a
corresponding realizable ISC C in W 0 such that either sink(C) = br or sink(C) = br0 where
br0 is in thread Th(br) before br, and the generated test for C covers br. As a result, all writes
that can happen without any interference are added to the interference forest after processing
W 0.
realizable ISC C in W k whose generated test covers br. This implies that each write that
while processing W k .
Induction Step: Let br be an n-coverable branch. We prove that W n contains a realizable ISC
C0 whose generated test covers br (C0 could have a sink other than br). Let C be a realizable
ISC with n interferences and sink(C) = br. Suppose that = br0,1 , br0,2 , ..., br0,m0 , brk1 ,1 ,
brk1 ,2 , ..., brk1 ,mk1 , brk2 ,1 , ..., brkh ,1 , ..., brkh ,mkh is the sequence of branch nodes in GTh(br)
where bri,j represents the j th branch node that requires exactly i interferences according to
C to be covered. Note that the set of interferences required to cover a branch bri,j according
191
192
i |EI (CIS (C, bri,j ))|. For example, a read involved in an interference in GTh(br) before
bri,j might only affect a branch that comes after bri,j although it appears in EI (CIS (C, bri,j )).
According to , the first m0 branches are coverable during sequential testing, then the next
mk1 branches require (same) k1 interferences, then the next mk2 branches require (same) k2
interferences, and so on, where 0 < k1 < k2 < .. < kh = n, and br = brkh ,mkh . Let Ck1 , Ck2 ,
..., Ckh be the minimal realizable ISCs (which are subgraphs of C) for branches brk1 ,1 , brk2 ,1 ,
..., brkh ,1 , respectively. Each Cki is a subgraph of Cki+1 for all 1 i < h.
We first prove (by induction) that W n contains Ckh at some point during the execution of
Algorithm 3. Then, we show that an ISC C0 with sink(C0 ) = brkh ,i (for some 1 i mkh )
which is a super-IS of Ckh with exactly the the same set of interferences in Ckh , is put in W n
at some point during the execution of Algorithm 3, whose generated test covers br. Now,
we prove by induction that W kh = W n contains Ckh at some point during the execution of
Algorithm 3.
Induction Base: We show that W k1 contains Ck1 at some point during the execution of Algo-
rithm 3. Since br0,1 , br0,2 , ..., br0,m0 are coverable by sequential concolic testing, the algo-
rithm covers them through the initial path exploration. The test that covers br0,m0 , skips brk1 ,1
ing to brk1 ,1 will be added to forest and the algorithm generates an ISC I with degree 0 (i.e.,
I = CIS (forest, brk1 ,1 )) and inserts it in W 0 . Note that GTh(brk1 ,1 ) (I) is equal to GTh(brk1 ,1 ) (Ck1 ).
According to Ck1 , there should be k k1 reads in Th(brk1 ,1 ) in I before brk1 ,1 that are
required to be interfered with writes from other threads. Each of those writes requires < k1
interferences to happen. We order these reads based on the number of interferences their cor-
responding writes require. Suppose that each read node ri is interfered with wi that requires
wid interferences (for 1 i k), and w1d .. wkd . According to the induction hypothesis,
d
each write wi is added to the interference forest while the algorithm processes W wi .
While processing W 0 , the algorithm picks and removes I from W 0 at line 12. Since I is not
193
realizable, it will be added to UN 0 . If w1d = 0 then ExploreISCs (called at line 17) generates
an ISC for brk1 ,1 by using I and introducing an interference from w1 to r1 . If w1d > 0 then
d
ExploreISCs (called at line 23) generates the same ISC, while processing W w1 and after w1
calling ExploreISCs (called at line 17) where an interference is added from w2 to r2 . In the
d d
second case, since I 0 is not realizable, it will be added to UN w1 +1 while processing W w1 +1 .
d d
While processing W w2 and after w2d has occurred, the algorithm selects I 0 from UN w1 +1 and
generates the same ISC I 00 by adding an interference is added from w2 to r2 . In both cases,
the resulting I 00 will be added to W combined(w1 ,w2 )+2 where combined(w1 , w2 ) represents the
number of distinct interferences that are required for both w1 and w2 . This pattern continues
by the algorithm until the last read rk is interfered and by then the generated ISC is equal to
Induction Hypothesis: W ki contains Cki for all 1 < i < h at some point during the execution
of Algorithm 3.
Induction Step: W kh1 contains Ckh1 at some point during the execution of Algorithm 3.
While processing W kh1 , the algorithm generates a test that realizes Ckh1 and leads to cov-
ering the branch brkh1 ,1 . Assume that this test covers branches brkh1 ,1 , ..., brkh1 ,i for some
1 i mkh1 .
If i 6= mkh1 then the test skips the branch brkh1 ,i+1 . Therefore, a dangling node is added
to the forest according to brkh1 ,i+1 and an ISC I = CIS (forest, brkh1 ,i+1 ) for brkh1 ,i+1 is
generated and added to W kh1 which is also realizable. As a result, while processing W kh1 ,
Algorithm 3 generates a test for ISC I that covers brkh1 ,i+1 . Algorithm 3 continues path
exploration until all branches brkh1 ,1 , ..., brkh1 ,mkh1 are added to forest and covered by some
tests.
194
The test that covers brkh1 ,mkh1 (could be the test that realized Ckh1 in the first place if
i = mkh1 ) would skip brkh ,1 since it requires some additional interferences. Therefore, brkh ,1
is added as a dangling node in forest and the algorithm generates an ISC I with degree kh1
(i.e., I = CIS (forest, brkh ,1 )) and inserts it in W kh1 . Note that I is a sub-IS of Ckh . According
to Ckh , there should be k kh = n reads in I which are not involved in any interference but
are interfered in Ckh . These reads are interfered with writes that require < n interferences to
happen. We order these reads based on the number of interferences their corresponding writes
require. Suppose that each read node ri is interfered with wi that requires wid interferences (for
1 i k), and w1d .. wkd . According to the induction hypothesis, each write wi is added
d
to the interference forest while the algorithm processes W wi .
While processing W kh1 , the algorithm picks and removes I from W kh1 at line 12. Since
I is not realizable, it will be added to UN kh1 . If w1d kh1 then ExploreISCs (called at
line 17) generates an ISC for brkh ,1 by using I and introducing an interference from w1 to r1 .
If w1d > kh1 then ExploreISCs (called at line 23) generates the same ISC while processing
d
W w1 after w1 occurred, by selecting I from UN kh1 and introducing an interference from w1
W combined(I,w1 )+1 where combined(I, w1 ) represents the number of the distinct interferences
Now, either w2d < W combined(I,w1 )+1 or w2d >= W combined(I,w1 )+1 . In the first case, the
algorithm picks and removes I 0 from W combined(I,w1 )+1 (while processing W combined(I,w1 )+1 ) at
line 12 and generates an ISC I 00 by calling ExploreISCs (called at line 17) where an interfer-
ence is added from w2 to r2 . In the second case, since I 0 is not realizable, it will be added
to UN combined(I,w1 )+1 while processing W combined(I,w1 )+1 . After w2d has occurred while pro-
d
cessing W w2 , the algorithm selects I 0 from UN combined(I,w1 )+1 and generates the same ISC I 00
by adding an interference from w2 to r2 . In both cases, the resulting ISC I 00 will be added
to W combined(I,w1 ,w2 )+2 where combined(I, w1 , w2 ) represents the number of distinct interfer-
ences that are in I or are required for w1 or w2 . This pattern continues by the algorithm until
195
the last read rk is interfered and by then the generated ISC is equal to Ckh and will be added to
W kh = W n .
We have proved that W n contains Ckh which we know is realizable. Therefore, while
processing W n , the algorithm generates a test for Ckh that covers brkh ,1 . Assume that this test
covers branches brkh ,1 , ..., brkh ,i for some 1 i mkh . If i = mkh then the proof is complete
and C0 = Ckh is the ISC in W n whose generated test covers br. If i 6= mkh then the test skips
the branch brkh ,i+1 . Therefore, a dangling node is added to the forest according to brkh ,i+1
and an ISC I = CIS (forest, brkh ,i+1 ) for brkh ,i+1 is generated and added to W n which is also
realizable. As a result, while processing W n , Algorithm 3 generates a test for ISC I that covers
brkh ,i+1 . Algorithm 3 continues path exploration until all branches brkh ,1 , ..., brkh ,mkh are added
to forest and covered by some tests. Here as well, W n contains a realizable ISC C0 , at some