Beruflich Dokumente
Kultur Dokumente
2, 2018
2016
Abstract: Software testing is prime concern for the software industry and researchers. In the
software testing process test cases play an important and significant role. Optimisations of test
cases are essential to test the software effectively. Finding maximum number of faults and
rectifying them before actual software release is most complex and critical during software
development process. This paper deals with software test case optimisation using bacteriologic
algorithm (BA) and requirement mapping-based approach. Test case optimisation deals with
selecting effective test cases having maximum code coverage and fault detection capability,
consequently minimising and prioritising the test cases.
Keywords: test suite; test case optimisation; software development life cycle; SDLC; genetic
algorithm; GA; bacteriologic algorithm; BA.
Reference to this paper should be made as follows: Srivastava, P.R. (2016) 8 ‘Test case
optimisation a nature inspired approach using bacteriologic algorithm’, Int. J. Bio-Inspired
Computation, Vol. 8, No. 2, pp.122–131.
2 Background and motivation work overhead of generating mutants (in case of mutation-based
testing) and testing each of them.
The aim of software engineering is desirable to develop
This paper here presents a BA and requirement
high quality software in a cost-effective manner. In this
mapping-based approach for test case optimisation in case
regard software testing is an important and complex phase
of regression testing. It searches for fittest bacteria globally,
during the software development life cycle (SDLC). It is
i.e., in all generations.
estimated that effort consumed by software testing process
is nearly 50% of the total software effort (Pressman, 2001;
Rothermel et al., 2001). This is the reason and motivation
3 Definitions and concepts
for finding this work. From past few years, various
researchers have been engaged in finding methods for test This section explains some useful terms and already
case minimisation and prioritisation, so that the whole established concepts in context of optimisation of test cases.
system can be effectively tested in given time. The first The optimisation here involves minimising and prioritising
approach used for test suite minimisation is the classical the test cases explained as follows:
Greedy approach (Cormen et al., 2001). This uses a function
that finds out the test cases in test suite that satisfy most of 3.1 Test case minimisation (Harrold et al., 1993)
the requirements and removes those requirements from the
requirement set. This process is continued recursively until Initially, a set of test cases are present with the tester as this
all the requirements are satisfied. technique is designed to work for regression testing.
Harrold et al. (1993) (HGS) proposed a technique which Minimisation of the test cases is carried out using BA
selects a set of test cases from a test suite that could keeping two considerations. First, the test suites created
represent the entire test suite in terms of coverage, i.e., it should consist of minimum number of test cases with
will have the same coverage as that of test suite. maximum branch and fault coverage. And second the
The main drawback in above two approaches is that the number of test suites should be optimal.
fault detection capability of test suites has been reduced.
Wong et al. (1995) suggested that keeping all-uses 3.2 Test case prioritisation (Rothermel et al., 2001)
coverage constant, if test suite’s size is reduced, then little
or no reduction in its fault detection effectiveness is In regression testing large number of test suits are present
examined. but due to strict time constraint, it is not feasible to test each
Von Ronne (1999) generalised the HGS algorithm for and every test suit over the code. So it is required to arrange
support each conditions in multiple times. the test suits according to some criteria. This paper proposes
In contrast to previous studies, Rothermal et al.’s (1998, prioritisation of test suits on the basis of branch coverage
2001) empirical study showed that minimisation of test and fault detection capability.
suites severely affects their fault detection capabilities.
Tallam and Gupta (2005) developed another heuristic 3.3 Basic GA
called the delayed-greedy strategy to minimise the test Holland (1992) first introduced the concept of GAs. These
suites while considering the coverage requirements. algorithms were influenced by Darwin’s theory of
Then, the next significant work in the field of the evolution. GAs automatically search in the input area for
software testing was the use of genetic algorithm (GA) for data with the maximum usage. GAs requires a fitness
test case optimisation. These were developed by Holland function, capable of calculating the quality of chromosomes
(1992), based on the process of evolution of biological (a chromosome is a collection of genes).
organisms. Krishnamoorthi and Mary (2009) have proposed Figure 1 shows the basic GA.
a test suite prioritisation technique for regression testing
using GA, but the problem here with GA is that it stuck into Figure 1 Basic GA flow diagram
the local optimum result, i.e., it does not searches for best
optimal result in all generations. And also they have done
time-based prioritisation which requires the extra overhead
of calculating the execution time of every test case.
A recent kind of evolutionary method adapted from GA
is called BA (Baudry et al., 2005), which is inspired by the
nature of bacteria. Some researchers have shown that this
variant of GA can also be used for test case optimisation
and performs better than GA. Rad et al. (2010) have used
genetic and BA for optimising testing data in mutation
testing. Similarly Baudry et al. (2005) have proposed an
approach for test case optimisation in mutation-based
testing using both genetic and BA. Their comparison study GAs use selection, crossover and mutation genetic
shows that BA performs better than GA, but here it requires operators.
124 P.R. Srivastava
1 Selection: in this process two chromosomes are The minimisation algorithm which uses these matrices
selected for crossover and mutation. Chromosomes decides the initial test suite size and then test suites are
with high fitness value have high probability of generated by randomly selecting the test cases. These form
selection. the initial population for bacteriological algorithm. Then,
selection and mutation operations are applied and new
2 Crossover: this gene operator selects a random
descendants are formed. This process is repeated until
crossover point and makes new descendants by
stopping criteria is met.
crossover of parents. The simplest way is one-point
The fitness function, fit used here for calculating fitness
crossover, which selects a gene as crossover point. The
of test suites is the metric inferred as sum of product of
parents gene before or beyond the crossover point are
respective weights with bc and fda which can be directly
exchanged arbitrarily, resulting two new descendants.
obtained from the two tables formed by requirement
3 Mutation: crossover process is followed by mutation mapping.
process, which randomly changes the new descendants.
fit = w1 * bc + w 2 * fda
Next part shows the algorithm for the proposed approach 4.2 Proposed algorithm
and then a case study is taken which thoroughly explains the
The below algorithm shows the steps involved in proposed
algorithm.
approach for test case minimisation and prioritisation. It
takes two inputs as
126 P.R. Srivastava
1 source code by the tester by carefully examining the code and nature of
the program. The stopping criteria can be set as number of
2 initial set of test cases.
generations as far as we are getting bacteria of required
The output is minimised and prioritised set of test suites. fitness. The weights w1 and w2 depends upon the nature of
Rb is the set of extracted branches from the source code program, like for ‘type of triangle’ shown in Table 5, only
and Rc is the set of extracted variable def-use pair. Tablebc is branches, i.e., p-use were there and no def-use were found.
the branch coverage table and Tablefda is the variable So w1 should be given more weight than w2. Def-use table
def-use table. TS is a test suite which gives the optimal will not be constructed for this and value of w2 can be set to
length L here. The weights w1 and w2, both cannot be zero zero. The rounds of mutation per generation and the
at the same time. If instead of test cases, test suites are memorisation threshold can be set by the tester. As we have
present then the steps 5, 6 and 7 becomes insignificant. examined the mutation rounds for each generation should be
more than the number of test suites, which ensures that
Input: source code, S almost every test suite gets mutated in a generation. The
Set of initial test cases, ti memorisation threshold depends on the number of bacteria
Output: minimised and prioritised test suites. and upto what fitness value are to be memorised for each
Algorithm: generation. This algorithm can be more understood by
taking an example. The next part presents a case study
1 Extract branches as requirements from source code S
which explains the algorithm thoroughly.
Rb = {B1, B2, B3....Bn}
2 Map initial test cases, ti, with Rb
4.3 Case study
Tablebc = map_fun(ti, Rb)
For thorough explanation of the proposed approach, a
3 Find define-use of different variables, from source code S
source code of binary search (written in C) is taken from
Rc = {a1(line x, line y)....an(line x, line y)}
‘Data Structure using C and C++’ by Tanenbaum (2008),
4 Map initial test cases, ti, with Rc which is given as input to the algorithm and returns the
Tablefda = map_fun(ti, Rc) position of the element to be searched in the array.
5 Find a test suite TS with maximum branch coverage and
Returns pointer to the position of seeked number, val
fault detection ability
int * binary_search ( intval)
TS = find(Tablefda, Tablebc, ti)
{
6 Find optimal length of test suites as
unsigned int L = 0, R = array_length(arr), M;
L = find_length(TS)
while (L <R)
7 Generate random test suites TSi = {TS1, TS2,....TSn}of
length L from ti {
8 Apply bacteriologic algorithm over TSi M = (L+R− 1)/2;
8.1 Calculate fitness for each bacterium in TSi if (val = = arr [M])
fiti = w1 * bci + w2 * fdai return arr+M;
where bc = branch coverage of bacterium (obtained else if (val<arr [M])
from Tablebc) R = M;
fda = fault detection ability (obtained from Tablefda) else
w1 and w2 are weights ranging between 0 and 1. L = M+1;
8.2 Arrange all bacterium in TSi in descending order of }
fiti
return NULL;
8.3 Apply mutation
}
8.4 Recalculate fiti
Source code S
8.5 Memorise fittest bacteria
8.6 Add to next generation In the above source code L, R and val are variables for
8.7 Repeat from 8.1 until stopping criteria is not met. which the set of the initial test cases (ti) are given as second
(stopping criteria here can be number of generations) input to the algorithm:
9 Rearrange all memorised bacteria in descending order of
Test Cases (ti):
fiti, which prioritises them.
Variables: L, R, val
The above algorithm has certain considerations like the
stopping criteria, the memorisation threshold (Baudry et al., The sample array, arr[] = –14, –7, 3, 8, 12} is taken for
2005), i.e., the maximum number of bacteria of each designing the test cases(ti) which are shown in Table 1.
generation to be memorised, the weights w1 and w2, number
of mutation rounds per generation. These are to be decided
Test case optimisation a nature inspired approach using bacteriologic algorithm 127
Table 1 Set of test cases for source code S Table 2 Branch coverage table (Tablebc)
t17 t18 0, 5, 9 X X
0, 5, –8
t19 0, 5, 3 X X X
t18 0, 5, 9
t20 0, 5, 5 X X
t19 0, 5, 3
Note: The ‘X’ in column Bi and row ti depicts that
t20 0, 5, 5 branch Bi is covered by corresponding test case ti.
Now further process involves the following steps: Step 3: Extract variable def-use pair from source code S as
Step 1: Extract branches from source code S as: V (line x, line y), where V is a variable which is defined at
line x and used at line y. Here the variable-def means that
1. int * binary_search ( intval)
the variable is either initialised by some value which can be
2. { a constant or result of any expression, and the variable-use
3. unsigned int L = 0, R = array_length(arr), M; means that where that variable is used in some computation.
4. B1 while (L <R) In source code S the following def-use pairs are
5. {
extracted-
6. M = (L+R− 1)/2; Rc = {L(3, 6), R(3, 6), M(6,8), M(6,10), M(6,12)}
7. B2 if (val = = arr [M])
E.g., L(3,6) depicts here that variable L is defined at line
8. return arr+M;
number 3 and then used at line number 6.
9. B3 else if (val<arr [M])
Step 4: Map initial test cases ti (from Table 1) with Rc to
10. R = M;
form Tablefda (fault coverage table). This is shown in
11. B4 else Table 3.
12. L = M+1; The ‘X’ in Table 3 depicts that a variable’s definition
13. } and use both are covered by corresponding test case ti.
14. return NULL; For example ‘X’ in row t1 and corresponding column L
(3, 6) is showing that test case t1 is covering both definition
15. }
(at line 3) and use (at line 6) of L.
So, the set of branches Rb = {B1, B2, B3, B4}
Step 2: Map initial test cases ti (from Table 1) with Rb and
form a branch coverage Tablebc. This is shown in Table 2.
128 P.R. Srivastava
Table 3 Fault coverage table (Talefda) If the tester is only having test cases, not test suites then he
requires to calculate the optimal size of a test suite,
Testcase/def-use L(3, 6) R(3, 6) M(6, 8) M(6, 10) M(6, 12)
otherwise the calculations involved in step 5, step 6 and
t1 X X X X step 7 are not required.
t2 X X X X
Step 8: Apply BA on test suites generated after step 7, for
t3 X X X number of generations until stopping criteria is not met. The
t4 X X X stopping criteria can be number of generations here. For
t5 X X X every generation, test suites having highest fitness are
t6 X X X X selected by the selection operator and mutation is applied
over them. The mutation operator randomly exchanges test
t7 X X X
cases between test suites. This process is repeated several
t8 X X X times for every generation.
t9 X X X For example during this process, at any time let the
t10 X X X selection operator chooses two test suites (bacteria) as
t11 X X X X TS4 = {t4, t9} and TS7 = {t13, t14}. The mutation operator
chooses to exchange the first test cases of these test
t12 X X X X
suites. So after mutation the new test suites comes as
t13 X X X TS4 = {t13, t9} and TS7 = {t4, t14}. This process of
t14 X X X selection and mutation is carried out several times and new
t15 X X X test suites are created. Now after every generation the
t16 X X X X fitness calculation of the generated test suites is required.
For example:
t17 X X X
t18 X X X Fitness calculation of TS4 = {t13, t9} :
t19 X X X X
Test case t13 covers branches B1 and B3; test case t9 covers
t20 X X X
branches B1 and B4 (from Table 2). So the test suite TS4
Step 5: By using minimisation algorithm, find a test suite covered 3 branches out of total 4 branches. So branch
which has maximum branch coverage and fault detection coverage of TS4 will be:
ability. For this both the tables, i.e., Tables 2 and 3 can be bc = (3 / 4) *100 = 75%
combined and classical Greedy approach (Cormen et al.,
2001) can be applied. It finds out the test cases that satisfy Similarly, t13 covers L(3, 6), R(3, 6), M(6, 10) and t9
most of the requirements and removes those requirements covers L(3, 6), R(3, 6), M(6, 12) (from Table 3). So the test
from the requirement set. This process is continued suite TS4 covered 4 def-use pairs out of total 5, so fault
recursively until all the requirements are satisfied, e.g., here detection ability (fda) of TS4 will be:
test case t1 covers branches B1, B2, B3 (from Table 2) and fda = (4 / 5) *100 = 80%
def-use L(3, 6), R(3, 6), M(6, 8), M(6, 10) (from Table 3).
Now these requirements are removed and search for test Let the weights taken here be w1 = 1 and w2 = 0, i.e., it
case which can cover rest of the requirements. Then test tends to cover ‘all p-use and some c-use’ here. Therefore
case t2 covers branches B1, B2, B4(from Table 2) and fitness value of test suite TS4 will be
def-use L(3, 6), R(3, 6), M(6, 8), M(6, 10) (from Table 3).
So both t1 and t2 are covering all branches and def-use. fit = (1* 75) + (0 *80) = 75%
They both can form a test suite with hundred percent bc and The most fit bacteria (test suites) of each generation will be
fda. We get a test suite TS as TS = {t1, t2}. memorised according to a memorisation threshold
Step 6: Now find length of TS (number of test cases in TS), (maximum number of bacteria of each generation to be
which will give an optimal test suite size, L = length (TS), memorised) which is determined by the tester and finally
here L = 2. these bacteria are arranged in descending order of their
fitness value, which gives minimised and prioritised test
Step 7: Form initial test suites of size L by randomly taking suites according to branch coverage and fault detection
the test cases from ti. capability.
Here the following 10 test suites of size L = 2 are For this case study of binary search, the stopping criteria
formed for the initial test case (ti): were set as number of generations. We had run it for ten
TS1 = {t1, t6} , TS2 = {t2, t7} , TS3 = {t3, t8} , TS4 = {t4,generations.
t9} , Here weights w1 is set to one and w2 as zero.
Although the weights values shows that we are inclining
TS5 = {t5, t10} , TS6 = {t11, t12} , TS7 = {t13, t14} , towards branch coverage, yet first the results from Table 4
TS8 = {t15, t16} , TS9 = {t17, t18} , TS10 = {t19, t20} . showing that first six test suites covered all branches and
def-use.
Test case optimisation a nature inspired approach using bacteriologic algorithm 129
As a result of an execution we have got ten test suites as proposed algorithm five different source codes have been
shown in Table 4. tested and Table 5 shows the analysis that BA is performing
better than GA in some aspects.
Table 4 Test suites comes as a result of BA The first example in Table 5, i.e., ‘type of triangle’ has
no def-use variables; it had only p-uses. The number of test
Test suites Test cases bc fda fit
cases are selected according to worst case testing, i.e., by 5n
TS1 ts11, ts6 100 100 100 where n is the number of variables for which test cases are
TS2 ts3, ts12 100 100 100 to be designed. In our case the ‘bubble sort’ can have 3,125
TS3 ts1, ts14 100 100 100 test cases i worst case, but for feasibility we have designed
TS4 ts11, ts18 100 100 100 625 test cases. Graphical representation of percent reduction
in test cases by GA and BA (from Table 5) is shown in
TS5 ts17, ts6 100 100 100
Figure 5.
TS6 ts1, ts18 100 100 100 Figure 5 shows graphically the reduction in test cases
TS7 ts1, ts7 75 80 75 after applying bacteriologic and GA. Clearly the
TS8 ts15, ts12 75 80 75 performance of BA is better than GA for every sample
TS9 ts5, ts18 75 80 75 program. For, e.g., for ‘type of triangle’ program, initial test
cases were 125. BA reduced it to 15 test cases and GA
TS10 ts4, ts9 50 60 50
reduced it to 45 cases. All the reduced test cases were
The above test suites are prioritised according to fit value prioritised according to their fitness values (fit). This
and shown in graphical form in Figure 4. This graph is analysis shows that the proposed approach works well for
showing the prioritised test suites and their corresponding regression test case optimisation. Introduction of
bc, fda and fit values. memorisation function in BA makes it capable to search for
From Figure 4, it can be observed that the first six test fittest bacteria globally, i.e., in all generations whereas GA
suites we got have 100% branch coverage (bc) and fault does the local search. The approach described in this paper
detection ability (fda). Further to check the feasibility of the gives an efficient method to minimise and also prioritise the
test cases.
Figure 4 Graph showing test suites from Table 4 and their fitness values
Title Loc Branch Def-use Number of test cases % Reduction after BA % Reduction after GA
Type of triangle 25 8 0 125 88 64
LCM 17 3 8 25 92 80
Greatest number 15 3 3 125 88 84
Bubble sort 20 5 10 625 84 72
Armstrong number 23 3 8 25 92 84
130 P.R. Srivastava
Figure 5 Graph showing reduction of test for five sample programs from Table 5
Trunfio, G.A. (2014) ‘Enhancing the firefly algorithm through a Wong, W.E., Horgan, J.R., London, S. and Mathur, A.P. (1995)
cooperative coevolutionary approach: an empirical study on ‘Effect of test set minimization on fault detection
benchmark optimisation problems’, Int. J. of Bio-Inspired effectiveness’, Proceedings in Seventeenth international
Computation, Vol. 6, No. 2, pp.108–125. conference on Software Engineering, Seattle, Washington,
Von Ronne, J. (1999) Test Suite Minimization: An Empirical USA, pp.41–50.
Investigation, Bachelor thesis, June.