Sie sind auf Seite 1von 6

Generation of Test Data Using Meta Heuristic Approach

1
Praveen Ranjan Srivastava, 2Vinod Ramachandran, 3Manish Kumar, 4Gourab Talukder,
5
Vivek Tiwari, 6Prateek Sharma
1
Lecturer (computer science)
2-6
Under Graduate Students
Computer Science and Information System Group, BITS Pilani 333031 INDIA
praveenrsrivastava@gmail.com

Abstract: Software testing is of huge importance to 2.1. Main Components of Genetic Algorithm:
development of any software. The prime focus is to
minimize the expenses on the testing. In software testing Initialization: Initially many individual solutions
the major problem is generation of test data. Several
are randomly generated to form an initial population.
metaheuristic approaches in this field have become very
popular. The aim is to generate the optimum set of test The population size depends on the nature of the
data, which would still not compromise on exhaustive problem. Traditionally, the population is generated
testing of software. Our objective is to generate such randomly, covering the entire range of possible
efficient test data using genetic algorithm and ant colony solutions (the search space).
optimization for a given software. We have also Selection: Individual solutions are selected through
compared the two approaches of software testing to a fitness-based process, where fitter solutions (as
determine which of these are effective towards generation measured by a fitness function) are typically more
of test data and constraints if any. likely to be selected. Certain selection methods rate the
fitness of each solution and preferentially select the
Key Words: Software Testing, Genetic Algorithm (GA),
Ant Colony Optimization (ACO), Fitness Function best solutions. This helps keep the diversity of the
population large, preventing premature convergence on
poor solutions. Popular selection methods include
1. Introduction to Software Testing roulette wheel selection and tournament selection.
Reproduction: The next step is to generate a
Genetic algorithm has been very popular over the
second generation population of solutions from those
past few years for its use in software testing. It helps in
selected through genetic operators: crossover and
efficiently testing any given piece of software and in a
mutation.
short time [1]. This is one of the main benefits of
By producing a "child" solution using the above
evolutionary testing approaches. Another modern
methods of crossover and mutation, a new solution is
approach to software testing is to use Ant Colony
created which typically shares many of the
Optimization. The generation of test data with the help
characteristics of its "parents". New parents are
of artificial ants is very useful.
selected for each child, and the process continues until
a new population of solutions, with increased average
2. Genetic Algorithm fitness value of appropriate size is generated.
Termination: This generational process is repeated
Genetic Algorithm (GA) is an evolutionary until a termination condition has been reached.
searching mechanism used to approximate solutions to Common terminating conditions are:
optimization and search problems [3]. Genetic ƒ A solution is found that satisfies minimum criteria.
Algorithm follows series of steps which actually are ƒ Fixed number of generations reached
part of the evolution such as population generation, ƒ The highest ranking solution’s fitness is reaching
mating, cross over and mutation which proceed or has reached such that successive iterations no
towards generation of fitter offspring or solutions. longer produce better results.

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on February 20,2010 at 19:17:03 EST from IEEE Xplore. Restrictions apply.
3. Ant Colony Optimization evaluation of all possible transitions from the
current states α to other neighboring states,
Ant Colony Optimization (ACO) algorithm is using the state-transition table associated with
inspired by observation on real ants. Individually each the UML Statechart diagram
ant is blind, frail and almost insignificant yet by being ƒ Sense the Trace - For the non-negative
able to cooperate with each other the colony of ants connections in T, the ant senses and gathers
demonstrates complex behaviors. One of these is the the corresponding pheromone levels P at the
ability to find the closest route to a food source or other ends of the connections
some other interesting land mark. This is done by 2. Move to next vertex
laying down special chemicals called pheromones. As ƒ Select Destination - The following prioritized
more ants use a particular trail, the pheromone rules are used in ant's selection:
concentration on it increases hence attracting more ants i) Select the vertex Vi with the lowest
[6]. pheromone level P(Vi) sensed from the
ACO like any other metaheuristic algorithm [2], current vertex α.
which, in order to escape from local optima, drive ii) If vertices Vi and Vj shares the same
some basic heuristic: either a constructive heuristic lowest pheromone level P(Vi) = P(Vj), but
starting from a null solution and adding elements to T(Vi) = 0 and T(Vj) = 1, select Vi.
build a good complete one, or a local search heuristic iii) If vertices Vi and Vj shares the same
starting from a complete solution and iteratively lowest pheromone level P(Vi) = P(Vj) and
modifying some of its elements in order to achieve a T(Vi)= T(Vj), randomly select one vertex.
better one. The metaheuristic part permits the low level Destination β is the vertex selected using the above
heuristic to obtain solutions better than those it could rules.
have achieved alone, even if iterated. Usually, the ƒ Update Pheromone - Update the pheromone level
controlling mechanism is achieved either by for the current vertex α.
constraining or by randomizing the set of local The pheromone levels at the vertices are adjusted
neighbor solutions to consider in local search, or by using the following formulas:
combining elements taken by different solutions. The P(α) = max(P(α), P(β)+1) if T(β) = 1
particular way of defining components and associated or
probabilities is problem-specific, and can be designed P(α) = max(P(α), P(β)+1)+TP if T(β) = 0
in different ways, facing a trade-off between the Where TP is a high pheromone level which decays
specificity of the information used for the conditioning in one iteration of the steps, namely, TP quickly decays
and the number of solutions which need to be to 0 before ant's next move at the end of Step 2
constructed before effectively biasing the probability Move - Move to the destination vertex β, set α = β,
distribution to favor the emergence of good solutions. and return to Step 1
Here in ACO, set of computational concurrent and .
asynchronous agents (a colony of ants) moves through 4. Previous works in Genetic Algorithm
states of the problem corresponding to partial solutions and Ant Colony Optimization:
of the problem to solve. They move by applying a The Genetic Algorithms have been used for several
stochastic local decision policy based on two optimization problems. The genetic algorithms have
parameters, called trails and attractiveness. By moving, been used to generate test plans for functionality
each ant incrementally constructs a solution to the testing [3] i.e. is to verify if the software satisfies
problem. When an ant completes a solution, or during design and functional suitability criteria. [4]
the construction phase, the ant evaluates the solution More work has been done using Genetic Algorithms
and modifies the trail value on the components used in in formal concept analysis to generate branch coverage
its solution. This pheromone information will direct the test data automatically, which supports automatic test
search of the future ants. data generation. [5]
Applying the Ant Colony Optimization Algorithm Ant Colony Optimization has been used similarly
Let α be the current vertex. used for test data generation. The “all state testing
1. Evaluation at vertex α coverage” requirement is commonly used in state-
ƒ Update the Track - Push the current vertex α based software testing. A test suite is said to achieve all
into the track set S states coverage if every state is accessed at least once
ƒ Evaluate Connections - Evaluate all by a test case within. On the basis of this simple fact
connections to the current vertex α to Ant Colony Optimization has been used with several
determine T. The procedure involves variations to cover various problems [2].

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on February 20,2010 at 19:17:03 EST from IEEE Xplore. Restrictions apply.
Ant Colony Optimization has been used to solve 9. count++;
several other optimization problems like Traveling 10. }
Salesman Problem to obtain better results than genetic 11. }
algorithms even. 12. }
5. Test Data Generation Approach: 13. }
Figure 1.Bankers Algorithm with one resource
Both Genetic Algorithm and ACO have been used unit
to generate test data [6] and the backbone has been the The Fig1. is the code for banker’s algorithm for a
same for both. The case taken here is that of Resource single resource. The variable work keeps track of the
Request Algorithm which is used for the resource number units of the single resource that are available
allocation by operating system to the processes in the for a given process. The array allocation keeps track of
execution cycle. Using Genetic Algorithm and ACO the number of units of the resource that are allocated
we try finding suitable set of test data which covers the for each process. The variable need keeps track of the
need for each process. number of units required by the process to go to
The backbone of the genetic process is the fitness completion. The variable finish is a flag which keeps
function [7] which counts the number of times a track of the status of the process. The variable count
particular data enters and continues in the resource keeps track of the number of processes that have gone
request algorithm and the higher the value is, higher is to completion.
the chances of avoiding a deadlock. The test data with
higher values of count is taken and genetic crossover
and mutation is followed to yield better results. The
poor test data is removed each time.
In the Ant Colony Optimization approach, the CFG
(Control Flow Graph) is drawn and the various paths
of flow are generated using ant optimization. The
pheromone generating function is such that it yields
paths with maximum number of nodes. The test data
satisfying this path constitutes the required set of test
data.

6. Resource Request Algorithm

The Resource Request algorithm is a resource


allocation deadlock avoidance algorithm, which
incorporates in itself the Banker’s algorithm [8]. The
Banker’s algorithm, developed by Edsger Dijkstra tests
for system safety at a given instance of time. The Fig.2.Control Flow Graph
resource request algorithm performs a mock allocation The Fig.2 is the control flow graph for the given
as asked for by a given process and then makes a "safe- code of resource request algorithm.
state" check to test for possible deadlock conditions
post-allocation. In case the allocation does not lead the 7. Test Data Generation using GA:
system to a deadlock, it allocates the resources as
asked for by a process. The genetic Algorithm can be used to generate the
optimized set of Test data for the given problem. Here
the problem taken is Resource Allocation Algorithm.
1. int work, count=0, i, allocation[n], need[n], The relevant code is listed in the figure.
finish[n]; Initialization: Initial set of test data will be in two
2. while (count<n) arrays that are the allocation array and the need array.
3{ The allocation array has the listing of resources for the
4. for(i=0;i<n;i++) n processes. And the need array lists the total resources
5. { each of the n processes need. The initial population
6. if(finish[i]=="False" && need[i]<=work) will be from set of positive integers. The validity of the
7. { work = work + allocation[i]; population will depend upon the limit to the number of
8. strcpy(finish[i],"True")' resources and also the need for the particular process.

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on February 20,2010 at 19:17:03 EST from IEEE Xplore. Restrictions apply.
So, initially taking care of the limit of need for each Table 1
process random numbers are generated and filled in the Process Allocation Need Fitness
array as initial test data.
Selection: The selection procedure is defined by the P0 0 7 08
fitness function for the particular genetic algorithm P1 2 1 01
problem. The fitness function will have to evaluate the P2 3 6 10
strength of test data in terms of how many decision P3 2 2 04
nodes it went through and still gave results.
P4 0 4 06
So, if we are to design the fitness function. We need
to have a stub in the fitness function which gives the Table 2:
output as the program for the given test data. In short Pro Need Fitness Random Cross Random Mutated
cess no. Over no for Needs
we can say the fitness function contains the test oracle. mutation
Another evaluation constraint is to find the number of
P2 0110 10 2 0111 3 3
nodes covered by the test data. For that each decision
nodes is given a weight. The time to evaluate the node P0 0111 08 2 0110 1 7
(to execute the instruction at that node) can be one In the table above, Fitness Value’s for the processes
specific criterion. are calculated on the basis of the Banker’s Algorithm.
In the problem for each n in the allocation array we For example, let us calculate the Fitness Value for
need to find the fitness value. We take a dummy process P3.
variable k to denote the passes made to a decision node
for a particular value of n. The while loop will be Let us assume that that total available instances of the
covered once for each n. k becomes 1. For each resource is 10. Since processes P1, P2, and P3 have
allocation[i] we evaluate the value of k inside the if already been allocated 2, 3, and 2 units of the resource
block as well as the for block. We need to define an respectively; Net Available Instances of the resource
array of k here (to help each of the n data in the set). can be calculated as Total Instances – Currently
For each of the n if it crosses the for loop its k value is Allocated Instances. Hence, Net Available Instances of
incremented. Again if it goes inside the if block an the resource are 3. Let the variable K be used to keep
increment is made in k. track of the fitness values. Initially K is set to zero.
After the execution, for each of the n allocation we Each time we enter the inner for-loop or the if-block K
have got the value of k. The maximum value of k has a is incremented by 1. The value of K when a process
corresponding value in allocation array which has enters the if-block is taken to be its Fitness Value.
traversed the decision nodes most and hence had better Thus process P0’s for-loop iteration increments K by 1,
checked the conditions. Again the output should match as do processes P1 and P3. Additionally, process P1
the test oracle output. gets executed and hence its if-block also increments K
Reproduction: We sort the k array as well as the by1. Hence, process P3’s Fitness Value is equal to 4,
allocation array. We sort the first 2 values in the the value of K at the beginning of process P3’s if-
allocation array, which corresponds to the most k block. This same procedure is used to determine
values. The binary representation of the integers is Fitness Value’s for the other processes.
taken and a cross over is performed at some random
point. The output is checked to be in range defined by The safety sequence exists if all the processes go to
the number of resources. In alpha=15% times a completion. In our analysis we have taken the fitness
mutation is also performed to the output results at values based on the number of times a process strives
some random point in the binary string. The whole to go to completion. The process with the largest
cycle is performed with the new set of integers. fitness value is completed at the end. Hence the last 2
Termination: After a trial of twice the value of n, processes in our safety sequence are the most fit in the
we stop the genetic algorithm. We have now got in the current mating pool and hence we apply genetic
allocation array the set of data which traverses the algorithm on them to obtain the next off springs. We
whole code most number of times hence proving to be have used this genetic operation twice the number of
best set of test data. processes to finally get feasible results.
Given below is an example explaining our approach The next generation of test data after the process of
for generation of test data using genetic algorithm. cross over and mutation seem to be fitter.
We initially have 5 processes namely P0, P1, P2, P3
and P4.
The initially Allocation array: {0, 2, 3, 2, 0}.

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on February 20,2010 at 19:17:03 EST from IEEE Xplore. Restrictions apply.
8. Test Data Generation using Ant Colony factor to get the paths. The other two variables will
Optimization help later sieve the test data generated by ACO. In our
case, the optimal path is the one that makes the ant
traverse the largest number of nodes. This is because
the process (ant), which finishes in the end, traverses
the maximum number of nodes, hence ensuring a
safety sequence.

9. Co-relation between Ant Colony


Optimization and Genetic Algorithm
The Genetic Algorithm is used to give improved
results on the basis of checking the various decision
nodes covered by the test data in the program. More
the decision nodes covered better is the test data.

Fig 4: Various Path possible are shown


Using Ant Colony Optimization [9], we generate
various paths from the Start Node 1 to the Finish Node
13. The optimized set of test data should be such that it
covers all the edges in the CFG. For this we assign a
flag to each of the edge in the CFG. And as the ant Fig 3: Relationship between GA and ACO.
passes through a particular edge, we give a value of 1
to the flag. The default value is 0. At the end for each In case of Ant Colony Optimization, we take the
test data we can sum up the flag value for each edge, control flow graph, and see the ant traversal based on
and we know the effectiveness of the test data. the pheromone content assigned to the edges. The
Now if there are two test data for which the path have pheromone here in this example is dependent on the
the same value (value means what we get by value of the test data, the need of the process to be
summation of flag bit), we do not know which portion precise. The ants traverse various paths between the
of the CFG they cover, they might be covering edges initial node of the CFG and the end node. The path
which are disjoint set of each other or they might cover with maximum coverage corresponds to some set of
the same set. So we have attached a value to each node. test data, which is the optimal set of test data.
let it be that value=1 is attached to node 1 , value of 2 Considering the GA and ACO approaches with the
is attached to 2nd node. Now if we have a test data, same population size N, the same number of Offspring
which has a value of 5, and covers 1-2-3-4-5, and other in GA or Ants in ACO, the same Fitness Values in GA
test data which covers 6-7-8-9-10, the sum of weight or Pheromone Values in ACO, and paying attention to
for one will be 1+2+3+4+5=15 and for other it will be the Genetic Iterations in GA and Pheromone
6+7+8+9+10=40, so we can differentiate between the Allocation in ACO, we come to the conclusion that the
set of data and consider that they are taking care of a two approaches are very much correlated. The above
better region. For getting path these two values will be figure explains this correlation
the consideration for the pheromone level at each edge. So, both these approaches work on the coverage by
The pheromone level will dynamically change as we test data and finally give us the optimized set of test
add another variable that will tell how many ants data.
traversed through the given edge. i.e. if the edge is
crossed once, we changed the flag bit to 1, but each
time an ant crosses, will tell the vulnerability of that
particular edge and its expectancy to be covered by any
test data. The more the expectancy the lesser is the
pheromone content for it. And this will be the deciding

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on February 20,2010 at 19:17:03 EST from IEEE Xplore. Restrictions apply.
10. Conclusion Conference on Automated Software Engineering (ASE),
2004.
We have here tried to generate test data by using [6] Software Test data generation using ant colony
GA and ACO both and find suitable test data. We have optimization, Huaizhong Li and C. Peng Lam, Transactions
used genetic algorithm to generate test data for the on engineering, computing and technology v1 December
Resource Request Algorithm. 2004 issn 1305-5313, world enformatika society
On our analysis, we generated various test data and
computed the number of valid or invalid test cases and [7] Efficient Software Test Case Generation Using Genetic
computed the success ratio for each of the approaches. Algorithm Based Graph Theory, Dr. Velur Rajappa et .al.,
We used ACO on the same set of data and computed First International Conference on Emerging Trends in
the success ratio. Engineering and Technology(IEEE),July 2008.

[8]Silberschatz et. al, operating system concept, sixth edition,


Wiley publication Singapore.

[9] K. Ayari, S. Bouktif and G. Antoniol, Automatic


Mutation Test Input Data Generation via Ant Colony,
GECCO’07, July 7–11, 2007, London, England, United
Kingdom. (ACM Press)

Fig 5: Comparison between ACO and GA.


From the graph we can see that ACO generates a
success ratio of about 53-55% and GA generates a
success ratio of 45-46%. Hence both these approaches
have a reasonable amount of success in generating test
data for software testing. ACO is a more modern
approach which has not yet been explored much
however it is seen to have a better success rate.

11. References
[1] Roger S.Pressman, “Software Engineering – A
Practitioner’s Approach”, Mcgraw-Hill, India, 2005.

[2] Ant Colony Optimization, Vittorio Maniezzo, Luca Maria


Gambardella, Fabio de Luigi, 2004

[3]Genetic Algorithm Based Software Testing,


Jarmo.T.Alander, TimoMantere, Department Of Information
Technology, University Of Vassa, Finland, 2005

[4] Using Genetic Algorithms to Generate Test Plans for


Functionality Testing, Francisca Emanuelle Veira Francisco
Martins Rafael Silva et al. (NATUS Project, Brazil), ACE SE
March 2006.

[5] Using a Genetic Algorithm and Formal Concept Analysis


to Generate Branch Coverage Test Data Automatically,
Susan Khor and Peter Grogono of Department of Computer
Science, Concordia University, 19th International

Authorized licensed use limited to: The University of British Columbia Library. Downloaded on February 20,2010 at 19:17:03 EST from IEEE Xplore. Restrictions apply.

Das könnte Ihnen auch gefallen