Sie sind auf Seite 1von 27

Nov 6, 2008

Presented by Amy Siu and EJ Park


Application Application
Release 1 Release 2

R1 Test R2 Test
Cases Cases

 Validate modified software


 Often with existing test cases from previous
release(s)
 Ensure existing features are still working
Regression testing is expensive!
2
 A strategy to
◦ Minimize the test suite
◦ Maximize fault detection ability

 Considerations and trade-offs


◦ Cost to select test cases
◦ Time to execute test suite
◦ Fault detection effectiveness

3
 Regression test case selection techniques affect
the cost-effectiveness of regression testing

 Empirical evaluation of 5 selection techniques

 No new technique proposed

4
P
Application Application
P'
Release 1 Release 2

R1 Test T' T''


T T'''
Cases

 Programs: P, P'
 Test suite: T
 Test cases: T’ ⊆ T Regression test selection problem
 New test cases: T'' for P‘
 New test suite: T''' for P’ including selection from T’

5
 5 test case selection techniques
◦ Minimization
◦ Dataflow
◦ Safe
◦ Ad Hoc / Random
◦ Retest-All

6
• Select minimal sets of
Minimization test cases T'
Dataflow • Only cover modified or
Safe
affected portions of P
Ad Hoc / Random
– '81 Fischer et. al
Retest-All
– '90 Hartman and Robson

7
• Select test cases T' that
Minimization exercise data interactions
Dataflow that have been affected
Safe by modifications in P'
Ad Hoc / Random – '88 Harrold and Soffa
Retest-All
– '88 Ostrand and Weyuker
– '89 Taha et. al

8
• Guarantee that T' contains
Minimization all test cases in T that can
Dataflow reveal faults in P'
Safe – '92 Laski and Szermer
Ad Hoc / Random – '94 Chen et. al
Retest-All – '97 Rothermel and Harrold
– '97 Vokolos and Frankl

9
• Select T' based on hunches,
Minimization or loose associations of test
Dataflow
cases with functionality
Safe
Ad Hoc / Random
Retest-All

10
• “Select” all the test cases in T
Minimization to test P'
Dataflow
Safe
Ad Hoc / Random
Retest-All

11
 How techniques differ?
◦ The ability to reduce regression testing cost

◦ The ability to detect faults

◦ Trade-offs between test size reduction and fault detection

◦ The Cost-effectiveness comparison

◦ Factors affect the efficiency and effectiveness of test


selection techniques

12
 Calculating the cost of RST (Regression Test
Selection) Techniques
c ost = A + E (T ′)
A: The cost of analysis required to select test cases
E(T’): The cost of executing and validating the selected test cases
 They measure
◦ Reduction of E(T’) by calculating the size reduction
Reduction = T ′ / T
◦ Average of A by simulating on several machines

13
 On a Per-Test-Case Basis
◦ Effectiveness = # of test cases revealing fault of P’ in T,
but not in T’

 On a Per-Test-Suite Basis Their choice


◦ Classify the result of test selection
(1) No test case in T is fault revealing then T’ too, or
(2) Some test cases in T and T’ both revealing fault, or
(3) Some test cases in T is revealing fault, but not in T’.
◦ Effectiveness = 1 – (% of no fault revealing test cases)

14
Programs

 Programs: All C programs

◦ The Siemens Programs: 7 C programs

◦ Space: Interpreter for an array definition language

◦ Player: Subsystem of Empire (Internet game)

How do the authors


Faulty version create test pool and
suite? 15
Programs

Test Pool
Design

 Siemens Programs
◦ Constructing test pool of black-box test cases from
Hutchins et al.
◦ Adding additional white-box test cases

 Space
◦ 10000 test cases from Vokolos and Frankl, randomly
generated
◦ Adding new test cases from executing CFG

 Player
◦ 5 different unique version of player – named “base” version
◦ Creating own test cases from Empire information files

16
Programs
Siemens / Space Player
Test Pool
P1 P8
command1 Design
Siemens: 0.06%~19.77%
Test Suite
… command2 Space: 0.04%~94.35%
Design
Player: 0.77%~4.55%


Random


Selection

Test Pool

TC1 TC2 TC3


Test Suites for each program

Tp(E)
Random
Number
Generator

17
 Minimization
◦ Created simulator tool

 Dataflow  Only for Siemens


◦ Simulating dataflow testing tool
◦ Def-use pairs affected by modification

 Safe
◦ DejaVu: Rothermel and Harrold’s RTS algorithm
 Detect “dangerous edge”
◦ Aristole: program analysis system

 Random: n % of test cases from T randomly

18
 Variables
◦ Independent
 9 Programs (Siemens, Space and Player)
RTS technique (safe, dataflow, minimization,
random(25, 50, 75), retest-all
Test suite creation criteria
◦ Dependent
The average reduction in test suite size
Fault detection effectiveness

 Design
◦ Test suites: 100 coverage-based + 100 random

19
 Internal
◦ Instrumentation effects can bias results
 They run each test selection algorithm on each test suite
and each subject program

 External
◦ Limitation to generalize results to industrial practice
 Small size/simple fault pattern of test programs
Only for corrective maintenance process

 Construct
◦ Adequate measurement
Cost and effectiveness measurement is too coarse!

20
 Comparison1
◦ Test Size Reduction
◦ Fault Detection Effectiveness

 Comparison2
◦ Program Analysis Based Techniques
minimization, safe, and data-flow
◦ Random Technique

21
Random
Safe Minimization:
andTechniques:
Dataflow:
Safe: Best
Always
Similar
Constant
on choose
Space
behavior
percentage
and
1 test
on
Player
case
of test cases
Siemens

22
Random
Random
Safe
Minimization:
Techniques:
& Techniques:
Dataflow:overall
Similar
Increase
Effectiveness
had
median
rate
the lowest
diminished
performance
increased
effectiveness
as
byon
size
test
Siemens
increased.
suit size

23
 Minimization vs. Random
 Safe vs. Random

◦ Assumption: k value =◦ analysis


Same assumption
time about k
◦ Comparison Method ◦ Find k to make fixed
 Start from a trial value100(1-p)%
of k
of fault detect
of Random techniques
Choose test suite from Random Techniques
Minimization
minimization
Safe
Dataflow
◦ Comparison Results
-
Choose |Test suite| + k -testReduction
Effective
100% general
is very high
Effectiveness too
suits from
--Coverage-based
Various
Selection
Various
 Not Effectiveness
Ratio
Test
safe ↑ Size
Suite
random k Effectiveness
=0, 96.7% ↑
Adjust k until the effectiveness is equal
k Increase
= 0.1, 99%
Rate ↓
◦ Comparison Result  Random
For coverage-based test k = 0,k =89%
suite: 2.7
For random test suite: k =k4.65
= 10, 95%
 k = 25, 99%
 Safe vs. Retest-all
◦ When Safe is desirable?
Analysis cost is less than running
the unselected test cases
Test suite reduction depends on
program

24
 Minimization
◦ Smallest code size but least effective
◦ “on the average” applies to long-run behavior
◦ The number of test cases to choose depends on run-time

 Safe and Dataflow


◦ Nearly equivalent average behavior in cost-effective
◦ Safe is better than Dataflow, why?
◦ When dataflow is useful?
◦ Better analysis required for Safe

 Random
◦ Constant percentage of size reduction
◦ Size ↑, fault detect effectiveness ↑

 Retest-All
◦ No size reduction, 100% fault detect effectiveness

25
(1) Improve Cost Model with Other Factors
(2) Extend analysis to Multiple Types of Faults
(3) Develop Time-Series-Based Models
(4) Scalability with More Complex Fault Distribution

With more factors [3],


[4]
Multiple Types of
Test Prioritization [2] Faults [10]

Java Larger Improved Cost


Software[1] Software[7] Model [9]
Current
Paper 2003 2005 2 papers 4 papers

2004 2006 2007 2008


2001 2002
Larger and
complex
Software[8]
Using Field Data [5],[6]

26
[1] Mary Jean Harrold, James A. Jones, Tongyu Li, Donglin Liang, Alessandro Orso, Maikel
Pennings, Saurabh Sinha, Steven Spoon, “Regression Test Selection for Java
Software”, OOPSLA 2001, October 2001.
[2] Jung-Min Kim , Adam Porter, “A history-based test prioritization technique for regression
testing in resource constrained environments”, 24th International Conference on
Software Engineering, May 2002.
[3] A. G. Malishevsky, G. Rothermel, and S. Elbaum, “Modeling the Cost-Benefits Tradeoffs
for Regression Testing Techniques”, Proceedings of the International Conference on
Software Maintenance, October 2002.
[4] S. Elbaum, P. Kallakuri, A. Malishevsky, G. Rothermel, and S. Kanduri, “Understanding
the Effects of Changes on the Cost-Effectiveness of Regression Testing Techniques”,
Technical Report 020701, Department of Computer Science and Engineering,
University of Nebraska -- Lincoln, July 2002
[5] Alessandro Orso, Taweesup Apiwattanapong, Mary Jean Harrold, “Improving Impact
Analysis and Regression Testing Using Field Data”. RAMSS 2003, May 2003.
[6] Taweesup Apiwattanapong, Alessandro Orso, Mary Jean Harrold, “Leveraging Field Data
for Impact Analysis and Regression Testing”, ESEC9/FSE11 2003, September 2003.
[7] Alessandro Orso, Nanjuan Shi, Mary Jean Harrold, “Scaling Regression Testing to Large
Software Systems”, FSE 2004, November 2004.
[8] J. M. Kim, A. Porter, and G. Rothermel, “An Empirical Study of Regression Test
Application Frequency”, Journal of Software Testing, Verification, and Reliability, V. 15,
no. 4, December 2005, pages 257-279.
[9] H. Do and G. Rothermel, “An Empirical Study of Regression Testing Techniques
Incorporating Context and Lifecycle Factors and Improved Cost-Benefit Models”,
FSE2006, November 2006
[10] H. Do and G. Rothermel, “On the Use of Mutation Faults in Empirical Assessments of
Test Case Prioritization Techniques”, IEEE Transactions on Software Engineering, V.
32, No. 9, September 2006, pages 733-752
27