Software Metrices

Introduction to Software Testing
Software testing is an vital part of the software lifecycle. To understand its role, it is
instructive to review the definition of software testing in the literature.
Among alternative definitions of testing are the following:
"... the process of exercising or evaluating a system or system component by manual or

automated means to verify that it satisfies specified requirements or to identify
differences between expected and actual results ..."
(ANSI/IEEE Standard 729, 1983).
"... any activity aimed at evaluating an attribute or capability of a program or system and
determining that it meets its required results. Testing is the measurement of software
quality ..."
(Hetzel, W., The Complete Guide to Software Testing, QED Information Sciences Inc.,
1984).
"... the process of executing a program with the intent of finding errors..."
(Myers, G. J., The Art of Software Testing, Wiley, 1979).
Of course, none of these definitions claims that testing shows that software is free from
defects. Testing can show the presence, but not the absence of problems.
According to Humphrey [1], software testing is defined as 'the execution of a program to

find its faults'. Thus, a successful test is one that finds a defect. This sounds simple
enough, but there is much to consider when we want to do software testing. Besides
finding faults, we may also be interested in testing performance, safety, fault-tolerance or
security.
Testing often becomes a question of economics. For projects of a large size, more testing
will usually reveal more bugs. The question then becomes when to stop testing, and what
is an acceptable level of bugs. This is the question of 'good enough software'.
It is important to remember that testing assumes that requirements are already

validated.
Basic Methods
White Box Testing
White box testing is performed to reveal problems with the internal structure of a
program. This requires the tester to have detailed knowledge of the internal structure. A
common goal of white-box testing is to ensure a test case exercises every path through a
program. A fundamental strength that all white box testing strategies share is that the
entire software implementation is taken into account during testing, which facilitates
error detection even when the software specification is vague or incomplete. The
effectiveness or thoroughness of white-box testing is commonly expressed in terms of
test or code coverage metrics, which measure the fraction of code exercised by test cases.
Black Box Testing

Black box tests are performed to assess how well a program meets its requirements,
looking for missing or incorrect functionality. Functional tests typically exercise code
with valid or nearly valid input for which the expected output is known. This includes
concepts such as 'boundary values'.
Performance tests evaluate response time, memory usage, throughput, device utilization,
and execution time. Stress tests push the system to or beyond its specified limits to
evaluate its robustness and error handling capabilities. Reliability tests monitor system
response to representative user input, counting failures over time to measure or certify
reliability.
Testing Levels
Different Levels of Test
Testing occurs at every stage of system construction. The larger a piece of code is when
defects are detected, the harder and more expensive it is to find and correct the defects.
The different levels of testing reflect that testing, in the general sense, is not a single
phase of the software lifecycle. It is a set of activities performed throughout the entire
software lifecycle.
In considering testing, most people think of the activities described in figure 1. The
activities after Implementation are normally the only ones associated with testing.
Software testing must be considered before implementation, as is suggested by the input
arrows into the testing activities.
Figure 1: V-Shaped Life Cycle
.The following paragraphs describe the testing activities from the 'second half' of the
software lifecycle.
Unit Testing
Unit testing exercises a unit in isolation from the rest of the system. A unit is typically a
function or small collection of functions (libraries, classes), implemented by a single
developer.
The main characteristic that distinguishes a unit is that it is small enough to test
thoroughly, if not exhaustively. Developers are normally responsible for the testing of
their own units and these are normally white box tests. The small size of units allows a
high level of code coverage. It is also easier to locate and remove bugs at this level of
testing.
Integration Testing
One of the most difficult aspects of software development is the integration and testing of
large, untested sub-systems. The integrated system frequently fails in significant and
mysterious ways, and it is difficult to fix it
Integration testing exercises several units that have been combined to form a module,
subsystem, or system. Integration testing focuses on the interfaces between units, to make
sure the units work together. The nature of this phase is certainly 'white box', as we must
have a certain knowledge of the units to recognize if we have been successful in fusing
them together in the module.
There are three main approaches to integration testing: top-down, bottom-up and 'big
bang'. Top-down combines, tests, and debugs top-level routines that become the test
'harness' or 'scaffolding' for lower-level units. Bottom-up combines and tests low-level
units into progressively larger modules and subsystems. 'Big bang' testing is,
unfortunately, the prevalent integration test 'method'. This is waiting for all the module
units to be complete before trying them out together.
(From [1])
Bottom-up
Top-down
Major Features
• Allows early testing aimed t proving feasibility and practicality of particular modules.
• Modules can be integrated in various clusters as desired.
• Major emphasis is on module functionality and performance.
The control program is tested first

Modules are integrated one at a time
Major emphasis is on interface testing
Advantages
No test stubs are needed
It is easier to adjust manpower needs
Errors in critical modules are found early
No test drivers are needed
The control program plus a few modules forms a basic early prototype
Interface errors are discovered early
Modular features aid debugging
Disadvantages
Test drivers are needed
Many modules must be integrated before a working program is available
Interface errors are discovered late
Test stubs are needed
The extended early phases dictate a slow manpower buildup
Errors in critical modules at low levels are found late
Comments
At any given point, more code has been written and tested that with top down testing. Some
people feel that bottom-up is a more intuitive test philosophy.
An early working program raises morale and helps convince management progress is being
made. It is hard to maintain a pure top-down strategy in practice.
Integration tests can rely heavily on stubs or drivers. Stubs stand-in for finished
subroutines or sub-systems. A stub might consist of a function header with no body, or it
may read and return test data from a file, return hard-coded values, or obtain data from
the tester. Stub creation can be a time consuming piece of testing.
The cost of drivers and stubs in the top-down and bottom-up testing methods is what
drives the use of 'big bang' testing. This approach waits for all the modules to be
constructed and tested independently, and when they are finished, they are integrated all
at once. While this approach is very quick, it frequently reveals more defects than the
other methods. These errors have to be fixed and as we have seen, errors that are found
'later' take longer to fix. In addition, like bottom up, there is really nothing that can be
demonstrated until later in the process.
External Function Testing

The 'external function test' is a black box test to verify the system correctly implements
specified functions. This phase is sometimes known as an alpha test. Testers will run tests
that they believe reflect the end use of the system.
System Testing
The 'system test' is a more robust version of the external test, and can be known as an
alpha test. The essential difference between 'system' and 'external function' testing is the
test platform. In system testing, the platform must be as close to production use in the
customers’ environment, including factors such as hardware setup and database size and
complexity. By replicating the target environment, we can more accurately test 'softer'
system features (performance, security and fault-tolerance).
Because of the similarities between the test suites in the external function and system test
phases, a project may leave one of them out. It may be too expensive to replicate the user
environment for the system test, or we may not have enough time to run both.
Acceptance Testing
An acceptance (or beta) test is an exercise of a completed system by a group of end users
to determine whether the system is ready for deployment. Here the system will receive
more realistic testing that in the 'system test' phase, as the users have a better idea how
the system will be used than the system testers.
Regression Testing
Regression testing is an expensive but necessary activity performed on modified software
to provide confidence that changes are correct and do not adversely affect other system
components. Four things can happen when a developer attempts to fix a bug. Three of
these things are bad, and one is good:
New Bug
No New Bug
Successful Change
Bad
Good
Unsuccessful Change
Bad
Bad
Because of the high probability that one of the bad outcomes will result from a change to
the system, it is necessary to do regression testing.
It can be difficult to determine how much re-testing is needed, especially near the end of
the development cycle. Most industrial testing is done via test suites; automated sets of
procedures designed to exercise all parts of a program and to show defects. While the
original suite could be used to test the modified software, this might be very time-
consuming. A regression test selection technique chooses, from an existing test set, the
tests that are deemed necessary to validate modified software.
There are three main groups of test selection approaches in use:
• Minimization approaches seek to satisfy structural coverage criteria by identifying

a minimal set of tests that must be rerun.
• Coverage approaches are also based on coverage criteria, but do not require
minimization of the test set. Instead, they seek to select all tests that exercise
changed or affected program components.
• Safe attempt instead to select every test that will cause the modified program to
produce different output than original program.
An interesting approach to limiting test cases is based on whether we can confine testing
to the "vicinity" of the change. (Ex. If I put a new radio in my car, do I have to do a
complete road test to make sure the change was successful?) A new breed of regression
test theory tries to identify, through program flows or reverse engineering, where
boundaries can be placed around modules and subsystems. These graphs can determine
which tests from the existing suite may exhibit changed behavior on the new version.
Regression testing has been receiving more attention as corporations focus on fixing the
'Year 2000 Bug'. The goal of most Y2K is to correct the date handling portions of their
system without changing any other behavior. A new 'Y2K' version of the system is
compared against a baseline original system. With the obvious exception of date formats,
the performance of the two versions should be identical. This means not only do they do
the same things correctly, they also do the same things incorrectly. A non-Y2K bug in the
original software should not have been fixed by the Y2K work.
A frequently asked question about regression testing is 'The developer says this problem
is fixed. Why do I need to re-test?’ to which the answer is 'The same person probably told
you it worked in the first place'.
Installation Testing
The testing of full, partial, or upgrade install/uninstall processes.
Completion Criteria
There are a number of different ways to determine the test phase of the software life cycle
is complete. Some common examples are:
• All black-box test cases are run

• White-box test coverage targets are met
• Rate of fault discovery goes below a target value
• Target percentage of all faults in the system are found
• Measured reliability of the system achieves its target value (mean time to failure)
• Test phase time or resources are exhausted
When we begin to talk about completion criteria, we move naturally into a discussion of
software testing metrics.
Metrics
Goals
As stated above, the major goal of testing is to discover errors in the software. A
secondary goal is to build confidence that the system will work without error when
testing does not reveal any errors. Then what does it mean when testing does not detect
any errors? We can say that either the software is high quality or the testing process is
low quality. We need metrics on our testing process if we are to tell which is the right
answer.
As with all domains of the software process, there are hosts of metrics that can be used in
testing. Rather than discuss the merits of specific measurements, it is more important to
know what they are trying to achieve.
Three themes prevail:

• Quality Assessment (What percentage of defects are captured by our testing
process, how many remain?)
• Risk Management (What is the risk related to remaining defects?)
• Test Process Improvement (How long does our testing process take?)
Quality Assessment
An important question in the testing process is "when should we stop?" The answer is
when system reliability is acceptable or when the gain in reliability cannot compensate
for the testing cost. To answer either of these concerns we need a measurement of the
quality of the system.
The most commonly used means of measuring system quality is defect density. Defect
density is represented by:
# of Defects / System Size
where system size is usually expressed in thousands of lines of code or KLOC. Although
it is a useful indicator of quality when used consistently within an organization, there are
a number of well documented problems with this metric. The most popular relate to
inconsistent definitions of defects and system sizes.
Defect density accounts only for defects that are found in-house or over a given amount
of operational field use. Other metrics attempt to estimate of how many defects remain
undetected. A simplistic case of error estimation is based on "error seeding". We assume
the system has X errors. It is artificially seeded with S additional errors. After a testing,
we have discovered Tr 'real' errors and Ts seeded errors. If we assume (questionable
assumption) that the testers find the same percentage of seeded errors as real errors, we
can calculate X:
• S / (X + S) = Ts / (Tr + Ts)
• X = S * ((Tr + Ts) / Ts -1)
For example, if we find half the seeded errors, then the number of 'real' defects found
represents half of the total defects in the system.
Estimating the number and severity of undetected defects allows informed decisions on
whether the quality is acceptable or additional testing is cost-effective. It is very
important to consider maintenance costs and redevelopment efforts when deciding on
value of additional testing.
Risk Management
Metrics involved in risk management measure how important a particular defect is (or
could be). These measurements allow us to prioritize our testing and repair cycles. A
truism is that there is never enough time or resources for complete testing, making
prioritization a necessity.
One approach is known as Risk Driven Testing, where Risk has specific meaning. The
failure of each component is rated by Impact and Likelihood. Impact is a severity rating,
based on what would happen if the component malfunctioned. Likelihood is an estimate
of how probable it is that the component would fail. Together, Impact and Likelihood
determine the Risk for the piece.
Obviously, the higher rating on each scale corresponds to the overall risk involved with
defects in the component. With a rating scale, this might be represented visually:
I 3
m
2
p
c 1
+ 1 2 3 4
Likelihood -
The relative importance of likelihood and impact will vary from project to project and
company to company.
A system level measurement for risk management is the Mean Time To Failure (MTTF).
Test data sampled from realistic beta testing is used find the average time until system
failure. This data is extrapolated to predict overall uptime and the expected time the
system will be operational. Sometimes measured with MTTF is Mean Time To Repair
(MTTR). This represents the expected time until the system will be repaired and back in
use after a failure is observed. Availability, obtained by calculating MTTF / (MTTF +
MTTR), is the probability that a system is available when needed. While these are
reasonable measures for assessing quality, they are more often used to assess the risk
(financial or otherwise) that a failure poses to a customer or in turn to the system supplier.
Process Improvement
It is generally accepted that achieve improvement you need a measure against which to
gauge performance. To improve our testing processes we the ability to compare the
results from one process to another.
Popular measures of the testing process report:

• Effectiveness: Number of defects found and successfully removed / Number of
Defect Presented
• Efficiency: Number of defects found in a given time
It is also important to consider reported system failures in the field by the customer. If a
high percentage of customer reported defects were not revealed in-house, it is a
significant indicator that the testing process in incomplete.
A good defect reporting structure will allow defect types and origins to be identified. We
can use this information to improve the testing process by altering and adding test
activities to improve our changes of finding the defects that are currently escaping
detection. By tracking our test efficiency and effectiveness, we can evaluate the changes
made to the testing process.
Testing metrics give us an idea how reliable our testing process has been at finding
defects, and can is a reasonable indicator if its performance in the future. It must be
remembered that measurement is not the goal, improvement through measurement,
analysis and feedback is what is needed.

Software Metrices

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Software Metrices

Hochgeladen von

Copyright:

Verfügbare Formate

Introduction to Software Testing

Among alternative definitions of testing are the following:

"... the process of exercising or evaluating a system or system component by manual or

(ANSI/IEEE Standard 729, 1983).

(Myers, G. J., The Art of Software Testing, Wiley, 1979).

According to Humphrey [1], software testing is defined as 'the execution of a program to

It is important to remember that testing assumes that requirements are already

Black Box Testing

The control program is tested first

External Function Testing

There are three main groups of test selection approaches in use:

• Minimization approaches seek to satisfy structural coverage criteria by identifying

• All black-box test cases are run

Three themes prevail:

# of Defects / System Size

Popular measures of the testing process report:

Das könnte Ihnen auch gefallen