Sie sind auf Seite 1von 204

23/11/2012

Functional testing - Wikipedia, the free encyclopedia

Functional testing

From Wikipedia, the free encyclopedia

Functional testing is a type of black box testing that bases its test cases on the specifications of the software component under test. Functions are tested by feeding them input and examining the output, and internal program structure is rarely considered (not like in white-box testing). [1]

Functional testing differs from system testing in that functional testing "verif[ies] a program by checking it

against

against the published user or system requirements" (Kaner, Falk, Nguyen 1999, p. 52).

design document(s) or specification(s)", while system testing "validate[s] a program by checking it

Functional testing typically involves five steps [citation needed] :

1. The identification of functions that the software is expected to perform

2. The creation of input data based on the function's specifications

3. The determination of output based on the function's specifications

4. The execution of the test case

5. The comparison of actual and expected outputs

See also

Non-functional testing5. The comparison of actual and expected outputs See also Acceptance testing Regression testing System testing

Acceptance testingactual and expected outputs See also Non-functional testing Regression testing System testing Software testing

Regression testingoutputs See also Non-functional testing Acceptance testing System testing Software testing Integration testing Unit

System testingNon-functional testing Acceptance testing Regression testing Software testing Integration testing Unit testing Database

Software testingtesting Acceptance testing Regression testing System testing Integration testing Unit testing Database testing References

Integration testingtesting Regression testing System testing Software testing Unit testing Database testing References 1. ^ Kaner, Falk,

Unit testingtesting System testing Software testing Integration testing Database testing References 1. ^ Kaner, Falk, Nguyen.

Database testingtesting Software testing Integration testing Unit testing References 1. ^ Kaner, Falk, Nguyen. Testing Computer

References

1. ^ Kaner, Falk, Nguyen. Testing Computer Software. Wiley Computer Publishing, 1999, p. 42. ISBN 0-471-

35846-0.

External links

JTAG for Functional Test without Boundary-scanPublishing, 1999, p. 42. ISBN 0-471- 35846-0. External links

(http://www.corelis.com/blog/index.php/blog/2011/01/10/jtag-for-functional-test-without-boundary-

scan)

Retrieved from "http://en.wikipedia.org/w/index.php?title=Functional_testing&oldid=510357783"

Categories: Software testing

Computing stubs

23/11/2012

Functional testing - Wikipedia, the free encyclopedia

apply. See Terms of Use for details. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.

Quick QuoteRequest a Demo Login ™ Functionality Testing What is Functionality Testing? Functionality testing is employed

Request a DemoQuick Quote Login ™ Functionality Testing What is Functionality Testing? Functionality testing is employed to verify

LoginQuick Quote Request a Demo ™ Functionality Testing What is Functionality Testing? Functionality testing is employed

Quick Quote Request a Demo Login ™ Functionality Testing What is Functionality Testing? Functionality testing is
Quick Quote Request a Demo Login ™ Functionality Testing What is Functionality Testing? Functionality testing is

Functionality Testing

What is Functionality Testing?

Functionality testing is employed to verify whether your product meets the intended specifications and functional requirements laid out in your development documentation.

What is the purpose of Functionality Testing?

As competition in the software and hardware development arena intensifies, it becomes critical to deliver products that are virtually bug-free. Functionality testing helps your company deliver products with a minimum amount of issues to an increasingly sophisticated pool of end users. Potential purchasers of your products may find honest and often brutal product reviews online from consumers and professionals, which might deter them from buying your software. nResult will help ensure that your product functions as intended, keeping your service and support calls to a minimum. Let our trained professionals find functional issues and bugs before your end users do!

How can nResult help you deliver high quality products that are functionally superior to products offered by your competition?

We offer several types of functional testing techniques:

Ad Hoc – Takes advantage of individual testing talents based upon product goals, level of user capabilities and possible areas and features that may create confusion. The tester will generate test cases quickly, on the spur of the moment.We offer several types of functional testing techniques: Exploratory – The tester designs and executes tests

Exploratory – The tester designs and executes tests while learning the product. Test design is organized by a set of concise patterns designed to assure that testers don’t miss anything of importance.will generate test cases quickly, on the spur of the moment. Combination – The tester performs

Combination – The tester performs a sequence of events using different paths to complete tasks. This can uncover bugs related to order of events that are difficult to find using other methods.to assure that testers don’t miss anything of importance. Scripted – The tester uses a test

Scripted – The tester uses a test script that lays out the specific functions to be tested. A test script can be provided by the customer/developer or constructed by nResult, depending on the needs of your organization.of events that are difficult to find using other methods. Let nResult ensure that your hardware

Let nResult ensure that your hardware or software will function as intended. Our team will check for any anomalies or bugs in your product, through any or all stages of development, to help increase your confidence level in the product you are delivering to market. nResult offers detailed, reasonably priced solutions to meet your testing needs.

Services offered by nResult:

Accessibility Testing With accessibility testing, nResult ensures that your software or hardware product is accessible and effective With accessibility testing, nResult ensures that your software or hardware product is accessible and effective for those with disabilities. Read more>>

Compatibility Testing Make sure your software applications and hardware devices function correctly with all relevant operating systems Make sure your software applications and hardware devices function correctly with all relevant operating systems and with computing environments. Read more>>

Interoperability Testing Make sure your software applications and hardware devices function correctly with all other products in Make sure your software applications and hardware devices function correctly with all other products in the market. Read more>>

Competitive Analysis Stack up next to your competitors with a full competitive analysis report. Read more>> Stack up next to your competitors with a full competitive analysis report. Read more>>

Performance Testing Ensure that your software/web application or website is equipped to handle anticipated and increased network Ensure that your software/web application or website is equipped to handle anticipated and increased network traffic with adequate performance testing. Performance Testing includes Load Testing and Benchmarking. Read more>>

Localization Testing Make certain that your localized product blends flawlessly with the native language and culture. Read Make certain that your localized product blends flawlessly with the native language and culture. Read more>>

Medical Device Testing nResult provides solutions for complying with challenging and expensive testing requirements for your medical device. nResult provides solutions for complying with challenging and expensive testing requirements for your medical device. Read more>>

Web Application Testing Find and eliminate weaknesses in your website’s usability, functionality, performance, and browser compatibilities. Read Find and eliminate weaknesses in your website’s usability, functionality, performance, and browser compatibilities. Read more>>

Certification Testing Add instant credibility to your product from one of the most trusted names in testing. Add instant credibility to your product from one of the most trusted names in testing. Read more>>

Security Testing Test your product for common security vulnerabilities; gain piece of mind in an insecure world. Test your product for common security vulnerabilities; gain piece of mind in an insecure world.

Introduction to Performance Testing

Introduction to Performance Testing First Presented for: PSQT/PSTT Conference Washington, DC May, 2003 Scott Barber Chief

First Presented for:

PSQT/PSTT Conference Washington, DC May, 2003

Scott Barber Chief Technology Officer PerfTestPlus, Inc.

Scott Barber Chief Technology Officer PerfTestPlus, Inc. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 1

Agenda

Agenda

Why Performance Test? What is Performance related testing? Intro to Performance Engineering Methodology Where to go for more info Summary / Q&A

Methodology Where to go for more info Summary / Q&A www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 2

Why Performance Test?

Why Performance Test?

Speed - Does the application respond quickly enough for the intended users?

Scalability – Will the application handle the expected user load and beyond? (AKA Capacity)

Stability – Is the application stable under expected and unexpected user loads? (AKA Robustness)

Confidence – Are you sure that users will have a positive experience on go-live day?

that users will have a positive experience on go-live day? www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 3

Speed

Speed

User Expectations

– Experience

– Psychology

– Usage

System Constraints

– Hardware

– Network

– Software

Costs

– Speed can be expensive!

– Network – Software Costs – Speed can be expensive! www.PerfTestPlus.com Introduction to Performance Testing ©
– Network – Software Costs – Speed can be expensive! www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 4

Scalability

Scalability

How many users…

– before it gets “slow”?

– before it stops working?

– will it sustain?

– do I expect today?

– do I expect before the next upgrade?

How much data can it hold?

– Database capacity

– File Server capacity

– Back-up Server capacity

– Data growth rates

capacity – Back-up Server capacity – Data growth rates www.PerfTestPlus.com Introduction to Performance Testing ©
capacity – Back-up Server capacity – Data growth rates www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 5

Stability

Stability

What happens if…

– there are more users than we expect?

– all the users do the same thing?

– a user gets disconnected?

– there is a Denial of Service Attack?

– the web server goes down?

– we get too many orders for the same thing?

goes down? – we get too many orders for the same thing? www.PerfTestPlus.com Introduction to Performance
goes down? – we get too many orders for the same thing? www.PerfTestPlus.com Introduction to Performance

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 6

Confidence

Confidence

If you know what the performance is…

– you can assess risk.

– you can make informed decisions.

– you can plan for the future.

– you can sleep the night before go-live day.

The peace of mind that it will work on go-live day alone justifies the cost of performance testing.

go-live day alone justifies the cost of performance testing. www.PerfTestPlus.com Introduction to Performance Testing ©
go-live day alone justifies the cost of performance testing. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 7

What is Performance Related Testing?

What is Performance Related Testing?

Not Resolved

Performance Validation Performance Testing Performance Engineering

Compare & Contrast

What?

Detect

What? Detect
What? Detect
What? Detect
What? Detect
Engineering Compare & Contrast What? Detect Diagnose Why? Resolve www.PerfTestPlus.com Introduction to
Engineering Compare & Contrast What? Detect Diagnose Why? Resolve www.PerfTestPlus.com Introduction to

Diagnose

Diagnose
Diagnose
Diagnose

Why?

Compare & Contrast What? Detect Diagnose Why? Resolve www.PerfTestPlus.com Introduction to Performance
Compare & Contrast What? Detect Diagnose Why? Resolve www.PerfTestPlus.com Introduction to Performance

Resolve

& Contrast What? Detect Diagnose Why? Resolve www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 8

Performance Validation

Performance Validation

“Performance validation is the process by which software is tested with the intent of determining if the software meets pre-existing performance requirements. This process aims to evaluate compliance.”

Primarily used for…

– determining SLA compliance.

– IV&V (Independent Validation and Verification).

– validating subsequent builds/releases.

Verification). – validating subsequent builds/releases. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 9

Performance Testing

Performance Testing

“Performance testing is the process by which software is tested to determine the current system performance. This process aims to gather information about current performance, but places no value judgments on the findings.”

Primarily used for…

– determining capacity of existing systems.

– creating benchmarks for future systems.

– evaluating degradation with various loads and/or configurations.

degradation with various loads and/or configurations. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 10

Performance Engineering

Performance Engineering

“Performance engineering is the process by which software is tested and tuned with the intent of realizing the required performance. This process aims to optimize the most important application performance trait, user experience.”

Primarily used for…

– new systems with pre-determined requirements.

– extending the capacity of old systems.

– “fixing” systems that are not meeting requirements/SLAs.

“fixing” systems that are not meeting requirements/SLAs. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 11

Compare and Contrast

Compare and Contrast

Validation and Testing:

– Are a subset of Engineering.

– Are essentially the same except:

• Validation usually focuses on a single scenario and tests against pre-determined standards.

• Testing normally focuses on multiple scenarios with no pre- determined standards.

– Are generally not iterative.

– May be conducted separate from software development.

– Have clear end points.

from software development. – Have clear end points. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 12

Compare and Contrast

Compare and Contrast

Engineering:

– Is iterative.

– Has clear goals, but ‘fuzzy’ end points.

– Includes the effort of tuning the application.

– Focuses on multiple scenarios with pre-determined standards.

– Heavily involves the development team.

– Occurs concurrently with software development.

team. – Occurs concurrently with software development. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 13

Intro to PE Methodology

Intro to PE Methodology
Intro to PE Methodology Evaluate System Develop Test Assets Baselines and Benchmarks Analyze Results Tune Identify

Evaluate System Develop Test Assets Baselines and Benchmarks Analyze Results Tune Identify Exploratory Tests Execute Scheduled Tests Complete Engagement

Tests Execute Scheduled Tests Complete Engagement www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 14

Evaluate System

Evaluate System

Determine performance requirements. Identify expected and unexpected user activity. Determine test and/or production architecture. Identify non-user-initiated (batch) processes. Identify potential user environments. Define expected behavior during unexpected circumstances.

Define expected behavior during unexpected circumstances. www.PerfTestPlus.com Introduction to Performance Testing ©
Define expected behavior during unexpected circumstances. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 15

Develop Test Assets

Develop Test Assets

Create Strategy Document. Develop Risk Mitigation Plan. Develop Test Data. Automated test scripts:

Plan

Create

Validate

Automated test scripts: – Plan – Create – Validate www.PerfTestPlus.com Introduction to Performance Testing ©
Automated test scripts: – Plan – Create – Validate www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 16

Baseline and Benchmarks

Baseline and Benchmarks

Most important for iterative testing.

Baseline (single user) for initial basis of comparison and ‘best case’.

Benchmark (15-25% of expected user load) determines actual

state at loads expected to meet requirements.

actual state at loads expected to meet requirements. www.PerfTestPlus.com Introduction to Performance Testing ©
actual state at loads expected to meet requirements. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 17

Analyze Results

Analyze Results

Most important. Most difficult. Focuses on:

– Have the performance criteria been met?

– What are the bottlenecks?

– Who is responsible to fix those bottlenecks?

– Decisions.

Who is responsible to fix those bottlenecks? – Decisions. www.PerfTestPlus.com Introduction to Performance Testing ©
Who is responsible to fix those bottlenecks? – Decisions. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 18

Tune

Tune

Engineering only. Highly collaborative with development team. Highly iterative. Usually, performance engineer ‘supports’ and ‘validates’ while developers/admins ‘tune’.

and ‘validates’ while developers/admins ‘tune’. www.PerfTestPlus.com Introduction to Performance Testing ©
and ‘validates’ while developers/admins ‘tune’. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 19

Identify Exploratory Tests

Identify Exploratory Tests

Engineering only. Exploits known bottleneck. Assists with analysis & tuning. Significant collaboration with ‘tuners’. Not robust tests – quick and dirty, not often reusable/relevant after tuning is complete.

dirty, not often reusable/relevant after tuning is complete. www.PerfTestPlus.com Introduction to Performance Testing ©
dirty, not often reusable/relevant after tuning is complete. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 20

Execute Scheduled Tests

Execute Scheduled Tests

Only after Baseline and/or Benchmark tests. These tests evaluate compliance with documented requirements. Often are conducted on multiple hardware/configuration variations.

are conducted on multiple hardware/configuration variations. www.PerfTestPlus.com Introduction to Performance Testing ©
are conducted on multiple hardware/configuration variations. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 21

Complete Engagement

Complete Engagement

Document:

– Actual Results

– Tuning Summary

– Known bottlenecks not tuned

– Other supporting information

– Recommendation

tuned – Other supporting information – Recommendation Package Test Assets: – Scripts – Documents – Test

Package Test Assets:

– Scripts

– Documents

– Test data

Package Test Assets: – Scripts – Documents – Test data www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 22

Where to go for more information

Where to go for more information

http://www.PerfTestPlus.com (My site) http://www.QAForums.com (Huge QA Forum) http://www.loadtester.com (Good articles and links)

articles and statistics) http://www.keynote.com/resources/resource_library.html (Good articles and statistics)

_ library.html (Good articles and statistics) www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 23

Summary

Summary

We test performance to:

– Evaluate Risk.

– Determine system capabilities.

– Determine compliance.

Performance Engineering Methodology:

– Ensures goals are accomplished.

– Defines tasks.

– Identifies critical decision points.

– Shortens testing lifecycle.

critical decision points. – Shortens testing lifecycle. www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 24

Questions and Contact Information

Questions and Contact Information

Scott Barber Chief Technology Officer PerfTestPlus, Inc.

Scott Barber Chief Technology Officer PerfTestPlus, Inc. E-mail: sbarber@perftestplus.com Web Site:

E-mail:

Web Site:

sbarber@perftestplus.com Web Site: www.PerfTestPlus.com www.PerfTestPlus.com Introduction to Performance Testing ©

www.PerfTestPlus.com

Introduction to Performance Testing

© 2006 PerfTestPlus, Inc. All rights reserved.

Page 25

Software Performance Testing

Xiang Gan

Helsinki 26.09.2006

Seminar paper

University of Helsinki

Department of Computer Science

HELSINGIN YLIOPISTO - HELSINGFORS UNIVERSITET – UNIVERSITY OF HELSINKI

Tiedekunta/Osasto - Fakultet/Sektion – Faculty/Section

Laitos - Institution - Department

Faculty of Science

Department of Computer Science

Tekijä - Författare - Author

Xiang Gan

Työn nimi - Arbetets titel - Title

Software performance testing

Oppiaine - Läroämne - Subject

Työn laji - Arbetets art - Level

Aika - Datum - Month and year

Sivumäärä - Sidoantal - Number of pages

26.9.2006

9

Tiivistelmä - Referat - Abstract

Performance is one of the most important aspects concerned with the quality of software. It indicates how well a software system or component meets its requirements for timeliness. Till now, however, no significant progress has been made on software performance testing. This paper introduces two software performance testing approaches which are named workload characterization and early performance testing with distributed application, respectively.

ACM Computing Classification System (CCS):

 

A.1 [Introductory and Survey], D.2.5 [Testing and Debugging]

Avainsanat – Nyckelord - Keywords

software performance testing, performance, workload, distributed application

Säilytyspaikka - Förvaringställe - Where deposited

 

Muita tietoja - Övriga uppgifter - Additional information

 

ii

Contents

1 Introduction

1

2 Workload characterization approach

2

2.1 Requirements and specifications in performance testing

2

2.2 Characterizing the workload

2

2.3 Developing performance test cases

3

3 Early performance testing with distributed application

4

3.1

Early testing of performance

5

3.1.1 Selecting performance use-cases

5

3.1.2 Mapping use-cases to middleware

6

3.1.3 Generating stubs

7

3.1.4 Executing the test

7

4 Conclusion

8

References

9

1

1

Introduction

Although the functionality supported by a software system is apparently important, it is usually not the only concern. The various concerns of individuals and of the society as a whole may face significant breakdowns and incur high costs if the system cannot meet the quality of service requirements of those non-functional aspects, for instance, performance, availability, security and maintainability that are expected from it.

Performance is an indicator of how well a software system or component meets its requirements for timeliness. There are two important dimensions to software performance timeliness, responsiveness and scalability [SmW02]. Responsiveness is the ability of a system to meet its objectives for response time or throughput. The response time is the time required to respond to stimuli (events). The throughput of a system is the number of events processed in some interval of time [BCK03]. Scalability is the ability of a system to continue to meet its response time or throughput objectives as the demand for the software function increases [SmW02].

As Weyuker and Vokolos argued [WeV00], usually, the primary problems that projects report after field release are not system crashes or incorrect systems responses, but rather system performance degradation or problems handling required system throughput. If queried, the fact is often that although the software system has gone through extensive functionality testing, it was never really tested to assess its expected performance. They also found that performance failures can be roughly classified as the following three categories:

l

the lack of performance estimates,

l

the failure to have proposed plans for data collection,

l

the lack of a performance budget.

This seminar paper concentrates upon the introduction of two software performance testing approaches. Section 2 introduces a workload characterization approach which requires a careful collection of data for significant periods of time in the production environment. In addition, the importance of clear performance requirements written in requirement and specification documents is emphasized, since it is the fundamental basis to carry out performance testing. Section 3 focuses on an approach to test the performance of distributed software application as early as possible during the entire software engineering process since it is obviously a large overhead for the development team to fix the performance problems at the end of the whole process. Even worse, it may be impossible to fix some performance problems without sweeping redesign and re-implementation which can eat up lots of time and money. A conclusion is made at last in section 4.

2

2 Workload characterization approach

As indicated [AvW04], one of the key objectives of performance testing is to uncover problems that are revealed when the system is run under specific workloads. This is sometimes referred to in the software engineering literature as an operational profile [Mus93]. An operational profile is a probability distribution describing the frequency with which selected important operations are exercised. It describes how the system has historically been used in the field and thus is likely to be used in the future. To this end, performance requirement is one of the necessary prerequisites which will be used to determine whether software performance testing has been conducted in a meaningful way.

2.1 Requirements and specifications in performance testing

Performance requirements must be provided in a concrete, verifiable manner [VoW98]. This should be explicitly included in a requirements or specification document and might be provided in terms of throughput or response time, and might also include system availability requirements.

One of the most serious problems with performance testing is making sure that the stated requirements can actually be checked to see whether or not they are fulfilled [WeV00]. For instance, in functional testing, it seems to be useless to choose inputs with which it is entirely impossible to determine whether or not the output is correct. The same situation applies to performance testing. It is important to write requirements that are meaningful for the purpose of performance testing. It is quite easy to write a performance requirement for an ATM such as, one customer can finish a single transaction of withdrawing money from the machine in less than 25 seconds. Then it might be possible to show that the time used in most of the test cases is less than 25 seconds, while it only fails in one test case. Such a situation, however, cannot guarantee that the requirement has been satisfied. A more plausible piece of performance requirement should state that the time used in such a single transaction is less than 25 seconds when the server at host bank is run with an average workload. Assume that a benchmark has been established which can accurately reflect the average workload, it is then possible to test whether this requirement has been satisfied or not.

2.2 Characterizing the workload

In order to do the workload characterization, it is necessary to collect data for significant periods of time in production environment. This can help characterize the system workload, and then use these representative workloads to determine what the system performance will look like when it is run in production on significantly large workloads.

3

The workload characterization approach described by Alberto Avritzer and Joe Kondek [AKL02] is comprised of two steps that will be illustrated as follows.

The first step is to model the software system. Since most industrial software systems are usually too complex to handle all the possible characteristics, then modeling is necessary. The goal of this step is thus to establish a simplified version of the system in which the key parameters have been identified. It is essential that the model be as close enough to the real system as possible so that the data collected from it will realistically reflect the true system’s behavior. Meanwhile, it shall be simple enough as it will then be feasible to collect the necessary data.

The second step is to collect data while the system is in operation after the system has been modeled, and key parameters identified. According to the paper [AKL02], this activity should usually be done for periods of two to twelve months. Following that, the data must be analyzed and a probability distribution should be determined. Although the input space, in theory, is quite enormous because of the non-uniform property of the frequency distribution, experience has shown that there are a relatively small number of inputs which actually occur during the period of data collection. The paper [AKL02] showed that it is quite common for only several thousand inputs to correspond to more than 99% of the probability mass associated with the input space. This means that a very accurate picture of the performance that the user of the system tends to see in the field can be drawn only through testing the relatively small number of inputs.

2.3 Developing performance test cases

After performing the workload characterization and determining what are the paramount system characteristics that require data collection, now we need to use that information to design performance test cases to reflect field production usage for the system. The following prescriptions were defined by Weyuker and Vokolos [WeV00]. One of the most interesting points in this list of prescriptions is that they also defined how to design performance test cases in case the detailed historical data is unavailable. Their by then situation was that a new platform has been purchased but not yet available; plus software has already been designed and written explicitly for the new hardware platform. The goal of such work is to determine whether there are likely to be performance problems once the hardware is delivered and the software is installed and running with the real customer base.

Typical steps to form performance test cases are as follows:

l

identify the software processes that directly influence the overall performance of the system,

l

for each process, determine the input parameters that will most significantly influence the performance of the system. It is important to limit the parameters to the essential ones so that the set of test cases selected will be of manageable size,

4

l

determine realistic values for these parameters by collecting and analyzing existing usage data. These values should reflect desired usage scenarios, including both average and heavy workloads.

l

if there are parameters for which historical usage data are not available, then estimate reasonable values based on such things as the requirements used to develop the system or experience gathered by using an earlier version of the system or similar systems.

l

if, for a given parameter, the estimated values form a range, then select representative values from within this range that are likely to reveal useful information about the performance behavior of the system. Each selected value should then form a separate test case.

It is, however, important to recognize that this list cannot be treated as a precise preparation for test cases since every system is different.

with

distributed application

Testing techniques are usually applied towards the end of a project. However, most researchers and practitioners agree that the most critical performance problems, as a quality of interest, depend upon decisions made in the very early stages of the development life cycle, such as architectural choices. Although iterative and incremental development has been widely promoted, the situation concerned with testing techniques has not been changed so much.

With the increasingly advance in distributed component technologies, such as J2EE and CORBA, distributed systems are no longer built from scratch [DPE04]. Modern distributed systems are often built on top of middlewares. As a result, when the architecture is defined, a certain part of the implementation of a class of distributed applications is already available. Then, it was argued that this enables performance testing to be successfully applied at such early stages.

The method proposed by Denaro, Polini and Emmerich [DPE04] is based upon the observation that the middleware used to build a distributed application often determines the overall performance of the application. However, they also noted that only the coupling between the middleware and the application architecture determines the actual performance. The same middleware may perform quite differently under the context of different applications. Based on such observation, architecture designs were proposed as a tool to derive application-specific performance test cases which can be executed on the early available middleware platform on which a distributed application is built. It then allows measurements of performance to be done in the very early stage of the development process.

3

Early

performance

testing

5

3.1 Early testing of performance

The approach for early performance testing of distributed component-based applications consists of four phases [DPE04]:

l

selection of the use-case scenarios relevant to performance, given a set of architecture designs,

l

mapping of the selected use-cases to the actual deployment technology and platform,

l

creation of stubs of components that are not available in the early stages of the development, but are needed to implement the use cases, and

l

execution of the test.

The detailed contents in each phase are discussed in the following sub-sections.

3.1.1 Selecting performance use-cases

First of all, the design of functional test cases is entirely different from the case in performance testing as already indicated in the previous section. However, as for performance testing of distributed applications, the main parameters relating to it are much more complicated than that described before. Table 1 is excerpted from the paper [DPE04] to illustrate this point.

excerpted from the paper [DPE04] to illustrate this point. Table 1: Performance parameters [DPE04]. Apart from

Table 1: Performance parameters [DPE04].

Apart from traditional concerns about workloads and physical resources, consideration about the middleware configuration is also highlighted in this table (in this case, it describes J2EE-based middleware). The last row of the table classifies

6

the relative interactions in distributed settings according to the place where they occur. This taxonomy is far from complete, however, it was believed that such a taxonomy of distributed interactions is key for using this approach. The next step is the definition of appropriate metrics to evaluate the performance relevance of the available use-cases according to the interactions that they trigger.

3.1.2 Mapping use-cases to middleware

At the early stage of development process, software architecture is generally defined at a very abstract level. It usually just describes the business logic and abstract many details of deployment platforms and technologies. From this point, it is necessary to understand how abstract use-cases are mapped to possible deployment technologies and platforms.

To facilitate the mapping from abstract use-cases to the concrete instances, software connectors might be a feasible solution as indicated [DPE04]. Software connectors mediate interactions among components. That is, they establish the rules that govern component interaction and specify any auxiliary mechanisms required [MMP00]. According to the paper [MMP00], four major categories of connectors, communication, coordination, conversion, and facilitation, were identified. It was based on the services provided to interacting components. In addition, major connector types, procedure call, data access, linkage, stream, event, arbitrator, adaptor, and distributor, were also identified. Each connector type supports one or more interaction services. The architecturally relevant details of each connector type are captured by dimensions, and possibly, sub-dimensions. One dimension consists of a set of values. Connector species are created by choosing the appropriate dimensions and values for those dimensions from connector types. Figure 1 depicts the software connector classification framework which might provide a more descriptive illustration about the whole structure.

As a particular element of software architecture, software connector was studied to investigate the possibility of defining systematic mappings between architectures and middlewares. Well characterized software connectors may be associated with deployment topologies that preserve the properties of the original architecture [DPE04]. As indicated, however, further work is still required to understand many dimensions and species of software connectors and their relationships with the possible deployment platforms and technologies.

7

7 Figure 1: Software connector classification framework [MMP00]. 3.1.3 Generating stubs To actually implement the test

Figure 1: Software connector classification framework [MMP00].

3.1.3 Generating stubs

To actually implement the test cases, it needs to solve the problem that not all of the application components which participate in the use-cases are available in the early stages of development. Stubs should be used in place where the components miss. Stubs are fake versions of components that can be used instead of the corresponding components for instantiating the abstract use-cases. Stubs will only take care that the distributed interactions happen as specified and the other components are coherently exercised.

The main hypothesis of this approach is that performance measurements in the presence of the stubs are decent approximations of the actual performance of the final application [DPE04]. It results from the observation that the available components, for instance, middleware and databases, embed the software that mainly impact performance. The coupling between such implementation support and the application-specific behavior can be extracted from the use-cases, while the implementation details of the business components remain negligible.

3.1.4 Executing the test

Building the support to test execution involves more technical problems provided scientific problems raised in the previous three sub-sections have been solved. In addition, several aspects, for example, deployment and implementation of workload generators, execution of measurements, can be automated.

8

4

Conclusion

In all, two software performance testing approaches were described in this paper. Workload characterization approach can be treated as a traditional performance testing approach that requires to carefully collecting a series of data in the production field and that can only be implemented at the end of the project. In contrast, early performance testing approach for distributed software applications seems to be more novel since it encourages to implement performance testing early in the development process, say, when the architecture is defined. Although it is still not a very mature approach and more researches need to be conducted upon it according to its advocators [DPE04], its future looks like to be promising since it allows to fix those performance problems as early as possible which is quite attractive.

Several other aspects also need to be discussed. First of all, there has been very little research published in the area of software performance testing. For example, with the search facility IEEE Xplore, if one enters software performance testing in the search field, there were only 3 results returned when this paper was written. Such a situation indicates that the field of software performance testing as a whole is only in its initial stage and needs much more emphasis in future. Secondly, the importance of requirements and specifications is discussed in this paper. The fact, however, is that usually no performance requirements are provided, which means that there is no precise way of determining whether or not the software performance is acceptable. Thirdly, a positive trend is that software performance, as an important quality, is increasingly punctuated during the development process. Smith and Williams [SmW02] proposed Software Performance Engineering (SPE) which is a systematic, quantitative approach to constructing software systems that meet performance objectives. It aids in tracking performance throughout the development process and prevents performance problems from emerging late in the life cycle.

9

References

AKL02

Avritzer A., Kondek J., Liu D., Weyuker E.J., Software performance testing based on workload characterization. Proc. of the 3 rd international workshop on software and performance, Jul. 2002, pp.

17-24.

AvW04

Avritzer A., and Weyuker E.J., The role of modeling in the performance testing of E-commerce applications. IEEE Transactions on software engineering, 30, 12, Dec. 2004, pp. 1072-1083.

BCK03

Bass L., Clements P., Kazman R., Software architecture in practice, second edition. Addision Wesley, Apr. 2003.

DPE04

Denaro G., Polini A., Emmerich W., Early performance testing of distributed software applications. Proc. of the 4 th international workshop on software and performance, 2004, pp. 94-103.

MMP00

Mehta N., Medvidovic N. and Phadke S., Towards a taxonomy of software connectors. In proc. of the 22 nd International conference on software engineering, 2000, pp. 178-187.

Mus93

Musa J.D., Operational profiles in software reliability engineering. IEEE Software, 10, 2, Mar. 1993, pp. 14-32.

SmW02

Smith C.U. and Williams L.G., Performance solutions: a practical guide to creating responsive, scalable software. Boston, MA, Addision Wesley, 2002.

VoW98

Vokolos F.I., Weyuker E.J., Performance testing of software systems. Proc. of the 1 st international workshop on software and performance, Oct. 1998, pp. 80-87.

WeV00

Weyuker E.J. and Vokolos F.I., Experience with performance testing of software systems: issues, an approach and a case study. IEEE Transactions on Software Engineering, 26, 12, Dec. 2000, pp.

1147-1156.

Software Reliability Engineering: A Roadmap

Michael R. Lyu

Software Reliability Engineering: A Roadmap Michael R. Lyu Michael R. Lyu received the Ph.D. in computer

Michael R. Lyu received the Ph.D. in computer science from University of California, Los Angeles in 1988. He is a Professor in the Computer Science and Engineering Department of the Chinese University of Hong Kong. He worked at the Jet Propulsion Laboratory, Bellcore, and Bell Labs; and taught at the University of Iowa. He has participated in more than 30 industrial projects, published over 250 papers, and helped to develop many commercial systems and software tools. Professor Lyu is frequently invited as a keynote or tutorial speaker to conferences and workshops in U.S., Europe, and Asia. He initiated the International Symposium on Software Reliability Engineering (ISSRE) in 1990. He also received Best Paper Awards in ISSRE'98 and in ISSRE'2003. Professor Lyu is an IEEE Fellow and an AAAS Fellow, for his contributions to software reliability engineering and software fault tolerance.

Software Reliability Engineering: A Roadmap

Michael R. Lyu Computer Science and Engineering Department The Chinese University of Hong Kong, Hong Kong lyu@cse.cuhk.edu.hk

Abstract

Software reliability engineering is focused on engineering techniques for developing and maintaining software systems whose reliability can be quantitatively evaluated. In order to estimate as well as to predict the reliability of software systems, failure data need to be properly measured by various means during software developmen t and operational phases. Moreover, credible software reliability models are required to track underlying software failure processes for accurate reliability analysis and forecasting. Although software reliability has remained an active research subject over the past 35 years, challenges and open questions still exist. In particular, vital future goals include the d evelopment of new software reliability engineering paradigms that take software architectures, testing techniques, and software failure manifestation mechanisms into consideration. In this paper, we review the history of software reliability engineering, the current trends and existing problems, and specific difficulties. Possible future directions and promising research subjects in software reliability engineering are also addressed.

1. Introduction

Software permeates our daily life. There is probably no other human-made material which is more omnipresent than software in our modern society. It has become a crucial part of many aspects of society:

home appliances, telecommunications, automobiles, airplanes, shopping, auditi ng, web teaching, personal entertainment, and so on. In particular, science and technology demand high-quality software for making improvements and breakthroughs.

The size and complexity of software systems have grown dramatically during the past few decades, and the trend will certainly continue in the future. The data from industry show that the size of the software for

various systems and applications has been growing exponentially for the past 40 years [20]. The trend of such growth in the telecommunication, business, defense, and transportation industries shows a compound growth rate of ten times every five years. Because of this ever-increasing dependency, software failures can lead to serious, ev en fatal, consequences in safety-critical systems as well as in normal business. Previous software failures have impaired several high- visibility programs and have led to loss of business

[28].

The ubiquitous software is also invisible, and its invisible nature makes it both beneficial and harmful. From the positive side, systems around us work seamlessly thanks to the smooth and swift execution of software. From the negative side, we often do not know when, where and how software ever has failed, or will fail. Consequently, while reliability engineering for hardware and physical systems continuously improves, re liability engineering for software does not really live up to our expectation over the years.

This situation is frustrating as well as encouraging. It is frustrating because the software crisis identified as early as the 1960s still stubbornly stays with us, and “software engineering” has not fully evolved into a real engineering discipline. Human judgments and subjective favorites, instead of physical laws and rigorous procedures, dominate many decision making processes in software engineering. The situation is particularly critical in software reliability engineering. Reliability is probably the most important factor to claim for any engineering discipline, as it quantitatively measures quality, and the quantity can be properly engineered. Yet software reliability engineering, as elaborated in later sections, is not yet fully delivering its promise. Nevertheless, there is an encouraging aspect to this situation. The demands on, techniques of, and enhancements to software are continually increasing, and so is the need to understand

its reliability. The unsettled software crisis poses tremendous opportunities for software engineering researchers as well as practitioners. The ability to manage quality software production is not only a necessity, but also a key distinguishing factor in maintaining a competitive advantage for modern businesses.

Software reliability engineering is centered on a key attribute, software reliability, which is defined as the probability of failure-free software operation for a specified period of time in a specified environment [2]. Among other attributes of software quality such as functionality, usability, capability, and maintainability, etc., software reliability is generally accepted as the major factor in software quality since it quantifies software failures, which can make a powerful system inoperative. Software reliability engineering (SRE) is therefore defined as the quantitative study of the operational behavior of software-based systems with respect to user requirements concerning reliability. As a proven technique, SRE has been adopted either as standard or as best current practice by more than 50 organizations in their software projects and reports [33], including AT&T, Lucent, IBM, NASA, Microsoft, and many others in Europe, Asia, and North America. However, this number is still relatively small compared to the large amount of software producers in the world.

Existing SRE techniques suffer from a number of weaknesses. First of all, current SRE techniques collect the failure data during integration testing or system testing phases. Failure data collected during the late testing phase may be too late for fundamental design changes. Secondly, the failure data collected in the in-house testing may be limited, and they may not represent failures that would be uncovered under actual operational environment. This is especially true for high-quality software systems which require extensive and wide-ranging testing. The reliability estimation and prediction using the restricted testing data may cause accuracy problems. Thirdly, current SRE techniques or modeling methods are based on some unrealistic assumptions that make the reliability estimation too optimistic relative to real situations. Of course, the existing software reliability models have had their successes; but every model can find successful cases to justify its existence. Without cross- industry validation, the modeling exercise may become merely of intellectual interest and would not be widely adopted in industry. Thus, although SRE has been around for a while, credible software reliability techniques are still urgently needed, particularly for modern software systems [24].

In the following sections we will discuss the past, the present, and the future of software reliability engineering. We first survey what techniques have been proposed and applied in the past, and then describe what the current trend is and what problems and concerns remain. Finally, we propose the possible future directions in software reliability engineering.

2.

engineering techniques

In the literature a number of techniques have been proposed to attack the software reliability engineering problems based on software fault lifecycle. We discuss these techniques, and focus on two of them.

2.1. Fault lifecycle techniques

Achieving highly reliable software from the customer’s perspective is a demanding job for all software engineers and reliability engineers. [28] summarizes the following four technical areas which are applicable to achieving reliable software systems, and they can also be regarded as four fault lifecycle techniques:

1) Fault prevention: to avoid, by construction, fault occurrences.

2) Fault removal: to detect, by verification and validation, the existence of faults and eliminate them.

3) Fault tolerance: to provide, by redundancy, service complying with the specification in spite of faults having occurred or occurring.

4) Fault/failure forecasting: to estimate, by evaluation, the presence of faults and the occurrences and consequences of failures. This has been the main focus of software reliability modeling.

Fault prevention is the initial defensive mechanism against unreliability. A fault which is never created costs nothing to fix. Fault prevention is therefore the inherent objective of every software engineering methodology. General approaches include formal methods in requirement specifications and program verifications, early user interaction and refinement of the requirements, disciplined and tool-assisted software design methods, enforced programming principles and environments, and systematic techniques for software reuse. Formalization of software engineering processes with mathematically specified languages and tools is an aggressive approach to rigorous engineering of software systems. When applied successfully, it can completely prevent faults. Unfortunately, its application scope has been

Historical

software

reliability

limited. Software reuse, on the other hand, finds a wider range of applications in industry, and there is empirical evidence for its effectiveness in fault prevention. However, software reuse without proper certification could lead to disaster. The explosion of the Ariane 5 rocket, among others, is a classic example where seemly harmless software reuse failed miserably, in which critical software faults slipped through all the testing and verification procedures, and where a system went terribly wrong only during complicated real-life operations.

Fault prevention mechanisms cannot guarantee avoidance of all software faults. When faults are injected into the software, fault removal is the next protective means. Two practical approaches for fault removal are software testing and software inspection, both of which have become standard industry practices in quality assurance. Directions in software testing techniques are addressed in [4] in detail.

When inherent faults remain undetected through the testing and inspection processes, they will stay with the software when it is released into the field. Fault tolerance is the last defending line in preventing faults from manifesting themselves as system failures. Fault tolerance is the survival attribute of software systems in terms of their ability to deliver continuous service to the customers. Software fault tolerance techniques enable software systems to (1) prevent dormant software faults from becoming active, such as defensive programming to check for input and output conditions and forbid illegal operations; (2) contain the manifested software errors within a confined boundary without further propagation, such as exception handling routines to treat unsuccessful operations; (3) recover software operations from erroneous conditions, such as checkpointing and rollback mechanisms; and (4) tolerate system-level faults methodically, such as employing design diversity in the software development.

Finally if software failures are destined to occur, it is critical to estimate and predict them. Fault/failure forecasting involves formulation of the fault/failure relationship, an understanding of the operational environment, the establishment of software reliability models, developing procedures and mechanisms for software reliability measurement, and analyzing and evaluating the measurement results. The ability to determine software reliability not only gives us guidance about software quality and when to stop testing, but also provides information for software maintenance needs. It can facilitate the validity of software warranty when reliability of software has

been properly certified. The concept of scheduled maintenance with software rejuvenation techniques [46] can also be solidified.

The subjects of fault prevention and fault removal have been discussed thoroughly by other articles in this issue. We focus our discussion on issues related to techniques on fault tolerance and fault/failure forecasting.

2.2.

measurement

As a major task of fault/failure forecasting, software reliability modeling has attracted much research attention in estimation (measuring the current state) as well as prediction (assessing the future state) of the reliability of a software system. A software reliability model specifies the form of a random process that describes the behavior of software failures with respect to time. A historical review as well as an application perspective of software reliability models can be found in [7, 28]. There are three main reliability modeling approaches: the error seeding and tagging approach, the data domain approach, and the time domain approach, which is considered to be the most popular one. The basic principle of time domain software reliability modeling is to perform curve fitting of observed time-based failure data by a pre-specified model formula, such that the model can be parameterized with statistical techniques (such as the Least Square or Maximum Likelihood methods). The model can then provide estimation of existing reliability or prediction of future reliability by extrapolation techniques. Software reliability models usually make a number of common assumptions, as follows. (1) The operation environment where the reliability is to be measured is the same as the testing environment in which the reliability model has been parameterized. (2) Once a failure occurs, the fault which causes the failure is immediately removed. (3) The fault removal process w ill not introduce new faults. (4) The number of faults inherent in the software and the way these faults manifest themselves to cause failures follow, at least in a statistical sense, certain mathematical formulae. Since the number of faults (as well as the failure rate) of the software system reduces when the testing progresses, resulting in growth of reliability, these models are often called software reliability growth models (SRGMs).

Since Jelinsky and Moranda proposed the first SRGM [23] in 1972, numerous SRGMs have been proposed in the past 35 years, such as exponential failure time class models, Weibull and Gamma failure

Software

reliability

models

and

time class models, infinite failure category models, Bayesian models, and so on [28, 36, 50]. Unified modeling approaches have also been attempted [19]. As mentioned before, the major challenges of these models do not lie in their technical soundness, but their validity and applicability in real world projects.

Determine Reliability Ob j ective

in real world projects. Determine Reliability Ob j ective Develop Op erat ional Profile Perform Software
in real world projects. Determine Reliability Ob j ective Develop Op erat ional Profile Perform Software

Develop Op erat ional Profile

Reliability Ob j ective Develop Op erat ional Profile Perform Software Testing Collect Failure Data Apply

Perform Software Testing

Develop Op erat ional Profile Perform Software Testing Collect Failure Data Apply Software Reliability Tools

Collect Failure Data

ional Profile Perform Software Testing Collect Failure Data Apply Software Reliability Tools Select Appropriate

Apply Software Reliability Tools

Collect Failure Data Apply Software Reliability Tools Select Appropriate Software Reliability Models Continue

Select Appropriate Software Reliability Models

Continue

Testing

No Reliability Objective met? Yes
No
Reliability
Objective
met?
Yes

Use Software Reliability Models to Calculate Current Reliability

Start to Deploy

Models to Calculate Current Reliability Start to Deploy Validate Reliability in the Field Feedback to Next

Validate Reliability in the Field

Start to Deploy Validate Reliability in the Field Feedback to Next Release Figure 1. Software Reliability

Feedback to Next Release

Figure 1. Software Reliability Engineering Process Overview

Figure 1 shows an SRE framework in current practice [28]. First, a reliability objective is determined quantitatively from the customer's viewpoint to maximize customer satisfaction, and customer usage is defined by developing an operational profile. The software is then tested according to the operational profile, failure data collected, and reliability tracked during testing to determine the product release time. This activity may be repeated until a certain reliability level has been achieved. Reliability is also validated in the field to evaluate the reliability engineering efforts and to achieve future product and process improvements.

It can be seen from Figure 1 that there are four major components in this SRE pro cess, namely (1) reliability

objective, (2) operational profile, (3) reliability modeling and measurement, and (4) reliability validation. A reliability objective is the specification of the reliability goal of a product from the customer viewpoint. If a reliability objective has been specified by the customer, that reliability objective should be used. Otherwise, we can select the reliability measure which is the most intuitive and easily understood, and then determine the customer's "tolerance threshold" for system failures in terms of this reliability measure.

The operational profile is a set of disjoint alternatives of system operational scenar ios and their associated probabilities of occurrence. The construction of an operational profile encourages testers to select test cases according to the system's likely operational usage,

which contributes to more accurate estimation of software reliability in the field.

Reliability modeling is an essential element of the reliability estimation process. It determines whether a product meets its reliability objective and is ready for release. One or more reliability models are employed to calculate, from failure data collected during system testing, various estimates of a product's reliability as a function of test time. Several interdependent estimates can be obtained to make equivalent statements about a product's reliability. These reliability estimates can provide the following information, which is useful for product quality management: (1) The reliability of the product at the end of system testing. (2) The amount of (additional) test time required to reach the product's reliability objective. (3) The reliability growth as a result of testing (e.g., the ratio of the value of the failure intensity at the start of testing to the value at the end of testing). (4) The predicted reliability beyond the system testing, such as th e product's reliability in the field.

Despite the existence of a large number of models, the problem of model selection and application is manageable, as there are guidelines and statistical methods for selecting an appropriate model for each application. Furthermore, experience has shown that it is sufficient to consider only a dozen models, particularly when they ar e already implemented in software tools [28].

Using these statistical methods, "best" estimates of reliability are obtained during testing. These estimates are then used to project the reliability during field operation in order to determine whether the reliability objective has been met. This procedure is an iterative process, since more testing will be needed if the objective is not met. When the operational profile is not fully developed, the application of a test

compression factor can assist in estimating field reliability. A test compression factor is defined as the ratio of execution time required in the operational phase to execution time required in the test phase to cover the input space of the program. Since testers during testing are quickly searching through the input space for both normal and difficult execution conditions, while users during operation only execute the software with a regular p ace, this factor represents the reduction of failure rate (or increase in reliability) during operation with respect to that observed during testing.

Finally, the projected field reliability has to be validated by comparing it with the observed field reliability. This validation not only establishes benchmarks and confidence levels of the reliability estimates, but also provides feedback to the SRE process for continuous improvement and better parameter tuning. When feedback is provided, SRE process enhancement comes naturally: the model validity is established, the growth of reliability is determined, and the test compression factor is refined.

2.3. Software fault tolerance techniques and models

Fault tolerance, when applicable, is one of the major approaches to achieve highly reliable software. There are two different groups of fault tolerance techniques:

single version and multi-version software techniques [29]. The former includes program modularity, system closure, atomicity of actions, error detection, exception handling, checkpoint and restart, process pairs, and data diversity [44]; while the latter, so-called design diversity, is employed where multiple software versions are developed independently by different program teams using different design methods, yet they provide equivalent services according to the same requirement specifications. The main techniques of this multiple version software approach are recovery blocks, N-version programming, N self-checking programming, and other variants based on these three fundamental techniques.

Reliability models attempt to estimate the probability of coincident failures in multiple versions. Eckhardt and Lee (1985) [15] proposed the first reliability model of fault correlation in design diversity to observe positive correlations between version failures on the assumption of variation of difficulty on demand space. Littlewood and Miller (1989) [25] suggested that there was a possibility that negative fault correlations may exist on the basis of forced design diversity. Dugan and Lyu (1995) [14] proposed a Markov reward model

to compare system reliability achieved by various design diversity approaches, and Tomek and Trivedi (1995) [43] suggested a Stochastic reward net model for software fault tolerance. Popov, Strigini et al. (2003) [37] estimated the upper and lower bounds for failure probability of design diversity based on the subdomain concept on the demand space. A detailed summary of fault-tolerant software and its reliability modeling methods can be found in [29]. Experimental comparisons and evaluations of some of the models are listed in [10] and [11].

3. Current trends and problems

The challenges in software reliability not only stem from the size, complexity, difficulty, and novelty of software applications in various domains, but also relate to the knowledge, training, experience and character of the software engineers involved. We address the current trends and problems from a number of software reliability engineering aspects.

3.1. Software reliability and system reliability

Although the nature of software faults is different from that of hardware faults, the theoretical foundation of software reliability comes from hardware reliability techniques. Previous work has been focused on extending the classical reliability theories from hardware to software, so that by employing familiar mathematical modeling schemes, we can establish software reliability framework consistently from the same viewpoints as hardware. The advantages of such modeling approaches are: (1 ) The physical meaning of the failure mechanism can be properly interpreted, so that the effect of failures on reliability, as measured in the form of failure rates, can be directly applied to the reliability models. (2) The combination of hardware reliability and software reliability to form system reliability models and measures can be provided in a unified theory. Even though the actual mechanisms of the various causes of hardware faults and software faults may be different, a single formulation can be employed from the reliability modeling and statistical estimation viewpoints. (3) System reliability models inherently engage system structure and modular design in block diagrams. The resulting reliability modeling process is not only intuitive (how components contribute to the overall reliability can be visualized), but also informative (reliability-critical components can be quickly identified).

The major drawbacks, however, are also obvious. First of all, while hardware failures may occur independently (or approximately so), software failures

do not happen independently. The interdependency of software failures is also very hard to describe in detail or to model precisely. Furthermore, similar hardware systems are developed from similar specifications, and hardware failures, usually cau sed by hardware defects, are repeatable and predictable. On the other hand, software systems are typically “one-of-a-kind.” Even similar software systems or different versions of the same software can be based on quite different specifications. Consequently, software failures, usually caused by human design faults, seldom repeat in exactly the same way or in any predictable pattern. Therefore, while failure mode and effect analysis (FMEA) and failure mode and effect criticality analysis (FMECA) have long been established for hardware systems, they are not very well understood for software systems.

3.2. Software reliability modeling

Among all software reliability models, SRGM is probably one of the most successful techniques in the literature, with more than 100 models existing in one form or another, through hundreds of publications. In practice, however, SRGMs encounter major challenges. First of all, software testers seldom follow the operational profile to test the software, so what is observed during software testing may not be directly extensible for operational use. Secondly, when the number of failures collected in a project is limited, it is hard to make statistically meaningful reliability predictions. Thirdly, some of the assumptions of SRGM are not realistic, e.g., the assumptions that the faults are independent of each other; that each fault has the same chance to be detected in one class; and that correction of a fault never introduces new faults [40]. Nevertheless, the above setbacks can be overcome with suitable means. Given proper data collection processes to avoid drastic invalidation of the model assumptions, it is generally possible to obtain accurate estimates of reliability and to know that these estimates are accurate.

Although some historical SRGMs have been widely adopted to predict software reliability, researchers believe they can further improve the prediction accuracy of these models by adding other important factors which affect the final software quality [12,31,48]. Among others, code coverage is a metric commonly engaged by software testers, as it indicates how completely a test set executes a software system under test, therefore influencing the resulting reliability measure. To incorporate the effect of code coverage on reliability in the traditional software reliability models, [12] proposes a technique using

both time and code coverage measurement for reliability prediction. It reduces the execution time by a parameterized factor when the test case neither increases code coverage nor causes a failure. These models, known as adjusted Non-Homogeneous Poisson Process (NHPP) models, have been shown empirically to achieve more accurate predictions than the original ones.

In the literature, several models have been proposed to determine the relationship between the number of failures/faults and the test coverage achieved, with various distributions. [48] suggests that this relation is a variant of the Rayleigh distribution, while [31] shows that it can be expressed as a logarithmic-exponential formula, based on the assumption that both fault coverage and test coverage follow the logarithmic NHPP growth model with respect to the execution time. More metrics can be incorporated to further explore this new modeling avenue.

Although there are a number of successful SRE models, they are typically measurement-based models which are employed in isolation at the later stage of the software development process. Early software reliability prediction models are often too insubstantial, seldom executable, insufficiently formal to be analyzable, and typically not linked to the target system. Their impact on the resulting reliability is therefore modest. There is currently a need for a creditable end-to-end software reliability model that can be directly linked to reliability prediction from the very beginning, so as to establish a systematic SRE procedure that can be certified, generalized and refined.

3.3. Metrics and measurements

Metrics and measurements have been an important part of the software development process, not only for software project budget planning but also for software quality assurance purposes. As software complexity and software quality are highly related to software reliability, the measurements of software complexity and quality attributes have been explored for early prediction of software reliability [39]. Static as well as dynamic program complexity measurements have been collected, such as lines of code, number of operators, relative program complexity, functional complexity, operational complexity, and so on. The complexity metrics can be further included in software reliability models for early reliability prediction, for example, to predict the initial software fault density and failure rate.

In SRGM, the two measurements related to reliability are: 1) the number of failures in a time period; and 2) time between failures. An important advancement of

SRGM is the notation of “time” during which failure data are recorded. It is demonstrated that CPU time is more suitable and more accurate than calendar time for recording failures, in which the actual execution time of software can be faithfully represented [35]. More recently, other forms of metrics for testing efforts have been incorporated into software reliability modeling to improve the prediction accuracy [8,18].

One key problem about software metrics and measurements is that they are not consistently defined and interpreted, again due to the lack of physical attributes of software. The achieved reliability measures may differ for different applications, yielding inconclusive results. A unified ontology to identify, describe, incorporate and understand reliability-related software metrics is therefore urgently needed.

3.4. Data collection and analysis

The software engineering process is described sardonically as a garbage-in/garbage-out process. That is to say, the accuracy of its output is bounded by the precision of its input. Data collection, consequently, plays a crucial role for the success of software reliability measurement.

There is an apparent tr ade-off between the data collection and the analysis effort. The more accuracy is required for analysis, the more effort is required for data collection. Fault-based data are usually easier to collect due to their static nature. Configuration management tools for source code maintenance can help to collect these data as developers are required to check in and check out new updated versions of code for fault removal. Failure-based data, on the other hand, are much harder to collect and usually require additional effort, for the following reasons. First, the dynamic operating condition where the failures occur may be hard to identify or describe. Moreover, the time when the failures o ccur must be recorded manually, after the failures are manifested. Calendar time data can be coarsely recorded, but they lack accuracy for modeling purposes. CPU time data, on the other hand, are very difficult to collect, particularly for distributed systems and networking environment where multiple CPUs are executing software in parallel. Certain forms of approximation are required to avoid the great pain in data collection, but then the accuracy of the data is consequently reduced. It is noted that while manual data collection can be very labor intensive, automatic data collection, although unavoidable, may be too intrusive (e.g., online collection of data can cause interruption to the system under test).

The amounts and types of data to be collected for reliability analysis purposes vary between organizations. Consequently, the experiences and lessons so gained may only be shared within the same company culture or at a high level of abstraction between organizations. To overcome this disadvantage, systematic failure data analysis for SRE purposes should be conducted.

Given field failure data co llected from a real system, the analysis consists of five steps: 1) preprocessing of data, 2) analysis of data, 3) model structure identification and parameter estimation, 4) model solution, if necessary, and 5) analysis of models. In Step 1, the necessary information is extracted from the field data. The processing in this step requires detailed understanding of the target software and operational conditions. The actual processing required depends on the type of data. For example, the information in human-generated reports is usually not completely formatted. Therefore, this step involves understanding the situations described in the reports and organizing the relevant information into a problem database. In contrast, the information in automatically generated event logs is already formatted. Data processing of event logs consists of extracting error events and coalescing related error events.

In Step 2, the data are interpreted. Typically, this step begins with a list of measures to evaluate. However, new issues that have a major impact on software reliability can also be identified during this step. The results from Step 2 are reliability characteristics of operational software in actual environments and issues that must be addressed to improve software reliability. These include fault and error classification, error propagation, error and failure distribution, software failure dependency, hardware-related software errors, evaluation of software fault tolerance, error recurrence, and diagnosis of recurrences.

In Step 3, appropriate models (such as Markov models) are identified based on the findings from Step 2. We identify model structures and realistic ranges of parameters. The identified models are abstractions of the software reliability behavior in real environments. Statistical analysis packages and measurement-based reliability analysis tools are useful at this stage.

Step 4 involves either using known techniques or developing new ones to solve the model. Model solution allows us to obtain measures, such as reliability, availability, and performability. The results obtained from the model must be validated against real data. Reliability and performance modeling and

evaluation tools such as SHARPE [45] can be used in this step.

In Step 5, “what if” questions are addressed, using the identified models. Model factors are varied and the resulting effects on software reliability are evaluated. Reliability bottlenecks are determined and the effects of design changes on software reliability are predicted. Research work currently addressed in this area includes software reliability modeling in the operational phase, the modeling of the impact of software failures on performance, detailed error and recovery processes, and so ftware error bursts. The knowledge and experience gained through such analysis can be used to plan additional studies and to develop the measurement techniques.

3.5. Methods and tools

In addition to software reliability growth modeling, many other methods are available for SRE. We provide a few examples of these methods and tools.

Fault trees provide a graphical and logical framework for a systematic analysis of system failure modes. Software reliability engineers can use them to assess the overall impact of software failures on a system, or to prove that certain failure modes will not occur . If they may occur, the occurrence probability can also be assessed. Fault tree models therefore provide an informative modeling framework that can be engaged to compare different design alternatives or system architectures with respect to reliability. In particular, they have been applied to both fault tolerant and fault intolerant (i.e., non-redundant) systems. Since this technique originates from hardware systems and has been extended to software systems, it can be employed to provide a unified modeling scheme for hardware/software co-design. Reliability modeling for hardware-software interactions is currently an area of intensive research [42].

In addition, simulation techniques can be provided for SRE purposes. They can produce observables of interest in reliability engineering, including discrete integer-valued quantities that occur as time progresses. One simulation approach produces artifacts in an actual software environmen t according to factors and influences believed to typify these entities within a given context [47]. The artifacts and environment are allowed to interact naturally, whereupon the flow of occurrences of activities and events is observed. This artifact-based simulation allows experiments to be set up to examine the nature of the relationships between software failures and other so ftware metrics, such as program structure, programming error characteristics,

and test strategies. It is suggested that the extent to which reliability depends merely on these factors can be measured by generating random programs having the given characteristics, and then observing their failure statistics.

Another reliability simulation approach [28] produces time-line imitations of reliability-related activities and events. Reliability measures of interest to the software process are modeled parametrically over time. The key to this approach is a rate-based architecture, in which phenomena occur naturally over time as controlled by their frequencies of occurrence, which depend on driving software metrics such as number of faults so far exposed or yet remaining, failure criticality, workforce level, test intensity, and software execution time. Rate-based event simulation is an example of a form of modeling called system dynamics, whose distinctive feature is that the observables are discrete events randomly occurring in time. Since many software reliability growth models are also based on rate (in terms of software hazard), the underlying processes assumed by these models are fundamentally the same as the rate-based reliability simulation. In general, simulations enable investigations of questions too difficult to be answered analytically, and are therefore more flexible and more powerful.

Various SRE measurement tools have been developed for data collection, reliability analysis, parameter estimation, model application and reliability simulation. Any major improvement on SRE is likely to focus on such tools. We need to provide tools and environments which can assist software developers to build reliable software for different applications. The partition of tools, environments, and techniques that will be engaged should reflect proper employment of the best current SRE practices.

3.6. Testing effectiveness and code coverage

As a typical mechanism for fault removal in software reliability engineering, software testing has been widely practiced in industr y for quality assurance and reliability improvement. Effective testing is defined as uncovering of most if not all detectable faults. As the total number of inherent faults is not known, testing effectiveness is usually represented by a measurable testing index. Code coverage, as an indicator to show how thoroughly software has been stressed, has been proposed and is widely employed to represent fault coverage.

 

Reference

Findings

 

Horgan (1994) [17] Frankl (1988) [16] Rapps (1988) [38]

High code coverage brings high software reliability and low fault rate.

Chen (1992) [13]

A correlation between code coverage and software reliability was observed.

Positive

Wong (1994)

The correlation between test effectiveness and block coverage is higher than that between test effectiveness and the size of test set.

Frate (1995)

An increase in reliability comes with an increase in at least one code coverage measure, and a decrease in reliability is accompanied by a decrease in at least one code coverage measure.

Cai (2005) [8]

Code coverage contributes to a noticeable amount of fault coverage.

Negative

Briand (2000) [6]

The testing result for published data did not support a causal dependency between code coverage and fault coverage.

Table 1. Comparison of Investigations on the Relation of Code Coverage to Fault Coverage

Despite the observations of a correlation between code coverage and fault coverage, a question is raised:

Can this phenomenon of concurrent growth be attributed to a causal dependency between code coverage and fault detection, or is it just coincidental due to the cumulative nature of both measures? In one investigation of this question, an experiment involving Monte Carlo simulation was conducted on the assumption that there is no causal dependency between code coverage and fault detection [6]. The testing result for published data did not support a causal dependency between code coverage and defect coverage.

Nevertheless, many researchers consider coverage as

a faithful indicator of the effectiveness of software

testing results . A comparison among various studies

on the impact of code coverage on software reliability

is shown in Table 1 .

3.7. Testing and operational profiles

The operational profile is a quantitative characterization of how a system will be used in the field by customers. It helps to schedule test activities, generate test cases, and select test runs. By allocating development and test resources to functions on the basis of how they are used, software reliability engineering can thus be planned with productivity and economics considerations in mind.

Using an operational profile to guide system testing ensures that if testing is terminated and the software is shipped because of imperative schedule constraints, the most-used operations will have received the most testing, and the reliability level will be the maximum that is practically achievable for the given test time. Also, in guiding regression testing, the profile tends to find, among the faults introduced by changes, the ones that have the most effect on reliability. Examples of

the benefits of applying operational profiles can be found in a number of industrial projects [34].

Although significant improvement can be achieved by employing operational profiles in regression or system testing, challenges still exist for this technique. First of all, the operational profiles for some applications are hard to develop, especially for some distributed software systems, e.g., Web services. Moreover, unlike those of hardware, the operational profiles of software cannot be duplicated in order to speed the testing, because the failure behavior of software depends greatly on its input sequence and internal status. While in unit testing, different software units can be tested at the same time, this approach is therefore not applicable in system testing or regression testing. As a result, learning to deal with improper operational profiles and the dependences within the operational profile are the two major problems in operational profile techniques.

3.8. Industry practice and concerns

Although some success stories have been reported, there is a lack of wide industry adoption for software reliability engineering across various applications. Software practitioners often see reliability as a cost rather than a value, an investment rather than a return. Often the reliability attribute of a product takes less priority than its functionality or innovation. When product delivery schedule is tight, reliability is often the first element to be squeezed.

The main reason for the lack of industry enthusiasm in SRE is because its cost-effectiveness is not clear. Current SRE techniques incur visible overhead but yield invisible benefits. In contrast, a company’s target is to have visible benefit but invisible overhead. The former requires some demonstration in the form of successful projects, while the latter involves avoidance

of labor-intensive tasks. Many companies, voluntarily or under compulsion from their quality control policy, collect failure data and make reliability measurements. They are not willing to spend much effort on data collection, let alone data sharing. Consequently, reliability results cannot be compared or benchmarked, and the experiences are hard to accumulate. Most software practitioners only employ some straightforward methods and metrics for their product reliability control. For example, they may use some general guidelines for quality metrics, such as fault density, lines of code, or development or testing time, and compare current projects with previous ones.

As the competitive advantage of product reliability is less obvious than that of other product quality attributes (such as performance or usability), few practitioners are willing to try out emerging techniques on SRE. The fact that there are so many software reliability models to choose from also intimidates practitioners. So instead of investigating which models are suitable for their environments or which model selection criteria can be applied, practitioners tend to simply take reliability measurements casually, and they are often suspicious about the reliability numbers obtained by the models. Many software projects claim to set reliability objectives such as five 9’s or six 9’s (meaning 0.99999 to 0.999999 availability or 10 -5 to 10 -6 failures per execution hour), but few can validate their reliability achievement.

Two major successful hardware reliability engineering techniques, reliability prediction by system architecture block diagrams and FME(C)A, still cannot be directly applied to software reliability engineering. This, as explained earlier, is due to the intricate software dependencies within and between software components (and sub-systems). If software components can be decoupled, or their dependencies can be clearly identified and properly modeled, then these popular techniques in hardware may be applicable to software, whereupon wide industry adoption may occur. We elaborate this in the following section.

3.9. Software architecture

Systematic examination of software architectures for a better way to support software development has been an active research direction in the past 10 years, and it will continue to be center stage in the coming decade [41]. Software architectural design not only impacts software development activities, but also affects SRE efforts. Software architecture should be enhanced to decrease the dependency of different software pieces

that run on the same computer or platform so that their reliability does not interact. Fault isolation is a major design consideration for software architecture. Good software architecture should enjoy the property that exceptions are raised when faults occur, and module failures are properly confined without causing system failures. In particular, this type of component-based software development approach requires different framework, quality assurance paradigm [9], and reliability modeling [51] from those in traditional software development.

A recent trend in software architecture is that as information engineering is becoming the central focus for today’s businesses, service-oriented systems and the associated software engineering will be the de facto standards for business development. Service orientation requires seamless integration of heterogeneous components a nd their interoperability for proper service creation and delivery. In a service- oriented framework, new paradigms for system organizations and software architectures are needed for ensuring adequate decoupling of components, swift discovery of applications, and reliable delivery of services. Such emerging software architectures include cross-platform techniques [5], open-world software [3], service-oriented architectures [32], and Web applications [22]. Although some modeling approaches have been proposed to es timate the reliability for specific Web systems [49], SRE techniques for general Web services and other service-oriented architectures require more research work.

4. Possible future directions

SRE activities span the whole software lifecycle. We discuss possible future directions with respect to five areas: software architecture, design, testing, metrics and emerging applications.

4.1. Reliability for software architectures and off-the-shelf components

Due to the ever-increasing complexity of software systems, modern software is seldom built from scratch. Instead, reusable components have been developed and employed, formally or informally. On the one hand, revolutionary and evolutionary object-oriented design and programming paradigms have vigorously pushed software reuse. On the other hand, reusable software libraries have been a deciding factor regarding whether a software development environment or methodology would be popular or not. In the light of this shift, reliability engineering for software development is

focusing on two major aspects: software architecture, and component-based software engineering.

The software architecture of a system consists of software components, their external properties, and their relationships with one another. As software architecture is the foundation of the final software product, the design and management of software architecture is becoming the dominant factor in software reliability engineering research. Well- designed software architecture not only provides a strong, reliable basis for the subsequent software development and maintenance phases, but also offers various options for fault avoidance and fault tolerance in achieving high reliability. Due to the cardinal importance of, and complexity involved in, software architecture design and modeling, being a good software architect is a rare talent that is highly demanded. A good software architect sees widely and thinks deeply, as the components should eventually fit together in the overall framework, and the anticipation of change has to be considered in th e architecture design. A clean, carefully laid out architecture requires up-front investments in various design considerations, including high cohesion, low coupling, separation of modules, proper system closure, concise interfaces, avoidance of complexity, etc. These investments, however, are worthwhile since they eventually help to increase software reliability and reduce operation and maintenance costs.

One central research issue for software architecture concerning reliability is the design of failure-resilient architecture. This requires an effective software architecture design which can guarantee separation of components when software executes. When component failures occur in the system, they can then be quickly identified and properly contained. Various techniques can be explored in such a design. For example, memory protection prevents interference and failure propagation between different application processes. Guaranteed separation between applications has been a major requirement for the integration of multiple software services in complicated modern systems. It should be noted that the separation methods can support one another, and usually they are combined for achieve better reliability returns. Exploiting this synergy for reliability assessment is a possibility for further exploration.

In designing failure-resilient architecture, additional resources and techniques are often engaged. For example, error handling mechanisms for fault detection, diagnosis, isolation, and recovery procedures are incorporated to tolerate component failures; however,

these mechanisms will themselves have some impact on the system. Software architecture has to take this impact into consideration. On the one hand, the added reliability-enhancement rou tines should not introduce unnecessary complexity, making them error-prone, which would decrease the reliability instead of increasing it. On the other hand, these routines should be made unintrusive while they monitor the system, and they should not further jeopardize the system while they are carrying out recovery functions. Designing concise, simple, yet effective mechanisms to perform fault detection and recovery within a general framework is an active resear ch topic for researchers.

While software architecture represents the product view of software systems, component-based software engineering addresses the process view of software engineering. In this popular software development technique, many research issues are identified, such as the following. How can reliable general reusable components be identified and designed? How can existing components be modified for reusability? How can a clean interface design be provided for components so that their interactions are fully under control? How can defensive mechanisms be provided for the components so that they are protected from others, and will not cause major failures? How can it be determined whether a component is risk-free? How can the reliability of a component be assessed under untested yet foreseeable operational conditions? How can the interactions of components be modeled if they cannot be assumed independent? Component-based software engineering allows structure-based reliability to be realized, which facilitates design for reliability before the software is implemented and tested. The dependencies among components will thus need to be properly captured and modeled first.

These methods favor reliability engineering in multiple ways. First of all, they directly increase reliability by reducing the frequency and severity of failures. Run-time protections may also detect faults before they cause serious failures. After failures, they make fault diagnosis easier, and thus accelerate reliability improvements. For reliability assessment, these failure prevention methods reduce the uncertainties of application interdependencies or unexpected environments. So, for instance, having sufficient separation between running applications ensures that when we port an application to a new platform, we can trust its failure rate to equal that experienced in a similar use on a previous platform plus that of the new platform, rather than being also affected by the specific combination of other applications present on the new platform. Structure-

based reliability models can then be employed with this system aspect in place. With this modeling framework assisted by well-engineered software architecture, the range of applicability of structure- based models can further be increased. Examples of new applications could be to specify and investigate failure dependence between components, to cope with wide variations of reliability depending on the usage environment, and to assess the impact of system risk when components are checked-in or checked-out of the system.

4.2. Achieving design for reliability

To achieve reliable system design, fault tolerance mechanism needs to be in place. A typical response to system or software faults during operation includes a sequence of stages: Fault confinement, Fault detection, Diagnosis, Reconfiguration, Recovery, Restart, Repair, and Reintegration. Modern software systems pose challenging research issues in these stages, which are described as follows:

1. Fault confinement. This stage limits the spread of fault effects to one area of the system, thus preventing contamination of other areas. Fault-confinement can be achieved through use of self-checking acceptance tests, exception handling routines, consistency checking mechanisms, and multiple requests/confirmations. As the erroneous system behaviours due to software faults are typically unpredictable, reduction of dependencies is the key to successful confinement of software faults. This has been an open problem for software reliability engineering, and will remain a tough research challenge.

2. Fault detection. This stage recognizes that something unexpected has occurred in the system. Fault latency is the period of time between the occurrence of a software fault and its detection. The shorter it is, the better the system can recover. Techniques fall in two classes: off-line and on-line. Off-line techniques such as diagnostic programs can offer comprehensive fault detection, but the system cannot perform useful work while under test. On-line techniques, such as watchdog monitors or redundancy schemes, provide a real-time detection capability that is performed concurrently with useful work.

3. Diagnosis. This stage is necessary if the fault detection technique does not provide information about the failure location and/or properties. On-line, failure- prevention diagnosis is the research trend. When the diagnosis indicates unhealthy conditions in the system (such as low available system resources), software

rejuvenation can be performed to achieve in-time transient failure prevention.

4. Reconfiguration. This stage occurs when a fault is

detected and a permanent failure is located. The system may reconfigure its components either to replace the failed component or to isolate it from the rest of the system. Successful reconfiguration requires robust and flexible software architecture and the associated reconfiguration schemes.

5. Recovery. This stage utilizes techniques to

eliminate the effects of faults. Two basic recovery approaches are based on: fault masking, retry and rollback. Fault-masking techniques hide the effects of failures by allowing redundant, correct information to outweigh the incorrect information. To handle design (permanent) faults, N-version programming can be employed. Retry, on the other hand, attempts a second try at an operation and is based on the premise that many faults are transient in nature. A recovery blocks approach is engaged to recover from software design faults in this case. Rollback makes use of the system operation having been backed up (checkpointed) to

some point in its processing prior to fault detection and operation recommences from this point. Fault latency

is important here because th e rollback must go back far

enough to avoid the effects of undetected errors that occurred before the detected error. The effectiveness of design diversity as represented by N-version programming and recovery blocks, however, continues

to

be actively debated.

6.

Restart. This stage occurs after the recovery of

undamaged information. Depending on the way the system is configured, hot restart, warm restart, or cold restart can be achieved. In hot restart, resumption of all operations from the point of fault detection can be attempted, and this is possible only if no damage has occurred. In warm restart, only some of the processes can be resumed without loss; while in cold restart, complete reload of the system is performed with no processes surviving.

7. Repair. In this stage, a failed component is

replaced. Repair can be off-line or on-line. In off-line repair, if proper component isolation can be achieved, the system will continue as the failed component can be removed for operation. Otherwise, the system must be brought down to perform the repair, and so the system availability and reliability depends on how fast

a fault can be located and removed. In on-line repair

the component may be replaced immediately with a backup spare (in a procedure equivalent to reconfiguration) or operation may continue without the faulty component (for example, masking redundancy

or graceful degradation). With on-line repair, system operation is not interrupted; however, achieving complete and seamless repair poses a major challenge to researchers.

8. Reintegration. In this stage the repaired module must be reintegrated into the system. For on-line repair, reintegration must be performed without interrupting system operation.

Design for reliability techniques can further be pursued in four different areas: fault avoidance, fault detection, masking redundancy, and dynamic redundancy. Non-redundant systems are fault intolerant and, to achieve reliability, generally use fault avoidance techniques. Redundant systems typically use fault detection, masking redundancy, and dynamic redundancy to automate one or more of the stages of fault handling. The main design consideration for software fault tolerance is cost-effectiveness. The resulting design has to be effective in providing better reliability, yet it should not introduce excessive cost, including performance penalty and unwarranted complexity, which may eventually prove unworthy of the investigation.

4.3. Testing for reliability assessment

Software testing and software reliability have traditionally belonged to two separate communities. Software testers test software without referring to how software will operate in the field, as often the environment cannot be fully represented in the laboratory. Consequently they design test cases for exceptional and boundary conditions, and they spend more time trying to break the software than conducting normal operations. Software reliability measurers, on the other hand, insist that software should be tested according to its operational profile in order to allow accurate reliability estimation and prediction. In the future, it will be important to bring the two groups together, so that on the one hand, software testing can be effectively conducted, while on the other hand, software reliability can be accurately measured. One approach is to measure the test compression factor, which is defined as the ratio between the mean time between failures during operation and during testing. This factor can be empirically determined so that software reliability in the field can be predicted from that estimated during testing. Another approach is to ascertain how other testing related factors can be incorporated into software reliability modeling, so that accurate measures can be obtained based on the effectiveness of testing efforts.

Recent studies have investig ated the effect of code coverage on fault detection under different testing profiles, using different coverage metrics, and have studied its application in reducing test set size [30]. Experimental data are required to evaluate code

coverage and determine whether it is a trustworthy indicator for the effectiveness of a test set with respect

to fault detection capability. Also, the effect of code

coverage on fault detection may vary under different testing profiles. The correlation between code coverage and fault coverage should be examined across different testing schemes, including function testing, random testing, normal testing, and exception testing. In other words, white box testing and black box testing should be cross–checked for their effectiveness in exploring faults, and thus yielding reliability increase.

Furthermore, evidence for variation between different coverage metrics can also established. Some metrics may be independent and some correlated. The

quantitative relationship between different code coverage metrics and fault detection capability should

be assessed, so that redundant metrics can be removed,

and orthogonal ones can be combined. New findings about the effect of code coverage and other metrics on fault detection can be used to guide the selection and evaluation of test cases under various testing profiles, and a systematic testing scheme with predictable reliability achievement can therefore be derived.

Reducing test set size is a key goal in software testing. Different testing metrics should be evaluated regarding whether they are good filters in reducing the test set size, while maintaining the same effectiveness in achieving reliability. This assessment should be conducted under various testing scenarios [8]. If such

a filtering capability can be established, then the

effectiveness of test cases can be quantitatively determined when they are designed. This would allow the prediction of reliability growth with the creation a test set before it is executed on the software, thus facilitating early reliability prediction and possible feedback control for better test set design schemes.

Other than linking software testing and reliability with code coverage, statistical learning techniques may offer another promising avenue to explore. In particular, statistical debugging approaches [26, 52], whose original purpose was to identify software faults with probabilistic modeling of program predicates, can provide a fine quantitative assessment of program codes with respect to software faults. They can therefore help to establish accurate software reliability

prediction models based on program structures under testing.

4.4. Metrics for reliability prediction

Today it is almost a mandate for companies to collect software metrics as an indication of a maturing software development process. While it is not hard to collect metrics data, it is not easy to collect clean and consistent data. It is even more difficult to derive meaningful results from the collected metrics data. Collecting metrics data for software reliability prediction purposes across various projects and applications is a major challenge. Moreover, industrial software engineering data, particularly those related to system failures, are histori cally hard to obtain across a range of organizations. It will be important for a variety of sources (such as NASA, Microsoft, IBM, Cisco, etc.) across industry and academia to make available real-failure data for joint investigation to establish credible reliability analysis procedures. Such a joint effort should define (1) what data to collect by considering domain sensitivities, accessibility, privacy, and utility; (2) how to collect data in terms of tools and techniques; and (3) how to interpret and analyze the data using existing techniques.

In addition to industrial data collection efforts, novel methods to improve reliability prediction are actively being researched. For example, by extracting rich information from metrics data using a sound statistical and probability foundation, Bayesian Belief Networks (BBNs) offer a promising direction for investigation in software engineering [7]. BBNs provide an attractive formalism for different software cases. The technique allows software engineers to describe prior knowledge about software development quality and software verification and validation (SV&V) quality, with manageable visual descriptions and automated inferences. The software reliability process can then be modified with inference from observed failures, and future reliability can be predicted. With proper engagement of software metrics, this is likely to be a powerful tool for reliability assessment of software based systems, finding applications in predicting software defects, forecasting software reliability, and determining runaway projects [1].

Furthermore, traditional reliability models can be enhanced to incorporate some testing completeness or effectiveness metrics, such as code coverage, as well as their traditional testing-time based metrics. The key idea is that failure detection is not only related to the time that the software is under testing, but also what fraction of the code has been executed by the testing.

The effect of testing time on reliability can be estimated using distributions from traditional SRGMs. However, new models are needed to describe the effect of coverage on reliability. These two dimensions, testing time and coverage, are not orthogonal. The degree of dependency between them is thus an open problem for investigation. Formulation of new reliability models which integrate time and coverage measurements for reliability prediction would be a promising direction.

One drawback of the current metrics and data collection process is that it is a one-way, open-loop avenue: while metrics of the development process can indicate or predict the outcome quality, such as the reliability, of the resulting product, they often cannot provide feedback to the process regarding how to make improvement. Metrics would present tremendous benefits to reliability engineering if they could achieve not just prediction, but also refinement. Traditional software reliability models take metrics (such as defect density or times between failures) as input and produce reliability quantity as the output. In the future, a reverse function is urgently called for:

given a reliability goal, what should the reliability process (and the resulting metrics) look like? By providing such feedback, it is expected that a closed- loop software reliability engineering process can be informative as well as beneficial in achieving predictably reliable software.

4.5.

applications

Software engineering targeted for general systems may be too ambitious. It may find more successful applications if it is domain-specific. In this Future of Software Engineering volume, future software engineering techniques for a number of emerging application domains have been thoroughly discussed. Emerging software applications also create abundant opportunities for domain-specific reliability engineering.

One key industry in which software will have a tremendous presence is the service industry. Service- oriented design has been employed since the 1990s in the telecommunications industry, and it reached software engineering community as a powerful paradigm for Web service development, in which standardized interfaces and protocols gradually enabled the use of third-party functionality over the Internet, creating seamless vertical integration and enterprise process management for cross-platform, cross-provider, and cross-domain applications. Based

Reliability for emerging software

on the future trends for Web application development as laid out in [22], software reliability engineering for this emerging technique poses enormous challenges and opportunities. The design of reliable Web services and the assessment of Web service reliability are novel and open research questions. On the one hand, having abundant service providers in a Web service makes the design diversity approach suddenly appealing, as the diversified service design is perceived not as cost, but as an available resource. On the other hand, this unplanned diversity may not be equipped with the necessary quality, and the compatibility among various service providers can pose major problems. Seamless Web service composition in this emerging application domain is therefore a central issue for reliability engineering. Extensive experiments are required in the area of measurement of Web service reliability. Some investigations have been initiated with limited success [27], but more efforts are needed.

Researchers have proposed the publish/subscribe paradigm as a basis for middleware platforms that support software applications composed of highly evolvable and dynamic federations of components. In this approach, components do not interact with each other directly; instead an additional middleware mediates their communications. Publish/subscribe middleware decouples the communication among components and supports implicit bindings among components. The sender does not know the identity of the receivers of its messages, but the middleware identifies them dynamically. Consequently new components can dynamically join the federation, become immediately active, and cooperate with the other components without requiring any reconfiguration of the architecture. Interested readers can refer to [21] for future trends in middleware-based software engineering technologies.

The open system approach is another trend in software applications. Closed-world assumptions do not hold in an increasing number of cases, especially in ubiquitous and pervasive computing settings, where the world is intrinsically open. Applications cover a wide range of areas, from dynamic supply-chain management, dynamic enterprise federations, and virtual endeavors, on the enterprise level, to automotive applications and home automation on the embedded-systems level. In an open world, the environment changes continuously. Software must adapt and react dynamically to changes, even if they are unanticipated. Moreover, the world is open to new components that context changes could make dynamically available – for example, due to mobility. Systems can discover and bind such components

dynamically to the application while it is executing. The software must therefore exhibit a self-organization capability. In other words, the traditional solution that software designers adopted – carefully elicit change requests, prioritize them, sp ecify them, design changes, implement and test, then redeploy the software – is no longer viable. More flexible and dynamically adjustable reliability engineering paradigms for rapid responses to software evolution are required.

5. Conclusions

As the cost of software application failures grows and as these failures increasingly impact business performance, software reliability will become progressively more important. Employing effective software reliability engineering techniques to improve product and process reliability would be the industry’s best interests as well as major challenges. In this paper, we have reviewed the history of software reliability engineering, the current trends and existing problems, and specific difficulties. Possible future directions and promising research problems in software reliability engineering have also been addressed. We have laid out the current and possible future trends for software reliability engineering in terms of meeting industry and customer needs. In particular, we have identified new software reliability engineering paradigms by taking software architectures, testing techniques, and software failure manifestation mechanisms into consideration. Some thoughts on emerging software applications have also been provided.

References

[1] S. Amasaki, O. Mizuno, T. Kikuno, and Y. Takagi, “A Bayesian Belief Network for Pr edicting Residual Faults in Software Products,” Proceedings of 14th International Symposium on Software Reliability Engineering (ISSRE2003), November 2003, pp. 215-226,

[2] ANSI/IEEE, Standard Glossa ry of Software Engineering Terminology, STD-729-1991, ANSI/IEEE, 1991.

[3] L. Baresi, E. Nitto, and C. Ghezzi, “Toward Open-World Software: Issues and Challenges,” IEEE Computer, October 2006, pp. 36-43.

[4] A. Bertolino, “Software Testing Research: Achievements, Challenges, Dreams,” Future of Software Engineering 2007, L. Briand and A. Wolf (eds.), IEEE-CS Press, 2007.

[5] J. Bishop and N. Horspool, “Cross-Platform Development: Software That Lasts,” IEEE Computer, October 2006, pp. 26-35.

[6] L. Briand and D. Pfahl, “Using Simulation for Assessing the Real Impact of Test Coverage on Defect Coverage,”

IEEE Transactions on Reliability , vol. 49, no. 1, March 2000, pp. 60-70.

[7] J. Cheng, D.A. Bell, and W. Liu, “Learning Belief Networks from Data: An Information Theory Based Approach, Proceedings of the Sixth International Conference on Information and Knowledge Management , Las Vegas, 1997, pp. 325-331.

[8] X. Cai and M.R. Lyu, “The Effect of Code Coverage on Fault Detection Under Diffe rent Testing Profiles,” ICSE 2005 Workshop on Advances in Model-Based Software Testing (A-MOST), St. Louis, Missouri, May 2005.

[9] X. Cai, M.R. Lyu, a nd K.F. Wong, “A Generic Environment for COTS Testing and Quality Prediction,” Testing Commercial-off-the-shelf Components and Systems, S. Beydeda and V. Gruhn (eds.), Springer-Verlag, Berlin, 2005, pp. 315-347.

[10] X. Cai, M.R. Lyu, and M.A. Vouk, “An Experimental Evaluation on Reliability Features of N-Version Programming,” in Proceedings 16th International Symposium on Software Reliability Engineering (ISSRE’2005), Chicago, Illinois, Nov. 8-11, 2005.

[11] X. Cai and M.R. Ly u, “An Empirical Study on Reliability and Fault Correlation Models for Diverse Software Systems,” in Proceedings 15th International Symposium on Software Reliability Engineering (ISSRE’2004), Saint-Malo, France, Nov. 2004, pp.125-136.

[12] M. Chen, M.R. Lyu, and E. Wong, “Effect of Code Coverage on Software Reliability Measurement,” IEEE Transactions on Reliability , vol. 50, no. 2, June 2001,

pp.165-170.

[13] M.H. Chen, A.P. Mathur , and V.J. Rego, “Effect of Testing Techniques on Software Reliability Estimates Obtained Using Time Domain Models,” In Proceedings of the 10th Annual Software Reliability Symposium, Denver, Colorado, June 1992, pp. 116-123.

[14] J.B. Dugan and M.R. Ly u, “Dependability Modeling for Fault-Tolerant Software and Systems,” in Software Fault Tolerance , M. R. Lyu (ed.), Ne w York: Wiley, 1995, pp.

109–138.

[15] D.E. Eckhardt and L.D. L ee, “A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors,” IEEE Transactions on Software Engineering, vol. 11, no. 12, December 1985, pp. 1511–1517.

[16] P.G. Frankl and E.J. We yuker, “An Applicable Family of Data Flow Testing Criteria,” IEEE Transactions on Software Engineering, vol. 14, no. 10, October 1988, pp.

1483-1498.

[17] J.R. Horgan, S. London, and M.R. Lyu, “Achieving Software Quality with Testing Coverage Measures,” IEEE Computer, vol. 27, no.9, September 1994, pp. 60-69.

[18] C.Y. Huang and M.R. Lyu, “Optimal Release Time for Software Systems Considering Co st, Testing-Effort, and Test

Efficiency,” IEEE Transactions on Reliability , vol. 54, no. 4, December 2005, pp. 583-591.

[19] C.Y. Huang, M.R. Lyu, and S.Y. Kuo, "A Unified Scheme of Some Non-Homogeneous Poisson Process Models for Software Reliability Estimation," IEEE Transactions on Software Engineering, vol. 29, no. 3, March 2003, pp. 261-269.

[20] W.S. Humphrey, “The Future of Software Engineering:

I,” Watts New Column, News at SEI, vol. 4, no. 1, March,

2001.

[21] V. Issarny, M. Caporuscio, and N. Georgantas: “A Perspective on the Future of Middleware-Based Software Engineering,” Future of Software Engineering 2007, L. Briand and A. Wolf (eds.), IEEE-CS Press, 2007.

[22] M. Jazayeri, “Web Application Development: The Coming Trends,” Future of Software Engineering 2007, L. Briand and A. Wolf (eds.), IEEE-CS Press, 2007.

[23] Z. Jelinski and P.B. Moranda, “Software Reliability Research,” in Proceedings of the Statistical Methods for the Evaluation of Computer System Performance , Academic Press, 1972, pp. 465-484.

[24] B. Littlewood and L. Strigini, “Software Reliability and Dependability: A Roadmap,” in Proceedings of the 22nd International Conference on Software Engineering (ICSE’2000), Limerick, June 2000, pp. 177-188.

[25] B. Littlewood and D. Miller, “Conceptual Modeling of Coincident Failures in Multiversion Software,” IEEE Transactions on Software Engineering, vol. 15, no. 12, December 1989, pp. 1596–1614.

[26] C. Liu, L. Fei, X. Yan, J. Han, and S. Midkiff, “Statistical Debugging: A Hy pothesis Testing-based Approach,” IEEE Transaction on Software Engineering, vol. 32, no. 10, October, 2006, pp. 831-848.

[27] N. Looker and J. Xu, “Assessing the Dependability of SOAP-RPC-Based Web Services by Fault Injection,” in Proceedings of 9th IEEE International Workshop on Object- oriented Real-time Dependable Systems, 2003, pp. 163-170.

[28] M.R. Lyu (ed.), Handbook of Software Reliability Engineering, IEEE Computer Society Press and McGraw- Hill, 1996.

[29] M.R. Lyu and X. Cai, “Fault-Tolerant Software,” Encyclopedia on Computer Science and Engineering, Benjamin Wah (ed.), Wiley, 2007.

[30] M.R. Lyu, Z. Huang, S. Sze, and X. Cai, “An Empirical Study on Testing and Fault Tolerance for Software Reliability Engineering,” in Proceedings 14th IEEE International Symposium on Software Reliability Engineering (ISSRE'2003), Denver, Colorado, November 2003, pp.119-130.

[31] Y.K. Malaiya, N. Li, J.M. Bieman, and R. Karcich, “Software Reliability Growth with Test Coverage,” IEEE

Transactions on Reliability , vol. 51, no. 4, December 2002, pp. 420-426.

[32] T. Margaria and B. St effen, “Service Engineering:

Linking Business and IT,” IEEE Computer, October 2006, pp. 45-55.

[33] J.D. Musa, Software Reliability Engineering: More Reliable Software Faster and Cheaper (2nd Edition), AuthorHouse, 2004.

[34] J.D. Musa, “Operational Profiles in Software Reliability Engineering,” IEEE Software , Volume 10, Issue 2, March 1993, pp. 14-32.

[35] J.D. Musa, A. Ia nnino, and K. Okumoto, Software Reliability: Measurement, Prediction, Application, McGraw- Hill, Inc., New York, NY, 1987.

[36] H. Pham, Software Reliability, Springer, Singapore,

2000.

[37] P.T. Popov, L. Strigini , J. May, and S. Kuball, “Estimating Bounds on the Reliability of Diverse Systems,” IEEE Transactions on Software Engineering, vol. 29, no. 4, April 2003, pp. 345–359.

[38] S. Rapps and E.J. Weyuke r, “Selecting Software Test

Data Using Data Flow Information,” IEEE Transactions on Software Engineering, vol. 11, no. 4, April 1985, pp. 367-

375.

[39] Rome Laboratory (RL), Methodology for Software Reliability Prediction and Assessment , Technical Report RL- TR-92-52, volumes 1 and 2, 1992.

[40] M.L. Shooman, Reliability of Computer Systems and Networks: Fault Tolerance, Analysis and Design, Wiley, New York, 2002.

[41] R. Taylor and A. van der Hoek, “Software Design and Architecture: The Once and Future Focus of Software Engineering,” Future of Software Engineering 2007, L. Briand and A. Wolf (eds.), IEEE-CS Press, 2007.

[42] X. Teng, H. Pham, and D. Jeske, “Reliability Modeling of Hardware and Software Interactions, and Its Applications,” IEEE Transactions on Reliability , vol. 55, no. 4, Dec. 2006, pp. 571-577.

[43] L.A. Tomek and K.S. Trivedi, “Analyses Using Stochastic Reward Nets,” in Software Fault Tolerance, M.R. Lyu (ed.), New York: Wiley, 1995, pp. 139–165.

[44] W. Torres-Pomales, “Software Fault Tolerance: A Tutorial,” NASA Langley Research Center, Hampton, Virginia, TM-2000-210616, Oct. 2000.

[45] K.S. Trivedi, “SHARPE 2002: Symbolic Hierarchical Automated Reliability and Performance Evaluator,” in Proceedings International Conference on Dependable Systems and Networks, 2002.

[46] K.S. Trivedi, K. Va idyanathan, and K. Goseva- Postojanova, "Modeling and Analysis of Software Aging and Rejuvenation", in Proceedings of 33 rd Annual Simulation

Symposium, IEEE Computer Society Press, Los Alamitos, CA, 2000, pp. 270-279.

[47] A. von Mayrhauser and D. Chen, “Effect of Fault Distribution and Execution Patterns on Fault Exposure in Software: A Simulation Study,” Software Testing,

Verification & Reliability, vol. 10, no.1, March 2000, pp. 47-

64.

[48] M.A. Vouk, “Using Relia bility Models During Testing With Nonoperational Profiles,” in Proceedings of 2nd Bellcore/Purdue Workshop on I ssues in Software Reliability Estimation, October 1992, pp. 103-111.

[49] W. Wang and M. Tang, “User-Oriented Reliability Modeling for a Web System,” in Proceedings of the 14th International Symposium on Software Reliability Engineering (ISSRE’03), Denver, Colorado, November 2003,

pp.1-12.

[50] M. Xie, Software Reliability Modeling, World Scientific Publishing Company, 1991.

[51] S. Yacoub, B. Cukic, a nd H Ammar, “A Scenario-Based Reliability Analysis Approach for Component-Based Software,” IEEE Transactions on Reliability , vol. 53, no. 4, 2004, pp. 465-480.

[52] A.X. Zheng, M.I. Jordan, B. Libit, M. Naik, and A. Aiken, “Statistical Debugging: Simultaneous Identification of Multiple Bugs,” in Proceedings of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, 2006, pp.

1105-1112.

23/11/2012

Software Reliability Testing - Wikipedia, the free encyclopedia

Software Reliability Testing

From Wikipedia, the free encyclopedia

Software reliability testing is one of the testing field, which deals with checking the ability of software to function under given environmental conditions for particular amount of time by taking into account all precisions of the software. In Software Reliability Testing, the problems are discovered regarding the software design and functionality and the assurance is given that the system meets all requirements. Software Reliability is the probability that software will work properly in specified environment and for given time.

Probability = Number of cases when we find failure / Total number of cases under consideration

Using this formula, failure probability is calculated by testing a sample of all available input states. The set of all possible input states is called as input space. To find reliability of software, we need to find output space from given input space and software. [1]

Contents

Overview1

1

Objective of reliability testing2

2

2.1 Secondary objectives

2.1 Secondary objectives

2.2 Points for defining objectives

2.2 Points for defining objectives

Need of reliability testing3

3

Types of reliability testing4

4

4.1 Feature test

4.1 Feature test

4.2 Load test

4.2 Load test

4.3 Regression test

4.3 Regression test

Tests planning5

5

5.1 Steps for planning

5.1 Steps for planning

5.2 Problems in designing test cases

5.2 Problems in designing test cases

Reliability enhancement through testing6

6

6.1 Reliability growth testing

6.1 Reliability growth testing

6.2 Designing test cases for current release

6.2 Designing test cases for current release

Reliability evaluation based on operational testing7

7

7.1 Reliability growth assessment and prediction

7.1 Reliability growth assessment and prediction

7.2 Reliability estimation based on failure-free working

7.2 Reliability estimation based on failure-free working

See also8

8

References9

9

10 External linksand prediction 7.2 Reliability estimation based on failure-free working See also 8 References 9

Overview

To perform software testing, it is necessary to design the test cases and test procedure for each software module. Data is gathered from various stages of development for reliability testing,like design stage, Operating stage etc. The tests are limited because of some restrictions, like Cost of test performing and time restrictions. Statistical samples are obtained from the software products to test for reliability of the software. when sufficient data or information is gathered then statistical studies are done. time constraints are handled by applying fix

23/11/2012

Software Reliability Testing - Wikipedia, the free encyclopedia

dates or deadlines to the tests to be performed,after this phase designed of the software is stopped and actual implementations started.As there are restriction on cost and time the data is gathered carefully so that each data has some purpose and it gets expected precision. [2] To achieve the satisfactory results from reliability testing one must take care of some reliability characteristics. for example Mean Time to Failure (MTTF) [3] is measured in terms of three factors

1. Operating Time.

2. Number of on off cycles.

3. Calendar Time.

If the restrictions are on Operation time or if focus is on first point for improvement then one can apply compressed time accelerations to reduce the test time. If the focus is on calendar time that is there are predefined deadlines,then intensified stress testing is used. [2]

Software Reliability is measured in terms of Mean Time Between Failure(MTBF). [4]

MTBF consisting of mean time to failure (MTTF) and mean time to repair(MTTR). MTTF means difference of time in two consecutive failures and MTTR is the time required to fix the failure. [5] .Reliability for good software should be always between 0 to 1.Reliability increases when the errors or bugs from the programs are removed. [6]

e.g. MTBF = 1000 hours for average software, then software should work for 1000 hrs for continuous operations.

Objective of reliability testing

The main objective of the reliability testing is to test the performance of the software under given conditions without any type of corrective measure with known fixed procedures considering its specifications.

Secondary objectives

1. To find perceptual structure of repeating failures.

2. To find the number of failures occurring in specified amount of time.

3. To find the mean life of the software.

4. To know the main cause of failure.

5. After taking preventive actions checking the performance of different units of software.

Points for defining objectives

1. Behaviour of software should be defined in given conditions.

2. The objective should be feasible.

3. Time constraints should be provided. [7]

Need of reliability testing

Nowadays in maximum number of fields we find the application of computer software. Also some software are used in many critical applications like in industries, in military, in commercial systems etc. For these software from last century software engineering is developing. There is no complete measure to assess them . But to assess them software reliability measure are used as tool. So software reliability is the most important aspect of

23/11/2012

Software Reliability Testing - Wikipedia, the free encyclopedia

any software. [8]

To improve the performance of software product and software development process through assessment of reliability is required. Reliability testing is of great use for software managers and software practitioners. Thus ultimately testing reliability of a software is important. [9]

Types of reliability testing

Software Reliability Testing requires to check features provided by the software,the load that software can handle and regression testing. [10]

Feature test

feature test for software conducts in following steps

Each operation in the software is executed once.test feature test for software conducts in following steps Interaction between the two operations is reduced

Interaction between the two operations is reduced andsteps Each operation in the software is executed once. Each operation each checked for its proper

Each operation each checked for its proper execution.once. Interaction between the two operations is reduced and feature test is followed by the load

feature test is followed by the load test. [10]

Load test

This test is conducted to check the performance of the software under maximum work load. Any software performs better up to some extent of load on it after which the response time of the software starts degrading. For example, a web site can be tested to see till how many simultaneous users it can function without performance degradation. This testing mainly helps for Databases and Application servers.Load testing also requires to do software performance testing where it checks that how well some software performs under workload. [10]

Regression test

Regression testing is used to check if any bug fixing in the software introduced new bug. One part of the software affects the other is determined. Regression testing is conducted after every change in the software features. This testing is periodic. The period depends on the length and features of software. [10]

Tests planning

Reliability testing costs more as compare to other types of testing. Thus while doing reliability testing proper management and planning is required. This plan includes testing process to be implemented, data about its environment, test schedule, test points etc.

Steps for planning

1. Find main aim of testing.

2. Know the requirements of testing.

3. Have a look over existing data and check for the requirements.

4. Considering priorities of test find out necessary tests.

5. Utilize time constraints, available money and manpower properly.

23/11/2012

Software Reliability Testing - Wikipedia, the free encyclopedia

6. Determine specifications of test.

7. Allot different responsibilities to testing teams.

8. Decide policies for providing report of testing.

9. Have control over testing procedure throughout testing procedure. [7]

Problems in designing test cases

There are some problem while going through this tests.

Test cases can be simply selected by selecting valid input values for each field of the software,but after some changes in particular module the recorded input values again needs to check. Those values may not test the new features introduced after older version of software.cases There are some problem while going through this tests. There may be some critical runs

There may be some critical runs in the software which are not handled by any test case.so careful test case selection is necessary. [ 1 0 ] [10]

Reliability enhancement through testing

Studies during development and design of software helps for reliability of product. Reliability testing is basically performed to eliminate the failure mode of the software.life testing of the product should always done after the design part is finished or at least complete design is finalize. [11] failure analysis and design improvement is achieved through following testings.

Reliability growth testing

[11] This testing is used to check new prototypes of the software which are initially supposed to fail frequently.Failure causes are detected and actions are taken to reduce defects. suppose T is total accumulated time for prototype.n(T) is number of failure from start to time T.The graph drawn for n(T)/T is a straight line. This graph is called Duance Plot. one can get, how much reliability can be gained after all other cycles of test and to fix it.

can be gained after all other cycles of test and to fix it. solving eq.1 for

solving eq.1 for n(T),

other cycles of test and to fix it. solving eq.1 for n(T), where K is e^b.

where K is e^b. if value of alpha in the equation is zero the reliability can not be improved as expected for given number of failure. for alpha greater than zero cumulative time T increases. this explains that number of the failure doesn't depends on test lengths.

Designing test cases for current release

If in the current version of software release we are adding new operation,then writing a test case for that operation is done differently.

first plan how many new test cases are to be written for current version.writing a test case for that operation is done differently. If the new feature is part

If the new feature is part of any existing feature then share the test cases of new and existing features among them.how many new test cases are to be written for current version. en.wikipedia.org/wiki/Software_Reliability_Testing 4/6

23/11/2012

Software Reliability Testing - Wikipedia, the free encyclopedia

Finally combine all test cases from current version and previous one and record all the results. [ 1 0 ] [10]

There is a predefined rule to calculate count of new test cases for the software. if N is the probability of occurrence of new operations for new release of the software, R is the probability of occurrence of used operations in the current release and T is the number of all previously used test cases then

and T is the number of all previously used test cases then Reliability evaluation based on

Reliability evaluation based on operational testing

In reliability testing to test the reliability of the software use the method of operational testing. Here one checks the working of software in its relevant operational environment. But constructing such an operational environment is the main problem. Such type of simulation is observed in some industries like nuclear industries, in aircraft etc. Predicting future reliability is a part of reliability evaluation. There are two techniques used for this:

Steady state reliability estimation In this case we use the feed-backs of delivered software products. Depending on those results we predict the future reliability of next version of product. It simply follows the way of sample testing for physical products. Reliability growth based prediction This method uses the documentation of testing procedure. For example consider a developed software and after that we are creating different new versions of that software. At that time we consider data about testing of each version of that software and on the basis of that observed trend we predict the reliability of software. [12]

Reliability growth assessment and prediction

In assessment and prediction of software reliability we use reliability growth model. During operation of software data about its failure is stored in statistical form and is given as input to reliability growth model. Using that data reliability growth model will evaluate the reliability of software. Lots of data about reliability growth model is available with probability models claiming to represent failure process. But there is no model which best suited for all conditions. So considering circumstances we have to chose one of the model. So today such type of problem is overcome by using advanced techniques.

Reliability estimation based on failure-free working

In this case the reliability of the software is estimated with some assumptions like

If a bug is found out then it is sure that it is going to fix by someone.of the software is estimated with some assumptions like Fixing of bug is not going to

Fixing of bug is not going to affect the reliability of software.out then it is sure that it is going to fix by someone. Each fix in

Each fix in the software is accurate. [ 1 2 ] [12]

See also

Software testingEach fix in the software is accurate. [ 1 2 ] See also Load testing

Load testingthe software is accurate. [ 1 2 ] See also Software testing en.wikipedia.org/wiki/Software_Reliability_Testing 5/6

23/11/2012

Software Reliability Testing - Wikipedia, the free encyclopedia

Regression testingReliability Testing - Wikipedia, the free encyclopedia Reliability engineering References 1. ^ Software

Reliability engineering- Wikipedia, the free encyclopedia Regression testing References 1. ^ Software Reliability . Hoang Pham.

References

1.

^ Software Reliability. Hoang Pham.

2.

^ a b E.E.Lewis. Introduction to Reliability Engineering.

3.

^ "MTTF" (http://www.weibull.com/hotwire/issue94/relbasics94.htm) .

http://www.weibull.com/hotwire/issue94/relbasics94.htm.

4.

^ Roger Pressman. Software Engineering A Practitioner's Approach. McGrawHill.

5.

^ "Approaches to Reliability Testing & Setting of Reliability Test Objectives" (http://www.softwaretestinggenius.com/articalDetails.php?qry=963) .

http://www.softwaretestinggenius.com/articalDetails.php?qry=963.

6.

^ Aditya P. Mathur. Foundations of Software Testing. Pearson publications.

7.

^ a b Reliability and life testing handbook. Dimitri kececioglu.

8.

^ A Statistical Basis for Software Reliability Assessment. M. xie.

9.

^ Software Reliability modelling. M. Xie.

10.

^ a b c d e f John D. Musa. Software reliability engineering: more reliable software, faster and cheaper. McGraw-Hill. ISBN 0-07-060319-7.

11.

^ a b E.E.Liwis. Introduction to Reliability Engineering. ISBN 0-471-01833-3.

12.

^ a b "Problem of Assessing reliability". CiteSeerX: 10.1.1.104.9831 (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.9831) .

External links

Mean Time Between Failure (http://www.weibull.com/hotwire/issue94/relbasics94.htm/). External links Software Life Testing

Software Life Testing (http://www.weibull.com/basics/accelerated.htm/)(http://www.weibull.com/hotwire/issue94/relbasics94.htm/) Retrieved from

Retrieved from "http://en.wikipedia.org/w/index.php?title=Software_Reliability_Testing&oldid=521833844" Categories: Software testing

This page was last modified on 7 November 2012 at 15:12.Categories: Software testing Text is available under the Creative Commons

Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. See Terms of Use for details. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.testing This page was last modified on 7 November 2012 at 15:12. en.wikipedia.org/wiki/Software_Reliability_Testing 6/6

23/11/2012

Software performance testing - Wikipedia, the free encyclopedia

Software performance testing

From Wikipedia, the free encyclopedia

In software engineering, performance testing is in general testing performed to determine how a system performs in terms of responsiveness and stability under a particular workload. It can also serve to investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource usage.

Performance testing is a subset of performance engineering, an emerging computer science practice which strives to build performance into the implementation, design and architecture of a system.

Contents

1 Performance testing types

1 Performance testing types

 
  1.1 Load testing

1.1 Load testing

1.2 Stress testing

1.2 Stress testing

1.3 Endurance testing (soak testing)

1.3 Endurance testing (soak testing)

1.4 Spike testing

1.4 Spike testing

1.5 Configuration testing

1.5 Configuration testing

1.6 Isolation testing

1.6 Isolation testing

2 Setting performance goals

2 Setting performance goals

 
  2.1 Concurrency/throughput

2.1 Concurrency/throughput

2.2 Server response time

2.2 Server response time

2.3 Render response time

2.3 Render response time

2.4 Performance specifications

2.4 Performance specifications

2.5 Questions to ask

2.5 Questions to ask

3 Pre-requisites for Performance Testing

3 Pre-requisites for Performance Testing

 
  3.1 Test conditions

3.1 Test conditions

3.2 Timing

3.2 Timing

4 Tools

4 Tools

 
5 Technology

5 Technology

6 Tasks to undertake

6 Tasks to undertake

7 Methodology

7 Methodology

 
  7.1 Performance testing web applications

7.1

Performance testing web applications

8 See also

8 See also

9 External links

9 External links

Performance testing types

Load testing

Load testing is the simplest form of performance testing. A load test is usually conducted to understand the behaviour of the system under a specific expected load. This load can be the expected concurrent number of users on the application performing a specific number of transactions within the set duration. This test will give out the response times of all the important business critical transactions. If the database, application server, etc.

23/11/2012

Software performance testing - Wikipedia, the free encyclopedia

are also monitored, then this simple test can itself point towards any bottlenecks in the application software.

Stress testing

Stress testing is normally used to understand the upper limits of capacity within the system. This kind of test is done to determine the system's robustness in terms of extreme load and helps application administrators to determine if the system will perform sufficiently if the current load goes well above the expected maximum.

Endurance testing (soak testing)

Endurance testing is usually done to determine if the system can sustain the continuous expected load. During endurance tests, memory utilization is monitored to detect potential leaks. Also important, but often overlooked is performance degradation. That is, to ensure that the throughput and/or response times after some long period of sustained activity are as good or better than at the beginning of the test. It essentially involves applying a significant load to a system for an extended, significant period of time. The goal is to discover how the system behaves under sustained use.

Spike testing

Spike testing is done by suddenly increasing the number of or load generated by, users by a very large amount and observing the behaviour of the system. The goal is to determine whether performance will suffer, the system will fail, or it will be able to handle dramatic changes in load.

Configuration testing

Rather than testing for performance from the perspective of load, tests are created to determine the effects of configuration changes to the system's components on the system's performance and behaviour. A common example would be experimenting with different methods of load-balancing.

Isolation testing

Isolation testing is not unique to performance testing but involves repeating a test execution that resulted in a system problem. Often used to isolate and confirm the fault domain.

Setting performance goals

Performance testing can serve different purposes.

It can demonstrate that the system meets performance criteria.goals Performance testing can serve different purposes. It can compare two systems to find which performs

It can compare two systems to find which performs better.can demonstrate that the system meets performance criteria. Or it can measure what parts of the

Or it can measure what parts of the system or workload causes the system to perform badly.It can compare two systems to find which performs better. Many performance tests are undertaken without

Many performance tests are undertaken without due consideration to the setting of realistic performance goals. The first question from a business perspective should always be "why are we performance testing?". These considerations are part of the business case of the testing. Performance goals will differ depending on the system's technology and purpose however they should always include some of the following:

Concurrency/throughput

If a system identifies end-users by some form of log-in procedure then a concurrency goal is highly desirable. By

23/11/2012

Software performance testing - Wikipedia, the free encyclopedia

definition this is the largest number of concurrent system users that the system is expected to support at any given moment. The work-flow of your scripted transaction may impact true concurrency especially if the iterative part contains the log-in and log-out activity.

If the system has no concept of end-users then performance goal is likely to be based on a maximum throughput

or transaction rate. A common example would be casual browsing of a web site such as Wikipedia.

Server response time

This refers to the time taken for one system node to respond to the request of another. A simple example would be a HTTP 'GET' request from browser client to web server. In terms of response time this is what all load testing tools actually measure. It may be relevant to set server response time goals between all nodes of the system.

Render response time

A difficult thing for load testing tools to deal with as they generally have no concept of what happens within a

node apart from recognizing a period of time where there is no activity 'on the wire'. To measure render response time it is generally necessary to include functional test scripts as part of the performance test scenario which is a feature not offered by many load testing tools.

Performance specifications

It is critical to detail performance specifications (requirements) and document them in any performance test plan.

Ideally, this is done during the requirements development phase of any system development project, prior to any

design effort. See Performance Engineering for more details.

However, performance testing is frequently not performed against a specification i.e. no one will have expressed what the maximum acceptable response time for a given population of users should be. Performance testing is frequently used as part of the process of performance profile tuning. The idea is to identify the “weakest link” – there is inevitably a part of the system which, if it is made to respond faster, will result in the overall system running faster. It is sometimes a difficult task to identify which part of the system represents this critical path, and some test tools include (or can have add-ons that provide) instrumentation that runs on the server (agents) and report transaction times, database access times, network overhead, and other server monitors, which can be analyzed together with the raw performance statistics. Without such instrumentation one might have to have someone crouched over Windows Task Manager at the server to see how much CPU load the performance tests are generating (assuming a Windows system is under test).

Performance testing can be performed across the web, and even done in different parts of the country, since it is known that the response times of the internet itself vary regionally. It can also be done in-house, although routers would then need to be configured to introduce the lag what would typically occur on public networks. Loads should be introduced to the system from realistic points. For example, if 50% of a system's user base will be accessing the system via a 56K modem connection and the other half over a T1, then the load injectors (computers that simulate real users) should either inject load over the same mix of connections (ideal) or simulate the network latency of such connections, following the same user profile.

It is always helpful to have a statement of the likely peak numbers of users that might be expected to use the

system at peak times. If there can also be a statement of what constitutes the maximum allowable 95 percentile

response time, then an injector configuration could be used to test whether the proposed system met that specification.

23/11/2012

Software performance testing - Wikipedia, the free encyclopedia

Questions to ask

Performance specifications should ask the following questions, at a minimum:

In detail, what is the performance test scope? What subsystems, interfaces, components, etc. are in and out of scope for this test?should ask the following questions, at a minimum: For the user interfaces (UIs) involved, how many

For the user interfaces (UIs) involved, how many concurrent users are expected for each (specify peak vs. nominal)?components, etc. are in and out of scope for this test? What does the target system

What does the target system (hardware) look like (specify all server and network appliance configurations)?users are expected for each (specify peak vs. nominal)? What is the Application Workload Mix of

What is the Application Workload Mix of each system component? (for example: 20% log-in, 40% search, 30% item select, 10% checkout).(specify all server and network appliance configurations)? What is the System Workload Mix? [Multiple workloads may

What is the System Workload Mix? [Multiple workloads may be simulated in a single performance test] (for example: 30% Workload A, 20% Workload B, 50% Workload C).20% log-in, 40% search, 30% item select, 10% checkout). What are the time requirements for any/all

What are the time requirements for any/all back-end batch processes (specify peak vs. nominal)?example: 30% Workload A, 20% Workload B, 50% Workload C). Pre-requisites for Performance Testing A stable

Pre-requisites for Performance Testing

A stable build of the system which must resemble the production environment as close as is possible.

The performance testing environment should not be clubbed with User acceptance testing (UAT) or development environment. This is dangerous as if an UAT or Integration test or other tests are going on in the same environment, then the results obtained from the performance testing may not be reliable. As a best practice it is always advisable to have a separate performance testing environment resembling the production environment as much as possible.

Test conditions

In performance testing, it is often crucial (and often difficult to arrange) for the test conditions to be similar to the

expected actual use. This is, however, not entirely possible in actual practice. The reason is that the workloads

of production systems have a random nature, and while the test workloads do their best to mimic what may

happen in the production environment, it is impossible to exactly replicate this workload variability - except in the most simple system.

Loosely-coupled architectural implementations (e.g.: SOA) have created additional complexities with performance testing. Enterprise services or assets (that share a common infrastructure or platform) require coordinated performance testing (with all consumers creating production-like transaction volumes and load on shared infrastructures or platforms) to truly replicate production-like states. Due to the complexity and financial and time requirements around this activity, some organizations now employ tools that can monitor and create production-like conditions (also referred as "noise") in their performance testing environments (PTE) to understand capacity and resource requirements and verify / validate quality attributes.

Timing

It is critical to the cost performance of a new system, that performance test efforts begin at the inception of the

development project and extend through to deployment. The later a performance defect is detected, the higher

the cost of remediation. This is true in the case of functional testing, but even more so with performance testing, due to the end-to-end nature of its scope. It is always crucial for performance test team to be involved as early

as possible. As key performance requisites e.g. performance test environment acquisition and preparation is often a lengthy and time consuming process.

23/11/2012

Software performance testing - Wikipedia, the free encyclopedia

Tools

In the diagnostic case, software engineers use tools such as profilers to measure what parts of a device or software contributes most to the poor performance or to establish throughput levels (and thresholds) for maintained acceptable response time.

Technology

Performance testing technology employs one or more PCs or Unix servers to act as injectors – each emulating the presence of numbers of users and each running an automated sequence of interactions (recorded as a script, or as a series of scripts to emulate different types of user interaction) with the host whose performance is being tested. Usually, a separate PC acts as a test conductor, coordinating and gathering metrics from each of the injectors and collating performance data for reporting purposes. The usual sequence is to ramp up the load – starting with a small number of virtual users and increasing the number over a period to some maximum. The test result shows how the performance varies with the load, given as number of users vs response time. Various tools, are available to perform such tests. Tools in this category usually execute a suite of tests which will emulate real users against the system. Sometimes the results can reveal oddities, e.g., that while the average response time might be acceptable, there are outliers of a few key transactions that take considerably longer to complete – something that might be caused by inefficient database queries, pictures etc.

Performance testing can be combined with stress testing, in order to see what happens when an acceptable load is exceeded –does the system crash? How long does it take to recover if a large load is reduced? Does it fail in a way that causes collateral damage?

Analytical Performance Modeling is a method to model the behaviour of an system in a spreadsheet. The model is fed with measurements of transaction resource demands (CPU, disk I/O, LAN, WAN), weighted by the transaction-mix (business transactions per hour). The weighted transaction resource demands are added-up to obtain the hourly resource demands and divided by the hourly resource capacity to obtain the resource loads. Using the responsetime formula (R=S/(1-U), R=responsetime, S=servicetime, U=load), responsetimes can be calculated and calibrated with the results of the performance tests. Analytical performance modelling allows evaluation of design options and system sizing based on actual or anticipated business usage. It is therefore much faster and cheaper than performance testing, though it requires thorough understanding of the hardware platforms.

Tasks to undertake

Tasks to perform such a test would include:

Decide whether to use internal or external resources to perform the tests, depending on inhouse expertise (or lack thereof)to undertake Tasks to perform such a test would include: Gather or elicit performance requirements (specifications)

Gather or elicit performance requirements (specifications) from users and/or business analyststhe tests, depending on inhouse expertise (or lack thereof) Develop a high-level plan (or project charter),

Develop a high-level plan (or project charter), including requirements, resources, timelines and milestones(specifications) from users and/or business analysts Develop a detailed performance test plan (including detailed

Develop a detailed performance test plan (including detailed scenarios and test cases, workloads, environment info, etc.)