Sie sind auf Seite 1von 4

Software Reliability

Leralyn S. Quitaneg
Carl Angelo S. Pamplona

Software Reliability
Software Reliability Testing is a field of software that relates to testing a software's ability to
function, given environmental conditions, for a particular amount of time. Software reliability
testing helps discover many problems in the software design and functionality.
Software Reliability is an important to attribute of software quality, together with functionality,
usability, performance, serviceability, capability, installability, maintainability, and
Software reliability is the probability that software will work properly in a specified environment
and for a given amount of time. Using the following formula, the probability of failure is
calculated by testing a sample of all available input states.
Probability = No. of failing cases / Total no. of cases
The set of all possible input states is called the input space. To find reliability of software, we
need to find output space from given input space and software.
Functional and Non-functional Requirements
Functional Requirements maybe calculations, technical details, data manipulation and
processing and other specific functionality that define what a system is supposed to accomplish.
The system will check the all operator inputs to see that they fall within their
required ranges.
The system will check all disks for bad blocks each time it is booted.
The system must be implemented in using a standard implementation of Ada.
Non-Functional Requirements specifies criteria that can be used to judge the operation of a
system rather than specific behaviors reliability and availability are specified as part of the non-
functional requirements for the system.
The required level of reliability must be expressed quantitatively.
Reliability is a dynamic system attribute.
Source code reliability specifications are meaningless (e.g. N faults/1000 LOC)
An appropriate metric should be chosen to specify the overall system reliability.
Failure Probabilities
If there are two independent components in a system and the operation of the system depends
on them both then
P(S) = P(A) + P(B)
If the components are replicated then the probability of failure is
P(S) = P(A)

meaning that all components fail at once
Software Reliability Metrics
Reliability metrics are units of measure for system reliability
System reliability is measured by counting the number of operational failures and relating these
to demands made on the system at the time of failure
A long-term measurement program is required to assess the reliability of critical systems
Reliability Metrics
Probability of Failure on Demand (POFOD)
POFOD = 0.001
For one in every 1000 requests the service fails per time unit
Rate of Fault Occurrence (ROCOF)
ROCOF = 0.02
Two failures for each 100 operational time units of operation
Mean Time to Failure (MTTF)
average time between observed failures (aka MTBF)
Availability = MTBF / (MTBF+MTTR)
MTBF = Mean Time Between Failure
MTTR = Mean Time to Repair
Reliability = MTBF / (1+MTBF)
Time Units
Raw Execution Time
non-stop system
Calendar Time
If the system has regular usage patterns
Number of Transactions
demand type transaction systems
Measures the fraction of time system is really available for use
Takes repair and restart times into account
Relevant for non-stop continuously running systems (e.g. traffic signal)
Probability of Failure on Demand
Probability system will fail when a service request is made
Useful when requests are made on an intermittent or infrequent basis
Appropriate for protection systems service requests may be rare and consequences can be
serious if service is not delivered
Relevant for many safety-critical systems with exception handlers
Rate of Fault Occurrence
Reflects rate of failure in the system
Useful when system has to process a large number of similar requests that are relatively
Relevant for operating systems and transaction processing systems
Mean Time to Failure
Measures time between observable system failures
For stable systems MTTF = 1/ROCOF
Relevant for systems when individual transactions take lots of processing time (e.g. CAD or WP
Failure Consequences
Reliability does not take consequences into account
Transient faults have no real consequences but other faults might cause data loss or corruption
May be worthwhile to identify different classes of failure, and use different metrics for each
When specifying reliability both the number of failures and the consequences of each matter
Failures with serious consequences are more damaging than those where repair and recovery is
In some cases, different reliability specifications may be defined for different failure types
Failure Classification
Transient - only occurs with certain inputs
Permanent - occurs on all inputs
Recoverable - system can recover without operator help
Unrecoverable - operator has to help
Non-corrupting - failure does not corrupt system state or data
Corrupting - system state or data are altered
Building Reliability Specification
For each sub-system analyze consequences of possible system failures
From system failure analysis partition failure into appropriate classes
For each class send out the appropriate reliability metric
Statistical Reliability Testing
Test data used, needs to follow typical software usage patterns
Measuring numbers of errors needs to be based on errors of omission (failing to do the right
thing) and errors of commission (doing the wrong thing)
Difficulties with Statistical Reliability Testing
Uncertainty when creating the operational profile
High cost of generating the operational profile
Statistical uncertainty problems when high reliabilities are specified
Safety Specification
Each safety specification should be specified separately
These requirements should be based on hazard and risk analysis
Safety requirements usually apply to the system as a whole rather than individual components
System safety is an an emergent system property
Safety Life Cycle
Concept and scope definition
Hazard and risk analysis
Safety requirements specification
safety requirements derivation
safety requirements allocation
Planning and development
safety related systems development
external risk reduction facilities
safety validation
installation and commissioning
Operation and maintenance
System decommissioning
Safety Processes
Hazard and risk analysis
assess the hazards and risks associated with the system
Safety requirements specification
specify system safety requirements
Designation of safety-critical systems
identify sub-systems whose incorrect operation can compromise entire system safety
Safety validation
check overall system safety
Hazard Analysis Stages
Hazard identification
identify potential hazards that may arise
Risk analysis and hazard classification
assess risk associated with each hazard
Hazard decomposition
seek to discover potential root causes for each hazard
Risk reduction assessment
describe how each hazard is to be taken into account when system is designed
Risk Assessment
Assess the hazard severity, hazard probability, and accident probability
Outcome of risk assessment is a statement of acceptability
Intolerable (can never occur)
ALARP (as low as possible given cost and schedule constraints)
Acceptable (consequences are acceptable and no extra cost should be incurred to
reduce it further)

Risk Acceptability
Determined by human, social, and political considerations
In most societies, the boundaries between regions are pushed upwards with time (meaning risk
becomes less acceptable)
Risk assessment is always subjective (what is acceptable to one person is ALARP to another)

Risk Reduction
System should be specified so that hazards do not arise or result in an accident
Hazard avoidance
system designed so hazard can never arise during normal operation
Hazard detection and removal
system designed so that hazards are detected and neutralized before an accident can
Damage limitation
system designed to minimized accident consequences
Security Specification
Similar to safety specification
not possible to specify quantitatively
usually stated in system shall not terms rather than system shall terms
no well-defined security life cycle yet
security deals with generic threats rather than system specific hazards
Security Specification Stages
Asset identification and evaluation
data and programs identified with their level of protection
degree of protection depends on asset value
Threat analysis and risk assessment
security threats identified and risks associated with each is estimated
Threat assignment
identified threats are related to assets so that asset has a list of associated threats
Technology analysis
available security technologies and their applicability against the threats
Security requirements specification
where appropriate these will identify the security technologies that may be used to
protect against different threats to the system