You are on page 1of 196

Software Architecture

Glossary
Module

Code that together serves some functionality X.

Component

Two or more modules that serves some services Y

System

"A collection of components organized to accomplish a specific


function or set of functions." (From the 2000 IEEE-standard, see the
section at the end of this artcile)

Architectural description (AD)

"A collection of products to document an architecture" (IEEE, 2000)

Architecture

The software architecture of a system is the set of structures needed


to reason about the system, which comprise software elements,
relations among them, and properties of both.

Pattern

"[..] Usually describes software abstractions used by advanced


designers and programmers in their software. [..] It can be applied to
both the thing (for example, a collection class and its associated
iterator) and the directions for making a thing." - Coplien J. (1998)

Architectural pattern

A composition of architectural elements providing packaged


stategies for solving some of the problems facing a system.

Design patterns

A pattern that can be used for solving a problem in a subsystem, or a


coding problem in a module.

View

"A representation of a whole system from the perspective of a related


set of concerns" (IEEE, 2000) E.g. a logical view and a development
view of the system.

Viewpoint

"A specification of the conventions for constructing and using a


view. A pattern or template from which to develop individual views
by establishing the purposes and audience for a view and the
techniques for its creation and analysis." (IEEE, 2000)

This means the architectural patterns are at a higher level than the design patterns.
Why is Software Architecture important?
1. An architecture will inhibit or enable a system's driving quality attributes.

2. The decisions made in an architecture allow you to reason about and manage change as the
system evolves.
3. The analysis of an architecture enables early prediction of a system's qualities.
4. A documented architecture enhances communication among stakeholders.
5. The architecture is a carrier of the earliest and hence most fundamental, hardest-to-change
design decisions.
6. An architecture defines a set of constraints on subsequent implementation.
7. The architecture dictates the structure of an organization, or vice versa.
8. An architecture can provide the basis for evolutionary prototyping.
9. An architecture is the key artifact that allows the architect and project manager to reason about
cost and schedule.
10.An architecture can be created as a transferable, reusable model that forms the heart of a product
line.
11.Architecture-based development focuses attention on the assembly of components, rather than
simply on their creation.
12.By restricting design alternatives, architecture channels the creativity of developers, reducing
design and system complexity.
13.An architecture can be the foundation for training a new team member.

Contexts of software architecture


Architectures exist in four different contexts.
Technical - What technical role does the software architecture play in the system or systems of
which it's a part? The technical context includes the achievement of quality attribute
requirements, as well as current technology (cloud, mobile computing etc.).
Project life cycle - How does a software architecture relate to the other phases of a software
development life cycle?
Business - How does the presence of a software architecture affect an organization's business
environment? The system created must satisfy the business goals of a wide variety of
stakeholders, each of whom has different expectations for the system.
Professional - What is the role of a software architect in an organization or a development
project? You must have certain skills and knowledge to be an architect, and there are certain
duties that you must perform as an architect. These are influenced not only by coursework and
reading, but also by your experiences.

Architectural patterns
Module Patterns
Layered Pattern
Software is divided into layers that can be developed simultaneously. Layers are always drawn as a stack
of boxes. All layers can use the public interfaces of the other layers (layer bridging).

Component-and-Connector Patterns
Broker Pattern
The broker pattern solves the problem of having many services distibuted among multiple servers. It sets
a broker between the client and the servers. The broker forwards the client to the correct server and the
servers response to the client that ordered it.

Model-View-Controller Pattern
We want a way to keep the user interface separate from the applications functionality. We divide the
software into a model with the application data, a view that displays it and a controller that manages the
notifications of state changes.

Model

An object for saving data. It usually represents some sort of object, e.g. a user. The
model must have methods to read and write the attributes, and also be able to fire change
events.

View

A view is shown to the user. It reads and visualizes data.

Controller

Logic. Interprets input and manipulates/updates data.

Pipe-and-Filter Pattern
Software needed is supposed to transform streams of data multiple time. The solution is a pipeline with
filters, altering the data in the correct order.

Client-Server Pattern
Shared resources and services needs to be easily accessed and we have a large number of clients. Clients
request services from servers. Note that some components can be both clients and servers.

Peer-to-Peer Pattern
When we have a set of distributed computational entities that are equally important, peer-to-peer (aka
P2P) really shines. Peer-to-peer is typically a request-reply system. A peers search for another is often
directed through peers in the middle, thus forming a swarm. Bittorrent much?

Service-Oriented Pattern
A service provider needs to provide easy accessible services for its users. The consumers don't need to
know anything about the implementation. Components have interfaces that describe the services they
provide. Often multi language.

Publish-Subscribe Pattern
Components work together via announced messages or events. Components may subscribe to a set of
events. Any component may be both publisher and subscriber.

Shared-Data Pattern
Shared data between multiple data accessors and shared data stores. The connector enables data reading
and writing to these stores.

Allocation Patterns
Map-Reduce Pattern
Satisfies the need to quickly analyze enormous volumes of data. Often sorting data and then analyze is
insufficient. The map-reduce pattern needs specialized hardware for parallelization, and the functions map
and reduce.

Multi-Tier Pattern
Both c&c and allocation pattern depending on use. If tiers group components of similar functionality it is
c&c. Allocation if tiers are distributed. Computationally independent nodes.

Design patterns
Singleton pattern
A class is only instantiated once and can be changed by everyone. The constructor is private, so the object
will only be created inside the class. It is important that we can retrieve the instance created from outside
the class. A singleton pattern is not necessarily optimized for multithreading.

Observer pattern
The observer pattern relies on an observer that gets its information from a subject, and notifies its
subclasses to update their state. This differs from model view controller which is an architectural design,
system wide pattern.
For example: A program taking an integer and presenting it in different numeral systems. A subclass
notifies the observer that an integer has been entered, the observer notifies all subclasses, in our example
the octal, binary, and hexa- classes. Then they update their state accordingly.

State pattern
A main class listens to what state the program is in, and changes based on what state the program is in.
For example: A program starting to run will trigger the onCreate, or start method. When the running state
is set to false this will trigger the onDestroy method.

Template pattern
In java term (might want to generalize this a bit more), we have an abstract parent class thus it cannot be
instantiated. It is extended by child classes inheriting its properties.
For example: The Zoo animals A zoo animal is a class containing legs, methods for feeding. That is
inherited by elephants, dogs etc... Dogs and elephants eat different things, so we initialize the actual
methods in dogs and elephants class. The abstract class says that dogs and elephants (all zoo animals)
have to be fed.

Factory pattern
Is used to encapsulate a set objects, so that you can use them without having to specify the exact class of
each object It can be done by either defining an interface, implemented in the child classes, or it can be
defined in one class and overridden in the derived classes.
An example is to have a button factory so that when you're drawing a GUI you don't have to enter all the
specifics of each button, but have a factory that takes care of all the specifics for you.

Abstract factory pattern


Here we encapsulate multiple factories made based on the factory pattern in an complete abstract factory.

The abstract factory pattern creates an abstract factory with some general properties. We then make
factories for each of these properties. Each of these factories make an interface able to generate our
instance of for example shape in the figure below

Quality Attributes (Chapters 5 - 12)


Testable property of a system to tell how well the system is satisfying the needs of the stakeholders. The
attributes hold three types of categories:
Functional Requirements: What the system must do and how it must behave or react.
Quality Attribute Requirements: Qualification of functional requirements. For instance how fast a
function must perform.
Constraints: Design decision with zero degrees of freedom. For instance the use of a certain
programming language.

Availability
It refers to the property of software that is there and ready to carry out its task when it is needed. It
encompasses both reliability and dependability.
Reliability

Delivers data to the right recipient at the right time.

Dependability

"The ability to avoid failures that rare more frequent and more severe than
acceptable." - Avizienis

This means that availability is about detecting and preventing faults, but also recovering from them. The
book points out that one could say "failure is not an option", but that it is a lousy design philosophy as it
would not work in practice.
The term "high availability" typically refers to "5 nine"-availability (99.999 % availability) or higher,
which translates to 1 minute and 18 seconds of downtime over 90 days, or 5 minutes and 15 seconds over
a year.

General scenario
Source of stimulus

Hardware

Stimulus

Failure

Artifacts

Harddisk on server

Environment

Normal operation

Response

Backup server takes over

Response measure

Back to normal operation after 30 s

Tactics
The goal of these tactics is to mask or repair the fault.

Detect faults
Monitor

A component monitors all services and reports inconsistencies

Heartbeat

Periodic messange exchange between a system monitor and a


process being monitored.

Sanity checking

Checks validity or how reasonable the output from a


component

Condition monitoring

Checks conditions in a process or device or validating


assumptions made during the design. Implemented with e.g.
checksums.

Voting

Most common realization of the tactic is called triple modular


redundancy (TMR). It has three components that do the same
thing. They all receive the same input, and forward the same
output to a voting logic. The voting logic will compare the

outputs, and report failure if there are any inconsistencies


Have multiple copies of a component. Effective against
hardware failure.

Replication

Recover from faults


Prevent faults

Interoperability
Interoperability is about the degree to which two or more systems can usefully exchange meaningful
information via interfaces in a particular context.
It includes:
Syntactic interoperability

The ability to exchange data.

Semantic interoperability

The ability to correctly interpret the data being exchanged.

A system cannot be interoperable in isolation. They can interoperate in indirect ways.


Reasons you would want interoperability in your system:
Your system provides a service to be used by other (unknown) systems
Your system constructs capabilities from existing systems.
These reasons give us two important aspects of interoperability:

Discovery

The system/consumer of some service has to know (discover) the location,


identity and interface of the service to be used. This may be done during
runtime, or prior to runtime.

Handling of the response

The service can either: Report back with a response, broadcast the response or
send the response to another system.

One also has to manage available interfaces.

General scenario
Source of stimulus A system that initiates a request.
Stimulus

request to exchange information among systems

Artifacts

The systems that wish to interoperate

Environment

The systems that wish to interoperate are discovered at runtime/prior to runtime

Response

The result, where it is sent. The request could also be rejected. In either case, the result
may be logged.

Response measure

Percentage of information exchanges correctly processed, or the percentage correctly


rejected.

Tactics

Locate
Discover service

Locate a service by searching a known directory service.

Manage interfaces
Orchestrate

Uses a control mechanism to coordinate, manage and sequence the invocation


of services.

Tailor interface

Adds or removes capabilites to an interface. One can add capabilities such as


translation, adding buffering or smoothing data. One can remove capabilities
e.g., to hide particular functions to untrusted users.

Modifiability
Change happens.

Four questions to consider when planning for modifiability:

What can change?


What is the likelihood of the change?
Where is the change made, and who makes it?
What is the cost of the change?

Important measures:

Cohesion

A module should do exactly what it is intended to do, and nothing else. This means
splitting responsibilities up in different modules. E.g. if you have a "Person" module for
some sort of system (e.g. banking), it does not make sense to put a lot of email
responsibilites in there, e.g. for sending email. That should rather be put in an email
module. You'll want high cohesion

Coupling

How modules or components are tied together. You'll want loose coupling, which means
that they aren't too tight. It's important because it'll make it harder to modify the system.
E.g. if module A is dependent on B and C, which in turn is dependent on each other and
on A.

General scenario
Source of stimulus

Who makes the change: End user, developer, sysadmin.

Stimulus

Someone wants the change..

Artifacts

Code, data, interfaces, components, resources, configurations ..

Environment

Runtime, compile time, build time, initiation time, design time.

Response

Make, test or deploy modification

Response measure

Cost in terms of time, effort, money, new defects, complexity

Tactics

Reduce the size of a module


Split module

A big module will be expensive to modify. Split it up into smaller modules.

Increase cohesion

Increase semantic coherence

A module should only have some responsibility A. If you discover it also


has some responsibility B, move that to another module or create a new one
for it.

Reduce coupling
Encapsulate

Private and public methods. Have explicit interfaces for other modules to
use, hide what's irrelevant to them.

Use an intermediary

But something in between dependencies. If A is dependent on B, but X in


between to handle that communication. This leaves A and B more free
from each other. The intermediary depends on the type of dependency.

Restrict dependencies

Some module A can only talk with a given set of modules X. In practice
this means you'll restrict modules visibility.

Refactor

Rewrite code so that you do not have duplicated code or responsibilites,


and so that the code can be easiliy understood by developers.

Abstract common services

In the case two modules provide almost the same services, it will increase
modifiability to let them implement some abstract service. (In Java, this
means creating an abstract class.)

Defer binding
A parameterized function f(a,b)
is more general than the similar function f(a) that assumes b=0
. When we bind the value of some parameters at a different phase in the life cycle than the one in which
we defined the parameters, we are applying the defer binding tactic.
You'll want to bind as late as possible, so a change will only mean a change of argument, not the inner
workings of a module.

Performance
It's about time.
Performance describes the software's ability to meet time requirements.

General scenario
Source of stimulus Internal or external to the system
Stimulus

Arrival of a periodic, sporadic or stochastic

Artifacts

System, or one or more components of the system

Environment

Operational: normal, emergency, peak or overload

Response

Process events, change level of service

Response measure Latency, deadline, throughput, jitter, miss rate

Tactics

Control resource demand

Work to produce smaller demand

Managing sampling rate

Reduce the sampling rate environmental data is captured.

Limit event response

Queue up events, or processor utilization measure.

Prioritize events

Prioritize events depending on their importance.

Reduce overhead

Reduce intermediaries (important for modifiability).


Modifiability/performance-tradeoff.

Bound execution times

Limit the amount of time that can be used to process an event.

Increase resource efficiency

Improve critical algorithms.

Manage resources

Work to make the resources at hand work more efficiently

Increase resources

Costs money.

Introduce concurrency

Process in parallel when possible

Maintain multiple copies of


computations aka Caching

Introduces new responsibilities, like keeping data syncronized


and chosing what data to be cached.

Bound queue sizes

Limit the events that can arrive, and you'll limit the resources
spent on processing events. Need policy for what happens with
queue overflow. Often paired with limit event response tactic.

Schedule resources

If there's contention for a resource, the resource must be


scheduled.

Security
Measures the system's ability to protect data from those who are not meant to have access, while still
giving those who are authorized access.
Simple characteristic of security (for short: CIA):
1. Confidentiality

Data is protected from unauthorized access.

2. Integrity

Data is not subject to unauthorized manipulation.

3. Availability

System and data is available for legitimate use.

Other characteristics used to support CIA:


4. Authentication

Verifies identities

5. Nonrepudiation

Guarantees the sender of a message later can't deny sending it.

6. Authorization

Grants users privileges to perform a task (or tasks)

General scenario
Source of stimulus

Human or another system. May or may not have been already identified.

Stimulus

Unauthorized attempt to display, change or delete data, access system services,


change system's behavior or reduce availability.

Artifacts

System services, data within the system, a component or resources of the system,
data delivered to or from the system.

Environment

Online or offline. With or without a firewall. Fully, partially or not operational.

Response

Stop unathorized use. Logging. Recovering from the attack.

Response measure

Time used to end the attack. Number of attacks detected. How long it takes to
recover from an attack. How much data was vulnerable to an attack. Value of
system/data compromised.

Tactics
One method of thinking about how to achieve security in a system is to think about physical security.

Detect attacks

If you detect the attacks, you can stop the attack. Very clever.

Detect intrusion

Compare network traffic and check for malicious patterns, e.g. TCP-flags, payload
sizes, source or destination adress, port etc.

Detect service denial

Compare incoming traffic with know patterns of DDoS-attacks.

Verify message integrity

Use checksums or hash-values to see if messages are valid.

Detect message delay

Detect possible man-in-the-middle attacks. Check the time it takes to deliver


messages, and check that this time stays about the same for the messages sent.

Resist attacks

Make it harder for attackers

Identify actors

Use input to identify attacker

Authenticate actors

Certificates, passwords etc.

Authorize actors

Authenticated users are allowed to modify and add data.

Limit access

Different degree of access

Limit exposure

Hide facts about the system. "Security by obscurity."

Encrypt data

So, yeah

Seperate entities

Physical seperation, virtualization etc. Components are not at the same place, and
may be less affected by each other.

Change default settings

Don't have default settings.

React to attacks

Attack! How to respond

Revoke access

Limit access if you believe an attack is under way

Lock computer

Upon repeated failed login attempts, lock the system.

Inform actors

Inform people who may need to act..

Recover from attacks

Restore systems and data. Maintain audit trail.

Testability
To what degree the system is testable. Used to demonstrate faults.
You'll want to plan for testing in the project plan, as it is very important for systems, especially big ones.
You'll want a testing infrastructure that makes it easy to test, to introduce tests and mimic faults to see if
the system can handle them. The infrastructure should also enable logging system states. You'll also want
these to run automatically with development cycles/increments.

General scenario
Source of stimulus

Unit, integration, system, acceptance testers. Users. Manually running tests, or


automatically.

Stimulus

The completion of a coding segment.

Artifacts

The part of the system being tested.

Environment

Design, development, compile, integration, deployment or run time

Response

Execute test suite and capture results, capture activity that resulted in the fault,
control and monitor the state of the system

Response measure

Effort to find the fault or class of faults, test coverage, reduction in risk exposure ...

Tactics

Control and observe system state

You cannot test something if you cannot observe what happens when
you do.

Specialized testing interfaces

Capture variables. Can be done with special get and set methods,
report-method, reset-method or a method to turn on verbose output.

Record/playback

Record state when it's crossing an interface

Localize state storage

Hard to test with distributed states

Abstract data soures

Make it easy to switch data sources through making the interfaces


they go through abstract. E.g. with a banking system you would want
a test database in development, but another in production. This means
you want to be able to switch data sources easiliy.

Sandbox

Isolate a part of the system from the real world for experimentation.
E.g. virtualization (with resources for instance.)

Executable assertions

Hard-code assertion and place assertion at specified places to check if


values are okay.

Limit Complexity

Complex software is harder to test. It's hard to replicate a state when


the state space is large (that is: The Software is complex.)

Limit structural complexity

Avoid or resolve cyclic dependencies, encapsulate and isolate


environmental dependencies and reduce dependencies in general.
High cohesion and loose coupling (see modifiability) can also help
testability.

Limit nondeterminism

That is limit behavioral complexity. Deterministic systems are easier


to test. Find all sources of nondeterminism such as unconstrained
parallelism and weed them out as much as possible. All
nondeterminism cannot be removed.

Usability
Usability is concerned with how easy it is for the user to accomplish a desired task and the kind of user
support the system provides.
Over the last few years this has become increasingly important, as users don't want to use (or buy)
systems which are hard to use.
Comprises the following areas:

Learning system features


Using a system efficiently
Minimizing the impact of errors
Adapting the system to user needs
Increasing confidence and satisfaction

General scenario
Source of stimulus

End user, possibly in a specialized role.

Stimulus

End user does/attempts something in relation to the usability areas mentioned


above.

Artifacts

The system or the part the user interacts with

Environment

Runtime or configuration time.

Response

Provide the user with the feature needed or anticipate the user's needs

Response measure

Task time, number of errors, number of tasks accomplised, user satisfaction, gain
of user knowledge, ratio of successful operations to total operations or amount of
time or data lost when an error occurs.

Tactics

Support user initative

When the system is running, you enhance usability by giving the user feedback

on what he or she is doing.


Cancel

Let the user cancel operations. This means the system must listen for the cancel
command.

Undo

Let the user undo actions.

Pause/resume

Let the user pause and resume processes that take a lot of time.

Aggregate

Let the user lower-level objects to a single group so the user can do the same
operation on the whole group.

Support system initative

Maintain task model

Task model is used to give system context of what the user is trying to do, so it
can help the user get there. E.g. knowing all sentences start with a capital letter
and knowing the user is writing a text, the system can correct the text when the
user forgets to use a capital letter.

Maintain user model

Represents user's knowledge of the system. Can control the amount of


assistance the system gives. Can let the user customize the system.

Maintain system model

System model used to determine expected system behavior, e.g. so it can give
the user an estimate of how much time an operation will take.

Other Quality Attributes


A few more arise frequently:
Variability

A special form of modifiability. It describes to which degree the system,


documentation and staff supports making variants of the system.

Portability

Special form of modifiability. In which degree the system can be made to run
on a different platform than what was orignally intended.

Development distributability To what extent can the development of the software be distributed.

Scalability

How can the system handle more resources. Two types: Horizontal and vertical.
Horizontal refers to adding more resources to logical units, e.g. add another
server to a cluster of servers. Called elasticity in cloud environments. Vertical:
Add more resources to a physical unit, e.g. more memory to a single server.

Deployability

How an executable arrives at a host, and how it is invoked. That means: How
easy is it to deploy the system.

Mobility

Problems of movement, battery, reconnecting etc.

Monitorability

The ability of the operations staff to monitor the system.

Safety

Does the system lead to unsafe actions, e.g. can a missile system fire random
missiles.

Architectural Tactics and Patterns (Chapter 13)


An architectural pattern - is a package of design decisions that is found repeatedly in practice, - has
known properties that permit reuse, and - describes a class of architectures.
Patterns are discovered, not invented.
An architectural pattern establishes a relationship between:
A context. A recurring, common situation in the world that gives rise to a problem.
A problem. The problem that arises in a given context. Often includes quality attributes that must
be met.
A solution. A successful architectural solution to the problem. Determined and described by:
A set of element types (for example, data repositories, processes, and objects)
A set of interactions mechanisms or connectors (for example, method calls, events, or
message bus)
A topological layout of the components
A set of semantic constraints covering topology, element behavior, and interaction
mechanisms.
This {context, problem, solution} form constitutes a template for documenting a pattern.

Documenting Software Architectures (Chapter 18)


Views (18.3)
A view is a representation of a set of system elements and relations among them not all system
elements, but those of a particular type. For example, a layered view of a system would show elements of
type "layer" that is, it would show the system's decomposition into layers and the relations among
those layers.
The concept of views gives us our most fundamental principle of architecture documentation:
"Documenting an architecture is a matter of of documenting the relevant views and then adding
documentation that applies to more than one view."
What are relevant views? That depends entirely on your goals.

Module Views
Module structures describe how the system is to be structured as a set of code or data units that have to be
constructed or procured.
Example of fitting architectural pattern: Layered Pattern.

Component-and-Connector Views
Component-and-connector structures describe how the system is to be structured as a set of elements that
have runtime behavior (components) and interactions (connectors).
The purpose of the C&C views is to show how the system works, guide development by specifying
structure and behaviour of runtime elements and help reason about runtime system quality attributes, such
as performance and availability.
Examples of fitting architectural patterns: Broker, MVC, Pipe-and-Filter, Peer-to-Peer, Service-oriented
architectural, Publish-Subscribe, Shared-Data pattern.

Allocation Views
Allocation views describe the mapping of software units to elements of an environment in which the
software is developed or in which it executes. The environment might be the hardware, the operating
environment in which the software is executed, the file systems supporting development or deployment,
or the development organization(s). TL;DR: Allocation views describe the mapping from software
elements to non-software elements (CPUs, file systems, networks, development teams, etc).

Overview over the views:


Name

For

Description

Logical view

End-user,
functionality

"Supports the functionality requirements. The system is decomposed


into a set of key abstractions, taken (mostly) from the problem domain
in the form of objects and object classes."

Process view

Integrators,
perfomance,
scalability

Adresses some non-functional requirements. Looks on how different


processes will work on different tasks, and how they will communicate
with each other.

Development
view

Programmers,
software
management

"Focuses on the actual software module organization on the software


development environment." It shows how the complete system can be
divided into small chunks, like modules and library. This is used by
developers/software managers to give tasks to different team members.

System
engineers,
"Mapping software to hardware." Focuses on non-function
Physical view
topology,
requirements and how the physical hardware will fulfill these.
communications
Scenarios

Shows how all the views work together, through instances of more
general use cases called scenarios.

Lecture 1
(Envisioning Architecture)
What is Software Architecture?
The software architecture of a program or computing system is the structure or
structures of the system, which comprise software elements, the externally visible
properties of those elements, and the relationships among them.
Software architecture encompasses the structures of large software systems:
-

abstract view

eliminates details of implementation, algorithm, & data representation

concentrates on the behavior & interaction of black box elements

Software application architecture is the process of defining a structured solution that meets all of the
technical and operational requirements, while optimizing common quality attributes such as performance,

security, and manageability. It involves a series of decisions based on a wide range of factors, and each of
these decisions can have considerable impact on the quality, performance, maintainability, and overall
success of the application.
Software architecture encompasses the set of significant decisions about the organization of a software
system including the selection of the structural elements and their interfaces by which the system is
composed; behavior as specified in collaboration among those elements; composition of these structural
and behavioral elements into larger subsystems; and an architectural style that guides this organization.
Software architecture also involves functionality, usability, resilience, performance, reuse,
comprehensibility, economic and technology constraints, tradeoffs and aesthetic concerns.
Why software architecture is important for our business cycle?
Like any other complex structure, software must be built on a solid foundation. Failing to consider key
scenarios, failing to design for common problems, or failing to appreciate the long term consequences of
key decisions can put your application at risk. Modern tools and platforms help to simplify the task of
building applications, but they do not replace the need to design your application carefully, based on your
specific scenarios and requirements. The risks exposed by poor architecture include software that is
unstable, is unable to support existing or future business requirements, or is difficult to deploy or manage
in a production environment.
Systems should be designed with consideration for the user, the system (the IT infrastructure), and the
business goals. For each of these areas, you should outline key scenarios and identify important quality
attributes (for example, reliability or scalability) and key areas of satisfaction and dissatisfaction. Where
possible, develop and consider metrics that measure success in each of these areas.

User, business, and system goals


Trade offs are likely, and a balance must often be found between competing requirements across these
three areas. For example, the overall user experience of the solution is very often a function of the
business and the IT infrastructure, and changes in one or the other can significantly affect the resulting
user experience. Similarly, changes in the user experience requirements can have significant impact on

the business and IT infrastructure requirements. Performance might be a major user and business goal, but
the system administrator may not be able to invest in the hardware required to meet that goal 100 percent
of the time. A balance point might be to meet the goal only 80 percent of the time.
Architecture focuses on how the major elements and components within an application are used by, or
interact with, other major elements and components within the application. The selection of data
structures and algorithms or the implementation details of individual components are design concerns.
Architecture and design concerns very often overlap. Rather than use hard and fast rules to distinguish
between architecture and design, it makes sense to combine these two areas. In some cases, decisions are
clearly more architectural in nature. In other cases, the decisions are more about design, and how they
help you to realize that architecture.
Definition of Software Architecture?
The software architecture of a program or computing system is the structure or structures of the system,
which comprise software elements, the externally visible properties of those elements, and the
relationships among them.
Components of a system means modules. That means architecture means not about the system, but it is
about the each module and its relationship. External visible properties of the system.
What is the Goal of Software Architecture?
Application architecture seeks to build a bridge between business requirements and technical
requirements by understanding use cases, and then finding ways to implement those use cases in the
software. The goal of architecture is to identify the requirements that affect the structure of the
application. Good architecture reduces the business risks associated with building a technical solution. A
good design is sufficiently flexible to be able to handle the natural drift that will occur over time in
hardware and software technology, as well as in user scenarios and requirements. An architect must
consider the overall effect of design decisions, the inherent trade offs between quality attributes (such as
performance and security), and the trade offs required to address user, system, and business requirements.
Keep in mind that the architecture should:

Expose the structure of the system but hide the implementation details.
Realize all of the use cases and scenarios.
Try to address the requirements of various stakeholders.
Handle both functional and quality requirements.

Why we need software architecture?


Normally the software development is based on the requirement. How to overcome the existing problems
in software development. Some of the problems with the pre architecture life cycle are
1. Few stake holders involved : When we design a system, only a few stake holders will be
involved. When we deliver the system, more stake holders will come into picture who are not at
all involved during the development stage. They may be having different kind of expectations.

2. Iteration mainly on functional requirement :- during SDLC, the iterations are mainly on the
functionality of the system.
3. No balancing of the functional and quality requirement :- Mostly we deal with the technical or
functional requirements. But when we look into the quality requirement (non functional
requirement) such as the usability of the system, the performance of the system,
availability,maintainability etc. there is no balancing with functional requirement with the nonfunctional requirement.

Seeing the above problems, we introduce architecture during the development phase. The development
phase comprise of architecture, detailed design and implementation.
We take the architecture in a
granular level because we focus not only with the functional requirement but with quality (non functional)
requirement too.
When do we introduce Architecture in the SDLC?
1. Many stake holders involved :- There will be different views of the system and naturally there
will be different types of stake holders involved.
2. Iteration on both functional and quality requirements :- We get the requirement from the stake
holders. And we try to fulfill the requirement without compromising on quality.
3. Balancing of functional and quality requirements :- Taking account of both requirement and
quality we build the architecture and sign agreement with the stakeholder. Then start
development.
What are the role of an architect?
In a software realm, we have different people with different roles.
Customer who wants the system to be built.
Those who take the requirement from the customer- business analyst.
Those who take requirement and analyzing as well - Business intelligence person
Who develop the software software developer
We have one more role which is called software architect (system architect), who take the requirement
from the customer and prepare a blueprint of the product to be delivered.

Stakeholders

each stakeholder has different concerns & goals, some contradictory

Development Organization

immediate business, long-term business, and organizational (staff skills, schedule, & budget)

Background & Experience of the Architects

repeat good results, avoid duplicating disasters

The Technical Environment

standard industry practices or common SE techniques

The properties required by the business & organizational goals not understood most of the time.

- Architects need to know & understand the nature, source, and priority of constraints on the project as
early as possible.

Architects must identify &actively engage the stakeholders to solicit their needs & expectations.

- Use architecture reviews & iterative prototyping.

A simplistic view of the role is that architects create architectures, and their responsibilities encompass all
that is involved in doing so. This would include articulating the architectural vision, conceptualizing and
experimenting with alternative architectural approaches, creating models and component and interface
specification documents, and validating the architecture against requirements and assumptions.
The architect (or team) needs to partner well with a variety of different stakeholder groups, including
management at different levels, business analysts or marketing, and developers. The architect needs to
balance participation (to gain insight, ensure excellence and get buy-in) with the need to create conceptual
integrity and keep the architecture decision process from stalling. The more broadly scoped the
architecture, the more likely it is that the architecture will be challenged on many fronts. The architect has
to shed distaste for what may be considered "organizational politics," and actively work to sell the
architecture to its various stakeholders, communicating extensively and working networks of influence to
ensure the ongoing success of the architecture.
But "buy-in" to the architecture vision is not enough either. Anyone involved in implementing the
architecture needs to understand it. Weighty architectural documents are notorious dust-gatherers. The
early participation of key developers brings good ideas into the architecture process and also creates
broader understanding and vested interest in its outcome. In addition, for bigger projects, it can be quite
helpful to create and teach tutorials to help developers understand the architecture and the rationale for the
decisions it represents. During the construction cycles, the architect needs to be available to actively
consult on the application of the architecture, to explain the rationale behind architectural choices, and to
make amendments to the architecture when justified. The architect also acts as mentor and coach,
working with developers to address challenges that arise, especially when they have broad/systemic
impact or are critical to the success of the system.
Lastly, the architect must leadthe architecture team, the developer community, and, in its technical
direction, the organization.

An architect abstracts the complexity of a system into a manageable model that describes the essence of a
system by exposing important details and significant constraints.
An architect maintains control over the architecture lifecycle parallel to the projects software
development lifecycle. Although an architect may be most visible during the requirements and design
stages of a project lifecycle, he or she must proactively monitor the adherence of the implementation to
the chosen architecture during all iterations. Architecture on paper is fruitless unless implemented
proficiently.
An architect stays on course in line with the long term vision. When project's scope creep attempts to
manipulate software architecture in a certain way in order to satisfy the desires of myriad stakeholders the
architect must know when to say "NO" to select requests in order to say "YES" to others. An architect
must focus on actions that produce results early while staying on course for the long term. When project
variables outside of ones control change the architect must adjust the strategy given the resource
available while maintaining the long term goal.
An architect progressively makes critical decisions that define a specific direction for a system in terms of
implementation, operations, and maintenance. The critical decisions must be faithfully made and backed
up by understanding and evaluation of alternative options. These decisions usually result in tradeoffs that
principally define characteristics of a system. Additionally these decisions must be well documented in a
manner understood by others.
An architect sets quantifiable objectives that encapsulate quality attributes of a system. The fitness of the
architecture is measured against set marks.
An architect works closely with executives to explain the benefits and justify the investment in software
architecture of a solution. This may be done by participating in business process re-engineering activities,
by using Cost Benefit Analysis Method, or by measuring the level of component / architecture re-use
between projects with the help from the software process improvement team. Software architect must be
effective in order to deliver results that are meaningful to the projects that have an impact on the bottom
line that result in greater profits.
An architect inspires, mentors, and encourages colleagues to apply intelligently customized industrys
best practices. Educating the recipients and participants of system architecture is essential to successfully
selling the chosen architectural path. Specifically the stakeholders must be able to understand, evaluate,
and reason about software architecture. If an architect is the only one who can read and understand
documented system architecture, then he has failed to integrate his best practices into the organizational
culture.
An architect fights entropy that threatens architects structural approach to problem solving. Its an
architects job to keep the inertia going once the project is in progress. He or she must convince all
relevant stakeholders that the chosen approach is sound moreover the chosen architectural solution must
be well explained and justified. The benefits of implementing a system in a particular way must be
explained not only in terms of thats the right pattern for this problem, but also to demonstrate the
measurable benefits - such as easier integration. For example, in a product line approach an architect must

be able to demonstrate how the subsequent projects will be easier to implement due to the presence of a
common base from which subsequent work can be done.
An architect creates and distributes tailored views of software architectures to appropriate stakeholders at
appropriate intervals. For example, a customer may demand to become more involved with a project and
they may need to know an abstract view of a system on the level understood by them. A government
customer may require an architect to demonstrate early in the project how a given system meets High
Level Architecture requirements for a specific framework. Its the architects responsibility to identify
and present a sufficient level of information that a customer needs.
An architect acts as an agent of change in organizations where process maturity is not sufficient for
creating and maintaining architecture centric development. If the concept of software architecture is not
well recognized in an organization it may be a tough sell to formally recognize the role of software
architecture in a SDLC. Without senior management commitment and without mature software
development process, architecture of the system on paper may not reflect the actual architecture of a
system.

The three types of architect


Enterprise architects ensure convergence between business needs and technologies by establishing
architectural guidelines such as enterprise conceptual data models or, in service-oriented environments,
business service interfaces. For each project, they must validate that the technical solution design by the
software architect complies with the corporations network policies and capabilities. It is interesting to
note that the job of the enterprise architect is not terribly technical: It requires technical understanding,
but to an equal degree it requires understanding business issues and business needs.
Infrastructure architects, on the other hand, are highly technical. They ensure the safe and productive
deployment and operation of enterprise applications. This involves managing hardware, network and
operating systems, as well as the so-called infrastructure services, including security, logging, and error
management.
Software architects design the technical solution of the entire application. The logical and physical
structure that they conceive must simplify the technical work and be within the technical capabilities of
the development team.
Software architects must work hand-in-hand with enterprise and infrastructure architects.
Importance of an architect
The relationships among business goals, product requirements, architects experience, architectures, and
fielded systems form a cycle with feedback loops that a business can manage:
To handle growth, to expand enterprise area, and to take advantage of previous investments in
architecture & system building.

Architecture is the vehicle for stakeholder communication

Architecture manifests the earliest set of design decisions

Constraints on implementation

Dictates organizational structure

Inhibits or enable quality attributes

Architecture is a transferable abstraction of a system


-

Product lines share a common architecture

Allows for template-based development

Basis for training

What are the activities involved with the Architecture

Creating the Business Case for the System

Understanding the Requirements

Creating or Selecting the Architecture

Communicating the Architecture

Analyzing or Evaluating the Architecture

Implementing Based on the Architecture

Ensuring Conformance to an Architecture

What makes a Good Architecture

No such thing as an inherently good or bad architecture.

Architectures are more or less fit for some stated purpose.

Architectures can be evaluated - one of the great benefits of paying attention to them - but only in
the context of specific goals.

Rules of Thumb: process & product (structural) recommendations

What are the rules of Thumb for making a a Good Architecture ?


Process Recommendations:

include functional requirements and a prioritized list of quality attributes the system must
satisfy

analyze & formally evaluate before it is too late to change

Product Recommendations:

well-defined modules using principles of information hiding & separation of concerns

separate modules that produce data from those that consume data to increase
modifiability & staged upgrades

write tasks or processes to allow easy reallocation, perhaps at runtime.

There are rules of thumb that must be followed while designing an architecture.
These fall into two categories. Process and Structure
Process for Developing an Architecture
1. The architecture should be the product of a single architect or a group of architects with an identified
leader.
2. The architect \ team should have a set of functional requirements and non-functional requirements (
quality attributes ) that the architecture is supposed to satisfy. The quality attribute list should be
prioritized.
3. The architecture should be well documented.
4. The architecture should be communicated / presented to the stakeholders who should be actively
involved in its review.
5. The architecture should be analysed for quantitiative measures like maximum throughput and evaluated
for quality attributes.
6. The architecture should lend itself to incremental implementation via the creation of a skeletal system
in which the communication paths are exercised but at first have minimal functionality.
7. The architecture should result in a specific set of resource contention areas, the resolution of which id
clearly specified, circulated, and maintained. If performance is a concern the architects should produce
time budgets for the major transactions or threads in the system.
Structure of the Architecture
1. The architecture should have well defined modules whose functional responsibilities are allocated on
the principles of information hiding and separation of concerns.
2. Each module should have a well defined interface that encapsulates changeable aspects from other
software that uses its facilities.

3. Every task and process should be written such that its assignment to a specific processor can be easily
changed perhaps even at runtime.
4. The architecture should feature a small number of simple interaction patterns . That is the system
should do the same thing in the same waythroughout.
5. Modules that produce data should be separate from modules that produce data.

Lecture 2
(Structures and Quality Attributes )

What is Architectural Pattern, Reference Models and Reference Architecture?


There are three stages that capture characteristics of an architecture, on the way from box-and-arrow to
full software architectures:
-

Architectural Patterns

Reference Models

Reference Architectures

Architectural Patterns

A description of element & relation types together with a set of constraints on how they may be used.
These constraints on an architecture define a set or family of architectures.
For example, the client-server pattern has 2 element types (?); their relationship is a protocol that the
server uses to communicate with each of its clients, the clients dont communicate directly. Functionality
is excluded.
Main characteristics Patterns is re-usability
Value of Patterns

They exhibit known quality attributes, and are a reuse of experience.

Some patterns solve performance problems, others apply to high-security systems, or highavailability goals.

Often the architects first major design decision.

Also referred to as architecturalstyles.

Reference Models

A division of functionality together with data flow between the pieces.

A standard decomposition of a known problem into parts that cooperatively solve the problem.

They arise from experience, and are thus a characteristic of mature domains.

For example, the standard parts of a compiler or database management system & how they work together.
Reference Architectures

A reference model mapped onto software elements and the data flows between them. The
elements must cooperatively implement the functionality defined in the reference model.

The mapping may be 1-1, but an element may implement a part of a function or several functions.

Between box-and-line sketches that are the barest of starting points and full-fledged architectures, with all
of the appropriate information about a system filled in, lie a host of intermediate stages. Each stage
represents the outcome of a set of architectural decisions, the binding of architectural choices. Some of
these intermediate stages are very useful in their own right. Before discussing architectural structures, we
define three of them.
1. An architectural pattern is a description of element and relation types together with a set of
constraints on how they may be used. A pattern can be thought of as a set of constraints on an
architecture-on the element types and their patterns of interaction-and these constraints define a
set or family of architectures that satisfy them. For example, client-server is a common
architectural pattern. Client and server are two element types, and their coordination is described
in terms of the protocol that the server uses to communicate with each of its clients. Use of the

term client-server implies only that multiple clients exist; the clients themselves are not identified,
and there is no discussion of what functionality, other than implementation of the protocols, has
been assigned to any of the clients or to the server. Countless architectures are of the client-server
pattern under this (informal) definition, but they are different from each other. An architectural
pattern is not an architecture, then, but it still conveys a useful image of the system-it imposes
useful constraints on the architecture and, in turn, on the system.
One of the most useful aspects of patterns is that they exhibit known quality attributes. This is
why the architect chooses a particular pattern and not one at random. Some patterns represent
known solutions to performance problems, others lend themselves well to high-security systems,
still others have been used successfully in high-availability systems. Choosing an architectural
pattern is often the architect's first major design choice.
The term architectural style has also been widely used to describe the same concept.
2. A reference model is a division of functionality together with data flow between the pieces. A
reference model is a standard decomposition of a known problem into parts that cooperatively
solve the problem. Arising from experience, reference models are a characteristic of mature
domains. Can you name the standard parts of a compiler or a database management system? Can
you explain in broad terms how the parts work together to accomplish their collective purpose? If
so, it is because you have been taught a reference model of these applications.
3. A reference architecture is a reference model mapped onto software elements (that cooperatively
implement the functionality defined in the reference model) and the data flows between them.
Whereas a reference model divides the functionality, a reference architecture is the mapping of
that functionality onto a system decomposition. The mapping may be, but by no means
necessarily is, one to one. A software element may implement part of a function or several
functions.
Reference models, architectural patterns, and reference architectures are not architectures; they are useful
concepts that capture elements of an architure. Each is the outcome of early design decisions.
The figure given below shows the relationships of reference models, architectural patterns, reference architectures,
and software architectures. (The arrows indicate that subsequent concepts contain more design elements.)

Reference architecture is an abstraction. For example, in software, we are often solving the same
problems, particularly within an industry, but for a different company. A reference architecture might
provide a template for solving the common problems faced by any company in the banking industry, such

as

how

to

model

loan,

or

an

API

definition

for

wire

transfer,

as

examples.

An actual architecture will be the fully-fleshed-out implementation of either the reference architecture
templates, or something custom, or maybe a combination of the two.
Definition: Architectural patterns are a method of arranging blocks of functionality to address a need.
Patterns can be used at the software, system, or enterprise levels. Good pattern expressions tell you how
to use them, and when, why, and what trade-offs to make in doing so. Patterns can be characterized
according to the type of solution they are addressing (e.g., structural or behavioral).
What are the differences between Architecture pattern and Design Pattern?
The term "design pattern" is often used to refer to any pattern which addresses issues of software
architecture, design, or programming implementation. In Pattern-Oriented Software Architecture: A
System of Patterns, the authors define these three types of patterns as follows:
An Architecture Pattern expresses a fundamental structural organization or schema for software
systems. It provides a set of predefined subsystems, specifies their responsibilities, and includes
rules and guidelines for organizing the relationships between them.
A Design Pattern provides a scheme for refining the subsystems or components of a software
system, or the relationships between them. It describes a commonly recurring structure of
communicating components that solves a general design problem within a particular context.
An Idiom is a low-level pattern specific to a programming language. An idiom describes how to
implement particular aspects of components or the relationships between them using the features
of the given language.
These distinctions are useful, but it is important to note that architecture patterns in this context still refers
solely to software architecture. Software architecture is certainly an important part of the focus of
TOGAF, but it is not its only focus.
Why is Architecture Important?
Three fundamental reasons from a technical perspective:
Communication among stakeholders
- a basis for mutual understanding, negotiation, & consensus
Early design decisions
- earliest point at which decisions can be analyzed
Transferable abstraction of a system
- can promote large-scale reuse
Communication among stakeholders

Software architecture represents a common abstraction of a system that most if not all of the system's
stakeholders can use as a basis for mutual understanding, negotiation, consensus, and communication.
Early design decisions
Software architecture manifests the earliest design decisions about a system, and these early bindings
carry weight far out of proportion to their individual gravity with respect to the system's remaining
development, its deployment, and its maintenance life.
It is also the earliest point at which design decisions governing the system to be built can be analyzed.
The architecture defines constraints on implementation
The architecture inhibits or enables a system's quality attributes
Predicting system qualities by studying the architecture
The architecture makes it easier to reason about and manage change
The architecture helps in evolutionary prototyping
The architecture enables more accurate cost and schedule estimates
Transferable abstraction of a system
Software architecture constitutes a relatively small, intellectually graspable model for how a system is
structured and how its elements work together, and this model is transferable across systems.
In particular, it can be applied to other systems exhibiting similar quality attribute and functional
requirements and can promote large-scale re-use.
Software product lines share a common architecture
Systems can be built using large, externally developed elements

List some of the quality Attributes as per ISO 9011


Whether a system will be able to exhibit its desired (or required) quality attributes is substantially
determined by its architecture. The relationship between architectures and quality is as follows:
If your system requires high performance, you need to manage the time-based behavior of elements and
the frequency and volume of inter-element communication.
If modifiability is important, you need to assign responsibilities to elements such that changes to the
system do not have far-reaching consequences.
If your system must be highly secure, you need to manage and protect inter-element communication and
which elements are allowed to access which information. You may also need to introduce specialized
elements (such as a trusted kernel) into the architecture.
If you believe scalability will be needed in your system, you have to carefully localize the use of
resources to facilitate the introduction of higher-capacity replacements.

If your project needs to deliver incremental subsets of the system, you must carefully manage intercomponent usage.
If you want the elements of your system to be re-usable in other systems, you need to restrict interelement coupling so that when you extract an element it does not come out with too many attachments to
its current environment to be useful.
The strategies for these and other quality attributes are supremely architectural. It is important to
understand, however, that architecture alone cannot guarantee functionality or quality. Poor downstream
design or implementation decisions can always undermine an adequate architectural design. Decisions at
all stages of the life cyclefrom high-level design to coding and implementationaffect system quality.
Therefore, quality is not completely a function of architectural design. To ensure quality, a good
architecture is necessary, but not sufficient.
What is the difference between a view and a structure
A view is a representation of a coherent set of architectural elements, consisting of:
-

a set of elements

the relationships among them

A structure is the set of elements itself, as they exist in software or hardware.

Often used interchangeably, text will distinguish.

View
A representation of a set of elements and the relations among them.
Structure
The set of elements itself, as they exist in software or hardware
Restrict our attention at any one moment to one (or a small number) of the software systems
structures.
To communicate meaningfully about an architecture, we must make clear which structure or
structures we are discussing at the moment
What are the different types of structures?
Correspond to the three broad types of decisions that architectural design involves:

How is the system to be structured as a set of code units (modules?)

How is the system to be structured as a set of elements that have runtime behavior
(components) and interactions (connectors)?

How is the system to relate to non-software structures in its environment (i.e., CPUs, file
systems, networks, development teams, etc. - allocation)?

There are 3 types of structures.


Module structures : units of implementation with assigned areas of functionality - usually static
Component-and-connector structures (C&C) : runtime components (principal units of computation)
and connectors (communication vehicles)

Allocation structures :show relationships between software elements & external environments (creation
or execution)

Module structures
Elements are modules, which are units of implementation.
* What is the primary functional responsibility assigned to each module?
* What other software elements is a module allowed to use?
* What other software does it actually use?
Decomposition
* shows how larger modules are decomposed into smaller ones recursively
Uses
* The units are: modules, procedures or resources on the interfaces of modules
* The units are related by the uses relation
Layered
* "uses relations" structured into layers
Class, or generalization
* shows the inherits-from or is-an-instance-of relations among the modules
Component-and-connector structures
Elements are runtime components (units of computation) and connectors (communication
vehicles among components)
The relation is attachment, showing how the components and connectors are hooked together
* What are the major executing components and how do they interact?
* What are the major shared data stores?
* Which parts of the system are replicated?
* How does data progress through the system?
* What parts of the system can run in parallel?
* How can the systems structure change as it executes?
Process, or communicating processes
* units are processes or threads that are connected with each other by communication,
synchronization, and/or exclusion operations
Concurrency
* The units are components and the connectors are logical threads
* A logical thread is a sequence of computation that can be allocated to a separate
physical thread

Shared data, or repository


* This structure comprises components and connectors that create, store, and access
persistent data
Client-server
* The components are the clients and servers, and the connectors are protocols and
messages
Allocation structures
the relationship between the software elements and the elements in one or more external
environments
* What processor does each software element execute on?
* In what files is each element stored during development, testing, and system building?
* What is the assignment of software elements to development teams?
Deployment
* Shows how software (usually a process from a component-and-connector view) is
assigned to hardware-processing and communication elements
* Relations are allocated-to and migrates-to if the allocation is dynamic
Implementation
* how software elements (usually modules) are mapped to the file structure(s)
Work assignment
* assigns responsibility for implementing and integrating the modules to development
teams
Non-functional Properties
Deals with quality part of the system.
Each structure provides a method for reasoning about some of the relevant quality attributes, for example:

the uses structure, must be engineered to build a system that can be easily extended or
contracted

the process structure is engineered to eliminate deadlock and reduce bottlenecks

the module decomposition structure is engineered to produce modifiable systems, etc.

Relating Structures to Each Other

Although the structures give different system perspectives, they are not independent.

Elements of one structure are related to elements in another, and we need to reason about these
relationships.

- For example, a module in a decomposition structure may map to one, part of one, or several,
components in a component-and-connector structure at runtime.

In general, mappings are many-many.

How to choose a structure


Kruchtens Four + One Views:
Logical - elements are key abstractions that are objects or classes in OO. This is a module view.

Process - addresses concurrency & distribution of functionality. This is a C&C (Connection and
Component) view.
Development - shows organization of software modules, libraries, subsystems, and units of development.
This is an allocation view.
Physical - maps other elements onto processing & communication nodes, also an allocation view, but
usually referred to specifically as the deployment view.

Lecture 3
(Quality classes and attribute, quality attribute scenario and architectural tactics)

What is Architecture Tradeoff Analysis Method

The Architecture Tradeoff Analysis Method (ATAM) is a method for evaluating software architectures
relative to quality attribute goals. ATAM evaluations expose architectural risks that potentially inhibit the
achievement of an organization's business goals. The ATAM gets its name because it not only reveals
how well an architecture satisfies particular quality goals, but it also provides insight into how those
quality goals interact with each otherhow they trade off against each other.
What is functionality?

Ability of the system to fulfill its responsibilities

Software Quality Attributes- also called non-functional properties

Orthogonal to functionality

is a constraint that the system must satisfy while delivering its functionality

Design Decisions

A constraint driven by external factors (use of a programming language, making


everything service oriented)

Consider the following requirements

User interface should be easy to use

User interface should allow redo/undo at any level of depth

Radio button or check box? Clear text? Screen layout? --- NOT architectural decisions

Architectural decision

The system should be modifiable with least impact

Modular design is must Architectural

Coding technique should be simple not architectural

Need to process 300 requests/sec

Interaction among components, data sharing issues--architectural

Choice of algorithm to handle transactions -- non architectural

Quality Attributes and Functionality

Any product (software products included) is sold based on its functionality which are its
features

Mobile phone, MS-Office software

Providing the desired functionality is often quite challenging

Time to market

Cost and budget

Rollout Schedule

Functionality DOES NOT determine the architecture. If functionality is the only thing you need

It is perfectly fine to create a monolithic software blob!

You wouldnt require modules, threads, distributed systems, etc.

Examples of Quality Attributes


The success of a product will ultimately rest on its Quality attributes
Too slow!-- performance

Keeps crashing! --- availability


So many security holes! --- security
Reboot every time a feature is changed! --- modifiability
Does not work with my home theater! --- integrability
Needs to be achieved throughout the design, implementation and deployment
Should be designed in and also evaluated at the architectural level
Quality attributes are NON-orthogonal
One can have an effect (positive or negative) on another
i.g. - Performance is troubled by nearly all other. All other demand more code where-as
performance demands the least
Defining and understanding system quality attributes

Defining a quality attribute for a system

System should be modifiable is vague, ambiguous requirement

How to associate a failure to a quality attribute

Is it an availability problem, performance problem or security or all of them?

Everyone has his own vocabulary of quality

ISO 9126 and ISO 25000 attempts to create a framework to define quality attributes.

Different quality attributes


Availability is concerned with system failure and duration of system failures. System failure means ...
when the system does not provide the service for which it was intended.
Modifiability is about the cost of change, both in time and money.
Performance is about timeliness. Events occur and the system must respond in a timely fashion.
Security is the ability of the system to prevent or resist unauthorized access while providing access to
legitimate users. An attack is an attempt to breach security.
Testability refers to the ease with which the software can be made to demonstrate its faults or lack
thereof. To be testable the system must control inputs and be able to observe outputs.
Usability is how easy it is for the user to accomplish tasks and what support the system provides for the
user to accomplish this. Dimensions are:
Learning system features
Using the system efficiently
Minimizing the impact of errors
Adapting the system to the
users needs
Increasing confidence and satisfaction
Quality Attribute Scenarios
A Quality Attribute Scenario is a quality attribute specific requirement.

There are 6 parts:


1. Source of stimulus (e.g., human, computer system, etc.)
2. Stimulus : a condition that needs to be considered
3. Environment : what are the conditions when the stimulus occurs?
4. Artifact : what elements of the system are stimulated.
5. Response : the activity undertaken after arrival of the stimulus
6. Response measure : when the response occurs It should be measurable so that the requirement can be
tested.
The quality attributes must be described in terms of scenarios, such as when 100 users initiate complete
payment transition, the payment component, under normal circumstances, will process the requests with
an average latency of three seconds. This statement, or scenario, allows an architect to make quantifiable
arguments about a system. A scenario defines the source of stimulus (users), the actual stimulus (initiate
transaction), the artifact affected (payment component), the environment in which it exists (normal
operation), the effect of the action (transaction processed), and the response measure (within three
seconds). Writing such detailed statements is only possible when relevant requirements have been
identified and an idea of components has been proposed.
Writing effective scenarios takes some time to learn. But it's an important skill, as it's in the scenarios
where the desired vague software behaviors are turned into tangible and measurable goals. Measurable
goals tells you what architectural approaches and tactis to apply as you design the system. It's easiest to
learn by looking at examples. See this resource on how you can download a catalog of over 100 well
defined quality attribute scenarios.

What do you mean by Architectural Tactics? Or How to Achieve Quality?


Scenarios help describe the qualities of a system, but they dont describe how they will be achieved.
Architectural tactics describe how a given quality can be achieved. For each quality there may be a large
set of tactics available to an architect. It is the architects job to select the right tactic in light of the needs
of the system and the environment. For example, a performance tactics may include options to develop
better processing algorithms, develop a system for parallel processing, or revise event scheduling policy.
Whatever tactic is chosen, it must be justified and documented.
System qualities can be categorized into four parts: runtime qualities, non-runtime qualities, business
qualities, and architecture qualities. Each of the categories and its associated qualities are briefly
described below. Other articles on this site provide more information about each of the software quality
attributes listed below, their applicable properties, and the conflicts the qualities.
Runtime System Qualities
Runtime System Qualities can be measured as the system executes.

Functionality
Definition: the ability of the system to do the work for which it was intended.
Performance
Definition: the response time, utilization, and throughput behavior of the system. Not to be confused with
human performance or system delivery time.
Security
Definition: a measure of systems ability to resist unauthorized attempts at usage or behavior
modification, while still providing service to legitimate users.
Availability (Reliability quality attributes falls under this category)
Definition: the measure of time that the system is up and running correctly; the length of time between
failures and the length of time needed to resume operation after a failure.
Usability
Definition: the ease of use and of training the end users of the system. Sub qualities: learnability,
efficiency, affect, helpfulness, control.
Interoperability
Definition: the ability of two or more systems to cooperate at runtime
Non-Runtime System Qualities
Non-Runtime System Qualities cannot be measured as the system executes.
Modifiability
Definition: the ease with which a software system can accommodate changes to its software
Portability
Definition: the ability of a system to run under different computing environments. The environment types
can be either hardware or software, but is usually a combination of the two.
Reusability
Definition: the degree to which existing applications can be reused in new applications.
Integrability
Definition: the ability to make the separately developed components of the system work correctly
together.
Testability
Definition: the ease with which software can be made to demonstrate its faults
Business Qualities
Non-Software System Qualities that influence other quality attributes.
Cost and Schedule
Definition: the cost of the system with respect to time to market, expected project lifetime, and utilization
of legacy and COTS systems.
Marketability
Definition: the use of the system with respect to market competition.
Appropriateness for Organization

Definition: availability of the human input, allocation of expertise, and alignment of team and software
structure. Business process re-engineering
Architecture Qualities
Quality attributes specific to the architecture itself.
Conceptual Integrity
Definition: the integrity of the overall structure that is composed from a number of small architectural
structures.
Correctness
Definition: accountability for satisfying all requirements of the system.
Domain Specific Qualities
Quality attributes found in specific business domains.
Sensitivity
Definition: the degree to which a system component can pick up something being measured.
Calibrability
Definition: ability of a system to recalibrate itself to some specific working range.

What are the different design decisions one needs to take to achieve quality?

To address a quality following 7 design decisions need to be taken

Allocation of responsibilities

Coordination model

Data model

Resource Management

Mapping among architectural elements

Binding time decisions

Technology choice

Allocation of Responsibilities
Decisions involving allocation of responsibilities include the following:
Identifying the important responsibilities, including basic system functions, architectural
infrastructure, and satisfaction of quality attributes.
Determining how these responsibilities are allocated to non-runtime and runtime elements
(namely, modules, components, and connectors).

Strategies for making these decisions include functional decomposition, modeling real-world objects,
grouping based on the major modes of system operation, or grouping based on similar quality
requirements: processing frame rate, security level, or expected changes.
In Chapters 511, where we apply these design decision categories to a number of important quality
attributes, the checklists we provide for the allocation of responsibilities category is derived
systematically from understanding the stimuli and responses listed in the general scenario for that QA.
Coordination Model
Software works by having elements interact with each other through designed mechanisms. These
mechanisms are collectively referred to as a coordination model. Decisions about the coordination model
include these:
Identifying the elements of the system that must coordinate, or are prohibited from coordinating.
Determining the properties of the coordination, such as timeliness, currency, completeness,
correctness, and consistency.
Choosing the communication mechanisms (between systems, between our system and external
entities, between elements of our system) that realize those properties. Important properties of the
communication mechanisms include stateful versus stateless, synchronous versus asynchronous,
guaranteed versus nonguaranteed delivery, and performance-related properties such as throughput
and latency.
Data Model
Every system must represent artifacts of system-wide interestdatain some internal fashion. The
collection of those representations and how to interpret them is referred to as the data model. Decisions
about the data model include the following:
Choosing the major data abstractions, their operations, and their properties. This includes
determining how the data items are created, initialized, accessed, persisted, manipulated,
translated, and destroyed.
Compiling metadata needed for consistent interpretation of the data.
Organizing the data. This includes determining whether the data is going to be kept in a relational
database, a collection of objects, or both. If both, then the mapping between the two different
locations of the data must be determined.
Management of Resources
An architect may need to arbitrate the use of shared resources in the architecture. These include hard
resources (e.g., CPU, memory, battery, hardware buffers, system clock, I/O ports) and soft resources (e.g.,
system locks, software buffers, thread pools, and non-thread-safe code).
Decisions for management of resources include the following:
Identifying the resources that must be managed and determining the limits for each.
Determining which system element(s) manage each resource.

Determining how resources are shared and the arbitration strategies employed when there is
contention.
Determining the impact of saturation on different resources. For example, as a CPU becomes
more heavily loaded, performance usually just degrades fairly steadily. On the other hand, when
you start to run out of memory, at some point you start paging/swapping intensively and your
performance suddenly crashes to a halt.
Mapping among Architectural Elements
An architecture must provide two types of mappings. First, there is mapping between elements in
different types of architecture structuresfor example, mapping from units of development (modules) to
units of execution (threads or processes). Next, there is mapping between software elements and
environment elementsfor example, mapping from processes to the specific CPUs where these processes
will execute.
Useful mappings include these:
The mapping of modules and runtime elements to each otherthat is, the runtime elements that
are created from each module; the modules that contain the code for each runtime element.
The assignment of runtime elements to processors.
The assignment of items in the data model to data stores.
The mapping of modules and runtime elements to units of delivery.
Binding Time Decisions
Binding time decisions introduce allowable ranges of variation. This variation can be bound at different
times in the software life cycle by different entitiesfrom design time by a developer to runtime by an
end user. A binding time decision establishes the scope, the point in the life cycle, and the mechanism for
achieving the variation.
The decisions in the other six categories have an associated binding time decision. Examples of such
binding time decisions include the following:
For allocation of responsibilities, you can have build-time selection of modules via a
parameterized makefile.
For choice of coordination model, you can design runtime negotiation of protocols.
For resource management, you can design a system to accept new peripheral devices plugged in
at runtime, after which the system recognizes them and downloads and installs the right drivers
automatically.
For choice of technology, you can build an app store for a smartphone that automatically
downloads the version of the app appropriate for the phone of the customer buying the app.
When making binding time decisions, you should consider the costs to implement the decision and the
costs to make a modification after you have implemented the decision. For example, if you are
considering changing platforms at some time after code time, you can insulate yourself from the effects
caused by porting your system to another platform at some cost. Making this decision depends on the

costs incurred by having to modify an early binding compared to the costs incurred by implementing the
mechanisms involved in the late binding.
Choice of Technology
Every architecture decision must eventually be realized using a specific technology. Sometimes the
technology selection is made by others, before the intentional architecture design process begins. In this
case, the chosen technology becomes a constraint on decisions in each of our seven categories. In other
cases, the architect must choose a suitable technology to realize a decision in every one of the categories.
Choice of technology decisions involve the following:
Deciding which technologies are available to realize the decisions made in the other categories.
Determining whether the available tools to support this technology choice (IDEs, simulators,
testing tools, etc.) are adequate for development to proceed.
Determining the extent of internal familiarity as well as the degree of external support available
for the technology (such as courses, tutorials, examples, and availability of contractors who can
provide expertise in a crunch) and deciding whether this is adequate to proceed.
Determining the side effects of choosing a technology, such as a required coordination model or
constrained resource management opportunities.
Determining whether a new technology is compatible with the existing technology stack. For
example, can the new technology run on top of or alongside the existing technology stack? Can it
communicate with the existing technology stack? Can the new technology be monitored and
managed?
Requirements for a system come in three categories:
1. Functional. These requirements are satisfied by including an appropriate set of responsibilities
within the design.
2. Quality attribute. These requirements are satisfied by the structures and behaviors of the
architecture.
3. Constraints. A constraint is a design decision thats already been made.
To express a quality attribute requirement, we use a quality attribute scenario. The parts of the scenario
are these:
1.
2.
3.
4.
5.
6.

Source of stimulus
Stimulus
Environment
Artifact
Response
Response measure

An architectural tactic is a design decision that affects a quality attribute response. The focus of a tactic is
on a single quality attribute response. Architectural patterns can be seen as packages of tactics.
The seven categories of architectural design decisions are these:

1.
2.
3.
4.
5.
6.
7.

Allocation of responsibilities
Coordination model
Data model
Management of resources
Mapping among architectural elements
Binding time decisions
Choice of technology

1. What is the relationship between a use case and a quality attribute scenario? If you wanted to add
quality attribute information to a use case, how would you do it?
2. Do you suppose that the set of tactics for a quality attribute is finite or infinite? Why?
3. Discuss the choice of programming language (an example of choice of technology) and its
relation to architecture in general, and the design decisions in the other six categories? For
instance, how can certain programming languages enable or inhibit the choice of particular
coordination models?
4. We will be using the automatic teller machine as an example throughout the chapters on quality
attributes. Enumerate the set of responsibilities that an automatic teller machine should support
and propose an initial design to accommodate that set of responsibilities. Justify your proposal.
5. Think about the screens that your favorite automatic teller machine uses. What do those screens
tell you about binding time decisions reflected in the architecture?
6. Consider the choice between synchronous and asynchronous communication (a choice in the
coordination mechanism category). What quality attribute requirements might lead you to choose
one over the other?
7. Consider the choice between stateful and stateless communication (a choice in the coordination
mechanism category). What quality attribute requirements might lead you to choose one over the
other?
8. Most peer-to-peer architecture employs late binding of the topology. What quality attributes does
this promote or inhibit?

Lecture 4
(Usability and Its Tactics)
What is Usability?
Usability is concerned with how easy it is for the user to accomplish a desired task and the kind of user
support the system provides. It can be broken down into the following areas:
Learning system features. If the user is unfamiliar with a particular system or a particular aspect
of it, what can the system do to make the task of learning easier?
Using a system efficiently. What can the system do to make the user more efficient in its
operation?
Minimizing the impact of errors. What can the system do so that a user error has minimal impact?
Adapting the system to user needs. How can the user (or the system itself) adapt to make the
user's task easier?
Increasing confidence and satisfaction. What does the system do to give the user confidence that
the correct action is being taken?
Two types of tactics support usability, each intended for two categories of "users." The first category,
runtime, includes those that support the user during system execution. The second category is based on
the iterative nature of user interface design and supports the interface developer at design time. It is
strongly related to the modifiability tactics already presented.

Goal of runtime usability tactics

Once a system is executing, usability is enhanced by giving the user feedback as to what the system is
doing and by providing the user with the ability to issue usability-based commands such as cancel, undo,
aggregate, and show multiple views support the user in either error correction or more efficient
operations.
Computer interaction have used the terms "user intiative,""system initiative," and "mixed initiative" to
describe which of the human/computer pair takes the initiative in performing certain actions and how the
interaction proceeds. The usability scenarios (Chapter 4, Understanding Quality Attributes) combine
initiatives from both perspectives. For example, when canceling a command the user issues a cancel
("user initiative") and the system responds. During the cancel, however, the system may put up a progress
indicator("system initiative)." Thus, cancel demonstrates "mixed initiative." We use this distinction
between user and system initiative to discuss the tactics that the architect uses to achieve the various
scenarios.
When the user takes the initiative, the architect designs a response as if for any other piece of
functionality. The architect must enumerate the responsibilities of the system to respond to the user
command. To use the cancel example again: When the user issues a cancel command, the system must be
listening for it (thus, there is the responsibility to have a constant listener that is not blocked by the
actions of whatever is being canceled); the command to cancel must be killed; any resources being used
by the canceled command must be freed; and components that are collaborating with the canceled
command must be informed so that they can also take appropriate action.
When the system takes the initiative, it must rely on some information (a model) about the user, the task
being undertaken by the user, or the system state itself. Each model requires various types of input to
accomplish its initiative. The system initiative tactics are those that identify the models the system uses to
predict either its own behavior or the user's intention. Encapsulating this information will enable an
architect to more easily tailor and modify those models. Tailoring and modification can be either
dynamically based on past user behavior or offline during development.
Maintain a model of the task. In this case, the model maintained is that of the task. The task model is used
to determine context so the system can have some idea of what the user is attempting and provide various
kinds of assistance. For example, knowing that sentences usually start with capital letters would allow an
application to correct a lower-case letter in that position.
Maintain a model of the user. In this case, the model maintained is of the user. It determines the
user's knowledge of the system, the user's behavior in terms of expected response time, and other
aspects specific to a user or a class of users. For example, maintaining a user model allows the
system to pace scrolling so that pages do not fly past faster than they can be read.
Maintain a model of the system. In this case, the model maintained is that of the system. It
determines the expected system behavior so that appropriate feedback can be given to the user.
The system model predicts items such as the time needed to complete current activity.

DESIGN-TIME TACTICS
User interfaces are typically revised frequently during the testing process. That is, the usability engineer
will give the developers revisions to the current user interface design and the developers will implement
them. This leads to a tactic that is a refinement of the modifiability tactic of semantic coherence:
Separate the user interface from the rest of the application. Localizing expected changes is the
rationale for semantic coherence. Since the user interface is expected to change frequently both
during the development and after deployment, maintaining the user interface code separately will
localize changes to it. The software architecture patterns developed to implement this tactic and to
support the modification of the user interface are:
- Model-View-Controller
- Presentation-Abstraction-Control
- Seeheim Command Pattern (cab be used to implement undo/redo op)
- Arch/Slinky
Summary of runtime usability tactics

End user is the one who is going to use the system. User wants to learn the system to use it efficiently.
Suppose if the user wants to update the system, if the usability is very good, the update will be just for the
functionality without changing much in the user interface.
The runtime tactics are how the system initiates the application and how the system responds efficiently.
User Initiative and System Response

Cancel

When the user issues cancel, the system must listen to it (in a separate thread)

Cancel action must clean the memory, release other resources and send cancel command
to the collaborating components

Undo

System needs to maintain a history of earlier states which can be restored

This information can be stored as snapshots

Pause/resume

Should implement the mechanism to temporarily stop a running activity, take its snapshot
and then release the resource for others use

Aggregate (change font of the entire paragraph)

For an operation to be applied to a large number of objects

Provide facility to group these objects and apply the operation to the group

System Initiated

Task model

Determine the current runtime context, guess what user is attempting, and then help

Correct spelling during typing but not during password entry (Context Specific help)

System model

Maintains its own model and provide feedback of some internal activities

Time needed to complete the current activity (i.e % of completion of download)

User model

Captures users knowledge of the system, behavioral pattern and provide help

Adjust scrolling speed, user specific customization, locale specific adjustment

What are the different software Architecture modeling for user interfaces?

Model view controller architecture pattern

Presentation abstraction control

Command Pattern (cab be used to implement undo/redo op)

Arch/Slinky

Similar to Model view controller

Usability General Scenarios

Figure above gives an example of a usability scenario: A user, wanting to minimize the impact of an
error, wishes to cancel a system operation at runtime; cancellation takes place in less than one second.
The portions of the usability general scenarios are:
Source of stimulus. The end user is always the source of the stimulus.
Stimulus. The stimulus is that the end user wishes to use a system efficiently, learn to use the
system, minimize the impact of errors, adapt the system, or feel comfortable with the system. In
our example, the user wishes to cancel an operation, which is an example of minimizing the
impact of errors.
Artifact. The artifact is always the system.
Environment. The user actions with which usability is concerned always occur at runtime or at
system configuration time. Any action that occurs before then is performed by developers and,
although a user may also be the developer, we distinguish between these roles even if performed
by the same person. In figure above, the cancellation occurs at runtime.
Response. The system should either provide the user with the features needed or anticipate the
user's needs. In our example, the cancellation occurs as the user wishes and the system is restored
to its prior state.
Response measure. The response is measured by task time, number of errors, number of problems
solved, user satisfaction, gain of user knowledge, ratio of successful operations to total
operations, or amount of time/data lost when an error occurs. In figure above, the cancellation
should occur in less than one second.
The usability general scenario generation table is given below
Portion of Scenario

Possible Values

Portion of Scenario

Possible Values

Source

End user

Stimulus

Wants to
learn system features; use system efficiently; minimize impact of errors;
adapt system; feel comfortable

Artifact

System

Environment

At runtime or configure time

Response

System provides one or more of the following responses:


to support "learn system features"
help system is sensitive to context; interface is familiar to user; interface is
usable in an unfamiliar context
to support "use system efficiently":
aggregation of data and/or commands; re-use of already entered data and/or
commands; support for efficient navigation within a screen; distinct views
with consistent operations; comprehensive searching; multiple simultaneous
activities
to "minimize impact of errors":
undo, cancel, recover from system failure, recognize and correct user error,
retrieve forgotten password, verify system resources
to "adapt system":
customizability; internationalization
to "feel comfortable":
display system state; work at the user's pace

Response Measure

Task time, number of errors, number of problems solved, user satisfaction,


gain of user knowledge, ratio of successful operations to total operations,
amount of time/data lost

COMMUNICATING CONCEPTS USING GENERAL SCENARIOS


One of the uses of general scenarios is to enable stakeholders to communicate. We have already pointed
out that each attribute community has its own vocabulary to describe its basic concepts and that different
terms can represent the same occurrence. This may lead to miscommunication. During a discussion of
performance, for example, a stakeholder representing users may not realize that the latency of the
response to events has anything to do with users. Facilitating this kind of understanding aids discussions
of architectural decisions, particularly about trade offs.
Quality Attribute Stimuli
Quality Attribute

Stimulus

Availability

Unexpected event, nonoccurrence of expected event

Modifiability

Request to add/delete/change/vary functionality, platform, quality attribute, or


capacity

Performance

Periodic, stochastic, or sporadic

Security

Tries to display, modify, change/delete information, access, or reduce


availability to system services

Testability

Completion of phase of system development

Usability

Wants to learn system features, use a system efficiently, minimize the impact
of errors, adapt the system, feel comfortable

Above table gives the stimuli possible for each of the attributes and shows a number of different concepts.
Some stimuli occur during runtime and others occur before. The problem for the architect is to understand
which of these stimuli represent the same occurrence, which are aggregates of other stimuli, and which
are independent. Once the relations are clear, the architect can communicate them to the various
stakeholders using language that each comprehends. We cannot give the relations among stimuli in a
general way because they depend partially on environment. A performance event may be atomic or may
be an aggregate of other lower-level occurrences; a failure may be a single performance event or an
aggregate. For example, it may occur with an exchange of several messages between a client and a server
(culminating in an unexpected message), each of which is an atomic event from a performance
perspective.

An Analysis Framework for Specifying Quality Attributes


[For each quality-attribute-specific requirement.]

Source of stimulus. This is some entity (a human, a computer system, or any other actuator) that
generated the stimulus.
Stimulus. A condition that needs to be considered when it arrives at a system.
Environment. The stimulus occurs within certain conditions. The system may be in an overload condition
or may be idle when the stimulus occurs.
Artifact. Some artifact is stimulated. This may be the whole system or some pieces of it.
Response. The activity undertaken after the arrival of the stimulus.
Response measure. When the response occurs, it should be measurable in some fashion so that the
requirement can be tested.

Lecture 5
(Availability and Its Tactics)

What is Fault, Error and Failure?


Fault : It is a condition that causes the software to fail to perform its required function.
Error : Refers to difference between Actual Output and Expected output.
Failure : It is the inability of a system or component to perform required function according to its
specification.

IEEE Definitions
Failure: External behavior is incorrect
Fault: Discrepancy in code that causes a failure.
Error: Human mistake that caused fault
Note:
Error is terminology of Developer.

Bug is terminology of Tester

Failure Classification

Transient - only occurs with certain inputs

Permanent - occurs on all inputs

Recoverable - system can recover without operator help

Unrecoverable - operator has to help

Non-corrupting - failure does not corrupt system state or data

Corrupting - system state or data are altered

Availability

Readiness of the software to carry out its task

A related concept is Reliability

100% available (which is actually impossible) means it is always ready to perform the
intended task

Ability to continuously provide correct service without failure

Availability vs Reliability

A software is said to be available even when it fails but recovers immediately

Such a software will NOT be called Reliable

Thus, Availability measures the fraction of time system is really available for use

Takes repair and restart times into account

Relevant for non-stop continuously running systems (e.g. traffic signal)

Availability is concerned with system failure and its associated consequences. A system failure occurs
when the system no longer delivers a service consistent with its specification. Such a failure is observable
by the system's users?either humans or other systems.
Among the areas of concern are how system failure is detected, how frequently system failure may occur,
what happens when a failure occurs, how long a system is allowed to be out of operation, when failures
may occur safely, how failures can be prevented, and what kinds of notifications are required when a
failure occurs.

We need to differentiate between failures and faults. A fault may become a failure if not corrected or
masked. That is, a failure is observable by the system's user and a fault is not. When a fault does become
observable, it becomes a failure. For example, a fault can be choosing the wrong algorithm for a
computation, resulting in a miscalculation that causes the system to fail.
Once a system fails, an important related concept becomes the time it takes to repair it. Since a system
failure is observable by users, the time to repair is the time until the failure is no longer observable. This
may be a brief delay in the response time or it may be the time it takes someone to fly to a remote location
in the mountains of Peru to repair a piece of mining machinery (this example was given by a person who
was responsible for repairing the software in a mining machine engine.).
The distinction between faults and failures allows discussion of automatic repair strategies. That is, if
code containing a fault is executed but the system is able to recover from the fault without it being
observable, there is no failure.
The availability of a system is the probability that it will be operational when it is needed. This is
typically defined as

From this come terms like 99.9% availability, or a 0.1% probability that the system will not be
operational when needed.
Scheduled downtimes (i.e., out of service) are not usually considered when calculating availability, since
the system is "not needed" by definition. This leads to situations where the system is down and users are
waiting for it, but the downtime is scheduled and so is not counted against any availability requirements.
What is Software Reliability?

Probability of failure-free operation of a system over a specified time within a specified


environment for a specified purpose

Difficult to measure the purpose,

Difficult to measure environmental factors.

Its not enough to consider simple failure rate:

Not all failures are created equal; some have much more serious consequences.

Might be able to recover from some failures reasonably.

Differentiate between Availability, Reliability and Serviceability


The term reliability refers to the ability of a computer-related hardware or software component to
consistently perform according to its specifications. In theory, a reliable product is totally free of technical
errors. In practice, vendors commonly express product reliability as a percentage. The Institute of
Electrical and Electronics Engineers ( IEEE ) sponsors an organization devoted to reliability in
engineering known as the IEEE Reliability Society (IEEE RS).
Availability is the ratio of time a system or component is functional to the total time it is required or
expected to function. This can be expressed as a direct proportion (for example, 9/10 or 0.9) or as a
percentage (for example, 90%). It can also be expressed in terms of average downtime per week, month
or year or as total downtime for a given week, month or year. Sometimes availability is expressed in
qualitative terms, indicating the extent to which a system can continue to work when a significant
component or set of components goes down.
Serviceability is an expression of the ease with which a component, device or system can be maintained
and repaired. Early detection of potential problems is critical in this respect. Some systems have the
ability to correct problems automatically before serious trouble occurs; examples include built-in features
of OSs such as Microsoft Windows XP and auto-protect-enabled anti-virus software and spyware
detection and removal programs. Ideally, maintenance and repair operations should cause as little
downtime or disruption as possible.
Period of loss of availability determined by:
Time to detect failure
Time to correct failure
Time to restart application
Availability Scenarios
Source of stimulus. We differentiate between internal and external indications of faults or failure
since the desired system response may be different. In our example, the unexpected message arrives from
outside the system.
Stimulus. A fault of one of the following classes occurs.
- omission. A component fails to respond to an input.
- crash. The component repeatedly suffers omission faults.
- timing. A component responds but the response is early or late.
- response. A component responds with an incorrect value.
- In Figure 4.3, the stimulus is that an unanticipated message arrives. This is an example of a
timing fault. The component that generated the message did so at a different time than expected.

Artifact. This specifies the resource that is required to be highly available, such as a processor,
communication channel, process, or storage.
Environment. The state of the system when the fault or failure occurs may also affect the desired
system response. For example, if the system has already seen some faults and is operating in other
than normal mode, it may be desirable to shut it down totally. However, if this is the first fault
observed, some degradation of response time or function may be preferred. In our example, the
system is operating normally.
Response. There are a number of possible reactions to a system failure. These include logging the
failure, notifying selected users or other systems, switching to a degraded mode with either less
capacity or less function, shutting down external systems, or becoming unavailable during repair.
In our example, the system should notify the operator of the unexpected message and continue to
operate normally.
Response measure. The response measure can specify an availability percentage, or it can
specify a time to repair, times during which the system must be available, or the duration for
which the system must be available. In Figure 4.3, there is no downtime as a result of the
unexpected message.

Possible values for Availability Scenario


Portion of Scenario

Possible Values

Source

Internal to the system; external to the system

Stimulus

Fault: omission, crash, timing, response

Artifact

System's processors, communication channels, persistent storage, processes

Environment

Normal operation;
degraded mode (i.e., fewer features, a fall back solution)

Response

System should detect event and do one or more of the following:


record it
notify appropriate parties, including the user and other systems
disable sources of events that cause fault or failure according to defined
rules
be unavailable for a prespecified interval, where interval depends on

Portion of Scenario

Possible Values
criticality of system
continue to operate in normal or degraded mode

Response Measure

Time interval when the system must be available


Availability time
Time interval in which system can be in degraded mode
Repair time

There are two broad approaches to tackle the fault


Fault tolerance
fault prevention

Fault Tolerance

Allow the system to continue in presence of faults. Methods are

Error Detection

Error Masking (through redundancy)

Recovery

Fault Prevention

Techniques to avoid the faults to occur

Availability Tactics

Availability Tactics- Fault Detection

Fault detection
Ping/echo;
Heartbeat;
Exceptions
Fault recovery
Mostly redundancy based
[byzantine faults] Voting: multiple processes working in parallel.
[crash, timing] Active redundancy hot restart
[crash] Passive redundancy (warm restart), spare.

Shadow

Repair the component

Run in shadow mode to observe the behavior

Once it performs correctly, reintroduce it

State resynch

Related to the hot and warm restart

When the faulty component is started, its state must be upgraded to the latest state.

Update depends on downtime allowed, size of the state, number of messages


required for the update..

Check pointing and recovery

Application periodically commits its state and puts a checkpoint

Recovery routines can either roll-forward or roll-back the failed component to a


checkpoint when it recovers

Escalating Restart

Allows system to restart at various levels of granularity

Kill threads and recreate child processes

Frees and reinitialize memory locations

Hard restart of the software

Nonstop forwarding (used in router design)

If the main recipient fails, the alternate routers keep receiving the packets
When the main recipient comes up, it rebuilds its own state

Reintroduction: shadow operation, resynchronization, checkpoint/rollback


Fault prevention
Removal from service; Transactions

Faulty component removal

Fault detector predicts the imminent failure based on processs observable parameters
(i.e. CPU usage is very high or Memory Consumption is going high )

The process can be removed (rebooted) and can be auto-restart

Transaction

Group relevant set of instructions to a transaction

Execute a transaction so that either everyone passes or all fails

Predictive Modeling

Analyzes past failure history to build an empirical failure model

The model is used to predict upcoming failure

Software upgrade (preventive maintenance)

Periodic upgrade of the software through patching prevents known vulnerabilities

A failure occurs when the system no longer delivers a service that is consistent with its specification; this
failure is observable by the system's users. A fault (or combination of faults) has the potential to cause a
failure. Recall also that recovery or repair is an important aspect of availability. The tactics we discuss in
this section will keep faults from becoming failures or at least bound the effects of the fault and make
repair possible.

Many of the tactics we discuss are available within standard execution environments such as operating
systems, application servers, and database management systems. It is still important to understand the
tactics used so that the effects of using a particular one can be considered during design and evaluation.
All approaches to maintaining availability involve some type of redundancy, some type of health
monitoring to detect a failure, and some type of recovery when a failure is detected. In some cases, the
monitoring or recovery is automatic and in others it is manual.
We first consider fault detection. We then consider fault recovery and finally, briefly, fault prevention.
FAULT DETECTION
Three widely used tactics for recognizing faults are ping/echo, heartbeat, and exceptions.
Ping/echo. One component issues a ping and expects to receive back an echo, within a predefined
time, from the component under scrutiny. This can be used within a group of components
mutually responsible for one task. It can also used be used by clients to ensure that a server object
and the communication path to the server are operating within the expected performance bounds.
"Ping/echo" fault detectors can be organized in a hierarchy, in which a lowest-level detector pings
the software processes with which it shares a processor, and the higher-level fault detectors ping
lower-level ones. This uses less communications bandwidth than a remote fault detector that
pings all processes.
Heartbeat (dead man timer). In this case one component emits a heartbeat message periodically
and another component listens for it. If the heartbeat fails, the originating component is assumed
to have failed and a fault correction component is notified. The heartbeat can also carry data. For
example, an automated teller machine can periodically send the log of the last transaction to a
server. This message not only acts as a heartbeat but also carries data to be processed.
Each node implements a lightweight process called heartbeat daemon that periodically
(say 10 sec) sends heartbeat message to the master node.
If master receives heartbeat from a node from both connections (a node is connected
redundantly for fault-tolerance), everything is ok
If it gets from one connections, it reports that one of the network connection is faulty
If it does not get any heartbeat, it reports that the node is dead (assuming that the master
gets heartbeat from other nodes)
Trick: Often heartbeat signal has a payload (say resource utilization info of that node)
Exceptions. One method for recognizing faults is to encounter an exception, which is raised when
one of the fault classes we discussed in Chapter 4 is recognized. The exception handler typically
executes in the same process that introduced the exception.

The ping/echo and heartbeat tactics operate among distinct processes, and the exception tactic operates
within a single process. The exception handler will usually perform a semantic transformation of the fault
into a form that can be processed.
FAULT RECOVERY
Fault recovery consists of preparing for recovery and making the system repair. Some preparation and
repair tactics follow.
Voting. Processes running on redundant processors each take equivalent input and compute a
simple output value that is sent to a voter. If the voter detects deviant behavior from a single
processor, it fails it. The voting algorithm can be "majority rules" or "preferred component" or
some other algorithm. This method is used to correct faulty operation of algorithms or failure of a
processor and is often used in control systems. If all of the processors utilize the same algorithms,
the redundancy detects only a processor fault and not an algorithm fault. Thus, if the consequence
of a failure is extreme, such as potential loss of life, the redundant components can be diverse.
One extreme of diversity is that the software for each redundant component is developed by
different teams and executes on dissimilar platforms. Less extreme is to develop a single software
component on dissimilar platforms. Diversity is expensive to develop and maintain and is used
only in exceptional circumstances, such as the control of surfaces on aircraft. It is usually used for
control systems in which the outputs to the voter are straightforward and easy to classify as
equivalent or deviant, the computations are cyclic, and all redundant components receive
equivalent inputs from sensors. Diversity has no downtime when a failure occurs since the voter
continues to operate. Variations on this approach include the Simplex approach, which uses the
results of a "preferred" component unless they deviate from those of a "trusted" component, to
which it defers. Synchronization among the redundant components is automatic since they are all
assumed to be computing on the same set of inputs in parallel.
Active redundancy (hot restart). All redundant components respond to events in parallel.
Consequently, they are all in the same state. The response from only one component is used
(usually the first to respond), and the rest are discarded. When a fault occurs, the downtime of
systems using this tactic is usually milliseconds since the backup is current and the only time to
recover is the switching time. Active redundancy is often used in a client/server configuration,
such as database management systems, where quick responses are necessary even when a fault
occurs. In a highly available distributed system, the redundancy may be in the communication
paths. For example, it may be desirable to use a LAN with a number of parallel paths and place
each redundant component in a separate path. In this case, a single bridge or path failure will not
make all of the system's components unavailable.
Synchronization is performed by ensuring that all messages to any redundant component are sent
to all redundant components. If communication has a possibility of being lost (because of noisy or
overloaded communication lines), a reliable transmission protocol can be used to recover. A
reliable transmission protocol requires all recipients to acknowledge receipt together with some
integrity indication such as a checksum. If the sender cannot verify that all recipients have

received the message, it will resend the message to those components not acknowledging receipt.
The resending of unreceived messages (possibly over different communication paths) continues
until the sender marks the recipient as out of service.
Passive redundancy (warm restart/dual redundancy/triple redundancy). One component (the
primary) responds to events and informs the other components (the standbys) of state updates
they must make. When a fault occurs, the system must first ensure that the backup state is
sufficiently fresh before resuming services. This approach is also used in control systems, often
when the inputs come over communication channels or from sensors and have to be switched
from the primary to the backup on failure. Chapter 6, describing an air traffic control example,
shows a system using it. In the air traffic control system, the secondary decides when to take over
from the primary, but in other systems this decision can be done in other components. This tactic
depends on the standby components taking over reliably. Forcing switch overs periodically?for
example, once a day or once a week?increases the availability of the system. Some database
systems force a switch with storage of every new data item. The new data item is stored in a
shadow page and the old page becomes a backup for recovery. In this case, the downtime can
usually be limited to seconds.
Synchronization is the responsibility of the primary component, which may use atomic broadcasts
to the secondaries to guarantee synchronization.
Spare. A standby spare computing platform is configured to replace many different failed
components. It must be rebooted to the appropriate software configuration and have its state
initialized when a failure occurs. Making a checkpoint of the system state to a persistent device
periodically and logging all state changes to a persistent device allows for the spare to be set to
the appropriate state. This is often used as the standby client workstation, where the user can
move when a failure occurs. The downtime for this tactic is usually minutes.
There are tactics for repair that rely on component reintroduction. When a redundant component
fails, it may be reintroduced after it has been corrected. Such tactics are shadow operation, state
resynchronization, and rollback.
Shadow operation. A previously failed component may be run in "shadow mode" for a short time
to make sure that it mimics the behavior of the working components before restoring it to service.
State resynchronization. The passive and active redundancy tactics require the component being
restored to have its state upgraded before its return to service. The updating approach will depend
on the downtime that can be sustained, the size of the update, and the number of messages
required for the update. A single message containing the state is preferable, if possible.
Incremental state upgrades, with periods of service between increments, lead to complicated
software.
Checkpoint/rollback. A checkpoint is a recording of a consistent state created either periodically
or in response to specific events. Sometimes a system fails in an unusual manner, with a
detectably inconsistent state. In this case, the system should be restored using a previous

checkpoint of a consistent state and a log of the transactions that occurred since the snapshot was
taken.
FAULT PREVENTION
The following are some fault prevention tactics.
Removal from service. This tactic removes a component of the system from operation to undergo
some activities to prevent anticipated failures. One example is rebooting a component to prevent
memory leaks from causing a failure. If this removal from service is automatic, an architectural
strategy can be designed to support it. If it is manual, the system must be designed to support it.
Transactions. A transaction is the bundling of several sequential steps such that the entire bundle
can be undone at once. Transactions are used to prevent any data from being affected if one step
in a process fails and also to prevent collisions among several simultaneous threads accessing the
same data.
Process monitor. Once a fault in a process has been detected, a monitoring process can delete the
nonperforming process and create a new instance of it, initialized to some appropriate state as in
the spare tactic.
Detect Fault

Timer and Timestamping

If the running process does not reset the timer periodically, the timer triggers off and
announces failure

Timestamping: assigns a timestamp (can be a count, based on the local clock) with a
message in a decentralized message passing system. Used to detect inconsistency

Voting (TMR Triple Modular Redundancy)

Three identical copies of a module are connected to a voting system which compares
outputs from all the three components. If there is an inconsistency in their outputs when
subjected to the same input, the voting system reports error/inconsistency

Majority voting, or preferred component wins

Availability Tactics- Error Masking

Hot spare (Active redundancy)

Every redundant process is active

When one node fails, another one is taken up

Downtime is millisec

Warm restart (Passive redundancy)

Standbys keep syncing their states with the primary one

When primary fails, backup starts

Spare copy (Cold)

Spares are offline till the primary fails, then it is restarted

Typically restarts to the checkpointed position

Downtime in minute

Used when the MTTF(Mean Time To Failure) is high and High Availability is not that
critical

Service Degradation

Ignore faulty behavior

Most critical components are kept live and less critical component functionality is
dropped (i.e. In Window OS, Basic Mode/Repair Mode rather than complete )

E.g. If the component send false messages or is under DOS(Denail-of-Service) attack,


ignore output from this component

Exception Handling this masks or even can correct the error

Quality Design Decisions


Recall that one can view an architecture as the result of applying a collection of design decisions. What
we present here is a systematic categorization of these decisions so that an architect can focus attention on
those design dimensions likely to be most troublesome.
The seven categories of design decisions are
1.
2.
3.
4.
5.
6.
7.

Allocation of responsibilities
Coordination model
Data model
Management of resources
Mapping among architectural elements
Binding time decisions
Choice of technology

These categories are not the only way to classify architectural design decisions, but they do provide a
rational division of concerns. These categories might overlap, but its all right if a particular decision
exists in two different categories, because the concern of the architect is to ensure that every important
decision is considered. Our categorization of decisions is partially based on our definition of software

architecture in that many of our categories relate to the definition of structures and the relations among
them.
Allocation of Responsibilities
Decisions involving allocation of responsibilities include the following:
Identifying the important responsibilities, including basic system functions, architectural
infrastructure, and satisfaction of quality attributes.
Determining how these responsibilities are allocated to non-runtime and runtime elements
(namely, modules, components, and connectors).
Strategies for making these decisions include functional decomposition, modeling real-world objects,
grouping based on the major modes of system operation, or grouping based on similar quality
requirements: processing frame rate, security level, or expected changes.
In Chapters 511, where we apply these design decision categories to a number of important quality
attributes, the checklists we provide for the allocation of responsibilities category is derived
systematically from understanding the stimuli and responses listed in the general scenario for that QA.
Coordination Model
Software works by having elements interact with each other through designed mechanisms. These
mechanisms are collectively referred to as a coordination model. Decisions about the coordination model
include these:
Identifying the elements of the system that must coordinate, or are prohibited from coordinating.
Determining the properties of the coordination, such as timeliness, currency, completeness,
correctness, and consistency.
Choosing the communication mechanisms (between systems, between our system and external
entities, between elements of our system) that realize those properties. Important properties of the
communication mechanisms include stateful versus stateless, synchronous versus asynchronous,
guaranteed versus nonguaranteed delivery, and performance-related properties such as throughput
and latency.
Data Model
Every system must represent artifacts of system-wide interestdatain some internal fashion. The
collection of those representations and how to interpret them is referred to as the data model. Decisions
about the data model include the following:
Choosing the major data abstractions, their operations, and their properties. This includes
determining how the data items are created, initialized, accessed, persisted, manipulated,
translated, and destroyed.
Compiling metadata needed for consistent interpretation of the data.

Organizing the data. This includes determining whether the data is going to be kept in a relational
database, a collection of objects, or both. If both, then the mapping between the two different
locations of the data must be determined.
Management of Resources
An architect may need to arbitrate the use of shared resources in the architecture. These include hard
resources (e.g., CPU, memory, battery, hardware buffers, system clock, I/O ports) and soft resources (e.g.,
system locks, software buffers, thread pools, and non-thread-safe code).
Decisions for management of resources include the following:
Identifying the resources that must be managed and determining the limits for each.
Determining which system element(s) manage each resource.
Determining how resources are shared and the arbitration strategies employed when there is
contention.
Determining the impact of saturation on different resources. For example, as a CPU becomes
more heavily loaded, performance usually just degrades fairly steadily. On the other hand, when
you start to run out of memory, at some point you start paging/swapping intensively and your
performance suddenly crashes to a halt.
Mapping among Architectural Elements
An architecture must provide two types of mappings. First, there is mapping between elements in
different types of architecture structuresfor example, mapping from units of development (modules) to
units of execution (threads or processes). Next, there is mapping between software elements and
environment elementsfor example, mapping from processes to the specific CPUs where these processes
will execute.
Useful mappings include these:
The mapping of modules and runtime elements to each otherthat is, the runtime elements that
are created from each module; the modules that contain the code for each runtime element.
The assignment of runtime elements to processors.
The assignment of items in the data model to data stores.
The mapping of modules and runtime elements to units of delivery.
Binding Time Decisions
Binding time decisions introduce allowable ranges of variation. This variation can be bound at different
times in the software life cycle by different entitiesfrom design time by a developer to runtime by an
end user. A binding time decision establishes the scope, the point in the life cycle, and the mechanism for
achieving the variation.
The decisions in the other six categories have an associated binding time decision. Examples of such
binding time decisions include the following:

For allocation of responsibilities, you can have build-time selection of modules via a
parameterized makefile.
For choice of coordination model, you can design runtime negotiation of protocols.
For resource management, you can design a system to accept new peripheral devices plugged in
at runtime, after which the system recognizes them and downloads and installs the right drivers
automatically.
For choice of technology, you can build an app store for a smartphone that automatically
downloads the version of the app appropriate for the phone of the customer buying the app.
When making binding time decisions, you should consider the costs to implement the decision and the
costs to make a modification after you have implemented the decision. For example, if you are
considering changing platforms at some time after code time, you can insulate yourself from the effects
caused by porting your system to another platform at some cost. Making this decision depends on the
costs incurred by having to modify an early binding compared to the costs incurred by implementing the
mechanisms involved in the late binding.
Choice of Technology
Every architecture decision must eventually be realized using a specific technology. Sometimes the
technology selection is made by others, before the intentional architecture design process begins. In this
case, the chosen technology becomes a constraint on decisions in each of our seven categories. In other
cases, the architect must choose a suitable technology to realize a decision in every one of the categories.
Choice of technology decisions involve the following:
Deciding which technologies are available to realize the decisions made in the other categories.
Determining whether the available tools to support this technology choice (IDEs, simulators,
testing tools, etc.) are adequate for development to proceed.
Determining the extent of internal familiarity as well as the degree of external support available
for the technology (such as courses, tutorials, examples, and availability of contractors who can
provide expertise in a crunch) and deciding whether this is adequate to proceed.
Determining the side effects of choosing a technology, such as a required coordination model or
constrained resource management opportunities.
Determining whether a new technology is compatible with the existing technology stack. For
example, can the new technology run on top of or alongside the existing technology stack? Can it
communicate with the existing technology stack? Can the new technology be monitored and
managed?

Hardware vs Software Reliability Metrics

Hardware metrics are not suitable for software since its metrics are based on notion of component
failure

Software failures are often design failures

Often the system is available after the failure has occurred

Hardware components can wear out

Software Reliability Metrics

Reliability metrics are units of measure for system reliability

System reliability is measured by counting the number of operational failures and relating these to
demands made on the system at the time of failure

A long-term measurement program is required to assess the reliability of critical systems

Time Units

Raw Execution Time

Calendar Time

non-stop system

If the system has regular usage patterns

Number of Transactions

demand type transaction systems

Reliability Metric POFOD

Probability Of Failure On Demand (POFOD):

Likelihood that system will fail when a request is made.

E.g., POFOD of 0.001 means that 1 in 1000 requests may result in failure.

Any failure is important; doesnt matter how many if the failure > 0

Relevant for safety-critical systems

Reliability Metric ROCOF & MTTF

Rate Of Occurrence Of Failure (ROCOF):

Frequency of occurrence of failures.

E.g., ROCOF of 0.02 means 2 failures are likely in each 100 time units.

Relevant for transaction processing systems

Mean Time To Failure (MTTF):

Measure of time between failures.

E.g., MTTF of 500 means an average of 500 time units passes between two consecutive
failures.

Relevant for systems with long transactions

Rate of Fault Occurrence

Reflects rate of failure in the system

Useful when system has to process a large number of similar requests that are relatively frequent

Relevant for operating systems and transaction processing systems

Mean Time to Failure (MTTF)

Measures time between observable system failures

For stable systems MTTF = 1/ROCOF

Relevant for systems when individual transactions take lots of processing time (e.g. CAD
systems)

Failure Consequences

When specifying reliability both the number of failures and the consequences of each matter

Failures with serious consequences are more damaging than those where repair and recovery is
straightforward

In some cases, different reliability specifications may be defined for different failure types

Building Reliability Specification

For each sub-system analyze consequences of possible system failures

From system failure analysis partition failure into appropriate classes

For each class send out the appropriate reliability metric

Functional and Non-functional Requirements


System functional requirements may specify error checking, recovery features, and system failure
protection
System reliability and availability are specified as part of the non-functional requirements for the system.

Reliability Metrics
Probability of Failure on Demand (POFOD)
POFOD = 0.001 For one in every 1000 requests the service fails per time unit
Rate of Fault Occurrence (ROCOF)
ROCOF = 0.02 Two failures for each 100 operational time units of operation
Mean Time to Failure (MTTF) ->average time between observed failures (aka MTBF)
Availability = MTTF / (MTBF+MTTR)
MTTF = Mean Time Between Failure
MTTR = Mean Time to Repair
Reliability = MTBF / (1+MTBF)

How to Calculate Availability?


Measures the fraction of time system is really available for use
Takes repair and restart times into account
Relevant for non-stop continuously running systems (e.g. traffic signal)

Probability of Failure on Demand


Probability that the system will fail when a service request is made
Useful when requests are made on an intermittent or infrequent basis
Appropriate for protection systems service requests may be rare and consequences can be
Relevant for many safety-critical systems with exception handlers
Rate of Fault Occurrence
Reflects rate of failure in the system
Useful when system has to process a large number of similar requests that are relatively
frequent
Relevant for operating systems and transaction processing systems
Mean Time to Failure
Measures time between observable system failures
For stable systems MTTF = 1/ROCOF
Relevant for systems when individual transactions take lots of processing time (e.g. CAD or
WP systems)
Failure Consequences
Reliability does not take consequences into account

serious if servi

Transient faults have no real consequences but other faults might cause data loss or corruption
May be worthwhile to identify different classes of failure, and use different metrics for each
When specifying reliability both the number of failures and the consequences of each matter
Failures with serious consequences are more damaging than those where repair and recovery is
In some cases, different reliability specifications may be defined for different failure types
Failure Classification
Transient - only occurs with certain inputs
Permanent - occurs on all inputs
Recoverable - system can recover without operator help
Unrecoverable - operator has to help
Non-corrupting - failure does not corrupt system state or data
Corrupting - system state or data are altered

How to Build Reliability Specification


For each sub-system analyze consequences of possible system failures
From system failure analysis partition failure into appropriate classes
For each class send out the appropriate reliability metric.

straightforward

Failure Class

Example

Permanent

ATM fails to operate with any


card, must restart to correct

Non-corrupting

Metric

ROCOF = .0001
Time unit = days

Transient

Magnetic stripe can't be read on


undamaged card

Non-corrupting

POFOD = .0001
Time unit = transactions

Specification Validation
It is impossible to empirically validate high reliability specifications
No database corruption really means POFOD class < 1 in 200 million
If each transaction takes 1 second to verify, simulation of one days transactions takes 3.5 days

Statistical Reliability Testing


Test data used, needs to follow typical software usage patterns
Measuring numbers of errors needs to be based on errors of omission (failing to do the right thing) and
errors of commission (doing the wrong thing)

What are the difficulties with Statistical Reliability Testing


Uncertainty when creating the operational profile
High cost of generating the operational profile
Statistical uncertainty problems when high reliabilities are specified

Safety Specification
Each safety specification should be specified separately
These requirements should be based on hazard and risk analysis
Safety requirements usually apply to the system as a whole rather than individual components
System safety is an an emergent system property

Lecture 6
(Modifiability and Its Tactics)
Modifiability

Ability to Modify the system based on the change in requirement so that

the time and cost to implement is optimal

Impact of modification such as testing, deployment, and change management is minimal

When do you want to introduce modifiability?

If (cost of modification without modifiability mechanism in place) > (cost of


modification with modifiability in place)+ Cost of installing the mechanism

Modifiability is about the cost of change. It brings up two concerns.


1. What can change (the artifact)? A change can occur to any aspect of a system, most commonly
the functions that the system computes, the platform the system exists on (the hardware, operating
system, middleware, etc.), the environment within which the system operates (the systems with
which it must interoperate, the protocols it uses to communicate with the rest of the world, etc.),
the qualities the system exhibits (its performance, its reliability, and even its future
modifications), and its capacity (number of users supported, number of simultaneous operations,
etc.). Some portions of the system, such as the user interface or the platform, are sufficiently
distinguished and subject to change that we consider them separately. The category of platform
changes is also called portability. Those changes may be to add, delete, or modify any one of
these aspects.
2. When is the change made and who makes it (the environment)? Most commonly in the past, a
change was made to source code. That is, a developer had to make the change, which was tested
and then deployed in a new release. Now, however, the question of when a change is made is
intertwined with the question of who makes it. An end user changing the screen saver is clearly
making a change to one of the aspects of the system. Equally clear, it is not in the same category
as changing the system so that it can be used over the Web rather than on a single machine.
Changes can be made to the implementation (by modifying the source code), during compile
(using compile-time switches), during build (by choice of libraries), during configuration setup
(by a range of techniques, including parameter setting) or during execution (by parameter
setting). A change can also be made by a developer, an end user, or a system administrator.
Once a change has been specified, the new implementation must be designed, implemented, tested, and
deployed. All of these actions take time and money, both of which can be measured.

Modifiability General Scenarios


From these considerations we can see the portions of the modifiability general scenarios. Figure 4.4 gives
an example: "A developer wishes to change the user interface. This change will be made to the code at
design time, it will take less than three hours to make and test the change, and no side-effect changes will
occur in the behavior."
Source of stimulus. This portion specifies who makes the changes?the developer, a system
administrator, or an end user. Clearly, there must be machinery in place to allow the system
administrator or end user to modify a system, but this is a common occurrence. In Figure 4.4, the
modification is to be made by the developer.
Stimulus. This portion specifies the changes to be made. A change can be the addition of a
function, the modification of an existing function, or the deletion of a function. It can also be
made to the qualities of the system?making it more responsive, increasing its availability, and so
forth. The capacity of the system may also change. Increasing the number of simultaneous users
is a frequent requirement. In our example, the stimulus is a request to make a modification, which
can be to the function, quality, or capacity.
Variation is a concept associated with software product lines (see Chapter 14). When considering
variation, a factor is the number of times a given variation must be specified. One that must be
made frequently will impose a more stringent requirement on the response measures than one that
is made only sporadically.
Artifact. This portion specifies what is to be changed?the functionality of a system, its platform,
its user interface, its environment, or another system with which it interoperates. In Figure 4.4, the
modification is to the user interface.
Environment. This portion specifies when the change can be made?design time, compile time,
build time, initiation time, or runtime. In our example, the modification is to occur at design time.
Response. Whoever makes the change must understand how to make it, and then make it, test it
and deploy it. In our example, the modification is made with no side effects.
Response measure. All of the possible responses take time and cost money, and so time and cost
are the most desirable measures. Time is not always possible to predict, however, and so less
ideal measures are frequently used, such as the extent of the change (number of modules
affected). In our example, the time to perform the modification should be less than three hours.
Table 4.2 presents the possible values for each portion of a modifiability scenario.
Table 4.2. Modifiability General Scenario Generation
Portion of Scenario

Possible Values

Source

End user, developer, system administrator

Portion of Scenario

Possible Values

Stimulus

Wishes to add/delete/modify/vary functionality, quality


attribute, capacity

Artifact

System user interface, platform, environment; system that


interoperates with target system

Environment

At runtime, compile time, build time, design time

Response

Locates places in architecture to be modified; makes


modification without affecting other functionality; tests
modification; deploys modification

Response Measure

Cost in terms of number of elements affected, effort, money;


extent to which this affects other functions or quality
attributes

Modifiability Scenario
A sample modifiability scenario is "A developer wishes to change the user interface to make a screen's
background color blue. This change will be made to the code at design time. It will take less than three
hours to make and test the change and no side effect changes will occur in the behavior." Figure 4.4
illustrates this sample scenario (omitting a few minor details for brevity).
Figure 4.4. Sample modifiability scenario

A collection of concrete scenarios can be used as the quality attribute requirements for a system. Each
scenario is concrete enough to be meaningful to the architect, and the details of the response are
meaningful enough so that it is possible to test whether the system has achieved the response. When

eliciting requirements, we typically organize our discussion of general scenarios by quality attributes; if
the same scenario is generated by two different attributes, one can be eliminated.
For each attribute we present a table that gives possible system-independent values for each of the six
parts of a quality scenario. A general quality scenario is generated by choosing one value for each
element; a concrete scenario is generated as part of the requirements elicitation by choosing one or more
entries from each column of the table and then making the result readable. For example, the scenario
shown in Figure 4.4 is generated from the modifiability scenario given in Table 4.2 (on page 83), but the
individual parts were edited slightly to make them read more smoothly as a scenario.
Concrete scenarios play the same role in the specification of quality attribute requirements that use cases
play in the specification of functional requirements.
Modifiability Tactics
Recall from Chapter 4 that tactics to control modifiability have as their goal controlling the time and cost
to implement, test, and deploy changes. Figure 5.4 shows this relationship.

We organize the tactics for modifiability in sets according to their goals. One set has as its goal reducing
the number of modules that are directly affected by a change. We call this set "localize modifications." A
second set has as its goal limiting modifications to the localized modules. We use this set of tactics to
"prevent the ripple effect." Implicit in this distinction is that there are modules directly affected (those
whose responsibilities are adjusted to accomplish the change) and modules indirectly affected by a change
(those whose responsibilities remain unchanged but whose implementation must be changed to
accommodate the directly affected modules). A third set of tactics has as its goal controlling deployment
time and cost. We call this set "defer binding time."
LOCALIZE MODIFICATIONS
Although there is not necessarily a precise relationship between the number of modules affected by a set
of changes and the cost of implementing those changes, restricting modifications to a small set of
modules will generally reduce the cost. The goal of tactics in this set is to assign responsibilities to
modules during design such that anticipated changes will be limited in scope. We identify five such
tactics.
Maintain semantic coherence. Semantic coherence refers to the relationships among
responsibilities in a module. The goal is to ensure that all of these responsibilities work together
without excessive reliance on other modules. Achievement of this goal comes from choosing

responsibilities that have semantic coherence. Coupling and cohesion metrics are an attempt to
measure semantic coherence, but they are missing the context of a change. Instead, semantic
coherence should be measured against a set of anticipated changes. One subtactic is to abstract
common services. Providing common services through specialized modules is usually viewed as
supporting re-use. This is correct, but abstracting common services also supports modifiability. If
common services have been abstracted, modifications to them will need to be made only once
rather than in each module where the services are used. Furthermore, modification to the modules
using those services will not impact other users. This tactic, then, supports not only localizing
modifications but also the prevention of ripple effects. Examples of abstracting common services
are the use of application frameworks and the use of other middleware software.
Anticipate expected changes. Considering the set of envisioned changes provides a way to
evaluate a particular assignment of responsibilities. The basic question is "For each change, does
the proposed decomposition limit the set of modules that need to be modified to accomplish it?"
An associated question is "Do fundamentally different changes affect the same modules?" How is
this different from semantic coherence? Assigning responsibilities based on semantic coherence
assumes that expected changes will be semantically coherent. The tactic of anticipating expected
changes does not concern itself with the coherence of a module's responsibilities but rather with
minimizing the effects of the changes. In reality this tactic is difficult to use by itself since it is not
possible to anticipate all changes. For that reason, it is usually used in conjunction with semantic
coherence.
Generalize the module. Making a module more general allows it to compute a broader range of
functions based on input. The input can be thought of as defining a language for the module,
which can be as simple as making constants input parameters or as complicated as implementing
the module as an interpreter and making the input parameters be a program in the interpreter's
language. The more general a module, the more likely that requested changes can be made by
adjusting the input language rather than by modifying the module.
Limit possible options. Modifications, especially within a product line (see Chapter 14), may be
far ranging and hence affect many modules. Restricting the possible options will reduce the effect
of these modifications. For example, a variation point in a product line may be allowing for a
change of processor. Restricting processor changes to members of the same family limits the
possible options.
PREVENT RIPPLE EFFECTS
A ripple effect from a modification is the necessity of making changes to modules not directly affected by
it. For instance, if module A is changed to accomplish a particular modification, then module B is
changed only because of the change to module A. B has to be modified because it depends, in some sense,
on A.
We begin our discussion of the ripple effect by discussing the various types of dependencies that one
module can have on another. We identify eight types:
1. Syntax of

- data. For B to compile (or execute) correctly, the type (or format) of the data that is produced by
A and consumed by B must be consistent with the type (or format) of data assumed by B.
- service. For B to compile and execute correctly, the signature of services provided by A and
invoked by B must be consistent with the assumptions of B.
2. Semantics of
- data. For B to execute correctly, the semantics of the data produced by A and consumed by B
must be consistent with the assumptions of B.
- service. For B to execute correctly, the semantics of the services produced by A and used by B
must be consistent with the assumptions of B.
3. Sequence of
- data. For B to execute correctly, it must receive the data produced by A in a fixed sequence. For
example, a data packet's header must precede its body in order of reception (as opposed to
protocols that have the sequence number built into the data).
- control. For B to execute correctly, A must have executed previously within certain timing
constraints. For example, A must have executed no longer than 5ms before B executes.
4. Identity of an interface of A. A may have multiple interfaces. For B to compile and execute
correctly, the identity (name or handle) of the interface must be consistent with the assumptions
of B.
5. Location of A (runtime). For B to execute correctly, the runtime location of A must be consistent
with the assumptions of B. For example, B may assume that A is located in a different process on
the same processor.
6. Quality of service/data provided by A. For B to execute correctly, some property involving the
quality of the data or service provided by A must be consistent with B's assumptions. For
example, data provided by a particular sensor must have a certain accuracy in order for the
algorithms of B to work correctly.
7. Existence of A. For B to execute correctly, A must exist. For example, if B is requesting a service
from an object A, and A does not exist and cannot be dynamically created, then B will not
execute correctly.
8. Resource behavior of A. For B to execute correctly, the resource behavior of A must be consistent
with B's assumptions. This can be either resource usage of A (A uses the same memory as B) or
resource ownership (B reserves a resource that A believes it owns).
With this understanding of dependency types, we can now discuss tactics available to the architect for
preventing the ripple effect for certain types.
Notice that none of our tactics necessarily prevent the ripple of semantic changes. We begin with
discussion of those that are relevant to the interfaces of a particular module?information hiding and

maintaining existing interfaces?and follow with one that breaks a dependency chain?use of an
intermediary.
Hide information. Information hiding is the decomposition of the responsibilities for an entity (a
system or some decomposition of a system) into smaller pieces and choosing which information
to make private and which to make public. The public responsibilities are available through
specified interfaces. The goal is to isolate changes within one module and prevent changes from
propagating to others. This is the oldest technique for preventing changes from propagating. It is
strongly related to "anticipate expected changes" because it uses those changes as the basis for
decomposition.
Maintain existing interfaces. If B depends on the name and signature of an interface of A,
maintaining this interface and its syntax allows B to remain unchanged. Of course, this tactic will
not necessarily work if B has a semantic dependency on A, since changes to the meaning of data
and services are difficult to mask. Also, it is difficult to mask dependencies on quality of data or
quality of service, resource usage, or resource ownership. Interface stability can also be achieved
by separating the interface from the implementation. This allows the creation of abstract
interfaces that mask variations. Variations can be embodied within the existing responsibilities, or
they can be embodied by replacing one implementation of a module with another.
Patterns that implement this tactic include
- adding interfaces. Most programming languages allow multiple interfaces. Newly visible services or
data can be made available through new interfaces, allowing existing interfaces to remain unchanged and
provide the same signature.
- adding adapter. Add an adapter to A that wraps A and provides the signature of the original A.
- providing a stub A. If the modification calls for the deletion of A, then providing a stub for A will allow
B to remain unchanged if B depends only on A's signature.
Restrict communication paths. Restrict the modules with which a given module shares data. That is,
reduce the number of modules that consume data produced by the given module and the number of
modules that produce data consumed by it. This will reduce the ripple effect since data
production/consumption introduces dependencies that cause ripples. Chapter 8 (Flight Simulation)
discusses a pattern that uses this tactic.
Use an intermediary. If B has any type of dependency on A other than semantic, it is possible to
insert an intermediary between B and A that manages activities associated with the dependency.
All of these intermediaries go by different names, but we will discuss each in terms of the
dependency types we have enumerated. As before, in the worst case, an intermediary cannot
compensate for semantic changes. The intermediaries are
- data (syntax). Repositories (both blackboard and passive) act as intermediaries between the
producer and consumer of data. The repositories can convert the syntax produced by A into that
assumed by B. Some publish/subscribe patterns (those that have data flowing through a central
component) can also convert the syntax into that assumed by B. The MVC and PAC patterns

convert data in one formalism (input or output device) into another (that used by the model in
MVC or the abstraction in PAC).
- service (syntax). The facade, bridge, mediator, strategy, proxy, and factory patterns all provide
intermediaries that convert the syntax of a service from one form into another. Hence, they can all
be used to prevent changes in A from propagating to B.
- identity of an interface of A. A broker pattern can be used to mask changes in the identity of an
interface. If B depends on the identity of an interface of A and that identity changes, by adding
that identity to the broker and having the broker make the connection to the new identity of A, B
can remain unchanged.
- location of A (runtime). A name server enables the location of A to be changed without
affecting B. A is responsible for registering its current location with the name server, and B
retrieves that location from the name server.
- resource behavior of A or resource controlled by A. A resource manager is an intermediary that
is responsible for resource allocation. Certain resource managers (e.g., those based on Rate
Monotonic Analysis in real-time systems) can guarantee the satisfaction of all requests within
certain constraints. A, of course, must give up control of the resource to the resource manager.
- existence of A. The factory pattern has the ability to create instances as needed, and thus the
dependence of B on the existence of A is satisfied by actions of the factory.
DEFER BINDING TIME
The two tactic categories we have discussed thus far are designed to minimize the number of modules that
require changing to implement modifications. Our modifiability scenarios include two elements that are
not satisfied by reducing the number of modules to be changed?time to deploy and allowing
nondevelopers to make changes. Deferring binding time supports both of those scenarios at the cost of
requiring additional infrastructure to support the late binding.
Decisions can be bound into the executing system at various times. We discuss those that affect
deployment time. The deployment of a system is dictated by some process. When a modification is made
by the developer, there is usually a testing and distribution process that determines the time lag between
the making of the change and the availability of that change to the end user. Binding at runtime means
that the system has been prepared for that binding and all of the testing and distribution steps have been
completed. Deferring binding time also supports allowing the end user or system administrator to make
settings or provide input that affects behavior.
Many tactics are intended to have impact at loadtime or runtime, such as the following.
Runtime registration supports plug-and-play operation at the cost of additional overhead to
manage the registration. Publish/subscribe registration, for example, can be implemented at either
runtime or load time.
Configuration files are intended to set parameters at startup.

Polymorphism allows late binding of method calls.


Component replacement allows load time binding.
Adherence to defined protocols allows runtime binding of independent processes.
The tactics for modifiability are summarized in Figure 5.5.

What is a Wrapper?
Wrapper is a code segment which wraps a particular feature/ code/module so that other interfaces cannot
modify it.
Dependency between two modules (B A)
Publish-subscribe model -> publisher has no information about subscriber and subscriber has no
information about the publisher.
Adherence to defined protocol -> if more than one modules wants to use same resources at the same time,
then there should be some protocol to be defined to avoid deadlock situation.
Allocation of Responsibilities
Determine the types of changes that can come due to technical, customer or business
Determine what sort of additional features are required to handle the change
Determine which existing features are impacted by the change
Coordination Model
For those where modifiability is a concern, use techniques to reduce coupling
Use publish-subscribe, use enterprise service bus
Identify which features can change at runtime
which devices, communication paths or protocols can change at runtime
And make sure that such changes have limited impact on the system

Binding
Determine the latest time at which the anticipated change is required
Choose a defer binding if possible
Try to avoid too many binding choices
Choice of Technology
Evaluate the technology that can handle modifications with least impact (e.g. enterprise service bus)
Watch for vendor lock-in problem

Performance
What is Performance?

Software systems ability to meet timing requirements when it responds to an event

Events are

interrupts, messages, requests from users or other systems

clock events marking the passage of time

The system, or some element of the system, must respond to them in time

Performance is about timing. Events (interrupts, messages, requests from users, or the passage of time)
occur, and the system must respond to them. There are a variety of characterizations of event arrival and
the response but basically performance is concerned with how long it takes the system to respond when
an event occurs.
One of the things that make performance complicated is the number of event sources and arrival patterns.
Events can arrive from user requests, from other systems, or from within the system. A Web-based
financial services system gets events from its users (possibly numbering in the tens or hundreds of
thousands). An engine control system gets its requests from the passage of time and must control both the
firing of the ignition when a cylinder is in the correct position and the mixture of the fuel to maximize
power and minimize pollution.
For the Web-based financial system, the response might be the number of transactions that can be
processed in a minute. For the engine control system, the response might be the variation in the firing
time. In each case, the pattern of events arriving and the pattern of responses can be characterized, and
this characterization forms the language with which to construct general performance scenarios.
A performance scenario begins with a request for some service arriving at the system. Satisfying the
request requires resources to be consumed. While this is happening the system may be simultaneously
servicing other requests.
An arrival pattern for events may be characterized as either periodic or stochastic. For example, a periodic
event may arrive every 10 milliseconds. Periodic event arrival is most often seen in real-time systems.

Stochastic arrival means that events arrive according to some probabilistic distribution. Events can also
arrive sporadically, that is, according to a pattern not capturable by either periodic or stochastic
characterizations.
Multiple users or other loading factors can be modeled by varying the arrival pattern for events. In other
words, from the point of view of system performance, it does not matter whether one user submits 20
requests in a period of time or whether two users each submit 10. What matters is the arrival pattern at the
server and dependencies within the requests.
The response of the system to a stimulus can be characterized by latency (the time between the arrival of
the stimulus and the system's response to it), deadlines in processing (in the engine controller, for
example, the fuel should ignite when the cylinder is in a particular position, thus introducing a processing
deadline), the throughput of the system (e.g., the number of transactions the system can process in a
second), the jitter of the response (the variation in latency), the number of events not processed because
the system was too busy to respond, and the data that was lost because the system was too busy.
Notice that this formulation does not consider whether the system is networked or standalone. Nor does it
(yet) consider the configuration of the system or the consumption of resources. These issues are
dependent on architectural solutions, which we will discuss in Chapter 5.
Performance General Scenarios
From these considerations we can see the portions of the performance general scenario, an example of
which is shown in Figure 4.5: "Users initiate 1,000 transactions per minute stochastically under normal
operations, and these transactions are processed with an average latency of two seconds."

Source of stimulus. The stimuli arrive either from external (possibly multiple) or internal sources. In our
example, the source of the stimulus is a collection of users.
Stimulus. The stimuli are the event arrivals. The arrival pattern can be characterized as periodic,
stochastic, or sporadic. In our example, the stimulus is the stochastic initiation of 1,000
transactions per minute.

Artifact. The artifact is always the system's services, as it is in our example.


Environment. The system can be in various operational modes, such as normal, emergency, or
overload. In our example, the system is in normal mode.
Response. The system must process the arriving events. This may cause a change in the system
environment (e.g., from normal to overload mode). In our example, the transactions are
processed.
Response measure. The response measures are the time it takes to process the arriving events
(latency or a deadline by which the event must be processed), the variation in this time (jitter), the
number of events that can be processed within a particular time interval (throughput), or a
characterization of the events that cannot be processed (miss rate, data loss). In our example, the
transactions should be processed with an average latency of two seconds.
Table 4.3 gives elements of the general scenarios that characterize performance.
Table 4.3. Performance General Scenario Generation
Portion of Scenario

Possible Values

Source

One of a number of independent sources, possibly from within system

Stimulus

Periodic events arrive; sporadic events arrive; stochastic events arrive

Artifact

System

Environment

Normal mode; overload mode

Response

Processes stimuli; changes level of service

Response Measure

Latency, deadline, throughput, jitter, miss rate, data loss

For most of the history of software engineering, performance has been the driving factor in system
architecture. As such, it has frequently compromised the achievement of all other qualities. As the
price/performance ratio of hardware plummets and the cost of developing software rises, other qualities
have emerged as important competitors to performance.
Latency- time between arrival of stimulus and system response
Throughput- number of txn processed/unit of time
Jitter- allowable variation in latency
Performance Tactics

Recall from Chapter 4 that the goal of performance tactics is to generate a response to an event arriving at
the system within some time constraint. The event can be single or a stream and is the trigger for a request
to perform computation. It can be the arrival of a message, the expiration of a time interval, the detection
of a significant change of state in the system's environment, and so forth. The system processes the events
and generates a response. Performance tactics control the time within which a response is generated. This
is shown in Figure 5.6. Latency is the time between the arrival of an event and the generation of a
response to
it.

After an event arrives, either the system is processing on that event or the processing is blocked for some
reason. This leads to the two basic contributors to the response time: resource consumption and blocked
time.
1. Resource consumption. Resources include CPU, data stores, network communication bandwidth,
and memory, but it can also include entities defined by the particular system under design. For
example, buffers must be managed and access to critical sections must be made sequential.
Events can be of varying types (as just enumerated), and each type goes through a processing
sequence. For example, a message is generated by one component, is placed on the network, and
arrives at another component. It is then placed in a buffer; transformed in some fashion
(marshalling is the term the Object Management Group uses for this transformation); processed
according to some algorithm; transformed for output; placed in an output buffer; and sent onward
to another component, another system, or the user. Each of these phases contributes to the overall
latency of the processing of that event.
2. Blocked time. A computation can be blocked from using a resource because of contention for it,
because the resource is unavailable, or because the computation depends on the result of other
computations that are not yet available.
- Contention for resources. Figure 5.6 shows events arriving at the system. These events may be
in a single stream or in multiple streams. Multiple streams vying for the same resource or
different events in the same stream vying for the same resource contribute to latency. In general,
the more contention for a resource, the more likelihood of latency being introduced. However,
this depends on how the contention is arbitrated and how individual requests are treated by the
arbitration mechanism.
- Availability of resources. Even in the absence of contention, computation cannot proceed if a
resource is unavailable. Unavailability may be caused by the resource being offline or by failure
of the component or for some other reason. In any case, the architect must identify places where
resource unavailability might cause a significant contribution to overall latency.

- Dependency on other computation. A computation may have to wait because it must


synchronize with the results of another computation or because it is waiting for the results of a
computation that it initiated. For example, it may be reading information from two different
sources, if these two sources are read sequentially, the latency will be higher than if they are read
in parallel.
With this background, we turn to our three tactic categories: resource demand, resource management, and
resource arbitration.
RESOURCE DEMAND
Event streams are the source of resource demand. Two characteristics of demand are the time between
events in a resource stream (how often a request is made in a stream) and how much of a resource is
consumed by each request.
One tactic for reducing latency is to reduce the resources required for processing an event stream. Ways
to do this include the following.
Increase computational efficiency. One step in the processing of an event or a message is
applying some algorithm. Improving the algorithms used in critical areas will decrease latency.
Sometimes one resource can be traded for another. For example, intermediate data may be kept in
a repository or it may be regenerated depending on time and space resource availability. This
tactic is usually applied to the processor but is also effective when applied to other resources such
as a disk.
Reduce computational overhead. If there is no request for a resource, processing needs are
reduced. In Chapter 17, we will see an example of using Java classes rather than Remote Method
Invocation (RMI) because the former reduces communication requirements. The use of
intermediaries (so important for modifiability) increases the resources consumed in processing an
event stream, and so removing them improves latency. This is a classic modifiability/performance
tradeoff.
Another tactic for reducing latency is to reduce the number of events processed. This can be done in one
of two fashions.
Manage event rate. If it is possible to reduce the sampling frequency at which environmental
variables are monitored, demand can be reduced. Sometimes this is possible if the system was
overengineered. Other times an unnecessarily high sampling rate is used to establish harmonic
periods between multiple streams. That is, some stream or streams of events are oversampled so
that they can be synchronized.
Control frequency of sampling. If there is no control over the arrival of externally generated
events, queued requests can be sampled at a lower frequency, possibly resulting in the loss of
requests.
Other tactics for reducing or managing demand involve controlling the use of resources.

Bound execution times. Place a limit on how much execution time is used to respond to an event.
Sometimes this makes sense and sometimes it does not. For iterative, data-dependent algorithms,
limiting the number of iterations is a method for bounding execution times.
Bound queue sizes. This controls the maximum number of queued arrivals and consequently the
resources used to process the arrivals.
RESOURCE MANAGEMENT
Even though the demand for resources may not be controllable, the management of these resources affects
response times. Some resource management tactics are:
Introduce concurrency. If requests can be processed in parallel, the blocked time can be reduced.
Concurrency can be introduced by processing different streams of events on different threads or
by creating additional threads to process different sets of activities. Once concurrency has been
introduced, appropriately allocating the threads to resources (load balancing) is important in order
to maximally exploit the concurrency.
Maintain multiple copies of either data or computations. Clients in a client-server pattern are
replicas of the computation. The purpose of replicas is to reduce the contention that would occur
if all computations took place on a central server. Caching is a tactic in which data is replicated,
either on different speed repositories or on separate repositories, to reduce contention. Since the
data being cached is usually a copy of existing data, keeping the copies consistent and
synchronized becomes a responsibility that the system must assume.
Increase available resources. Faster processors, additional processors, additional memory, and
faster networks all have the potential for reducing latency. Cost is usually a consideration in the
choice of resources, but increasing the resources is definitely a tactic to reduce latency. This kind
of cost/performance tradeoff is analyzed in Chapter 12.
RESOURCE ARBITRATION
Whenever there is contention for a resource, the resource must be scheduled. Processors are scheduled,
buffers are scheduled, and networks are scheduled. The architect's goal is to understand the characteristics
of each resource's use and choose the scheduling strategy that is compatible with it.
A scheduling policy conceptually has two parts: a priority assignment and dispatching. All scheduling
policies assign priorities. In some cases the assignment is as simple as first-in/first-out. In other cases, it
can be tied to the deadline of the request or its semantic importance. Competing criteria for scheduling
include optimal resource usage, request importance, minimizing the number of resources used,
minimizing latency, maximizing throughput, preventing starvation to ensure fairness, and so forth. The
architect needs to be aware of these possibly conflicting criteria and the effect that the chosen tactic has
on meeting them.
A high-priority event stream can be dispatched only if the resource to which it is being assigned is
available. Sometimes this depends on pre-empting the current user of the resource. Possible preemption

options are as follows: can occur anytime; can occur only at specific pre-emption points; and executing
processes cannot be pre-empted. Some common scheduling policies are:
1. First-in/First-out. FIFO queues treat all requests for resources as equals and satisfy them in turn.
One possibility with a FIFO queue is that one request will be stuck behind another one that takes
a long time to generate a response. As long as all of the requests are truly equal, this is not a
problem, but if some requests are of higher priority than others, it is problematic.
2. Fixed-priority scheduling. Fixed-priority scheduling assigns each source of resource requests a
particular priority and assigns the resources in that priority order. This strategy insures better
service for higher-priority requests but admits the possibility of a low-priority, but important,
request taking an arbitrarily long time to be serviced because it is stuck behind a series of higherpriority requests. Three common prioritization strategies are
- semantic importance. Each stream is assigned a priority statically according to some domain
characteristic of the task that generates it. This type of scheduling is used in mainframe systems
where the domain characteristic is the time of task initiation.
- deadline monotonic. Deadline monotonic is a static priority assignment that assigns higher
priority to streams with shorter deadlines. This scheduling policy is used when streams of
different priorities with real-time deadlines are to be scheduled.
- rate monotonic. Rate monotonic is a static priority assignment for periodic streams that assigns
higher priority to streams with shorter periods. This scheduling policy is a special case of
deadline monotonic but is better known and more likely to be supported by the operating system.
3. Dynamic priority scheduling:
- round robin. Round robin is a scheduling strategy that orders the requests and then, at every
assignment possibility, assigns the resource to the next request in that order. A special form of
round robin is a cyclic executive where assignment possibilities are at fixed time intervals.
- earliest deadline first. Earliest deadline first assigns priorities based on the pending requests
with the earliest deadline.
4. Static scheduling. A cyclic executive schedule is a scheduling strategy where the pre-emption
points and the sequence of assignment to the resource are determined offline.
For Further Reading at the end of this chapter lists books on scheduling theory.
The
tactics
performance
summarized
Figure 5.7.

for
are
in

Why System fails to Respond?

Resource Consumption

CPU, memory, data store, network communication

A buffer may be sequentially accessed in a critical section

There may be a workflow of tasks one of which may be choked with request

Blocking of computation time

Resource contention

Availability of a resource

Deadlock due to dependency of resource

Control Resource Demand

Increase Computation Efficiency: Improving the algorithms used in performance critical areas

Reduce Overhead

Reduce resource consumption when not needed

Use of local objects instead of RMI (Remote Method Invocation)calls

Local interface in EJB 3.0

Remove intermediaries (conflicts with modifiability)

Manage

event rate: If you have control, dont sample too many events (e.g. sampling
environmental data)

sampling time: If you dont have control, sample them at a lower speed, leading to loss of
request

Bound

Execution: Decide how much time should be given on an event. E.g. iteration bound on a
data-dependent algorithm

Queue size: Controls maximum number of queued arrivals

Manage Resources

Increase Resources(infrastructure)

Faster processors, additional processors, additional memory, and faster networks

Increase Concurrency

If possible, process requests in parallel

Process different streams of events on different threads

Create additional threads to process different sets of activities

Multiple copies

Computations : so that it can be performed faster (client-server, multiple copy of


client/data)

Data:

use of cache for faster access and reduce contention

Hadoop maintains data copies to avoid data-transfer and improve data locality

Resource Arbitration

Resources are scheduled to reduce contention

Processors, buffer, network

Architect needs to choose the right scheduling strategy

FIFO

Fixed Priority

Semantic importance

Domain specific logic such as request from a privileged class gets higher priority

Deadline monotonic (shortest job first)

Dynamic priority

Round robin

Earliest deadline first- the job which has earliest deadline to complete

Static scheduling

Also pre-emptive scheduling policy

Design Checklist for a Quality Attribute

Allocate responsibility

Manage Data

Identify the portion of the data that needs to be managed for this quality attribute

Plan for various data design w.r.t. the quality attribute

Resource Management Planning

How infrastructure should be monitored, tuned, deployed to address the quality concern

Manage Coordination

Modules can take care of the required quality requirement

Plan how system elements communicate and coordinate

Binding

Performance- Design Checklist- Allocate responsibilities

Identify which features may involve or cause

Heavy workload

Time-critical response

Identify which part of the system thats heavily used

For these, analyze the scenarios that can result in performance bottleneck

Furthermore-

Assign Responsibilities related to threads of control allocation and de-allocation of


threads, maintaining thread pools, and so forth

Assign responsibilities that will schedule shared resources or appropriately select,


manage performance-related artifacts such as queues, buffers, and caches

Performance- Design Checklist- Manage Data

Identify the data thats involved in time critical response requirements, heavily used, massive size
that needs to be loaded etc. For those data determine

whether maintaining multiple copies of key data would benefit performance

partitioning data would benefit performance

whether reducing the processing requirements for the creation, initialization, persistence,
manipulation, translation, or destruction of the enumerated data abstractions is possible

whether adding resources to reduce bottlenecks for the creation, initialization,


persistence, manipulation, translation, or destruction of the enumerated data abstractions
is feasible.

Performance- Design Checklist- Manage Coordination

Look for the possibility of introducing concurrency (and obviously pay attention to threadsafety), event priorization, or scheduling strategy

Will this strategy have a significant positive effect on performance? Check (i.e.
Introducing Concurrency create extra overhead )

Determine whether the choice of threads of control and their associated responsibilities
introduces bottlenecks

Consider appropriate mechanisms for example

Stateful vs. stateless, synchronous vs. asynchronous, guaranteed delivery

Performance Design Checklist- Resource Management

Determine which resources (CPU, memory) in your system are critical for performance.

Plan for mitigating actions early, for instance

Where heavy network loading will occur, determine whether co-locating some
components will reduce loading and improve overall efficiency.

Ensure that components with heavy computation requirements are assigned to processors
with the most processing capacity.

Prioritization of resources and access to resources

Ensure they will be monitored and managed under normal and overloaded system
operation.

scheduling and locking strategies

Deploying additional resources on demand to meet increased loads

Typically possible in a Cloud and virtualized scenario

Performance Design checklist- Binding

For each element that will be bound after compile time, determine the

time necessary to complete the binding

additional overhead introduced by using the late binding mechanism

Ensure that these values do not pose unacceptable performance penalties on the system.

Performance Design Checklist- Technology choice

Choice of technology is often governed by the organization mandate (enterprise architecture)

Find out if the chosen technology will let you set and meet real time deadlines?

Do you know its characteristics under load and its limits?

Does your choice of technology give you the ability to set

scheduling policy

Priorities

policies for reducing demand

allocation of portions of the technology to processors

Does your choice of technology introduce excessive overhead?

Lecture 7
(Security, Testability, Interoperability RL 6.1,6.2,6.3)
What is security?

A measure of the systems ability to resist unauthorized usage while still providing its services to
legitimate users

Ability to protect data and information from unauthorized access

An attempt to breach this is an Attack

Unauthorized attempt to access, modify, delete data

Theft of money by e-transfer, modification records and files, reading and copying
sensitive data like credit card number

Deny service to legitimate users

Security is a measure of the system's ability to resist unauthorized usage while still providing its services
to legitimate users. An attempt to breach security is called an attack[1] and can take a number of forms. It
may be an unauthorized attempt to access data or services or to modify data, or it may be intended to deny
services to legitimate users.
[1]

Some security experts use "threat" interchangeably with "attack."

Attacks, often occasions for wide media coverage, may range from theft of money by electronic transfer
to modification of sensitive data, from theft of credit card numbers to destruction of files on computer
systems, or to denial-of-service attacks carried out by worms or viruses. Still, the elements of a security
general scenario are the same as the elements of our other general scenarios?a stimulus and its source, an
environment, the target under attack, the desired response of the system, and the measure of this response.

Security can be characterized as a system providing nonrepudiation, confidentiality, integrity, assurance,


availability, and auditing. For each term, we provide a definition and an example.
1. Nonrepudiation is the property that a transaction (access to or modification of data or services)
cannot be denied by any of the parties to it. This means you cannot deny that you ordered that
item over the Internet if, in fact, you did.
2. Confidentiality is the property that data or services are protected from unauthorized access. This
means that a hacker cannot access your income tax returns on a government computer.
3. Integrity is the property that data or services are being delivered as intended. This means that
your grade has not been changed since your instructor assigned it.
4. Assurance is the property that the parties to a transaction are who they purport to be. This means
that, when a customer sends a credit card number to an Internet merchant, the merchant is who
the customer thinks they are.
5. Availability is the property that the system will be available for legitimate use. This means that a
denial-of-service attack won't prevent your ordering this book.
6. Auditing is the property that the system tracks activities within it at levels sufficient to reconstruct
them. This means that, if you transfer money out of one account to another account, in
Switzerland, the system will maintain a record of that transfer.
Each of these security categories gives rise to a collection of general scenarios.
Security General Scenarios
The portions of a security general scenario are given below. Figure 4.6 presents an example. A correctly
identified individual tries to modify system data from an external site; system maintains an audit trail and
the correct data is restored within one day.
Source of stimulus. The source of the attack may be either a human or another system. It may
have been previously identified (either correctly or incorrectly) or may be currently unknown. If
the source of the attack is highly motivated (say politically motivated), then defensive measures
such as "We know who you are and will prosecute you" are not likely to be effective; in such
cases the motivation of the user may be important. If the source has access to vast resources (such
as a government), then defensive measures are very difficult. The attack itself is unauthorized
access, modification, or denial of service.
The difficulty with security is allowing access to legitimate users and determining legitimacy. If
the only goal were to prevent access to a system, disallowing all access would be an effective
defensive measure.

Stimulus. The stimulus is an attack or an attempt to break security. We characterize this as an


unauthorized person or system trying to display information, change and/or delete information, access
services of the system, or reduce availability of system services. In Figure 4.6, the stimulus is an attempt
to modify data.
Artifact. The target of the attack can be either the services of the system or the data within it. In
our example, the target is data within the system.
Environment. The attack can come when the system is either online or offline, either connected to
or disconnected from a network, either behind a firewall or open to the network.
Response. Using services without authorization or preventing legitimate users from using services
is a different goal from seeing sensitive data or modifying it. Thus, the system must authorize
legitimate users and grant them access to data and services, at the same time rejecting
unauthorized users, denying them access, and reporting unauthorized access. Not only does the
system need to provide access to legitimate users, but it needs to support the granting or
withdrawing of access. One technique to prevent attacks is to cause fear of punishment by
maintaining an audit trail of modifications or attempted accesses. An audit trail is also useful in
correcting from a successful attack. In Figure 4.6, an audit trail is maintained.
Response measure. Measures of a system's response include the difficulty of mounting various
attacks and the difficulty of recovering from and surviving attacks. In our example, the audit trail
allows the accounts from which money was embezzled to be restored to their original state. Of
course, the embezzler still has the money, and he must be tracked down and the money regained,
but this is outside of the realm of the computer system.
Table 4.4 shows the security general scenario generation table.

Table 4.4. Security General Scenario Generation


Portion of Scenario

Possible Values

Source

Individual or system that is


correctly identified, identified incorrectly, of unknown identity
who is
internal/external, authorized/not authorized

Portion of Scenario

Possible Values
with access to
limited resources, vast resources

Stimulus

Tries to
display data, change/delete data, access system services, reduce availability to
system services

Artifact

System services; data within system

Environment

Either
online or offline, connected or disconnected, firewalled or open

Response

Authenticates user; hides identity of the user; blocks access to data and/or services;
allows access to data and/or services; grants or withdraws permission to access data
and/or services; records access/modifications or attempts to access/modify
data/services by identity; stores data in an unreadable format; recognizes an
unexplainable high demand for services, and informs a user or another system, and
restricts availability of services

Response Measure

Time/effort/resources required to circumvent security measures with probability of


success; probability of detecting attack; probability of identifying individual
responsible for attack or access/modification of data and/or services; percentage of
services still available under denial-of-services attack; restore data/services; extent
to which data/services damaged and/or legitimate access denied

Security Tactics- Close to Physical Security

Detection:

Limit the access through security checkpoints

Enforces everyone to wear badges or checks legitimate visitors

Resist

React

Armed guards

Lock the door automatically

Recover

Keep backup of the data in a different place

A. Detect Attacks

Detect Intrusion: compare network traffic or service request patterns within a system to

a set of signatures or

known patterns of malicious behavior stored in a database.

Detect Service Denial

Verify Message Integrity

Compare the pattern or signature of network traffic coming into a system to historic
profiles of known Denial of Service (DoS) attacks.

Use checksums or hash values to verify the integrity of messages, resource files,
deployment files, and configuration files.

Detect Message Delay:

checking the time that it takes to deliver a message, it is possible to detect suspicious
timing behavior.

B. Resist Attacks

Identify Actors: identify the source of any external input to the system.

Authenticate & Authorize Actors:

Use strong passwords, OTP, digital certificates, biometric identity

Use access control pattern, define proper user class, user group, role based access

Limit Access

Restrict access based on message source or destination ports

Use of DMZ (demilitarized zone)

Limit Exposure: minimize the attack surface of a system by allocating limited number of
services to each hosts

Data confidentiality:

Use encryption to encrypt data in database

User encryption based communication such as SSL for web based transaction

Use Virtual private network to communicate between two trusted machines

Separate Entities: can be done through physical separation on different servers attached to
different networks, the use of virtual machines, or an air gap.

Change Default Settings: Force the user to change settings assigned by default.

C. React to Attacks

Revoke Access: limit access to sensitive resources, even for normally legitimate users and uses, if
an attack is suspected.

Lock Computer: limit access to a resource if there are repeated failed attempts to access it.

Inform Actors: notify operators, other personnel, or cooperating systems when an attack is
suspected or detected.

D. Recover From Attacks

In addition to the Availability tactics for recovery of failed resources there is Audit.

Audit: keep a record of user and system actions and their effects, to help trace the actions of, and
to identify, an attacker.

Tactics for achieving security can be divided into those concerned with resisting attacks, those concerned
with detecting attacks, and those concerned with recovering from attacks. All three categories are
important. Using a familiar analogy, putting a lock on your door is a form of resisting an attack, having a
motion sensor inside of your house is a form of detecting an attack, and having insurance is a form of
recoverin
g from an
attack.
Figure 5.8
shows the
goals of
the
security
tactics.

RESISTING ATTACKS
In Chapter 4, we identified nonrepudiation, confidentiality, integrity, and assurance as goals in our
security characterization. The following tactics can be used in combination to achieve these goals.
Authenticate users. Authentication is ensuring that a user or remote computer is actually who it
purports to be. Passwords, one-time passwords, digital certificates, and biometric identifications
provide authentication.
Authorize users. Authorization is ensuring that an authenticated user has the rights to access and
modify either data or services. This is usually managed by providing some access control patterns
within a system. Access control can be by user or by user class. Classes of users can be defined
by user groups, by user roles, or by lists of individuals.
Maintain data confidentiality. Data should be protected from unauthorized access. Confidentiality
is usually achieved by applying some form of encryption to data and to communication links.
Encryption provides extra protection to persistently maintained data beyond that available from
authorization. Communication links, on the other hand, typically do not have authorization
controls. Encryption is the only protection for passing data over publicly accessible
communication links. The link can be implemented by a virtual private network (VPN) or by a
Secure Sockets Layer (SSL) for a Web-based link. Encryption can be symmetric (both parties use
the same key) or asymmetric (public and private keys).
Maintain integrity. Data should be delivered as intended. It can have redundant information
encoded in it, such as checksums or hash results, which can be encrypted either along with or
independently from the original data.
Limit exposure. Attacks typically depend on exploiting a single weakness to attack all data and
services on a host. The architect can design the allocation of services to hosts so that limited
services are available on each host.
Limit access. Firewalls restrict access based on message source or destination port. Messages
from unknown sources may be a form of an attack. It is not always possible to limit access to
known sources. A public Web site, for example, can expect to get requests from unknown
sources. One configuration used in this case is the so-called demilitarized zone (DMZ). A DMZ is
used when access must be provided to Internet services but not to a private network. It sits
between the Internet and a firewall in front of the internal network. The DMZ contains devices
expected to receive messages from arbitrary sources such as Web services, e-mail, and domain
name services.
DETECTING ATTACKS
The detection of an attack is usually through an intrusion detection system. Such systems work by
comparing network traffic patterns to a database. In the case of misuse detection, the traffic pattern is
compared to historic patterns of known attacks. In the case of anomaly detection, the traffic pattern is
compared to a historical baseline of itself. Frequently, the packets must be filtered in order to make

comparisons. Filtering can be on the basis of protocol, TCP flags, payload sizes, source or destination
address, or port number.
Intrusion detectors must have some sort of sensor to detect attacks, managers to do sensor fusion,
databases for storing events for later analysis, tools for offline reporting and analysis, and a control
console so that the analyst can modify intrusion detection actions.
RECOVERING FROM ATTACKS
Tactics involved in recovering from an attack can be divided into those concerned with restoring state and
those concerned with attacker identification (for either preventive or punitive purposes).
The tactics used in restoring the system or data to a correct state overlap with those used for availability
since they are both concerned with recovering a consistent state from an inconsistent state. One difference
is that special attention is paid to maintaining redundant copies of system administrative data such as
passwords, access control lists, domain name services, and user profile data.
The tactic for identifying an attacker is to maintain an audit trail. An audit trail is a copy of each
transaction applied to the data in the system together with identifying information. Audit information can
be used to trace the actions of an attacker, support nonrepudiation (it provides evidence that a particular
request was made), and support system recovery. Audit trails are often attack targets themselves and
therefore should be maintained in a trusted fashion.
Figure 5.9 provides a summary of the tactics for security.

Design Checklist- Allocation of Responsibilities

Identify the services that needs to be secured

Identify the modules, subsystems offering these services

For each such service

Identify actors which can access this service, and implement authentication and level of
authorization for those

verify checksums and hash values

Allow/deny data associated with this service for these actors

record attempts to access or modify data or services

Encrypt data that are sensitive

Implement a mechanism to recognize reduced availability for this services

Implement notification and alert mechanism

Implement recover from an attack mechanism

Design Checklist- Manage Data

Determine the sensitivity of different data fields

Ensure that data of different sensitivity is separated

Ensure that data of different sensitivity has different access rights and that access rights are
checked prior to access.

Ensure that access to sensitive data is logged and that the log file is suitably protected.

Ensure that data is suitably encrypted and that keys are separated from the encrypted data.

Ensure that data can be restored if it is inappropriately modified.

Design Checklist- Manage Coordination

For inter-system communication (applied for people also)

Ensure that mechanisms for authenticating and authorizing the actor or system, and
encrypting data for transmission across the connection are in place.

Monitor communication

Monitor anomalous communication such as

unexpectedly high demands for resources or services

Unusual access pattern

Mechanisms for restricting or terminating the connection.

Design Checklist- Manage Resource

Define appropriate grant or denial of resources

Record access attempts to resources

Encrypt data

Monitor resource utilization

Log

Identify suddenly high demand to a particular resource- for instance high CPU utilization
at an unusual time

Ensure that a contaminated element can be prevented from contaminating other elements.

Ensure that shared resources are not used for passing sensitive data from an actor with access
rights to that data to an actor without access rights.

Identity Legitimate users


Architecture design should be like that other component should not be affected if one component is
affected with a security breach.
Design checklist- Binding

Runtime binding of components can be untrusted. Determine the following

Based on situation implement certificate based authentication for a component

Implement certification management, validation

Define access rules for components that are dynamically bound

Implement audit trail for whenever a late bound component tries to access records

System data should be encrypted where the keys are intentionally withheld for late bound
components

Whatever components/securities you are binding should be secured enough.


Design Checklist- Technology choice
Choice of technology is often governed by the organization mandate (enterprise architecture)

Decide tactics first. Based on the tactics, ensure that your chosen technologies support the tactics

Determine what technology are available to help user authentication, data access rights, resource
protection, data encryption

Identify technology and tools for monitoring and alert

Developer should know the Tools /techniques/technology for monitoring and avoid security breach.

Testability

The ease with which software can be made to demonstrae its faults through testing

If a fault is present in a system, then we want it to fail during testing as quickly as possible.

At least 40% effort goes for testing

Done by developers, testers, and verifiers (tools)

Specialized software for testing

I.g. Test harness, Simple playback capability, Specialized testing chamber

Dijkstras Thesis

Test cant guarantee the absence of errors, but it can only show their presence.

Fault discovery is a probability

That the next test execution will fail and exhibit the fault

A perfectly testable code each components internal state must be controllable through
inputs and output must be observable

Error-free software does not exist.

Testability Scenario
Software testability refers to the ease with which software can be made to demonstrate its faults through
(typically execution-based) testing. At least 40% of the cost of developing well-engineered systems is
taken up by testing. If the software architect can reduce this cost, the payoff is large.
In particular, testability refers to the probability, assuming that the software has at least one fault, that it
will fail on its next test execution. Of course, calculating this probability is not easy and, when we get to
response measures, other measures will be used.
For a system to be properly testable, it must be possible to control each component's internal state and
inputs and then to observe its outputs. Frequently this is done through use of a test harness, specialized
software designed to exercise the software under test. This may be as simple as a playback capability for
data recorded across various interfaces or as complicated as a testing chamber for an engine.
Testing is done by various developers, testers, verifiers, or users and is the last step of various parts of the
software life cycle. Portions of the code, the design, or the complete system may be tested. The response
measures for testability deal with how effective the tests are in discovering faults and how long it takes to
perform the tests to some desired level of coverage.

Testability General Scenarios


Figure 4.7 is an example of a testability scenario concerning the performance of a unit test: A unit tester
performs a unit test on a completed system component that provides an interface for controlling its
behavior and observing its output; 85% path coverage is achieved within three hours.

Figure 4.7. Sample testability scenario

Source of stimulus. The testing is performed by unit testers, integration testers, system testers, or the
client. A test of the design may be performed by other developers or by an external group. In our
example, the testing is performed by a tester.
Stimulus. The stimulus for the testing is that a milestone in the development process is met. This
might be the completion of an analysis or design increment, the completion of a coding increment
such as a class, the completed integration of a subsystem, or the completion of the whole system.
In our example, the testing is triggered by the completion of a unit of code.
Artifact. A design, a piece of code, or the whole system is the artifact being tested. In our
example, a unit of code is to be tested.
Environment. The test can happen at design time, at development time, at compile time, or at
deployment time. In Figure 4.7, the test occurs during development.
Response. Since testability is related to observability and controllability, the desired response is
that the system can be controlled to perform the desired tests and that the response to each test can
be observed. In our example, the unit can be controlled and its responses captured.
Response measure. Response measures are the percentage of statements that have been executed
in some test, the length of the longest test chain (a measure of the difficulty of performing the
tests), and estimates of the probability of finding additional faults. In Figure 4.7, the measurement
is percentage coverage of executable statements.

Table 4.5 gives the testability general scenario generation table.

Table 4.5. Testability General Scenario Generation


Portion of Scenario

Possible Values

Source

Unit developer
Increment integrator
System verifier
Client acceptance tester
System user

Stimulus

Analysis, architecture, design, class, subsystem integration completed;


system delivered

Artifact

Piece of design, piece of code, complete application

Environment

At design time, at development time, at compile time, at deployment time

Response

Provides access to state values; provides computed values; prepares test


environment

Response Measure

Percent executable statements executed


Probability of failure if fault exists
Time to perform tests
Length of longest dependency chain in a test
Length of time to prepare test environment

Goal of Testability Tactics

Using testability tactics the architect should aim to reduce the high cost of testing when the
software is modified

Two categories of tactics

Introducing controllability and observability to the system during design

The second deals with limiting complexity in the systems design

Testability Tactics
Control and Observe System State

Specialized Interfaces for testing:

to control or capture variable values for a component either through a test harness (or
automated test framework is a collection of software and test data configured to test a
program unit by running it under varying conditions and monitoring its behavior and
outputs.) or through normal execution.

Use a special interface that a test harness can use

Make use of some metadata through this special interface

Record/Playback: capturing information crossing an interface and using it as input for further
testing.

Localize State Storage: To start a system, subsystem, or module in an arbitrary state for a test, it
is most convenient if that state is stored in a single place.

Interface and implementation

If they are separated, implementation can be replaced by a stub for testing rest of the
system

Sandbox: isolate the system from the real world to enable experimentation that is unconstrained
by the worry about having to undo the consequences of the experiment.

Executable Assertions: assertions are (usually) hand coded and placed at desired locations to
indicate when and where a program is in a faulty state.

Manage Complexity

Limit Structural Complexity:

avoiding or resolving cyclic dependencies between components,

isolating and encapsulating dependencies on the external environment

reducing dependencies between components in general.

Limit Non-determinism: finding all the sources of non-determinism (i.e. multiple thread running),
such as unconstrained parallelism, and remove them out as far as possible.

Internal Monitoring

Implement a built-in monitoring mechanism

One should be able to turn on or off

one example is logging

Performed typically by instrumentation- AOP (Aspect Oriented Programming),


Preprocessor macro. Instrument the code to introduce recorder at some point

The goal of tactics for testability is to allow for easier testing when an increment of software development
is completed. Figure 5.10 displays the use of tactics for testability. Architectural techniques for enhancing
the software testability have not received as much attention as more mature fields such as modifiability,
performance, and availability, but, as we stated in Chapter 4, since testing consumes such a high
percentage of system development cost, anything the architect can do to reduce this cost will yield a
significant benefit.
Figure 5.10. Goal of testability tactics

Although
in Chapter
4 we included design reviews as a testing technique, in this chapter we are concerned only with testing a
running system. The goal of a testing regimen is to discover faults. This requires that input be provided to
the software being tested and that the output be captured.
Executing the test procedures requires some software to provide input to the software being tested and to
capture the output. This is called a test harness. A question we do not consider here is the design and
generation of the test harness. In some systems, this takes substantial time and expense.
We discuss two categories of tactics for testing: providing input and capturing output, and internal
monitoring.
INPUT/OUTPUT
There are three tactics for managing input and output for testing.
Record/playback. Record/playback refers to both capturing information crossing an interface and
using it as input into the test harness. The information crossing an interface during normal
operation is saved in some repository and represents output from one component and input to
another. Recording this information allows test input for one of the components to be generated
and test output for later comparison to be saved.
Separate interface from implementation. Separating the interface from the implementation allows
substitution of implementations for various testing purposes. Stubbing implementations allows

the remainder of the system to be tested in the absence of the component being stubbed.
Substituting a specialized component allows the component being replaced to act as a test harness
for the remainder of the system.
Specialize access routes/interfaces. Having specialized testing interfaces allows the capturing or
specification of variable values for a component through a test harness as well as independently
from its normal execution. For example, metadata might be made available through a specialized
interface that a test harness would use to drive its activities. Specialized access routes and
interfaces should be kept separate from the access routes and interfaces for required functionality.
Having a hierarchy of test interfaces in the architecture means that test cases can be applied at any
level in the architecture and that the testing functionality is in place to observe the response.
INTERNAL MONITORING
A component can implement tactics based on internal state to support the testing process.
Built-in monitors. The component can maintain state, performance load, capacity, security, or
other information accessible through an interface. This interface can be a permanent interface of
the component or it can be introduced temporarily via an instrumentation technique such as
aspect-oriented programming or preprocessor macros. A common technique is to record events
when monitoring states have been activated. Monitoring states can actually increase the testing
effort since tests may have to be repeated with the monitoring turned off. Increased visibility into
the activities of the component usually more than outweigh the cost of the additional testing.
Figure 5.11 provides a summary of the tactics used for testability.
Figure 5.11. Summary of testability tactics

Design Checklist- Allocation of Responsibility


Identify the services are most critical and hence need to be most thoroughly tested.

Identify the modules, subsystems offering these services

For each such service

Ensure that internal monitoring mechanism like logging is well designed

Make sure that the allocation of functionality provides

low coupling,

strong separation of concerns, and

low structural complexity.

Design Checklist- Testing Data

Identify the data entities that are related to the critical services need to be most thoroughly tested.

Ensure that creation, initialization, persistence, manipulation, translation, and destruction of these
data entities are possible-

State Snapshot: Ensure that the values of these data entities can be captured if required,
while the system is in execution or at fault

Replay: Ensure that the desired values of these data entities can be set (state injection)
during testing so that it is possible to recreate the faulty behavior

Design Checklist- Testing Infrastructure

Is it possible to inject faults into the communication channel and monitoring the state of
the communication

Is it possible to execute test suites and capture results for a distributed set of systems?

Testing for potential race condition- check if it is possible to explicitly map

processes to processors

threads to processes

So that the desired test response is achieved and potential race conditions identified

Design Checklist- Testing resource binding

Ensure that components that are bound later than compile time can be tested in the late bound
context

E.g. loading a driver on-demand

Ensure that late bindings can be captured in the event of a failure, so that you can re-create the
systems state leading to the failure.

Ensure that the full range of binding possibilities can be tested.

Design Checklist- Resource Management

Ensure there are sufficient resources available to execute a test suite and capture the results

Ensure that your test environment is representative of the environment in which the system will
run

Ensure that the system provides the means to:

test resource limits

capture detailed resource usage for analysis in the event of a failure

inject new resources limits into the system for the purposes of testing

provide virtualized resources for testing

Choice of Tools

Determine what tools are available to help achieve the testability scenarios

Do you have regression testing, fault injection, recording and playback supports from the
testing tools?

Does your choice of tools support the type of testing you intend to carry on?

You may want a fault-injection but you need to have a tool that can support the level of
fault-injection you want

Does it support capturing and injecting the data-state

Interoperability
(How coordination happen)

Ability that two systems can usefully exchange information through an interface

Ability to transfer data (syntactic) and interpret data (semantic)

Information exchange can be direct or indirect

Interface

Beyond API

Need to have a set of assumptions you can safely make about the entity exposing the API

Example- you want to integrate with Google Maps

Why Interoperate?

The service provided by Google Maps are used by unknown systems

They must be able to use Google Maps without Google knowing who they can be

You may want to construct capability from variety of systems

A traffic sensing system can receive stream of data from individual vehicles

Raw data needs to be processed

Need to be fused with other data from different sources

Need to decide the traffic congestion

Overlay with Google Maps

Interoperability Scenario
Combines the location information with other details, overlays with Google Maps, and broadcasts
Notion of Interface

Information exchange

Can be as simple as A calling B

A and B can exchange implicitly without direct communication

Operation Dessert Storm 1991: Anti-missile system failed to exchange information


(intercept) an incoming ballistic rocket

The system required periodic restart in order to recalibrate its position. Since it
wasnt restarted, the information wasnt correctly captured due to error
accumulation

Interface

Here it also means that a set of assumptions that can be made safely about this entity

E.g. it is safe to assume that the API of anti-missile system DOES NOT give information
about gradual degradation

Interoperability is about the degree to which two or more systems can usefully exchange meaningful
information. Like all quality attributes, interoperability is not a yes-or-no proposition but has shades of
meaning.

Inte
rope
rabil
ity
Tact
ics

ocat
e
(Dis
cove
r
servi
ce)

denti
fy
the
servi
ce through a known directory service. Here service implies a set of capabilities available
through an interface

By name, location or other attributes

REpresentational State Transfer (REST)


REST is an architectural pattern where services are described using an uniform interface. RESTful
services are viewed as a hypermedia resource. REST is stateless.

Allocation of Responsibilities: Check which system features need to interoperate with others.
For each of these features, ensure that the designers implement
Accepting and rejecting of requests
Logging of request
Notification mechanism

Exchange of information
Coordination Model: Coordination should ensure performance SLAs to be met. Plan for
Handling the volume of requests
Timeliness to respond and send the message
Currency of the messages sent
Handle jitters in message arrival times

Data Model

Identify the data to be exchanged among interoperating systems

If the data cant be exchanged due to confidentiality, plan for data transformation before
exchange

Identification of Architectural Component

The components that are going to interoperate should be available, secure, meet
performance SLA (consider design-checklists for these quality attributes)

Resource Management

Ensure that system resources are not exhausted (flood of request shouldnt deny a
legitimate user)

Consider communication load

When resources are to be shared, plan for an arbitration policy

Binding Time

Ensure that it has the capability to bind unknown systems

Ensure the proper acceptance and rejection of requests

Ensure service discovery when you want to allow late binding

Technology Choice

Consider technology that supports interoperability (e.g. web-services)

Lecture 8
(Introduction to Patterns)

What is a (Architecture) Pattern?

A set of components (or subsystems), their responsibilities, interactions, and the way they
collaborate

Constraints or rules that decide the interaction

To solve a recurring architectural problem in a generic way

Synonymous to architecture style

Properties of Patterns

Addresses a recurring design problem that arises in specific design situations and presents a
solution to it

Document existing, well-proven design experience

Identify and Specify abstractions at the high level

Provide a common vocabulary and understanding of design principles

Helps to build complex systems

Manage software complexity

A note on Design Principles

A set of guidelines that helps to get a good design

Robert Martins book on Agile Software Development says


-

Avoid Rigidity (hard to change)

Avoid Fragility (whenever I change it breaks)

Avoid Immobility (cant be reused)

OO Design Principles

Open close

Open to extension and close for modification

i.e. Template and strategy pattern

Dependency inversion

Decouple two module dependencies (A B)

Superclass can be replaced by subclass

Interface Segregation

Adapter pattern

Liskovs Substitution

A holds the interface of B. Implementer of B implments the interface.

Dont pollute an interface. Define for a specific purpose

Single responsibility

One class only one task

Context

A scenario or situation where design problem arises

Ideally the scenario should be generic, but it may not always be possible

Describe situations in which the problem occurs

Give a list of all known situations

Example

Developing Messaging solution for mobile applications

Developing software for a Man Machine Interface

Problem

Starts with a generic problem statement; captures the central theme

Completed by forces; aspect of the problem that should be considered when solving it

It is a Requirement

It can be a Constraint

It can be a Desirable property

Forces complement or contradict

Example

Ease of modifying the User Interface (Personalization)

Solution

Configuration to balance forces

Structure with components and relationships

Run-time behavior

Structure: Addresses static part of the solution

Run-time: Behavior while running addresses the dynamic part

Example

Building blocks for the application

Specific inputs events and their processing

Pattern System
A pattern system for software architecture is a collection of patterns for software architecture, together
with guidelines for their implementation, combination and practical use of software development.

Support the development of high-quality software systems; Functional and non-functional


requirements

It should comprise a sufficient base of patterns

It should describe all its patterns uniformly

It should expose the various relationships between patterns

It should organize its constituent patterns

It should support the construction of software systems

It should support its own evolution

Pattern Classification

It should be simple and easy to learn

It should consist of only a few classification criteria

Each classification criterion should reflect natural properties of patterns

It should provide a roadmap

The schema should be open to integration of new patterns

Problem Categories
Category

Description

Mud to
Structure

Includes patterns that support suitable decomposition of an overall system task into
cooperating subtasks

Distributed
Systems

Includes patterns that provide infrastructures for systems that have components
located in different processes or in several subsystems and components

Interactive
Systems

Includes patterns that help to structure human-computer interaction

Adaptable
Systems

Includes patterns that provide infrastructures for the extension and adaptation of
application in response to evolving and changing functional requirements

Structural
Decomposition

Includes patterns that support a suitable decomposition of subsystems and complex


components into cooperating parts

Organization
of Work

Includes patterns that define how components collaborate to provide a complex


service

Category

Description

Creation

Includes patterns that help with instantiating objects and recursive object structures

Service
Variation

Comprises patterns that support changing the behavior of an object or component

Service
Extension

Includes patterns that help to add new services to an object or object structure
dynamically

Adaptation

Provides patterns that help with interface and data conversion

Access Control

Includes patterns that guard and control access to services or components

Management

Includes patterns for handling homogenous collections of objects, services and


components in their entirety

Communication

Includes patterns that help organize communication between components

Resource
handling

Includes patterns that help manage shared components and objects

Architectural
Patterns
Mud to
Structure

Layers, Pipes and


Filters, Blackboard

Distributed
Systems

Broker, Pipes and


Filters, Microkernel

Interactive
Systems

MVC, PAC

Adaptable
Systems

Microkernel,
Reflection

Creation

Structural
Decomposition

Design Patterns

Idioms

Abstract Factory,
Prototype, Builder

Singleton,
Factory
Method

Whole-Part, Composite

Organisation
of work

Master-Slave, Chain of
Responsibility, Command,
Mediator

Access Control

Proxy, Faade, Iterator

Service
Variation

Bridge, Strategy, State

Service
Extension

Decorator, Visitor

Management

Adaptation

Template
method

Command Processor,
View Handler, Memento
Adapter

Communication

Resource
Handling

Publisher-subscriber,
Forwarder-Receiver,
Client-Dispatcher-Server
Flyweight

Counted
Pointer

Mud to Structure

Before we start a new system, we collect requirement from customer transform those into
specifications

Requirements Architecture (Optimistic View)

Ball of mud is the realization

Cutting the ball along only one aspect (like along lines visible in the application domain may not
be of help)

Need to consider functional and non-funcational attributes

Architectural Patterns

Lecture 9
(Layering Pattern)

Example
Suppose that the store should provide the capability for a user to
Browse the catalog of products
Select a product and put it in shopping cart
Product is stored in a Table

Name

Category

Edition

Price

Software Architecture in
Practice

Book

2nd

2453

Software Architecture

Book

3rd

500

When you implement, it will look like Flipkart or Amazon

What you need at a minimum?

Three sets of classes

One set manages display of products, ease of selection, navigation

Another set manages the product management, pricing

Another set manages the database access

UI Layer classes

Business Layer classes

Database Layer classes

Layers Architectural Pattern


Helps to structure application that can be decomposed into groups of subtasks in which each group of
subtasks is at a particular level of abstraction .
Layers

Implementing protocols

Conceptually different issues split into separate, interacting layers

Functionality decomposed into layers; helps replace layer(s) with better or different
implementation

Layers 3 part schema


Context

A large system that requires decomposition

Problem

Mix of low- and high-level issues, where high-level operations rely on low-level
ones
A typical pattern of communication flow consists of requests moving from high
level to low level, and answers to requests, incoming data and notification about
events traveling in the opposite direction

Forces

Solution

Code changes should not ripple through the system


Stable interfaces; standardization
Exchangeable parts
Grouping of responsibilities for better understandability and
maintainability

Structure the system into appropriate number of layers

Implementation Guideline

Define abstraction criteria


Level of abstractions define the layers.
Most generic components are in lowest layer whereas the domain-specific components are in top
layer

More stable components (which hardly undergoes change) are in lower layer. Use degree of
stability to decide layers

Distance from hardware


-

User-visible elements

Specific Application Modules

Common Service Levels

OS Interface Level

Hardware

Determine the no. of abstraction levels

Typically each abstraction level is one layer

Map the abstraction levels to layers

Use mechanisms to keep number of layers to optimum number (say 3 layers for a typical selfservice based application)

Too Few Layers Can Result in Poor Structure

Too Many Layers Impose Unnecessary Overhead

Complete Layer specification


A) Name the layer and assign tasks

Highest layers are system functionality perceived by the user

Lower layers are helpers

In bottom up approach create generic tasks at the lowest level- sort of infrastructure

Requires experience to achieve this

B) Specify the services

Strict separation of layers

No component should spread over two layers

Inverted pyramid of use Lower level layer have generic services which will be used by
upper level layer.

Construct Each Layer

Specify layer interface

Use a black box approach

Layer N treats Layer N-1 as a black box

Structure each layer

Identify components inside each layer

Bridge or strategy pattern can help

Supports multiple implementations of services provided by a layer

Supports Dynamic exchange of algorithms used by a user

Inter layer communication

Design an error handling strategy

Define an efficient strategy

Handling may be expensive errors need to propagate

Benefits

Examples

Pattern

Description

Context

A large system that requires decomposition

Problem

Mix of low- and high-level issues, where high-level operations rely on low-level
ones
A typical pattern of communication flow consists of requests moving from high
level to low level, and answers to requests, incoming data and notification
about events traveling in the opposite direction
Forces
Code changes should not ripple through the system
Stable interfaces; standardization
Exchangeable parts

Grouping of responsibilities for better understandability and


maintainability

Solution

Structure the system into appropriate number of layers

Variants

Relaxed Layered System


Layering Through Inheritance

Benefits

Reuse of layers
Support for standardization
Dependencies are kept local
Exchangeability

Liabilities

Cascades of changing behavior


Lower efficiency
Unnecessary work
Difficulty in establishing the correct granularity

Lecture 10

Pipes and Filters Architectural Pattern


Definition
"The Pipes and Filters architectural pattern provides a structure for systems that
process a stream of data. Each processing step is encapsulated in a filter component.
Data [are] passed through pipes between adjacent filters. Recombining filters allows
you to build families of related filters." [Buschmann]

Context
The context consists of programs that must process streams of data.

Problem
Suppose we need to build a system to solve a problem:

that must be built by several developers


that decomposes naturally into several independent processing steps

for which the requirements are likely to change.

The design of the components and their interconnections must consider the following
forces [Buschmann]:

It should be possible to enhance the system by substituting new filters for


existing ones or by recombining the steps into a different communication
structure.
Components implementing small processing steps are easier to reuse than
components implementing large steps.
If two steps are not adjacent, then they share no information.
Different sources of input data exist.
It should be possible to display or store the final results of the computation in
various ways.
If the user stores intermediate results in files, then the likelihood of errors
increases and the file system may become cluttered with junk.
Parallel execution of the steps should be possible.

Solution

Divide the task into a sequence of processing steps.


Let each step be implemented by a filter program that consumes from its
input and produces data on its output incrementally.
Connect the output of one step as the input to the succeeding step by means
of a pipe.
Enable the filters to execute concurrently.
Connect the input to the sequence to some data source, such as a file.
Connect the output of the sequence to some data sink, such as a file or display
device.

Structure
The filters are the processing units of the pipeline. A filter may enrich, refine, or
transform its input data [Buschmann].

It may enrich the data by computing new information from the input data and
adding it to the output data stream.
It may refine the data by concentrating or extracting information from the
input data stream and passing only that information to the output stream.

It may transform the input data to a new form before passing it to the output
stream.
It may, of course, do some combination of enrichment, refinement, and
transformation.

A filter may be active (the more common case) or passive.

An active filter runs as a separate process or thread; it actively pulls data from
the input data stream and pushes the transformed data onto the output data
stream.
A passive filter is activated by either being called:
o as a function, a pull of the output from the filter
o as a procedure, a push of output data into the filter

The pipes are the connectors--between a data source and the first filter, between
filters, and between the last filter and a data sink. As needed, a pipe synchronizes the
active elements that it connects together.
A data source is an entity (e.g., a file or input device) that provides the input data to
the system. It may either actively push data down the pipeline or passively supply data
when requested, depending upon the situation.
A data sink is an entity that gathers data at the end of a pipeline. It may either actively
pull data from the last filter element or it may passively respond when requested by
the last filter element.
See the Class-Responsibility-Collaborator (CRC) cards for these elements on page 56
of the Buschmann book.

Implementation
Implementation of the pipes-and-filters architecture is usually not difficult. It often
includes the following steps [Buschmann]:
1. Divide the functionality of the problem into a sequence of processing steps.
Each step should only depend upon the outputs of the previous step in the
sequence. The steps will become the filters in the system.

In dividing up the functionality, be sure to consider variations or later changes


that might be needed--a reordering of the steps or substitution of one processing
step for another.
2. Define the type and format of the data to be passed along each pipe.
For example, Unix pipes carry an unstructured sequence of bytes. However,
many Unix filters read and write streams of ASCII characters that are
structured into lines (with the newline character as the line terminator).
Another important formatting issue is how the end of the input is marked. A
filter might rely upon a system end-of-input condition or it may need to
implement their own "sentinel" data value to mark the end.
3. Determine how to implement each pipe connection.
For example, a pipe connecting active filters might be implemented with
operating system or programming language runtime facility such as a message
queue, a Unix-style pipe, or a synchronized-access bounded buffer.
A pipe connecting to a passive filter might be implemented as a direct call of
the adjacent filter: a push connection as a call of the downstream filter as a
procedure or a pull connection as a call of the upstream filter as a function.
4. Design and implement the filters.
The design of a filter is based on the nature of the task to be performed and the
natures of the pipes to which it can be connected.
o

An active filter needs to run with its own thread of control. It might run
as as a "heavyweight" operating system process (i.e., having its own
address space) or as a "lightweight" thread (i.e., sharing an address
space with other threads).
A passive filter does not require a separate thread of control (although
it could be implemented with a separate thread).

The selection of the size of the buffer inside a pipe is an important performance
tradeoff. Large buffers may use up much available memory but likely will
involve less synchronization and context-switching overhead. Small buffers
conserve memory at the cost of increased overhead.

To make filters flexible and, hence, increase their potential reusability, they
often will need different processing options that can be set when they are
initiated. For example, Unix filters often take command line parameters, access
environment variables, or read initialization files.
5. Design for robust handling of errors.
Error handling is difficult in a pipes-and-filters system since there is no global
state and often multiple asynchronous threads of execution. At the least, a
pipes-and-filters system needs mechanisms for detecting and reporting errors.
An error should not result in incorrect output or other damage to the data.
For example, a Unix program can use the stderr channel to report errors to its
environment.
More sophisticated pipes-and-filters systems should seek to recover from
errors. For example, the system might discard bad input and resynchronize at
some well-defined point later in the input data. Alternatively, the system might
back up the input to some well-defined point and restart the processing, perhaps
using a different processing method for the bad data.
6. Configure the pipes-and-filters system and initiate the processing.
One approach is to use a standardized main program to create, connect, and
initiate the needed pipe and filter elements of the pipeline.
Another approach is to use an end-user tool, such as a command shell or a
visual pipeline editor, to create, connect, and initiate the needed pipe and filter
elements of the pipeline.

Example
An example pipes-and-filter system might be a retargetable compiler for a
programming language. The system might consist of a pipeline of processing elements
similar to the following:
1. A source element reads the program text (i.e., source code) from a file (or
perhaps a sequence of files) as a stream of characters.
2. A lexical analyzer converts the stream of characters into a stream of lexical
tokens for the language--keywords, identifier symbols, operator symbols, etc.
3. A parser recognizes a sequence of tokens that conforms to the language
grammar and translates the sequence to an abstract syntax tree.

4. A "semantic" analyzer reads the abstract syntax tree and writes an


appropriately augmented abstract syntax tree.
Note: This element handles context-sensitive syntactic issues such as type
checking and type conversion in expressions.
5. A global optimizer (usually optionally invoked) reads an augmented syntax
tree and outputs one that is equivalent but corresponds to program that is
more efficient in space and time resource usage.
Note: A global optimizer may transform the program by operations such as
factoring out common subexpressions and moving statements outside of loops.
6. An intermediate code generator translates the augmented syntax tree to a
sequence of instructions for a virtual machine.
7. A local optimizer converts the sequence of intermediate code (i.e., virtual
machine) instructions into a more efficient sequence.
Note: A local optimizer may transform the program by removing unneeded
loads and stores of data.
8. A backend code generator translates the sequence of virtual machine
instructions into a sequence of instructions for some real machine platform
(i.e., for some particular hardware processor augmented by operating system
calls and a runtime library).
9. If the previous step generated symbolic assembly code, then an assembler is
needed to translate the sequence of symbolic instructions into a relocatable
binary module.
10. If the previous steps of the pipeline generated a sequence of separate binary
modules, then a linker might be needed to bind the separate modules with
library modules to form a single executable (i.e., object code) module.
11. A sink element outputs the resulting binary module into a file.
The pipeline can be reconfigured to support a number of different variations:

If source code preprocessing is to be supported (e.g., as in C), then


a preprocessor filter (or filters) can be inserted in front of the lexical analyzer.
If the language is to be interpreted rather than translated into object code,
then the backend code generator (and all components after it in the pipeline)
can be replaced by an interpreter that implements the virtual machine.

If the compiler is to be retargeted to a different platform, then a backend code


generator (and assembler and linker) for the new platform can be substituted
for the old one.
If the compiler is to be modified to support a different language with the same
lexical structure, then only the parser, semantic analyzer, global optimizer,
and intermediate code generator need to be replaced.
Note: If the parser is driven by tables that describe the grammar, then it may be
possible to use the same parser with a different table.

If a load-and-go compiler is desired, the file-output sink can be replaced by


a loader that loads the executable module into an address space in the
computer's main memory and starts the module executing.

Of course, a pure active-filters system as described above for a compiler may not be
very efficient or convenient.

Sometimes a system of filters can be made more efficient by directly sharing a


global state. Otherwise the global information must be encoded by one filter,
passed along a pipe to an adjacent filter, decoded by that filter, and so forth
on downstream.
In the compiler pipeline, the symbol table is a key component of the global
state that is constructed by the lexical analyzer and needed by the phases
downstream through (at least) the intermediate code generator.

Sometimes performance can be improved by combining adjacent active filters


into one program and replacing the pipe by an upstream function call (a
passive pull connection) or a downstream procedure call (a passive push
connection).
In the compiler pipeline, it may be useful to combine the phases from lexical
analysis through intermediate code generation into one program because they
share the symbol table. Performance can be further improved by having the
parser directly call the lexical analyzer when the next token is needed.

Although a piece of information may not be required at some step, the


availability of that information may be useful.
For example, the symbol table information is not usually required during
backend code generation, interpretation, or execution. However, some of the

symbol table information, such as variable and procedure names, may be useful
in generation of error messages and execution traces or for use by a runtime
debugging tools.

Variants
So far we have focused on single-input single-output filters. A generalization of the
pipes-and-filters pattern allows filters with multiple input and/or multiple output pipes
to be connected in any directed graph structure.
In general, such dataflow systems are difficult to design so that they compute the
desired result and terminate cleanly. However, if we restrict ourselves to directed
acyclic graph structures, the problem is considerably simplified.
In the UNIX operating system shell, the tee filter provides a mechanism to split a
stream into two streams, named pipes provide mechanisms for constructing network
connections, and filters with multiple input files/streams provide mechanisms for
joining two streams.
Consider the following UNIX shell commands. On a Solaris machine, this sequence
sets up a pipe to build a sorted list of all words that occur more than once in a file:
# create two named pipes
mknod pipeA p
mknod pipeB p
# set up side chain computation (running in the background)
cat pipeA >pipeB &
# set up main pipeline computation
cat filename | tr -cs "[:alpha:]" "[\n*256]" \
| tr "[:upper:]" "[:lower:]" | sort | tee pipeA | uniq \
| comm -13 - pipeB | uniq

The mknod commands set up two named pipes, pipeA and pipeB, for connecting
to a "side chain" computation.
The "side chain" command starts a cat program running in a background fork
(note the &). The program takes its input from the pipe named pipeA and
writes its output to the pipe named pipeB.
The main pipeline uses a cat filter as a source for the stream. The next two
stages use filter tr to translate each sequence of non-alphabetic characters to
a single newline character and to map all uppercase characters to lowercase,
respectively. The words are now in a standard form--in lowercase, one per
line.

The fourth stage of the main pipeline sorts the words into ascending order
using the sort filter.
After the sort, the main pipeline uses a tee filter to replicate the stream,
sending one copy down the main pipeline and another copy onto the side
chain via pipeA.
The side chain simply copies the words from pipeA onto pipeB. Meanwhile the
main pipeline uses the uniq filter to remove adjacent duplicate words.
The main pipeline stream and the side chain stream are then joined by
the comm filter. The comm filter takes two inputs, one from main pipeline's
stream (note the - parameter) and another from pipeB.
Invoking the comm filter with the -13 option cause it to output the lines that
appear in the second stream (i.e., pipeB) but not the first stream (i.e., the main
pipeline). Thus, the output is an alphabetical list of words that appear more
than once in the input file.
The final stage, another uniq filter, removes duplicates from the final output.

Consequences
Benefits
The pipes-and-filters architectural pattern has the following benefits [Buschmann]:

Intermediate files unnecessary, but possible. File system clutter is avoided


and concurrent execution is made possible.
Flexibility by filter exchange. It is easy to exchange one filter element for
another with the same interfaces and functionality.
Flexibility by recombination. It is not difficult to reconfigure a pipeline to
include new filters or perhaps to use the same filters in a different sequence.
Reuse of filter elements. The ease of filter recombination encourages filter
reuse. Small, active filter elements are normally easy to reuse if the
environment makes them easy to connect.
Rapid prototyping of pipelines. Flexibility of exchange and recombination and
ease of reuse enables the rapid creation of prototype systems.
Efficiency by parallel processing. Since active filters run in separate processes
or threads, pipes-and-filters systems can take advantage of a multiprocessor.

Liabilities
The pipes-and-filters architectural pattern has the following liabilities [Buschmann]:

Sharing state information is expensive or inflexible. The information must be


encoded, transmitted, and then decoded.
Efficiency gain by parallel processing is often an illusion. The costs of data
transfer, synchronization, and context switching may be high. Nonincremental filters, such as the Unix sort, can become the bottleneck of a
system.
Data transformation overhead. The use of a single data channel between
filters often means that much transformation of data must occur, for example,
translation of numbers between binary and character formats.
Error handling. It is often difficult to detect errors in pipes-and-filters systems.
Recovering from errors is even more difficult.

Pipe-And-Filter
A very simple, yet powerful architecture, that is also very robust. It consists of any
number of components (filters) that transform or filter data, before passing it on via
connectors (pipes) to other components. The filters are all working at the same time.
The architecture is often used as a simple sequence, but it may also be used for very
complex structures.

The filter transforms or filters the data it receives via the pipes with which it is
connected. A filter can have any number of input pipes and any number of output
pipes.
The pipe is the connector that passes data from one filter to the next. It is a directional
stream of data, that is usually implemented by a data buffer to store all data, until the
next filter has time to process it.
The pump or producer is the data source. It can be a static text file, or a keyboard
input device, continously creating new data.

The sink or consumer is the data target. It can be a another file, a database, or a
computer screen.

Examples

Unix programs. The output of one program can be linked to the input
of another program.
Compilers. The consecutive filters perform lexical analysis, parsing,
semantic analysis, and code generation.

Where does it come from?


The popularity of the architecture is mainly due to the Unix operating system. It has
become popular because Ken Thomson (who created Unix, together with Dennis
Ritchie) decided to limit the architecture to a linear pipeline. Using the architecture at
all was an idea of Doug McIlroy, their manager at Bell Labs at the time (1972). Both
filters (coroutines) and pipes (streams) were not new, but it is not clear to me who
designed the architecture of linking the coroutines by streams. As far as I can see, the
design was made by Doug McIlroy.

When should you use it?


This architecture is great if you have a lot of transformations to perform and you need
to be very flexible in using them, yet you want them to be robust.

How does it work?


The application links together all inputs and outputs of the filters by pipes, then
spawns
separate
threads
for
each
filter
to
run
in.
Here's an idea of the relationships that can be created between the different filter
processes, through pipes.

All filters are processes that run (virtually) at the same time. That means, they can run
as different threads, coroutines, or be located on different machines entirely. Every
pipe connected to a filter has its own role in the function of the filter. So if you
connect a pipe, you also need to specify the role it plays in the filter process. The
filters should be made so robust that pipes can be added and removed at runtime.
Every time the filter performs a step, it reads from its input pipes, performs its
function on this data, and places the result on all output pipes. If there is insufficient
data in the input pipes, the filter simply waits.
The architecture also allows for a recursive technique, whereby a filter itself consists
of a pipe-filter sequence:

Problems

If a filter needs to wait until it has received all data (e.g. a sort filter),
its data buffer may overflow, or it may deadlock.
If the pipes only allow for a single data type (a character or byte) the
filters will need to do some parsing. This complicates things and slows

them down. If you create different pipes for different datatypes, you
cannot link any pipe to any filter.
Common implementation techniques

Filters are commonly implemented by separate threads. These may be


either hardware or software threads/coroutines.

Pipes and Filters


A structure for systems that process a stream of data
Filter

Has interfaces from which a set of inputs can flow in and a set of outputs can flow out

processing step is encapsulated in a filter component

Independent entities

Does not share state with other filters.

Does not know the identity to upstream and downstream filters

All data does not need to be processed for next filter to start working

Pipes

Data is passed through pipes between adjacent filters

Stateless data stream

Source end feeds filter input and sink receives output.

Recombining filters allows you to build families of related systems


Pipes and Filters 3 part schema

Context

Processing Data Streams

Problem

Forces

System that must process or transform a stream of


input data.
Multi-stage operations on data (workflow)
Many developers may work on different stages
Requirements may change

Solution

Future enhancements exchange


processing steps or recombination
Reuse desired, hence small processing
steps
Non adjacent processing steps do not
share information
Different sources of data exist (different
sensor data)
Store final result in various ways
Explicit storage of intermediate results
should be automatically done
Multiprocessing the steps should be
possible

Pipes and filters data source to data sink

Simple case

Known Example Compiler Design

Various Components

Scenario-1

Scenario-2

Scenario 3

Scenario 4- Multiprocess

Implementation

Steps

Divide the systems task into a sequence of processing stages

Define the data format to be passed along each pipe

Decide how to implement each pipe connection

Design and implement the filters

Design the error handling

Set up the processing pipeline

Initial Steps

Design Pipe and Filter

Final Steps

Variants

Tee and Join pipeline


Filters with more then one input and/or more than one output

Benefits

No intermediate files necessary, but possible

Filter addition, replacement, and reuse


Possible to hook any two filters together

Rapid prototyping of pipelines

Concurrent execution

Certain analyses possible


Throughput, latency, deadlock

Liabilities

Sharing state information is expensive or inflexible

Data transformation overhead

Error handling can be a problem

Does not work well with interactive applications

Lowest common denominator on data transmission determines the overall throughput

Pipe and Filter in Cloud based Service

Most PaaS service providers (Amazon, Azure, Google) provides message oriented
service orchestration

Pipe-n-Filter is a common pattern

Azure
The components having worker role are the filters
Pipe is the queuing service

Amazon
EC2 instances are filters, communicating via SQS pipes

Lecture 11
RL 10.2 Blackboard Architecture
RL 11.1 Distributed Pattern

A blackboard architecture is a distributed computing architecture where distributed applications,


modelled as intelligent agents, share a common data structure called the blackboard and a
scheduling/control process. The blackboard can be either centeralized or distributed, depending
on the requirements and constraints of the application(s).
To solve a complex problem in the blackboard-style, the intelligent agents cooperate as
functional specialists, observing updates to the blackboard and self-actualizing (in an event
driven process) when there is new information to process. Agents continually update the
blackboard with partial solutions when the agents capabilities for processing match the state of
the blackboard.
The blackboard architecture is a distributed computing model for a metaphor describing how
people work together to collaboratively solve a problem around a blackboard (whiteboard in
todays lingo). For example, one person is standing at the whiteboard working on a solution while
three other people are sitting (or standing) around watching. One of the observers sees new
information on the whiteboard, thinks of how he (or she) can contribute, and then jumps up,
takes the whiteboard marker from the person working, and adds to the solution. This process is
repeated in various scenarios.
The blackboard architecture can be very effective in solving complex distributed computing
problems, including event processing problems; however, scheduling the self-actuating agents
can be a key challenge. Another core challenge is how to model and manage the blackboard
itself, especially in distributed blackboard architectures.

Blackboard Architecture

Context and Problem

A set of heterogeneous specialized modules which dynamically change their strategies as a


response to unpredictable events

Non-deterministic strategies

Problem

When there is no deterministic solutions to process raw data, and it is required to


interchange algorithms processing some intermediate computation

Solutions to partial problems require different representation

No predetermined strategy is present to solve a problem (in functional decomposition


sequence of activations are more hard-coded)

Dealing with uncertain knowledge

Forces

A complete search of the solution space is not possible

Different algorithms to be used for partial solutions

One algorithm uses results of another algorithm

Input, intermediate data, output can have different representation

No strict sequence between algorithms, one can run them concurrently if required

Examples

Speech recognition (HEARSAY project 1980)

Vehicle identification and tracking

Robot control (navigation, environment learning, reasoning, destination route planning)

Modern machine learning algorithms for complex task (i.e. Jeopardy challenge Popular quiz
machine developed by IBM)

Adobe OCR text recognition

Modern compilers tend to be more Blackboard oriented

Blackboard Pattern

Two kinds of components

Central data structure blackboard

Components operating on the blackboard

System control is entirely driven by the blackboard state

Components of Blackboard

The blackboard is the shared data structure where solutions are built

The control plan encapsulates information necessary to run the system

It is accessed and up dated by control knowledge sources

DomainKS (Domain Knowledge Source) are concerned with the solving of domain specific
problems

Control KS adapt the current control plan to the current situation

The control component selects, configures and executes knowledge sources

Solution Structure

Automated Robo Navigation

Robots high level goal is to visit a set of places as so on as possible

The successive sub goals are

to decide on a sequence of places to visit

to compute the best route and

to navigate with a constraint of rapidity

Benefits

Distributed Systems - Broker Pattern


Context
Complex environment comprises of distributed systems

You want to take advantage of computing power of many CPUs, or a cluster of low-cost systems

A software may be available only on a specific computer

Due to security reasons, you want to run different parts in different systems

Some services are provided by business partners over the internet

Problem with distributed components

To build a complex sw system as a set of decoupled, interoperating components rather than a


monolith.

Greater flexibility, maintainability, changeability

Partitioning into independent components makes system distributable and scalable.

Require a flexible means of inter-process communication

If participating components handle communication, there can be several issues

System depends on which communication mechanism used

Clients need to know location of servers

Forces

It should be possible to distribute components during deployment application should unaware


of

Whether the service is collocated (i.g. app is in same machine) or remote

If remote, where the location of the server is

Need to exchange, add, or remove components at run-time

Must not depend on system-specific details to guarantee portability and interoperability

Architecture should hide system-specific and implementation-specific details from users of


components and services

Specifically communication issues, data transfer, security issues

Broker Pattern: Solution

Introduce a broker component to achieve better decoupling of clients and servers

Servers: register themselves with the broker and make their services available to clients
through method interfaces.

Clients: access the functionality of servers by sending requests via the broker

The Broker:

Locating the appropriate server and forwarding a request to that server

Transmitting results and exceptions back to the client

Broker Pattern: Solution -- 2

Reduces the development complexity

Introduces object model where distributed services are encapsulated within objects.

Broker systems offer a path to the integration of two core technologies:

Distribution

Object oriented design

Object oriented design from single applications to distributed applications that can

run on heterogeneous machines and

written in different programming languages.

Broker Pattern: Structure

Participating components

Clients

Servers

Brokers

Bridges

Client-side proxies

Server-side proxies

Broker

Broker pattern: Implementation


1. Define an object model or use an existing one
2. Decide which kind of component-interoperability the system should offer
3. Specify the APIs the broker component provides for collaborating with clients and servers
4. Use proxy objects to hide implementation details from clients and servers

5. Design the broker component in parallel with steps 3 and 4

broken down into nine steps

6. Develop IDL compilers


Scenario 1

Broker as service locator

Broker resides at a well-known location and then expose that location to the client

Broker is responsible for locating the server for the client.

Broker also implements a repository for

adding and removing server components

Makes it possible to add, remove, or exchange server components at run time

Once the server is located, client and server talks directly

Broker behavior server look-up

Broker as Intermediary

In some situations, direct communication between client and server is not desirable

For security reasons you may want to host all the servers in your company's private
network, behind a firewall, and

only allow access to them from the broker

Broker forwards all the requests and responses between the server and the client instead of
direct communication

Broker as intermediary

Broker Known Uses- CORBA

CORBA is the oldest amongst the middleware technologies used in todays IT world

CORBA stands for Common Object Request Broker Architecture and is defined by

its interfaces

their semantics and

protocols used for communication (Internet Inter-Orb Protocol IIOP)

CORBA supports the basic Broker pattern.

For the basic functionality CORBA supports the so called Dynamic Invocation Interface (DII tells
what are the services Server is providing at run time) on the client-side

From IDL (compiler) create client proxy (client stub) and the server proxy (called skeleton)

Various ORB extensions support a wide variety of advanced features

CORBA supports client-side asynchrony via standardized interface. Server-side


asynchrony is only supported proprietarily, no specific manner.

Broker Known Uses- RMI

Sun's Java Remote Method Invocation (RMI) is based on the Transparent Broker variant pattern

The client-side proxy (so called stub) and the server-side invoker (so called skeleton) have to be
created manually by an additional compilation step

In contrast to CORBA the Service Interface is not written in an abstract IDL, but in Java.

RMI is limited to the usage of Java

To establish interoperability RMI-IIOP is provided

RMI doesn't support client-side or server-side asynchrony out of the box- you have to
implement it

A central naming service (called RMI registry) allows clients to look up servant identifiers

Broker Known Uses- .NET

Microsoft's .NET Remoting platform implements the Transparent Broker variant pattern to
handle remote communication.

Since the .NET platform supports reflection to acquire type information, the client proxy
is created automatically at runtime behind the scene, completely transparent for the
application developer.

No separate source code generation or compilation step required.

The interface description for the client proxy can be provided by MSIL-Code or by a
WSDL-Description of the interface itself.

The client proxy is responsible of creating the invocation request, but is not in charge of
any communication related aspects.

The remote communication functionality of .NET Remoting is encapsulated within a framework


consisting of marshalers (so called Formatters in .NET Remoting) and Transport Channels, which
abstract from the underlying transport layer.

Flexible, allows any custom extensions to fulfil for example QoS requirements.

Supports the client-side asynchrony broker variants. Lifecycle management strategies


for servants are also included within the framework.

Doesn't have a central naming or lookup system. Clients have to know the object
reference of the servant in advance. However different strategies exist to avoid the
hardcoding of the server destination inside the client application code

Benefits

Location Independence-- Clients do not have to care where an object is located, though for
remote objects, they always have to use the more complex interface, unless a Transparent
Broker is used.

Type System TransparencyDifferences in type systems are coped with by a intermediate


network protocol. The marshaler translates between programming language specific types and
the common network protocol.

Isolation-- Separating all the communication-related code into its own layer isolates it from the
application. You can decide to run the application distributed or all on one computer without
having to change any application code.

Separation of Concerns The communication and marshaling concerns are properly


encapsulated in the requestor, invoker, and marshaler.

Resource ManagementThe management of network and other communication resources


such as connections, transfer buffers and threads is encapsulated within the Broker Participants
and therefore seperated from the application logic.

Portability Platform dependencies which typically arise from low level I/O and IP
communication are encapsulated within the Broker Participants and therefore separated from
the application logic.

Liabilities

Error HandlingClients have to cope with the inherent unreliability and the associated errors of
network communication.

Overhead Developers can easily forget about the location of objects, which can cause
overhead if the expenses of remote communication are not considered

Performance

Lower fault tolerance (server fails, broker fails, ...)

Testing and debugging

Lecture 12
Interactive Systems

MVC Architecture
Model View Controller or MVC as it is popularly called, is a software design
pattern for developing web applications. A Model View Controller pattern is
made up of the following three parts:

Model - The lowest level of the pattern which is responsible


for maintaining data.

View - This is responsible for displaying all or a portion of the


data to the user.

Controller - Software Code that controls the interactions


between the Model and View.

MVC is popular as it isolates the application logic from the user interface
layer and supports separation of concerns. Here the Controller receives all
requests for the application and then works with the Model to prepare any
data needed by the View. The View then uses the data prepared by the

Controller to generate a final presentable response. The MVC abstraction


can be graphically represented as follows.

The model
The model is responsible for managing the data of the application. It
responds to the request from the view and it also responds to instructions
from the controller to update itself.

The view
A presentation of data in a particular format, triggered by a controller's
decision to present the data. They are script based templating systems like
JSP, ASP, PHP and very easy to integrate with AJAX technology.

The controller
The controller is responsible for responding to user input and perform
interactions on the data model objects. The controller receives the input, it

validates the input and then performs the business operation that modifies
the state of the data model.
Struts2 is a MVC based framework. In the coming chapters, let us see how
we can use the MVC methodology within Struts2.

Model-View-Controller
Context
The purpose of many computer systems is to retrieve data from a data store and
display it for the user. After the user changes the data, the system stores the updates
in the data store. Because the key flow of information is between the data store and
the user interface, you might be inclined to tie these two pieces together to reduce
the amount of coding and to improve application performance. However, this
seemingly natural approach has some significant problems. One problem is that the
user interface tends to change much more frequently than the data storage system.
Another problem with coupling the data and user interface pieces is that business
applications tend to incorporate business logic that goes far beyond data
transmission.

Problem
How do you modularize the user interface functionality of a Web application so that
you can easily modify the individual parts?

Forces
The following forces act on a system within this context and must be reconciled as
you consider a solution to the problem:
User interface logic tends to change more frequently than business logic, especially in Webbased applications. For example, new user interface pages may be added, or existing page
layouts may be shuffled around. After all, one of the advantages of a Web-based thin-client
application is the fact that you can change the user interface at any time without having to
redistribute the application. If presentation code and business logic are combined in a single
object, you have to modify an object containing business logic every time you change the user

interface. This is likely to introduce errors and require the retesting of all business logic after
every minimal user interface change.
In some cases, the application displays the same data in different ways. For example, when an
analyst prefers a spreadsheet view of data whereas management prefers a pie chart of the same
data. In some rich-client user interfaces, multiple views of the same data are shown at the same
time. If the user changes data in one view, the system must update all other views of the data
automatically.
Designing visually appealing and efficient HTML pages generally requires a different skill set
than does developing complex business logic. Rarely does a person have both skill sets.
Therefore, it is desirable to separate the development effort of these two parts.
User interface activity generally consists of two parts: presentation and update. The
presentation part retrieves data from a data source and formats the data for display. When the
user performs an action based on the data, the update part passes control back to the business
logic to update the data.
In Web applications, a single page request combines the processing of the action associated
with the link that the user selected with the rendering of the target page. In many cases, the target
page may not be directly related to the action. For example, imagine a simple Web application
that shows a list of items. The user returns to the main list page after either adding an item to the
list or deleting an item from the list. Therefore, the application must render the same page (the
list) after executing two quite different commands (adding or deleting)-all within the same HTTP
request.
User interface code tends to be more device-dependent than business logic. If you want to
migrate the application from a browser-based application to support personal digital assistants
(PDAs) or Web-enabled cell phones, you must replace much of the user interface code, whereas
the business logic may be unaffected. A clean separation of these two parts accelerates the
migration and minimizes the risk of introducing errors into the business logic.
Creating automated tests for user interfaces is generally more difficult and time-consuming
than for business logic. Therefore, reducing the amount of code that is directly tied to the user
interface enhances the testability of the application.

Solution
The Model-View-Controller (MVC) pattern separates the modeling of the domain, the
presentation, and the actions based on user input into three separate classes
[Burbeck92]:
Model. The model manages the behavior and data of the application domain, responds to
requests for information about its state (usually from the view), and responds to instructions to
change state (usually from the controller).

View. The view manages the display of information.


Controller. The controller interprets the mouse and keyboard inputs from the user, informing
the model and/or the view to change as appropriate.
Figure 1 depicts the structural relationship between the three objects.

Figure 1: MVC class structure


It is important to note that both the view and the controller depend on the model.
However, the model depends on neither the view nor the controller. This is one the
key benefits of the separation. This separation allows the model to be built and
tested independent of the visual presentation. The separation between view and
controller is secondary in many rich-client applications, and, in fact, many user
interface frameworks implement the roles as one object. In Web applications, on the
other hand, the separation between view (the browser) and controller (the serverside components handling the HTTP request) is very well defined.
Model-View-Controller is a fundamental design pattern for the separation of user
interface logic from business logic. Unfortunately, the popularity of the pattern has
resulted in a number of faulty descriptions. In particular, the term "controller" has
been used to mean different things in different contexts. Fortunately, the advent of
Web applications has helped resolve some of the ambiguity because the separation
between the view and the controller is so apparent.

Variations
In Application Programming in Smalltalk-80: How to use Model-View-Controller
(MVC) [Burbeck92], Steve Burbeck describes two variations of MVC: a passive model
and an active model.
The passive model is employed when one controller manipulates the model
exclusively. The controller modifies the model and then informs the view that the
model has changed and should be refreshed (see Figure 2). The model in this
scenario is completely independent of the view and the controller, which means that
there is no means for the model to report changes in its state. The HTTP protocol is
an example of this. There is no simple way in the browser to get asynchronous

updates from the server. The browser displays the view and responds to user input,
but it does not detect changes in the data on the server. Only when the user
explicitly requests a refresh is the server interrogated for changes.

Figure 2: Behavior of the passive model


The active model is used when the model changes state without the controller's
involvement. This can happen when other sources are changing the data and the
changes must be reflected in the views. Consider a stock-ticker display. You receive
stock data from an external source and want to update the views (for example, a
ticker band and an alert window) when the stock data changes. Because only the
model detects changes to its internal state when they occur, the model must notify
the views to refresh the display.
However, one of the motivations of using the MVC pattern is to make the model
independent from of the views. If the model had to notify the views of changes, you
would reintroduce the dependency you were looking to avoid. Fortunately,
the Observer pattern [Gamma95] provides a mechanism to alert other objects of
state changes without introducing dependencies on them. The individual views
implement the Observerinterface and register with the model. The model tracks the
list of all observers that subscribe to changes. When a model changes, the model
iterates through all registered observers and notifies them of the change. This
approach is often called "publish-subscribe." The model never requires specific
information about any views. In fact, in scenarios where the controller needs to be
informed of model changes (for example, to enable or disable menu options), all the
controller has to do is implement the Observer interface and subscribe to the model

changes. In situations where there are many views, it makes sense to define multiple
subjects, each of which describes a specific type of model change. Each view can
then subscribe only to types of changes that are relevant to the view.
Figure 3 shows the structure of the active MVC using Observer and how the observer
isolates the model from referencing views directly.

Figure 3: Using Observer to decouple the model from the view in the active model
Figure 4 illustrates how the Observer notifies the views when the model changes.
Unfortunately, there is no good way to demonstrate the separation of model and
view in a Unified Modeling Language (UML) sequence diagram, because the diagram
represents instances of objects rather than classes and interfaces.

Figure 4: Behavior of the active model

Example
See Implementing Model-View-Controller in ASP.NET.

Testing Considerations
Testability is greatly enhanced when you employ employing Model-View-Controller.
Testing components becomes difficult when they are highly interdependent,

especially with user interface components. These types of components often require
a complex setup just to test a simple function. Worse, when an error occurs, it is hard
to isolate the problem to a specific component. This is the reason why separation of
concerns is such an important architectural driver. MVC separates the concern of
storing, displaying, and updating data into three components that can be tested
individually.
Apart from the problems posed by interdependencies, user interface frameworks are
inherently difficult to test. Testing user interfaces either requires tedious (and errorprone) manual testing or testing scripts that simulate user actions. These scripts tend
to be time-consuming to develop and brittle. MVC does not eliminate the need for
user interface testing, but separating the model from the presentation logic allows
the model to be tested independent of the presentation and reduces the number of
user interface test cases.

Resulting Context
Architecting the presentation layer around the MVC pattern results in the following
benefits and liabilities:

Benefits
Supports multiple views. Because the view is separated from the model and there is no direct
dependency from the model to the view, the user interface can display multiple views of the
same data at the same time. For example, multiple pages in a Web application may use the same
model objects. Another example is a Web application that allows the user to change the
appearance of the pages. These pages display the same data from the shared model, but show it
in a different way.
Accommodates change. User interface requirements tend to change more rapidly than
business rules. Users may prefer different colors, fonts, screen layouts, and levels of support for
new devices such as cell phones or PDAs. Because the model does not depend on the views,
adding new types of views to the system generally does not affect the model. As a result, the
scope of change is confined to the view. This pattern lays the foundation for further
specializations of this pattern such as Page Controllerand Front Controller.

Liabilities
Complexity. The MVC pattern introduces new levels of indirection and therefore increases the
complexity of the solution slightly. It also increases the event-driven nature of the user-interface
code, which can become more difficult to debug.
Cost of frequent updates. Decoupling the model from the view does not mean that
developers of the model can ignore the nature of the views. For example, if the model undergoes

frequent changes, it could flood the views with update requests. Some views, such as graphical
displays, may take some time to render. As a result, the view may fall behind update requests.
Therefore, it is important to keep the view in mind when coding the model. For example, the
model could batch multiple updates into a single notification to the view.

What is Model View Controller


(MVC)?
In a typical application you will find these three fundamental parts:

Data (Model)

An interface to view and modify the data (View)

Operations that can be performed on the data (Controller)

The MVC pattern, in a nutshell, is this:


1. The model represents the data, and does nothing else. The
model does NOT depend on the controller or the view.
2. The view displays the model data, and sends user actions
(e.g. button clicks) to the controller. The view can:
o

be independent of both the model and the controller; or

actually be the controller, and therefore depend on the


model.

3. The controller provides model data to the view, and interprets


user actions such as button clicks. The controller depends on

the view and the model. In some cases, the controller and the
view are the same object.
Rule 1 is the golden rule of MVC so I'll repeat it:

The model represents the data, and does nothing


else. The model does NOT depend on the
controller or the view.
Let's take an address book application as an example. The model is
a list of Person objects, the view is a GUI window that displays the
list of people, and the controller handles actions such as "Delete
person", "Add person", "Email person", etc. The following example
does not use MVC because the model depends on the view.
//Example 1:
void Person::setPicture(Picture pict){
m_picture = pict; //set the member variable
m_listView->reloadData(); //update the view
}

The following example uses MVC:


//Example 2:
void Person::setPicture(Picture pict){
m_picture = pict; //set the member variable
}

void
PersonListController::changePictureAtIndex(Picture
newPict, int personIndex){
m_personList[personIndex].setPicture(newPict);
//modify the model
m_listView->reloadData(); //update the view
}

In the above example, the Person class knows nothing about the
view. The PersonListControllerhandles both changing the
model, and updating the view. The view window tells the controller
about user actions (in this case, it tells the controller that the user
changed the picture of a person).

What is the advantage of MVC?


Unnecessary complexity is the devil of software development.
Complexity leads to software that is buggy, and expensive to
maintain. The easiest way to make code overly complex is to put
dependencies everywhere. Conversely, removing unnecessary
dependencies makes delightful code that is less buggy and easier
to maintain because it is reusable without modification. You can
happily reuse old, stable code without introducing new bugs into it.
The primary advantage of the MVC design pattern is this:

MVC makes model classes reusable without


modification.

The purpose of the controller is to remove the view dependency


from the model. By removing the view dependency from the model,
the model code becomes delightful.
Why is the model code so delightful? Let's continue with the
address book application example. The project manager
approaches the developer and says "We love the contact list
window, but we need a second window that displays all the contacts
by their photos only. The photos should be in a table layout, with
five photos per row."
If the application uses MVC, this task is pretty straight forward.
Currently there are three classes:Person, PersonListController,
and PersonListView. Two classes need to be
created:PersonPhotoGridView and PersonPhotoGridController
. The Person class remains the same, and is easily plugged into the
two different views. How delightful.
If the application is structured badly like in Example 1, then things
get more complicated. Currently there are two classes Person,
and PersonListView. The Person class can not be plugged into
another view, because it contains code specific to PersonListView.
The developer must modify the Personclass to accommodate the
new PersonPhotoGridView, and ends up complicating the model
like so:
//Example 3:

void Person::setPicture(Picture pict){


m_picture = pict; //set the member variable
if(m_listView){ //check if it's in a list view
m_listView->reloadData(); //update the list
view
}
if(m_gridView){ //check if it's in a grid view
m_gridView->reloadData(); //update the grid
view
}
}

As you can see, the model code is starting to turn nasty. If the
project manager then says "we're porting the app to a platform with
a different GUI toolkit" the delightfulness is even more prominent.
With MVC, the Person class can be displayed by different GUI
toolkits without any modification. Just make a controller and a view
with the new toolkit, just as you would with the old toolkit. Without
MVC, it is a nightmare to support multiple GUI toolkits. The code
may end up looking like this:
//Example 4:
void Person::setPicture(Picture pict){
m_picture = pict;
#ifdef ORIGINAL_GUI_TOOLKIT
if(m_listView){ //check if it's in a list view

m_listView->reloadData(); //update the list


view
}
if(m_gridView){ //check if it's in a grid view
m_gridView->reloadData(); //update the grid
view
}
#endif
#ifdef NEW_GUI_TOOLKIT
if(m_listView){ //check if it's in a list view
m_listView->redisplayData(); //update the
list view
}
if(m_gridView){ //check if it's in a grid view
m_gridView->redisplayData(); //update the
grid view
}
#endif
}

The setPicture method is basically spaghetti code at this point.

Why not put the controller code


in the view?
One solution to the spaghetti code problem in Example 4 is to move
the controller code from the model to the view like so:
//Example 5:
PersonListView::newPictureClicked(Picture
clickedPicture){
m_selectedPerson.setPicture(clickedPicture);
this->reloadData();
}

The above example also makes the model reusable, which is the
main advantage of MVC. When the view will only ever display one
type of model object, then combining the view and the controller is
fine. For example, a SinglePersonView will only ever display
a Person object, so the SinglePersonView can double as the
controller.
However, if the controller is separate from the view then MVC has a
second advantage:

MVC can also make the view reusable without


modification.
Not only does MVC make the model delightful, it can also make the
view delightful. Ideally, a list view should be able to display lists of

anything, not just Person objects. The code in Example 5


can not be a generic list view, because it is tied to the model
(the Person class). In the situation where the view should be
reusable (e.g. a list view, or a table view) and the model should be
reusable, MVC is the only thing that will work. The controller
removes the dependencies from both the model and the view, which
allows them to be reused elsewhere.

Conclusion
The MVC design pattern inserts a controller class between the view
and the model to remove the model-view dependencies. With the
dependencies removed, the model, and possibly the view, can be
made reusable without modification. This makes implementing new
features and maintenance a breeze. The users get stable software
quickly, the company saves money, and the developers don't go
insane. How good is that?

Ajax and MVC


These two buzzwords have emerged in the last years as
key features of frameworks, both in the PHP landscape
and in other languages' niches. Let's analyze what is the
relationship between Ajax technology and MVC

frameworks, and why they are so comfortable with each


other.

MVC
The Model-View-Controller pattern separates every
feature of an application into three aspects: the Model,
which is the representation of data and domain-specific
behavior; the View(s), which reflects the changes to the
model and handled the presentation logic; and the
Controller which channels the user actions to drive the
Model. The goal of this separation of concerns is being
able to change as much as possible of one of the three
layers without having an impact on each of the others.
Many web frameworks have embraced the MVC pattern,
introducing a stack of components for Controllers and
Views (and in some cases also to ease the development of
the Model) to subclass or configure in order to build a
full-featured web application without handling the raw
HTTP requests. In the case of PHP, frameworks abstract
away much of the boilerplate work with the native
language constructs ($_GET, $_POST, $_SESSION),
and provide an higher-level object-oriented Api.

AJAX
The introduction of the XMLHttpRequest object in
modern browsers marked the starting point of the AJAX
(Asynchronous JavaScript And XML) era, where a page
is capable of initiating HTTP requests towards the
server-side application following events that happen on
the client. The initial innovation leaded the way for the
diffusion of javascript libraries that performed reliable
cross-browser Ajax request for the first time, and were
able to compose pages from segments generated
independently on the server, inserting them in the DOM.
Although the AJAX acronym comprehends XML,
anything can be returned from such a server request for
the client's pleasure, from text to HTML to Json. AJAX is
everywhere now: Twitter and Facebook timelines are
realized by inserting AJAX results into the homepage,
and DZone infinite pagination is implemented with the
same pattern. Google's GMail and Google Documents
makes heavy use of AJAX. A modern application cannot
ignore the revolution that AJAX brought to web
development.

Their union
While the MVC pattern is not inherently web-related, the
AJAX technology takes advantage of the separation of
concerns favored by MVC to reuse as much code as
possible on the server, and enrich the user experience.
How does AJAX affects the classical MVC components of
a web application?
The Model component is usually not touched when
introducing AJAX into an application as it deals with the
inner business logic of the application. If the Model itself
is well factored, it will continue to reside on the server
and ignores every change in the presentation layer which
derives from AJAX-powered web pages.
The View becomes instead the principal subject of
changes, as AJAX pages are essentially different
implementations of the Views, that still are the target of
of the same Model. Here are two simple examples of
alternate Views used as AJAX callbacks:

a View can be generated without layout (header,


footer, menus), for the inclusion of it as a simulated
frame into an already existing page. Historically, this

was one of the first and simplest implementation of


AJAX-powered webpages, which would modify a
specific div instead of reloading the whole
document. The AjaxLink jQuery plugin is an
example of this application.
a View can return part of the Model, like a single
object or a collection, in a different format from the
human-readable html. Examples of machinereadable formats are representations as XML, Json,
or literal JavaScript.

The Controller layer still reacts to HTTP requests as the


basic event generated from the client, although most of
the requests are not focused on a complete document,
but more on a particular endpoint or resource. Of course,
these resources are virtual as much as the original
documents, since this is the dynamic nature of a web
application.
The controller has to return the appropriate View of the
Model as a text stream (this is a requirement from the
format of HTTP responses), which as seen before are
different formats of the same object graph. Part of the
Controller is moved on the client: while Views were
produced on the server and rendered on the client from

the start, the boundary is not clearly defined in AJAX


applications.
For example, when a View in Json or XML format is
returned to the client, this is an intermediate
representation as the View that the end user will
ultimately see is always composed by segments of HTML
code. So we have what for the server-side architecture is
only a View becoming input for a Controller on the client,
which generates or clones HTML to accomodate it.
The original View and the client-side Controller may not
even be part of the same application, as the latter may
consume an external web service. Though, there
are security limitations in what an XMLHttpRequest
object can do, thus these mashups have either to pass
from the server or to use an hidden iframe as a proxy
(the same workaround commonly used for AJAX-like file
uploads).

Support
How frameworks embrace AJAX and what support is
provided to further extend the MVC metaphor into AJAX
applications? Having tried Zend Framework for

managing the multiple-format Views I talked about


earlier, I saw that the generic PHP code is already
present and ready to be employed.
The Zend_Controller component provide an helper
named AjaxContext, which is configured while
subclassing the base controller of the MVC stack. The
configuration sets up specific actions for usage with
XMLHttpRequest calls, by letting the helper intercept
the requests. It recognizes the X-Requested-With nonstandard HTTP header sent from most of the javascript
libraries and disable the standard layout, switching the
view to render (a PHP script in Zend Framework)
from action.phtml toaction.xml.phtml or even
to json_encode().
With this example in mind, it's very easy to extend an
existing application by introducing AJAX user
experiences, while reusing the existing code. The
separation of concerns of MVC is finally leveraged: the
Model does not change at all, the controllers are tweaked
and separate implementations of the Views are coded.

Context and Problem

Context
Interactive application with flexible human-computer interface

Problem
Because the flow of information is between the data store and UI, one may be
inclined to data and UI to reduce the amount of coding and to improve application
performance.
However, the major problem is that UI tends to change much more frequently
than the data storage system.
Another problem is that business applications tend to incorporate complex
business logic which also gets mixed up

Forces

Same data with different presentation needed

Display, behavior and data-manipulation to be reflected immediately

UI is a different element altogether


Changes are very frequent, more than data and business logic

One should be able to test only the UI part

Skillset: HTML page designer skills are different from core app development. It is
desirable to separate UI with the rest of the app

Changing look-feel (even device dependency) shouldnt affect the core app

In web-app, one UI action can trigger many processing and then outcome may need to be
collated into one

Model View Controller

MVC Components

The model directly manages the data, logic and rules of the application.

A view can be any output representation of information, such as a chart or a diagram.


Multiple views of the same information are possible, such as a bar chart for management
and a tabular view for accountants.

The controller, accepts input and converts it to commands for the model or view.

Interactions

A model stores data that is retrieved according to commands from the controller and
displayed in the view.

A view generates new output to the user based on changes in the model.

A controller can send commands to the model to update the model's state. It can also
send commands to its associated view to change the view's presentation of the model.

Model-View-Controller

Other Dynamic Scenarios

Update to display alone i.e. no change in the controller

System exit. Sequence of deletion or destruction of objects

Scenario of update with multiple View-Controller pairs

MVC Implementation

Steps

1 Separate human-computer interaction from the core


functionality

Fundamental 2 Implement the set-up of MVC setup part


steps for
3 Design and implement Model
realizing a
4 Design and implement views
MVC
5 Design and implement controllers

Initial Part

3: Design the Model

Encapsulate the data and functionality to access and modify data

Bridge to the core business logic of the system


Publish-Subscribe design pattern
Implement a registry that holds references of observers (Views and
Controllers)
Provide APIs for an observer to subscribe and unsubscribe
Implement notify() which will be called every time the (other) parts of
the system change the models state (and data)
In turn calls update() of each observer (a view or a controller)
4: Design and Implement Views

Design the appearance of the View(s)


Presents information to the user
Each important data entity should have a view
Each view may have its own controller (sometimes a set of views can also
share a controller)
Creates a controller using the Factory Method design pattern
(makeController() in View class)
View exposes a method for controller to use directly bypassing the
model (a scenario when models state is not changed by the action of
a user)
Implement update() method
retrieves data from the model and presents it on the screen
Initialization
Register with model
Set up relationship with the controller

Look for efficiency of fetching the data from model to build the view
View to decide based on changes if Draw needs to be called
5: Design and Implement Controllers
Initialization procedure
Binds the controller to its View and Model and enables event
processing
Subscribes to the change-propagation mechanism
Set up relationship with the View
Implement event processing
accept user input as events; events delivery to the controller is
platform dependent
Event translated into requests for the model or the associated view
Controller behavior dependent on state of model
Registers for change propagation
Implements its update() procedure
Variants
Document View
Document = Model
View = View Controller
Loose coupling of View and Controller enables multiple simultaneous and
synchronized but different views of the same document
MVC in AJAX based Applications
Traditional web based UI is thin-client based
Browser sends HTTP GET/POST request to the server

Entire web page is refreshed


Client side Javascript is used for field validations
One request may entail retrieving data from many servers
AJAX running on a browser
Makes asynchronous calls to the server without refreshing the primary
HTML
No longer a thin client, provides a richer user interface
AJAX in Action

AJAX and MVC

Benefits
Multiple views of the same model
Synchronized views (as soon as Model changes, Views also get changes via
notifications)
Pluggable views and controllers
Exchangeability of look-and-feel
Framework potential
Should you use it everywhere?

Maybe not. Extremely complex web applications maybe split into multiple
layers! You may not be able to get away with just View/Business
Logic/Data layers.

Here's an example where just MVC by itself maybe a bad choice:


Try designing air traffic control system or a loan processing application for a
large bank

- just MVC by itself would be a bad choice. You will inevitably have Event
buses/message queues along with a multi-layered architecture with MVC within
individual layers and possibly an comprehensive MVC design to keep the code
base better organized.
Liabilities
Increased complexity
Potential for excessive number of updates
Intimation connection betweenLiabilities

view and controller

Close coupling of views and controllers to a model


Inefficiency of data access in view
Difficulty of using MVC with modern user-interface tools