Beruflich Dokumente
Kultur Dokumente
Network Trustworthiness
Prof. William H. Sanders
Department of Electrical and Computer Engineering and
Coordinated Science Laboratory
University of Illinois at Urbana-Champaign
whs@uiuc.edu
www.mobius.uiuc.edu
www.perform.csl.uiuc.edu
www.iti.uiuc.edu
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 1
Course Outline
Issues in Model-Based Validation of High-Availability
Computer Systems/Networks
Combinatorial Modeling
Stochastic Activity Network Concepts
Analytic/Numerical State-Based Modeling
Case Study: Embedded Fault-Tolerant Multiprocessor System
Solution by Simulation
Symbolic State-space Exploration and Numerical Analysis of
State-sharing Composed Models
Case Study: Security Evaluation of a Publish and Subscribe
System
The Art of System Trust Evaluation /Conclusions
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 2
improper
service
proper
service
restoration
Slide 5
Slide 6
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 7
Time to Failure - measure of the time to failure from last restoration. (Expected
value of this measure is referred to as MTTF - Mean time to failure.)
Coverage - the probability that, given a fault, the system can tolerate the fault
and continue to deliver proper service.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 8
Modeling
Active
Passive
(no fault (Fault Injection
injection) on Prototype)
Without
Contact
Simulation
With
Contact
HardwareImplemented
Continuous Discrete
Event
State
(state)
Analysis/
Numerical
Deterministic
Non-Deterministic
Probabilistic Non-Probabilistic
SoftwareImplemented
Sequential
Stand-alone
Systems
Mbius supports
model-based validation
of italicized (red) items.
Networks/
Distributed
Systems
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Parallel
Non-State-space-based
State-space-based (Combinatorial)
Slide 9
Requirement
Decomposition
ModuleB
ModuleA
AA1
M1
M2
L1
L2
AA2
AA3
M4
M3
ModuleZ
AP1
AP2
M5
M6
L3
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Functional Model
of the System
(Probabilistic or
Logical)
Assumptions
Supporting Logical
Arguments and
Experimentation
Slide 10
0
1e-t
0
e-t
t0
t>0
t0
t>0 .
d
Fx (t );
dt
1
1
Its mean is and its variance is 2 .
The exponential random variable is the only continuous random variable that is
memoryless.
To see this, let X be an exponential random variable representing the time that an
event occurs (e.g., a fault arrival).
Important Fact 1: P[ X > t + s X > s ] = P[ X > t ] (memoryless property)!
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 12
This can be thought of as looking at X at time t, observing that the event has not
occurred, and measuring the number of events (probability of the event) that
occur per unit of time at time t.
Important Fact 2: The exponential random variable has a constant failure rate!
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 13
Slide 14
= P[ A < B A = x ] f A ( x )dx
= P[ A < B A = x ] e x dx
= P[x < B ] e x dx
0
= (1 P[B x ]) e x dx
0
= (1 1 e x ) e x dx
= e x e x dx
0
= e ( + ) x dx =
0
Slide 15
Course Outline
Issues in Model-Based Validation of High-Availability
Computer Systems/Networks
Combinatorial Modeling
Stochastic Activity Network Concepts
Analytic/Numerical State-Based Modeling
Case Study: Embedded Fault-Tolerant Multiprocessor System
Solution by Simulation
Symbolic State-space Exploration and Numerical Analysis of
State-sharing Composed Models
Case Study: Security Evaluation of a Publish and Subscribe
System
The Art of System Trust Evaluation /Conclusions
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 16
Combinatorial Methods
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 17
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 18
Lecture Outline
Review definition of reliability
Failure rate
System reliability
Maximum
Minimum
k of N
Reliability formalisms
Reliability block diagrams
Fault trees
Reliability graphs
Reliability modeling process
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 19
Reliability
One key to building highly available systems is the use of reliable components
and systems.
Reliability: The reliability of a system at time t (R(t)) is the probability that the
system operation is proper throughout the interval [0,t].
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 20
Failure Rate
What is the rate that a component fails at time t? This is the probability that a
component that has not yet failed fails in the interval (t, t + t), as t 0.
Note that we are not looking at P[X (t, t + t)] = fX(t). Rather, we are seeking
P[X (t, t + t)| X > t].
P[ X (t , t + t ), X > t ]
P[ X > t ]
P[ X (t , t + t )]
=
1 FX (t )
P[ X (t , t + t ) | X > t ] =
f X (t )
= rX (t )
1 FX (t )
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 21
Break in
Normal operation
Wear out
rX(t)
time
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 22
System Reliability
While FX can give the reliability of a component, how do you compute the
reliability of a system?
System failure can occur when one, all, or some of the components fail. If one
makes the independent failure assumption, system failure can be computed quite
simply. The independent failure assumption states that all component failures of a
system are independent, i.e., the failure of one component does not cause another
component to be more or less likely to fail.
Given this assumption, one can determine:
1) Minimum failure time of a set of components
2) Maximum failure time of a set of components
3) Probability that k of N components have failed at a particular time t.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 23
F
i =1
Xi
(t )
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 24
A2
A1
A3
This is an application of the law of total probability (LOTP).
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 25
Minimum cont.
Fs(t) = P[X1 t OR X2 t OR . . . OR Xn t]
= 1 - P[X1 > t AND X2 > t AND . . . AND Xn > t]
= 1 - P[X1 > t] P[X2 > t] . . . P[Xn > t]
= 1 - (1 - P[X1 t])(1 - P[X2 t]) . . . (1 - P[Xn t])
By trick
By independence
By LOTP
= 1 (1 FX i (t ))
i =1
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 26
k of N
Let X1, . . . , Xn be component failure times that have identical distributions (i.e.,
FX (t ) = FX (t ) = . . .). The system fails at time S if k of the N components fail.
1
N
FS (t ) = FX (t ) i (1 FX (t )) N i
i=k i
N
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 27
k of N in General
For non-identical failure distributions, we must sum over all combinations of at
least k failures.
Let Gk be the set of all subsets of {X1, . . . , XN} such that each element in Gk is a set
of size at least k, i.e.,
Gk = {gi {X1, . . . , XN} : |gi| k}.
The set Gk represents all the possible failure scenarios.
Now FS is given by
(
)
FS (t ) = FX (t ) 1 FX (t )
gG X g
X g
k
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 28
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 29
Summary
A system comprises N components, where the component failure times are given
by the random variables X1, . . . , XN. The system fails at time S with distribution
FS if:
Condition:
Distribution:
N
FS (t ) = FX (t )
FS (t ) = 1 (1 FX (t ) )
i =1
i =1
k components fail,
identical distributions
N
N i
FS (t ) = FX (t ) i (1 FX (t ) )
i=k i
k components fail,
general case
FS (t ) = FX (t ) (1 FX (t ) )
gG X g
X g
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 30
Reliability Formalisms
There are several popular graphical formalisms to express system reliability. The
core of the solvers is the methods we have just examined. In particular, we will
examine
Reliability Block Diagrams
Fault Trees
Reliability Graphs
There is nothing particularly special about these formalisms except their popularity.
It is easy to implement these formalisms, or design your own, in a spreadsheet, for
example.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 31
Series:
System fails if any component fails.
Parallel:
System fails if all components fail.
source
C1
C2
C3
sink
C1
source
C2
sink
C3
k of N:
System fails if at least k of N
components fail.
C1
source
C2
sink
C3
2 of 3
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 32
Example
A NASA satellite architecture under study is designed for high reliability. The
major computer system components include the CPU system, the high-speed
network for data collection and transmission, and the low-speed network for
engineering and control. The satellite fails if any of the major systems fail.
There are 3 computers, and the computer system fails if 2 or more of the computers
fail. Failure distribution of a computer is given by FC.
There is a redundant (2) high-speed network, and the high-speed network system
fails if both networks fail. The distribution of a high-speed network failure is given
by FH.
The low-speed network is arranged similarly, with a failure distribution of FL.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 33
RBG Example
computer
LSN
HSN
source
computer
sink
HSN
LSN
computer
2 of 3
3 3 i
3 i
FS (t ) = 1 1 FC (t )(1 FC (t ))
i=2 i
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
1 ( FH (t ))2 1 ( FL (t ))2
)(
Slide 34
Fault Trees
AND gates
true if all the components are true (fail).
AND
C1 C2 C3
OR
OR gates
true if any of the components are true (fail).
C1
k of N gates
true if at least k of the components are true (fail).
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
C2
C3
2 of 3
C1 C2 C3
Slide 35
OR
2 of 3
C1
C2
AND
C3
H1
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
AND
H2
L1
L2
Slide 36
Distribution:
N
FS (t ) = FX (t )
FS (t ) = 1 (1 FX (t ) )
i =1
i =1
k components fail,
identical distributions
N
N i
FS (t ) = FX (t ) i (1 FX (t ) )
i=k i
k components fail,
general case
FS (t ) = FX (t ) (1 FX (t ) )
gG X g
X g
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 37
Reliability Formalisms
There are several popular graphical formalisms to express system reliability. The
core of the solvers is the methods we have just examined. In particular, we will
examine
Reliability Block Diagrams
Fault Trees
Reliability Graphs
There is nothing particularly special about these formalisms except their popularity.
It is easy to implement these formalisms, or design your own, in a spreadsheet, for
example.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 38
Series:
System fails if any component fails.
Parallel:
System fails if all components fail.
source
C1
C2
C3
sink
C1
source
C2
sink
C3
k of N:
System fails if at least k of N
components fail.
C1
source
C2
sink
C3
2 of 3
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 39
Example
A NASA satellite architecture under study is designed for high reliability. The
major computer system components include the CPU system, the high-speed
network for data collection and transmission, and the low-speed network for
engineering and control. The satellite fails if any of the major systems fail.
There are 3 computers, and the computer system fails if 2 or more of the computers
fail. Failure distribution of a computer is given by FC.
There is a redundant (2) high-speed network, and the high-speed network system
fails if both networks fail. The distribution of a high-speed network failure is given
by FH.
The low-speed network is arranged similarly, with a failure distribution of FL.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 40
RBG Example
computer
LSN
HSN
source
computer
sink
HSN
LSN
computer
2 of 3
3 3 i
3 i
FS (t ) = 1 1 FC (t )(1 FC (t ))
i=2 i
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
1 ( FH (t ))2 1 ( FL (t ))2
)(
Slide 41
Fault Trees
AND gates
true if all the components are true (fail).
AND
C1 C2 C3
OR
OR gates
true if any of the components are true (fail).
C1
k of N gates
true if at least k of the components are true (fail).
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
C2
C3
2 of 3
C1 C2 C3
Slide 42
OR
2 of 3
C1
C2
AND
C3
H1
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
AND
H2
L1
L2
Slide 43
Reliability Graphs
source
FC1
FC2
sink
FC1
source
FC2
sink
FC3
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 44
2
A
source
B
How do we solve this?
sink
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 45
Solving by Conditioning
P[ E F ]
P[ F ]
If F and F are complementary events, i.e.,
Recall that P[ E | F ] =
F F = and F F =
then there is a trick :
P[ E ] = P[ E F ] + P[ E F ]
P[ E ] = P[ E | F ]P[ F ] + P[ E | F ]P[ F ]
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 46
A
source
C
B
sink
A
source
sink
3
FS |C Fail (t ) = P[ S t | C t ] = (1 (1 FA (t ) )(1 FD (t ) ))(1 (1 FB (t ) )(1 FE (t ) ))
and P[C t ] = FC (t )
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 47
2,3
B
sink
FS |C up (t ) = P[ S t | C > t ] = 1 (1 FA (t ) FB (t ) )(1 FD (t ) FE (t ) ),
and P[C > t ] = 1 P[C t ] = 1 FC (t )
Thus, FS (t ) = FS |C Fail (t ) FC (t ) + FS |C up (t )(1 FC (t ) ).
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 48
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 49
This condition simplifies computation because all that is necessary for solution
is the reliability of the components at time t. Solution then becomes a
straightforward computation.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 50
Reliability/Availability Tables
A system comprises N components. Reliability of component i at time t is given by
RXi(t), and the availability of component i at time t is given by AXi(t).
Condition
system fails if all
components fail
System Reliability
AS (t ) = 1 (1 AXi (t ))
RS (t ) = 1 (1 RXi (t ))
i =1
i =1
system fails if
one component fails
AS (t ) = AXi (t )
RS (t ) = RXi (t )
i =1
i =1
system fails if at
N
N
i
N i
least k components
RS (t ) = (1 RXi (t )) RX (t )
i =k i
fail, identical distribution
system fails if at least
k components fail,
general case
System Availability
RS (t ) =
(
(
)
)
(
)
1
R
t
R
t
X
gG X g
X g
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
N
N i
i
AS (t ) = (1 AX (t )) AX (t )
i =k i
N
AS (t ) =
(
(
)
)
(
)
1
A
t
A
t
X
X
gG X G
X g
Slide 51
In all cases, numbers should be used with caution and adjusted based on
observation and experience.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 52
Modeling Process
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 53
RBD Process(system)
Define the system
Define proper service
Create RBD out of components
for each component
if component is simple
obtain reliability data of component
else
Do RBD Process(component)
end if
Compute reliability of system
Do results meet specification?
Modify design and repeat as necessary
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 54
Summary
Reliability: review of definition
Failure rate
System reliability
Independent failure assumption
Minimum, maximum, k of N
Reliability block diagrams, fault trees, reliability graphs
Reliability modeling process
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 55
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 56
Introduction
Stochastic activity networks, or SANs, are a convenient, graphical, high-level
language for describing system behavior. SANs are useful in capturing the
stochastic (or random) behavior of a system.
Examples:
The amount of time a program takes to execute can be computed precisely if
all factors are known, but this is nearly impossible and sometimes useless.
At a more abstract level, we can approximate the running time by a random
variable.
Fault arrivals almost always must be modeled by a random process.
We begin by describing a subset of SANs: stochastic Petri nets.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 58
tokens:
transitions:
input arcs:
output arcs:
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 59
Example:
P1
P2
t1
Transition t1 is enabled.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 60
Example:
P1
P3
t1 fires
t1
P2
P4
Slide 61
If a transition t becomes enabled, and before t fires, some other transition fires
and changes the state of the SPN such that t is no longer enabled, then t aborts,
that is, t will not fire.
Since the exponential distribution is memoryless, one can say that transitions
that remain enabled continue or restart, as is convenient, without changing the
behavior of the network.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 62
Notes on SPNs
SPNs are much easier to read, write, modify, and debug than Markov chains.
Most SPN formalisms include a special type of arc called an inhibitor arc,
which enables the SPN if there are zero tokens in the associated place, and the
identity (do nothing) function. Example: modify SPN to give writes priority.
Limited in their expressive power: may only perform +, -, >, and test-for-zero
operations.
More general and flexible formalisms are needed to represent real systems.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 65
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 66
SAN Symbols
Stochastic activity networks (hereafter SANs) have four new symbols in addition to
those of SPNs:
Input gate:
Output gate:
Cases:
Instantaneous activities:
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 67
SAN Terms
1. activation - time at which an activity begins
2. completion - time at which activity completes
3. abort - time, after activation but before completion, when activity is no longer
enabled
4. active - the time after an activity has been activated but before it completes or
aborts.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 73
completion
activity time
t
activation
aborted
enabled
activity time
activation
completion
and activation
activity
time
completion
activity
time
enabled
t
enabled
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 74
Completion Rules
When an activity completes, the following events take place (in the order listed),
possibly changing the marking of the network:
1. If the activity has cases, a case is (probabilistically) chosen.
2. The functions of all the connected input gates are executed (in an
unspecified order).
3. Tokens are removed from places connected by input arcs.
4. The functions of all the output gates connected to the chosen case are
executed (in an unspecified order).
5. Tokens are added to places connected by output arcs connected to the
chosen case.
Ordering is important, since effect of actions can be marking-dependent.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 75
Exponential
Hyperexponential
Deterministic
Weibull
Conditional Weibull
Normal
Erlang
Gamma
Beta
Uniform
Binomial
Negative Binomial
Slide 80
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 81
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 82
Activity
CPUfail1
Gate
Enabled1
Case
1
2
3
Probability
0.987
0.005
0.008
Definition
Predicate
MARK(CPUboards1 > 0) && MARK(NumComp) > 0
Function
MARK(CPUboards1) ;
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 83
Gate
Covered1
Definition
Function
if (MARK(CPUboards1) == 0)
MARK(NumComp)--;
Uncovered1
Function
MARK(CPUboards1) = 0;
MARK(NumComp)--;
Catastrophic1 Function
MARK(CPUboards1) = 0;
MARK(NumComp) = 0;
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 84
Reward Variables
Reward variables are a way of measuring performance- or dependability-related
characteristics about a model.
Examples:
Expected time until service
System availability
Number of misrouted packets in an interval of time
Processor utilization
Length of downtime
Operational cost
Module or system reliability
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 85
Reward Structures
Reward may be accumulated two different ways:
A model may be in a certain state or states for some period of time, for
example, CPU idle states. This is called a rate reward.
An activity may complete. This is called an impulse reward.
The reward variable is the sum of the rate reward and the impulse reward structures.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 86
R(m ) = 16 N
0
K
C (a ) =
0
m is a degraded-mode marking
otherwise
otherwise
By carefully integrating the reward structure from 0 to t, we get the profit at time t.
This is an example of an interval-of-time variable.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 87
Reward Variables
A reward variable is the sum of the impulse and rate reward structures over a
certain time.
Let [t, t + l] be the interval of time defined for a reward variable:
If l is 0, then the reward variable is called an instant-of-time reward variable.
If l > 0, then the reward variable is called an interval-of-time reward
variable.
If l > 0, then dividing an interval-of-time reward variable by l gives a timeaveraged interval-of-time reward variable.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 88
Interval-of-Time
Instant-of-Time
Time-Average Interval-of-Time
[t, t + l]
t
lim as t
goes to
infinity
[t, t + l]
[t, t + l] lim as l
[t, t + l]
[t, t + l]
[t, t + l]
lim as t
goes to
infinity
lim as l
goes to
infinity
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
lim as t goes to
goes to infinity
infinity
Slide 89
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 92
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 93
Model Composition
A composed model is a way of connecting different SANs together to form a larger
model.
Model composition has two operations:
Replicate: Combine 2 or more identical SANs and reward structures
together, holding certain places common among the replicas.
Join: Combine 2 or more different SANs and reward structures together,
combining certain places to permit communication.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 94
Replicate
submodel a
certain number of
times
Certain places in
different
submodels can be
made common
Hold certain
places common to
all replicas
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 95
Rationale
There are many good reasons for using composed models.
Building highly reliable systems usually involves redundancy. The
replicate operation models redundancy in a natural way.
Systems are usually built in a modular way. Replicates and Joins are
usually good for connecting together similar and different modules.
Tools can take advantage of something called the Strong Lumping Theorem
that allows a tool to generate a Markov process with a smaller state space
(to be described in Session 7).
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 96
(Note initial marking of NumComp is two since there will be two computers
in the composed model.)
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 98
Node
Rep1
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Reps
2
Common Places
NumComp
Slide 99
Composed Model
How does adding an additional computer affect reliability?
In the composed model, change number of replications to 3 and change
various reward variables - easy (Use a global variable if you think suspect
you may want to do this.)
In flat model, add another computer - hard
In composed model, the number of states in the underlying Markov chain is much
smaller, especially for large numbers of replications. (Details will be given in
Session 7.)
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 102
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 103
Session Outline
Review of Markov process theory and fundamentals
Methods for constructing state-level models from SANs
Analytic/numerical solution techniques
Transient solution
Standard uniformization (instant-of-time variables)
Adaptive uniformization (instant-of-time variables)
Interval-of-time uniformization (interval-of-time variables)
Steady-state solution (steady-state instant-of-time variables)
Direct solution
Iterative solution
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 104
Weaknesses of Simulation
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 105
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 106
-1
X:
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 107
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 108
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 110
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 111
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 112
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 113
Time
Continuous
Discrete
Analog signal
A to D converter
Computer
availability
model
round-based
network
protocol
model
State
Continuous
Discrete
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 114
Markov Process
A special type of random process that we will examine in detail is called the
Markov process. A Markov process can be informally defined as follows.
Given the state (value) of a Markov process X at time t (X(t)), the future
behavior of X can be described completely in terms of X(t).
Markov processes have the very useful property that their future behavior is
independent of past values.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 115
Markov Chains
A Markov chain is a Markov process with a discrete state space.
We will always make the assumption that a Markov chain has a state space in
{1,2, . . .} and that it is time-homogeneous.
A Markov chain is time-homogeneous if its future behavior does not depend on
what time it is, only on the current state (i.e., the current value).
We make this concrete by looking at a discrete-time Markov chain (hereafter
DTMC). A DTMC X has the following property:
P[ X (t + k ) = j X (t ) = i, X (t 1) = nt 1 , X (t 2 ) = nt 2 ,..., X (O ) = nO ]
= P[ X (t + k ) = j X (t ) = i ]
(1)
= Pij( k )
(2)
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 116
DTMCs
Notice that given i, j, and k, Pij( k ) is a number!
Pij( k ) can be interpreted as the probability that if X has value i, then after k time-steps,
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 117
Markov Chains
A Markov chain is a Markov process with a discrete state space.
We will always make the assumption that a Markov chain has a state space in
{1,2, . . .} and that it is time-homogeneous.
A Markov chain is time-homogeneous if its future behavior does not depend on
what time it is, only on the current state (i.e., the current value).
We make this concrete by looking at a discrete-time Markov chain (hereafter
DTMC). A DTMC X has the following property:
P[ X (t + k ) = j X (t ) = i, X (t 1) = nt 1 , X (t 2 ) = nt 2 ,..., X (O ) = nO ]
= P[ X (t + k ) = j X (t ) = i ]
(1)
= Pij( k )
(2)
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 118
DTMCs
Notice that given i, j, and k, Pij( k ) is a number!
Pij( k ) can be interpreted as the probability that if X has value i, then after k time-steps,
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 119
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 120
j (1) = P[ X (1) = j ]
= P[ X (1) = j X (0 ) = i ]P[ X (0 ) = i ]
i =1
n
= Pij i (0 )
i =1
n
= i (0 )Pij
i =1
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 121
p11
p1n
pn1
pnn
then pij = Pij, and (1) = (0)P, where (0) and (1) are row vectors, and (0)P is a
vector-matrix multiplication.
The important consequence of this is that we can easily specify a DTMC in terms of
an occupancy probability vector and a transition probability matrix P.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 122
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 123
A Simple Example
Suppose the weather at Urbana-Champaign, Illinois can be modeled the following
way:
If its sunny today, theres a 60% chance of being sunny tomorrow, a
30% chance of being cloudy, and a 10% chance of being rainy.
If its cloudy today, theres a 40% chance of being sunny tomorrow, a
45% chance of being cloudy, and a 15% chance of being rainy.
If its rainy today, theres a 15% chance of being sunny tomorrow, a 60%
chance of being cloudy, and a 25% chance of being rainy.
If its rainy on Friday, what is the forecast for Monday?
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 124
(0 ) = (0,0,1)
.6 .3 .1
P = .4 .45 .15
.15 .6 .25
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 125
.6 .3 .1
that is, 15% chance sunny, 60% chance cloudy, 25% chance rainy.
The weather on Sunday (2) is
.6 .3 .1
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 126
Solution, cont.
Alternatively, we could compute P3 since we found
(3) = (0)P3.
Working out solutions by hand can be tedious and error-prone, especially for
larger models (i.e., models with many states). Software packages are used
extensively for this sort of analysis.
Software packages compute (k) by (. . . (((0)P)P)P. . .)P rather than computing
Pk, since computing the latter results in a large fill-in.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 127
Graphical Representation
It is frequently useful to represent the DTMC as a directed graph. Nodes represent
states, and edges are labeled with probabilities. For example, our weather
prediction model would look like this:
.45
2
.15
.3
1 = Sunny Day
2 = Cloudy Day
3 = Rainy Day
.6
.4
.1
.6
.15
.25
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 128
Pcom
Pfi
Pfb
Pr
X=1
X=2
X=3
computer idle
computer working
computer failed
3
Pff
Pidle
P = Pcom
Pr
Parr
Pbusy
0
Pfi
Pfb
Pff
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 129
There are various ways to compute this. The simplest is to calculate (n) for
increasingly large n, and when (n + 1) (n), we can believe that (n) is a good
approximation to steady-state. This can be rather inefficient if n needs to be large.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 130
Classifications
It is much easier to solve for the steady-state behavior of some DTMCs than
others. To determine if a DTMC is easy to solve, we need to introduce some
definitions.
Definition: A state j is said to be accessible from state i if there exists an n 0 such
that Pij( n ) > 0. We write i j.
Note: recall that Pij( n ) = P[ X (n) = j X (0) = i ]
If one thinks of accessibility in terms of the graphical representation, a state j is
accessible from state i if there exists a path of non-zero edges (arcs) from node i to
node j.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 131
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 132
Periodicity
Consider the following DTMC:
1
Does lim (n ) exist? No!
1
(0 ) = (1,0 )
(i )
i =1
However, nlim
does exist; it is called the time-averaged steady-state
n
distribution, and is denoted by *.
Definition: A state i is said to be periodic with period d if Pij( n ) > 0 only when n is
some multiple of d. If d = 1, then i is said to be aperiodic.
A steady-state solution for an irreducible DTMC exists if all the states are aperiodic.
A time-averaged steady-state solution for an irreducible DTMC always exists.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 133
i =1
shown that this solution is unique. If the DTMC is periodic, then this solution
yields *.
One can understand the equation = P in two different ways.
In steady-state, the probability distribution (n + 1) = (n)P, and by
definition (n + 1) = (n) in steady-state.
Flow equations.
Flow equations require some visualization. Imagine a DTMC graph, where the
nodes are assigned the occupancy probability, or the probability that the DTMC has
the value of the node.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 134
...
...
Flow Equations
Let iPij be the probability mass that moves from state j to state i in one time-step.
Since probability must be conserved, the probability mass entering a state must
equal the probability mass leaving a state.
Prob. mass in = Prob. mass out
n
j =1
Pji = i Pij
j =1
= i Pij
j =1
= i
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 135
P[ X (t + ) = j X (t ) = i, X (t t1 ) = k1 , X (t t 2 ) = k 2 ,..., X (t t n ) = k n ]
= P[ X (t + ) = j X (t ) = i ] ,
= Pij ()
for all > 0, 0 < t1 < t 2 < ... < t n
A CTMC is completely described by the initial probability distribution (0) and the
transition probability matrix P(t) = [pij(t)]. Then we can compute (t) = (0)P(t).
The problem is that pij(t) is generally very difficult to compute.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 136
CTMC Properties
This definition of a CTMC is not very useful until we understand some of the
properties.
First, notice that pij() is independent of how long the CTMC has previously been in
state i, that is,
P[ X (t + ) = j X (u ) = i for u [0, t ]]
= P[ X (t + ) = j X (t ) = i ]
= pij ()
There is only one random variable that has this property: the exponential random
variable. This indicates that CTMCs have something to do with exponential
random variables. First, we examine the exponential r.v. in some detail.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 137
0
1e-t
t0
t>0 .
0
e-t
t0
t>0
d
Fx (t );
dt
The exponential random variable is the only random variable that is memoryless.
To see this, let X be an exponential random variable representing the time that an
event occurs (e.g., a fault arrival).
We will show that P[ X > t + s X > s ] = P[ X > t ] .
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 138
Memoryless Property
Proof of the memoryless property:
P[ X > t + s X > s ] =
P[ X > t + s , X > s ]
P[ X > s ]
P[ X > t + s ]
P[ X > s ]
1 FX (t + s )
=
1 FX ( s )
=
e (t + s )
= s
e
e t e s
= s
e
= e t
= P[ X > t ]
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 139
Event Rate
The fact that the exponential random variable has the memoryless property
indicates that the rate at which events occur is constant, i.e., it does not change
over time.
Often, the event associated with a random variable X is a failure, so the event rate
is often called the failure rate or the hazard rate.
The event rate of X is defined as the probability that the event associated with X
occurs within the small interval [t, t + t], given that the event has not occurred by
time t, per the interval size t:
P[t < X t + t X > t ]
.
t
This can be thought of as looking at X at time t, observing that the event has not
occurred, and measuring the number of events (probability of the event) that occur
per unit of time at time t.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 140
Observe that:
FX (t + t ) FX (t )
1
t
1 FX (t )
f X (t )
1 FX (t )
in general.
f X (t )
e t
e t
=
= t = .
t
1 FX (t ) 1 (1 e ) e
This is why we often say a random variable X is exponential with rate .
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 141
Slide 142
P[ A < B ] = P[ A < B A = x ] P[ A = x ] dx
0
= P[ A < B A = x ] f A ( x )dx
= P[ A < B A = x ] e x dx
= P[x < B ] e x dx
0
= (1 P[B x ]) e x dx
0
= (1 1 e x ) e x dx
= e x e x dx
0
= e ( + ) x dx =
0
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 143
X(0) = 1
P[X(0) = 1] = 1
Imagine a random process X with state space S = {1,2,3}. X(0) = 1. X goes to state
2 (takes on a value of 2) with an exponentially distributed time with parameter .
Independently, X goes to state 3 with an exponentially distributed time with
parameter . These state transitions are like competing random variables.
We say that from state 1, X goes to state 2 with rate and to state 3 with rate .
X remains in state 1 for an exponentially distributed time with rate + . This is
1
called the holding time in state 1. Thus, the expected holding time in state 1 is + .
The probability that X goes to state 2 is
Slide 144
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 145
State-Transition-Rate Matrix
A CTMC can be completely described by an initial distribution (0) and a statetransition-rate matrix. A state-transition-rate matrix Q = [qij] is defined as follows:
qij =
i j,
i = j.
Example: A computer is idle, working, or failed. When the computer is idle, jobs
arrive with rate , and they are completed with rate . When the computer is
working, it fails with rate w, and with rate i when it is idle.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 146
1
i
2
w
3
Let X = 1 represent the system is idle, X = 2 the system is working, and X = 3 a
failure.
i
( + i )
Q=
( + w ) w
0
0
0
If the computer is repaired with rate , the new CTMC looks like
i
( + i )
2
1
Q=
( + w ) w
3
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 147
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 148
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 149
CPUboards1
3
2
0
0
3
3
3
1
2
2
2
0
0
0
3
1
1
1
2
0
0
1
CPUboards2
3
3
3
3
2
0
0
3
2
0
0
2
0
2
1
2
0
0
1
1
1
1
NumComp
2
2
1
0
2
1
0
2
2
1
0
1
0
0
2
2
1
0
2
1
0
2
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
(19,p1),(20,p2),(21,p3),(6,(p1+p2) ),(7,p3)
(12,(p1+p2) ),(14,p3),(22,p1),(17,p2),(18,p3)
(13, )
(22,p1),(20,p2),(21,p3),(10,(p1+p2),(11,p3)
(13, )
(20,(p1+p2) ),(21,p3),(17,(p1+p2) ),(18,p3)
Slide 150
10
12
14
15
20
19
11
13
16
17
18
22
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
21
Slide 151
A state in the reduced base model is composed of a state tree and an impulse
reward.
During reduced base model construction, the use of state trees permits an
algorithm to automatically determine valid lumpings based on symmetries in the
composed model.
The reduced base model is constructed by finding all possible (state tree,
impulse reward) combinations and computing the transition rates between states.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 152
Composed Model
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
computer
Slide 153
(state 1)
computer (CPUboards = 3)
covered
catastrophic
uncovered
R (NumComp = 2)
computer
(CPUboards = 3)
computer
(CPUboards = 2)
(state 2)
R (NumComp = 1)
computer
(CPUboards = 3)
computer
(CPUboards = 0)
(state 3)
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
R (NumComp = 0)
computer
(CPUboards = 3)
computer
(CPUboards = 0)
(state 4)
Slide 154
3
4
6
8
10
11
12
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
13
Slide 155
Place comments, as
specified by edit
comments, in file.
State-space generation must be done before all analytic/numerical solutions are done.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 156
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 157
d
P (t ) = QP(t ) = P (t )Q, where Q is the state transition rate matrix of
dt
the Markov chain.
Solving this differential equation in some form is difficult but necessary to compute
a transient solution.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 158
Qt
(
)
Qt
.
Matrix exponentiation: P(t) = eQt, where e = I +
n!
n =1
Qt by performing
Matrix
exponentiation
has
some
potential.
Directly
computing
e
(Qt ) n
I +
can be expensive and prone to instability.
n!
n =1
22
nn
* See C. Moler and C. Van Loan, Nineteen Dubious Ways to Compute the Exponential of a Matrix, SIAM
Review, vol. 20, no. 4, pp. 801-836, October 1978.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 159
Standard Uniformization
Starting with CTMC state transition rate matrix (Q) construct
1. Poisson process : rate , q(i ,i )
2. DTMC : P = I +
Probability of k transitions
in time t
Then :
k
(
t ) t k
e P .
(t ) = (0 )
k =0
k!
In actual computation :
Ns
(t ) =
with (k + 1) = (k )P.
k =0
(t )k e t (k ),
k!
Slide 160
(t )k
k =0
k!
e t
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 161
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 163
(solves for expected values of interval-of-time and time-averaged intervalof-time variables on intervals [t0, t1] when both t0 and t1 are finite)
Number of digits of
accuracy in the
solution. Solution
reported is a lower
bound.
Volume of
intermediate results
reported. 1 gives the
greatest volume, greater
numbers less.
Series of time
intervals for which
solution is desired.
Intervals are
separated by spaces.
Each interval can be
specified as t1:t2.
Slide 167
i = 1.
i =1
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 170
Cons: Solution complexity is O(n3), so does not scale well to large models;
memory requirements are high due to fill-in and are not known a
priori.
Recommendation: Use for small CTMCs (tens of states) or medium-sized and stiff
CTMCs (hundreds to a few thousands), or when high accuracy is required.
Reminder: High accuracy in solution does not mean high accuracy in prediction.
Use accuracy to do relative comparisons.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 171
Slide 172
Slide 174
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 175
Gauss-Seidel
One of the most widely used stationary iterative methods is called Gauss-Seidel.
The algorithm appears as follows:
for k = 1 to convergence
for i = 1 to n
( k +1)
1 i 1 ( k +1)
= j q ji +
qii j =1
j q ji
j = i +1
(k )
end for
end for
An intuitive explanation for this algorithm:
( k +1)
qii i
i 1
= j
j =1
( k +1)
q ji +
(k )
j q ji
j = i +1
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 178
SOR
There is an extension to Gauss-Seidel called successive over-relaxation, or SOR,
that sometimes gives better performance.
Let xi = xi( k +1) xi( k ) , where x ( k ) and x ( k +1) are the kth and (k + 1)th Gauss - Seidel
iterate. The (k + 1)th SOR iterate, ~
x ( k +1) , is computed as
i
~
xi( k +1) = xi( k ) + x ,
where 0 < < 2.
Choosing is a hard problem in general. Automatic techniques for choosing
exist but are not implemented in Mbius.
Note: = 1 is the same as Gauss-Seidel.
Recommendation: Leave = 1 unless you are solving a similar system many times
and the matrix is stiff.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 179
Stopping criterion,
expressed as 10-x, where x is
given. The criterion used is
the infinity difference norm.
SOR weight factor.
Values < 1 guarantee
convergence, but slow it.
Values >= 1 speed
convergence, but may not
converge.
Maximum number of
iterations allowed.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 180
Model Class
All activities
exponential
Exponential and
Deterministic
activities
Instant-of-timeb
Mean,
Variance, and
Distribution
Applicable
Analytic
Solver
dss and iss
ars
diss and
adiss
if only rate rewards are used, the time-averaged interval-of-time steady-state measure is
identical to the instant-of-time steady-state measure (if both exist).
b provided the instant-of-time steady-state distribution is well-defined. Otherwise, the timeaveraged interval-of-time steady-state variable is computed and only results for rate
rewards should be derived.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 183
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 184
Problem Origin
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 186
Problem Description
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 187
..
..
41 RAMs
41 RAMs
41 RAMs
2 int. ch.
2 int. ch.
2 int. ch.
memory module
memory module
2 ch.
memory module
errorhandlers
interface bus
..
..
..
..
..
6 CPU
chips
6 CPU
chips
6 CPU
chips
6 I/O
chips
6 I/O
chips
CPU module
CPU module
CPU module
I/O port
I/O port
...
computer
computer
computer
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 188
Definition of Operational
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 189
Coverage
This system could be modeled using combinatorial methods if we did not take
coverage into account. Coverage is the chance that the failure of a chip will not
cause the larger system to fail even if sufficient redundancy exists. I.e.,
coverage is the probability that the fault is contained.
The coverage probabilities are given in the following table:
Redundant Component
RAM Chip
Memory Module
CPU Unit
I/O Port
Computer
For example, if a RAM chip fails, there is a 0.2% chance the memory module
will fail even if sufficient redundancy exists. If the memory module fails, there
is a 5% chance the computer will fail. If a computer fails, there is a 5% chance
the system will fail.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 190
Each SAN models the behavior of the module in the event of a module
component failure.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 191
List of Places
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 192
List of Activities
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 193
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 194
Composed Model
Node
Join1
Node
Rep1
Reps
3
Rep2
Common Places
computer_failed
memory_failed
computer_failed
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Common Places
Subtree 1
computer_failed
memory_failed
cpus
errorhandlers
ioports
Slide 195
cpu_modules SAN
Place
cpus
ioports
errorhandlers
memory_failed
computer_failed
Marking
3
2
2
0
0
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 196
Enabling Predicate
(MARK(cpus) > 1) &&
(MARK(memory_failed) < 2) &&
(MARK(computer_failed) < 2)
Function
identity
Distribution
expon(0.0052596 * MARK(cpus))
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 197
Probability
module_cpu_failure
if (MARK(cpus) == 3)
return(0.995);
else
return(0.0);
if (MARK(cpus) == 3)
return(0.00475);
else
return(0.95);
if (MARK(cpus) == 3)
return (0.00025);
else
return(0.05);
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 198
OG3
Function
if (MARK(cpus) == 3)
MARK(cpus) ;
MARK(cpus) = 0;
MARK(ioports) = 0;
MARK(errorhandlers) = 0;
MARK(memory_failed) = 2;
MARK(computer_failed) ++;
MARK(cpus) = 0;
MARK(ioports) = 0;
MARK(errorhandlers) = 0;
MARK(memory_failed) = 2;
MARK(computer_failed) = 2;
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 199
Model Solution
The modeled two-computer system with non-perfect coverage at all levels (i.e., the
model as described), the state space contains 10,114 states. The 10 year mission
reliability was computed to be .995579.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 213
Impact of Coverage
Coverage can have a large impact on reliability and state-space size. Various
coverage schemes were evaluated with the following results.
Design description
100% coverage at all levels
Nonperfect coverage considered at all levels
Nonperfect coverage considered at all levels,
no spare memory module
Nonperfect coverage considered at all levels,
no spare CPU module
Nonperfect coverage considered at all levels,
no spare IO port
Nonperfect coverage considered at all levels,
no spare memory module, CPU module, or
IO port
100% coverage at all levels, no spare
memory module, CPU module, IO port, or
RAM chips
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
4278
10114
1335
Reliability
(10-year
mission time)
0.999539
0.995579
0.987646
3299
0.973325
3299
0.985419
511
0.935152
0.702240
State-space size
Slide 214
Solution by Simulation
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 216
Motivation
High-level formalisms (like SANs) make it easy to specify realistic systems, but
they also make it easy to specify systems that have unreasonably large state
spaces.
State-of-the-art tools (like Mobius) can handle state-level models with a few
tens of million states, but not more.
When state spaces become too large, discrete event simulation is often a viable
alternative.
Discrete-event simulation can be used to solve models with arbitrarily large state
spaces, as long as the desired measure is not based on a rare event.
When rare events are present, variance reduction techniques can sometimes be
used.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 218
Advantages of Simulation
Simulation can be applied to any SAN model. The most
prominent difference, compared with analytic solvers, is that
generally distributed activities can be used.
Simulation does not require the generation of a state space and
therefore does not require a finite state space. Therefore, much
more detailed models can be solved.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 219
Disadvantages of Simulation
Simulation only provides an estimate of the desired measure. An
approximate confidence interval is constructed that contains the actual
result with some user-specified probability.
Higher desired accuracy dramatically increases the necessary simulation
time. As a rule, to make the confidence interval n times narrower, the
simulation has to be run n2 times as long.
The rare event problem may arise. If simulation is used to estimate a
small probability, such as the reliability of a highly-reliable system,
extremely long simulations may have to be performed to encounter the
particular event often enough.
Complicated models can require long simulation times, even if the rare
event problem is not an issue. The simulators in Mbius perform the
necessary event scheduling very efficiently, but it should be realized that
simulation is not a panacea.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 220
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 221
Types of Simulation
Continuous-state simulation is applicable to systems where the notion of state is
continuous and typically involves solving (numerically) systems of differential
equations. Circuit-level simulators are an example of continuous-state simulation.
Discrete-event simulation is applicable to systems in which the state of the system
changes at discrete instants of time, with a finite number of changes occurring in
any finite interval of time.
Since we will focus on validating end-to-end systems, rather than circuits, we will
focus on discrete-event simulation.
There are two types of discrete-event simulation execution algorithms:
Fixed-time-stamp advance
Variable-time-stamp advance
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 222
e1
e2
e3
2t
3t
e4
e5
4t
e6
5t
Good for all models where most events happen at fixed increments of time (e.g.,
gate-level simulations).
Has the advantage that no future event list needs to be maintained.
Can be inefficient if events occur in a bursty manner, relative to time-step used.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 223
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 224
Slide 225
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 226
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 227
Slide 228
Slide 229
LCGs have been studied extensively; good choices of a, b, and m are known.
See, e.g., Law and Kelton (1991), Jain (1991).
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 230
Tausworthe Generators
As with LCGs, analysis has been done to determine good choices of the ci.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 231
Suppose you have a uniform [0,1] random variable, and you wish to have a
random variable X with CDF FX. How do we do this?
All other random variates can be generated from uniform [0,1] random variates.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 232
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 233
FX (a ) = 1 e a
1
X = FX1 (U ) = ln(1 U )
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 234
Convolution Technique
Technique can be used for all random variables X that can be expressed as the
sum of n random variables
X = Y1 + Y2 + Y3 + . . . + Yn
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 235
Composition Technique
Technique can be used when the distribution of a desired random variable can be
expressed as a weighted sum of other distributions.
F ( x ) = pi Fi ( x )
i =0
where pi 0,
p
i =0
= 1.
A variant of composition can also be used if the density function of the desired
random variable can be expressed as weighted sum of other density functions.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 236
Acceptance-Rejection Technique
Indirect method for generating random variates that should be used when other
methods fail or are inefficient.
Must find a function m(x) that majorizes the density function f(x) of the
desired distribution. m(x) majorizes f(x) if m(x) f(x) for all x.
Note:
c = m( x )dx
m( x )
is a density function.
c
If random variates for m(x) can be easily computed, then random variates for f(x)
can be found as follows:
1) Generate y with density m(x)
2) Generate u with uniform [0,1] distribution
but m( x ) =
3) If u
f ( y)
, return y, else goto 1.
m( y )
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 237
Useful for generating any discrete distribution, e.g., case probabilities in a SAN.
More efficient algorithms exist for special cases; we will review most general
case.
Suppose random variable has probability distribution p(0), p(1), p(2), . . . on
non-negative integers. Then a random variate for this random variable can be
generated using the inverse transform method:
1) Generate u with distribution uniform [0,1]
2) Return j satisfying
j 1
i =0
i =0
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 238
Make sure the uniform [0,1] generator that is used has a long enough period.
Modern simulators can consume random variates very quickly (multiple per
state change!).
Use separate random number streams for different activities in a model system.
Regular division of a single stream can cause unwanted correlation.
Consider multiple random variate generation techniques when generating nonuniform random variates. Different techniques have very different efficiencies.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 239
Common mistake is to run the basic simulation loop a single time, and presume
observations generated are the answer.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 240
Can be:
Instant-of-time, at a fixed t, or in steady-state
Interval-of-time, for fixed interval, or in steady-state
Time-averaged interval-of-time, for fixed interval, or in steady-state
Estimators on these measures include:
Mean
Variance
Interval - Probability that the measure lies in some interval [x,y]
Dont confuse with an interval-of-time measure.
Can be used to estimate density and distribution function.
Percentile - 100th percentile is the smallest value of estimator x such that
F(x) .
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 241
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 242
Slide 243
2
N
, where 2 = Var [ X ])
1 N
1 N 2
N
2
2
(
)
(
)
s =
x
=
x
n
n N 1
N 1 n =1
N 1 n =1
2
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 244
t N 1 (1 2 )s
t N 1 (1 2 )s
+
N
N
Where
t N 1 (1 2 ) is the 100(1 2 )th percentile of the student' s t distribution with
The interpretation of the equation is that with (1 - ) probability the real value
() lies within the given interval.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 245
Define
i =
1
N 1
2
2
(
)
x
n N 2 i
N 2 ni
Where
i =
1
xn
N 1 ni
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 246
Now define
1 N
Z i = Ns ( N 1) and Z = Z i , for i = 1,2,..., N
N i =1
(where s2 is the sample variance as defined for the mean)
2
2
i
And
1 N
2
(
)
s =
Z
Z
i
N 1 i =1
2
Z
Then
t N 1 (1 2 )s Z
t N 1 (1 2 )s Z
2
Z +
Z
N
N
is a (1 - ) confidence interval about 2.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 247
Such estimators are very important, since mean and variance are not enough to
plan from when simulating a single system.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 248
Slide 249
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 250
O11
O12
O21
O31
O32
O13
O14
O22
O23 O24
O33
O34
trajectory 1
trajectory 2
...
trajectory n
Compute
Mi
xi =
O
j =1
ij
th
M i as i observation, where Mi is the number of observations in
trajectory i.
xi are considered to be independent, and confidence intervals are generated.
Useful for a wide range of models/measures (the system need not be ergodic),
but slower than other methods, since transient phase must be repeated multiple
times.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 251
O31
...
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 252
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 253
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 254
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 255
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 256
Simulator Editor
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 261
Variable Name
Batch Number
Simulation Time
Time (CPU seconds)
Batch Mean
Mean
Variance
:
:
:
:
:
:
:
utilization
10
1.100000e + 04
41
8.467695e 01
8.447065e 01 + / 1.516121e 03
4.417886e 02 + / 5.035103e 04
:
:
:
:
:
:
:
:
utilization
2400
1.000000e + 02
1498
1.000000e + 00
8.466667e 01 + / 8.196275e 03
4.196934e 02
4.196934e 02 + / 2.588252e 03
Variable Name
Replication Number
Simulation Time
Time (CPU seconds)
Current Value
Mean
Sample Variance
Variance
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 262
Simulation Characteristics
Steady-state Instant-of-time or
Mean,
Variable
Applicable
or Transient Interval-of-time Variance, or
Simulator
Distribution
Transient
Instant-of-time
Mean,
Reward Variable tsim and itsim
and
Variance,
Activity
tsim
Interval-of-time
and
Variable
Distribution
Steady-state Instant-of-time
Mean,
Reward Variable
ssim
Variance,
and Activity
and
Variable
Distribution
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 263
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 265
Motivation
State-space (SS) explosion or largeness problem in
discrete-state systems
Costly generation and representation of SS (space and
time)
Costly representation of CTMC (space)
Costly representation of solution vector (space) and
costly iteration/solution time (time)
Typical solutions:
Largeness avoidance, e.g., using lumping techniques
CTMC level
Model level
Largeness tolerance using BDD, MDD, MTBDD,
Kronecker, or Matrix Diagrams (MD)
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 266
What Is New?
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 267
SV1
M1
SV1
M2
Join
M1
M2
M1
M1
Rep (3)
M1
M1
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 268
Introduction to MDD
Represents function
where
0 1 2
0 1
0 1
0 1 2
0
0 1
0 1 2
Slide 269
Introduction to MDD
Represents function
where
0 1 2
0 1
0 1
0 1 2
0
0 1
0 1 2
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 270
Introduction to MDD
Represents function
where
0
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
12
0 1 2
0
0 1
0
0 1
0
0 1 2
0
0 1
0
0 1 2
1
Rep2 (N)
Join
Rep1 (M)
cpu
error handler
IO port
memory
MDD level
assignment
Rep2
Join
Rep1
mem
3 outer replicate
mem
2+M
inner replicate
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 272
Algorithm Overview
1.
2.
3.
4.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 273
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 274
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 275
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 276
Lumping
Rep
AM
AM
AM
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
1
1
Slide 277
Lumping
Rep
x
1
AM
AM
AM
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 278
Lumping
Rep
x
1
AM
AM
AM
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 279
Lumping
Rep
x
1
AM
AM
AM
Slide 280
Lumping
where
min(v) =v
may become huge break up
MDDs
and therefore
Slide 281
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 282
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 283
wrong
correct
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 284
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 285
Fairly fast iteration: less than 6 times slower than lumped sparse matrix
Solving larger CTMCs
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 286
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 287
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 288
Nation-states,
Terrorists,
Multinationals
Economic intelligence
Information terrorism
Military spying
Disciplined strategic
cyber attack
Selling secrets
Civil disobedience
Serious hackers
Harassment
Embarrassing organizations
Stealing credit cards
Collecting trophies
Script kiddies
HIGH
INNOVATION
PLANNING
STEALTH
COORDINATION
Copy-cat attacks
Curiosity
Thrill-seeking
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
LOW
Slide 289
Trusted Computing
Base
Boundary
Controllers
Firewalls
Intrusion
Detection
Systems
VPNs
PKI
Tolerate Attacks
Intrusion
Tolerance
Graceful
Degradation
Hardened
Operating
System
Slide 290
Slide 291
CONTEXT: Create robust software and hardware that are faulttolerant, attack resilient, and easily adaptable to changes in
functionality and performance over time.
GOAL: Create an underlying scientific foundation,
methodologies, and tools that will:
Enable clear and concise specifications,
Quantify the effectiveness of novel solutions,
Test and evaluate systems in an objective manner, and
Predict system assurance with confidence.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 292
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 293
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 294
JBI Design
Overview
Quad 1
JBI Core
Quad 2
Quad 3
Quad 4
Executive
Zone
Operations
Zone
Crumple
Zone
Network
Access Proxy
Protection
Domains
Isolation among
selected functions on
individual core hosts
and on clients
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Domain1
Domain2
Forward/
Ratelimit
PS
TCP
Domain3
Domain4
Domain5
Proxy Logic
Inspect / Forward / Rate Limit
Sensor
Rpts
DC
PSQImpl
Eascii
IIOP
RMI
IIOP
UDP
TCP
TCP
STCP
PSQImpl
Slide 295
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 296
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 297
Requirement
Decomposition
Model for
Access Proxy
Model for
Client
AA1
M1
M2
AA2
AA3
M4
M3
Model for
PSQ Server
AP1
AP2
M5
M6
(Network Domains)
L1
L2
L3
Functional Model
of the System
(Probabilistic or
Logical)
Assumptions
Supporting Logical
Arguments and
Experimentation
(ADF)
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 298
1. A precise statement of
the requirements
Model for
Access Proxy
Model for
Client
AA1
M1
M2
AA2
AA3
M4
M3
Model for
PSQ Server
AP1
AP2
M5
M6
(Network Domains)
L1
L2
2. High-level functional
model description:
a) Data and alerts
flows for the
processes related
to the
requirements,
b) Assumed attacks
and attack effects
[Threat/vulnerability analysis;
whiteboarding]
L3
(ADF)
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 299
Model for
Access Proxy
Model for
Client
AA1
M1
M2
AA2
AA3
M4
M3
Model for
PSQ Server
AP1
AP2
M5
M6
(Network Domains)
L1
L2
3. Detailed descriptions
of model component
behaviors representing
2a and 2b, along with
statements of
underlying
assumptions made for
each component.
[Probabilistic modeling
or logical
argumentation,
depending on
requirement]
L3
(ADF)
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 300
Model for
Access Proxy
Model for
Client
AA1
M1
M2
AA2
AA3
M4
M3
Model for
PSQ Server
AP1
AP2
M5
M6
(Network Domains)
L1
L2
L3
(ADF)
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
4. Construct executable
functional model
[Probabilistic
modeling, if model
constructed in 3 is
probabilistic]
In Parallel
5. a) Verification of the
modeling assumptions
of Step 3 [Logical
argumentation] and,
b) where possible,
justification of model
parameter values
chosen in Step 4.
[Experimentation]
Slide 301
Model for
Access Proxy
Model for
Client
AA1
M1
M2
AA2
AA3
M4
M3
Model for
PSQ Server
AP1
AP2
M5
M6
(Network Domains)
L1
L2
L3
(ADF)
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 302
?
S
Model for
Access Proxy
Model for
Client
AA1
M1
M2
AA2
AP1
AA3
M4
M3
Model for
PSQ Server
M5
(Network Domains)
L1
L2
L3
7. Comparison of results
obtained in Step 6,
noting in particular
the configurations
and parameter values
for which the
requirements of Step
1 are satisfied.
AP2
M6
(ADF)
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 303
JBI critical
functionality
Initialized JBI
provides
essential services
Authorized
publish
processed
successfully
Authorized
subscribe
processed
successfully
JBI mission
Detection / Correlation
Requirements
JBI properly
initialized
Authorized
query
processed
successfully
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
IDS objectives
Authorized
join/leave
processed
successfully
Unauthorized
activity
properly
rejected
Confidential
info not
exposed
Slide 304
PIP requirements 1 4
JBI survivability
requirements
Executable model
Model assumptions
Supporting arguments
Authorized subscribe is
processed successfully
Authorized publish is
processed successfully
Dataflow
Timeliness
Integrity
Authorized query is
processed
successfully
Authorized join/leave
is processed
successfully
IDS / Correlation
requirements
JBI is properly
initialized
Unauthorized activity
is properly rejected
Confidential info is
not exposed
Confidentiality
(from functional
model execution)
IO Confidentiality
(end-to-end)
Confidentiality of
Application-layer
Messages
Functional model
faithful to design
IO Confidentiality
in Transit
Functional Model
Assumptions Hold
IO Confidentiality
in Storage
IConfidentiality of
Network
Communications
Component Model
Assumptions Hold
No
Compromise
or Failure of
QIS
QA1: QIS
Incorruptibility
Hard-wired
Configuration
Attack Model
Parameter
Selection
CERT
Vulnerability
DB Analysis
DoS Causes
Processing
Delays
DoS Does
Not Corrupt
Other
Components
DoS Attacks
Do Not
Propagate from
Clients to Core
Type Enforcement
Hardened Kernel
Solaris
SA3: IO
Authenticity
Initial Targets
of
Infrastructure
Attacks
Attacks
Originate
Outside the
Platform
SELinux
No Data
Attacks
Outside the
Platform
Correctness of
Rate Control
Mechanisms
Variation over
Anticipated
Ranges
Isolation of
Intruded
Process
Domains
Platform Mechanisms
Targets for
Loss of IO
Confidentiality
Physically
Protected
Electrically
Isolated
Infrastructure
Attack
Propagation
SA1: IO
Integrity in
PSQ Server
PA2: Alternate
Path
Availability
SA2: Client
Confidentiality
in PSQ Server
AA2: AP
Applicationlayer Integrity
AA3: AP
Application-layer
Confidentiality
DA1: DC
Communications
QA3: QIS
Input
Integrity
QA4: QIS
Function
Correctness
Physical
Integrity
Correctness of
Reattachment
Protocol
Correctness of
Registration
Protocol
MA1: SM Byzantine
Agreement
AA1: AP
Function
Correctness
SeA1: Sensor
False Alarm
Rate
SeA2: Sensor
Detection Delay
SeA3: Sensor
Detection
Probability
Correctness of Modified
ITUA Protocols
Connectivity
PA1: ClientCore
Communication
I&C
Data Attack
Propagation
QA2: QIS
Communication
Cutoff
Electrical
Integrity
Gate
Configuration and
Truth Table
Proxy Protocol
Configuration
CoA1:
Corrleator
False Alarm
Rate
Can Identify
Malformed Traffic
IDS Experimental
Evaluation
Process Domain
Policies
Windows
Design
Faithfully
Implemented
Absence of
Insider Threat
PsA1: ADF
Policy Server
Input
Correctness
System Connectivity
PsA2: ADF
Policy Server
Synchronization
No Cryptography
in Access Proxy
IKENA StormWatch
No Tunneling Attacks
Restricted Routing
Not
Preconfigured
Correctness of
Certificate
Exchange
Network Topology
Correctness of
Managed Switch
ADF Agent
Initialization
ADF Host
Independence
ADF Protocol
Correctness
ADF Agent
Correctness
Policy Server
Integrity
VPG Integrity
VPG
Confidentiality
ADF Policy
Correctness
DoD Common
Access Card (CAC)
PKCS #11
ADF NIC
services
protected
No Unauthorized
Indirect Access
No Unauthorized
Direct Access
ADF Correctness
Keys Protected
from Theft
ADF NIC Physical
Security
Not
Reconfigurable
Private Key
Confidentiality
Physical Topology
Algorithmic
Framework
Key Length
Physical Protection
of CAC device
Protection of CAC
Authentication Data
No Compromise of
Authorized Process
Accessing CAC
Key Lifetime
Tamperproof
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 305
Steps 4-5
Access proxy verifies if
the client is in valid
session by sending the
session key
accompanying the IO to
the Downstream
Controller for verification
Step 6
Access Proxy forwards
the IO to the PSQ
Server in its quadrant.
....
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 306
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 307
OS vulnerability
Non-JBI-specific application-level vulnerability
pcommon : common-mode failure
Data-Level Vulnerabilities attacks in breadth
Slide 308
Compromise
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 309
Restart Processes
Secure Reboot
Permanent Isolation
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 310
AP Hb
Se
AP
Alert
Se
Ac
ADF NIC
DC
ADF NIC
Outside
DC, Quad 1, OS 1
AP IO
ADF NIC
T=85 min.:
discovery of a
vulnerability on
the Main PD,
OS1
PS
all quad
components
Quadrant
1
LC
Ac
Outside
LC
Guardian, Quad 1, OS 1
PSQ
Gu
Ac
LC
ADF NIC
Se
Ac
LC
SM, Quad 1, OS 1
ADF NIC
SD
Se
ADF NIC
Ac
ADF NIC
Se
Publishing Client, OS1
SM
Correlator, Quad 1, OS 1
Crumple Zone
Co
ADF NIC
LC
Operations Zone
Executive Zone
SM, Quad 1, OS 2
SM, Quad 1, OS 3
ADF NIC
ADF NIC
ADF NIC
ADF NIC
ADF NIC
ADF NIC
ADF NIC
ADF NIC
ADF NIC
SM, Quad 1, OS 4
SM
SM
SM
Slide 311
Assumptions
4.4.3 Assumptions
AA1: Only well-formed traffic is forwarded by a correct access proxy.
AA2: The access proxy cannot access cryptographic keys used to sign messages that pass through it.
AA3: Access proxy cannot access the contents on an IO if application-level end-to-end encryption is being used.
AA4: Attacks on an access proxy can only be launched from compromised clients, or from corrupted core elements that interact
with the access proxy during the normal course of a mission.
.
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 312
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 313
Authorized publish
processed successfully
Authorized subscribe
processed successfully
Authorized query
processed
successfully
Authorized join/leave
processed
successfully
Unauthorized activity
properly rejected
Confidential info
not exposed
IDS objectives
Dataflow
Timeliness
Integrity
Confidentiality
(from functional
model execution)
IO Confidentiality
(end-to-end)
Notification
Confidentiality
Functional model
assumptions hold
Functional model
faithful to design
CA1: Origin of
Attacks on
Clients
CA2: Attack
Propagation
from Clients
AA4: Origin of
Attacks on
Access Proxy
AA5: Attacks
from AP
DA1: DC
Communications
DA2: Origin of
Attacks on DC
GA2: Attacks
from Guardian
SA1: Origin of
Attacks on
PSQ Server
SA2: Attacks
from PSQ
Server
SeA1: Attacks
from IDS Sensor
AcA2: Attacks
from Actuator
LA2: Attacks
from Local
Controller
CoA2: Origin
of Attacks on
Correlator
CoA3: Attacks
from Correlator
MA2: Origin
of Attacks on
SM
MA3: Attacks
from SM
PA1: ClientCore
Communication
I&C
PsA1: ADF
Policy Server
Input
Correctness
PsA1: ADF
Policy Server
Synchronization
AA1: AP
Function
Correctness
Bidirectional
Flow Control
AA8: DoS
Prevention by
Access Proxy
Correctness of
Flow Control
Mechanisms
QA1: QIS
Incorruptibility
Hard-wired
Configuration
QA2: QIS
Communication
Cutoff
Electrically
Isolated
Physically
Protected
QA3: QIS
Input
Integrity
Connectivity Physical
Integrity
QA4: QIS
Function
Correctness
Electrical
Integrity
Gate
Configuration and
Truth Table
Correctness of
Registration
Protocol
Proxy Protocol
Configuration
Restricted Routing
SA4: Client
Confidentiality
in PSQ Server
AA2: AP
Applicationlayer Integrity
AA3: AP
Application-layer
Confidentiality
Correctness of
Reattachment
Protocol
Correctness of
Certificate
Exchange
System Connectivity
Network Topology
SA3: IO
Integrity in
PSQ Server
PA2: Alternate
Path
Availability
SA5: IO
Authenticity
No Cryptography
in Access Proxy
Private Key
Confidentiality
MA1: SM Byzantine
Agreement
CA3: Client
Process
Corruption
AA7: AP
Process
Corruption
DA3:
Process
Corruption
on DC
GA1: Process
Corruption on
Guardian
SA7: Process
Corruption in
PSQ Server
SeA5:
Process
Corruption in
Sensor
Not
Reconfigurable
ScA1: Process
Corruption in
Subscribed
Client
Correctness of Modified
ITUA Protocols
SeA2: Sensor
False Alarm
Rate
SeA3: Sensor
Detection Delay
SeA4: Sensor
Detection
Probability
CoA1:
Corrleator
False Alarm
Rate
CoA4: Alert
Integrity
IDS Experimental
Evaluation
ADF NIC
services
protected
Platform Mechanisms
No Unauthorized
Direct Access
LA1: Process
Corruption in
Local
Controller
Process Isolation
Not
Preconfigured
Can Identify
Malformed Traffic
No Tunneling Attacks
AcA1: Process
Corruption in
Actuator
Component-specific
policy
No Unauthorized
Indirect Access
SELinux
Trusted Solaris
Windows 2000
Physical Topology
Keys Protected
from Theft
ADF Correctness
DoD Common
Access Card (CAC)
ADF NIC Physical
Security
ADF Agent
Initialization
ADF Protocol
Correctness
Policy Server
Integrity
PKCS #11
ADF Key Initialization
ADF Host
Independence
ADF Agent
Correctness
VPG Integrity
Algorithmic
Framework
Key Length
No Compromise of
Authorized Process
Accessing CAC
Type Enforcement
Hardened Kernel
Hardened Kernel
Kernel Loadable
Wrappers
Key Lifetime
ADF Policy
Correctness
Tamperproof
VPG
Confidentiality
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 314
Functional
Model
Model
Assumptions
SA3: IO
Integrity in
PSQ Server
SA4: Client
Confidentiality
in PSQ Server
No Unauthorized
Direct Access
DoD Common
Access Card
(CAC)
PKCS #11
Compliance
AA2: AP
Applicationlayer Integrity
Private Key
Confidentiality
Supporting
Arguments
Keys Protected
from Theft
Access Proxy
Model
No Cryptography
in Access Proxy
No Unauthorized
Indirect Access
Keys Not
Guessable
Algorithmic
Framework
Key
Length
AA3: AP
Application-layer
Confidentiality
Not
Preconfigured
Physical
Protection of
CAC device
Not
Reconfigurable
Protection of
CAC
Authentication
Data
ADF NIC
services
protected
No Compromise
of Authorized
Process
Accessing CAC
Key
Lifetime
Tamperproof
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 315
Slide 316
4 OS, 4 pd,
3 OS, 3 pd,
AP OS<>core AP OS<>core
4 OS total
4 p.d
1.00
0.90
0.94
2 p.d
1 p.d
0 p.d
0.89
0.85
3 p.d
2 p.d
1 p.d
0 p.d
2 p.d
1 p.d
1 OS total
0 p.d
1 p.d
0 p.d
0.93
0.84
0.83
0.80
3 p.d
2 OS total
3 OS total
0.87
0.83
0.82
0.84
0.84
0.81
0.78
0.76
0.75
0.72
0.70
0.70
0.81
0.78
0.76
0.76
0.72
0.70
0.66
0.64
0.63
0.61
0.76
0.71
0.59
0.60
0.57
0.52
0.50
0.40
0.30
0.20
0.10
0.00
1.1
6.1
Experiment
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
10
11
12
13
14
Slide 318
1
0.98
0.96
120
0.94
0.92
0.9
0.88
Per component
No restriction
100
80
60
40
0.86
0.84
Per component
0.82
No restriction
20
0
0.8
100
1000
100
1000
MTTD
MTTD (min)
Per-pd policies considerably increase the performance (10% unavailability vs. 1.5%
at MTTD=100 minutes)
ADF NICs can handle per-port policies => should take advantage of this feature,
implying to set the communication ports in advance
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 319
Model
Assumptions
AA2: AP
Applicationlayer Integrity
PKCS #11
Compliance
Step 2: If R is
logically
decomposable,
decompose it
iteratively.
Logical
Decomposition
AA3: AP
Application-layer
Confidentiality
No
Unauthorized
Indirect
Access
No Unauthorized
Direct Access
DoD
(CAC)
Sub-requirements
Key
Leng
th
Tamperproof
Key
Lifeti
me
No
Cryptography in
AP
Not
Preconfig
ured
Physical
Protection of
CAC device
Decomposable?
No
Not
Reconfigu
rable
Protection of
CAC
Authenticati
on Data
Step 3: For
every atomic
requirement Ra
Logical
Argumentation
Attack Tree
Gate 1
Gate 2
Gate 3
Step 4:
Detailed
description of
components
Gate 4
Compromise client
Escalate privileges
Read data
Get in middle of
client/core traffic
Ev ent 1
Ev ent 2
Ev ent 3
Gate 5
Gate 6
Gate 7
Ev ent 1
Steal key/certificate
Sniff packets
Ev ent 4
Ev ent 5
Ev ent 6
Ev ent 7
Ev ent 8
Ev ent 1
Ev ent 6
Attack Graph
Yes
Read from AP
Gate 12.1
Re-route traffic at
both ends
Read data
Steal key/certificate
Compromise AP
Escalate privileges
Read IO as it passes
through
Gate 8
Gate 9
Ev ent 7
Ev ent 13
Ev ent 14
Ev ent 15
Ev ent 9
Perform ARP
spoofing
Modify network
routing
Steal key/certificate
Ev ent 10
Ev ent 11
Ev ent 7
Ev ent 12
Automatic
construction
Data Flow
ADF NIC
services
protected
Keys Not
Guessable
Alg.
Framew
ork
Yes
Quantitative?
Private Key
Confidentiality
Supporting
Arguments
Keys
Protected
from Theft
Requirement
Access Proxy
Model
Functional
Model
Step 1:
Formulate a
precise
statement of
R.
Not
valid
No Compromise of
Authorized Process
Accessing CAC
Step 5: Justify
the modeling
assumptions of
Step 4
Step 6:
Construct a
simulation
model
Verify assumptions
& parameter values
Probabilistic measures
Infrastructure-level attacks
Step 7:
Evaluation and
comparing
System not valid
Core
Compare with
requirement
Quad 3
Quad 4
Operations
Zone
Crumple
Zone
Client Zone
Network
Access Proxy
Local Controller
Domain1
Domain2
Forward/Rate limit
PS
TCP
2005 William H.
Sanders.
rights
Do not
IN
F O All
RM
Areserved.
TION
T duplicate
R U S Twithout
I Npermission
S T I T of
U the
T Eauthor.
University of Illinois at Urbana-Champaign
Sensor Rpts
DC
PSQImpl
PSQImpl
Eascii
IIOP
RMI
IIOP
UDP
TCP
TCP
TCP
www.iti.uiuc.edu
Slide 320
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 321
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 322
In general:
Use tricks from probability theory to reduce complexity of model
Choose the right solution method
Simulation:
Result is just an estimator based on a statistical experiment
Estimation of accuracy of estimate essential
Use confidence Intervals!
Analytic/Numerical model solution:
Avoid state space explosion
Limit model complexity
Use structure of model (symmetries) to reduce state space size
Understand accuracy/limitations of chose numerical method
Transient Solution
(Iterative or Direct) Steady-state solution
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 323
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 324
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 325
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 326
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 327
Next Steps
You have:
Learned theory related to reliability, availability, and
performance validation using SANs and Mbius
Learned about the advantages and disadvantages of various
(analytical/numerical and simulation-based) solution
algorithms.
There are many places to go for further information:
Mbius Software Web pages
(www.mobius.uiuc.edu)
Performability Engineering Research Group Web pages
(www.perform.csl.uiuc.edu)
2005 William H. Sanders. All rights reserved. Do not duplicate without permission of the author.
Slide 328