2
Why Probability Theory?
Information is exchanged in a computer network in a
random way, and events that modify the behavior of links
and nodes in the network are also random
We need a way to reason in quantitative ways about the
likelihood of events in a network, and to predict the
behavior of network components.
Example 1:
Measure the time between two packet arrivals into the
cable of a local area network.
Determine how likely it is that the interarrival time
between any two packets is less than T sec.
3
Probability Theory
A mathematical model used to quantify the
likelihood of events taking place in an
experiment in which events are random.
It consists of:
A sample space: The set of all possible
outcomes of a random experiment.
The set of events: Subsets of the sample
space.
The probability measure: Defined according to a
probability law for all the events of the
sample space.
4
Random Experiment
A random experiment is specified by stating
an experimental procedure and a set of
measurements and observations.
Example 2:
We count the number of packets received
correctly at a base station in a wireless LAN
during a period of time of T sec.
We want to know how likely it is that the
next packet received after T sec is correct.
5
Our Modeling Problem ...
We want to reason in quantitative terms
about the likelihood of events that can be
observed.
This requires:
Describing all the events in which we are
interested for the experiment.
Combining events into sets of events that are
interesting (e.g., all packets arriving with 5 sec.
latency)
Assigning a number to each of those events
reflecting the likelihood that they occur.
6
Probability Law
Probability of an event: A number
assigned to the event reflecting the
likelihood with which it can occur or has
been observed to occur.
Probability Law: A rule that assigns
probabilities to events in a way that reflects
our intuition of how likely the events are.
What we need then is a formal way of assigning
these numbers to events and any combination of
events that makes sense in our experiments!
7
Probability Law
Let E be a random experiment (r.e.)
Let A be an event in E
The probability of A is denoted by P(A)
A probability law for E is a rule that assigns P(A)
to A in a way that the following conditions, taken
from our daily experience, are satisfied:
A may or may not take place; it has some
likelihood (which may be 0 if it never occurs).
Something must occur in our experiment.
If one event negates another, then the likelihood
that either occurs is the likelihood that one
occurs plus the likelihood that the other occurs.
8
Probability Law
More formally, we state the same as follows.
A probability law for E is a rule that assigns a
number P(A) to each event A in E satisfying
the following axioms:
AI:
AII:
AIII:
P(A) 0 s
1 ) ( = S P
P(B) P(A) B A P B A + = = ) ( 
Everything else is derived from these axioms!
9
Important Corollaries
1.
2.
3.
4.
5.
= = u =
k
k
k
k j i n
A P A P j i A A A A A ) ( ) ( ... , ,
2 1
) ( 1 ) ( A P A P
c
=
1 ) ( s A P
0 ) ( = u P
) ( ) ( ) ( ) ( B A P B P A P B A P + =
10
Probability Law
All probabilities must be in [0, 1]
The sum of probabilities must be at most 1
A
S
x
) ( 1 S P =
) ( 0 u = P
u
) ( A P
) (x P
11
Conditional Probability
Events of interest occur within some context,
and that context changes their likelihood.
time
packet collisions
time
interarrival times
We are interested in events occurring given that
others take place!
12
Conditional Probability
The likelihood of an event A occurring given
that another event B occurs is smaller than
the likelihood that A occurs at all.
We define the conditional probability of A
given B as follows:
0 ) ( for
) (
) (
)  ( >
= B P
B P
B A P
B A P
We require P(B) > 0 because we know B occurs!
13
Theorem of Total
Probability
Purpose is to divide and conquer
We can describe how likely an event is by
partitioning it into mutually exclusive pieces.
time
success failure
busy period idle period busy period
1
B
2
B
3
B
14
Theorem of Total
Probability
1
B
2
B
n
B
...
A
) ( ) ( )  ( ) (
= =
i
i
i
i i
B A P B P B A P A P
5
B A
Intersections of A with Bs are mutually exclusive
15
Independence of Events
In many cases, an event does not depend on prior events,
or we want that to be the case.
Example: Our probability model should not have to account
for the entire history of a LAN.
Independence of an even with respect to another means
that its likelihood does not depend on that other event.
A is independent of B if P(A  B) = P(A)
B is independent of A if P(B  A) = P(B)
So the likelihood of A does not change by
knowing about B and viceversa!
This also means:
) ( ) ( ) ( B P A P B A P =
16
Random Variables
We are not interested in describing the likelihood
of every outcome of an experiment explicitly.
We are interested in quantitative properties
associated with the outcomes of the experiment.
Example:
What is the probability with which each packet
sent in an experiment is received correctly?
We dont really care!
What is the probability of receiving no packets
correctly within a period of T sec.?
We care! This may make a router delete a
neighbor!
17
Random Variables
We implicitly use a measurement that assigns a numerical
value to each outcome of the experiment.
The measurement is [in theory] deterministic; based on
deterministic rules.
The randomness of the observed values of the
measurement is completely determined by the randomness
of the experiment itself.
A random variable X is a rule that assigns a
numerical value to each outcome of a
random experiment
Yes, it is really a function!
18
Random Variables
Definition: A random variable X on a sample
space S is a function X: S >R that assigns a
real number X(s) to each sample point s in S.
+
S
5
s
3
s
2
s
1
s
4
s
0
) (
2
s X
) (
1
s X
) ( ) (
5 4
s X s X =
9 c e = }  ) ( { S s s X S
i i X
which is called the event space
19
Random Variables
Purpose is to simplify the description of the problem.
We will not have to define the sample space!
+
0
1
S
s
x s X = ) (
) ) ( ( x s X P =
Possible values of X
20
Types of Random Variables
Discrete and continuous: Typically used
for counting packets or measuring time
intervals.
time
0
t
time
Busy period
next packet
time?
1 2 3 4
21
Discrete Random Variables
We are interested in the probability that a discrete random
variable X (our measurement!) equals a certain value or
range of values.
Example:
We measure the delay experienced by each packet sent from one host to
another over the Internet; say we sent 1M packets (we have 1M delay
measurements). We want to know the likelihood with which any one
packet experiences a delay of 5 ms or less.
Probability Mass Function (pmf) of a random
variable X :
The probability that X assumes a given value x
) ( ) ( x p x X P
X
= =
22
Discrete Random Variables
Cumulative Distribution Function (cdf) of a
random variable X: The probability that X
takes on any value in the interval
] , ( x
) ( ) ( x X P x F
X
s =
The pdf and pmf of a random variable are just
probabilities and obey the same axioms AI to
AIII. Therefore,
) ( ) ( b) X P(a
for ) ( ) (
0 ) ( lim ; 1 ) ( lim ; 1 ) ( 0
a F b F
b a b F a F
x F x F x F
X X
X X
X X x X
x
= s <
< s
= = s s
+
23
Continuous Random Variables
The probability that a continuous r.v. X assumes a given
value is 0.
Therefore, we use the probability that X assumes a range
of values and make that length of that range tend to 0.
The probability density function (pdf) of X, if it exists,
is defined in terms of the cdf of X as
dx
x dF
x f
X
X
) (
) ( =
}
} }
+
= +
s s = = = s s
y probabilit a is pdf because , 1 ) ( then if
) ( ) ( ) ( ; (x)dx f b) X P(a
b
a
X
dt t f x
x X P dt t f x F
X
x
X X
24
What We Will Use
We are interested in:
Using the definitions of wellknown
r.v.s to compute probabilities
Computing average values and
deviations from those values for
wellknown r.v.s
25
Mean and Variance
What is the average queue length at each
router?
What is our worst case queue?
B
A
D
C
3
5
4
6
1
2
7
For VC1 use 3
For VC2 use 2
..
For VCn use 3
VC1
VC2
26
Mean and Variance
Expected value or mean of X:
us! for defined always is Mean
c.r.v for ) ( ) (
d.r.v. for ) ( ) (
}
+
=
=
dt t tf X E
k p x X E
X
X
k
k
time
queue size
mean
too much?
27
Variance
Describes how much a r.v. deviates from its
average value, i.e., D = X  E(X)
We are only interested in the magnitude of
the deviation, so we use
2 2
)) ( ( X E X D =
The variance of a r.v. is defined as the mean
squared variation ) (
2
D E
) )] ( ([ ) (
2 2
X E X E X Var = =o
Important relation:
) ( ) ( ) (
2 2
X E X E X Var =
28
Properties of Mean and Variance
Useful when we discuss amplifying random
variables or adding constant biases.
) ( ) (
) ( ) (
0 ) (
) ( ; ) ( ) (
2
X Var c cX Var
X Var c X Var
c Var
b b E b X aE b aX E
=
= +
=
= + = +
29
Examples of Random Variables
We are interested in those r.v. that permit us to
model system behavior based on the present state
alone.
We need to
count arrivals in a time interval
count the number of times we need to repeat
something to succeed
count the number of successes and failures
measure the time between consecutive arrivals
The trick is to map our performance questions into
the above four types of experiments
30
Bernoulli Random Variable
Let A be an event related to the outcomes
of a random experiment.
X = 1 if outcome occurs and 0 otherwise
This is the Bernoulli r.v. and has two
possible outcomes: success (1) or
failure (0)
q p X P P P
p q X P P P
X
X
= = = = =
= = = = =
1 ) 1 ( ) 1 (
1 ) 0 ( ) 0 (
1
0
We use it as a building block for other types of counting
31
Geometric Random Variable
Used to count the number of attempts needed to
succeed doing something.
Example: How many times do we have to
transmit a packet over a broadcast radio
channel before it is sent w/o interference?
time
Assume that each attempt is independent of any prior
attempt!
Assume each attempt has the same probability of success (p)
failure failure failure success!
32
Geometric Random Variable
We want to count the number k of trials needed for the first
success in a sequence of Bernoulli trials!
,... 2 , 1 ; ) 1 ( ) ( ) (
trial a in success of y probabilit
1
= = = =
=
k p p k X P k p
p
k
X
Why this is the case is a direct consequence of assuming
independent Bernoulli trials, each with the same probability of success:
k1 failures needed before the last successful trial
Memoryless property: The probability of having a success in
k additional trials having experienced n failures is the same as
the probability of success in k trials before the n failed trails
) ( )  ( k X P n X k n X P = = > + =
33
Binomial Random Variable
X denotes the number of times success occurs in n
independent Bernoulli trials.
success 1 where 0); , ... 0, 0, 1, , ... 1, 1, , 1 (
such that outcome specific he Consider t
= =
e
s
S s
k n k c
n
c
k k
p p A P A P A P A P A P s P
+
= = ) 1 ( ) ( )... ( ) ( ).... ( ) ( ) (
: t independen are trials Because
1 2 1
s 0' and s 1' have we ; 1 1 nk k n k k +
q p A P A P p A P
A A A A A s
i A
c
i
c
n
c
k k
i
= = = =
=
=
+
1 ) ( 1 ) ( ; ) (
... ...
then , in trial success Let
1 2 1
34
Binomial Random Variable
k n k c
n
c
k k
p p A P A P A P A P A P s P
+
= = ) 1 ( ) ( )... ( ) ( ).... ( ) ( ) (
1 2 1
! )! (
!
k
n
and slots, in the s 1' for
positions choose to ways different
k
n
are There
k k n
n
n
k
=


.

\



.

\

Because each outcome is mutually exclusive of the others:
n k
p p
k k n
n
p p
k
n
k P
k n k k n k
n
s s


.

\

=


.

\

=
0
) 1 (
! )! (
!
) 1 ( ) (
35
Poisson Random Variable
We use this r.v. in cases where we need to count event
occurrences in a time period.
An event occurrence will typically be a packet arrival.
Arrivals are assumed to occur at random over a time
interval.
The time interval is (0, t]
We divide the time interval into n small subintervals of
length
t n t A = /
The probability of a new arrival in a given subinterval is
defined to be
t A
is constant, independent of the subinterval.
36
Poisson Random Variable
A sequence of n independent Bernoulli trials; with X being
the number of arrivals in (0, t]
arrivals. 1 or 0 having of y probabilit the to compared
negligible is in arrival one than more of y probabilit The
: 0 and Make
t
t n
A
A
By assumption, whether or not an event occurs in a subinterval is
independent of the outcomes in other subintervals.
We have:
t A
.
time
t
0
arrival
1 2 3 4 n
1 2 3 k
37
Poisson Random Variable
arrivals of number the is ; ) 1 ( ) ( k p p
k
n
k P
k n k
n


.

\

=
n t t n t t P p / } / in arrival { = A = = A =
k k n
k
n
t
n
t
k k n
n
P k X P t k P
(
= = = =
1
! )! (
!
) ( ]} [0, in arrivals {
: then , 0 and Make A t n
k n
k
k
k
n
t
n
t
k
t
k n n
n
P
(
=
1 1
!
) (
)! (
!
: get to Rearrange
 
k
n
k
k
n
t
k
t
n
k n n n n
(
+
1 1
!
) ( ) 1 )..( 2 )( 1 (
P
k
=
n
k
n
t
k
t
(
1
!
) (
) 1 (
n x
x
e
x
x
=

.

\

+
with ;
1
1 lim also and
38
Poisson Random Variable
We see that the Poisson r.v. is the result of an
approximation of i.i.d. arrivals in a time interval.
We will say arrivals are Poisson meaning we can use the
above formulas to describe packet arrivals
The probability of 0 arrivals in [0, t] plays a key role in our
treatment of interarrival times.
P
k
= P{k arrivals in (0, t]} =
t
( )
k
k!
e
t
t
e t P P
= = ]} (0, in arrivals 0 {
0
t
e P t P
= = 1 1 ]} (0, in arrivals some {
0
39
Important Properties of
Poisson Arrivals
The aggregation of Poisson sources is a Poisson
source.
If packets from a Poisson source are routed such
that a path is chosen independently with
probability p, that stream is also Poisson, with
rate p times the original rate.
It turns out that the time of a given Poisson
arrival is uniformly distributed in a time interval
The parameter
] ][e ) [(
] [
x)  (t 
The above probability corresponds to the
uniform distribution!
Hence, in the average case, a packet arrives in the middle of
a fixed time interval with Poisson arrivals