Beruflich Dokumente
Kultur Dokumente
(ENM 503)
Michael A. Carchidi
June 30, 2015
Chapter 10 - Simulation and Monte-Carlo Methods
The following notes are based on the textbook entitled: A First Course in
Probability by Sheldon Ross (9th edition) and these notes can be viewed at
https://canvas.upenn.edu/
after you log in using your PennKey user name and Password.
1. Introduction and Motivation for Simulation
In this chapter, we want to discuss simulation and specifically Monte-Carlo
methods for computing probabilities. A more detailed discussion of simulation
methods, of which Monte-Carlo methods is just one part of, is discussed in the
ESE 603 course that is offered during the spring semester and this course serves
as a good continuation of the ENM 503 course.
Let us motivate the ideas behind simulation and Monte-Carlo methods in
probability by considering the following geometric probability problem. Suppose
that two coins of radii R1 and R2 are thrown on a rectangular sheet of paper
having length L > 0 and width W > 0 so that the position of each coins center
uniformly lands somewhere on the sheet of paper. Note that this does not require
that the entire coin lands on the paper, only its center. Given these conditions, we
would like to compute (in terms of the inputs: L, W , R1 and R2 ) the probability
that the two coins overlap. Such a problem is known as a geometric probability
problem.
Without any lost in generality, we may assume that the rectangle is fixed on
an xy plane as the region
R = {(x, y)|0 x L, 0 y W },
(1)
R = {(x, y)|0 x L, 0 y W }
If (X1 , Y1 ) give the coordinates of the center of coin 1 and if (X2 , Y2 ) give the
coordinates of the center of coin 2, then, under the conditions of the problem, we
have X1 and X2 , both independent and uniform random variables in the continuous interval [0, L), and we have Y1 and Y2 , both independent and uniform random
variables in the continuous interval [0, W ), i.e.,
X1 U [0, L)
Y1 U [0, W )
(2a)
X2 U[0, L)
Y2 U [0, W ).
(2b)
and
From the geometry of the problem, we then see that the two coins will overlap
when the distance between their centers,
D = (X2 X1 )2 + (Y2 Y1 )2
(3)
(4)
Here we have D R1 + R2
and the two coins overlap
and
figures.
Here we have D R1 + R2
and the two coins overlap
and
(5)
where D is some random variable that could be as small as zero, when the two
centers coincide, or as large as (L2 + W 2 )1/2 , when the two centers are on opposite
corners of the rectangle. This is somewhat difficult to compute analytically since
the random variables X1 and X2 are from U[0, L) and the random variables Y1
and Y2 are from U[0, W ), making it difficult to determine the random nature of
the random variable D as defined in Equation (4), even through stating the range
space of D as
0 D L2 + W 2
is somewhat obvious.
We shall see that simulation offers a way to estimate the probability in Equation (5) using the computer and without requiring that much more work than we
have already done. Such an estimate will be provided in the last section of this
chapter. Before we see how this is accomplished, it should first be noted that
since
X1 U [0, L)
,
Y1 U [0, W )
and
X2 U [0, L)
Y2 U [0, W )
we have
X1 = LZ11
, Y1 = W Z12
, X2 = LZ21
, Y2 = W Z22
where Z11 , Z12 , Z21 and Z22 are all independent standard uniform random variables, U[0, 1). Then
P = Pr(D R1 + R2 ) = Pr( (X2 X1 )2 + (Y2 Y1 )2 R1 + R2 ),
becomes
P = Pr(
= Pr(
or
where
P = Pr( (Z21 Z11 )2 + 2 (Z22 Z12 )2 )
(6a)
R1 + R2
W
and
=
,
(6b)
L
L
thereby showing that P is not a function of the four parameters: L, W , R1 and
R2 , but is rather a function of only the two parameters and , and in the special
case when W = L (i.e., when = 1), then P depends only on the single value
of . These results will serve as a way of checking the simulation for accuracy by
=
seeing if P stays fixed when one changes the values of L, W , R1 and R2 in a way
that keeps the values of and fixed.
At the heart of all simulations are random numbers so let us now discuss these.
A more detailed discussion is found in the ESE 603 course.
2. The Definition of Random Numbers
A random number (denoted by R) is simply a sample from the standard uniform distribution U [0, 1), whose pdf and cdf are given by
0,
x0
0,
x<0
x, 0 x 1 ,
1, 0 x < 1
&
F (x) =
f (x) =
1,
1x
0,
1x
respectively. It is easily seen that the mean and variance of the standard uniform
distribution are given by
2
1
1
1
1
1
E(X) =
and
V (X) =
xdx =
dx = ,
x
2
2
12
0
0
respectively, and these will help when it comes to checking a random sequence for
accuracy.
Random numbers are a necessary basic ingredient in simulation because from
a sample R U[0, 1), we shall see that in theory, it will be possible to generate a
sample from any other random variable X. The reader may never have to write
a computer program to generate random numbers because all well-written simulation software have built-in subroutines, objects, or functions that will generate
random numbers. For example, Microsoft Excel, which we shall use later to solve
the problem proposed in the introduction, has a routine called RAND() which
generates a random number. However, it is still important to understand the
basic ideas behind the generation and testing of random numbers.
,
N
N
in the limit of large N must approach 1/12. In addition, these limits should
be approached in an oscillatory (or non-monotonic) manner so that sometimes they are too high and sometimes they are too low and sometimes they
are too high and sometimes they are too low, and so on.
4. Generation of Pseudo-Random Numbers
This section describes the common method for the generation of random numbers and some methods for testing these for randomness. Since a computer
algorithm must be used to generate random numbers, they are technically not
7
really random. For this reason, they are call pseudo random since the word pseudo
implies that the very act of generating random numbers by any known method
removes the potential for true randomness because if the method is known, the
set of random numbers can be replicated over and over again. Therefore a philosophical argument could be made that it is impossible to construct a computer
algorithm that generates truly random numbers.
Therefore, the real goal of any random-number generation scheme is to
produce a sequence of numbers between zero and one which simulates (or mimics)
the necessary properties of uniformity and independence as closely as possible, so
that if just the sequence of numbers
{R1 , R2 , R3 , . . . , RN }
is provided to a user, it should be virtually impossible for the user to reconstruct
the computer algorithm that produced this sequence of numbers.
When generating pseudo-random numbers, certain problems or errors can occur which should be avoided by a good algorithm. Some of these errors (but
certainly not all) include the following:
the generated numbers may really not be uniformly distributed,
the generated numbers may really be discrete-valued instead of continuous
valued,
the sample mean of the generated numbers may be consistently too high
above 1/2 or too low below 1/2,
the sample variance of the generated numbers may be consistently too high
above 1/12 or too low below 1/12, and
the numbers may not really be independent in that there may be dependence
in any of the following ways:
autocorrelation between numbers, e.g., every fifth number is larger
than the mean of 1/2, and so on,
numbers successively higher or lower than adjacent numbers,
8
several numbers are found above the mean followed by several numbers
below the mean, and so on.
Any departures from uniformity and independence for a particular generation
scheme may be detected by tests such as those we shall describe later. Generators,
such as RAND() in Microsoft Excel, have pass many of these tests as well as more
stringent tests and so there is really no excuse for using a generator that is later
been found to be deficient.
In most cases, random numbers are generated as part of a subroutine (or
function) for a given simulation and most generators of random numbers should
satisfy the following practical conditions:
the generator routine should be fast since good statistics requires a large
sample size of random numbers,
the generator routine should be portable to different computers, and ideally
to different programming languages,
the generator routine should have a long cycle length or period, which is
the length of the random-number sequence before previous numbers begin
to repeat themselves, (and what this means will be discussed in more detail
later),
the random numbers generated should be replicable so that it should be
possible to generate the same sequence of random numbers given the same
starting point in the sequence,
the generated random numbers should closely approximate the ideal statistical properties of uniformity and independence.
Note that constructing algorithms that seem to generate random numbers is
easy, but constructing algorithms that really do produce sequences of random
numbers that are independent and uniformly distributed in the interval between
0 and 1 is much more difficult.
One purpose of this section is to discuss the central issues in random-number
generation in order to enhance ones understanding in the generation of random
numbers and to show some of the techniques that are used to test a sequence of
numbers for independence and uniformity.
First we discuss the techniques for generating random numbers and then we
shall discuss some tests used to see if these sequences are random.
A seemingly simple way to generate a sequence of N random numbers
{R0 , R1 , R2 , R3 , . . . , RN }
is to start with a continuous function f that maps the interval [0, 1) onto the
interval [0, 1), i.e.,
f : [0, 1) [0, 1).
Then an initial value (called the seed) R0 in the interval [0, 1) is chosen and the
iteration scheme Rn+1 = f (Rn ), for n = 0, 1, 2, . . . , N 1, is used to generate
R1 , R2 , R3 , . . ., RN . This is known as an iteration (or recursive) method. Lets
illustrate the idea with two examples.
Example #1: f(R) = 4R(1 R)
The function f (R) = 4R(1 R), which is plotted below,
1
0.8
0.6
0.4
0.2
0 0
0.2
0.4
0.6
0.8
5 5
0.34549
R0 =
8
produces the sequence
{R0 , R1 , R2 , R3 , . . .} =
5 5 5+ 5 5 5 5+ 5
,
,
,
, ...
8
8
8
8
or
{R0 , R1 , R2 , R3 , . . .} = {0.34549, 0.90451, 0.34549, 0.90451, ...}
which is also certainly not random, showing that such a recursive method is very
much dependent on the value of the seed R0 . In addition, we should note that
the sequence
generated using R0 = 0.6 may never contain the numbers 0.75 or
(5 5)/8.
Example #2: f(R) = R2
It should be noted that some choices of the mapping function f : [0, 1)
[0, 1) will never produce a sequence that looks random for any choice of R0 . For
example, the mapping function f (R) = R2 , which is plotted below,
11
1
0.8
0.6
0.4
0.2
0 0
0.2
0.4
0.6
0.8
12
1
0.8
0.6
0.4
0.2
0 0
10
and
f : N [0, 1),
is that they rely on real-number arithmetic which can sometimes be unpredictable when performed by a computer. To illustrate this statement, the reader
is directed to the 4R(1 R) worksheet in the Microsoft Excel file that accompanies this chapter. This worksheet illustrates what is commonly known as the
butterfly effect, which says that a small change at the beginning of an iteration scheme could very quickly propagate into a very large effect later on. It is
sometimes dramatically worded to say that a single butterfly flapping its wings in
South America could result in a tornado being formed in Texas. Specifically, this
worksheet shows that the sequence generated using
R0 = 0.60000001
and
13
Rn+1 = 4Rn (1 Rn )
and
Rn+1 = 4Rn (1 Rn )
even as soon as in the values of R18 . This is mainly due to the limited storage
capability of a computer and these effects can sometimes not be avoided and is
the subject of a branch of mathematics known as Chaos Theory. A better scheme,
which uses mostly integer arithmetic (and hence avoids this type of chaotic behavior) is now described.
Linear Congruential Method
The linear congruential method is the most widely used method for generating
random numbers. The major advantage of this method is that it uses mostly
integer arithmetic and hence can be implemented easily on a computer with very
dependable outcomes. The linear congruential method first produces a sequence
of integers
{X0 , X1 , X2 , X3 , . . . , Xn , . . .}
between 0 and m 1 according to the following linear recursive relationship
Xn+1 = (aXn + c) mod(m)
(7)
for n = 0, 1, 2, 3, . . .. The initial integer value X0 is called the seed, the integer a
is called the constant multiplier, the integer c is the increment, and the integer
m is the modulus with m > 1. From this sequence of integers, the sequence of
random numbers in the interval [0, 1),
{R0 , R1 , R2 , R3 , . . . , Rn , . . .},
is then computed using Rn = Xn /m for n = 0, 1, 2, 3, . . ., and hence involves
a single division. This is the only real-number arithmetic needed and all other
arithmetic is integer.
Modular Arithmetic
By definition, we say that a = b mod(m) when the integer a b is evenly
divisible by m. In fact, the notation b mod(m) is used to represent the remainder
one gets when one divides b by m. For example, 7 mod(3) = 1 since 3 divided
14
and hence
{R0 , R1 , R2 , R3 , . . . , Rn , . . .},
equals the number of entries in the repetitive part of the sequence. This would
then suggest that the sequence
{R0 , R1 , R2 , R3 , . . . , Rn , . . .},
is really not random unless the cycle length of the sequence is large enough so that
the repetitive nature of the sequence is very well hidden. We shall see that the
selection of the values for a, c, m, and X0 can drastically affect the cycle length
but first lets look at an example.
Example #4: A Linear Congruence
Let us use the linear congruential method to generate a sequence of random
numbers using a = 17, c = 43, m = 100, and X0 = 27 (along with X0 = 20) in
the equation
Xi+1 = (aXi + c) mod(m) = (17Xi + 43) mod(100)
for i = 0, 1, 2, 3, . . .. Here the Xi s will be integers from 0 to 99, inclusive, and
so the Ri s will be two decimal-place random numbers between 0.00 and 0.99,
15
inclusive. The following two tables of results (one using X0 = 27 and one using
X0 = 20) are obtained.
i
0
1
2
3
4
5
6
7
8
9
10
Xi
27
2
77
52
(27)
2
77
52
27
2
77
Ri
0.27
0.02
0.77
0.52
0.27
0.02
0.77
0.52
0.27
0.02
0.77
i Xi
0 20
1 83
2 54
3 61
4 80
5
3
6 94
7 41
8 40
9 23
10 34
Ri
0.20
0.83
0.54
0.61
0.80
0.03
0.94
0.41
0.40
0.23
0.34
i
11
12
13
14
15
16
17
18
19
20
21
Xi
21
0
43
74
1
60
63
14
81
(20)
83
Ri
0.21
0.00
0.43
0.74
0.01
0.60
0.63
0.14
0.81
0.20
0.83
The numbers in parenthesis show where the sequence starts to repeat. Note that
using the seed X0 = 27 gives in the numbers 0.27, 0.02, 0.77, and 0.52 and these
continually repeat resulting in a cycle length of 4, but using the seed X0 = 20
does a little better resulting in a cycle length of 20, but note that the numbers
0.27, 0.02, 0.77 and 0.52 can never appear in this sequence of 20 numbers. One
should note that the resulting sequence generated by
Xi+1 = (aXi + c) mod(m)
does depends on the seed X0 and the repetitive part of two different sequences
can have no elements in common.
The ultimate test of the linear congruential method, as of any generation
scheme, is how closely the generated numbers approximate uniformity and independence. Other important properties include maximum density and maximum
period. By maximum density, it is meant that the values assumed by the Ri s
leave no large gaps on [0, 1).
Gaps
With regards to these gaps, note that the sequence of random numbers gen-
16
erated by the linear congruential method can only come from the set
1 2 3
m1
0, , , , ,
m m m
m
which means that the Ri s are discrete (not continuous) on the interval [0, 1) and
the gap is no smaller than 1/m. However, all of this is of little consequence if the
modulus m is very large. Values of m as large as
m = 248 = 281, 474, 976, 710, 656
are in common use these days making
1
3.5527 1015
m
so that the discreteness of such a sequence, and the resulting gap produced, is
well hidden.
Periods
With regards to the period, we again note that the sequence of random numbers generated by the linear congruential method can only come from the set
1 2 3
m1
0, , , , ,
m m m
m
which means that maximum period of the sequence of Ri s can be no larger than
m and a maximum period equal to m can be achieved by proper choices of a, c
and X0 (for a given value of m). Specifically, the following general results from
number theory can be utilized to insure maximum periods when m is either a
power of 2 (which is good when it comes to computers) or when m is a prime
number.
For m a power of 2, and c = 0, the longest possible period that can be
achieved is m, and this is accomplished whenever c is odd and a = 1 mod(4).
Furthermore, it should be obvious that this does not depend on the choice
of seed X0 since every integer from 0 to m 1, inclusive will be represented
somewhere in the sequence of Xi s.
17
Xi
1
13
41
21
17
29
57
37
33
Xi
2
26
18
42
34
58
50
10
(2)
Xi
3
39
59
63
51
23
43
47
35
Xi
4
52
36
20
(4)
52
36
20
4
i
9
10
11
12
13
14
15
16
17
Xi
45
9
53
49
61
25
5
(1)
13
Xi
26
18
42
34
58
50
10
2
26
Xi
7
27
31
19
55
11
15
(3)
39
Xi
52
36
20
4
52
36
20
4
52
The numbers in parenthesis show where the sequence starts to repeat. The
maximum period of m/4 = 16 is achieved using X0 odd (1 or 3). Notice that
a = 13 = 5 mod(8) as required to achieved maximum period. Note also that when
X0 = 1, the generated sequence assumes values (when ordered) in the set
{1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61}
18
=
= 0.0625
64 64
16
which is large, and this leads one to be concerned about the density of the random
numbers generated using the scheme in this example. Of course, this generator
has a period that is too short and the density is insufficiently low for it to be used
to generate random numbers, but it does illustrate the importance of properly
choosing a, c, m and X0 .
Example #6: A Linear Congruence in Which m is Prime
Lets assume that a = 5, m = 7 (a prime) and c = 0. Then the following
table shows that choosing a = 5 leads to a maximum period of m 1 = 6 using
Xi+1 = 5Xi mod(7).
k
ak 1
1
51 1 = 4
2
52 1 = 24
3
53 1 = 124
4
54 1 = 624
5 55 1 = 3124
6 56 1 = 15624
Comments
Not Divisible by
Not Divisible by
Not Divisible by
Not Divisible by
Not Divisible by
Divisible by 7
7
7
7
7
7
i
0
1
2
3
4
5
6
Xi
3
1
5
4
6
2
(3)
The number in parenthesis show where the sequence of Xi s starts to repeat. The
maximum period of m 1 = 6 is then achieved using any value of X0 not equal
to zero, and the resulting sequence of random numbers
3 1 5 4 6 2 3
1 2 3 4 5 6 1
, , , , , , ,...,
, , , , , , ,...,
7 7 7 7 7 7 7
7 7 7 7 7 7 7
when ordered, produces a gap of 1/7.
Once again we point out that using a large value of m such as m = 248 and
having a maximum period of m (when c = 0) or m/4 (when c = 0) will result
in a small gap and a large period for the appropriately chosen values of a and c,
and this will mask the discrete and the repetitive nature of the numbers being
generated.
19
0, when x 0
x, when 0 x 1 .
F (x) =
1, when 1 x
n
(Oi Ei )2
i=1
20
Ei
Note that this does not imply that further testing of the generator for uniformity
is unnecessary because no test can ever guarantee that the generated numbers
are distributed uniformly on the interval [0, 1).
Hypothesis Testing for Independence
In testing for independence, the null hypothesis is
H0 : {R1 , R2 , R3 , . . . , RN } are independent on the interval [0, 1)
(neglecting the seed R0 ) and failure to reject this null hypothesis means that no
evidence of dependency has been detected on the basis of this test.
Note that this does not imply that further testing of the generator for independence is unnecessary because no test can ever guarantee that the generated
numbers are independence.
Level of Significance
For each of the above tests, a level of significance must be stated. This level
of significance is the probability of rejecting the null hypothesis given that the
null hypothesis is true and is known as a Type I() error, i.e.,
= Pr(Reject H0 | H0 is true)
(8a)
which is often referred to as a false positive. The decision maker sets the value
of , and usually, is set to a small value such as 0.01 or 0.05. This then says
that the probability that you reject the null hypothesis, given that it is true (i.e.,
make a false positive) would be small. Note that a Type II() error involves the
probability of accepting the null hypothesis given that the null hypothesis is false
and is defined as
= Pr(Accept H0 | H0 is false),
(8b)
and this is known as a false negative. Of course
Pr(Accept H0 | H0 is true) = 1 &
are not considered errors.
22
Pr(Reject H0 | H0 is false) = 1
Note that we can never choose to accept H0 with certainty. We can only choose
to reject H0 (or accept H0 ) up to a certain significance level.
If several tests are made on a sequence of random numbers, the probability
of rejecting the sequence (making a Type I() error) on at least one test, by
chance alone, must increase. Similarly, if one test is conducted on many sets of
random numbers, the probability of rejecting at least one set (making a Type
I() error), by chance alone must increase as well. For example, if 100 sets of
numbers were subjected to a particular test, with = 0.05, it would be expected
that (100)(0.05) = 5 of these sets would be rejected by chance alone, or if one set
of numbers is subjected to 100 tests (all with the same level ), then this set of
numbers is expected to not pass (100)(0.05) = 5 of these tests by chance alone.
In general, if the number of rejections in N tests (all with the same level ) is
close to the expected number, N , then there is no compelling reason to discard
the generator that is being tested since N rejections would normally occur by
chance alone. In addition, if a set of random numbers passes all the tests, it is
still no guarantee that the set is truly random because it is always possible that
some underlying pattern will go undetected.
Frequency Tests
Basic tests that should always be performed to validate a new generator
of random numbers are tests for uniformity. At least two different methods of
testing are readily available. They are the Kolmogorov-Smirnov (KS) and the
chi-squared (2 ) tests and both of these tests measure the degree of agreement
between the distribution of a sample of generated random numbers and results
predicted by the theoretical uniform distribution U [0, 1). These both assume the
null hypothesis of no significant difference between the sample distribution and
the theoretical distribution.
The Kolmogorov-Smirnov (KS) Test
This test compares the empirical cdf SN (x) constructed from a sample of N
random numbers to the theoretical cdf F (x) of the standard uniform distribution.
23
For the standard uniform distribution U [0, 1), the theoretical cdf is given by
0, for x 0
x, for 0 x 1
F (x) =
1, for 1 x
which is plotted below.
1
0.8
0.6
0.4
0.2
-0.5
0 0
0.5
1.5
24
(9a)
x
2 2
Pr DN
12
(1)k1 e2k x
(9b)
N
k=1
for large values of N . For example, setting
x
2 2
Pr DN D,N =
12
(1)k1 e2k x = 1
N
k=1
gives
1.22
Pr DN
N
2
2
12
(1)k1 e2k (1.22) 0.90 = 1 0.10
k=1
and
1.36
Pr DN
N
12
2
2
(1)k1 e2k (1.36) 0.95 = 1 0.05
k=1
and
1.63
2
2
Pr DN
12
(1)k1 e2k (1.63) 0.99 = 1 0.01.
N
k=1
When using the Kolmogorov-Smirnov to test a random sequence {R1 , R2 , R3 , ..., RN }
against a standard uniform cdf, the test procedure follows the following five steps
which can be easily performed using Microsoft Excel:
1.) Rank the sequence {R1 , R2 , R3 , ..., RN } from smallest to largest. Specifically,
let R(i) denote the ith smallest observation, so that
R(1) R(2) R(3) R(N) .
25
2.) Compute
+
DN
= max
1iN
i
R(i)
N
DN = max R(i)
1iN
N
which is the largest deviation of SN (x) below F (x).
+
1.22,
1
1.36,
D,N
N
1.63,
when the sample size N is larger than 35, which is usually the case.
5.) If the sample statistic DN is greater than the critical value, D,N , the null
hypothesis that the sample data are a sample from a standard uniform
distribution is rejected. If DN D,N , we conclude that no difference has
been detected between the true distribution of {R1 , R2 , R3 , . . . , RN } and the
standard uniform distribution U [0, 1).
Example #7: The Kolmogorov-Smirnov (KS) Test
Consider the small set of N = 5 random numbers
{0.44, 0.81, 0.14, 0.05, 0.93}.
26
Ri
0.44
0.81
0.14
0.05
0.93
R(i)
0.05
0.14
0.44
0.81
0.93
i/N
0.20
0.40
0.60
0.80
1.00
0.60
0.07
0.80
+
D = 0.26
R(i) (i 1)/N
0.05
0.04
0.21
0.13
D = 0.21
0 0
0.2
0.4
0.6
x
0.8
n
(Oi Ei )2
Ei
i=1
27
(10a)
where Oi is the observed number of observations in the ith class, Ei is the expected
number of observations in the ith class (based on the random variable one believes
the observations come from), and n is the number of classes choosen. Of course,
we must have
n
Oi = N,
(10b)
i=1
and for the uniform distribution, the expected number of observations in each
class is Ei = N/n for equally-sized classes. We now present an intuitive argument
showing that the 2 sampling distribution (for large values of N) is approximately
the chi-squared distribution with n 1 degrees of freedom. Tables of different
percentage points of the chi-squared distribution with degrees of freedom for
different values of are also easily obtained.
An Intuitive Argument Showing that 2 2n1 - Optional
The statistic
n
(Oi Ei )2
2
=
Ei
i=1
Oi = N
i=1
(11)
and
C3 = {W |150 lbs W < 200 lbs}
,
28
and
C5 = {W |250 lbs W < 300 lbs}
C6 = {W |300 lbs W },
and the data points coming in refers to customers entering a store. Then each
customer can be placed into one of these six classes depending on the weight of
the customer.
Now let Oi be the random variable on the number of data points coming in
and placed in class Ci , for i = 1, 2, 3, ..., n. This is a random variable just like the
number of customers coming into a store with weights less than 100 lbs is also
a random variable. Let Ei = E(Oi ) be the expected value of Oi based on some
distribution and to determine the variance of Oi , we must make some assumption
about the natural of Oi . As data points come into class Ci , suppose we make the
reasonable assumption that they come in according to a Poisson Process, which
leads to a Poisson distribution with parameter i t (with t = 1 time unit). Before
we continue, let us be reminded about the assumptions behind a Poisson process.
A Poisson Process - A Reminder
Consider a sequence of random events such as the arrival of units at a shop or
the arrival of data coming in as measurements. These events may be described
by a counting function N (t) (defined for all 0 t), which equals the number
of events that occur in the closed time interval [0, t]. We assume that t = 0 is
the point at which the observations begin, whether or not an arrival occurs at
that instant and we note that N(t) is a random variable with with possible values
equal to the non-negative integers: 0, 1, 2, 3, . . .. Such an arrival process is called a
Poisson process with mean rate (per unit time) if the following three reasonable
assumptions are fulfilled.
A1: Arrivals occur one at a time: This implies that the probability of 2 or more
arrivals in a very small (i.e., infinitesimal) time interval t is zero compared
to the probability of 1 or 0 arrivals occurring in the same time interval t.
A2: N (t) has stationary increments: The distribution of the numbers of arrivals
between t and t + t depends only on the length of the interval t and not
on the starting point t. Thus, arrivals are completely random without rush
or slack periods. In addition, the probability that a single arrival occurs in
29
and
V (N(t)) = t = E(N(t)),
and
as the mean and variance in the arrival of the data that is being studied.
Back to the Intuitive Argument that 2 2n1
Using this little reminder about the Poisson process, we then see from A1, A2
and A3 above, that assuming that the data comes in as one piece of data every
time unit is reasonable and under this assumption, we find that
E(Oi ) = Ei
and
V (Oi ) = Ei
so that
(Oi ) = Ei .
If we then define the random variable
Zi =
Oi Ei
Oi E(Oi )
=
Ei
V (Oi )
e
,
2Ei
0.15
0.1
0.05
0 0
x 6
10
10
and
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0 0
10
12
14
0 0
10
12
14
16
32
and
0.12
0.1
0.08
0.06
0.04
0.02
0 0
10
15
20
Oi Ei
Ei
is approximately
N (0, 1)
provided that Ei 5. The reason for choosing Ei 5, besides the visual indications in the above figures, is because if X N(, ), then
1 2
X
1
0
Pr(X < 0) = Pr
<
= ( ) =
e 2 x dx.
33
0.1
0.08
0.06
0.04
0.02
0 0
A plot of ( ) versus
P (X < 0) = ( ) ( 5) = 0.00078
when 5. This shows that when N (, ) is used to approximate the Poisson
distribution, less than 0.00078 of the probability is in the forbidden region to
the left of zero.
Now going back to Equation (11), we see that
2
n
n
Oi Ei
2
2
Zi =
=
Ei
i=1
i=1
Oi = N,
i=1
this then says (for example) that On is completely known once O1 , O2 , ..., On1
are known, which says that
2
2
2
n1
n1
Oi Ei
Oi Ei
On En
2
=
+
=
+ constant
E
E
E
i
n
i
i=1
i=1
or
n1
Zi2 + constant.
i=1
34
This now makes 2 approximately chi-squared with one less degree of freedom
since only the Zi s for i = 1, 2, 3, ..., n1 are independent standard normal random
variables. This is why the statistic
2
n
(Oi Ei )2
i=1
Ei
Oi = N
i=1
(12b)
35
1
0.8
0.6
0.4
0.2
0 0
F (2, ) = Pr(X 2, ) = 1 .
or
20.1,7
1
27/2 (7/2)
C2 = [x1 , x2 ) ,
for chosen values of x0 0 < x1 < x2 < x3 < < xn1 < 1 xn . It is
recommended ( but not necessary) that all n classes have the same size by
making xi = i/n for i = 1, 2, 3, ..., n 1, so that
C1 = [0, 1/n),
C2 = [1/n, 2/n),
Cn = [(n1)/n, 1).
2.) Compute Oi as
Oi = # of {R1 , R2 , R3 , . . . , RN } in Ci
for each i = 1, 2, 3, . . . , n.
3.) Compute Ei using Ei = (xi xi1 )N as predicted by the standard uniform
distribution U [0, 1), for each i = 1, 2, 3, . . . , n, and (as demonstrated earlier)
it is recommended that the ith class is large enough so that Ei 5. When
using classes of equal size, we have Ei = N/n for each value of i, and then
Ei 5 says that we should choose n so that n N/5. In fact, it is usually
best to choose n so that
N n N/5
when N 25.
4.) Compute the sample statistic
2 =
n
(Oi Ei )2
Ei
i=1
2n1 .
5.) Determine the critical value, 2,n1 , from either a chi-squared table or from
the equation
Pr(X
2,n1 )
2,n1
1
x(n1)/21 ex/2 dx = 1 .
2(n1)/2 ((n 1)/2)
37
0.90
0.76
0.99
0.31
0.71
0.17
0.51
0.43
0.39
0.26
0.25
0.79
0.77
0.17
0.23
0.99
0.54
0.56
0.84
0.97
0.89
0.64
0.67
0.82
0.19
0.46
0.01
0.97
0.24
0.88
0.87
0.70
0.56
0.56
0.82
0.05
0.81
0.30
0.40
0.64
0.44
0.81
0.41
0.05
0.93
0.66
0.28
0.94
0.64
0.47
0.12
0.94
0.52
0.45
0.65
0.10
0.69
0.96
0.40
0.60
0.21
0.74
0.73
0.31
0.37
0.42
0.34
0.58
0.19
0.11
0.46
0.22
0.99
0.78
0.39
0.18
0.75
0.73
0.79
0.29
0.67
0.74
0.02
0.05
0.42
0.49
0.49
0.05
0.62
0.78
C2 = [1/n, 2/n),
so that
Cn = [(n 1)/n, 1)
i
i1
N
Ei =
N=
n
n
n
for each class, we have (for n = 10) the 10 classes:
C1 = [0, 0.1) ,
C2 = [0.1, 0.2) ,
C3 = [0.2, 0.3)
and the expected value for each class is Ei = 100/10 = 10 5. Using these 100
numbers we generate the next table.
Class Oi Ei Oi Ei
C1
7
10
3
C2
9
10
1
C3
8
10
2
C4
9
10
1
C5
14 10
+4
C6
7
10
3
C7
10 10
0
C8
15 10
+5
C9
9
10
1
C10
12 10
+2
100 100
0
38
(Oi Ei )2
9
1
4
1
16
9
0
25
1
4
(Oi Ei )2 /Ei
0.9
0.1
0.4
0.1
1.6
0.9
0.0
2.5
0.1
0.4
2 = 7.0
2
0.05,9
2
x7/2 ex/2 dx = 0.95
0
resulting in 20.05,9 16.919, we see that 2 < 20.05,9 , and so the null hypothesis
that the 100 numbers come from a standard uniform distribution should not be
rejected on the bases of this test and significance level.
Both the Kolmogorov-Smirnov and the chi-squared test are acceptable for
testing the uniformity of a sample of data, and the Kolmogorov-Smirnov test is
the more powerful of the two since it directly compares cdfs, and so it is the
more recommended of the two. Furthermore the Kolmogorov-Smirnov test can
be applied to small sample sizes, whereas the chi-squared test is valid only for
large samples so that each Ei 5.
Testing for uniformity is certainly important but it does not tell the whole
story. It should be noted that the order in which the Ri s are computed has no effects on the conclusions drawn from the Kolmogorov-Smirnov and the chi-squared
tests for uniformity but the order in which the Ri s are computed is certainly important from the perspective of giving the appearance of independence, as the
next example shows.
Example #9: A Perfect Random-Number Sequence Or Not
Consider the sequence
{X1 , X2 , X3 , . . . , XN }
generated using X0 = m 1 and
Xi+1 = (Xi + 1) mod(m).
Such a sequence must always lead to
{X1 , X2 , X3 , . . . , Xm } = {0, 1, 2, 3, . . . , m 1, ...}
39
which then repeats in a maximum cycle of length m. It is clear that the resulting
random numbers
{R1 , R2 , R3 , . . . , Rm } = {0, 1/m, 2/m, 3/m, . . . , (m 1)/m, ...}
would easily pass any Kolmogorov-Smirnov and chi-squared tests since we would
always find that Dm = 0 and 2 = 0 for any choice of classes C1 , C2 , C3 , ...,
Cn . Yet such a sequence definitely does not look random. This set of numbers
would pass all possible frequency tests with ease, but the ordering of the numbers
produced by the generator would not be random and so these numbers would not
pass any tests for independence.
In fact, in general, one can take any sequence of random numbers that would
pass all possible frequency tests and simply rearrange them (i.e., in increasing
order) and these same numbers would easily fail any type of independence test.
There are many tests for independence, some of which include:
Runs Test: Tests the runs up and down or the runs above and below
the mean by comparing the actual values to the expected values as
predicted by the standard uniform distribution U[0, 1). The statistic
for comparison are the Standard Normal Distribution and The ChiSquared Distribution.
Autocorrelation Test: Tests the correlation between the generated numbers and compares the sample correlation to the expected correlation
of zero as predicted by the standard uniform distribution U [0, 1). The
statistic for comparison is the Standard Normal Distribution.
Gap Test: Counts the number of digits that appear between repetitions of a particular digit and then uses the Kolmogorov-Smirnov (KS)
test to compare this with the expected size of gaps as predicted by a
geometric distribution. The statistic for comparison is the Geometric
Distribution.
Poker Test: Treats numbers grouped together as a poker hand. For
example, a five-digit number 0.11433 can be though of as a five-card
poker hand having two pairs, or 0.2222 can be though of as a five-card
poker hand having five of a kind, and so on. Then, a chi-squared (2 )
statistic is used to compare the frequency of these poker hands to
40
41
dG(z)
=1
dz
1, for 0 z < 1
g(z) =
,
0, for otherwise
which is also the pdf of a uniform distribution on the interval [0, 1), and so we
have shown that if F (x) is the cdf of some random variable X, then Z = F (X)
U [0, 1).
This means that (in theory) if F (x) is the cdf of some random variable X,
then R = F (X) is the continuous standard uniform distribution on the interval
[0, 1), and then X = F 1 (R), the inverse function of F , (which always exists
since F (x) is a monotonically increasing function of x) has distribution with pdf
f (x) = F (x). Therefore if
{R1 , R2 , R3 , . . . , RN }
42
becomes a random sample from the random variable X having cdf F (x). Note that
in practice, it may be very difficult (if not impossible) to get a simple algebraic
form for F (X), and even if its possible to get a simple algebraic form for F (X), it
may be very difficult (if not impossible) to get a simple algebraic form for F 1 (R).
For this reason, other methods such as the acceptance-rejection method have been
developed and this method is discussed in detail in the ESE 603 course. Let us
now look at a few examples.
Example #10: Exponential Distribution with Parameter
The exponential distribution with parameter > 0 has pdf
x
e , for 0 x
f (x) =
0,
for x < 0
F (x) =
f (z)dz =
1 ex , for 0 x
0,
for x < 0
with
1
Xi = ln(1 Ri )
instead of
43
1
X = ln(1 R).
This removes the need for the operations that subtracts each of the Ri s from 1,
which could result in considerable computer savings time especially if the value
of N (i.e., the size of the random sample) is very large. In Excel, this allows one
to use
1
ln(RAND())
0,
otherwise
0,
for x a
x
1,
for b x
with
Xi = a + (b a)Ri
for i = 1, 2, 3, . . . , N , is a random sample from U[a, b). In Excel, this allows one
to use
a + (b a) RAND()
to generate samples of X U [a, b).
44
0,
for x a
0,
for elsewhere
F (x) =
f (z)dz =
0,
for x a
for a x b
1,
for c x
0.05
0.04
0.03
0.02
0.01
0 0
10
20
30
40
may use the inverse transform method and set R = F (X), yielding
R=
h(X a)2
2(b a)
1
0 = F (a) R F (b) = h(b a)
2
when
and
R=1
h(c X)2
2(c b)
1
h(b a) = F (b) R F (c) = 1.
2
when
when
1
0 R h(b a)
2
and
X=F
(R) = c
2(1 R)(c b)
h
when
1
h(b a) R 1.
2
(b a)(c a)R
when
(c b)(c a)(1 R)
when
0R
ba
ca
ba
R 1.
ca
a + (b a)(c a)Ri ,
for 0 Ri (b a)/(c a)
(c b)(c a)(1 Ri ), for (b a)/(c a) Ri 1
10
(x x4 )
3
for 0 x 1, and zero otherwise. A plot of this pdf is shown in the figure below.
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0 0
0.2
0.4
0.6
0.8
5
2
(t t4 )dt = x2 x5
3
3
47
g(xn )
.
g (xn )
If the initial guess x0 is not too far away from a solution to g(x) = 0, then
lim xn = a solution to g(x) = 0.
Therefore if we want to solve for X given that R = F (X), then we let g(X) =
F (X) R and get g (X) = F (X) = f (X), where f is the pdf of X, so that
Xn+1 = Xn
g(Xn )
F (Xn ) R
= Xn
g (Xn )
f (Xn )
(with an initial guess of 0 < X0 < 1) could generate a sequence of values that
converge to X in which R = F (X). For example, earlier we had f (x) = 10(x
x4 )/3 and F (x) = 5x2 /3 2x5 /3, and so solving
2
5
R = F (X) = X 2 X 5
3
3
leads to
Xn+1
F (Xn ) R
= Xn
= Xn
f(Xn )
which reduces to
Xn+1
1
= Xn
10
5
3
Xn2 23 Xn5 R
10
(Xn Xn4 )
3
2Xn5 5Xn2 + 3R
Xn (Xn3 1)
0 0
0.2
0.4
0.6
0.8
x0 x1 x2 x3 xn
pk = 1
k=1
0,
for x x0
c0 + m0 (x x0 ),
for x0 x x1
c1 + m1 (x x1 ),
for x1 x x2
c2 + m2 (x x2 ),
for x2 x x3
F (x) =
..
..
..
.
.
.
1,
for xn x
where the slopes are given by
mi =
ci+1 ci
xi+1 xi
50
x0 + (R c0 )/m0 ,
for c0 R c1
x1 + (R c1 )/m1 ,
for c1 R c2
1
x2 + (R c2 )/m2
for c2 R c3
X = F (R) =
..
..
..
.
.
.
x
for cn1 R cn
n1 + (R cn1 )/mn1
so that if {R1 , R2 , R3 , . . . , RN } is a random sample from U [0, 1), then
{X1 , X2 , X3 , . . . , XN }
with
x0 + (Ri c0 )/m0 ,
x1 + (Ri c1 )/m1 ,
x2 + (Ri c2 )/m2
Xi =
..
x
n1 + (Ri cn1 )/mn1
for c0 Ri c1
for c1 Ri c2
for c2 Ri c3
..
.
..
.
for cn1 Ri cn
does not have a simple closed form. If we are willing to approximate the inverse of
the cdf, then we may still be able to generate these random variates. For example,
starting from F (x), we may choose a value of n and values of x1 , x2 , x3 , . . ., xn,
and construct the following table
Random Variable Values
(Increasing Order)
x1
x2
x3
..
.
Cumulative
Probabilites
c1 = Pr(X x1 ) = F (x1 )
c2 = Pr(X x2 ) = F (x2 )
c3 = Pr(X x3 ) = F (x3 )
..
.
xn
cn = Pr(X xn ) = F (xn)
like that in the previous example and then we may use the method described in
the previous example to generate a random sample from X. This says that if
{R1 , R2 , R3 , . . . , RN } is a random sample from U[0, 1), then
{X1 , X2 , X3 , . . . , XN }
with
..
x
n1 + (Ri F (xn1 ))/mn1
..
.
mi =
F (xi+1 ) F (xi )
,
xi+1 xi
for i = 1, 2, 3, . . . , N, is an approximation to a random sample from the distribution having cdf F (x). The larger we make the value of n, and the smaller we make
52
the intervals [xi , xi+1 ], the better the approximation, but also the more computer
work involved and so the slower the algorithm.
Discrete Distributions
Samples from discrete distributions can also be generated using the inverse
transform method, either numerically through a table look-up procedure, or in
some cases algebraically with the final generation scheme in terms of a formula
involving the ceiling and/or floor functions.
Note that for the sake of this discussion involving discrete distributions we
shall assume that R U (0, 1], which includes 1 but not 0. This is done simply
out of convenience and since
U(0, 1) = U(0, 1] = U[0, 1) = U [0, 1],
it really does not matter as long as N (the sample size) is large. After all, getting
exactly R = 0 or R = 1 should be very rare events. Let us illustrate the ideas
with some examples.
Example #15: An Empirical Discrete Distribution
Suppose we have a random variable X with a discrete range space
RX = {x1 , x2 , x3 , . . . , xn }
and corresponding probabilities {p1 , p2 , p3 , . . . , pn }. Then we may construct the
following table of cumulative probabilities.
Random Variable Values Probabilities
Cumulative
(Increasing Order)
Probabilites
x1
p1
c1 = p1
x2
p2
c2 = c1 + p2
x3
p3
c3 = c2 + p3
..
..
..
.
.
.
xn
pn
cn = cn1 + pn = 1
53
c0 0, for x < x1
c1 ,
for x1 x < x2
c2 ,
for x2 x < x3
c3 ,
for x3 x < x4
F (x) =
.
..
..
..
.
.
.
cn1 ,
for xn1 x < xn
cn 1, for xn x
Note that F is not continuous and it should not be made continuous using some
interpolation scheme. When applying the inverse transform method in this case,
we note that if
{R1 , R2 , R3 , . . . , RN }
is a random sample from U (0, 1], then
{X1 , X2 , X3 , . . . , XN }
with
x1 ,
x2 ,
1
x3 ,
Xi = F (Ri ) =
..
x ,
n
for 0 = c0 < Ri c1
for c1 < Ri c2
for c2 < Ri c3
..
.
..
.
54
Note that for generating discrete random variables, the inverse transform technique becomes a table look-up procedure, and unlike the case of a continuous
variable, interpolation should not be done. However, if the values of xi in the
above table are such that xi+1 xi is a constant (independent of i), then the
ceiling and/or floor functions may be used along with the expression for F 1 (R)
to generate a sample from a discrete distribution X, as we shall now demonstrate.
But first, let us be reminded of the ceiling and floor functions.
The Ceiling (Round Up) and Floor (Round Down) Functions
by
(13a)
4
2
-4
-2
0 0
2 x
-2
-4
Plot of x versus x
55
(13b)
and
4
2
-4
-2
0 0
2 x
-2
-4
Plot of x versus x
Note that in general
x x x
(13c)
x = 1 + x
(13d)
56
0/k,
1/k,
2/k,
F (x) =
..
(k 1)/k,
1,
for x < a + b
for a + b x < a + 2b
for a + 2b x < a + 3b
..
.
..
.
a + b,
for 0 < Ri 1/k
1
a + 3b, for 2/k < Ri 3/k
Xi = F (Ri ) =
..
..
..
.
.
.
a + b,
a + 2b,
1
a + 3b,
Xi = F (Ri ) =
..
a + kb,
57
..
.
with
Xi = a + kRi b
(14)
us agree that all those values of Zi satisfying 0 Zi < r will be assigned the
value of Xi = a + b, all those values of Zi satisfying r Zi < 2r will be assigned
the value of Xi = a + 2b, all those values of Zi satisfying 2r Zi < 3r will be
assigned the value of Xi = a + 3b, and so on up to all those values of Zi satisfying
(k 1)r Zi < kr will be assigned the value of Xi = a + kb. In other words all
values of Zi satisfying
(j 1)r Zi < jr
Xi = a + jb
for j = 1, 2, 3, ..., k are accepted and assigned the value Xi = a + jb. Since
(j 1)r Zi < jr
j 1 Zi /r < j
we see that
j = [Zi /r]
and so
Xi = a + [Zi /r]b
can be used to compute the value of Xi from the value of Zi whenever [Zi /r] k,
and any value of Zi kr is rejected. The number of acceptable value of Zi is then
equal to kr = k[m/k], and the number of rejected values is m kr = m k[m/k].
It should be noted that
jr (j 1)r = r
59
1
0.9
0.8
0.7
0.6
0.5
0.4
10
15
20
25
1
[x]
1
x
x
for all x > 1, we see that the efficiency of the method is better than 0.9 (90%)
when m/k 10.
1
x
k=0
p(1 p)
k1
1 (1 p)x
=p
1 (1 p)
= 1 (1 p)x
or simply
60
1
ln(1 R)
X = F (R) =
.
ln(1 p)
(15)
of independent random variables X1 . X2 , X3 , ..., Xn having a common distribution will, with probability 1, converge to the mean = E(X), of that common
distribution. In other words,
n
1
Pr lim
Xi = = 1.
n n
i=1
Therefore, simulation is ideal for approximating the average (or expected value)
of a random variable by simply computing
N
1
Xi
N i=1
and then since each of X, Y and Z are from U [0, 1), we have
1 1 1
1
1
A=
xzdxdydz = .
8
0
0
0 2
The area worksheet that accompanies this chapter shows the result obtained
using a simulation for N = 5000 samples and it agrees rather nicely with the
result of 1/8.
A more difficult analytical calculation is to compute the average perimeter of
such a triangle since the perimeter of one such triangle is
P = X + Y 2 + Z 2 + (Y X)2 + Z 2
resulting in
P =
1
0
(x +
y2 + z2 +
(y x)2 + z 2 )dxdydz
P = +
y + z dydz +
(y x)2 + z 2 dxdydz.
2
0
0
0
0
0
and
1
0
1
0
(y x)2 + z 2 dxdydz 0.65176
N
1
Xi
N i=1
2
for a large number of samples N. Let us now see how we may use simulation to
estimate probabilities.
Computing Probabilities Using The Strong Law of Large Numbers
Using simulation to compute probabilities is an important application of the
strong law of large numbers. It works by constructing probabilities as expected
values. Toward this end, suppose that a sequence of independent trials of some
experiment is performed and suppose that E is some fixed event of the experiment
and suppose that E occurs with probability Pr(E) on any particular trial. Defining
the random variable X by
= Pr(E)
E(X) = (1) Pr(E) + (0) Pr(E)
64
showing that the expected value of X is the same as the probability of the occurrence of E. Therefore, letting
1
lim
Xi = E(X) = P (E)
n n
i=1
or
n
1
Xi
Pr(E) = lim
n n
i=1
N
1
Pr(E)
Xi
N i=1
,
65
Y2 U [0, W ).
and the two coins will overlap when the distance between their centers,
D = (X2 X1 )2 + (Y2 Y1 )2
is less then or equal to the sum of their radii, i.e., when D R1 +R2 . To compute
the probability that the two coins overlap requires that we compute
P = Pr(D R1 + R2 ).
Using simulation to solve this problem, we first use our random number generator
(RAND() in Microsoft Excel) to generate four independent random numbers R11 ,
R12 , R13 and R14 and we use a + (b a)R to generate a sample from U [a, b). Thus
we set
X11 = 0 + (L 0)R11 = LR11
along with
X12 = 0 + (L 0)R13 = LR13
to generate X11 U[0, L), X12 U [0, L), Y11 U[0, W ) and Y12 U[0, W ).
Then we compute
D1 = (X12 X11 )2 + (Y12 Y11 )2
and we set Z as the random variable defined by
1, when D R1 + R2
Z=
.
0, when D > R1 + R2
so that
Z1 =
1, when D1 R1 + R2
0, when D1 > R1 + R2
We then use our random number generator (RAND() in Microsoft Excel) to generate another four independent random numbers R21 , R22 , R23 and R24 and we
set
X21 = 0 + (L 0)R21 = LR21
66
along with
X22 = 0 + (L 0)R23 = LR23
Then we compute
D2 =
and we set
Z2 =
(X22 X21 )2 + (Y22 Y21 )2
1, when D2 R1 + R2
0, when D2 > R1 + R2
Continuing this process and constructing Z3 , Z4 , ..., we then use the fact that
Pr(D R1 + R2 ) = E(Z)
to get
Pr(D R1 + R2 ) = E(Z) = lim
1
Zi
n i=1
N
1
Zi
N i=1
67