Beruflich Dokumente
Kultur Dokumente
Chapter 3
. . . there are many functions which are not at all suited for approximation by
a single polynomial in the entire interval which is of interest.
G. Dahlquist and . Bjrk
A panorama
The main goal of this chapter is to begin exploring the deep connection between proba-
bility and analysis. That connection is fully described by measure theory. But it is not at
all obvious, which may be the reason measure theory was not applied to probability until
thirty-some years after it was developed.
Analysis becomes relevant when we ask more about probability than simply carrying
out familiar probability computations in discrete probability, e.g. chances of picking a
particular card from a deck. For example, much deeper mathematical issues arise when
we play a probability game and over, e.g. how is the probability of choosing a king from
a single shuffled deck related to the results that are obtained when we choose at random a
single card from each deck in a large, or infinite, collection of decks. That relation is key
to using probability as a model for physical phenomena and developing a systematic way
to carry out probability computations.
To give an example of a deep question in probability, we begin by explaining the Law
of Large Numbers, which partially answers the question of what happens when we re-
peat a random experiment over and over. This theorem is a central theme because it is
connected to the use of probability as a mathematical model and we develop a number
21
i
Definition 3.1.1
Example 3.1.1
The experiment is to draw a card from a standard deck. We can classify the possible
outcome in a number of ways, e.g.,
Sample space 1 A point in the space of 52 outcomes.
Note that sample spaces 2 and 3 are sets whose points are sets.
There is a special case of a sequence of trials that greatly simplifies analysis, but is also
still important.
Definition 3.1.2
Example 3.1.2
Example 3.1.3
Definition 3.1.3
Example 3.1.4
Definition 3.1.4
3. If A is an event and P (A) is the probability of A, then P (A) is the sum of prob-
abilities of the outcomes in A.
P is generally reserved for probability. The probability (function) on the events in
the sample space is the function that assigns a probability to each event. For later
reference, probability is a non-negative finitely additive set function.
i
Probability is associated with randomness and uncertainty and the rules governing
probability reflect properties of the experiment. But, it is important to note that there
is nothing uncertain or random about the rules governing probability. The connection
to uncertainty or randomness comes through the interpretation of the probability values
placed on the outcomes and how those values are assigned.
Example 3.1.5
Consider the experiment of flipping a two side coin with a head side (H ) and a tail
side (T ). The sample space is {H , T }. Given the complexity of modeling the physics
of the motion through the flip to the catch, we might assign probability by assuming
that each outcome is equally likely, i.e. P (H ) = P (T ) = 1/2. The randomness or
uncertainty in the experiment is that, short of carrying out a complex predictive
physics simulation, we cannot predict which outcome will occur before the toss is
made.
In general, a common approach for assigning probabilities in the absence of any infor-
mation about probabilities of events is based on assuming each outcome is equally likely.
If the sample space has n outcomes, then P (any outcome) = 1/n. It follows that,
Example 3.1.6
Example 3.1.7
Definition 3.1.5
In discrete probability, a sample space together with its power set and a set of prob-
abilities is called a probability space. If X is the sample space and P the probability,
then we write (X, PX , P ) to emphasize the three ingredients.
Definition 3.1.6
A sure event must occur in an experiment, so it contains the entire sample space.
An almost sure event is an event with probability one. An event with probability
zero happens almost never. An impossible event never occurs in an experiment,
so it is the event with no outcomes.
Definition 3.1.7
Probabilities must satisfy certain properties with respect to taking unions and inter-
sections of events. For example,
Theorem 3.1.1
Definition 3.1.8
Two events in a probability space are (mutually) exclusive if they have no outcomes
in common.
Theorem 3.1.2
experiment. It is a result that we discuss several more times throughout the book. In this
section, we give an elementary proof of a simple version that does not require measure
theory or any significant probability theory.
We work in a sample space associated with a given experiment. We assume that a
certain outcome O occurs with some specific probability x when the experiment is con-
ducted, but that we do not know x. How might we determine it? If we conduct a single
trial, O might result or it might not. In either case, it gives little information about x.
However, if we conduct a large number m 1 of trials, intuition suggests O should oc-
cur approximately x m times, at least most of the time. Another way of stating this
intuition is,
number of times O occurs
probability of O .
total number of trials
But, we have to be wary about intuitive feelings:
Example 3.2.1
If we conduct a set of trials in which we flip a fair coin many times (m), we expect to
m
see around 50% ( 2 ) heads most of the time. However, it turns out the probability
of getting exactly m/2 heads in m flips is approximately,
1
p ,
m
So, we have to be careful about how we state obtaining the expected result. Moreover,
we might have a run of either good or bad luck in the sequence of experiments, that
undermines the intuition.
Theorem 3.2.1: Law of Large Numbers
k
P x < > 1 ,
(3.4)
m
It is important to spend some time reading the conclusion of Theorem 3.2.1 and un-
derstanding its meaning. The theorem does not say O will occur exactly x m times in m
trials nor that O must occur approximately x m times in m trials. The role of is that
it quantifies the way in which k/m approximates x, thus avoiding the issue in Ex. 3.2.1.
The role of is that it allows the possibility, however small in probability, that the se-
quence of trials can produce a result that is not expected. For example, we would have the
(mis)fortune to obtain all heads in the sequence of trials. By making small, we obtain a
better approximation to x. By making small, we obtain the expected result with higher
probability. The cost in each case is that having to conduct a large number m of trials.
As well as being interesting in its own right, the Law of Large Numbers (LLN) is
centrally important to various aspects of probability theory. For example, it is important
i
Example 3.2.2
We could try to describe the result of a particular coin flip for a fair coin determin-
istically by using physics involving the initial position on the thumb, equations de-
scribing the effects of force and direction of the flip, the effect of airflow, and so on.
In the absence of such a detailed computation for a particular flip, it is reasonable to
believe that the outcome is equally likely to be head or tails.
The LLN describes how the assignment of a probability to the event in question can be
validated by repeated independent experiments.
The LLN can be proved using a very elementary argument based on the binomial
expansion. Of course, the binomial coefficients and binomial expansions are very impor-
tant in probability. We recall the basic ideas.
Definition 3.2.1
Theorem 3.2.2
Definition 3.2.2
Theorem 3.2.3
Theorem 3.2.4
Proof. We prove Theorem 3.2.1 by proving that given > 0 and > 0
X
p m,k (x) > 1 ,
0km
k
m x <
which is estimated
X k 2
X 1 1
p m,k (x) 2 x p m,k (x) T (x),
2 2 m
m m
0km 0km
k k
m x m x
i
where
m
X
T m (x) = (k m x)2 p m,k (x).
k=0
m
Using (3.5c) and (3.5d), we find T m (x) = m x(1 x), and so T m (x) 4
, for all 0 x 1.
Therefore,
X 1 X 1
p m,k (x) and p m,k (x) 1 , 0 x 1. (3.6)
4m 2 4m 2
0km 0km
k k
m x m x <
Remark 3.2.1
It is interesting to consider how the final line implies the result. Given and , we
require
1
m .
4 2
This can be achieved uniformly with respect to the value of x. However, increasing
the accuracy by decreasing requires a very substantial increase in the number of
trials m. This adverse scaling occurs again, unfortunately.
Assume f C ([a, b ]). Given > 0, there is a polynomial b m with sufficiently high
degree m, such that
sup | f (x) b m (x)| < .
[a,b ]
We observe that the set of polynomials with rational coefficients is dense in the space
of all polynomials on [a, b ] with respect to the sup metric. The set of polynomials with
rational coefficients is a countable set. We conclude,
Theorem 3.3.2
C ([a, b ]) is separable.
Definition 3.3.1
Definition 3.3.2
3.4. References 31
[0, 1],
9
| f (x) b m ( f , x)| ( f , m 1/2 ). (3.7)
4
Using (3.5b), we write the error as
m
X m
X m
X
f (x) b m ( f , x) = f (x)p m,k (x) f (xk )p m,k (x) = ( f (x) f (xk ))p m,k (x).
k=0 k=0 k=0
For II, we note that there is a C such that | f (x)| C , for 0 x 1. Hence, II is small by
the Law of Large Numbers. More precisely, (3.6) in the proof of the LLN implies
X C
|II| 2C p m,k (x) .
0km
2m 2
k
m x
So, we can make II as small as desired by taking m large. It is a good exercise to show that
in fact,
1
|II| ( f , ) 1 + ,
4m 2
and so,
1
| f (x) b m ( f , x)| ( f , ) 2 + .
4m 2
Setting = m 1/2 proves the result.
3.4 References
3.5 Worked problems