Sie sind auf Seite 1von 52

BST 401 Probability Theory

Xing Qiu Ha Youn Lee

Department of Biostatistics and Computational Biology


University of Rochester

September 21, 2009

Qiu, Lee BST 401


Outline

1 Lebesgue-Stieltjes Measure and Distribution Functions

2 Measurable functions

Qiu, Lee BST 401


Lebesgue measure review, the construction

Let F0 be the field generated by the collection of all


intervals, µ0 be the usual length measure of intervals.
Extend µ0 to µ1 : G → R, where G is F0 plus limiting sets
of F0 , µ1 on these limiting sets are defined by exchange
the order of limit and measure.
Extend µ1 to µ∗ , which is an outer measure defined on 2Ω .
Unfortunately, µ∗ in general does not satisfy
countable-additivity.
Restrict µ∗ to the collection of measurable sets, denoted by
F ∗ , which is a σ-algebra.

Qiu, Lee BST 401


Lebesgue measure review, the construction

Let F0 be the field generated by the collection of all


intervals, µ0 be the usual length measure of intervals.
Extend µ0 to µ1 : G → R, where G is F0 plus limiting sets
of F0 , µ1 on these limiting sets are defined by exchange
the order of limit and measure.
Extend µ1 to µ∗ , which is an outer measure defined on 2Ω .
Unfortunately, µ∗ in general does not satisfy
countable-additivity.
Restrict µ∗ to the collection of measurable sets, denoted by
F ∗ , which is a σ-algebra.

Qiu, Lee BST 401


Lebesgue measure review, the construction

Let F0 be the field generated by the collection of all


intervals, µ0 be the usual length measure of intervals.
Extend µ0 to µ1 : G → R, where G is F0 plus limiting sets
of F0 , µ1 on these limiting sets are defined by exchange
the order of limit and measure.
Extend µ1 to µ∗ , which is an outer measure defined on 2Ω .
Unfortunately, µ∗ in general does not satisfy
countable-additivity.
Restrict µ∗ to the collection of measurable sets, denoted by
F ∗ , which is a σ-algebra.

Qiu, Lee BST 401


Lebesgue measure review, the construction

Let F0 be the field generated by the collection of all


intervals, µ0 be the usual length measure of intervals.
Extend µ0 to µ1 : G → R, where G is F0 plus limiting sets
of F0 , µ1 on these limiting sets are defined by exchange
the order of limit and measure.
Extend µ1 to µ∗ , which is an outer measure defined on 2Ω .
Unfortunately, µ∗ in general does not satisfy
countable-additivity.
Restrict µ∗ to the collection of measurable sets, denoted by
F ∗ , which is a σ-algebra.

Qiu, Lee BST 401


Lebesgue measure review, the main results

The Carathéodory extension theorem. There exists one


and only one way to extend a σ-finite measure µ0 on an
algebra F0 to, F ,the σ-algebra generated by F0 .
The measure approximation theorem. For any A ∈ F and
a given  > 0, there exists a set B ∈ F0 such that
µ(A∆B) < .

Qiu, Lee BST 401


Lebesgue measure review, the main results

The Carathéodory extension theorem. There exists one


and only one way to extend a σ-finite measure µ0 on an
algebra F0 to, F ,the σ-algebra generated by F0 .
The measure approximation theorem. For any A ∈ F and
a given  > 0, there exists a set B ∈ F0 such that
µ(A∆B) < .

Qiu, Lee BST 401


Generalizations

Slight generalization of Lebesgue Measure:


µ((a, b]) = F (b) − F (a), where F (·) is a non-decreasing,
continuous function.
Further generalization: F (·) just needs to be a
right-continuous function. So jumps are allowed.
Def. of right-continuity: F (xn ) ↓ F (x) when xn ↓ x.
Such an F is called a Stieltjes measure function. If µ is a
probability measure, it is called the distribution function of
µ. We will use these two terms interchangeably.
Theorem (1.5), pg. 440. Given such an F , there is a
measure µ s.t. µ((a, b]) = F (b) − F (a).

Qiu, Lee BST 401


Generalizations

Slight generalization of Lebesgue Measure:


µ((a, b]) = F (b) − F (a), where F (·) is a non-decreasing,
continuous function.
Further generalization: F (·) just needs to be a
right-continuous function. So jumps are allowed.
Def. of right-continuity: F (xn ) ↓ F (x) when xn ↓ x.
Such an F is called a Stieltjes measure function. If µ is a
probability measure, it is called the distribution function of
µ. We will use these two terms interchangeably.
Theorem (1.5), pg. 440. Given such an F , there is a
measure µ s.t. µ((a, b]) = F (b) − F (a).

Qiu, Lee BST 401


Generalizations

Slight generalization of Lebesgue Measure:


µ((a, b]) = F (b) − F (a), where F (·) is a non-decreasing,
continuous function.
Further generalization: F (·) just needs to be a
right-continuous function. So jumps are allowed.
Def. of right-continuity: F (xn ) ↓ F (x) when xn ↓ x.
Such an F is called a Stieltjes measure function. If µ is a
probability measure, it is called the distribution function of
µ. We will use these two terms interchangeably.
Theorem (1.5), pg. 440. Given such an F , there is a
measure µ s.t. µ((a, b]) = F (b) − F (a).

Qiu, Lee BST 401


Generalizations

Slight generalization of Lebesgue Measure:


µ((a, b]) = F (b) − F (a), where F (·) is a non-decreasing,
continuous function.
Further generalization: F (·) just needs to be a
right-continuous function. So jumps are allowed.
Def. of right-continuity: F (xn ) ↓ F (x) when xn ↓ x.
Such an F is called a Stieltjes measure function. If µ is a
probability measure, it is called the distribution function of
µ. We will use these two terms interchangeably.
Theorem (1.5), pg. 440. Given such an F , there is a
measure µ s.t. µ((a, b]) = F (b) − F (a).

Qiu, Lee BST 401


Generalizations

Slight generalization of Lebesgue Measure:


µ((a, b]) = F (b) − F (a), where F (·) is a non-decreasing,
continuous function.
Further generalization: F (·) just needs to be a
right-continuous function. So jumps are allowed.
Def. of right-continuity: F (xn ) ↓ F (x) when xn ↓ x.
Such an F is called a Stieltjes measure function. If µ is a
probability measure, it is called the distribution function of
µ. We will use these two terms interchangeably.
Theorem (1.5), pg. 440. Given such an F , there is a
measure µ s.t. µ((a, b]) = F (b) − F (a).

Qiu, Lee BST 401


Definitions

The converse of Theorem (1.5) is almost true. For most


measures we can define its distribution function. The only
exceptions: µ((a, b]) = ∞ for some finite interval.
Definition
A Lebesgue-Stieltjes measure on R is a measure µ : B → R
such that µ(I) < ∞ for each bounded interval.

Alternatively, we may define L-S measure by F (·) which


satisfies the non-decreasing and right-continuity conditions.

Qiu, Lee BST 401


Comments and examples

Page 1.4.5.

Qiu, Lee BST 401


Discrete measure

Page 26.
Let µ be a L-S measure that is concentrated on a
countable set S = {x1 , x2 , . . . , }.
Distribution function: step function.
µ can be extended to 2Ω .

Qiu, Lee BST 401


Discrete measure

Page 26.
Let µ be a L-S measure that is concentrated on a
countable set S = {x1 , x2 , . . . , }.
Distribution function: step function.
µ can be extended to 2Ω .

Qiu, Lee BST 401


Discrete measure

Page 26.
Let µ be a L-S measure that is concentrated on a
countable set S = {x1 , x2 , . . . , }.
Distribution function: step function.
µ can be extended to 2Ω .

Qiu, Lee BST 401


Restriction

In the discrete measure example, we can restrict µ to 2S


instead of B. It pretty much has the same property.
Remark: why we don’t say restrict µ on S?
Another restriction example: µ is concentrated on some
interval [a, b].
Construction of B[a, b]. Then we can restrict µ on this
σ-field without loosing its mathematical properties.

Qiu, Lee BST 401


Restriction

In the discrete measure example, we can restrict µ to 2S


instead of B. It pretty much has the same property.
Remark: why we don’t say restrict µ on S?
Another restriction example: µ is concentrated on some
interval [a, b].
Construction of B[a, b]. Then we can restrict µ on this
σ-field without loosing its mathematical properties.

Qiu, Lee BST 401


Restriction

In the discrete measure example, we can restrict µ to 2S


instead of B. It pretty much has the same property.
Remark: why we don’t say restrict µ on S?
Another restriction example: µ is concentrated on some
interval [a, b].
Construction of B[a, b]. Then we can restrict µ on this
σ-field without loosing its mathematical properties.

Qiu, Lee BST 401


Restriction

In the discrete measure example, we can restrict µ to 2S


instead of B. It pretty much has the same property.
Remark: why we don’t say restrict µ on S?
Another restriction example: µ is concentrated on some
interval [a, b].
Construction of B[a, b]. Then we can restrict µ on this
σ-field without loosing its mathematical properties.

Qiu, Lee BST 401


L-S measure on Rn

Sketch of construction
The analogy of intervals: rectangles.
Open rectangles, closed rectangles, semi-closed
rectangles: just need to know their two vertices. Same
notation: (a, b]
The smallest σ-field containing all rectangles: B(Rn ).
A L-S measure on Rn is a measure µ : B(Rn ) → R such
that µ(I) < ∞ for each bounded rectangle I.

Qiu, Lee BST 401


L-S measure on Rn

Sketch of construction
The analogy of intervals: rectangles.
Open rectangles, closed rectangles, semi-closed
rectangles: just need to know their two vertices. Same
notation: (a, b]
The smallest σ-field containing all rectangles: B(Rn ).
A L-S measure on Rn is a measure µ : B(Rn ) → R such
that µ(I) < ∞ for each bounded rectangle I.

Qiu, Lee BST 401


L-S measure on Rn

Sketch of construction
The analogy of intervals: rectangles.
Open rectangles, closed rectangles, semi-closed
rectangles: just need to know their two vertices. Same
notation: (a, b]
The smallest σ-field containing all rectangles: B(Rn ).
A L-S measure on Rn is a measure µ : B(Rn ) → R such
that µ(I) < ∞ for each bounded rectangle I.

Qiu, Lee BST 401


L-S measure on Rn

Sketch of construction
The analogy of intervals: rectangles.
Open rectangles, closed rectangles, semi-closed
rectangles: just need to know their two vertices. Same
notation: (a, b]
The smallest σ-field containing all rectangles: B(Rn ).
A L-S measure on Rn is a measure µ : B(Rn ) → R such
that µ(I) < ∞ for each bounded rectangle I.

Qiu, Lee BST 401


L-S Measure on Rn (II)

Its distribution function is defined to be F = µ((−∞, x]).


These distribution functions are also
non-decreasing. For ~a ≤ ~b, that is, a1 ≤ b1 , a2 ≤ b2 , . . ., we
have: F (a) ≤ F (b).
Right-continuous. F is right continuous in all variables.
On the other hand, just as in the 1-dim case, for any
non-decreasing, right continuous function, there exist a
unique measure on B(Rn ) corresponding with it.

Qiu, Lee BST 401


L-S Measure on Rn (II)

Its distribution function is defined to be F = µ((−∞, x]).


These distribution functions are also
non-decreasing. For ~a ≤ ~b, that is, a1 ≤ b1 , a2 ≤ b2 , . . ., we
have: F (a) ≤ F (b).
Right-continuous. F is right continuous in all variables.
On the other hand, just as in the 1-dim case, for any
non-decreasing, right continuous function, there exist a
unique measure on B(Rn ) corresponding with it.

Qiu, Lee BST 401


L-S Measure on Rn (II)

Its distribution function is defined to be F = µ((−∞, x]).


These distribution functions are also
non-decreasing. For ~a ≤ ~b, that is, a1 ≤ b1 , a2 ≤ b2 , . . ., we
have: F (a) ≤ F (b).
Right-continuous. F is right continuous in all variables.
On the other hand, just as in the 1-dim case, for any
non-decreasing, right continuous function, there exist a
unique measure on B(Rn ) corresponding with it.

Qiu, Lee BST 401


L-S Measure on Rn (II)

Its distribution function is defined to be F = µ((−∞, x]).


These distribution functions are also
non-decreasing. For ~a ≤ ~b, that is, a1 ≤ b1 , a2 ≤ b2 , . . ., we
have: F (a) ≤ F (b).
Right-continuous. F is right continuous in all variables.
On the other hand, just as in the 1-dim case, for any
non-decreasing, right continuous function, there exist a
unique measure on B(Rn ) corresponding with it.

Qiu, Lee BST 401


L-S Measure on Rn (II)

Its distribution function is defined to be F = µ((−∞, x]).


These distribution functions are also
non-decreasing. For ~a ≤ ~b, that is, a1 ≤ b1 , a2 ≤ b2 , . . ., we
have: F (a) ≤ F (b).
Right-continuous. F is right continuous in all variables.
On the other hand, just as in the 1-dim case, for any
non-decreasing, right continuous function, there exist a
unique measure on B(Rn ) corresponding with it.

Qiu, Lee BST 401


Measure of a finite rectangle

This is a main difference.


µ((~a, ~b]) 6= F (~b) − F (~a).
Draw a two dimensional example to illustrate this point.
Page 442-443 describes an elaborated way of measuring a
rectangle by means of the distribution function, but that
formula is not often used. The reason is that later we will
see that we can define density function for most common,
useful distributions, and there is a easy way to calculate
measure of a rectangle (or an arbitrary region for that
matter) using integral.

Qiu, Lee BST 401


Measure of a finite rectangle

This is a main difference.


µ((~a, ~b]) 6= F (~b) − F (~a).
Draw a two dimensional example to illustrate this point.
Page 442-443 describes an elaborated way of measuring a
rectangle by means of the distribution function, but that
formula is not often used. The reason is that later we will
see that we can define density function for most common,
useful distributions, and there is a easy way to calculate
measure of a rectangle (or an arbitrary region for that
matter) using integral.

Qiu, Lee BST 401


Measure of a finite rectangle

This is a main difference.


µ((~a, ~b]) 6= F (~b) − F (~a).
Draw a two dimensional example to illustrate this point.
Page 442-443 describes an elaborated way of measuring a
rectangle by means of the distribution function, but that
formula is not often used. The reason is that later we will
see that we can define density function for most common,
useful distributions, and there is a easy way to calculate
measure of a rectangle (or an arbitrary region for that
matter) using integral.

Qiu, Lee BST 401


An example of measurable function

A probability example to show the motivation. Let


(Ω, F , P) be a probability space. To be more specific, let’s
say Ω = {HEAD, TAIL-a, TAIL-b},
F = {φ, Ω, {HEAD} , {TAIL-a, TAIL-b}},
µ({HEAD}) = 12 , µ({TAIL-a, TAIL-b}) = 12 .
µ is a probability measure. Interpretation: probability of
seeing HEAD or TAIL (including a,b, types) is both 1/2.
Now define a function h on Ω in this way: h(HEAD) = −1,
h(TAIL-a) = h(TAIL-b) = 1. (Such a function is sometimes
called a coding function).

Qiu, Lee BST 401


An example of measurable function

A probability example to show the motivation. Let


(Ω, F , P) be a probability space. To be more specific, let’s
say Ω = {HEAD, TAIL-a, TAIL-b},
F = {φ, Ω, {HEAD} , {TAIL-a, TAIL-b}},
µ({HEAD}) = 12 , µ({TAIL-a, TAIL-b}) = 12 .
µ is a probability measure. Interpretation: probability of
seeing HEAD or TAIL (including a,b, types) is both 1/2.
Now define a function h on Ω in this way: h(HEAD) = −1,
h(TAIL-a) = h(TAIL-b) = 1. (Such a function is sometimes
called a coding function).

Qiu, Lee BST 401


An example of measurable function

A probability example to show the motivation. Let


(Ω, F , P) be a probability space. To be more specific, let’s
say Ω = {HEAD, TAIL-a, TAIL-b},
F = {φ, Ω, {HEAD} , {TAIL-a, TAIL-b}},
µ({HEAD}) = 12 , µ({TAIL-a, TAIL-b}) = 12 .
µ is a probability measure. Interpretation: probability of
seeing HEAD or TAIL (including a,b, types) is both 1/2.
Now define a function h on Ω in this way: h(HEAD) = −1,
h(TAIL-a) = h(TAIL-b) = 1. (Such a function is sometimes
called a coding function).

Qiu, Lee BST 401


An example of measurable function (II)

This function codes HEAD, TAIL-a, TAIL-b into numbers,


which are much easier to process than plain English
descriptions!
One nice thing is, we can talk about P(h = −1) and
P(h = 1) instead of µ(HEAD), µ({TAIL-a, TAIL-b}).
h not only maps arbitrary events into tangible numbers, but
also maps a measure defined on an arbitrary space to a
measure defined for numbers.

Qiu, Lee BST 401


An example of measurable function (II)

This function codes HEAD, TAIL-a, TAIL-b into numbers,


which are much easier to process than plain English
descriptions!
One nice thing is, we can talk about P(h = −1) and
P(h = 1) instead of µ(HEAD), µ({TAIL-a, TAIL-b}).
h not only maps arbitrary events into tangible numbers, but
also maps a measure defined on an arbitrary space to a
measure defined for numbers.

Qiu, Lee BST 401


An example of measurable function (II)

This function codes HEAD, TAIL-a, TAIL-b into numbers,


which are much easier to process than plain English
descriptions!
One nice thing is, we can talk about P(h = −1) and
P(h = 1) instead of µ(HEAD), µ({TAIL-a, TAIL-b}).
h not only maps arbitrary events into tangible numbers, but
also maps a measure defined on an arbitrary space to a
measure defined for numbers.

Qiu, Lee BST 401


Another example (II)

More complex examples: measuring height of trees. Ω:


trees (descriptive). P: probability of the height of trees.
h : Ω → R maps every tree to a real number (of
centimeters, or inches, etc).
In this example, we may want to estimate probabilities in
this form: P(a < h(ω) ≤ b), i.e., , the probability of a
certain range of height.
A notation: for any function h : Ω → R and for a set A ⊂ R,
denote h−1 (A) to be {ω ∈ Ω|h(ω) ∈ A}. Draw a diagram to
show this set.
By this notation, P(a < h(ω) ≤ b) = P(h−1 ((a, b])).

Qiu, Lee BST 401


Another example (II)

More complex examples: measuring height of trees. Ω:


trees (descriptive). P: probability of the height of trees.
h : Ω → R maps every tree to a real number (of
centimeters, or inches, etc).
In this example, we may want to estimate probabilities in
this form: P(a < h(ω) ≤ b), i.e., , the probability of a
certain range of height.
A notation: for any function h : Ω → R and for a set A ⊂ R,
denote h−1 (A) to be {ω ∈ Ω|h(ω) ∈ A}. Draw a diagram to
show this set.
By this notation, P(a < h(ω) ≤ b) = P(h−1 ((a, b])).

Qiu, Lee BST 401


Another example (II)

More complex examples: measuring height of trees. Ω:


trees (descriptive). P: probability of the height of trees.
h : Ω → R maps every tree to a real number (of
centimeters, or inches, etc).
In this example, we may want to estimate probabilities in
this form: P(a < h(ω) ≤ b), i.e., , the probability of a
certain range of height.
A notation: for any function h : Ω → R and for a set A ⊂ R,
denote h−1 (A) to be {ω ∈ Ω|h(ω) ∈ A}. Draw a diagram to
show this set.
By this notation, P(a < h(ω) ≤ b) = P(h−1 ((a, b])).

Qiu, Lee BST 401


Another example (II)

More complex examples: measuring height of trees. Ω:


trees (descriptive). P: probability of the height of trees.
h : Ω → R maps every tree to a real number (of
centimeters, or inches, etc).
In this example, we may want to estimate probabilities in
this form: P(a < h(ω) ≤ b), i.e., , the probability of a
certain range of height.
A notation: for any function h : Ω → R and for a set A ⊂ R,
denote h−1 (A) to be {ω ∈ Ω|h(ω) ∈ A}. Draw a diagram to
show this set.
By this notation, P(a < h(ω) ≤ b) = P(h−1 ((a, b])).

Qiu, Lee BST 401


Non-measurable function

Now let us define another function g on Ω. g(HEAD) = −1,


g(TAIL-a) = 0, g(TAIL-b) = 1.
This function is bad because P(g = 1) and P(g = 0) are
not well defined!
Gambling interpretation: TAIL-a has certain probability that
is unmeasurable for us players. All we can observe are the
HEAD: lose one dollar; TAIL: most of time (TAIL-b) we win
one dollar, but sometimes the result is canceled by the
casino (TAIL-a).

Qiu, Lee BST 401


Non-measurable function

Now let us define another function g on Ω. g(HEAD) = −1,


g(TAIL-a) = 0, g(TAIL-b) = 1.
This function is bad because P(g = 1) and P(g = 0) are
not well defined!
Gambling interpretation: TAIL-a has certain probability that
is unmeasurable for us players. All we can observe are the
HEAD: lose one dollar; TAIL: most of time (TAIL-b) we win
one dollar, but sometimes the result is canceled by the
casino (TAIL-a).

Qiu, Lee BST 401


Non-measurable function

Now let us define another function g on Ω. g(HEAD) = −1,


g(TAIL-a) = 0, g(TAIL-b) = 1.
This function is bad because P(g = 1) and P(g = 0) are
not well defined!
Gambling interpretation: TAIL-a has certain probability that
is unmeasurable for us players. All we can observe are the
HEAD: lose one dollar; TAIL: most of time (TAIL-b) we win
one dollar, but sometimes the result is canceled by the
casino (TAIL-a).

Qiu, Lee BST 401


Random variables and Borel-measurable functions

Start with an arbitrary probability space: (Ω, F , P).


A function h : Ω → R is called a random variable if
h−1 ((a, b]) is always measurable, i.e., , P(a < h(x) ≥ b) is
always well defined.
From the Carathéodory extension theorem, we know we
can extend intervals to Borel sets. In other words, if h is a
random variable, then for any Borel set B, h−1 (B) is
measurable.
R can be extended to Rn . I.e., functions taking vector
values. These functions are called n-dimensional random
vectors.
Furthermore, if P is replaced by an arbitrary measure, h is
called an n-dimensional Borel-measurable function.

Qiu, Lee BST 401


Random variables and Borel-measurable functions

Start with an arbitrary probability space: (Ω, F , P).


A function h : Ω → R is called a random variable if
h−1 ((a, b]) is always measurable, i.e., , P(a < h(x) ≥ b) is
always well defined.
From the Carathéodory extension theorem, we know we
can extend intervals to Borel sets. In other words, if h is a
random variable, then for any Borel set B, h−1 (B) is
measurable.
R can be extended to Rn . I.e., functions taking vector
values. These functions are called n-dimensional random
vectors.
Furthermore, if P is replaced by an arbitrary measure, h is
called an n-dimensional Borel-measurable function.

Qiu, Lee BST 401


Random variables and Borel-measurable functions

Start with an arbitrary probability space: (Ω, F , P).


A function h : Ω → R is called a random variable if
h−1 ((a, b]) is always measurable, i.e., , P(a < h(x) ≥ b) is
always well defined.
From the Carathéodory extension theorem, we know we
can extend intervals to Borel sets. In other words, if h is a
random variable, then for any Borel set B, h−1 (B) is
measurable.
R can be extended to Rn . I.e., functions taking vector
values. These functions are called n-dimensional random
vectors.
Furthermore, if P is replaced by an arbitrary measure, h is
called an n-dimensional Borel-measurable function.

Qiu, Lee BST 401


Random variables and Borel-measurable functions

Start with an arbitrary probability space: (Ω, F , P).


A function h : Ω → R is called a random variable if
h−1 ((a, b]) is always measurable, i.e., , P(a < h(x) ≥ b) is
always well defined.
From the Carathéodory extension theorem, we know we
can extend intervals to Borel sets. In other words, if h is a
random variable, then for any Borel set B, h−1 (B) is
measurable.
R can be extended to Rn . I.e., functions taking vector
values. These functions are called n-dimensional random
vectors.
Furthermore, if P is replaced by an arbitrary measure, h is
called an n-dimensional Borel-measurable function.

Qiu, Lee BST 401


Random variables and Borel-measurable functions

Start with an arbitrary probability space: (Ω, F , P).


A function h : Ω → R is called a random variable if
h−1 ((a, b]) is always measurable, i.e., , P(a < h(x) ≥ b) is
always well defined.
From the Carathéodory extension theorem, we know we
can extend intervals to Borel sets. In other words, if h is a
random variable, then for any Borel set B, h−1 (B) is
measurable.
R can be extended to Rn . I.e., functions taking vector
values. These functions are called n-dimensional random
vectors.
Furthermore, if P is replaced by an arbitrary measure, h is
called an n-dimensional Borel-measurable function.

Qiu, Lee BST 401

Das könnte Ihnen auch gefallen