Beruflich Dokumente
Kultur Dokumente
M
n
proves useful.
M
n
M
n
M
n
a
n
b
Mn
. If a = b = 1
this simplies to 2
M
=
P
M
n=0
M
n
.
15
Assume that event space has the following properties:
Axiom 10 /. That is, the event that one of the outcomes occurs is an
event. Note that Pr[] = 1.
Axiom 11 if A / then
A / where
A is the compliment of A. That is, if
A is an event, then not A is an event
Axiom 12 if A
1
and A
2
/ then A
1
'A
2
/. That is, either event happening
is an event
Any collection of events that fullls the above three assumptions/axioms is
called a Boolean algebra.
8
These three axioms/assumptions about event space along with the deni-
tions of events and events space imply the following:
1. O / This follows from the rst two Axioms. Why?
2. if A
1
and A
2
/, then A
1
A
2
/
3. if A
1
; A
2
; :::A
n
/ then
n
i=1
A
i
and
n
i=1
A
i
/
Can you convince your fellow students that these three theorems follow log-
ically from the three axioms?
8
According to MathWorld, The Boolean algebra of a set S is the set of subsets of S that
can be obtained from S by means of a nite number of the set operations \ [ and n. For
more details, see see??
Wikipedia says, or used to say, "a Boolean algebra is an algebraic structure (a collection of
elements and operations on them obeying dening axioms) that captures essential properties
of both set operations and logic operations. Specically, it deals with the set operations of
intersection, union, complement; and the logic operations of AND, OR, NOT." See see??
16
For example if A
1
and A
2
/, then by the third axiom A
1
'A
2
/. Then
17
by the second axiom A
1
' A
2
/. But from De Morgans law we know that
A' B = A B. So A
1
' A
2
= A
1
A
2
/.
Try and answer the following question from the rst set of review questions
1. The experiment is that a coin is ipped twice. How many outcomes are
in the sample space and what are they? Now dene and enumerate the
event space.
18
1.6.2 The probability function, Pr[A]
Paraphrasing MGB, recollect that denotes the sample space and that / is the
set of events, an algebra of events, of interest from some random experiment.
Denition 13 The axiomatic denition of probability: Given our denition of
events and denition of event space. And given the three axioms imposed on
event space, a probability function Pr[:] will exist that maps events in / onto
the 0; 1 interval
9
That is, there exists some function that identies the probability
associated with any event in /, Pr[A].
Assume it has the following three properties:
Axiom 14 Pr[A] _ 0 \A /
Axiom 15 Pr[] = 1 /
Axiom 16 If A
1
; A
2
; :::; A
n
is a sequence of mutually exclusive events, (A
i
A
j
= \i; j i ,= j) and if '
n
i=1
A
i
/ then Pr['
n
i=1
A
i
] =
n
i=1
Pr[A
i
].
From this denition of a probability function (with the three axioms) and
the earlier denitions and axioms, it is possible to deduce a bunch of additional
properties that a probability function must have (MGB). Including
Pr[O] = 0
Pr[
A] = 1 Pr[A]
If A and B / and A _ B then Pr[A] _ Pr[B]
If A
1
; A
2
; :::A
n
/ then Pr[A
1
' A
2
' ::: ' A
n
] _ Pr[A
1
] + Pr[A
2
] +::: +
Pr[A
n
]. This is called Booles inequality.
If A and B / then Pr[A] = Pr[A B] + Pr[A
B]:
10
That is, Pr[A] =
Pr[AB] + Pr[A
B]
If A and B / then Pr[A
B] = Pr[A] Pr[AB]. That is, Pr[AB] =
Pr[A] Pr[AB]
Put simply, the axiomatic probability theory builds up the notion of proba-
bility from a number of assumptions/axioms.
The axiomatic approach subsumes and incorporates the traditions of the the
Classical and Frequency approaches.
9
Note that A consists of events not numbers. That is, the domain of the function is events
(collections of sets), so, formally speaking, Pr[A] is a set function.
You are likely more familiar with function where the domain of the function is not a collec-
tion of sets, but rather all or part of the real line, or all or part of Eucilian N space.
10
Note that Pr[A\ B] Pr[AB] and Pr[A\
B] Pr[AB]
19
Now that we have dened the probability function Pr[A] we can return to
the earlier assertion that all events are subsets of the sample space, but that for
sample spaces of innite size, some subsets of the sample space are not events.
MBG, page 23, explain this as follows.
In our denition of event and /, a collection of events, we stated
that / cannot always be taken to be the collection of all subsets of .
The reason for this is that for "suciently large" the collection of
all subsets of is so large that it is impossible to dene a probabilty
function consistent with the above axioms (referring to the three
axioms above).
I am not sure I completely understand, but here is a shot at further expla-
nation (maybe your other stats books have a better explanation).
MGB are dening events as being something dierent from a subset of the
sample space: being a subset of the sample is necc. but not sucient.
This raises the question of what are the other required conditions? Read-
ing between the lines, MGB are saying that an event is something that has a
probability associated with it, and that probability must be consistent with:
Pr[A] _ 0 \A /, Pr[A] _ 0 \A /, and If A
1
; A
2
; :::; A
n
is a sequence of
mutually exclusive events, (A
i
A
j
= \i; j i ,= j) and if '
n
i=1
A
i
/ then
Pr['
n
i=1
A
i
] =
n
i=1
Pr[A
i
].
So, if I read MGB correctly, they are saying if there is a large enough number
of subsets of the sample space, there is not a probability function that maps all
of those subsets into probabilities and fullls the above axioms.
As an aside this is not typically a problem when one works with innite
sample space because, typically, one is only interested in a unlarge number of
events. Consider, for example, a sample of one observation when the the sample
space is the real line from zero to one, including the end points. This sample
space consists of an innite number of samples/points. There is an innite
number of events. That said, we can, for example, consider only two events:
< :5 and
_
_5, and easily dene the probability function for both of these events,
in a way that is consistent with the above axioms.
Thinking about his raises the following question in my mind: if the sample
space is = ! : 0 _ ! _ 1, each number in this range is a possible sample (a
point in the sample space), and also an event. Some of these samples can only be
identied with an innite number of digits, e.g. :5721::::::666:::::::::::175::::::763:::.
One can explicitly dene events that are ranges of numbers, but I am not sure
about some of the individual numbers. There is also the question in my mind
of the probability of drawing a specic number. Is it zero=
1
1
?
20
Returning to Fellers book, page 14
(ital. added by Feller)The sample space provides the model of an ideal
experiment in the sense that that, by denition, every thinkable outcome
of the experiment is completely describe by one, and one one, sample
point. It is meaningful to talk about an event A only when it is clear
for every outcome of the experiment whether the event A has or has not
occured. The collection of all those sample points representing outcomes
where A has occured completely describes the event. Conversely, any
given aggregate A containing one or more sample points can be called an
event: this event does, or does not, occur according as the outcome of
the experiment is, or is not, represented by a point of the aggregate A.
We therefore dene the word event to mean the same as an aggregate
of sample points. We shall say that an event A consists of (or contains)
certain points, namely those representing outcomes of the ideal experiment
in which A occurs.
21
I take this to mean that Feller is dening an event and a subset of the sample
space as the same thing, contradicting MGB.
However, on page 18 Feller says, "In this volume we shall consider only
discrete sample spaces." in which case, I think, MGB and Feller would agree
that events and subsets of the sample space are one and the same.
11
1.6.3 Note that at this point, we know what an axiomatic probability
function is, but dont know, in general, how to determine the
probability of an event.
But, we know how to calculate the probabilities if the number of outcomes
are nite and equally likely (the classical world of probability). In this case
Pr[A] =
N(A)
N()
where N() is the number of elements in (number of possible
outcomes), and N(A) is the number of elements in A.
If the number of outcomes are nite but all outcomes are not equally likely,
one can determine the probability of an event if one knows the probability
associated with each element in , Pr[!
j
]: Pr[A] =
j2A
Pr[!
j
].
12
Consider a case where the sample space has an innite number of elements,
not necesarily all equally likely. In this case, one can estimate Pr[A] using the
frequency denition of probability. Probabilities estimated using the frequency
approach fulll all the required properties (axioms) of the axiomatic denition
of Pr[A], so an axiomatic approach to probability.
Consider the case, where the number of outcomes is nite, they are not
equally likely and one does not know the Pr[!
j
]. In this case one can also
estimate Pr[A] using the frequency denition of probability.
13
Much of what we do in econometrics to estimate probabilities is in the spirit
of the frequency approach.
11
Feller denes a discrete sample space as follows: "A sample space is called discrete if it
contains only nitely many points or innitiely many points which can be arranged into a
simple sequence..." MGB would say such a sample space is countable.
12
Not that equally likely is the special case where Pr[!
j
] = 1=N()8j
13
Unless one has a large sample space, one will probably want to sample with replacement
22
1.7 But dont I have to learn about how many possible
samples there are if the population is of size M, where
M is a nite number, and the sample size is N, N _ M?
It would seem so. Books on probability or statistics cover, or assume, this topic.
Determining probabilities is often all about how many dierent things can
happen and how many of them will have some property.
The answer to how many things can happen is simple if the population size
is innite. In that case, there are an innite number of samples and sometimes
an innite number of ways an event can occur.
When M is nite and N _ M, the answer depends on whether one sam-
ples with or without replacement and depends on whether one is talking about
ordered or un-ordered samples
For the moment, we will limit ourselves to considering sampling without
replacement (what economists typically do)
Consider two possible "samples of 3 observations: (2; 55; 17), and (55; 2; 17)
where each observation in the sample is identied by a number, so in the rst
sample the rst observation is population member #2.
These are two dierent outcomes
Are these two dierent samples or are they the same sample? The answer
depends on whether the order is important. If ordering is important, there are
two distinct samples. If ordering is not important, they are the same sample.
Consider poker, 5-card hands: Each possible hand is a dierent outcome.
But in terms of the game, order is not important; for example, a hand with
3 kings and 2 jacks has the same meaning/value, independent of the order in
which you were dealt the cards. So, 3 kings and 2 jacks is an event, an event
generated by a number of dierent outcomes. How many dierent outcomes?
23
1.7.1 With the above in mind, how many outcomes (ordered sam-
ples) are there if one is sampling without replacement, the
population is of size M and the sample size is N?
M(M 1)(M 2):::(M N + 1)
=
M(M 1)(M 2):::(M N + 1)(M N)(M N 1)(M N 2):::1
(M N)(M N 1)(M N 2):::1
=
M!
(M N)!
= (M)
N
where X! = X(X 1)(X 2)(X 3)::::1
Why is it (M)
N
? On the rst draw there are M possibilities, on the next
draw M 1, the next M 2, and on until N observations are drawn.
For example if the population of interest, Ph.D. students in economics, is
56 and one randomly samples 10 individuals from this population, there are
(56)!
(5610)!
= 1: 292 1 10
17
possible samples, if the order in which the individuals
were drawn matters.
1.7.2 How many ways are there if order is not important?
N! is the number of dierent ways one can order the same N observations (why?
)so, the answer is
M!
(M N)!N!
=
(M)
N
N!
=
_
M
N
_
For our students, the answer is
(56)!
(5610)!10!
= 3: 560 710
10
, many fewer than
when order is important. but still a big number.
_
M
N
_
is called the binomial coecient.
14
It is also known as a combination
or combinatorial number. An alternative notation is
M
C
N
and can be read as
"M choose N."
In terms of sets, where order does not matter,
_
M
N
_
is the mumber of subsets
of M that have N elements, so will be useful for counting events
14
For a cool graph of the binomial coecient, see
http://mathworld.wolfram.com/BinomialCoecient.html
24
You should be able to derive the above formulae, and explain what they
mean
A graph the binomial assuming a population size of 4 , 1 _ k _ 4.
_
4
k
_
=
4
C
k
4 3.5 3 2.5 2 1.5 1
5
3.75
2.5
1.25
k
4Ck
k
4Ck
Looks correct:
_
4
1
_
= 4:0; there are 4 subsets of size 1;
_
4
2
_
= 6:0; there are 6
subset of size 2,
_
4
3
_
= 4:0, and
_
4
4
_
= 1:0
Note that the function
_
n
k
_
is dened for non integer values of k.
25
Now I will try to graph
_
n
k
_
, 1 _ n; k _ 4
4
3.5
3
2.5
2
1.5
1
4
3.5
3
2.5
2
1.5
1
5
3.75
2.5
1.25
0
n
k
nCk
n
k
nCk
Plot of
_
n
k
_
, 1 _ n; k _ 4
26
Note that the function
_
n
k
_
is dened for non integer values of n and k.
It is much wilder looking if you include negative values for n and k
27
1.7.3 How often will you use these formulae (2
M
, (M)
N
and
_
M
N
_
)?
A lot if you are a devotee of problems in classical probability ( is nite and each
outcome is equally likely). Poker players should be devotees. These formulae
give us shorthand ways to count, and classical probabilities (the probability of
a full house in poker) are determined by counting
When M is small one can often intuit the number of times an event will
occur without explicitly using the appropriate formula.
We will sometimes use them. For example, (M)
N
and
_
M
N
_
appear in some
discrete distribution functions, for example, the binomial distribution contains
the term
_
M
N
_
.
Binomial distribution: a digression Continuing with this digression, a
discrete random variable is said to have a binomial distribution if
Pr[x : n] =
_ _
n
x
_
p
x
(1 p)
nx
if x = 0; 1; 2; :::; n
0 if otherwise
where p is the probability of a "yes" so Pr[x : n] is the probability of observing
x yess out of n trials.
In explanation, p
x
(1 p)
nx
is the probability of one way of observing
x yeses in n trials, but there are many ways of getting x yesses in n trials,
specically there are
_
n
x
_
ways.
28
For a cool interactive graph of the binomial distribution, go to http://demonstrations.wolfram.com/BinomialDistribution/
How might an econometrician use such a distribution? Lets say you want
to model the probability of choosing beverage A over B as a function of the
price of each, the amount of sugar in each, and whether the beverage is colored
brown or not. One might assume that the probability of a yes to beverage A
is
15
p
A
=
e
V
A
e
V
A
+e
V
B
where
V
j
=
s
S
j
+
c
C
j
+
p
(price
j
)
where S
j
is the sugar content of beverage j, price
j
is its price and C
j
= 1 if
its color is brown, and zero otherwise. Substituting these functions for the V
j
,
one has p
A
as a function of the prices and characteristics of the two beverages,
p
A
= f(price
A
; price
B
; S
A
; S
B
; C
A
; C
B
). Specically,
p
A
= f(price
A
; price
B
; S
A
; S
B
; C
A
; C
B
)
e
(
s
S
A
+
c
C
A
+
p
(price
A
))
e
(
s
S
A
+
c
C
A
+
p
(price
A
))
+e
(
s
S
B
+
c
C
B
+
p
(price
B
))
Plugging f(price
A
; price
B
; S
A
; S
B
; C
A
; C
B
) into the binomial distribution,
one gets Pr[n
A
: n], the probability that n
A
of the n beverages you drink will be
beverage A, as a function of the prices and characteristics of the two beverages.
Pr[x : n] =
_ _
n
x
_
p
x
A
(1 p
A
)
nx
if x = 0; 1; 2; :::; n
0 if otherwise
And
Pr[x : n]
=
_ _
n
x
_
(f(price
A
; price
B
; S
A
; S
B
; C
A
; C
B
))
x
(1 (f(price
A
; price
B
; S
A
; S
B
; C
A
; C
B
)))
nx
if x = 0; 1; 2; :::; n
0 if otherwise
One could then collect data from a bunch of pair-wise comparisons allowing
the prices and other characteristics to vary across the pairs. One then could
use the data to come up with estimates of
s
,
c
and
p
. Maximum likelihood
estimation would be a good way to do this.
15
I chose this functional form for the probability simply because it restricts p
A
to be between
zero and one, inclusive, and p
A
+p
B
= 1
29