Beruflich Dokumente
Kultur Dokumente
Mar 1, 2007
Probability Theory & Uncertainty
Read Chapter 13 of textbook
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
What you will learn today
probability as uncertainty
There are almost always things that are unknown, or not precisely known.
Examples:
- bus schedule
- quickest way to the airport
- sensors
- joint positions
- nding an H-bomb
Probability
the precise representation of knowledge and uncertainty
Probability theory
how to optimally update your knowledge based on new information
What was the probability the sea level will rise 1 meter within the century?
http://www.biz.uiowa.edu/iem/
The Iowa Electronic Markets are real-money futures markets in which contract
payoffs depend on economic and political events such as elections.
It is very difcult to recognize phonemes from naturally spoken speech when they
are presented in isolation.
All modern speech recognition systems rely heavily on context (as do we).
This allows the recognition of words, even if there is a great deal of uncertainty in
each of the individual parts.
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
De Finettis denition of probability
de Finetti showed that the price you set has to obey the axioms of probability or
you face certain loss, i.e. youll lose every time.
20
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007 21
Axioms of probability
Axioms (Kolmogorov):
0 ! P(A) ! 1
P(true) = 1
P(false) = 0
P(A or B) = P(A) + P(B) " P(A and B)
Corollaries:
- A single random variable must sum to 1:
- The joint probability of a set of variables must also sum to 1.
- If A and B are mutually exclusive:
P(A or B) = P(A) + P(B)
n
i=1
P(D = d
i
) = 1
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
Rules of probability
conditional probability
Bayes rule
independence
23
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007 24
21
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +1
The Joint Distribution
Recipe for making a joint distribution
of N variables:
1. Nake a truth table listing all
combinations of values of your
variables (if there are N Boolean
variables then the table will have
2
N
rows).
2. For each combination of values,
say how probable it is.
3. !f you subscribe to the axioms of
probability, those numbers must
sum to 1.
Example: Boolean
variables A, B, C
0.10 1 1 1
0.25 0 1 1
0.10 1 0 1
0.05 0 0 1
0.05 1 1 0
0.10 0 1 0
0.05 1 0 0
0.30 0 0 0
!"#$ % & '
A
B
C
0.05
0.25
0.10 0.05
0.05
0.10
0.10
0.30
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +2
Using the
Joint
One you have the JD you can
ask for the probability of any
logical expression involving
your attribute
!
"
!
" ! "
matching rows
) row ( ) (
All the nice looking slides like this one from now on are from Andrew Moore.
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007 25
21
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +1
The Joint Distribution
Recipe for making a joint distribution
of N variables:
1. Nake a truth table listing all
combinations of values of your
variables (if there are N Boolean
variables then the table will have
2
N
rows).
2. For each combination of values,
say how probable it is.
3. !f you subscribe to the axioms of
probability, those numbers must
sum to 1.
Example: Boolean
variables A, B, C
0.10 1 1 1
0.25 0 1 1
0.10 1 0 1
0.05 0 0 1
0.05 1 1 0
0.10 0 1 0
0.05 1 0 0
0.30 0 0 0
!"#$ % & '
A
B
C
0.05
0.25
0.10 0.05
0.05
0.10
0.10
0.30
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +2
Using the
Joint
One you have the JD you can
ask for the probability of any
logical expression involving
your attribute
!
"
!
" ! "
matching rows
) row ( ) (
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007 26
21
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +1
The Joint Distribution
Recipe for making a joint distribution
of N variables:
1. Nake a truth table listing all
combinations of values of your
variables (if there are N Boolean
variables then the table will have
2
N
rows).
2. For each combination of values,
say how probable it is.
3. !f you subscribe to the axioms of
probability, those numbers must
sum to 1.
Example: Boolean
variables A, B, C
0.10 1 1 1
0.25 0 1 1
0.10 1 0 1
0.05 0 0 1
0.05 1 1 0
0.10 0 1 0
0.05 1 0 0
0.30 0 0 0
!"#$ % & '
A
B
C
0.05
0.25
0.10 0.05
0.05
0.10
0.10
0.30
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +2
Using the
Joint
One you have the JD you can
ask for the probability of any
logical expression involving
your attribute
!
"
!
" ! "
matching rows
) row ( ) (
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
21
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +1
The Joint Distribution
Recipe for making a joint distribution
of N variables:
1. Nake a truth table listing all
combinations of values of your
variables (if there are N Boolean
variables then the table will have
2
N
rows).
2. For each combination of values,
say how probable it is.
3. !f you subscribe to the axioms of
probability, those numbers must
sum to 1.
Example: Boolean
variables A, B, C
0.10 1 1 1
0.25 0 1 1
0.10 1 0 1
0.05 0 0 1
0.05 1 1 0
0.10 0 1 0
0.05 1 0 0
0.30 0 0 0
!"#$ % & '
A
B
C
0.05
0.25
0.10 0.05
0.05
0.10
0.10
0.30
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +2
Using the
Joint
One you have the JD you can
ask for the probability of any
logical expression involving
your attribute
!
"
!
" ! "
matching rows
) row ( ) (
27
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
21
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +1
The Joint Distribution
Recipe for making a joint distribution
of N variables:
1. Nake a truth table listing all
combinations of values of your
variables (if there are N Boolean
variables then the table will have
2
N
rows).
2. For each combination of values,
say how probable it is.
3. !f you subscribe to the axioms of
probability, those numbers must
sum to 1.
Example: Boolean
variables A, B, C
0.10 1 1 1
0.25 0 1 1
0.10 1 0 1
0.05 0 0 1
0.05 1 1 0
0.10 0 1 0
0.05 1 0 0
0.30 0 0 0
!"#$ % & '
A
B
C
0.05
0.25
0.10 0.05
0.05
0.10
0.10
0.30
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +2
Using the
Joint
One you have the JD you can
ask for the probability of any
logical expression involving
your attribute
!
"
!
" ! "
matching rows
) row ( ) (
28
!!
"#$%&'()*+,+!--./+012&34+56+7##&3 8	:9';'<*'=+01:;%*'=<>+?;'23+@A
B<'1(+*)3+
C#'1*
8D8##&+7:;3E+F+-6@GH@
!
"
!
" ! "
matching rows
) row ( ) (
"#$%&'()*+,+!--./+012&34+56+7##&3 8	:9';'<*'=+01:;%*'=<>+?;'23+@@
B<'1(+*)3+
C#'1*
8D8##&E+F+-6IG-@
!
"
!
" ! "
matching rows
) row ( ) (
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
21
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +1
The Joint Distribution
Recipe for making a joint distribution
of N variables:
1. Nake a truth table listing all
combinations of values of your
variables (if there are N Boolean
variables then the table will have
2
N
rows).
2. For each combination of values,
say how probable it is.
3. !f you subscribe to the axioms of
probability, those numbers must
sum to 1.
Example: Boolean
variables A, B, C
0.10 1 1 1
0.25 0 1 1
0.10 1 0 1
0.05 0 0 1
0.05 1 1 0
0.10 0 1 0
0.05 1 0 0
0.30 0 0 0
!"#$ % & '
A
B
C
0.05
0.25
0.10 0.05
0.05
0.10
0.10
0.30
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +2
Using the
Joint
One you have the JD you can
ask for the probability of any
logical expression involving
your attribute
!
"
!
" ! "
matching rows
) row ( ) (
29
!!
"#$%&'()*+,+!--./+012&34+56+7##&3 8	:9';'<*'=+01:;%*'=<>+?;'23+@A
B<'1(+*)3+
C#'1*
8D8##&+7:;3E+F+-6@GH@
!
"
!
" ! "
matching rows
) row ( ) (
"#$%&'()*+,+!--./+012&34+56+7##&3 8	:9';'<*'=+01:;%*'=<>+?;'23+@@
B<'1(+*)3+
C#'1*
8D8##&E+F+-6IG-@
!
"
!
" ! "
matching rows
) row ( ) (
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
21
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +1
The Joint Distribution
Recipe for making a joint distribution
of N variables:
1. Nake a truth table listing all
combinations of values of your
variables (if there are N Boolean
variables then the table will have
2
N
rows).
2. For each combination of values,
say how probable it is.
3. !f you subscribe to the axioms of
probability, those numbers must
sum to 1.
Example: Boolean
variables A, B, C
0.10 1 1 1
0.25 0 1 1
0.10 1 0 1
0.05 0 0 1
0.05 1 1 0
0.10 0 1 0
0.05 1 0 0
0.30 0 0 0
!"#$ % & '
A
B
C
0.05
0.25
0.10 0.05
0.05
0.10
0.10
0.30
Copyright 2001, Andrew W. Noore Probabilistic Analytics: Slide +2
Using the
Joint
One you have the JD you can
ask for the probability of any
logical expression involving
your attribute
!
"
!
" ! "
matching rows
) row ( ) (
30
!"
#$%&'()*+,-,!../0,123'45,67,8$$'4 9'$:;:(<(=+(>,12;<&+(>=?,@<(34,AB
C2D4'42>4,
5(+*,+*4,
E$(2+
!
!
"
#
"
2
2 1
matching rows
and matching rows
2
2 1
2 1
) row (
) row (
) (
) (
) , (
!
! !
"
"
! "
! ! "
! ! "
#$%&'()*+,-,!../0,123'45,67,8$$'4 9'$:;:(<(=+(>,12;<&+(>=?,@<(34,AF
C2D4'42>4,
5(+*,+*4,
E$(2+
!
!
"
#
"
2
2 1
matching rows
and matching rows
2
2 1
2 1
) row (
) row (
) (
) (
) , (
!
! !
"
"
! "
! ! "
! ! "
9G8;<4 H,9$$'I,J,.7AFBA,K,.7LF.A,J,.7F/!,,
!"
#$%&'()*+,-,!../0,123'45,67,8$$'4 9'$:;:(<(=+(>,12;<&+(>=?,@<(34,AB
C2D4'42>4,
5(+*,+*4,
E$(2+
!
!
"
#
"
2
2 1
matching rows
and matching rows
2
2 1
2 1
) row (
) row (
) (
) (
) , (
!
! !
"
"
! "
! ! "
! ! "
#$%&'()*+,-,!../0,123'45,67,8$$'4 9'$:;:(<(=+(>,12;<&+(>=?,@<(34,AF
C2D4'42>4,
5(+*,+*4,
E$(2+
!
!
"
#
"
2
2 1
matching rows
and matching rows
2
2 1
2 1
) row (
) row (
) (
) (
) , (
!
! !
"
"
! "
! ! "
! ! "
9G8;<4 H,9$$'I,J,.7AFBA,K,.7LF.A,J,.7F/!,,
!"
#$%&'()*+,-,!../0,123'45,67,8$$'4 9'$:;:(<(=+(>,12;<&+(>=?,@<(34,AB
C2D4'42>4,
5(+*,+*4,
E$(2+
!
!
"
#
"
2
2 1
matching rows
and matching rows
2
2 1
2 1
) row (
) row (
) (
) (
) , (
!
! !
"
"
! "
! ! "
! ! "
#$%&'()*+,-,!../0,123'45,67,8$$'4 9'$:;:(<(=+(>,12;<&+(>=?,@<(34,AF
C2D4'42>4,
5(+*,+*4,
E$(2+
!
!
"
#
"
2
2 1
matching rows
and matching rows
2
2 1
2 1
) row (
) row (
) (
) (
) , (
!
! !
"
"
! "
! ! "
! ! "
9G8;<4 H,9$$'I,J,.7AFBA,K,.7LF.A,J,.7F/!,,
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
Continuous probability distributions
marginal probability
Bayes rule
31
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007 32
3
Copyright 2001, Andrew W. Noore Probability Densities: Slide 5
A PDF of American Ages in 2000
Copyright 2001, Andrew W. Noore Probability Densities: Slide 6
A PDF of American Ages in 2000
Let X be a continuous random
variable.
!f p(x) is a Probability Density
Function for X then.
! "
#
$
$ % &
!
" #
$# # % ! & " ' ! "
! "
#
$
$ % &
#$
%$ &'(
&'( ! &'( " #$ )'( %$ $ % '
= 0.36
more of Andrews nice slides
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
3
Copyright 2001, Andrew W. Noore Probability Densities: Slide 5
A PDF of American Ages in 2000
Copyright 2001, Andrew W. Noore Probability Densities: Slide 6
A PDF of American Ages in 2000
Let X be a continuous random
variable.
!f p(x) is a Probability Density
Function for X then.
! "
#
$
$ % &
!
" #
$# # % ! & " ' ! "
! "
#
$
$ % &
#$
%$ &'(
&'( ! &'( " #$ )'( %$ $ % '
= 0.36
33
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007 34
What does p(x) mean?
The likelihood of p(a) can only be compared relatively to other values p(b)
It indicates the relative probability of the integrated density over a small delta:
7
Copyright 2001, Andrew W. Noore Probability Densities: Slide 13
Talking to your stomach
- What's the gut-feel meaning of p(x)?
!f
then
when a value X is sampled from the
distribution, you are ! times as likely to find
that X is very close to" 5.31 than that X is
very close to" 5.92.
b
a
!
! "
# "
"
! "
! "
Copyright 2001, Andrew W. Noore Probability Densities: Slide 1+
Talking to your stomach
- What's the gut-feel meaning of p(x)?
!f
then
!
! "
# "
"
! "
! "
!
$ ! % $ ! &
$ # % $ # &
$
"
# $ $ %
# $ $ %
&
! "
! "
#$%
&
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007 35
9
Copyright 2001, Andrew W. Noore Probability Densities: Slide 17
Expectations
E[X| = the expected value of
random variable X
= the average value we'd see
if we took a very large number
of random samples of X
!
"
#" $
$
!
"! ! # ! ! "
Copyright 2001, Andrew W. Noore Probability Densities: Slide 18
Expectations
E[X| = the expected value of
random variable X
= the average value we'd see
if we took a very large number
of random samples of X
!
"
#" $
$
!
"! ! # ! ! "
= the first moment of the
shape formed by the axes and
the blue curve
= the best value to choose if
you must guess an unknown
person's age and you'll be
fined the square of your error
E[age|=35.897
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
9
Copyright 2001, Andrew W. Noore Probability Densities: Slide 17
Expectations
E[X| = the expected value of
random variable X
= the average value we'd see
if we took a very large number
of random samples of X
!
"
#" $
$
!
"! ! # ! ! "
Copyright 2001, Andrew W. Noore Probability Densities: Slide 18
Expectations
E[X| = the expected value of
random variable X
= the average value we'd see
if we took a very large number
of random samples of X
!
"
#" $
$
!
"! ! # ! ! "
= the first moment of the
shape formed by the axes and
the blue curve
= the best value to choose if
you must guess an unknown
person's age and you'll be
fined the square of your error
E[age|=35.897
36
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
10
Copyright 2001, Andrew W. Noore Probability Densities: Slide 19
Expectation of a function
!=E[f(X)| = the expected
value of f(x) where x is drawn
from X's distribution.
= the average value we'd see
if we took a very large number
of random samples of f(X)
"
#
$# %
%
!
"! ! # ! $ ! " ! " !
Note that in general:
#! $ " !# " $ % & $ ! $ & &
%& ' ()*% # +,- $
.
% &
%. ' (.** #! +,- $ "
.
% &
Copyright 2001, Andrew W. Noore Probability Densities: Slide 20
variance
'
2
= var[X| = the
expected squared
difference between
x and E[X| "
#
$# %
$ %
!
"! ! # ! ! " ! "
. .
! '
= amount you'd expect to lose
if you must guess an unknown
person's age and you'll be
fined the square of your error,
and assuming you play
optimally
/. ' &0* # +,- $ 1+2 %
37
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
10
Copyright 2001, Andrew W. Noore Probability Densities: Slide 19
Expectation of a function
!=E[f(X)| = the expected
value of f(x) where x is drawn
from X's distribution.
= the average value we'd see
if we took a very large number
of random samples of f(X)
"
#
$# %
%
!
"! ! # ! $ ! " ! " !
Note that in general:
#! $ " !# " $ % & $ ! $ & &
%& ' ()*% # +,- $
.
% &
%. ' (.** #! +,- $ "
.
% &
Copyright 2001, Andrew W. Noore Probability Densities: Slide 20
variance
'
2
= var[X| = the
expected squared
difference between
x and E[X| "
#
$# %
$ %
!
"! ! # ! ! " ! "
. .
! '
= amount you'd expect to lose
if you must guess an unknown
person's age and you'll be
fined the square of your error,
and assuming you play
optimally
/. ' &0* # +,- $ 1+2 %
38
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
11
Copyright 2001, Andrew W. Noore Probability Densities: Slide 21
Standard Deviation
!
2
= var[X| = the
expected squared
difference between
x and E[X| "
#
$# %
$ %
!
"! ! # ! ! " ! "
# #
& !
= amount you'd expect to lose
if you must guess an unknown
person's age and you'll be
fined the square of your error,
and assuming you play
optimally
! = Standard Deviation =
typical" deviation of X from
its mean
$# % &'( ) *+, - .*/ %
) - .*/ $ % !
0# % ## % !
Copyright 2001, Andrew W. Noore Probability Densities: Slide 22
!n 2
dimensions
p(x,y) = probability density of
random variables (X,Y) at
location (x,y)
39
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
11
Copyright 2001, Andrew W. Noore Probability Densities: Slide 21
Standard Deviation
!
2
= var[X| = the
expected squared
difference between
x and E[X| "
#
$# %
$ %
!
"! ! # ! ! " ! "
# #
& !
= amount you'd expect to lose
if you must guess an unknown
person's age and you'll be
fined the square of your error,
and assuming you play
optimally
! = Standard Deviation =
typical" deviation of X from
its mean
$# % &'( ) *+, - .*/ %
) - .*/ $ % !
0# % ## % !
Copyright 2001, Andrew W. Noore Probability Densities: Slide 22
!n 2
dimensions
p(x,y) = probability density of
random variables (X,Y) at
location (x,y)
40
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
12
Copyright 2001, Andrew W. Noore Probability Densities: Slide 23
!n 2
dimensions
Let X,Y be a pair of continuous random
variables, and let R be some region of (X,Y)
space.
!!
"
# "
! " #
$"$# " # % ! & ' (
! " #
! " # ! ! " ##
Copyright 2001, Andrew W. Noore Probability Densities: Slide 2+
!n 2
dimensions
Let X,Y be a pair of continuous random
variables, and let R be some region of (X,Y)
space.
!!
"
# "
! " #
$"$# " # % ! & ' (
! " #
! " # ! ! " ##
P( 20<mpg<30 and
2500<weight<3000) =
area under the 2-d surface within
the red rectangle
41
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
12
Copyright 2001, Andrew W. Noore Probability Densities: Slide 23
!n 2
dimensions
Let X,Y be a pair of continuous random
variables, and let R be some region of (X,Y)
space.
!!
"
# "
! " #
$"$# " # % ! & ' (
! " #
! " # ! ! " ##
Copyright 2001, Andrew W. Noore Probability Densities: Slide 2+
!n 2
dimensions
Let X,Y be a pair of continuous random
variables, and let R be some region of (X,Y)
space.
!!
"
# "
! " #
$"$# " # % ! & ' (
! " #
! " # ! ! " ##
P( 20<mpg<30 and
2500<weight<3000) =
area under the 2-d surface within
the red rectangle
42
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
13
Copyright 2001, Andrew W. Noore Probability Densities: Slide 25
!n 2
dimensions
Let X,Y be a pair of continuous random
variables, and let R be some region of (X,Y)
space.
!!
"
# "
! " #
$"$# " # % ! & ' (
! " #
! " # ! ! " ##
P( [(mpg-25)f10|
2
+
[(weight-3300)f1500|
2
< 1 ) =
area under the 2-d surface within
the red oval
Copyright 2001, Andrew W. Noore Probability Densities: Slide 26
!n 2
dimensions
Let X,Y be a pair of continuous random
variables, and let R be some region of (X,Y)
space.
!!
"
# "
! " #
$"$# " # % ! & ' (
! " #
! " # ! ! " ##
Take the special case of region R = everywhere".
Remember that with probability 1, (X,Y) will be drawn from
somewhere".
So..
! !
$
%$ #
$
%$ #
#
# "
$"$# " # % $ ! " #
43
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
13
Copyright 2001, Andrew W. Noore Probability Densities: Slide 25
!n 2
dimensions
Let X,Y be a pair of continuous random
variables, and let R be some region of (X,Y)
space.
!!
"
# "
! " #
$"$# " # % ! & ' (
! " #
! " # ! ! " ##
P( [(mpg-25)f10|
2
+
[(weight-3300)f1500|
2
< 1 ) =
area under the 2-d surface within
the red oval
Copyright 2001, Andrew W. Noore Probability Densities: Slide 26
!n 2
dimensions
Let X,Y be a pair of continuous random
variables, and let R be some region of (X,Y)
space.
!!
"
# "
! " #
$"$# " # % ! & ' (
! " #
! " # ! ! " ##
Take the special case of region R = everywhere".
Remember that with probability 1, (X,Y) will be drawn from
somewhere".
So..
! !
$
%$ #
$
%$ #
#
# "
$"$# " # % $ ! " #
44
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
1+
Copyright 2001, Andrew W. Noore Probability Densities: Slide 27
!n 2
dimensions
Let X,Y be a pair of continuous random
variables, and let R be some region of (X,Y)
space.
!!
"
# "
! " #
$"$# " # % ! & ' (
) , (
) , ( ) ) , ((
2
0 h
2 2 2 2
lim
)
)
" &
)
"
)
# '
)
# (
$
%
&
'
(
)
* + , - . * + , -
/
# ) , ( " # %
Copyright 2001, Andrew W. Noore Probability Densities: Slide 28
!n m
dimensions
Let (X
1
,X
2
,.X
m
) be an n-tuple of continuous
random variables, and let R be some region
of !
m
.
# " ) ) ,..., , ((
2 1
! ' ' ' (
*
!! !
"! # # #
* *
*
$# $# $# # # # %
) ,..., , (
1 2 2 1
2 1
, ,... , ) ,..., , ( ...
45
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
15
Copyright 2001, Andrew W. Noore Probability Densities: Slide 29
!ndependence
!f X and Y are independent
then knowing the value of X
does not help predict the
value of Y
) ( ) ( ) , ( : y x, iII ! " # " ! # " $ % ! " #
mpg,weight NOT
independent
Copyright 2001, Andrew W. Noore Probability Densities: Slide 30
!ndependence
!f X and Y are independent
then knowing the value of X
does not help predict the
value of Y
) ( ) ( ) , ( : y x, iII ! " # " ! # " $ % ! " #
the contours say that
acceleration and weight are
independent
46
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
15
Copyright 2001, Andrew W. Noore Probability Densities: Slide 29
!ndependence
!f X and Y are independent
then knowing the value of X
does not help predict the
value of Y
) ( ) ( ) , ( : y x, iII ! " # " ! # " $ % ! " #
mpg,weight NOT
independent
Copyright 2001, Andrew W. Noore Probability Densities: Slide 30
!ndependence
!f X and Y are independent
then knowing the value of X
does not help predict the
value of Y
) ( ) ( ) , ( : y x, iII ! " # " ! # " $ % ! " #
the contours say that
acceleration and weight are
independent
47
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
16
Copyright 2001, Andrew W. Noore Probability Densities: Slide 31
Nultivariate Expectation
! ! ! " #
"
!
" " ! " # ! " # $
E[mpg,weight| =
(2+.5,2600)
The centroid of the
cloud
Copyright 2001, Andrew W. Noore Probability Densities: Slide 32
Nultivariate Expectation
! ! ! "
!
" ! " $ $ # ! " ! " !# " $
48
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
16
Copyright 2001, Andrew W. Noore Probability Densities: Slide 31
Nultivariate Expectation
! ! ! " #
"
!
" " ! " # ! " # $
E[mpg,weight| =
(2+.5,2600)
The centroid of the
cloud
Copyright 2001, Andrew W. Noore Probability Densities: Slide 32
Nultivariate Expectation
! ! ! "
!
" ! " $ $ # ! " ! " !# " $
49
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
17
Copyright 2001, Andrew W. Noore Probability Densities: Slide 33
Test your understanding
? | | | | | | does ever) (iI When : Question ! " # " ! # " ! " !
-All the time?
-Only when X and Y are independent?
-!t can fail even if X and Y are independent?
Copyright 2001, Andrew W. Noore Probability Densities: Slide 3+
Bivariate Expectation
#
" " $%$& % & ' & ! # ( " & % & ( ) , ( )| , ( | then ) , ( iI
#
" $%$& % & ' % & ( % & ( " ) , ( ) , ( )| , ( |
#
" " $%$& % & ' % ! # ( " % % & ( ) , ( )| , ( | then ) , ( iI
#
! " ! " $%$& % & ' % & ! # ( " % & % & ( ) , ( ) ( )| , ( | then ) , ( iI
| | | | | | ! " # " ! # " ! " !
50
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
17
Copyright 2001, Andrew W. Noore Probability Densities: Slide 33
Test your understanding
? | | | | | | does ever) (iI When : Question ! " # " ! # " ! " !
-All the time?
-Only when X and Y are independent?
-!t can fail even if X and Y are independent?
Copyright 2001, Andrew W. Noore Probability Densities: Slide 3+
Bivariate Expectation
#
" " $%$& % & ' & ! # ( " & % & ( ) , ( )| , ( | then ) , ( iI
#
" $%$& % & ' % & ( % & ( " ) , ( ) , ( )| , ( |
#
" " $%$& % & ' % ! # ( " % % & ( ) , ( )| , ( | then ) , ( iI
#
! " ! " $%$& % & ' % & ! # ( " % & % & ( ) , ( ) ( )| , ( | then ) , ( iI
| | | | | | ! " # " ! # " ! " !
51
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
!"
#$%&'()*+,-,.//!0,123'45,67,8$$'4 9'$:;:(<(+&,=42>(+(4>?,@<(34,AB
C(D;'(;+4,#$D;'(;2E4
)| )( |( | , Cov|
! " "!
# $ % # $ ! ! " # # $ $
| ) |( | | | , Cov|
2 2
"
"
""
$ % $ &'( $ $ ! " " # $ $ $ $
| ) |( | | | , Cov|
2 2
!
!
!!
# % # &'( # # ! " " # $ $ $ $
#$%&'()*+,-,.//!0,123'45,67,8$$'4 9'$:;:(<(+&,=42>(+(4>?,@<(34,AF
C(D;'(;+4,#$D;'(;2E4
)| )( |( | , Cov|
! " "!
# $ % # $ ! ! " # # $ $
| ) |( | | | , Cov|
2 2
"
"
""
$ % $ &'( $ $ ! " " # $ $ $ $
| ) |( | | | , Cov|
2 2
!
!
!!
# % # &'( # # ! " " # $ $ $ $
then , Write
%
%
&
'
(
(
)
*
$
#
$
) !
&
'
(
(
)
*
$ $ # # $
!
"!
"!
"
*
" "
) + +, , %
2
2
| | | |
" "
" "
! " ! " ! ! #$%
52
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
!"
#$%&'()*+,-,.//!0,123'45,67,8$$'4 9'$:;:(<(+&,=42>(+(4>?,@<(34,AB
C(D;'(;+4,#$D;'(;2E4
)| )( |( | , Cov|
! " "!
# $ % # $ ! ! " # # $ $
| ) |( | | | , Cov|
2 2
"
"
""
$ % $ &'( $ $ ! " " # $ $ $ $
| ) |( | | | , Cov|
2 2
!
!
!!
# % # &'( # # ! " " # $ $ $ $
#$%&'()*+,-,.//!0,123'45,67,8$$'4 9'$:;:(<(+&,=42>(+(4>?,@<(34,AF
C(D;'(;+4,#$D;'(;2E4
)| )( |( | , Cov|
! " "!
# $ % # $ ! ! " # # $ $
| ) |( | | | , Cov|
2 2
"
"
""
$ % $ &'( $ $ ! " " # $ $ $ $
| ) |( | | | , Cov|
2 2
!
!
!!
# % # &'( # # ! " " # $ $ $ $
then , Write
%
%
&
'
(
(
)
*
$
#
$
) !
&
'
(
(
)
*
$ $ # # $
!
"!
"!
"
*
" "
) + +, , %
2
2
| | | |
" "
" "
! " ! " ! ! #$%
53
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
19
Copyright 2001, Andrew W. Noore Probability Densities: Slide 37
Covariance !ntuition
E[mpg,weight| =
(2+.5,2600)
!
"#$
! " !
"#$
! "
%&&
'()$*+
! "
%&&
'()$*+
! "
Copyright 2001, Andrew W. Noore Probability Densities: Slide 38
Covariance !ntuition
E[mpg,weight| =
(2+.5,2600)
!
"#$
! " !
"#$
! "
%&&
'()$*+
! "
%&&
'()$*+
! "
Principal
Eigenvector
of #
54
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
19
Copyright 2001, Andrew W. Noore Probability Densities: Slide 37
Covariance !ntuition
E[mpg,weight| =
(2+.5,2600)
!
"#$
! " !
"#$
! "
%&&
'()$*+
! "
%&&
'()$*+
! "
Copyright 2001, Andrew W. Noore Probability Densities: Slide 38
Covariance !ntuition
E[mpg,weight| =
(2+.5,2600)
!
"#$
! " !
"#$
! "
%&&
'()$*+
! "
%&&
'()$*+
! "
Principal
Eigenvector
of #
55
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
20
Copyright 2001, Andrew W. Noore Probability Densities: Slide 39
Covariance Fun Facts
!
!
"
#
$
$
%
&
' ' ( ( '
!
"!
"!
"
#
" "
$ % %& & '
!
!
" # " #
) )
) )
! ! " ! " " #$%
-True or False: !f )
xy
= 0 then X and Y are
independent
-True or False: !f X and Y are independent
then )
xy
= 0
-True or False: !f )
xy
= )
x
)
y
then X and Y are
deterministically related
-True or False: !f X and Y are deterministically
related then )
xy
= )
x
)
y
How could
you prove
or disprove
these?
Copyright 2001, Andrew W. Noore Probability Densities: Slide +0
General Covariance
! ! " ! " " #$% ' ( ( ' $ % %& & '
#
" "
" # " #
Let ! = (X
1
,X
2
, . X
k
) be a vector of k continuous random variables
( )
" " ( ) )(
* * +,- ) ' ' " % # !
S is a k x k symmetric non-negative definite matrix
!f all distributions are linearly independent it is positive definite
!f the distributions are linearly dependent it has determinant zero
56
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
20
Copyright 2001, Andrew W. Noore Probability Densities: Slide 39
Covariance Fun Facts
!
!
"
#
$
$
%
&
' ' ( ( '
!
"!
"!
"
#
" "
$ % %& & '
!
!
" # " #
) )
) )
! ! " ! " " #$%
-True or False: !f )
xy
= 0 then X and Y are
independent
-True or False: !f X and Y are independent
then )
xy
= 0
-True or False: !f )
xy
= )
x
)
y
then X and Y are
deterministically related
-True or False: !f X and Y are deterministically
related then )
xy
= )
x
)
y
How could
you prove
or disprove
these?
Copyright 2001, Andrew W. Noore Probability Densities: Slide +0
General Covariance
! ! " ! " " #$% ' ( ( ' $ % %& & '
#
" "
" # " #
Let ! = (X
1
,X
2
, . X
k
) be a vector of k continuous random variables
( )
" " ( ) )(
* * +,- ) ' ' " % # !
S is a k x k symmetric non-negative definite matrix
!f all distributions are linearly independent it is positive definite
!f the distributions are linearly dependent it has determinant zero
57
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
21
Copyright 2001, Andrew W. Noore Probability Densities: Slide +1
Test your understanding
? | | | | | | does ever) (iI When : Question ! "#$ % "#$ ! % "#$ ! " !
-All the time?
-Only when X and Y are independent?
-!t can fail even if X and Y are independent?
Copyright 2001, Andrew W. Noore Probability Densities: Slide +2
Narginal Distributions
#
$
%$ "
"
&
'& & ( ) ( ) ) , ( ) (
58
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
!!
"#$%&'()*+,+!--./+012&34+56+7##&3 8	:9';'*%+<31='*'3=>+?;'23+@A
"#12'*'#1:;+
<'=*&'9B*'#1=
! " #
! $ %
!
!
when oI p.d.I.
) , (
) 4600 weight , mpg ( ! %
) 3200 weight , mpg ( ! %
) 2000 weight , mpg ( ! %
"#$%&'()*+,+!--./+012&34+56+7##&3 8	:9';'*%+<31='*'3=>+?;'23+@@
"#12'*'#1:;+
<'=*&'9B*'#1=
! " #
! $ %
!
!
when oI p.d.I.
) , (
) 4600 weight , mpg ( ! %
) (
) , (
) , (
! %
! $ %
! $ % !
5)%C
59
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
!!
"#$%&'()*+,+!--./+012&34+56+7##&3 8	:9';'*%+<31='*'3=>+?;'23+@A
"#12'*'#1:;+
<'=*&'9B*'#1=
! " #
! $ %
!
!
when oI p.d.I.
) , (
) 4600 weight , mpg ( ! %
) 3200 weight , mpg ( ! %
) 2000 weight , mpg ( ! %
"#$%&'()*+,+!--./+012&34+56+7##&3 8	:9';'*%+<31='*'3=>+?;'23+@@
"#12'*'#1:;+
<'=*&'9B*'#1=
! " #
! $ %
!
!
when oI p.d.I.
) , (
) 4600 weight , mpg ( ! %
) (
) , (
) , (
! %
! $ %
! $ % !
5)%C
60
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
23
Copyright 2001, Andrew W. Noore Probability Densities: Slide +5
!ndependence Revisited
!t's easy to prove that these statements are equivalent.
) ( ) ( ) , ( : y x, iII ! " # " ! # " $ % ! " #
) ( ) , ( : y x,
) ( ) , ( : y x,
) ( ) ( ) , ( : y x,
! " # ! "
# " ! # "
! " # " ! # "
! "
$
! "
$
! "
Copyright 2001, Andrew W. Noore Probability Densities: Slide +6
Nore useful stuff
Bayes
Rule
(These can all be
proved from
definitions on
previous slides)
1 ) , ( !
%
&
'& ! #
&# ! # "
) , (
) , , (
) , , (
' ! "
' ! # "
' ! # " !
) (
) ( ) , (
) , (
! "
# " # ! "
! # " !
61
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007
23
Copyright 2001, Andrew W. Noore Probability Densities: Slide +5
!ndependence Revisited
!t's easy to prove that these statements are equivalent.
) ( ) ( ) , ( : y x, iII ! " # " ! # " $ % ! " #
) ( ) , ( : y x,
) ( ) , ( : y x,
) ( ) ( ) , ( : y x,
! " # ! "
# " ! # "
! " # " ! # "
! "
$
! "
$
! "
Copyright 2001, Andrew W. Noore Probability Densities: Slide +6
Nore useful stuff
Bayes
Rule
(These can all be
proved from
definitions on
previous slides)
1 ) , ( !
%
&
'& ! #
&# ! # "
) , (
) , , (
) , , (
' ! "
' ! # "
' ! # " !
) (
) ( ) , (
) , (
! "
# " # ! "
! # " !
62
Michael S. Lewicki ! Carnegie Mellon AI: Probability Theory ! Mar 1, 2007 63
Next time: The process of probabilistic inference
1. dene model of problem
2. derive posterior distributions and estimators
3. estimate parameters from data
4. evaluate model accuracy