Estimating PDF'S, Means, Variances: 1 Exp 1: Estimated PDF/CDF Plots From Data

KEEE494: 2nd Semester 2009 Lab I
Estimating PDFs, Means, Variances

1 Exp 1: Estimated PDF/CDF Plots From Data
Suppose we have a vector x of observations from a continuous probability distribution with density function (PDF)
f(x). This experiment teaches you how we can get an estimated plot of f(x). Lets suppose that x consists of n
samples, n large. (We will typically take n = 100000 in our experiments.) We thus subdivide the range of x values
into N bins of equal width. (In our experiments, we typically take N = n/100, so that if the data were uniformly
distributed, we would expect about 100 samples in each bin.) Let be the bin width. Let x
i
be the midpoint of the
i-th bin and let n
i
be the number of samples in the i-th bin (which can be found with the Matlab function hist). Then
n
i
nf(x
i
)
This is because the probability that the distribution will yield a value in the i-th bin is the area under the density
function f(x) for this bin, which is about f(x
i
). Hence, we get the estimated value for f(x
i
) by dividing n
i
by n:
f(x
i
) n
i
/(n)
We can implement these ideas with the following Matlab code:
n=length(x); % n is the number of samples in x
N=floor(n/100); % N is the number of bins
A=min(x); B=max(x); % [A,B] is the range of x values
Delta=(B-A)/N; % Delta is the bin width
t=A-Delta/2+[1:N]
*
Delta; % horizontal axis of bin midpoints
f=hist(x,t)/(Delta
*
n); % vertical axis of density estimates
bar(t,f); % estimated density plot
Example 1. Obtain a plot on your screen of estimated uniform(0,1) density from 10000 rand pseudorandomly
generated samples by running the following code:
n=100000;
x=rand(1,n);
N=floor(n/100);
A=min(x); B=max(x);
Delta=(B-A)/N;
t=A-Delta/2+[1:N]
*
Delta; f=hist(x,t)/(Delta
*
n);
bar(t,f);
title(Estimated Uniform (0,1) PDF)
Does your plot look like a jagged rectangular pulse of amplitude one? The jaggedness is unavoidable (it is
due to variance). One way to reduce the jaggedness in the density estimate is to use a smoothing lter. There is
an entire theory devoted to such smoothing lters, which we do not touch in this lab.
Example 2. Run the following in order to obtain an estimated CDF for uniform (0,1) data:
n=100000;
x=rand(1,n);
N=floor(n/100);
1
A=min(x); B=max(x);
Delta=(B-A)/N;
t=A-Delta/2+[1:N]
*
Delta; p=hist(x,t)/n;
CDF=cumsum(p);
plot(t,CDF)
Does the plot look like the CDF of the uniform (0,1) density? Compare the code above to the code in Example
1; see from this comparison if you understand how the code above does its job. The estimated CDF plot looks
smoother that the estimated density plot- Why?
Example 3. Obtain a plot on your screen of estimated standard gaussian density from 100000 randn pseudo-
randomly generated samples by running the following code:
n=100000;
x=randn(1,n);
N=floor(n/100);
A=min(x); B=max(x);
Delta=(B-A)/N;
t=A-Delta/2+[1:N]
*
Delta;
f=hist(x,t)/(Delta
*
n);
bar(t,f)
title(Estimated Standard Gaussian PDF)
Compute 1/
2. Is this about equal to the peak value of the estimated density curve that you see on your
computer screen?
You can check by drawing the theoretical PDF on the same gure by adding three lines to the above m le as
below:
hold on
pdf=1/sqrt(2
*
pi)
*
exp(-t.2/2);
plot(t,pdf)
t=A-Delta/2+[1:N]
*
Delta;
f=hist(x,t)/(Delta
*
n);
bar(t,f)
title(Estimated Standard Gaussian PDF)
Compute 1/
2. Is this about equal to the peak value of the estimated density curve that you see on your
computer screen?
Example 4. Run the following to get an estimate of the plot of the CDF of a standard Gaussian distribution:
n=100000;
x=randn(1,n);
N=floor(n/100);
A=min(x); B=max(x);
Delta=(B-A)/N;
t=A-Delta/2+[1:N]
*
Delta;
p=hist(x,t)/n;
CDF=cumsum(p);
plot(t,CDF)
title(Estimated Gaussian(0,1) CDF)
2
Use your standard Gaussian CDF table on page 142 to see if the actual CDF values at z = 0, 0.5, 1.0, 1.5, 2.0
conform to the estimated CDF values you get from the curve on your screen.
Example 5. We know we can simulate an exponential distribution with density ae
ax
u(x) with parameter a
with the transformation X = log(U)/a, where U is uniform in the interval [0, 1]. To test this, run the code:
n=100000;
u=randn(1,n);
a=0.5;
x=-log(u)/a;
N=floor(n/100);
A=min(x); B=max(x);
Delta=(B-A)/N;
t=A-Delta/2+[1:N]
*
Delta;
f=hist(x,t)/(Delta
*
n);
bar(t,f)
title(Estimated Exponential(a) PDF (a=0.5))
Look at the peak value of your estimated density curve. Is this about what you expected it to be? Try another
value of a if time permits.
Example 6. Run the following code to get the estimated CDF plot for an exponential distribution:
n=100000;
u=randn(1,n);
a=0.5;
x=-log(u)/a;
N=floor(n/100);
A=min(x); B=max(x);
Delta=(B-A)/N;
t=A-Delta/2+[1:N]
*
Delta;
p=hist(x,t)/n;
CDF=cumsum(p);
plot(t,CDF)
axis([0 5 0 1])
title(Estimated Exponential(a) CDF (a=0.5))
Compute the actual values of the CDF 1 e
(0.5)x
at x = 1, 1.5, 2, 2.5, 3 and see how these compare to what
the estimated CDF plot gives.
Example 7. Run the following code, which generates estimated PDF and CDF plots for a Rayleigh distribution:
x=randn(1,100000);
y=randn(1,100000);
r=sqrt(x.2+y.2);
N=floor(n/100);
A=min(r); B=max(r);
Delta=(B-A)/N;
t=A-Delta/2+[1:N]
*
Delta;
f=hist(r,t)/(Delta
*
n);
subplot(2,1,1)
bar (t,f)
3
title(Estimated Rayleigh PDF)
p=Delta
*
f;
CDF=cumsum(p);
subplot(2,1,2)
plot(t,CDF)
title(Estimated Rayleigh CDF)
Example 8. Run the following code, which generates estimated PDF and CDF for a chi-square distribution:
n=100000;
sigma=1;
x=sigma
*
randn(1,n);
y=x.2;
N=floor(n/100);
A=min(y); B=max(y);
Delta=(B-A)/N;
t=A-Delta/2+[1:N]
*
Delta;
f=hist(y,t)/(Delta
*
n);
subplot(2,1,1)
bar(t,f)
title(Estimated chi-square PDF)
p=Delta
*
f;
CDF=cumsum(p);
subplot(2,1,2);
plot(t,CDF)
title(Estimated chi-square CDF)
Examine the effect of changing to = 0.5 in line two of the program. (The here is the standard deviation
of the underlying Gaussian distribution, not the standard deviation of the chi-square distribution.)
2 Exp 2: Estimating Means/Variances from Data
Given a vector of data points x from a probability distribution (discrete or continuous), mean(x) is used to estimate
the mean of the distribution , var(x) is used to estimate the variance
2
of the distribution, and std(x) is used to
estimate the standard deviation .
Example 9. What is the mean and variance
2
of the standard uniform distribution? (That is, the uniform
distribution in the interval [0, 1]?) Look these two values up in your textbook if you do not know the answer.
Run the following lines of code to estimate ,
2
, from data.
x=rand(1,50000);
mean(x)
var(x)
std(x)
2
of the standard Gaussian distribution? Look them up if you
dont know them. Run the following lines of code to estimate ,
2
, from data.
4
x=randn(1,50000);
mean(x)
var(x)
std(x)
2
of the exponential distribution with parameter a = 1? Look
up these if you dont know them. A data point simulating data from this distribution is -log(rand(1,1)).
Run the following lines of code to estimate ,
2
, from data.
x=-log(rand(1,50000));
mean(x)
var(x)
std(x)
3 Exp 3: Effect of Linear Transformations on Means/Variance
Let X be a RV with mean
X
and variance
2
X
. Let Y be a RV
Y = aX +b,
where a, b are constants. We will show in class that the mean and variance of Y are related to the mean and variance
of X by the equations
Y
= a
X
+b
2
Y
= a
2
2
X
.
This experiment will provide numerical verication of this fact.
Example 12. Let X be uniformly distributed in [0, 1]. Let Y = 3X + 7. What will the mean and variance of
Y be? Try to gure this out using paper and pencil. Then, do the following simulation to get the approximate
answers.
x=rand(1,50000);
y=-3
*
x+7;
mean(y)
var(y)
Example 13. Let X be standard Gaussian. Let Y = 3X + 7. What will the mean and variance of Y be? Try
to gure this out using paper and pencil. Then, do the following simulation to get the approximate answers.
x=randn(1,50000);
y=-3
*
x+7;
mean(y)
var(y)
Example 14. Let X be exponential with parameter a = 1. Let Y = 3X +7. What will the mean and variance
of Y be? Try to gure this out using paper and pencil. Then, do the following simulation to get the approximate
answers.
x=-log(rand(1,50000));
y=-3
*
x+7;
mean(y)
var(y)
5
Lab Homework 1
1. Let us denote the chi-square random variable Y , then Y =

n
k=1
X
2
k
, where X
k
is the independent and
identically distributed Gaussian random variable with zero mean and variance
2
. Generate 100,000
chi-square samples with degree of 4 (n = 4) and set
2
such that E[Y ] = 1.
Estimate the PDF, the mean and the variance. Plot the estimated PDF and the exact PDF curves on the
same gure. Calculate the exact mean and the variance of the chi-square with degree of 4 and compare
the exact values with the estimated values.
2. Let us denote the Ricean random variable Y , then Y is
Y = X
2
1
+X
2
2
where X
1
and X
2
are Gaussian random variable with means of m
1
and m
2
and variance of
2
, respec-
tively. Denote s
2
= m
2
1
+m
2
2
. Generate 100,000 Ricean samples with m
1
= m
2
= 1/
2 and
2
= 1.
Estimate the PDF, the mean and the variance. Plot the estimated PDF and the exact PDF curves on the
same gure. Calculate the exact mean and the variance of the chi-square with degree of 4 and compare
the exact values with the estimated values.
In this lab we will learn some special random variables which are widely used not only in the communication area
but also in many other natural phenomena. Those are (1) Central-Chi square and (2) Non-Central Chi-Square. As
special cased for (1) and (2), we consider the Rayleigh and Ricean random variables, respectively.
Central Chi-Square Density
Dene the random variable Y as
Y =
n
i=1
X
2
i
(1)
where the X
i
, i = 1, 2, , n are statistically independent and identically distributed (i.i.d) Gaussian
random variables with zero-mean and variance
2
. Then, the characteristic function of Y is given by
Y
(j) E{e
jY
} =
1
(1 j2
2
)
n/2
(2)
which is the consequence of the statistical independence of the X
i
s. The pdf of Y is obtained of the
inverse Fourier transform, that is
f
Y
(y) =
1
n
2
n/2
(n/2)
y
n/21
e
y/2
2
, y 0 (3)
where (q) is the gamma function dened as
(q) =
_

0
t
q1
e
t
dt, q > 0
(q) = (q 1)!, q an integer and q > 0
(
1
2
) =

,
_
3
2
_
=
2
This pdf is called the Gamma or Chi-square pdf with n degrees of freedom and is illustrated in (3) for
several values of n.
2
(n).
6
0 2 4 6 8 10 12 14 16
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
f
Y
(
y
)
y
n=1
n=2
n=4
n=8
Figure 1: Pdf of a chi-square-distributed random variable for several degrees of freedom(
2
= 1)
The rst and second moments of Y can be computed to give
E{Y } = n
2
E{Y } = 2n
4
+n
2
2
Y
= 2n
4
(4)
The cumulative distribution function (cdf) of Y is obtained by integrating (3), that is
F
Y
(y) Prob{Y y} =
_
y
0
1
n
2
n/2
(n/2)
u
n/21
e
u/2
2
du (5)
For the special case when n is even (let k=n/2), (5) can be computed in closed form to give
F
Y
(y) = 1 e
y/2
2
k1
j=0
1
j!
_
y
2
2
_
j
Noncentral Chi-Square Density
when the Gaussian random variables of (1) have nonzero means m
i
, i = 1, 2, , n but still identical
variances
2
, the characteristic function of Y is no longer given by (2) but becomes
Y
(j) =
1
(1 j2
2
)
n/2
exp
_
j
n
i=1
m
2
i
1 j2
2
_
(6)
As expected, (6) reduces to (2) when all the means are zero. The inverse Fourier transform of this charac-
teristic function yields
f
Y
(y) =
1
2
2
_
y
s
2
_
(n2)/4
exp
_
s
2
+y
2
2
_
I
n
2
1
_
y
s
2
_
, y 0 (7)
7
where s
2
is the sum of means squared, that is
s
2
=
n
i=1
m
2
i
and I
k
(x) is the k
th
-order modied Bessel function of the rst kind which can be expressed by the innite
series
I
k
(x) =
j=0
(x/2)
k+2j
j!(k +j + 1)
, x 0
The pdf, as given by (7), is typically referred to as the noncentral Chi-square density with n degrees
of freedom. The parameter s
2
is called the noncentrality parameter of the pdf. The cdf is obtained by
integration as
F
Y
(y) =
_
y
0
1
2
2
_
u
s
2
_
(n2)/4
exp
_
s
2
+u
2
2
_
I
n
2
2
_
u
s
2
_
du
Here again, when n is even (k=n/2), the cdf can be expressed in terms of the generalized Q-function
Q
k
(x, y) as
F
Y
(y) = 1 Q
k
_
s
_
where
Q
k
(a, b)
_

b
x
_
x
a
_
k1
exp
_
x
2
+a
2
2
_
I
k1
(ax) dx
= Q
1
(a, b) + exp
_
x
2
+a
2
2
_
k1
j=1
_
b
a
_
j
I
j
(ab)
The function Q
1
(a, b) is typically denoted by Q(a, b) [not to be confused with the Gaussian tail integral
Q(x) which is a function of one variable] and is referred to as the Marcum Q-function and is given by
Q(a, b)
_

b
xexp
_
x
2
+a
2
2
_
I
0
(ax) dx
The rst and second moments of Y can be computed in closed form to give
E{Y } = n
2
+s
2
E{Y
2
} = 2n
4
+ 4
2
s
2
+ (n
2
+s
2
)
2
2
y
= 2n
4
+ 4
2
s
2
which reduce to (4) when the means are zero(that is, s=0)
8
Rayleigh Density
Suppose we dene a new random variable R which is
R
Y =
_
n
i=1
X
2
i
(8)
where the X
i
, i = 1, 2, , n are as dened earlier and the means are zero. Then, the pdf of R can be
easily obtained from the Chi-square pdf with a simple transformation to give
f
R
(r) =
r
n1
2
(n2)/2
n
(n/2)
exp
_
r
2
2
2
_
, r 0 (9)
Here again, when n is even (let k = n/2), the cdf of R can be obtained in closed form as
F
R
(r) = 1 exp
_
r
2
2
2
_
k1
j=0
1
j!
_
r
2
2
2
_
j
, r 0
and the m
th
moment of R is given by
E{R
m
} = (2
2
)
m/2
((n +m)/2)
(n/2)
, m 0
for any integer m. As a special case of (9) when n = 2, we obtain the familiar Rayleigh pdf
f
R
(r) =
r
2
exp
_
r
2
2
2
_
, r 0 (10)
with corresponding cdf
F
R
(r) = 1 exp
_
r
2
2
2
_
, r 0 (11)
9
Rician Density
When R is still dened by (8) and the X
i
s are nonzero means (each X
i
has mean m
i
) each with variance
2
, then the pdf of R becomes
f
R
(r) =
r
n/2
2
s
(n2)/2
exp
_
r
2
+s
2
2
2
_
I
n
2
1
_
rs
2
_
, r 0 (12)
with corresponding cdf for n even (k = n/2)
F
R
(r) = 1 Q
k
_
s
_
where s
2
is dened in (8). The m
th
moment of R is
E{R
m
} = (2
2
)
m/2
exp
_
s
2
2
2
_
((n +m)/2)
(n/2)
1
F
1
_
n+m
2
,
n
2
;
s
2
2
2
_
where
1
F
1
(, , ; x) is the conuent hypergeometric function given by
1
F
1
(, , ; x)
j=0
( +j)()x
j
()( +j)j!
, = 0, 1, 2,
when n = 2, (12) reduces to the familiar Rician pdf
f
R
(r) =
r
2
exp
_
s
2
+r
2
2
2
_
I
0
_
rs
2
_
, r 0 (13)
where s =
_
m
2
1
+m
2
2
and corresponding cdf
F
R
(r) = 1 Q
_
s
,
r
_
(14)
when s = 0, the Rician pdf and cdf of (13) and (14) reduce to the Rayleigh pdf and cdf of (10) and (11),
respectively.
10
On The Joint Density of Amplitude and Phase
So far, we have been concentrating on the marginal densities of Y or R, basically ignoring any phase
information. Lets dene
R =
_
X
2
1
+X
2
2
and tan =
X
2
X
1
where X
1
and X
2
are i.i.d. Gaussian random variables with variance
2
and m
1
= m but m
2
= 0.
We already know that R is Rician distributed if m = 0 but Rayleigh distributed if m = 0. It would
be interesting to derive the joint pdf f
R,
(r, ) for both cases. Since R and are two functions of two
random variables, it is easily shown that
f
R,
(r, ) = rf
X
1
,X
2
(r cos , r sin), r 0,
=
r
2
2
exp
_
1
2
2
_
(r cos m)
2
+ (r sin)
2
__
or equivalently
)f
R,
(r, ) =
r
2
2
exp
_
1
2
2
_
r
2
2mr cos +m
2
__
, r 0, (15)
To obtain the marginal densities, we integrate the joint density
f
R
(r) =
_

f
R,
(r, ) d =
r
2
2
exp
_
r
2
+m
2
2
2
__

exp
_
2mr cos
2
2
_
d
=
r
2
exp
_
r
2
+m
2
2
2
_
I
0
_
mr
2
_
, r 0
which is the Rician pdf as expected with s = |m|. On the other hand, the marginal pdf of becomes
f
() =
_

0
f
R,
(r, ) dr
=
1
2
2
exp
_
m
2
2
2
__

0
r exp
_
r
2
2mr cos
2
2
_
dr (16)
completing the square in the integral and then substituting x = r mcos , (??) reduces to
f
() =
1
2
2
exp
_
m
2
sin
2
2
2
__

mcos
(x +mcos ) exp
_
x
2
2
2
_
dx
or
f
() =
mcos
2
exp
_
m
2
sin
2
2
2
__
1 Q
_
mcos
__
+
exp
_
m
2
2
2
_
2
(17)
note that when m = 0, the joint pdf in (??) becomes
f
R,
(r, ) =
r
2
2
exp
_
r
2
2
2
_
= f
R
(r)f
()
where f
R
(r) = (r/
2
) exp(r
2
/2
2
), r 0 and f
() = 1/2, . Hence, R and

are independent random variables with R Rayleigh distributed and uniformly distributed. On the other
hand, when m = 0, R and are dependent with joint pdf as given by (??). The marginal pdf of R is
Rician and F
() is given by (??) and is plotted in (??) as a function of |m|/. As one would expect, the
pdf of becomes quite peaked as |m|/ increases.
11
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
()
f
)
|m|/=4
2
1
0
Figure 2: Pdf for phase angle, Rician channel
12

Estimating PDF'S, Means, Variances: 1 Exp 1: Estimated PDF/CDF Plots From Data

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Estimating PDF'S, Means, Variances: 1 Exp 1: Estimated PDF/CDF Plots From Data

Hochgeladen von

Copyright:

Verfügbare Formate

KEEE494: 2nd Semester 2009 Lab I

Estimating PDFs, Means, Variances

() = 1/2, . Hence, R and

Das könnte Ihnen auch gefallen