Beruflich Dokumente
Kultur Dokumente
&
ESTIMATION
THEORY
Control Systems: where the position of a powerboat for the corrective navigation system has
to be estimated in the presence of sensor and environmental noise or the occurrence of an
abrupt change in the system is to be detected.
Communication Systems: where the transmitted signals have to be identified at the receiver
or the carrier frequency of a signal is to be estimated for demodulation of the baseband signal
in the presence of degradation noise.
Image Processing: where an object has to be identified or its position and orientation from a
camera image has to be estimated in the presence of lighting and background noises.
Radar Systems: where the occurrence of an air-bourne target (e.g., an aircraft, a missile) is
to be detected or the delay of the received pulse echo is to be estimated to determine the
location of the target in the presence of noises.
Siesmology: where the presence of underground oil is to be detected or the distance of oil
deposit has to be estimated from noisy sound reflection due to different densities of oil and
rock layers.
Sonar Systems: where the presence of a submarine is to be detected or the delay of the
received signal from each of sensors is to be estimated to locate it in the presence of noises
and attenuations.
Speech Processing: where the presence of different events (such as phonemes or words) is to
be detected in speech signal in context of speech recognition application or the parameters of
the speech production model have to be estimated in-context of speech coding application in
the presence of speech/ speaker variability and the environmental noises.Apart from these, a
number of applications stemming from the analysis of data from physical phenomena,
economics, etc., could also be mentioned.
The majority of applications require either detection one or more events of interest and/or the
estimation of an unknown parameter from a collection of observation data which also
includes “artifacts” due to sensor inaccuracies, additive noise, signal distortion (convolution
noise), model inaccuracies, unaccounted source of variability and multiple interfering signals.
These artifacts make the dectection and estimation a challenging problems.
The PDFs under each hypothesis are denoted by and , which for this
example are:
Note that in deciding between and , we are essentially asking whether x[0] has been
generated according to PDF or . Alternatively, if we consider the
family of PDFs
This distinction is analogous the classical versus Bayesian approach to parameter estimation
highlighted earlier.
Hierarchy of Detection Problems
The detection problem in its simplest form assumes that both signal and noise characteristics
are completely known. If the characteristics of signal and/or noise are unknown or not known
completely it leads to detection problem becoming more challenging as well as complex. The
hierarchy of the detection problems along with their typical applications are listed in
Table 1.2.
Conditions Applications
Level 1: Known signals in noise 1. Synchronous digital communication
2. Pattern recognition
Level 2: Signals with unknown 1. Digital communication system without phase reference
parameters in noise 2. Digital communication over slowly fading channels
3. Conventional pulse radar and sonar, target detection
and the objective is to decide which one of these hypothesis is true based on the observed
data.
We begin with those decision making problems in which the PDF for each assumed
hypothesis is completely known, that is why they are referred to as simple hypothesis testing
problems. The primary approaches to simple hypothesis testing are the classical approach
based on Neyman-Pearson theorem and the Bayesian approach based on minimization of the
Bayes risk. In many ways these approaches are analogous to the classical and Bayesian
methods of statistical estimation theory.
8.2.2 Neyman-Pearson (NP) Detector
Before explaining the NP detector, we first describe some relevant terminologies. Suppose
we observe realization of a random variable whose PDF is either (0, 1) and (1, 1). The
detection problem cam be summarized as:
The PDFs under each hypothesis along with the probabilities of hypothesis testing errors as
shown in Figure 8.1. A reasonable approach might be to decide 1 if x[0] > 1∕2. This is
because if x[0] > 1∕2 the observed sample more likely if 1 is true. Our detector then
compares the observed datum value with 1∕2, which is called threshold.
With this scheme we can make two errors. If we decide 1 but 0 is true, we make Type I
error. On the other hand, if we decide 0 when 1 is true, we make a Type II error. The
terms Type I and Type II errors are use in statistical domain but in engineering domain these
errors are referred to as false alarm and miss, respectively. The term P( i; j) indicates the
probability of deciding i when j is true. Note that it is not possible to reduce both error
probabilities simultaneously. A typical approach is to hold one error probability value fixed
while minimizing the other.
Figure 8.1: PFDs in binary hypothesis testing, possible errors and their probabilities
In general terms the goal of a detector is to decide either 0 or 1 based on observed data
x = [x[0] x[1]… x[N - 1]] . This is a mapping from each possible data set value into a
T
decision. The decision regions for the previous example are shown in Figure 8.2. Let R1 be
the set of values in RN that map into decision 1. Then probability of false alarm PFA (i.e.,
where α is termed as significance level or size of the test in statistics. Now there are many
R1that satisfy the above relation. Our goal is to choose the one that maximizes the probability
of detection defined as:
where the threshold γ is computed from the constraint on the probability of false alarm value:
Then function L(x) is termed the likelihood ratio and the entire test is called as likelihood
ratio test (LRT).
8.2.3 Example
Consider the general signal detection problem:
where signal s[n] = A for A > 0 and w[n] is WGN with variance ζ².
The NP detector decides 1 if:
Taking logarithm of both sides, simplifying and moving non-data dependent terms to right-
hand side, we have:
Thus the NP detector compares the sample mean to a threshold γ′. Note that the test
statistic is Gaussian under each hypothesis:
We have then:
and
and therefore:
where μ1 > μ0. Thus we decide between two hypotheses that differ by a shift in mean of T. For
this type of detector the detection performance is totally characterized by the deflection
coefficient defined as:
In the case when μ0 = 0, d² = μ1²∕ζ² which may be interpreted as the signal-to-noise ratio
(SNR).
Since,
so we have,
The detection performance is, therefore, monotonic with respect to the deflection coefficient.
8.3.2 Receiver Operating Characteristics
The receiver operating characteristic or simply ROC curve is an graphical means to illustrate
the performance of NP detector. The ROC curve was first used in World War II for the
analysis of radar signals and later it was employed in signal detection theory. In ROC curve,
each point corresponds to a value (PFA, PD) for a given threshold value. By adjusting the value
of the threshold any point of the curve may be obtained. As the threshold increases the
PFA decreases but so does the PD and vice-versa.
We already know that for the DC level in WGN example, we have
and
But the probability of occurance of head in the coin toss has no dependence upon which
hypothseis is true and therefore PFA = PD = p. This detector then generates the point (p, p) on
the ROC. Considering the difference values of p it would generate a 45° line on the ROC
curve and the same has been marked as a dotted line.
The family of ROCs generated for different values of the deflection coefficient d² are also
shown in Figure 8.3 . As d² increases, the value of PD obtained for a smaller value of PFA also
increases. For d² → ∞, the ideal ROC is obtained i.e., PD = 1 for any value of PFA.
Further, as the threshold γ varies from -∞ to ∞, the point (PFA(γ), PD(γ)) along the ROC curve.
The salient properties of the ROC curve for binary hypothesis testing are:
1. If threshold γ → -∞, the detector always decides 1 and PFA = PD = 1. Thus point (1, 1) belongs
to the ROC curve.
2. If threshold γ → ∞, the detector never decides 1 and PFA = PD = 0. Thus point (0, 0) belongs to
the ROC curve.
3. The slope of the ROC curve at any point (PFA(γ), PD(γ)) is equal to the threshold γ.
4. All points of the ROC curve satisfy PD ≥ PFA.
5. The ROC curve is concave i.e., the domain of the achievable pairs (PD, PFA) is convex.
6. The region of feasible tests is symmetric about the point (0.5, 0.5) i.e., if (PFA, PD) is feasible, so
is (1 - PFA, 1 - PD).
8.3.3 Example
Consider a random variable Y given by Y = N + λθ, where θ is either 0 or 1, λ is a fixed
number between 0 and 2, and N ~ (-1, 1). We wish to decide between the hypotheses
Find (i) Neyman-Pearson (NP) decision rule to decide 1 for 0 ≤ PFA ≤ 1. (ii) Sketch the
receiver operating characteristics.
The distribution of the two hypotheses are plotted in Fig 8.4 . Let a threshold γ is chosen to
give the specified value of false alarm probability PFA, then
From the above relationship, the ROC for different values of λ is plotted and is also shown in
Fig 8.4.
It can be shown that the detector which minimizes the Bayes risk is to decide 1 if:
This detector which minimizes Pe for any prior probability is termed as the maximum a
posteriori probability (MAP) detector.
In cases where the prior probability of the hypotheses are equal, the detector which
minimizes Pe decides 1 if:
where A > 0 and w[n] is WGN with variance ζ². It is reasonable to assume that
P( 0) = P( 1) = 1∕2. The receiver that minimizes the Pe decides 1 if:
or we decide 1 if > A∕2. This detector is same as obtained with NP criterion except for
the threshold and of course the performance. To determine the Pe we note that:
Thus,
minimize that risk function. This principle of minimizing the maximum average cost for the
selected P1 is referred to as minimax criterion.
The Bayes‟ risk for a binary hypothesis testing problem is given by
Further we can express the probability of different decisions in terms of the probability of
false alarm PFA, the probability of miss PM and the probability of detection PD as,
risk as
Since either of the hypothesis 0 and 1 will always occurs i.e., P0 = 1 - P1. Using this
Assuming a fixed value of P1 with P1 ∈ (0, 1), Bayes‟ test decides 1 if,
As P1 varies, the decision regions change in turn causing a variation in the average cost which
would be larger than the Bayes‟s cost. The two extremes possible values of P1 are 0 or 1.
When P1 = 0, then the threshold is ∞. We always decide 0, and
If P1 = P1*, such that P1* ∈ (0, 1) then the risk as a function of P1 and is shown in Figure 8.5
If the cost of correct decisions are individually zero (C00 = C11 = 0), then minimax rule
for P1 = P1* reduces to
Furthermore, if the cost of incorrect decisions are individually one (C01 = C10 = 1), then
minimax rule for P1 = P1* reduces to
PM = PFA
and the minimax cost in this case is
where u(y) is the unit step function. For uniform costs (C00 = C11 = 0 and C01 = C10 = 1), find
the minimax decision rule to decide 1.
Figure 8.6: PDFs of the hypotheses for minimax decision rule example
The PDF under two hypothesis is plotted in Figure 8.6. Let the chosen threshold be y = γ,
then we have
On using the complimentary error function table, the value of γ that satisfies the above
relation turns out to be γ ≈ 0.565.
Thus the minimax decision rule decides 1 if
y > 0.565
for uniform costs assignment (C00 = C11 = 0 and C01 = C10 = 1) we have that = Pe
Let the decision region Ri = {x : decide i} where i = 0, 1, …, M - 1. These Ri‟s together
partition the space so that
As each x must be assigned to one and only one of the decision regions Ri‟s. The cost
contribution to if x is assigned to R1, say for example, is C1(x)p(x)dx. Then for
assigning xto R2 is C2(x)p(x)dx and so on. Generalizing that, we should assign x to Rk if
is minimum for i = k.
Hence, we should choose the hypothesis that minimizes
over i = 0, 1,…,M - 1.
To determine the decision rule that minimizes Pe we use the uniform costs, then
Since the first term is independent of i, the cost Ci(x) is minimized by maximixing P( i |x).
Thus, the minimum Pe decision rule is to decide k if
where the additive noise w[n] is Gaussian with zero mean and variance ζ². The cost are
Cii = 0 and Cij = 1 for i≠j and i, j = 1, 2, 3. Determine the decision regions and the minimum
probability of error Pe.
Figure 8.7: Decision regions for multiple DC signals in while Gaussian noise for N = 1 case
In this problem, as the prior probabilities are equal P( 0) = P( 1) = P( 3) = 1∕3, the ML
decision rule applies. First consider the simple case of N = 1, the PDF of three hypotheses are
shown in Figure 8.7 . By symmetry, it is obvious that as per the ML rule to minimize Pe we
should decide 0 if y[0] < 1.5, 1 if 1.5 < y[0] < 2.5, and 2 if y[0] > 2.5.
For multiple samples (N > 1) case, the multivariate PDFs as well as the decision regions are
not feasible to plot. In this case we need to derive a test statistic and for doing the same note
that the conditional PDF of the hypotheses can be compactly written as
Further using as the mean of observations y[n]‟s and on manipulation, we can express Di²
as
To detemine the minimum Pe, let us note that in this case there are six types of errors unlike
that in binary case. In general for an M-ary detection problem there are M² - M = M(M - 1)
error types. It is therefore easiler to determine 1 - Pe = Pc , where Pc is the probability of
correct decisions. Thus
As conditioned on , we have
so that
Since the amplitude A is unknown, the PDF is not completely specified so we cannot directly
perform the LRT tests as discussed earlier.
There are two approaches to composite hypothesis testing:
1. Bayesian approach: In this approach, the unknown parameter considered as realizations of random
variables and assigned a prior PDF.
2. Generalized likelihood ratio test: In this approach, the unknown parameters are first estimated and
then used in a likelihood ratio test.
The general problem is to decide between 0 and 1 when the PDFs depend on different
sets of unknown parameters. These parameters may or may not be the same under each
hypothesis. Under 0 assume that vector parameters θ0 is unknown while under 1 assume
that vector parameters θ1 is unknown.
The unconditional PDFs p(x; 0) and p(x; 1) are now completely specified. They no
longer dependent on the unknown parameters. With the Bayesian approach the optimal NP
detector decides 1 if,
Remark: In this approach the required integrations are multidimensional with dimension
equal to the unknown parameter dimension. The choice of prior PDFs can also prove to be
difficult. If indeed some prior knowledge is available, then it should be used. If not, one can
used non-informative prior i.e., the one having PDF as „flat‟ as possible.
8.7.3 Example
Detection of the unknown DC level in WGN: Bayesian approach
where the DC level in WGN A in unknown and can take on any value -∞ < A < ∞ and w[n] is
WGN with variance ζ².
To solve this problem using Bayesian approach, we assign a prior A ~ (0,ζA²) where A is
independent of the noise w[n]. The conditional PDF under 1 is given by
But
Letting
On taking logarithm of both sides and retaining only the data dependent terms we decide
1if
or
Remark: Note the form of the detector. As the unknown DC level can either be positive or
negative so the detector is formed by comparing either the square or the absolute value of the
sufficient statistic with an appropriate threshold.
where 1 is the MLE of θ1 assuming 1 is true (i.e., maximizes p(x; 1)), and 0 is the
MLE of θ0 assuming 0 is true (i.e., maximizes p(x; 0)). This approach also provides
information about the unknown parameters since the first step in determining LG(x) is to find
the MLEs.
20 Detection & Estimation Theory ECE-A.P IIIT Nuzvid
8.7.3 Example
Detection of the unknown DC level in WGN: GLRT approach
Assume θ1 = A and there is no unknown parameters under 0. The hypothesis test becomes,
or GLRT decides 1 if
Remark: Note the form of the detector is the identical to that has been obtained using the
Bayesian approach. The derivation of the detector using GLRT often turns out to be much
simpler than that of Bayesian approach.
Instead of summing the observations and comparing it with a threshold, if we count the
number of times the observation samples exceed zero then this could also be used to detect
the unknown fixed positive voltage level. In this case, the detector can be given as
and γu denotes an appropriately chosen threshold. Such a detector essentially counts the
number of positive signs in the observations so is terned as the sign detector. A sign detector
is very simple to implement as requiring a hard limiter followed by an adder. Unlike that of
sample mean detector, the performance analysis of sign detector is rather involved and is
undertaken in the following.
where p is the probability of the observed data sample being positive given a fixed positive
voltage A is transmitted.
Let d(n) denote the sign of x[n] by
For the sign detector, the likelihood ratio test (LRT) for deciding 1 can be given as
where
Further on taking the logarithm to the base (p∕(1 - p)), we can express the LRT in more useful
form as
22 Detection & Estimation Theory ECE-A.P IIIT Nuzvid
Note that N+ is the sum of Bernoulli distributed random variables. Under the hypothesis 0,
the binomial distribution N+ has parameters N and 0.5, i.e.,
Under the hypothesis 1, then the binomial distribution N+ has parameters N and p, i.e.,
9.2.2 Example
Derive a sign detector that uses nine observations and ensures a probability of false alarm
probability of 0.1 for detecting a positive signal A in presence of zero mean Gaussian noise
and analyze its performance for probability of data being positive (i) 0.75 and (ii) 0.99.
Given N = 9, the detection problem is,
or
Since
so
while
or
so we can have a recursive arrangement for the LRT with initial condition Λ(x1) = Λ(x1).
For a fixed value of PFA and PM, the thresholds η0 and η1 need to be derived that meet the
constraints:
Hence,
To summarize, the sequential LRT detector decides 1 to be true if Λ(x N) > η1, and if
Λ(xN) < η0 then it decides 0 to be true. If Λ(xN) > η0 but smaller than η1, then another
samples is taken to form Λ(xN+1) and the test is repeated.
This kind of test allows the user to terminate the test earlier than the conventional NP test,
once the presence or absence of a target has been determined with an acceptable level of error
(PFA or PM).
Remarks: The salient limitations of this approach-
Samples are assumed to be IID
PF and PM are assumed to be constant
9.3.2 Example
Consider the detection of DC level A in additive Gaussian noise with zero mean and variance
ζ². Conduct a sequential likelihood ratio test (SLRT) to detect the presence and absence of
the signal. It is desired to terminate the test when PM ≤ β or PFA ≤ α.
The detection problem is,
We know that thresholds η1 and η0 for deciding the hypotheses 1 and 0, respectively can
be given as
The outcome of the LRT versus the number of samples considered are shown in Figure 9.1 .
Note the case for N = 1, 2,…, 7 when the decision is deferred and another sample is required
to be taken since neither of the thresholds are crossed. When N = 8, the upper threshold is
crossed, thus allowing to make the decision that the signal present hypothesis ( 1) being
true.
where the signal s[n] is assumed known determinstic signal and the noise w[n] is WGN with
variance ζ².
Recall that the NP detection decides 1 if
Thus we decide 1 if
26 Detection & Estimation Theory ECE-A.P IIIT Nuzvid
where T(x) is test statistic and γ′ a threshold which is chosen to satisfy PFA = α for a given α.
It is clear that the received data is correlated with the signal replica and therefore it is often
referred to replica-correlator detector. Figure 10.1 shows the block diagram of the replica-
correlator detector.
Figure 10.1: Replica-correlator detector for deterministic signal in white Gaussian noise
10.1.2 Matched Filter Detector
In this we show that the replica-correlator detector can interpreted as doing finite impulse
response (FIR) filtering on the data. Assume that x[n] is the input to a FIR filter with impulse
response h[n] where h[n] is nonzero for n = 0, 1,…,N - 1. The output of the filter at time n ≥ 0
is given by
If we consider the impulse response of the FIR filter to be a “flipped-around” version of the
signal to be detected or
then
Thus we decide 1 if
where T(x) is test statistic and γ′ a threshold which is chosen to satisfy PFA = α for a given α.
It is clear that the received data is correlated with the signal replica and therefore it is often
referred to replica-correlator detector. Figure 10.1 shows the block diagram of the replica-
correlator detector.
Figure 10.1: Replica-correlator detector for deterministic signal in white Gaussian noise
where H(f) and X(f) are the discrete-time Fourier transforms of h[n] and x[n], respectively.
From matched filter interpretation, H(f) = {h[n]} = {s[N - 1 - n]}, where {.}
represents the discrete-time Fourier transform. So the filter Fourier transform H(f) can be
shown to be
then we have
or equivalently
As under each hypothesis the data samples x[n] are Gaussian and since test statics T(x) is a
linear combination of Gaussian random variables, T(x) is also Gaussian. Let E(T; i) and
var(T; i) denote the expected value and the variance of T(x) under i, then
where we have used the fact that w[n]‟s are uncorrelated. Thus
It is to note that as ε∕ζ² increases the PDFs retain their shape but move further apart, this
obviously increases the detection performance. As the PDFs under both hypotheses are
known, we can find the PFA and PD as
where Q(x) = 1 - ϕ(x) and ϕ(x) is CDF of standard normal distribution, (0, 1). Since CDF
is monotonically increase so Q(x) must be monotonically decreasing and so is Q-1(x). On
substituting into the expression for PD, we can write
The above relation establishes that PD monotonically increases with increase in ε∕ζ² i.e., the
energy-to-noise ratio.
where A > 0. Which one of the signals would yield the better detection performance?
The two hypotheses are,
where the signal si[n] denotes either of the given deterministic signals s1[n] or s2[n] and the
noise w[n] is WGN with variance ζ².
Note that in case of detection of the known deterministic signal in WGN with variance ζ², the
detection performance of a NP detector is completely characterised by the signal-to-noise
ratio as
As both the signals have identical energy, they would yield the identical detection
performance under WGN.
On simplifying and incorporating the data dependent terms into threshold, we decide 1 if
This is referred to as a generalized matched filter. Note that for WGN, C = ζ²I, the detector
reduces to
Further it may be viewed as a replica-correlator where the replica is the modified signal s′= C-
1
s, then
Thus the linear transformation D is a whitening transform and the generalized matched filter
can also be viewed as a pre-whitener followed by a replica-correlator or matched filter as
shown in Figure 10.3
is a linear transformation of the data x so the PDF of the test statistic under either of the
hypothesis remain Gaussian same as that of data. The first two moments of the test statistic
under either of the hypothesis are determined below:
It is to note that in case of the correlated noise the signal can be designed to maximize sT C-
1
sand hence PD unlike that of white noise case in which the shape of the signal has no
importance only the signal energy mattered.
energy. So the optimal signal is chosen by maximizing sC-1sT subject to the fixed energy
constraint sTs = ε. Making use of Lagrangian multiplier, we maximize the function
or
Therefore,
Thus, we should choose the signal s as the eigenvector of C-1 whose corresponding
eigenvalue, i.e., λ is maximum. Alternatively, we should choose the signal as the eigenvector
of C that has the minimum eigenvalue.
10.5.2 Example
It is desired to design a signal for achieving the best detection performance in colored WSS
Gaussian noise with auto covariance function (ACF), rww[k] = P + ζ²δ[k] where both P and ζ
are non-zero positive constants. Two competing signals proposed are:
where A > 0. Which one of the signals would yield the better detection performance?
The two hypotheses are:
where the signal si[n] denotes either of the given deterministic signals s1[n] or s2[n] and w[n]
is colored WSS Gaussian noise with ACF, rww[k] = P + ζ²δ[k] or the covariance matrix C =
P11T + ζ²I, where 1 denotes N × 1 column vector of ones and I is N × N identity matrix.
Note that in case of detection of the known deterministic signal in corelated noise, the
detection performance of a NP detector is completely characterised by the signal-to-noise
ratio (SNR) as,
the value say θ1, then s = Hθ1 is the known signal. Under the null hypothesis 0, we
have θ = 0 so that no signal is present. In applying the linear model to the detection problems,
we decide whether the signal s = Hθ1 is present or not. The detection problem can be
mathematically expressed as
The NP detector immediately follows by letting s = Hθ1 in the detector T(x) = sC-1sT > γ′, i.e.,
we decide 1 if
expression .
Further we could generalize the above results by noting that for the general linear model the
minimum variance unbiased (MVU) estimator of θ is
where γ is a threshold computed from the constraint on probability of false alarm, PFA.
The PDF under the hypotheses can be given as
with .
The test statistics T(x) being linear function of data, assuming that the signal and the noise
are uncorrelated, the PDFs of T(x) under 0 and 1 could be shown as (0,ζ²||s||²) and
(||s||²,ζ²||s||²), respectively.
Finding the probability of false alarm and applying the given constraint (say α)
Noting that the detection threshold satisfies the constraint with equality,
or
where the signal s[n] is zero mean white WSS Gaussian random process with variance ζs² and
the noise w[n] is WGN with variance ζ². For these modeling assumptions, x ~ (0,ζ²I) under
0 and x ~ (0, (ζs² + ζ²)I) under 1. So the NP detector decides 1 if
where
where
Therefore, the NP detector basically computes the energy of the received signal and compares
it to a predetermined threshold. Hence it referred to as the energy detector.
On using the definition for the right-tailed probability for a χν² random
variable, we can find the PFA and PD as
As done earlier, we can substitute the value of threshold γ′ determined from PFA expression
into PD expression to get
Thus with the increase in ζs²∕ζ², the argument of the Qχ ² function decreases and the detection
N
where
or
so that
Now, let
Note that NP detector correlates the received data with an estimate of the signal i.e., . It is
therefore termed as estimator-correlator detector. Recall that if θ is an unknown random
variable whose realizations are to be estimated based on the data x where θ and x are jointly
Gaussian with zero mean, then the MMSE estimator is given by
where Cθx = E(θxT) and Cxx = E(xxT). In the context of detection problem, we have θ = s and
x = s + w with s and w uncorrelated. The MMSE estimate of the signal realization can be
given as
Thus it can be argued that the signal estimate is Wiener filter estimate of the given
realization of the random signal. The block diagram of the estimator-correlator is shown in
Figure 11.1.
Figure 11.1: Estimator-correlator detector for detection of Gaussian random signal in white
Gaussian noise
By noting that s = Hθ ~ (0, HCθHT) and using the previous results for the NP detector in
random signal case, i.e., the estimator-correlation detector decides 1 if
where s ~ (μs, Cs) w ~ (0, Cw) and s and w are independent. The NP detector decides
1 if
or
Taking logarithm of both sides, retaining only data-dependent terms and scaling produces the
test statistic
Thus the test statics consists of both linear form and quadratic form in the data x. Consider
the following special cases:
1. Cs = 0 or a deterministic signal with s = μs. Then
where A and ϕ are random variables, ƒ0 is known frequncy within the range 0 < ƒ0 < 0.5 and
w[n] is WGN with variance ζ².
Instead of assigning PDF to A and ϕ, it is more convenient to note that
and
Thus s[n] is a WSS Gaussian random process with ACF rss(k) = ζs² cos 2πƒ0k. In addition we
can show that the PDF of is Rayleigh or
and the PDf of ϕ = arctan(-q/p) is (0, 2π) and A and ϕ are independent of each other. For
the amplitude PDF being Rayleigh distributed, this channel model is referred to as Rayleigh
fading channel model.
With these assumptions, we note that the observed data follows the Bayesian linear model
x = Hθ + w, where
Further noting that for large N and 0 < ƒ0 < 0.5, we have HT H ≈ (N∕2)I. Thus
or
From the above relations, we can implement the detector in two ways.
In one implementation, the data is correlated with the cosine (“in phase”) and sine (“in quadrature”)
replicas of the sinusoidal signal. As the phase is random, one or both of these outputs (I or Q) will be
large in magnitude if the signal is present. Since the sign of the correlator output can be positive ot
negative, we square the I and Q outputs, sum them with scaling by 1∕N and then compare to a
threshold. This type of detector is known as a quadrature matched filter or an incoherent matched
filter.
The second implementation is known as a periodogram detector or a sampled spectrum detector. In
this, the Fourier transform of data x[n] is computed, which is magnitude-squared and scaled by
1∕N and then compared with a threshold
2.1 Outline
In this module the basics of classical parameter estimation are discussed and the bounds on
the unbiased estimator are described. The salient topics discussed this module are:
Minimum variance unbiased estimator (MVUE)
Cramer-Rao lower bound (CRLB) on unbiased estimators
Fisher information and its relation to CRLB
Computation of CRLB in general cases
Linear model of data and its generalization
But often this criterion does not yield a realizable estimator, i.e., the one which can be written
as functions of the data only:
However although [E( ) - θ]² is a function of θ, the variance of the estimator var( ) is only
a function of data. Thus an alternative approach is to assume E( ) - θ = 0 (i.e., bias is zero)
and minimize var( ). This produces the minimum variance unbiased estimator (MVUE).
2.2.1 Minimum Variance Unbiased Estimator (MVUE)
The MVUE is an optimal estimator. The two attributes of this optimal estimator are:
It should be unbiased:
This ensures that the estimator, in expected sense, deviates from the true value of the
parameter minimally among all possible unbiased estimators for the problem.
2.2.2 Example
Consider a DC signal, A, in presence of additive white Gaussian noise (WGN)
w[n] with variance ζ², then the observed data x[n] is given by:
Find variance:
The sample-mean estimator is turned out as a unbiased and having a variance var( ) =
. But it is not ovbious whether it is MVUE or not. If is an MVUE, all other unbiased
estimator functions would have their variance as var( )≥
2.2.3 Existence of MVUE
The MVUE is the most desired estimator in any case but they do not exist always.
Figure 2.1 depicts two possible situation for the variance var( ) of an unbiased
estimator for the parameter θ. Let there be only three unbiased estimators that exist and
whose variances are shown in Figure 2.1(a), then clearly is the MVUE. If the situation
shown in Figure 2.1(b) exists, then there is no MVUE since for θ > θo, is better while
for θ < θo, is better. In the former case is some times referred to as uniformally
minimum variance unbiased estimator to emphasized the fact that it has the smallest variance
for all θ. In general the MVUE does not always exist.
so that
and
For θ ≤ 0 the minimum possible variance of an unbiased estimator is 18∕36, while that for θ <
0 is 24∕36. Clearly between these two estimators no MVU estimator exists.
and
then we can find the MVUE as: = g(x) and the minimum, variance is . The proof
of these results are given later in Section 2.4.1
For p-dimensional parameter, θ, the equivalent condition in terms of the covariance matrix is
given by:
44 Detection & Estimation Theory ECE-A.P IIIT Nuzvid
i.e., C - I-1(θ) is positive semi-definite where C = E[( -E( ))(( -E( ))T ] is the
covariance matrix of the estimator. The Fisher matrix, I(θ), is given as:
then we can find the MVUE as: = g(x) and the minimum covariance is I-1(θ).
2.3.2 MVU Estimator and CRLB Attainment
In general, an MVU estimator may exist but may not attain the CRLB. To illustrate this, let
us assume that there exist three unbiased estimators for estimating the unknown parameter θ
in an estimation problem and their variances are shown in Figure 2.3 . As shown in
Figure 2.3 (a), the estimator 3 is the efficient as it attains the CRLB and therefore it is also
MVUE. On the other hand, in Figure 2.3 (b), the estimator 3 does not attain the CRLB so it
is not an efficient. But its variance is uniformally less that other possible unbiased estimators
so it is the MVUE.
Note that the above regularity conditions are satisfied in general except when the domain of
the PDF for which it is nonzero dependes on the unknown parameter (e.g., uniform
distribution (0,θ) with unknown domain parameter).
Given the PDF p(x; θ), the Fisher information I(θ) can also be expressed as
or
The Fisher information has the essential properties of an information measure and that is
obvious by noting the facts:
This results in
2.3.4 Example
Consider the case of DC signal A embedded in noise. The noisy observation is given by:
Note the second derivative turns out to be a constant, thus the CRLB is
where I(θ) = and g(x) = = . Shown earlier that for estimation of DC level
in WGN, the sample-mean estimator has a variance of . Thus it is indeed an MVUE as its
variance achieves the minimum possible variance bound of .
which means that as the number of observations increase the MSE of the estimator descends
to zero, i.e., = θ.
For an example, if , then the MSE of is 1∕n. Since limn (1∕n) =
0, x is a consistent estimator of θ or more specifically “MSE-consistant”. There are other type
of consistancy definitions that look at the probability of the errors. They work better when the
estimator do not have a variance.
2.4.1 CRLB in General Cases
In practice, we often require the CRLB for a parameter which is function of some more
fundamental parameter. So first we discussed the derivation of the CRLB for transformation
of scalar parameters. Later the derivation of the CRLB for a deterministic signal in white
Gaussian noise case and its extension for general Gaussian case are discussed.
CRLB for Transformed Parameter
Consider a transformed parameter α = g(θ) where the PDF of data is parameterized by θ.
Then CRLB for an estimator of α, under the regularity conditions, is given by
The speed measurement would be less accurate at higher speed (being quadratic) and be more
accurate for larger distance. Thus, the speed estimator can achieve the CRLB only
asymptotically.
2.5.1 CRLB for Signals in White Gaussian Noise
Assume that a deterministic signal with an unknown parameter θ is observed in WGN as
This form of the bound emphasizes the importance of the signal dependence on θ. It is easy to
note the fact that signals which change rapidly as the unknown parameter changes result in
accurate estimation.
Proof
The likelihood function is
thus both the mean and covariance may depend on θ. Then the Fisher information matrix can
be given by (the proof is lengthy, so omitted)
where
where θ = [A ƒ0 ϕ]T , A > 0, 0 < ƒ0 < 1∕2, and -π ≤ ϕ ≤ π. Since multiple parameters are
unknown, the vector case of CRLB is required to be used and also noting that the covariance
matrix, C = ζ²I, does not depend on θ. Thus we have
For evaluating the CRLB, it is assumed that ƒ0 is not close to 0 or 1∕2 since that allows for
certain simplifications based on approximations as
into the form I(θ)(g(x) -θ). When we do this the MVU estimator for θ is:
2.6.2 Example
Consider fitting the data, x(t), by a pth order polynomial function of t:
where θis are the polynomial coefficients and w(t) is the approximation error assumed to be
zero mean Gaussian with a constant variance.
Assume we have N samples of data, then:
so x = Hθ + w, where H is N × p matrix:
2.6.3 Example
Consider the problem of estimation of the channel in a wireless communication system.
The receiver cannot decode the transmitted symbols correctly unless the channel is known
which is usually unknown in wireless communication. To overcome this issue, the transmitter
regularly transmits a known pseudorandom noise (PN) sequence u[n] so as to enable the
receiver to estimate the unknow channel as depicted in Figure 2.5. The unknown channel is
modeled with a finite impluse response (FIR) filter of an appropriate order, say p.
Assuming that w[n] is white Gaussian noise and noting the data is in the linear model form,
the MVU estimator of the channel impulse response is
For large N and the fact that u[n] = 0 for n < 0 and n > N - 1, it can be shown that
which could be identified as the autocorrelation function of the known sequence u[n]. With
this approaximation HTH also takes the form of a symmetric Toeplitz matrix,
Therefore, the FIR filter coefficient estimators are independent and the MVU estimator
for ith filter coefficient can be given as
That is:
3.1 Outline
In this module the general approaches to find MVUE are discussed. These approaches make
use of the sufficient statistics. Also the best linear unbiased estimator (BLUE) is described
which is much easier to find in practical cases. The salient topics discussed this module are:
Sufficient statistics
Determination of MVUE using a sufficient statistics
Best linear unbiased estimation (BLUE)
By letting
We have the sufficient statistics T(x) = ∑n x[n] which is in fact the minimal sufficient
statisitcs.
3.2.3 Example
Consider the problem of estimation of the phase of a sinusoid embedded in WGN, (0,ζ²).
here, the amplitude A and the frequency ƒ0 of the sinusoid as well as the noise PDF are
assumed to be known.
The PDF of the data vector can be given as
55 Detection & Estimation Theory ECE-A.P IIIT Nuzvid
Noting that the exponent can be expanded as
Note in this problen as per the factorization theorem, there is no single sufficient statistics that
exists rather both T1(x) and T2(x) are jointly the sufficient statistics for the estimation of the
phase of the sinusoid.
where η(θ) and A(θ) are some function of θ, T(x) is function of data x, and h(x) is purely a
function of data, i.e., it does not involve θ.
Suppose that x = {x[0],x[1],…,x[N - 1]} are i.i.d. samples from a member of the exponential
family with the parameter θ, then the joint PDF can be expressed as
From this it is apparent by the factorization theorem that T(x) = ∑n=0N-1T(x[n]) is a sufficient
statistics.
In case of the multi-parameter members of the exponential family, the joint PDF or PMF with
parameter set θ = [θ1,θ2,…,θd]T can be expressed as
3.3.3 Example
Show that the Gamma distribution belongs to the exponential family of distributions.
The Gamma distribution is characterized by the density function
We note that the Gamma distribution has the form of exponential family of distribution
with η1(α, β) = -β, η2(α, β) = (α - 1), T1(x) = x, T2(x) = ln x, A(α, β) = ln Γ(α) - α ln β, and h(x)
= 0.
Obviously:
The BLUE is derived by finding the A which minimizes the variance, subject
to the constraint AH = I, where C is the covariance matrix of the data x. Carrying out the
minimization yields the following form for the BLUE:
where .
Salient attributes of BLUE:
For the general linear model, the BLUE is identical in form to the MVUE.
The BLUE only assumes only up to 2nd-order statistics and not the complete PDF of the data unlike
the MVUE which was derived assuming Gaussian PDF.
If the data is truly Gaussian then the BLUE is also the MVUE.
The BLUE for the general linear model can be stated in terms of following theorem.
Gauss-Markov Theorem: Consider a general data model of the form:
x = Hθ + w
where H is known, and w is noise with covariance C (the PDF of w otherwise arbitrary).
Then the BLUE of θ is:
where w[n] is of unspecified PDF with var(w[n]) = ζn2 and the unknown parameter θ = A is to
be estimated. We assume a BLUE estimate and derive H by noting:
E[x] = 1θ
where x = [x[0],x[1],x[2],…,x[N - 1]] , 1 = [1, 1, 1,…, 1]T and we have H ≡ 1. Also:
T
and we note that in the case of white noise where ζn2 = ζ2 then we get the sample-mean
estimator:
3.4.3 Example
Consider the acoustic echo cancellation (AEC) problem and the signal flow-graph of the
same is shown in Figure 3.1. A speech signal u[n] from the far-end side is broadcast in a
room by means of a loudspeaker. A microphone is present in room to record the local
signal v[n] which is to be transmitted back to the far-end side. The recorded microphone
signal y[n] = v[n] + x[n] contains the undesired echo x[n] due to acoustic echo path existing
between loudspeaker and microphone. The echo path transfer function is modeled using an
FIR filter , so the echo signal can be considered as a filtered
version of the loudspeaker signal . The object of the AEC is to
estimate the impulse response Ĥ(z) of the echo path so as to produce a echo-free
signal .
here v the near-end signal vector is modeled as zero mean process with correlation matrix
R = EvvT. Any linear estimator of h can be written as a linear function of the microphone
signal vector y as
3.4.4 Example
Show that the BLUE commutes over linear (affine) transformation.
Let us consider that given BLUE of θ we wish to estimate
where H is known, and w is noise with covariance C (the PDF of w otherwise arbitrary).
Then the BLUE of θ is:
60 Detection & Estimation Theory ECE-A.P IIIT Nuzvid
where is the minimum covariance matrix. Further assuming that
transformation matrix B is invertible, we get
Then,
Therefore,
This shows that in case of linear transformation of the parameter the BLUE of the
transformed parameter could be obtained by simply by applying the same linear transform to
the BLUE of the original parameter.
where .
On maximizing the likelihood function, by setting its derivative to zero, yields the MLE
Another important observation is that unlike the previous estimates the MLE does not
require an explicit expression for p(x; θ)! Indeed given a histogram plot of the PDF as a
function of θ one can numerically search for the θ that maximizes the PDF.
4.2.2 Example
Consider the problem of a DC signal embedded in noise:
where w[n] is WGN with zero mean and known variance ζ2.
We know that the MVU estimator for θ is the sample-mean. To see that this is also the MLE,
we consider the PDF:
4.2.3 Example
Consider the problem of a DC signal embedded in noise:
where w[n] is WGN with zero mean but unknown variance which is also A, that is the
unknown parameter, θ = A, manifests itself both as the unknown signal and the variance of
the noise. Although a highly unlikely scenario, this simple example demonstrates the power
of the MLE approach since finding the MVUE by the procedures is not easy. Consider the
likelihood function for x is given by:
On differentiating we have:
on setting the derivative to zero and solving for θ, produces the MLE:
and:
where is the MLE of θ. If g is not one-to-one function (i.e., not invertible) then is
obtained as the MLE of transformed likelihood function, pT (x; α), which is defined as:
4.3.2 Example
In this example we demonstrate the finding of transformed MLE. In context of the previous
example, consider two different parameter transformations (i) α = exp(A) and (ii) α = A².
Case (i) From previous example, the PDF parameterized by the parameter θ = A can be given
as
Now to find the MLE of α, setting the derivative of pT (x; α) with respect to α to zero yields
or
But being the MLE of A, so we have = exp(Â). Thus the MLE of the transformed
parameter is found by substituting the MLE of the original parameter into the transformation
function. This is known as invariance property of MLE.
Case (ii) Since , the α is not one-to-one transformation of A. If we
take only then some possible PDFs will be missing. To characterize all possible
PDFs, we need to consider two sets of PDFs
and the MLE of θ is found by differentiating the log-likelihood which can be shown to yield:
Proof
Proof
In the following, the proof of this important property is outlined for the scalar parameter case.
Assuming that the observations are IID and the regularity condition holds
i.e., . Further assume that the first-order and the second-order derivatives
of the likelihood function are defined.
Before deriving the asymptotic PDF, first its is shown that MLE is a consistent estimator. For
this using the Kullback-Leibler information inequality
or
with equality if and only if θ1 = θ2 or the right-hand side of the above inequality is maximized
for θ = θ0. As the data is IID, the maximization of log-likelihood function is equivalently
maximizing
But for N → ∞, this converges to the expected value by the law of large numbers. Hence, if
θ0 be the true value of θ, we have
By a continuity argument, the normalized log-likelihood function also maximizes for θ = θ0or
as N → ∞, the MLE is = θ0. Thus the MLE is consistent.
To derive the asymptotic PDF of the MLE using the mean value theorem, we have
where . But by the definition of the MLE the left-hand side of the above relation
is zero, so that
Now let is a random variable, being the function of x[n]. Additionally, since
the x[n]s are IID so are the ξns. By the central limit theorem the numerator term has the PDF
that converges to a Gaussian with mean
due to independence of the random variables. On applying the Slutsky‟s theorem which says
that if a sequence of random variable xn has the asymptotic PDF of the random variable x and
the sequence of random variables yn converges to a constant c, then xn / yn has the same
asymptotic PDF as the random variable x∕c. Thus in this case
So that
or equivalently
or finally
Thus the distribution of MLE for a parameter is asymptotically normal with mean as true
value of the paramter and the variance as the inverse of the Fisher information.
is minimized over N observation samples of interest and we call this the LSE of θ. More
precisely we have:
An important assumption to produce a meaningful unbiased estimate is that the noise and
model inaccuracies, w[n], have zero mean. However no other probabilistic assumption about
the data is made (i.e., LSE is valid for both Gaussian and non-Gaussian noise). At the same
time we can not make any optimality claims with LSE (as this would depend on the
distribution of the noise and modeling errors).
A problem that arises from assuming the signal model function s(n; θ) rather than knowledge
of p(x; θ) is the need to choose an appropriate signal model. Then again in order to obtain a
closed form or parametric expression for p(x; θ) one usually requires to know what the
underlying model and noise characteristics are anyway.
5.2.2 Example
Consider observations, x[n], arising from a DC-level signal model, s[n] = s(n; θ) = θ:
which surprisingly is identical in functional form to the MVU estimator for the linear model.
An interesting extension to the linear LS is the weighted LS where the contribution to the
error from each component of the parameter vector can be weighted in importance by using a
different form of the error criterion:
Recall that for this problem, the earlier derived BLUE is of the form:
Note that the LSE does not take into account the near-end signal characteristics (R = EvvT)
and therefore, in practice, it is not found as effective as the BLUE.
On combining the two equations and using the matrix form, we have
or
The error vector must be orthogonal to the columns of H. This is the well
known orthogonality principle. In effect the error represent the part of x that cannot be
described by the signal model. The minimum LS error J min can be given as
where hiTx is the length of the vector x along hi. In matrix notation this is
so that
This result is due to orthonormal columns of H. As a result, we have HTH = I and therefore
In general, the columns of H will not be orthogonal, so that the signal vector estimate is
obtained as
and hence
Remark: Note that the constrained LSE is a corrected version of the unconstrained LSE. In
cases where the constraint by chance satisfied by or A = b, then according to above
relation the LSE and constrained LSE be identical. Such is usually not the case however.
5.5.2 Example
In this example we explain the effect of constraints on LSE. Consider a signal model
With some matrix algebra we can show the constrained LSE and corresponding signal
estimate as
Since θ1 = θ2, the two observations are averaged which is intuitively reasonable. In this simple
problem, we can easily incorporate the constraint into the given signal model
Note the parameter to be estimated gets reduced to θ only. On estimating the unconstrained
LSE of θ using the reduced signal model would have produced the same result. Similar to the
least squares, it would be interesting to view the constrained least squares estimation problem
geometrically as done in Figure 5.5 in the context of this example.
and p(x; θ) is the PDF of x parameterized by θ. In the Bayesian approach, the estimator is
similarly derived by minimizing = arg minθ Bmse(θ) where:
is the Bayesian mean square error and p(x,θ) is the joint PDF of x and θ (since θ is now a
random variable). It is to note that the squared error ( - θ)² is identical in both Bayesian and
classical MSE. The minimum Bmse( ) estimator or MMSE estimator is derived by
differentiating the expression of Bmse( ) with respect to and setting this to zero to yield:
Thus MMSE is the conditional expectation of the parameter θ given the observations x. Apart
from the computational (and analytical!) requirements in deriving an expression for the
posterior PDF and then evaluating the expectation E(θ|x) there is also the problem of finding
an appropriate prior PDF. The usual choice is to assume that the joint PDF, p(x,θ), is
Gaussian and hence both the prior PDF, p(θ) and posterior PDF, p(θ|x), are also Gaussian
(this property implies the Gaussian PDF is a conjugate prior distribution). Thus the form of
the PDFs remains the same and all that changes are the means and the variances.
6.2.2 Example
Consider a signal embedded in noise:
where as before w[n] ~ (0,ζ²) is a WGN process and the unknown parameter θ = A is to be
estimated. However in the Bayesian approach we also assume the parameter A is a random
variable with prior PDF which in this case is Gaussian PDF, p(A) = (μA,ζA²). We also
have p(x|A) = (A,ζ²) and we can assume that A and x are jointly Gaussian. Thus the
posterior PDF:
where .
Upon closer examination of MMSE we observe the following (assume ζA² ≪ ζ²):
1. With fewer data (N is small) we have and  ⇒ μA, that is the MMSE tends towards
the mean of the prior PDF and effectively ignores the contribution of the data. Also p(A|x) ≈
(μA,ζA²).
2. With large amounts of data (N is large) we have and  ⇒ , that is the MMSE
tends towards the sample-mean and effectively ignores the contribution of the prior
information. Also p(A|x) ≈ .
Conditional PDF of Multivariate Gaussian: If x and y are jointly Gaussian where x is k × 1
and y is l × 1, with the mean vector [E(x)T, E(y)T]T and the partitioned covariance matrix,
then the conditional PDF, p(y|x), is also Gaussian and the posterior means vector and the
covariance matrix are given by:
This result can be used for the MMSE estimation involving jointly Gaussian parameter vector
and the data vector.
Lecture 18 : Properties of Bayesian Estimator
6.3.1 Bayesian Linear Model
Now consider the Bayesian linear model:
x = Hθ + w
where θ is the unknown parameter to be estimated with prior PDF (μθ, Cθ) and w is a
WGN with PDF (0, Cw). The MMSE is provided by the expression for E(y|x) where we
identify
y ≡ θ. We have:
Now p(x|θ) is, in reality, p(x|θ,α), but we can obtain the true p(x|θ) by:
is one specific case for a general estimator that attempts to minimize the average of the cost
function, (ϵ), that is the Bayesian risk = E[ (ϵ)] where ϵ = (θ - ). Figure 6.1 shows the
plots of three different cost functions of wide interest and the discussion about which one of
the central tendencies of the posterior PDF gets emphasised with choice of these cost
functions is given below:
1. Quadratic: (ϵ) = ϵ² which yields = Bmse( ). It is already shown that the estimate to
minimize = Bmse( ) is:
which is the mode of the posterior PDF, i.e., the value that maximizes the PDF.
2. MAP Estimator: The maximum a posteriori (MAP) estimator which is the mode (or
maximum) of the posterior PDF.
3. Bayesian ML Estimator: The Bayesian maximum likelihood estimator which is the special
case of the MAP estimator where the prior PDF, p(θ), is uniform or non-informative:
Noting that the conditional PDF of x given θ, p(x|θ), is essentially equivalent to the PDF
of x parameterized by θ, p(x|; θ), the Bayesian ML estimator is equivalent to the classical
MLE.
Comparison among the three types of Bayesian estimators:
The MMSE is preferred due to its least-squared cost function but it is also the most difficult
to derive and compute due to the need to find an expression of the posterior PDF, p(θ|x) in
order to integrate ∫ p(θ|x)dθ.
The hit-or-miss cost function used in the MAP estimator though less “precise” but it is much
easier to derive since there is no need to integrate, only find the maximum of the posterior
PDF p(θ|x) which can be done either analytically or numerically.
The Bayesian ML is equivalent in preference to the MAP only in the case where the prior is
non-informative, otherwise it is a sub-optimal estimator.
Like the classical MLE, the expression for the conditional PDF, p(x|θ) is easier to obtain
rather than that of the posterior PDF, p(θ|x). Since in most cases knowledge of the prior is not
available so, not surprisingly, classical MLE tend to be more prevalent. However it may not
always be prudent to assume that prior is uniform, especially in the cases where prior
knowledge of the estimate is available even though the exact PDF is unknown. In these cases
a MAP estimate may perform better even if an “artificial” prior PDF is assumed (e.g., a
Gaussian prior which has the added benefit of yielding a Gaussian posterior).
where a = [a1,a2,…,aN-1]T.
The estimation problem now is to choose the weight coefficients [a,aN] to minimize the
Bayesian MSE:
The resultant estimator is termed as linear minimum mean square error (LMMSE) estimator.
It is to note that the LMMSE estimator will be sub-optimal unless the MMSE estimator
happens to be linear. The MMSE estimator is linear when θ and x are jointly Gaussian.
If the Bayesian linear model is applicable, we can write
x = Hθ + w
The weight coefficients are obtained from = 0 for i = 0, 1,…,N - 1 this yields:
where ho denotes the true value of the impulse response, is then given by the mean of the
posterior PDF p(ho|y)
It is worth comparing the form of LMMSE with that of earlier derived BLUE:
Note that in MMSE criterion of the Bayesian framework, the variance of the estimator and
squared bias are weighted equally. Thus the variance of the estimator could be reduced
further than that of the MVUE if the estimator is no longer constrained to be unbiased.
We assume N sample of time-series data x = [x[0], x[1],…, x[N - 1]]T which are wide-sense
stationary (WSS). Further as E(x) = 0, such N × N covariance matrix takes the symmetric
Toeplitz form:
where rxx[k] = E(x[n]x[n-k]) is the autocorrelation function (ACF) of the x[n] process
and Rxxdenotes the autocorrelation matrix. Note that since x[n] is WSS the
expectation E(x[n]x[n-k]) is independent of the absolute time index n.
In signal processing the estimated ACF is used. The estimated ACF is given by,
Both the data x and the parameter to be estimated are assumed to be zero mean. Thus the
LMMSE estimator is:
Application of the LMMSE estimation to the three signal processing problems such as
soomthing, filtering and prediction gives rise to different kinds of Wiener filters and are
discussed in the following sections.
smoothing and filtering is that the signal estimate ŝ[n] can use the entire data set:
the past values (x[0], x[1],…, x[n- 1]), the present value x[n] and the future values
(x[n + 1], x[n + 2],…, x[N - 1]). This means that the solution cannot be cast as “filtering”
problem since we cannot apply a causal filter to the data. We assume that the signal and noise
processes are uncorrelated. Hence,
and thus
Also
Now, by letting h[k] = αn-k, the above estimator can be expressed as convolution sum
where h[k] could be indentifies as the impulse response of an infinite length two-sided time-
invariant filter.
Analogous to the LSE case (refer to 5.4.1) the orthogonality principle also holds for the
LMMSE case, i.e., the error in estimation (θ - ) is always orthogonal (or perpendicular) to
the observed data {…, x[-1], x[0], x[1],…}. This could be mathematically expressed as
Hence,
Thus the equations required to be solved for determining the infinite Wiener filter impulse
response, also referred to as Wiener-Hopf equations, are given by
On taking the Fourier transform of both sides of the above equation we have
where H(ƒ) is the frequency response of the infinite Wiener smoother, Pxx(ƒ) and Pss(ƒ) are the
power sepctral density of noisy and clean signals, respectively.
As the signal and noise are assumed to be uncorrelated, the frequency response of Wiener
smoother can be expressed as
Remarks: Since the power spectral densities are real and even function of frequency, the
impulse response also turns out to be real and even. This means that the designed filter is
non-causal which is consistent with the fact that the signal is estimated using both future as
well as present and past data.
respectively. Assume that signal and noise are uncorrelated and zero mean. Find the non-
causal optimal Wiener filter for estimating the clean signal from its noisy version.
On computing the z-transform of ACF, the power spectral of the signal and the noise can be
derived as
Given Hopt(z), the impulse response of the optimal stable filter turns out as
Then:
For large data (n → ∞) case, the time-varying impulse response h(n)[k] could be replaced with
its tine-invariant version h[k] and we have
This is termed as the infinite Wiener filter. The determination of the causal Wiener filter
involves the use of the spectral factorization theorem and is explained in the following.
The one-sided z-transform of a sequence x[n] is defined as
where the filter impulse response h[n] is constrained to be causal. The two-sided z-transform
that satisfies the Wiener-Hopf equation could be written as
7.4.3 Example
Consider a signal s[n] corrupted with an additive white noise w[n]. The signal and noise are
assumed to be zero mean and uncorrelated. The autocorrelation function (ACF) of the signal
and noise are
Find the causal optimal Wiener filter to estimate the signal from its noisy observations.
As the signal and noise are uncorrelated, so we have
Figure 7.2: Plot showing the two-sided and the causal optimal filter for causal Wiener filter
example problem.
Therefore,
where rxx = [rxx[l]rxx[l + 1]…rxx[l + N - 1]]T is the time-reversed version of When written
out we get the Wiener-Hopf prediction equations:
As pointed out earlier the Levinson recursion is the computationally efficient procedure for
solving these equations recursively. The special case for l = 1, the one-step predictor, covers
two important cases in signal processing:
The values -h[n] are termed as the linear prediction coefficients (LPC) which are used extensively in
speech coding. For example, commonly a 10th-order (N = 10) linear predictor used in speech coding
and is given by
The resulting Wiener-Hopf equations equations are identical to the Yule-Walker equationsused to
solve the autoregressive (AR) filter parameters of an AR(N) process.
7.5.2 Example
Consider a real wide-sense stationary (WSS) random process x with autocorrelation sequence
or
On solving we have a1 = -5∕6 and a2 = 1∕6. Thus, the optimal predictor polynomial is given by
or
A.
B.
C.
D.
8. The maximum likelihood procedure yields an estimator that is asymptotically efficient, but
A. sometimes it also yields an efficient estimator for finite data records
B. it never yields an efficient estimator for finite data records
C. it yields MVU estimator for finite data records and not the efficient estimator
D. none of the above
9. Given the observations {X1,X2,…,XN} having Poisson distribution
[P(X = x) = ,(x = 0, 1,…)] with unknown parameter λ > 0. The maximum likelihood
estimate of λ be
A.
B. 1∕
C. ²
D. 1∕ ²
10. For an estimation problem, if an efficient estimator exists, then maximum likelihood
estimator
A. will always produce it.
B. will produce it in some cases only.
C. will never produce it.
D. none of the above
11. For the least squares estimation which one of the following assumptions is true.
A. The data is assumed to have uniform distribution.
B. The data is assumed to have Gaussian distribution.
C. The data is assumed to be probabilistic with first two moments known.
D. No probabilistic assumption about the data is made.
A.
B.
C.
D.
16. For a DC level in WGN detection, assume that we wish to have PFA = 10-4 and
PD = 0.99. If the SNR is -30dB, the number of samples N required for detection is
A. 20, 465
B. 28, 646
C. 36, 546
D. 40, 486
17. Consider a binary hypothesis testing problem with the conditional probabilities of the
received data as
with hypotheses H0 and H1 being equilikely. Find the minimum probability of error
A. 0.2012
B. 0.3854
C. 0.4385
D. 0.5108
18. For binary hypothesis testing problem:
where c > 0, and [a,b] denote the uniform PDF in [a,b]. The condition for the perfect
detector (PFA = 0, PD = 1) is
A. c < 1∕2
B. c < 1
C. c > 1∕2
D. c > 1
19. Consider an M = 2 pulse amplitude modulation (PAM) scheme
and subjected to average energy constraint. To have minimum probability of error Pe, the best
choice for signal amplitudes A0 and A1 is
90 Detection & Estimation Theory ECE-A.P IIIT Nuzvid
A. A0 = A1
B. A0 =
C. A0 = -
D. A0 = -A1
20. For linear model, x = Hθ + w, where s = Hθ. Which one of the following is correct?
A. ||s||2 + ||x - ||2 = ||x||2
B. ||s||2 + ||x - ||2 > ||x||2
C. ||s||2 + ||x - ||2 < ||x||2
D. ||x||2 + ||x - ||2 = ||s||2
21. The minimum Bayes risk for a binary hypothesis testing problem with costs C00 = 1, C11=
2, C10 = 2, and C01 = 4 is given by
where π0 is the prior probability of hypothesis H0. Find the values of the minimax risk and the
least favorable prior probability
A. (πL) = 1; πL = 1
B. (πL) = 0; πL = 1
C. (πL) = 2; πL = 0
D. (πL) = 2; πL = 0
22. Consider the PDFs for 0 and 0 given as:
A.
B.
C.
D.
23. Consider the detection problem:
where s[n] = A cos(2πƒ0n + ϕ) is the signal and w[n] the noise distributed as
w[n] ~ (0,ζ2). For estimating the amplitude of the signal A, if the detection statistics is