Detect & Estimate Signals

DETECTION
&
ESTIMATION
THEORY
1 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Introduction
1.1 Lecture 0
The Detection and Estimation theory is a branches of the statistical signal processing that deal
with the decision making and the extraction of relevant information from noisy data. In many
electronic signal processing systems are designed to decide when an event of interest occurs
and then extract more information about that event. Detection and Estimation theory can be
found at the core of those systems.
Some typical applications involving the use of the detection and the estimation theory
principles include:
Biomedicine: where the presence of cardiac arrhythmia is to be detected from
electrocardiogram or the heart rate of a fetus has to be estimated from sonography during
pregnancy in presence of sensor and environmental noises.
Control Systems: where the position of a powerboat for the corrective navigation system has
to be estimated in the presence of sensor and environmental noise or the occurrence of an
abrupt change in the system is to be detected.
Communication Systems: where the transmitted signals have to be identified at the receiver
or the carrier frequency of a signal is to be estimated for demodulation of the baseband signal
in the presence of degradation noise.
Image Processing: where an object has to be identified or its position and orientation from a
camera image has to be estimated in the presence of lighting and background noises.
Radar Systems: where the occurrence of an air-bourne target (e.g., an aircraft, a missile) is
to be detected or the delay of the received pulse echo is to be estimated to determine the
location of the target in the presence of noises.
Siesmology: where the presence of underground oil is to be detected or the distance of oil
deposit has to be estimated from noisy sound reflection due to different densities of oil and
rock layers.
Sonar Systems: where the presence of a submarine is to be detected or the delay of the
received signal from each of sensors is to be estimated to locate it in the presence of noises
and attenuations.
Speech Processing: where the presence of different events (such as phonemes or words) is to
be detected in speech signal in context of speech recognition application or the parameters of
the speech production model have to be estimated in-context of speech coding application in
the presence of speech/ speaker variability and the environmental noises.Apart from these, a
number of applications stemming from the analysis of data from physical phenomena,
economics, etc., could also be mentioned.
The majority of applications require either detection one or more events of interest and/or the
estimation of an unknown parameter from a collection of observation data which also
includes “artifacts” due to sensor inaccuracies, additive noise, signal distortion (convolution
noise), model inaccuracies, unaccounted source of variability and multiple interfering signals.
These artifacts make the dectection and estimation a challenging problems.

1.1.1 Formulation of the Estimation Problem
The model used in estimation problem involves four components.
 The first is the source, whose output depends on a parameter θ that can be regarded as a point
in parameter space. The parameter is either random or nonrandom (deterministic), unkown
quantity.
 The second component is the probabilistic mapping that governs the effect of the parameter
on the observations x. This probability mapping is expressed in terms of the joint probability
density function (PDF), denoted by p(x; θ).
 The third component is the observation space which is usually multidimensional
measurements belonging to either the continuous or the discrete domain.
 The fourth and the last components is the estimation rule g(x) that determines the mapping of
the observation space into an estimate of the unknown parameter.
The whole process is illustrated in Figure 1.1.
Figure 1.1: Estimation model

Classification of Estimation Approaches
Based on the assumptions made about the unknown parameter, the estimation methods can be
classified into two broad groups: classical parameter estimation and Bayesian estimation. In
the classical parameter estimation methods, no probabilistic assumption about the unknown
parameter is made; rather it is treated as the deterministic unknown. In both these broad
groups, a number of optimal and suboptimal estimation approaches exist which are
necessitated by the lack of the complete knowledge about the mathematical model available
for estimating the unknown quantity. A broad classification of different estimation methods
are given Table 1.1.
Unknown Probabilistic Other Estimator kind (salient Practical
param. assumption requirement property) utility
Sufficeient Minimum variance unbiased Low
statistics (optimal)
Completely
known PDF Large data; no Maximum likelihood Very
Non- statistics (asymptotically optimal) high
random First-two Best linear unbiased Moderate
moments only (suboptimal in general)
Known signal Least squares (suboptimal) High
model; no PDF
Conjugate prior; Minimum mean square error High
quadratic cost (optimal)
Known joint and
prior PDFs Hit-or-Miss cost Maximum a posteriori High
Random Uniform prior; Bayesian maximum Low
Hit-or-Miss cost likelihood
First-two Linear minimum mean square Very
moments only error; Wiener filter high
Unknown Probabilistic Other Estimator kind (salient Practical
param. assumption requirement property) utility
(suboptimal in general)
Table 1.1: Classification of different estimation approaches.
1.1.2 Formulation of the Detection Problem

The simplest detection problem is formed when we wish to decide whether a signal is present
embedded in noise or if only noise is present. For example, consider the detection of an
aircraft based on radar echo. This problem can be termed as binary hypothesis
testing problem, the two hypothesis being (1) the aircraft is absent, (2) the aircraft is present.
A somewhat more general binary hypothesis testing problem is encountered in
communications. There our interest is in deciding which of two possible signals is
transmitted. For example our hypothesis in this case consists of a sinusoid of phase
0° embedded in noise verse a sinusoid with phase 180° embedded in noise. Frequently, we
also wish to decide among more than two hypotheses. For example in speech recognition, our
goal is to determine which of the digit among ten possible ones is spoken. Such problem is
referred to as multiple hypothesis testing problem.
All these problems are characterized by the need to decide among two or more possible
hypotheses based on the observed data set. As always, the data are inherently random in
nature due to the inherent variability and noise, so that a statistical approach is necessitated.
We model the detection problem in a form that allows us to apply the theory of statistical
hypothesis testing.
Example
Figure 1.2: DC signal in noise detection problem

Consider the problem of detection of a DC level of amplitude A = 1 embedded in white
Gaussian noise w[n] with variance ζ² as shown in Figure 1.2. Assume only one sample x[0]
of N-point data x = [x[0] x[1] … x[N - 1]]T is available to make the decision. More formally,
we model the detection problem as one of choosing hypothesis , which is noise-only
hypothesis, and , which is the signal-present hypothesis, or symbolically:
The PDFs under each hypothesis are denoted by and , which for this
example are:
Note that in deciding between and , we are essentially asking whether x[0] has been
generated according to PDF or . Alternatively, if we consider the
family of PDFs

which is parameterized by A, then we can reformulate the detection problem as a parameter
test or symbolically:
At times it is convenient to assign prior probabilities to the possible occurrences

of and . For example, in an on-off keyed (OOK) communication system we transmit
a „0‟ by sending no pulse and „1‟ by sending a pulse with amplitude A = 1. Thus it also
corresponds to above mentioned hypothesis or parameter test. Since the likelihoods of
occurrence of data bits, 0 and 1 are equal in long run so it makes sense to regards hypotheses
as random events with probability 1∕2. When we do so; our notation for PDFs will
be and , in keeping with standard notation of a conditional PDF. For
this example, we have then that:
This distinction is analogous the classical versus Bayesian approach to parameter estimation
highlighted earlier.
Hierarchy of Detection Problems
The detection problem in its simplest form assumes that both signal and noise characteristics
are completely known. If the characteristics of signal and/or noise are unknown or not known
completely it leads to detection problem becoming more challenging as well as complex. The
hierarchy of the detection problems along with their typical applications are listed in
Table 1.2.
Conditions Applications
Level 1: Known signals in noise 1. Synchronous digital communication
2. Pattern recognition
Level 2: Signals with unknown 1. Digital communication system without phase reference
parameters in noise 2. Digital communication over slowly fading channels
3. Conventional pulse radar and sonar, target detection
Level 3: Random signals in noise 1. Digital communication over scatter link

2. Passive sonar
3. Radio astronomy (detection of noise sources)
Table 1.2: Hierarchy of signal detection problems

1.1.3 Organization of the Material
The signal detection and parameter estimation problems are closely linked and often we are
required to address both problems in the same system. But they both have their separate
applications. As the detection theory employes many concepts and techniques developed for
the estimation theory, we first present the estimation theory concepts and then the detection
theory concepts. Further most of the real world problems involve analog observations; so
traditionally the detection and the estimation theories were developed for the continuous-time
domain. But nowadays as most of the signal processing is done on digital computers, the
detection and the estimation theories are analogously developed for the discrete-time domain.
In this material, only the discrete-time cases are considered.
In this material, we begin with classical estimation methods which include: minimum
variance unbiased estimator (MVUE), best linear unbiased estimator (BLUE), maximum

likelihood estimator (MLE) and least squares estimator (LSE). These estimators are discussed
in Modules 2-5. It is followed by the Bayesian estimation methods. These techniques are
discussed in Modules 6-7. In detection theory, we first describe different types of dection
criteria in Modele 8. It is followed by brief introduction to non-parametric detection methods
in Module 9. Finally we describe a detection of deterministic and random signals in white
Gaussian noise in Modules 10-11. At the end of this material some long answer and multiple-
choice type questions are given in Module 12.
Module 8 : Detection Theory

Outline
8.1 Outline
A detection problem can be classified into two broad classes: parametric detection and non-
parametric detection. In non-parametric detection, the probability density function (PDF) of
the data is unknown and that will be discussed in Module 9. In this module, the parametric
detection approaches to hypothesis testing are presented. In these approaches, the complete
knowledge about the PDF or that of its structure is assumed to be available. We begin with
the discussion of the Neyman-Pearson (NP) detector and its generalization as the mean
shifted Gauss-Gauss detection problems. An alternative approach to hypothesis testing, the
Bayesian detector which allows the use of the prior information, is then introduced along
with its variant as the Minimax detector. The detection under more complex cases where the
complete knowledge about the PDF is not available is then discussed using both classical and
Bayesian approaches. The salient topics discussed this module are:
 Simple hypothesis testing

o Neyman-Pearson criterion
o Bayes criterion
o Minimax criterion
 Composite hypothesis testing
o Bayesian criterion
o Generalized likelihood ratio tests
Lecture 24 : Hypothesis Testing
8.2.1 Simple Hypothesis Testing
In detection theory, a hypothesis is a statement about the source of the observed data. In the
simplest case, we have the null hypothesis ( 0) that there is no change from the usual and
the alternate hypothesis ( 1) that there is a change. For example. in the target detection
problem, the hypotheses may be
and the objective is to decide which one of these hypothesis is true based on the observed
data.
We begin with those decision making problems in which the PDF for each assumed
hypothesis is completely known, that is why they are referred to as simple hypothesis testing
problems. The primary approaches to simple hypothesis testing are the classical approach
based on Neyman-Pearson theorem and the Bayesian approach based on minimization of the
Bayes risk. In many ways these approaches are analogous to the classical and Bayesian
methods of statistical estimation theory.
8.2.2 Neyman-Pearson (NP) Detector
Before explaining the NP detector, we first describe some relevant terminologies. Suppose
we observe realization of a random variable whose PDF is either (0, 1) and (1, 1). The
detection problem cam be summarized as:

where 0 is referred to as the null hypothesis and 1 as the alternate hypothesis.
The PDFs under each hypothesis along with the probabilities of hypothesis testing errors as
shown in Figure 8.1. A reasonable approach might be to decide 1 if x[0] > 1∕2. This is
because if x[0] > 1∕2 the observed sample more likely if 1 is true. Our detector then
compares the observed datum value with 1∕2, which is called threshold.
With this scheme we can make two errors. If we decide 1 but 0 is true, we make Type I
error. On the other hand, if we decide 0 when 1 is true, we make a Type II error. The
terms Type I and Type II errors are use in statistical domain but in engineering domain these
errors are referred to as false alarm and miss, respectively. The term P( i; j) indicates the
probability of deciding i when j is true. Note that it is not possible to reduce both error
probabilities simultaneously. A typical approach is to hold one error probability value fixed
while minimizing the other.
Figure 8.1: PFDs in binary hypothesis testing, possible errors and their probabilities
In general terms the goal of a detector is to decide either 0 or 1 based on observed data
x = [x[0] x[1]… x[N - 1]] . This is a mapping from each possible data set value into a
T
decision. The decision regions for the previous example are shown in Figure 8.2. Let R1 be
the set of values in RN that map into decision 1. Then probability of false alarm PFA (i.e.,
Type I error) is computed as:
where α is termed as significance level or size of the test in statistics. Now there are many
R1that satisfy the above relation. Our goal is to choose the one that maximizes the probability
of detection defined as:

Figure 8.2: Decision regions in binary hypothesis testing
Neyman-Pearson Theorem: To maximize the PD for a given PFA = α decide 1 if:
where the threshold γ is computed from the constraint on the probability of false alarm value:
Then function L(x) is termed the likelihood ratio and the entire test is called as likelihood
ratio test (LRT).
8.2.3 Example
Consider the general signal detection problem:
where signal s[n] = A for A > 0 and w[n] is WGN with variance ζ².
The NP detector decides 1 if:
Taking logarithm of both sides, simplifying and moving non-data dependent terms to right-
hand side, we have:
Thus the NP detector compares the sample mean to a threshold γ′. Note that the test
statistic is Gaussian under each hypothesis:
We have then:
and
where the function Q(.) is defined as .

The threshold can be found as:
and therefore:
Lecture 25 : Gauss-Gauss Detection Problem

8.3.1 Mean Shifted Gauss-Gauss Detection Problem
In this class of detection problems, we observe the value of the test statistic T and decide
1 if T > γ′ and 0 otherwise. The PDF of T is assumed to be:
where μ1 > μ0. Thus we decide between two hypotheses that differ by a shift in mean of T. For
this type of detector the detection performance is totally characterized by the deflection
coefficient defined as:
In the case when μ0 = 0, d² = μ1²∕ζ² which may be interpreted as the signal-to-noise ratio
(SNR).
Since,
so we have,
The detection performance is, therefore, monotonic with respect to the deflection coefficient.
8.3.2 Receiver Operating Characteristics
The receiver operating characteristic or simply ROC curve is an graphical means to illustrate
the performance of NP detector. The ROC curve was first used in World War II for the
analysis of radar signals and later it was employed in signal detection theory. In ROC curve,
each point corresponds to a value (PFA, PD) for a given threshold value. By adjusting the value
of the threshold any point of the curve may be obtained. As the threshold increases the
PFA decreases but so does the PD and vice-versa.
We already know that for the DC level in WGN example, we have
and
where the deflection coefficient .

Figure 8.3: Receiver operating charateristics for DC level detection in WGN for varying
values of the deflection coefficient (d²).
Figure 8.3 shows the ROC curve for this detection problem for different values of the
deflection coefficient d². The ROC should always be above the 45° line as this ROC can be
attained by a detector that bases its decision on flipping a coin, ignorong all the data.
Consider, a detector that decides 1, if head appears in coin toss where Pr{head} = p and
decides 0 when the outcome is tail, then
But the probability of occurance of head in the coin toss has no dependence upon which
hypothseis is true and therefore PFA = PD = p. This detector then generates the point (p, p) on
the ROC. Considering the difference values of p it would generate a 45° line on the ROC
curve and the same has been marked as a dotted line.
The family of ROCs generated for different values of the deflection coefficient d² are also
shown in Figure 8.3 . As d² increases, the value of PD obtained for a smaller value of PFA also
increases. For d² → ∞, the ideal ROC is obtained i.e., PD = 1 for any value of PFA.
Further, as the threshold γ varies from -∞ to ∞, the point (PFA(γ), PD(γ)) along the ROC curve.
The salient properties of the ROC curve for binary hypothesis testing are:
1. If threshold γ → -∞, the detector always decides 1 and PFA = PD = 1. Thus point (1, 1) belongs
to the ROC curve.
2. If threshold γ → ∞, the detector never decides 1 and PFA = PD = 0. Thus point (0, 0) belongs to
the ROC curve.
3. The slope of the ROC curve at any point (PFA(γ), PD(γ)) is equal to the threshold γ.
4. All points of the ROC curve satisfy PD ≥ PFA.
5. The ROC curve is concave i.e., the domain of the achievable pairs (PD, PFA) is convex.
6. The region of feasible tests is symmetric about the point (0.5, 0.5) i.e., if (PFA, PD) is feasible, so
is (1 - PFA, 1 - PD).
8.3.3 Example
Consider a random variable Y given by Y = N + λθ, where θ is either 0 or 1, λ is a fixed
number between 0 and 2, and N ~ (-1, 1). We wish to decide between the hypotheses
Find (i) Neyman-Pearson (NP) decision rule to decide 1 for 0 ≤ PFA ≤ 1. (ii) Sketch the
receiver operating characteristics.

Figure 8.4: The distribution of hypotheses and the ROC for NP detector deciding 1.
Given that
The distribution of the two hypotheses are plotted in Fig 8.4 . Let a threshold γ is chosen to
give the specified value of false alarm probability PFA, then
Thus, the NP detector decides 1 if,
Now, the probability of detection can be give as
From the above relationship, the ROC for different values of λ is plotted and is also shown in
Fig 8.4.
Lecture 26 : Bayesian Detector

8.4.1 Bayesian Detector
As argued earlier that in some detection problems one can reasonably assign probabilities to
various hypotheses. This approach, where the prior probabilities are assigned, is the Bayesian
approach to hypothesis testing. In the general Bayesian hypothesis testing, not only prior
probabilities are assigned to each hypothesis but a cost C ij is also assigned to each of the
errors. In some applications, all types of errors are not of equal importance and hence
different costs could be assigned to different errors to optimize the detector performance.
The objective of the detector to minimize the expected cost or Bayes risk defined as:
where P( i| j) is the conditional probability that indicates the probability of detecting

iwhen j is true and P( i) is prior probability of ith hypothesis.
It can be shown that the detector which minimizes the Bayes risk is to decide 1 if:
where it is assumed that (C10 > C00), (C01 > C11).

Note once again the conditional likelihood ratio is compared to a threshold.
8.4.2 Minimum Probability of Error
In some cases, we do not assign a cost if not error is made in decision, i.e., C00 = C11 = 0 and
also assign equal cost to the errors. Without loss of generality we can assume C01 = C10 = 1,
then the Bayes risk becomes the probability of error Pe which is defined as:

From the previous discussion about the general Bayesian detector, we can easily deduce that
the detector which minimizes the Pe is given by the rule:
Alternatively we decide 1 if:
But from the Bayes rule, we have:
This detector which minimizes Pe for any prior probability is termed as the maximum a
posteriori probability (MAP) detector.
In cases where the prior probability of the hypotheses are equal, the detector which
minimizes Pe decides 1 if:
This is called the maximum likelihood (ML) detector.

8.4.3 Example
Consider the on-off keying (OOK) communication problem where we transmit either
s0[0] = 0 or s1[n] = A. The detection problem is:
where A > 0 and w[n] is WGN with variance ζ². It is reasonable to assume that
P( 0) = P( 1) = 1∕2. The receiver that minimizes the Pe decides 1 if:
Taking logarithm and on simplifying yields:
or we decide 1 if > A∕2. This detector is same as obtained with NP criterion except for
the threshold and of course the performance. To determine the Pe we note that:
Thus,
The probability of error decreases monotonically with which, of course, is the

deflection coefficient for the given problem.

Lecture 27 : Minimax Detector
8.5.1 Minimax Detector
The Bayes‟ criterion assigns costs to the decisions and assumes knowledge of the a
prioriprobabilities. In many situations, we may not have enough information about a priori
probabilities so Bayes‟ criterion cannot be used. One approach would be to select a value
of P1, the a priori probability of hypothesis 1, for which the risk is maximum and then
minimize that risk function. This principle of minimizing the maximum average cost for the
selected P1 is referred to as minimax criterion.
The Bayes‟ risk for a binary hypothesis testing problem is given by
Further we can express the probability of different decisions in terms of the probability of
false alarm PFA, the probability of miss PM and the probability of detection PD as,
Above relationships use the fact that .

Let P0 and P1 be a priori probabilities of 0 and 1 respectively. We can express the Bayes‟
risk as
Since either of the hypothesis 0 and 1 will always occurs i.e., P0 = 1 - P1. Using this
relation we can now obtain the Bayes‟s risk function in terms of P1 as
Assuming a fixed value of P1 with P1 ∈ (0, 1), Bayes‟ test decides 1 if,
As P1 varies, the decision regions change in turn causing a variation in the average cost which
would be larger than the Bayes‟s cost. The two extremes possible values of P1 are 0 or 1.
When P1 = 0, then the threshold is ∞. We always decide 0, and
When P1 = 1, then the threshold is 0. We always decide 0 and
If P1 = P1*, such that P1* ∈ (0, 1) then the risk as a function of P1 and is shown in Figure 8.5
Figure 8.5: Variation of Bayes‟ risk as a function of P1

As we have already noted that the risk is linear in terms of P1 and Bayes‟ test for P1 = P1*
gives the minimum risk min. The tangent to min is horizontal and *(P1) at P1 = P1*
represents the minimum cost. The Bayes‟ curve must be concave downwards thus the average

cost will not exceed *(P1). Taking the derivative of with respect to P1 and setting to
zero, we obtain the minimax rule to be
If the cost of correct decisions are individually zero (C00 = C11 = 0), then minimax rule
for P1 = P1* reduces to
Furthermore, if the cost of incorrect decisions are individually one (C01 = C10 = 1), then
minimax rule for P1 = P1* reduces to
PM = PFA
and the minimax cost in this case is
which happens to be same as the average probability of error.

8.5.3 Example
Suppose Y is a random variable that has PDF under each hypothesis as
where u(y) is the unit step function. For uniform costs (C00 = C11 = 0 and C01 = C10 = 1), find
the minimax decision rule to decide 1.
Figure 8.6: PDFs of the hypotheses for minimax decision rule example
The PDF under two hypothesis is plotted in Figure 8.6. Let the chosen threshold be y = γ,
then we have
For uniform costs, the minimax decision rule is

PM = PFA
or
On using the complimentary error function table, the value of γ that satisfies the above
relation turns out to be γ ≈ 0.565.
Thus the minimax decision rule decides 1 if
y > 0.565

Lecture 28 : Multiple Hypothesis Testing
8.6.1 Multiple Hypothesis Testing
In many problem one is required to distinguish between more than two hypotheses. These
problems frequently occurs in pattern recognition or in communication when one
of M signals is to be detected. Although the NP decision rule can be extended for the M-ary
hypothesis test but it is often not used in practice. More commonly the minimum probability
of error Pe criterion or its generalization, the Bayes risk, is employed.
Assume that there are M possible hypotheses { 0, 1, …, M-1} to decide from with a cost
Cijassigned to the decision for choosing i when j is true. The expected Bayes risk is given
by
for uniform costs assignment (C00 = C11 = 0 and C01 = C10 = 1) we have that = Pe
Let the decision region Ri = {x : decide i} where i = 0, 1, …, M - 1. These Ri‟s together
partition the space so that
Now let be the average cost of deciding . Then the expected

i
Bayes risk can be written as
As each x must be assigned to one and only one of the decision regions Ri‟s. The cost
contribution to if x is assigned to R1, say for example, is C1(x)p(x)dx. Then for
assigning xto R2 is C2(x)p(x)dx and so on. Generalizing that, we should assign x to Rk if
is minimum for i = k.
Hence, we should choose the hypothesis that minimizes
over i = 0, 1,…,M - 1.
To determine the decision rule that minimizes Pe we use the uniform costs, then
Since the first term is independent of i, the cost Ci(x) is minimized by maximixing P( i |x).
Thus, the minimum Pe decision rule is to decide k if

This is the M-ary maximum a posteriori proabability (MAP) decision rule. In case the prior
probability of each of the hypotheses is equal, then to maximize P( i|x) one need only to
p(x| i). Hence the decision rule for equal prior probabilities decides k if
This is the M-ary maximum likelihood (ML) decision rule.

8.6.2 Example
A ternary communication system transmits one of the three amplitude signals {1, 2, 3} with
equal probabilities. The independent received signal samples under each hypothesis are
where the additive noise w[n] is Gaussian with zero mean and variance ζ². The cost are
Cii = 0 and Cij = 1 for i≠j and i, j = 1, 2, 3. Determine the decision regions and the minimum
probability of error Pe.
Figure 8.7: Decision regions for multiple DC signals in while Gaussian noise for N = 1 case
In this problem, as the prior probabilities are equal P( 0) = P( 1) = P( 3) = 1∕3, the ML
decision rule applies. First consider the simple case of N = 1, the PDF of three hypotheses are
shown in Figure 8.7 . By symmetry, it is obvious that as per the ML rule to minimize Pe we
should decide 0 if y[0] < 1.5, 1 if 1.5 < y[0] < 2.5, and 2 if y[0] > 2.5.
For multiple samples (N > 1) case, the multivariate PDFs as well as the decision regions are
not feasible to plot. In this case we need to derive a test statistic and for doing the same note
that the conditional PDF of the hypotheses can be compactly written as
where A0 = 1, A1 = 2 and A2 = 3. To maximize p(y| ) we can equivalently minimize

i
Further using as the mean of observations y[n]‟s and on manipulation, we can express Di²
as

It is apparent that to minimize Di² we need to choose i for which Ai is closest to .
Hence, we decide
To detemine the minimum Pe, let us note that in this case there are six types of errors unlike
that in binary case. In general for an M-ary detection problem there are M² - M = M(M - 1)
error types. It is therefore easiler to determine 1 - Pe = Pc , where Pc is the probability of
correct decisions. Thus
As conditioned on , we have
so that
Lecture 29 : Bayesian Composite Hypothesis Testing

8.7.1 Composite Hypothesis Testing
The general class of hypothesis testing problems that we will be interested in the composite
hypothesis test. As opposed to the simple hypothesis test in which the PDFs under both
hypotheses are completely known, the composite hypothesis test must accommodate
unknown parameters. The PDFs under 0 or under 1 or both hypotheses may not be
completely specified. For example, if we wish to detect a DC level with an unknown

amplitude A in WGN, then under 1 the PDF is:
Since the amplitude A is unknown, the PDF is not completely specified so we cannot directly
perform the LRT tests as discussed earlier.
There are two approaches to composite hypothesis testing:
1. Bayesian approach: In this approach, the unknown parameter considered as realizations of random
variables and assigned a prior PDF.
2. Generalized likelihood ratio test: In this approach, the unknown parameters are first estimated and
then used in a likelihood ratio test.
The general problem is to decide between 0 and 1 when the PDFs depend on different
sets of unknown parameters. These parameters may or may not be the same under each
hypothesis. Under 0 assume that vector parameters θ0 is unknown while under 1 assume
that vector parameters θ1 is unknown.

8.7.2 Bayesian Approach for Composite Hypothesis Testing
The Bayesian approach assigns prior PDFs to θ0 and θ1. In doing so it models the unknown
parameters as realization of a vector random variable. If the prior PDFs are denoted by p(θ0)
and p(θ1), respectively, the PDFs of the data are:
The unconditional PDFs p(x; 0) and p(x; 1) are now completely specified. They no
longer dependent on the unknown parameters. With the Bayesian approach the optimal NP
detector decides 1 if,
Remark: In this approach the required integrations are multidimensional with dimension
equal to the unknown parameter dimension. The choice of prior PDFs can also prove to be
difficult. If indeed some prior knowledge is available, then it should be used. If not, one can
used non-informative prior i.e., the one having PDF as „flat‟ as possible.
8.7.3 Example
Detection of the unknown DC level in WGN: Bayesian approach
where the DC level in WGN A in unknown and can take on any value -∞ < A < ∞ and w[n] is
WGN with variance ζ².
To solve this problem using Bayesian approach, we assign a prior A ~ (0,ζA²) where A is
independent of the noise w[n]. The conditional PDF under 1 is given by
whereas the PDF under 0 is completely known.

The NP detector decides 1 if,
But
Letting
on completing the square in A, we have

so we have
On taking logarithm of both sides and retaining only the data dependent terms we decide
1if
or
Remark: Note the form of the detector. As the unknown DC level can either be positive or
negative so the detector is formed by comparing either the square or the absolute value of the
sufficient statistic with an appropriate threshold.
Lecture 30 : Generalized Likelihood Ratio Test

The Bayesian approach to composite hypothesis testing discussed earlier suffers from
following limitations:
 In some cases, it is not obvious how to assign the prior probability of the hypotheses.
 If both the hypotheses contain unknown parameters, finding the Bayesian solution becomes very
tedious and often the involved integrals do not yield closed-form solution.
On account of the above limitations, one can use an alternative hypothesis testing approach
referred to as generalized likelihood ratio test (GLRT) and is presented in the following.
8.8.1 Generalized Likelihood Ratio Test (GLRT)
In this approach, the unknown parameters are first estimated from the observed data under
either or both the hypotheses. In the GLRT, the unknown parameters are replaced by their
maximum likelihood estimate (MLE) in the likelihood ratio. Although there is no optimality
associated with the GLRT, in practice, it appears to work quite well. In general a GLRT
decides 1 if:
where 1 is the MLE of θ1 assuming 1 is true (i.e., maximizes p(x; 1)), and 0 is the
MLE of θ0 assuming 0 is true (i.e., maximizes p(x; 0)). This approach also provides
information about the unknown parameters since the first step in determining LG(x) is to find
the MLEs.
8.7.3 Example
Detection of the unknown DC level in WGN: GLRT approach
Assume θ1 = A and there is no unknown parameters under 0. The hypothesis test becomes,
Thus the GLRT decides 1 if
The MLE of A is found by maximizing
and that is well known to be Â = . Thus,
On taking logarithm of both sides, we have
or GLRT decides 1 if
Remark: Note the form of the detector is the identical to that has been obtained using the
Bayesian approach. The derivation of the detector using GLRT often turns out to be much
simpler than that of Bayesian approach.
8.8.3 Summary of Parametric Detectors

The salient attributes of the different parametric detection approaches discussed in this
module are summaried as below:
 The Bayesian detector is the most general detector which not only accounts for the prior probability of
the hypotheses but also the costs of each of the decisions made. It minimizes the average cost or risk
in a decision making.
 The minimax detector is a variant of the Bayesian detector which accounts for the uncertainity about
choosing the prior probability. It minimizes the worst case average cost in a decision making.
 The Nayman-Pearson (NP) detector is the well-known classical detector which optimizes the
detection probability given a fixed level of false alarm probability. It is based on the observed data
only and does not require any prior information about the hypotheses. As a result of this it can be
applied in alomost all detection problems.
 In contrast to simple hypothesis testing where the PDFs under all hypotheses is completely known,
the composite hypothesis testing where one or more hypothses have unknown parameter(s) is much
more challenging.
 In the Bayesian approach to composite hypothesis testing, the likelihood ratio test is performed by
integrating out the unknown parameter(s) using its prior PDF. This approach yields optimal detector
but in general it does not lead to closed-form derivation of the detector.
 The generalized likelihood ratio test (GLRT) forms the most commonly used approach for the
composite hypothesis testing. In this appraoch, the unknown parameter is substituted with its the
maximum likelihood estimate (MLE) obtained from the observed data to perform the likelihood ratio
test. The GLRT approach is very effective in practice though the optimality of the resulting detector is
not guaranteed in all cases.

Lecture 31 : Non-Parametric Detection: Sign Detector
9.2.1 Sign Detector
We already known that for detection of fixed positive voltage in presence of zero mean
additive Gaussian noise, the optimal detector has the form
Instead of summing the observations and comparing it with a threshold, if we count the
number of times the observation samples exceed zero then this could also be used to detect
the unknown fixed positive voltage level. In this case, the detector can be given as
where (.) denotes a unit step function
and γu denotes an appropriately chosen threshold. Such a detector essentially counts the
number of positive signs in the observations so is terned as the sign detector. A sign detector
is very simple to implement as requiring a hard limiter followed by an adder. Unlike that of
sample mean detector, the performance analysis of sign detector is rather involved and is
undertaken in the following.
Performance Analysis of Sign Detector

Assuming that the probabilities of the observation samples taking positive values under two
hypotheses are given as
where p is the probability of the observed data sample being positive given a fixed positive
voltage A is transmitted.
Let d(n) denote the sign of x[n] by
The hypotheses now result in
For the sign detector, the likelihood ratio test (LRT) for deciding 1 can be given as
where
and γ is the threshold.

Let N+ denote the number of positive observations. Then LRT can be expressed as
Further on taking the logarithm to the base (p∕(1 - p)), we can express the LRT in more useful
form as
Note that N+ is the sum of Bernoulli distributed random variables. Under the hypothesis 0,
the binomial distribution N+ has parameters N and 0.5, i.e.,
and the probability of false alarm PFA can be given by
Under the hypothesis 1, then the binomial distribution N+ has parameters N and p, i.e.,
and the probability of detection PD can be given by
9.2.2 Example
Derive a sign detector that uses nine observations and ensures a probability of false alarm
probability of 0.1 for detecting a positive signal A in presence of zero mean Gaussian noise
and analyze its performance for probability of data being positive (i) 0.75 and (ii) 0.99.
Given N = 9, the detection problem is,
Also given that false alarm probability is 0.1. Thus
or
Since
so
while
This means that γ′ = 6. Hence the LRT for deciding 1 becomes
The probabilities of false-alarm and detection can be computed as

(i) For p = 0.75, the probability of detection PD can be computed as
(ii) For p = 0.99, it turns out that PD = 0.999.
Lecture 32 : Sequential Detection

9.3.1 Sequential Detection
The sequential likelihood ratio detection is a modified Neyman-Pearson (NP) detection in
which two thresholds are established. Testing is done until one of the two thresholds is
crossed. This modified NP test is characterized by fixing the probability of miss in addition to
the probability of false alarm. In the modified NP test (also called the sequential likelihood
ratio test), the likelihood ratio (LR) is compared at every update time (i.e., at each new
sequential observation point) with two thresholds. These thresholds denoted by η0 and η1 are
determined by specifying a fixed value of α to PFA and a fixed value of β to PM.
The decision rule is as follows. If the LR is larger than η1, we decide on 1, while if the LR
is smaller than η0, we decide on 0. If either threshold is crossed, the test is stopped and
appropriate decision is taken. If the LR falls between the two thresholds then the decision is
deferred, that is another sample is taken and the test is repeated.
Given independent and identically distributed observations denoted by xN = {x1, x2,…, xN},
the LRT can be written as
or
so we can have a recursive arrangement for the LRT with initial condition Λ(x1) = Λ(x1).
For a fixed value of PFA and PM, the thresholds η0 and η1 need to be derived that meet the
constraints:
For deciding a detection, the sequential LRT is given by
must be true. Hence,
Since PD = 1 - PM = 1 - β which leads to bound on η1

Similarly,
For deciding a miss,
Hence,
which leads to bound on η0
To summarize, the sequential LRT detector decides 1 to be true if Λ(x N) > η1, and if
Λ(xN) < η0 then it decides 0 to be true. If Λ(xN) > η0 but smaller than η1, then another
samples is taken to form Λ(xN+1) and the test is repeated.
This kind of test allows the user to terminate the test earlier than the conventional NP test,
once the presence or absence of a target has been determined with an acceptable level of error
(PFA or PM).
Remarks: The salient limitations of this approach-
 Samples are assumed to be IID
 PF and PM are assumed to be constant
9.3.2 Example
Consider the detection of DC level A in additive Gaussian noise with zero mean and variance
ζ². Conduct a sequential likelihood ratio test (SLRT) to detect the presence and absence of
the signal. It is desired to terminate the test when PM ≤ β or PFA ≤ α.
The detection problem is,
We know that thresholds η1 and η0 for deciding the hypotheses 1 and 0, respectively can
be given as
The SLRT at Nth step becomes
On simplification of the above expression, we have
The outcome of the LRT versus the number of samples considered are shown in Figure 9.1 .
Note the case for N = 1, 2,…, 7 when the decision is deferred and another sample is required
to be taken since neither of the thresholds are crossed. When N = 8, the upper threshold is
crossed, thus allowing to make the decision that the signal present hypothesis ( 1) being
true.

Figure 9.1: Likelihood ratio test versus the number of samples considered (Note that the test
terminates at n = 8).
Module 10 : Detection of Deterministic Signals in White Gaussian Noise

Outline
The problems of the detection of known signals in presence of white Gaussian noise (WGN)
find extensive use in different signal processing applications in particularly where the
assumption about known signal characteristics is a valid one. These problems can be grouped
into two broad classes depending on the nature of the signal. When the signals are known
deterministic it can be shown that the detector takes the form of a replica-correlator or
a matched filter. The salient topics discussed this module are:
 Replica-correlator detector
 Matched filter detector
 Generalized matched filter detector
Lecture 33 : Replica-Correlator Detector
10.1.1 Replica-Correlator Detector
The replica-correlator detector detects known deterministic signal in presence of white
Gaussian noise. In this, we derive the NP detector for the known deterministic signal case.
The two hypotheses are
where the signal s[n] is assumed known determinstic signal and the noise w[n] is WGN with
variance ζ².
Recall that the NP detection decides 1 if
where x = [x[0]x[1]…x[N - 1]]T and γ is the threshold.

Since the PDF under the hypotheses can be given as
On simplification and taking logarithm of both sides of the NP detector, we have
Thus we decide 1 if
where T(x) is test statistic and γ′ a threshold which is chosen to satisfy PFA = α for a given α.
It is clear that the received data is correlated with the signal replica and therefore it is often
referred to replica-correlator detector. Figure 10.1 shows the block diagram of the replica-
correlator detector.
Figure 10.1: Replica-correlator detector for deterministic signal in white Gaussian noise
10.1.2 Matched Filter Detector
In this we show that the replica-correlator detector can interpreted as doing finite impulse
response (FIR) filtering on the data. Assume that x[n] is the input to a FIR filter with impulse
response h[n] where h[n] is nonzero for n = 0, 1,…,N - 1. The output of the filter at time n ≥ 0
is given by
If we consider the impulse response of the FIR filter to be a “flipped-around” version of the
signal to be detected or
then
Now the output of the FIR filter at time n = N - 1 is
Thus we decide 1 if
where T(x) is test statistic and γ′ a threshold which is chosen to satisfy PFA = α for a given α.
It is clear that the received data is correlated with the signal replica and therefore it is often
referred to replica-correlator detector. Figure 10.1 shows the block diagram of the replica-
correlator detector.
Figure 10.1: Replica-correlator detector for deterministic signal in white Gaussian noise

Lecture 34 : Properties of Matched Filter
10.2.1 Frequency-domain Interpretation of Matched Filter
We may also be view the matched filter in frequency domain, using Parseval‟s theorem the
replica-correlator can be expressed as
where H(f) and X(f) are the discrete-time Fourier transforms of h[n] and x[n], respectively.
From matched filter interpretation, H(f) = {h[n]} = {s[N - 1 - n]}, where {.}
represents the discrete-time Fourier transform. So the filter Fourier transform H(f) can be
shown to be
then we have
On sampling the output of the filter at n = N - 1 we have
10.2.2 Matched Filter as Maximizer of Signal-to-Noise Ratio

There is another property of matched filter that it maximizes the signal to noise ratio (SNR) at
the output of the filter. To show this we consider all detectors of the form of FIR filter but
with arbitrary impulse response h[n] over [0,N - 1] and zero otherwise. Now if we define the
output SNR η as
Let s = [s[0],s[1],…s[N - 1]]T , h = [h[0],h[1],…h[N - 1]]T and w = [w[0],w[1],…w[N - 1]]T ,

then using the vector notation we can write
On using the Cauchy-Schwartz inequality, we have
with equality if and only if h = ks, where k is any constant. Hence
The maximum output SNR ηmax is attained for (letting k = 1)
or equivalently
which turns out to be matched filter.

Remark: Note that for the detection of a known signal in WGN, the NP criterion and the
maximum SNR criterion both lead to the matched filter detector. The maximum SNR is
ηmax = sTs∕ζ² = ε∕ζ² , where ε is the energy of the signal. One can easily guess that the
performance of the matched filter detector would increase monotonically with ηmax.

Lecture 35 : Computation of Performance
10.3.1 Performance of Matched Filter Detector
To determine the detection performance of the replica-correlator or matched filter detection,
we need to derive the expression of probability of the detection PD for a given value of
probability of false alarm PFA. For replica-correlation or matched filter detector we decide
1 if
As under each hypothesis the data samples x[n] are Gaussian and since test statics T(x) is a
linear combination of Gaussian random variables, T(x) is also Gaussian. Let E(T; i) and
var(T; i) denote the expected value and the variance of T(x) under i, then
where we have used the fact that w[n]‟s are uncorrelated. Thus
Now define the scaled test statistic , its PDF is
It is to note that as ε∕ζ² increases the PDFs retain their shape but move further apart, this
obviously increases the detection performance. As the PDFs under both hypotheses are
known, we can find the PFA and PD as
where Q(x) = 1 - ϕ(x) and ϕ(x) is CDF of standard normal distribution, (0, 1). Since CDF
is monotonically increase so Q(x) must be monotonically decreasing and so is Q-1(x). On
substituting into the expression for PD, we can write
The above relation establishes that PD monotonically increases with increase in ε∕ζ² i.e., the
energy-to-noise ratio.

10.3.2 Example
It is desired to design a signal for achieving the best detection performance in white Gaussian
noise. Two competing signals proposed are:
where A > 0. Which one of the signals would yield the better detection performance?
The two hypotheses are,
where the signal si[n] denotes either of the given deterministic signals s1[n] or s2[n] and the
noise w[n] is WGN with variance ζ².
Note that in case of detection of the known deterministic signal in WGN with variance ζ², the
detection performance of a NP detector is completely characterised by the signal-to-noise
ratio as
where ε denotes the energy of the known signal.

On computing the energies for the signals s1[n] and s2[n], we have
As both the signals have identical energy, they would yield the identical detection
performance under WGN.
Lecture 36 : Generalized Matched Filter

10.4.1 Generalized Matched Filter Detector
In many practical situations, the noise is more accurately modeled as correlated noise. The
noise is assumed to have the PDF w ~ (0,C), where C is the covariance matrix. If the noise
is modeled as wide sensed stationary (WSS) process then C has a special form of symmetric
Toeplitz matrix. But for non-stationary noise C will be an arbitrary covariance matrix. For
finding the NP detector in this case we again perform the likelihood ratio test with the PDF of
the data x under 1 as (s,C) and under 0 as (0,C). The NP detector decides 1 if
On simplifying and incorporating the data dependent terms into threshold, we decide 1 if
This is referred to as a generalized matched filter. Note that for WGN, C = ζ²I, the detector
reduces to
Further it may be viewed as a replica-correlator where the replica is the modified signal s′= C-
1
s, then

thus the detector correlates the data with modified signal.
For any C that is positive definite, its inverse would also be positive definite. Consequently
we may factor C-1 as C-1 = DT D, where D is a nonsingular matrix. Thus the test statistic can
be expressed as
where x′ = Dx and s′ = Ds.

Considering that the fact the with the linear transformation of data the correlated noise also
undergoes the same linear transformation and gets transformed to w′ = Dw. Then
Thus the linear transformation D is a whitening transform and the generalized matched filter
can also be viewed as a pre-whitener followed by a replica-correlator or matched filter as
shown in Figure 10.3
Figure 10.3: Generalized matched filter detector as pre-whitener followed by a replica-

correlator (or matched filter)
10.4.2 Performance of Generalized Matched Filter
The test statistic for the generalized matched filter is given by
is a linear transformation of the data x so the PDF of the test statistic under either of the
hypothesis remain Gaussian same as that of data. The first two moments of the test statistic
under either of the hypothesis are determined below:
Similar to white noise case, we can relate PD and PFA as
where d² is the deflection ratio and from the definition

so we have
It is to note that in case of the correlated noise the signal can be designed to maximize sT C-
1
sand hence PD unlike that of white noise case in which the shape of the signal has no
importance only the signal energy mattered.
Lecture 37 : Signal Detection in Colored Noise

10.5.1 Signal Design for Correlated Noise
In this we explain the design of signal for optimal detection performance under correlated
noise case. Consider an arbitrary noise covariance matrix C and signal s to be designed for
optimal detection. The detection performance can be optimized by maximizing the term sC-
s by arbitrarily increasing the signal strength but in practice there is constraint on signal
1 T
energy. So the optimal signal is chosen by maximizing sC-1sT subject to the fixed energy
constraint sTs = ε. Making use of Lagrangian multiplier, we maximize the function
where λ is a positive constant.

Since C is a symmetric matrix, we have
or
Therefore,
Thus, we should choose the signal s as the eigenvector of C-1 whose corresponding
eigenvalue, i.e., λ is maximum. Alternatively, we should choose the signal as the eigenvector
of C that has the minimum eigenvalue.
10.5.2 Example
It is desired to design a signal for achieving the best detection performance in colored WSS
Gaussian noise with auto covariance function (ACF), rww[k] = P + ζ²δ[k] where both P and ζ
are non-zero positive constants. Two competing signals proposed are:
where A > 0. Which one of the signals would yield the better detection performance?
The two hypotheses are:
where the signal si[n] denotes either of the given deterministic signals s1[n] or s2[n] and w[n]
is colored WSS Gaussian noise with ACF, rww[k] = P + ζ²δ[k] or the covariance matrix C =
P11T + ζ²I, where 1 denotes N × 1 column vector of ones and I is N × N identity matrix.
Note that in case of detection of the known deterministic signal in corelated noise, the
detection performance of a NP detector is completely characterised by the signal-to-noise
ratio (SNR) as,

where the term sTC-1s denotes the SNR. Thus a signal that yields higher SNR would results in
a better detection performance for a given probability of false alarm.
To compute the resulting SNR for the given signals, we have to first compute the inverse of
noise covariance matrix. For the sake of simplity in finding the inverse, without loss of
generality, assume that the length of observation is either N = 2 (even case) or N = 3 (odd
case).
Even data length case: N = 2
The SNR value for signal vector s1 is
For A,P,σ > 0, we have s2TC-1s2 > s1TC-1s1.

Odd data length case: N = 3

Again, for A,P,σ > 0, we have s2TC-1s2 > s1TC-1s1.
Thus, the signal s2[n] yields better detector performance than the signal s1[n] in given colored
WSS Gaussian noise. Constrast this inference with the one made in white Gaussian noise
case, refer to Example 10.3.2 , where either of the signals yields the same detection
performance.
Lecture 38 : Detection and Linear Model
10.6.1 Linear Model
The linear model was introduced earlier in context of classical estimation in Section 2.6.1. It
finds application in a number of real-world problems and makes the detection/estimation
problems mathematically tractable. Recall that in the classical general linear model, the data
vector x can be expressed as
x = Hθ + w
where x is a N × 1 vector of received data samples, H is a known N × p full rank observation
matrix with N > p, θ is a p × 1 vector of parameter, which may or may not be known,
and w is a N × 1 noise vector with PDF (0,C). The term Hθ can be interpreted as the
signal.
In case of the detection of the deterministic signals, θ is assumed to be known under 1 with
the value say θ1, then s = Hθ1 is the known signal. Under the null hypothesis 0, we
have θ = 0 so that no signal is present. In applying the linear model to the detection problems,
we decide whether the signal s = Hθ1 is present or not. The detection problem can be
mathematically expressed as
The NP detector immediately follows by letting s = Hθ1 in the detector T(x) = sC-1sT > γ′, i.e.,
we decide 1 if
The performance of the detector is obtained by appropriate substitution in the
expression .
Further we could generalize the above results by noting that for the general linear model the
minimum variance unbiased (MVU) estimator of θ is
also that the covariance matrix of the MVU estimator is .

Thus, we decide 1 if
The detection performance can be easily found from

Remark: As pointed out earlier, the quantity θ1TC-1θ1 can be interpreted as the signal-to-noise
ratio (SNR). Thus the detection performance monotically increases with increasing SNR.
10.6.2 Example
It is desired to detect the known signal s[n] = Arn for n = 0,…,N - 1 in white Gaussian noise
with variance ζ². Find the Neyman-Pearson detector and its detection performance. Explain
what happens as N →∞ for 0 < r < 1, r = 1, and r > 1.
The two hypotheses are
where x = [x[0],x[1],…,x[N - 1]]T is a N × 1 observation vector, s = [A,Ar,…,ArN-1] is N × 1

known determinstic signal vector and w is a N × 1 noise vector with PDF (0,ζ²I).
The NP detection decides 1 if
where γ is a threshold computed from the constraint on probability of false alarm, PFA.
The PDF under the hypotheses can be given as
On simplification and taking logarithm of both side of the NP detector, we have
with .
The test statistics T(x) being linear function of data, assuming that the signal and the noise
are uncorrelated, the PDFs of T(x) under 0 and 1 could be shown as (0,ζ²||s||²) and
(||s||²,ζ²||s||²), respectively.
Finding the probability of false alarm and applying the given constraint (say α)
Noting that the detection threshold satisfies the constraint with equality,
The performance of the detector is
or
For the cases r = 1 and r > 1, as N →∞ the signal energy tends to

infinity. As a result the argument of Q-function in the expression of PD tends to infinity and
hence PD → 1.
For 0 < r < 1, as N →∞ we have . Hence

The probability of detection improves with increasing A, decreasing ζ, and/or r being closer
to 1.
Module 11 : Detection of Random Signals in White Gaussian Noise

Outline
11.1 Outline
In some cases, it is more appropriate to assume the signal as a random process rather than
being deterministic. When the signals are modeled as random processes with known
covariance structure, the detector takes the form of a estimator-correlator to cope with the
random nature of the signals to be detected. The salient topics discussed this module are:
 Energy detector
 Estimator-correlator detector
 Generalized Gaussian detection
Lecture 39 : Energy Detector
11.1.1 Energy Detector
We first derive the NP detector in the presence of WGN for the cases where the signal is
modeled as white WSS Gaussian random process with known variance. Later it is generalized
for the cases where the signal is modeled as a Gaussian random process with known arbitrary
covariance matrix.
11.1.2 Detection of Random Signal with Diagonal Covariance Matrix
The detection problem is to differentiate between the hypotheses
where the signal s[n] is zero mean white WSS Gaussian random process with variance ζs² and
the noise w[n] is WGN with variance ζ². For these modeling assumptions, x ~ (0,ζ²I) under
0 and x ~ (0, (ζs² + ζ²)I) under 1. So the NP detector decides 1 if
where
On merging non-data dependent terms with threshold, we have
where
Therefore, the NP detector basically computes the energy of the received signal and compares
it to a predetermined threshold. Hence it referred to as the energy detector.

For computing the detection performance of the energy detector, note that the test statistic is
the sum of the squares of N IID Gaussian random variable, so the PDF of the test statistic
under both hypotheses can be given as
On using the definition for the right-tailed probability for a χν² random
variable, we can find the PFA and PD as
As done earlier, we can substitute the value of threshold γ′ determined from PFA expression
into PD expression to get
Thus with the increase in ζs²∕ζ², the argument of the Qχ ² function decreases and the detection
N
performance PD increases monotonically.

Remark: The energy detector forms the most widely used receivers in the communication
systems, in particular, for the asynchronous communication.
11.1.3 Detection of Random Signal with Arbitrary Covariance Matrix
In this case, the signal s[n] is a zero mean Gaussian random process with known covariance
matrix Cs and the noise is WGN with the variance ζ². For these modeling assumptions,
x ~ (0,ζ²I) under 0 andx ~ (0,Cs + ζ²I) under 1. So the NP detector decides 1 if
where
On taking logarithm and retaining only data-dependent term yields, we have
or
Using the matrix inversion lemma,
and upon letting A = ζ²I, B = D = I, C = Cs, we get
so that
Now, let

Hence, we decide 1 if
Note that NP detector correlates the received data with an estimate of the signal i.e., . It is
therefore termed as estimator-correlator detector. Recall that if θ is an unknown random
variable whose realizations are to be estimated based on the data x where θ and x are jointly
Gaussian with zero mean, then the MMSE estimator is given by
where Cθx = E(θxT) and Cxx = E(xxT). In the context of detection problem, we have θ = s and
x = s + w with s and w uncorrelated. The MMSE estimate of the signal realization can be
given as
Thus it can be argued that the signal estimate is Wiener filter estimate of the given
realization of the random signal. The block diagram of the estimator-correlator is shown in
Figure 11.1.
Figure 11.1: Estimator-correlator detector for detection of Gaussian random signal in white
Gaussian noise
Lecture 40 : Linear Model and Generalized Gaussian Detection

11.2.1 Linear Model
The linear model for the detection of deterministic signal in WGN was discussed
in Section 10.6.1, the detection of random signals in WGN could also be simplified with the
use of linear model. This approach has obvious similarity to the Bayesian linear model
described in Section 7.3.1
Assume that the data is described as:
x = Hθ + w
where x is N × 1 data vector, H is known N × p observation matrix, θ is p × 1 random vector
of parameters with θ ~ (0, Cθ), and w is an N × 1 noise vector with w ~ (0, ζ²I) and
independent of θ.
The detection problem can be expressed as
By noting that s = Hθ ~ (0, HCθHT) and using the previous results for the NP detector in
random signal case, i.e., the estimator-correlation detector decides 1 if

On substituting the value of the signal covariance matrix as Cs = HCθHT , the detector
decides 1 if
This can also be shown to reduce to , where is the MMSE estimate

of θ.
11.2.2 Generalized Gaussian Detection
In general case the signal is allowed to have both deterministic and random components. To
accommodate this the signal is modeled as a random process with the deterministic part
corresponding to a nonzero mean mean and the random part part corresponding to a zero
mean random process with a given signal covariance matrix. These assumptions lead to the
general Gaussian detection problem in which the signal can be discriminated from the noise
based on its mean and covariance. Mathematically the detection problem is described as
where s ~ (μs, Cs) w ~ (0, Cw) and s and w are independent. The NP detector decides
1 if
or
Taking logarithm of both sides, retaining only data-dependent terms and scaling produces the
test statistic
Again using the matrix inversion lemma,
on ignoring the non-data dependent term and scaling, we have
Thus the test statics consists of both linear form and quadratic form in the data x. Consider
the following special cases:
1. Cs = 0 or a deterministic signal with s = μs. Then
which results in a pre-whitener followed by a matched filter detector.

2. μs = 0 or a random signal with s ~ (0, Cs). Then
which results in pre-whitener followed by an estimator-correlator where

is the MMSE estimate of the random signal s.

11.2.3 Example
Detection of a sinusoidal signal in additive white Gaussian noise with Rayleigh fading
channel model.
In typical detection scenario, the signal of interest can reach the detector/receiver following
many different paths. The net effect of this is to cause constructive and distructive
interference, resulting in an unpredictable amplitude and phase of the received signal. Such a
fluctuation in received signal can also be casued due to relative motion between the
transmitter and the receiver.
In case of Rayleigh fading, the observed signal can be expressed as
where A and ϕ are random variables, ƒ0 is known frequncy within the range 0 < ƒ0 < 0.5 and
w[n] is WGN with variance ζ².
Instead of assigning PDF to A and ϕ, it is more convenient to note that
where p = A cos ϕ and q = -A sin ϕ and then assign a PDF to [p q]T.

Note that now the signal is linear in the parameters p and q. Now invoking the central limit
theorem (due to the superposition over number of multipath arrivals of the signal at the
receiver), we further assume that
and that θ is independent of w[n].

On computing the first two moments of s[n] we have
and
Thus s[n] is a WSS Gaussian random process with ACF rss(k) = ζs² cos 2πƒ0k. In addition we
can show that the PDF of is Rayleigh or
and the PDf of ϕ = arctan(-q/p) is (0, 2π) and A and ϕ are independent of each other. For
the amplitude PDF being Rayleigh distributed, this channel model is referred to as Rayleigh
fading channel model.
With these assumptions, we note that the observed data follows the Bayesian linear model
x = Hθ + w, where
θ ~ (0, ζs²I) and w ~ (0, ζ²I), with θ being independent of w.

For the detection of sinusoid (i.e., deciding 1), the NP detector can be given as,

On using the matrix inversion lemma, we have
Further noting that for large N and 0 < ƒ0 < 0.5, we have HT H ≈ (N∕2)I. Thus
where k = and ∥ . ∥ denotes the Euclidean norm.

On merging the positive constant k with the threshold, we have
or
From the above relations, we can implement the detector in two ways.
 In one implementation, the data is correlated with the cosine (“in phase”) and sine (“in quadrature”)
replicas of the sinusoidal signal. As the phase is random, one or both of these outputs (I or Q) will be
large in magnitude if the signal is present. Since the sign of the correlator output can be positive ot
negative, we square the I and Q outputs, sum them with scaling by 1∕N and then compare to a
threshold. This type of detector is known as a quadrature matched filter or an incoherent matched
filter.
 The second implementation is known as a periodogram detector or a sampled spectrum detector. In
this, the Fourier transform of data x[n] is computed, which is magnitude-squared and scaled by
1∕N and then compared with a threshold
2.1 Outline
In this module the basics of classical parameter estimation are discussed and the bounds on
the unbiased estimator are described. The salient topics discussed this module are:
 Minimum variance unbiased estimator (MVUE)
 Cramer-Rao lower bound (CRLB) on unbiased estimators
 Fisher information and its relation to CRLB
 Computation of CRLB in general cases
 Linear model of data and its generalization

To state the problem of classical parameter estimation mathematically, let us define the
following:
 x[n] ≡ observation data at sample time n
 x = [x[0] x[1] … x[N - 1]]T ≡ vector of N observation samples (N-point data set)
 p(x,θ) ≡ mathematical model (i.e., PDF) of the N-point data set parameterized by θ
The problem is to find a function of the N-point data set which provides an estimate of θ, that
is
where is an estimate of θ and g(x) is known as the estimator function.

Once a candidate estimator function g(x) is found, then one usually asks the following
questions:
1. How close will be to θ (i.e., how good or optimal is the estimator)?
2. Are there better estimators (i.e., closer to the value to be estimated) ?
To measure the goodness of an estimator, one need to define a suitable cost function C(θ, )
which essintially captures the difference between the estimated and the true value of the
parameter over the range of interest. The typical cost factors used are thequadratic error,
the absolute error and the uniform (Hit-or-Miss) cost functions.
In classical (nonrandom) parameter estimation case, the natural optimazation criterion is
minimization of the mean square error:
But often this criterion does not yield a realizable estimator, i.e., the one which can be written
as functions of the data only:
However although [E( ) - θ]² is a function of θ, the variance of the estimator var( ) is only
a function of data. Thus an alternative approach is to assume E( ) - θ = 0 (i.e., bias is zero)
and minimize var( ). This produces the minimum variance unbiased estimator (MVUE).
2.2.1 Minimum Variance Unbiased Estimator (MVUE)
The MVUE is an optimal estimator. The two attributes of this optimal estimator are:
 It should be unbiased:
where [a,b] is the range of interest of the parameter to be estimated.

This ensures that the estimator produces, in an average sense, the true value of the parameter
to be estimated.
 It should have minimum variance:
This ensures that the estimator, in expected sense, deviates from the true value of the
parameter minimally among all possible unbiased estimators for the problem.
2.2.2 Example
Consider a DC signal, A, in presence of additive white Gaussian noise (WGN)
w[n] with variance ζ², then the observed data x[n] is given by:
where θ = A is the parameter to be estimated from the observed data.

Consider the sample-mean estimator function:
In the sample-mean an MVUE?

Check unbiasedness:
Find variance:
The sample-mean estimator is turned out as a unbiased and having a variance var( ) =
. But it is not ovbious whether it is MVUE or not. If is an MVUE, all other unbiased
estimator functions would have their variance as var( )≥
2.2.3 Existence of MVUE
The MVUE is the most desired estimator in any case but they do not exist always.
Figure 2.1 depicts two possible situation for the variance var( ) of an unbiased
estimator for the parameter θ. Let there be only three unbiased estimators that exist and
whose variances are shown in Figure 2.1(a), then clearly is the MVUE. If the situation
shown in Figure 2.1(b) exists, then there is no MVUE since for θ > θo, is better while
for θ < θo, is better. In the former case is some times referred to as uniformally
minimum variance unbiased estimator to emphasized the fact that it has the smallest variance
for all θ. In general the MVUE does not always exist.
Figure 2.1: Possible dependence of estimator variance with parameter θ

2.2.4 Example
In this we present a counterexample to the existence of the MVUE. If the form of the data
PDF changes with the parameter θ, then it would be expected that the best estimator would
also change with θ. Assume that we have two independent observations x[0] and x[1] with
PDF:
Consider two estimators,

It can be easily shown that they are unbiased estimators and to compute their variances we
have
so that
and
For θ ≤ 0 the minimum possible variance of an unbiased estimator is 18∕36, while that for θ <
0 is 24∕36. Clearly between these two estimators no MVU estimator exists.
Figure 2.2: Illustration of non-existence of MVU estimator

As shown in Figure 2.2, for θ ≥ 0 the minimum possible variance of an unbiased estimator is
18∕36, while that for θ < 0 is 28∕36. Clearly none of these two estimators could be the MVUE.
In this lecture, we present the Cramer Rao lower bound on the variance of an unbiased
estimator.
2.3.1 Cramer Rao Lower Bound (CRLB)
The variance of any unbiased estimator under certain regularity conditions, must be lower
bounded by the CRLB, with the variance of the MVUE attaining the CRLB. That is:
and
where I(θ) = is the Fisher information.

Furthermore if, for some function g amd I:
then we can find the MVUE as: = g(x) and the minimum, variance is . The proof
of these results are given later in Section 2.4.1
For p-dimensional parameter, θ, the equivalent condition in terms of the covariance matrix is
given by:
i.e., C - I-1(θ) is positive semi-definite where C = E[( -E( ))(( -E( ))T ] is the
covariance matrix of the estimator. The Fisher matrix, I(θ), is given as:
Furthermore if, for some p-dimensional function g and p × p matrix I:
then we can find the MVUE as: = g(x) and the minimum covariance is I-1(θ).
2.3.2 MVU Estimator and CRLB Attainment
In general, an MVU estimator may exist but may not attain the CRLB. To illustrate this, let
us assume that there exist three unbiased estimators for estimating the unknown parameter θ
in an estimation problem and their variances are shown in Figure 2.3 . As shown in
Figure 2.3 (a), the estimator 3 is the efficient as it attains the CRLB and therefore it is also
MVUE. On the other hand, in Figure 2.3 (b), the estimator 3 does not attain the CRLB so it
is not an efficient. But its variance is uniformally less that other possible unbiased estimators
so it is the MVUE.
Figure 2.3: Possible dependence of estimator variance with parameter θ

2.3.3 Fisher Information
As noted above, when CRLB is attained, the variance of the unbiased estimator is reciprocal
of the Fisher information. The Fisher information is a way of measuring the amount of
information that an observable random variable x carries about an unknown parameters θ
upon which the probability of x depends. Assume that the data PDF p(x; θ) satisfies some
regulairty conditions which include:
 For all x such that p(x; θ) > 0, ln p(x; θ) exists and is finite.
 The operation of the integration with respect to x and the differentiation with respect to θ
could be exchanged in finding the expectation, i.e.,
Note that the above regularity conditions are satisfied in general except when the domain of
the PDF for which it is nonzero dependes on the unknown parameter (e.g., uniform
distribution (0,θ) with unknown domain parameter).
Given the PDF p(x; θ), the Fisher information I(θ) can also be expressed as

which follows directly from the “regularity” condition, E = 0 ∀ θ , imposed on
the PDF.
Proof
In the following, the Fisher information relationships are shown for the scalar parameter case
p(x; θ) for the sake of simplicity,
or
The Fisher information has the essential properties of an information measure and that is
obvious by noting the facts:
1. It is non-negative, due to I(θ) =

2. It is additive for independent observations
The later property leads to the result that the CRLB of N IID observations is 1∕N times that of
for one observation. To verify this, note that for independent observations
This results in
In case of completely dependent identically distributed observations, the Fisher information

of N observations remains same as that of for one observation and the CRLB will not
decrease with increasing data record length.
In other words, if we synthetically try to increase the data length by simply repeating some of
the actual observations rather than making new observation it would not result in lowering of
the CRLB or better estimation performance than obtained by using the actual observations.
2.3.4 Example
Consider the case of DC signal A embedded in noise. The noisy observation is given by:
where w[n] is a WGN with variance ζ², and thus:

where p(x; θ) is considered a function of the parameter θ = A (for known x) and is thus
termed as likelihood function. Taking the first and then second derivatives of logarithm of the
likelihood function:
Note the second derivative turns out to be a constant, thus the CRLB is
Alternatively by considering the first derivative:
where I(θ) = and g(x) = = . Shown earlier that for estimation of DC level
in WGN, the sample-mean estimator has a variance of . Thus it is indeed an MVUE as its
variance achieves the minimum possible variance bound of .
2.3.5 Consistency of Estimator

Another desirable property of estimators is consistency. If we collect a large number of
observations, we hope that we have a lot of information about any unknown parameter θ, and
thus we can construct an estimator with a very small mean square error (MSE). An estimator
is defined as consistent if
which means that as the number of observations increase the MSE of the estimator descends
to zero, i.e., = θ.
For an example, if , then the MSE of is 1∕n. Since limn (1∕n) =
0, x is a consistent estimator of θ or more specifically “MSE-consistant”. There are other type
of consistancy definitions that look at the probability of the errors. They work better when the
estimator do not have a variance.
2.4.1 CRLB in General Cases
In practice, we often require the CRLB for a parameter which is function of some more
fundamental parameter. So first we discussed the derivation of the CRLB for transformation
of scalar parameters. Later the derivation of the CRLB for a deterministic signal in white
Gaussian noise case and its extension for general Gaussian case are discussed.
CRLB for Transformed Parameter
Consider a transformed parameter α = g(θ) where the PDF of data is parameterized by θ.
Then CRLB for an estimator of α, under the regularity conditions, is given by

Proof
Effect of Parameter Transformation on Asymptotic Efficiency
It is important to understand the effect of the transformation of parameter on the efficiency of
estimator of the transformed parameter. As shown earlier, the sample mean estimator was
efficient for estimation of DC level A in WGN, considering the transformation
α = g(A) = A² one may expect that be efficient for the estimation of A². But this notion is
not true as is not even unbiased estimator. Since
Hence we can conclude that the efficiency of an estimator is destroyed by nonlinear

transformation. The efficiency of an estimator is maintained for linear (or affine)
transformations which can be verified easily.
Assume that an efficient estimator for θ exists and is given that the observed values of lie
in a small interval about = A. Over this small interval, the nonlinear transformation is
approximately linear. If we linearize g about A, we have the approximation
It follows that within this approximation
so the estimator is statistically unbiased. Also
so as N →∞ the estimator achieves the CRLB (asymptotically) and therefore it is

asymptotically efficient.
2.4.2 Example
Consider the problem of the estimation of speed of a vehicle from the elapsed time as
depicted in Figure 2.4. Find the bound on the accuracy of the speed estimation given that the
elapsed time estimation could be measured efficiently.
Figure 2.4: Vehicle speed measurement

By measuring the time elapsed (T) to cover the known distance (D) by the vehicle, its speed
can be derived as V = . It is obvious that the accuracy of the measured speed is set by the
accuracy of the elapsed time measurement. But the relationship between the speed and the
elapsed time is non-linear so the efficiency in the time measurement would no longer be
carried over to that in the speed measurement.

The bound of the accuracy of speed estimator,
The speed measurement would be less accurate at higher speed (being quadratic) and be more
accurate for larger distance. Thus, the speed estimator can achieve the CRLB only
asymptotically.
2.5.1 CRLB for Signals in White Gaussian Noise
Assume that a deterministic signal with an unknown parameter θ is observed in WGN as
then the CRLB for θ is
This form of the bound emphasizes the importance of the signal dependence on θ. It is easy to
note the fact that signals which change rapidly as the unknown parameter changes result in
accurate estimation.
Proof
The likelihood function is
Differentiating p(x; θ) with respect to θ once produces
and a second differentiation results in
On taking expectation yields
Finally, we have the CRLB as
2.5.2 CRLB for General Gaussian Case

In this a general expression for the CRLB is provided. Assume that
thus both the mean and covariance may depend on θ. Then the Fisher information matrix can
be given by (the proof is lengthy, so omitted)
where
For the scalar parameter case in which

this reduces to
2.5.3 Example
In many fields the observed data is cyclical in nature either due to assumed signal model (in
economics) or due to the underlying physical constraints (in radar and sonar). Thus the signal
processing challenge in those cases gets reduced to estimating the parameters of a sinusoid.
In this example, the determination of the CRLB for the estimation of the ampliture A,
frequency ƒ0, and phase ϕ of a sinusoid embedded in WGN is discussed.
The observed data are
where θ = [A ƒ0 ϕ]T , A > 0, 0 < ƒ0 < 1∕2, and -π ≤ ϕ ≤ π. Since multiple parameters are
unknown, the vector case of CRLB is required to be used and also noting that the covariance
matrix, C = ζ²I, does not depend on θ. Thus we have
For evaluating the CRLB, it is assumed that ƒ0 is not close to 0 or 1∕2 since that allows for
certain simplifications based on approximations as
for i = 0, 1, 2. Using these approximations and letting γ = 2πƒ0n + ϕ, we have

The Fisher information matrix can be given as
Using the facts and upon inversion, we have
where η = A²∕(2ζ²) is the SNR.

Remark: In practice, the problem of the frequency estimation of a sinusoid is of quite
interest. Note that the CRLB for the frequency decreases as SNR increases. Further that the
bound decreases as 1∕N³, thus making the CRLB quite sensitive to the length of data record.
2.6.1 Linear Model
If N-point samples of data are observed and modeled as:
x = Hθ + w
where
x = N ×1observation vector
H = N × p known observation matrix
θ = p ×1 vector of parameters to be estimated
w = N × 1 noise vector with PDF
then using the CRLB theorem θ = g(x) will be an MVUE if:
with C = I-1(θ). So we need to factorize
into the form I(θ)(g(x) -θ). When we do this the MVU estimator for θ is:
and the covariance matrix of θ is:
2.6.2 Example
Consider fitting the data, x(t), by a pth order polynomial function of t:
where θis are the polynomial coefficients and w(t) is the approximation error assumed to be
zero mean Gaussian with a constant variance.
Assume we have N samples of data, then:
so x = Hθ + w, where H is N × p matrix:

Hence the MVUE of the polynomial coefficients based on N samples of the data is:
2.6.3 Example
Consider the problem of estimation of the channel in a wireless communication system.
The receiver cannot decode the transmitted symbols correctly unless the channel is known
which is usually unknown in wireless communication. To overcome this issue, the transmitter
regularly transmits a known pseudorandom noise (PN) sequence u[n] so as to enable the
receiver to estimate the unknow channel as depicted in Figure 2.5. The unknown channel is
modeled with a finite impluse response (FIR) filter of an appropriate order, say p.
Figure 2.5: Channel estimation using known pseudorandom sequence

In practice, the signal received at the receiver is plus some additive noise.
The received data can be modeled as
where it is assuned that u[n] = 0 for n < 0. In matrix form we have
Assuming that w[n] is white Gaussian noise and noting the data is in the linear model form,
the MVU estimator of the channel impulse response is
The covariance matrix of the channel estimator is
Since [H]ij = u[i - j], we have
For large N and the fact that u[n] = 0 for n < 0 and n > N - 1, it can be shown that
which could be identified as the autocorrelation function of the known sequence u[n]. With
this approaximation HTH also takes the form of a symmetric Toeplitz matrix,

where is the autocorrelation function of u[n]. For PN
sequence, the autocorrelation matrix has the property,
so it makes HT H = Nruu[0] I and the covariance of the channel estimator reduces to
Therefore, the FIR filter coefficient estimators are independent and the MVU estimator
for ith filter coefficient can be given as
where rux[i] is the crosscorrelation between data and PN sequence.

Lecture 6 : General Linear Model
2.7.1 General Linear Model
In a general linear model, the two important extensions are:
1. The noise vector, w, is no longer WHITE but has a general Gaussian PDF (0,C)
2. The observed data x, also include the contribution of a known signal vector, s.
Thus the general linear model for the observed data is expressed as:
x = Hθ + s + w
where
s=N× 1 vector of known signal samples
w = N × 1 noise vector with PDF (0,C)
The earlier discussed solution for simple linear model where the noise is assumed to be white
can be used in this after applying a suitable whitening transformation.
If the noise covariance matrix is factored as:
C-1 = DT D
then the matrix D is the required transformation since:
i.e., w′= Dw has PDF (0,I).

Thus by transforming the general linear model:
x = Hθ + s + w
to:
x' = Dx = DHθ + Ds + Dx
x' = H'θ + s' + w'
equivalently,
x" = x' - s' = H'θ + w'
we can then write the MVU estimator of θ given the observed data x'' as:
That is:

and the covariance matrix is:
Note that C is the CRLB which is achieved by the covariance matrix of .

2.7.2 Example
Consider the DC level along with the exponenetial signal in coloured Gaussian noise case.
The observed data is modeled as:
where r is known, A is to be estimated and w[n] is coloured Gassian noise with

known N × N covariance matrix C.
Using the general linear model, we have
where H = [1 1 …1]T and s = [1 r … rN-1]T

By noting that the data follow the genearl linear model, then MVUE of A can then be given
as
and its covariance
3.1 Outline
In this module the general approaches to find MVUE are discussed. These approaches make
use of the sufficient statistics. Also the best linear unbiased estimator (BLUE) is described
which is much easier to find in practical cases. The salient topics discussed this module are:
 Sufficient statistics
 Determination of MVUE using a sufficient statistics
 Best linear unbiased estimation (BLUE)
Lecture 7 : Sufficient Statistics

3.2.1 Sufficient Statistics
For the cases where the CRLB cannot be established, a more general approach to finding the
MVUE is required. This approach is based on first finding a sufficient statistics for the
unknown parameter θ.
A sufficient statistic with respect to a statistical model (i.e., a PDF) is an statistic which
contains all information about a parameter associated with that model available in sample
observations. No other statistic which can be derived from the same sample observation
provides any additional information that the sufficient statistic does.
Consider the problem of estimating a DC level A in WGN ( (0,ζ²)), we know that sample
mean (Â = ∑n=0N-1x[n]) is a MVUE having minimum variance ζ²/N. On the other hand if we
have chosen Ã = x[0] as our estimator, it is obvious that even though it is unbiased but has
much higher (being ζ²) variance than the minimum variance. This poor performance is due to
throwing away the data samples {x[1],x[2],…,x[N - 1]} which carry information about DC
level A to be estimated. At this point it would be of interest to know which data samples are
pertinent for estimation problem or does there exits a set of data that is sufficient. Note that
following data sets may be claimed to be sufficient for finding Â.

Among them S1 is the original data set. Expectedly it must always be sufficient for the
problem. But other statistics S2, S3 and S4 are also sufficient.
Minimal sufficient statistics: In an estimation problem, there can be a number of sufficient
statistics for a parameter but among them the one which contains the minimum number of
elements is termed as the minimal sufficient statistics.
In estimation problems, it is desirable to use the minimal sufficient statistic as it leads to the
estimator in the most compact form. Formally, to prove that a statistic is a sufficient statistic
one needs to show.
Given the sufficient statistic T(x) = t the conditional PDF p(x|T(x) = t; θ) = p(x|T(x) = t) i.e.,
independent of θ.
Often the determination of this conditional PDF is difficult in practice. Further one need to
first guess a sufficient statistics and then verify its sufficiency. This involves much of
guesswork.
To alleviate the guesswork, a simple procedure termed as “Neyman-Fisher factorization
theorem” can be employed for finding the sufficient statistics for the given parameter.
Neyman-Fisher Factorization Theorem - If the PDF of the data p(x; θ) can be factored into
the form,
p(x; θ) = g(T(x), θ)h(x)
where g(T(x),θ)h(x) is a function of T(x) and θ only and h(x) is a function of x only, then T(x)
is a sufficient statistics for θ.
Conceptually one expects that the PDF after the sufficient statistics has been
observed, p(x|T(x); θ), should not depend on θ since T(x) is sufficient statistics for the
estimation of θ and no more knowledge can be gained about θ once we know T(x).
3.2.2 Example
Let a data x = {x[0],x[1],…,x[N - 1]} be independent and Poisson distributed with mean θ.
Assuming the mean parameter θ is unknown and we want to estimate it.
The pdf of the data can be written as:
By letting
We have the sufficient statistics T(x) = ∑n x[n] which is in fact the minimal sufficient
statisitcs.
3.2.3 Example
Consider the problem of estimation of the phase of a sinusoid embedded in WGN, (0,ζ²).
here, the amplitude A and the frequency ƒ0 of the sinusoid as well as the noise PDF are
assumed to be known.
The PDF of the data vector can be given as
Noting that the exponent can be expanded as
Thus the PDF could be factored as
Note in this problen as per the factorization theorem, there is no single sufficient statistics that
exists rather both T1(x) and T2(x) are jointly the sufficient statistics for the estimation of the
phase of the sinusoid.
Lecture 8 : Determination of MVUE

3.3.1 Determination of MVUE using sufficient statistics
There are two ways in which one may derive the MVUE based on the sufficient
statistics, T(x):
1. Method I: Let be any unbiased estimator of θ. Then
is the MVUE.
2. Method II: Find some function g such that = g(T(x)) is an unbiased estimator of θ, i.e., E(g(T(x))
= θ, then is the MVUE.
The basis of above mentioned approaches of finding the MVUE lies in the Rao-Blackwell-
Lehmann-Scheffe (RBLS) theorem which states that = E( |T(x)) is:
 a valid estimator for θ
 unbiased
 of a variance var( ) less or equal to the variance of , for all θ
 the MVUE if the sufficient statistics, T(x) is complete
Note that the sufficient statistics, T(x), is complete if there is only one function g(T(x) that is
unbiased. That is, if h(T(x)) is another unbiased estimator (E(h(T(x))) = θ) then we must
have g= h if T(x) is complete.
The property of completeness of a sufficient statistics depends on its PDF which in turn is
determined by the PDF of data. To validate that a sufficient statistic is complete is in general
quite difficult. But for many practical cases, in particularly, for exponential family of PDFs
the completeness of sufficient statistic holds.
3.3.2 Exponential family of distributions
A study of the properties of probability distributions that have sufficient statistics of the same
dimension as the parameter space regardless of the sample size led to the development of
what is called the exponential family of distributions. The common members of this family
are:
 Binomial distribution
 Exponential distribution
 Gamma distribution
 Goemetric distribution
 Normal distribution
 Rayleigh distribution
On the other hand, there are some distributions which do not belong to the exponential family
of PDFs. The examples of those are:
 Uniform distribution
 Cauchy distribution
 Weibull (unless shape parameter is known)
 Laplace (unless mean parameter is zero)
The one-parameter members of the exponential family have probability density or probability
mass function of the form
where η(θ) and A(θ) are some function of θ, T(x) is function of data x, and h(x) is purely a
function of data, i.e., it does not involve θ.
Suppose that x = {x[0],x[1],…,x[N - 1]} are i.i.d. samples from a member of the exponential
family with the parameter θ, then the joint PDF can be expressed as
From this it is apparent by the factorization theorem that T(x) = ∑n=0N-1T(x[n]) is a sufficient
statistics.
In case of the multi-parameter members of the exponential family, the joint PDF or PMF with
parameter set θ = [θ1,θ2,…,θd]T can be expressed as
Thus, we have a vector of natural sufficient statistics as
3.3.3 Example
Show that the Gamma distribution belongs to the exponential family of distributions.
The Gamma distribution is characterized by the density function
where θ = [α, β] is the parameter set.

The PDF can be written as
On comparing with the form of exponential family
We note that the Gamma distribution has the form of exponential family of distribution
with η1(α, β) = -β, η2(α, β) = (α - 1), T1(x) = x, T2(x) = ln x, A(α, β) = ln Γ(α) - α ln β, and h(x)
= 0.

3.3.4 Example
Consider the previous example of a DC signal embedded in WGN:
where we derived the sufficient statistic, .

Using method 2 we need to find a function g such that E[g(T(x)] = θ = A. Now:
Obviously:
and thus , which is the sample-mean we have already seen

before, is the MVU estimator for θ.
Lecture 9 : Linear Unbiased Estimator

3.4.1 Best Linear Unbiased Estimator (BLUE)
In many estimation problems, the MVUE or a sufficient statistics cannot be found or indeed
the PDF of the data is itself unknown (only the second-order statistics are known in the sense
that they can be estimated from data). In such cases, one solution is to assume a functional
model of the estimator, as being linear in the data, and find the linear estimator which
is unbiased and has minimum variance. This estimator is referred to as the best linear
unbiased estimator (BLUE).
Consider the general vector parameter case θ, the estimator is required to be a linear function
of the data, i.e.,
The first requirement is that the estimator should be unbiased, i.e.,
which can be only satisfied if:
The BLUE is derived by finding the A which minimizes the variance, subject
to the constraint AH = I, where C is the covariance matrix of the data x. Carrying out the
minimization yields the following form for the BLUE:
where .
Salient attributes of BLUE:
 For the general linear model, the BLUE is identical in form to the MVUE.
 The BLUE only assumes only up to 2nd-order statistics and not the complete PDF of the data unlike
the MVUE which was derived assuming Gaussian PDF.
 If the data is truly Gaussian then the BLUE is also the MVUE.
The BLUE for the general linear model can be stated in terms of following theorem.
Gauss-Markov Theorem: Consider a general data model of the form:
x = Hθ + w
where H is known, and w is noise with covariance C (the PDF of w otherwise arbitrary).
Then the BLUE of θ is:
where is the minimum covariance matrix.

3.4.2 Example
Consider a signal embedded in noise:
where w[n] is of unspecified PDF with var(w[n]) = ζn2 and the unknown parameter θ = A is to
be estimated. We assume a BLUE estimate and derive H by noting:
E[x] = 1θ
where x = [x[0],x[1],x[2],…,x[N - 1]] , 1 = [1, 1, 1,…, 1]T and we have H ≡ 1. Also:
T
and hence the BLUE is:
and the minimum covariance is:
and we note that in the case of white noise where ζn2 = ζ2 then we get the sample-mean
estimator:
and the minimum variance .
3.4.3 Example
Consider the acoustic echo cancellation (AEC) problem and the signal flow-graph of the
same is shown in Figure 3.1. A speech signal u[n] from the far-end side is broadcast in a
room by means of a loudspeaker. A microphone is present in room to record the local
signal v[n] which is to be transmitted back to the far-end side. The recorded microphone
signal y[n] = v[n] + x[n] contains the undesired echo x[n] due to acoustic echo path existing
between loudspeaker and microphone. The echo path transfer function is modeled using an
FIR filter , so the echo signal can be considered as a filtered
version of the loudspeaker signal . The object of the AEC is to
estimate the impulse response Ĥ(z) of the echo path so as to produce a echo-free
signal .

Figure 3.1: Acoustic echo cancellation problem
The linear estimation problem for the derivation of echo path impulse response can be written
as
where it is assuned that u[n] = 0 for n < 0. In matrix form we have
here v the near-end signal vector is modeled as zero mean process with correlation matrix
R = EvvT. Any linear estimator of h can be written as a linear function of the microphone
signal vector y as
Minimizing the variance
under the unbiasedness constraint ATU = I results in the BLUE:
with as the minimum covariance.
3.4.4 Example
Show that the BLUE commutes over linear (affine) transformation.
Let us consider that given BLUE of θ we wish to estimate
where B is a known p × p matrix and b is known p × 1 vector.

Then the BLUE of is given by
Consider the linear model of the data:
where H is known, and w is noise with covariance C (the PDF of w otherwise arbitrary).
Then the BLUE of θ is:
where is the minimum covariance matrix. Further assuming that
transformation matrix B is invertible, we get
Then,
Therefore,
This shows that in case of linear transformation of the parameter the BLUE of the
transformed parameter could be obtained by simply by applying the same linear transform to
the BLUE of the original parameter.
Module 4 : Maximum Likelihood Estimation (MLE)

Outline
4.1 Outline
In this module, the principle of maximum likelihood estimation is discussed. This happens to
be the most popular approach for obtaining practical estimators. The maximum likelihood
estimation approach is very desirable in situations where the MVUE does not exist or cannot
be found even though it does exist. The attractive feature of the maximum likelihood
estimator (MLE) is that it can always be found following a definitive procedure, allowing it
to be used for complex estimation problems. Additionally, the MLE is asymptotically optimal
for large data record. The salient topics discussed this module are:
 Basic Procedure of MLE
 MLE for Transformed Parameters
 MLE for General Linear Model
 Asymptotic Property of MLE
Lecture 10 : Maximum Likelihood Estimation

4.2.1 Basic Procedure of MLE
In some case the MVUE may not exist or it cannot be found by any of the methods discussed
so far. The maximum likelihood estimation (MLE) approach is an alternative method in cases
where the PDF or the PMF is known. This PDF or PMF involves the unknown parameter θ
and is called the likelihood function. With MLE the unknown parameter is estimated by
maximizing the likelihood function for the observed data. The MLE is defined as:
where x is the vector of observed data (of N samples).

It can be shown that is asymptotically unbiased:
and asymptotically efficient:

An important result is that if an MVUE exists, then the MLE procedure will produce it.
Proof
Assume a scaler parameter case, if an MVUE exists, then the log-likelihood function can be
factorized as
where .
On maximizing the likelihood function, by setting its derivative to zero, yields the MLE
Another important observation is that unlike the previous estimates the MLE does not
require an explicit expression for p(x; θ)! Indeed given a histogram plot of the PDF as a
function of θ one can numerically search for the θ that maximizes the PDF.
4.2.2 Example
Consider the problem of a DC signal embedded in noise:
where w[n] is WGN with zero mean and known variance ζ2.
We know that the MVU estimator for θ is the sample-mean. To see that this is also the MLE,
we consider the PDF:
and maximize the log likelihood function by setting it to zero:
Thus which is the sample-mean.
4.2.3 Example
Consider the problem of a DC signal embedded in noise:
where w[n] is WGN with zero mean but unknown variance which is also A, that is the
unknown parameter, θ = A, manifests itself both as the unknown signal and the variance of
the noise. Although a highly unlikely scenario, this simple example demonstrates the power
of the MLE approach since finding the MVUE by the procedures is not easy. Consider the
likelihood function for x is given by:
Now consider p(x; θ) as a function of θ, thus it is a likelihood function and we need to

maximize it with respect to θ. For Gaussian PDFs it is easier to find the maximum of the log-
likelihood function (since logarithm is a monotonic function):
On differentiating we have:
on setting the derivative to zero and solving for θ, produces the MLE:

where we have assumed θ > 0. It can be shown that:
and:
Lecture 11 : MLE in General Cases

4.3.1 MLE for Transformed Parameters
The MLE of the transformed parameter, α = g(θ) is given by:
where is the MLE of θ. If g is not one-to-one function (i.e., not invertible) then is
obtained as the MLE of transformed likelihood function, pT (x; α), which is defined as:
4.3.2 Example
In this example we demonstrate the ﬁnding of transformed MLE. In context of the previous
example, consider two different parameter transformations (i) α = exp(A) and (ii) α = A².
Case (i) From previous example, the PDF parameterized by the parameter θ = A can be given
as
Since α is a one-to-one transformation of A, the PDF parameterized in terms of

the transformed parametercan be given as
Thus pT (x; α) is the PDF of the data set,
Now to find the MLE of α, setting the derivative of pT (x; α) with respect to α to zero yields
or
But being the MLE of A, so we have = exp(Â). Thus the MLE of the transformed
parameter is found by substituting the MLE of the original parameter into the transformation
function. This is known as invariance property of MLE.
Case (ii) Since , the α is not one-to-one transformation of A. If we
take only then some possible PDFs will be missing. To characterize all possible
PDFs, we need to consider two sets of PDFs

The MLE of α is the value of α that yields the maximum of pT1(x; α) and pT2(x; α) or
The maximum can be found in two steps as

1. For a given value of α, say α0, determine whether pT1(x; α) or pT2(x; α) is larger. If for
example pT1(x; α0) > pT2(x; α0) then denote the value of pT1(x; α0) as . Repeat for
all α > 0 to form . Note that .
2. The MLE is given as the α that maximizes over α ≥ 0.
Thus the MLE is
Again the invariance property holds.

4.3.3 MLE for General Linear Model
Consider the general linear model of the form:
x = Hθ + w
where H is a known N × p matrix, x is an N × 1 observation vector with N samples,
and w is N × 1 noise vector with PDF (0,C). The PDF of the observed data is:
and the MLE of θ is found by differentiating the log-likelihood which can be shown to yield:
which upon simplification and setting to zero becomes:
and this yields the MLE of θ as:
which turns out to be same as the MVU estimator.

Lecture 12 : Properties of MLE
4.4.1 Asymptotic Normality Property of MLE
The asymptotic property of the MLE can be stated as follows. If the PDF p(x; θ) of the
data xsatisfies some “regularity” conditions, then the MLE of the unknown parameter θ is
asymptotically distributed (i.e., for large data record) according to
Proof
Proof
In the following, the proof of this important property is outlined for the scalar parameter case.
Assuming that the observations are IID and the regularity condition holds
i.e., . Further assume that the first-order and the second-order derivatives
of the likelihood function are defined.
Before deriving the asymptotic PDF, first its is shown that MLE is a consistent estimator. For
this using the Kullback-Leibler information inequality
or
with equality if and only if θ1 = θ2 or the right-hand side of the above inequality is maximized
for θ = θ0. As the data is IID, the maximization of log-likelihood function is equivalently
maximizing
But for N → ∞, this converges to the expected value by the law of large numbers. Hence, if
θ0 be the true value of θ, we have
By a continuity argument, the normalized log-likelihood function also maximizes for θ = θ0or
as N → ∞, the MLE is = θ0. Thus the MLE is consistent.
To derive the asymptotic PDF of the MLE using the mean value theorem, we have
where . But by the definition of the MLE the left-hand side of the above relation
is zero, so that
Now considering , from above relation we have
Due to IID assumption
Since , we must also have → θ0 due to consistency of the MLE. Hence

where the last convergence is due to law of large numbers and i(θ0) denotes the information
for single sample. Also the numerator term is
Now let is a random variable, being the function of x[n]. Additionally, since
the x[n]s are IID so are the ξns. By the central limit theorem the numerator term has the PDF
that converges to a Gaussian with mean
and the variance
due to independence of the random variables. On applying the Slutsky‟s theorem which says
that if a sequence of random variable xn has the asymptotic PDF of the random variable x and
the sequence of random variables yn converges to a constant c, then xn / yn has the same
asymptotic PDF as the random variable x∕c. Thus in this case
So that
or equivalently
or finally
Thus the distribution of MLE for a parameter is asymptotically normal with mean as true
value of the paramter and the variance as the inverse of the Fisher information.
Module 5 : Least Squares Estimator

Outline
5.1Outline
In this module, a class of estimator is introduced which unlike the earlier discussed optimal
(MVU) or asymptotically optimal (MLE) estimator has no optimality property in general. In
this approach no probabilistic assumptions about the data are made, only a signal model is
assumed. The least squares estimator (LSE) is determined by the minimization of least
squares error and is widely used in practice, due to ease of implementation. The salient topics
discussed this module are:
 Basic Procedure of LSE
 Linear Least Squares
 Geometrical Interpretations of LS Approach
 Constrained Least Squares

Lecture 13 : Least Squares Estimation
5.2.1 Basic Procedure of LSE
The MVUE, BLUE, and MLE developed previously required an expression for the PDF
p(x; θ) in order to estimate the unknown parameter θ in some optimal manner. An alternative
approach is to assume a signal model (rather than making probabilistic assumptions about the
data) and achieve a design goal assuming this model. With the least squares (LS), approach
we assume that the signal model is a function of the unknown parameter θ and produces a
signal:
where s(n; θ) is a function of n and parameterized by θ.

Due to measurement noise and model inaccuracies w[n], only the noisy version x[n] of the
true signal s[n] can be observed as shown in Fig. 5.1 .
Figure 5.1: Signal model employed in least squares estimation.

Unlike previous approaches no assumption is made about the probabilistic distribution of
w[n]. We only state that what we have observed is an “error” e[n] = x[n] - s[n] which with the
appropriate choice of θ should be minimized in a least-squares sense. Thus we
choose θ = so that the cost function:
is minimized over N observation samples of interest and we call this the LSE of θ. More
precisely we have:
and the minimum LS error is given by:
An important assumption to produce a meaningful unbiased estimate is that the noise and
model inaccuracies, w[n], have zero mean. However no other probabilistic assumption about
the data is made (i.e., LSE is valid for both Gaussian and non-Gaussian noise). At the same
time we can not make any optimality claims with LSE (as this would depend on the
distribution of the noise and modeling errors).
A problem that arises from assuming the signal model function s(n; θ) rather than knowledge
of p(x; θ) is the need to choose an appropriate signal model. Then again in order to obtain a
closed form or parametric expression for p(x; θ) one usually requires to know what the
underlying model and noise characteristics are anyway.
5.2.2 Example
Consider observations, x[n], arising from a DC-level signal model, s[n] = s(n; θ) = θ:
where θ is the unknown parameter to be estimated. Then we have:
On differentiating wrt θ and setting to zero:

and hence which is the sample-mean. We also have:
Lecture 14 : Linear LSE

5.3.1 Linear Least Squares
Although there is no restrictions on the form of the assumed signal model in LSE, but often it
is assumed that the signal model is a linear function of the parameter to be estimated:
s = Hθ
where s = [s[0],s[1],s[2],…,s[N - 1]]T and H is a known N × p matrix with θ = [θ1,θ2,…,θp]T.
Now
x = Hθ + w
and with x = [x[0],x[1],x[2],…,x[N - 1]] we have:
T
On differentiating and setting to zero:
this yields the required LSE:
which surprisingly is identical in functional form to the MVU estimator for the linear model.
An interesting extension to the linear LS is the weighted LS where the contribution to the
error from each component of the parameter vector can be weighted in importance by using a
different form of the error criterion:
where W is an N × N positive definite (symmetric) weighting matrix. The weighting

matrix Wis generally a diagonal and its main purpose is to emphasize the contribution of
those data samples that are deemed to be more reliable.
5.3.2 Example
Reconsider the acoustic echo cancellation problem discussed earlier having the signal flow
diagram re-shown in Figure 5.2 for the ease of reference. We now show how linear LS
appraoch can be used for estimating the required echo path impulse response in this problem.

Recall the output vector at the microphone is expressed as
y = Uh + v
Any linear estimator of h can be written as a linear function of the microphone signal
vector y as
Minimization of the least square (LS) cost function:
would result in the LS estimator:
Recall that for this problem, the earlier derived BLUE is of the form:
Note that the LSE does not take into account the near-end signal characteristics (R = EvvT)
and therefore, in practice, it is not found as effective as the BLUE.
Lecture 15 : LSE: Geometrical Interpretation

5.4.1 Geometrical Interpretations of LS Approach
The geometrical perspective of the LS approach helps provide the insights into the estimator
and also reveals other useful properties. Consider a general linear signal model s = Hθ. On
denoting the ith column of H by hi, the signal model can be seen as a linear combination of the
“signal” vectors as
The LS error was defined to be
The Euclidean length of an N × 1 vector can be defined as
||ξ|| = then the LS error can also be written as

Figure 5.3: Geometrical visualization of linear least squares in 3-dimensional (R³) space
We now note that the linear LS approach attempts to minimize the square of the distance
from the data vector x to a signal vector , which must be a linear combination of
the columns of H. The data vector can lie anywhere in an N-dimensional space, termed RN,
while all possible signal vectors, being linear combinations of p < N vectors, must lie in a p-
dimensional subspace of RN, termed Sp. The full rank of H assumption ensures that the
columns are linearly independent and hence the subspace is truly p-dimensional. For N = 3
and p = 2 this is illustrated in Figure 5.3. Note all possible choices of θ1, θ2 (where assume
that -∞ < θ1 < ∞ and -∞ < θ2 < ∞) produce signal vectors constrained to lie in subspace S² and
that in general xdoes not lie in the subspace. It is intuitively obvious that the vector that lies
in S² and that is closest to x in the Euclidean sense is the component of x in S². In other
words, is the orthogonal projection of x onto S². This makes the error vector x - to be
orthogonal (or perpendicular) to all vectors in S². Two vectors in RN are defined to be
orthogonal if xTy = 0. For the considered example, we can determine the appropriate by
using the orthogonality condition as
Letting = θ1h1 + θ2h2, we have
On combining the two equations and using the matrix form, we have
or
Finally, we get the LSE as
Note that if ϵ = x - Hθ denotes the error vector, then
The error vector must be orthogonal to the columns of H. This is the well
known orthogonality principle. In effect the error represent the part of x that cannot be
described by the signal model. The minimum LS error J min can be given as

Figure 5.4: Effect of nonorthogonality of the columns of the observation matrix H
As illustrated in Figure 5.4 (a), if the signal vectors h1 and h2 were orthogonal, then could
have easily been found. This is because the components of along h1 or 1 do not contain a
component of along h2. If this does not happen then we have the situation as in
Figure 5.4 (b). Making the orthogonality assumption and also assuming that ||h1|| = ||h2|| = 1
(orthonormal vectors), we have
where hiTx is the length of the vector x along hi. In matrix notation this is
so that
This result is due to orthonormal columns of H. As a result, we have HTH = I and therefore
In general, the columns of H will not be orthogonal, so that the signal vector estimate is
obtained as
The signal estimate is orthogonal projection of x onto the p-dimensional subspace.

The N × N matrix P = H(HTH)-1HT is known as projection matrix. It has the properties that it
is symmetric (PT = P), idempotent (P2 = P) and it must be singular (for independent columns
of H, it has rank p).

Lecture 16 : Constrained LSE
5.5.1 Constrained Least Squares
In some LS estimation problems, the unknown parameters are constrained. Consider that we
wish to estimate the amplitudes of a number of signals but it is known a priori that some of
the signals are of same amplitude. In this case, the number of parameters can be reduced to
take advantage of the prior knowledge. If the parameters are linearly related, it leads to least
squares problem with linear constraints and could be easily solved.
Consider a least squares estimation problem with parameter θ subjected to r < p linear
constraints. Assuming the constraints be independent, we can summarize the constraints as
Aθ + b
where A is a known r × p matrix and b is a known r × 1 vector. To find the LSE subject to
constraints we setup a Lagrangian and determine the constrained LSE by minimizing the
Lagrangian
where λ is a r × 1 vector of Lagrangian multipliers. On taking the derivative of Jc(θ) with

respect to θ and setting it to zero produces,
where is unconstrained LSE and λ is yet to be determined. To find λ we impose the

constraint so that that
and hence
Substituting λ in earlier found expression of c results in the final estimator
Remark: Note that the constrained LSE is a corrected version of the unconstrained LSE. In
cases where the constraint by chance satisfied by or A = b, then according to above
relation the LSE and constrained LSE be identical. Such is usually not the case however.
5.5.2 Example
In this example we explain the effect of constraints on LSE. Consider a signal model
if we observe {x[0], x[1], x[2]}, find the LSE.

The signal vector can be expressed as
The unconstrained LSE and signal estimate can be obtained as

Now assume that it is known a priori that θ1 = θ2. Expressing this constraint in matrix form
we have [1 - 1]θ = 0. So that A = [1 - 1] and b = 0. Noting that HTH = I, we can get the
constrained LSE as
With some matrix algebra we can show the constrained LSE and corresponding signal
estimate as
Since θ1 = θ2, the two observations are averaged which is intuitively reasonable. In this simple
problem, we can easily incorporate the constraint into the given signal model
Note the parameter to be estimated gets reduced to θ only. On estimating the unconstrained
LSE of θ using the reduced signal model would have produced the same result. Similar to the
least squares, it would be interesting to view the constrained least squares estimation problem
geometrically as done in Figure 5.5 in the context of this example.
Figure 5.5: Graphical representation of unconstrained and constrained least squares
Module 6 : Bayesian Estimation

Outline
6.1 Outline
In this module, a new class of estimators are introduced. This class departs from the classical
approach to statistical estimation in which the unknown parameter of interest is assumed to
be a deterministic but unknown constant. Instead it is assumed that the unknown parameter is
a random variable whose particular realization is to be estimated. This approach makes direct
use of the Bayes‟s theorem and so is commonly termed as Bayesian approach. The two main
advantages of this approach are the incorporation of prior knowledge about the parameter to
be estimated and providing an alternative to the MVUE when it cannot be found. First, the
minimum mean square estimator (MMSE) is discussed. It is followed by introduction to
Bayesian linear model which allows the finding of the MMSE with ease. The salient topics
discussed this module are:
 Minimum mean square estimator
 Bayesian linear model
 General Bayesian estimators

Lecture 17 : Bayesian Estimation
6.2.1 Minimum Mean Square Estimator (MMSE)
The classical approach we have been using so far has assumed that the parameter θ is
unknown but deterministic. This the optimal estimator is optimal irrespective and
independent of the actual value of θ. But in cases where the actual value or prior knowledge
of θ could be a factor (e.g. where the MVU estimator does not exist for certain values or
where prior knowledge would improve the estimator performance), the classical approach
would not work effectively.
In the Bayesian approach the θ is treated as a random variable with a known prior pdf, p(θ).
Such prior knowledge concerning the distribution of the estimator should provide better
estimators than the deterministic case.
In the classical approach the MVU estimator is derived by first considering minimization of
the mean square error, i.e., = arg minθ mse(θ) where:
and p(x; θ) is the PDF of x parameterized by θ. In the Bayesian approach, the estimator is
similarly derived by minimizing = arg minθ Bmse(θ) where:
is the Bayesian mean square error and p(x,θ) is the joint PDF of x and θ (since θ is now a
random variable). It is to note that the squared error ( - θ)² is identical in both Bayesian and
classical MSE. The minimum Bmse( ) estimator or MMSE estimator is derived by
differentiating the expression of Bmse( ) with respect to and setting this to zero to yield:
where the posterior PDF, p(θ|x), is given by
Thus MMSE is the conditional expectation of the parameter θ given the observations x. Apart
from the computational (and analytical!) requirements in deriving an expression for the
posterior PDF and then evaluating the expectation E(θ|x) there is also the problem of finding
an appropriate prior PDF. The usual choice is to assume that the joint PDF, p(x,θ), is
Gaussian and hence both the prior PDF, p(θ) and posterior PDF, p(θ|x), are also Gaussian
(this property implies the Gaussian PDF is a conjugate prior distribution). Thus the form of
the PDFs remains the same and all that changes are the means and the variances.
6.2.2 Example
Consider a signal embedded in noise:
where as before w[n] ~ (0,ζ²) is a WGN process and the unknown parameter θ = A is to be
estimated. However in the Bayesian approach we also assume the parameter A is a random
variable with prior PDF which in this case is Gaussian PDF, p(A) = (μA,ζA²). We also
have p(x|A) = (A,ζ²) and we can assume that A and x are jointly Gaussian. Thus the
posterior PDF:
is also Gaussian PDF and after the required simplification we have:

and hence the MMSE is:
where .
Upon closer examination of MMSE we observe the following (assume ζA² ≪ ζ²):
1. With fewer data (N is small) we have and Â ⇒ μA, that is the MMSE tends towards
the mean of the prior PDF and effectively ignores the contribution of the data. Also p(A|x) ≈
(μA,ζA²).
2. With large amounts of data (N is large) we have and Â ⇒ , that is the MMSE
tends towards the sample-mean and effectively ignores the contribution of the prior
information. Also p(A|x) ≈ .
Conditional PDF of Multivariate Gaussian: If x and y are jointly Gaussian where x is k × 1
and y is l × 1, with the mean vector [E(x)T, E(y)T]T and the partitioned covariance matrix,
then the conditional PDF, p(y|x), is also Gaussian and the posterior means vector and the
covariance matrix are given by:
This result can be used for the MMSE estimation involving jointly Gaussian parameter vector
and the data vector.
Lecture 18 : Properties of Bayesian Estimator
6.3.1 Bayesian Linear Model
Now consider the Bayesian linear model:
x = Hθ + w
where θ is the unknown parameter to be estimated with prior PDF (μθ, Cθ) and w is a
WGN with PDF (0, Cw). The MMSE is provided by the expression for E(y|x) where we
identify
y ≡ θ. We have:
and we can show that:
and hence since x and θ are jointly Gaussian we have:
6.3.2 Nuisance Parameters

Suppose that both θ and α were unknown parameters but we are only interested in θ.
Then α is a nuisance parameter. We can deal with this by “integrating α out of the way”.
Consider the Bayes‟s rule for the posterior PDF:
Now p(x|θ) is, in reality, p(x|θ,α), but we can obtain the true p(x|θ) by:

and if α and θ are independent then:
6.3.3 Relation with Classical Estimation

In the classical estimation we do not make any assumption on the prior, thus all
possible θ have to be considered. The equivalent prior would be a flat distribution and
essentially σθ² = ∞. This so-called non-informative prior PDF will yield the classical estimator
where such is defined.
6.3.4 Example
Consider again the signal embedded in noise problem:
where we have already shown that the MMSE is:
where . If the prior is non-informative then ζA² = ∞ and α = 1 with Â- which is

the classical estimator.
Lecture 19 : General Bayesian Estimator
6.4.1 General Bayesian Estimators
The Bmse( ) is given by:
is one specific case for a general estimator that attempts to minimize the average of the cost
function, (ϵ), that is the Bayesian risk = E[ (ϵ)] where ϵ = (θ - ). Figure 6.1 shows the
plots of three different cost functions of wide interest and the discussion about which one of
the central tendencies of the posterior PDF gets emphasised with choice of these cost
functions is given below:
1. Quadratic: (ϵ) = ϵ² which yields = Bmse( ). It is already shown that the estimate to
minimize = Bmse( ) is:
which is the mean of the posterior PDF.

2. Absolute: (ϵ) = |ϵ| The estimate, , that minimizes = E[|θ - |] satisfies:
which is the median of the posterior PDF.
3. Hit-or-miss: where δ is a very small threshold. Note that a uniform

penality is assigned here to all errors greater than δ.
The estimate that minimizes the Bayes risk can be shown to be:
which is the mode of the posterior PDF, i.e., the value that maximizes the PDF.

Figure 6.1: Common cost functions used in finding the Bayesian etimator
For the Gaussian posterior PDF, it should be noted that the mean, the median and the mode
are identical. Of most interest are the quadratic and hit-or-miss cost functions which, together
with a special case of the later, yield the following three important classes of estimators:
1. MMSE Estimator: The minimum mean square error (MMSE) estimator which has already
been introduced as the mean of the posterior PDF.
2. MAP Estimator: The maximum a posteriori (MAP) estimator which is the mode (or
maximum) of the posterior PDF.
3. Bayesian ML Estimator: The Bayesian maximum likelihood estimator which is the special
case of the MAP estimator where the prior PDF, p(θ), is uniform or non-informative:
Noting that the conditional PDF of x given θ, p(x|θ), is essentially equivalent to the PDF
of x parameterized by θ, p(x|; θ), the Bayesian ML estimator is equivalent to the classical
MLE.
Comparison among the three types of Bayesian estimators:
 The MMSE is preferred due to its least-squared cost function but it is also the most difficult
to derive and compute due to the need to find an expression of the posterior PDF, p(θ|x) in
order to integrate ∫ p(θ|x)dθ.
 The hit-or-miss cost function used in the MAP estimator though less “precise” but it is much
easier to derive since there is no need to integrate, only find the maximum of the posterior
PDF p(θ|x) which can be done either analytically or numerically.
 The Bayesian ML is equivalent in preference to the MAP only in the case where the prior is
non-informative, otherwise it is a sub-optimal estimator.
 Like the classical MLE, the expression for the conditional PDF, p(x|θ) is easier to obtain
rather than that of the posterior PDF, p(θ|x). Since in most cases knowledge of the prior is not
available so, not surprisingly, classical MLE tend to be more prevalent. However it may not
always be prudent to assume that prior is uniform, especially in the cases where prior
knowledge of the estimate is available even though the exact PDF is unknown. In these cases
a MAP estimate may perform better even if an “artificial” prior PDF is assumed (e.g., a
Gaussian prior which has the added benefit of yielding a Gaussian posterior).

Module 7 : Estimation of Signals
Outline
7.1 Outline
The Bayesian estimators discussed in previous module are difficult to implement in practice
as they involve multi-dimensional integration for the MMSE estimator and multi-dimensional
maximization for the MAP estimator. In general these estimators are very difficult to derive
in closed form except under the jointly Gaussian assumption. When ever the Gaussian
assumption is not valid, an alternate approach is required. In this module we introduce the
Bayesian estimators derived with linearity constraint which depends on only first two
moments of the PDF. This approach is analogous to the BLUE in classical estimation case.
These estimators are also termed as Wiener filter and find extensive use in practice. The
salient topics discussed this module are:
 Linear Minimum Mean Square Error (LMMSE) Estimator
 Bayesian Gauss-Markov Theorem
 Wiener Filtering and Prediction
Lecture 20 : Linear Bayesian Estimator

7.2.1 Linear Minimum Mean Square Error (LMMSE) Estimator
Assume that the parameter θ is to be estimated based on the data set x = [x[0],x[1],…,x[N -
1]]T. Rather than assuming any specific form for the joint PDF p(x,θ), we consider the class
of all affine estimators of the form:
where a = [a1,a2,…,aN-1]T.
The estimation problem now is to choose the weight coefficients [a,aN] to minimize the
Bayesian MSE:
The resultant estimator is termed as linear minimum mean square error (LMMSE) estimator.
It is to note that the LMMSE estimator will be sub-optimal unless the MMSE estimator
happens to be linear. The MMSE estimator is linear when θ and x are jointly Gaussian.
If the Bayesian linear model is applicable, we can write
x = Hθ + w
The weight coefficients are obtained from = 0 for i = 0, 1,…,N - 1 this yields:
where Cxx is N × N covariance matrix and Cxθ is N × 1 cross-covariance vector.

Thus the LMMSE estimator is:
where we note that Cθx = CxθT.

For the 1 × N vector parameter θ an equivalent expression for LMMSE estimator is derived
as:
and the Bayesian MSE matrix is:

where Cθθ = E[(θ - E(θ))(θ - E(θ))T] is p × p covariance matrix.
7.2.2 Example
Let us revisit the acoustic echo cancellation problem, the purpose is to show that variance of
the estimator of the echo path impulse response could be lowered further by dropping the
unbiasedness constraint. The signal flow diagram of the problem is shown again in Figure 7.1
for the ease of reference.

Recall the output vector at the microphone is expressed as
y = Uh + v
Let us assume that the echo path (room) impulse response is a random variable ho with some
prior knowledge is available i.e., ho has the prior PDF p(ho) with first and second moments
given as:
Any linear estimator minimizing the Bayesian MSE
where ho denotes the true value of the impulse response, is then given by the mean of the
posterior PDF p(ho|y)
If the prior knowledge is constructed as μo = 0 then
It is worth comparing the form of LMMSE with that of earlier derived BLUE:
Note that in MMSE criterion of the Bayesian framework, the variance of the estimator and
squared bias are weighted equally. Thus the variance of the estimator could be reduced
further than that of the MVUE if the estimator is no longer constrained to be unbiased.

Lecture 21 : Wiener Smoother
7.3.1 Bayesian Gauss-Markov Theorem
If the data are described by the Bayesian linear model:
x = Hθ + w
where x is N × 1 data vector, H is known N × p observation matrix, θ is p × 1 random vector
of parameters with mean E(θ) and covariance matrix Cθθ and w is an N × 1 noise vector with
zero mean and covariance matrix Cw which is uncorrelated with θ (the joint PDF p(w,θ) and
hence also p(x,θ) are otherwise arbitrary). Noting that:
Therefore, the LMMSE estimator is:
We assume N sample of time-series data x = [x[0], x[1],…, x[N - 1]]T which are wide-sense
stationary (WSS). Further as E(x) = 0, such N × N covariance matrix takes the symmetric
Toeplitz form:
where rxx[k] = E(x[n]x[n-k]) is the autocorrelation function (ACF) of the x[n] process
and Rxxdenotes the autocorrelation matrix. Note that since x[n] is WSS the
expectation E(x[n]x[n-k]) is independent of the absolute time index n.
In signal processing the estimated ACF is used. The estimated ACF is given by,
Both the data x and the parameter to be estimated are assumed to be zero mean. Thus the
LMMSE estimator is:
Application of the LMMSE estimation to the three signal processing problems such as
soomthing, filtering and prediction gives rise to different kinds of Wiener filters and are
discussed in the following sections.
7.3.2 Wiener Smoothing

The problem is to estimate the signal θ = s = [s[0], s[1],…, s[N - 1]]T based on the noisy data
x = [x[0], x[1],…, x[N - 1]]T where
x=s+w
and w = [w[0], w[1],…, w[N - 1]] is the noise process. An important difference between
T
smoothing and filtering is that the signal estimate ŝ[n] can use the entire data set:
the past values (x[0], x[1],…, x[n- 1]), the present value x[n] and the future values
(x[n + 1], x[n + 2],…, x[N - 1]). This means that the solution cannot be cast as “filtering”
problem since we cannot apply a causal filter to the data. We assume that the signal and noise
processes are uncorrelated. Hence,
and thus
Also
Hence the LMMSE estimator (also called Wiener estimator) is,

and the N × N matrix
is referred to as the Wiener smoothing matrix.

Non-causal Wiener filtering
Consider a smooting problem in which a signal s[n] is required to be estimated given a noisy
signal x[n] of infinite length, i.e., {…, x[-2], x[-1], x[0], x[1], x[2],…} or x[k] for all k. In such
cases, the smoothing estimator takes the form
Now, by letting h[k] = αn-k, the above estimator can be expressed as convolution sum
where h[k] could be indentifies as the impulse response of an infinite length two-sided time-
invariant filter.
Analogous to the LSE case (refer to 5.4.1) the orthogonality principle also holds for the
LMMSE case, i.e., the error in estimation (θ - ) is always orthogonal (or perpendicular) to
the observed data {…, x[-1], x[0], x[1],…}. This could be mathematically expressed as
Using the definition of the orthogonality principle, we have
Hence,
Thus the equations required to be solved for determining the infinite Wiener filter impulse
response, also referred to as Wiener-Hopf equations, are given by
On taking the Fourier transform of both sides of the above equation we have
where H(ƒ) is the frequency response of the infinite Wiener smoother, Pxx(ƒ) and Pss(ƒ) are the
power sepctral density of noisy and clean signals, respectively.
As the signal and noise are assumed to be uncorrelated, the frequency response of Wiener
smoother can be expressed as
Remarks: Since the power spectral densities are real and even function of frequency, the
impulse response also turns out to be real and even. This means that the designed filter is
non-causal which is consistent with the fact that the signal is estimated using both future as
well as present and past data.

7.3.3 Example
Consider a noisy signal x[n] consisting of a desired clean signal s[n] corrupted with an
additive white noise w[n]. Given that the autocorrelation function (ACF) of signal and noise
are
respectively. Assume that signal and noise are uncorrelated and zero mean. Find the non-
causal optimal Wiener filter for estimating the clean signal from its noisy version.
On computing the z-transform of ACF, the power spectral of the signal and the noise can be
derived as
The z-transform of the non-causal optimal Wiener filter can be given as
Given Hopt(z), the impulse response of the optimal stable filter turns out as
Lecture 22 : Wiener Filter

7.4.1 Wiener Filtering
The problem is to estimate the signal θ = s[n] based only on the present and past noisy data
x = [x[0],x[1],…,x[N - 1]]T. As n increases this allows us to view the estimation process as an
application of a causal filter to the data and we need to cast the LMMSE estimator expression
in form of a filter.
Assuming the signal and noise processes are uncorrelated, we have,
where Cxx is the autocorrelation matrix.

Note that Cθx is a 1 × (n + 1) row vector.
The LMMSE estimator is given by,
where is the (n + 1) × 1 vector of weights. Note

that the “check” subscript is used to denote time reversal. Thus we have

The process of forming the estimator as time evolves can be interpreted as a filtering
operation. Specifically we let h(n)[k], the time-varying impulse response, be the response of
the filter at time n to an impulse applied k samples before (i.e., at time n-k). We note
that di can be interpreted ad the response of the filter at time n to the signal (or impulse)
applied at time
i = n - k. Thus we can make the following correspondence,
Then:
We define the vector h = [h(n)[0]h(n)[1]…h(n)[n]]T. Then we have , that is, h is a time-

reversed version of a. To explicitly find the impulse response h we note that since,
then it is also true that,
When written out we get the Wiener-Hopf filtering equations:
where rxx[k] = rss[k] + rww[k].

Remark: A computationally efficient solution for solving the equations exists and is known
as Levinson recursion which solves the equations recursively to avoid resolving them for
each value of n.
Causal Wiener Filtering
On using the property rxx[k] = rxx[-k], the Wiener-Hopf equation can be written as
For large data (n → ∞) case, the time-varying impulse response h(n)[k] could be replaced with
its tine-invariant version h[k] and we have
This is termed as the infinite Wiener filter. The determination of the causal Wiener filter
involves the use of the spectral factorization theorem and is explained in the following.
The one-sided z-transform of a sequence x[n] is defined as
Now we could write the Wiener-Hopf equation as
where the filter impulse response h[n] is constrained to be causal. The two-sided z-transform
that satisfies the Wiener-Hopf equation could be written as
If has no zeros on the unit circle, then it can be factorized as

where (z) is the z-transform of a causal sequence and (z-1) is the z-transform of
a anticausal sequence. Thus we have
Now letting , so that
Noting that is the z-transform of a causal sequence, it can be shown that
7.4.3 Example
Consider a signal s[n] corrupted with an additive white noise w[n]. The signal and noise are
assumed to be zero mean and uncorrelated. The autocorrelation function (ACF) of the signal
and noise are
Find the causal optimal Wiener filter to estimate the signal from its noisy observations.
As the signal and noise are uncorrelated, so we have
and on taking z-transform, we have
The z-transform of signal ACF, after some manipulation, is
The z-transform of noise ACF is
Thus, the z-transform of noisy signal is given by
On factorizing Pxx(z) into causal and anticausal parts, we have,

Note corresponds to a right-handed sequence while corresponds to a left-
handed sequence. Figure 7.2 shows the plot of the sequences corresponding to these
components as well as that of the optimal non-causal filter obtained by combination.
Form these plots, it is straight forward to determine the causal sequence corresponding to the
optimal filter and that is also shown in Figure 7.2 . Thus
Figure 7.2: Plot showing the two-sided and the causal optimal filter for causal Wiener filter
example problem.
The z-transform of optimal causal Wiener filter can be given as

Lecture 23 : Wiener Predictor
7.5.1 Wiener Prediction
The problem is to estimate a future sample θ = x[N - 1 + l], as l ≥ 1, based on the current and
past data x = [x[0],x[1],…,x[N - 1]]T. The resulting estimator is termed as l-step linear
predictor.
As before we have Cxx = Rxx where Rxx is N × N autocorrelation matrix and:
Then, the LMMSE estimator is:
where . We can interpret the process of forming the estimator as

filtering operation where
Therefore,
Defining as before we can find an

explicit expression for h by noting that:
where rxx = [rxx[l]rxx[l + 1]…rxx[l + N - 1]]T is the time-reversed version of When written
out we get the Wiener-Hopf prediction equations:
As pointed out earlier the Levinson recursion is the computationally efficient procedure for
solving these equations recursively. The special case for l = 1, the one-step predictor, covers
two important cases in signal processing:
 The values -h[n] are termed as the linear prediction coefficients (LPC) which are used extensively in
speech coding. For example, commonly a 10th-order (N = 10) linear predictor used in speech coding
and is given by
 The resulting Wiener-Hopf equations equations are identical to the Yule-Walker equationsused to
solve the autoregressive (AR) filter parameters of an AR(N) process.
7.5.2 Example
Consider a real wide-sense stationary (WSS) random process x with autocorrelation sequence
Find the coefficients of second-order optimal linear predicitor filter.

The first few coefficients of the autocorrelation sequence rxx(k) are:
The second-order optimal linear predictor coefficients is defined as

where ais are the predictor coefficients. These predictor coefficients are obtained by solving
the Wiener-Hopf prediction equations for N = 2
or
On solving we have a1 = -5∕6 and a2 = 1∕6. Thus, the optimal predictor polynomial is given by
or

Module 12 : Review Questions
Assignments
12.1 Long Answer Questions
1. What do you mean by an efficient parameter estimator?
2. Discuss the asymptotic efficiency of an parameter estimator under nonlinear transformation
of the parameter.
3. Explain the purpose of whitening transformation in context of the general linear model for
parameter estimation.
4. Define sufficient statistic. What do you mean by minimal sufficient statistic?
5. What do you mean by the invariance property of the maximum likelihood estimator?
Establish the property.
6. Explain the role of projection matrix in context of linear least squares estimation.
7. Comment on the need of finding a conjugate prior in context of Bayesian estimation.
8. What are the nuisance parameters and how do we deal them in context of Bayesian
estimation?
9. Discuss the relationship between Bayesian and classical estimation approaches.
10. What are the commonly used cost functions in Bayesian estimation? Also comment about the
class of estimators resulting from them.
11. Explain why Wiener filters are termed as linear Bayesian estimators. When are these filters
optimal?
12. What do you mean by the receiver operating characteristics (ROC)? Also explain the effect of
increasing deflection coefficient in ROC.
13. Define the Bayes risk ( ) for a multiple hypothesis testing problem and comment under
which condition it reduces to probability of error (Pe).
14. What do you mean by the composite hypothesis testing problem? Contrast between Bayesian
and generalized likelihood ratio test (GLRT) based approaches for the same.
15. What is the fundamental difference between classical NP-detector and sequential detector?
16. Show how the matched filter maximizes the signal-to-noise ratio for the detection of known
signal.
17. What do you mean by the generalized matched filter? In which cases are these filters
employed?
18. The estimator-correlator detector is used for the detection of random signal in white Gaussian
noise. Explain why can‟t we use matched filter for the same purpose.
19. Explain the need of signal design in case of the signal detection in coloured noise.
20. What do you mean by the non-parametric detection? Discuss the sign detector.
12.2 Multiple Choice Questions
1. For DC level A in WGN, x[n] = A + w[n], n = 0, 1,…,N - 1, the sample mean
is the efficient estimator for A. For estimating A², the will be
A. unbiased and efficient estimator
B. unbiased but not efficient estimator
C. unbiased and minimum variance estimator
D. biased estimator
2. In classical estimation theory, it is not always possible to find the CRLB bound because
A. the MVU estimator does not exists.
B. the sufficient statistics exist but are not complete.
C. the Bayesian estimator may result in better performance.
D. the 1st-order derivative of likelihood function is not defined everywhere.
3. For N observations, the Fisher information is I(θ) = N i (θ) where i (θ) being the Fisher
information of each of the observations. This is true when the observations are
A. independent and identically distributed
B. completely dependent and identically distributed
C. both A and B
D. neither A nor B

4. In vector parameter estimation case, the Fisher information matrix turning out to be diagonal
implies that
A. all unknown parameters are correlated.
B. none of the unknown parameters is correlated with the other.
C. some of the unknown parameters are correlated while others are uncorrelated.
D. nothing can be said about the correlation among unknown parameters.
5. For the sufficient statistic T(x) to yield in an MVUE which one of the following conditions
must be true
A. T(x) must be complete.
B. T(x) must be unbiased.
C. Both A and B.
D. Either A or B.
6. According Neyman-Fisher factorization theorem, the sufficient statistic T(x) for a estimation
problem can found if the PDF of the data p(x; θ) can be factorized as
A. p(x; θ) = g(h(x),θ) T(x)
B. p(x; θ) = g(T(x),θ) h(x)
C. p(x; θ) = g(T(x),h(x)) θ
D. p(x; θ) = g(θ) T(x) h(x)
7. In OOK communication system, the observed data is given as x[n] = A cos(2πƒ1n)+w[n], n =
0, 1,…,N -1 where w[n] is white noise with zero mean and variance ζ². The best linear
unbiased estimator (BLUE) for A is given by
A.
B.
C.
D.
8. The maximum likelihood procedure yields an estimator that is asymptotically efficient, but
A. sometimes it also yields an efficient estimator for finite data records
B. it never yields an efficient estimator for finite data records
C. it yields MVU estimator for finite data records and not the efficient estimator
D. none of the above
9. Given the observations {X1,X2,…,XN} having Poisson distribution
[P(X = x) = ,(x = 0, 1,…)] with unknown parameter λ > 0. The maximum likelihood
estimate of λ be
A.
B. 1∕
C. ²
D. 1∕ ²
10. For an estimation problem, if an efficient estimator exists, then maximum likelihood
estimator
A. will always produce it.
B. will produce it in some cases only.
C. will never produce it.
11. For the least squares estimation which one of the following assumptions is true.
A. The data is assumed to have uniform distribution.
B. The data is assumed to have Gaussian distribution.
C. The data is assumed to be probabilistic with first two moments known.
D. No probabilistic assumption about the data is made.

12. Suppose that three measurements of signal s(k) = θ exp(k ∕2), where θ is the parameter to be
estimated, are given as y(1) = 1.5, y(2) = 3.0 and y(3) = 5.0. Find the least squares estimate
of θ
A. 0.6459
B. 0.795
C. 0.895
D. 0.995
13. The estimator that minimizes the Bayes risk for the “hit-or-miss” cost function is
A. mode of the posterior pdf
B. median of the posterior pdf
C. mean of the posterior pdf
14. The MAP estimator is usually easier to determine than the MMSE estimator since
A. it does not involve any differentiation
B. it does not involve any integration
C. it does not involve any prior PDF
D. it does not involve any maximization
15. Given the power spectral density of signal Pss(f) and that of noise Pww(f) = ζ2, the Wiener filter
frequency response H(f) for an infinite length non-causal filter is
A.
B.
C.
D.
16. For a DC level in WGN detection, assume that we wish to have PFA = 10-4 and
PD = 0.99. If the SNR is -30dB, the number of samples N required for detection is
A. 20, 465
B. 28, 646
C. 36, 546
D. 40, 486
17. Consider a binary hypothesis testing problem with the conditional probabilities of the
received data as
with hypotheses H0 and H1 being equilikely. Find the minimum probability of error
A. 0.2012
B. 0.3854
C. 0.4385
D. 0.5108
18. For binary hypothesis testing problem:
where c > 0, and [a,b] denote the uniform PDF in [a,b]. The condition for the perfect
detector (PFA = 0, PD = 1) is
A. c < 1∕2
B. c < 1
C. c > 1∕2
D. c > 1
19. Consider an M = 2 pulse amplitude modulation (PAM) scheme
and subjected to average energy constraint. To have minimum probability of error Pe, the best
choice for signal amplitudes A0 and A1 is
A. A0 = A1
B. A0 =
C. A0 = -
D. A0 = -A1
20. For linear model, x = Hθ + w, where s = Hθ. Which one of the following is correct?
A. ||s||2 + ||x - ||2 = ||x||2
B. ||s||2 + ||x - ||2 > ||x||2
C. ||s||2 + ||x - ||2 < ||x||2
D. ||x||2 + ||x - ||2 = ||s||2
21. The minimum Bayes risk for a binary hypothesis testing problem with costs C00 = 1, C11=
2, C10 = 2, and C01 = 4 is given by
where π0 is the prior probability of hypothesis H0. Find the values of the minimax risk and the
least favorable prior probability
A. (πL) = 1; πL = 1
B. (πL) = 0; πL = 1
C. (πL) = 2; πL = 0
D. (πL) = 2; πL = 0
22. Consider the PDFs for 0 and 0 given as:
If P( 0) = γP( 1), then the Bayesian detector for deciding 1 is
A.
B.
C.
D.
23. Consider the detection problem:
where s[n] = A cos(2πƒ0n + ϕ) is the signal and w[n] the noise distributed as
w[n] ~ (0,ζ2). For estimating the amplitude of the signal A, if the detection statistics is
denoted by T(x), then deflection coefficient defined as . The

expression for d² is
A.
B.
C.
D.
24. The parameter which does not affect the performance of a NP-detector for detecting a
deterministic signal in white Gaussian noise is
A. Signal energy
B. Signal shape
C. Noise energy
D. Probability of false alarm
Answers to Multiple Choice Questions
1. (D) 2. (D) 3. (B) 4. (B) 5. (C) 6. (B) 7. (D) 8. (A)
9. (A) 10. (A) 11. (D) 12. (D) 13. (A) 14. (B) 15. (A) 16. (C)
17. (C) 18. (A) 19. (D) 20. (A) 21. (D) 22. (C) 23. (C) 24. (B)

Detect & Estimate Signals

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Detect & Estimate Signals

Hochgeladen von

Copyright:

Verfügbare Formate

DETECTION

1 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

3 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Figure 1.1: Estimation model

1.1.2 Formulation of the Detection Problem

Figure 1.2: DC signal in noise detection problem

5 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

At times it is convenient to assign prior probabilities to the possible occurrences

Level 3: Random signals in noise 1. Digital communication over scatter link

Table 1.2: Hierarchy of signal detection problems

6 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Module 8 : Detection Theory

 Simple hypothesis testing

7 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Type I error) is computed as:

8 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

where the function Q(.) is defined as .

Lecture 25 : Gauss-Gauss Detection Problem

where the deflection coefficient .

10 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

decides 0 when the outcome is tail, then

11 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Thus, the NP detector decides 1 if,

Now, the probability of detection can be give as

Lecture 26 : Bayesian Detector

where P( i| j) is the conditional probability that indicates the probability of detecting

where it is assumed that (C10 > C00), (C01 > C11).

12 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Alternatively we decide 1 if:

But from the Bayes rule, we have:

This is called the maximum likelihood (ML) detector.

Taking logarithm and on simplifying yields:

The probability of error decreases monotonically with which, of course, is the

13 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Above relationships use the fact that .

relation we can now obtain the Bayes‟s risk function in terms of P1 as

When P1 = 1, then the threshold is 0. We always decide 0 and

Figure 8.5: Variation of Bayes‟ risk as a function of P1

14 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

which happens to be same as the average probability of error.

For uniform costs, the minimax decision rule is

15 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Now let be the average cost of deciding . Then the expected

Bayes risk can be written as

16 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

This is the M-ary maximum likelihood (ML) decision rule.

where A0 = 1, A1 = 2 and A2 = 3. To maximize p(y| ) we can equivalently minimize

17 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Lecture 29 : Bayesian Composite Hypothesis Testing

completely specified. For example, if we wish to detect a DC level with an unknown

18 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

whereas the PDF under 0 is completely known.

on completing the square in A, we have

19 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Lecture 30 : Generalized Likelihood Ratio Test

Thus the GLRT decides 1 if

The MLE of A is found by maximizing

and that is well known to be Â = . Thus,

On taking logarithm of both sides, we have

8.8.3 Summary of Parametric Detectors

21 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

where (.) denotes a unit step function

Performance Analysis of Sign Detector