You are on page 1of 23

Estimation: Bias, Variance and Mean Square Error Let denote the thing that we are trying to estimate.

denote the result of an estimation based on Let one data set. Each data set used, will generate a different ) = E[ ] = difference between the true Bias, b( value and the average of all possible estimates.

Variance, of the estimates about the mean of all estimates.

2 Mean Square Error (m.s.e.) = E[ ( ) ] = b 2 +

2 =

E[ ( E [ ]) ] = measure of the spread


Estimation: Some definitions Estimate is consistent if, when we use more data to form the estimate, the mean square error is reduced. If we have two ways of estimating the same thing, we say that the estimator that leads to the smaller mean square error is more efficient than the other estimator.
= (a,b) b true bias xxx x xx x value x x x a estimates mean of all estimates

Examples we did in class

Bias and variance of an estimate of the mean. Bias of the estimate of the variance of a set of measurements: 1 N 2
N ) ( Xn

assuming that the measurements were independent random variables. Two methods of estimating Rxx() from T sec. of data.
1. Dividing by the integration time: T-| | Estimation was unbiased but had very high variance, particularly when was close to T. 2. Dividing by total time: T Estimation was biased (asymptotically unbiased). This was equivalent to multiplying Method 1 estimate by a triangular window (T-| |)/T. This window attenuated the high variance estimates.

n =1

Power Spectral Density Estimation

* lim X X Sxx ( f ) = E T T T T + = R ( ) e j 2 f d . xx

Estimation: 1. Could Fourier Transform the Autocorrelation function estimate (not computationally efficient). 2. Could use the frequency domain definition directly.
* ( f ) = XT XT Raw Estimate = S xx T

Extremely poor variance characteristics. We showed that variance was Sxx ( f ) by T, the length of data used.

, unaffected

Power Spectral Density Estimation (Continued)

Smoothed estimate from segment averaging. x(t) w(t) Tr
1. 2.


Break signal up into NSEG segments, Tr seconds long. For each segment:
1. Apply a window to smooth transition at ends of segments 2. Fourier Transform windowed segment XT(f) 3. Calculate a raw power spectral density: |XTr|2/Tr estimate


Average the results from each segment to get the smoothed estimate.

~ Sxx ( f ) =


NSEG i =1

(f ) S xx i

Power Spectral Density (PSD) Estimation (Continued)

We argued that the distribution of the smoothed PSD was related to that of a Chi-squared random variable ( 2 ) with = 2.NSEG degrees of freedom, if Tr was large enough so we could ignore bias errors.

~ 2.Nseg .S xx Variance Sxx


4.NSEG ~ = Variance S xx = 2 ( 2.NSEG ) 2 Sxx

and rearranging we showed that:

S xx 2 ~ Variance [ Sxx ] = NSEG

Therefore, we can control variance by averaging more segments. Note: shorter segments mean larger bias, so for a fixed T seconds of data, there is a trade-off between Segment Length (Tr), which controls the bias, and Number of Segments (NSEG), which controls the variance: T=Tr.NSEG.

Cross Spectral Density (CSD)

* lim XT Y T Sxy ( f ) = E T T + j 2 f = Rxy ( ) e d .

1. 2.


Could Fourier Transform the Cross-correlation function estimate (not computationally efficient). Could use the frequency domain definition directly. Raw Estimate =
* XT Y T Sxy ( f ) = T

1. 2. 3.

As with PSD,this has extremely poor variance characteristics, so

divide the time histories into segments, generate a raw estimate from each segment, and average to reduce variance and produce a smoothed estimate.

Cross Spectral Density Estimation: Segment Averaging x(t) w(t) Tr y(t) w(t) Tr time
* Tr ( f ) = XTrY S xyi
NSEG i =1


Fourier Transform of Windowed Segments XT(f) & YT(f). Raw Estimate from ith segment =
~ Smoothed Estimate = S xy ( f ) =



(f ) S xyi

Issues with Cross Spectral Density Estimates

1. Reduce bias by choosing the segment length (Tr) as large as possible. (Bias greatest where the phase changes rapidly.) Reduce variance by averaging many segments. Might require a large amount of averaging to reduce noise effects: y(t) = h(t)*x(t) + n(t) x,n uncorrelated, h-impulse response Sxy (f) = H(f) *Sxx (f) + Sxn (f) To make Sxn (f) 0 may require a lot of averaging if the signal to noise on the output [ SNRy= |H|2Sxx/Snn ] is low. 4. Time delays between x and y cause problems, if the time delay (to) is greater than a small fraction of the segment length (Tr). (See class notes.)

2. 3.

Coherence Function Estimation: Substitute in Smoothed Estimates of Spectral Densities

Coherence takes values in the range 0 to 1. 2 ~ 2 | Sxy | 2 | Sxy | 2 ~ Definition: xy = ; Estimate: xy = ~ ~ Sxx Syy Sxx Syy Note that substituting in raw spectral density estimates results in an estimate of 1. (See class notes for proof.) A result where the coherence = 1 at all frequencies from measured signals should be treated with a high degree of suspicion. Estimate highly sensitive to bias in spectral density estimates, particularly bad where the phase of the cross spectral density changes rapidly (at maxima and minima in |Sxy|). COHERENCE 0 because of: NOISE NONLINEARITY BIAS ERRORS IN ESTIMATION

Example: System with Some Nonlinearities (cubic stiffness) and Noisy Measurements
Nonlinearity causes spread of energy here, around 3x and 5x this frequency Nonlinear Mode

Nonlineary causes broad dips in coherence function. If you drive the system harder these regions become wider

Poor SNR on output causing this Dip due to Bias Errors

Example: Linear System with Noisy Measurements

High SNR; Tr = 512/fs

High SNR; Tr = 2048/fs

Low SNR on output; Tr = 512/fs

Dip filled in with noise

Less Averaging compared to N=512 case: fewer segments greater variance

Bias greatest where phase change is fastest

Dips mainly due to bias. and thus get smaller as resolution increases

SNRy also affecting coherence here

H1(f) and H2(f): Effects of Noise

If the system is linear and there is no noise:

H(f) = Sxy(f)/Sxx(f) = Syy(f)/Syx(f)

Case with Noise:

H1 = Sxmym/Sxmxm

Assuming here that spectral density estimation errors small (Tr and NSEG Large).

Noise on the input adversely affects this estimate of H. |H1| < |H|

= [Sxy(f)/Sxx(f)]/[1+ Snxn x /Sxx] = H(f)/[1+ Snxnx /Sxx]

H2 = Symym/S*xmym

Noise on the output adversely affects this estimate of H. |H2| > |H|

= [Syy(f)/S*xy(f)].[1+ Snyny /Syy] = H(f).[1+ Snyny /Syy]

Estimation of H
Note that, e.g.,
~ ~ E[H1] E[ S ]/E[ S xy xx ] ~ ~ E[H1] = E[ S xy / Sxx ]
Frequency response function estimates (both H1 and H2) are

extremely sensitive to bias errors which are worse at peaks and troughs.

Require large segment sizes to overcome bias, but this means less segments to average, thus higher variance.

Note: A low coherence function does not necessarily imply a poor frequency response function estimate. If the only reason the coherence function is low is noise on the response (input), then the H1 (H2) frequency response estimation should be accurate, provided sufficient averaging was done to reduce the variance of the estimates.

Some notes
1.Systems with feedback and noise may result in erroneous frequency response function estimates. n(t)



d and n uncorrelated

H(f) G(f)


The problem is caused by the noise n(t) affecting both the output and the input signals, and thus input and output noise is correlated.. you cannot set Sxn(f) to zero in this example. Sometimes, in these cases, we use a third signal, r(t), that is highly correlated to the noise-free output and uncorrelated with the noise. Estimate of H = Sry/Srx. A good choice for r(t) here would be d(t).

Some notes
2. Multi-input systems with correlated inputs may cause problems in frequency response function estimation.

n(t) + u(t) u, m and n uncorrelated +

x1(t) x2(t)

H(f) G(f) +



y=h(t)*x1(t)+g(t)*x2(t); x1(t) = u(t) + n(t); x2(t) = u(t) + m(t); Sx1y = H.Sx1x1 + G.Sx1x2 and you can show that Sx1x2 = Suu 0. So, Sx1y / Sx1x1 = H + G. (Suu /[Suu + Snn]) H Similarly: Sx2y / Sx2x2 = H (Suu /[Suu + Smm]) + G G
Can sometimes use PARTIAL COHERENCE techniques to overcome this problem.

Partial Spectral and Coherence

We just touched on this is in the class. Basically we try to sequentially remove the influence of correlated components in a series of signals.
If you are interested in more details, see Bendat and Piersol, Random Data: Analysis and Measurement Procedures, John Wiley.

Applied to the previous example, we would try to remove the effect of correlated inputs by modifying x2(t) and y(t), to generate x2 1(t) and y 1(t), and then examine: Sx2 1y 1/Sx2 1x2 1. x2 1(t) = x2(t) - l12(t)*x1(t), where L12(f) = Sx1x2/Sx1x1 y 1(t) = y(t) - l1y(t)*x1(t), where L1y(f) = Sx1y/Sx1x1 = [H + G.L12] For the previous example you can show that: Sx2 1y 1 = G [ Smm+ L12 Snn ], where L12 = Suu / [Suu + Snn] Sx2 1x2 1 = [Smm+ L12 Snn], Therefore Sx2 1y 1/Sx2 1x2 1 = G. We can also generate a partial coherence function from these modified signals: x2 x1,y x1 2 = |Sx2 1y 1|2/[Sx2 1x2 1 Sy 1y 1]

Sines + Noise
The power spectral density of a sinusoid is: A A ( f f1 ) + ( f + f1 ) 2 2 But by using windows Tr seconds long, the delta functions become sinc functions with maximum height affected by window size Tr. If Tr is too small the sinc functions will be buried in the noise. But as Tr is increased the sinc functions begin to emerge from the noise. So if you expect a peak in your spectrum is due to sinewave, increase the window size (freqency resolution) and see if the peak gets larger, as you would expect if it were truly a sine wave.
2 2

Sines in noise
sinc function emerging from noise as Tr increases.

PSD - V2/Hz

Tr = N N=4,096 N=8,192 N=16,384

Variation in estimated PSD due to lack of averaging. Tr larger NSEG smaller, larger variance.

Frequency - Hertz

Calibration of PSD in MatLab

The units should be (signal units)2/Hz. The PSD you see is one-half of a two-sided PSD. Even though you give the PSD program the sampling rate, the output is incorrectly scaled by a factor of fs. To get total power of the signal you should be able to integrate the estimated power spectral density and get the mean square value of the time history (Parsevals Theorem). However if you calculate

{ P ( 0 ) + P ( fs / 2 ) + 2.

( N / 2 ) 1 k =1

P ( k .fs / N ) } ( frequency

resolution )

where N is the number of points in a segment and P(f) is the estimate of Sxx(f), it is off by a factor of fs. You need to divide by fs to get the correct result.

Calibration (continued)
Power Spectral Density:
Recall that for fs/2<f<fs/2, XTr(f) Xk So: our raw PSD estimate generated from one segment should be:
(1/Tr).|XTr(f)|2 (1/(N)).2 |Xk|2 = | Xk|2/N Volts2/Hz.

Calibration Continued: Energy Spectral Density

We sometimes have segments that contain a single transient (tap testing of structures) and we average the raw spectra from each segment to remove noise effects. If we choose different Tr, i.e., allow a shorter or longer time between successive transients, (transient should have died away in the segment), the PSD will change because of the divide by Tr.


time - s

To overcome this problem we estimate an Energy Spectral Density (ESD) (remove the divide by Tr in the raw PSD estimate.) Raw ESD estimate = |XTr(f)|2 2 |Xk|2 (Volts/Hz) 2 [You also need to be careful with window choice here so as not to distort the transient.]

Calibration Continued: Power Spectrum Power Spectrum

Segment averaging is often applied to periodic signals that are corrupted with noise. As resolution increases the noise floor in the power spectrum decreases.
Recall: ck = Xk/N, if you synchronize, dont alias and there is no noise.

Raw PWR estimate = |Xk|2/N2 = Raw PSD estimate . (frequency resolution) = ( | Xk|2/N ) . (fs/N) Volts2 So effectively at each point in the PSD you have integrated the power spectral density over a bin of width fs/N Hertz.