Sie sind auf Seite 1von 16

4/CY/O8 Advanced System Identification Lecture 4

Data Pre-processing

When data have been collected from the identification


experiment, it is often necessary to do some processing
on the data set prior to using it for identification.

There are several possible deficiencies in the data that


should be attended to:

1. High-frequency disturbances in the data record, above


the frequencies of interest to the system dynamics.

2. Occasional outliers and missing data.

3. Drift and offset, low frequency disturbances.

Dr. V.M. Becerra 1


4/CY/O8 Advanced System Identification Lecture 4

Drifts and detrending

There are two different approaches to dealing with these


problems:

1. Removing the disturbances by explicit pre-treatment


of the data.

2. Letting the noise model take care of the disturbances.

The first approach involves removing trends and off-sets


by direct subtraction.

Dr. V.M. Becerra 2


4/CY/O8 Advanced System Identification Lecture 4

Signal offsets

There are several approaches to dealing with signal off-


sets or non-zero mean values:

1. Let y(t) and u(t) be deviations from a physical


equilibrium: y (t ) = y m (t ) − y (t ) , y (t ) = y m (t ) − y (t )
2. Subtract the mean values from the data.
1 N m 1 N m
y = ∑ y (t ) , u = ∑ u (t )
N t =1 N t =1
3. Estimate the offset explicitly. For example,

A(q −1 ) y m (t ) = B(q −1 )u m (t ) + α + v(t )

Dr. V.M. Becerra 3


4/CY/O8 Advanced System Identification Lecture 4

Drift, trends and seasonal variations

Methods to cope with other slow disturbances in the


data are analogous to the previous approaches for
dealing with offsets.

Drifts and trends can be seen as time-varying


equilibrium points, or time-varying mean values.

With some knowledge about the frequencies of the


slow variations, an alternative is to high-pass filter the
data:

y (t ) = F (q ) y m (t ), u (t ) = F (q )u m (t )

Dr. V.M. Becerra 4


4/CY/O8 Advanced System Identification Lecture 4

Outliers and missing data

In practice, data acquisition equipment is not perfect.


It may be that single values of the input-output data
are missing or corrupt due to malfunctioning of
sensors or communication links.

A practical method to deal with this problem is to


replace outliers or missing measurements by smoothed
estimates prior to parameter estimation.

Dr. V.M. Becerra 5


4/CY/O8 Advanced System Identification Lecture 4

Model Validation – Residual Analysis

Once a model has been identified, it is important to


validate the model using a data set that should be
independent of the data used to calculate the model
parameters.

The ‘leftovers’ of the modeling process - the part of the


data that the model could not reproduce- are the
residuals:

ε (t ) = ε (t ,θˆN ) = y (t ) − yˆ (t ,θˆN )

These residuals carry information about the quality of the


model.

When doing model validation, one typically computes


the Mean Square Error, the autocorrelation of the
residuals, and the cross-correlation between the residuals
and the input. The values of the autocorrelation and
cross-correlation should be small and lie within certain
confidence limits.

Dr. V.M. Becerra 6


4/CY/O8 Advanced System Identification Lecture 4

The Mean Square Error

This is the average of the squared error:

N
1
MSE =
N
∑ (t )
ε 2

t =1

This is a measure in a single positive number of how well


the model output fits the measured data.

Auto-correlation of the residuals.

As the residuals are assumed to be a white noise


sequence, it is good to check its auto-correlation:

N
1
Rε (τ ) =
N

N
∑ ε (t )ε (t − τ )
t =1

For different values of τ = 1, 2, 3, 4, …

If these numbers are not small for τ ≠ 0 then part of ε


could have been predicted from past data, and so this is a
sign of deficiency in the model.

Dr. V.M. Becerra 7


4/CY/O8 Advanced System Identification Lecture 4

Cross-correlation between the residuals and the input

Similarly, the residuals should not be correlated with the


input, so it is also good to check the cross-correlation of
the residuals and the input:

N
1
R (τ ) =
N
εu
N
∑ ε (t )u(t − τ )
t =1

If there are traces of past inputs in the residuals, then


there is a part of y(t) that originates from the past input
and that has not been properly picked up by the model.
Hence, the model could be improved.

Dr. V.M. Becerra 8


4/CY/O8 Advanced System Identification Lecture 4

Subspace Identification – An introduction

A linear system can always be represented in state space


form:

x(k +1) = A x(k) + Bu(k) + w(t)

y( k) = C x(k) + Du( k) + v( k)

where:

x is a n-dimensional state vector


u is a nu-dimensional input vector
y is a ny dimensional output vector
v is a ny-dimensional noise vector
w is a n-dimensional process noise vector

A, B, C and D are parameter matrices of the appropriate


dimensions.

Dr. V.M. Becerra 9


4/CY/O8 Advanced System Identification Lecture 4

The main idea behind subspace identification techniques


is that given the input-output data sequences u(t), y(t),
t=1…N, the state sequence x(t), t=1…N, is estimated
first, and then the state space matrices A,B,C,D are found
using a least squares procedure.

Assume for a moment that not only y and u are


measured, but also the state vector x. Now, with known
u, y and x we can form a linear regression form the state
space model above:

LMx(t +1)OP L O
Y (t) = M ,P Θ = MM A B PP
MN y(t) PQ MNC D PQ
LMx(t)OP LMw(t)OP
Φ(t) = M P, E(t) = M
u(t
MN PQ ) MN v(t) PPQ

Then the state space model above may be written as:

Y (t) = Θ Φ(t) + E(t)

From this equation, the matrix elements in Θ can be


estimated by the simple least squares method.

Dr. V.M. Becerra 10


4/CY/O8 Advanced System Identification Lecture 4

How do we obtain the state sequence x(t), t=1..N from


the input-output data?

All state vectors that can be reconstructed from input-


output data are linear combinations of the n k-step ahead
output predictors (See Ljung 1999):

y(t + k|),
t k =1,...n

where n is the model order (the dimension of x)

We can form these predictors and select an algebraic


basis among its components.

LM y(t+1|)t OP
x(t) = L
MMy(t+#n|)t PP
N Q
The choice of L will determine the basis of the state
space realization. The predictor y(t + k|)
t is a linear
function of u(s), y(s), s=1,…t

The method is called subspace identification, because it


is based on subspace projections – a concept from linear
algebra.

A well known algorithm for subspace identification is


the so called N4SID, originally developed by Peter Van
Overschee at the University of Leuven, Belgium.

Dr. V.M. Becerra 11


4/CY/O8 Advanced System Identification Lecture 4

EXAMPLE

An identification experiment has been carried out on a


two tank mixing process. This process was located at the
Control Engineering Centre Laboratory at City
University, London, and a schematic diagram is given in
Figure 1.

ucold uhot

T1
L1 L2
T2

Tank 1 Tank 2

Figure 1: The two tank mixing process


The process has two manipulated inputs, which are the
openings (in percentage) of the cold (ucold) and hot
water valves (uhot).

Dr. V.M. Becerra 12


4/CY/O8 Advanced System Identification Lecture 4

The four measured variables are the levels in tanks 1 and


2, L1 and L2 (in cm), and the temperatures in tanks 1 and
2, T1 and T2 (in degrees C).

The experiment was carried out in open loop by


manually specifying the values of the valve openings.

The input sequences to the process were chosen to be


binary pseudo-random signals shifting between 10% and
40% for the cold water valve, and between 20% and 40%
for the hot water valve.

The sampling time used was 20s. The data set consists
of 190 samples.

Dr. V.M. Becerra 13


4/CY/O8 Advanced System Identification Lecture 4

MATLAB code to estimate a state space model for the level in


Tank 1 (uses the System Identification Toolbox):
load ex090398;
mix =iddata([L1],[ucold uhot]);
mixd = detrend(mix,’constant’);
mixe = mixd([1:95],:);
mixv = mixd([96:190],:)
ssmodel = n4sid(mixe,2);
compare(mixv, ssmodel);
resid(mixv, ssmodel);
present(ssmodel)
40

35

30
L (cm)

25
1

20

15

10
0 20 40 60 80 100 120 140 160 180 200
s a m p le

50

40
ucold (%)

30

20

10

0
0 20 40 60 80 100 120 140 160 180 200
s a m p le
50

40
uhot (%)

30

20

10

0
0 20 40 60 80 100 120 140 160 180 200
s a m p le

Dr. V.M. Becerra 14


4/CY/O8 Advanced System Identification Lecture 4

y1. (sim)

mixv; measured
ssmodel; fit: 93.25%

10

5
1
y

-5

-10

100 110 120 130 140 150 160 170 180 190

Correlation function of residuals. Output y1


1

0.5

-0.5
0 5 10 15 20 25
lag
Cross corr. function between input u1 and residuals from output y1
0.4

0.2

-0.2

-0.4
-25 -20 -15 -10 -5 0 5 10 15 20 25
lag

Dr. V.M. Becerra 15


4/CY/O8 Advanced System Identification Lecture 4

These are the model parameters returned by N4SID:


A=
x1 x2
x1 0.90194 0.053728
x2 -0.48226 -0.15668

B=
u1 u2
x1 0.0027808 0.0010754
x2 0.0085512 -0.0001901

C=
x1 x2
y1 41.407 -2.2392

D=
u1 u2
y1 0 0

Dr. V.M. Becerra 16

Das könnte Ihnen auch gefallen