Identification and Estimation

Identification and Estimation
identification of the model structure (find a suitable class

of models)
design of experiment; selecting input and output signals
parameter estimation (estimation of the parameter
values in the chosen model) θ
y
model validation process
u + y-ym
θˆ -
ym
model
T
2
min J (θ ,θˆ) = min ˆ
{ ∫  y (t ,θ ) − yˆ (t ,θ )  dt
θˆ 0
Identification (cont.)
Model structures :
Regression models,
General (SISO) models,
State models,
”Black-box-models” (e.g. impulse response models like
the residence time distribution, neural net models, any
input-output model can be considered in this class)
Identification (cont.)
Input signal:
The estimation result depends crucially on the
characteristics of the input signal
- Convergence of the estimate
- The signal must be rich enough the wake the
dynamics (”persistently exciting”)
If the model structure is too simple, changes in the
output are explaned by parameter variations, not good.
A too complex model does not usually improve the
input-output prediction much.
Least Squares Estimation
The model
y (t ) = ϕ1 (t )θ1 + ϕ 2 (t )θ 2 + L + ϕ n (t )θ n = ϕ (t )T θ
is linear in the parameters. An estimation problem can

be formulated as an optimisation problem, which is
analytically solvable.
T T
θ{ (t ) = [ϕ1 (t ) ϕ 2 (t ) L ϕ n (t ) ] θ{ = [θ1 θ 2 L θ n ]
n×1 n×1
regressors parameters
Least Squares (cont.)
The process experiment gives the observations
{( y(i), ϕ (i) ) , i = 1, 2,K , t}

Use the following notations
Y (t ) = [ y (1) y (2) L y (t ) ]
T
Residual = estimation error:
T
E (t ) = [ε (1) ε (2) L ε (t ) ] ε (i ) = y (i ) − yˆ (i ) = y (i ) − ϕ T (i )θ
ϕ T (1) 
 T  −1
 ϕ (2)   t

P(t ) = Φ (t )Φ (t )  =  ∑ ϕ (i )ϕ (i ) 
−1
Φ= T T
 M 
   i =1 
ϕ (t ) 
T
Gauss: ”minimize the sum of squares of the estimation error”

1 t 1 t
1 T 1
V (θ , t ) = ∑ ε (i ) = ∑  y (i ) − ϕ (i )θ  = E E = E
2 T 2 2
2 i =1 2 i =1 2 2
in which
E = Y − Yˆ = Y − Φθ
Solution:
T
2 V (θ , t ) = E E = (Y − Φθ ) (Y − Φθ )
T
= Y T Y − Y T Φθ − θ T ΦT Y + θ T ΦT Φθ = V1 (θ , t )
T
But θ Φ Y = (θ Φ Y ) = Y T Φθ
T T T T
(scalar)
Note that for a square matrix A and vector x

∂ ∂ T
∂x
( Ax ) = A;
∂x
( x Ax ) = x T
( A + AT
)
in which the gradient with respect to x is considered to be
a row vector.
Now, search for the minimum

∂V1 (θ , t )
= −Y T Φ − Y T Φ + θ T ( ΦT Φ + ΦT Φ ) − 2Y T Φ + 2θ T ΦT Φ = 0
∂θ
T T T
which gives θ Φ Φ =Y Φ
and by taking the transpose
T T
Φ Φθ = Φ Y (normal equations)
If ΦT Φ is non-singular, a unique solution exists. It is

−1
θ = θ = (Φ Φ ) Φ Y
ˆ T T
The solution is a minimum, because the Hessian matrix
∂ 2V1 (θ , t ) T is always positive definite or positive

= 2 Φ Φ
∂θ 2
demidefinite.
(Note: a square matrix A is positive definite, if for all

non-zero vectors x
T
T
x Ax > 0 ;positive semidefinite if x Ax ≥ 0
Negative (semi)definitess is defined accordingly. A is
positive definite, iff the eigenvalues of AT A are positive.)
Note that the solution can be written in the form

−1
 t
  t
 t
θˆ(t ) =  ∑ ϕ (i )ϕ (i )   ∑ ϕ (i ) y (i )  = P(t )∑ ϕ (i ) y (i )
T
 i =1   i =1  i =1
The condition that ΦT Φ is non-singular, is called the

excitation condition. Note that the dimension of the matrix
is n x n, in which n is the number of parameters to be
estimated.
Exercise: Prove that if A is a real n x p – dimensional matrix

and x a p – dimensional column vector, it holds
AT Ax = 0 ⇔ Ax = 0
Prove further that if rank (A) = p, then AT A

is non-singular.
Prove that AT A is non-singular, iff the columns of matrix
A are linearly independent.
Recursive Least Squares (RLS)
In on-line identification the algorithms must run continuously

as new measurement data is flowing in. Two points are of
interest:
- how to develop a recursive form of the least squares
estimation algorithm?
- how to give more weight to the ”new” data?
Let us first try to write the least squares algorithm in a
recursive form.
Recursive Least Squares (cont.)
−1
 t

P (t ) = Φ (t )Φ (t )  =  ∑ ϕ (i )ϕ (i ) 
T −1 T
 i =1 
which gives easily
P (t ) −1 = P(t − 1) −1 + ϕ (t )ϕ T (t ) and
t
 t −1

θ (t ) = P(t )∑ ϕ (i ) y (i) = P(t )  ∑ ϕ (i ) y (i) + ϕ (t ) y (t ) 
ˆ
i =1  i =1 
Using the formula of the estimate and then the expression

of P(t ) −1 gives
t −1
∑ ϕ
i =1
(i ) y (i ) = P (t − 1) −1 ˆ
θ (t − 1) = P (t ) −1 ˆ
θ (t − 1) − ϕ (t )ϕ T
(t )θˆ(t − 1)
and θˆ(t ) = θˆ(t − 1) − P(t )ϕ (t )ϕ T (t )θˆ(t − 1) + P(t )ϕ (t ) y (t )

= θˆ(t − 1) + P(t )ϕ (t )  y (t ) − ϕ T (t )θˆ(t − 1) 
= θˆ(t − 1) + K (t )ε (t )
where K (t ) = P(t )ϕ (t )
ε (t ) = y (t ) − ϕ T (t )θˆ(t )
The residual ε(t) can be interpreted as the prediction error

of y(t), (one step predictor), based on the old estimate
θˆ(t − 1)
The matrix inversion lemma: Let A, C and C-1+DA-1B be

non-singular matrices of appropriate dimensions. Then
A+BCD is non-singular and
−1
( A + BCD )
−1
= A − −1
A B (−1
C + DA−1
B ) DA −1
−1
Proof: Multiplying by A+BCD from the left gives

 −1 −1 −1 
I = ( A + BDC ) A − A B D + CA B CA−1  =

−1 −1

( )
(
I −B D −1 −1
+ CA B )
−1
CA −1 −1
+ BDCA − BDCA B D −1
( −1 −1
+ CA B )
−1
CA−1 =
−1
(
I + BDCA − B I + DCA B D −1
)( −1
+ CA B−1
)−1
CA−1 =
−1
I + BDCA − BD D ( −1
+ CA B D −1
)( −1
+ CA B−1
)
−1
CA−1 =
I + BDCA−1 − BDCA−1 =
I
Apply the inversion lemma to

−1 −1
P(t ) = Φ (t )Φ (t )  =  P(t − 1) + ϕ (t )ϕ (t ) 
T −1 T
which gives
−1
P(t ) = P(t − 1) − P(t − 1)ϕ (t )  I + ϕ (t ) P(t − 1)ϕ (t )  ϕ T (t ) P(t − 1)
T
Note that I =1 above (scalar)

It follows that
 ϕ T (t ) P(t − 1)ϕ (t ) 
K (t ) = P(t )ϕ (t ) = P(t − 1)ϕ (t ) 1 − T 
 1 + ϕ (t ) P(t − 1)ϕ (t ) 
1
= P (t − 1)ϕ (t )
1 + ϕ T (t ) P (t − 1)ϕ (t )
Collecting the results together gives the desired RLS

algorithm
θ{ˆ (t ) = θˆ(t − 1) + K (t )  y (t ) − ϕ T (t )θˆ(t − 1) 

n×1
P (t − 1)ϕ (t )
K (t ) = P(t )ϕ (t ) =
{ T
n×1 1 + ϕ (t ) P (t − 1)ϕ (t )
P(t − 1)ϕ (t )ϕ T (t ) P(t − 1)
{P (t ) = P (t − 1) − T
= 
 I − K (t )ϕ T
(t )  P(t − 1)
n× n 1 + ϕ (t ) P(t − 1)ϕ (t )
K(t) is used to explain, how to correct the previous estimate

based on new measurement data
RLS can be interpreted as the optimal state estimator

(Kalman filter) to the system
θ (t + 1) = θ (t )
y (t ) = ϕ T (t )θ (t ) + e(t )
where e is white noise. The RLS can be interpreted in

the stochastic framework. Also, in a geometric sense, it
is a consequence of the projection theorem.
Note that P(t) is defined only when ΦT (t )Φ (t ) is

non-singular. Choose the initial value at a time instant,
when this holds
−1
P(t0 ) = Φ (t0 )Φ (t0 ) 
T
θˆ(t0 ) = P(t0 )ΦT (t0 )Y (t0 )
recursion for t > t0

Choose a positive definite matrix P0 and
P(0) = P0
−1
P(t ) =  P + Φ (t )Φ (t ) 
0
−1 T
Choose P0 ”large”. Interpretation in the stochastic

setting: set the covariance of the parameter estimates
large in the beginning.
The second important question was: how to weight more

new data and forget the old history. For example: if the
parameters change, the old values cause problems as time
goes by.
Solution: Use a forgetting factor. The parameter values
are assumed to change slowly with respect to the dynamics
of the new estimator. If the estimator is tuned to be too
fast, that can cause severe robustness problems;
the estimator may exhibit oscillations.
New cost function:
1 t t −i
V (θ , t ) = ∑ λ  y (i ) − ϕ (i )θ 
T 2
2 i =1
where the forgetting factor λ is between 0 and 1; usually

λ ∈ [ 0.95 1]
The new RLS estimator becomes
θˆ(t ) = θˆ(t − 1) + K (t )  y (t ) − ϕ T (t )θˆ(t − 1) 

P(t − 1)ϕ (t )
K (t ) = P(t )ϕ (t ) =
λ + ϕ T (t ) P(t − 1)ϕ (t )
P(t ) =  I − K (t )ϕ T (t )  P(t − 1) / λ
Problem: If K (t ) = P(t )ϕ (t ) = 0 then P(t) grows.

This is called the estimator windup. There exist methods
e.g. constant trace algorithms to deal with the problem.
Identification in Closed Loop
Consider a forward path system
y (t ) = ay (t − 1) + bu (t − 1) + e(t )
with proportional regulator
u (t ) = gy (t )
Identification in Closed Loop
e(t)
1
1 − az −1
+
bz −1 +
1 − az −1
u(t) y(t)
By substituting the controller equation into the system

equation we obtain either one of the following equations
y (t ) = (a + gb) y (t − 1) + e(t )
y (t ) = (a / g + b)u (t − 1) + e(t )
i. It is not possible to identify both parameters a, b.

We can only estimate a+gb.
ii. There are two equivalent low order models driven by the
same white noise.
Identifialibility can be improved by :
i. adding an independent signal (’dither’) into the

feedback loop,
ii. adding delay in feedback,
iii. using a time-variable or non-linear feedback.

Simplified algorithms
The save the calculation effort the updating of P can be

avoided by introducing simplified estimation algorithms.
These are based on different ideas, e.g. the geometrical
interpretation, and usually lead to slower convergence rates.
Example: Kaczmarz’s algorithm:
γϕ (t )
θˆ(t ) = θˆ(t − 1) + T
ϕ (t )ϕ (t )
(
y (t ) − ϕ T
(t )θˆ(t − 1) )
To avoid the potential problem of division by zero leads to
γϕ (t )
θˆ(t ) = θˆ(t − 1) +
α + ϕ T (t )ϕ (t )
(
y (t ) − ϕ T
(t )θˆ(t − 1) )
where α ≥ 0 and 0 < γ < 2
In the stochastic framework the stochastic approximation

(SA) algorithm and its simplified version, the least mean
square (LMS) algorithm, are obtained
SA:
(
θˆ(t ) = θˆ(t − 1) + P(t )ϕ (t ) y (t ) − ϕ T (t )θˆ(t − 1) )
−1
 t

where P(t ) =  ∑ ϕ (i )ϕ (i) 
T
is a scalar.
 i =1 
LMS: ˆ ˆ (
θ (t ) = θ (t − 1) + γϕ (t ) y (t ) − ϕ (t )θˆ(t − 1)
T
)
where γ is a constant.
Continuous-Time Models
The least squares algorithm can also be formulated and

solved in the case of continuous time systems.
Model: y (t ) = ϕ T (t )θ
t
Criterion: V (θ ) = ∫ e −α ( t −τ )
(y(τ ) − ϕ T
)
2
(τ )θ dτ
0
where α corresponds to the forgetting factor.

The solution is expressed by the normal equation
 t −α (t −τ )  t
∫e ϕ (τ )ϕ T
(τ ) dτ θˆ(t ) = ∫ e −α (t −τ )ϕ (τ ) y (τ )dτ
 
0  0
The estimated is unique, if the matrix

t
R(t ) = ∫ e −α (t −τ )ϕ (τ )ϕ T (τ )dτ is invertible.
0
The solution can be formulated as the algorithm

dθˆ(t )
= P(t )ϕ (t )e(t )
dt
e(t ) = y (t ) − ϕ T (t )θˆ(t )
dP (t )
= αP(t ) − P(t )ϕ (t )ϕ T (t ) P (t )
dt
where P (t ) = R (t ) −1

Identification and Estimation

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Identification and Estimation

Hochgeladen von

Copyright:

Verfügbare Formate

Identification and Estimation

 identification of the model structure (find a suitable class

is linear in the parameters. An estimation problem can

The process experiment gives the observations

{( y(i), ϕ (i) ) , i = 1, 2,K , t}

Gauss: ”minimize the sum of squares of the estimation error”

Note that for a square matrix A and vector x

Now, search for the minimum

and by taking the transpose

If ΦT Φ is non-singular, a unique solution exists. It is

The solution is a minimum, because the Hessian matrix

∂ 2V1 (θ , t ) T is always positive definite or positive

(Note: a square matrix A is positive definite, if for all

Note that the solution can be written in the form

The condition that ΦT Φ is non-singular, is called the

Exercise: Prove that if A is a real n x p – dimensional matrix

Prove further that if rank (A) = p, then AT A

In on-line identification the algorithms must run continuously

which gives easily

Using the formula of the estimate and then the expression

and θˆ(t ) = θˆ(t − 1) − P(t )ϕ (t )ϕ T (t )θˆ(t − 1) + P(t )ϕ (t ) y (t )

The residual ε(t) can be interpreted as the prediction error

The matrix inversion lemma: Let A, C and C-1+DA-1B be

Proof: Multiplying by A+BCD from the left gives

Apply the inversion lemma to

Note that I =1 above (scalar)

Collecting the results together gives the desired RLS

θ{ˆ (t ) = θˆ(t − 1) + K (t )  y (t ) − ϕ T (t )θˆ(t − 1) 

K(t) is used to explain, how to correct the previous estimate

RLS can be interpreted as the optimal state estimator

where e is white noise. The RLS can be interpreted in

Note that P(t) is defined only when ΦT (t )Φ (t ) is

θˆ(t0 ) = P(t0 )ΦT (t0 )Y (t0 )

recursion for t > t0

Choose a positive definite matrix P0 and

Choose P0 ”large”. Interpretation in the stochastic

The second important question was: how to weight more

New cost function:

where the forgetting factor λ is between 0 and 1; usually

The new RLS estimator becomes

θˆ(t ) = θˆ(t − 1) + K (t )  y (t ) − ϕ T (t )θˆ(t − 1) 

Problem: If K (t ) = P(t )ϕ (t ) = 0 then P(t) grows.

Consider a forward path system

with proportional regulator

By substituting the controller equation into the system

i. It is not possible to identify both parameters a, b.

i. adding an independent signal (’dither’) into the

iii. using a time-variable or non-linear feedback.

The save the calculation effort the updating of P can be

To avoid the potential problem of division by zero leads to

In the stochastic framework the stochastic approximation

The least squares algorithm can also be formulated and

where α corresponds to the forgetting factor.

The solution is expressed by the normal equation

The estimated is unique, if the matrix

The solution can be formulated as the algorithm

Das könnte Ihnen auch gefallen

identification of the model structure (find a suitable class