Sie sind auf Seite 1von 227

COVARIANCE AND GRAMIAN MATRICES

IN CONTROL AND SYSTEMS THEORY

KURUKULASURIYA VINCENZA FERNANDO

A thesis submitted for the degree of DOCTOR OF PHILOSOPHY

in the FACULTY OF ENGINEERING, the UNIVERSITY OF SHEFFIELD

Sep tember 1982

DEPARTMENT OF CONTROL ENGINEERING


I have fought a good fight, I have finished my course

2 Timothy 4.7
- i -

ACKNOWLEDGEMENTS

The author is grateful to his mentor, Professor H. Nicholson,

for his guidance. The author also wishes to acknowledge the

University of Sheffield for financial assistance in the form of

the Edgar AlIen Scholarship and the Linley Scholarship for doctoral

research. The manuscript was skilfully prepared by

l-lrs. Josephine Stubbs.


- ii -

Covariance and Gramian matrices in control and systems theory

K. V. FERNANOO

SUMMARY

Covariance and Gramian matrices in control and systems

theory and pattern recognition are studied in the context of

reduction of dimensionality and hence complexity of large-scale

systems. This is achieved by the removal of redundant or

'almost' redundant information contained in the covariance

and Grarrdan matrices. The Karhunen-Loeve expansion (principal

component analysis) and its extensions and the singular value

decomposition of matrices provide the framework for the work

presented in the thesis. The results given for linear dynamical

systems are based on controllability and observability Gramians

and some new developments in singular perturbational analysis

are also presented.


- iii -

CONTENTS

Acknowledgements i

Summary ii

Contents iii

Part 1 Exordium 1

Chapter 1 Exordium 2

Part 2 The Karhunen-~eve Expansion and Extensions 16


,
Chapter 2 The Karhunen-Loeve expansion with reference
to singular value decomposition and
separation of variables 17
,
Chapter 3 The discrete-double-sided Karhunen-Loeve
expansion 27

Chapter 4 Two-dimensional curve-fitting and prediction


using spectral analysis 49

Part 3 Singular Perturbational Model-Order Reduction


of Balanced Systems 70

Chapter 5 Singular perturbational model reduction of


continuous-time balanced systems 71

Chapter 6 Singular perturbational approximations for


discrete-time balanced systems 81

Chapter 7 On balanced model-order reduction of


discrete-time systems and their continuous-
time equivalents 92

Chapter 8 Reciprocal transformations in balanced


model-order reduction 102

Part 4 The Cross-Gramian Matrix W in Linear


co
Sy sterns Theo ry 116

Chapter 9 On the structure of balanced and other


principal representations of S150 systems 117

Chapter 10 Minimality of S1S0 linear systems 133

Chapter 11 On the Cauchy index of linear systems 139


Part 5 Measures for Quantification of Controllability,
and Observability and Input-Output Behaviour 149

Chapter 12 The degree of controllability due to individual


inputs 150

Chapter 13 The coherence between system inputs and outputs 158

Chapter 14 On discrimination of inputs in multi-input


systems 185

Part 6 Closure 191

Chapter 15 Closure 192

Part 7 Appendices 195

Appendix 1 The double-sided least-squares problem 196

Appendix 2 The double-sided least-squares problem with


diagonal constraints 203

Appendix 3 Singular perturbationa1 model reduction in


the frequency domain 208

Appendix 4 On the applicability of Routh approximations


and allied methods in model-order reduction 215
- 1 -

PART 1

Exordium
- 2 -

CHAPTER 1
Exordium

1. The prelude

The classical historian Edward Gibbon (1737-1794) noted in his

autobiography that independence is the first of the earthly blessings.

In an earthly subject such as mathematics, especially in linear

analysis, 'independence' is often the first concept which has to be

comprehended.

The independence in linear analysis (see any standard text on


7
linear algebra or matrix theory, M1rsky6, Gantmacher ) can be defined

as follows. If xi , 1. - 1 ,m denotes a set 0 f rea 1 n-vectors, t hen

this set {xi} is said to be linearly dependent if there are scalar

values a. , i
1
= l,m (not all zeros) such that

m i
L
i=l
a.x
1
= 0 (1)

In the contrary case, that is equation (1) implies that a. • 0 for


1

all i, then the set {xi} is said to be independent.

One of the important concepts associated with linear independence

is rank. Let {x(i)} denote all possible permutations of indexing of

the vectors xi , 1
. · 1 ,m. The rank is then defined as the largest

integer r such that

(i)
a.x
1
o

for non-zero a., i - l,r. If r is equal to m, then the set {xi}


1

is said to be full rank. If the null vector does not belong to

this set, then

1 < r < n
- 3 -

According to the definition, verification of linear independence

requires an uncountable infinite number of tests for all possible

values of a., i
1
= I,m which is obviously impractical. Fortunately,

there is a simple test for verification of independence which is due


l
to J. P. Gram (1850-1916). Gram's monumental contribution was

submitted in the year 1881 and was published in 1883, and thus we

are in the midst of important centennial anniversaries.

2. The Gramian matrix


i
If the vector set {x } is considered as column vectors, then a

matrix of n,m dimensions can be formed as follows.

2
The matrix product defined by G = XTX (X T is the transpose of X)

is known as the Gram matrix or the GraDdan, and its determinant as


2
the Gram deterDdnant. The Gramian G can also be written in the

format,

The following may be considered as the fundamental result in linear

algebra.

Theorem: The Gram determinant is zero if and only if the set {xi}

is linearly dependent.

Interesting variations of this result are available in most

texts on linear algebra and matrix theory. Since the deterDdnant

of a matrix is given by the product of its eigenvalues, then the

linear independence is guaranteed if all the eigenvalues of the Gramian


2
are non-zero. Equivalently, if the GraDdan G is positive definite,
- 4 -

2
then the set {xi} is linearly independent. The notation G is

used to show the non-negative definiteness of the Gramian. This

notation is used in this thesis whenever it is required to emphasise

this property.
2
If the spectral expansion of the Gramian G is written in the

form,

2 2
I = diag (d
l
, ••• ,d
m
)

where U is an orthonormal matrix, I is the identity matrix and D2

is a diagonal matrix, then the Gram determinant is given by

m
det G
2
- IT
i-l

If the determinant is 'small' due to small eigenvalues, then this


i
determinant is 'almost' zero showing that the set {x }, although

theoretically independent, is almost near to degeneration to a set

of rank less than m. This theme is present throughout the thesis

where almost dependent subspaces are removed to simplify theoretical

or computational problems.

Similarly, as in the problem of determining the linear

independence of column vectors, if {yi} is a set of row vectors,

then a matrix Y can be formed as


1
Y
-2-
Y - Y

m
y

The Gram matrix ~n this case can be defined as

-2
G =
- 5 -

This Gramian a2 can be used in exposing the linear independence


i
of the row vector set {y }. Gram matrices of the form, xTx and

XX T appear in this thesis, especially in connection with the singular

value decomposition of the matrix X (see section 4).

3. Covariance matrices

If M2 is a positive definite matrix of n,n dimensions and a is

an n-vector, then a function p(x) can be defined as

p(x) :0:
1
----;:---- e
-! (X-a) TM-2 (X-a) (2)
(2'11')n/2 det M

which has the property

f p(x)dx = 1 (3)

where dx is the infinitesimal element

dx = ~l •••• dxn' x =

If x is a random vector with the probability density function

p(x) then {x} is called a Gaussian (or normal) process 8 •

The following first- and second-order moments can be easily

verified,

E{x} - a
T
E { (x -a)( x-a) }

where E{·} denotes the expectation operator. The vector ().. is

called the 'average value' and the matrix M2 the covariance matrix

of the process.

If the spectral decomposition of the matrix M2 is of the form

= = I
-
- 6 -

then equation (2) can be written in the form


2 2
p(y)
n
IT .. I
2
e
-l( Y • -8.) Id.
1. 1. 1.
i=l (21f) I d.
1.

where y
-[~:l = UTx S =
[~:l
.. lla

2
If d. is altmst zero then
1.

2 2
-(y.-8.) Id.
1
e 1. 1. 1. ~ except near y. - 8.
0
(21f) I d. 2 l. l.
1.

Thus, as in the case of Gramian matrices, we may ignore y. due to


1.
'statistical' dependence. Such, removal of dependent data is a

recurrent theme in this thesis. We also observe that the validity

of equation (3) can be verified using this canonical transformation


T
y a U X.

If there are a large number of observations of the process,

then the first- and second-order moments can be estimated in the

following manner

a . Limit X.
1.
m--

I ~l (x i -a) (xi -a) T


m i=l

.
We observe t h at t h e covar1.ance .2
matr1.X M .1.S a G ' .
r~an matr1.X

formed by the infinite set {xi-a}.


- 7 -

4. The singular value decomposition


T
We have already indicated the form x (h\2)x into its canonical

form. A general quadratic form xTpx where P is a symmetric matrix

can be simplified using the spectral decomposition

p ,. .. I D ..
T 2
Thus, x Px = = d.y.
1. 1.

T
where Y - U x.

If the rank of the matrix P is r, then n-r number of the diagonal

values of D will be zero.

A natural extension of the above simplification is concerned

w~ the bilinear form xTQw where x and ware n and m dimensional


vectors respectively and Q is an n,m dimensional matrix.

For the special case m - n, the problem was solved independently

by three celebreties in the theory of matrices namely, Beltrami 3


4 2
(1813), Jordan (1874), and Sylvester (1889).

If the spectral decomposition of the Gramian matrices (which


2
share the same eigenvalue set D ) are written in the format

- -
- I ...
then the singular value decomposition of Q is given by

Q -
where the diagonal matrix D can be chosen to have diagonal positive

values (called the singular values).


- 8 -

The bilinear form xTQw in its canonical form is given by


n
xTQw "" (UT.x) Tn(VTw) ... I d.y.z .
111
i-I

where Y = UTx z ... V z


T

However, the discovery of the singular value decomposition should


5
be attributed to Jacobi , one of the founders of matrix theory. In

the year 1832, that is exactly 150 years ago, Jacobi derived this

decomposition for the special case n = 3, which was used in the

simplification of a double integral via the canonical form described

earlier. We observe that the generalization of the singular value

decomposition from the 3-dimensional to the n-dimensional case is

obvious to us although at the time of Jacobi higher dimensional

spaces above 3 were not generally considered as physically meaningful.

The generalization of the decomposition for rectangular matrices


9
is due to Eckart and young (1939). For the rectangular matrix

e of rank r, the decomposition is given byll

e OIl

where

[U IiiJ T (u IiiJ . I
n [VlvJT{vlv] ... I
m
UTU
- VTV ... I
r

ee
T ".
[UluJ [:2 :] luliil
T .. 2 T
un u

eTe . lv lvJ [D20 :] [vlvJI =


2 T
vn v

where n is a diagonal r,r dimensional matrix.


- 9 -

5. Computation of Gramians and the singular value decomposition

As shown in the previous section, the information present in

the Gramian matrices eTe and eaT are explicitly present in the

decomposition e = UDVT. Furthermore, the natural method of

computation of the singular value decomposition is to form the


.matr1ces
Gra~an . eTa an d ee T an dh
t en to compute t he spectra 1

decompositions of those matrices. However, this approach has

certain pitfalls if finite word length arithmetic is used in the

computation as the following example indicates.

Let a be the full rank matrix

1.0
a = [ 1.0

where E is a small real number such that the floating arithmetic

unit can distinguish between 1.0 and E. That is

1.0 ; fl(l.O + E)

where fl(.) denotes floating point operations.

However, we assume that the floating point arithmetic is


2
'blind' to values of E •
2
1.0 = £1(1.0 + E )

The Gramian matrix eaT is of the form

1.0
= [ 1.0

Thus, the Gramian ee T is internally represented in the computer as

the rank one matrix

1.0 1.0]
[ 1.0 1.0
- 10 -

if the above described floating point arithmetic unit is used,

implying that the matrix e is not full rank.

The above example clearly indicates that formation of Gramian

matrices should be avoided in rank determination and in the

computation of the singular value decomposition.

Fortunately, there is a computational scheme for the decomposition

which does not require the formation of the Gramian matrices which
lO
is due to Golub et al (1970) and which is based on an extended

QR algorithm. Thus, the linear independence/dependence and the

associated rank can be determined using the singular value decomposition

without formation of the Gramian matrices.

6. The objectives and the organization of the thesis

The main aim of this thesis is to study the appearance of

Gramian and canonical matrices in control and systems theory including

pattern recognition and signal processing. More specifically,

linearly dependent or 'almost' linearly dependent subspaces are

removed so that more attention can be given to the more 'robust'

linearly independent subspaces. Such removal of dependent data

reduces the dimensionality and hence the complexity of such systems

pertaining to that data. Thus data reduction is paramount in the

analysis of large-scale systems.

The thesis is in seven parts and this introductory chapter

forms Part 1.

Part 2 which consists of Chapters 2, 3 and 4 is about the

Karhunen-Loeve expansion/transform, which is fundamental in the

analysis of random processes. In Chapter 2, we study the


- 11 -

relationships between this expansion, the singular value

decomposition and the technique of separation of variables.

In Chapter 3, 2-dimensional data reduction is investigated in

the context of the singular value decomposition. Finally, in

Chapter 4, extrapolation or prediction of 'future' data using a

random expansion which is structurally similar to the Karhunen-Loeve

expansion, is described.

Part 3 is concerned with model-order reduction of linear

dynamical systems which is increasingly becoming more important

in the context of large-scale systems theory. The traditional

methods of model-order reduction are based on modal methods where

the slow-time behaviour and the fast-time behaviour as characterized

by the 'small' and 'large' eigenvalues respectively, of the system

matrix are the criteria for order reduction. The more modern

approach is to delete the least controllable and the least observable

parts of the system. This'is achieved by means of 'balancing'

transformations which transform the controllability and observability

Gramians into their canonical diagonal forms. The theme in this

part is to harmonize the singular perturbational approach with that

of the balanced reduction. In Chapters 5 and 6, respectively,

continuous-time and discrete-time systems are studied. In

Chapter 7, the inter-relationships between continuous-time and

discrete-time model-order reduction are investigated through the

Cayley transformation. Finally ~n Chapter 8, the combined

singular perturbational balanced method is exposed in the context

of reciprocal transformations.
- 12 -

Part 4 of the thesis which encompasses Chapters 9, 10 and 11

describes a new Gramian matrix called the cross-Gramian which has

properties of a cross-covariance matrix. This matrix denoted by

wco contains information pertaining to both controllability and

observability of linear single-input sing1e-output systems. In

Chapter 9, this matrix is studied in relation to balanced and other

principal representations of linear systems. The minima1ity or

joint controllability/observability is the subject of Chapter 10.

In Chapter 11, we demonstrate that the matrix W contains information


co
about the Cauchy index of the system.

Part 5 is concerned with the quantification of input and output

behaviour of linear systems based on properties of the Gramian

matrix. In Chapter 12, the use of a Mahalanobis distance measure

is discussed in relation to the degree of controllability. Due to

the well known duality between controllability and observability,

this naturally extends to observability as well. In Chapter 13,

measures for describing inter-relationships between inputs and

outputs are proposed. In Chapter 14,which is the final Chapter

in this part, a method based on the Mahalanobis distance is

described which can be used to discriminate system inputs or

outputs.

The concluding Chapter 15 forms Part 6 of the thesis and Part 7

contains the appendices.


- 13 -

7. Originality of the research and style of presentation

The research work presented in this thesis is based on

completely original work. All the Chapters and the four

appendices are best described as 'essays' since they can be

read and understood almost independently of other essays.

This style is considered as the best way to present the broad

range of research work undertaken in this thesis which ranges

from control and systems theory to image processing, pattern

recognition, time series analysis and signal processing.

The material in Chapters 2 to 6 and q to 13 and the Appendices

1 to 3 have been published or accepted for publication in the

journals of the Institution of Electrical Engineering (lEE)

London or the Institute of Electrical and Electronic Engineers

(IEEE) New York. Chapters 7, 8, and 14 and Appendix 4 have

been submitted for possible publication.


- 14 -

References

1. Gram, J.P.: 'Ueberdie Entwicke1ung reeller Functionen in

Reihen mitte1st der Methode der K1einsten Quadrate', Jurna1

reine angew. Mathematik, 1883, 94, pp.4l-73.

2. Sylvester, J.J.: 'Sue la r~duction biorthogonale d'une forme


• I' .'.
Ilneo-llnealre ' forme canonlque
a sa . , , Comptes Rendus Acad.

Sci. Paris, 1889, 108, pp.651-653.

3. Be1trami, E.: 'SuI1e funzioni bi1ineari', Giornale de Mat.

BattagIini, 1873, 11, pp. 98-106.

4. Jordan, C.: 'M~moire sur les formes bilin~aires', Journal

de Mathematiques Pures et Appliquees, 1874, series 2, 19,

pp.35-54.

5. Jacobi, C.G.J.: 'de transformatione integraIis duplicis

indefiniti

f apaljl
A + Beos. + Csin~+(A'+Bicos~+C'sin~)cos1jl+(Aii+B"cos~+C"sin~)sin1jl

,
in formam simp1iciorem
J anas
G-G'eosn cose - G"sinn sine

Crelle's Journal (Jurnal relne angew Mathematik), 1832, 8,

pp.253-279, and 321-357.

6. Mirsky, L.: 'An Introduction to Linear Algebra', Clarendon

Press, Oxford, 1955.

7. Gantmacher, F.R.: 'The Theory of Matrices', Vo1.1, Chelsea,

New York, 1959.

8. Miller, K.S.: 'An Introduction to Vector Stochastic Processes',

Krieger, New York, 1980.


- 15 -

9. Eckart, C., and Young, G.: 'A principal axis transformation

for non-Rermitian matrices', Bull. Amer. Math. Soc., 1939, 45,

118-121.

10. Golub, G.R., and Reinsch, C.: 'Singular value decomposition

and least-squares solutions', Numer. Math •• 1970, 14, pp.403-420.

11. Klema, V.C., and Laub, A.J.: 'The singular value decomposition:

Its computation and some applications', IEEE Trans. Automatic

Control, 1980. AC-25, (2), pp.164-l76.


- 16 -

PART 2

,
The Karhunen-Loeve Expansion and Extensions
- 17 -

~~R2

The Karhunen-Lo~ve expansion with reference to singular

value decomposition and separation of variables

Abstract The Karhunen-Lo~ve expansion for random processes, the

method of principal component analysis, the singular value

decomposition of rectangular matrices and the method of separation

of variables used in mathematical physics and functional analysis,

are shown to possess the same basic structure based on orthonorma1

basis functions or vectors and associated eigenprob1ems.

n
1. Introduction The Karhune:Lo~ve expansion is one of the

fundamental expansions used for describing random processes, and

has been used widely in control, estimation and information theory

and also in image processing and pattern recognition. The

continuous expansion is based on orthogona1 functions derived from

eigenfunction solutions of covariance functions.

The continuous form of the expansion is well known and the

associated optimal properties can be found in texts on probability


l
and communication theory and pattern recognition - 9 However, it

is difficult to obtain numerical solutions since it involves

eigenfunction problems defined by Fredho1m integrals.

The discrete form of the expansion leads to matrix eigenva1ue


. 8-11 23
problems which are well suited for digital computatlon '

The extension from the continuous to the discrete case has been
- 18 -

motivated by these numerical considerations and, unfortunately. the

full algebraic properties of the expansion have not been fully

investigated or utilized in the published literature. The discrete

case is also equivalent to the method of principal component analysis

used in mathematical statistics, which has had extensive applications


. . l'
1n the SOC1a SC1ences 12-14,23 •

The aim of this Chapter is to show that the discrete Karhunen-

Lo~ve expansion is algebraically equivalent to the singular value


. 15 16
decomposition of a rectangular matr1x ' • For the continuous

case, the expansion is equivalent to the classical technique of

separation of variables using orthonormal basis functions (Bernoulli's

separation method) which is a well recognized method of solution of


. 17 24
partial differential equations in phys1cs ' • A more formal

approach can be based on approximation theory, spectral theory and

generalized functions in a Hilbert space setting.

2. The singular value decomposition of a rectangular matrix 15 ,16

The singular value decomposition of an m,n dimensional matrix X is

given by

or x ..
1J - (l)

where U and V are orthonormal matrices and C is a diagonal matrix.

The rank of the matrix X is taken as r, and X~M ,C· diag{cl ••• c ),
m,n r
C.
K
>0, k - 1. .r, UeM
m,r
, VeM
n, r
, r < min{m,n}. Also
- 19 -

m
or r
k-1
~i~j .. t5 1J
•• (2)

or
n
r
k-1
vkivkj - t5 ••
1J
(3)

where 0 .. denotes the Kronecker delta function.


1J
The matrix U can be obtained as the eigenvector solution of the

problem defined by
m
SU == UC or r
k=l
s1'kQ 1'
k 1
- u •. c.
111 11
i1 - 1 •• r
(4)

where or (5)

The matrix V is similarly given by

RV .. VC or

(6)

n
where or r . ..
Jk
r
x.x.
1-1 1J 1J 1
(7)

The nonnegative matrices Sand R can also be written in the

dyadic format

or s .. . r
r (8)
111 k=l ~~i~i1

or r ..
JJ 1
.. rr ~vk·vk·
J J1
(9)
k-1
- 20 -

3. The separation of variables of a function of two variables l7 ,l8,24,25

The equivalent decomposition for a continuous function of two variables

is given by the method of separation of variables. A function x(w,t)


of two independent variables w and t, can be represented by

r
r - ~'1k~t L
1k<~' where the sets {u } and {v }
k k
k-l
are orthonorma1 functions with

(2*)

- (3*)

The analogies between eqns (1), (2) and (1*), (2*), etc are obvious,

where the summation has been replaced by integration.

The orthonormal functions ~(w) can be obtained from the

eigenproblem defined by the Fredho1m integral equation18 ,25

• (4*)

where the kernal function s(w,w 1) is given by

- Tf x(w, t) x(w ' t) dt (5*)


l

Similarly, for vk(t)

f
T
- (6*)

where • f x(w,t)x(w,tl)dw (7*)


w
- 21 -

By Mercer's theorem, the kerna1 functions s(w,w ) and r(t,t ) can


l 1
be written in the form
r
s (w,w )
l
=- r
k-l
~~(w)~(wl) (8*)

r(t,t )
l - k-lr r
~v
k
(t)v (t )
k l
(9*)

4. Karhunen-Lo~ve expansions The continuous form of the Karhunen-

Lo~ve expansion is given by eqn (1*) if it is assumed that w is the

probability space variable of a second order random process.

Integration with respect to the variable w in eqns (2*) and (7*) can

then be replaced by the expectation operator E [.] • Thus

• (10*)

(11*)

and the kernal r can be identified as a covariance function.

(Note - It is usual to suppress the variable w from x(w,t),~(w),etc).

Similarly, for the discrete case, the Karhunen-Lo~ve expansion

is given by eqn 1 with i denoting the discrete probability variable

of a random process or of the 'experiments'. Equations 2 and 7 can

then be modified as

• or E[UTU] - I (10)

r.. • nE[X.X. ] or R • E[XTX] (11)


JJ l J J1

A dual set of results can be obtained by assuming t to be the

probability space variable instead of w.


- 22 -

5. Conclusions The role of the singular value decomposition of

rectangular matrices in random signal analysis and, particularly,

the relation to the discrete Karhunen-Loeve expansion has been

highlighted. For the continuous case, the expansion is also

equivalent to the method of separation of variables used in classical

physics and functional analysis.

In most engineering and other problems involving numerical

values and large samples, the expectation operator can be suppressed,

and the Karhunen-Lo~ve expansion and the singular value decomposition

then become numerically identical.

In applications concerned, for example with the forecasting


11
of load data ,the signal matrix X is formed from load data at day

(row) i and at hour (column) j. It is usual to consider that the

rows of the matrix are due to different 'experiments', and to obtain

the eigenvector matrix V which gives the modes of the system data

in the row direction. This matrix will contain information pertaining

to the variation of the load in a day due to industrial and domestic

peaks, etc at each hour of the day. However, if the data in each

column is taken as being due to different experiments, then the

matrix U will show the weekly pattern, indicating the reduced demands

at weekends. Hence, it is not always necessary to know which

subscript refers to the probability space or the 'experiments' in

practice.

The problem can also be modified to include basis functions

~(w) and vk(t) which are orthonormal with respect to weighting

functions Pkt(w) and qkt(t), respectively. Then


- 23 -

-
and

-
If w is the probability space variable, then Pkt(w) will correspond

to the probability density. The results due to this extension are

straightforward and hence omitted. For the discrete case, weighting

matrices P and Q can be introduced accordingly.

The singular value decomposition has been used in model-order

reduction of systems within the framework of principal component


. 19 ' 22 (see Part 3 and 4 for details).
analys1s

The relationships between the various techniques are illustrated

below. It is hoped that this exposition will further strengthen

the links between different disciplines which can be studied within

the general framework of systems theory.

discrete
continuous discrete
KLE KLE/peA
continuous

random de te rmi ni s ti c random deterministic

SOV
discrete
... SVD
• cont1nuous

KLE Karhunen-Lo~ve expansion


SOV Separation of variables
SVD Singular value decomposition
PCA Principal component analysis
- 24 -

6. References
1. Miller, K.S.: 'Complex stochastic processes', Addison-Wes1ey,

Reading, MA, 1974.

2. Wong, E.: 'Stochastic processes in information and dynamical

systems', McGraw-Hi11, New York, 1971.

3. Kai1ath, T.: 'A view of three decades of linear filtering


theory', IEEE Trans. Information Theory, 1974, IT-20, 2, pp.146-181.

Also in 'Linear least-squares estimation', (Ed. Kai1ath, T.)

Benchmark papers in e1ct. eng. & comp. sci., 1977, vo1.17,

Dowden, Hutchinson & Ross, Strudsburg, Pennsylvania.

4. Ke11y, E.J., and Root, W.L.: 'A representation of vector-valued

random processes', J. of Mathematics & Physics, 1860, 39,

pp.211-216.

5. Garcia, A.M.: 'On the smoothness of functions satisfying

certain integral inequalities', in 'Functional analysis',

(ed. Wi1de, C.O.), Academic Press, 1970.

6. Davenport, W.B., and Root, W.L.: 'An introduction to the theory


of random signals and noise', McGraw-Hi11, New York, 1958.

7. Baggeroer, A.B.: 'State variables and communication theory',

M.I.T. Press, Cambridge, MA, 1970.

8. Fukunga, K.: 'Introduction to statistical pattern recognition',

Academic Press, New York, 1972.

9. Fu, K.S.: 'Sequential methods in pattern recognition and

machine learning', Academic Press, New York, 1968.

10. Ahmed, N., and Rao, K.R.: 'Orthogona1 transforms for signal

processing', Springer-Ver1ag, Berlin, 1975.

1;. Nicho1son, H.: 'Structure of interconnected systems', Peter

Peregrinus (for lEE), London, 1978.


- 25 -

12. Glaser, E.M., and Ruchkin, D.S.: 'Principles of neurobiological

signal analysis', Academic Press, New York, 1976.

13. Kshirsagar, A.M.: 'Multivariate analysis', Marcel Dekker, New

York, 1972.

14. Hotelling, H.: 'Analysis of a complex of statistical variables

into principal components', J. Educational Psychology, 1933,

24, pp.4l7-44l and pp.498-S20.

15. Forsyth, G., and Moler, C.B.: 'Computer solution of linear

algebraic systems' , Prentice-Hall, Englewood Cliffs, NJ, 1967.

16. Albert, A.: 'Regression and the Moore-Penrose pseudoinverse',

Academic Press, New York, 1972.

17. Gould, S.H.: 'Variational methods for eigenvalue problems',

Univ. of Toronto Press, Toronto, 1957.

18. Dieudonne, J.: 'Foundations of modern analysis', Academic

Press, 1960.

19. Moore, B.C.: 'Principal component analysis in linear systems:

controllability, observability, and model reduction', Univ. of

Toronto, Dept. of Elect. Eng., Systems Control Reports 7801,7802,

January 1980.

20. Andrews, H.C., and Hunt, B.R.: 'Digital image restoration',

Prentice-Hall, Englewood Cliffs, NJ, 1977.

21. Brier, G.W., and Meltesen, G.T.: 'Eigenvector analysis for

prediction of time series', J. Applied Meteorology, 1976, 15,

12, pp.1307-l3l2.

22. Moore, B.C.: 'Singular value analysis of linear systems',

Proe. Conf. on Decision & Control, January 1979, San Diego, CA,

USA, pp.66-73.
- 26 -

23. Ozeki, K.: 'A coordinate-free theory of eigenva1ue analysis

related to the method of principal components and the

Karhunen-Lo~ve expansion'. Information and Control, 1979,

42, pp.38-59.

24. Sagan, H.: 'Boundary and eigenvalue problems in mathematical

physics', Wi1ey, New York, 1961.

25. Hi11e, E.: 'Methods in classical and functional analysis',

Addison-Wesley, Reading, MA, 1972.


- 27 -

CHAPTER 3

The discrete double-sided Karhunen-Loeve expansion

Abstract. A new expansion for the representation of random data

the double-sided Karhunen-Lo~ve expansion - is hypothesized, with

application for data analysis, contraction and prediction in two-

dimensional processes.

1. Introduction. The Karhunen-Lo~ve (K-L) expansion is one of the

basic forms used for describing random signals and has had wide

application in pattern recognition, feature selection, image processing,


"
data compress10n an d pre d1ct1on
" · 1-7 • The expansion is formed using a

set of orthonormal basis functions which can be obtained as a set of

eigenvectors of a data covariance matrix, and optimal properties are

associated with the expansion which is closely related to the least-


8 23
squares estimation problem' • The truncated series minimizes the

summated mean-square error and also the entropy function defined

over the variance of the random coefficients of the expansion from


2
the information theoretic point of view •

The pattern recognition or feature selection problem can be

concerned with identifying the modes or the energy spectrum of the

process. These properties can then be modified to emphasize or

restrict certain aspects which may be required for data compression.

Thus, an image can be enhanced by altering the corresponding energy

values, and non-dominant terms of the expansion attributed to noise


9 10
can be filtered or suppressed t • Data compression techniques also
- 28 -

have application in the analysis of biomedical data and in the

coding and transmission of picture signals. Such data contraction

can be considered as smoothing if the non-dominant neglected modes

of the expansion are due to high-frequency components. From a

statistical point of view and in geographical applications, this

is also known as trend surface analysis.

Prediction or forecasting of non-stationary random processes

which cannot be modelled exactly is essentially a problem of

extrapolation of past data using known patterns. The K-L expansion

has been used successfully for this problem, and particularly for

the forecasting of electrical power and water system demands, and

of "
a~r po11ut~on "d
an tra ff"~c f1 ows 11-13,26

A double-sided form of the K-L expansion is now developed for


" "
app 1 ~cat~on "h"~n
~t two- d"~mens~ona
"115,25 space / t~me
" "
coord~nate systems,

and is related to the double-sided least-squares prob1em14 • The

double-sided form of the discrete K-L expansion is based on the


16
singular value decomposition of matrices ,21. In numerical

problems with large samples, the expectation operator can be suppressed

and then the K-L expansion and the singular value decomposition
"
techn1que are ~Od"
ent~ca
127 The decomposition has been used in the

study 0
" 28 , meteoro 1ogy 22 ,and"1mage
f g1 ass propert1es "
process~ng
29-31

The expansion can be used for the analysis of spatially-correlated

patterns, for example in geographically-located data, and for time-

mapped data, which is becoming increasingly important in many fields

of study including engineering, econometrics, ecology, meteorology,

geology, planning and regional science.


- 29 -

2. The Karhunen-Loeve expansion. The one-dimensional K-L expansion

is concerned with the representation of mn-data points obtained

from m experiments each with n observations. The data are ordered

as an mxn-dimensional random signal matrix of the form

x =
Xl (1)
x~ (~) .]
[ x (1) x (n)
m m

In the electrical load prediction problem, for example, the element

x. (j) would represent the demand at time j hours (1. .n) on day i (1. .m) •
1
8
The K-L expansion is now defined by the row-problem representation

XE.M A '=M V~M


m,n m,r n,r

where A is a random coefficient matrix with expected value E[A] - O.

Note - The notation M denotes a real matrix of dimension mxn, etc.


m,n
The matrix V represents a set of basis functions and contains the

orthonormalized n eigenvectors of the positive semi-definite

covariance matrix R , defined as


l
R.. ~M Pl~M
-1 n,n m,m
where PI is the~priori probability matrix associated with the m

experiments, with elements p .. , l<p •• <O, p .• • 0, i ~ j. In the


1J - 11 1J

load prediction problem, the probabilities would be assigned to each

row depending on whether it is representative of demand for that

particular day, which may include, for example, the effects of a

freak weather condition.

The system modes are then identified with the eigenvalue problem

defined by

- Al4!:M
n,n

where Al is the diagonal eigenvalue matrix which correlates the

coefficient matrix A, with

- AI
- 30 -

If the series is truncated, to include only the first k

eigenvectors (with the eigenvalues ordered in decreasing order of

magnitude), then

X"" AVT A c&Mm, k V E:.Mn, k

where A denotes the truncated A-matrix with k columns, and X M


m,n
is the reconstructed data X-matrix obtained using the truncated

series. The expansion contains the first k modes, with

The error function is then given by

J ,. trace E [(X-X) Tp 1 (X-X) ]

- trace hI - trace hI

representing the sum of the omitted eigenvalues.

A similar problem could also be considered with m observations

resulting from n experiments, with priori probabilities assigned to

each column. In the load prediction problem, the probabilities

could emphasize the probable occurrence of demand at a particular

hour of each day, say for example, during periods of peak TV viewing.

In this case, the K-L expansion could be represented by the column-

problem format
X ... UB U6M
m,r

where U contains the orthonormalized set of m vectors, or eigen-

vectors of the covariance matrix R , defined as


2

R 6M P2~M
2 m,m n,n
where P2 is the~priori probability matrix associated with the n

experiments. Then

and

where h2 cMm,m is the diagonal eigenvalue matrix.


- 31 -

3. The double-sided K-L expansion. The double-sided K-L expansion

can now be formulated to introduce the possibility of correlation

between both the row and column data associated with either m or n

experiments containing either n or m observations respectively.

The expansion will define this dual behaviour inherently, based on

the properties of the two covariance matrices RI and R2 and the

two spectra corresponding to row and column correlations respectively.

The expansion can be developed within a space(time)/space(time)-

or space/time-coordinate framework. For example, in contrast to the

time/time-coordinate load prediction problem, the measurement of

river-water quality (such as biochemical oxygen demand (BOO), dissolved

oxygen (DO), etc) could form a data matrix with spatial-coordinate

rows and time-coordinate columns. For each independent physical

cause, both spatial and temporal variations will be present and

these will be characterized by an eigenva1ue and corresponding row

and column eigenvectors.

3.1 Probability and weighting matrices and 'energy'. For the

case withtpriori probability matrices PI = Im' P 2 - In' representing

absolute certainty of the experiments, the covariance matrices will

be given by

,. .. T
E [VAI v ] , Al~Mn,n
,. .. T
E [UA U ]
2
A 6M
2 m,m

The eigenvalue matrices Al and A2 differ only by the number of zero

eigenva1ues. If r is the rank of the matrix X (r!min (m,n», then

the ranks of ~ and R2 also will be equal to r, and


- 32 -

Al - diag (Al,···,Ar,O, ••• O)

A2 - diag (A l ,··· ,Ar,O, .......... 0)

If the non-zero eigenvalues are contained in the diagonal matrix A

defined as A - diag (Al, ••• ,A )


r
then A -A@O , A -A(f)O
1 n-r,n-r 2 ~r,m-r

where ® denotes the direct sum and 0 is the null matrix of


n-r,n-r
order n-r.

The traces of the covariance matrices are given by

trace R1 - trace R2 - E (J1=1 j-l~ x1J.. 2) - trace E [A]


This is equal to the sum of the eigenvalues and is a measure of the

total 'energy' content of the system.

For the probability matrices P and P considered as general


l 2
positive definite weighting matrices, we have

In general, the traces are not equal and can be considered as

directional energies. The selection of the probability or weighting

matrices P and P will be dictated by experience and the requirements


1 2
of the problem. For example, in TV image processing, relative

weighting could be used to increase the information content in the

centre of the image compared to the edge regions. Also, row or

• horizontal scanning will preserve continuity in that direction and


- 33 -

reduce the correlation in the vertical direction due to the discrete

nature of scanning and time delays, and the resulting effects could

be de-emphasized by assigning appropriate weighting values to the

matrices PI and P2 •
Note - If PI and P 2 are positive definite matrices, then rank RI -

rank R2 - rank X = r, and the number of non-zero eigenvalues will be

equal to r, in each direction.

Since the above energies are directional, it may be possible to

define a combined non-directional energy term. A possible scalar

candidate function which is balanced in each direction, is given by

where * denotes the matrix inner product or sum of inner products

of corresponding rows or columns. From the two-dimensional point

of view, any 'energy' maximization should then be with respect to J •


l
In the least-squares formulation, maximization of 'energy'

with respect to J corresponds to the minimization of 'energy' due


1
to the error terms. Minimization could then be attempted using

the non-directional function defined as

where error matrix E - x-x.


3.2 The unweighted double-sided K-L expansion. The double-sided

K-L expansion is now defined by

(1)
- 34 -

or by the spectral decomposition

T
x = c .. U.V.
11 1 1

with dimensions X ~M ,U E::M ,C eM ,V E:M ,where X is a


m,n m,r r,r n,r
data matrix of rank rand U, C and V are full rank (-r) matrices.
T
Note - If the expansion X - UCV is used for a time/space system,

then the eigenvector matrix U will contain time-series information

and the matrix V will relate to a space-series. Similarly, for

purely spatial or temporal systems, the matrices U and V will be

of the same kind. If the time variable is t and the space variable

s, then the signal matrix can be written in the forms U(t)CV(s)T,


T T
U(t )CV(t ) or U(sl)CV(s2) for space/time, time/time or space/space
l 2
systems respectively.

If U and V are the orthonormal eigenvector matrices, formed

from the non-zero eigenvalues of the matrices XXT, XTX respectively,

then the decomposition of eqn 1 becomes the singular value

decomposition of the rectangular matrix x16 ,21.


Now, since

RI .. E [xTx] - E [VAVTJ AGM


r,r
T
and R2 .. E [xxT] - E [UAU ]

eqn 1 can be considered as the double-sided K-L expansion. It can


also be shown that the matrix C is diagonal and equal to the square-

root of the eigenvalue matrix. Thus

e
2 . A e ... Ai eT

The matrix C is also given by


C - UTXV
- 3S -

T
Further, it can be shown that A - UC and B • CV are also K-L

expansions.

If now the expansion is truncated to include only k (~r) modes,

the matrix C can be solved in terms of the least-squares problem


defined byl4 (see Appendix 1)
T
X - UCV + E
where E is a residual error matrix. With an error function

~
the truncated solution for C is then given by

~ -T-
C == U XV , -
U~M CloioM.. VE:M , k < r
m,k -It,k n, k

assuming the orthogonality conditions UTU - I, VTV - I, and the


-
reconstructed X is given by

Since C is a diagonal matrix, its least-squares solution is

equivalent to truncation, ie, eKe and


A

- --T
A • UC B - CV

The reconstructed covariance matrices are given by

~ - E [xTx] - E [vcTCVT] - E [VAvT]


T T
R2 - E [XiT] - E [ucCTU ] • E [uAu ]

which contain only the first k modes. The minimized error function

is then given by

trace J • trace E [A - A] (2)

which is equal to the sum of the omitted eigenvalues. Similarly,

the maximized total energy is given by

trace RI - trace R2 - trace E [A].


- 36 -

3.3 The general double-sided K-L expansion. The least-squares

problem with weighting matrices can be obtained by considering the

truncated expansion

1/1 eMm,r C E::M


r,r

where 1/1, ~ and C are full rank (-r) matrices.

With an error function

.
the 1east-squares estimate -CA.lS t h en given
. b y 14

Without loss of generality we can assume that

H = P i~ , , H cM
m,r
, GEM
1 n,r

and the matrices H and G are orthogonal, wi th

-T-
H H
-T-
- GG - Ir

Then ,

and the reconstructed value of Y is


-~-T
Y - HCG
T
Also ~ (Y> - E [yTY1 - E [cCTcc ]

R2 (Y> - E [nT] - E [iiccTiiT]

Comparing these results with the un-weighted solution, the matrices

Gand H can now be identified as the eigenvector matrices of the


~ ~ ~

covariance matrices R1(Y) and R (Y) respectively. Then


2
C - Xi
- 37 -

and the reconstructed data matrix is given by

The minimized value of the error function is given similarly

by eqn 2, which is equal to the sum of the neglected eigenvalues

associated with Y, and the maximized energy value is given by

4. Computational procedure. The procedure for determining the

reconstructed data matrix from the original matrix X is developed

with the following steps.

(a) Y - P 1 ~ XP 2 ~ -+ RI' R2 -+ H ,G -+ H TYG (-C)

(b) choose k to truncate H ,G ,C -+ B ,G ,C


---T
(c) Y .. BCG , X .. P -~yp -~
1 2
T T
(d) trace (E PIE) trace(EP 2E )

(e) J .. P *(EP ET) .. P *(ETp E) .. trace(A-A)


1 2 2 1

5. Example. A data matrix (Table 1) representing the number of


passengers (xlOOO) carried on scheduled international airlines for
19
monthly periods from 1949 to 1960 ,24 is used to illustrate the

application of the double-sided K-L expansion technique for data

contraction. Fig. 1 illustrates the hyper-surface generated by

the two-dimensional process. The data is cyclic with summer peaks

and has a rising pattern or trend as the popularity of air travel

increased. In classical time series analysis, such behaviour has

been explained using complicated and empirical composite models.


- 38 -

Table 1

J~----~.- ------~-~DEC

1949 112 118 132 129 121 135 148 148 136 119 104 118

115 126 141 135 125 149 170 170 158 133 114 140

1 145 150 178 163 172 178 199 199 184 162 146 166

171 180 193 181 183 218 230 242 209 191 172 194

196 196 236 235 229 243 264 272 237 211 180 201

204 188 235 227 234 264 302 293 259 229 203 229

242 233 267 269 270 315 364 347 312 274 237 278

284 277 317 313 318 374 413 405 355 306 271 306

315 301 356 348 355 422 465 467 404 347 305 336

340 318 362 348 363 435 491 505 404 359 310 337

360 342 406 396 420 472 548 559 463 407 362 405

1960
1 417 391 419 461 472 535 622 606 508 461 390 432

'Steady states' were removed using the method described in the


S d
appendix. The steady part X of the matrix X is found to be very

dominant and it is given by

--T
ab

where

; - 10-1 (1.21 1.34 1.63 1.89 2.15 2.29 2.72 3.14 3.53 3.55 4.10 4.56)T
-b • 10-1 (2.47 2.40 2.76 2.72 2.77 3.18 3.58 3.58 3.08 2.72 2.38 2.67) T
3
c - 3.652 10
s

The average trend could be identified from the vector a and the cyclic
-
pattern from the vector b.

The signal matrix X after the removal of steady states is shown

• in table 2.
- 39 -

The priori probability weighting matrices are selected as

PI = diag (0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00)

P
2
= diag (0.60 0.70 0.80 0.90 1.00 1.00 1.00 1.00 0.90 0.80 0.70 0.60)

with the elements of PI emphasizing the later data and the elements

of P providing increased weighting for the summer months.


2
The coefficient matrix is then given by

C = diag (68.47 36.65 -30.69 23.65 19.55 -11.19 -9.21 6.10 -4.49 -1.82

-0.30 0.00)

and the reconstructed data matrix based on k - 5 modes is illustrated

in Table 3.
The optimized values of the error functions with the retention of

different numbers of modes associated with the un-truncated signal

and the corresponding maximized directional 'energy' functions are

given in Table 4.

It can be seen from the reconstructed matrix that the best fits

to the original matrix X are near the centre-bottom of the matrix

corresponding to the summer of 1960. The overall contraction could

be judged from the ratio defined by, error energy/un-truncated signal

energy

-2
- 3.31 10 for k • 5

This ratio can be considered as the square of the noise/signal ratio

and since it is very low, further truncation is possible within a

slight increased penalty. The direction ratios are given by

trace ETplE/trace 3.39 10- 2


T
trace EP E /trace 4.12 10-2
2

and it is evident that the column direction is penalized more than

the row direction.


- 40 -

Table 2

2.S l1.S 9.9 8.3 -1.8 -5.8 -10.S -10.7 -3.7 -1.5 -1.2 -0.3

-5.5 8.9 6.4 1.9 -10.4 -6.3 -5.1 -4.9 7.3 0.2 -2.0 9.5

-1.8 7.3 14.0 0.9 7.0 -11.2 -14.3 -14.1 0.4 0.2 4.6 7.0

1.1 14.8 3.1 -5.7 -8.1 -1.0 -15.9 -4.7 -3.5 3.6 8.4 10.0

1.9 7.4 19.1 20.6 10.S -7.2 -18.0 -9.8 -5.8 -3.0 -5.9 -9.2

-2.1 -12.3 4.7 -0.7 2.3 -1. 7 2.5 -6.3 1.2 1.S 4.5 5.8
X=
-2.9 -5.1 -6.7 -1.6 -5.4 -0.8 8.0 -8.7 5.6 3.9 1.1 12.7

4.9 1.8 0.6 0.2 -0.3 9.0 1.5 -6.1 0.8 -6.2 -1. 7 -0.6

-2.7 -7.9 0.9 -3.0 -2.3 12.4 3.2 5.5 6.5 -3.4 -1.0 -8.1

11.4 -1.4 -5.2 -15.0 -6.5 11.4 13.4 27.8 -7.1 -3.4 -5.5 -18.9

-5.4 -17.1 -6.9 -12.1 4.5 -4.3 11.1 22.5 0.9 -0.4 5.2 4.9

6.3 -S.2 -40.0 7.3 10.2 5.5 25.2 9.6 -5.7 8.1 -5.5 -12.8

Table 3

2.3 11.2 9.3 7.4 -3.1 -3.6 -12.4 -11.0 -1. 7 -0.4 -1.1 0.5

-1.8 6.3 5.8 -2.7 -6.2 -2.3 -7.2 -8.1 2.7 0.6 3.4 9.5

-3.9 6.1 13.7 5.S 3.1 -11.2 -15.0 -11.5 0.0 0.5 3.5 8.9

2.1 16.8 5.2 -5.2 -9.9 -5.4 -12.7 -4.7 -1.8 2.4 4.3 9.8

2.S 8.4 19.1 18.8 12.2 -7.3 -17.7 -10.9 -5.4 -3.6 -5.5 -10.4

-5.4 -9.1 3.0 4.9 2.6 -1.8 1.8 -4.2 4.9 -0.6 2.6 6.3
X-
-4.5 -4.5 -8.3 -2.1 -3.9 1.0 5.2 -8.6 6.3 3.1 3.7 11.8

1.5 4.2 1.4 1.8 -3.2 7.2 1.5 -5.4 2.9 -2.3 -2.8 -2.8

-4.1 -7.8 3.1 -3.6 -2.9 14.5 5.5 4.4 4.0 -4.9 -2.9 -6.4

8.9 -0.9 -6.9 -14.2 -6.0 13.0 11.4 28.8 -6.0 -3.9 -5.2 -19.0

-7.7 -18.0 -6.4 -13.4 4.4 -4.9 11. 7 21. 7 0.4 1.0 5.2 4.1

6.8 -8.3 -39.5 7.4 9.9 5.1 25.8 9.5 -6.2 8.0 -5.1 -12.4

The reconstructed data process (including the steady states) is a1100st


identical in form to the representation illustrated in Fig.l.
- 41 -

Table 4 - Optimized Values of Error Functions

T T
k trace E P E J - P *(EP ET) trace EP E
1 1 2 2

12 O. O. O.

11 O. O. O.
1
10 1.27 10- 8.96 10- 2 1.52 10-
1

9 4.58 3.40 6.40


1 1 1
8 3.03 10 2.36 10 3.69 10
1 1 1
7 7.66 10 6.07 10 8.95 10
2 2 2
6 1. 81 10 1.45 10 2.04 10
2 2 2
5 3.23 10 2.71 10 4.25 10
2 2 2
4 7.78 10 6.53 10 9.51 10
3 3 3
3 1.41 10 1.21 10 1. 72 10
3 3 3
2 2.75 10 2.15 10 2.95 10
3 3 3
1 4.24 10 3.50 10 4.43 10

T 3 T 4
trace X P X - 9.52 10 trace XP X - 1.03 10
1 2

3
- 8.19 10 = trace A
- 42 -

6. Conclusion. A ~o-dimensional K-L-type expansion has been

proposed assuming that there are two spectra or covariance matrices

which can be associated with regression in horizontal (row) and

vertical (column) directions. If the experiments in both

directions are equally likely and certain, this expansion reduces

to the singular value decomposition of a rectangular matrix.

The technique will have application for the analysis of

temporal and spatially-located data which exist in many disciplines.

It is also being developed and extended for two-dimensional curve


l7
fitting and prediction (see Chapter 3), which will have application

for the forecasting of a wide range of industrial and socio-economic

system data. Extensions to multidimensional processes are also

under consideration.
- 43 -

7. References

1. ARMED, N., and RAO, K.R.: 'Orthogonal transforms for digital

signal processing', Springer-Verlag, Berlin, 1975.

2. FU, K.S.: 'Sequential methods in pattern recognition and

machine learning', Academic Press, NY, 1968.

3. ANDREWS, H.C.: 'Computer techniques in image processing',


Academic Press, NY, 1969.

4. ANDREWS, H.C.: 'Introduction to mathematical methods in pattern

recognition', Wiley, NY, 1972.

5. ULLMANN, J.R.: 'Pattern recognition techniques', Butterworths,

London, 1973.

6. MENDEL, J.M., and FU, K.S.: 'Adaptive, learning and pattern

recognition systems', Academic Press, NY, 1970.

7. FUKUNGA, K.: 'An introduction to pattern recognition', Academic


Press, NY, 1973.

8. NICHOLSON, H.: 'Sequential least-squares prediction based on

spectral analysis', Int. J1. Compr. Math., Sect. B, 1972,

3, pp.257-270.

9. ALBERT, G.E.: 'Statistical methods in prediction, filtering and

detection problems', Jl. SIAM Appl. Math., 1960, 8, 4,

pp.64o-653.

10. SELIN, I.: 'The sequential estimation and detection of signals

in normal noise, I', Inf. and Control, 1964, 7, pp.5l2-534.

11. MATTHEWHAN, P.D., and NICHOLSON, H.: 'Techniques for load

prediction in the electricity-supply industry', Proc.IEE,

1968, 115, 10, pp.1451-1457.

12. STERLING, M.J.H., and ANTCLIFFE, D.J.: 'A technique for the

prediction of water demand from past consumption data only',

Jl. Inst. of Water Engineers, 1974, 28, 8, pp.4l3-420.


- 44 -

13. NICHOLSON, H., and SWANN, C.D.: 'The prediction of traffic

flow volumes based on spectral analysis', Transpn. Res.,

1974, 8, pp.533-538.

14. FERNANDO, K.V.M., and NICHOLSON, H.: 'The double-sided least-

squares problem', Electronics Letters, 1979, 15, 20, pp.624-625.

15. WILLSKY, A.S.: 'Relationships between digital signal processing

and control and estimation', Proc.IEEE, 1978, 66, 9,

pp.996-l0l7.

16. GOLUB, G.H., and REINSCH, C.: 'Singular value decomposition and

least squares solutions', Numer. Math., 1970, 14, pp.403-420.

Also in Handbook for Automatic Computation: Linear algebra,

Vol.2, pp.134-l5l, Edited by Wilkinson, J.H., and Reinsch, C.,

Springer-Verlag, Berlin, 1971.

17. FERNANDO, K. V• M., and NI CHOLSON, H.: 'Two-dimens ional curve

fitting and prediction using spectral analysis',

submitted for publication, Proc.IEE, 1982, 192D, in press.

18. MACDUFFEE, C.C.: 'The theory of matrices', Chelsea, NY, 1946.

19. BROWN, R.G.: 'Smoothing, forecasting and prediction', Prentice

Hall, NJ, 1962.

20. CLIFF, A.D., and ORD, J.K.: 'Spatial autocorre1ation', Pion Ltd,

London, 1973.

21. ALBERT, A.: 'Regression and the Moore-Penrose pseudoinverse',

Academic Press, NY, 1972.

22. BRIER, G.W., and MELTESEN, G.T.: Eigenvector analysis for

prediction of time series', J. Applied Metero1ogy, 1976,

15, 12, pp. 1307-1312.

23. NICHOLSON, H.: 'Structure of interconnected systems', Peter

Peregrinus, lEE, London, 1978.

24. BOX, G.E.P., and JENKINS, G.M.: 'Time series analysis, forecasting

and control', Holden-Day, San Francisco, 1971.


- 45 -

25. LARIMORE, W.E.: 'Statistical inference on stationary random

fields', Proc.IEEE, 1977, 65, 6, pp.96l-970.

26. SAITO, 0., and TAKEDA, H.: 'Two-stage predictor of air pollution

adapted to daily fluctuation', Proc.IEE, 1979, 126, 1,

pp.107-112.

27. FERNANDO, K.V.M., and NICHOLSON, H.: 'The Karhunen-Lo~ve

expansion with reference to singular value decomposition

and separation of variables', to be submitted for publication.

28. PETERSON, G.E., CARNEVALE, A., and KURKJIAN, O.R.: 'Singu1ar-

value decomposition and boron spectra in glass', J. of

non-crystalline solids, 1977, 23, pp.243-259.

29. SAWCHUK, A.A., and PEYROVlAN, M.J.: 'Restoration of astigmatism

and curvature of field', J. Optical Soc. of Amer., 1975,

65, 6, pp.7l2-715.

30. SAHASRABUDHE, S.C., and KULKARNI, A.D.: 'Shift variant image

degradation and restoration using SVD', Computer Graphics

& Image Proessing, 1979, 9, pp.203-2l2.


31. ANDREWS, H.C., and PATTERSON, C.L.: 'Outer product expansions

and their uses in digital image processing', Amer. Math.

Monthly, 1975, 82, pp.1-13.

32. TOU, J.T., and GONZALEZ, R.C.: Pattern Recognition Principles ,

Addison-Wesley, Reading, MA, 1974.

33. LOEVE, M.: Probability Theory, Van Nostrand, Princeton, NJ, 1963.
- 46 -

8. Appendix

Removal of the 'steady states'

In row formulation of the expansion, it is required that the

expected value of the row sums to be zero.

-
n
L x. (j)
~
a.
~
o for all i
j=l

Similarly, in the column representation of the expansion, the column

sums have to be zero.

m
L
i-I
x. (j)
~ - b.
J
E[b.]
J
= o for all j

If the signal matrix has large deviations from the expected values,

then removal of steady states or the mean levels would be required.

From the computational point of view, this would be desirable to

avoid large numerical values. Steady states removal is sometimes


. 32
known as entropy re d uct~on •

The following numerical scheme could be used to remove the

steady state matr~x


. XSf rom t h ·
e s~gna 1 .
matr~x Xdto .
g~ve the
.
matr~x X

with required properties.

S
The rank one matrix X is defined as

X
s ,.. ! ab T if d ; 0
d

where a '" (a l , ... , a m)T

b = (b. ,
~
... , b )T
n

m n
d = L a.
~
= L b.
J
i-I j-l
- 47 -

S
The matrix X can also be represented in the format

T
"" c ab
s
=

where a and b- are normalized vectors of a and b, respectively, and

c A~ =
s s

Then A could be interpreted as the 'energy value' associated with


s
the steady states.
- 48 -

196C

Fig. 1. Number of passengers carried on scheduled


international airlines
- 49 -

CHAPTER 4

Two-dimensional curve-fitting and prediction

using spectral analysis

Abstract: A curve-fitting or prediction problem for two-dimensional

or cyclic processes is defined and solved using spectral techniques.

The assumed statistical model is structurally similar to the


, .
Karhunen-Loeve expans10n and the technique can be implemented

using the singular value decomposition. Two examples using

published data illustrate the feasibility of the method and the

peculiarities associated with the problem.


- 50 -

1. Introduction

One-dimensional techniques have often been used for the processing

of data or signals which are essentially two-dimensional or cyclic.

These include, for example, load demands on utilities, meteorological

conditions, biological cycles, consumer demands and economic indicators.


17
In one-dimensional methods, as in conventional ARHA time series analysis •
which is generally not suitable for long-term prediction and for seasonal
15
or cyclic processes • two-dimensional properties such as long-term trend

and cyclic effects are removed or neglected, and thus such techniques do

not utilize available information optimally. However, the two-dimensional

and cyclic nature of processes has been appreciated recently


1-3 • and is

leading to new research activity in this area.

Two-dimensional properties of data can be exploited using the


20
Karhunen-Loeve expansion (KLE), which is a fundamental expansion for

random processes. This is equivalent to the well-known teChnique of


16
principal component analysis , and at an abstract level to separation

of variables methods in functional analysis and mathematical physics 4


(Chapter 2). It is also known as the method of characteristics ll •

Contraction and smoothing of data using the double-sided KLE based


.
on the s1ngular va1 ue decompos1t10n
. . 18 0
. h ave b een stud1ed
f a matr1x .

previously3,4 (Chapters 2,3). Contraction and smoothing is a problem of inter-

polation and in this particular case it is equivalent to least-square inter-

polation in tensor product spaces. We now extend this to extrapolation using

past data which can be viewed as either prediction or curve-fitting

depending on the type of data being handled.

The data or the signal matrix could consist of discretized values

along temporal or spatial dimensions or a combination of the two. Causality


- 51 -

of data is not required or assumed and purely spatial two-dimensional

processes can also be studied. If the data is cyclic then one of the

dimensions could represent short-term behaviour within the cycle. The

other dimension could then represent long-term aspects including any

trends and other non-stationary effects.

The technique of extrapolation of data using KLE has been used in


7-11
the forecasting of load demands on power systems and water
. . but10n
d1str1 . ~- 12
networ~s ,and 0 f tra ff'1C fl ows 1 3 .
,1nterna 1 pressure

.
dur1ng . hemorrh age 11 an da1r
bra1n ' po 1 1 ' 14 ,among other app l'1cat10ns.
ut10n .

The object of this cA. 4~ is to develop a lOOdel to extrapolate data which

can be considered as two-dimensional or cyclic. The structure of the

model is similar to the KLE and can be implemented using the singular

value decomposition technique.

2. The problem

It is assumed that the discretized data are in matrix form, with

the two-dimensions of the data corresponding to the row and column

di rec dons. Without loss of generality, the submatrix X is taken as


22
the unknown data to be predicted which is imbedded in the m,n dimensional

composite matrix X. Thus,

X ••
1J
G: M
m. ,no
, i,j • 1,2
1 J

The unknown matrix X22 could represent future data to be forecasted,

as in load prediction problems, or it could correspond to inaccessible

data in a hostile environment or to lost records due to instrumentation

failure.
- 52 -

, 7-14
3. The Karhunen-Loeve extrapolation method

Forecasting of future data using the Karhunen-L~eve expansion is

implemented usually in two stages. The first stage involves the formation

of a covariance matrix, the spectral decomposition of that matrix to

obtain the "modes" or the eigenvector structure and then contraction of

the data to eliminate the "noise" or the insignificant modes of the

system. This can be considered as the training or learning stage of

the method. The actual extrapolation using the modes or basis "pattem"

functions is represented by the second stage.

We assume that the data matrix Yl - [X1l xl2] is formed using ~

experiments each with n discrete sampled values. The data record denoted

by the row vector yi contains the information about the ith experiment

and it is assumed that the expected value of yi is zero, and that this

second-order process has a covariance matrix R ,


y

Y Ci= M
1 ~,n

• RY

where Er·] denotes the expectation operator.

If the covariance matrix R is already known (say, using analytical


y
modelling or experimental evaluation) then we proceed to the decomposition

of that matrix. However, if it is not known, then it has to be

estimated using the information in the matrix Y • This can be achieved


1
20
using Bayesian or any other standard learning method • The most simple

me thod is gi ven by the asymptotic approximation,

~
R
y ~ -1 L (yi) T (yi) = l y Ty
~ 1 1
~ i-l

provided ml is large.
- 53 -

The spectral expansion of the covariance matrix R can be written


y
in the spectral format,

R =
Y

where the matrix V contains the n orthogonal set of eigenvectors and D 2


y
is the diagonal eigenvalue matrix ordered with decreasing magnitude.

The KLE for the process can then be expressed as,

Al ~ M
u;.,n

where ~ is a random coefficient matrix, given by

-
The most important property of this matrix is that (the expected value

of) the columns of the matrix are orthogonal, with

• D 2
y

Because of this diagonal form, the modes of the process can be decoupled

indi vi dually • All the optimal properties ascribed to this expansion

are due to this absence of off-diagonal terms.

The diagonal values of the matrix D 2 may contain zero or near zero
y
entries and they can be neglected without loss of information. If the
expansion is truncated using only k modes, then it can be written in the

format,

Y
l - - -T
AlV YI ~M
u;.,n
, Al 6 Mu;.,k

V €. Mn, k Dy 2 l= ~ ,k

where Y denotes the data matrix reconstructed from the most significant
1
k modes and Al and Vare truncated matrices of Al and V, respectively,

containing only the first k columns.


- S4 -

Prediction or extrapolation is based on the assumption that the

remaining experiments as specified by the matrix Y2 - [X21 X22 ] has

the same truncated eigenvector structure. The matrix Y can then be


2
expressed using the KLE, as

where A2 is the truncated coefficient matrix and E2 is a possible

error matrix. The above expansion can be decomposed into two parts.

X ... - - T
A V
21 2 I + E21

- - T
X A V
2 2 + E21
OIl

22

where V - (::1 -
If n is greater than the number of modes k, then the random
l
coefficient matrix A2 can be obtained by solving the following least-

squares problem •

.
I.
minimze
.. trace
The least-squares estimate A2 is then given by,

-
...
The estimate X of the unknown matrix X can be computed as,
22 22
- 55 -

Since the extrapolation was performed with the matrix X in the


21
horizontal or row direction using the eigenvector structure of the

covariance matrix R , it can be called the horizontal or row prediction


y
method (Fig.!).

The assumption that the data matrix is due to m experiments each

with n sampled values is quite arbitrary in most applications, and it

could be conveniently assumed that the data is from n experiments each


3 4
with m sampled data values' (Chapters 2,3). A vertical or column predictor

(Fig.2) c,an then be obtained by extrapolating the matrix X in the column


l2
direction by employing the eigenvector structure of the covariance matrix

R ,
z

R
z

and with analogous relationships as defined for the horizontal case.

If zj is a column vector of Zl' where,

then an estimate of R can be obtained ~ using the approximation,


z
n
l
R
z
~
I
j-l
L '" -I

4. A case for two-dimensional extrapolation

As mentioned in the introduction, one-dimensional techniques have

often been used for the processing of data or signals which are essentially

two-dimensional or approximately cyclic. However, often the solution is

then a series of continuous (SIOOOth) curves rather than a continuous

surface, and such extrapolation methods do not utilize all available data

optimally. Even if such methods provide very acceptable one-dimensional


- 56 -

fitting, it will be at the expense of the fitting in the complementary

direction. Thus, true two-dimensional extrapolation has to be good

in both directions and should not have a bias to one direction or the

other (unless of course, such a bias is desired).

In practical problems with limited data points, the assumption that

the process has a zero mean (i.e. E[yi] - 0) is highly restrictive.

Often, the experiments are not stationary and it is required to remove

the trend or cyclic or other effec~from the data before the asymptotic

covariance matrix is formed. The conventional practice is to remove

the trend effects by differencing of data. However, differencing is

akin to differentiation and such techniques are anathema to most engineers.

It is also possible to use polynomial curve-fitting to remove the trend

or similar effects. Such methods artificially introduce a secondary

curve-fitting process. One of our objectives is to avoid such secondary

methods and to use the trend and other similar terms to our own

advantage. This is achieved by considering the process to be two-

dimensional and accepting such trend effects in the model.

One possible way of devising a two-dimensional extrapolation method

is to exploit the eigenstructure of the matrix V as well as that of the

matrix U. If both these matrices are used, most of the information in

the data matrix is abstracted, and thus it is reasonable to assume that

the prediction will be more consistent than if only one matrix is used.

Another objective is to avoid matrix inversion (or equivalently

solution of linear matrix equations) by exploiting the orthogonal structure

of the eigenvector matrices U and V. Such methods will also minimize

the roundoff errors due to the reduced computational effort, and the

technique will be numerically stable due to the inherently well-conditioned

orthogonal matrices.
- 57 -

5. A two-dimensional statistical model

The element x .. of the random matrix (field) is assumed to be of


1J
the form,

x ..
1J -
where u and v are random variables with the following second order
ik jk
properties.

E [u.1p u.1q] • IS
pq
, E Cv.JP v.Jq ] - ~pq

The function IS is the Kronecker delta.


pq
For an integer pair p,q, the sum of the product x. x. , i - l,~
1p 1q J.

is given by
m k k
l
l:
i=l
x. x.
1p 1q
- r-l
l: s-l
l: v v d d
pr q8 r q
u. u.
1r 18

If ~ is large, we may a88umethe asymptotic approximation,

1
~
u. u.
1r 18
~ E [u.1ru.18J - ~r8

which gives the result,

x. x. ~ ~ v v d 2
1p 1q J. pr qr r

Equivalently, the above relation can be written in the matrix format,

Similarly, by considering the column direction, the following

approximation can be obtained.

-2-T
n UD U (2)
l
- 58 -

Equations (1) and (2) are very similar to the results obtained
"-
using the conventional Karhunen-Loeve extrapolation method, except that

there are two equations corresponding to both row and column extra-

polation. To determine the "patterns" or the matrices ii and V. we


again use asymptotic approximations.

If n is large, the following asymptotic approximation is valid,

n
1 ~ [v.l.rv.1.S] •
n
i=l
L v. v.
1.r 1.S
E cS
rs

which implies that


-T- I (3)
V V ~

Similarly,
-T-
U U z I (4)

Thus, the matrices U and V are approximately orthonormal matrices. This

property can be used to obtain estimates of these matrices by considering

them to be the orthonormal eigenvector matrices of ylTY and zIZT,


l
respectively. This is evident from equations (1) and (2).

In the next sections, extrapolation schemes to obtain the unknown

matrix X are developed.


22

6. Extrapolation using the two-dimensional model

Since the "pattern" matrices U and V can be calculated using

equations (3) and (4), the problem of extrapolation reduces to estimating

the matrix D. In the conventional approach, the actual extrapolation

is achieved by considering the matrix X (the row method) or the matrix


2l
X (the column method). However, in the two-dimensional approach, we
l2
~y use all three known matrices X , X , and X •
12 2l ll
- 59 -

6.1 Extrapolation with three known matrices

Using the assumed two-dimensional model, the matrices YI and Z2 can

be written in the format,

• Y =
I

= =

where El and E2 are possible error matrices. Since, the matrices Y ,


1
Zl' U, V, UI , and VI are known, the diagonal matrix Dcan be estimated

by using a least-squares technique.

If the quadratic error function for ~ni~zation is taken as,

the least-squares estimate of the diagonal matrix Dis given by6 (Appendix 2)

where the elements of the vector c are given by

c. =-
1

ui and vi are the column vectors of the matrices U and V, respectively,

and * denotes the Hadamard product or the Schur product defined as the

element by element product of any two matrices of equal dimensions.

Using orthogonality conditions to simplify the solution, the


,.
elements of the matrix Dcan be written in the form,

d.
1

Thus, inversion of matrices is not required in this method.


- 60 -

An estimate of the unknown matrix X can be computed as,


22
- ~- T
= U DV
2 2

Since all three known matrices are used in the extrapolation, the estimate

will be less sensitive to any unrepresentative values in the known

matrices. In the conventional approach, only one known matrix (X or


12
X ) is used, and such values tend to have large effects on the predicted
2l
values. In the context of load prediction in the power and water

industries, such unexpected demands occur due to freak weather conditions

and bank holiday weekends.

6.2 Extrapolation using an arbitrary number of known matrices

We have so far assumed that the unknown block is the matrix X


22
and that there are only three known blocks. However, such assumptions

are not necessary and we can extend this to any arbi trary number of

blocks, and the unknown block does not have to be the corner block.

The least-squares problem is then defined by

minimize r r trace k t t Eo known blocks


k t

Then the estimate D is given by


,.
- T-U ) - T- -1
d • ( r r (U
k k * (Vg Vi») c
k Q,

,. i T i
c. r r (~ ) ~QVQ,
1
k t
The estimate of the unknown block X is given by
st
- ~- T
- U DV
s t

By judicious choice of the known blocks for extrapolation, inversion of


matrices may be avoided.
- 61 -

7. Formation of the data matrix for cyclic processes

For a cyclic process of period N (e.g. N • 12 months or N • 52 weeks

if the cycle is taken as a year; N - 24 hours if the cycle is a day),

the data matrix X can be formed in the following way9.

xl(N'+l)
..... ............... ....................
X = ~(N') ~(N' +1) ~(N)

\t+l (1) ••• ~+l (N') ~+l(N'+l) ••• ~+l(N)

The suffix i in x. (j) refers to the cycle. It is assumed that data upto
1

and including the Mth cycle and the data points 1 to N'+l in the M+I cycle

are known. The unknown data to be predicted are the data points N'+l to

N in the M+I cycle.

Using the above method, predictions can be made only of a fraction of

a cycle. If however, full cycle ahead predictions are required, the

following 'doubling up' procedure can be used.

xl (1) xl (N) x (1) x (N)


2 2
................... ..................
X = ~-l (1) ~1(N)

~(N)
~(l)

~+l (1) •••


\t(N)
-[*.)
X
2l
X
22
\t(1) ~+I(N)

This method has the advantage of having the continuity preserved from

data points N to N+l (e.g. December to January) in the row direction.


- 62 -

8. computational procedure

Computation of covariance matrices ylTY and zlzlT can be avoided


I
if the singular value decomposition of Yl and Zl' respectively, are used

in obtaining the spectral decompositions. The following procedure is

used in implementing the prediction method.

(a) Compute t h e •
s~ngu I ar va I
ue d ""
ecompos~t~ons 18

... , • UD V T
z z

(b) Truncate the eigenvector matrices U and V to give U and V,

respec ti vely.
(c) Solve the least squares problems defined in section 6 to give

the matrix D.

(d) Compute the prediction submatrix X22 ...

9. Example I
To illustrate the method, one cycle ahead predictions equivalent

to twelve months were attempted for data corresponding to the number of

international airline passengers per month (entering and leaving the

United States) in the years 1949 to 1960. This airline data is seasonal
"
or cyc I ~c "h a tren d an d h as b een
w~t ·de l
w~ yIanad
yse"~n t h e I"~terature 3,17 •

Figure 3 shows the actual passenger levels and the predicted values.

The predictions were obtained using five years of immediate past data

(M • 5) and two modes (k - 2). We arrived at the following conclusions

after extensive numerical experiments with this data.

(a) Reasonable one-year ahead predictions can be obtained using this

method as is obvious from Fig. 3. Efficiency of the method was

gauged using the mean error and the mean squared error for each year.
- 63 -

(b) It was found that if lOOre than five years of past data were used,

the errors marginally increased. This may be due to long-term

nonstationary effects which cannot be taken into account.

Similarly, if M was taken as less than five, predictions were poor

which may be due to insufficient statistics.

(c) Generally, acceptable predictions were obtained using only one mode

(k - 1). For k - 2, corresponding to two modes, slightly better

(which might not be statistically significant) predictions can be

made. It was found that the ratio of 'energy' in the first mode
,. 2 ,. 2
to the second mode d /d 2 is high, and this could explain the
l
marginal differences between the predictions using one and two modes.
T T
(d) The elements of the matrices Yl Yl and ZlZl are positive and thus,

the first modes VI and u I correspond to Perron-Frobenius eigenvectors

which have positive elements. The vector VI contained most of the

seasonal information and the vector u ' the trend. Due to


l
orthogonality conditions, u 2 ' v 2 and other higher-order modes,

contain both positive and negative elements and thus can be

considered as completely oscillatory modes.

(e) Since only past data is used, it is not possible to take into account

other factors which ~ffect the number of passengers such as the high
19
number of airline accidents in the United States in 1958. This

might perhaps explain the large prediction errors for that year.

10. Example 2

As our second example, we have chosen the power demand on the

Hydro-Quebec system from Monday 15 November to Sunday 21 November 1971


2l
as published by Srinivasan and Pronovost • We have attempted 12 hour

predictions at hourly intervals at mid-day and mid-night. Since the


- 64 -

data is approximately cyclic with a period of 24 hours, the doubling

up procedure detailed in section 7 is not required for 12 hour predictions.

Figure 4 shows the actual realized load levels and the predictions

using four days of past data (M = 4) and two modes (k - 2) •

It can be seen from Figure 4 that the prediction errors are within

reasonable limits except in the periods 1 to 12 hours on the Saturday

. and the Sunday. This is due to the different patterns of load which

exist in weekdays and weekends. Since, the "patterns" or the "modes"

for prediction are based on the weekdays, such errors are to be expected.

This indicates that power load data has three-dimensional properties

rather than two-dimensional with hourly, daily, and weekly correlations,


2l
which has been well-recognized by power engineers for some time ,

although the significance of higher-dimensional data have not been fully


. th e I'1terature 1-2 •
' d yet 1n
apprec1ate

11. Concl us ions

A curve-fitting or prediction problem for two-dimensional or cyclic

processes has been formulated and solved. The assumed statistical

model is structurally similar to the Karhunen-Loive expansion, and is

implemented using the singular value decomposition.

The method can be used for temporal, spatial or mixed data in

engineering, socio-economics, bio-medicine and other fields for general

two-dimensional curve-fitting and long-term prediction. A typical

application of this method could be in prediction of load demands in

power and other utilities. In such problems, enough data is usually

available for formation of the asymptotic covariances and the data is

cyclic. Furthermore, it is reasonable to assume that such systems are

excited by noise rather than by deterministic inputs which conforms to

our strictly statistical model.


- 65 -

11. References
1. Willsky, A.S.: 'Digital signal processing and control and

estimation theory - points of tangency, areas of intersection.

and parallel directions', M.I.T. Press, Cambridge, MA, 1979.

2. Strintzis, M.G.: 'Dynamic representation and recursive estimation

of cyclic and two-dimensional processes', IEEE Trans. Automatic

Control, 1978, AC-23, pp.BOl-B09.

3. Femando, K.V.M., and Nicholson, H.: 'Discrete double-sided

Karhunen-Lo~ve expansion', Proc.IEE, part D, 1980, 127, pp.155-l60.

Fe man do , K.V.M., and Nicholson, H.:


,
'Karhunen-Loeve expans10n
.
4.
with reference to singular-value decomposition and separation of

variables', Proc.IEE, part D, 1980, 127, pp.204-206.

5. Femando, K.V.M., and Nicholson, H.: 'Double-sided least-squares

problem', Electronics Letters, 1979, 15, (20), pp.624-625.

6. Fe m an do , K.V.M., and Nicholson, H.: 'Double-sided least-squares

problem with diagonal constraints', Electronics Letters, 1980, 16,

(3), pp.82-83.

7. Farmer, E.D., and Potten, M.J.: 'Development of online load

prediction techniques with results from trials in the south-west

region of the CEGB', Proc.IEE, 1968, 115, pp.1549-l558.

8. Matthewman, P.D., and Nicholson, H.: 'Techniques for load prediction

in the electricity supply industry', Proc.IEE, 1968, 115, pp.145l-l457.

9. Nicholson, H.: 'Structure of interconnected systems', Peter

Peregrinus, London, 1978.


- 66 -

,
10. Belik, D.D., Nelson, D.J., and Olive, D.W.: 'Use of Karhunen-Loeve

expansion to analyse hourly load requirements for a power utility',

Proc.IEEE PES Winter Meeting, New York, January 1978.

11. Ivakhnenko, A.G., and Lapa, V.G.: 'Cybernetics and forecasting

techniques', Elsevier, New York, 1967.

12, Sterling, M.J.H., and Antcliffe, D.J.: 'A technique for the

prediction of water demand from past consumption data only', Jl.

Inst. of Water Engineers, 1974, 28, pp.4l3-420.

13. Nicholson, H., and Swann, C.D.: 'The prediction of traffic flow

volumes based on spectral analysis', Transpn. Res., 1974, 8,

pp.533-538.

14. Saito, 0., and Takeda, H.: 'Two-stage predictor of air pollution

adapted to daily fluctuation', Proc.IEE, 1979, 126, pp.l07-ll2.

15. Chatfield, C., and Prothero, D.L.: 'Box-Jenkins seasonal fore-

casting: problems in a case-study', J. R. Statist. Soc. A, 1973,

136, pp.295-336.

16. Brillinger, D.R.: 'Time series: data analysis and theory',

Holt, Reinhart and Wins ton, New York, 1975.

17. Box, G.E.P., and Jenkins, G.M.: 'Time series analysis, forecasting

and control', Holden-Day, San Francisco, 1970.

18. Garbow, B.S., Boyle, J.M., Dongarra, J.J., and Maler, C.B.:

'Matrix eigensystem routines: EISPACK guide extensions', Lecture

notes in computer science, vol-5l, Springer-Verlag, Berlin, 1977.

19. Lombardo, T.G.: 'The federal aviation administration under

scrutiny', IEEE Spectrum, 1980, 17, (11), pp.53-56.

20. Fu, K.S.: 'Sequential methods in pattern recognition and machine

learning', Academic Press, New York, 1968.

21. Srinivasan, K., and Pronovost, R.: 'Short term load forecasting using

multiple correlation models', IEEE Trans. Power Apparatus and Systems,


1976, PAS-94, (5), pp.1854-l858.
- 67 -

~I
,

Y
l
.-

X
.. X
2l 22
)
c:.- direction of e xtrapolation

Fig. 1: Conventional horizontal (row) extrapolation

Hatched area indicates the data required to form the covariance


matrix

Zl 12

~r

X
22

Fig. 2: Conventional vertical (column) extrapolation


650

actual :'
600
predicted

550 .-

500 .-

Cl)
~ 450
fJ
Cl)
::l
0
.t::
~ 400
~
.,-4

00 Cl)
\0 \.0
Q) 350
bO
~
Q)
Cl)
Cl)
1"<1
0. 300
~
0
\.0
Q)
..0 250
E:
::l
~

200

_____ ..J
150
1954 1955 1956 1957 1958 1959 1960
year

Fig. 3: The international ~assen~ers leavin~ and enterin~ the US~ ~er ~onth
( :- r)
u J ~ r- actual
- - - - predicted
I
,,,":.",",
II !
U..I'''/ 1-

, \.
/: (
;:
;' .
'\
~
" r.:- /",
j-~//! \
\'.." \.,

',- ,...
\ \
\
\
/\\\"
\ , '. r~
. \\"\,.
\"\\
,/ \

J
,:
.
\\
.\)\
I
I
\ \ ,. ....,
r: .
,'''':"r)
(..l -.i . f
i
1_
\.
' - - ........./ ;
\, i.
i"".
I
\.,. ' ' \~'
0'1
\0 ~
0
.-4
\1,

/ / \.

c:
\.
\.
"\ , .. \
/:"
f:
"\.,
,',
.~

,...
QJ
~
(}j~:J

!
\ /
,
'\
'\

\
,,,
/ '\"'. \ ii
/
ii
r: , \
I, \,

\
i
/. ".
0
p.,
II \ / '-. /\\-' !
",
'.
\

\
I \.
t:")~; I \
1'-
I ) \ /
\ \.
i
I
j
I
\ 'or
// .. '\
/
: /
I,-- '-"~"'<.~
" .'
f-- (', .'~j

i
j
,I

..-,.., ._----
.
,, _ _ .___ ._-1

Friday Saturday Saturday Sunday Sunday


13-24 Hrs. 0-12 Hrs. 12-24 Hrs. 0-12 Hrs. 12-24 Hrs.

Fig. 4. Load on rhe Hydro-(Juebec Sysrem


- 70 -

PART 3

Singular Perturbational Model Order Reduction

of Balanced Systems
- 71 -

CHAPTER 5

Singular Perturbational Model Reduction

of Continuous-time Balanced Systems

Abstract: The balanced representations of linear systems due to

Moore are shown to be natural and convenient mediQ for singular

perturbational model reduction.

1. Introduction

The linear stable time-invariant system S(A,B,C) defined by

i = Ax + Bu y = Cx

x~Mn, 1 uf: M
m, 1 yeM
r, 1
A€M B~M C6-M
n,n n,m r,n

is an input-output balanced system, if the controllability Gramian

Wand the observability Gramian Ware diagonal and equa1 6- 8 .


c 0

In this case

W -= W ~ W W ,W W M
c o c 0 n,n

OD T
where W ... I eAtBBTeA tdt
c 0

-f T
OD

W eA tcTCeAtdt
0
0

and A,B,C represent balanced system matrices.


- 72 -

6
A linear system can be balanced using similarity transformations

which is the basic ooeration used in principal component analysis

(and is also equivalent to the Karhunen-Lo~ve expansion/transform


4 6-8
method (see Chapter 2» of linear systems •

In the balanced system approach, the reduced order model is

obtained by direct elimination of 'weak' subsystems whose

contribution to the impulse response of the system is negligible.

However, in the perturbational approach, a 'boundary layer'


1 2 12
correction is used to account for the eliminated subsystem ' ,

Our aim is to show that such boundary layer corrections can be

accommodated in balanced models and such models are natural

representations for singular perturbational reduction.

2. Model Reduction using Balanced Systems


6 a
If the linear system S(A,B,C) is balanced - , then the diagonal

Gramian matrix W satisfies the Lyapunov equations,

= (1)

:8 (2)

Model reduction is achieved by considering a partition of the

balanced system matrices of the form

[ -:~ 1
A B = C .
where the matrices All and A22 are square. The two subsys tems

seA .. ,B.,C.) are also input-output balanced and the Gramians for
1.1. 1. 1

the subsystems are given by


- 73 -

W.A ..
1. 1.1.
T
+ A.. W.
1.1. 1.
.. - B.B. T
1. 1.
T
W.A .. + A.. W.
1. 1.1. 1.1. 1. - T
C. C.
1. 1.
i • 1,2

where W =

The diagonal Gramians also satisfy the following equations which

relate the cross-coupling between the subsystems.

T T
W.A .. + A.. W. 0& - B.B.
l. Jl. 1.J J 1. J
T T
W.A .. + A.. W. = C. C.
1. J
i, j "" 1,2
1. 1.J J1. J

If S(A
,B ,C ) is a weak subsystem, then the diagonal elements
22 2 2
of the W Gramian will be small in comparison with that of W • By
2 l
eliminating the weaker subsystem S(A ,B ,C ), we obtain the reduced
22 2 2
order model S(All,Bl,C l ).

3. Model Reduction using the Singular Perturbation Method



In the sl.ngu 1ar perturb at1.on
. method l ,2, t h e 1·l.near system

defined by

is approximated by
.!.
x ,. -
Ax + Bu = Cx
Y
-1
where A a
All - A12A22 A2l
-
B
-
- -1
Bl - A12A22 B2
-1
C a
Cl - C2A22 A21

The reduced-order state matrix A is the Schur complement of the matrix

A. The Schur complement of a partitioned matrix is fundamental in


- 74 -

lO
model reduction problems and also in data contraction , the Kron
3 7
method of tearing , and elsewhere •

The zeroth order perturbational approximation is exact for J..l = 0,


12
and frequently used ,13 for the case J..l = 1. However, the

perturbational parameter J..l does not explicitly appear in the

approximation and such simplifications can be considered as


. . 11
approx1mate aggregat10n

4. Weak and Fast Subsystems

The diagonal values of the state matrix A are related directly to

the elements of the Gramian matrices. Thus,

2w •• a ..
11 11 - b ..
1J
2

where the lower case letters indicate the elements of the respective

matrices. The relatively small elements w.. in weak systems can


11
occur in two ways:

(a) with large values of a .. which will correspond to high damping


11

(real parts of the eigenvalues) which is a property of fast

systems,

(b) with small values of b .. which means that the states x are
1J i
not strongly excited by impulses.

We observe that not all systems are suitable for singular

perturbational reduction particularly if mechanism (b) dominates.

However, if the eigenvalues cluster into disjoint regions, singular

perturbational reduction is possible and for such systems, mechanism

(a) will prevail. Thus, fast subsystems and weak subsystems have

common attributes in certain classes of systems.


- 75 -

5. Model Reduction of Balanced Systems using the Singular

Perturbation Method

The principal component approach is elegant from the point of


6
view of minimal realization, although, as pointed out by Moore ,

model reduction through subsystem elimination is not a well under-

stood operation. Model reduction using singular perturbational

methods has achieved a certain amount of maturity, and thus, it is

natural to investigate whether the singular perturbational approach

is compatible with the balanced representation. The following

proposition answers this affirm~t~vely.

Proposition: If the system S(A,B,C) is balanced, then the singular

perturbational approximation S(A,B,C) also defines a balanced system.

Further, the diagonal Gramian for the reduced-order system is given

by the matrix Wl · Thus, the Lyapunov equations for the reduced-

order system are of the form,

T (3)
WlA + AWl = BBT

WlA + ATWl - - CTc (4)

Proof: The Lyapunov equation (3) can be obtained by premultiplying


T
and postmultiplying (1) by Tl and Tl ' respectively, where

Similarly, (4) can be obtained by premultiplying and postmultiplying


T
(2) by T2 and T , respectively, where
2

- I TT-I)
: - A2l (A 22 ) Tl ' T2~ Mr,n

Since the Gramian matrix W is diagonal, the reduced-order system is


l
also balanced and this completes the proof.
- 76 -

In the subsystem elimination method using balanced systems,

the retention of the dominant second-order modes (i.e. the diagonal

elements of the Gramian) is considered as a criteri~for model

reduction. Since we arrive at the same Gramian matrix W , this


l
condition is naturally satisfied in the perturbational approach.

For single-input single-output systems, a shared Gramian W also


l
signifies the fact that the two systems have a common first moment

(i.e. the dc gain). This can be proved using results in reference 7.

6. Numerical Procedure

To obtain a reduced-order model for the system S(A,B,C)


. .
use s1milar1ty trans f '
ormat1ons 6 to g1ve
• the balanced system
(a)

S(A,B,C)

(b) partition the system S(A,B,C) to give the strong subsystem

S(All,BI,C ) and the weak subsystem S(A 22 ,B ,C )


I 2 2
(c) check whether the weak subsystem is fast by calculating the

eigenvalues of the matrices All and A22

(d) if (c) is true, calculate the reduced-order representation

7. Example

To illustrate the procedure and the peculiarities of singular

perturbational reduction of balanced systems, we have chosen the

fourth-order model for the longitudinal dynamics of an F-8 aircraft


5
without the wind disturbances .
- 77 -

An internally balanced representation of that model is given

by
-3 -2 -4 -3
-6. 560xlO -7.577xlO 7. 390xl0 3.564xlO -4.713
-2 -3 -4 -3
7.577xlO -8. 383xlO 9.204x10 4.445xlO 4.831
A = -3 -2 , B =
-4 -1
-9.l71xlO 1.142xlO -9.2l9xl0 -3.086 -3.293xlO
-3 -3
3.597xlO -4.486xlO 3.136 -1. 816 1. 292

c = [ 5.530xlO-
4 1.231xlO- 3 -1.951xlO- 1 -1. 743x10 -1]
4.713 4.831 -2.653x10- 1 -1.281

3 3 1
with W = diag (1.693x10 1.392x10 5. 882xl0- 4.S97xlO- l

If the system is partitioned into 2x2 subsystems in the natural order,

the eigenva1ues of the subsystems are given by

. )'(An) - -7.472x10
-3
± j 7.577xlO
-2

-1
). (A ) • -9.543x10 ± j 2.989
22
Thus, the subsystem S(A ,B ,C ) is considerably weak and fast compared
22 2 2
with the subsystem S(A11 ,B 1 ,C 1 )·
The balanced reduced-order model is then given by,

[-6 0S62xlO-3 -7.577x10 -2]


-
A III

7.S77x10
-2
-8. 380x10
-3
B
- [-40713]
4.831

[SoSlOXIO-4 8.593x10 -4]


-C
- 4.713 4.831
- 78 -

8. Conclusions

The singular perturbational model reduction technique is

accommodated in the framework of internally balanced systems which

gives a new unified technique which retains the advantages of both

methods.
In some physical systems, singular perturbational parameters
l
can be explicitly identified • However, in general, this may not

be possible and difficulties can be encountered in identifying the

subsystems to be eliminated. Usually, this requires solution of


12
(quadratic) Riccati equations to determine the fast systems •

By using balanced systems, we have avoided such difficulties and

the balancing operation requires only solution of (linear) Lyapunov

equations. The second order modes of the system (i.e. the diagonal

elements of the matrix W) in someway act as the perturbational

parameters. However, further research is necessary to investigate

such aspects.

In this Chapter, we have exploited the Schur complement in

perturbational reduction. However, it appears in a wider context

in aggregation and in the Kron method of tearing among other areas,

and thus it is reasonable to speculate that the balanced approach

of Moore is applicable in these areas as well.


- 79 -

9. References
1. Kokotovic, P.V., O'Malley, R.E., and Sannuti, P.: 'Singular

perturbations and order reduction in control theory - An

overview', Automatica, 1976, 12, pp.123-l32.


~
2. Sandell, Jr., N.R., Variya, P., Athans, M., and Safonov, M.G.:
~

'Survey of decentralized control methods for large scale systems',

IEEE Trans. Automatic Control, 1978, AC-23, pp.l08-l28.

3. Nicholson, H.: 'Structure of Interconnected Systems', Peregrinus,

London, 1978.
,
4. Fernando, K.V.M., and Nicholson, H.: 'Karhunen-Loeve expansion

with reference to singular-value decomposition and separation

of variables', Proc.IEE, 1980, l27D, pp. 204-206.

5. Teneketzis, D., and Sandell, Jr., N.R.: 'Linear regulator design

for stochastic systems by a multiple time-scales method', IEEE

Trans. Automatic Control, 1977, AC-22, pp.6l5-62l.

6. Moore, B.C.: 'Principal component analysis in linear systems:

Controllability, observability and model reduction', IEEE Trans.

Automatic Control, 1981, AC-26, pp.17-32.

7. Fernando. K.V., and Nicholson, H.: 'On the structure of balanced

and other principal representations of SISO systems', IEEE Trans.

Automatic Control, 1983, AC-28, (3), (tentative).

8. Pernobo, L., and Silverman, L.M.: 'Balanced systems and model

reduction', Proc.IEEE Conf. Decision and Control, Fort Lauderdale,

Florida, pp.865-867, December 1979.

9. Cottle, R.W.: 'Manifestations of the Schur complement', Linear

Algebra and its Applications, 1974, 8, pp.189-2ll.


-80-

10. Brillinger, D.R.: 'Time Series: Data Analysis and Theory',

Holt, Rinehart and Winston, New York, 1974.

11. Aoki, M.: 'Control of large-scale dynamic systems by

aggregation', IEEE Trans. Automatic Control, 1968, AC-13,

pp.246-253.
12. O'Malley, R.E., Anderson, L.R.: 'Singular perturbations, order

reduction, and decoupling of large-scale systems', in "Numerical

Analysis of Singular Perturbation Problems", (Hemker, P.W., and

Miller, J.J.H., eds.), Academic Press, London, 1979.

13. Ca1ise, A.J.: 'Singular perturbation methods for variational

problems in aircraft flight', IEEE Trans. Automatic Control,

1976, AC-2l, pp.345-353.


- 81 -

Singular Perturbational Approximations for Discrete-time Balanced Systems

Abstract: A new zero-order singular perturbational type approximation

is developed for model reduction of discrete-time linear systems.

This approximation is also suitable for model reduction of systems

which have fast subsystems and which are represented in the internally

balanced format. Subsystem elimination as suggested by Moore for

continuous-time systems does not generalise for the discrete-time

case and the singular perturbational approximation gives a particular

solution to the discrete-time internally balanced model reduction

problem.

1. Introduction

Principal axis realizations for discrete-time linear systems were


l
first introduced by Mullis and Roberts ,2 in the synthesi~of minimum
roundoff noise fixed-point digital filters. The problem under

consideration was to find the optimum word length necessary in registers

to optimize the storage and quantization efficiencies, simultaneously.


3
These results were extended by Maore for continuous-time linear systems.

The storage and quantization effects can be translated into controllability

and observability properties, respectively, if the control systems

terminology is used. The best trade-off between high controllability

with low observability and low controllability with high observability

is provided by internally balanced principal axis state-space representa-

tions, which contain equal amounts of information about controllability

and observability. Such balanced representations are convenient media

for model reduction since equal amounts of information about control la-

bility and observability can be neglected without causing any imbalance

in controllability or observability properties.


- 82 -

3
For continuous-time balanced systems. Moore proposed direct

elimination of weak subsystems which are characterized by small second-

order modes as a first approximation in the model reduction problem.

This is a convenient and a very acceptable technique since the resultant

reduced-order model is also internally balanced and retains the dominant

second-order modes of the original system. However, this technique does

not generalize for the discrete-time case in the sense that the reduced-

order system is neither internally balanced nor contains the dominant

second-order modes. The object of this Chapter is to demonstrate

that if a zero-order singular perturbational approximation is used in

model reduction of discrete-time systems which have fast subsystems.

internally balanced reduced-order representations can be obtained which

retain the dominant second-order modes.

2. Internally Balanced Discrete-time Systems

For the asymptotically stable. controllable and observable discrete-

time linear system S(A.B.C) described in the format,

x(k+l) - Ax(k) + Bu(k) y(k) - Cx(k)

the controllability Gramian matrix Wc (p) and the observability Gramian

matrix W (p) are defined by


o

W (p)
c -
w~
o -
If the system S(A,B,C) is a principal axis representation, then the

controllability and the observability Gramians are diagonal. If a system

{s represented by some other canonical form, then it can be brought into


- 83 -

. '
a princ1pal .
aX1S representat10n b y uS1ng
. "1
S1m1 '
ar1ty .
transformat1ons 1,2

If a similarity transformation is used to bring them to be equal

and diagonal, then the system is said to be internally balanced 3 ,4,9.

For the infinite time definition with p~, the Gramian matrices

W ~ W (~) and W ~ W (~) may be computed by solving the discrete-time


c coo
equivalents of the Lyapunov equations,

W - AW AT
c c -
=
For the internally balanced system S(A,B,C), the matrix equations are

given by
(1)

(2)

where W is a diagonal matrix. The assumed asymptotic stability,

controllability and observability properties ensure that the diagonal

values of W which are called the second-order modes are positive. We

assume, without loss of generality, that these are ordered in the

decreasing order of magnitude, and that they are distinct.

If the diagonal element w11


.. of the Gramian matrix W is small in

comparison with other elements, then the state x. of the controllable


1

system
x(k+l) = Ax(k) + Bu(k)

contributes marginally to the impulse response. Similarly, the

contribution towards the impulse response of the observable system

d
x (k+l) -
d
by the dual state Xi is small.
- 84 -

If the Gramian matrix W of dimension n,n, is partitioned in the

format,

where W and W are diagonal matrices of dimensions r,r and n-r,n-r


1 2
respectively, then the system S(A,B,C) can be partitioned to conform to

the above as,


- l-

A -t !11 -: -~12-1
A21 I A22
,

We assume that the diagonal elements of W2 are appreciably smaller


3
than those of W and following Maore , we call the subsystem S(~1.B1,C1)
l
the strong subsystem and the subsystem S(A 22 ,B 2 ,C 2) the weak subsystem.

However, we emphasise that the subsystems S(A11


.. ,B.,C.),
1 1
i - 1,2 are not

internally balanced as in the case of the continuous-time equivalent and

the submatrices W., i - 1,2 are not balanced Gramians of the subsystems.
1

These conditions preclude direct extension of the continuous-time results of

Moo re3 to discrete-time systems.

3. Singular Perturbationa1 Reduction of Discrete-time Systems


5
In a survey of large-scale systems , it was pointed out that

singular perturbationa1 results for discrete-time systems are not widely

available. This situation has been remedied to some extent in the

literature (see for example references 6-8). However. the published

results are not directly suitable for our problem.

For continuous-time systems. a subsystem is said to be fast if the

eigenva1ues of the subsystem are large (i.e. s 4 -m) in the complex plane,

and a subsystem is said to be slow if its eigenvalues are near the origin
- 85 -

(Le. s:c? 0). We carry this definition to the discrete-time case where

a subsystem is said to be fast if the eigenvalues are near the origin

(Le. z ~ 0) and slow if they are near z = 1 in the complex plane.

This extension is consistent with the well known sampled-data approximation

z = e sT . t h e samp l'1ng t1me.


where T 1S .

It is usually assumed in models of singular perturbation that

the fast states rapidly approach a linear combination of the slow

states. However, in references 6-8, it has been assumed that the

slow states can be approximated using the fast states which is the

converse of the usual assumption. Such a hypothesis leads to

approximations in fast-time rather than in slow-time.

In this section, we develop a slow-time approximation for discrete-

time sys tems. We recall that a continuous-time system S (F,G,H) can


lJ
be written in the singular perturbational format,

(s low-time)

(fas t-time)

where lJ is a positive small perturbational parameter. By considering

the approximation x{t) ~ [x{k+l)-x(k)] /T, we propose the analogous

singular perturbational discrete-time model SlJ{A,B,C) in the format,

As in the continuous-time case, we assume that the matrices Al2 and

A2l are small and that the subsys tem matrix A22 is fas t. With lJ -+ 0,

we obtain the approximation


- 86 -

Xl (k+l)

(slow-time)

Similarly, by considering the observable system, the reduced-

order approximation in slow-time can be obtained as S(A,B,C), with

B =

We observe that the inverse of the matrix (I-A 22 ) always exists under

our assumptions. The above approximation can be derived independently

without considering the controllable and the observable systems

separately 11 (see Appendix 3). To our knowledge, these reduced-order

approximations have not appeared in the control systems literature. However,

structurally similar "aggregations" have been used by Leontief in


12
econometric problems •
- 87 -

4. Weak and Fast Subsystems

If fast subsystems are present in the system, then the transients

associated with these subsystems will vanish quickly and thus the overall

contribution towards the impulse response of the system will be small.

Thus, fast subsystems will be characterized by relatively small diagonal

elements of the matrix W corresponding to weak subsystems. However, we

do not imply that all weak subsystems are fast and thus, this property

should be checked at each instant of application of the proposed method

of reduction.
lO
Numerical experience (see Chapter 5) with continuous-time systems

which are suitable for perturbational reduction indicates that fast

subsystems are substantially weak and this property also can be expected

in discrete-time systems.

5. Singular Perturbational Reduction of Balanced Systems

If the weak subsystem is also a fast subsystem, then it can be removed

using singular perturbational approximations. The following proposition

shows a means of achieving this.

Proposition: If the system S(A,B,C) is internally balanced, then the

singular perturbational approximation S(A,B,C) also defines an internally

balanced system. Further, the diagonal Gramian for the reduced-order

system is given by the matrix Wl • Thus, the matrix equations which

describe the Gramian are of the form,


AAT
W -~~ • BB (3)
l 1

W - ~WA
1 1 - ATA
CC (4)

Proof: Equation (3) can be obtained by premultiplying and postmultiplying


T
(1) by Tl and Tl ' respectively, where
- 88 -

= [I

Similarly, (4) can be obtained by premultip1ying and postmu1tip1ying


T
(2) by T2 and T2 ' respectively, where

.. [I I
I
- T -
A21 (I-A22 )
T -lJ

Since the Gramian matrix W is diagonal, the reduce~-order system ~s


1
also internally balanced and this completes the proof.

The next proposition indicates that the reduced model has very

desired properties.

Proposition: The reduced-order model S(A,B,C) is asymptotically stable,

controllable, and observable.

Proof: The positive definiteness of the diagonal matrix W guarantees


1
. 4
these propert~es •

We observe that the results given in this section are algebraically

correct whether the system conforms to the perturbational model

developed in Section III or not. However, the reduced-order model

S(A,B,C) cannot be considered as a good approximation of the original

system S(A,B,C) due to retention of the dominant singular values alone.

Thus, the supporting evidence from the singular perturbationa1 model

is required to justify our approximation.

6. Numerical Procedure

To obtain a reduced-order model for the system S(A,B,C)

use ...
s~~lar~ty trans f '
ormat~ons
4,9 to .
g~ve the a
b 1 ance d system
(a)

S(A,B,C)

(b) partition the system S(A,B,C) to give the strong subsystem

the diagonal values of the Gramian matrix W


- 89 -

(c) check whether the weak subsystem is fast by calculating the

eigenvalues of the matrix A22 and check the smallness of A12

and A2l
(d) if (c) is true, calculate the first-order singular perturbational

approximation
- -1-
A = All + A12 (I-A22 ) A2l B
- -1-
C = Cl + C2 (I-A 22 ) A2l

7. Conclusions

We have demonstrated the feasibility of singular perturbational

model reduction of systems which are represented in internally balanced

formats. This is particularly significant, since the subsystem


3
elimination method proposed by Moore breaks down for the discrete

case. However, the perturbational approximations are valid for


10
the discrete case as well as for the continuous case (Chapter 5).

Due to this inherent consistency of perturbational reduction in

internally balanced systems, it may be possible to use other techniques

available in the singular perturbational approach for the design and

analysis of balanced systems. Since the internally balanced

representations are well-conditioned with respect to controllability

and observability, which are fundamental in control and other systems

studies, balanced representations could be used to give a more format

structure with respect to controllability and observability in

singular perturbation analysis.


- ~ -

8. References
1. Mullis, C.T., and Roberts, R.A.: 'Synthesis of minimum roundoff

noise fixed point dieital filters', IEEE Trans. Circuits and

Systems, 1976, CAS-23 , (8), pp.55l-561.

2. Mullis, C.T., and Roberts, R.A.: 'Roundoff noise in digital

filters: Frequency transformations and invariants', IEEE Trans.

Acoustics, Speech and Signal Processing, 1976, ASSP-24, pp.538-550.

3. Moore, B.C.: 'Principal component analysis in linear systems:

Controllability, observability, and model-reduction', IEEE Trans.

Automatic Control, 1981, AC-276, (1), pp.17-32.

4. Pernobo, L., and Silverman, L.M.: 'Balanced systems and model

reduction', Proc. IEEE Conf. on Decision and Control, Fort

Lauderdale, Florida, pp.865-867, Dec. 1979.


~

5. Sandell, Jr., N.R., Variya, P., Athans, M., and Safonov, M.C.:
~

'Survey of decentralized control methods for large scale systems',

IEEE Trans. Automatic Control, 1978, AC-23, pp.l08-l28.

6. Phillips, R.e.: 'Reduced modelling and control of two-time scale

discrete systems', Int. J. Control, 1980, 31, pp.765-780.

7. Rajagopalan, P.K., and Naidu, S.: 'A singular perturbed difference

equation in optimal control problems', Int. J. Control, 1980, 32,

pp.925-936.
8. Blankenship, C.: 'Singularly perturbed difference equations in

optimal control problems', IEEE Trans. Automatic Control, 1981,

AC-26 , (4), pp.9ll-9l7.


9. Kung, S.-Y., and Lin, D.W.: 'A state-space formulation for optimal

Hankel-norm approximations', 1981, AC-26, (4), pp.942-946.


- 91 -

10. Fernando, K.V., and Nicholson, H.: 'Singular perturbational

reduction of balanced systems', IEEE Trans. Automatic Control,

1982, AC-27, (32), pp.466-468.

11. Fernando, K.V., and Nicholson, H.: 'Singular perturbational

model reduction in the frequency domain', IEEE Trans. Automatic

Control, 1982, AC-27, (4).

12. Leontief, W.: 'An alternative to aggregation in input-output


analysis and national accounts', The Review of Economics and

Statistics, 1967, 49, pp.4l2-4l9.


- 92 -

On Balanced Model-Order Reduction of Discrete-Time

Systems and their Continuous-Time Equivalents

Abstract: The model-order reduction method for continuous-time

systems based on subsystem elimination as proposed by Moore does

not generalise to the discrete-time case. In this Chapter, we

develop the discrete-time equivalent of the continuous-time technique

using the Cayley transformation between discrete-time and continuous-

time equivalents. The new reduced-order approximations based on

this method are exactly balanced and retain the dominant modes of

the system. The suitability of this result is verified using a

singular perturbationa1 model.

1. Introduction
l
The balanced model-order reduction method of Moore is based

on removal of 'modes' which are weak corresponding to joint contro1-

1ability and observability (i.e. the minima1ity) as characterised

by the controllability Gramian matrix Wand the observability


c
Gramian matrix Wo . The technique can be implemented by realization

of the system in the balanced format, in which case the two Gramian

matrices are equal and diagonal. The diagonal values are called

the second order modes of the system. The robust part of the system

is represented by relatively high second-order modes and the weak

part (if any) by low values. The direct removal of the weak sub-

system gives a robust approximation of the original system. Such

approximate representations retain the dominant second-order modes

of the original system and are balanced.


- 93 -

Direct generalization of this technique to the discrete-time

case is not possible. If subsystem elimination is used for

discrete-time systems, the resultant approximation neither retains


2
the dominant second-order modes nor is it balanced ,3.

In this Chapter, we develop the discrete-time equivalent of

the continuous-time balanced technique of Moore using the Cayley

transformation. This method gives reduced-order balanced

representations which retain the dominant modes of the original

system. The suitability of the approximation is investigated using

a singular perturbational model. Furthermore, it is shown that

this approximation can be considered as a 'generalized singular


6
perturbational' technique (Appendix 3). This solution is also

complementary to the singular perturbational approximation for


S
balanced discrete-time systems (Chapter 6). For completeness,

we summarize the balanced approximations for discrete-time systems

and their equivalents for continuous-time systems.

2. Preliminaries
For the continuous-time asymptotically stable time-invariant

system S(A,B,C)

x(t) - Ax(t) + Bu(t) y(t) = Cx(t)

the controllability Gramian matrix Wand


c
the observability Gramian

matrix W can be obtained as the solution of the Lyapunov equations


o
given by

w~+~
c c
=
- 94 -

l
If the system S(A,B,C) is a balanced representation , then the

Gramian matrices are diagonal and equal.


t:.
W
c
= Wo = W

The diagonal values of the Gramian matrix Ware called the second-

order modes, and we assume that they are ordered in the non-increasing

order of magnitude. We may partition the matrix W in the format,

where the diagonal matrix W contains the dominant modes of the


l
system and the matrix W contains the non-dominant values. Conforming
2
to the above partition, the system S(A,B,C) can be decomposed in

the format

B C =
A

By elimination of the weak subsystem S(A ,B ,C ), we may obtain
22 2 2
the reduced-order approximation of the original system as the sub-

system S(A ,B ,C ), which is a balanced representation with the


l1 l l
Gramian matrix Wl •
For the discrete-time asymptotically stable time-invariant

system S(F,G,H)
x(k+l) = Fx(k) + Gu(k) y(k) = Hx(k)

the Gramian matrices can be obtained as the solution of the matrix

equations given by
T
W - FW cFT = GG
c
W - FTW F = HTH
0 0
- 95 -

If the system is balanced, then

Wc = W0 "" W

where W is a diagonal matrix ordered in the non-increasing order of

magnitude. As in the continuous-time case, we may partition the

discrete-time system in the format

F = ]. G H

The subsystem S(F 2 3 as a reduced-


,G ,H ) has been suggested'
l1 1 1
order approximation of the original system. However, this sub-

system is not a balanced representation and does not retain the

dominant second-order modes of Wl" Thus, this approximation method

cannot be considered as the generalization of the balanced method


1
of Moore .

3. Balanced Model Order Reduction


7
It is well known that the Gramian matrices are identically

equal if the continuous-time system S(A,B,C) and the discrete-time

system S(F,G,H) are related by the Cay1ey transformation defined by

A - -(I+F)-l(I-F)

B ""
± 12 (I+F) -l G

C - ± 12 H(I+F)-l

This also corresponds to transformations between immittance matrix

and scattering matrix descriptions in network theory7,10.

If the weak subsystem S(A ,B ,C ) corresponding to the balanced


22 2 2
Gramian matrix W is removed, then the balanced approximation
2
- 96 -

S(A ,B ,C ) is given by
11 1 1

All - -(I+F)-l(I-F)

B1 = ± 12 (I+F)-l G

Cl = ± 12 H(I+F)-l

where F

-G
- -1
F11 - F12 (I+F 22 ) F21
-1
= G - F12 (I+F 22 ) G2
1

H - -1
Hl - H2 (I+F 22 ) F21

lO
This result may be verified using the well known matrix 1emma for

inversion of partitioned matrices. The reduced-order system

S(A ,B ,C ) has the balanced Gra~an matrix W and the equivalent


l l 1 l
discrete-time system S(F,G,H) also has the same balanced Gramian

matrix.
Thus, the required approximation corresponding to subsystem

elimination in continuous-time is given by the discrete-time sub-

system S(F,G,H) if the Cayley transformation is assumed as the

criterion of equivalence between continuous-time and discrete-time

systems.

4. The Singular Perturbational Interpretation


.
It has been demonstrate d prevlous 1y4,5 ( Ch apters 5, 6 and

Appendix 3) that balanced model-order reduction is consistent with

the singular perturbational technique. We now describe the derived

reduced-order model S(F,G,H) using a singular perturbationa1 argument.

To obtain this approximation, we define the singular perturbational


- 97 -

model S (F,G,H) in the form,


~

xl(k+l) + xl(k) = (Fll+I)x l (k) + F l2 x 2 (k) + Glu(k) (fast-time)

~(x2(k+l) + x (k)l = F x (k) + (F 22 +I)x (k)+G u(k) (slow-time)


2 ll 1 2 2

where fL denotes a small perturbational parameter. As usual, we assume

that the matrices Fll and F21 which define the interaction between

subsystems are small. Then, with~:O, we may obtain the required

approximations as

FX1(k) + GU(k) (fast-time)


-1 -1
= -(I+F 22 ) F 2l x1 (k) - (I+F l2 ) G2u(k) (slow-time)

It is obvious that the approximation is valid only if the

eigenva1ues of the matrix sum (I+F 22 ) are large. This is equivalent

to the requirement that the eigenvalues of the matrix F22 are near

z =1 in the complex plane. In addition, we assume that the eigen-

values of the matrix Fll are away from z =1 and thus the subsystem

S(Fll,Gl,H ) is a 'fast' subsystem.


l
Similarly, by considering the observable system, it is possible

to derive the complete singular perturbational approximation in

'fast-time' as S(F,G,H).

5. Generalized Singular Perturbational Balanced Approximations


6
Fernando and Nicholson (Appendix 3) demonstrated that balanced

approximations are special cases of 'generalized singular perturbational'

approximations. There are two possible balanced approximations for

continuous-time systems with equivalent results in discrete-time.

For completeness, we review these results and indicate the applica-

bility and suitability of each reduced-order representation.


- 98 -

The generalized approximation SeA ,B ,C ) for continuous-time


g g g
systems are of the form,
-1
Ag (sO) = All + A12(sOI-A22) A21
-1
Bg(SO) = B1 + A12(sOI-A22) B2
-1
Cg(sO) Cl + C2 (sOI-A 22 ) A21

where So is the dominant frequency of the robust subsystem S(All,Bl,C I )

and the non-dominant frequency of the weak subsystem S(A 22 ,B ,S2)'


2
Similarly, for discrete-time systems, they are given by

F g (zo)

Gg(zo)

Hg(zo)

If the frequency So is negative infinite corresponding to a

'fast' subsystem S(A1I,BI,C l ) and a 'slow' subsystem S(A 22 ,B 2 ,C 2),

then the result obtained by Moore through subsystem elimination can

be derived.
= SeA
g
(~), B
g
(~), C
g
(~»

The corresponding discrete-time equivalent with Zo = -1 is given by

S(F ,G,ii) - S(F (-1), G (-1), H (-1»


g g g

If the frequency So is zero, corresponding to a 'slow' subsystem

S(A ,B ,C ) and a fast subsystem S(A 22 ,B 2 ,C 2), then the result


l1 1 1
4
obtained by Fernando and Nicho1son (Chapter 5) manifests,

= SeA (0), B (0), C (0»


g g g
- 99 -

which is the traditional singular perturbationa1 approximation

given by the Schur complement. The corresponding approximation

in discrete-time with Zo = 1 is then given by

= S(F (1), G (1), H (1»


g g g

S
which was derived by Fernando and Nicho1son (Chapter 6).

Thus, it is possible to present a unified approach for balanced

model-order reduction using the generalized singular perturbationa1

approach.

6. Conclusions

We have developed the equivalent of the subsystem elimination

method of Moore in discrete-time and have shown that it does not

generalise in the manner suggested by Moore. Instead, balanced

model-order reduction which retains the dominant second-order modes

is best explained using singular perturbational arguments.

For completeness, we have demonstrated that there are two

continuous-time generalized approximations which can give balanced

reduced-order models. The 'fast' and 'slow' approximations have

equivalent counterparts in discrete-time systems through the Cayley

transformation. In network theory, this relationship can be

considered as the equivalence between immittance matrix and

scattering matrix descriptions.

Perhaps the feasibility of singular perturbationa1 model-order

approximations in the framework of balanced systems was first


- 100 -

9
suggested by Verriest and Kai1ath but was not elaborated by those

authors. We have demonstrated conclusively that the singular

perturbational technique is the central theme in balanced model-

order reduction in both continuous-time and discrete-time systems.

7. References
1. Moore, B.C.: 'Principal component analysis in linear systems:

Controllability, observability, and model reduction', IEEE

Trans. Automatic Control, 1981, AC-26, (1), pp.17-32.

2. Kung, S.: 'A new identification and model reduction algorithm

via singular value decomposition', 12th Asilomar Conf. Circuits,

Systems, and Computers, Nov. 1978.

3. Si1verman, L.M., and Bettayeb, M.: 'Optimal approximations of

linear systems', Proc. JACC, paper FA8-A, August 1980.

4. Fernando, K.V., and Nicho1son, H.: 'Singular perturbationa1

reduction of balanced systems', IEEE Trans. Automatic Control,

1982, AC-27, (2), pp.466-468.

5. Fernando, K.V., and Nicho1son, H.: 'Singular perturbational

approximations for discrete-time systems', IEEE Trans. Automatic

Control, 1983, AC-28, (n.


6. Fernando, K.V., and Nicho1son, H.: 'Singular perturbational

model-order reduction in the frequency domain', IEEE Trans.

Automatic Control, 1982, AC-27, (4).

7. Anderson, B.D.O., and Vongpanitlerd, S.: 'Network Analysis and

Synthesis', Prentice-Ha11, Eng1ewood Cliffs, NJ, 1973.

8. Pernebo, L., and Silverman, L.M.: 'Model reduction via balanced

state space representations', IEEE Trans. Automatic Control,

1982, AC-27, (2), pp.382-387.


- 101 -

9. Verriest, E., and Kai1ath, T.: 'On generalized balanced

realizations', Proc. 19th IEEE Conf. Decision and Control,

A1buquerque, NM, pp.504-505, Dec. 1980.

10. Nicho1son, H.: 'Structure of Interconnected Systems',

Peregrinus, London, 1978.


- 102 -

CHAPTER 8

Reciprocal Transformations in Balanced

Model-Order Reduction

Abstract: Direct elimination of weak subsystems and singular

perturbationa1 approximations given by the Schur complement have

been suggested for model-order reduction of balanced linear systems.

These two approaches are dual to each other and such reciprocal

approximations are well known in other model-order reduction

techniques. A standard example is used to illustrate the two

reciprocal approaches.

1. Introduction
Moore 1 was abe l
to ·1nterpret 1n
. an e I egant manner, the m1n1ma1
. .

realization problem and the model-order reduction problem in linear

systems theory from the point of view of 'signal injection'.

Instead of relying on the classical parameters of the system, which

may be susceptible to structural instabilities, Moore based the

realization on second-order averages of the controllable part of

the system and the observable part which are excited by impulse

inputs. These averages, given by the controllable Gramian matrix

and the observable Gramian matrix are central to the realization of

internally balanced models. Using this approach, Moore was able

to show that 'nearly optimal' reduced-order representations which

have approximately the same impulse responses as the original system

can exist. This is achieved by removing the 'weak' subsystems and


- 103 -

retaining the robust part of the system corresponding to the 'strong'

second-order modes. These second-order modes are given by Gramian

matrices which are diagonal and equal for balanced realizations.


4
Fernando and Nicholson (Chapter 5) considered the special case

where the weak subsystem corresponds to fast dynamics and the strong

subsystem to slow dynamics of the system. Most physical systems

behave in this manner, and the concept has been exploited in modal

methods of model-order reduction. Singular perturbational

approximations are feasible for this case and a low-frequency

approximation at s =0 is given by the usual Schur complement result.


5
Fernando and Nicholson (Appendix 3) also demonstrated that if the

weak subsystem and the strong subsystem are due to slow and fast

dynamics, respectively, then a 'generalized singular perturbational'

approximation is possible at s • -m. This result is given by direct

elimination of the weak subsystem which was first proposed by Moore

as a 'balanced model-order reduction'. However, Moore derived this

result using reasonable but heuristic arguments and no conventional

explanation was given. In both these approaches, the resultant

reduced-order representations are balanced and retain the dominant

second-order modes of the original system.

The object of this Chapter is to show that the direct elimination

method of Moore and the usual perturbational result are related by

a reciprocal transformation. Similar transformations are well known


. 2 3 10
in other model-order reduction techn~ques " (Appendix 4). We

demonstrate the viability of the two approaches and compare the

results using a standard example.


- 104 -

2. Internally Balanced Models and their Reciprocals

For the time-invariant, asymptotically stable linear system

S(A,B,C) described by
x(t) .: Ax(t) + Bu(t) yet) .. Cx(t)

the controllability Gramian matrix W 2 and the observability Gramian


c
matrix W 2 may be defined in the infinite interval by
o

CD

W2 (eAtB) (eAtB)T dt
c
OK
f
0

CD T T
W2 (eA tc T) (eA tCT)T dt
0
= f
0

These Gramian matrices can also be obtained as the solution of the

following Lyapunov equations

The state-space representation S(A,B,C) is said to be internally


1
balanced if the Gramian matrices Wc 2 and W0 2 are equal and diagonal .

W 2
c
... W 2
o
..
Internally balanced representations can be obtained using similarity

transformations 1 and without loss of generality, we assume that the

system S(A,B,C) is internally balanced with the diagonal Gramian


. W2 • 2
matr1x The diagonal elements of the Gramian matrix W are

called the second-order modes of the system and we assume that they

are ordered as a non-increasing sequence.


- 105 -

We define the 'reciprocal system' S(F,G,H) of S(A,B,C) by

F = A-I G .. H
- CA
-1

The system S(F,G,H) is also asymptotically stable and the Lyapunov

equations are given by

_ GGT

-
The reciprocal system S(F,G,H) is also controllable and observable
• W2 •
and has the same diagonal Gramian matr1x

It is easily seen that the reciprocal system S(F,G,H) 1S given

by the system S(A,B,C) and thus these two systems are dual to each

other. This dual relationship is also reflected in the Markov

parameters and the moments of the system. The k th momen t of the

system S(A,B,C) is equal to the (k+3)rd Markov parameter of the

system S(F,G,H)

- k > 1

Similarly, the kth moment of the system S(F,G,H) is given by the

(k-l)th Markov parameter of the system S(A,B,C)

- CA
k-l-l
B

These relations indicate that a reciprocal system has reciprocal

properties to the original system. We observe particularly that

if a system has dominant high-frequency behaviour, then the reciprocal

system will have dominant low-frequency behaviour and vice-versa.

Similar reciprocal relationships are well known in the model-

reduction literature and have been widely used to eliminate inherent


2 3 10
frequency biases in the Routh approximation method' and elsewhere •
- 106 -

3. Model-Order Reduction of Balanced Systems and their Reciprocals

The nth order internally balanced system S(A,B,C) can be

partitioned in the format,

A • (::: I :::). B - (::)

subsystems, respectively. If the diagonal Gramian matrix W2 is

partitioned in the format

2
the diagonal Gramians Wi ' i - 1,2 can be associated with the internally

balanced subsystems S(A11


.. ,B.,C.),
1 1
i = 1,2. Without loss of generality,

we assume that the diagonal values of the matrix wl 2 are large in


l
comparison with the elements of W22. Following Moore , the sub-

system S(All,Bl,C ) is called the 'strong' subsystem and the subsystem


l
S(A ,B ,C ), the 'weak' subsystem.
22 2 2
Moore advocated, as a first approximation, direct elimination of

the weak subsystem and thus, the reduced-order model is simply given

by the strong subsystem S(All,Bl,C l )·

We now investigate the structure of the model if reciprocal

transformations are used in this model-order reduction process.

Instead of eliminating the weak subsystem of the system S(A,B,C),

it is first transformed into the reciprocal format S(F,G,H) and the

weak subsystem of S(F,G,H) is eliminated to give S(F ,G ,H ). The


ll l 1
- 107 -

reduced-order model S(All,Bl,C l ) is obtained by computing the


Thus,

reciprocal eliminate weak reciprocal


~ S(F,G,H) - - - - - -.... S(F
S(A,B,C) -----...,.. ,G ,H ) ... S(A ,B l ,Cl>
subsystem ll l l ll

Again, we point out that similar intermediate reciprocal transformations


2
are common in the model-order reduction literature ,3,lO.

If this procedure is followed, then the reduced-order model


A A A

S(All,Bl,c ) is given by the usual singular perturbational approximation,


l

= -
=
4
It can be shown that reduced-order model S(AII,BI,C l ) is also an

internally balanced system with the balanced Gramian matrix W12.

The above low-frequency singular perturbational approximation

can be derived by considering the inverse of the matrix A in the

format of the well known lemma for inversion of partitioned matrices


'l
which 1S ' known as t h
a so somet1mes e K "
-part1t10ne d 1nverse.
' 8

with -
Thus. a direct duality exists between direct and reciprocal elimination

of subsystems through the reciprocal transformation.


- 108 -

We recall that for low-frequency singular perturbational

approximation to be valid, the weak subsystem S(A ,B ,C ) has


22 2 2
to be a fast subsystem. However, the high-frequency approximation

given by S(All,Bl,C ) is valid only if the weak subsystem S(A ,B ,C )


l 22 2 2
is slowS. Thus, these two methods are complementary to each other.

4. An Illustrative Example

To illustrate the two different approaches in the balanced

model-order reduction problem, consider the system described by the

transfer-function,

(s+4)
(s+l) (s+3) (s+5) (s+lO)

,
wh1ch .
has been prev10us 1y stud'1e d by M'
e1er an d Luenberger 6 .
, W1lson 7,

1 9
and recently by Moore and Fernando and Nicho1son (Chapter 11).

The system can be realized in the internally balanced format 1 ,

-0.4378 1.168 0.4143 0.05098 -0.1181

A =
-1.168

0.4143
-3.135

2.835
-2.835

-12.48
-0.3289

-3.249
, B
- -0.1307

o ~O5634
-0.05098 -0.3289 3.249 -2.952 -0.006875

C = [-0.1181 0.1307 I 0.05634 0.006875)

2
and the balanced Gramian matrix W is given by,
2 3
W2 _ diag[0.01954 0.272xlO- I 0.1272xlO- O. 8006x10 -5 )

The system can be decomposed into 2x2 subsystems,in the natural order,

as shown. The subsystem S(A ,B ,C ) is the strong subsystem and it


11 l l
is the reduced-order balanced approximation of Moore.
- 109 -

If the reciprocal method of reduction is used, the perturbational

result is given by

1.257
An - [ -0.4249
-1.257 -3.735 ] , BI - [ -0.1164]
-0.1427

... [-0.1164 0.1427)

We observe that the perturbationa1 correction is not very prominent

for this example. The eigenvalues of the subsystem matrices All

and A22 are given by

A(A
ll
) = -1.113 -2.460

A(A ) - -11.20, -4.232


22

Thus, the weak subsystem is faster than the strong subsystem, although

the effect is not very pronounced.

The transfer functions associated with the reduced models,

together with their error ratios are given in Table 1. If the

scalar het) denotes the impulse response of the original system and

het) is a reduced-order representation, then the error ratio is

defined as,

where h e (t) - het) - het)

For comparison purposes, the optimal solution with respect to


. . 6,7
the least-squares cr1ter10n ,
- 110 -

minimize f
o
h 2(t) dt
e

is also included in Table 1. This was obtained by direct numerical

optimization.

It can be seen from Table 1 that the reduced-order models

obtained through direct elimination and reciprocal elimination

(singular perturbational result) are nearly optimal. If the

optimality is paramount, the direct eliudnation result is the

preferred solution for this problem. However, the near optimality

has been achieved by having a badly positioned numerator zero. In

the result based on the reciprocal approach, the zero is most

favourably positioned. This can be an advantage in certain

applications and design procedures.

Figure 1 gives the frequency response of the systems. Both

reduced-order models represent the original transfer-function

adequately. As expected, at high frequencies, the direct solution

is marginally better than the reciprocal approach.

Figure 2 gives the phase behaviour of the original system and

the reduced-order representations. Even at high frequencies, the

phase behaviour of the reduced-order representation obtained through

reciprocal transformations is better than the direct approach.


- 111 -

5. Conclusions
We have demonstrated that a duality exists between direct

elimination of subsystems as proposed by Moore and the low-frequency

singular perturbational-type approximation as given by the Schur

complement, and the two approximations are reciprocal to each other.

A standard numerical example is used to illustrate the form of

the reduced-order solutions. There are certain minor advantages

and disadvantages in these two approaches and we have compared

these with the optimal solution.

If the system under consideration has a clearly defined weak

subsystem which is fast (as in the example in reference 4), then

the singular perturbational approach seems to give the best solution.

Alternatively, if the weak subsystem is slow then the direct

elimination method would be preferred. However, if the weak sub-

system is either marginally fast or slow, then both these approaches

should be investigated to determine the appropriate approximation

which can be based on optimality, frequency response or other

criteria.
- 112 -

Table 1: The reduced-order models and their error ratios

method transfer function error ratio

direct elimination -o.003127(s-23.14)


(s+1.113)(s+2.460) 0.03938
1
(after Moore )

reciprocal -0.006808(s-12.29)
(s+1.003) (s+3.158) 0.06931
elimination

-0.003222(8-22.66)
'optimal' (s+1.099) (s+2.511) 0.03929
- 113 -

6. References

1. Moore, B.C.: 'Principal component analysis in linear systems:

Controllability, observability, and model reduction', IEEE Trans.

Automatic Control, 1981, AC-26, (1), pp.17-32.

2. Hutton, M.F., and Friedland, B.: 'Routh approximations for

reducing order of linear, time-invariant systems', IEEE Trans.

Automatic Control, 1975, AC-20, (3), pp.329-337.

3. Fernando, K.V., and Nicholson, H.: 'On the applicability of

Routh approximations and allied methods in model-order reduction',

IEEE Trans. Automatic Control, subudtted for publication.

4. Fernando, K.V., and Nicholson, H.: 'Singular perturbational

reduction of balanced systems', IEEE Trans. Automatic Control,

1982, AC-27, (2), pp.466-468.

S. Fernando, K.V., and Nicholson, H.: 'Singular perturbational

model-order reduction in the frequency domain', IEEE Trans.

Automatic Control, 1982, AC-27, (4).

6. Meier, Ill, L., and Luenberger, D.G.: 'Approximation of linear

constant systems', IEEE Trans. Automatic Control, 1967, AC-12,

pp.S8S-S88.
7. Wilson, D.A.: 'Optimum solution of model-reduction problem',

Proc.IEE, 117, (6), pp.1l6l-116S, 1'170

8. Nicholson, H.: 'Structure of Interconnected Systems', Peregrinus,

London, 1978.
9. Fernando, K.V., and Nicholson, H.: 'On the Cauchy index of linear

systems', IEEE Trans. Automatic Control, 1983, AC-28, (3).

10. Gutman, P.O., Mannerfelt, C.F., and Molander, P.: 'Contributions

to the model reduction problem', IEEE Trans. Automatic Control,

1982, AC-27, (2), pp.4S4-455.


o - 114 -

-20

-40-'

-60

-80

-100

-120
.
00
Q)
"0

'"'
-140 .,..,
3
'-'
=
,..00
cC
-160

-180

-200

a - reciprocal elimination
-220 b - direct elimination
c - original system b

-240 a

- 2 ' 6 0 4 - - - - - - - - - - - - , r - - - - - - - - - - - - r - -_ _ _ _ _ _ _ _-,
0.1 1.0 10 100

frequency w rad/sec

Fi~. 2. Frequency Response : argument of H(jw)


- 115 -

lxlO- 2

lxlO- 3 -3
.'-'
'-"
::r:

a - reciprocal elimination b
lxlO- 4 b - direct elimination
c - original system

lXlO- 5
+-----------------~----------------r_--------------
0.1 1.0 10
__
J.ar
frequency w rad/sec

Fig. 1. Frequency Response: magnitude of H(jw)


- 116 -

PART 4

The Cross-Gramian Matrix W in


co--
Linear Systems Theory
- 117 -

CHAPTER 9

On the Structure of Balanced and Other Principal

Representations of SISO Systems

Abstract: A new matrix W which can be considered as a cross-Gramian


co
matrix which contains information about both controllability and

observability is defined for single-input, single-output linear

systems. Using this matrix, the structural properties of linear

systems are studied in the context of principal component analysis.

The matrix W can be used in obtaining balanced and other principal


co
representations without computation of the controllability and the

observability Gramians. The importance of this matrix in model-order

reduction is highlighted.

1. Introduction
Moore 1 used concepts from the principal component analysis of

Rotelling to investigate the controllability and observability of

linear systems and also as a tool for model-order reduction. The

technique is based essentially on simultaneous diagonalization of the

controllability Gramian W 2 and the observability Gramian W 2 using


c 0

appropriate similarity transformations. It was shown by Moore that

it is somewhat inadequate and sometimes misleading to study control-

lability or observability, individually, and combined investigations

are required.
In this Chapter, we define a new matrix Wco which can be considered

as a cross-Gramian matrix and which carries information pertaining to

controllability and observability, and which is directly connected to

both controllability and observability Gramians. Thus, this new


- 118 -

matrix is a natural candidate for the study of combined investigations

of controllability and observability and is used to expound the

structure of 5150 linear systems in the framework of principal

component analysis.

Our main results are based on the absolute value symmetry of

the state matrix under balanced conditions. However, the analysis


3
could be carried out using more general principal (axis) representations

in which more specific balanced representations also belong. The

role of principal representations in model-order reduction is also

investigated.
The spectral structure of the matrix Wco is paramount in our

analysis and in fact the absolute values of the spectrum are given by

the singular values of the system. We also show the relationship

between the singular values and the dc gain of the system and the

importance of that result as an alternative criterion for model-order

reduction.

2. Preliminaries

For the linear nth order single-input, single-output asymptotically

stable time-invariant system 5(A,b,c) described by

x(t) - Ax(t) + bu(t) y(t) - cx(t)

the controllability Gramian matrix Wc 2 is defined as


T
W2 _ f (eAtb) (eAtb)T dt
c 0

where the term eAtb represents the impulse response of the states of

the system. We assume that the system is controllable and thus, W 2


c
is a positive definite matrix. Similarly, the observability Gramian
• 2
matrix Wo can be defined by considering the impulse response of the

dual system.
- 119 -

T
with W2
o
= f
o

We also assume that the system is observable, resulting in a positive

defini te observab l1t Gramian °


1
If the time interval of interest (O,T) is taken as infinite,

then the Gramian matrices can be obtained by solving the following

Lyapunov equationso

W 2AT + AW 2 = (1)
c c

- - c c
T
(2)

In the principal component analysis approach of Moore l , the

system S(A,b,c) is transformed into normalized and balanced forms

which have one of the following constraints.

Input normal form W 2(p) = I ,


c

Output normal form W 2(p) _ I


o

Internally balanced

The matrix P denotes the similarity transformation


-1 -1
S (A,b ,c) + S(P AP,P b,cP)

required to bring the Gramian matrices to normal or the balanced

formats ( see tabl e 1) ° Tb e °


matr~x ~2
~ °
1S dO~agona 1 and the diagonal

positive elements are called the singular values of the system.


- 120 -

Table 1

The system under the similarity transformation P

original transformed

A p-1AP

b P- 1b

c cP
W2 p-lw 2(p-1)T
c c
W2 pTW 2p
0 0
W 2W 2 p-lW 2w 2p
c 0 c 0
Wco p-lW p
co

We assume that they are ordered in the non-increasing order of

magnitude.
We observe that the normal and the internally balanced forms

differ only by a diagonal similarity transformation. A more general

format which encompasses the normal and the balanced forms can be

defined in the following manner, which is called a principal axis


. 3
representat10n •

Principal (axis) representation W 2(p) = 1: 2


c c

where I: 2 and I: 2 are positive diagonal matrices.


c 0

We note that one of the diagonal matrices E 2 or I: 2 is arbitrary


C 0

but not both. We denote internally balanced and principal

representations by S(A,b,c) and S(A,b,c), respectively.


- 121 -

3. The Symmetry in Internally Balanced Systems

Moore 1 referred to the absolute value symmetry of the state

matrix A in single-input, single-output internally balanced systems.

We present that property as a lemma.

Lemma 1: If the system S(A,b,c) is internally balanced, then

(a) b.
l.
= ±c.l.
(b) either a ..
1.J
:z a ..
J1.
or a.
1.
2

A A
::11 a.
J
2
if b.b.
1. J - A A
c.c. ;. 0
1. J

(c) a .. s -a .. if b.b. = -c.c. ; 0


1.J J1. 1. J 1. J
2 2
(d) either a ..
1.J
= -a J1.
.. and a.
1.
,. a.
J
or a .. • a .. - 0 if b.b.
1.J J1. 1. J
::11 c.c. - 0
1. J
where a .. denotes the i,jth element of the matrix A.
1.J

Proof: Since the Gramian matrices W 2(p), W 2(p) are diagonal and
c 0

equal to ~2 the diagonal elements of (1) and (2) are of the form,
A 2 A 2 A 2
2a 1.1.
.. a.1. • -b.1. - -c.1. for i - l,n

and part (a) of the lemma is true.


The i,jth elements of (1) and (2) are given by
2A 2
a. a .. + a . . a. • - b.b. (3)
1. J1. 1.J J 1. J
2A 2
a. a .. + a . . a. • - c.c . (4)
1. 1.J J1. J 1. J

and the difference and the sum of (3) and (4) are of the form,

(a.
J
2
1.
2
1.J
A
a. ){a .. - a .. )
J1.
A
. - b.b. + c.c . (5)
1. J 1. J

(a.
J
2 2 .-
1.
.-
+ a. ){a .. + a .. )
1.J J1. - - b.b. - c.c. (6)
1. J 1. J

A A

If b.b. = c.c. ;. 0, then from (5), the element a .. appears symmetrically


1. J 1. J 1.J
- 12l -

2 2
in the matrix A or a.1 - a .• If the second possibility
J
b.b. = -c.c. ~ 0 is satisfied, then from (6), the element a .. appears
1 J 1 J 1J
skew-symmetrically.
If the remaining possibility, b.b. • e.c. • 0 is true, then from
1. J 1. J
'b'l" 2 2 . 2
(3) and (4) one POSS1 1 1ty 1.S a.. - a .•• A
However, S1.nce a. and
1.J J1 1.
a. 2 are positive, the condition a .. - a .. is not admissible and hence
J 1.J J1.
2 2 A
D
a .. - -a .. and a. - a.. The other possibility is a .. - a .. - O.
1.J J1. 1. J . 1.J J1.
The appearance of non-symmetrical or non-skewsymmetrical elements

according to part (b) of lemma 1 is non-generic. The following result

indicates that internally balanced formats can be found with absolute

value symme try.

Lemma 2: Let S(A,b,c) be a balanced representation with non-distinct

singular values of multiplicity two. If the state matrix A is not

absolute value symmetric, then an orthogonal transformation can be

found which transforms the representation to another balanced format

with absolute value symmetry.

Proof: Without loss of generality assume that the first two singular

values of the system S(A,b,c) are non-distinct and thus, the two

elements a and a do not appear symmetrically or skew-symmetrically.


12 2l
NoW consider the orthogonal transformation Q where

, 1
Q - M • I M -
n-2
1m2 + 1

The symol • denotes the direct sum.

Under the orthogonal transformation the representation becomes,


"TA TA"
S(A,b,c) ~ S(Q AQ,Q b,cQ)
- 123 -

It is easily seen that S(A,b,~) is still internally balanced since

the diagonal Gramian matrices are invariant under this transformation.

Again, lemma 1 guarantees the absolute value symmetry of the matrix

A except the elements a 12 and a 21 • These elements are given by

2 -
(l+m ) a
12
.. a
12
- (a
22
-a
ll
)m - a
21
2
m

2 -
(l+m )a
2l - a
21
- (a
22
-a
ll
)m - a
12
2
m

If we constrain these two elements to be skewsymmetric, then m should

satisfy the quadratic equation given by

A real solution for m always exists for the above equation and the

existence of a real orthogonal transformation with required properties

is proved. 0

Remark 1: Since, a - -a12 , then from (3) and (4)


21

This shows that non-absolute value symmetry as considered in lemma 1

part (b) can be considered in part (d) of lemma 1, after the

orthogona1 transformation. a

Remark 2: According to part (b) of lemma 1, either a 12 - a or


21
2
a - a 2• However, a patho1og~ca1
.
case can occur where a - a
l 2 12 2l
Then, under the orthogona1 transformation,

and ei ther or is zero.

If the system is second order, then it is not an asymptotically stable

system which contradicts our assumption about stability. This argument

seems to be correct even for higher order systems.


D
- 124 -

In the remainder of thiscL"r~, we assume that the internally

balanced representation possesses the absolute value symmetry and

that the system is not pathological (as defined in remark 2).

4. The Cross Gramian Matrix Wco

Using the impulse responses of the controllable system and the

observable system, we define a matrix Wco as

Cl)

W - I eAtb ceAtd t (7)


co o

which we call the cross-Gramian matrix of the system. To our knowledge,

this matrix W has not appeared previously in the control literature.


co
Perhaps the closest analog is the cross-covariances in statistical

analysis where the usual Gr~an matrices could be considered as

auto-covariances under appropriate white noise inputs instead of the

usual impulse inputs.


It is easily seen that the matrix Wco can be computed by solving

the linear matrix equation,

W A + AW
co co
= - bc (8)

Since the state matrix A is assumed to be stable, a unique solution

matrix W exists. Standard algorithms are available for Obtaining


co
. 2
this solut1on •
It is intuitively clear that the matrix Wco carries information

about both controllability and observability. This contrasts with

the Gramian W 2 which contains controllability data only and the


c
Gramian W 2 which contains observability data only.
o
- 125 -

y oore 1 t h at ·
It was s h own bM ~t ·
1S·~na d equate an d .
somet~mes

mis leading to study controllability and observability, individually,

and a combined approach is required in the analysis of dynamical

systems. We will demonstrate that this cross-Gramian matrix W


co
is consistent with the philosophy advocated by Moore and also

fundamental to it.

It is easily seen from table 1 that the eigenva1ues of the matrix

Ware invariant under similarity transformations of the system.


co
We denote these eigenva1ues by A., i - 1,n and form the diagonal
~

eigenvalue matrix A as

The following result shows the importance of the matrix W in


co
the principal component analysis of SISO linear systems.

Theorem 1: If the system S(A,b,c) is internally balanced with absolute

value symmetry in the state matrix A, then

(a) the corresponding cross-Gramian matrix W (P) for the balanced


co
system is diagonal and the diagonal elements are given by

A.
~
o.
~
2 if c.
~ - b.
~
; 0

A.
~
= - o.
~
2
if c.
~
. - b.
~

(b) the square of the matrix Wco (P) is equal to the product

W 2(p)W 2(p) under any arbitrary similarity transformation


c 0
P. That is,

W
2 ...
co

Proof: Form the diagonal signature matrix U such that

u. 1 if c. a b. ; 0
~ ~ ~

u. - -1 if c.~ - - b.~
~
- 126 -

If the system is internally balanced, (2) can be written in the form,


2A AT 2 ATA
LA+ A L = - c c

and by premultiplying the above by U, we obtain


2 A
,.
(UL )A + (UATu) (UL2) - bc (9)

. AT
The i,j th element of the matnx UA U is equal to u.u.a .. and we
1. J J1.
consider all four possibil i ties.

c.
1. - b.
1.
c.
J - b~ 1 a ••
1.J
= a ..
J1.
u.u.a ..
1. J J1.
• a ..
1.J
c.
1. = -b.
1.
c.
J
= -b.
J

-
- ~
-b. c. b.
c.
1.
c.
1.
=

- b.
1.

1.
J
c.
J
J
-b.
J
a ..
1.J
= -a ..
J1.
u.u.a ..
1. J J1. - a ..
1.J

Thus, UATU - A, and by comparing (9) with (8), we obtain the required

result (part a)

W (P)
co - = (10)

To prove part b, from (10) W 2(p)


co
= L4. However, W 2(p)W 2(p) _ E4
c 0

and hence

= Wco 2(p)

It is easily seen that this result is true even under any arbitrary
a
similarity transformation F, which completes the proof.

The following result e~tends the above theorem concerning balanced

systems to more general principal representations.

Corollary 1: If the system S(A,b,c) 1.S a principal representation,

then the corresponding matrix Wco (P) is diagonal and the diagonal

• elements are given by


- 127 -

-
= o.1
2

2
if sign(c. )
-1- sign(b. )
1
, b.1 - C.1 ~ 0

-0. if sign(c.) = sign(-b. )


1 1 1

Proof: We observe that any principal representation differs from the

internally balanced format by a diagonal similarity transformation.

However, the matrix W (P) is equal to h when the system is internally


co
balanced and it does not vary under diagonal similarity transformations.

The rule for obtaining the signatures are obvious.


o

The following result confirms the validity of the converse of

the theorem 1 for principal representations.

Corollary 2: If the matrix Wco<P) is diagonal, then the corresponding


- - - --1 - --1 -
system S(A,b,c) - S(P .AP,P b.cP) is a principal representation.

Proof: If the matrix W (P) is diagonal, then it is equal to the


co
eigenvalue matrix (assuming that the diagonal elements are ordered

in the non-increasing order of absolute value magnitude). It can be

shown by substitution that

W 2(p)
c - - 2 - .. , ••• )
- diag( .•• , b.1 /2a 11

- 2 -
W 2(p) "" - diag( •••• , c.1 / 2a 11
..•••• ) for a .. ;. 0
0 11

thus satisfying the conditions for principal representations. a


The following result indicates that the matrix W is well-
co
conditioned even when there are non-distinct singular values.

corollary 3: The eigenva1ues of the matrix Ware always distinct


co
provided the multiplicity of singular values are at most two.
2 2
Proof: If o. and o. are equal, then due to lemma 2,
1 J

a.. "" - a ..
1J J1
2 2
However, due to theorem 1, A.1 - o.1 A. - - o.1
J

(or vice versa) and thus, the eigenvalues are distinct.


- 128 -

5. Model-order Reduction using Principal Representations

Moore 1 used internally balanced representations in model-order

reduction based on subsystem elimination. However, as the following

result indicates, we may use principal representations instead of

internally balanced representations in model-order reduction and

obtain the same reduced-order model.

Theorem 2: Let S(A,b,c) be the internally balanced representation and

S(A,b,c) be a principal representation with


-1 - -1--
S(A,b,c) s S(D AD,D b,cD)

where D is an arbitrary diagonal matrix which defines all possible

principal representations for that system (assuming that the matrix A

is ordered in the non-increasing order of absolute value magnitude).

If the internally balanced representation and the principal

representation are partitioned in the format,

A -( All

AZl
A12

AZ2 1
b - ( ~: 1 , c =[ Cl c
2
]

A = (~ll
AZl
A12

A22 1
b - ( ~: 1 c -( Cl ~z ]

with order of All - order of All etc, then the balanced representation
A A A ~ -

S(All,bl,c ) and the principal representation S(All,b1,c l ) describe


l
the same reduced-order model.

Proof: If the matrix D is partitioned in the same format as,

then -
which completes the proof. a
- 129 -

If we decompose the matrices E2 and A conforming to the

partitions in theorem 2, then

= ,

Moore l used the trace of the diagonal Gramian matrix E22 as a

measure of error in model reduction. The trace of the matrix E2,

which is equal to the sum of the singular values, can be considered

as the total "energy" of the system and relative error ratios can be

computed in conjunction with this value.

The following result indicates that the trace of the matrix is

related to the dc gain of the system.

Theorem 3: The sum of the eigenvalues Ai' i = l,n gives half the dc

gain of the system. That is

trace A =

Proof: wco -
trace W
co - =

Since, trace W
co
= trace A, the result follows. D

An alternative criterion for model reduction can be stated using

the above theorem. Instead of the requirement that the trace of the

matrix E2 is "small", we may specify that the dc gain of the subsystem

S(A ,b ,c ) given by twice the trace of the matrix A2 should be small.


22 2 2
The smallness of the singular values guarantees low dc gains, however

the converse is not necessarily true.

This is a significant result since in most established model-order

reduction methods, the dc gain is one of the criteria for obtaining


- 130 -

reduced order models. Often, it is specified that the reduced order

model and the original model should have the same dc gain and due to

the direct relationship between the singular values and the dc gain,

it may be possible to take such contraints into account in this approach.

6. Computation of Principal Representations

Since principal representations are equally useful as the balanced

representation in model-order reduction, we may use principal instead

of balanced representations. Computation of balanced representations


. 1
normally requ1re t he so 1 ·
ut10n 0 f L yapunov equat10ns
. (1) and ( 2)

and a series of spectral decompositions and similarity transformations.

However, for SISO systems we suggest the following numerical procedure

which requires less computational effort.

(a) Compute the matrix W as the solution of (8) using any standard
co
. 2
algonthm

(b) Compute the real spectral decomposition of the matrix W ,to


co
give

W
co
..
where V is an eigenvector matrix and A is the diagonal eigenvalue

matrix.
(c) Compute the principal representation S(A,b,c) given by

-1 -1
S(A.b,c) - S(V AV, V b.cV)

This step is obvious from corollary 2.

(d) (Optional step) If a balanced representation is required, we

may use the diagonal similarity transformation P, defined by,


-
P - diag( ••• , p.,
1.
... )
- 131 -

- -
where p.
1 - (u.b./c.)
111
~
if c.
1
~ 0

p.
1 - 1 if c.
1 - 0
0

We have used the above numerical procedure on a number of problems


1
including the examples described by Moore and identical results

were obtained.
In references 3 and 4 and elsewhere, computation of principal

representations is based on diagonalization of the matrix product W 2w 2


c 0
2
which is equal to Wco 2. However, W can have non-distinct eigen-
co
values while the eigenvalues of the matrix Ware
co distinct (corollary 3).

Thus, the spectral decomposition problem associated with Wco is always

well-conditioned, which may not be the case with W 2. Also, our


co
2 2
approach avoids formation of the product Wc Wo and is well-conditioned

with respect to round-off errors.

7. Conclusions

We have defined a new matrix Wco which can be considered as a

cross-Gramian matrix and which contains information pertaining to both

controllability and observability. Using this matrix, the structure

of SISO linear systems in the context of principal component analysis

has been studied. It was shown that its properties can be used in

model-order reduction in the framework of more general principal

representations without computing the more specific balanced

representations. However, both principal and balanced representations

give the same reduced-order model.


- 132 -

Due to inherent signatures, the cross-Gramian matrix W contains


co
more information than the controllability and observability Gramian

matrices. In fact, it can be shown that Wand the Hankel matrix


co
associated with the same system, share common properties including
5
the Cauchy index (Chapter 11).

We have also proved the relationship between the singular values

of the system and the dc gain. It was explained how this property

can be used as an alternative criterion in model-order reduction.

8. References
1. Moore, B.C.: 'Principal component analysis in linear systems:

Controllability, observability, and model reduction', IEEE Trans.

Automatic Control, 1981, AC-26, (1), pp.17-32.

2. Bartels, R.H., and Stewart, G.W.: 'Solution of the matrix

equation AX + XB - C', Comm. of the ACM, 1972, 15, (9), pp.820-826.

3. Mullis, C.T., and Roberts, R.A.: 'Synthesis of minimum roundoff

noise fixed point digital filters', IEEE Trans. Circuits and

Systems, 1976, CAS-23 , (9), pp.55l-56l.

4. Silverman, L.M., and Bettayeb,M~ 'Optimal approximation of linear

systems', in Proc. 1980 Joint Automatic Control Conference,

San Francisco, CA, paper FA8-A, August 1980.

5. Fernando, K.V., and Nicholson, H.: 'On the Cauchy index of

linear systems', IEEE Trans. Automatic Control, 1983, AC-28, (3).


- 133 -

CHAPTER 10

Minimality of SISO Linear Systems

Abstract: A new test for minimality of single-input single-output

state-space realizations is proposed based on rank conditions of a

cross-Gramian type matrix without computing the controllability

and observability Gramian matrices.

1. Introduction
The problem of determining minimality of state-space representations

of linear systems is of fundamental importance in modern control and

general systems theory. Much effort (see for example Kailath 2) has

gone into investigations of minima1ity since the pioneering work of

Kalman et all and other workers. If a system representation is

minimal, then the system is jointly controllable and observable and

hence minimality tests are based on controllability and observability

criteria.
3
Fernando and Nicholson (see Chapter 9) defined a cross-Gramian

matrix W which contains information pertaining to both controllability


co
and observability. It was shown that this matrix W can be used in
co
deriving internally balanced and other principal axis realizations

of 5150 systems without computing the controllability Gramian matrix

Wand the observability Gramian matrix W . The object of


c o
this Chapter is to show that if the matrix Wco is of full rank, then

the realization is minimal.


- 134 -

2. Preliminaries

For the asymptotically stable 5150 time-invariant system

S(A,b,c) defined by

x(t) - Ax(t) + bu(t) yet) - cx(t)

the controllability and observability Gramian matricesl ,2 can

be defined in the infinite-time interval as,


CD

W co
f (eA~) (eAtb) T dt (1)
c 0

W
0
. f
CD

(ceAt ) T (ce
At
) dt (2)
0

These Gramian matrices can be also obtained by solution of the

Lyapunov equations given by,

W AT + AW
c c
.- (3)

T
- - c c (4)

If (3) and (4) are used as definitions of the Gr~an matrices,

instead of (1) and (2), then the assumptions regarding the asymptotic

stability is not required provided that there are no eigenvalues

of the state matrix A such that,

A. (A) + A. (A) - 0 for i = l,n, j = l,n


1 J

where n is the order of the system. The above condition also

guarantees the uniqueness of the solutions Wc and Wo of (3) and (4).


3
Fernando and Nicholson (see Chapter 9) defined the cross-Gramian

matrix Wco as
Cl)

At. At
wco - of (e b) (ce ) dt
- 135 -

This matrix can be obtained as the solution of the matrix equation

given by
W A + AW = - bc (5)
co co

which can be considered as an equivalent and a more general


4
defini tion. We observe that there are standard algorithms

for solution of (5).

3. The results
We present the following result which re lates the three Gramian

matrices.

Proposition 1: WW .. W 2
c 0 co

Proof: For asymptotically stable systems, an indirect proof was

provided in Chapter 9. However, a simple proof is possible

provided the state matrix A is semi-simple. Using the eigenvector

matrix U of the matrix A as a similarity transformation, we may

obtain the equivalent system S(A,b,c) where A is diagonal.


-1 -1
S(A,b,c) - S(U AU,U b,cU)

The representation S(A,b,c) is in general complex.

For the transformed system, the i,jth elements of the Gramian

matrices are given by,


b.b.
1. ]
(W ).. -
c ~J
a .. + a ..
1.1. JJ

c.c.
1. ]
(W ).. -
o 1.J a .. + a ..
1.1. JJ
- 136 -

b.c.
~
(W ) •• = J
co 1J
a .. + a ..
~1 JJ

and thus, W W
c 0 - W
co
2

which provides the required result through the si~larity

trans forma tion.


If the controllability Gra~an
matrix W is of full rank, then
c
1 2
it is well known that the system is completely controllable ' •

Similarly, if the observability Gramian matrix W is of full rank,


o
then it guarantees complete observability. The following result

is about joint controllability and observability.

Propos i don 2: If the cross-Gra~an matrix W is of full rank,


co
then the system is completely controllable and observable, and hence

a ndnimal realization. This condition is both necessary and

sufficient.

Proof: This results as a direct consequence of the relationship

between Gramian matrices as indicated in proposition 1.

Thus, proposition 2 provides a direct method of determining

ndnimality of SISO realizations by checking the rank of the matrix

W There is considerable savings in computation since we have


co
to solve only one matrix equation for W rather than two matrix
co
equations for Wc and Wo' Furthermore, we have to check only one

matrix for rank rather than two.


- 137 -

4. The Discrete-time Problem


The results in section 3, can be extended to the discrete-

time system 5 (A,b,c) defined by


d
a Ax
t
+ bu
t -
Instead of equations (3), (4), and (5), the Gramian matrices are

defined by equations (6), (7), and (8), respectively.

W - AW cAT
c
W - ATW A
-
=
bb
T
c c
T
(6)

(7)
0 0

W - AW coA - bc (8)
co
We also assume that there are no eigenvalues of the state matrix A

such that
A. (A) A. (A) - 1 for i = l,n , J - l,n
~ J

It can be shown easily that propositions 1 and 2 are also true

for the discrete-time case.

5. Conclusions

We have developed a simple test for minimality of 5150

realizations based on joint controllability and observability

as characterized by the cross-Gramian matrix W which can be obtained


co
as the solution of single matrix equations.
- 138 -

6. References

1. Kalman, R.E., Ho, Y.C., and Narendra, K.S.: 'Controllability

of linear dynamical systems', Contributions to differential

equations, 1962, 1, (2), pp.l89-213.

2. Kailath, T.: 'Linear Systems', Prentice-Ha11, Eng1ewood Cliffs,

NJ, 1980.
3. Fernando, K.V., and Nicholson, H.: 'On the structure of balanced

and other principal representations of S1S0 systems', IEEE Trans.

Automatic Control, 1983, AC-28, (3), (tentative publication date).

4. Golub, GoH., Nash, So, Van Loan, Co: 'A Hessenberg-Schur method

for the problem AX + XB = C', IEEE Trans. Automatic Control,

1979, AC-24, (4), ppo903-913.


- 139 -

CHAPTER 11

On the Cauchy Index of Linear Systems

Abstract: The Cauchy index for linear S1S0 systems is given by the

signature of the cross-Gramian matrix Wco defined by Fernando and

Nicho1son l . This index is useful in the qualitative understanding

of systems and the importance of this index in internally balanced

model reduction is illustrated.

1. Introduction

The Cauchy index is one of the fundamental parameters available

~n the study of rational functions and thus naturally important in

studies involving transfer functions of both continuous-time and


235
discrete-time linear systems ' , • This index is given by the

signature of the associated Hanke1 matrix of the rational function

and is especially useful in the characterisation of systems with

respect to the structure of poles and zeros. For a system with

distinct poles, the Cauchy index is equal to the number of real poles

with positive residues minus the number of poles with negative

residues.
l
Fernando and Nicho1son (Chapter 9) defined a cross-Gramian

matrix W which contains information about both controllability and


co
observability. Due to this property, information contained in the

controllability Gramian W 2 and the observability Gramian W 2 become


c 0

redundant if the matrix Wco is known. The object of this Chapter

is to show that the Cauchy index is given by the signature of the


- 140 -

cross-Gramian matrix Wco • We also demonstrate the similarities

between the Hankel matrix and the cross-Gramian matrix.


2
The Cauchy index is particularly useful in system identification

and in model-order reduction. Structural changes occur in the

model-order reduction process, and by knowing the Cauchy index at

each level, the structural changes which accompany the reduction

process can be predicted. A numerical example is included to

illustrate the importance of the Cauchy index in internally balanced


. 4
model-order re d uct10n •

2. The Gramian and Hankel Matrices

For the linear nth order time-invariant asymptotically stable

continuous-time system 5 c (A,b,c),

x(t) = Ax(t) + bu(t) y(t) ~ cx(t)

the controllability Gramian matrix Wc 2 and the observability Gramian

matrix W 2 are defined as,


o

T
W 2(T)
c -f 0
(eAtb) (eAtb)Tdt , T > T
0
(1)

T
W 2(T)
0 -f0
(ceAt)T(ceAt)dt T > T
0
(2)

l
Fernando and Nicholson (Chapter 9) demonstrated that the information

in the controllability Gramian and the observability Gramian are

essentially contained in the cross-Gramian matrix Wco defined by

Wco (T) T > T (3)


o
- 141 -

In a stochastic formulation of the problem, this matrix can be

considered as a cross-covariance between the controllable states

and the observable states. while the controllable and the observable

Gramians can be considered as auto-eovariances.

For the discrete-time system Sd(A,b,e) ,

- Ax + Bu
t t

the Gramian matrices are defined by,

p > n (4)

• (5)

In an analogous manner to the continuous-time case, we may

define the W matrix as,


co

W (p) = (6)
co

The Gramian matrices are directly related to the controllability and

observability matrices of Kalman, thus

- T
Q (p)Qo(p)
o
(7)

- Qc(p)Q c (p)
T
(8)

and W
co
(p)
- Qc(p)Qo(p) (9)

where . Ab

Qo T (p) - [eT (cA)T


- 142 -

The (symmetric) Hankel matrix H(p) can be defined using these

controllability and observability matrices, and is given by

H(p) = (10)

The similarity between the Hankel matrix H(p) and the cross-

Gramian matrix W (p) is obvious. The Hankel matrix may be


co
considered as an "outer product" of the observability and the

controllability matrices while the cross-Gramian matrix manifests

as an "inner product". This suggests that there is a direct

duality between the Hankel matrix and the cross-Gramian matrix.

We observe that the definitions used in the discrete-time case

are also valid in the continuous-time case and provide alternative

but equivalent definitions for Gramians.

3. The Cauchy Index

For the proper rational transfer function

g(s) - l
k=1
s
-k

the associated Hankel matrix is defined as,

hI h2 h
P
H(p) - h2 h3 h
p+l
h h h
P p+l 2p-l

If this transfer function is of the dynamical system, Sc or Sd' then

the Markov parameters ~ are given by,

=
- 143 -

We reproduce the fundamental theorem of Hermite and Hurwitz 2 ,3

which relates the Cauchy index with the signature of the Hanke1

matrix. (The signature of a matrix ~s the number of positive real

eigenva1ues minus the number of real negative eigenva1ues).

Theorem 1: The Cauchy index of g(s) is equal to the signature of

the associated Hanke1 matrix.

A simple "control theoretic" proof was provided by Brockett 2

which illuminates the essentials of the theorem.

The following result which is often quoted in minimal realization


S
theory and circuit analysis is due to Anderson •

Theorem 2: For the system Sc or Sd' there exists a unique symmetric

matrix R satisfying

T
RA • Rb c

and the signature of R is equal to the Cauchy index of the system. 0

4. The Cauchy Index and the Cross-Gra~an Matrices

We propose the following two results which relate the Cauchy

index to the signature of the cross-Gramian matrices.

Proposition 1: The Cauchy index of the system Sc or Sd is given by

the signature of the cross-Gramian matrix Wco (p).

Proof: It is obvious from (9) and (10) that all non-zero eigenva1ues

of the Hanke1 matrix H(p) are given by the eigenvalues of the cross-

Gramian matrix Wco (p). Thus, the signatures of these matrices are

equal and hence the result. a

Proposition 2: The Cauchy index of the system S is given by the


c
signature of the cross-Gramian matrix Wco (T).
- 144 -

Proof: From theorem 2 (see 5 for details),

R Q (n) =
C

As a direct consequence of the Cayley-Hamilton theorem


ATt T
= e c
t
A
Post-multiplying the above by bTe t and integrating in the

interval (O,T), we obtain

R W 2(T) = W T(T)
co
c

Since, W 2(T) is positive definite, the result is obvious.


c
We have included the cross-Gramian matrices W (p) and W (T)
co co
in the list of matrices which contain information about the Cauchy

index. We now proceed to study the Cauchy index in relation to

internally balanced representations.

5. The Cauchy Index and Balanced Representations

If the controllability Gramian matrix W2(T) and the observability


c
- 2
Gramian matrix Wo (T) of the system Sc(A,b,c) are diagonal and equal,

then the system S (A,b,c) is said to be an internally balanced


c
. 4 Such balanced representations can be obtained
representat10n •
by using similarity transformations on any equivalent realization
l
S (A,b,c). The following result is due to Fernando and Nicholson
c
(Chapter 9).
..., .., 'Y

Theorem 3: For the system S (A,b,c), a balanced representation S (A,b,c)


c c
(with T~) exists such that
- 145 -

(a) the elements of the vectors b and c are equal in magnitude,

that is

b. - c. or b. - - C. for all i.
1 1 1 1

/::;
(b) the cross-Gramian matrix Wco (Cl» = A 1S diagonal.

(c) If b. = c.1 "I- 0 then A. ::z a. 2


1 1 1

2
If b. - c. then A. = - a.
1 1 1 1

where W (CI»" A = diag ( .•••• , A. , ••••• )


co 1

= E2 = diag( •.••• , a.
1
2
, ..... )

(d) If b.b.
1 J
= c.c.
1 J
; 0 then a .... a ..
1J J1

Il
If b.b. "-c.C. then a .. '" - a ..
1 J 1 J 1J J1

The following result gives an explicit solution for the matrix R

(defined in Theorem 2) for balanced representations.

Proposition 3: For the system Sc(A,b,c), the matrix R is diagonal

and the diagonal elements are given by

r. =1 if h. = C.
1 1 1

r ... -1 if b ... - C.
1 1 1

Proof: Obvious from Theorem 3. a

Proposition 3 provides a convenient way of determining the Cauchy

index of balanced representations by inspection. The validi ty of

Proposition 2 is also obvious from the above result.


- 146 -

6. Application in Model Reduction: An Example

The following transfer function is used by Moore 4 in internally

balanced model-order reduction.

(5+4)
g(s) = (5+1)(8+3)(5+5)(8+10)

1 1 1 + _~2_..,--",,""
= 24(5+1) 28(s+3) 40(5+5) 105(s+10)

Thus, the Cauchy index for this system is zero which is one of the

five possible values (±4, ±2, 0) for a fourth-order system.

If the system is realized in the internally balanced format,

the controllability and the observability Gramians are diagonal, and

equal, and given by4 (for the infinite-time definition of Gramians)

- 2 =
W
c
W
o
2 = r2 = diag(0.0159, 0.272x10- 2 , 0.126xlO- 3 , O.SxlO- S)

The cross-Gramian matrix Wco is also diagonal and equal to the balanced

Gramians except for the signature (Theorem 3).


2
W = A = diag(0.0159, -o.272xlO- , 0.127xlO- 3 , -O.SxlO- S)
co
Since the signature of the matrix Wco is zero, from Proposition 2, we

know that the Cauchy index is zero.


In internally balanced model-order reduction, the reduced model

is obtained by subsystem elimination and the dominant diagonal values

(the second-order modes) are retained in the reduction process. Thus,

by inspecting the diagonal values of Wco ,the Cauchy indices of the

reduced-order models can be inferred. Thus,

order 4 3 2 1

Cauchy index o 1 o 1
- 147 -

We now consider the transfer function

I + I + I + 2
g(s) = 24(s+1) 28(s+3) 40(s+5) ~10~5~(~s~+~1-0~)

which is equal to the transfer function g(s) except for the positions

of the zeros.
The Cauchy index for this transfer function is four, and thus

the signature of the cross-Gramian matrix W


co
is also four. The

Cauchy indices for the reduced-order models of the transfer function

g(s) are given by,

order 4 3 2 1

Cauchy index 4 3 2 I

We observe that if the absolute value of the Cauchy number is


2
equal to the system order

(a) the system has real poles only

(b) the system is not non-minimal phase.

Thus, the reduced-order models of g(s) are not non-minimal phase.

However, such inference cannot be used for the transfer function g(s),

since the Cauchy index is zero. Thus, the occurrence of right-hand

plane zeros (in the reduction process) cannot be overruled, although

the transfer function g(s) is minimal phase. In fact, the second-

and third-order reduced models obtained by Moore have right-hand plane

zeros. However, if the second-order approximation is obtained by

eliminating the second and the fourth states (instead of the third

and fourth), the reduced model will not have right-hand plane zeros.

We have deduced this information by inspecting the partial fraction

expansion of the transfer function g(s). However, in large-scale

system studies, it is much more convenient to compute the matrix W


co
and then by inspecting the signature of the matrix Wco we may obtain

the same results.


- 148 -

7. Conclusions

We have related the Cauchy index to the cross-Gramian matrices

and shown the important structural information given by the Cauchy

index. The Cauchy index is classically associated with the Hanke1

matrix and we have proved that the information contained in the Hanke1

matrix is essentially present in the cross-Gramian matrices.

The wealth of structural information provided by the Cauchy

index in the context of internally balanced model-order reduction

has been amply demonstrated using a standard example.

8. References

1. Fernando, K.V., and Nicholson, H.: 'On the structure of balanced

and other principal representations of SISO systems', IEEE Trans.

Automatic Control, 1983, AC-28, (3).

2. Brockett, R.W.: 'Some geometric questions in the theory of

linear systems', IEEE Trans. Automatic Control, 1976, AC-21,

(4), pp.449-455.

3. Gantmacher, F.R.: 'Theory of matrices', vol. 11, Chelsea, New

York, 1976.

4. Moore, B.C.: 'Principal component analysis in linear systems:

Controllability, observability, and model reduction', IEEE Trans.

Automatic Control, 1981, AC-26, (1), pp.17-32.

5. Anderson, B.D.O.: 'On the computation of the Cauchy index',

Quart. Appl. Math., 1972, 29, pp.577-S82.


- 149 -

PART 5

Measures for Quantification of

Controllability, and Observability, and

Input-output Behaviour
- 150 -

CHAPTER 12

The Degree of Controllability due to

Individual Inputs

Abstract: Mahalanobis distance, which is an information theoretic

metric measure, can be used as an index to investigate the

effectiveness of individual inputs in multivariable control systems.

1. Introduction

Since the introduction of the concept of controllability for

linear dynamical systems by Kalman two decades ago, much effort has

been devoted to quantifying controllability. The natural candidate

for such an index has been the controllability Gramian matrix

(W-matrix) ,

for the linear time-invariant dynamical system defined by

x(t) .. A.X(t) + Bu(t)

xtMn, 1 u,"Mm, 1 A~M BEM


n,n n,m

Kalman et al 1 suggested that scalar functions of the W-matrix,

namely the determinant and the trace of the inverse of the W-matrix,

are suitable for identifying the degree of controllability. These

functions are related to the minimal energy problem and physical


2
interpretations are possible .
- 151 -

Ka1man et al also realized the importance of the W-matrix as

an information theoretic measure and related it to the Shannon

definition of information and the Fisher information matrix. Later,

Mitra 3 investigated the W-matrix using information quantifiers due

to Kolmogorov and Shannon in the context of model order reduction.

However, the basic weakness of this approach is that" the W-matrix

is not invariant under similarity transformations of the system (A,B).


4
This was overcome by Friedland by defining the degree of control-

1abi1ity as the condition number of the W-matrix given by

k(W) =
If 11 • 11 is taken as the spectral norm of W, the condition number

is then given by

where ~ and ~. are the eigenvalues of W having the largest, and


max m1n
the smallest magnitudes, respectively. The basic notion for this

definition comes from numerical algebra, however a physical inter-

pretation can be given using a1~eigh 5


quotients ,6 Recently, Denham7

used angles between subspaces, which is a non-metric information

measure, for inter-relating inputs and outputs of large scale systems.

The objective of this Chapter is to develop measures to quantify

the degree of controllability of individual inputs in multi-input

linear systems. This is achieved by using a metric information


S
measure known as the Kahalanobis distance ,9.
- 152 -

2. The Mahalanobis distance 8 ,9

It is well known that for a Gaussian vector z, the probability

density function is given by

p(z) =

where a =

z = E(zJ E[(z-z) (z-z) T)


EI.I denotes the expectation operator. If we take the logarithm

of the function, ignoring the constant scalar, we may define a new

function

which is still a measure of probability associated with the random

vector z. This measure is sometimes known as the Mahalanobis distance

from the mean.

For a particular class of random vectors denoted by z i , belonging


i
to the class S , which is a subset of the general class S,

, z~s i = I,m

the Mahalanobis distance M(zi,ii) defined by

-
i
is a measure of 'oscillatory energy' of the vector z • The loci of

points of constant energy defines a hyper-ellipsoid with the principal

axes in the directions of the eigenvectors of the covariance matrix $.


- 153 -

The lengths of the semi-axes are given by square roots of the

eigenvalues. Th us, t h e ro 1e 0 f .
t h e matr1x A.
~
-1 1S
. to de-we1g
. ht

heavily the eigenvector 'modes' which are powerfully represented

in the set S, and lightly the 'modes' which are not. The expected

value of this measure is given by


-1 i
'" trace 4> $

where the matrix $i is the covariance of the vector zi, thus

-
Although, this is a distance measure, due to the form of

weighting by the matrix $-1, it is independent of the unit of

measurement (i.e. dimensionless). Thus, it can be considered as

the fraction of energy in the subset Si with respect to the set S.

3. The degree of controllability due to individual inputs

We assume that the linear system (A,B) is asymptotically stable

and fully controllable. These may be relaxed in a more formal study.

The W-matrix can be considered as the Gramian matrix

T
• f x(t)xT(t)dt
o

for deterministic unit impulses at the inputs. For stochastic

inputs of the form

we can take the W-matrix as the covariance

W = limit W(t) = limit E(X(t)XT(t)]


t-+ao t-+ao
- 154 -

Without loss of generality, it is assumed that T~, and thus the

results for deterministic and stochastic approaches technically

coincide. If there is a~ ",,-,u.lu. signal at the i' th input wi th

all other inputs being held zero, the response of the system is

given by
1 1
x (t) = x (t)

where b 1 is the i'th column of the matrix B.

Now we can compare the energy in the controllability subspace

due to the i'th input only using the Mahalanobis distance. This is

a measure of the degree of controllability of the state-space and

we define this index by

where . limit E [i (t) (xi (t» T] -= fm eAtbi(bi)TeATtdt


t~ o

4. Properties of the degree of controllability index


2
(a) The scalar d. is invariant under similarity transformations of
1

the linear system (A,B). Under the similarity transformation T,

the system becomes (TAT-l,TB) and the distance measure is given

by

=
The result follows due to the invariancy of the trace under

similarity transformations.
m
(b) The sum 0f t . dexes are equa 1 to un1·ty,
he 1n \L d. 2 1
=.
i=l 1
- 155 -

i
The matrix W is given by the solution of the Lyapunov equation,

m
The result is due to the fact that, W = Lw.
. 1 L
La

(c) The indexes are always positive, and less than unity (equality

holds for the case m = 1),


2
1 > d. > 0 , i = I,m
- L

Since the system LS asymptotically stable, the matrix wi is


. .
pos1t1ve . d e f·1n1te
sem1- . 5,11 ,and h ence the result.

5. Extension

The definitions given for controllability for linear continuous

systems can be extended to unstable systems, to systems having

uncontrollable subspaces, and can be used to study the importance

of various outputs using the concept of observability. They can

also be extended to discrete systems and to include input signal

statistics of the form Elu(t)UT(T)] = N6(t-T) where N is a positive

diagonal matrix.

6. Conclusions

We have defined measures to quantify the degree of controllability

of individual inputs based on the Mahalanobis distance. These

measures, in some ways, are similar to the information theoretic


1
measures proposed by Kalman et al , and can also be used to investigate
14
the dissimilarity between controllable subspaces •
- 156 -

We believe that the definitions based on the Maha1anobis

distance are very appropriate for linear systems analysis due to

the direct connection with Gaussian processes. However, the

Maha1anobis distance is not the only measure available from the

fields of information theory and pattern recognition, and other

measures 8-10,12,13 bot h .


metr~c and .
nonmetr~c
( e.g. ang 1es between
7
subspaces ) are important in linear systems analysis and design.

7. References

1. Kalman, R.E., Ho, Y.C., and Narendra, K.S.: 'Controllability

of linear dynamical systems', Contributions to Differential

Equations, 1963, 1, pp.189-2l3.

2. Johnson, C.D.: 'Optimization of a certain quality of complete

controllability and observability for linear dynamical systems',

ASME J. Basic Engineering, June 1969, pp.228-238.

3. Mitra, D.: 'w matrix and the geometry of model equivalence and

reduction', Proc.IEE, 1969, 116, pp.1l01-1106.

4. Fried1and, B.: 'Controllability index based on conditioning

number', AS~m J. of Dynamical Systems, Measurement & Control,

December 1975, pp.444-445.

5. Lancaster, P.: 'Theory of Matrices', Academic Press, New York,

1969.

6. McCa11ion, H.: 'Vibration of linear mechanical systems',

Longman, London, 1973.

7. Denham, M.J., and Mahi1, S.S.: 'Determination of large scale

system structures using principal component analysis', Proe. Int.

Conf. on Control and Its Applications, University of Warwick,

UK, March 1981.


- 157 -

8. Tou, J.T., and Gonzalez, R.C.: 'Pattern Recognition Techniques',

Addison-Wesley, Reading, MA, 1974.

9. Rao, C.R.: 'Advanced Statistical Methods in Biometric Research',

Wiley, New York, 1952.

10. Baram, Y., and Sandell, N.R.: 'An information theoretic approach
to dynamical systems modelling and identification', IEEE Trans.

Auto. Control, 1978, 23, 1, pp.6l-66.

11. Barnett, S., and Storey, c.: 'Matrix Methods in Stability Theory',

Nelson, London, 1970.

12. Fukunga, K.: 'Introduction to Statistical Pattern Recognition',

Academic Press, New York, 1972.

13. Baram, Y.: 'Distance measures for stochastic models', Int. J.

Control, 1981, 33, pp.149-l57.

14. Fernando, K.V., and Nicholson, H.: 'On discrimination of inputs

in multi-input systems', Under preparation.


- 158 -

CHAPTER 13

The Coherence Between System Inputs and Outputs

Abstract: Some measures, based on Gramian matrices and signal

processing practice, are defined to quantify the inter-relationship

between a particular system input and an output. Such quantifiers

are useful in the analysis and design of linear multi-input mu1ti-

output mu1tivariab1e systems. This approach is particularly

significant for preliminary studies involving the input-output

behaviour of large-scale or complex systems.

1. Introduction

The analysis of interaction between system inputs and outputs

and the implications of such interacting behaviour constitutes an

important branch in systems theory, especially in the context of 1arge-

scale systems. Although the ultimate aim of such analysis is usually

to design a 'controller' for the system. the study of interaction

itself poses a non-trivial problem to the system analyst. As an

example, for a system with ten inputs and ten outputs (which is not

an unusual situation in process control or econometrics), there could

be up to a hundred possible forms of interaction. In such situations,

physical considerations and intuitive reasoning may break down due to

the dimensionality of the problem. Although most multivariable

techniques utilise interaction effects in the design of controllers,

such refined methods are usually not feasible in large-scale problems.

In process control, interactions between inputs and outputs are

. usually quantified using the relative gain array which is an extension


- 159 -

15 16 22
of the Bristol measure ' , based on the conditioning number

related to the gain of the system. Since the steady behaviour is

paramount in process control, this measure is usually based on the

dc gain of the system. The Bristol measure has been used widely in

the design of controllers for distillation columns and similar

processes.

The study of interaction between inputs and outputs is also an

important branch of econometrics. Input-output tables are widely used


. f11 anaI
in the Leont1e · 0 f nat10na
YS1S . I .
econo~es an d ot h er .
SOC10-econo~c.

behaviour. Using these tables, Leontief was able to disprove some

popular but erroneous beliefs in econom1CS.

From the control-theoretic point of view, the input and output

behaviour is usually studied in the context of controllability and

observability, respectively. Recently, there has been a resurgence

of interest in quantifying controllability and observability. Friedland4

defined an index to quantify the 'goodness' of controllability (and

observability) using the conditioning number of the controllability

Gra~an matrix, which led Moore l to define balanced and other principal
realizations. Moore has shown the advantages of working directly on

signals and their statistics rather than on secondary objects such as

model parameters. The principal component analysis of Moore led


2
Denham et a1 to define some measures to quantify the relationship

between a particular input and an output. These measures can be

determined by measuring the angles between some controllable and

observable subspaces, and also by computing the second-order modes

of the system.
- 160 -

In these different but related disciplines, namely econometrics,

and control theory, the analysis of interaction between inputs and

outputs has been considered as an important part of system analysis.

Although these approaches differ conceptually, the underlying theme

has been the problem of scaling the system to exemplify the inter-
2l
actions. As pointed out by Brockett et a1 , a unified scaling theory

does not exist for the analysis of control systems. In other

scientific disciplines, scaling is achieved using non-dimensional

quantities, the well known Reynolds number in fluid mechanics being

a typical example.
In estimation and other stochastic problems, cross-correlations

and cross-covariances between signals, through which coherence

functions can be defined, signify the presence or absence of

relationships between these signals. Such coherence measures have

been exploited fully in the literature, especially in input-output

identification. Paradoxically, the applicability and the suitability

of such measures for the analysis and design of mu1tivariab1e systems

have not been fully recognised.

The aim of this Chapter is to define similar measures for

relating inputs to outputs based on concepts from signal processing

practice and least-squares theory. These complement the measures

already defined to quantify the degree of controllability/observability3

(Chapter 12).
In common with the references 1-6, these measures are defined

using Gramian matrices (W-matrices) which can be considered as (auto)

covariances or second-order moments in a stochastic formulation of

• the problem, thus giving a direct link with least-squares theory.


- 161 -

However, the controllability Gramian Wand the observability Gramian


c
Ware inadequate to compute coherence functions, since they do not
o
give any information about the cross-effects between inputs and

outputs, explicitly.

To account for such input-output interactions, a matrix W has


co
5
been defined by Fernando and Nicholson ,18,19 (Chapters 9-11), which

is again a second-order statistical average and contains information

about both the controllability and observability properties. The

coherence measures are specified using scalar functions of these

matrices.

2. The controllability and observability Gramians and extensions

For the nth order linear time-invariant, stable, controllable and

observable system S(A,B,C),

i(t) a Ax(t) + Bu(t) yet) = Cx(t)


L
the controllability Gramian matrix Wc with respect to the input u
i
in the infinite interval is defined by

w i 6= f (eAtb i ) (eAtbi)T dt
c o

The vector b
i
denotes the ith column of the matrix B and ~(t) is
the kth input.
i
In a statistical formulation, the matrix W can be considered
c
i
as the (auto) covariance of the states x (t), under the white noise

input u.(t) with statistical averages of the form,


1

Elu.(t)1
L
- 0
- 162 -

where E (. J denotes the expectation operator, and 15(·) 1S the Dir~c


. . T
delta. That is,
1 1
E(x (t»(x (t» J = W1
c

j
Similarly, the observability Gramian matrix W0 is defined by

where c. denotes the jth row of the matrix C. The dual system is
J
characterized by,
T
= B xd (t)

for k ~ j and all t

.. W j
o

In studies involving inputs, the controllability Gramians W i,


c
i s l,n are paramount, and similarly, the observability Gramians

W j, j • l,n with respect to the outputs. However, these Gramians


o
do not convey any direct information about the relationships between

the inputs and the outputs of the system. It is well known that in
i
similar problems, the cross effects between the two processes x (t)

and xdj(t) are analysed using cross-covariances of the formS ,18,l9

(Chapters 9-11),
ij
= W
co

With the same white noise input in the controllable and the observable

systems ,
- 163 -

u. (t) z: v. (t)
~ J

~J
the matrix Wco is given by
00

W ~J
= f eAtbi c j e At dt
co 0

The matrix W ij can be computed by solving the following Lyapunov


co
. 5 18 , 19 (Chapters 9-11)
type equat~on'

W ~JA + AW ij
co co -
The fundamental relationship between the cross-Gramian matrix W ij
co
and the controllability and observability Gramians W i and W j is
c 0
.
g~ven
byS,lS,19

3. Input-output relationships

For the system S(A,B,C), driven by white noise inputs of the form,

E [u(t)] = 0

the cross-covariance between the jth output and the ith input is given

by
...
,. f c j e Atbid t ,.
o

which is the negative of the first moment of the system or the dc gain.

The moments are invariant under similarity transformations and do not

depend on the particular state-space realization.

Information about the first moment is also contained in the cross-


.
covar~ance .
matr~x W ij ,and f 0 11 ows f rom t h e re lat~on,
'
co

-
00 00
j At e Atbi/t f
2 f ce 4= 2 trace eAtbicjeAtdt
o o

- 2 trace W
co
ij
- 164 -

ij
Thus, the matrix Wco carries information about the ith input and
i
the jth output, which is not present in the Gra~an matrices Wand
c
W J, individually. The matrix W ij may thus be considered as the
o co
carrier of information from the ith input to the jth output.

The dc gain is one of the fundamental measures in systems studies

and its computation is almost routine in any design method. If the

dc gain is zero in a system, then step-wise changes in the output

cannot be obtained using step-wise inputs. If the dc gain is non-zero

and if the transient response of the system is of no consequence, the

long-range (infinite-time) objective of step-wise output control can

be achieved by knowing the dc gain alone. This is also true for the

multivariable case where all possible dc gains from each input to all

the outputs will be required.


In fact, this is the type of control envisaged by LeontieflO,ll in

the study of input-output economic and other models. The essentials

of our argument are present in Leontie£ models, although the terminology

is obviously different.

In practice, however, control of econo~c systems are based on

heuristic and fuzzy rules which have been determined through past input-

output correlations. Instead of using all the possible inputs to

control all outputs (which ~ght be required for the 'optimal' strategy)

simplified laws are used to control a particular output (or a group

of outputs) using a particular input (or a group of inputs). Such

input-output pairings are well known and perhaps the control of

inflation through control of money supply is a good typical example.


- 165 -

4. possible measures for quantification of input-output relationships

We have shown the importance of the dc gain in control, and without

any reservation it can be considered as the most important measure in

relating inputs to outputs. For multivariable systems, we may predict

the behaviour of the system by inspecting tabulated values of the de

gain.
As mentioned earlier, the dc gains are invariant under similarity

transformations and in that sense they are absolute measures. However,

the dc gains are not invariant under input or output scaling. Thus, if

a particular input is measured in metres per second instead of kilometres

per second, it will be reflected in the dc gain table as a thousand

fold increase. If the physical knowledge of the system is limited, this

may convey the false impression that the particular input is important

because of the high gain path. Thus, for unbiased understanding of

systems, we require measures which are also invariant under input and

output scaling. We call such measures, structural measures.


2
One of the measures advocated by Denham et al depends on the sum
j
of the singular values of the system S(A,bi,c ). According to our

criterion, this quantity is not a structural measure, since it is not

invariant under input and output scaling. However, Denham et al avoided

some of the difficulties by using lateral arguments involving ratios.


. .1S essent1a
Such analys1s . f lO ,11 1n
. 11y L eont1e . nature.,

Another possible quantifier could be based on the concept of output

controllability7 by using the value cjw i(cj)T which is also equal to


c
Again, this is not a structural measure.

Denham et al also defined quantifiers based on angles between

observable subspaces and controllable subspaces. Such a measure is

intuitively very appealing and is a true structural property. However,

a firm control theoretic interpretation has not yet been found.


- 166 -

5. The coherence between inputs and outputs

It is well known that a coherence measure y to relate two scalar


pq
stochastic processes p and q can be defined by

= f2 / f f o < y < 1
pq pp qq - pq

where the scalar values f pp , f qq and f pq are some second-order statistical

averages of the processes. If the processes p and q are directly related

to each other then the coherence measure tends to unity. If these are

unrelated then the measure approaches zero.

Similarly, the coherence between the input i and the output j can
i
be measured by considering the states x (t) of the controllable system

and the dual states Xdj(t) of the observable system. By using scalar

functions of the auto-covariance matrices W i and W j and the cross-


c 0

covariance matrix W ij, we may define scalar coherence measures for


co
vector processes. One possible definition for such a measure is

given by
•. 2 . . " 2 .. 2
y .. _ (trace W ~J) / trace (W ~W J) = (trace W 1J) /trace(W 1J)
~J co c 0 co co

where both the denominator and the numerator, and hence the measure,

are invariant under similarity transformations of the form,

It is also seen that the measure is invariant under input and output

scaling of the form,


i -.. i j -.. B c j
b a b c
i -.. 1
-u
i
, yj -.. _1 yJ.
u
a B
where a and B are non-zero scalar values. Thus, this measure is completely

independent of scaling and can be considered as a 'non-dimensional' quantity

as used in other scientific disciplines. Also, this quantifier is a true

structural measure according to our criterion.


- 167 -

We also observe that the term in the numerator is directly

related to the dc gain of the system, showing again the importance

of this quantity in input-output studies.

An explanation of the term in the denominator of the measure y .. ,


1J
namely trace(wciwo~. is required.
In the analysis of roundoff noise
6
in digital filters, Mullis and Roberts demonstrated that (for

discrete-time systems) trace(wciwoj)can be considered as the 'storage


j
energy capacity' of the system. If we drive the system S(A,bi,c )

with unity variance white noise in the input ui(t) from t = -~, then

i
as t -+ 0 E - wc
+ i
From, t - 0 onwards, if the system is unexcited with u (t) = 0, then

m
j
E [f (yj(t»2 dt] ,.. trace(w/w
o
)
o

As the term 'storage energy capacity' implies, the response yj(t) is

dictated by the amount of storage energy at t - o. Thus, the quantity

trace(W iw j)will be determined essentially by dynamic elements (such


c 0

as capacitors and inductors in electrical networks) rather than static

elements (such as resistors). It can be shown (using integration by

parts) that the storage energy is also given by


~

ij
I t(h (t»2 dt
o

where hij(t) denotes the impulse response of the system.

Now the coherence measure can be described as the ratio between

the 'static effects' of the system to the 'dynamic effects' of the

system, since the numerator is related to the dc gain and the

denominator to the storage energy capacity.


- 168 -

It should now be obvious that the coherence measure defined

through signal processing practice has an uncanny analogue in electrical

engineering, namely to the power factor or the inverse of the Q-factor.


20
In robust control of systems (especially in process control and
ll
econometrics ), it is usually desired that static (low-frequency)

behaviour is dominant and the dynamic (transient or high-frequency)

effects are minimal (although, at the level of 'fine tuning', dynamic

effects such as slight 'overshoots' are desirable in servomechanisms

after compensation). The use of the coherence measure in this respect

is self-evident since it is based on the ratio between static and

dynamic energies.

For a particular input-output pair. if the coherence measure is

very low, then it is best described as a 'tuned-circuit'.

Alternatively, if the measure is high, then due to the relatively


.
high dc ga1n, a ro b ust d ·
eS1gn 0 f contro 1er
1 ·18 P08S1. bl e 13 •

We wish to point out that the Bristol measure, which is essentially

based on the dc gain of the system. is a quantifier of the static

behaviour of the system while the measure proposed by Denham et al

is essentially dynamic. In contrast, the coherence measure defined

above depends on both the static and the dynamic behaviour of the

system.

6. The coherence measure for internally balanced representations

The system S(A,bi,~j) is said to be internally balanced if the

controllability .Gramian matrix Wc i and the observability Gramian matrix


l j
W
o
j are diagonal and equal If a system S(A,bi,c ) is not internally

balanced, this can be achieved by similarity transformations, and


- 169 -

hence the coherence measure will also be invariant. We denote the


ij
balanced Gramian matrix as w , that is

=
- i
W
c - - j
W
o

5
Since, (W
co
ij)2 = Wc iW 0 J it is seen tha't the cross-Gramian

matrix W
co
ij is diagonal for balanced representations, and we denote
ij - iJ· iJ·
this diagonal matrix as V • That is, Wco - V • It is easy to
ij ij
verify that the diagonal values of the matrix V denoted by v ,
k
ij
k = l,n are the eigenvalues of the cross-Gramian matrix W
co
In fact, these diagonal values are the singular values (second-

order modes) of the system except for possible sign variations. If


ij
we denote the singular values of the system by w
k
, k = l,n, which

are the .
d~agonal va 1ues 0
f t he .
matr~x Wij , then

= k = l,n

The following relationships are then obvious,

trace W
co
ij ..
• ·1~2
trace W iW j .. trace ( Wcol. J)
c 0

The coherence measure is then given by

y ..
1J - (1)
- 170 -

7. A modified coherence measure

It is well known that it is difficult to control systems which

have both positively and negatively decaying exponentials in their

impulse responses. This is a major problem in process control

systems 15-17,22 were


h . 1 e contro 11 ers are pre f erre d •
S1mp The intrinsic

reason for non-minimum phase behaviour of systems is also due to these

mixed exponentials.

Since the signature of the exponentials are determined by the

residues at the poles of the system, these signatures are given by


18
the Cauchy index (Chapter 11) (assuming that the poles are distinct).

For an nth order system with all positively decaying exponentials, the

Cauchy index is given by n. Similarly, if they are negatively decaying,

then it is equal to -no Such systems are always minimal phase.

However, if the Cauchy index lies between these extremes, then non-

minimal phase behaviour can occur, depending on the parameters of the

system, and with increasing possibility if the magnitude of the index

is low. Thus, the Cauchy index is an indicator which can be used

in identifying 'troublesome' input-output pairs.

The main difficulty with the Cauchy index is that, like the
4
rank of a matrix , it is essentially a non-robust measure which can

vary under a small perturbation of the system parameters. Thus, to

properly quantify the information in the Cauchy index, we have to

qualify it by using a condition number. In this respect, the cross-

Gramian matrix W ij is valuable.


co
18
Fernando and Nicholson (Chapter 11) have shown that the signature

of the matrix W ij is equal to the Cauchy index of the system. Since


co
1
• the robust part of the system is dictated by the dominant second-order
- 171 -

modes (as given by the unsigned eigenvalues of the matrix W ij)


co '
the signature of these values 1S more important than the signature

of the non-dominant values. One way of accounting for the magnitudes

of the second-order modes in the definition of the 'condition number'

is to weight them proportionally. Such a measure can be defined


ij
using the eigenvalues of the cross-Gramian matrix W as
co
n
( L vk ij) 2
k=l
BoO = (2)
1J n
( L Ivk ij 1)2
k=l

The numerator of this measure, as 1n the case of y •. , is based


1J
on the dc gain of the system. The denominator depends on the dc

gain of an hypothetical system with a cross-Gramian matrix which has


..
pos1t1ve .
e1genva I ues g1ven
. by Iv ij I • That is,
k

(dc gain of the original system S(A,b i ,c j »2


B•.
1J (dc gain of the hypothetical system)2

Since B•. is similar to the coherence measure y •. , we call this


1J 1J
new measure a1J
.. the modified coherence measure. It is easily verified

that this measure always takes values between zero and one,

o <
-
BoO
1J-
< 1

Furthermore, for first-order systems, it is always equal to unity and

for non-proper systems it is equal to zero.

If the magnitude of the Cauchy index of the dominant part of the

system is high, then the measure will take values near unity. If

it is low, then the measure S .. will take low values.


1J
- 172 -

8. The numerical procedure

To compute the coherence measures between the input i and the

output j,
ij i
(~ Solve the equation W ijA + AW = -b cj .
us~ng th
e aI · hm
gor~t
co co
given in reference 12 or an equivalent algorithm.

(b) Compute t h e ·
e~genva 1 ues 0 f th e ma tr~x
· W ij wh;ch
~ are d enote d
co
ij
by v
k
,k = 1,n.
(c) Compute the coherence measure y .. us~ng equation (1) and/or the
~J

modified measure B.. using equation (2).


~J

Since eigenvector calculations are avoided, the computation of

the measures can be accomplished using orthogona1 transformations.

Thus, the procedure is numerically well-conditioned.

As described in Section 6, the 'obvious' computational scheme

would be to compute the balanced realizations for each input and output.

However, this obvious approach is undesirable due to the following

reasons.

(1) Computation of balanced realizations is numerically expensive

and such realizations are unnecessary as far as the computation

of the second-order modes and the measures are concerned.

(2) All published numerical procedures for computation of balanced

realizations are based on the eigen-structure of the matrix

product W W , which is equal to the square of the matrix W


c 0 co
Since the matrix product is not formed in our approach, it is

a 'square-root' method and thus well-conditioned with respect to

roundoff errors.
- 173 -

However, if balanced realizations are required, then they can

be obtained S (Chapter 9) by computing the eigenvectors of the matrix

wco

9. The experimental procedure

The coherence measure y .. can be computed using experimental


1.J
results without knowing the model of the process. The procedure

is,
(a) Determine the dc ga1.n from input i to output J using step

responses.

(b) Determine the impulse response from input i to output J

denoted by hij(t) and compute the integral,

-
00
ij trace W 2
f t(h (t»2dt = co trace W W
c 0
o

(c) Compute the coherence measure using the formula,

(0.5 x dc gain of the system S(A,b i ,c j »2


y •• =
1.J trace W W
c 0
- 174 -

10. Illustrative examples

Example 1: The single-input single-output system as defined by the

transfer function,

s
R(s) = (s+2) (s+l)

13
was considered by Wang and Davison to illustrate the difficulties in

designing a robust controller. Because of the zero at the origin, the

first moment of the system is zero and thus the system cannot track a

step input.
The coherence measures are obviously equal to zero,

y..
1J
= S..
1J
• 0

due to the zero at the origin. Thus, our measures also indicate that

the system is somewhat ill-conditioned. However, further research is

required to compare ill-conditioning in robust controller problems with

that of balanced systems.


The diagonal V matrix for the system is given by,

VII • diag (1/36, -1/36)

and the singular values (second-order modes) are of the form,

Wll _ diag (1/36, 1/36)

One of the reasons for the ill-conditioned nature of the system is


5 14
due to the non-distinct singular values' ,which is reflected directly

in the coherence measures.

This example also indicates that if the first and the higher

derivatives of the input are dominant in the impulse response, then

the coherence measures are low, tending towards zero. Alternatively,

if the derivatives are not dominant, then we may expect high coherence

values.
- 175 -

Example 2: To illustrate some of the properties of the measures, we

consider the transfer function,

s + z
(s + l)(s + 3)

where z is a real number. If z lies between 1 and 3, then the

residues at the poles are positive (Cauchy index of 2) and thus the

exponentials will be positively decaying. Such systems are relatively

easy to control. If z lies outside this domain, then the Cauchy

index is zero and the exponentials will have opposite signatures.

Due to the form of the measures we would expect large values for them

if z lies between 1 and 3.

We now consider a transfer-function matrix of the form,

- 1
(s + l)(s + 3)

where k .. , i,j = 1,2 are real scalar constants.


1.J
Denham et al used the storage energy capacity of the system (sum

of the singular values) as the criterion for choosing input-output

pairs for control. If kll is large enough, then the combination

(u ,y ) can be obtained as the 'optimal' result which is obviously


1 1
a bad choice since the subsystem 1,1 is non-proper.

If we use the dc gain as the deciding criterion (assuming all

k .. are equal) then the combination (u 2 'Y2) is the first choice.


1.J
This is inappropriate due to the badly positioned numerator zero.

Table 1 gives the coherence measures (which are invariant of k •. )


1.J
for this problem. It is seen that the best combinations for control

are given by
- 176 -

Table 1: The measure for example 2

O. 1.10345

0.86487 0.48251

1 (a) : The coherence measure y ..


1J

O. 1.0

0.76191 0.48251

1 (b): The modified coherence measure S ..


1J
- 177 -

Example 3: To illustrate the measures we have defined for physical

systems, we have considered the state-space model of an oil-fired


.1 er 8 •
bOL This is a ninth order model which has an asymptotically

stable 8th order subsystem which can be obtained by directly decoupling

the ninth state. There are three main inputs and three significant

states in this subsystem. They are,

u mass flow of the steam at the superheater


l
u mass flow of fuel
2
u mass flow of water at the economiser
3
xl steam density

x superheater steam temperature


2
x6 steam drum pressure

Tables 2(a) and 2(b) show half the dc gain and the sum of the

singular values, respectively, which are required in the calculation

of the coherence measures. It is obvious from these tables that, even

for a three-input three-output system, the relationships between inputs

and outputs are difficult to assess.

However, from tables 2(c) and 2(d) which give the coherence measures,

the inter-relationships between inputs and outputs are quite explicit.

These tables indicate that the three inputs are directly related to

all the three outputs except that u 3 is almost completely unrelated to

This tabular evidence is in agreement with physical reasoning.

If one-to-one input-output pairing is required, we may obtain the

following combinations by observing the highest values in table 2(c) or

2(d) and avoiding the unrelated pair (u ,x 2 ) in the combination. One


3
possible combination is given by,

which is a reasonable choice. However, this is not the only possible

solution.
- 178 -

Table 2: The measures for example 3

xl x x6
2
-2 3
u -1.02lxlO -4.807 -1.248xlO
l
-1 1 4
u2 2.098x10 8.470xlO 2.700x10
-3 -6 2
u -4. 764xlO -6. 335x10 -4.645x10
3
2 Ca) : Half the dc gain

Xl x x6
2
-3 3
u 8.483xlO 4.698 1.l79x10
l
-1 1 4
u 2.088xlO 8.491x10 2.695x10
2
-3 -2 2
u 5.093xlO 2.794xlO 4. 956x10
3
2 Cb) : Sum of the singular values

Xl x x6
2
u 1.450 1.047 1.120
l
u 1.009 0.995 1.010
2
-8
u3 0.875 5.lx10 0.878

2 Cc) : The coherence measure y ••


1)

Xl x x6
2
u 0.958 0.976 0.964
1
u 0.891 0.988 0.924
2
-8
u 0.775 2.4xlO 0.781
3
2 Cd) : The modified coherence measure B••
1)
- 179 -

Example 4: As the final example, we have chosen to study the 16th

order model 9 of an F100 turbofan jet engine which was also considered
2
by Denham et al . This model also has three main inputs and three

ma~n outputs, where

main burner fuel flow

nozzle jet area

inlet guide vane position

engine net thrust level

total engine airflow

turbine inlet temperature

Tables 3a and 3b give half the dc gain and the sum of the singular

values, respectively, and we observe that the values in table 3b are


2
marginally different from those of Denham et a1 • As mentioned

earlier, Denham et al used some lateral arguments to choose input-

output pairings for this problem. However, if we consider table 3c

or 3d, the conclusions are quite explicit and are given by

2
and these combinations agree with those of Denham et al • However,

our measure y .. is fundamentally different from that of Denham et al


~J

since the sum of the singular values used by them appears in the

denominators of eqns 1 and 2.


- 180 -

Table 3: The measures for example 4

Y1 Y2 Y3
-1 -3 -2
ul 5.000x10 2.610x10 6.063xlO
2 2
u -4.8l9xlO 8.627 l.4l2x10
2
-1
u -7.075 -1. 973xlO -1.072
3
3(a) : Half the dc gain

Y1 Y2 Y3
-1 -3 -2
u 4.339x10 2.528x10 5.864x10
1
3 2
u2 1.069x10 9.594 1.508x10
1 -1
u l.048x10 l.938xlO 1.500
3
3(b) : Sum of the singular values

Y1 Y2 Y3
u1 1.328 1.066 1.069

u 0.203 0.809 0.877


2
u3 0.456 1.036 0.511

3(c) : The coherence measure y ..


~J

Y1 Y2 Y3

u 0.9086 0.7932 0.7732


l
u 0.0962 0.5860 0.5891
2
u 0.2296 0.8953 0.2758
3
3(d) : The modified coherence measure B••
lJ
- 181 -

10. Conclusion

We have defined coherence measures using fundamental data related

to controllability and observability which are consistent with signal

processing practice and least-squares theory. The first coherence

measure may be interpreted as the ratio between static and dynamic

energy. The modified measure is directly related to the Cauchy index

of the system. We have illustrated the usefulness and the importance

of these measures using non-trivial examples.

The coherence measures can be evaluated for very large-scale

systems because of the availability of efficient numerical algorithms

required for computation. Alternatively, they may be determined

experimentally.

We have also highlighted the relationships between our approach

and that of input-output Leontief models. In such economic and other

social systems, dc gain is paramount and the transient behaviour is

secondary. Since, the second-order modes (singular values) are

available in the intermediate calculations, the controllability and

observability properties of the system are also available which is a

definite advantage in these studies.

We have also established the importance of the coherence measures

in determining possible non-minimum phase behaviour and the occurrence

of exponentials of mixed signatures in impulse responses. Such

information is especially valuable in process control.

We do not claim that the measures defined in this paper are the

best or that there are no other alternatives in large-scale system

analysis. However, our approach seems to be reasonable and applicable.

Furthermore, they are consistent with the requirements in a variety of

disciplines.
- 182 -

We have not discussed the possible application of these measures

in the control of large or complex systems. However, the coherence

measures together with other measures which characterise dynamic and

static behaviour can have applications in 'structure free' modelling

. d by B' 20
and control 0 f systems as env1sage r1sto 1 an d oth ers. These

measures can also have application in the design of fuzzy controllers.


- 183 -

11. References

1. Moore, B.C.: 'Principal component analysis in linear systems:

Controllability, observability and model reduction', IEEE Trans.

Automatic Control, 1981, AC-26, (1), pp.17-32.

2. Denham, M.J., and Mahil, S.S.: 'Determination of large scale system

structures using principal component analysis', Proc. Int. Conf.

on Control and Applications, Univ. of Warwick, UK, March 1981.

3. Fernando, K.V., and Nicholson, H.: 'Degree of controllability due

to individual inputs', Electronics Letters, 1981, 17, (9), pp.330-33l.

4. Friedland, B.: 'Controllability index based on conditioning number',

ASME J. Dyn. Syst. Meas. & Control, 1975, pp.444-445.

5. Fernando, K.V., and Nicholson, H.: 'On the structure of balanced

and other principal representations of SISO systems', IEEE Trans.

Automatic Control, 1983, AC-28, (f).

6. Mullis, C.T., and Roberts, R.A.: 'Roundoff noise in digital filters:

Frequency transformations and invariants', IEEE Trans. Acoustics,

speech, and signal processing, 1976, ASSP-24, (6), pp.538-550.

7. Kreind1er, E., and Sarachik, P.E.: 'On the concept of controllability

and observability of linear systems', IEEE Trans. Automatic Control,

1964, AC-9, (2), pp.129-l36.

8. Nicholson, H.: 'Dynamic opti~sation of a boiler', Proc.IEE, 1964,

111, pp.1479-l499.

9. Sain, M.K., Peczkowski, J.L., and Melsa, J.L.: 'Alternatives for

linear multivariable control', National Engineering Consortium,

Chicago, 1978.

10. Bacharach, M.: 'Biproportional matrices and input-output change',

The University Press, Cambridge, 1970.


- 184 -

11. Leontief, W.: 'Input-output econo~cs', Oxford Univ. Press, New

York, 1966.
12. Bartels, R.H., and Stewart, G.W.: 'Solution of the matrix equation

AX + XB = C', Communications ACM, 1972, 15, (9), pp.820-826.

13. Wang, S.H., and Davison, E.J.: 'Penetrating transmission zeros in

the design of robust servomechanism systems', IEEE Trans. Automatic

Control, 1981, AC-26, (3), pp.784-787.

14. Pernebo, L., andSilverman, L.: 'Balanced systems and model reduction',

Proe. 18th IEEE Conf. Decision Control, Fort Lauderdale, Florida,

December 1979.
15. Tung, L.S., and Edgar, T.F.: 'Analysis of control-output interactions

in dynamic systems', AIChE Journal, 1981, 27, (4), pp.690-693.

16. Bristol, E.H.: 'Recent results on interactions in multivariable

process control', A1ChE 7lst Annual Meeting, Miami, FL, November 1978.

17. Bristol, E.H.: 'The right half plane'll get you if you don't watch

out', Proc. JACC, 1981, Charlottesvi11e, Va., USA.

18. Fernando, K.V., and Nicho1son, H.: 'On the Cauchy index of linear

systems', IEEE Trans. Automatic Control, 1983, AC-28, (3), (tentative).

19. Fernando, K.V., and Nicho1son, H.: 'On the minimality of S1S0 linear

systems', Proc. IEEE, to be pyblished.

20. Bristol, E.H.: 'Pattern recognition: An alternative to parameter

identification in adaptive control', Automatica, 1977, 13, pp.197-202.

21. Brockett, R.W., and Krishnaprasad, P.S.: 'A scaling theory for

linear systems', IEEE Trans. Automatic Control, 1980, AC-25, (2),

pp.197-207.

22. Shinskey, C.J.: 'Process-Control Systems', McGraw-Hill, New York,

1979.
- 185 -

CHAPTER 14

On Discrimination of Inputs in Multi-input Systems

Abstract: A metric information measure known as the Mahalanobis

distance is used to quantify the dissimilarity between controllable

subspaces due to any two inputs in mUlti-input linear systems.

1. Introduction
Recently, a metric information measure known as the Mahalanobis

distance 2-4 was used to quantify the effectiveness of inputs in multi-


. 1 However, for proper understanding of a
input l1near systems •
control system, this measure of effectiveness alone is not sufficient

and a measure for similarity or dissimilarity of controllable sub-

spaces due to individual inputs is required. The object of this

Chapter is to define such a measure for linear systems using the

Mahalanobis distance.
Apart from the theoretical importance, such measures can be used

to analyse complex systems. With some control problems, the designer

will have discretion in choosing the system inputs based on physical

reasoning and engineering judgement. However, with complex problems

such qualitative reasoning might be difficult or absent and more

quantitative measures are required.

Quantitative measures for discrimination of inputs are important

for system operation, and under emergency operatine conditions an

alternative control strategy with loss of an input (eg. actuator


- 186 -

failure) can be devised if the similarity or dissimilarity of

controllable subspaces due to individual inputs is known. Thus.

quantifiers for the discrimination of inputs together with an


l
effectiveness measure (Chapter 12) are crucial in the design and

operation of complex systems.

The Mahalanobis distance for discrimination of vectors


2-4
2.
For two classes of random vectors of dimension n, denoted by

zi and zj which belong to the classes Si and sj, respectively, and

which are subsets of the general class S,

Z~ S , i = I,m
the Mahalanobis distance between the vectors is defined by

,
-1
= ( Z i -z j)T~-l(
'I'
i j)
z-z

The square n,n matrix, is the covariance of vectors z,

= E r, - (z-z)
L(z-z) - TJ = z

and E(') denotes the expectation operator.


The expected value of the measure is given by,

where •

It is a measure of dissimilarity between the classes Si and sj,

and gives high values if they are orthogonal and low values when

they are similar.


- 187 -

3. Discrimination of inputs

We assume that the linear system (A,B) is asymptotically stable

and fully controllable. The controllability Gramian for the system,

le = Ax + Bu

is given by

WeT) .
for deterministic unit impulses at the inputs. For stochastic inputs

of the form

we can take the W-matrix as the covariance,

W ,.. limit Wet) "" limit E(X(t)XT(t)J


t-+<D t-+<D

If there is an impulse at the ith input, with all other inputs to

the system being held zero, the response of the system is given by

i i i
x (t) ,.. x (t) , bG-Mn, 1

i
where the vector b is the ith column of the matrix B.

We assume that the columns of the matrix B are normal which

can be achieved by scaling the inputs. '


That 1.S, (bi)Tb i - 1 for

all i. This normalisation is required to avoid discrimination of

two inputs which are identical except for their amplitudes.

The degree of dissimilarity between the controllable subspaces

due to ith and jth inputs can be measured using the Mahalanobis

distance measure for discrimination of vectors. We define this

measure as,
- 188 -

2
d .,
1.J - 1
trace W- {(W~~ + WJJ )
.. ..
- (W
ij ji
+ W )}

00
eAtbi(bj)TeATtdt
where W..
~J
= f
0

co T
eAtBBTeA tdt
W = f
0

The matrix wij can be computed by solving the Lyapunov equation,

=
2
Obviously, high magnitudes of d .. indicate dissimilarity of
1.J
controllable subspaces due to inputs i and j.

4. Properties of the discrimination index

(a) Under similarity transformations of the form,


-1
(A,B) + (TAT ,B)
2.. .
the scalar d ., 1.S 1.nvar1.ant.
~J

(b) The discrimination index is nonnegative

2
d.. > 0
1.J
5
(c) For input normal systems , the Gramian matrix W is equal to

unity, and thus the index can be written in the simplified form,

2 (1)
d. . :a
1.J

5. A modified measure

We assume, without loss of generality, that the system is input

normalized S • The distance measure for discrimination is then g1.ven

by eqn 1. If the ith column of matrix B is equal to the jth column,


- 189 -

then the distance between them is zero.

2
d..
1J
= 0

j
However, if b differs only with respect to sign, then the distance

is given by

2 ii
d .. ". 4 trace W
1J

In most applications, discrimination with respect to sign variations

is not required. Thus, we define a modified measure which is

insensitive to sign variations as,

-
d ..
1J
2 .
where tr is defined as

tr W = Iw··1
LL

where w.. are the diagonal elements of the matrix W. Thus, for
LL
i j
column vectors b and b which differ with respect to sign only, the

modified distant measure is given by,

- 2
d.. .. 0
LJ

6. Conclusions

A measure for discrimination of controllable subspaces due to

any two inputs in a multi input system has been defined using the

Mahalanobis distance. A modified measure was also defined which is

insensitive to sign variations. These measures, together with that

for controllability, provide meaningful measures for quantifying

the effectiveness of individual inputs. By using observability


- 190 -

Gramians instead of controllability Gramians, these measures can

also be used to analyse outputs in multiple-output systems.

7. References

1. Fernando, K.V., and Nicho1son, H.: 'The degree of controllability

due to individual inputs', Electronics Letters, 1981, 17, 9,

pp.330-331.

2. Tou, J.T., and Gonzalez, R.C.: 'Pattern recognition techniques',

Addison-Wesley, Reading, Ma., 1974.

3 •. Marill, T., and Green, D.M.: 'On the effectiveness of receptors

in recognition systems', IEEE Trans. Information Theory, 1963,

IT-9, pp.1l-17.

4. Mahalanobis, P.c.: 'On the generalized distance in statistics',

Proc. National Inst. Sci. India, 1936, 2, pp.49-55.

5. Moore, B.C.: 'Principal component analysis in linear systems:

Controllability, observability, and model reduction', IEEE Trans.

Automatic Control, 1981, AC-26, (1), pp.17-32.


- 191 -

PART 6

Closure
- 192 -

CHAPTER 15

Closure

1. Some percepts (and heresies)

Importance of the Gramian and covariance matrices and their

spectral decompositions in control and systems theory including

pattern recognition has been highlighted in this thesis. In all

the problems studied, removal of redundant or 'almost' redundant

information was the recurrent theme. Such superfluous data (often

due to noise, uncontrollable/unobservable modes, unreliable high

frequency effects, modelling errors etc.) unnecessarily complicates

the understanding of underlying mechanisms which govern the process.

Thus, removal of redundant data is prerequisite in the analysis of

complex or large-scale systems. In any case, the predicted

information explosL~ due to advances in microelectronics, fibre

optics, satellite communication and intelligent computers can be

deflated by efficient methods of information contraction.


,
In Part 2 of the thesis, where the Karhunen-Loeve expansion

and its extensions are studied, the models assumed for data reduction

and extrapolation were rather subtle or non-existent. However, in

Parts 3 and 4, formal state-space representations were used in the

analysis. It is well known that a large class of systems can be

modelled via state-space representations (or equivalently by transfer

functions) and thus model-order reduction methods and related

problems are paramount in system theory. However it is well known

that there are non-trivial problems in parameter identification even

with linear representations (for example, see reference 1 for


- 193 -

different results given by different techniques of identification

for a second-order scalar system). An alternative for identification

is mathematical modelling. However, except for systems which are

governed by well-behaved physical laws (such as electromagnetic

phenomena) modelling is no easy task and this leaves out most of

the social phenomena. Thus, state-space and other formal representa-

tions, although elegant from a mathematical point of view cannot be

considered as universal. Due to these difficulties in formal

representations, the author believes that more effort should be

spent on investigating the feasibility of using simple, subtle and

implicit models in representing dynamical systems. This is the

motivation for Part 5 where non-orthodox ways of characterization

of dynamical systems are studied.

The airline data which is prondnent in Part 2 has been analysed

in the literature using a wide variety of complicated time-series

techniques. However as indicated in Chapter 4, a simple model based

on the Karhunen-Loeve expansion/singular value decomposition can give

superior results to that of time-series methods. This is due to

the non-conformity of data to man-made assumptions. For example

it is often assumed that the data is stationary, does not have trends,

is asymptotically stable, etc, which are at odds with reality.

The author believes that by reducing the number of assumptions and

by using more simple models some of these problems can be avoided.

What are the alternatives if formal models are too complex or

non existent, which is the case for most of the phenomena observed

in real life?
- 194 -

This is a difficult question to answer. However, one approach

could be to decompose the data into their orthogona1 (perpendicular)

components so that each component can be studied as independent

scalar systems. In this way, it may be possible to understand

the underlying mechanics (if any) of the process without problems

associated with dimensiona1ity. This is the main reason for

diagona1ization of covariance/Gramian matrices.

One may argue that orthogona1ization procedure is an abstract

mathematical technique without any importance in reality. This is

not true and there are many situations where orthogona1ity has been

used in practice. There are a non-countable infinite number of

co1~ in the visible spectrum. However, we need only three

primary (independent) co1~. to represent the wide spectrum.

2. References

1. S6derstrom, T.: 'Identification of stochastic linear systems

in presence of input noise', Automatica, 1981, 17, (5),

pp.713-725.
- 195 -

PART 7

Appendices
- 196 -

APPENDIX 1

The double-sided least-squares problem

Abstract. The double-sided least-squares problem is formulated under

a separability condition, using the properties of the Kronecker

product to obtain the overall solution based on two standard sub-

problems.

1. Introduction. The standard matrix least-squares problem l is

concerned with obtaining a solution for the 'n' column elements of

the unknown state or parameter matrix X which are related to the 'n'

column elements of the observed matrix Y by the equation

Y - HX+E , YE-M
m,n
, H eM
m,p
XE:M
p,n
(1)

where H is a known matrix and E is a matrix of residual errors.

If H is of maximal rank p, the least-squares estimate, obtained

by minimizing the error matrix

, P E: M
m,m

is given by

The solution corresponds to a linear transformation of Y of the

form

giving the observed error matrix


- 197 -

y_y = (I-NP)Y ~ P- 1LY , L • P-PNP

and

In terms of the matrix elements, the relationship of eqn 1

is given by

y •.
l.J

The column j of the matrix Y correlates the column j of the matrix X

and no interactions are assumed to exist between the columns.

A similar least-squares problem can also be formed in terms


of observed and state row vectors, with the matrix representation

T
Y .. XG + E Y eM
m,n
,X ~M
m,q
, G (: M
n,q
(2)

where G is the known coefficient matrix. The least-squares

solution obtained by minimizing J - EQET is then given by

The elements of eqn 2 are given by

In this case, row i of the matrix Y correlates the same row of the

matrix X and no interactions exist between rows.

The problems represented by eqns I and 2 are equivalent since

the ordering of rows and columns are not generic properties. The

correlations or the 'flows' are in different directions but are,

however, uni-directional. Such relationships have very wide

application in classical least-squares estimation, but are

. inadequate for the representation of two-dimensional processes


- 198 -

involving two-directional flows. These require a more general

transformation between two matrices which will correlate both row

and column elements.


Such a relationship exists between the m,n dimensional

matrix Y and a p,q dimensional matrix Z with the Kronecker product


. 2
mapp1ng

vec(Y) - F vec(Z) FE-M Y6.Mmn, 1 (3)


mn,pq

or y - (G ® H) z G E:. M HE:M ,z~M 1


n,q m,p pq,

where vec(·) is the operator which stacks columns of a matrix into a

column vector and F is the Kronecker product of the matrices G and H.

Cross-correlations then exist between rows and columns with the

expansion

y ••
1J - Zk,t

Each element y .. of the matrix Y now depends on all the elements of


1J
the matrix Z instead of on one particular row or column. The

matrix equivalent of eqn 3 is given by

This includes a composition of two linear transformations, and the

equivalent vector map is formed using the tensor or Kronecker

product of two linear transformations. This form of representation


3
was known to Sy1vester •
- 199 -

2. The double-sided least-squares problem. We now consider the

double-sided least-squares formulation with the equation

y
T
• HZG + E (4)

relating the measurement or observed matrix Y and the parameter or

state matrix Z, where Hand G are known maximal rank matrices and

E is the residual error matrix. The vector form of eqn 4 is

y • Fz + e
where e = vec(E) • In a statistical framework, we may also introduce

the properties

E[eJ - 0 , - R
where E[·] is the expectation operator and R is the error covariance

matrix of the random noise vector e.

With a minimizing function

T
J • e Se S E: M (5)
mn,mn

and with F of maximal rank (pq) , the least-squares solution is

given by

z -

The problem dimension can now be reduced if the error criterion

matrix S is assumed to be separable, of the form

S - Q® P

The least-squares solution vector is then given by

z - [(G®H) T (Q®P) (GeJ H)] -l(G®H) T (Q(j9 P)y

- (GTQG) -l® (HTpH) -1 [(GTQ) ® (HTp)] y

- [(GTQG) -lGTQ] ® [(HTpH) -lHTp]y


- 200 -

and the corresponding least-squares solution matrix is

Z- (HTpH)-lHTpYQG(GTQG)-l (6)

= Z + (H TpH)-lHTpEQG(GTQG)-l

The measurement estimate then introduces a composition of two linear

transformations of Y of the form

where

The observed error matrix is given by

Y-Y • Y-NPYQM

The error function of eqn 5 can also be written in the form

or

where the operation * represents the bi1inear scalar product of

two similar dimension matrices defined by

m n
A * B - B *A - L L a .. b .•
l.J l.J
A,B c Mm,n
i-I j=l

I *B - trace B

Alternatively,

J - yT[s - S(M®N)S]y

_ yT[s - (QMQ)®(PNP)]y

_ Q * (yTpy) _ (QMQ) * (yTpNPY)


- 201 -

If the error covariance matrix R is assumed to be a separable

process, of the form

R - U®V U~M V~M


n,n m,m

and if the weighting matrices P and Q are set equal to the inverses

of the error covariance matrices respectively, then the error

covariance matrix for the least squares vector estimate z is given

by

E (z-z)(z-z) TJ -
[
AA

2.1 The equivalent decomposed problem. The double-sided least-

squares problem represented by eqn 3 can be decomposed into two

standard least-squares sub-problems. These are equivalent to

column and row 'scanning' and the estimate of the state matrix Z

or the 'image' can be formed from a combined solution of the sub-

problems.

The overall problem is then represented by the column problem

y - HX + E J
-
with the estimate X used as an 'observed' matrix in the row problem

x - , -
which will give the unknown state matrix Z, corresponding to eqn 6.

3. Conclusion. A solution has been given for the least-squares

estimate of the double-sided composite problem. If the error

criterion is separable, then the solution can be decomposed into

. two sub-problems which can be solved sequentially.


- 202 -

The double-sided problem has application in two-dimensional


4
curve fitting and prediction problems ,5.. It can also be used
6
for the solution of the inverse output feedback problem , which
requires solution for the unknown matrix P in the equation

A- A - BPC
where A and A are the open- and closed-loop system matrices respectively for
the linear dynamical system S(A,B,C).

4. References
1. NICHOLSON, H.: 'Sequential least-squares prediction based on
spectral analysis', Int. Jl. Compr. Math., Sect B, 1972,

3, pp.257-270.
2. BELLMAN, R.: 'Introduction to matrix analysis', McGraw Hill,

NY, 1960, 1970.


3. MARCUS, M.: 'Finite dimensional multi1inear algebra',
Marcel Dekker, NY, 1973.

4. FERNANDO, K.V.M., and NICHOLSON, H.: 'Discrete double-sided


Karhunen-Loeve expansion', Proc.IEE, 1980, l27D, (4),
pp.l~5-l60.

5. FERNANDO, K.V.M., and NICHOLSON, H.: 'Two-dimensional curve-


fitting and predietion using spectral analysis', Proc.IEE,
1982, 1290, in press.

6. PARASKEVOPOULOS, P.N., and KING, R.E.: 'A Kronecker product


approach to pole assignment by output feedback', Int.

Jl. Control, 1976, 24, 3, pp.325-334.


- 203 -

APPENDIX 2

The double-sided least-squares problem with diagonal constraints

Abstract The double-sided least-squares problem with a constrained

parameter matrix is formulated and solved using muitilinear products.

1. Introduction The double-sided least-squares problem has been


.
defined w1th the re 1 at10ns
. h'1p 1

T
Y .. HZG + E (1)

-K,...n' G=-Mn, ...n' where Y is the observed


with YE:Mm,n ,H<i: Mm, k' Z E:.K

data matrix, Z is the state or parameter matrix, Hand G are known

maximal rank matrices and E is the residual error matrix.

The quadratic error function for minimization is taken as

J .. eT(Q~P)e .. eTSe
.. Q * ETpE ..
2 3
where @ is the Kronecker product' and e .. vec(E) , where vec(o) is

the operator which stacks columns of a matrix into a column vector.

The operation * denotes the matrix inner product defined by

m n
A*B .. B * A" I I a .. b ••
i-I j-l 1J 1J
A,B£M
m,n

The least-squares formulation is now extended to consider

the case with Z constrained to be a diagonal matrix. The importance

of such constrained expressions are due to the form of spectral

expansions of matrices. For example, any square symmetrical matrix

W with distinct eigenvalues can be represented by


- 204 -

W • d .• u.u.
1.1. 1. 1.
T
- , W,D,U~M
m~

where U = [u.] and D •. 1 can


= diag[d 1.1. be identified as eigenvector
1.

and eigenvalue matrices respectively. Alternatively, U could be

a triangular matrix with diagonal elements equal to unity. Such

decompositions are useful in Gaussian elimination techniques and


5 If the matrix W is rectangular, then it can be
related problems •
.
represented by the Sl.ngu 1 ar va1 ue decompos1.t1.on
. . 4,5

W d •. u.v.
1.J 1. 1.
T
- , r - rank (W)

wi th dimensions WE:M ,U 6-M , D E::M ,V ~M ,where U and V


m,n m,r r,r n,r ,
T
are eigenvector matrices of WW and WTW, respectively, and D is the
T
square-root of the eigenvalue matrix A(WW ). Thus

.. , -
D • , A,DeM
r,r

This spectral decomposition is now proposed for the representation

of another matrix Y which is assumed to have approximately the same

eigenvector matrices U and V as the matrix W. The matrices Wand

Y could, for example, contain the observed outputs from a plant and

its model or the outputs from a system model and its reduced order

form. A diagonal eigenvalue matrix can be found by assuming that

the modes or eigenvectors of the original matrix W exist in the

matrix Y. The measurement equation is then written in the format

Y - UZv
T
+ E

where Z2 will give the eigenvalues or 'energy' values of the spectrum

of (yTy) with respect to the spectrum of (WTW).


- 205 -

2. The diagonal problem If the matrix Z is diagonal, then eqn 1

can be written in the form


T
Y = HZe + E , ZG~,k

• z .. F. + E (2)
11 1

T
where F.
1
= h.g.
1 1
, F. E:M
1 m,n

If the matrix set F., i - l •• k, is arbitrary, then the least-squares


1

problem is similar to that posed in reference 7.

The Kronecker product can be used in the expansion of eqn 2


T
instead of the dyadic product h.g. - both are tensor products and
1 1

hence equivalent. Then

y - z •• g.®h. + e
11 1 1.

• FZ + e
d

where y • vec(Y), e - vec(E), F€.Mmn,k and zd is the column vector


formed from diagonal elements of the matrix Z,

The unbiassed least-squares estimate of the vector zd is then

given by

and the matrix estimate Z can be formed using the above definition.

The symmetrical matrix FTSF is of the form


- 206 -

and the ij element can be reduced to the form

T T T
(F SF) .. - (g. Qg.)(h. Ph.)
1J 1 J 1 J

. FT SF can t h en b e decompose d 1nto


The matr1x • . 1
two symmetr1ca

matrices, with

3 4 8 2
where 0 denotes the Hadamard product ' , or the Schur product , which

is formed from element-by-element multiplication of the matrices A

and B. Thus

T
(F SF)..
1J
- a .. b •.
1J 1J
,

Since the matrices A and B are positive definite, with H,G,P and Q
3
of maximal rank, then by Schur's lemma ,4,8, the matrix FTSF is also

positive definite and thus nonsingular. The occurrence of the

Hadamard product A 0 B is not unexpected since it is a principal

submatrix of the Kronecker product matrix A~B3.

The vector FTSY is similarly given by

T T T T T
F Sy - (hI PYQgI hi PYQgi ••• hk PYQ~)

3. Conclusions The diagonally constrained double-sided 1east-

squares problem has been formulated and methods of solution

indicated. An applicatipn in two-dimensional curve fitting and


• •• • d6
prediction 1S be1ng 1nvest1gate •
- 207 -

4. References

1. Fernando, K.V.M., and Nicholson, H.: 'Double-sided least-

squares problem', Electronics Letters, 1979, 15, 20, pp.624-625.

2. Bellman, R.: 'Introduction to matrix analysis', McGraw-Hill,

New York, 1970.

3. Marcus, M., and Mine, H.: 'A survey of matrix theory and matrix

inequalities', Allyn & Bacon, Boston, 1964.

4. Rao, C.R., and Mitra, S.K.: 'Generalized inverse of matrices

and its applications', Wiley, New York, 1971.

5. Forsythe, G.E., and Moler, C.B.: 'Computer solution of linear

algebraic systems', Prentice-Hall, Eng1ewood Cliffs, NJ, 1967.

6. Fernando, K.V.M., and Nicholson, H.: 'Two-dimensional curve-

fitting and prediction using spectral analysis', Proc.lEE, 1982,

l29D, in press.

7. Basu, J.P., and Basu, R.: 'Estimations of proportions of given

populations when observable units contain several populations',

IEEE Trans. Systems, Man and Cybernetics, 1976, SMC-6, 11,

pp.775-777.

8. Halmos, P.R.: 'Finite-dimensional vector spaces', Van Nostrand

Reinhold, New York, 1958.


- 208 -

APPENDIX 3

Singular Perturbational Model Reduction in the Frequency Domain

Abstract: Singular perturbational approximations for linear

continuous-time and discrete-time systems are developed in the

frequency domain. It is shown that the familiar singular

perturbational result is an approximation at the origin in the

complex plane. However, if the system has multiple time-scale

effects, other approximations can be obtained at different locations

on the negative real axis to emphasize such behaviour. The

relationship between singular perturbational approximations and

direct subsystem elimination is also investigated.

1. Introduction

The singular perturbational method has become one of the best

popular methods for obtaining reduced-order representations of


2 3
linear systems ' • Some of the advantages of this technique are

due to its simplicity, the consistency with mathematical models of

some physical systems, and the relationship with other established


4
methods such as aggregation , the "dominant mode" methods 7 , and
. . 8
the Routh approxlmatlon However, the development of this method

has been mostly on an ad hoc basis and all theoretic implications

of the method have not been fully explained or understood.

The object of this note is to develop singular perturbational

approximations for continuous-time and discrete-time systems in the

neighbourhood of the negative real axis in the complex plane. The

negative real axis is paramount in singular perturbational studies, since


- 209 -

"fast" and "slow" phenomena depend on the real parts of the poles of the

system. It is shown that the usual perturbational result is an

approximation at the origin. However, for systems with mUltiple

time-scale effects, the origin need not be the most desired position

for approximation. Thus, we may generalize the singular perturbational

results by considering approximations in other positions of the negative

real axis.

2. The generalized approximation


We consider the linear stable system S(A,B,C) defined by

i:(t) • Ax(t) + Bu(t) , y(t) - Cx(t)


which has the transfer function

H(s) - C(sI - A)-lB

where s - a+jw is the complex frequency.

The system can be partitioned in the format,

B - [::1

where we assume that all submatrices conform to the orders of their sub-

systems • The transfer function can be written in the form,

-1
12
H(s) • [Cl -A ] [BB12]
sI-A
22

If we use the well known lemma for inverse of partitioned matrices


l
(sometimes known as K-partitioning , then the transfer function

can be expressed as
- 210 -

H(s) - H (S) + H (s)


1 2

where H1 (s) .. J
C(s) [sI-A(s) -lB(s)

A(s) - -1
All + A12 (sI-A22 ) Az1

B(s) - -1
B1 + A12 (sI-A22 ) B2

C(s) - Cl + C2 (sI-A22 )
-1
A21

H2 (s) - C (sI-A22 )
2
-1
B2

If the subsystem S(A ,B ,C ) is stable and is not dominant in the


22 2 2
neighbourhood of the frequency s = ao , then the system S(A,B,C) can be

approximated by the reduced order representation S(A(6),B(a


000
),C(a »

which has the transfer function

H(s)

We call this resu1 t the "generalized singular perturbationa1 approximation"

at s • ao •
In large-scale system studies, we may approximate the system at

different frequencies s = ao to study the behaviour on different time-scales,

provided such mUltiple time-scale effects are present in the system.

However, the subsystem being eliminated (generically denoted by S(A22 ,B ,C )


2 2
in this study) does not have to be the same subsystem at different

frequency approximations.
If the point of approximation is the origin, then we obtain the

familiar zeroth order singular perturbationa1 result,


-1
• All - A12A22 A21

B(O)
- BI - A12A22
-1
B2

C(O)
- 211 -

provided that the first moment (dc gain) of the second subsystem which
-1 B2 is "small" or singular compared wi th tha t of the
is given by C A
2 22
approximation. This seems to be more general than the conventional

assumption that the second subsystem S(A ,B ,C ) is "fast".


ZZ 2 2
The non-dominance of the subsystem S(A2Z ,B 'C ) at a nominal
Z Z
frequency s - a can be due to the pole structure as in "fas tIt subsystems.
o
However, this is not the only possibility and it could be due to the

numerator dynamics of the transfer function including any non-minimal

phase properties.

3. Direct subsystem elimination

In this section, we study the approximation at the other extreme of

the re a1 axi s wi th ao-+-co.


non-dominant at negative infinity, then the generalized approximation

is given by

as ao ... -

Thus, the generalized singular perturbationa1 approximation at negative

infinity can be obtained by direct elimination of the second subsystem.


S
Direct elimination of subsystems and the singular perturbational
6
approximation at the origin have been suggested for model reduction

of internally balanced systems. It is interesting to note that both

these methods are generalized singular perturbationa1 approximations.


- 212 -

4. Approximations for discrete time systems

For the discrete-time systems Sd(A,B,C) defined by

- -
the transfer function can be written in the form,

H(z) - C(zI-A)-lB

As in the continuous-time case, generalized singular perturbationa1

approximations can be obtained in a similar manner. However, as pointed

out by B1ankenship9, the usual approximation at the origin with

z = 0, corresponds to characterization of "fast" behaviour in the


o
discrete-time case rather than "slow" behaviour as in the continuous-

time problem.

If the second subsystem Sd(A22 ,B 2 ,C 2) is stable and is non-dominant

around the neighbourhood of z = 1, then the system can be approximated

where A(l) - All + A12 (I-A22 )


-1
A21

a(1)
- B1 + A12 (I-A22 )
-1
B2

e(1) . Cl + C2 (I-A 22 )
-1
A21

which characterise slow behaviour. However, this is not the only

approximation possible and any point on the real axis, but within the

unit circle, is a possible candidate frequency for obtaining a reduced-

order rode!.
- 213 -

5. Conclusions

We have shown the feasibility of singular perturbational

approximations in the frequency domain and defined generalized

singular perturbational approximations valid in the neighbourhood

of the real axis in the complex plane. The results developed in

this note give an alternative, and a more refined insight into the

singular perturbational model reduction problem.

6. References

1. Nicholson, Ho: 'Structure of Interconnected Systems',

Peregrinus, London, 1978.

2. Kokotovic, P.Vo, O'Malley, R.E., and Sannuti, P.: 'Singular

perturbations and order reduction in control theory - An

overview', Automatica, 1976, 12, pp.123-l32.


~
3. Sandell, Jr., N.R., Varjya, Po, Athans, M., Safonov, M.G.:

'Survey of decentralized control methods for large-scale systems',

IEEE Trans. Automatic Control, 1978, vol. AC-23, (1), pp.108-l28.

4. Aoki, M.: 'Control of large-scale dynamical systems by

aggregation', 1968, IEEE Trans. Automatic Control, AC-13, (3),

pp.246-253.

5. Moore, B.C.: 'Principal component analysis in linear systems:

Controllability, observability, and model reduction', IEEE Trans.

Automatic Control, 1981, AC-26, (1), pp.17-32.

6. Fernando, K.V., and Nicholson, H.: 'Singular perturbational

model reduction of balanced systems', IEEE Trans. Automatic

Control, 1982, AC-27, (2), pp.466-468.


- 214 -

7. Nicho1son, H.: 'Dynamic optimization of a boiler', Proc.IEE,

1964, 111, pp.1479-1499.

8. Hutton, M.F., and Friedland, B.: 'Routh approximations for

reducing order of linear systems', IEEE Trans. Automatic Control,

1975, AC-20, pp.329-337.

9. B1ankenship, G.: 'Singularly perturbed difference equations

in optimal control', IEEE Trans. Automatic Control, 1981, AC-26,

(4), pp.911-917.
- 215 -

APPENDIX 4

On the Applicability of Routh Approximations

and Allied Hethods in Model-order Reduction

Abstract: Routh approximations used in model-order reduction tend

to preserve high-frequency behaviour while low-frequency

approximations are usually required in control design. This

high-frequency bias can be remedied using reciprocal transformations.

We demonstrate that if low frequencies are not dominant or unimportant,

then reciprocal transformations should be avoided. We also show

that some of the disadvantages of the Routh method recently

reported in the literature are avoidable.

1. Introduction
l
The Routh approximation method of Hutton and Friedland and

similar techniques known variously as Hurwitz approximations and


. . 4,5,6. .
Routh-Hurwitz approx1mat10ns prov1de a conven1ent and simple

procedure to obtain reduced-order representations of linear systems

described by transfer functions. The resultant reduced-order

models are always stable provided the original systems are stable.

The main peculiarity of this method is the tendency to preserve

high-frequency behaviour of the system at the expense of the low

frequencies. However, most model-order reduction methods are

geared to give low-frequency approximations rather than high-frequency

approximations, a tradition initiated in references 7 and g following

Rosenbrock's original work on modal control. This practice is


- 216 -

due to the importance of slow-time behaviour in control systems

design. Furthermore, in most physical systems low-frequency

effects dominate and thus low-frequency approximations are paramount

in design procedures. In Routh approximation methods, the high-

frequency bias is avoided by using a reciprocal transformation,


l
before and after model-order reduction • This reciprocal trans-

formation, essentially changes low-frequency effects into high-

frequency effects and vice versa. Reciprocal transformations are

easy to implement and are of the form given by


.
H(s) H(s) - lIs H(l/s)
.
where the transfer functions H(s) and H(s) denote the original system

and the transformed system, respectively. If the transfer

function is of the form,

H(s) .. + ••••• + b
n
+ ••••• + a
n

then the transformed system is given by

n-l
. bns + ••••• + b
l
H(s) • n
ans + ••••• + a O

which only involves reordering of the coefficients of the original

transfer function. The use of reciprocal transformations has

become the standard (or rather the orthodox) practice in Routh

approximations, and often the application of this transformation

is not explicitly acknowledged.


- 217 -

The main objective of this Appendix is to indicate the

frequent misuse of reciprocal transformations in the literature.

It is obvious that if the high frequencies dominate in a system,

reciprocal transformations should be avoided and the Routh

approximation technique should be used directly. Otherwise,

the dominant high frequencies will be attenuated and the resultant

reduced-order model will represent the non-dominant low-frequency

effects. Similarly, if the high-frequency behaviour is important

(say, in understanding the transient response) and a high-frequency

approximation is desired (perhaps, in addition to a low-frequency

reduced-order representation) then reciprocal transformations should

not be used.
2 3
Recently, Shamash' used two examples to discredit the Routh

approximation and allied techniques. We investigate these two

examples in the context of the use of reciprocal transformations

and show that the defects pointed out by Shamash can be easily

rectified.
To avoid ambiguities, we call the technique the Direct Routh

Approximation (DRA) if reciprocal transformations are not used.

Otherwise, we call it the Reciprocal Routh Approximation (RRA).

2. Illustrative Examples
3
Example 1: Shamash used the following problem as a counter-example

for the Routh approximation method. The transfer function is of

the form,
- 218 -

2
... 1005 + 11005 + 1000 100(5+1)(5+10)
H(s) 3 2 • (8+1)(5+10)(5+100)
s + 1115 + 11105 + 1000

where two of the numerator zeros cancel out two of the poles. These
additional poles representing low-frequency effects were introduced

to 'confuse' those model-order reduction methods which are based on

truncation of high-frequency behaviour. Obviously, the preferred

first-order reduced model should be of the form,

100
(s + 100)

which is, in fact, the first-order Pade' approximation of H(s)3.

The first-order RRA is of the form,

0.9009
= (5 + 0.9009)
(RRA)

which is clearly, a bad approximation as pointed out by Shamash.

However, in model-order reduction of large-scale systems, this will

not be evident and the validity has to be checked by accounting for

the impulse energy of the system. The impulse energy, defined by

co
2
IIhl! 2 ... f h (t) dt
o

where h(t) is the impulse response of the system H(s), is given by

the a and B parameters of the system1 in the form,

n
11 h 11 2 - i=l
I
- 219 -

These parameters occur in Alpha and Beta tables which have to be


1
computed in the Routh approximation method •

The value of the impulse energy of the original system R(s)

~s given by Ilhl\ 2 .. 50 while the value for R1 (s) is IIhl112 .. 50/111.


This conclusively indicates that the reduced-order result R (S) is
1
a bad approximation. In this case, energy loss is due to high-

frequency 'leakage' if signal processing terminology is used.

To investigate the high-frequency behaviour, we have computed

the DRA as,

... 100
(DRA)
s + 111

Although, this approximation is not as good as the Pade' solution,

the result is reasonable. The impulse energy of the approximation

is given by IIg 1\ 2
1
= 50(100/111) which is near the value of the

original sys tem. Thus, by comparing energy values, we may conclude

that high frequencies dominate in this system and that the DRA

gives the best overall approximation. By computing the RRA as

well as the DRA and their impulse energies, we have avoided the

disadvantages pointed out by Shama5h.

Example 2: The following transfer function

3 2
8169.135 + 50664.97s + 9984.325 + 500
R(s) = 3 2
100s4 + 10520s + 321015 + 101055 + 500

81.6913(5 + 6.004)(s + 0.1009 + jO.0025)(s + 0.1009-jO.OO25)


=
(s + 100)(s + 5)(s + 0.1)2
- 220 -

2 3
was also investigated by Shamash ' • Approximate cancellation of

the poles at -0.1 is possible and the preferred second-order

approximation should be of the form

81.6913(5 + 6.004)
H(s)
(5 + 100) (5 + 5)

Clearly, the high-frequency effects are dominant in this system.

The RRA is of the form

0.1936s + 0.009694 (RRA)


=
(52 + 0.19595 + 0.009694)

0.1936(5 + 0.05007)
= (s + 0.09796 + jO.009921)(s + 0.09796 - jO.OO9921)

and obviously, it represents the low-frequency effects. The impulse

energy for this representation is given by IIh211 2 • 0.1204 which

indicates high leakage when compared with Ilhl! 2 • 34.07 for the

original system.

The DRA is given by

- (s
2
81.69s + 506.6
+ 105.2s + 520.0)
(DRA)

- 81.69(5 + 6.201)
(s + 5.201)(5 + 100.0)

which is a high-frequency approximation. The validity of the DRA

as an overall approximation is evident from the impulse energy

11 g2 11 2 :a 34.06 which is near the original value.

These results demonstrate that Routh approximations can give

reasonable results even when high-frequency effects are dominant.


- 221 -

3. Conclusions
We have demonstrated that reciprocal transformations should

not always be used in obtaining reduced-order models if the high-


frequency behaviour of the system is dominant or important. This
result, in retrospect, is very obvious. By methodically computing
the Reciprocal Routh Approximations and the Direct Routh Approximations
and their impulse energy values, the pitfalls reported in the
literature concerning Routh approximations can be avoided.

4. References
1. Hutton, M.F., and Fried1and, B.: 'Routh approximations for

reducing time-invariant systems', IEEE Trans. Automatic Control,

1975, AC-20, (3), pp.329-337.


2. Shamash, Y.: 'Viability of methods for generating stable
reduced order models', IEEE Trans. Automatic Control, 1981,

AC-26, (6), pp.1285-1286.

3. Shamash, Y.: 'Failure of the Routh-Hurwitz method of reduction',


IEEE Trans. Automatic Control, 1980, AC-25, (2), pp.3l3-314.
4. Appiah, R.K.: 'Linear model reduction using Hurwitz polynomial
approximation', Int. J. Control, 1978, 28, pp.477-488.
5. Shamash, Y.: 'Note on equivalence of the Routh and Hurwitz

methods of approximation', Int. J. Control, 1979, 30, pp.899-900.


6. Langholz, G., and Feinmesser, D.: 'Model reduction by Routh

approximations', Int. J. Systems Science, 1978, vol.9, pp.493-496.

7. Davison, E.J.: 'A method for simplifying linear dynamic systems',

IEEE Trans. Automatic Control, 1966, AC-ll, (1), pp.93-101.

8. Nicholson, H.: 'Dynamic optimisation of a boiler', Proc.IEE,

1964, 111, (8), pp.1479-l499.

Das könnte Ihnen auch gefallen