Sie sind auf Seite 1von 133

OF OBSERVATIONS

## DEPARTMENT OF GEOMATICS ENGINEERING

ENGO 361: Adjustment of Observations H(3-3), WINTER 2001

Naser El-Sheimy
1. INTRODUCTION TO ADJUSTMENT OF OBSERVATIONS
1.1. Why Observations?
Geomatics Engineers are usually faced with the problem of estimating some unknown
quantities (parameters). This is done through collecting several measurements of some
kind known as observations – and then adopting the appropriate mathematical model
relating both observation and unknowns.

## ♦ Observations generally require some form Parameters

(Required +
instrumentation that is used by an observer in certain Reliability
environment. Measure)

## ♦ All observations contain errors.

♦ An error it the difference between an observation of a
quantity and the true value (which can never be
known): Model
ε = l −t
ε ... true error
l... observed value
t... true value Measurements

## ♦ Since t can never be known, ε will never be

Errors
known as well.
♦ Both quantities can, however, be estimated: Observer Through
(By) Environment
Using
v = l − xˆ Equipment
v...estimated error ( estimated residual )
xˆ...estimated value

Page 1
1.2. Sources of Errors

Sources of errors

## Personal Instrumental Natural

•Limitation of the observer (the •Due to imperfect construction •Due to changing environmental
ability to repeat the same or incomplete adjustment of the conditions in which the
•Carelessness of the observer •e.g., incorrect graduation •e.g., Temperature variation
causes expansion/contraction of
the chain

## 1.3. Types of Errors

1) Gross Errors or blunders or mistakes
 Characteristics: Its magnitude is significantly very large/small/different in
comparison to the measured values (abnormal observation).
 Source: Personal errors (carelessness of the observer)
 Effect: Inhomogeneous observables
 Treatment: Must be detected and eliminated from the measurements by
careful checking of the measurements.
e.g., measuring a distance
31.1 m, 31.3 m, 31.2 m, 13.1 m, 31.15

## Gross Error Systematic

Error

Page 2
2) Systematic Errors
 Characteristics: Occur according to so deterministic system, when known,
can be exposed by some functional relationship.
 Source: Instrumental of natural personal of all
 Effect: Shifting all the observation it can be constant if it’s magnitude and
sign remain the same throughout the measuring process.

##  Treatment: Must be detected and corrected by, e.g. calibrating the

instruments before being used.
 Can be eliminated by:
2. using certain procedures during measurement, e.g. survey levels
collimation errors can be minimized by taking measurements at equal
distances from the level.

 Example:

Collimation
error
∆ ∆

L L
3) Random Errors
 Characteristics: The remaining errors after gross and systematic errors
have been removed.
 They have no functional relationship based upon a deterministic system,
usually modelled by stochastic model (probability theory)
 Source: Personal, natural and instrumental.
 Cannot be generally eliminated however it can be minimised by taking
redundant observations and applying the so called “Method of Least
Squares”
 This process is referred to as the “Adjustment of Observations” or
“Adjustment Computations” which is our main concern in this course.
 Based on the above fact, we can’t seek the ‘true’ value, all we can get is an
‘estimate’ for the ‘true’ value.

Page 3
 The small variations between the measurements and the ‘true’ value or its
‘estimate’ are regarded as “errors.”
1.4. Notation used in Least Squares

## ♦ Parameters: The unknown quantities, will be denoted by

x = [x1 ... xu ]
T
x2
u... number of unknowns
♦ Observations:

L = [l1 ... l n ]
T
l2
n... number of observations (measurements)

## ♦ Mathematical Model (discussed in further detail in chapter 3).

0 = f(x, l)
The function that relates x and l.

## 1.5. Types of Observations

Types of Observations

Direct Indirect

## Under the same Under different

Conditions conditions

## ♦ Direct observation under the same conditions

 Observer, instruments, and environment are all the same
 e.g., measuring an angle:
Parameter:
x = α, u = 1
Observations:
L = [α 1 α 2 ... α n ]
Math Model

Page 4
xˆ = Mean ( L)

## ♦ Direct observation under different conditions

 Any one of observer, instrument or environment changes
♦ Indirect observation
 x is measured indirectly
 e.g., measuring a height using angles and distances:
Parameter:
x = h, u = 1
Observations:
L= θ[ d ]
Math Model θ h
xˆ = d ⋅ tan(θ ) + hI
hI

Page 5
1.6. Behaviour of Random Errors

♦ For simplicity we’ll consider the direct case under the same conditions.
♦ Assumption: All measurements are free of gross errors and corrected for all
systematic errors.

## Practical Case Random Case

Theoretical Case
No. of obs. n → finite number n→∞
Variable Random Sample Random Variable
L = (l1 l2 ln)
Estimate Σl xˆ → t (true value)
xˆ= i (Mean)
n
Errors Residuals True Errors
v1 = xˆ − l i (Σv1 = ο ) εi = t − li
Errors
Relative Frequency
(Distribution 1
RF
of errors)
curve

-ε ε
1
-v 0 v +v
Probability Distribution Function
Range = vmax - vmin (PDF)

## RF1 is the number of observations which

have residuals between ο and V1

## Probability Distribution Histogram

(PDH)

Page 6
1.7. Understanding the Meaning of Residual Errors (v)

♦ Given L = (l1 l2, … ln) and accepting, for the time being, the fact that the
arithmetic mean Error! Objects cannot be created from editing field codes. of all
lI is the best estimate Error! Objects cannot be created from editing field codes.
for the measured quantity – i.e.,

Σl i
xˆ = x =
n
we can compute vi as vi = xˆ − l i
♦ The residuals express the degree of closeness of the repeated measurements of the
same quantity to each other and therefore the (v) values can be used in expressing the
precision of xˆ (and also the precision of the observer who take/made these
measurements.)
e.g. two observers A and B measure the same angle.

A B
V1 = +2″ V1 = +5″
V2 = -1″ V2 = -3″
V3 = +1″ V3 = +2″
V4 = 0 V4 = -3″
V5 = -2″ V5 = -4″
V6 = +3″
Now, we define the range over which the residuals change as,

## Range A = 2″ - (-2″) Range B = 5″ - (-4″)

= 4″ = 9″
Since Range A < Range B

We conclude that the angle computed from the data set of (A) is more
precise if it is computed from the data set (B)

Page 7
1.8. Determining the Probability Distribution Histogram (PDH)
1) Calculate the estimated parameter:
Σl i
xˆ = x =
n
2) Calculate the residuals for each observation:

vi = xˆ − l i
3) Calculate the range of the residuals:

## Range = v max − v min

4) Divide the range into k equal intervals ∆j (j=1,2,…k).
5) Calculate the relative frequency for each interval.

nj RF
fj =
n⋅∆ j
n j ... the number of residuals
that fall within the fj
boundaries of ∆ j

-v +v
∆j
Range

Page 8
Notes:
 The area under the histogram = 1
k
APDH = ∑ Ai RF
j =1
k
= ∑ fi ⋅ ∆
j =1
k nj
=∑ ⋅∆
j =1 n⋅∆
A1 A2 A3 A4 A5 A6
k nj
=∑ -v
∆ ∆ ∆ ∆ ∆ ∆
+v
j =1 n
Range
k
nj
1
= ∑ n =1
n j =1 =
n
 The histogram can be used for probability computation.

v1 ≤ Ρ ≤ v2 = Α ( v1 to v2 ) (v1 to v2)
A
v1 v2

Page 9
1.9. Characteristics of PDF

♦ Many scholars throughout the history of statistics have tried to describe the PDF
curve
♦ A commonly accepted model was given by Gauss (Gauss PDF)
h
G (ε ) =
2ε 2
e −h e = 2.71828
π

G(ε)

G(εi)

-ε εi +ε

Note: h is the only parameter that completely describes the shape of the Gauss
PDF.
h
♦ G (0 ) =
π
♦ h is usually called “precision index”. Precision α h
♦ Since the area under the curve = 1, then the higher the PDF (larger the h), the
narrower the curve must become – i.e. less range (more precise)

high precision

h1 low precision
π
h2
π

Range 1
Range 2

Page 10
Properties of Gauss PDF
1) Area under the curve = unity

∫−∞
G (ε )dε = 1

## 2) Symmetric around ε = 0. Positive and negative errors have equal probability to

occur (no systematic errors).
P (ε ) = P(−ε )

-ε ε

max

-ε 0 +ε

## 4) Asymptotic to the ε axis at ε = ± ∞. The probability of very large errors is

negligibly small. (i.e. no gross errors)

Page 11
1
5) Two points of inflection at ε = ± . These points occur where second
h 2
∂ 2 G (ε )
derivative of function is equal to zero - = 0.
∂ε 2

h
π

1 1
h 2 h 2

6) Statistical properties
ε
( 1 ) = −∫∞1G(ε )dε = A1
Ρε ≤ε

Ρ (ε ≥ ε ) = ∫ G (ε )dε = A

2 ε 2
2
ε2 A1 A2
= 1 − ∫ G (ε )dε
−∞
ε2
( 1 2 ε
)
Ρ ε ≤ ε ≤ = ∫ G (ε )dε
−∞ ε1 ε2 +∞
1
(More when we discuss statistical analysis)

Page 12
1.10. Reliability of Measurements

♦ Recall that the true value t of certain unknown quantity can never be obtained.
However, an estimate xˆ can be determined.
♦ xˆ will not be satisfactory from the client point of view because it is influenced by
random error.
♦ Therefore, we need a certain “measure” of the existing random errors to describe its’
“goodness”, “reliability”, and “repeatability”
♦ This measure should help as well in accepting or rejecting certain observations
depending on the desired precision.
♦ Three terms are commonly used in expressing the reliability of measurements:
1. Precision: the degree of closeness of repeated measurements of the
same quantity to each other. Precision is affected only by random
errors.

## high precision low precision

-ε +ε -ε +ε
closely clustered widely clustered

## 2. Accuracy: the degree of closeness of a measurement to the true value.

Accuracy is affected by both random and systematic errors.

-ε +ε
systematic error

## 3. Uncertainty: the range within which it is expected the error of a

measurement will fall.

Page 13
♦ Consider a circular target, where t is the centre of the target

precise but not not precise but not precise & precise and
accurate accurate not accurate accurate

## Precision –––––––––––––––––––→ Internal Reliability

Accuracy –––––––––––––––––––→ External Reliability

## ♦ In the absence of systematic errors, accuracy is equivalent to precision.

Measures of Precision

## 1) Average error (ae)

 The arithmetic mean of the absolute value of errors
 Random Sample

Random Sample

t = unknown t = known
Σ vi Σεi
ae = ∂e =
(n − 1) n

vi = xˆ − l i εi = t − li
Σl i
xˆ = x =
n

 Random Variable

1
∂e = ∫ ε G(ε )dε = h
−∞ π

Page 14
2) Probable Error (Pe)
 Pe has the following property: Half the resulting errors are smaller in
magnitude than Pe and the other half are larger than Pe.

50% of vi > Pe
50% of vi < Pe

 How: Arrange the absolute values of the errors into either ascending or
descending order.
For odd numbers Pe = V n +1 
 
 2 

e.g. v1 v2 v3 v4 v5
Pe = v3

1  n 
For even numbers Pe = V   + V (n 2 + 1)
2   2 

e.g. v1 v2 v3 v4 v5 v6

v3 + v 4
Pe =
2
 Problems with Pe

1″ 1″ 3″ 5″ 6″ 7″ 11″

1″ 1″ 3″ 5″ 6″ 7″ 100″

Pe

− Pe

## and, due to the symmetry of 0.5

Gaussian PDF:
Pe

2 ∫ G (ε )dε = 0.5
0

0.4769 +∞
Pe → −∞
-Pe +P
h

Page 15
3) Standard Deviation (σ)
 σ is defined as the square-root of the arithmetic mean of the sum of
square of the errors.
 The square of the standard deviation – σ2 – is known as the “variance” or
“mean square error” and consequently σ is sometimes referred to as the
Root Mean Square Error (RMSE)
Error! Objects cannot be created from editing field codes.

Random Sample

t = unknown t = known
1 n 2 1 n 2
σ2 = ∑ v1
n − 1 i =i
σ2 = ∑εi
n i =i
We have only one unknown, and hence we σ : Root mean square error
only need a single observation to determine (RMSE)
it. The remainder (n-1) is usually known as
the “Degrees of Freedom” or “Redundancy”,
which we will refer to as “r”

r = n−u
r... redundancy
n... number of observations
u... number of unknowns

##  σ for random variable

1
σ 2 = ∫ ε 2 G (ε )dε =
−∞ 2h 2

1
σ =
h 2

Page 16
Unlike the previous two measures of precision, σ has some distinguished properties
which make it a preferred measure of precision, they are:

G(ε)

-1 1
h 2 h 2

## 2. It has a Physical interpretation related to the PDF

G(ε)

-3σ -2σ -σ σ 2σ 3σ

Ρ(− σ ≤ ε ≤ σ ) = 0.683
Ρ(− 2σ ≤ ε ≤ 2σ ) = 0.954 called confidence interval
Ρ(− 3σ ≤ ε ≤ σ ) = 0.997

## • The 3σ limit is usually used in practice as a criterion for rejecting bad

observations.
• Residuals greater than 3σ are usually treated as gross errors.

## 3. Because of the squaring process of the residuals in the calculation of σ, the

magnitude of large errors are completely represented and reflected in the
computed σ value

Page 17
1.11. Review of Uni-variate Statistics

## ♦ Uni-variate: Single variable (i.e. we deal with “measurements” of a single quantity –

e.g. a distance, angle, etc.)

Given: L = l1 , l2 , l3 , ……… ln

## Required: Best estimate of x̂

♦ Best estimate x̂

∑l
xˆ = x = i =i

♦ Variance of x – σ x2

## σ x = σ x2 σx units of x (e.g. m or degrees)

σ 2
=
∑v 2
σ2x square units of x (m2 or degree2)
x
n −1

## ♦ Variance of best estimate – σ x2

σ x2
σ x2 =
n

Note: Since x̂ is the best estimate of the unknown parameter out of a group of
measurements (having a variance of σ x2 ), therefore σ x2 should be better than σ x2
σx <σx
σ x ... precision of the mean
σ x ... precision of a single measurement

σ x2
♦ Proof that σ = 2
x :
n
To determine the standard deviation of the mean, we begin with the expression for
the mean:

Page 18
n

∑x i
x1 x 2 x
x= i =i
= + +................. + n
n n n n

## Then, we use the Law of Propagation of Variances:

 Given x = f (a,b,c) with σa ,σb ,σc
2 2 2
 ∂f   ∂f   ∂f 
σ =   σ a2 +   σ b2 +   σ c2
2
x
 ∂b   ∂b   ∂c 

 e.g. x = x1 + x 2 = f ( x1 , x 2 )
2 2
 ∂f  2  ∂f  2
σ = 
2
x
 σ x1 +   σ x 2
 ∂ x1   ∂ x 2 

## σ x2 = (1)σ x21 + (1) σ x22 = σ x21 + σ x22

2

♦ Applying the law of propagation of variances to the equation for the mean

2 2 2
1 1 1
σ =   σ x21 +   σ x22 = ............... +   σ xn2
2
x
n n n
2
1
(
=   σ x21 + σ x22 + ............... + σ xn2 )
n
1
(
= 2 nσ 2 )
n
σ2
=
n
σ
=
n
♦ This yields the final expression for the standard deviation of the mean

σx =
∑v 2
i

(n − 1)n

Page 19
1.12. Direct Observation with Different Conditions

♦ Consider that two observers A and B are measuring the same quantity X (say a
distance or angle) and that it is required to estimate the best estimate of X by making
use of the two sets of observations
Observer A Observer B
α1 α1
α2 α2
α3 α3

αn αm

## ♦ Note that each group of observations is considered as a uni-variate problem with it

own best estimate and standard deviations.
Observer A Observer B
n m

Best estimate: ∑α i ∑α i

αA = i =i
αB = i =i

n m

σ αA =
∑v 2
i
i = i...n σ αB =
∑v 2
i
i = i...m
Precision: n −1 m −1
vi = α A − α i vi = α B − α i

♦ Now, α A and α B can be considered as measurements – they are the best estimate of
two groups of measurements for the same quantity (X).
♦ Therefore our original problem can be formulated as follows:

Given: α A , σ α A and α B , σ α B

## Required: The best estimate of α

αA +αB
♦ Could we say α = ? NO - because this will not take into account
n
difference in precision expressed by σ α A and σ α B

♦ To account for the difference in accuracy between the two sets of observations, we
introduce a new quantity called the weight:

1

σ2
Why?

Page 20
 Recall

∑ vi
2

## σx = where vi is an error (Residual), and

n −1

σx =
∑v 2
i

(n − 1)n
 Therefore, a measurement of high precision will have a small variance and
vice versa.
 Since the value of variance goes in opposite direction to that of precision,
another measure of precision is often used.
 We call this quantity “Weight”.

∑ Piα i p α + p Bα B
α= = A A
∑ Pi p A + pB

1 1
pA = pB =
σ α2A σ α2B

♦ In general:

x̂ =
∑ Px Weighted mean
∑P

σ xˆ =
∑Pv 2
1 1

(n − 1)∑ P
♦ Note: is all the observations have the same variance (i.e. equal weights), the above
formulas will be the same as the mean and variance equations discussed before

Page 21
ENGO361 Naser El-Sheimy

1. MULTIVARIATE STATISTICS
♦ Geomatics problems (surveying problems) normally include the measurement of
several quantities. In turn, these measurements are used to determine several
unknown parameters.
♦ The measured quantities cannot usually be treated separately. Instead, they must
be dealt with simultaneously. Both the effect of each quantity on the others and
the statistical relationship between quantities must be taken into consideration in
order to obtain a meaningful solution of the unknowns.
♦ A multivariate consists of several univariate, e.g.:

## L = [ a, b, c ] where a, b, and c, are univariate.

That is,
a = [a1 , a2 , …an] with a ,σ a , and σ a
b = [b1 , b2 , …bm] with b ,σ b , and σ b
c = [c1 , c2 , …ck] with c ,σ c and σ c
♦ Example:
Consider the situation shown in the figure below:

d1 sin α1 A α2 Unknowns
x = [xa, ya, xb, yb]

d1 B
α1 Observations
d1 cos α1
L = [α1 , d1 , α2 , d2]
X

## ya = y0 + d1 cos α1 yb = y0 + d1 cos α1 + d2 cos α2

Any errors in α1 and d1 will effect the accuracy of (xa, ya)
Any errors in α1, d1, α2 and d2 will effect the accuracy of (xb, yb)

Page 1
ENGO361 Naser El-Sheimy

## ♦ The following quantities summarises the statistical information of the multivariate

L = [ a, b, c ]:
1. The Mean of Multivariate
(
L = a ,b ,c )
∑ ai ∑ bi ∑ ci
a= b= c=
n m k
2. Variance of the multivariate
α2L = (α2a σ2b σ2c)
3. Variance of the mean of the multivariate
σ 2 σ 2 σ 2 
α2L = (α2a σ2b σ2c) =  a b c 
 n m k 

1.1. Covariance

## ♦ Covariance is a measure of the degree of correlation between any two

components of a multivariate.
♦ For example, if we have the following set of measurements:

## L = (a, b, c) a, b, and c have the same number of observations.

Then the covariance between a and b is given by:

1 n
σ ab = ∑ vai vbi
n − 1 i =i
(units of a⋅b)

vai = a − ai

vbi = b − bi
♦ σab has the physical units of a multiplied by the physical units of b. That is, the
covariance has no specific units and can take any value between −∞ → + ∞ (i.e.
no limit)
♦ The covariance between the mean values of a and b is:

σ ab
σ ab =
n

Page 2
ENGO361 Naser El-Sheimy

♦ In practice, all the variances and covariances of a multivariate are assembled into
one matrix called the variance-covariance matrix (v-c matrix), or simply the
covariance matrix.
 σ a2 σ ab σ ac 
 
C L = σ ba σ b2 σ bc 
σ ca σ cb σ c2 

## ♦ A similar expression for C L can be written for the variance-covariance matrix of

the mean.
♦ If the elements of the multivariate are statistically independent (no correlation),
then the variance-covariance matrix will be diagonal matrix.
σ a2 0 0
 
(
C L =  0 σ b2 0  = diag σ a2 σ b2 σ c2 )
0 0 σ c2 

## Properties of the Variance-covariance Matrix

1. Symmetric, that is
σ ij = σ ji
2. Its Diagonal elements are positive
3. Non-singular matrix – i.e., the variance-covariance matrix must be
invertable. This also means that the determinant of CL should not equal
zero. This property is very essential for the purpose of computing the
weight matrix P needed in the least squares adjustments, where P = C-1L.

## 3 4  The variance cannot be a negative

A=  number
4 − 1

5 1 − 2 
A = 1 3 0 
Matrix is not symmetric
2 0 4 

6 6 
A= 
Matrix is not invertable – determinant is
6 6  equal to zero ( |A| = 0 )

Page 3
ENGO361 Naser El-Sheimy

related.

σ ab
ρ ab = ρ ba =
σ aσ b

## Properties of the Correlation Coefficient

1. Unit-less
a → distance cm
eg. σ ab = cm sec ρab = cm .sec /cm .sec
b → angle sec

2. Has a limits of ± 1
σ aa σ a2
ρ aa = = =1
σ aσ a σ aσ a
♦ If
ρab = 0 completely uncorrelated
ρab = +1 completely positively correlated
ρab = -1 completely negatively correlated
ρab<1 completely correlated

## 0 < ρ ab < 0.35 weak correlation (i.e. strong solution)

0.35 < ρ ab < 0.75 significant correlation
0.75 < ρ ab < 1 strong correlation (i.e. weak solution)

## ♦ Similar to the v-c matrix, we can construct the “correlation matrix” ρL

 1 ρ ab ρ ac 
ρ L =  ρ ba 1 ρ bc 
 ρ ca ρ cb 1 

NOTE:
 σ a2 ρ abσ aσ b ρ acσ aσ c 
 
Cl =  σ b2 ρ bcσ bσ c 
 sym σ c2 

Page 4
ENGO361 Naser El-Sheimy

## 1.3. Geometrical Interpretation of the Covariance and Correlation

b b b

a a a
a increases a increases Very weak
b increases b decreases correlation
ρab → +ve ρab → -ve

b b

or

a a
a no change a increases
b increases b no change
No correlation

Page 5
ENGO361 Naser El-Sheimy

## ♦ A mathematical model is comprised of two parts;

1. Functional Model: Describes the deterministic (i.e. physical, geometric)
relation between quantities
Expresses the functional relationship between quantities
f(x, l, c) = 0 (all may be vector quantities )
c…Constants
- e.g. the speed of light)
- treat as absolute (known) quantities
- σ c2 = 0 and W ∝ P = 1 2 = ∞
σc
x…Unknown parameters
- the quantities we wish to solve for
- e.g., Area of a triangle, co-ordinate (x, y, z) of a
point
- usually treated as having zero weight (but doesn’t
have to be)
1
- Ρx ∝ = 0 → σ x2 = ∞
σ x2
l… Observables
- measurements
- e.g., distances, angles, satellite pseudoranges
- 0 < σ l2 < ∞

## 2. Stochastic Model: Describes the non-deterministic (probabilistic) behaviour of

model quantities, particularly the observations
σ 2 σ ab 
e.g., l(a, b) Cl =  a 
σ ba σ b2 

Page 6
ENGO361 Naser El-Sheimy

1) Direct Model

## x u,1 = g m,1 (l n,1 )

x = [x1 ... xu ]
T
x2 number of unknowns = u

g = [g 1 ... g m ]
T
g2 number of functions = m

l = [l1 ... l n ]
T
l2 number of observations = n
The model is direct with respect to the parameters
One equation per parameter (i.e. u = m)
The parameters are expressed directly as functions of the observations
♦ Example:
Observations:
A
l = [a b]
T
(n = 2)
Unknowns:

c a x = [A c]
T
(u = 2)
Functions:

b g = [g 1 g2 ]
g1 ⇒ A = a ⋅ b (m = 2)
g2 ⇒ c = a2 + b2

## The model is indirect with respect to the parameters

one equation per observation (h = m)

Page 7
ENGO361 Naser El-Sheimy

♦ Example: Levelling between two stations (i.e. elevation difference between two
stations)

B
∆hAB
A

Observations:
l = ∆h AB (n = 1)
Unknowns:
x = hB (u = 1)
Functions:
h = h1
(m = 1)
h1 ⇒ ∆h AB = hB − h A

## f m,1 (x u,1 l n,1 ) = 0 (m ≠ n ≠ u)

The model is implicit with respect to the parameters and the observations
The parameters and observations cannot be separated, and have an “interwoven”
relationship
♦ Example:
Observations:
Y
l = [x1 y3 ]
T
y1 x2 y2 x3 (n = 6)
b
2 Unknowns:
3
x = [a b] (u = 2)
T
1
a Functions:
X
f = [ f1 f2 f3 ]
f1 ⇒ 0 = a + bx1 − y1
f 2 ⇒ 0 = a + bx 2 − y 2 (m = 3)
f 3 ⇒ 0 = a + bx3 − y 3

Page 8
ENGO361 Naser El-Sheimy

## 1.6. Spaces and Transformations

Gun
Parameter Observation
Space Spaces
X Hnu L
dim X = u dim L = n

Amu Bmn
F
dim F = m
Model
Space

## x u,1 = g m,1 (l n,1 )

Linear
♦ The math models are linear w.r.t. to the parameters (i.e., when the math models
are differentiated, they yield a vector of constants)
♦ Example – A simple levelling network (Arrow pointing at the higher station)

## BM (bench mark) ∆H3

B
- known elevation
∆H2

A ∆H1

 ∆H 1 
x = [H B ]
 Unknowns: l = ∆H 2   Observations:
u=1 n=3
 ∆H 3 
c = [H A ]  Functions:
 Constants m = number of parameters (unknowns)
m=u=1
Math Model

u = 1, m = 1, n = 3 ∴ unique solution

Page 9
ENGO361 Naser El-Sheimy

## HB = HA + ∆H1 + ∆H2 + ∆H3

 ∆H 1 
[H B ]1x1 = [1 1 1]1x 3 ∆H 2  + [H A ]1x1
 ∆H 3  3 x1

Non-linear

##  x   g (l)  l cas + l 2 sin c 

2

♦ e.g.,  1  =  1  =  1 
 x 2   g 2(l)  l + l 2 + l 31 

♦ c is a constant, n = 3, u = m = 2
♦ We cannot come up with the g matrix
♦ To solve this problem we usually linearize the model using a Taylor Series
expansion (will be discussed later)

Conditional Model
♦ A special case of the direct model, where no parameters are expressed in the
model

0 = g m,1 (l n,1 )

## ♦ Example 1 - Estimating the internal angles of a triangle

 Unknowns:
u = 2 (any two angles)
γ  Observations:
n=3
 Functions:
m = number of independent conditions
α β m=n-u=3–2=1

Math Model
α + B + γ − 180 = 0
α 
[1 1 1] β  − 180 = 0
 γ 

Page 10
ENGO361 Naser El-Sheimy

## ♦ Example 2 – A levelling network (Arrow pointing at the higher station)

∆H3
BM2
B
∆H2
A
BM1 ∆H1 ∆H4
∆H5

 ∆H 1 
 ∆H 
H a   2
x =  H b   Unknowns: l =  ∆H 3   Observations:
u=3   n=5
 H c   ∆H 4 
 ∆H 5 

H   Functions:
c =  BM 1   Constants
 H BM 2  m = number of independent conditions
m=n–u=5–3=2
Math Model

H BM 1 + ∆H 1 + ∆H 2 + ∆H 3 − ∆H BM 2 =0
+ ∆H 2 + ∆H 4 − ∆H 5 =0
 ∆H 1 
∆H 
1 1 1 0 0  
2
H1 − H 2 
0 1 0 1 − 1  ∆H 3  +  0  =0
  2 x5     2 x1
∆H 4 
 ∆H 5 
5 x1

Page 11
ENGO361 Naser El-Sheimy

## number of functions = number of observations

n = m

Linear
♦ Example – Consider the same Levelling Network (Arrow pointing at the higher
station)

∆H3
BM2
B
∆H2
A
BM1 ∆H1 ∆H4
∆H5

 ∆H 1 
 ∆H 
H a   2
x =  H b   Unknowns: l =  ∆H 3   Observations:
u=3   n=5
 H c   ∆H 4 
 ∆H 5 

H   Functions:
c =  BM 1   Constants
 H BM 2  m = number of observations
m=n=5
Math Model

 ∆H 1  1 0 0 − H BM 1 
 ∆H   − 1 1 0  0 
  a
 2 H
  
 ∆H 3  =  0 − 1 0  H b  +  H BM 2 
       

 4 H  0 − 1 1   H c  ux1  0 
 ∆H 5  − 1 0 1  0 
nx1 nxu nx1

Redundancy = m – u = 5 – 3 = 2

Page 12
ENGO361 Naser El-Sheimy

Non-linear
♦ Non-linear Models will be linearised using a Taylor Series expansion (will be
discussed in Chapter 6)
♦ Example – finding the co-ordinates of a point by resection

 Unknowns:
Y  x
x =   u = 2 (any two angles)
(x2, y2)  y  ux1
(x1, y1)
 Observations:
L2  L1 
(x3, y3)
L1 l =  L2  n = 3
L3
 L3 
 Functions:
(x, y)
m = number of observations
m=n=3
redundancy = degrees of freedom = m – u = 3 – 2 = 1

Math Model:

[
 L1   h1 ( x)   ( x1 − x ) + ( y1 − y )
2 2
]
1

2

 2  2   2 [
 L  =  h ( x )  =  ( x − x )2 + ( y − y )2
1 ]
1
2

1 
[
 L3   h3 ( x)   (x − x ) + ( y − y )2

3
2
3 ] 2

Page 13
ENGO361 Naser El-Sheimy

## F(x,l) = 0 & g(l) = 0

♦ Example: Consider the example of line fitting and knowing that point (2) is
equidistant from points (1) and (3). This condition can be added to the
mathematical model
Observations:
Y
l = [x1 y3 ]
T
y1 x2 y2 x3 (n = 6)
b
2 Unknowns:
3
x = [a b] (u = 2)
T
1
a Functions:
X
F(x,l) = 0

f = [ f1 f2 f3 ]
f1 ⇒ 0 = a + bx1 − y1
f 2 ⇒ 0 = a + bx 2 − y 2 (m = 3)
f 3 ⇒ 0 = a + bx3 − y 3

## Adding the condition on the observation:

g(l) = 0
[(x 1
− x 2 ) + ( y1 − y 2 )
2 2
] − [(x
1
2
3
− x 2 ) + ( y 31 − y 2 )
2 2
]
1
2
=0

Page 14
ENGO361 Naser El-Sheimy

## L = h(x) & h(x) = 0

♦ Example – Consider the same Levelling Network (Arrow pointing at the higher
station) and knowing that Stations A and B are at the edge of lake (i.e. Ha = Hb)

∆H3
BM2
B
∆H2
A
BM1 ∆H1 ∆H4
∆H5

C
Math Model

L = h(x)

 ∆H 1  1 0 0 − H BM 1 
 ∆H   − 1 1 0  0 
  a
 2 H
  
 ∆H 3  =  0 − 1 0  H b  +  H BM 2 
       

 4 H  0 − 1 1   H c  ux1  0 
 ∆H 5  − 1 0 1  0 
nx1 nxu nx1

h(x) = 0
H a - Hb = 0
H a 
[1 − 1 0]  H  = [ 0]
 b
 H c  ux1

Page 15
ENGO361 Naser El-Sheimy

3. ERROR PROPAGATION
♦ Error Propagation: is the process of evaluating the errors in estimated quantities
(X) as functions of the errors in the measurements (L)
♦ Concept:

## l=(l1, l2, …, ln) Mathematical x=(x1, x2, …, xu)

+ Model +
Random Errors + Propagated Errors
Modeling Errors

l, Cl xˆ , C xˆ
Observations Best estimate of the unknowns
Variance-Covariance matrix of Variance-Covariance matrix of
the observations the estimated unknowns

## ♦ Simple Example: suppose that a quantity (y) is estimated from a measured

quantity (x) according to the following function (representing a straight line):

y = a + bx (1)
Y
And using the concept of true value, as
introduced in chapter (1)
b
y
yt = a + bxt (2)
yt
Defining the error of a measurement as the measured
value minus the true value , (1) – (2) a
y – yt = b(x-xt) X
xt x
dy = b⋅dx

##  That is any error in x of a value (dx) will introduce (i.e. propagate) an

error of a value (b⋅ dx) in the y component
 Any error in estimating b (i.e. any error in the math. model) will introduce
errors in y

♦ In general:
 Given:

Observations: l = [l1 ln ]
T
l2 (recall l1 is univariate)

1/14
/
ENGO361 Naser El-Sheimy

Covariance information: C l

 Required

## Estimated unknowns: xˆ = [xˆ1 xˆ u ]

T
xˆ 2

Covariance information: C xˆ
 With…
u = number of unknowns

n = number of observations

## nnecessary = the minimum number of observations required to

estimate the unknowns

r = degrees of freedom

## r = n – nnecessary (normally, nnecessary = u)

♦ Classification of problems
r=0 r >0
 we have nnecessary = n  no. of equations > no. of
 no. of equations = no. of unknowns
 unique solution observations to reach a unique
solution
We will study: We will study:
 Univariate error propagation  The method of Least Squares
 Multivariate error propagation

## ♦ Zero-Redundancy Error Propagation (r = 0)

Error propagation
r=0

Univariate Multivariate
dim(x) = 1
Cl = Diagonal
“Uncorrelated dim(x) = 1 dim(x) = u dim(x) = u
Observations” Cl ≠ Diagonal Cl = Diagonal Cl ≠ Diagonal

2/14
/
ENGO361 Naser El-Sheimy

3.1. Univariate

♦ Characteristics:

x = [x1 ]1x1
♦ Steps of Solution:
1. Construct the Mathematical model (direct model)
x = f (l)
Where

l = [l1 l2 ... l n ]
2. Obtain the best estimate of x
ˆ f (l)
x=
3. Estimate the precision of x̂
2 2 2
 ∂f   ∂f  2  ∂f  2
σ =   σ l21 + 
2

 σ l + ... +   σ l
 ∂l1   ∂l 2   ∂l n 
2 n

2
 ∂f
n
 2
σ = ∑ 
2

 σ l → Observe the units
i =1  ∂l i 
i

♦ Example:
 Given:
x=A

L = (a,b) a A
a =30m σ a = 0.1m
b = 40m σ b =0.2m
b
 Required: Aˆ and σ Aˆ
 Solution:
1. Mathematical Model
x = f (l ) → A = a ⋅ b
2. Best Estimate
xˆ = f (l ) → Aˆ = a ⋅b = 30 m⋅40 m = 120 m 2

3/14
/
ENGO361 Naser El-Sheimy

## 3. Estimate the precision of xˆ → σ Aˆ (m 2 )

2
 ∂f 2
 2
σ = ∑ 
2

 σ li
i =1  ∂l i 
2 2
 ∂A   ∂A 
σ =   σ a2 +   σ b2
2

 ∂a   ∂b 

σ 2

(b 2
)σ 2
a + (a 2
)σ 2
b

## σ A2ˆ = 1600m 2 ⋅ 0.01m 2 + 900m 2 ⋅ 0.04m 2 = 52m 4

σ A = 7.211m 2
 Best estimate of A = 120 m2 ± 7.211 m2
♦ Example on Univariate Error Propagation
 Given:

Mean σ n
a A
d 56.78m 2cm 4
θ 9°12'7" 30" 9
b
 Required: hˆ and σ hˆ
 Solution:
1. Mathematical Model
x = f (l ) → h = d tan θ
2. Best Estimate
xˆ = f (l )

## 3. Estimate the precision of ĥ

2 2
 ∂h   ∂h 
σ =   σ d2 +   σ θ2
2

{  ∂d  ∂θ
2
cm 1424 3 1 42 43
cm 2 cm 2

4/14
/
ENGO361 Naser El-Sheimy

σd 2 2
σ dˆ = = = = 1cm
4 4 2

σθ 30 30′′
σ θˆ = = = = 10′′
a 9 3

2
∂h  ∂h 
= tanθ (unit less ) →   σ d2ˆ → cm 2
∂d  ∂d 
2
∂h  ∂h 
∂θ
( )
= d sec 2 θ units m 2 →   σ θ2ˆ → m 2 sec 2
 ∂θ 

2
10′′ 
( 2 2
) ( 2
σ h2 = tan9 0 12′17′′ (1) + 56.78x100sec 2 9 0 1217 ′′   )
 206265 

ĥ = 9.1983 ±

## ♦ In the previous section we have discussed the error propagation of univariate.

That is single unknown quantity derived from uncorrelated observables and thus
we applied the law of propagation of variances.
♦ In this chapter, we will discuss the error propagation of multivariate in which we
will apply the law of propagation of variance-covariance matrix (also known as
covariance Law)
♦ To derive the Covariance Law, let us start with the special case of univariate
(single unknown X as a function of uncorrelated observables)

x = f (l1 , l 2 ,...l n ) = f (l )
with the variance of x, σx , derived from the Law of Propagation of variances as:
2 2 2
 ∂x   ∂x  2  ∂x  2
σ =   σ l21 + 
2
x
 σ l2 + ...... +   σ ln
 ∂l1   ∂l 2   ∂l n 
 ∂x   ∂ x   ∂ x  2  ∂x   ∂x  2  ∂x 
=  σ l21   +  σ l2   + ...... +  σ ln  
 ∂l1   ∂l 2   ∂l 2   ∂l 2   ∂l n   ∂l n 

5/14
/
ENGO361 Naser El-Sheimy

 ∂x 
σ 2
l1 0 ... 0   ∂l1 
  ∂x
 ∂x ∂x ∂ x   0
σx = 
2
... 
σ l22  
 ∂l1 ∂l 2 ∂l n   M O   ∂l 2 
 M 
2 
 0 σ ln  ∂x 
 
 ∂l n 

## σ x21 x1 = J l 1xn C l nxn J Tlnx1

♦ Now if we have more than one unknown, say a vector x of u unknowns, that are
related to the n observations as follows:

## x ux1 = f ux1 (l nx1 )

Or,

x1 = f1 (l1 , l 2 ,..., l n )
x 2 = f 2 (l1 , l 2 ,..., l n )
M
x u = f u (l1 , l 2 ,..., l n )
♦ In this case, σx will be a matrix Cx with dimensions u x u that takes the following
form:

##  σ x21 σ x1x2 σ x1xu 

 
 σ x22 
Cx =
 symmetric 
 
 σ xu xu 

♦ More over, the Jl matrix will contain all the partial derivatives
∂x j
j = 1,2,...u i = 1,2,...n
∂l i

##  ∂x1 ∂x1 ∂x1 

 ∂l L
∂l 2 ∂l n 
 1 
 ∂x 2 ∂x 2
L
∂x 2 
J l uxn =  ∂l ∂l 2 ∂l n 
1
 M M O M 
 ∂x ∂x u ∂x u 
 u L 
 ∂l1 ∂l 2 ∂l n 

6/14
/
ENGO361 Naser El-Sheimy

## ♦ Jl is usually called the Jacobian Matrix. It is also sometimes called the

Coefficient Matrix (since, in the case of a linear mathematical model, the partial
derivatives are just the coefficients of the observations).
♦ When some observations have correlation, then the variance-covariance matrix Cl
will not be a diagonal matrix.

##  σ l21 σ l1l2 L σ l1ln 

 
σ l22 σ l2 l 2 
Cl = 
 symmetric O M 
 
 σ l2n 

♦ Taking all the above characteristics of the multivariate into account, the final
covariance law will be:

C x = J l ⋅ C l ⋅ J Tl
♦ Summary of steps for solving a Multivariate Error Propagation:
 Given: l nx1 , C l nxn
 Required: xˆ ux1 , C xˆ uxu
1. Form the direct (explicit) mathematical model
x = f(l)
2. Establish the variance-covariance matrix of the observations Cl

## 3. Evaluate the elements of the Jacobian Matrix:

∂x j
Jl = j = 1,2, ... u i = 1,2,...n
∂l i
4. Adjust the physical units in the Covariance Law

## 5. Apply the covariance law to get Cx

7/14
/
ENGO361 Naser El-Sheimy

♦ Example:
 Given:
HA = 20 m
HA = 20 m (BM)
∆h1 = ?
 ∆h  5
l =  1  =  m H1 = ?
∆h2  8
 4 0
Cl =   mm
2
∆h2 = ?
 0 9  H2 = ?

 Required:

H 
x =  1  and Cx
H 2 

## - Discuss the degree of correlation between them

 Solution:
1. Mathematical Model
H 1 = H A + ∆h1 = 20m + 5m = 25m
H 1 = H A + ∆h1 − ∆h1 = 20m + 5m − 8m = 17 m
2. Variance-covariance matrix of the observations:
Cl is given, note that the observations are uncorrelated
(however, this does not mean that the best estimates of x will
be uncorrelated as well)
3. Construct the Jacobian Matrix
 ∂H 1 ∂H 1 
 ∂ ∆h ∂∆h2  1 0 
J l 2x2 = 1
= (no units)
 2∂ H ∂H 2  1 − 1
 ∂∆h1 ∂∆h1 

## (Note: A linear model will result in the coefficients of the

observations as the elements of Jl)
4. Adjust the physical units in the Covariance Law.
Since all the partial derivatives are unit-less, and the variances
of the observations are in mm2 (which are the same units of Cx)
no scaling is needed.

8/14
/
ENGO361 Naser El-Sheimy

## 5. Apply the covariance law to get Cx

C x = J l ⋅ C l ⋅ J Tl
 σ H2 1 σ H1H 2  1 0  4 0 1 1 
 =
σ H 2 H1 σ H2 2  1 − 1 0 9 0 − 1
4 0  1 1  4 4 
=      mm 2
4 − 9 0 − 1 4 13
From which we get

σ H1 = 4 mm 2 = 2 mm σ H 2 = 13 mm 2 = 3.6 mm

## σ H2 1 = σ H2 A + σ n21 σ H2 2 = σ H2 A + σ h21 + σ h22

Note: and
= 0 + σ h21 = 4 mm 2 = 0 + 4 + 9 = 13 mm 2
 Correlation:

σ H1 H 2 4
ρ H1H 2 = = = +0.55 ( Signifcant )
σ H1 σ H 2 2 ⋅ 3.6

 Homework:

## Consider ρ ∆h1∆h2 = 0.25 . What will ρ H1H 2 be?

9/14
/
ENGO361 Naser El-Sheimy

## 3.3. Pre-analysis of Survey Measurement

♦ Types of analysis

## Pre-analysis Measurements Post-analysis

σ x or C x σ x or C x
Pre-specified accuracy Actual accuracy

## ♦ Pre-analysis is the analysis of the component measurement before the project is

actually undertaken.
♦ Assumptions: all components of a survey measurement are free of bias caused by
systematic errors. This means that variances, or standard deviation, can be used as
measures of accuracy as well as measures of precision.
♦ Main items to consider in pre-analysis of a certain survey project are:
1. Possible survey techniques (and thus the corresponding mathematical
model)
 Example:

L hr
L sin θ

B θ
∆h2 ∆h
hI
∆h1
A A
HB = HA + Σ∆hi HB = HA + hI + L sin θ - hr
(Spirit leveling) OR (Trigonometric levelling)

## 2. Available surveying instruments (cost, simplicity, and the precision of a

single measurement).

10/14
/
ENGO361 Naser El-Sheimy

## ♦ Consider the simple case of having a single unknown and uncorrelated

observations – the process of pre-analysis is performed through the application of
the Law of Propagation of variances:

Recall:
2 2 2
 ∂x  σ l2i  ∂x   σ l 2   σ 
2 2

σ = 
2
 +     + ....... +  ∂x   l n 
x      
 ∂l i  ni  ∂l 2   n2   ∂l n   nn 
14444444444244444444443
n −terms

2
n
 ∂x   σ l2i 
σ 2
= ∑    
x
i = i  ∂l i 
 ni 
 

## σ x2 … Final required accuracy

 ∂x 
  … Effect of the math model
 ∂l i 

 σ l2i 
  … Effect of the instrument and the number of
 ni 
 
observations

Note:

## σ x … is usually given in this case (as a pre-specified accuracy)

∂x
… will depend on the mathematical model
∂l i

## n… is the number of observations

11/14
/
ENGO361 Naser El-Sheimy

∂x
♦ Usually and σ li are related and easy to decide upon and therefore the number
∂l i
of observations ni are the main quantities we are interested in.
♦ Since we have only a single equation and n unknowns, we can not estimate ni
unless we have some additional information or make some assumptions (e.g., the
n terms equally contribute to the total error budget).
♦ Example:
 The (d) distance can’t be measured
directly, however, we can measure
s1 s2 and α.
 We know the following: d
Measurement Standard deviation
s1 = 136 m σ s1 = 1.5 cm s2
α
s 2 = 115 m σ s2 = 1.5 cm s1
α = 50° σ α = 10"
Note: the value of the
measurements can be obtained from
a map (i.e. approximate values)
 Required

## n s1 , n s2 , nα such that σ d > 0.5 cm

 Solution:
1. Mathematical Model

(
d = s12 + s 22 − 2s1 s 2 cos α ) 1
2
= 107.77 m (cosine law)
2. Error Model
2 2 2
 ∂d  2  ∂d  2  ∂d  2
σ = 
2
 σ s1 +   σ s2 +   σα
 ∂s1   ∂s 2   ∂α 
d

∂d 2 s1 − 2 s 2 cos α
= = 0.576 (unitless)
∂s1 2d
∂d 2 s 2 − 2 s1 cos α
= = 0.255 (unitless)
∂s 2 2d
∂d 2 s1 s 2 cos α
= = 111.17 m = 11117 cm
∂α 2d
σ α~2
(0.5 cm) 2
= (0.576) σ + (0.255) σ
2 2 2 2
+ (11117 cm )
2
s1 s2
( ρ )2

12/14
/
ENGO361 Naser El-Sheimy

## 0.25cm 2 = 0.332σ s21 + 0.065σ s22 + 0.003σ α2

σ s21 σ s22 σ α2
0.25cm 2 = 0.332 + 0.065 + 0.003
ns1 n s2 nα

##  We have 3 unknowns in one equation. Therefore, to solve this

equation, we must impose some conditions:
1. 1st Trial
Assume s1, s2 and α contribute equally to σ2d.

0.25
= 0.332
(1.5) 2
→ n s1 = 9
3 n s1
0.25
= 0.065
(1.5) 2
→ n s2 = 2
3 n s2
0.25
= 0.003
(10) 2
→ nα = 4
3 nα

## Note: although σ s1 = σ s2 but n s1 > n s2 ∴ the equal distribution

is not a proper assumption.
2. 2nd Trial
0.25 = 0.15 + 0.05 + 0.05
b b b
0.25 = 0.335
( )
1.5 2
+ 0.065
(1.5) + 0.003 (10)2
2

n s2 n s2 nα
↓ ↓ ↓
ns1 = 5 ns2 = 3 nα = 6

## Which is a more realistic than the first assumption especially

for ns1 and ns 2

13/14
/
ENGO361 Naser El-Sheimy

♦ Example:
 The height h of a survey
station (A) above the reflector
instrument at (B) is
required with an accuracy
hr
of 0.01m.
S
A
h = ssin α − hr
B α h
s = 400m α = 30o

## 1. Estimate σ s , σ α , σ hr assuming balanced (equal) accuracies:

2 2 2
 ∂h   ∂ h  2  ∂h  2
σ h2 =   σ s2 +   σα +   σ
 ∂s   ∂α   ∂t 
σh 0.01
3 3
σs = = = 0.0115 m
∂h sin (30 )
∂s
σh 0.01 0.01 = 1.67 x10 −5 rad
3 3 3
σα = = = = 1.67 x10 −5 ⋅ ρ "
∂h / ∂α scos α 346
= 3.4′′
σh 0.01
3 3
σ hr = = = 0.058 m
∂h / ∂t (1)
2. If the σα is limited by the instrument used to be 5″, re-evaluate σs
and σhr to accommodate this limitation in σα
From 1:
2

## (0.01) = (0.5) σ + (346)  5  + (− 1)2 σ t2

2 2 2 2

 ρ ′′ 
s

(0.0056)2 m 2 = (0.5)2 σ s2 + σ t2
Balancing the accuracies of the two terms
0.0056 / 2
σs = = 0.008m (8mm ) → choose an EDM
0.5
0.0056 / 2
σt = = 0.004m (4mm )
1

14/14
/
Naser El-Sheimy ENGO361

♦ In the previous chapters, we dealt with the case where the number observables (n)
is just enough to provide the necessary number of equations (m or nnecessary) to
estimate (u) number of unknowns. This usually results in a unique solution for
the unknowns.
♦ In practice, however, a unique solution is dangerous, as an error in a single
observation can radically effect the final solution for the unknowns. Therefore, in
Geomatics we typically have redundant observations, i.e. having (taking) more
observations than is necessary for a unique solution.

## Number of redundant observations = r = n – u (degrees of freedom)

♦ A mathematical model ( l = f(x), g(l) = 0, f(x, l) = 0 ) in which r > 0 is termed as
over-determined mathematical model that can lead to an infinite number of
solutions for the unknowns x.
♦ This actually happens due to the discrepancies among the different equations of
the math model caused by the effect of the random errors that are still existing in
some or all of the observables.
♦ To illustrate the above fact, let us discuss the following Examples:
1. Levelling Networks
The following example shows
A = 10 m B = 12 m
three different routes (A→P,
B→P, C→P) for estimating the ∆h1 = 1.1
elevation of point p.
∆h2 = 1
P=?
H p = H A + ∆h1
= 10 + 1.1 = 11.1m ∆h3 = 3.8
or H ρ = H B − ∆h2
= 12 − 1 = 11.0m
or H ρ = H c − ∆h3 C = 15 m
= 15 − 3.8 = 11.2m

u = 1, n = 3, r = u – n = 3 – 1 = 2

Due to the different uncertainty (errors) in the observations ∆hi, the elevation of
point p is not identical.

Page 1 of 17
Naser El-Sheimy ENGO361

2. Area of Triangle
u = 1, nnecessary = 3, n = 6
1 B
∆1 = a b sin C
2
1
∆ 2 = cbsin A \
2
c a
1
∆ 3 = a csin B
2
Again, identical results may not
(most probably) be obtained, due to
the errors in the observation vector A b C
L.

## The above-mentioned problems associated with the over-determined math model

can be overcome by adjusting the observations.
♦ The apparent inconsistency, due to measurement errors, with the math model can
be resolved through the replacement of the given observations l by another set of
the so-called best estimates of the observations l̂ such that the new set l̂ fits the
model exactly.

Where ˆl = l + vˆ
n,1 n,1 n,1

Note: The estimated residuals v̂ are unknown and must be determined before the
observations can be estimated.

## given n0′ Choose a x̂′

l = l1 l2 … ln Mathematical
Model n0 ″ subset n0 of n
x̂ ′′
to solve for X
required M n0 = necessary
n>u k M
x = x1 x2 … xn i.e, r > 0 n0 observations to
x̂ k
solve for X
Inconsistent
(different)
solution

Page 2 of 17
Naser El-Sheimy ENGO361

Estimated
v = v1 v2 … vn
given
l = l1 l2 … ln Mathematical
Model
required x̂
n>u Unique
x = x1 x2 … xn i.e, r > 0 solution

## ♦ How we choose v: There are essentially an infinite number of possible sets of

residuals that provide estimated observations l̂ that fits the math model.
However, there is only one set of residuals that yield the optimal Least Squares
solution: ∑ vi2 = v T v = min

## 4.2. The Least Squares Method:

♦ Principle:
In addition to the fact that the adjusted observation must satisfy the mathematical
model exactly, the corresponding residuals must satisfy the Least Squares
criterion.
n
φ = ∑ v i2 = v12 + v 22 + .......... + v n2 = min
i =i

## ♦ Simple example on the concept:

Required: X

x –––––––– u (unknowns) = 1
Given:

 l  15.12
l =  1 =   m –––––– n (observations) = 2
l 2  15.14

r (redundancy/extra observations) = n – u = 2 – 1 = 1

The final value of x̂ (best estimate) can be obtained from the observation
equations.

## xˆ1 = l1 + vˆ1 = lˆ1 

 Find vˆ1 and vˆ2 such that lˆ1 = lˆ2
xˆ 2 = l2 + vˆ2 = lˆ2 

Page 3 of 17
Naser El-Sheimy ENGO361

For For
l1 l2 lˆ1 lˆ2 ∑v 2

## 0 -0.02 15.12 15.12 φ1 = (0 )2 + (0.02)2 = 4 x10 −4

Possible 0.01 -0.01 15.13 15.13 φ 2 = (0.01)2 + (− 0.01)2 = 2 x10 −4
values
for v 0.015 -0.005 15.135 15.135 φ3 = (0.015)2 + (− 0.005)2 = 2.5 x10 −4

Note that φ2 is the smallest, but is it the very minimum value when all possible
combinations of corrections are considered?

♦ Geometric Interpretation of ∑v 2
= min :

The adjusted observations lˆ1 and lˆ2 are related to each other by lˆ1 - lˆ2 = 0 or lˆ1 =
lˆ which is a line with 45o inclination (condition line)
2

## The observation l̂2

A φ3
15.12 15.14 A3
l=  φ2
15.14 15.13
φ1
Condition line
A2
can be represented by l̂1 = l̂2
A1
point (A)
The projection of point (A)
into the condition line have 3
possibilities l̂1
15.12 15.13

A1, A2, and A3 correspond to φ1, φ2 and φ3 respectively, and all the 3 points satisfy
the condition line that is lˆ1 = lˆ2 where,

## ˆl = A = 15.12 or A = 15.13 or A = 15.135

1 15.12 2 15.13 3 15.135
     
However, A2 is the only solution that satisfies

## v T2 ⋅ ˆl = 0 (inner or dot product).

That is,

15.13
[0.01 − 0.01]  =0.
15.13

Page 4 of 17
Naser El-Sheimy ENGO361

This means that the vector v2 is perpendicular to the condition line and therefore it
is the minimum function.

## Note as well that xˆ = ˆl =

∑ li = 15.12 + 15.14 = 15.13 – that is, the mean satisfies
n 2
the least squares condition.

♦ Proof that the mean is the Least Squares estimate for a group of measurements of
certain parameters:
[
Given: l1 l 2 ... l n ]
Required: x

l1 + v1 = xˆ
l 2 + v2 = xˆ X
X
l n + vn = xˆ
Using the least squares condition:
φ
n
φ = ∑ vi2
i =i

∂φ
min =0
Or, ∂X

X

## φ = (xˆ − li )2 + (x − l 2 )2 + ....... + (xˆ − l n )2 = min

For φ to be minimum,

∂φ
=0
∂xˆ
∂φ
= 2( xˆ − l1 ) + 2( xˆ − l 2 ) + .... + 2( xˆ − ln ) = 0
∂xˆ
0 = 2 n xˆ − 2(l1 + l 2 +........ + l n )
n
0 = nxˆ − ∑ li
i =i

xˆ =
∑l i

n
If the observations are unequal in precision, we have to consider the weight of the
observations, that is

xˆ =
∑ρl i i
⇒ Weighted mean
∑ρ i

Page 5 of 17
Naser El-Sheimy ENGO361

♦ Proof that the weighted mean is the Least Squares estimate for a group of
observations, with different precision, of the same parameters:

l i + v i = xˆ
φ = ∑ρv i i
2
= min where
v i = xˆ − l i
subsitute by v
φ = ∑ ρ (xˆ − l ) = min
2
i i

φ = ∑ ρ (xˆ − 2 xˆ l + l ) = min
2 2
i i i

φ = xˆ ∑ ρ − 2 xˆ ∑ ρ l + ∑ ρ l
2
i i i i
2
i = min
For φ to be minimum
∂φ
=0
∂xˆ
∂φ
= 2 xˆ ∑ ρ i − 2∑ ρ i li = 0
∂xˆ

xˆ =
∑ ρ i li
∑ ρi
It is very important to note that for

∂φ
φ = ∑ v 2 = min = 2∑ v i = 0 → ∑ v = 0
∂v
and
∂φ
φ = ∑ ρ i v i2 = min = 2∑ ρ i v i = 0 → ∑ ρ i v i = 0
∂v
(check chapter 1 and lab 1)
♦ Based on the above discussion, the basic concept of adjustment is, therefore, to
allow the observables l to change slightly while solving for x.
♦ This means that in the over determined model f(x,l)=0, we consider (l) as
approximate values for the observables which need to be corrected by certain
small amount, denoted by v so as to yield a unique solution. (v is the vector of
residuals)
♦ The mathematical model becomes:

( )
f xˆ u,1 , ˆl n,1 = f (xˆ u,1 , l n,1 + v n,1 ) = 0
(unknowns…) x̂ is the best estimate (adjusted) valves of the parameters
l̂ is the adjusted values of the observables
l is the original observation
(unknowns…) v is the residuals vector

Page 6 of 17
Naser El-Sheimy ENGO361

♦ The above model cannot be solved for x and v simultaneously because we have
more unknowns (u “for x” + n “for v”) in n equations (that is we need extra u
equations)
♦ Recall, the residuals v are very small and will behave according to Gauss Law of
Random errors (Σv = 0). As a result we can find several conditions that can be
used to provide us with the required u extra equations which will enable us to
solve for both x and v simultaneously.
♦ The condition of Least Squares of the residuals (Σv2 = min) was found to satisfy
the properties of the best estimate:
1) Maximum
Likelihood
most precise not precise
(most
probable)
2) Minimum
variance 0 0
(most
precise)
3) Unbiased accurate not accurate
(most
accurate)

0 0

Page 7 of 17
Naser El-Sheimy ENGO361

## 4.3. Least Square Techniques

General Model
Implicit Model
f(xu,1, ln,1) = 0
• X and L can not be written
as an explicit function

ln,1 = f(xu,1) fr,1(ln,1) = 0
• Each observed quantity • r = n - u ; the degree of freedom
provides one observation • After adjusting the observations,
equation the parameters can be estimated
• one stage adjustment in directly from the:
which x̂ and l̂ are x u,1 = f u,1 (l n,1 )
estimated simultaneously • Two stage adjustment, in which l̂
is estimated first, then x̂ as a
function of l̂

♦ These math models can be either linear or non-linear. Non-linear models should
be linearised first before conducting the Least Squares Adjustment.
4.4. Linearisation of Non-linear Models

## ♦ Linearisation: is the process of approximating non-linear functions by a linear

one. The most convenient and efficient approach of linearizations used in
Geomatics is Taylor Series Expansion.
♦ The Expansion of a function is performed about a Point of Expansion (POE) (i.e.
Good approximate values for the involved variables).
1) Univariate function - l = f(x)
L = f(X)

l = f(x)
∂f
∂f
l = f(x o ) +
∂x x 0 (x − x )
0 L + V = f(X0+δ) tanθ =
∂x

f(X0)
1 ∂ 2f
+
2! ∂x 2 x0
(
x − x0
2
)
X
+ higher order terms X0

X = X0 +

Page 8 of 17
Naser El-Sheimy ENGO361

∂f
( )
l = f x0 +
∂x x0

∂f
(( ) )
0 = f x0 − l +
∂x x0
δ

w + A δ =0
n×1 n×u u×1

## Misclosure Correction to the

vector unknown parameters

2) Multivariate Functions
fm,1 (xu,1 , ln,1) = 0
POE:
xo – approximate values of the unknowns can be estimated from u
group of observations ( use the number of observations necessary
to solve for x)
lobs – observed values

f (x, l ) ≈ f (x 0 , l obs ) +
∂F
∂x x 0 Lobs
(xˆ − x ) + ∂f
0

∂l x0l
(ˆl − l )
obs

In matrix form

w + A δ + B v = 0
m×1 m×u u×1 m×n n ×1

## wmx1 misclosure vector

Amxu 1st design matrix (partial derivatives of the function w.r.t. the parameters)
δu,1 Corrections to the approximate values of the unknown parameters x0
Bm,n 2nd design matrix (partial derivatives of the functions w.r.t. the
observables)
vn,1 corrections to the observations (residuals)
Note: l̂ n,1 = lobsn,1 + vn,1 adjusted observations

## x̂u,1 = xou,1 + δu,1 adjusted parameters

Page 9 of 17
Naser El-Sheimy ENGO361

♦ From the multivariate linearisation, we can drive the two general adjustments
1) Parametric Model
l n,1 = f n,1 (x u,1 ) or f n,1 (x u,1 ) − l n,1 = 0 n,1
POE − x 0 ,l obs
∂f ∂L
( )
f x 0 − l obs + δ −
14243 ∂x ∂ L
v=0
w { {
A I

## w n,1 + A n,u δ u,1 − v n,1 = 0 n,1

Can be derived from the Implicit Model by substituting B = -I

2) Conditional Model
f r,1 (l n,1 ) = O r,1
POE − l obs
∂f
( )
f l obs +
∂L
v=0

## w r,1 + B r,n v n,1 = 0 r,1

Can be derived from the Implicit Model by substituting A = 0

## ♦ Linear Parametric Math Model Example: Levelling Network

Constants:
A
HA ∆h1
Observations – n = 3 B
∆h3
l = [∆h1 ∆h2 ∆h3 ]T ∆h2
Unknowns – u = 2

x = [hB hC ] C
T

## Observation Equation for a single observation– ˆl = f (xˆ ) :

∆hˆ1 = hˆB − hA

Page 10 of 17
Naser El-Sheimy ENGO361

## Expand into differential form – l + v = f(x o ) + ∂f(x) δ:

∂x

h1 + v1 = hBo + (1)δ hB

1
424
(
v1 = hBo − h1 + δ hB
3
)
w

## Similarly for h2:

h2 + v2 = hˆB − hˆC
( )
= hBo − hCo + δ hB − δ hC
δ h 
( )
v2 = hBo − hCo − h1 + [1 − 1] B 
δ hC 
1424 43 4
w

A similar equation can be derived for h3, and together the three equations can be
written in matrix form:

 v1   1 0   hBo − h A − h1 
δ
v  = − 1 1   hB  + h o − h o − h 
 2   δ h   B C 2

v3   0 1   C 
 hC − h A − h3 
o
 
v n,1 = A n,u δ u,1 + w n,1

## ♦ Non-linear Parametric Math Model Example: 2–D Distance Network

Constants:
B C
xA, yA – xB, yB – xC, yC
Observations – n = 3 A
dBP
l = [d AP d BP d CP ]T dCP

Unknowns – u = 2 dAP

x = [x P yP ]
T

(xP, yP)
Observation Equation – ˆl = f (xˆ )

## dˆAρ = (xˆ ρ − xA ) + (yˆ

2
ρ − yA )
2

Page 11 of 17
Naser El-Sheimy ENGO361

## Expand into differential form

v dAρ = w + A δ
 ∂d Aρ ∂d Aρ  δx ρ 
( ( )
v dAρ = f x 0 − l obs +  )  
 ∂x ρ ∂y ρ  δy ρ 
 ∆x 0 ∆y 0  δx ρ 
( )
v dAρ = d A0 ρ − d Aobsρ +  0 0  
 d Aρ d Aρ  δy ρ 

## ∂d Aρ 2(x ρ0 − y A )(1) ∆x ρ0A

= = 0
∂x ρ 2 (x ρ0 − x A ) + ( y ρ0 − y A ) d Aρ
2 2

∂d Aρ 2( y ρ0 − y A )(1) ∆y A0 ρ
= = 0
∂y ρ 2 (x ρ0 − y A ) + ( y ρ0 − y A ) d Aρ
2 2

 ∆x A0 ρ ∆y A0 ρ 
 0 
d d A0ρ 
vdAρ   Aoρ d 0 − d Aobsρ 
   ∆x Bρ ∆y B0ρ  δx A   A0ρ 
 v dBρ  =  d0    +  d Bρ − d Bobs
d Bρ  δy A 
0 ρ 
vdCρ   B0ρ  d C0ρ − d Cobs 
   ∆x ∆yCρ0   ρ 
 0Cρ 
 d Cρ d C0ρ 
v n,1 = A n,u δ u,1 + w n,1

## ♦ Condition Model Example

Levelling Network: same network as in the parametric example

 h1  A
L = h2  , n = 3 h1
 h3  B
h3
h 
x =  B , u = 2
 hc  h2
r = n −u = 3− 2 =1
C

Page 12 of 17
Naser El-Sheimy ENGO361

## Note: The condition equation is a geometric or physical condition that must be

satisfied by the adjusted observation l̂

()
g ˆl = 0
hˆ1 − hˆ2 − hˆ3 = 0
Expand into differential form:

∂f 0 obs
∂l f(l ) = f(l )

B v + w = 0
r×n n×1 r×1 r×1

 v1 
[1 − 1 − 1]v2  + [h1 h2 h3 ] = 0
v3 

• The number of equations is fewer than in the parametric model, but the condition
model is more difficult to program
• If there is more than one condition, they must be independent (that is you cannot
get one equation from the other condition equations … one of the problems with
• Following the estimation of the v n,1 vector, we can estimate the best estimate
value of the unknown parameters.
• ˆl = l + v
n,1 n,1 n,1

• Since there are more observations than unknowns, there are several possibilities
for the direct model equations.
• All possibilities will a be equivalent if the adjusted observations are used, i.e

xˆ = f(ˆl)

 hˆ 
hˆB  1 0 0  ˆ1  hA 
ˆ =  h2  +  
 hc  0 0 1  ˆ  hA 
 h3 
Another possibility is:

hˆB = hA + h1

Page 13 of 17
Naser El-Sheimy ENGO361

hˆc = hA + h1 − h2

 hˆ 
hˆB  1 0 0  ˆ1  hA 
ˆ =  h2  +  
 hc  1 − 1 0  hˆ  hA 
 3

## Note: That this is a direct model with redundant adjusted observations.

♦ Combined Model Example: Circle fit Problem
Observations: (xi , yi) on the circumference of the circle
Unknowns: Centre point of circle and radius

 xi  Y
y 
 i
 x2  (x1, y1)
   xc 
y •
l8,1 =  2  x =  y c  r
 x3  (x4, y4) •
   γ  (xc, yc)
 y3 
x  • • (x2, y2)
 4 (x3, y3)
 y 4 
X
n=8 u=3
Math Model f(x,l) = 0:

## (xˆi − xˆc )2 + ( yˆ i − yˆ c )2 − rˆ 2 = 0 i = 1,2,3,4,

(note that the observables and parameters are not separable)

##  ∂f1 ∂f1 ∂f1 

 ∂x ∂y c ∂r 
 c 
 ∂f 2 ∂f 2 ∂f 2 
δxc 
∂f  ∂xc ∂y c ∂r 
A 4,3 = = δ 3,1 = δyc 
∂x  ∂f 3 ∂f 3 ∂f 3 
  δr 
 ∂xc ∂y c ∂r 
 ∂f 4 ∂f 4 ∂f 4 
 
 ∂xc ∂y c ∂r 

Page 14 of 17
Naser El-Sheimy ENGO361

(x

 1 )
− xc0
2
+ (y 1 − y c0 ) 2
− r 02 

) (x −x ) (y )
0 2 2

( ) (
w 4,1 = f x 0 , l 0 = f x 0 , l obs = 2 c + 2 − y c0 − r 02 
(x

 3 −x ) 0 2
c + (y 3 − y c0 ) 2
− r 02 

(x
 4 −x ) 0 2
c + (y 4 − y c0 ) 2
− r 02 

 ∂f1 ∂f1 
 ∂x 0 0 0 0 0 0 
∂y1
 1 
 0 ∂f 2 ∂f 2
0 0 0 0 0 
∂f  ∂x2 ∂y 2 
B 4,8 = = 
∂l  0 ∂f 3 ∂f 3
0 0 0 0 0 
 ∂x3 ∂y3 
 ∂f 4 ∂f 4 
 0 0 0 0 0 0
∂x4 ∂y 4 

eg. ( x1 − xc ) + ( y1 − y c ) − r o 2 = 0
2 2

∂f1
∂x1
(
= 2 x1 − xc0 )
∂f1 1st row of the B matrix
∂y1
(
= 2 y1 − y co )

∂f1
∂xc
(
= −2 x1 − xc0 )

∂f1
∂y c
(
= −2 y1 − y c0 )
1st row of the A Matrix

∂f1 
= −2r 0 
∂r

T
v 1,n [
= v x1 v y1 vx2 vy2 vx3 v y3 vx 4 vy4 ]

## A n,u δ u,1 + w n,1 = v n,1 with POE : x o

xˆ = x o + δˆ

Page 15 of 17
Naser El-Sheimy ENGO361

♦ As discussed before, the best estimate is the one that satisfies the Least-Squares
Condition:

φ = ∑ v 2 = min or ∑ pv 2
= min

φ = v 1,n
T
Pn,1 v n,1 = min

## ♦ P is the weight matrix P α Cl-1 (variance-covariance matrix of the observations)

♦ φ is called the Variation Function

## φ = (A n,u δ u,1 + w n,1 )T Pn,n (A n,u δ u,1 + w n,1 )

(
= δ1,u
T T
A u,n + w n,1
T
)
Pn,n (A n,u δ u,1 + w n,1 )
= (δ T
1,u A T
u,n +w T
1,n )(Pn,n A n,u δ u,1 Pn,n w n,1 )
= δ A PAδ + δ A Pn,n w n,1 + w 1,n
T T T
Pn,n A n,u δ v,1 + w T Pw
T
1,u
T
u,n
1442443 1442443
1x1 1x1

Note that: δ 1,Tu A u,T n Pn,n Wn,1 = W1,Tn Pn,n A n,u δ u,1

φ = δ T (A T PA )δ + 2(w T PA )δ + w T Pw = min
♦ The condition for φ to be minimum is that it derivative with respect to all
variables must be zero. The only variable in the minimisation (variation) function
φ is the vector δ.
∂φ
♦ Thus for φ = min → =0
∂δ

∂φ
∂δ
(
= 2δ T A T PA + 2 w T PA + 0 = 0 ) ( )
♦ Transpose the whole equation (note: PT = P symmetric matrix)

(1A44P24A43) δˆ
T
u,n n,n n,u u,1 + A u,T n Pn,n w n,1 = 0
14243
u,u u,1

## u = Normal Equations Vector (also known as the vector of constant terms)

Page 16 of 17
Naser El-Sheimy ENGO361

## ♦ The solution of the Normal Equations for δ is:

δˆ = −N −1u
(
δˆ = − A T PA )−1
A T Pw
xˆ = x o + δˆ

♦ It is interesting to note here that the normal equations system can be derived
directly from the variation function in a more direct way:

φ = v T Pv = min
Where

v = Aδ + w .
And therefore,

∂φ ∂φ ∂v
= ⋅
∂δ ∂v ∂δ
(
= 2v T P ⋅ (A ) = 0 )
Which, after transposing, yields the following

A T Pv = 0
A T P(Aδ + w ) = 0
(A T
)
PA δ + A T Pw = 0
N δ + u =0
♦ Solution for the residuals v

vˆ = Aδ + w
= − A A T PA( ) −1
A T Pw + w
( ( ) A P + I )w
= − A A T PA
−1 T

vˆ = (I − A(A PA ) A P )w
T −1 T

ˆl = l obs + vˆ
n,1 n,1 n,1

Page 17 of 17
♦ This chapter provides more detail on the Least Squares solution equations for the
parametric method (also known as Observation Equation Least Squares).
5.1. Estimated Parameters and Adjusted Observations

## ♦ Recall that the parametric math model is given by:

Ln,1 = F n,1 (X u,1)
♦ Starting with the linearised functional Model

## A n,u δ u,1 + w n,1 = v n,1 with POE : x o

xˆ = x o + δˆ
♦ As discussed before, the best estimate is the one that satisfies the Least-Squares
Condition:

φ= ∑v 2
= min or ∑ pv 2
= min
In matrix form, this is:

T

observations)

T

## = (δ 1,T u A Tu,n + w Tn,1 )Pn,n (A n,uδ u,1 + w n,1 )

Including P inside the 2nd term
= (δ 1,Tu A Tu,n + w 1,Tn )(Pn,n A n,uδ u,1 + Pn,n w n,1 )
= δ T A T PAδ + δ 1,Tu A Tu,n Pn,n w n,1 + w 1,Tn Pn,n A n,uδ v,1 + w T Pw
144244 3 144244 3
1x1 1x1

Note that: δ 1,Tu A u,T n Pn,n Wn,1 = W1,Tn Pn,n A n,u δ u,1 = scalar quantity,
therefore:
δ 1,T u A Tu,n Pn,n Wn,1 + W1,Tn Pn,n A n,uδ u,1 = 2( W1,Tn Pn,n A n,uδ u,1 ) = 2(δ 1,T u A Tu,n Pn,n Wn,1 )

φ = δ T (A T PA )δ + 2(w T PA )δ + w T Pw = min
♦ The condition for φ to be a minimum quantity means its derivative with respect to
all variables within the equation must be zero. The only variable in the
minimisation (variation) function φ is the vector δ.

∂φ ∂ ( X T CX )
♦ Therefore, for φ = min → = 0 (recall from Lab 1: = 2X TC )
∂δ ∂X

∂φ
= 2δ T (A T PA ) + 2(w T PA ) + 0 = 0
∂δ
♦ Transpose the whole equation (note because P is symmetric matrix, then: PT = P)

(1A44P24A43) δˆ
T
u,n n,n n,u u,1 + A u,T n Pn,n w n,1 = 0
14243
u,u u,1

## N u,u δˆ u,1 + u u,1 = 0

♦ This expression is known as the Normal Equation

## u = Normal Equations Vector (also known as the vector of constant terms)

♦ The solution of the Normal Equations for δ is:

δˆ = −N −1u
(
δˆ = − A T PA )
−1
A T Pw
xˆ = x o + δˆ

♦ It is interesting to note here that the normal equations system can be derived
directly from the variation function in a more direct way:

φ = v T Pv = min
Where

v = Aδ + w .
And therefore,

∂φ ∂φ ∂v
= ⋅
∂δ ∂v ∂δ
= 2v T P ⋅ (A ) = 0 ( )
Which, after transposing, yields the following

A T Pv = 0
A T P(Aδ + w ) = 0
(A T
)
PA δ + A T Pw = 0
N δ + u =0
♦ Solution for the residuals vector (the correction to the observations):

vˆ = Aδ + w
(
= − A A T PA )−1
A T Pw + w
( ( ) A P + I )w
= − A A T PA
−1 T

vˆ = (I − A(A PA ) A P )w
T −1 T

## ♦ The Adjusted observations vector:

ˆl = l obs + vˆ
n,1 n,1 n,1
5.2. Estimated Variance-Covariance Matrices for the Adjusted

## ♦ Since Least Squares only estimates the values of x̂ ( xˆ = x o + δˆ ) and l̂ ( ˆl = l + vˆ ),

it is always necessary to compute a measure of precision for these estimated
quantities ( C xˆ and C ˆl ).

♦ The general only way to compute C xˆ and Cˆl is through the use of covariance
law.

## 5.2.1. Cδ - The V-C matrix of the solution vector δ

Functional Model

δˆ = − N −1u
= −(A T PA ) A T Pw
−1

= −(A T PA ) A T P(f (x 0 ) − l )
−1

(x 0 ) + (1A4
= −(A T PA ) A T P f{ PA ) A T P {
−1 T −1
l
144244 3 4244 3
CONSTANT
14 4424443 14
CONSTANT
4244
CONSTANT
3 VARIABLE
K2 K1

Cδ = K 1Cl K 1T
Remember that

## P = σ 02 C l−1 σ 02 is called the variance factor (or variance

of unit weight, or apriori variance factor)
C l = σ 02 P −1 It is usually chosen = 1
Using the covariance law,

## Cδ = K 1Cl K 1T Note : K 1 = N −1 A T P and Cl = σ 02 P −1

Substituting for K1 and Cl = σ 02P −1

= ( N −1 A T P) σ P −1 (N −1 A T P )
2 T
0

= N −1 A Tσ P P −1 (N −1 A T P )
2 T
0
123
I

=σ 2
0
N −1 A T I (N −1
A T P)
T

=σ 2
0
N −1 A T (P A N −1 )
=σ 2
0
N −1 A T ( P A N −1 )
=σ 2
N −1 A T P A N −1
0
1424 3
N

=σ 2
0
−1
N NN −1

Cδ = σ 2
0
N −1
That is, the variance-covariance matrix of the solution vector δ

Cδ = σ 02 N −1

## 5.2.2. C xˆ - The V-C matrix of the adjusted parameters:

xˆ = x{o + δˆ
constant

C xˆ = JC δ J T
T
 ∂xˆ   ∂xˆ 
=  Cδ  
 ∂δ   ∂δ 

σ 02 ≠ 1, then
∴ C xˆ = Cδ = N −1 If C xˆ = σˆ 02 N −1
2
Note: we have introduced a new quantity σ̂ 0 which is called the a posteriori variance
v T Pv
factor, where
σ̂ =
2
0 (degree of freedom (d.o.f.) = n – u)
d.o.f
5.2.3. C ˆl - The V-C matrix of the adjusted observations

Function Model
ˆl = f (xˆ ) = Axˆ

## To get C ˆl , apply the covariance Law

T
 ∂ˆl   ∂ˆl 
C ˆl =  C xˆ   = JC xˆ J T
 ∂xˆ   ∂xˆ 
C ˆl n.n = A n,u C xˆ u,u A u,T n
We can also derive an expression for the variance-covariance matrix for v (
the residuals vector )

C v = C l − C ˆl

## 5.3. Iterative solution to the Parametric Model

♦ Steps:
1. Identify the elements of x u,1 and l n,1
2. Form the observation equations l = f(x)
3. Find approximate values for xo ( using u observation equations) and use
them to evaluate the design linearised equations:
v n,1 = A n,u δ u,1 + w n,1
Where

∂f
A=
∂x x = xo
( )
w n,1 = f n,1 x o − l n,1

## 4. Establish the CL Matrix (check the units)

5. Establish the weight matrix
P = σ 02 Cl−1 (choose the desired variance factor to simplify the
computations)
Usually σ 02 can be assumed = 1.
Note: If Cl is diagonal Matrix
 1 0 0 0  1 0 0 0
 
4 0 0  1 0 0 
Cl =  ⇒ C l−1 = 
4

 sym 6 0  sym 1
6 0
   
 9  1
9

T T

## 7. Update the approximate values

xˆ 1 = x 10 + δˆ 1
Note: If model is linear go to step 11
8. Check the magnitude of each element of δ̂ 1 for significance
9. If δ̂ 1 had significant element (will be discussed later) choose the new POE

x 02 = xˆ 1
• Calculate A2, W2 with x 02 and l
• Solve for δˆ 2 = − N 2 u 2 (step 6)
−1

## • Update the approximate values (step 7)

• Check the elements of δ̂ (step 8)
10. Repeat step 9 until there is no significant elements in δ̂
v T Pv
ˆ =
11. Calculate C xˆ , C ˆl , C v and σ
2
0
d.o.f
5.4. Numerical Example: Parametric Least-Squares

♦ The opposite sketch shows a leveling network abcd, in which point (a) is assumed
to be fixed with zero elevation.
Section
b
length
Observations:
 h1   6.16  4 km
h  12.57  h4 h6
 2   2 km
h5
 h3   6.41  2 km
l 6,1 =   =  m d
h4   1.09  4 km
 h5  11.58 a c
    2 km h2 h3
h6   5.07  4 km Ha = 0.0
Parameters (unknowns):
 hb  h1
x 3,1 =  hc 
hd 

♦ Required
given that the variance of the elevation differences = 1 cm2/km estimate the
elevation of point b, c, and their variance-covariance matrix C xˆ .
Note: n = 6, u = 3
♦ Solution

hˆ1 = hˆc − ha
hˆ2 = hˆd − ha
ˆh = ˆ ˆ
− hc + hd
3

hˆ4 = hˆb − ha
hˆ5 = − hˆb + hd
ˆ ˆ
h6 = − hb + hc ˆ
{ 1444424444 3
l f ( xˆ )

## 2. Estimate an approximate value (POE) of the parameters xo using any 3

equations of the 6 observation equations.

## Using equations 4, 1, and 2 (use easy equations):

0
hb = Ha + h4 = 0.0 + 1.09 = 1.09m
0
hc = ha + h1 = 0.0 + 6.16 = 6.16m
0
hd = ha + h2 = 0.0 + 12.57 = 12.57m
Hence

 1.09 
x o
3,1 =  6.16  m
12.57 

## 3. Calculate the misclosure vector – w = f (x o ) − l

(Remember that our model l = f(x) will not provide unique value for x
because l contains random errors and this is the main reason for vector w
not being zero. We use approximate value for x which we call x0 and
different x0 will lead to different w, but we should end up with the
solution)

 0 
 w1   h0c − ha   h   0.0 
1
w   h − h  h   0.0 
 2   d0 a
  2  
 w3   − hc + hd   h3   0.0 
 = 0 −  = m
 w4   hb − ha  h4   0.0 
 w5   0 0   h5  − 0.1
  − hb + hd     
 w6   0 0  h6   0.0 
− hb + hc  {
1 l
44244 3
( )
f x0

Note: Equations 4, 1, and 2 will, definitely, have zero elements for the w vector
4. Write down the linearised equations – v 6,1 = A 6,3 δ 3,1 + w 6,1

0 1 0
0 0 0

 0 −1 1
A 6,3 =  unitless
1 0 0
− 1 0 1
 
− 1 1 0

## Note: The units of v, w, and δ are all in meters.

5. Construct the Cl matrix – in a levelling network the variance of the
observed elevation-differences are proportional to the length of the
corresponding section and are usually uncorrelated.
∴ σ h2 ∝ L in our example σ hi2 = 1cm 2 / km

## C l 6,6 = diag σ h21 [ σ h2 2

σ h2 3
σ h2 4
σ h2
5
σ h2 6
]
C l = diag [4 2 2 4 2 4]cm 2

## Scale Cl to (meter)2 to be consistent with v, w, and δ

C l = 10 −4 [ 4 2 2 4 2 4 ]m 2
And recalling that P matrix α C l−1 P = σ 02 C l−1

## P = σ 02 ⋅104 ⋅ [14 12 12 14 12 14] 1m2

144444244444 3
Cl−1

If we assume σ 02 = 10 −4

∴ P = 10 −4 ⋅ 10 −4 [1 4 1
2
1
2
1
4
1
2
1
4 ] 1m 2

∴ P = [1 4 1
2
1
2
1
4
1
2
1
4 ] 1m 2

## Note: we choose σ 02 to simplify the computations.

6. Solve for the estimated parameters
−1
δ 3,1 = −N 3,3 u 3,1

##  1.0 − 0.25 − 0.5

N 3,3 = A P6,6 A 6,3
T 
= − 0.25 1.0 − 0.5 (Note: N is always
3, 6
symmetric)
 − 0.5 0.5 1.5 

## 1.6 0.8 0.8  0.05 

N −1
= 0.8 1.6 0.8 u 3,1 = A Pw =  0.00 
T

## − 0.04  1.09  − 0.04  1.05 

∴ δ = N u =  0.00  m → xˆ = x + δ =  6.16  +  0.00  =  6.16  m

−1  o

##  0.02  12.75  0.02  12.59

 1.05 
Adjusted parameters: x̂ =  6.16  m
12.59

7. Adjusted observations ˆl = l + v
 0.00   6.16   0.00   6.16 
 0.02  12.75  0.02  12.59
       
 0.02   6.41   0.02   6.43 
vˆ = Aδˆ + w =   m ˆl = l + vˆ =  + + m
 − 0 . 04   1.09  − 0.04  1.05 
− 0.04 11.58 − 0.04 11.54
       
 0.04   5.07   0.04   5.11 

## Note: If we use any 3 equations of the 6 observation equations and we use

l̂ instead of l, we should estimate the same x̂ .
8. Now, we estimate some parameters which express the accuracy of x̂ and l̂
C xˆ 3,3 = σˆ 02 N −1

## v 1T, 6 P6,6 v 6,1 0.002

σˆ 02 = aposteriori variance factor = = = 6.7 x10 − 4
−u
n{ 6−3
deg rees of freedom

## 10.67 5.33 5.33

C xˆ = σˆ N2
0
−1
= 10  5.33 10.67 5.33 m 2
−4

##  5.33 5.33 8.00

σ h = 10 −4 x10.67 = 3.27cm
b

σ h = 10 −4 x10.67 = 3.27cm
c

σ h = 10 −8 x8 = 2.83cm
d

## Finally (if needed), the v-c matrix of l̂ is computed

 10.67 5.33 − 5.33 5.33 0.00 5.33 
 8.00 2.67 5.33 2.67 0.00 

 8.00 0.00 2.67 − 5.33
C ˆl = AC xˆ A T =  
 symmetrical 10.67 − 5.33 − 5.33
 8.00 5.33 
 
 10.67 

0.0
Homework: Try to solve this problem with x = 0.0 , you should end up
o

0.0
with the same results. In general, for any linear model, you can always
assume x o = 0
♦ The general form of the Conditional mathematical model was given before as:

( )
f r ,1 ˆl n ,1 = 0 r ,1 where ˆl = l + vˆ
l̂ is the vector of adjusted observations
v is the vector of adjusted residuals
r (degrees of freedom) = n –u
♦ Two basic properties must be satisfied for the conditional model:
1) Number of equations = Number of degrees of freedoms. This means that each
redundant observation provides one independent condition equation.
2) The equations describe the functional relationship among the observations
only. This obviously indicates that the unknown parameters x will not be
parameters x̂ and their covariance matrix C xˆ , have to be computed after the
()
adjustment using the direct model ( xˆ = f ˆl ) and the law of propagation of
variances (this is a disadvantage when comparing the conditional and the

## 6.1. Examples of Surveying Problems using the Conditional Math

Model:
1. Levelling Networks – Two types of conditions:
i) Levelling line with fixed end-points HB

## x1T, 2 = [H C H D ] LT1, 3 = [h1 h2 h3 ] r =1 HC HD

HA h3

h1 + h2 + h3 + (H A − H B ) = 0
h1 h2

## ii) Closed loop HA

x1T, 2 = [H B H C ] LT1, 3 = [h1 h2 h3 ]
h1
r =1
h3
HB
h1 − h2 − h3 = 0
HC
h2
♦ How to choose the condition equations:
i) Make sure that each observation appears at least once in the condition
equations.
ii) Insure that the equations are linearly independent. If you insure that the
number of equations = r, this by default will give you linearly independent
equations.

♦ Example:
h3
h1 h2
b
a
h4
u=3 h5
n=5
r=2

 h1   0
raw(1) 1 − 1 1 0 
0  h2  H − H b   0
   a  
raw(2 ) 0 − 1 0 1 1   h3  +  0  =  0
  
 
raw(3) 1 0 1 − 1 − 1 h4   H a − H b  0
 h5  0
 Note:
Raw (3) = Raw (1) – Raw (2)

(i.e. Raw (3) is linearly dependent on Raw (1) and Raw (2))
2. Traverse Networks
i) Traverse connecting two fixed
points
1 α2 b α3
é α1 
α1
a d1
é x1 ù ê
ê
d1  fixed d2 d3
point 2
ê
y1 úú α 2 
x= ê l=   2
x2 ú d 2  ∑ ∆x − (x − x a ) = 0 where ∆xi = d i sin α i
ê
i b
ê ú
y2 û α 3  i =i
 
ë
2
ë 3 
d  ∑ ∆y − ( y
i =1
i b − ya ) = 0 ∆yi = d i cos α i

n = 4, u = 6

r =6−4= 2

n=8 u=6 α2 α3

r=8–6=2 d2
d1 d3
∑ ∆x i =0 α1
∑ ∆y i =0 α4
d4

## iii) Closed Traverse with observed

condition θ2 θ3

4
θ4
∑θ
i =1
i −K =0 θ1

where K = (2S − 4) ⋅ 90 0
θ2
and S = no of stations

e.g. if s = 4 → K = 3600 θ1
θ3
e.g. Triangle θ1 + θ 2 + θ 3 − 180 = 0
6.2. linearized (or reduced) form of the condition equation:

## ♦ As stated before, the least squares adjustment requires a linear model.

♦ As in the parametric case, the Taylor expansion is used.
♦ The point of expansion (POE) here is defined by the values of the observation
vector.
♦ The linearised functional model
fr,1 (l n,1 ) = Or,1
POE : l obs
∂f
f (l obs ) + v=0
♦ ∂L r equations in n unknowns (r<n)

## w r,1 + B r,n v n,1 = 0 r,1

∂f
 where B = and w = f(l)
∂l
♦ Stochastic model
Cl = P-1
n,n n,n

♦ Notes:
 After adjustment, the misclosure vector (w) must become zero.
 Both matrices B and W are numerically known, the only unknown here is
the residual (v) vector.
♦ Variation function φ in matrix form:
φ = vT P v = min.
♦ In the parametric model derivation, an explicit substitution was made for v. This
cannot be done here since v is not isolated in the functional model.
♦ Could we do the following in order to come up with an expression for (v)?
1. Starting with B v̂ + w = 0
 pre-multiply by B-1

B-1 B v̂ = - B-1 w

I v̂ = - B-1 w
 This is not possible since B is not square matrix (r < n always)

2. Starting with B v̂ + w = 0
 pre-multiply by BT
BT B v̂ = - BT w

v̂ = - (BT B) -1 BT w
 To examine the validity of this possibility, look at the rank and dimension
of B and BT B.
 Provided that all the condition equations are independent, rank (B) = r (its
smallest dimension).
 Since rank (BT B) = rank (B) = r < n and dimension of (BT B) = n, n
 Therefore, BT B is singular (i.e., (BT B) -1 does not exist)

## ♦ The way to minimise φ is to use the Lagrange multiplier technique.

♦ Lagrange multipliers allow the minimisation of φ = v̂ T P v̂ subject to the
condition B v̂ + w = 0.
♦ The variation function is:

ϕ = vˆ 1,n
T
Pn,n vˆ n,1 + 2kˆ 1,r
T
(B r,n v n,1 + w r,l ) = min
 Where k is the vector of Lagrange multipliers (one-multiplier/condition)
(The factor 2 is introduced for convenience only)
 Note: the second term equal to zero (B v̂ + w = 0) and therefore does not
change the value of φ

## ♦ To minimise φ, differentiate with respect to V and k

φ = vT P v + 2kT (B v + W) = min

## φ = vT P v + 2kT B v + 2kT W = min

∂φ
= 2vˆ T P + 2kˆ T B = 0
∂v

∂φ
= 2vˆ T B T + 2w T = 0
∂k
♦ Transpose each and divide by 2

## Pn,n vˆ n,1 + B n,T r kˆ r,1 = 0 – n equations in (n + r) unknown

B n,r vˆ n,1 + w r,1 = 0 – r equations in (n) unknowns

## ♦ Express in hyper-matrix form

 P B T   vˆ   0   0 
 
 B 0   kˆ  +  w  =  0 
      
♦ Solution possibilities to the hyper-matrix system:
1. Invert hyper-matrix
−1
 vˆ   P BT   0 
  = −   

ˆ
k  B 0  w
 This solution requires inversion of a large matrix

2. Elimination
 For the hyper-matrix system

 A B   x   u   0
    +   =  
 C D  y   v   0

Ax+By+u=0 (i)

Cx+Dy+v=0 (ii)
 Eliminate x from (i)
x = - A-1 (B y + u) (provided A-1 exists!) (iii)
 Now substitute (iii) into (ii)
- C A-1 (B y + u) + D y + v = 0

- C A-1 B y - C A-1 u + D y + v = 0

(D - C A-1 B) y + (v - C A-1 u) = 0
 In this case
A = P, B = BT, C = B, D = 0, x = v̂ , y = k, u = 0, v = w
 substitute into
(D - C A-1 B) y + (v - C A-1 u) = 0

(0 - B P-1 BT) k + w = 0

(B P-1 BT) k = w
∴ k = (B P-1 BT)-1 w = (B Cl BT)-1 w
Note: Since the P matrix is not included in the computation of K or the V vectors,
it is usually at the case to use σ o2 =1

## rank (B) = r rank (P) = n > r rank (BP-1BT)-1 = r

 To solve for v̂ , substitute k into

## P v̂ + BT k = 0 (first group of hyper-matrix)

v̂ = - P-1BT k = - Cl BT (B Cl BT)-1 w

ˆl = l + vˆ
♦ Estimated variance factor

v T Pv
σ̂ 02 =
r

## ♦ Covariance matrix for the adjusted residuals – C vˆ

 Functional model

vˆ = − C l B T (BC l B T ) −1 w
= − C l B T (BC l B T ) −1 f(l) = − Kf(l)
 Apply the law of Variance-covariance propagation
T
 ∂vˆ   ∂vˆ 
C vˆ =  C l  
 ∂l   ∂l 
∂vˆ ∂ ( Kf ( l )) ∂( f ( l ))
=− = −K = − KB = −C l B T (BC l B T ) −1 B
∂l ∂l ∂l

## C vˆ = [C l B T (BC l B T ) −1 B]C l [C l B T (BC l B T ) −1 B]T

= C l B T (BC l B T ) −1 BC l B T (BC l B T ) −1 BC l

∴ C vˆ = Cl B T (BCl B T ) −1 BCl
♦ Covariance matrix for the Adjusted Observations – C ˆl
 Functional model

ˆl = l + vˆ
= l − C l B T (BC l B T ) −1 w
= l − C l B T (BC l B T ) −1 f(l)
 Covariance propagation
T
 ∂ˆl   ∂ˆl 
C ˆl =  C l  
 ∂l   ∂l 

∂ˆl
= I − Cl B T (BCl B T ) −1 B
∂l

## C ˆl = [I − C l B T (BC l B T ) −1 B]C l [I − C l B T (BC l B T ) −1 B]T

= [I − C L B T (BC L B T ) −1 BC L ][I − C L B T (BC L B T ) −1 B]T
= C l − C l B T (BC l B T ) −1 BC l − C l B T (BC l B T ) −1 BC l
+ C l B T (BC l B T ) −1 BC l B T (BC l B T ) −1 BC l

∴ C ˆl = C l − C l B T (BC l B T ) −1 BC l

∴ Cˆl = C l − C vˆ

## 6.3. Iterative solution of the condition model

♦ Linearised model
Bv+w=0
♦ Initial Computation

∂f
B (1) =
∂l l obs
w (1) = f(l obs )
♦ Iteration (1):
 Solve for v̂ (1) ,

##  Correct observations l̂(1) = l obs + v̂ (1)

♦ Iteration (2): needed if the model is non-linear
∂f
B (2) = , in general:
∂l ˆl
(1)

∂f
B ( i +1) =
∂l ˆl
(i )

## w ( i +1) = f(ˆl ( i ) ) + B ( i +1) (l obs − ˆl ( i ) )

 Solve for v̂ (2) where

## v̂ (i+1) = - Cl B(i+1)T (B(i+1) Cl B(i+1)T)-1 w(i+1)

 Correct observations ˆl (2) = l + vˆ (2)
♦ Iterate until

♦ Calculate

## l̂(i+1) = l obs + v̂ (i+1)

6.4. Direct model solution

♦ The condition model adjustment will give the residual vector (v) and the adjusted
observations along with their respective covariance matrices.
♦ The direct model, then, uses the adjusted observations from the condition model
solution to calculate the parameters (if any)
 Functional model

xˆ = g(ˆl)
 variance propagation
T
 ∂xˆ   ∂xˆ 
C xˆ =   Cˆl  ˆ
 ∂ˆl   ∂l 
(see the next example)

## ♦ Consider the case of station adjustment in triangulation network in which

A
α 1 
α 
α
 1  2
α 
T
x 1,3 = α 4  l 1,6
T
=  3 3
 α4  2
α 6  1
α 5 
  D B
α 6 

n = 6, u = 3 4
6 5
r = 6-3 = 3

C
Angle Observed value
α1 89 59 58.3
α2 180 00 01.4
270 00 00.2 All the six angles have the same standard
α3
89 59 59.8 deviation σ = 1", that is Cl = I arcsec2
α4
α5 179 59 57.0
α6 90 00 03.1

♦ Solution
1. The three independent condition equations – f(ˆl) = 0
(α̂1 + α̂ 4 ) − α̂ 2 = 0

(α̂1 + α̂ 5 ) − α̂ 3 = 0

(α̂ 2 + α̂ 6 ) − α̂ 3 = 0
Note: we wrote the condition equations in terms of the adjusted
observations.
2. The linearised condition equations – B 3,6 v 6,1 + w 3,1 = 0

## The B matrix is:

1 − 1 0 1 0 0 
∂f(l) 
B 3,6 = = 1 0 − 1 0 1 0 unitless
∂l
0 1 − 1 0 0 1

## Recall: Since the observations are of the same physical quantities as

the unknowns, the B matrix is unitless. That is the problem is linear
and therefore only one iteration is required.

## The misclosure vector w3,1 is computed as:

 (α1 + α 4 ) − α 2   − 3.3
w 3,1 = f(l) =  (α1 + α 5 ) − α 3  = − 4.9 arcsec
(α 2 + α 6 ) − α 3   4.3

## Note that v and w will have the same units.

3. Covariance matrix of the observations – Cl
Since all angles are of equal precision and uncorrelated, Cl will be a
unit matrix, i.e.: Cl = I and P = σ 02 C l−1 = I

## σ 02 = apriori variance factor could be assumed =1.

4. The least-squares estimated values of the vector of residuals – v̂

v=− C
{l B T
{ ( BC lB )
T −1
w
{
unitless
123 arc sec
arc sec2 sec2 arcsec
1arce
42 43
1
arc sec2

 3 1 − 1
BC l B = BB =  1 3
T T
1 (B Cl BT should be symmetric)
− 1 1 3
 − 0.65
(BB ) w =  2.7 
T −1

 − 2.55
 2.05
 − 1.90
 
 − 0 . 15 
∴ v = −C l B T (BC l B T ) −1 w =   arcsec
 − 0.65
 2.70
 
− 2.55
5. The adjusted observations – l̂
αˆ1   90 0 0.35
αˆ  179 59 59.50
 2  
αˆ 3  270 0 0.05
ˆl = l + vˆ =   =  
αˆ 4   89 59 59.15
αˆ 5  179 59 59.70
   
αˆ 6   90 0 0.55
check f(ˆl) = 0
6. The variance-covariance matrix of the residual vector – C vˆ
Cv = Cl BT (B Cl BT)-1 B Cl
Remember: These matrices have been computed before in the computation
of v̂
7. The variance-covariance matrix of the adjusted observations – C ˆl

C ˆl = C l − C vˆ

## 8. The unknown parameters x̂ and their variance-covariance matrix C xˆ

Since the unknown parameters are directly observed (i.e., they are
among the l vector), we can simply extract the C xˆ from the C ˆl matrix.
But if you want a general way of doing that:

x̂ = g(l̂)
α̂ 1 
 
α̂
α̂ 1  1 0 0 0 0 0  2 
  α̂ 
x̂ = α̂ 4  = 0 0 0 1 0 0 ⋅  3 
α̂ 6  0 0 0 0 0 1 α̂ 4 
  α̂ 
 5
α̂ 6 

C xˆ = JC ˆl J T
6.6. Summary of Parametric and Condition Least Squares

Mathematical ˆl = f (xˆ ) f r,1 (ˆl n,1 ) = 0 r < n, conditional is
n,1 n,1 u,1
Model preferable.
- No ambiguity since r = n−u
each observation - Some ambiguity
provides an equation. may occur during
the stage of
selecting the
condition equation.
Design matrices vn,1 = An,u δu,1 + wn,1 B r,n v n,1 + w r,1 = 0 (B, w) takes less
and linearisation ∂f(x) effort than (A, w).
A= ∂f(l)
∂x x =x0 B= Thus conditional is
∂l l preferable.
0
- x is required - No approximates
w = f(x o ) − l are needed for l
- More computations - w = f(l) direct
calculation.
-1 If u < r, parametric
Size of inversion T −1
δ = - (A PA) u,u A Pw v = -C l B T (BC l B T ) r,r
T
w
is preferable.
- size of inversion is - size of inversion is
If r < u, conditional
(u×u) (r×r)
is preferable.
Final results xˆ = x + δˆ
0 ˆl = l + vˆ Conditional results
ˆl = l + vˆ C ˆl = C l B (B C l B ) are incomplete
T T -1

C xˆ = σ̂ 02 (A T PA) −1
C ˆl = AC xˆ A T
v T Pv
where σ̂ 02 =
r
No additional x̂ = g(l̂) More computations
computations are
C xˆ = JC ˆl J T
needed
7. COMBINED (IMPLICIT) MODEL
♦ Linearised functional model (m equations)

## Am,u δu,1 + Bm,n vn,1 + wm,1 = 0m,1 (u+n unknowns in m equations)

 With stochastic model Cl (n,n)

## ♦ To minimise the variation function, differentiate with respect to v̂ , δ̂ , k̂ and set

equal to zero:

∂ϕ
= 2vˆ T P + 2kˆ T B =0
∂vˆ
∂ϕ
= 2kˆ T A =0
∂δˆ
∂ϕ
= 2δˆ T A T + 2vˆ T B T + 2w T =0
∂kˆ
Dividing by 2 and transposing:

Pvˆ + B Tkˆ =0

A k =0 ②
Aδˆ + Bvˆ + w = 0
♦ The above equation can be written in hyper-matrix notation with the following
conditions:
1) The upper left matrix of the hyper-matrix must be invertible (the P matrix
is invertible)
2) The hyper-matrix should be symmetric (arrange the equations to achieve
this condition)

 P BT 0   vˆ   0   0 
      
B 0 A   kˆ  +  w  =  0 
 0 AT 0   δˆ   0   0 

 A B   x   u   0
♦ Partition into     +   =   and eliminate x ( v̂ in this case) using
 C D  y   v   0
(D - C A-1 B) y + (v - C A-1 u) = 0
 substitute for A, B, C, D, x, y, u, and v

 
 0 A   B  −1 T ˆ 
 k   w   B    0 
 T  −  P (B 0 )  +   −  P −1 (0 ) =  
 A 0   0  δˆ   0  104 243   0 
 0 

 0 A   BP −1 B T 0   kˆ   w   0 
 T −    +   =  
 A 0   0 0  δˆ   0   0 
144444244444 3

 − BP −1B T A   kˆ   w   0 
    +   =  
 AT 0   δˆ   0   0  ③

♦ Partition this hyper-matrix to eliminate K̂
[0 - (AT) (-B P-1 BT) -1 A] δ̂ + [0 - (AT) (-B P-1 BT)-1 (w)] = 0

## ∴ AT (B P-1 BT) -1 A δ̂ + AT (-B P-1 BT)-1 w = 0

∴ N-1 δ̂ + u = 0
-1
δ̂ = - N u recall P-1 = Cl
 Note: if B = - I (i.e. Parametric Model)
N = AT [(-I) P-1 (-I)T)] -1 A = AT (P-1) -1 A = AT P A

## u = AT [(-I) P-1 (-I)T)] -1 w = AT (P-1) -1 W = AT P w

which is the same results as the parametric model
♦ substitute δ̂ into the first group of equation ③ to estimate k̂
(-B P-1 BT) k̂ + A δ̂ + w = 0
k̂ = (B P-1 BT) -1 (A δ̂ + w) = (B CL BT) -1 (A δ̂ + w)
k̂ = M-1 (A δ̂ + w) ………….… where M = B CL BT
♦ substitute k̂ in the first group of equation ② to estimate v̂
P v̂ + BT k̂ = 0

## v̂ = - P-1 BT k̂ = - P-1 BT (B P-1 BT) -1 (A δ̂ + w)

That is
T T -1
v̂ = - Cl B (B Cl B ) (A δ̂ + w)
= - Cl BT k̂
♦ Therefore, the adjusted quantities are:

xˆ = x 0 + δˆ

ˆl = l + vˆ

## ♦ The covariance matrices are:

1. C δ̂
 Functional model

## δ̂ = - constant f(l, x0) not stochastic

T
 ∂δˆ   ∂δˆ 
C δˆ =  C l  
 ∂l   ∂l 
where

∂δˆ ∂f(l, x 0 )
= −constant = −constant B
∂l ∂l
 Therefore:

Cδˆ = { [A T
(BC l B T ) −1 A]−1 A T (BC l B T ) −1 B }C l
{B T
(BC l B T ) −1 A[A T (BC l B T ) −1 A]−1 }
Cδˆ = N −1 A T (BC l B T ) −1 BC l B T (BC l B T ) −1 AN −1
= N −1 A T (BC l B T ) −1 AN −1
= N −1 NN −1 = N −1
= [A T (BC l B T ) −1 A]−1

Cδˆ = [A T M −1A]−1
2. C xˆ

xˆ = x o + δˆ

C xˆ = Cδˆ

3. C vˆ without proof

## C vˆ = Cl BTM −1BCl − Cl BT M −1AN −1A TM −1BCl

 where

M = BC l B T
4. C ˆl
ˆl = l + vˆ

Cˆl = Cl − C vˆ
7.1. Iterative Solution of the combined model

♦ Linearised model
Aδ + Bv + w = 0

## ♦ At iteration (i), calculate

∂f ∂f
A (i) = B(i) =
∂x X o(i) , l ∂l X o(i) , l

## At the current point of expansion (POE)

ˆ (i −1)
X 0(i) = X
l = l obs
♦ calculate δ̂ (i) and v̂ (i)
-1
δ̂ (i) = - N(i) u(i)

 Where

## N(i) = A(i)T M-1 A(i)

u(i) = A(i)T M-1 w(i)
and
M(i) = B(i) P-1 B(i) T
w i = f(x (i) , l (i) ) + B(i) (l obs − ˆl(i −1) ) ……. [Note for
i =1 the second term will be zero]
k(i) = M-1 (A(i) δ̂ (i) + w(i))
♦ where
ˆl = l + vˆ
(i) (i)

## xˆ (i) = x 0(i) + δˆ (i) = xˆ (i −1) + δˆ (i)

♦ repeat until δˆ (i +1) − δˆ (i) approaches 0
7.2. Example
 The 2-D co-ordinate transformation (shift, rotation, and scale) between
two co-ordinate systems (x, y) and (u, v) is given by

## X  cos θ − sin θ  U   TX 

  = S     +  
 Y i  sin θ cos θ   V  i  TY 

Y V

U
TX θ

TY
X

##  For simplicity we will assume that TX = TY = 0, therefore we can write the

2-D model as

 X  a − b  U
  =    
 Y i  b a   V i
 Therefore, the unknowns are a an b
 In order to estimate a and b, observations are required. In this case, the
following table gives 3 points of known co-ordinates in both systems.
i U V X Y
1 0.0 1.0 -2.1 1.1
2 1.0 0.0 1.0 2.0
3 1.0 1.0 -0.9 2.8

 For all three given points, the Cl matrix of the (U, V) co-ordinates is

0.01 0 
C li =  = 0.01 I 2,2
 0 0.01

∴ C l = 0.01 I 6,6

##  All (X, Y) co-ordinates are to be considered constants.

 Solution:
1. l and x
 U1 
 
 V1 
U  a
l = 2 n = 6 x =   u = 2
 V2  b
U 
 3
V 
 3

2. Mathematical model
f1 → a Ui - b Vi - Xi = 0

f2 → b Ui + a Vi - Yi = 0 ⇒ a Vi + b Ui - Yi = 0

3. Linearised equations

## A 6,2 δˆ 2,1 + B 6,6 vˆ 6,1 + w 6,1 = 0

 U1 − V1  0 − 1
V U1   1 0
 1  
∂f U − V2   1 0
A= = 2 = 
∂x x=x0 ,l =lobs  V2 U 2  0 1
U3 − V3   1 − 1
   
 V3 U 3   1 1
Note that A is not a function of x or l.

U1 V1 U 2 V2 U 3 V3
a 0
− b0 0 0 0 0 
 0 0 
b a 0 0 0 0 
∂f
=0 a −b 
0 0
B= 0 0 0
∂l x = x 0 ,l =l obs
 0 0 
0 0 b a 0 0 
0 0 0 0 a0 − b0 
 
 0 0 0 0 b0 a 0 
Therefore, ao and bo are needed to evaluate B. They (ao and bo) can be
evaluated by simple computation through the use of 2 equations of the
math model.
a0 = 1, b0 = 2 using the equations of point 2
 X1 − a 0 U1 + b 0 V1   − 0.1
   
 Y1 − a V1 − b U1   0.1
0 0

X − a 0 U + b 0 V   0.0
w = f(x 0 , l) =  2 0 2 0 2  =  
 Y2 − a V2 − b U 2   0.0
 X − a 0 U + b 0 V   0.1
 3 0 3 0 3  
 Y3 − a V3 − b U 3  − 0.2

4. The δ̂ vector
δˆ = −N −1u = −[A T (BP −1B T ) −1 A]−1 A T (BP −1B T ) −1 w

 0.0 
δ̂ =  
− 0.05

## 1.0   0.0   1.0 

∴ xˆ = x 0 + δˆ =   +   = 1.95
2.0
   − 0.05  
5. The v̂ vector
vˆ = −Cl B T (BCl B T ) −1 (Aδˆ + w)

## 0.0  0.01  0.01

1.0   0.08 1.08
     
1.0   0.02 1.02
ˆl = l + vˆ =   +  = 
0.0
   0.01   0.01
1.0  − 0.05 0.95
     
1.0  − 0.05 0.95

 1.0 
Check: make uses of x̂ ==   to calculate the values of the (X,Y) coordinates
1.95
making use of the (U, V) coordinates and the math model:
 X  a − b  U
  =    
 Y i  b a   V i
8. COMBINATIONS OF MODELS
♦ Assume that observations are made in two groups, with the second group consisting of one or
several observations. Both groups have a common set of parameters, i.e.

♦ Given: 2 sets of observations collected at different times for the same parameters

n1 n2
647 48 647 48
i.e. L1 and CL1 and L2 and CL 2

♦ Required: Xu,1 The best estimate for a group of parameters from a group of measurements
that have been taken at two different times ( e.g. L1 at t1 and L2 at t2 )

♦ Functional model:

## F1 ( x, L1 ) = 0 A1m ,u δ u ,1 + B1m1,n1V1n1,1 + W1m1,1 = 0

F2 ( x , L2 ) = 0 A2 m uδ u ,1 + B2 m 2, n 2 V2 n 2,1 + W2 m 2 ,1 = 0
2

## ♦ Variation function using Lagrange Multipliers

φ = V1T P1V1 + V2T P2V2

## + 2 K1T ( A1δ + B1V1 + W1 ) + 2 K 2T ( A2δ + B2V2 + W2 )

♦ Note: For each group of observations, there’s a quadratic form and a Lagrange Multiplier

♦ To minimize φ, differentiate φ w.r.t. all variables ( δ, V1, V2, K1, K2 ) and equate to zero.

## ♦ Then arrange in hyper-matrix notation and solve by elimination.

∂φ
= 2 K1T A1 + 2 K 2T A2 =0
∂δ

∂φ
= 2 V1T P1 + 2 K1T B1 =0
∂V1

∂φ
= 2 V2T P2 + 2 K 2T B2 =0
∂V2

∂φ
= 2 δ T A1T + 2V1T B1T + 2W1T = 0
∂K1

∂φ
= 2δ T A2T + 2V2T B2T + 2W2T =0
∂K 2

## ♦ Transpose all equations and divide by 2 and arrange in hyper matrix

 P1 0 B1T 0 0  V1   0   0 
      
0 P2 0 B2T 0  V2   0   0 
 
 B1 0 0 0 A1  K1  +  W1  =  0 
     
0 B2 0 0 A2  K 2  W2   0 
      
0 0 A1T A2T 0  δ   0   0 

## ♦ To solve this system algebraically, partition and eliminate V1

♦ Then partition and eliminate V2
♦ Perform the elimination until a solution for δ is reached
♦ Then perform back substitution to get expressions for all other vaiables (V1, V2, K1, and K2 )
♦ Then perform the Law of propagation of variance to estimate the v-c matrices
8.1. Summation of Normals – Parametric Models

## ♦ Given: ( L1 and CL1 ) and ( L2, CL2 ) no correlation between L1 and L2

♦ Required: X

♦ Functional Models:

( )
Lˆ1 = F1 Xˆ with the linear model
     
→ A1n1, u δ u ,1 + W1n1,1 = V1n1,1

## Lˆ2 = F2 (Xˆ ) with

 the linear model
 → A2 n 2, u δ u ,1 + W2 n 2,1 = V2 n 2,1

♦ Variation function

## φ = V1T P1 V1 + V2T P2 V2 min

( ) ( )
= δ T A1T + W1T P1 ( A1 δ + W1 ) + δ T A2T + W2T P2 ( A2 δ + W2 )

## = δ T A1T P1 A1 δ + δ T A1T P1 W1 + W1T P1 A1 δ + W1T P1 W1

14243 1424 3
equivelant equivelant
scalers scalers

## + δ T A2 P2 A1 δ + δ T A2 P2 W2 + W1T P1 A1 δ + W2T P2 W2 = min

14243 1424 3
equivelant equivelant
scalers scalers
  
→ ←  


## ♦ Now minimize φ ( φ is only function of δ )

φ = δ T A1T P1 A1 δ + 2δ T A1T P1 W1 + W1T P1 W1
1424 3
Constant

## + δ T A2T P2 A2 δ + 2δ T A2 P2 W2 + W2T P2 W2 = min

1424 3
Constant

∂φ
= 2 δ T A1T P1 A1 + 2 W1T P1 A1 + 0
∂δ

2 δ T A2T P2 A2 + 2 W2T P2 A2 +0 =0
♦ Transpose and divide by 2

(14
A P A + A P A )δ + (A PW + A P W ) = 0
24
T
1 1
3 1424 1
3
T
2 2
1424 2
3 1424 3
T
1 1 1
T
2 2 2
u ,u u ,u u ,1 u ,1
1
42 4 3 142 4
3 1
42 4 3 142 4
3
N1 N2 u1 u2

∴ δ = −( N1 + N 2 ) (U 1 + U 2 )
−1

♦ Note: The δ vector involves addition of the normal equation matrices and vectors
corresponding to each set of observations.

♦ The same procedure can be applied for 3(or more) set of observations with the combined

## 8.1.1. Variance propagation to estimate Cδ and C xˆ C L−1 = P) (

144
( 42444
(
3 144424443
)
δ = −(N1 + N 2 )−1 A1T C L−11 f (x )0 − L1 + A2T C L−1 ( f (x 0 ) − L2 ) ) 2

u1 u2
0
x : non-stochastic and therefore, δ = F(L)

∴Cδ = J C L J T
T
∂δ  ∂δ 
∴ Cδ = CL  
∂L  ∂L 
Where
L   CL 0 
L(n1 + n2 ) =  1  C L(n =  1 
 L2  1 + n2 )( n1 + n2 )
 0 C L2 

∂δ  ∂δ ∂δ 
= 
∂L  ∂L1 ∂L2 

∂δ
= (N1 + N 2 )−1 A1T C L−1 = N −1 A1T C L−11
∂L1

∂δ
= (N1 + N 2 )−1 A2T C L− 2 = N −1 A2T C L−21
∂L2
0  C L−1 A1 N −1 
∴Cδ = (  C L1
N −1 A1T C −1L1
  1 
N −1 A2T C L−21
1444442444443  0 C L2  C L−1 A2 N −1 
)
↓ 144444  22444443
 C L C −1 A1 N −1 
 1 L1
 C C −1 A2 N −1 
1 L2 L2 
 424 3 
 I 

## = N −1 A1T C L−11 A1 N −1 + N −1 A2T C L−21 A2 N −1

( )
= N −1 A1T C L−11 A1 + A2T C L−21 A2 N −1

= N −1 (N1 + N 2 )N −1 = N −1 NN −1

## ∴Cδ = N −1 = (N1 + N 2 )−1

0
∴ Xˆ = X + δ

∴C Xˆ = Cδ
8.2. Sequential Least Squares-Parametric Model

• In the previous section (summation of normals), it has been shown that two sets of
observations ( L1 and L2 ) for the same set of parameters can be combined to get a new
solution

## δˆ = −(N1 + N 2 )−1 (U1 + U 2 ), where

N1 = A1Tu ,n1 P1n1,n1 A1n1,u , and N 2 = A2Tun 2 P2n 2,n 2 A2n 2,u
1442443 1442443
u ,u u ,u

• What if n2 << u (e.g. n2 = 1 and u= 4). With the summation of normals method, a (u,u) matrix
must be inverted again to add the new (single) observation (the assumption here the solution
has been obtained already for the 1st group of observations)

• This can be a significant computational burden specially when observations are being added
at a regular interval in real time, example in GPS positioning.

• Sequential Least squares provides a method where only (n2,n2) inversion is required for
updating the solution with new n2 observations.

Derivation

To derive the sequential expressions, the summation of normal solution is re-written as:

(
δ = − N1 + A2T C L−21 A2 ) (U
−1
1 + A2T C L−21W2 )

(
δ = − N1 + A2T C L−21 A2 )
−1
(
U1 − N1 + A2T C L−21 A2 )−1
A2T C L−21W2

## To proceed, two-matrix inversion lemmas are utilized.

(S −1
+ T T R −1T )
−1
(
= S − ST T R + TST T )
−1
TS.............(i )

(S −1
+ T T R −1T )
−1 T
T R −1 = ST T R + TST T ( ) −1
...........(ii )

Apply lemma (i) to the 1st term and lemma (ii) to the 2nd term of the δ equation with:
N1−1 = S T = A2 R = C L2

1st term:

(
− N1 + A2T CL−21 A2 )−1 (
U1 = − N1−1 − N1−1 A2T CL + A2 N1−1 A2T A2 N1−1 
 
)
−1
U1
144 42444 3 3
2
LHS of Lemma(i)
1 4444444 4244444444
RHS of Lemma(i)

2nd term

(
14444244443
)−1
− N1 + A2T C L−21 A2 A2T C L−21 (
W2 = − N1−1 A2T C L + A2 N1−1 A2T
144442244443
)−1
W2
LHS of Lemma(ii) RHS of Lemma(ii)

## Combining term 1 and 2

(
δ = −  N1−1 − N1−1 A2T C L2 + A2 N1−1 A2T

)−1 A2 N1−1  U1 − N1 A2T (CL 2
+ A2 N1−1 A2T )−1W2
(
δ = − N1−1U1 + N1−1 A2T C L2 + A2 N1−1 A2T )−1
A2 N1−1U1 (
− N1−1 A2T C L2 + A2 N1−1 A2T ) −1
W2

 
δ= − N1−1U1
1424 3
+ N1−1 A2T (C L2 +
−1 
)
A2 N1−1 A2T  A2 N1−1U1
123 

− W2 

δ (− )  −δ ( − ) 

Now set

## δ (+ ) = δ (− ) − N1−1 A2T (C L2 + A2 N1−1 A2T )

−1
( A2 δ (− ) + W2 )
∴ u ,1 u ,1 u ,u u,n 2 n ,n n ,u u,u u,n n2 ,u u ,1 n2 1
2 2
14424244432
n2 ,n2
δ (+ ) = δ u ,1 (− ) − K u ,n ( A2δ (− ) + W2 )
2

(
K = N1−1 A2T C L2 + A2 N1−1 A2T )−1

K is known as the Gain Matrix, which quantifies how much each new observations will
contribute to the corrections to the parameters

## Note: 1.) Only an (n2 x n2) inversion is required

2.) The parameters before update area
0
Xˆ (− ) = X + δˆ (− )

## 3.) The updated parameters are

0
Xˆ (+ ) = X + δ (+ )
0
Note: that X (The POE) is the same in both cases

## Understanding the Gain Matrix:

(
K = N1−1 A2T C L2 + A2 N1−1 A2T )
−1

• If the 2nd set of observations, L2, is imprecise, i.e. CL2 has large elements. The Gain Matrix
will generally have small elements. Thus the new observations will not greatly contribute to
solution update.

• As the precision of L2 increases, CL2 element decreases, L2 tends to contribute more the
solution update.

• δ (+ ) = δ (− ) − K ( A2δ (− ) + W2 )
14 4244 3
of the form Aδ +W =V

• That is

## • V2 (− ) = A2δ (− ) + W2 The predicted residuals of innovations vector. Hence, the estimated

residual after update is given by

• V2 (+ ) = A2δ (+ ) + W2
Covariance Matrices

## C X (− ) = Cδ (− ) = N1−1 = A1T C −1 L1 A1−1( )

Cδ (+ ) = Cδ (− ) − KA2Cδ (− )

Note: the Cδ(+) < Cδ(−) due to the subtraction of KA2Cδ(−) and thus the covariance matrix ins

## 8.3. Sequential Solution of Linear parametric Models

Step 1:
Model: L1 = F1(X)

• Linearized Model:

A1 δ + W1 = V1
n1 ,u u ,1 n1 ,1 n1 ,1

0
• P.O.E. : X
• Initial solution with n1 observations (n1 ≥ u ) ,

## δ (− ) = − A1T P1 A1 ( )−1 A1T P1W1

= −(N1 )−1U1

where
P = σ 02C L−11
Cδ (− ) = N1−1

Step 2:
Model: L2 = F2 ( X )
A2 δ + W2 = V2
Linearized model:
n2 ,u u ,1 n2 ,1 n2 ,1

## Update solution with n2 observations, (n2 ≥ 1)

δ (+ ) = δ (− ) − K (A2δ (− ) + W2 )
Note: δ(−) is the solution from step 1 (i.e. from the 1st group of observations)
Final Parameter estimates

Xˆ = X 0 + δ (+ )

## Updated covariance matrix

Cδ (+ ) = C Xˆ = Cδ (− ) − KA2 Cδ (− )

Step 3:
Addition of 3rd set of observations

L3 = F3 ( X )

δ(+) and Cδ(+) from step (2) become δ(−) and Cδ(−) respectively, for step (3).

## Note: if n2 >> u, summation of normals is the preferred method

8.4. Summation of Normals and Sequential LS for the implicit Models:

The combination of models discussed so far have mainly derived for the parametric model

The derivation of the summation of normals and sequential LS for the implicit models follow the
same scheme as the parametric case.

PARAMETRIC IMPLICIT
δ = −(N1 + N 2 )−1 (U 1 + U 2 )
δ = −( N1′ + N 2′ ) −1 U 1′ + U 2′ 
 
N i = AiT Pi Ai
U i = AiT PiWi ′
( −1
N i = AiT B i Pi BiT ) −1
Ai

Cδ = C Xˆ = ( N i + N 2 )
−1

( −1
U i = AiT Bi Pi −1 BiT Wi
14243
)
Mi
N i as before

Pi = σ 02 C L−i1 & C Li = σ 02 Pi −1

−1
′ ′
Cδ = C Xˆ =  N i + N 2 
 

δ (+ ) = δ (− ) − K ( A2δ (− ) + W2 ) Change
(
P1 by Bi P1−1B1T ) −1

## δ (− ) = −(A1T P1 A1 ) (A1T P1W1 )

(B C )
−1
T −1
14243 1 424 3 or
N1 U1 i L1 B1

(
K = N1−1 A2T C L2 + A2 N1−1 A2T ) −1
AND

C L2 → B2C L2 B2T
8.5. Parameter Observations

• Parameter observation is a method which can be used in cases where a priori information

• ( )
For example, station coordinates (or elevations) and their covariance matrix Xˆ and C xˆ and
may be available from a previous adjustment.

• In this case xˆ can be considered as a direct observations (with C xˆ ) along with some
observations vector to estimate better estimate of xˆ .

Functional Model:

Xˆ uobs ˆ
,1 = X u ,1

## Linearized Functional Model:

X obs = V X = X 0 + δ

0 
V X = δ +  X − X obs 
 

V X u ,1 = δ u ,1 + W X u ,1 (V = Aδ + W )

## with the stochastic model C X u , u = PX−u1, u

• Note: a variable (observation or parameter) that has infinite variance, σ2→∞, has a
corresponding weight of P = 1 2 = 0 . In this case, the variable becomes an unknown
σ
parameters.

## • A variable with zero variance σ 2 = 0 , has infinite weight, P → ∞ , and therefore is

regarded as a constant.

• In between these two extreme cases there are an infinite number of possibilities for
weighting parameters.
Functional Models (Linearized)

An , uδ u ,1 + Wn ,1 = Vn ,1

I u ,uδ u ,1 + W X n ,1 = Vu ,1

## with the stochastic models:

C L = P −1 (P = C )
−1
L

C X = PX−1 (P = C )
X
−1
X

Variation function

φ = V T PV + V XT PX V X = min
( ) (
= δ T AT + W T P ( Aδ + W ) + δ T + W XT PX (δ + W X ) ) = min

= δ T AT P A δ + δ T AT P W + W T P A δ + W T P W
+ δ T PX δ + δ T PX W X + W XT PX δ + W XT PX W X = min

= δ T AT PA δ + 2 δ T AT P W + W T P W
+ δ T PX δ + 2 δ T PX W X + W XT PX W X = min

Minimize φ

∂φ
= 2δ T AT PA + 2W T PA + 2δ T PX + 2WXT PX = 0
∂δ

## Transpose and divide by 2

AT PAδ + PX δ + AT PW + PX W X = 0

(A T
) ( )
PA + PX δ + AT PW + PX W X = 0

(
δ u ,1 = − AuT,n Pn,n An,u + PX u ,u
) (A
−1 T
P Wn ,1 + PX u ,u WX u ,1
u ,n n ,n )
Variance-Covariance matrix of Estimated Correction Cδ:

Functional Model
−1  
δ = −(AT PA + PX )  AT P F  X  − L  + PX  X − X obs  
 0
 0

      

T T
 ∂δ   ∂δ   ∂αδ   ∂δ 
∴ Cδ =  C L   +  obs C X  obs 
 ∂L   ∂L   ∂X   ∂X 

−1
 T 
Cδ =  1
A23 + PX 
PA
 N 

## δ = −(AT C L−1 A + C X−1 )(AT C L−1W + C L−1W X )

• If the observed parameters are very precise, i.e. Cx has small elements, therefore C X−1 will
have large elements, so as (AT C L−1 A + C X−1 ) and therefore its inverse will be very small.
Finally δˆ will be smaller than if it calculated without the additional parameter observations.

• If the parameters observations are not precise, Cx will has large elements, C X−1 will have
small elements. Thus the parameter observation will have little contribution to solution
vector.
ENGO361, Dr. Naser El-Sheimy 1/15

9. Statistical Analysis
In Chapter 2, we defined the one–dimensional Gaussian Probability Distribution Function (PDF)
of normally distributed random variable as:
( x − µ )2
− x
2σ 2
1 x
f ( x) = e ,
σ 2π
x
where µ and σ are the mean and standard deviation of variable (x). And the Normal Distribution
Function (NDF) as:

( x − µ )2
x

t 2σ 2
1
F ( x) = ∫ e x dx
− ∞ σ x 2π

Where (t) is the upper bound of the integration as shown is the Figure below.

f(x)

## Area under the

curve

t
• As stated before that the area under the curve represents the probability of occurrence.
Furthermore, the integration of this function yields the area under the curve.

• Unfortunately, the integration of the Equation (1) cannot be carried out in closed form, and
thus numerical integration techniques must be used to tabulate values of the this function.
This has been done for the function when the mean is Zero (µ = 0) and the variance is 1 (σ2 =
1).
ENGO361, Dr. Naser El-Sheimy 2/15

The result of this integration is shown in the following table. The tabulated values represent the
areas under the curve from –∞ to t.

t Decimals of (t)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
-3.2 .
-3.1 .
. .
. .
. .
-0.2 .
-0.1 .
0.0 .
0.1 .
0.2 .
. .
.
1.6 . . . . . . . . 0.95352
.
3.2

Example:

9 To determine the area under the curve from –∞ to 1.68 : Scan along the raw 1.6 and
under the column 8. At the intersection of raw 1.6 and the column 0.8 (1.68), the value is
0.95352 occurs.

Meaning:

• Since the area under the curve represents probability, and its maximum area is 1, this means
that there is 95.352% (0.95352 x 100) probability that (t) is less that or equal to 1.68.
• Alternatively, it can be stated that there is a 4.648% (1-0.95352) x 100 probability that (t) is
greater than 1.68.

## 9.1. The Standard Normal Distribution Function

9 Once available, this table can be used to evaluate the distribution function of any mean, µ,
and variance σ2.

9 For example, if y is a normal random variable with a mean, µy, and standard deviation σ y, an
equivalent normal random variable z = (y - µ y)/σ y can be defined that has µ = 0 and σ2 = 1.

## 9 Substituting the definition for Z with µ z = 0 and σ z2 = 1 into the NDF

ENGO361, Dr. Naser El-Sheimy 3/15

Z2
t 1 −
N (Z ) = ∫ e 2 this is known as the Standard Normal Distribution
− ∞ 2π
Function.

## • P(Z < t) = N(t)

• P(a < Z < b) = N(b) – N(a)

-Z +Z
a b

• If –a = b = t
• P(-t < Z < t) = P(|Z| < t ) = N(t) – N(-t)

## • From Symmetry of the normal distribution, it is seen that:

-Z +Z
-t t
• P(Z > t) = P(Z < -t)

• 1 – N(t) = N(-t)

## • Therefore, P(|Z| < t ) = N(t) – N(-t) = 2 N(t) – 1

ENGO361, Dr. Naser El-Sheimy 4/15

## Probability of the Standard Deviation (Standard Error)

From the above characteristic, the probability of the standard deviation can be estimated as
follows:

## P(- σ < Z < σ ) = N(σ) – N(-σ) …….. Recall that σ = 1

Looking into the table for t = 1 and t = -1, the area between -σ and σ is:
P(- σ < Z < σ ) = 0.84134 – 0.15866 = 0.68268

G(ε)

-3σ -2σ -σ σ 2σ 3σ

Ρ(− σ ≤ ε ≤ σ ) = 0.683
Ρ(− 2σ ≤ ε ≤ 2σ ) = 0.954 called confidence interval
Ρ(− 3σ ≤ ε ≤ σ ) = 0.997

Meaning:

For any group of measurements there is approximately a 68.3 % chance that any single
observation has an error ±σ. This is true for any set of measurements having normally distributed
errors.

## 9.2. The 50% Probable Error:

For any group of observations, the 50% probable error establishes the limits within which 50%
of the errors should fall.

In other words, any measurement has the same chance of becoming within these limits as it has
as falling outside them.

## P(|Z| < t ) = N(t) – N(-t) = 0.5

=2 N(t) – 1 = 0.5

Therefore

## 1.5 = 2 N(t) or 0.75 = N(t)

ENGO361, Dr. Naser El-Sheimy 5/15

From the SNDF tables it is apparent that 0.75 is between t value of 0.67 and 0.68, that is
N(0.67) = 0.7487 and N(0.68) = 0.7517

## ∆t = 0.7500 − 0.7486 = 0.0014 = 0.4516

0.68 − 0.67 0.7517 − 0.7486 0.0031
or ∆t = 0.01 x 0.4516

## and therefore, t = 0.67 + 0.0045 = 0.6745

For any set of observations, therefore, the 50% probable error can be obtained by computing the
standard error and then multiplying it by 0.6745, i.e.

E50 = 0.6745 σ

## 9.3. The 95% Probable Error:

• The 95% probable error, or E95, is the bound within which, theoretically, 95% of the
observation group’s errors should fall.

## • The error category is popular with geomaticans for expressing precision.

• Using the same reasoning as in the in the equations for the 50% probable error:

## P(|Z| < t ) = N(t) – N(-t) = 0.95

= 2 N(t) – 1 = 0.95
Therefore

## 1.95 = 2 N(t) or 0.975 = N(t)

• From the SNDF tables it is determined that 0.975 occurs with a (t) value of 1.960.

• Thus, for any set of observations, therefore, the 95% probable error can be obtained by
computing the standard deviations (or error) and then multiplying it by 1.96, i.e.

E95 = 1.96 σ

## 9.4. Other Percent Probable Errors

Using the same computation procedures, other percent error probable can be calculated.
ENGO361, Dr. Naser El-Sheimy 6/15

## Probable Error Multiplier

E50 0.6745σ

E90 1.6449 σ
E95 1.96 σ
E99 2.576 σ
E99.7 2.965 σ
E99.9 3.29 σ

The E99.7 is often used in detecting blunders as will be explained in the following example.

Example:

The arc-second portion of 50 direction readings from 1″ instrument are listed below. Find the
mean, standard deviation (error), and the E95. Check the observations at a 99% level of
certainty for blunders.

41.9 46.3 44.6 46.1 42.5 45.9 45.0 42.0 47.5 43.2 43.0 45.7 47.6
49.5 45.5 43.3 42.6 44.3 46.1 45.6 52.0 45.5 43.4 42.2 44.3 44.1
42.6 47.2 47.4 44.7 44.2 46.3 49.5 46.0 44.3 42.8 47.1 44.7 45.6
45.5 43.4 45.5 43.1 46.1 43.6 41.8 44.7 46.2 43.2 46.8

Solution:
50
∑ Li
2252
The mean = =1 =
i = 45.04′′
n 50
50

∑ (Mean − V ) i
2

## The standard Deviation = i =1

= ±2.12″
50 − 1

∴ E95 = ±1.96 σ = ±4.16″ (Thus 95% of the data should fall between 45.04 ± 4.16″ or in the
“40.88 - 49.20” range)

The data actually contain three values that deviate from the mean by more than 4.16 (i.e. that
are outside the range 40.88 to 49.20). They are 49.5 (2 times) and 52.0. No values are less
than 40.88, and therefore 47/50 = 94% or the measurements lie in the E95 range.
ENGO361, Dr. Naser El-Sheimy 7/15

∴ E99 = ±2.576 σ = ±5.46″ (Thus 99% of the data should fall between 45.04 ± 5.46″ or in the
“39.58 - 50.50” range).

Actually, one value is greater than 50.50, and thus 49/50 = 98% of all measurements fall in
this range.
ENGO361, Dr. Naser El-Sheimy 8/15

## 9.5. Confidence Intervals and Statistical Testing

• The following contains a finite population of 100 values. The mean (µ) and the variance
(σ2) of that population are 26.1 and 17.5.

## 18.2 26.4 20.1 29.9 29.8 26.6 26.2

25.7 25.2 26.3 26.7 30.6 22.6 22.3
30.0 26.5 28.1 25.6 20.3 35.5 22.9
30.7 32.2 22.2 29.2 26.1 26.8 25.3
24.3 24.4 29.0 25.0 29.9 25.2 20.8
29.0 21.9 25.4 27.3 23.4 38.2 22.6
28.0 24.0 19.4 27.0 32.0 27.3 15.3
26.5 31.5 28.0 22.4 23.4 21.2 27.7
27.1 27.0 25.2 24.0 24.5 23.8 28.2
26.8 27.7 39.8 19.8 29.3 28.5 24.7
22.0 18.4 26.4 24.2 29.9 21.8 36.0
21.3 28.8 22.8 28.5 30.9 19.1 28.1
30.3 26.5 26.9 26.6 28.2 24.2 25.5
30.2 18.9 28.9 27.6 19.6 27.9 24.9
21.3 26.7

• By randomly selecting 10 values of this table, an estimate of the mean and the variance of
the this sample can be estimated.

• However, it would not be expected that these estimates ( X and S2) would exactly match
the mean and the variance of the population. Now if the sample size were increased, it
would be expected that the X and S2 would more nearly match µ and σ2 as shown in the
table below..

## Increasing sample sizes

No. X S2
10 26.9 28.1
20 25.9 21.9
30 25.9 20.0
40 26.5 18.6
50 26.6 20.0
60 26.4 17.6
70 26.3 17.1
80 26.3 18.4
90 26.3 17.8
100 26.1 17.5
ENGO361, Dr. Naser El-Sheimy 9/15

• Since the mean and the variance of the sample ( X and S2) are computed from random
variable, they are also random variables. Thus it is concluded that the values computed
contain errors.

## Random sample sets from population (µ = 26.1 and σ2=17.5 ).

Set 1: 29.9, 18.2, 30.7, 24.4, 36.0, 25.6, 26.5, 29.9, 19.6, 27.9 X = 26.9, S2 = 28.1

Set 2: 26.9, 28.1, 29.2, 26.2, 30.0, 27.1, 26.5, 30.6, 28.5, 25.5 X = 27.9, S2 = 2.9

Set 3: 32.2, 22.2, 23.4, 27.9, 27.0, 28.9, 22.6, 27.7, 30.6, 26.9 X = 26.9, S2 = 10.9

Set 4: 24.2, 36.0, 18.2, 24.3, 24.0, 28.9, 28.8, 30.2, 28.1, 29.0 X = 27.2, S2 = 23.0

• Fluctuations in the mean and variance computed from sample sets raises questions about
the reliability of these estimates.

• A higher confidence value is likely to be placed on a sample set with small variance than
on one with large variance.

• In the above table, because of its small variance, one is more likely to believe that the
mean of the 2nd set is the most reliable estimate of the population mean. In reality this is
not the case, as the means of the other three sets are actually closer to the population
mean (µ = 26.1).

• The estimation of the mean and variance of a variable from sample data is referred as
point estimation (because it results in one value for each parameter in question)

• After having performed a point estimation, the question remains as how much the
deviation of the estimate is likely to be from the still unknown true values of the
parameters (mean and variance). In other word, we would like to have an indication of
how good the estimation is and how much it can be relied on.

• An absolute answer to this question is not possible because sampling never lead to
the true parameters.

• It is only possible to estimate probabilities with which the true value of the parameters in
question is likely to be within a certain interval around the estimate. Such probabilities
can be determined if the distribution function F(X) of the random variable is given.

Recall: The probability that a random variable Z takes values within the boundary “a and b”
is given by:

X =b
P(a < Z < b) = F(b) – F(a) = ∫ F ( X ) dX
X =a
ENGO361, Dr. Naser El-Sheimy 10/15

## • By analogy to this, the probability statement for a confidence interval of a parameter Z,

the estimate of which is Ẑ , is
• P(a < Z < b) = 1–α

## • Where (1 – α) is called the confidence level or degree of confidence which is

conventionally taken to be 90%, 95%, or 99%. And the values of “a and b” are called the
“upper and lower confidence limits” for the parameter Z.

• The probability that the parameter does not fall in a given interval is α.

## • Given: X , σ2, and n (number of observation) ….… (i.e. σ is known)

• Required: the (1 – α ) confidence interval on µ (which is unknown).

## • Recall: X − µ is normally distributed with mean 0.0 and variance 1.0

σ
n

• The probability statement for the confidence interval, which is symmetric here, is
then:

 
 X −µ 
P − Z < <Z = 1−α
 α /2 σ α / 2 
 n 
or
P X − Z ⋅σ < µ < X +Z ⋅σ 
= 1−α
 α /2 n α /2 n 

• For example for α = 5%, Zα/2 = 1.96, therefore we can write the above equation as:

## P X −1.96σ < µ < X + 1.96σ 

= 0.95
 n n 

• The above is an example of the so called two-sided confidence interval. On the other
hand, for a one-sided confidence interval we write:

## P µ < X + Zα (σ )  = 0.95

 n 
• Therefore for α = 5%,

## • Pµ < X + 1.6449σ 

= 0.95
 n 
ENGO361, Dr. Naser El-Sheimy 11/15

Two-sided One-sided

α/2 α/2 α
1-α 1-α

-Zα/2 Zα/2 Zα

Example:

Suppose a distance is measured n = 8 times with the mean X = 10.1 cm. We assume that the
variance of the normal population is known to be σ2 = 0.1 cm2. Then the 95% confidence interval
on µ (which is unknown) for the two-sided confidence interval is:

## P  X − 1.96σ < µ < X + 1.96σ 

= 0.95
 n n 

 0.1 0.1 
P 10.1 − 1.96 < µ < 10.1 + 1.96  = 0.95
 8 8 

## P{9.88 < µ < 10.32}= 0.95

For a one sided interval,

## P µ < X + 1.6449σ 

= 0.95
 n 

 0.1 
P µ < 10.1 + 1.6449  = P{µ < 10.28}= 0.95
 8 

• Let us know consider the case in which the standard deviation of the distribution σ is not
known and has to be repalced by the standard deviation of the, S, of the sample. Therefore,
the estimator under testing is X − µ which has different distribution than X − µ (definitely
S σ
n n
X − µ has different distribution than X − µ as well be shown in the next section)
S σ
n n
ENGO361, Dr. Naser El-Sheimy 12/15

## 9.7. Some Often Used Distributions:

In connection with the theory of errors of observations and least squares adjustment, there are
few (one-dimensional) distributions that are often used. Only continuos distributions are
discussed, particularly those used for statistical testing.

## 9.7.1. Gaussian or Normal Distributions:

The Gaussian or ND is the most frequently used distribution in statistical theory. Its density.
Function is given by:
( x − M x )2

1 2σ x2
()
f x = e which is fully described by to parameters Mx and σx.
σ x 2π

X − MX
The cumulative NDF of the standardized random variable Z = (having a zero mean
σX
and unit standard deviation, i.e. [M z = 0 & σ z = 1] ) is given by:

2
-Z
Z
1
F (Z ) = ∫e 2 dz , the table handed during the class gives the values of F(Z)
2π −∞

## 9.7.2. The X2 distribution (chi square)

• The distribution of the sum of squares of independent random variables each of which is
normally distributed is known as X2 (chi square) distribution.

Examples:
• χ n2 = X 12 + X 22 + .............. + X n2 which is χ n2 distribution of “n degrees of
freedom”
• VnT,1Pn , nVn ,1 is a χ n2 distribution of “n degrees of freedom”

## Area under the curve = α

X α2 ,m
ENGO361, Dr. Naser El-Sheimy 13/15

## χα2 ,m : α= Confidence region = area under the curve

m= degrees of Freedom

## m\α 0.995 0.99 …… 0.0005

1
2
.
.
(
P χ m2 > χα2 , m = α )
∫ f (χ )dχ 2
. ∞
2
=
χ α2 ,m
120 χ α2 , m

∫ f (χ )dχ
2 2
=1−
0

## 9.7.3. The F (Fisher) Distribution:

• The distribution of the ratio of two independent random variable, each having a X2 (Chi-
square) distribution, is said to have an F-distribution.

χ m2 / m
Fm, n = 2 is F-distribution of (m) and (n) degrees of freedom.
χn / n

• The practical application of the F distribution in least squares adjustment and statistical
testing is concerned with the comparisons of variances such as those obtained from two
adjustment. In some practical case the comparison may be between a variance obtained
from the adjustment (such as σˆ 02 ) and an a priori given reference variance (such as σ 02 ),
2
this case refers to the Fr , ∞ = χ r (where r is the degrees of freedom).

∑ v 2 is χ n2−1
2
Recall: ∑v , where distribution
variance = σ 2 =
n{−1
degrees of Freedoms
ENGO361, Dr. Naser El-Sheimy 14/15

Fα , m1 , m2

Fα , m1 , m2

## e.g. F0.05, m1, m2

m1 \ m2 1 2 …. 16
1
2
3
.
.
( )
P Fm m > F0.05, m , m = 0.05
1 2 1 2
. ∞
= ∫ f F dF ( )
F0−05,m1,m 2
F0.05,m1,m 2
=1− ∫ ( )
f F dF
0
100
.
.
.

500
.
.
.

ENGO361, Dr. Naser El-Sheimy 15/15

## 9.7.4. The t (student) Distribution

The t-distribution is used in connection with sampling (i.e. Testing using sample statistics

′ ′ ′
Let X 1 , X 2 ,....... X n be n independent random (stochastic) variables of identical normal
distribution with mean M and standard deviation σ Then the random variable.

x−M
t= n is said to have a t-distribution with (n-1) degrees of freedom. Where
s

1 n 1 n
x= ∑ xi and s 2
= ∑ (xi − x )2 , where x and s 2 are the sample mean and
n i =1 n − 1 i =1
standard deviation respectively.

curve = α

tα,b

## m\α 0.25 0.2 … 0.0005

1
2
. P (tm > tα , m ) = α
.
α
.
= ∫ f (t )dt
tα ,m
tα ,m
= 1− ∫ f (t ) dt
−∞
120
THE UNIVERSITY OF CALGARY
DEPARTMENT OF GEOMATICS ENGINEERING

## ENGO 361: Adjustment of observations

MIDTERM - WINTER 1999
February 25, 1999
Open Book – All Calculators Allowed
Time: 60 minutes (30% of final course grade)

Question 1: (9%)

The opposite figure shows a triangulation station (O) from which the horizontal directions d1, d2,
and d3 were measured to the three stations A, B, and C respectively
with: A
d1 = 45o 15’ 25” d1
2 0 0
d2 = 75o 25’ 35” C L = 0 2 0 arcsec2 α1 d
2 B
0 0 2 O
α2
o
d3 = 115 35’ 45” d3
C
1. Calculate the estimates of the two horizontal angels α1 and α2
and their Variance-Covariance matrix.
2. Discuss the degree of correlation between the two angles

Question 2: (3%)

If you have an Electronic Distance Measuring (EDM) device that has been calibrated to give a
standard deviation for a single measurement = 15 mm, how many times should you measure a
baseline of a length ≅ 5 km, and whose relative error is specified not to exceed 2 PPM?

Question 3: (3%)
h2 B
The opposite figure shows a leveling line which runs
from a benchmark (A) of known elevation (Ha) to
point B. The observed height differences h1 and h2 are h1
to be observed with the same precision and
uncorrelated.
Calculate how many times should h1 and h2 be A
measured to have the standard deviation of the
elevation of point B not to exceed 3 mm? Assume that the standard deviation of a single height
difference observation = 5mm.

## ENGO 361 – Midterm – W99

Page 1 of 2
Question (4) : (7%)

A distance is measured five times with the following values obtained: 156.11, 156.14, 156.08,
156.05, and 156.15 m. The first two measurements have a standard deviation of 1 cm and the last
three have standard deviation of 2 cm. All measurements are uncorrelated.
1. What is the weighted least-squares estimate of the distance?
2. What is the standard deviation of the estimated distance?

## Question (5): (8%)

1. Write down in matrix form the condition equations for the following triangulation networks:

(a) (b)

α1
α1 α3
α4 α3
α4 α2
α2

## 2. Write down in matrix form the indirect mathematical model

for the following leveling network, where: b
h2 h3
 h1 
h  d h5
 2 h  e
L =  h3  X =  d  and h1 h4
   he  points a, b, and c are benchmarks.
h4  a
 h5  c

Good Luck

NE\ne

Page 2 of 2