By //. Passing
Abt. für Praktische Mathematik, Hoechst AG, Frankfurt/Main 80 and
W. Bablok
Allg. Biometrie, Boehringer Mannheim GmbH, Mannheim
Summary: Procedures for the statistical evaluation of method comparisons and Instrument tests often have
a requirement for distributional properties of the experimental data, but this requirement is frequently not
met. In our paper we propose a new linear regression procedure with no special assumptions regarding
the distribution of the samples and the measurement errors. The result does not depend on the assignment of
the methods (instruments) to X and Y. After testing a linear relationship between X and confidence
limits are given for the slope and the intercept a; they are used to determine whether there is only a
chance difference between and l and between and 0. The mathematical background is amplified separat
ely in an appendix.
Ein neues biometrisches Verfahren zur Überprüfung der Gleichheit von Meßwerten von zwei analytischen
Methoden
Anwendung von linearen Regressionsverfahren bei Methodenvergleichsstudien in der Klinischen Chemie, Teil l
Zusammenfassung: Bei der statistischen Auswertung von Methodenvergleichen und bei Geräteerprobungen
werden in der Regel Verfahren eingesetzt, deren Anforderungen an die Verteilung der experimentiellen
Daten häufig nicht erfüllt sind. In unserer Arbeit schlagen wir daher ein neues lineares Regressionsver
fahren vor, das keine besonderen Annahmen für die Verteilung der Stichprobe und der Meßfehler voraus
setzt. Das Ergebnis ist unabhängig von der Zuordnung der Methoden (Geräte) zu den Variablen X und Y.
Nach Prüfung eines linearen Zusammenhanges zwischen X und Y werden Vertrauensgrenzen für die Stei
gung und den Achsenabschnitt angegeben. Mit ihrer Hilfe werden die Hypothesen = l und = 0 ge
testet. Die mathematischen Grundlagen werden separat in einem Appendix abgehandelt.
Contents 1. Introduction
1. Introduction Parameter estimation in linear regression models is
one of the standard to
2. Method Comparison and Linear Regression Pics >n basic statistical text
„ . ^T _ . . books. With the widespread use of pocket calculators
3. A New Regression Procedure the computation of linear regression lines based on
4. Discussion and Examples least Squares has become routine work in scientific
5. Appendix: Mathenaatieal Derivations research. Unfortunately, many textbooks cover the
model assumptions rather briefly and only rarely the For the ith sample this relationship is described by
reader is warned of the consequences if those as the equations
sumptions are violated. However, it is in the very
nature of many experiments which call for linear re
and
gression that most of the assumptions cannot be ob
served. For instance, in many situations there is no • ι
independent variable which is free of error; but pro xf and yf denote the expected values of this sample
cedures taking this into account (1,2) are based on and ξί and ηί give the measurement errofs. In this
more rigid and idealized distributional requirements way each method may have its own expected value
than can be met in real experiments. for the ith sample.
If there exists a structural relationship between the
two methods it can be described by the linear equa
tion
2. Method Comparison and Linear Regression y t  a + x*
Clinical chemistry is one of the areas where linear From the n experimental values (Xj,yO the follow
regression models play a major role in the statistical ing objectives should be attained:
evaluation of experiments. Especially in comparing
analytical methods for measuring the same chemical i) estimation of α and ;
substance or in Instrument testing the need for re ii) statistical test of the assumption of linearity; and
liable parameter estimation becomes very obvious in if linearity is given
the judgement of equality. Clearly a method com
parison cannot exclusively be based on the evalua iii) test of the hypothesis β = 1;
tion of a regression model. Many more properties of iv) test of the hypothesis α = 0.
the methods must be compared; at least accuracy, If both hypotheses are accepted we can infer y* =
imprecision, sensitivity, specificity, and r nge of con xf, i.e. the two methods X and Y measure the same
centration should be studied. For a detailed discus concentration within the investigated concentration
sion see I.e. (3,4). r nge.
The experimental layout can be described s follows: In practice one of the following four procedures is
There are 2 different methods (Instruments) which used:
measure the same chemical analyte in a given me
dium (e.g. serum, plasma, urine, ...). The question [1] linear regression yi = α +
is: Do the methods measure the same concentration [2] linear regression Xj = A + By, + %
of the analyte or is there a systematic difference in
the measurements? (For simplicity, we only refer to [3] principal component analysis (Deming's proced
concentrations but our Statements are also valid for ure) (5)
any other quantity.) [4] standardized principal component analysis (5, 6,
The usual experimental procedure is to draw n inde 7)
pendent samples from a population in which a given All four procedures assume a linear relation between
analyte is to the measured with values Xi and yj for the two methods, however each one has specific theo^
the ith sample. These measurements are realisations retical requirements:
of a pair of random variables X and Y, where X re
presents the values of method l and Υ the values of — [1] and [2] ask for an errorfree independent vari
method 2. For simplicity we also denote method l able X or Y and normally distributed error terins
by method X and method 2 by method Y. with constant variance. A statistical test of linear
ity can be performed only if there exist multiple
Following the statistical model of I.e. (5) each ran measurements of the dependent variable1). Pro
dom variable is the sum of two components: cedures [1] and [2] are not equivalent.and may
 ,one variable representing the Variation of the ex even give contradictory results.
pected value of the analyte within the population
of all possible samples;
 one variable representing the Variation of the *) Strictly speaking this test should only be used if the independ
ent variable has fixed values, s is assumed in the usual least
measurement error for a given sample. squares linear regression.
 [3] and [4] assume that the expected values x* 3. A New Regression Procedure
and y* come from a normal distribution. The er
On the basis of the structural relationship model äs
ror terms have to be normally distributed with a
described in chapter 2 we make the following as
constant variance  and o^; they follow the re
sumptions:
strictions
for [3] x*,y* are the expected values of random variables
from an arbitrary, continuous distribution (i.e.
and
the sampling distribution is arbitrary).
, are realisations of random error terms, both
A statistical test of linearity has not so far been pro coming from the same type of distribution.
posed. Their variances  and ojj need not to be con
stant within the sampling ränge but should re
However, in method comparison studies we general main proportional, that is
ly find the following Situation:
— Neither method X nor method is free of ran
dom error.
— The distribution of the measurement errors is usu In part II of our paper we shall demonstrate that
ally not normal (8). these rather weak assumptions are sufficient for reli
able parameter estimations and hypothesis testing if
— The expected values x* and yf are not a random — 1. There we shall investigate the influence of the
sample from normal distributions, since the me distributions on the result of our procedure.
thods are compared over a wide concentration
ränge of the analyte which covers values of both
healthy and diseased persons. i) Estimätion of and
— Extreme values (outliers) are not necessarily gross According to Theil (12) the slopes of the straight
measurement errors; they may be caused by dif lines between any two points are employed for the
ferent properties of the methods with respect to estimätion of ß. They are given by
specificity or susceptibility to interferences. There
fore they should not be removed from the calcula for l < i < j < n.
i  Xj
tion without experimental reason.
— The variance of the measurement errors is not There are (^ ] possible ways to connect any two
constant over the ränge of concentrations; in fact points. W
the variability increases with the magnitude of the Identical pairs of measurements with
measurements.
xi = x j a n d y i = y j .
Therefore it must be expected that a researcher using
any of the above procedures may obtain biased esti do not contribute to the estimätion of ß; the cor
mations for and ß, and therefore misleading re responding Sij is not defined at this stage. For reasons
sults from the experiment. This Situation is equally of symmetry (see appendix) any S,j with a value of
disappointing for the investigator and the statistician. — l is also disregarded.
Furthermore, from x{ = Xj and y} yj it follows that
In the last 20 yeärs many different proposals have
Sy = ± «>, depending on the sign of the difference
been published forparäineter estimätion in the linear
y$  ^ ; from x{ Xj and y, = yj it follows that Sy = 0.
mpdel, using less stringent distributional assumptions.
Since (X,Y) is a continuous bivariate variable the
The estimations were either'based ori robust pro
occurrence of any of these special cases has a prob
cedures [for a detailed discussion with references see
ability of zero (experimental data should exhibit these
Lc. (9) and also i.e. (10)] or on a distributionfree
cases very rarely). In total there are
approach (11). We are, however, not aware of a pro
posal which deals with the problem of a structural
relationship.
We now describe a procedure which can achieve all slopes Sjj. After sorting the Sy the ranked sequence
the objeetives (i) to (iv) and does not require spe S(l) ^ S(2) < . . . . ^S(N)
cific assumptions regarding the distributions of the
expected values or the error terms. is obtained.
If we substitute the structural relationship in the de The estimation of requires that at least one half of
finition of the S\} we find the points is located above or on the regression line
and at least one half of the points below or on the
ytyt  line. As (X, Y) is a continuous bivariäte variable then
 an equal number of points lies above and below the
From yf = + ßx* and djj = (x*  x*) we get regression line with probability L pöint (x»yi) is
located above the line only if a < yi — bx,. Therefofe
o ß · djj + ( ,  ^)
it can easily be shown that
^ dij + (6  ;)
a = med (y^ — bxi}
ß is an estimator of a.
de + d  §) If bL denotes the lower and bu the upper limit of the
d + Zj confidence interval for ß then the corresponding H
^ dij + Zg ' mits for are given by
where Zjj and z§ are indepedent and from the same aL = med {yj  byXj}
distribution. ay == med {y{ * bLXi}.
Since the values of Sjj are not independent it is ob
These limits are conservative.
vious that their median can be a biased estimator
of ß. We proceed therefore äs follows: With the n pairs of measurement (Xi,yj) one can
either calculate
Let K be the number of values of Sjj with Sjj < 1.
Then using K äs an offset, ß is estimated by the y* = ahbx* or x* = A + By*.
shifted median b of the S(\\:
The above estimators for and ß show the following
, if N is odd property:
b= =" and A= —
b '
S +K if N is even
y ' ( (T ) · analogous förmulas hold for their confidence limits.
The proof is given in the appendix. 1t is.therefore
For the construction of a twosided confiderice inter irrelevant which one of the two methods is denoted
val for ß on the level let w^ denote the 11  ^ lq
quan byX.
2 \ *· l '
tue of the standardized normal distribution. ii) Statistical fest of the assumption of linearity
In testing for linearity one has to inspeet how the
With regression line fits the data or how randomly the
n ( n ~ l)(2n + 5) data scatters about y* = a + bx*. Naturally the para
18 meters a and b are fixed in this context and oür test
and will be conditional on a and b.
If there is a nonlinear relatioiisMp between x* and y*
one would expect to find too many consecutive meas
(Mi rounded to an integer value) urements either above or below the fitted line. Let l
the confidence interval for ß is given by denote the number of points (Xi,yi) with y\ > a + bxi
and L the mimber of points with y^ < af bxj. To
S(M + K) < ß < S(M2 + K)·
every point (Xi,yj) we assign a score , i.e.
Unfortunately the sequence of scores depends on the Tabl. 1. Critical values of the cusum statistic
way in which the points (Xj,yO are ranked: either by
increasing Xvalues or by increasing Yvalues. That γ (%) hv
is, the result of a test for linearity would depend on l 1.63
which one of the methods is assigned to the X and 5 1.36
which one to the Yvariable. 10 1.22
ϋ
er
ί
α»
^3
α»
0
Ό
Ο£
c
v
X
\ *
χ
δ X
X
X
~ (O
s
I
'X
CD χ
CXI
σ
ο
X
J
v
X
χ
CD
f
ca.
.c
l
»r X
(U x X C5
C x er
σ
ι
χ: χχΧ cn c/)
'i *x
X
σ « X ' y C
L •ο
0
X rv
£= £2.
X
0 x'
ρ Qi CD
χ
Ό Ο
ο c X*'
X
JC
"QJ
ο
·—
:
X
y CD
'~
ε
^L·
σ
α.
xv £
εο δ χ Ό
α>
<—J 1 1 1 1 ll 1 M ι 1 1 1 M 1 ! 1 ll 1 ' 1 MI 1 1 IM 1 M l ': ' l l l l l I I I l l l l l l l l l l ( I M I I l l I I l . l l M I l l r^·»
CD CD CD CD CD , CD CD
to »a· cxi cxi ^j· <o
sanpisay
ι ι ι
«
g 11
l II
^ι
s c ·£
'S 2o **
E
P l II
?ε —8 ***I
C3
σ
o.
•5 — •Ό
E 8
o
o 0X ^Q
TD
O 12
t l
.'S
"o
οc .εo «Ξ^
Ιϊ 1
S O a
c
o ""
8. € e
P 1
i!
rt "ce
C
o
a l
E « >. 0 «
(U
•o
.1 h= JB00 Oe Λ60 ΛW>
3
« ^ !2 CN m
o 11
i
8n δ
g,
H
X ;
g S
o*
X ^
κ
χ
x
c^ l
CM
XJ ir»
O x * *
X
K
X
κ X
0
χ g
£
C C* X 6
c
C5
X
υo σ»
1. i X
xx
5 ^ * X c
ca. «Ό ϊ( 3
g O
X.
v X °^
c5
ι o*
« Xx ^ «K CD
fsi
Ou
T3 "o l»
cD '»X
JC
·*
cS 1
•~
X
X
O
aE
ii
2 £*· 1
* W
o /
«»> t i 1 O Bn o°
O
U3 O
«N* O
fxi =>
C
S S g
sanpi sau lo
8c
'S "c
I 11
O
CO ii
S8
O o>
o X ^
o
o ll

^2 ol
sl
o
c
.go
•g
•
ff
O
to
s. σ> <
o 8
o
o
.c
o»
If
S ii
O >,
oi
ob
In particular the inconstancy of the variances is the and 1.165 respectively; the latter is significantly dif
reason for using only signs for the estimation of ferent from 1. We then move the extreme point to
and the test for linearity. Moreover, all measure the value (1180, 1190) and calculate a second time.
ment points (Xj,y») have equal weights in the estima The new results are 0.994 and 1.043, which are both
tion of the regression line; therefore extreme points not significant.
do not show undue influence on the calculation. The It can be argued that this is not ä'fair comparison
same is true if the ränge of concentration is rather with a pf ocedure for which the sampling of the data
large (i.e. the ränge covers several powers of 10). is not adequate. However, the data comes from a
These theoretical arguments are supported by three real experiment and the standardized principal com
figures all based on the same set of samples measur ponent analysis is a recommended and often used
ed by methods l, 2 and 3. In figure l method l evaluation procedure (4).
is assigned to X and method 2 to Y. In figure 2 this The procedures [1] to [4] are related to each other.
assignment is interchanged. Obviously, all plots with For instance, the slope estimated by procedure [4] is
in figure 2 are obtained from the corresponding ones eqüal to the geometric mean of the slope from pro
in figure l by reflection, showing the independence cedure [1] and the reciprocal of the slope from pro
of the assignment to X and Y. We have chosen this cedure [2]. Furthermore, the slope of [4] will al·
example to demonstrate this property of our proce ways be greater than or equal to that of [1], because
dure even though the estimation of is clearly dif b[4] = b[l]/r. Since b[4] is just the ratio of two
ferent from 1. Standard deviations, it is independent of any jpint
In figure 3 we cfemonstrate how one extreme point function of X and Y. Under the assumption of normal·
can influence the outcpme of the comparison of ly distributed error terms and expected values
method l with method 3. First we calculate our the ratio
procedure and the standardized principal component
with the original data set, i.e. including the extreme
point (1180, 1398). The estimations of are 0.998
is estimated by the ratio of the Standard deviations
of the measurements. Since the Standard deviation
depends only on the sum of the variances of the two
UOOF
Variation components it cannot reflect any independ
ent change in their distributional properties. In con
trast the estimator
b = med \
present for various Computers. For small desk Com considered: normal distribution, mixture of two nor
puters with a Standard memory size we have written mal distributions, and a skew distribution. c is varied
a PASCAL program which allows the evaluation of between 2 and oo, n from 40 to 90 and both CV's
up to 70 samples. With sufficient memory this num are varied independently of each other from l % to
ber can easily be extended. This program is avail 13%. The slope b is calculated for every of 500 data
able on request. In addition a BASIC program writ sets which are generated for each choice of para
ten for a HP 85 desk Computer can be requested; it meters and distributions and the median of this 500
can easily be adapted to the B ASICversion of other slope estimations is computed. The deviation of this
Computers. median from β = l is an estimate of the bias of b.
From the Simulation we find that b is unbiased for
CV's < 7%. The details and the behaviour of our
procedure compared with 6 others are given in part II
of this paper.
5. Appendix: Mathematical Derivations From the above, it follows that a estimates a.
7. What does b estimate?
The values for Sjj are identically distributed but not 2. The procedure is independent ofthe assignment to
independent. Therefore the sample median of the Sy X and Υ
may give a biased estimation of . It is plausible
that a somehow shifted median would be a better For
estimator. We cannot prove theoretically that the y* = a + bx* and x* = A + By*
median shifted by our offset K is unbiased. How we show that
ever, we can demonstrate einpirically that our proce
dure estiinates β correctly in the case of the null B=   .
hypothesis by using the following Simulation model. D D
We define
Let [Cu,c0] be the common r nge of concentrations
in which both methods are applicable. It is assumed arctg Sjj
that both methods have constant coefficients of Varia if  l < Sij < oo
tion CV^ and Ονη in [CU,CQ]. Let (i.e. 45°<arctgS ij <90°)
(1)
c0 arctg Sy + 180°
c: =
if  oo < Sjj < 1
(i.e. 90° < arctg SV) < 45°)
The r nge of both methods is transformed into the
The domain of ω,j with 45° < ω^ < 135° lies sym
interval — , l ; in doing so CV^ and CVη re metrical to 45° which corresponds to the ideal slope
of l for a regression line in method comparison.
Since Sy = — l cannot uniquely be assigned to ω^ , we
in unchanged. On — , I n samples are drawn
main have the choice of including these values in both
assignments, or of excluding them — s we have
with "true values" x? and yf = x? for i = l, ..., n done — from the calculation. If we now interchange
from two different distributions respectively: one in X and Y, we find that ω^ is transformed to 90° 
(Oij and the rank order of the sorted ω^ is revers
which the x? are equidistant on l—*, l L and one ed, but not changed in the sequence. If the slope
conforms to
where the samples are skewly distributed over — , l . (2) b = tg med
L ^ J
it follows that
B = tg med {90° 
The "true values" x* and yf are distorted by inde
pendent "rneasurement errors" §j and r\{ giving = tg [90°  med
"measured values" Xi = x* + % and yi = y* 4 TJ, Three
types of distribution of "measurement errors" are tg med {o)ij}
To derive formula (2) let us consider the following Analogously we obtain for the intercept after inter
two ways of ranking on R \ {—1} u { — » , 00} changing X and Y:
which for simplicity's sake are given in graphical
A = med {xj —Byj}
form:
l
I. r ( t + 00
(5) = — med {bxj  b · Byj}
D ·r
II. i loo1 l l a
= —gmed{yibXi} =  — .
Ranking according to I gives the natural rank order
of the Sy; ranking according to Π shows the cor The confidence interval for α can be transformed
responding order of the Sjj with Sy = tgcoy, if the <i)jj in the same manner:
are ranked ίη the natural order.
AU = med {XiBLyi}
Clearly, the sequence in II is the same s in I for the
region(l, + oo] 5 onlythefirstKvalueswithSij < l τ—
are added to the end by a left roundshift. There DU
fore, if we sort the Sy according to I it is sufficierit

to use K s an offset for the deterrnination of the bu'
median with respect to rank order II.
and it follqws that
if N odd
b= (rank
order I), The result of testing the hypotheses α = 0 is thefe
fore independent f the choice of X.
if N even In the cusumtest the rank order of the O} and of the
(3) A femains unchanged if X and Y are interchanged,
= med {Sjj} (rank order II)
only the sign of the η is revefsed. Since the test
= med tg { )ij} (natural ranking) statistic is j cusum (i)  the result is independent of
= tg med (natural ranking) the assignment.
The last equality is exact only for odd N's; how
ever even for N > 40 the difference 3. J stification of confidence intervals
Let
'(N + l  M,) (6) K( oo,  D < MI and K(_ Iv0) < MI
hold.
= l (rank order I,
SCN + IM. + K) X and Y inter To justify the formula for M! and to give a sufficient
changed) condition for formula (6) we proceed s follows. In
The result of testing the hypothesis β = l is there I.e. (17) it is shown that a confidence interval for β
fore independent of the assignment of the methods can be constructed by determining all those 's for
to X and Y. which
Xi and R. =  x
Brought to you by  Memorial University of Newfoundland
J. Clin. Chem. Clin. Biochem. / Vol. 21» 1983 / No. 11
Authenticated  134.153.184.170
Download Date  5/24/14 6:53 PM
Passing and Bablok: A ncw proccdurc for testing the equality of measurcmcnts from two methods 719
are not significantly correlated according to Ken Therefore it can be concluded that for method com
daWs . Let parisons in clinical chemistry the proposed confid
enee interval for has the actual level of about 95%.
(7)
Q(ß) = * UM) l (XiXj) (RiRj) < 0} The empirical derivation of this Statement might seem
unsatisfactory. But the same Simulation model can
Then P(ß) + Q(ß) = N with probability 1. also be used to demonstrate the behaviour of the
From other regression procedures mentioned in chapter 2
(xs  Xj) (Ri  Rj) = (x,  Xj)2 (Sij  ß) under realistic conditions. In our second paper we
shall show the favourable properties of our method
follows that when compared with the others.
P(ß) = * {('O) l S« > ß}
A sufficient condition for (6) is
and r(k)
or
M,=· F,(X) 
r(k) > 0
We studied the properties of the confidenee interval
for on the definition of Mi in the Simulation model
and obtajned the following result: If both methods VFT *  > w
have the same precision then in all cases the actual
r« < 0
confidenee level is about 95%; it is never less thän l
91% or higher than 96%. More details are given in VML 'r(k)
part II of this paper.
References
1. Halperin, M. (1961) J. Amer. Statist. Assoc. 56, 657669. 11. Maritz, J.S. (1981) DistributionFree Statistical Methods,
2. Madansky, A. (1959) J. Amer. Statist. Assoc. 54, 173205. Chapman and Hall, London.
3. Stamm, D. (1979) J. Clin. Chem. Clin. Biochem. 77, 277 12. Theil, H. (1950) Proc. Kon. Ned. Akad. v. Wetensch, AS 3,
279 and 280282. Part I 386392, part II 521525, part III 13971412.
4. Haeckel, R. (1982) J. Clin. Chem. Clin. Biochem. 20, 107 13. Bradley, J. V. (1968) Distribution free Statistical tests. Prent
110. ice Hall, Englewood Cliffs, N.J.
5. Feldmann, U., Schneider, B., Klinkers, H. & Haeckel, R. 14. Wold, S. & Sjöström, M. (1978) Lirtear free energy relation
(1981) J. Clin. Chem. Clin. Biochem. 79, 121137. ship äs tools for investigating chemical similarity — Theory
6. Ricker, W.E. (1973) J. Fish. Res. Board Can. 30, 409434. and Practice, In: Correlation Anälysis in Chemistryj Recent
7. Jolicoeur, P. (1975) J. Fish. Res. Board Can. 32,14911494. Advances (Chapman, N.B. & Shorter, J., eds.), Plenum Press,
8. Michotte, Y. (1978) Evaluation of precision and accuracy  New York and London.
comparison of two procedures, In: Evaluation and optimiza 15. Van Dobben de Bruyn, C.S. (1968) Cumulative Sum Tests:
tion of laboratory methods and analytical procedures (Mas Theory and Practice, Griffin, London.
sart, D. L., Dijkstra, A. & Kaufmann, L., eds.), Eisevier, Am 16. Cornbleet, P.J. & Shea, M.C, (1978) Clin. Chem. 24, 857
sterdam, Oxford, New York. 861.
9. Heiler, S. (1980) Robuste Schätzung im Linearen Modell, In: 17. Hollander, M. & Wolfe, D. A. (1973) Nonparametric Statist
Robuste Verfahren (Nowak, H. & Zentgraf, R., eds.), Sprin ical Methods, J. Wiley & Sons, New York.
ger Verlag, Berlin. 18. Witting, H. & Nölle, G. (1970) Angewandte mathematische
10. Wolf, G. K. (1980) Praktische Erfahrung mit Rrobusten Ver Statistik, B. G. Teubner. Stuttgart.
fahren bei klinischen Versuchen, In: Robuste Verfahren (No
wak, H. & Zentgraf, R., eds.), Springer Verlag, Berlin.
Dr. H. Passing
Äbtig, für Praktische Mathematik, Hoechst AG
D.6230 Frankfurt/Main 80
W. Bablok  v
Allg. Biometrie, Boehringer Mannheim GmbH
D6800 Mannheim 3!