Sie sind auf Seite 1von 190

DOCUMENT RESUME

SE 037 439

"- ED 216 874

O'Brien,-Trancis J.

AUTHOR
TITLE'

Jr.

Correlation
Cdefficient Has Limits ((Plus or Minus)) 1.
P r

'

PUB DATE
NOTE

32p.

EDRS PRICE
DESCRIPTORS
.

.11

8'2

IDENTIFIERS

MF01/PCO2 Plus postagf.


*Correlation; *Educatidnal Research; Higher
Education; Mathematical Formulas; *Proof
(Mathematics); *Research Tools; *Statistics
*AppliecOStatistics
,

ABSTRACT'

er presents in detail a proof of the liMits,


This
of th'asample bivariate correlation cdefficient which requires only
knowledge of algebra. Notation and basic formulas of standard (z)
scores, bivariate correlation formulas in unstandardized and
istandardized form, and algebraic inequalities are `reviewed first,
since they are necessary for understanding the proof.'Then the proof
is presented, with an appendix.containing additional proofs of
related material. (NNS)

6
1
4
'

,P,

.
I
O

f,

****************************4,**************1***************************
*
*
Reproductions supplied by EDRS are the beet that can be made
* ,
.*
from the original document.
*****************************************v*****************************.fr

Proof That the Sample Bivariate Correlation Coefficient


U S. DEPARTMENT OF EDUCATION
NATIONAL INSTITUTE OF EDUCATION
EDUCATIONAL RESOURCES INFORMATION
CENTER IERICI

Has

Limits

-I:

"PERMISSION TO REPRODUCE THIS


MATERrAL HAS BEEN GRANTED BY

Ars aocument has been reproduced as


recereed from the person or ofganizahon
onginating It,

Minor changes have been made to Improve

Fraricis J. O'Brien, Jr.,'Ph.D.

reprOduCnOn Quality.

Points of crew or %mons sated fn this docu


meet do not necessaray represent

othcfal NIE

T,0 THE EDUCATIONAL RESOURCES


INFORMATION CENTER (ERIC)."

posafon or 'poky

Virtually all social science students who have studied applied

-4'

..

r%-

CO '-.
%CO

statistics have been introduced to the concepts and formulas

orslinear
s

correlation of two variables.

Applied statistics textbooks routinely

r--4

report the theoretical limits of the bivariate . correlation coefficient; tamely,


6.11.1

'HoWever,

'that the coefficient is no' more than +1 and no less than

no commonly used applied statistics textbook

proves this.

One of the

best textbooks available to students of education and psychology introdUces


o

the'prOof (Glass and Stanley, 1970).

UndoUbtedly, one of the constraints

placed on authors by publishers is space limitations available for detailed


explanations, derivations and proofs;

This paper will set forth in-detail a proof bf the limits of the sample
bivariate correlation coefficient. Since the proof requires only knowledge of
4

algebra, most students of applied statistics at the advanced undeigraduate


, or introductory graduate'level should have little difficulty in under-

standing.the proof.

As a former instructor of graduate level introductory

applied statistics; I know that the typical student can understand the
prodf as it is presented here.
The key fdr tinderstanding statistical proofs is a presentation
-

of detaile\d steps in -4 well articulated and coherent manner.

A review

1P4

of relevant statistical and mathematical concepts is also helpful


.

..

and

.\

'2.

1.

'

usually required).

When students are presented

in detail im-

portant statistical proofsithey feel that some of the mystery and magic
of mathematics has been unveiled.

My experience has been that the typical

student of applied statistics can follow a good number of proofs becguse


most proofs can be presented algebraically without use of calculus. In addition to
enhancing knowledge, an occasional

proof often increases academic

motivational
)

Some Preliminary

Concepts

The proof requires knowledge of several concepts in statistics


X
and mathematics. In order to make thispaPer self-contained, some

preliminary concepts stated in a consistent notation will be reviewed.


We will review the concepts and forMulas of standardiscores (z scores),
bivariate correlation formulas in unstandardized and standardized form,
and algebraic inequalities.

Notation and Basic Formulas

Table l'i,a layout of symbolic values written in the notation


to'be used in this paper.

The model presented in Table 1 is of two measures

in unstandardized (raw scorer and standardized (z score) form. Table


.presents some familiar

formulas based on unstandardized variables that

will be useful'for the; development of the proof.

1This
T
paper is one of a series contemplated.for publication. [See
O'Brien, 1982]. Eventually ,I hope to present a textbook of applied /stat'
tics proofs and derivations to supplement standard applied statistics
textbooks.

61"

Table 1

Table Layout for Two Measures in Unstandardized and Standardized Form

Measure
Unstandardizgd

Measure

(X -R)/S'

rR)/S

(X
3

r,

x2

Y1

='i

JY

= Z

(Y -Y)/S

Zx
1

Standardized

Unstandardized

',Standardized

(X -;) /S
x
1

Y2

= Z

(Y
3'

Y3

I
0

t*

x.

(g.1

-R1/S

= Z

(Yi-) /S

Yi

x.
.

= Z
Y

-X

(X -T) /S

.=Z

= Z

(Y

40

Y n'

'Sample
Size

n
zx

n
x

Y,

Sample
Mean

Sample
Variance

.,2*

3c

r'.

#.
.

-NOTE:

all sample size terms are equa.1; th4t is: n


..

= Ti

,1= n

z.x

-= n

zy

..

' o,

Any of these sample size terms could be identified by,just.Che symbol -- such As
n.
We' will use h when it is not important to disiinguish..kmong the other sample'
size terms, but will Ilse the table' values above when itis necessary or important
to do so.
,,

Table

Relevant Formulas for Unstandardized Measures

-Y

Measure

Measure

-4
Sample
Mean

1 =1

X=

X.
X

1=1

Y
n

TX.

nxX =

nx

Sum

Y.

n Y =

E
1=1

i=1

Y.
1

nx

i=1

Sample
Variance

S
x

nx
2

(n -1)S
x
x

Sum of
Squares ,

L....(Xi :X)

2,
.

(n

i=1

-1)S2 =
Y

),__.

i=1

(T.-(Y. -Y')-?

1
.

NOTES:

The sampleisize terms are equal:n =ny . Also n x =n.=n4


y
x

l ,

..

anipulation of "Sample Mean"; i.e.,


2. ,"Sum"is simply-an algebraic manipulation
multiply over the sample sizeterm in "Sample Mean" to get-"Sum".
Also,"Sum of Squares is such a manipulationbased on "-Sample
"Sum" and "Sum of Squares"-will be useful later on.
Variance"..
.

3.
.

Des criptive statistics for standardized scoret'will be


body".ol. the text.
developed in. the

Ilr

Standard Scores
It will be Kecalled that the standard score for an unstandardized
.

'

measure (raw score) is "the score-minus the mean divided by the standard
deviation".

For case 1 of measure X in Table 1 , the standard' -(z) score is:

For any (hypotheticall, case, 'the standard,score,of-an,X measure is


.

x
=

1
X

.The lame procedure can be applied to Ymeasurds..For case'l:


,

Y -Y
1

Y1

Similarly,

for the ith (hypothetical case),-we have:


.:

Y-

,Since a standard score distribupion (such as in Table 11 is a


.

distribution of variable measures., we can calculate means; standard deviations,


,

.,

,
.

variances, correlations, and so forth, just as we can,calculate theSe.

__
.

statistics for unstandardized measures.

0-

6.

Most students will recall that the mean of z scores is equal to 0


and.the standard deviation (and variance) of standardiZed scores is equal to 1.
(The proof of these statements is given in the Appendix

1
.)

The mean of X standardized scores is defined as:

xi

i=i

Z =
x

40k

Similarly, for Ymeasures:

z
0

The variance Of X in z score notation is defined as:

n
cz
11 2

P
4

1=1
n

z
.

--.

E ) 2

...

-1

4,

Th'e Appendii Contains proof of certaimconcepts'or relationhips


that miy be of interest to the reader but are not crucial for.the.development
.,of the proof in this Raper (the theoretical limits of the sample bivariate
correlation coefficient).
.

..

16

7.

'I

For the standerdized,Y measure the variance is defined as:


4:0

n
z

1E: (z
2

i=1

-z 1
Yi Y

nz

-1

Sum of Squared Standard Scores.

To understand the proof it is necessary to know the result

of summing a distributiorkof squared standard

/.

If we square

scor4s.

each, standard score for the X measurein Table 1 and sum them, we obtain:
11,

4
.

Xi"

i=1

Z2

>..... Z2

Z
.

,,,-.

Z-2

-2

V
.

If we substitute the appropriate means and variances in .the right hand side

of the expression, we obtain:.

'n

zx

E
.i.71

------

z2
xi

;
0

.r'

(X

+
2

40 2

0(

-X).

(X -X)
n
2

C,

8.

Since

the

2
S

is a qonstant, we can.factor it outside, and write:

n ,
z ..
x

'

'

z-x4

..

(x -x)
1

(x

-xr

+...+ (x

)i)

n.

1 =1

.1

S,

!I

4'

,.,)

.e08

Rewriting the right hand -side in summation notation, we obtain:

(X1 --co 2

h
z

1:

k=1,
Z2

S
i

i=1
.

x'

.
.

From Table 2, we know that we can substitute the sum of squares term
into the numerator on'the right hand side. This results in:

.5

(n Ll)S
x

n x-1

n71

(Recall that n -1 = n-1).

If we were to work through the same steps for Y, we would obtain:

,(n -1)Sy
Y
2

--.1a -1
,Y

4
(Recall that n 71 = n-1) .
41.

9.

As

Theu relationships between squared.z scores and sample size are very important'
,

for the proof 'later on.

They will be summarized dater on for easy reference._.,

Correlation Formulas

Unstandardized Form .

Using the notation'and variables in Tables 1 and 2, theiinstand

dized

'form.corc,elation'for two measures(Vand Y) is defined as follows:

i=1
r

3;

0;

xy
-7

-X)2

1=1

i=1

n -1
x

ma.

n -1,
Y

6...1.

Note that the numerator contains the term n-1 because it is not important
9

or necessary to distinguish between nx-1 or n y -1.


inator it is helpful to distingpish nn -1 from n -1.

However, in the denom-,

In any case, all

of the sample size terms would be equal, to the same numerical'value


,....serre/ation coefficient were computed on a set of data (n x -1 =
r

4'
10

if5a,

-1 =n-1)

Jr

10.

Standardized Form
The correlation of measure,Xsand measure Y in standard score form
is defined as follows:
1

n-1

1E: (Z --E )(Z


x
x.
17 .
i=1
n

rzx`zY

zx

z.

E(Z x.

-Z

i=1

i=1,
nz- -1

z y,

It is proven in the Appendix that this correlation formula is egdal to


I

E.Z x.Z
d=1

171.
.

-1

rz z
x y

If we rearrangejthis formula y multiplying over the n-1 term,we

obtain:

-n

(n-1) r

x. , y .

z z
x y

i=1 "

This relationship will be useful in,the proof

it will be restated for

easy reference later.

The reader may recall,that the same correlation coefficient results


when the variables are in raw score form or standard score form. That is:'

ll:
.

.__

This statement is proven in the Appendix.

W&will restate it prior to the

. .

proof for the readers convenience.

'C'

,Inequalities,

to

it it necessaryto revie

Before startlng the prOor

In the proof we are required to manip--

topic: algebraic inequalitiA.


ulate

one eifurther

the folm "gfeatek than- or eqUal to"

form of inequality:

and

Airless than,or equalto". An example will serve as a refresher..


For two variables, say A and B, we Can. write:

> B

which means "A is-greater than or equal to B".


'1

Equivalently, we can write,i,


,

<

A ',

"B is less than or, equal to A"!1-For example,.

which means the same thing:


3

) 1 or equivalently

}e obvious.

All'of this m

1 L 3.

What students, sometimes forget is

what happenS when. multiplying or dividing by negative quantities.

For

example, if
.

3 > 1, and we multiply this inequality by -1, we would obtain:

N.

J
=1( (3)

(1)]

-3

/---

12

12.

4"
.

That is, the inequality sign is reversed. when multiplying by a negative number.
004:
,

The same result occurs for more cbmplex expressions.

For example:

1 - A >

Multiplying each side of the inequitS, by -1,we

-11(1-A) >

-(1-A)

(B)]

(0))

<

A-1 <

Example:

1 1

-Multiplying through by

4>Q

we obtain:

-1[ (1-1/4)

-(1-1/4)

347.1

<

^:

SummaryNof Important Concepts:

a
We have reviewed standard scores(z), correlkion formulas and

algebrait inequalities.

(
the

proof that follows.

All of these concepts are important to understand


For the readers convenience, we will summarize

these concepts, for easy reference.

This is done in Table 3.

Table 3

Summary of Important Condepts

x-z'

1=1

z
n-1

=
Y.

1=1

xy,

z z
x y

I
n

Z
1=1

,,

.y.

(n-1) r

'

(n-1) r

x y

(I-A)

z z

>.

(B)

<

xy

. \

.4

14.

Proof

Formally,,we want to pro've

We are now ready to present the proof.


C.

the following statements:

-1

xy

<

xy

+1

.-011

writing each of the statements in one linear form:

-1

C
410t

+1
xy

This states the same'information as the above two separate statements'.

The proof consists of two parts: one part shows the lower limit
4
o

of r
of'r

(i.e.,

xy

xy

xy

(i.e., e

xy

>

-1), and

+1). We will prove the'upp'er limit first.

the second'part shows theupper limit

g7

Proof that

xy

+1

To prOve this limit, we sill perform

algebraic manipulations

on a statement which-is mathematically true.

Dz
i=1

That statement is:

z.

xi

Yi

15

15.

In,wdrds, the statement means:: the sum of squared differences of n


standardized value pairs will always be equal'to or greater than O. The
The squared differences

reader may refer to.Table 1 for clarification.

e,

Yi

fi
Z

values starting at

and

are taken 17 each row (pairs) of

...44.

xl, yi and conti uing down ,to the last pair of Z's (Zx ,Z
n

).

Yn
ob

Most students readily agree that the squared sum will be greater than O.
But can it ever be exactly equal to 0? Yes, theoretically it can.

Refer-

',
4

ring to Table 1, if olle;imagines each standardized X and Y measure to have


0

1
the same numerical value, then it is apparent that each difference will be

0; so, the squared

will itself be equal toi0.

it is onlyreqUired
sense.

that

Now, a sum of squared O's.

of 0 is also O.

value

While it may be unlikely to occur in practice,


:,(Z
xi
1=1

Yi

Thus, the statement is true.

be true in a mathematical

We will expand this squared, sum,

perform algebiaic manipulations and substitutions, and arrive at the proof

for the upper limit of thesample correlation coefficient.


The actual steps in the derivation will now be presented.
.pertaining to the

Notes

algebra are provided for the readers reference.

to Tables 1,2 and 3 as needed.

Refer,

It is suggested that the reader first

examine the algebraic statement on the left side of the page. Then read the

comment on the right side for explanation. Seenext page.


1

That is, within pairs, not all pairs.

Example:

Y.

i
1.41
-.68

1.41
-.68

.05

.05

etc.

40.

1.

That

+1

xy

Notes
2

-Z

Z(Z

yestatement
each term, we obtain an expansion of

1=1'

thd binomial in this form:


2

;t(Z2

2Z

1=1

Z
x. y.
3.

>

A2 + B2 - 2AB ,

(A-B.)

AK.

Squaring

from before.

Y1

Distributing the summati& opel-ator

3.

to 'each term, and, bringing g-the


a

constant 12)P outside


the summation sign
co,
\

ti

n
2

+" Z- Y1

x.

1=1

1=1 1:4:

2 E

1=1

Z
x. y.
1

This next stet) is very important. We will


substitute three quantitiee,a,11 frdm
Table 3. They are:
n

2:Z2

n-1

x.

i=1 1

17

n
I

n-1

.1.Z2

i=1

xZy

3.

3.

(n-1)r

(n-1)

z z
x y

i=1
.

So

f"

Making the

> 0

2(n -1)r

(n-1)

(n-1)

'xy

ree substitutions

Collecting, the liicetei-ms of (nr1)


A.

2.(n -1)r

2(.1-1)

2(n-1)(17r

xy

>

>

xy

n-1).term

.FactOring the

Dividing each
2(n-1) which
equality s gn
positive b ca

side of the inequality by


oes not change the in-,
as

2 (p-1). is always

se n mutt ,alwayS, be >

'

2(n-1)(1-rxy ]

2(n-1)

2(n-1)'

(1-r

Here we make

se of multiplying an
a negative number. Let
ch side of the inequality
us multiply
which reverses
by -1_(see Ta le 3)
sign
and
reverses the 1-r
the inequalit

xy

inequality' b

13
-1[(1-r

xy

xy

>

'+i

xy

0]

-1

xy

Now, add

:+1

to each side

This gives us

END OF PROOF

OR. UPPER LIMIT.

xy

18.

Proof that

xy

-1

,Part two of the proof

will be much simpler because the structure

of the proof is very much

of

e the first part.


.

follow the ame basic steps.

We will

We start out with a statement that is mathe-

true, namely:g.
.

",

2: (zx

)2

'o

y.

1=1

Again this statement is true in a mathematical senseeven through the


411;,'
.

"equals 0" aspect is very, unlikely to occur in statistical , practice.

The development of the proof with apprOpriate notes-begins on the

next page-

-1

4
0

10'

,0

-41Nha

xy
;..?

4
(2,

,. n

Zx

>

)2

Step ) restated. Squaring each term


results in a binomi4a1 expansion in
1
this form:

i=1

(A+B)

o,

2AB

Ez

i=1

+ Z

x.

2Z

+
.

Yi e

>

Z
)
x. y.
1
1

....,

2 EZ Z

i=1 1

Distributing the summatiOn operator


and bringing out the 2

i=1111

, 0

Makingsthesame three substitutions


As in part one/ we obtainer

y
i
i
.'

i
t+

(n-1)

2 (n-1) r

2(n-1)[1 + r

xy

Adding like terms and factoring


xy
Dividing each side by ;pi-1)

xy

.>

Adding.

i%

1.

xy

.41"4"'to.each side

>

-1

Simplifying

xy 0

>

END OF PROOF FOR LOWER LIMIT

-1

23

22

ra

jolt
JO,

20.
,
.'..
...

We have just proven that 1 <

+1.

See the

Appendix

'

for aclditional proofs of related material.


4

tit

re

ri

,r

ti

24

v..

21.
J.

.4

APPENDIX

Fs

g.

Selsiled'Proofs2

1:

That the mean of standard scores is equal to 0.

We will start wi th the definition of the mean of z scores for


o

6C

'4604

the X meature:'

. n

zx '

Z. 41 i
Z

1=1

=
x

Expanding the right sidel


4.
.,,

(X2-56

-i)

(X

n_z

S
-

..

..._

S
.

____CX_n ri)----

+ ... +
S

?(

Factoring the constant,

outside amervwriting the sum of deviatiOns

in summation notation:

n
x
(ni:X)

1 ;

Zx

nz

47
1=1

22
7

'

Distributing the summation sign inside the parentheses

:*

t_

Zx

n.

Sx

z
x-

Y.

e,

Since

x,

"")L

i=1

v Xlisequal
4,
x

and '4,e sum of the constant,

n X
x

. -

X..-.1,.

i=1

,t--..:

to in X
X

P,

0,"'

..

1.

1
-

Zx

(n X
x

----

n-

n7()

.,

Thus the mean of

Z scores is equal to 0.,


x

Similar reasoning for the

Y measure will produce the same result, namely:

nz
Z

yi

i=1

Zy

(n

n1

zy

"Y.

- n

S1

.y

Therefore, variables in standardized form have

mean equal to 0.

0,

Recall that when taking the sum of a c onstant (say C) we have:


n
JE:c .
\

..,0

c + c + c + ..1- c

i=1

. ,nCL
.

That is, the sum of 4i0brisiant


is equal to the constant times the number
...,4.,
..---of terms added (in thts.case / n).
,

.
.

26
.

23.

-That the variance and standard deviation og standard scores is equal to 1.

2.

By. definition, the variance fbr X meAsures in ,'standard score form is:

2::(Z

x.

i=1

,z

zX

-1

=;0, we now have:

Since we know that Z

S.
n1
nz

*S---(Z

L___ x.
1=1
1
nz -1

.,

If we

..

x',

rewrite 'Z

'

in terms ofunstandardized mean and standard


x.

'T

deviation:

z =
nz
x

i=1

2
S

Rearranging terms:
n

='

2-'

-1
z

S2
x

x
2
14EDX -X)

i=1

59

24.

From Table 3, we can substitute. into the numerator of S

the "sup of

x
This results in:

situares" for the X measure.

Since

nx = nz

2
S.
=
z

-1

:(n -1)(S )
X
X

we can cancel terms,' leaving:'


x

Similar reasoning for Y standardized measures-will produce, as the next


to the last step in the derivation:

(n -1) (S2)
Y
'Y

-1

nzy

2
,

Y
46.

Since n

= n
z
-Y'
2

In each case, the standard deviation for approp riate variance terms,

is simply the square root of 1. That iS!


,

2
n.

Sz

=
X

=17-=

Sz

and

Tr
S

Y'

=
,-

=j1

S.
z

Thtis the variance and standard deviaapn-of z scores

equal to 1.

a.

25.

0 3.

That

xy

rz

x y

We want to show that when measures X and Y are converted to standard


scores and correlated, the resulting correlation is the same as the correlation between the unstandardized (raw) measures of X and Y.Let us first
rewrite the correlation formula foY z scores:

:E=

,e 1

n-1
r

(z E
x

i=1

(Z E y)

tex

Yi

z z
x y

-am
z
2

-Z
x

i=1
n

^
i=1

-1

-1

x
,

Since Z

= 0, we can simplify to get:

4i

n
(Zx ) (Z

n-1
r

y .
3.

3.

i=1

z z

x y

nz
2

(Z

Y.

i=1
nz -1
Y4

In the denominator, we recognize. that'


i=1

(z'xi

)2

= n -1
z

.111140114,

nz

Substituting these valu es, we obtain:

and

r.

26.

S--(Z
r

.)(Z

x.

1.

y.

n-1 i=1
z z

x y

-1

.nz)c-1

-1
z

-1

n
z

The denominator cancels out completely leaving:

r
z z

11-1

x y

xi yi

1=1

(Recall that this relationship was used in the proof for the limits of r ).
xy
Nowo expanding the z score terms:

r.--

,
r

z z
x y

cp

n-1

(Xi -T)

(y.- 1-i)

.4i=1 '(S ) (S
x

This is identical to :

n
1.

RHY i)'

n-1 i=1

30
441

27.

Recognizing that

S2

and \02

, we can write:

40P

'

)
(X -R).(Y.-i
1

n-1

1=1
2

(S

2
) (S

Rewriting the denominator of the variance product term in

aw score

terms (see Table 2)':

.
i Em-X)
n-1
.

z z
x y

r-

-i)

i=1
....

n
Y
2

21(X. -X)

.4*

i=1

i=1
.

n x-1

Therefore,

-Y)2

Y:

n
,110i46

=mow.

01/0.

This is precisely the fOrm for

.....

...-

n
x

that was defined earlier it the iapei.

xY

the correlation ketween measures in rawrscore and z score

forms is identical.

31

28.

References

Glass, Gene V and Stanley, Julian C. StatisticalMethods in Education


and Psychology. Englewood Cliffs, New Jersey: Prentice-Hall,-1970.
a
2

O'Brien, Francis J., Jr. A 15/15tf that t and 'F are i e tical: the general
case. ERIC Clearinghouse orisCience, Mathematics and Environmental`.
(Note: the ED number for
Education, Ohio State University, April, 1982.
was not available at the time of this publication).
this document

.7

-DOCUMENT'RESUME
A

SE 037 098

.ED '215 894

O'Brien, Francis J., Jr.


A Proof That t2 and F are Identical: The General

AUTIOR
TITLE

Case.
,24 Apr 82

PUB DATE
NOTE

EDR PRICE
DESCRIPTORS

20p.
.

'

MF01/PC01 Plus Postage.


*College Mathematics; Equations (Mathematics); Higher
Education- Instructional Materials; *Mathematical
Applications; Mathematical Concepts; *Mathematical
Formulas; Mathematics; *Proof (Mathematics);
*Research Tools; *Statistics; Supplementary Reading
Materials

ABSTRACT

This document proves that the F statisticcan be


obtained by squaring ,t -test values,, or that'equivalent t=test values
may be obtained by extracting the positive square roots of F values.
Proof to varying degrees of*completeness and accessibility 16s been
given by other scholars, but generally these prier statements,
particularly those available to students of education or psychology,,
focus'on the special case,4rhen sample sizes are equal. No source

could be found that provides a complete,\ detai.led'prZof of the


Atueents
of.
applied
general case that was understandable-to
statisti.cs. This document seeks to give a clear-step-by-itep proof,

MO'

with,a numerical example worked out, anda plan is provided for


proving the special case. It is felt the reader should te able to.
follow the proof of the general case, and should therefore have
little difficulty in translating the acquired knowledge into proving
the special case. (MP)
'

***********************************************************************
Reproductions supplied by EDRS are the best thtt can be made
*
from the original document.
****************-*******************************************************

/VS. DEPARTMENT OF EDUCATION


NATIONAL INSTITUTE OF EDUCATION

"PERMISSION TO REPRODUCE THIS


MATERIAL HAS BEEN GRANTED BY

EOUCATIONAL RESOURCES INFORMATION


CENTER IERICI

ivori:ns document has been reproduced as


received, from the person or organzahon.

A Proof that t

and F are Identical:

the General. Case

onognaung a

-_

Mrnor changes have been mad, Io anprOve


reproduction

Francis

p_ oats of view dr opiniOnS sta d n this (loco

relent do not,necessanly rep,

F.

O'Brien, Jr., Ph.D.


TO THE EDUCATIONAL RESOURCES
INFORMATION CENTER (ERIC)"

root NIP

POSMon Or poky

.1t is well known that a researcher wilt) has. collected data from two

independent groupsmA,

rforth either a t-test for independent samples

or a one-way analysis 6f variance for two cfroups.

This is because

knoWledge of results from one type of computation can be transformed into


an equivalent result)for the other type of computation.
r;

if 'a t-test for independent samples is calculated,

For example,

the equivalent

F statistic can be obtained by squaring the t-test value. Analogously,


if the researcher has available the F statistic obtained from a one-way
analysis of variance for two groups, the.equi'valent t-test value may be
obtained by extracting the positive square root of ,the F value.
That is, t

= F or t = (F)

The proof that t

1/2
.

= F has been given to varying degrees 'of

completeness and licessibility to students by other scholars 'in,

'professional journals (See Rucci and Tweney, 1980 for citations).


.
[

Statistics textbooks commonally available to stude4te of education


1

....

or psychology

occasionally provide hintS for proving the'special case

of the relationship (when sample sizes are equal). (See, for example,
.

N.

Glass and Stanley, 1970)


The motivation fOr presenting the Proof is t.101ctold.

First, many

prior statements of the proof for the general case (of unequal sample
sizes) are either abbreviated, mathematically inaccessible or incomplete

for understanding this important relationship., A search, of


did not reveal a

the literature

source that' provided a complete, detailed proof of

'

the general case that was understandable to students of applied statistics.


Second, a-full sfep-by-step_ proof foi the general case will give reader's
4

a sense that statistics is not all just a babel of "Greek arithmetic".


As a dormer instructor of graduate lev'el applied s'Ntistics, I know

that many students can follow well-articulated proofs and desire to see them
worked out.
/

In this paper

three

tasks will be accomplished.

a clear step-by-step proof that t

Firsty

= F in the general case will be provided.

Second, a'buMerical example will be worked out.

-provided for proving the speciaLcase.

Third,

a plan will be

It is felt that the reader should

be able to follow the proof of the general case, and therefore, shOuld

have little difficulty in translating the acquired knowledge into


proving the special case.

Proof that t

= F: the Gen4ral
j
Case

Firs410eus lay out a table of. symbolic values'in order to


introduce a familiar notation and the var9.ab:Te7gAused in the proof.

This is done in'Table 1.


yhe'plan'for the proof is important for understanding the strategy
0

,involved in attacking a statistical proofr

that will be used here are given below:


1.

The steps of the plan

state the form of the t-test Statistic using the


notation of Table 1.
r

2.

square the t in step 1.

3.

state the form of the F. statistic using the notation


of Table 1.

4.

simplify alggbraiciIly the F statistic in step 3.

5_.

observe that t
t8 the squared t

amplified F of step 4 is equal


f step 2.

3.

Table 1

Table Layout

Notes for Table 1

for Two Independent Groups


1.

oup 1

sample sizes are assumed


unequal., That is,

Group 2

n.

X11

12
2.

21

the total sample size is

22
t..

31

41

n.

n.
2

32
3.

42

the grand mean (x..) is a


weighted mean since
sample sizes are unequal.
That is,
0

AP
+

X.

n.
1

n. 2X.2
'2

n.1 ,+

n.
2

X.

12

Ime

4.

s ?.

is not needed

for the proof. It is


included only for
completeness.

nl

Total

Sample
Size

n.

Sample
Mean

i.

n.

x..

X.2

Sample
Variance

n..

s.

s.

41

s..

N-

fr

5.

4-

\21'
This is the expresion we will use as a basis oecomparison with the.
simplified F statistic to be obtained in step 4.

t value is referred to simply as t

Note that the alcove squared

2
.

. o

Step 3.

the F statistic

It.\ \will be recalled that the F statistic is the ratio of two independent :
1

sums of squares: a between


squares (SS

).

and a within sums of

Also, eachk sums of squares is divided by an appropriate

'degrees of freedom term:


and df

sums of sqUares (SS

df

(between sums of.squares degrees of freedpm,

w (within sums of squares degrees of freedom).

the F statistic (for any number of groups)

The general 'expression

is:

SSb/df
SS

w/df

The genera1.form'ofdfb'is "the number of groups minus one" (i.e., dfb = J-1,
where J is the number of groups).
4

two groups is df

= 2

1 = 1.

For two groups, J = 2, and, so, dfb for

The- general form of df

size minus.the number of groups (i.e., df


we can Write

= n

+ n.
.1

df

='n

'1

J.
2

= n.. - J).

is

the total sample,


0

Since n.! =n.


1

+ n.

Hence for J=2 groups, we can write

Gr&mmarians will point out th-t the preposition "between" refers to


the relationship of two entites while "among "ore

g to more than two.

However, since the ,referencehere is ultimately td. two groups, "between'

'will be used instead of\the correct "among": 44afocept the righteous indig.

nation of the grammarian.

fr

5.

(N

This is the expresion we will use as a 'psis of'coMparison with the.


simplified F statistic to be obtained in step 4.
t value is referred to simply as t
*,
.

Step 3.

2
.

4%

Note that the above squared

1.

the F statistic

It\will be recalled that the F statistic is the ratio of two independent :


sums of squares: a between'
squares (SS

).

sums of sqdares

and a within sums of

( SSb )

Also, each,sums of squares is divided by an appropriate

'degrees of freedom term:

df

(between sums Of, squares degrees of freedp

and df
w (within sums of squares degrees of freedom).

the F statistic (for any nuber of groups)

The general 'expression

is:

SS.
b/df
FJ- b,n.. -J

SS

w/df

The general.form of'df b is "the number of groups minus one" (i.e., df

whereJ is the number of groups).

= J-1,

For two groups, J = 2, and, so, dfb for

two groups-is df

= 2 - 1 = 1.

The general form of df is "the total sample,


w

size minus,the number of groups (i.e., df


we

an Write

= n.1 + n.

J.
2

= n.. - J).

Since n.! = n.

+ n.
1

Hence, for J=2 groups, we can write

dfw,
44

ad

Gra.mmarians will.point out that the ipepotition "between" refers to


the relationship of two entiti,es while "among "ref

g to more than two.

However, since the iefereqcehere is ultimately tO two groups, "between'


'will be used instead of\the correct "among": .04 aotept the righteous indig.

nation of the grammarian.

-6.

Thus, using the notation of

Table 1, we carcwrite the F statistic

fof two groups as follows:

sb/i
F
1,n.

+ n.
1

- 2
2

ss

w/(n. +n. -2)


2

SS

SS

w
- 2)

+n.

(n.
1

In order to facilitate the proof, we will write'F in the


terms of means and variances.

familiar

That is:

n.

1nl+n. -2

+
1

n.

-X..)

(Z(

-1)s.

(n.
1

-1)s.

(n.
2

n.1 + n.2 2

We will refer to this expression as simply F.


2

It is instructive to now compare the vale of t

and F. for the readers.


I

convenience, we will restate t

'

and F so that they may be compared,,

arZ referred to later. This is done in Table 2.

7.

Table 2
L.

Restatement of t

and F

t2

F
41.

z
.-X.

(X.
1

(n: -.1)s.

(n.

+ n.2\

)s..

(A

-1)s.

:2

-R.:)2

+ (n. -1).
2
-2

,n.1

+ n.

(i.

+,n.
+

n.

(n.

)(n.

+ n.

n.

Step 4.- Simplify F.


This is the next to the last step in the proof.

proof requires several algebraic steps.


of,F

We will first

A full step-by-step
simplify the ,numerator

Notes pertaining to the algebraic manipulations are provided if

the margin for the readers convenience:

See the following pages*

c..
4

4..

Notes,

sta, by looking at the numerator of F (SSb).


Since,

n.

SS

n.

(;

1 "" 1

R..

(R..R..)2

n.

""=

n:1

+ n.

n.
1

n.2 4X

n.

Finding common denominators for each


bracketed term,

n.2 /

(n.1X.1 + n.2X.2

(n.1 +n.2)X.1

+ n.
1

n.2

+n.

(n.
1

))-(.

- (n.1X.1

R.

n.
2

2!

+ n.

1 n.
1

].
n.

L
SS

P n.

n.

we can substitute for X..

X.1

+ n.
1

Sat.

=
n.

n. 2X.21

+ n. X.

R.
1

qemoytng inside parentheses, multiplying


Aside:the group means and subtracting,

AP

V.

+ n.

.n.
1

SS

- n. X..

tl

n.

1,

- n. X.
1

4/

n.1 + n.
2

+ n.

R.

n.

n.
2

n. R.

R.
1

n.
1

2X.2
2

Cancellipg.like teifms,

+ n.

n.

n.

n.
1

,Factbring like sample size

terms,

+ n.

n.
1

n.

SS

-R.

(R.

+ n.

n.
1

2
2

n.

rR. --)

n.1

n.2

Squaring each term separately


inside the brackets,

10Na

SS

(h.

n.

2
1

(n.

(n.

2'
+ n.
1

/n.

qo(n.1) (n.2)

n.2(X.1 -X.2)2

n.

+ n.

-R.

(R.
1

1-

)2
1

-X,

(X .

b
(n.

SS

brackets,

(X.

Rearranging terms by bringing


all sample size terms outside

+ n.

(n.

)(n.

(n.

+,n.
1

(R.2-R.1

1
(n.

SS

-X.

(X.

1)

Factdring out like sample size


terms in'numerator and denom-:inator, A

y
lk

(n.1 + n.2)2

Note that (X. -X.

is the

same as (R.,-R.)? because


Oftufo =sq
ed`difer
the same to
Tes
same quantity. Hence, we. dan
can

411,

factor within thebrackefs


I

to obtain,

9.P

)(n..

(h.
1

(n

SS

+ n

+ n.21

(n

(X.1 1

)2

Simplifying (n.1 + n.2)

'2

and (n .

+ n .
1

) 2',

we obtain, 16-

2-

)(n. 2)

(n.

SS

(R.
1

-R.

This is just about itiNow sub

)2
2

stitute the value of SSt, just


obtained into the F statistic,
and obtain,

n.1 + n.,

4.

1_3

-14
'so

p
C

0/

11.

)(n.

(n.
1

(X.

n
'1

k
0

If we divide numerator and denominator by

-1.)s..

(n.

).

-1)s.

(n.

-X.
1

(n.

n;

(R.
1

F
2

-1)s.

-R. )2

+ n.

Step 5.

observe that t

- 2

= F

t (n.

+ n.

n.
.

ENO OF PROOF.

-1)s2

(n.

n.

we obtain,

4-n.

+ n.

n.

(n.

).(n.

) (n.

If the value of F above is compared with value of t

it will be seen that they are in the same'

form, signifying that they'are equal. Refer to Table 2 for this comparison.

This completes the entire proof in accordance with the five step plan.

We now turn to a numerical

example for two.independerit groups of unequal sample size.

15"

16

4
12.
t

Numerica1 Example

The analytib
proof using algebraic rules for the general case
.
41,

was given in great 'detail.

A numerical example should provide additional

insight.

The data and descriptive statistics are rovided in Table 3 which is


modeled do Table 1.

Note that the data were chosen for illustrative purposes

only.

Table 3

Data for Numerical Example


46.

Group 1

"`Group 2

10

50

40

70

20 "

100

50

40

'QM

75

ieb

4
.

90

'Total

Sample
Size

10

Sample
Mean

07.5

65.0

54.5

Sample
Variance

957.5,

700.0

NOT NEEDED

Refer to Table 2 for the t-test formula.

13.

Using the formula

-,

there, the t statistic

value is:

47,.5 - 65.0
t

(6-1)957.5 + (4-1)700.0
10

6887.5

J.
t

10

24

-.9240

1%.

If we, square this value, we obtain,

(-.9240)

.8537

(t

= .8537)

.We place the computed value in the margin for easy reference.

Now

compute the F statistic using the formula provided in Table 2.


.6(47.5-54.5)

+ 4(65.0-54.5)

(6-1)957.5 + (4-1)700.0

\_J

6 + 4

6(49) + 4(110.5)

F
6887,5
8

.8537

(F = .8537)

18

./

Thus, t

= .8537 and FL .8537 or t

14.

= F.

Prooff the Special Case

If the sample 'sizes are equal in eresearch design of


two independent groups, the proof that t
derive.

=F is somewhat easier to

Rather than perform the necessary algebraic manipulations, it

may prove instrpctive,forthe reader to actually derive it himself or herself.


Working_ through the analytic proof for the special case will solidify
Jr(AnUnderstanding of.the proof for th

general case.

Some hints will be provi ed for the reader in proving thC


special case.
1.

They are summarized as follows.


= n

= n.

n.

R.2

R.
2.

X.. =

That is, since sample sizes are equal,


r

used; it could be called n.


mean (X..- )
for X- ..

one symbol for sample size may be

Also, since samplX)zes are equal, the grand

is simply the average of the means of 6ach group.

should be substituted in the proof.

Thi5 value

de`

By making these two changes and by following the five step outline
used for the general case proof, the reader should be able to derive the
proof for the special case., One may.also wish to "make up" an.easy to work with
numerical data set to check

on the process.

I
0

Note

For students or researchers who enjoy proofs in applied statistics, the


following

two

references may be useful.

EdwardAll'en.L.

Expected Values of Discrete Random Variables and Elementary


.Statistics. New York: Wiley,"1964.

Guilford, J. P. and Fruchter, Benjamin. Fundamental Statistics in Psychology


and Education, 4th and 5th editions. New York: McGraw-Hill, 1973.

References
Glass, Gene V and Stanley, Julian_ C. Statistical Methods in Education
and Psychology. Englewood Cliffs, New Jersey: Prvitice-Hall, 1970.

Rocca, Anthony J. and Tweney, Ryan D. Analysis oalvariance and the "second discipline"
of Scientific isychology: a historicaraccount. Psychological
Bulletin, 1980, Vol.,87, No. 1, 166-184.

O
'fp

a MD

DOCUMENT RESUME
SE 051 972

ED 331 706
AUTHOR
TITLE

PUB DATE
NOTE
PUB TYPE

EDRS PRICE
DESCRIPTORS

O'Brien, Francis J., Jr.


A Derivation of the Limits of the Sample Multivariate
Correlation Coefficient.
Mar 91
18p.

Guides - Classroom Use - Instructional Materials (For


Learner) (051)
MF01/PC01 Plus Postage.
Algebra; *College Mathematics; *Correlation; Higher
Education; Learning Activities; Mathematical
Applications; Mathematics Education; *Multivariate
Analysis; *Problem Solving; *Proof (Mathematics);
*Statistics

ABSTRACT
This paper is the sixth in a series designed to
supplement the statistics training of students. The intended audience
is social science undergraduate and graduate students studying
applied statistics. The purpose of the applied statistics monographs
is to provide selected proofs and derivations of important
relationships or formulas that students do not find available and/or
comprehensible in journals, textbooks and similar sources. Derived is
the theoretical limits of the sample multivariate (or multiple)
correlation of one criterion (dependent variable) and any (finite)
number of predictors (independent variables). The proof given in this
paper involves deriving the individual terms of R. The lower limit
and upper limit of R are derived separately. (KR)

***********************************************************************
Reproductions supplied by EDRS are the best that can be made
from the original document.
**********************************************************************,

CC,

A DERIVATION OF THE Limn OF THE


rim(

SAMPLE MULTWAIUATE CORRELATION COKFFICIENT

Cf'D

Francis J. O'Brien, Jr., Ph.D.


frel

U S DEPARTMENT OF EDUCATION
(Mr* at F &rational Research and Improvement
EDUCATIONAL RESOURCES INFORMATION
CENTER IERIC)

zp This locumenr has peen rep/O&M:I as


rece,vet; am nie person or OnilandatIOn
nr.ginatmg it
M,not changes have been made to imcoove
rephodur ton quality
Points at view or opinions Mated in MS clOcu
men? no nal ne(OSSSIoly repreSAPnt 0.61.411
Of RI posot,an ot pohi

"PERMISSION TO REPRODUCE THIS


BY
MATERIAL HAS BEEN GRANTED

cjit..1 L'i

C;j11)F11.11
RESOURCES
TO THE EDUCATIONAL
INFORMATION CENTER (ERIC)."

36 Linden Street
Middletown, RI 02840
March, 1991

1991
FRANCIS J. O'BRIEN, JR.
ALL RIGHTS RESERVED

UST COPY AVAILABLE

Table of Contents

Page
Introduction

Introduction io Proof

Proof that 0 5 R 5 1

Proof of the Lower limit


Proof of the Upper Limit

11

13
17

References

List of Tib les

Table
1

Page

Formulas and Relationships for Standardized Raw Score


Variables

Formulas and Relationships for the Terms of the


Multiple R

10

..0.7-,oEsoo

a ass....

ar*

_EE

1,

aet412Verata.thansao_o .ACM96011172

A Derivation of the Limits of the Sample Multivariate Correlation


Coefficient

Francis J. O'Brien, Jr., Ph.D.


Introduction

This paper is the sixth in a series of ERIC publications designed


to supplement the statistics training of students. For related
documents see O'Brien (1982a; 1982b; 1982c; 1984: 1987). The
intended audience for these papers is social science undergraduate
and graduate students studying applied statistics.

The purpose of these applied statistics monographs is to provide


selected proofs and derivations of important relationships or formulas
that students do not find available and/or comprehensible in journals.
textbooks and similar sources. For example, based on the author's
personal experience as a former applied statistics instructor at the
graduate level, few students would profit from a reading ;if Kendall and
Stuart (1967) to understand the proof provided in the present paper.
The unique feature of the papers in this series is deta...tz:ci step-by-step
proofs or derivations written in a consistent notation system. Calculus
is neither used nor assumed. Each proof or derivation is presented
algebraically in detail.

The present paper assumes familiarity with the authors' 1982c


paper (or equivalent knowledge). That paper formulated a detailed
derivation of the sample multiple correlation formula for one
dependent variable and p predictors for the linear model based on
standardized (z) variables.

Introduction to Proof

In this paper we derive the theoretical limits of the sample


multivariate (or multiple) correlation of one criterion (dependent
variable) and any (finite) number of predictors (independent variables).
To facilitate the development of the proof, we will work with
standardized (z) variables. Although the proof could be presented in
the unstandardized (draw scorel form, normalized variables reduces
some of the algebraic details.
Many students have learned that the multivariate correlation
between one dependent variable and a finite number of independent
variables can be expressed as a weighted sum of regression weights
and Pearson (zero-order) product-moment correlations between
dependent/independent variables. This relationship holds only for
standardized variables. This correlation for p independent variables
can be written (see O'Brien, 1982c):
Rzyszi,

+ Biy2 +

+ Bjryi +

+ Bryp

Writing the right-hand side in summation notation,

Rzrzi.

zi.

zp

sVtBjr 1
J=1

where
Rzy.zi,

Z.J,

Zp

= multiple correlation of p standardized


variables.

= the standardized dependent variable

ZY

zi. 42.

= the standardized independent


I

variables
5

B1 B2,

B,

= beta (regression weights)


attached to each
standardized independent

B
p

variable*

r
r
..., r
y1, y2,

= product moment (zeroorder) dependent/independent

ryp

variable correlations.

Many st adents know that the numerical limits on the above


multiple R are zero and 1 (i.e.. 0 5 R 1 ). The purpose of this
paper is to prove that statement.

Proof that 0 R 5 1
In this section we present a detailed proof that the limits of the
multiple R are 0/1. First, a review is given of the notation and
uecessary definitions as well as the relevant results that were derived
in O'Brien (1982c).
We can state the formal linear regression prediction equation for
p standardized predictors as follows:**
= B1 Z1 + B2Z2 + ...+ BiZi

...+ BZp
p+

This equation represents the predicted standardized criterion


measure or score (246) for the ith subject in the sample on the p
i

standardized variable's Z 1 through Z

* Technically, the beta weights (1) ) are called "standardized partial regression
cofficients". The formal notation in some standard textbooks is more elaborate than
ours (e.g., Hays, 1973 or Kendall and Stuart. 1967). As in previous papers, we have
minimized the reading of the symbdism to clarify the concepts in the development of
the proof.
**The coefficient "A" is not included for the reason given in O'Brien (1982c) ; i.e.. it
"drops out" in the least squares derivations and so may be ignored.

The multiple correlation (or just R for short) for this regression
model of p standardized predictors may be defined conceptually as:
=

Corray,

Cot(Zir

\IVaray)Vara0

where Corr is the correlation operator, Coy is the covariance operator,


and Var is the variance operator. Note that Zy is the random variable
that represents the "observed" or known information while Z1

represents the "predicted information".


The proof that is given in this paper involves deriving the
individual terms of R. Two tables are provided for reference in the
development of the proof. Table 1 summarizes familiar formulas for
standardized variables. Table 2 is a summary of the results derived in
O'Brien (1982c) for the multiple R of p standardized variables. The
information in each table provides the essential building blocks of the
proof.

'.vaaff

Table 1

Formulas and Relationships for Sample Standardized Variables


Name of Quantity

Formula

Note

Sum

13 =0

n is sample size. The

i=1

summation is understood to
be across the sample for a
given predictorj.

Sum of Squares

Mean

= n-1

EZ 2

1=1

Above note applies.

Mean of jth predictor for

=0

total sample. The

summation is understood
to be across the sample
for a given predictorj.

Variance

n-1

= Vaa
r

Variance of jth

predictor for total


sample. The
summation is
understood to be
across the sample
for a given

predictot j.

(Table 1 cont.)

Correlation

so rzAzy = rxy

General zero-order
correlation formula for
any two standardized
variables, Zx and Zy..

Note: Proof of these formulas/relationships may be found in O'Brien


(1982b, Ap endix).

Table 2

rmulas

d Relationshies for the Sams e M _lti 'le R


p p- 1

/92

p113,4

COU(Zy,

Var(Zy)

=1

Var(71)

= ;EI2 + 2 JUEhair

-pryj

Cord?,

, Zs7)

P
pr
+

2A 1.; ;Biro

P P2 j.; BiBrij

iBjryj
J=1

where

= dependent/independent variable
Pearson (zero-order) correlations, and
rg = Pearson correlations among the p
independent variables
ryi

Note: Proof of these formulas/relationships may be found in O'Brien


(1982c).

As the reader can verify from Table 2, the covariance term is


Cov(Zy. 7,0) = Var( Zi) = R2. These relationships constitute the "key"

to the proof for the 0/1 limits of R as developed in this paper. We now
demonstrate this proof. The development of the proof will consist of
two parts --one part will demonstrate the proof for the lower limit and
the other will show the proof for the upper limit. The lower limit is
now presented.

Proof of the lower limit


The proof of the lower limit (R z 0) is based on an algebraic
inequality and the information in Table 1 and Table 2. Recall the
conceptual definition of the sample variance of Zi :

Vara0

n-1

A3 is true for any standardized mean, Z. = 0 (see Table 1). Thus.

Var(Z)

n-1

The reader will agree that the following algebraic inequality is a true
statement mathematically:

n-1

From Table 2, this statement is equivalent to:

Zi.)

;Er +

1) P-1

;Iwo

.1"2 11`

But, as the reader can verify from Table 2, Var(21)

Hence,

- Vad

= R2

or

Var(74) 0
Since the value of the square root of a variance term is, by definition,
positive, then

iJVar(Z)

or by substituting R2 ,

k0

4-1-21

Consequently,
R

O.

The proof for the lower limit has been demonstrated.

Proof of the upper limIt


The proof of the upper limit (R 5 1) follows with similar logic. The
reader will recall that the least squares criterion for standard scores can be
stated as follows (see O'Brien, 1982c):

= a minimum.

E(Z

zi

We can also write the least squares criterion as:

Egy

)2

1=1

which is a true statement mathematically.


Our proof for the upper limit will consist of first expanding the above
squared sum, substituting quantities from Tables 1 and 2. and simplifying.
We then return to the inequality relation and conclude the derivation.
Expanding out the left side as a binomial and bringing in the
summation operator:
)2

Ezyi&

24
1=1

1=1

1=1

2
Y1

1=1

Each term can be simplified in turn. As shown in Table 1, the sum of


squared standardized scores in a sample is:

Y1

= n-1 where n is the sample size. As for the second

1=1

term in the expansion, that term reduces to


13

(n-1)Vad 74)

which is derived as an algebraic manipulation for the form given in Table 1.


The last term can be obtained in several steps by expansion and
manipulation as follows:
zy 1( B Z1 + B Z2 +

+ BpZ)

1=1

1=1

21(BI7I7yi

+ B2Z2Zyi +

+ Bpyn)

i=1

2(B1EZ1Zy + B2 EZ2Zy +
i=1

Oe

+B
13

1:1

1=1

Z,i
X

1=1

From Table 1, it can be seen that any term of the form EZx Zy is equal to
1=1

For correlations involving the independent/dependent variables


, we have:

(n- 1 )r

(r
xl

xY

I zyi

2EB 1

.=,

(n-l)r

+ B2(n-1)r

yl

2(n-l)iB r

Jyj

+ B (n- 1)r
P

yP1

.1=1

Collecting all terms together, we can now rewrite the least squares criterion
as:
14

15

)2

n-1 + (n-1)Var( 74)

2(n-1)13 r

20

1=1

Upon factoring out n-1 and dividing it through the inequality, we have

2iBjry 2 0

E(Z, - 7.4- )2 = 1 +
z

1=1

.1=1

Now. irom Table 2, we know that

iB r

74).

Yi

.1=1

Thus,
(Zu.

- Zo

)2

1 + Var(

2Var(

20

i=1

Or
- Zi> )2

=1-

Var( 74)

i=1

Reversing the s% lse of the inequality, we can write the right hand side of
the above as:
Var(

5 1.

This gives equivalently,

Vara& 5

But since Var( 74) = R2, then


15

Viii S 1 or
R S 1. Proof is completed.

We have proven in this paper that 0 S R s 1.

References
Hays, W. L. Statistics for the Social Sciences (2nd ed.). NY: Holt,

Rinehart & Winston, 1973.


Kendall, M. G. & A. Stuart. The Advanced Theory of Statistics. Vol II:
Inference and Relatinnship (2nd ed.). NY: Hafner Publishing
Co., 1967.

O'Brien, F. A proof that t2 and F are identical: the general case.


Washington, D.C. : Educational Resources Information Center,
1982a. (ED 215894)
O'Brien. F. Proof that sample bivariate correlation coefficient has
limits +. Washington, D.C. : Educational Resources Information
Center, 1982b. (ED 216874)
O'Brien. F. A derivation of the sample multiple correlation formula for
standard scores. Washington, D.C. : Educational Resources
Information Center, 1982c. (ED 223429)

O'Brien, F. A derivation of the sample multiple correlaton formula for


raw scores. Washington, D.C. : Educational Resuurces
Information Center, 1984. (ED 235205)
O'Brien, F. A derivation of the unbiased standard error of estimate: the
general case. Washington, D.C. : Educational Resources
Information Center 1987. (ED 280896)

17

is

DOCUMENT RESUME
TM 830 618

ED 235 205
AUTHOR
TITLE
INSTITUTION
PUB DATE.
NOTE
PUB TYPE

O'Brien; Francis J.; Jr;


A Derivation of the Sample .Multiple Correlation
Formula for Raw Scores;
National Opinion Research Center, New York, NY.
24 dun 83
64p.; For related document, see ED 223 429.
Materials (For Learner)
Classroom Use
Guides
(051)

EDRS PRICE
DESCRIPTORS
IDENTIFIERS

MF01/PC03 Plus_Postage.
correlation; Higher Education; Instructional
Materials; *Mathematical Formulas; *Scores;
*StatittiCS; *Supplementary Reading Materials
Linear MOdelt; *Multiple Correlation Formula

ABSTRACT

This paper, a derivation of the multiple correlation


formula for unstandardized (raw) scores, is the fourth in a series of
publications. The purpose of these papers is to provide supplementary
reading for students of applied statistics; The intended audience is
Social science graduate and advanced undergraduate students familiar
with applied statistics; The minimum background for most of the
existing and forthcoming papers is knowledge of applied statistics
through rudimentary analysis of variance, and multiple correlation_
and regression analysis; The unique feature of this set of papers_is
detailed proofs and derivations of important formulas and_ derivations
which are not readily available in textbooksi journal articles; and
.other similar sources. Each proof or derivation is presented in a
clear, detailed and consistent fashion. When necessary, a review of
relevant algebra is provided. Calculus is not -used or assumed. This
series seeks to address the needs of studentsto see a full,
comprehensible Statement of a mathematical argument. (PN)

**A********************************************************************
*
Reroductions supplied by-EDRS are the best that can be made
*
*
from the original document.
*********************************************************A*************

A Derivation of the Sample Multiple Correlation Formula


fz.)r

Scores

Raw

Francis J. O'Brien, Jr., Ph.D.

Assistant Sampling Director


National Opinion Research Center

NORC
Sampling Department
902 Broadway
New Ygrk, NY 10010
June 24; 1983

"PERMISSION TO REPRODUCE THIS


MATERIAL HAS BEEN GRANTED BY

1983

FRANCIS J. O'BRIEN, JR.

TO THE EDUCATIONAL RESOURCES


INFORMATION CENTER (ERIC).

ALL RIGHTS RESERVED


U.SDEPABTJVIENT OF EDUCATION
NATIONAL INSTITUTE.OF EOUCATION.
EDUCATIONAL RESOURCES INFORMATION
CENTER (CRICI
Xs.

tvrt,.1 frmo
r10111.11i1),1

Minor

iwrson or orcianIzation

it

hantr..

11., b,,)

r11

,,(1,

re111o1,n.Z.,1

Pt.ntt,1)1,,, nr opiolotts 51,11,1 in 1,0. dot:kJ


ment

trtir)

',It
:

;)tifit,

lopresvnt of I Ici.11 NIL

SHEET

ERRATA

"A

derivation of the sample multiple correlation formula for raw scores"

1i? Francis J. O'Brien, Jr., June 24, 1983

GO-itettt-O

Noi4 ReadS

10, footnote,

4 lines down

13

17, footnote

nX1Y

hX Y

var(b2,x2)

var(b2x2)

Multiple R

multiple

b Ex2

29, f--JquationZx-Y

36)

2 lines from
bottom of text

mathematical calculus

JO, 2nd equation

Si

43, last lihe in text

*Page refers to original numbers (at top).

mathematical statistics
r

2
SY

Table of Contents

Page

Introduction

Overview of Derivation

Bricf Review of RegreSion Analysis and Derivation for


Two Predictors
Normal Equations

Multiple Correlation

12

Derivation

15

Derivation for Three Predictors

19

Derivation for p Predictors

28

Multiple Correlatibn for p Predictors and Derivation


Appendix A

Normal Equations in Regression Analysis

30
36

Introduction

36

Plan

39

Finding Normal Equations for the Two Predictor Model

40

Finding Normal Equations for p Predictors

44

Alternate Procedure

47

Example for Five Predictors

48

Appendix B: Errata for paper; ED 223 429


References

51
'" 52

list of Tables

Page

Fable

1; Descriptive Samplo Statistics

Normal Equations and Multiple Correlation Formula


:for Two Raw Score Predictors

16

3; Normal Equations and Multiple Correlation Formula


for Three Raw Score PredictOrS

4. Normal Equations and Multiple Correlation Formula


for p Raw Score Predictors

ii

31

A Dorivation of the Sample Multiple Correlation Formula


for Raw Scores

'

Francis J. O'Brien, Jr.; Ph.D.


Nat,ional Opinion

Research

Center, New York

Introduction

This paper is the fourth in a series, of publications.

The purpose

of these papers is to provide supplementary reading for students of


applied statistics. (See O'Brien; 1982a; 1982b; 1982c).

My intended

audience is social science graduate and advanced undergraduate students


familiar with applied statistica

The minimum background for most of

the existing and forthcoming papers is knowledge or applied statistics throul


rudimentary analysis.of variance; and multiple correlation and regression
analYSiS.

The unique feature of this set of

papers is detailed proofs and


derivations of imnortant formulas and derivations Which are not readily
available in textbooks, journal articles, and other similar sources.

Each proof or derivation is presented in a clear; detailed and consistent


fashion.

When necessary, a review of.relevant algebra is provided.

Calculus is not used or assumed.

As a former instructor of applied statistics on the graduate


level, I know that many students are very capable of understanding the
proofs and deriliations preSented in these paperS

My experience has been

that many students desire to see a full, comprehensible statement of


a mathematical argument.

This series seeks to address such needs.

The present paper is a companion work to an earlier paper (O'Brien,


Each is a derivation of the multiple correlation formula for

1982c).

the linear model.

The first paper formulated a detailed derivation of the

multiple correlation formula for standard (z) scores.

The present paper

is a derivation of the multiple correlation formula for unstandardized


(raw) scores.

1-

Readers should find each paper interesting and informative.

TYpographical errors appeared in this paper For the readers


convenience, corrections are summarized in Appendix B of the present paper.
The author would be grateful if other errors in that paper or the
present paper were communicated to him.

The two paperS taken together are meant to be preparatory reading


for a related paper.

Overview of Derivation

In this paper we will present a derivation of the linear multiple


correlation fortuld for raw scores.

The basic objective is to derive

this formula for one raw score criterion (dependent variable) and
lany finite number of raw score predictors (independent variables);

Let us first state the formula we will derive and introduce the
notation used

The linear Multiple correlation between one criterion

and p predictors can be expressed as:

RY

,x ,...,x,,...,x

=
P

hit

Sy S1 +

v 2

br-,S S
J YJ

+ ...+ b r S S
P YP y p

TJriting the right hand side in summation notation:

RY.X
1

x 2,".,.X

'13.r

:S S

J YJ Y

-where:

Y.A ix

x-

multiple correlation of raw

scores,

the observed raw score criterion to be predicted;

xl,x2,...,xj,...,xp

raw score predictors of the criterion,

Forthcoming with the expected title: "A Derivation of the Unbiased


Sample Standard Error of Estimate: the General Case." It will appear in ERIC.

3.

...,b ,... b

b'ill
2

,r

vi,..., vp

...,S

;S-,0....S

_--

slope coefficients o

prOdUct moment cri:erionifiredie.:or correlationS,

,--

standard deviations of the predictors`,

regression weight-Si

the standard deviation of the criterion.

ThiS IS the formula that is derived in this paper:

We will

first present a deriVation for the simplest multivariate case: one criterion
and two predictors.

A derivation is then presented for three predictors

The latter derivation is a useful exercise because it allows a review


of the logic and procedures used in the derivation.

In addition, it

will motivate the use of summation when the algebra becomes complex.
The deriVation is then presented for the general case of 0 (finite)
predictors.

An integral part of this paper is' Appendix'A.

In that

appendix, a method is presented for finding the "normal equations" in


regression analysis for raw score linear models.

Prior to starting the derivation for two predictors, let us


outline the plan which will be followed in the derivations.
will

use

The steps

are:

1.

State the regression model

2.

derive the normal. equations (see Appendix A)

T.

define the multiple correlation

4.

apply rules of covariance and variance algebra


to simplify the definitional form of the multiple
correlation formula

5.

substitute the normal equations into the multiple


-correlation formula

6.

simplify.

We will refine these steps to suit a particular application:

we

4.

Brief Overview of Regression Analysis and


Derivation for Two Prediccor5

In this section we will review the baSic concepts, logic and


our notation for regression analysis.

Introductory applied statistics

textbooks can he consulted for more detailed information on regression


See, for example, Lindeman

analysis theory.

in this section is to review the rationale

et_al., 1982.

The intention

of regression analysis;

The primary use of statistical regression analysis is controlled


The basic principle

prediction and explanation of. quantatativedata.


that-lay behind regression analysis

involves aelecting a general

mathematical functicn that beSt matches the underlying form of


ictability.

variables over which one desires to exercise p

Assume one i5 attempting

one raw Score criterion by use OT two raw score predictorS.


to predict
Assume further that the relationship between each predictor and the
Criterion is linear in form

The mathematical function most

often selected to obtain the best linear "fit" for these conditions is
provided by the following equation:

+ 6-X. +

x
2

where:
Y
a,

b2

the predicted (not actual or observed) criterion,

constants_to be selected by the "least.squares"; procedure;


a = the Slope intercept, and b- and b. = slope coefficient
1

ter-MS,

predictor variables in deviation score form.

1U

5.

ft is conventional to express the predictor variables in deviation Score

tom.

That is, for each predictor, first find its mean and then subtract

the mean from each predictor.

For example,

X1

Here, for either variable,


is its'arithmetic mean.

'cap X" is the actual (or gross)

It is not necessary for any mathematical

reason to re-express the predictors in deviation score fOrM.

simply to fOreethe algebra to be more tractable.


of convenience,

Note that we do not re-express


each type of criterion.

re-express

raw score'and

ThiS is dohe

As such, it is a matter
Y (or Y) as deviations: We could

However, we have chosen not to

do this since most authors follow this convention.


Using deviation scores for the predictors, we can now write
the two predictor raw score model as follows:
a

=a

b (X

1-bx
1

1):
1

As stated, we Will use the second form in this paper.


model.
The regression model stated above is an idealized mathematical
be
If a variable set consisting of one criterion and two predictors can
apply
assumed to be linear, then the model is a reasonable one to

It is idealized
for prediction of actual or observed criterion scores.
of Y. In
in the sense that it assumes no error is made in the prediction

practice, when an actual criterion score is compared to the criterion

1-

Readers of the 1982c paper may wonder why on page 2 there6f the
raTscore regression model was stated in terms of gross raw score (and
As stated, it is not_necessary mathnot deviation score) predictors;
In
any
case, the major result_we are seeking in
ematically to re-express;
this paper is unaffected by the initial form of the predictors. The
derivation could be made witnout the translation of predictors into
deviation score form, but the result ould involve unnecessary and unwanted
complexities. Practically speaking, t .ispaper would have been very much
longer if re-expression was not done

6.

score generated by model, some error is likely to occur--the "fit" is


Y,

If we call the actual sample raw score criterion

lesJ than perfect.

we can sLate another model (an observed raw score model):

where:

the amount of numerical error -resulting from using the


idealized mathematical model (Y) to predict the actual
criterion score (Y).

That is, an actual criterion consists of a predicted quantity plus an


error component.

The error made in predicting the observed criterion Store by the


idealiZed mathematical model is:

A
e

Y - Y

This is the quantity we want to be as small as possible in order to


It can be seen that,if e=0;the

minimize the error in prediction.

"N_

Ictual criterion is perfectly predicted by,the idealized model (Y=Y).


The technique most Often used in the social sciences to accomplish
this goal is the "least squares" procedure.

Essentially, this procedure

seeks to maximize predictability by minimizing prediction error.

The.least
1

squares criterion or goal is summarized in the folloW'ing eXpression:

r's
Y

_7
e

a minimum

1=1
A

If we substitute the quantity for Y previously defined, we can rewrite the


leaSt

squares criterion as:

If it is understood that the summation limits range from the first


observation (i=1) to the laSt (i=n) then we can drop the Summation limitsc
n refers to the total number'of observations for the zriterionand predictors.
This sample size is the same regardless of the number of predic.tors in the
complex,
m
regression model._ Later in the paper when the algebra becomes more
we use summation limitS extensively.

7.

b X

(a + b x
1

,=ie

minimum

and b

(As an aside, "least squares" means we deter-Mine values for d,b


1

in Y such that the squared error term reSultS in the least-poSSible value

Normal Equations

Having stated the multiple regression model for two predictors,


We now derive the So-called "normal equations ".

A discussion of the pro-

cedures and results we will need is presented in Appendix A.

The reader

may wish to read Appendix A at this 'point (or take the next step
on taith).

The normal equations are derived from the least squares criterion
The basic idea that lay behind the technique for

using calculus.

two predictors is to generate an equation for each of the constants in


the regression model (a,bt and b,i).

equations for a, b

and b-,;,

For the two predictor model; the normal

respectively are found to be:

na
Y

12.xl

x9Y

In the first normal equation (for a), n is the sample size:

TheS6 normal equatiOns can be simplified by substituting various


descriptive statistics into terms of the equations.
cancel in the process.

Other terms will

For the readers convenience in following the

substitutions; some basic formulas for sample descriptive statistic:;


presented in Table 1.

'Table

Descriptive Sample Statistics

Statistic

Raw Score Form

Deviation Score Form

Mean
X1

same

Variance
1

n-1

Standard Deviation

(where y = Y-Y)

Y and

)( y

(Xii) (Y-7

Correlation of
Y1
x

(n-1)S.S

(n-1)S S
1

(n-I)S-S2'

fi

15
Note:

For "moan" it is understood


for "correlation").

that the summation extends across all n values of X,

(and Y'

This applies equally to other statistics defined in the table.

9.

1.

In the first normal equation, we recognize that, on the right hand side:
x1- )

In the second normal equation, we can see that:

E(x

but the sample variance

)2
1

or

This may be substituted for

As for Ex

S2

is:

(n-1)S1

we can use the definition of the sample

correlation

between x

and x 7 to Simplify thiS term.

y.(xA)(x-a2)

By definition, for samples:

x
or:

(n-1)S S

(n-1)S1S-7

(n=1)r12S

Ex

Thia may be substituted.

Finally, Ty may be simplified as followS:

y(x x)

is

Now,

identical to'

or

`'1) (Y -Y)

(where y = Y-Y)

This is recognized to be the

numerator of the correlation between

and Y (ryl or r

ly

Hence,

).

x Y

(n-1)S S
1

or

xiY

SYS

(n-1 ) r
y 1

This

may be substituted into the second normal equation.


1

PROOF:

1E:(
Now,

n3"-ci

xV

E(X1-7 )(Y-V)

/.(X Y

-)(1)Y

+ TclY )

sfiY +X Y
= yX1

- Y(nX

n7)

-1-4.

TC17

Therefore,

=5q)Y

= 2:(X

) (Y=T)

End of proof.

11.

we can write down immediately the following

For the:equation

3.

Simplifications:

-E. *2y = (n-1)r

translates into

)
x-Y

S -S-

y2 y"2

In addition,

5Ex9-

lx; =

(n-1 )r
12

(n=1)S7

Making all these substitutions, we arrive at a Simplified set of the originally


.Stated normal equations.

na

b9(0)

(0)
1

(n-l)r

S S
v

a(0)

-=

(n =1

y2 y'.2

b2 (n -1

b (n-1)S

TO further simplify; eliminate zero terms, and for the last two. normal
equations, divide each term by (n-1).

This gives us:

na
EY

-9
r

yi y

y2 y

_;.

S
1

s'T
1

b r
2

b r
2

12

-2

h
12

(n=1)S2-

a(0)

S
2

12.

As a -final simplification, we can divide through the first equation by n:

S S
YI Y

b r

S
I

b r

S S

y2 y 2

S S
19

TheFe are the normal equations we want to work with in the derivation
for two predictors;

For the readers convenience in working through the

derivation, we will restate them prior to the derivation.

Multiple Correlation
We are now ready to define the multiple correlation for one criterion
and two predictors.

RY.x

x2

By definition:

corr(Y,a + b-x- + b-x-)

corr(Y,Y) =

2 2

cov(Y,Y)

\ivar(Y) var()

cov(Y,a +
1

var (Y)

x +bx
2 2
1

var (a + b xi + b2x2)

where:

corr means correlation


cov means covariance and,
var means variance.
1

Alternative notation systems use

y.x +x
I

IL)

Or

x2

among others.

13.

Itisimportanttorememberthata,b-1 andb-function as constants.


Elementary covariance and variance operations performed on the
correlation formula yield

ahove

in the first step:

cov(Y,a)

var(a)

cov(Y,b1x1)

cov(Y,

Var(b X

var(b ;x

)
1

dvar(Y)

2cov(a,bix )

+ 2cov(a,b2x7)

x )

2cov(bixi,b2x9)

Applying rules of covariance and variance for variables and constants;


1

we can achieve further simOlification.

This is done on the next page.

the variance of any constant is zero; the


To briefly review:
variance of a product term containing a constant yields the squared
constant times the variance of the variablesfor example
var(b x
1

= b1
1

When a covariance term contains constants, factor the constants outside


the covariance operator (sometimes this reduces the covariance to zero)for example,

cov(a b x-) = ab cov(I,x


'

= 0

but
-x

cov(b-1 x1'

cov(x ,x

definition the covariance is related to the simple correlationfor example,


cov(xi;x2)

r12S1S2

This should appear correct since, by definition)


cOV(k ,k )
2

r12

var(x ) var(x ?
1

20

14.

04-b1

cov(Ifx)+b2 cov(Y;x-)
2

X9
)
0 + bvar(x
1
1

b2var(x

0 4- 0 4- 2bib2cov(xi,x)
\II

As mentioned, by definition:

cov(Y;x ) =rSS
YI Y

cov(Y,x,)

cov(x ix

=
9

y2 V 2
S1

12 12

One further obr-erVation should be made with respect to the variance

For example; the variance of x 1 is:

of the predictors.

var(X1 - 5Z

var(x1 )

definitiOn, the variance of this difference is:

var(X1) + var(R )

2cov(Xt:RI)

is a constant,

Since X
1

var

var(X1) + 0

Si

Similar results obtain for var(x 2 ).

Therefore ) when all substitutions

are

made:

Ry5x

brylSS
+
y

r-S

2 y2 y

x2

b12 S1 2

b22S2

S
2

+ 2bbr
SS 2
2 12
1

NI

This is the form of the multiple R we will use in the derivation.


be restated for the reader

convenience.

2i

It will

15.

Diarivation

The following formula

for

one criterion and two predictors

appears in many applied statistics textbooks:

b r
1

Ry.x ,x

S S

yl y

+ hr_SS
2 y2 y 2

S.

We are now able to show its derivation.


t

For the readers cDnvenience, a restatement of the simplified


set of normal equations

and the multiple R formula is given in Table 2.

The derivation involves two steps: a)substitute the normal equations


into the numerator of the multiple R formula and b)simplify algebraically.

See the page following Table

Table 2

Predietbr-gNormal EquatLony,---and_flulttple Correlation Formula for Two Raw Score


7

No-tmai-Ettuations

S S
yl y

S- S

12

b r
2

y2 y 2

h S

12

S
1

b-S
2 2

Multiple Cornelia-Lo

brSS
)12 y 2

hr SS
yl y

RY.x

x
2

; 2
sy

term is

1--

The

NOTE:

a= Y

omitted

b S

2b1h2r12SIS2

because it plays no role in the derivation (other than zero),

Proof involves the substitution of the normal equation into the numerator of
the multiple R formula and simplifying. See text for details.

17.

Notice that the numeratDr of the multiple R formula contains


These terms are functionally related
S s- and r S
y2 Y
vl y
If we substitute normal equations for
to the normal equations.
the terms r

each term into R and rearrange terms, we obtain the following reSultS:

1
112-

+ b9 S9

7_ 9

b-S-7
y

17-2-S

2b b

+bbrSS

')

12

S`
S

oi2 S")

2b

2- 2

b2 S2

b r

2b
1

b r_

(Hence,

-2-2

12

9 9
b-S-

2b b2 r12S1S2

+ b2r y 2 S yS 2

S S

S S

r
2

2-2

1pS2

2b

Now, the bracketed term of the denominator can be simplified algebraically if


we remember

radicals and laws of exponents'.

1--

Let the denominator (inside the brackets) be called A. Thus, the


structure of the Multiple R
A
R
S

\jr.k

kecall the following permissible operation (rationalizing the denominator):

18.

Simplifying:
2

b`1 S

2 9
b2S7

2b

r
1

12

S
1

Therefore,

br
S_S
1 yl y

+br2 y2SS
y 2

RY'xi'x2

END OF PROOF1

For readers familiar with the 1982c paper, it is possible to

Obtain a "theap" proof in the analogous standard score regression model:. If variables

are in standard score form, then the standard deviations become unity;

the

1982c

1.

Thus, in the notation of

paper

R-

z .z,z 2
Y

y2

19.

Derivation for Three Predictors

Let us ;now workout the derivation for a three predictor


raw score linear regression model;

This will allow us to review

the logic and procedures of the derivation.

We will also introduce

the ti,;(-2 of summation which becomes necessary for the general case of

p predictors.

The first step is to state the regression model.

Y = a + b1x1 + b7x7 + b k
3

For

three predictors:

e.'

We have simply added an independent variable to our prediction (idealized)


mathematical model to forM A foUr dimensional ilodel (Y. and three
pred:i.ctors with their associated slope terms0).

As in

the two predictor model, we make use of the least squares

criterion to establish our g'oaI of minimizing the prediction error:


0

Y)-

b x- = Le2

- b x
1

a minimum

The next step is the application of partial differentiation to find


derivatives of each of the terms in the prediction model (a,b1;b2 and b

This procedure produces the set of normal equations. Appendix A shows


the procedures involved.

Omitting the cumbersome algebra in<7olved

in Simplifying the original set of normal equations, we can state the


final and simplified set of normal equations as follows:

+b

5.s.

yl y

brS

rSS
y2 y 2
r

b3r23S2S1

rS

b r

_S

S S
2

2 23 2

S
3

--+,
Recall that the value of

but it plays no role in the

is determined in practice

derivation Since it "drops out" in covariance and variance operations of the multiple
R derivation.

The above normal equations are the ones we will make use of in the derivation
of the multiple R formula for three predictors.

A restatement of them is presented

in Table 3 for easy reference;

The third step is to define the multiple correlation of one crit


and three ray./ score predictors;

on

Rules of covariance and variance algebra will allow

us to simplify the definitional form of R.


1

The multiple R is defined

on the followin'g

page.

The term a is omitted

For justification; the reader ti),

the definition of R and a3certain the result,

w;int to include it i.n

Y.x

,x
1

+b

corr(Y b1 x

corr(YiY )

3,

cov(Y;Y)

3 3

2 2

var(Y) var(Y)

All of

the above forms state equivalent ways to define the multiple R.

to operations of covariance and variance,

The last is amenable

Applying rules of covariance and variance algebra:

Cov(Y,b xi) + cov(Y,b2x2) + cov(Y,b3x3)


RY.x1

var (b x ) + var(b x ) + var(b x3) +


3

cov(h x

h X )

+ 2cov

x2)

+ 2cov

33

hr SS-

y j 1

SS

b r

r -S S
2 y2 y 2

2
3

+2b

r
1

This is

as far as

easy reference;
ti

we can simplify

See Table 3.

S1 S

13

S2 S

2b2 b3 r

the multiple R at this point.

23

We will retain this for

22

Table 3
Nor-Mal.--1:ytaLi-on,s -and Multiple Correlation Formula for Three RawScore- Pred-ict-orS

Normal Equations'

br SS

S-

-S

12

S2

13

br S

b-r12S1S2

y2

3 23 2

SS

b3 r
2

b r
1

b
13

r
2

S3

Multiple Correlation

brSS

1 _04_4_

242_
2

23

+
1

2b b2r 4S1S2

-r -S

rSS
3-0

2b

_2

b2

2b}b3r1

r
2

S
2 3

S
2

Again) we note that the term a (=Y) is omitted from normal equations and the multiple R.
NOTE:

perivation involves substituting the normal equations into the multiple R and simplifying. See
the text for details.

23.

We have stated the multiple regression model and least squares criterion;
the normal equations and the multiple R fbrMula.

and presented

The faUrth step is to tibtitute

the normal equations into the multiple R.


If we !=substitute each of the normal equations for appropriate term

tlie-fturriglato>/ r

of R we obtain (see Table 3):

cov(YiY) =

b-r -S S1

h-rSS-

hrS S-

2 y 2 y 2

3 y 3

S S ) +

r
13

(b'S'

+h

12

3 113 223
(1) b

r13S1S3) +

b
1

+b2'p-r S S

rlS

Y 3

r
2

+ b bjr jS2S

ijS-S

+ bjS-

S2 + b2

12

-S S

23

(bi jr jSiSj

jr2j
3

Now

let us write each parenthesised term on a separate line to form a covariance matrix:
./

cov(Y;Y) =

b b r

1)i s1

12'1

9-9
r

12

S
1

2'2

S3

S
2

2- 2

b S

33

1.1

At this point we will introduce SUMM8tiO0 to simplify the algebra:

Consider the three

squared terms along the northwest to southeast diagonal of the covariance matrix.
that we might express these terms in summation as follows:
3
2

bS

+b2 S 2
2 2

j=1

The remaining six terms in the matrix consist of three Rain

S S

2b b r
1

12

2b

r
1

of quantitie:

13

One common way to express this in summation is as follows:'


3

+b

2E 2:1),b.r-..
j

One of several :orms often seen in multivariate statistics textbooks is as

2
i(j

b.j b-r.

It is clear

25.

The total number of terms to be summed is determined by multiplying


the upper limits (3x2=6).

In the double summation operation, the inside

summation operator is

set to I;

then-increment the outer operator

(jr=2,3) giving i j=12 + 13. Now increment i to 2 and complete the limits of j

(with the side condition that i j

e.g.,

i j=22 is not permitted);

The subscripts that result from all of the summation operations are:
12 + 13 + 23.

Each value ,of course, is taken twice.

26.

Thus, the nine covariance terms of the multiple R numerator can he written in all of
the following ways:

coy Y,Y)

+ bY

b21 S

-2-

+ 2b

S 5

2b

S S
3

r 3S S3

2b

1
2 2

b b.r..S.S.
j=i

j 1]

brS5 +brSS
2 y2 y 2
1

b r S S
3 y3 y 3

b.r .S S.
Y J

This last equation is simply a restatement of multiple R numerator from Table 3.

The second

equation was just derived from the first equation.

Turning to the denominator of the multiple R in Table 3, it is readily apparent that


That is

it is similar to the covariance term above.

I.22
S
S,

2 2

2 2
+ 2bbr-SS+ 2bbrSS + 2bbrSS
bS+bS+
b-S
2 3 23 2 3
3 3
2 2
3
1

12

-3--2vr- 2 2
b

bar -S S.
b j ij y j

S. S.

j=1

j=2 1=1

3 dI-

If we now form the ratio of covariance and variance terms for the multiple R, we

can complete the derivation for three predctors:

y7-b 2

-P

El

i=1 J

J72 i71 1 j

ij

,x

111.110,=1.1mmEMMINE

2,x3

b r

j=1 3

Notice that the numerator, and

J =2i=

S.

ij I J

denominator (under the radical) are identical in for.

If we

make the same algebraic simplification we made for the two predictor derivation, we obtain:'

b b,r.

Jj=1

RY.x ,x ,x
2

j=

i=1

=
3

.S S:
J YJ. Y J

This completes the derivation for three predictors.

END OF PROOF

We now derive the multiple R for any

possible (finite) number of predictors in the linear regression model;

34

28.

Derivation for p Predictors

The derivation of the multiple correlation formula for any number


of predictors will be presented/as a generalization of the two and three
predictor cases.

A rigorous mathematical proof that the generalization


could be provided by "mathematical induction".

holds for p predictors

Our

Approach in thiS section is a straightforward multivariate generalization.


For reference; the following is a listing of the general
steps for the p predictor variable case:
1.

state the regression model for p predictors

2.

derive the normal equations (see Appen,lix A)

3.

define the multiple R

4.

substitute normal= equations into numerator of R

5.

express the

6.

express the variance term in summation

7.

Simplify

covariance term in summation

The linear regression model is

A
Y = a + blk1

2 2

+ ...+ b.x. + ...+ b x

P P

3 J

The least squares criterion is:


^ 9

i.= 2

1(Y = Y)-

Le

a minimum

Sbstituting for Y:

1(

1x1

box
J

Next We derive the normal equations.

b-x-)-=
P P

e2

a minimum

In unsimplified form we have:

,x

nit8thatthenormalequationsfortermsxY y_ 2 Y etc. are written


Since these
such that the first subscript is atwayc less that the-second one
Yx1 etc.) this method simplifies the algebra.
products are symmetric (Y.x Y
1
Appendix A for more detail.
1

See

1
=

na

hl Exl

Ex
2

)c2

-f-...+ b.

x,

+...+ b

Ex
P

Ex Y

Ex

Ex x

Ex x

b
2

+...+ b
3

Ex x. +...+ b
j

Ex,x
p

Ex Y
2

a Ex
2

Ex3Y

b1 Ex x

Ex

a Ex3

Ex x

Ex x

b
3

+...+ b.
3

a Ex

Ex x

2 j

Ex x

1 p

3 p

S S

yl y

+brS

S2

11

12

+bt SS +...+ hr SS, +.1..+br


1

h-rY

rY3SYS3

S S

YP

=
P

+ b3r3

S2

lj

h-r
-S
+...+
j 2) 2 j
1p 2p 2 p
.

b r
3 +

3 13

b2r23S2S3 + b3S

brSS+br
S 1S
1p
p
1

+...+ b,r ;S -S; +...+ b r

a3J3J

p3p3p

+ b r
3

+...+ b S
JP j P
P P
S S

3p

restatement of the normal equations is given in Table

2 and 3 predictors, we

3 P

Ex
P

obtain a simplified set of normal equations!

Ex x

Exx +...+ b

+...+ b

Ex x

It we apply the same logic and make the same substitutions we made for

Ex x
p

b3Ex3+...+b,Ex x. +...+ b

b2 Ex2x3

Ex

Ex x, +...+ b

e tOrre-Tat ion- fOrl

We are no

lc tO-rs--an-dDeri-vatiori

ready to derive the multiple correlation formula for p:predictors.

for a statement of the definition

See Table 4

the multiple R.

The covariance term is

cov(Y,a +bk

bk

+b2 k

b,x, +...+bk )

P P

br SS +br SS + b-r-SS
1

Y1 Y

2 Y2 9 2

b.r_.S S. +...+br_SS
P YP Y

J YJ Y 3

3 Y3 Y

Now, substitute the normal equations (line for line--see Table 4)

tN

SS

cov(Y,Y) = b (b,S'T
1

1 2

4--..,--1--

S S) + b (b r

p 1p

2\

S S

-I-. ..+

r '

p 2p

12

...+
p

-2

S2)

b (b-r SS + b- r` y ---i--...-1-- b
P

Multiply

IP

PP

P,-P

I)

each of the b, terms inside the parentheses and write each parenthesiied sum on a separate line:

A
cov(Y,Y) =

+ b2r

-S

b-r

v2 y 2

yS1

12

rSS+

b h

3 13

n ip

sp

3 23 2 3

1 p 1p

bbrS
2 3

22

,..+hbr SI S

13

1S

+1)22

S.
hbr SS.

PYPYP

Y3 Y

+ blb3r

+;;.+ b_ r_ __S _S__

+...+ b b,r

p 20 2 p

For reasons presented earlier; the term

S S,
2j

22
+...+ b-r

33

3p 3 p

bb r- SS + hr SS
2

3 3p 3 p

P JP

.J 1:SL

S
I

is omitted in the derivation.

37

P p

31.

Table 4

Pte44ctd-rs

Co-rtelationFormula

Normal.

Nortif-1--ktiat fori-S

rSS.b S.
yly1

b r

11

rSS .brS
y 2

12

S2

b3r13S1S

.,;+ bjrijS1SJ +...+ bripS/Sp

b3r3S2S3

...+ bir2iSiSi +...+ bpr-4S2Sp

12

2 2

+...+ b r

=b1 r13 S

ry3 Sy S

2232

r SS

r-

1 10p3

YP Y P

3J

b-r

b-r

2 20 2p

+...+ b 52

S:S

+;;;+

-S

S S, +...+ b r
P 3P 3 P
3 J

pp

jjp jp

3p 3 p

Mg-tiple COtteLat

RY.x

x ,x ,...,x
2

+,..+bix +.;A b x )

corr (Y; b x + b-

corr(Y;Y)

=
j

22

s3

coV(Yib Xi +

pp

j j

+ b3X3

cov(Y,Y)

Iv

var(Y),

a r 69 var(Y)

var(b x +
1

r
1

...+b,r .5 S +...+b r

x +...+b,x, ...+b x )

PP

S S

SY

b
P

S
2b,b,r,.
+.
j ij i j

:.+

2b

2b
P

b
S
p -1 p p -11P P-

+ 2b

121

r
3

...+

S S

13

The PY term is Mitt-0 from the normal equations and multiple R;


NOTE:

Derivation consists of substituting each r ,S S,

notMal equation term into the

YJ Y J

covariance term of the multiple R. See text for details;

40

32.

To facilitate workihg with such a complex matrix, we will introduce summation at this point;

As the first step, we count the

total number of terms to be summed.

Ah inspection of the covariance matrix (page 30)


Since there

above makes it evident that each row consists of p terms.

is a total of p such rows, the entire covariance matrix consists of


For example, in the derivation for three

p x p = p2 terms.

predictors, we worked with three rows; each of which contained three


_

terms or a total of 3 x 3 =3-=9 terms.

In the p predictor model,

the covariance matrix consists of two kinds of terms: diagonal


terms (h

2 2
1

to b SP P

and off diagonal terms.

there are p such diagonal teLms.

It is evident that

A little algebra will tell us

how many off diagonal terms are in the covariance matrix.

Let X

represent the total number of off diagonal terms. Then:


7

TOTAL M A T R I X = p

= p + X

or
X

-2

= p -p

X = 0(0-1)

Thus, the entire covariance matrix consists of p diagonal terms and p(p-1)
off

diagonal terms for a total of p

terms.

We can View the structure of the covariance matrix in another way.


This view is the "trick"
in summation notation.

in understanding the expression of the matrix

Notice that the off diagonal terms exhibit


Each

a pattern (as we saw in the two and three predictor cases).


b.b r
1

to i t ;

corresponds to one other term in the matrix that is identical

.S

ij

and the first term in row two is identical to it.


any off diagonal term in row i;column j
in row j, tOlUffin i (e.g., row 2, column 5
the of

In general,

is .identical to the term


=

tot 5, column 2).

diagonal terms consist of a number of identical pairs

There are
halve

S S

For example, the first off diagonal term in row one is b

such pairs of off diagonal termS.

p(p-1)

the total p

a right triangle.

matrix

Thus;

of terms.

Suppose we

and consider the upper half only that makes

In this halved matrix; we are considering

the p

2
3

diagonal terms and


triangle consists of

p(p-1)/2 off diagonal teams


p(p71)

terms.

That is; the

upper

To represent the entire covariance

41

33.

mattix (0

terms), simply double the number of off diagonal terms

in the half matrix:

2 0(0=1)

p + p(p-1) total terms.

[
Examine the matrix of covariance terms for the three predictor case
for further clarification;
A

As explained,
9

there are p

the cov(Y,Y) matrix consists of p x p

and p(p-1)b.b r

ID'S'

=p2

or 2fP(p-1)/1

S S.
ij

terms;

terms in

the total matrix;

Expressing the total number of diagonal terms in summation notation:


9

2'9

2 2

P P

...+ b.S. +...+ b s

j=1 J

The off diagonal terms can be expressed in Summation notation as follows:


P

12

S.S )
r
JP J P

EbbrSS

2E

j=2

i=i

ij

1Fbr those reader_; familiar with combinatoric -; the following may assist
in clarifying the logic.
2
which are combined with all such terms
There is a t_ctal of p b.2 S
2

at a time. In combinatorial notation, this means that p


one at a time =that is:
bine-1
total number of

com-

l(p-1)(p-2)...1

11(p-1)!

For the off diagonal terms, we construct

terms are

p(p -1)(p -2);;;1

P!

C)

terms

1,4S?

b,2 S
J

terms (pairs

2-

of identical terms, each combined with all other

like terms two at a timP).

Thus:
4.m

Total number Of

bb,r:,S,S; term8
11

P(p=l)(P=2) (13-3)
2

P-

(: ;)

P(P-1)

2!(p-2)!

(p-2) (p-3)

Verne, tl-,e entire covariance matrix consists of:

4;

d9

P(P-1)

terms

34.

For

example, in the three predictet model, the first off diagonal

term was seen to be

and the last was seen to be b2bjrjS_;

b-b-r--12 S-S1

'

In-the case of a 10 predictor model, fitSt


laSt terms, respectively would be:

b b
1

r
2

and

-8 S-

12

b b

10

9 1

and

S
9S 10 .

We can now express the full covariance matrix in Summation


notation as:
p

2E

P -9-2

cov(Y,Y) =

j=1 3

p-1

Ebbj rS-S.
1=1

Equivalentl,
cov(Y,Y

b r

SS +br 2SS
=...+ b,r ,S S.
y2 y 2

1 yl y

3 Y3 Y 3

p
E

j=1

b,1 r

.S

Y3 Y

S.
3

Thus,

cov(Y,Y)
i=

P P-1
E b.b,r..S.S,
E
j=2i=1 1 3 13 1 3

The latter equation is very important

E b,r ,S S1 Y3 Y

3L-1

in the final steps.

If the variance terms of the multiple R are examined, we see that


;!var(Y)

is simply Sy by definition.

The term,

var (Y)

can be manioulated

by covariance and variance rules to produce the folloWing (See Table 4):

35.

var(Y)

var

+...+ b,x,+...+ b-x-)

JJ

2
2
2 2
...+ b, S, +...+ b S
P P
J
J

+ 2b

+...+ 2b,b,r

S,+...+ 2b -b

1JJIJ

4r 14

r-

S -S

PP-1 PfP-PP-1

In summation notation:

var(Y)

b; S;
J

P=.1

Ebb,rS,S,

2E

13 1 j

i j

j.2 i=1

THEREFORE, AFTER MUCH LABOR, WE CAN STATE THE MULTIPLE CORRELATION


-1
2.2

E 11,S;

-E

Ebbr ,S,S,

.x

x,,...,x,i...,x
1

21.2 i=1 1 J

2 2
E b.S

2 E

_i=1

il 1 J

17F-T7T7"-

E b;b,r,S,S,

j=1 j

j.2 i.1 1 j iJ 1 J

p
2-

p-1

2E

E b.S:
j.i

Ebb,r_S,S,
i 1 1.3

j =2i =l

P
\I'

44

JE

'

b,r

,S

YJ Y

S.

END OF PROOF FOR p PREDICTORS

36.

Appendix

Normal Equations

in

Regression

AnalySiS

Introduction

In this appendix, we outline a set of procedures to apply in


regression analysis for finding normal equations.

The procedures

are appropriate when:


a)

the regression model is linear, and

b) the measures are in raw score.

If variables are transformed to a nonlinear form prior to


regression analysis procedures, the procedures described in this appendix
would not apply.

Examples of nonlinear transformations include

logarithmic, exponential and square root

the exponents of the variables in the regression model

general, whenever

are not equal to unity.


Y = a + b 1X

re-expression, or, in

For example,

+ b2x2
1

This is a nonlinear mathematical model since the exponent of


x2 is not equal to 1.

To derive normal equations for a given regression model requires knowledge of elementary differential calculus which makes
1

use of partial differentiation.

Students who are familiar with

taltulus may read any textbook of mathematical calculus for the


details

for example, Hoel, Port and Stone, 1971 ).

For students who need to review this procedure, or who_know


some calculus and want to learn the technique, see Goodman,1977, for
a good introduction.

37.

TO render a conceptual understanding of normal equations as they are


employed in the least squares procedure, let us take an example of a two
predictor model.

The mathematical model applied to a distribution assumed

linear in each predictor is the one given in the text, namely:

= a+bx + b x 2

The raw score model includes an error component, and the error
made in prediction of the criterion (Y) with the above model may be
negative, zero or positive.

The raw score model is

= Y + 6

Solving for e, we obtain:

Y = e

This represents the amount of numerical error made on a score-by-score


A
To obtain
basis when we predict Y with the idealized model, Y.
an overall indication of the amount of prediction error for the
define

entire raw score distribution, we might be tempted to


A
E(Y-Y)

Ee

(over all n observations)

The problem with this approach is that the resulting sum on the left

"
side turns out to be exactly zerol;

E(Y-Y)

Ee

That is,

0.

positive errors cancel out negative errors leaving zero as the ove:ali
sum;

This is obviously problematical because no matter how good or

bad

a partitular mathematical model (linear or nonlinear)

empiriOal sc ,ore prediction,

is for

we would have no way of determining

its

utility (using the sensible criterion of minimizing prediction error).


1-

Proof.

For two predictors:

2".(Y-a=bixib2y

E(Y-V-

E(Y-V)

IX,-

blExi

b2Ex2

The generalization of thiS for p predictors is obvious.

47

38.

For these reasons, the most widely used and accepted procedure
for finding normal equations is based on the least squares criterion;

F.(Y

- Y)-

E(Y-a -b-

X-

= minimum

Ee

(The summation ranges from 1=1 to i=n o over the entire set of observations
least squares states: find numerical values for a,

In words;

13

and b2
2

which will make the prediction error the smallest possible numerical
upon subStitution.

amount

least squares

one

The reader is already aware of

A kind 3f least squares

type of result from elementary statistics.

criterion (and procedure) is used in defining the sample variance


of,a diStribution; i.e.,

E(Y-7)2
n-1

Mean, Y,

The arithmetic

in variance formulas (instead

is used

of medians or other numbers

) because the resulting variance is

the smallest possible value when the mean is used rather than any
other number (or combination of numbers)

in that given distribution.

This is derived through the same calculus procedure used in deriving


ormal equations, and is based on the same principle: optimization
normal
or minimization.
1

Take an example:
.(Y-2)`

(Y-4)

(Y=10)2

(Y-'8)

(Y -11)2

8
10
11

--

(The n-1
(Y -Y)
Find_eath squared sum and compare it against
can be ignored since it is a constant and has no material bearing on
the result).. It will be seen that only
.

(Y-7)

gives the smallest squared deviation sum.

48

-Cy.752

39.

Our task in regression analysis is to find numerical


values corresponding to

terms in the model

to Satisfy the least

squares criterion of minimum error of prediction;

The resulting

values, when SUbStituted into the regression eqUatibn, satisfies


the criterion of minimization.

In essence, we solve

p+1 equations

(p= the number of predictors, and 1 corresponds to the slope intercept


term), or one equation for one term in the model.

Each equation is

:hen solved simultaneously to determine computing formulas to


obtain the numerical values for the p+1 terms in the model.

Finally,

each predictor (and the slope intercept term) is passed through


the resulting prediction equation to find a unique predicted criterion
for each observation in the data set.
(see Lindeman,

The rest is statist cal theory

et al. for an excellent discussion of regression theory).

To take the two predictor example once again,

E(Y -a-bx

x2)

Ee

minimum.

We are not interested in finding a comnutational formula for


and b

b
1

2.

doing that;

Our goal is to stop

one step short Of

We are interested in finding the normal equations,.

and simplifying them

to substitute into the multiple R.

Plan

We will now set down a plan for finding the normal equations.
A four phaSe

plan is used throughout this appendix for finding

40.

normal equations.

This will help structure the presentation.

^
Y

the regression model,

A.

State

B.

state the matheffiatical function of the


least squares criterion,
2
ECY-Y

C.

derive the normal equations for each of


terms in the model

D.

summarize the normal equations

Finding Normal Eouations for the Two Predictor Model

Let us apply the four phase plan firat to the two predictor case.
A.

the regression function is


A

Y = a

B.

b 1x1

the least squares criterion is

E(Y - Y)2

C.

+ b2x2

E(Y - a -b x- - b x )2

Ee2

the procedures fOr deriving the normal equations


are
1.

For the slope intercept term, a, we need to:


a)
b)
c)
d)
e)

and set function equal to 0


drop the exponent 2
distribute the summation_ operator
apply rules of summation for constants
solve in terms of the criterion. variable; Y
subStitute descriptive statistics and simplify

Applying each step in a) through e) produces:


a)
b) \

E(Y -a-bI x2 - b2x2)


EY

Ea

c)

EY

na

Zb-x

b x

b1E xi =

0
2

b2E x2

= 0

d)

EY =

na

+ biE x1

e)

EY =

na

+ b1(0)

b2E x2
+ b2(0)

Recall that Ex1=Exi 2= O. Dividing through by


n gives us the normal equation for aSin simplified form).
a =Y

50

41.
2.

The procedure$for finding the normal equation for b

are:

a) drop the exponent and set function equal to 0.


b) multiply the function by xi

c) diattibUte the X1 term


d)
e)
f)
g)

distribute the summation operator


apply rules of summation for constants
solve in terms of the criterion variable, Y
substitute descriptive statistics and simplify

Applying each step in turn produces:


a) 7(Y - a- b ixi

b) E(Y = a = bi*i = b2x2


_2

c) E(Yki = Aki - bixi


= Eax

d)
'1

b2xlx2

Eb x2
1

b2 Exix2

:xi +

b2 Exix2

--10

a Z,X

Eb x x

Exl

since

EYxi = (n-OryiSySi

Exi = (n-1)S

and

Exix

and

(n- 1)r12SiS2

we can substitute these quantities, and obtain:


S S

(recall that

Ex

(n-Or

0 + b (n-1)S2

_(n-1)r12

=,..0).
1

n-1), we obtin:

If we divide the last equation by

-S

Y1 Y

S-

r
2

12

This is the normal equation in simplified form


we used in the derivation (see Table 2;.

42.

3.

The steps for finding the normal equation for b2


parallel those for 1)1:
a)

drop the exponent 2 and set function equal to 0

b)

multiply the funttiOn by x2

c)

d)

diStribUte the x term


2
distribute the summation operator

e)

apply rules of summation for constants

f)

solve in terms of the criterion variable, Y

g)

substitute descriptive statistics and simplify

Applying each step in order:

a)

E(Y -a

- b2x2) =

b-1 x1

b)

E(Y =d = bixi = b2X2)X2


E(Yx2

bixIx2

ax2

d)

EY-x 2

- FA)-,2

e)

EYX2

aEx

f)

E Yx2

since
Ex
1

aEx

c)

-bE x22

?Klx2

b22x22

x1 x
2
2

-Eb2x;

EbiX1
i

u2x2

r
= (n=l)-S-S
y2 y 2

EYx

x=r12 SS 2
2

and

E4

and

(n-1)S1

we can substitute these quantities and obtain:


= 0 +

(n-1)r2S)iS2

bl(1-1)r12S1S2

b2.(n-1)S2

If we divide through by (n-1) we have:


r

S S

y2 y

b S

2 2

This was the simplified form of the normal equation for


that was used in the derivation (see Table 2).

b
2

.0

43.

D.

As noted, a normal equation is

We now recapitulate.

derived at the point when we solve in terms of the criterion


variable;

Subsequent steps are used to simplify.

Y:

and b2 were:

The normal equations for a, b


I

b Ex

= n

EY

For

For

= aEx1

EYX

Ex-

Ex x

b,Ex1.2

For b

Ex

aEx

EYx-

2:

When we simplified the normal equations, we obtained the


1

following set used in the derivation for two predictors.

b2r1

S S

yl .y

b r

S S

y2 y 2

12

Readers of the 1982c paper should recognize the remarkable


SiMildrity between raw score and standard score normal
f the above variables were standardized,
equations.
each term S, 1 = 0 and a=0 making each normal equation set equal.
I

We actually disregarded the term_ain the derivation because


it was seen to "drop out" when it_ was included in thelalgebra,__
It is included here because -the slope intercept term is included
The formula
in the regression equation for criterion score calculation.
used is:

A
Y

Y+
1

x +bx
2 2
1

See Lindeman, et al; for additional methodS of writing this


equation.

53

44.

Finding Normal Equations for p Predictors

The rules and method for deriving a set of normal equations


when the number of predictors is greater than two are
-

generalizations for the two (or one) predictor case.


show two methods for the general case;

We will

The first method will

The 'second is a short -cut technique;

use the four phase plan.

But

the shorter method depends on first showing the longer one.

What are the normal equations for_the one predictor model?


The reader may_find it instructive to derive the normal equations for
this linear model. This can be done using the above procedures as guidelines.
ANSWER:

S S
yl y 1

The "multiple" R_in this case is the simple Pearson product correlation,
This is obtained from the second equation.
S
which is equal to
r
b

yl

SY

Thus,

the regression (predictiol equation upon substitution is:


Y

b x
1

S-

4- ry,

xl

Applying the four phase plan gives the following results for the general case.

A.

The regression model is:


A

hk-

b k

h_x_

+...+ 1),X,

P P

B.

The function to be minimized accoring to the least squares criterion i:

:,;(Y

1)-

b,x,
J

P P

C. The procedures for fihding the normal equations for a and any

term are as follows:

In deriving the normal equation for a, regardless of the number of predictors,


the result is always the same-- a = I

2.

Finding the normal equation for any b: term can he done in seven steps:

a)

drop the exponent 2,and set the funetiOn equal to 0

b)

multiply the functiOn by x

c)

distribute the x, term

d)

distribute the summation operator

e)

Hpply rules of summation for constants

,)

solve L

terms of the criterion variable,

sunrit

c descriptive statistics and simplify

46.

Applying these steps in turn produces:

a)

h1

h)

22

-b x,x

j) ...p) p

22j

11j

= 0

pp

= b x x.- b x x

ax

c)

-bpxp)

b x

b-x_

E(Y - a

2--bj

22j

= 0

Eb-X X- -...- Eb,x,


1

e)

P ) P

J J

b Ex?

b Ex x

aTx-

ZYx.

b Ex,x- = 0
J P
P

f)

8Ex, + b-Ex-x- + b

ZYx, =

Ilj

,x-x, +...+
2 j

_2
g)

S S

01-1 r
J

1 )

Dividing thrbUgh by

r .S S: =
yi y.3

bAn -1)S, +...+ (n -1)r, S,S

b(n-l)r1) .S-S, t b2 (n-1)r


)

n-1)

b r
1

13

b-r- S-S- +;..+b,S, +...+ b r S.S


22121
V JP I P
13

Thus, the normal equations for any number of predictors in the regression model
consist of

8=Y

and p normal equations of the general form

56

defined above.

yJyJ

JpJp

47.

Alternate -Proce-dure

The above normal equation for any b, term

,S S.

is a general result.

NoW a

Yj Y

much simpler procedure which makes use of this fatt Will be presented.

Recall that the simple correlation of any variable with itself is equal to 1.

r--=r -=,;=r, =;;;=r


22

II

jj

is eIual to

= 1.

Also recall that the

covariance of any variable With itself

PP
2

the

variance of that variable; that is

COV(X

X )
l'

or,

That is;

in general

cdv

Another way to denote

= S2

COV(X ,X

.= S
2

2
1

Cov(x -ik )

is S11

in general
;

we cnn Write

coli(x,Oc.)=S.or
J

S.

Jj

F.

From these facts; it is possible to write down an entire set of normal equations for any nutter
of

predictors.

kids

If r

for any b, term, the

it holds for j=1, j=2, j=3,..,,

YJ v J

j=p.

For example assume j=2 predictors.

We know that the set of normal equations will consist

ofixi= 2x2=2'

4 terms,

Thuil

firSt write out the general result forr,S 5,


Yl Y 3

,S S.

yj y j

S,

YJ Y J

br SS

,S S
j

b-r

b r
1

2 ij 2 j

lj

22j

Now; substitute the appropriate j value: j=1 for line 1; and j..=2 for line 2i

twice

as follows:

S S
Y1 Y

r
1

11

+ b r

S S

S S

11

yl y 1

S S

21 2

...

S=b1

OR
r-

S -S

y2 y

b r

S S

b r

2 2 2 22

S1 S2 +1,S2
12

OR

r_ S S
yl y

br- SS

br SS

S S

y2 y

12

12

+
2

The last set shows the subscripts of the correlations between predictors and criterion; and
the predictor standard deviations written so that the first Subscript is less than the second
subscript.

As mentioned

in the text, this c,:/vention

makes it easier to read the matrix ( and

see the symmetry of off diagonal terms).

Example for FiveTTedictors

To exem Jify the procedures for p predictors


equations for five predictors.

We will show

we will work thrOUgh the solution of normal

the solution by the shortcut method.

The long method could be used by applying the steps listed above for any

15,

term; but s'ice

the shorter method gives identical results, we will not work through the longer method.
We begin by writing out the 52 = 25 terms for the general r_ ,S S. normal ecivation.

nit i.,

YJ Y J

S S. on five separate line.

write out r

YJ Y J

br,SS, +

YJ Y

SS

S S

b r
I

2j 2 j

brSS

S S

lj

1j

S S

,SS,

bl

biSS3
3)j3

+ b3rS

S S
+
+ b4r 4i4i:

S S
2 2] 2 j

3j3

b -r

b rSS

b r

S S

3 3j

+ brS

b2r

J 33

+ br

b tSS

+brSS
223 2 j
b

4 4j 4 j

3 33 3

33j3j

I jr

+ br SS. +

+ br,SS

SS

b,

b r

S S
Yi Y

brS S

I j

+br SS, +br-,S-S,


4 4j

5 5j 5 3

Substitute the ,appropri ate j value (j=1 for line 1, j=2 for line 2, etc.); set ri

etc;

=1 r22=2)

and Set S

=S

etc.

= S.

22

'

+ br SS
2

.S S.

yl y

S S2

y2 y

S S

r
1

12

biri3SrS3

S S

rY.

21

S2

+ br SS

3 31 3

41

b3r32S3S2

b- r-

5 51 5

b4r42S4

b4

b5r52S5S2

b2r23yj

+ b3S

23

S S

S S4,

S S
Y

Y 5

24 2 4

y4 y

+ br SS

SS

b1 r

b r -S
1

15

25

Y
4

+ bS-

SS
4

'5

60

S S
5 54 5

4 4
4.

+ b3 r,

S2 S

b2 r

+ b-r

b4r45S4S5

1,

"5"05

If one desires, the subscripts

may be reversed for variables in the Upper right hand

The result is the same set of

triangle to render the first less than the second.

normal equations that would be obtained if the longer method were used to derive the
h(tMal equations.

The auLhor would be pleased to receive comment and reactions by readers of this paper
and others that appear in this series.

proofs and derivations

My intention is to prepare a textbook

for social science_ students.

of

I have long felt the

textbooks
need to bridge the gap between the standard applied statistics (and psychometrics)
currently on the market and mathematical_statistics. The mathematical sophistication

of students entering college and_univerity is rising steadily, and a textbook such I am


contemplating would make a contribution,_I feel. While it is true that a "real"
understanding of statistical (and probability) theory requires substantial mathematical
coursework, it is nonetheless true that more in the way of explanation and justification
It is my belief that a textbook

of results in probability and statistics is possible.


showing detailed presentations of proofs/derivations

would be a welcome addition to the market.

I would like to hear from readers (students, professors and others) regarding these
Are there proofs that you would like to
For example, are they clear?
papers.
see (statistics or psychometrics) in this format? Please remember, at this time I
am limiting my selections to those which can be presented with algebra.
I welcome comments on any level from readers of these papers. My mailing address is:
Franci J. O'Brien, Jr.
106 Morningide Drive, Apartr6nt# 5

New York, New York

61

10027

51.

Appendix B

ERRATA for " A derivation of the sample multiple Correlation formula for standard scores" ED 223 429

--Nor

-Dretiv-atik- far TwoPtedc

rs

Correct to

Reads--.

Let us review some

it us review some of the

concepts; notation

concepts, math

...

1_

3; footnote

1-

If it is understood that the 111

21, first formula

If it is understood that 411-1_4-a

summations range _from i=1 to

summations range from i=1 to

i=h,then We can drop the summation

i=n, then we can drop the summatim

limit.all together;

limits 4togthser;

cov(Z,y1,B2Z2,B3Z3,.,

Z
J

cov(2,B Z

B Z
2

B.Z.

PP

PP

JJ

corr(Zy,BIZ1,B2z2,B3

22

Corr(yBIZ1

B2Z2

B-P Z-P )

B.Z,

B Z )

JJ%

PP,

27; statement under Plan

add period, D.

27; two lines under previous

demonstrate

demonstrate

31, line 2

consisdered

onSidered

33, 4 sentences 1'rom bottom

first

erratum

firSt

first

corrected prose is underlined; but focMuIas are rewritten with applied corrections only,

NOTE: "Page" refers to original numbers in upper right hand corner.

,B Z
3 3

52.

REFERENCES

Goodman, A,W;_Call-culus for the Social Sciences


Saunders Company; 1977.

Philadelhpia: W.B.

Charles J. Stone. Introduction to


Hoel; Paul G., Sidney C. Port and
Statistical Theor.L. Boston: Houghton Mifflin, 1971.

Lindeman, Richard H., Ruth Gold and Peter Merenda. Bivariate and Multivariate
Chlcago: Scott, Foresman and Co., 1982.
Analysis
.

O'Brien, Francis J. Jr.


A proof that t and F are identical: the
general case, 1982a. ERIC ED 215 894.

Proof that the sample bivariate correlation coefficient has


1, 1982b. ERIC ED 216 874

1iMitS

A derivation Of the sample multiple Correlation formula for


Standard scores, 1982c. EDIC ED 223 429.
.

DOCUMENT RESUME
ED 280 896

TM 870 228

AUTHOR
TITLE

O'Brien, Francis J., Jr.


A Derivation of the Unbiased Standard Error of
Estimate: The General Case.

PUB DATE
NOTE

87

PUB TYPE

EDRS PRICE
DESCRIPTORS

IDENTIFIERS

54p.; For earlier monographs in this series, see ED


215 894, ED 216 874, ED 223 429, and ED 235 205.
Reports
Research/Technical (143) -- Guides Classroom Use - Materials (For Learner) (051)
MF01/PC03 Plus Postage.
*Error of Measurement; *Estimation (Mathematics);
Goodness of Fit; Higher Education; *Mathematical
Models; *Predictor Variables; Proof (Mathematics);
*Raw Scores; Regression (Statistics); Statistical
Studies
Applied Statistics; *Z Scores

ABSTRACT
This paper is pa4.-t of a series of applied statistics
monographs intended to provide s:applementary reading for applied
statistics students. In the present paper, derivations of the
unbiased standard error of estimate for both the raw score and
standard score linear models are presented. The derivations for raw
score linear models are presented in graduated steps of generality
for one, two, three, and any finite number of predictors. A brief
overview of regression analysis precedes the derivations. Appendices
include: (1) errata for a derivation of the sample multiple
correlation formula; and (2) a discussion of linear and nonlinear
regression models. (LMO)

**********************************************************************
Reproductions supplied by EDRS are the best that can be made
from the original document.
***********************************************************************

Derivation of the Unbiased Standard Error of Estimate:


the General Case

Francis J. O'Brien, Jr., Ph.D.

U.S. DEPARTMENT OF EDUCATION


Office 0 Educational riesearch and Improvement
EDUCATIONAL RESOURCES INFORMATION
CENTER (ERIC)

VThis document has teen reproduced as

received from the person or organization


originating it
0 Minor changes have teen made to improve
reproduction quality.
Points & view or opinions stated in this docu .

ment do not necessarily represent official


OERI position or pohcy.

"PERMISSION TO REPRODUCE THIS


MATERIAL HAS BEEN GRAN1ED BY

Fj O'Br

Tr.

TO THE EDUCATIONAL RESOURCES


INFORMATION CENTER (ERIC)."

1 987

Francis J. O'Brien, Jr.


ALL RIGHTS

RESERVED

BEST COPY AVAILABLE

"A Derivation of the Unbiased Standard Error of


Estimate:the General Case"

ERRATA SHEET for

4,

2nd equation

(X - X )
1

6,

CHANGE TO

NOW READS

PAGE

definition
2

Y.x ,x , ...,x ,...x

Y.x ,x

of R

definition
of r

9,

3xY

91

9,

Ix

9
1

footnote b

...,x
2

align subscript for 7

9, definition
of

x
1

10,

ist equation

Y.x

Y.x1

12, Sth equation

Ix

Ex y

18,

2nd equation

2 2

2 2
2 2
+ b
(b S

1 2

2 2

2b b r
S S
1 2 91 9 2

-2(Ebr SS)
j'l

9j Y

2b b r
S S )
1 2 12 1 2

26, 3rd line


from bottom

2 2
b S
2 2

(b S +
1 1

-2(Eb r

S S

j41 j 9j 9 j

PAGE

CORRECT TO

NOW. READS

31, 2nd line of


2nd equation

34

'2

1 -

1 - 1

br SY
j 9j Y

Refers to page at top.

br
j

j.

SS
9.1

9 j

Table of Contents

Page

Introduction

Overview of Derivation

Overview of Regression Analysis

The Standard Error of Estimate

Derivations for Raw Score Model

Derivation for One Predictor

Derivation for Two Predictors

14

Derivation for Three Predictors

20

Derivation for p Predictors

29

Derivations for Standard Score Model

36

Introduction

36

Derivation for One Predictor

37

Outline for Derivations

40

Appendix A:Errata for ED 235 205

41

Appendix B:Discussion of Linear and Nonlinear Regression


Models

42

Notes

46

References

48

List of Tables

Page

Table

Basic Sample Descriptive Statistics for One Predictor


Raw Score Model ....

2.

Substitution Equations for Two Predictor Raw Score Model ..

17

3.

Functions of R

4.

Generalized Substitution Equations for Raw Score Model

5.

Functions of R

6.

Functions of R

1.

for Two

Predictor Raw Score Model

for Three

Predictor Raw Score Model

19
25

27

for p Predictor Raw Score Model . .......

ii

34

A Derivation of the Unbiased Standard Error of Estimate:


the General Case

Francis J. O'Brien, Jr., Ph.D.

Introduction

This paper represents the f!fth in a series of


applied statistics
monographs (See O'Brien 1982a, 1982b, 1982c, 1983a)
The purpose of these
papers is to provide supplementary reading for applied statistics students.
The intended audience is social science graduate and advanced undergraduate
students.
The minimum background for i.lost of the existing and forthcoming
papers is familiarity with elementary analysis of variance, and multiple
correlation and regression analysis.
The unique feature of this series is detailed proofs and derivations of
important formulas and relationships which are not readily available in
textbooks, journal articles and similar sources.
Each proof or derivation is
presented in a detailed and clear fashion using well defined and consistent
notation. When necessary,
a review of relevant algebra is provided.
Calculus is not used or assumed.
.

The present paper assumes familiarity with two previous papers in this
series (O'Brien, 19820, 1983a). Each paper formulated a detailed derivation
of the multiple correlation formula of one criterion and p predictors for the
linear model.
The first"
paper (1982c) presented a derivation of the
multiple R based on
1

standard (Z) scores,

and the second showed the analogous


2

derivation for the raw score model.

Overview of Derivation

In the present paper derivations of the unbiased standard error of


estimate for both the raw score and standard score linear models are
presented. The derivations will be presented in graduated steps of
generality. First the derivation for one criterion (dependent) variable and
one predictor (independent) variable is presentd for the raw score model.
A
derivation for two raw score predictors is then presented.
Next, the
derivation for the three predictor case is formulated.
Finally, the
derivation for any (finite) number of predictors is presented.
Derivations
for the standard score model are then outlined.

Overview of Regression Analysis

Prior to presenting the derivations, a brief overview


3

of regression analysis will be given. Let us consider


the linear regression model for one raw score criteHon and
one predictor. Assume one is attempting to predict one
criterion with one predictor.
We assume that the model
4

is linear in form.
The mathematical model we might select
to"fit" such a distribution is the simple linear equation:

A
a + b X
1 1
Where:

A
Y

the predicted criterion,


= the slope intercept term,

the slope coefficient term,


1

the predictor variable in deviation score form ;

i.e.,

x -

where
1

"T

is the arithmetic mean.

If a scatter diagram were constructed for this hypothetical model (based on


actual data, of course), the actual rau score observations would in all
likelihood not fall on the line defined
A

by the linear equation of the idealized mai-hematical model (Y).

A
Such deviations from Y
are considered errors of prediction.
We can
conceive a raw scor :? observation as consisting of a component predicted by the
model plus an error component. That is:

+ e
Where:
Y

the actual criterion we want to predict by YA


the amount of numerical error resulting from using

A
the idealized mathematical model
actual raw score criterion (Y).

(Y)

to predict the

an actual dependent (criterion)


variable score consists of the
quantity predicted by the idealized "best fitting" line plus an error
component.
That is,

The error made in predicting the observed criterion score by the model

is

simply:

enY-Y
One of the goals of regression analysis is to minimize the prediction error
denoted by e above. It can be seen that if e.0, then the actual criterion is
perfectly predicted by the selected mathematical model.
That is to say, the
simple linear
equation fitted to the observed data points,

a + b x

1 1

predicts every observation (Y)

when egO, every Y score falls


values corresponding to a and
algebra based on the observed
exist in the social sciences.
procedures which will provide

in the distribution.

Geometrically,

on the straight line, Y.


For this case, the
b can be solved empirically using elementary
data. Rarely, however, do such distributions
Consequently, we are forced to select
computing formulas for calculating the a and b

terms.

The technique most often used in the social sciences -`o minimize the
error of prediction is the "least squares" procedure.
Essentially,
this
procedure seeks to maximize predictability by minimizing prediction error.
The least squares criterion or goal is summarized in the following
expression:

a minimum

If we substitute the quantity for Y


least squares criterion as:

previously defined, we can rewrite the

Ie

(Y-a-b x )

(a+b x
1 1

minimum

(As an aside, "least squares" means we determine values for a and b such that
the squared error term results in the least possible value).
The Standard Error of Estimate

The standard error of estimate provides a measure of the


for Y score prediction.
average amount of error that results from using Y
(See
Lindeman, et al.).
The unbiased standard error of estimate for one
predictor is defined as follows:

...

(Y-Y

(X

1-17-(a+b

.)

X )1

E(Y- a

b x )
1 1

n-2
where:

Y.x
1

the unbiased standard error of estimate for


one predictor,
the sample size.

Note that the predictor variable (x )

is

in deviation form.

However, the criterion to be predicted (Y) is not transformed;nor


A
do we transform the predicted criterion (Y),
This is the definitional formula for the unbiased standard error of
estimate. An equivalent formula shown in virtually all
applied statistics textbooks is as follows:

10

n-1 (1-r

x y

Y.x

n-2

S
1

where:
S

the standard deviction of the actual criterion score,

the square of the simple Pearson correlation between

x Y
1

the predictor in deviation form (x ) and the


criterion (Y).
1

This formula will be derived in this paper.


In general, the standard error of estimate con be
obtained for a linear regression model containing any finite
of predictors. If we let p represent an indefinite number
of raw score predictors, the unbiased standard error
of estimate can be expressed as::

BEST COPY AVAILABLE

11

number

Y.x ,x

,...,x

X(Y- Y

.411".11

n-(p+1)

(1- R

(n-1)

Y.x

, x

n-(p+1)

where:

Y.x ,x
1

the unbiased standard error


of estimate for p predictors
(in
deviation score form),
an indefinite number cf predictors,

Y.x ,x

the squared linear multiple


correlation between one criterion and
p predictors.

This formula also will be derived in this paper.

The standard error of estimate also can be de:ived for regression models
in which the variables have been expressed in standard score (Z) form.
The
unbiased sample standard error of estimate for a one predictor standard score
linear

is defined as

model

n-2

2.

E[zy

(A +

iij
n-2

where:

the standard error of estimate for the


standardized criterion (Z ) and the
standardized predictor (Z ),
1

a the sample size,

8 the slope intercept term,

the standardized predictor,

weight.

the beta (regression)


3.

the prediction error.

We show that the definitional formula above is equal

n-1 (1-r

Z .Z
Y

Z ,Z

3.

n-2

13

to:

where:

the squared correlation of Z


Z

and Z

1
1

For standard score variables, the unbiased standard error of


estimate for p predictors is:

n-1

Z .Z ,Z
Y

,...Z
2

n-(p+1)

2
-

Z .Z
Y

...,Z

...,Z

where:

...,Z ,...,Z

.Z ,Z

unbiased standard error of estimate


for p predictors,

.Z ,Z

Z
Y

squared multiple correlation


between the criterion (Zy)
and p standardized predictors.

In this paper we will concentrate on the standard error of estimate


for the raw score model. The derivations for the Z score model will
The reader may wish to work out the derivations
be outlined.
for the standard score model using the detailed presentations for
the raw score model as a guide.

Derivations for Raw Score Model

In the next several sections, we will show the derivations of the


We begin with the
unbiased standard error of estimate for raw scores.
simplest case of one criterion and one predictor.
Derivation for One Predictor

For the readers convenience in working through the algebra, we will


This is done in Table 1.
summarize relevant definitions and formulas.

14

Table 1

Basic Sample Descriptive Statistics for One Predictor


Raw Score Model

Regression Model

a+bx V+ r

1 1

91

1
1

Variance of Y:

r(Y-v)

n-1

n-1
2

Variance of X

n-1

n-1

Correlation
x
and

of
:

(n-1) S S

91

Y 1

Note: All summations range from i.1 to in observations.


a

This is derived from the least squares criterion ;i.e.,


2
-

i.1

r .(Y -a-b x )

1.1

i.1

1 1

n
N

minimum
i

See O'Brien, 19830, p. 44

See O'Brien, 19830, for justification that the numerator in the


correlation formula may be given as:

9 or IX Y, where

Y,

X - 7

x
1

In this paper, we will use the correlation

15

and 9 .Y-Y.

expression

10

(or r

).

92

yl

We begin by repeating the definition of the unbiased

standard error

of estimate:

Substituting for

Y.

1:(Y -

a - b x )
1 1

Y.x

n-2

It will be easier if we work with the variance err,or of estimate.


This is simply the square of the standard error of estimate:

2
2

Y.x

a - b x )
1 1

(Y
1

n-2

It was shown by the author that the slope intercept term,

a,

(See O'Brien, 18830, p.44).


equal to the criterion mean, V
Making that substitution and rearranging terms:

16

is

11

2
2

p-7)

Y.x

b x )]
1 1

n-2

Let us express (Y-7)


algebra:

in deviation score form

to simplify the

This gives us:

Y-Y.

E (y-b x )

Y.x

1 1

n-2

Squaring out the terms inside parentheses for this binomial


expression:

2
2

Y.x

+ b
1

2
x

n-2

- 29b x )
1 1

12

Bringing the summation operator inside and factoring constants


functions

out54de the summation operator (recall that b


1

as constant to be estimated in the regression mcdel):

2
2

Y.x

2
1

x 9)

-2b

n-2

Substituting the following expressions (see Table 1):

(n-1) S

N:x

(n-1)S
1

91

(based on substitution from


Table 1 and O'Brien, 1983a, p.44)

(n-1)r

Y
1

S S
91 9 1

Thus:

Y.x

(n-1)S

S / S )

(r

91

(n-1)S

n-2
-2(r

S /5
91 9 1

(n-1)r

S S

9191

Factoring out the (n-1) term:

18

)]

13

2 2
(S /S ) S

(n-1)

y1

2r

y1

(n-2)

Simplifying:

(n-1)

(n-2)

[li

(n-1)

Y.x

91

9
1

2r
9

91

91

2
S

1.

(n-0

91

(n-2)

Taking the (positive) square root, the unbiased standard error of estimate
for one raw score predictor is:

Y.x

ENO OF PROOF

(n-1) 1

(n-2)

19

14

Derivation for Two Predictors

In this section we seek to show that the unbiased standard


error of estimate for two raw score predictors is:

Y.x ,x

(n-1)

- R

Ii

.x ,x
Y

(n-3)

where:

the observed criterion standard deviation,


squared

Y.x

the/mUltfpfle correlation between the criterion and


the two raw score predictors (in deviation score

x
1

form)

We begin with the definition of the unbiased standard error


of estimate for two raw score predictors:

A
Y.x ,x
1

(Y-Y

E(Y-Y

n-(p+1)

n-3

2
BY

-a -bx- bx
1 1

2 2

n-3

As in the one predictor derivation,


variance error of estimate:

20

it will be easier to work with the

15

2
2

S
Y .x

SE(Y- a- bx -bx)

, x

2 2

1 1

r.w.gw.m.m.Mgmm.wa./Mr.

n-3

Substituting -sir

for the slope intercept term and rearranging:

S
1

-bx -bx)

Z[Y-7)

Y.x ,x

1 1

2 2

n-3

Now, expressing Y-7


trinomial expression:

in

deviation form and expanding the

Y.x ,x
1

n-3

[y+bx +bx
1

- 2yb x
2 2
1 1

-2yb x

+ 2b b x x ]
1 2 1 2

Bringing the summation operator inside and factoring constants:

21

16

2
S

--

Y.x ,x
1

n-3

1
+

2b b
1 2

7x

( 1:9 + b

rx

" 1

+ b

5-

2 ' 2

x )

1 2

The following formu)as can be used for simplification:


2

(n-1)S

L9
2

(n-1)S
7Ex

(n-1)S
2

EA2

S S

(n-1)r
91

9 1

92

S S
9 2

12

S S
1 2

(n-1)r

(n-1)r

For ease reference, these formulas are summarized in


Table 2.

24 2

17

Table 2
Substitution Equations for Two Predictor Raw Score Model

(n-1)S
9
2

(n-1)S
1

Ex

(n-1)S
2

Ex

y1

S S
y 1

y2

S S
y 2

12

S S
1 2

(n-1)r

(n-1)r
2

yx x

(n-1)r

1 2

Note: equations are expressed in deviation score form.


Each equation is based on algebraic rearrangements for
basic sample descriptive statistics (compare Table 1).
For example, the variance of Y is:
2
S

E(Y-7) /(n-1)

"

Ey /(n-1)

Solving in terms of

(n-1)S

23

18

Making these substitutions:


2 2

+ (n-1)b

1 1

+ (n-1)b S

fin-1)S

S
2

n-3

-2(n-1)b r S S
1 91 9 1

2(n-1)b r
S S
2 92 9 2

+ 2(n-1)b br SS
1 2 12 1 2

Factoring out the (n-1) term and rearranging:

r2

(n-1)

Y.x ,x
(n-3)

2
+

(b
1

2
+

b
2

-2(brSS+brSS
1 91 9 1

2b br SS)
1 2 91 9 2

2
)1

2 92 9 2

The next step is very important. The two terms in parentheses


reduce to functions of the squared multiple R for two predictors.
As was shown in the author's 1383a paper, the derivation of
R for two predictors results in several equivalent ways to express
2

R or R
Table 3 shows forms of R which will be used in the
next step. (Compare O'Brien, 1383a, pages 12-18, especially p. 18).
.

24

19

Table 3
2

Functions of R

for Two Raw Score Predictors.

bS +bS
2

+ 2bbrSS
1 2 12 1 2

br SS +br SS
1

'21

y 1

2 y2 y 2

Rearranging:

2 2
R S

b
1

+ 2bbr SS

Note: R

+b

1 2 12 1 2

2
a

Y.x ,x
1

See O'Brien, 1983a.

25

br SS +br SS
1 y1 y 1

2 y2 y 2

20

Thus,

bS +bS +2bbr SS

R S

1 2 12 1 2

br SS +br SS

2 92 Y 2

1 91 Y 1

Making these substitutions:


n-1

Y.x ,x
1

n-3

+ S

25

R
9

n-1

2-1

n-3

n-1

1- R

n-3

Y.x ,x

Taking the positive square root, the unbiased standard error of estimate for
two raw score predictors is:

(n-1)

Y.x ,x
1

-------

[1-

END OF PROOF

Y.x ,x

(n-3)

Derivation for Three Predictors

Prior to showing the derivation for the general case of p predictors, we


will present the derivation for the three predictor model.
This allows us to
review the logic and procedures of the derivation.
In addition, we introduce

26

21

summation notation throughout all of the steps of the derivation which


simplifies the algebra for the general case.
For three raw score predictors, we will show that:

2
S
Y .x , x
1

, x

n-1

Y.x ,x

n-4

We begin by presenting the definition of the unbiased standard error


of estimate for three predictors:

CY

Y.x ,x ,x
1

n- (p+1)

-bx -bx)
1

2 2

3 3

n-4

As before, we will work with the variance error of estimate:

27

22

a- bx -bx -bx)
2

3 3

2 2

3.

Y.x ,x ,x
n-4

Proceeding as before, we first replace a with V


express

as 9:

2
S
Y .x
1

- bx -bx -bx)

3 3

2.2

1 1

, x

, x

and

n-4

[9

-bx - bx -bx
3 3

2 2

1 1

n-4

Expanding this quadrinomial expression:

2
3.

Y.x ,x ,x
1

n-4

LI
2

+b

2
x

1 1

2
x

3.

- 29b x

2
+

29b x
2 2

2
x

- 296 x
3 3

+ 2b b x x + 2b b x x + 2b b x x
2 3 2 3
1 3 1 3
1 2 1 2
Bringing the summation operator inside:

28

23

Y.x ,x ,x
2

n-4
2

+b 5x +b yx

+b
1

rx

- 2b
1

- 2b

x y
2

2b brxx
1 2

1 2

x y

2b

2b b
1 3

+ 2b b cx x

1 3

2 31--

2 3

The following substitution formulas stated in general form will help us to


simplify the above expression (see Table 4 for reference):

(n-1)

ru

2.

For any x
2

For any A lc
Y 4

EA X.

(n-1)r

S S

Y
j

For any x x
i

S S

(n-1)r
i

ij

Applying these substitutions:

29

24

2
2

Y.>: ,x ,x
1

En-1)5
3

n-4

(n-1) b
1

- 2(n-1)b r

+ (n-1) b

SS

- 2(n-1)b r

+ (n-1)b
2

1 91 9 1

2
S

SS

2 92 9 2

- 2(n-1)b r

SS

3 93 9 3

+ 2(n-1)b br SS + 2(n-1)b br SS + 2(n-1)b br SS


1 2 12 1 2

1 3 13 1 3

30

2 3 23 2 3

25

Table 4
Generalized Substitution Equations For Raw Score Model

(n-1) Ly
9

(n-1)

Ex

S S

(n-1)r

9j 9

S S

(n-1)r

CE:x x
j

ij

For example, the second equation applies to any X variable;


for the jth X variable, the sum of squares is related to the
jth variance.

31_

26

Factoring out (n-1) and rearranging:

n-1

Y.x ,x ,x

n-4

S +b S +b S

+(b
1

'3

2bbr SS

-2(br SS+br SS+br SS


1 91 9 1

SS

.r2bbr

1 2 12 1 2

2bbr SS)

1 3 13 1 3

2 3 23 2 3

)1

3 93 9 3

2 92 9 2

We now express the parenthesized terms in summation notation (see O'Brien,

13830:

n-1

maillma.11=1111Mlis

x ,x

Y.x
1

Sy

n-4
3

3
+

j1

+ 2 1 r bbr SS)

j2 1.1

ij

3
-

2 (' z br SS ) 1

ji

9j V

Table 5 shows equivalent forms of R

for three predictors stated

in summation notation.

32

27

Table 5
a

j'1

Raw Score Predictors

For Three

Functions of R

bbr SS

2 E.

j'2 i'1

ij

2: b r
j.1

S S
yj Y j

Rearranging:

2 2

RS

j1

Note:

j=2 in1

Y.x ,x ,x
a

gEbr SS

2 'Er bbr 5 $

See O'Brien, 1983a

33

ij

j'l

9j

28

Thus:
2

R S

b
j

bbr SS

ja2 ia1

ij

br SS
j

9j

Substituting:

n-1

Y.x ,x ,x
1

2 3

2
S

n-4

2
+

Simplifying:

34

2
R

2S

R
9

29

IiR

n-1 S

Y.x ,x ,x
1

n-4

or

n-1 S

Y.x ,x ,x
1

[1

Y.x ,x ,x

n-4

Therefore, the unbiased standard error of estimate is:

END OF PROOF

n-1[. - R

Y.x ,x ,x

Y.x ,x ,x
1

n-4

Derivation For p Predictors

In this section, we show the general form of the unbiased


standard error of estimate when the regression model contains
some unknown but finite number of predictors (p).
We will follow the same steps in the derivation we used for one,
two and three predictors. It will be seen that the derivation
for the general case of p predictors is a straightforward
multivariate generalization.
Formally, we will show that the unbiased standard error of
estimate for p predictors is:

35

30

S
1

1 n-1

n-(p+1)

Definitions for terms in the formula were given in the section


"Overview of Derivation".
Starting with the definition of the unbiased standard error
of estimate:

2
)

n-(p+1)

1: [Y- a-bixi-b2x2-...-b.x.-...-b
J

P P

n-(p+1)

As in the previous derivations, we will work with the variance error of


estimate:

7 (Y-a-b x -b x -...-b x -...-b x )

I--

1 1

2 2

p p

n-(p+1)

Now replace a by V

and express Y-7

in deviation score form:


2

1:

Y.x ,x ,...,): ,...,x


1

1 1

2 2

n-(p+1)

36

P P

31

Expanding this multinomial:


2

X
Y .x

, x

n- (p+1)

E(9 +b
1

+...+ b

+...+ b

ii

2
x

29b x -...-2yb x
j
j
P P

- 29b x
2 2

- 29b x
1

+b

+ 2b b x x + 2b b x x +...+ 2b b x x
1 2 1 2
1 3 1 3

ijij

+...+ 2b

b x

x )

p-1 p p-1 p

Bringing the summation operator inside:

Y .x

, x

n- (p+1)

rx

( E9 + b
1

+ 2b b
1 2

2
x

1 2

r.x

+...+

...-2b

...-2b ST' yx

sx

j/

+ 2b b

+...+ b
2

2b 7yx
1

- 2b \5,
1

x x

+..

P
b

1 3' 1 3

+...+2b

.! 4 i

b 1:x
p-1 p

Using the generalized substitution formulas given in Table 4,


can simplify as follows:

37

x )

p-1 p
we

32

Y.x ,x

...,x

n-(p+1)

(n-1)S

(n-l)b

- 2(n-1)b r

S S

+...+

- 2(n-1)b r
S S
2 y2 y 2

1311

(n-1)b

(n-l)b

2(n-1)b r

''

S S

.-

JJJ

S S
2(n-1)b r
P YP
P

+ 2(n-1)b b r S S + 2(n-1)b b r S S +...+ 2(n-1)b b r


1 2 12 1 2
1 3 13 1 3
j

S S +...+
ij

b r

2(n-1)b

p-1 p p-1,p p-1 p

Factoring out (n-1) and rearranging:

n-1

Y.x ,x

n-(p+1)
2

[S2
2
+

(b

S
1

b
2

+...+

+..+

ii

2b b r S S + 2b b r S S +...+ 2b b r
1 2 12 1 2
1 3 13 1 3
j
i

b r

2b

2
S

+...+

S S
ij

S )

p-1 p p-1,p p-1 p


2(b r
1

SS +...+br SS)
j9j9j
P913913

SS +br SS +...+ b r
y1 y 1

222

Expressing the terms in parentheses in summation notation:

38

33

n-1

2
S

Y.x ,x ,...,x ,...,x


2

n-(p+1)
P

S
9

2...

(L

j1

2(r
P

J*1

2
S

bbr SS)

j2

jiji j

br SS )1
j

9j

Table 6 shows equivalent forms of the multiple R


predictors (see O'Brien, 1983a).

39

for p

34

Table 6
2

Functions of R

for p Predictors

jl

bbr SS

j2 i1

ij

r br SY
j1

9j Y

Rearranging:

RS

jal

Note:

E b

2
S

2 LEbbr SS

p2 i1

Y.x ,x , ...,x , ...,x


a

See O'Brien, 1383a

40

ij

br SS
j

jm1

9j 9

35

Thus:

2 2

RS

1E:b
9

p-1

+ 21bbr SS

j.2 i1

ij

'tbr SS
j=1

9j 9

Substituting into the variance error of estimate above:

n-1

Y .x

, x

n- (p+1)

n-1

x ,

2
S

Y.x

- 2R S

+ R S

9.x

n- (p+1) 9

x
2

Therefore:

n-1
s
x

Y .x

x
j

R
9

Y.x

n- (p+1)

END OF PROOF

41

x
1

36

Derivations for Standard Score Model

Introduction

We have presented derivations for the unbiased standard error


of estimate for the linear raw score model when the number
of predictors was one, two, three and some finite number, p.
In this part of the paper we will outline the derivations
for the standard score model.
The reader may be aware of the fact that there is
a simple relathnship between models in raw score form
and standard score (Z) form. This relationship obviates the
need for presenting detailed derivations for the Z score model.
Therefore, we will outline the derivations for the standard
score model, and leave the proofs as an exercise for the reader.
We will show the logic behind transforming from the linear
raw score model to the Z score model. First we take the standardized
model for one predictor. We then provide an outline
for generalizing the derivation for the p predictor standard
score case.

42

37

Derivation for One Predictor


Recall the derivation for the one predictor raw scot-, model.
The derivation of the standard error of estimate was shown
to be:

2
n-1.

[1-r

--9

n-2

Let us now consider the model in standard score form.


First, recall the following relationships for the Z score
model (See O'Brien, 1982b for proofs):

2
r

Z ,Z
Y

91
1

43

38

the standard deviation for the raw score


variable Y is equal to unity when Y is standardized.
Also, the square of the simple (zero order)
Pearson correlation when calculated in raw score form
is identical to the correlation between the same variables
that have each been standardized.Taking these facts into account,
we can rewrite the raw score standard error of estimate
for Z scores as follows:
That is,

n-1

1-r

Z ,Z
n-2
9

n-1

1-r
x y

n-2 L

1[7::
x y

n-2

44

39

If one were to extend this logic to the case


of p standardized predictors, the standard error of estimate
for p standardized predictors is:

n-1

,...Z 2

.Z ,Z
Y

n-(p+i)

Z .Z
Y

Z
2

(n-1)

[1- R

Y.x ,x
1

,...,x.
2

n -(p+1)

also is equal to 1.

For the p predictor case S


9

It remains to be proved 'That the squared multiple R's


are equal to one another.. It can be shown that they
are equal for p predictors, although this statement is
not proved in this paper.

45

40

Outline for Derivations

The reader who desires to derive the unbiased


standard error of estimate for p linear standardized
predictors may use the following outline as a guide.
Essentially, the steps parallel those for the raw
score model. First, the definitional form for
the standard error of estimate is stated.
Substituting the terms of the regression model
for p predictors is the second step. (See O'Brien, 1983c).
Third,
square the multinomial expression. Nex+, a
series of equations are substituted into the squares
and cross products of the squared multinomial. The
reader may refer to the author's paper (1983c) for
the relevant equations. The simplified expression
Functions
is then expressed in summation notation.
of the multiple squared R are substituted.
Upon
simplification, the result will be the unbiased
standard error of estimate for the Z score model.
Many students who work out the derivations for
the Z score model prefer to
work with several
predictors in succession. This was our approach
for the raw score model derivations. A careful review
of the steps used in the raw score derivations
9

may be helpful

in working through

the long tedious algebra.

46

Appendix A
Errata for "A derivation

of

the samplemultiple correlation formula

for raw scores,


Page

ED 235 205

Correct to

Nol.irReads

10, footnote,
3 lines down

X Y

X Y

10, footnote,
4 lines down

n X Y

13

var (b ,x )

n X Y
1

var(b x )
2 2

16, footnote,
last 2 lines

... and simplifying.iSee


text for details.

See the text for details.

17, footnote

Multiple R

multiple R

24, footnote 1

Omit this.

29, equation

x Y

b
p

30, 3 lines from bottom

x
p

bbr SS

bbr SS
2 p 2p 2 p

2 j 2j 2 j

34,2nd equation

36, 2 lines from


bottom of text

x
p

change = to +

=...+b r S S
j yj Y j
mathematical calculus
2

mathematical statistics
2

38, 2nd equation


1
2

43, last line in text

47
Page number-at -top of text:-

4 2

Appendix

Discussion of Linear and Nonlinear Regression Models

This appendix will clarify terminology used in two previous papers


(O'Brien, 1982c, 1983a).
Some readers have requested clarification of my use
of terms "linear" and "nonlinear' as they apply to regression analysis.
There are two reasons why this should be done.
First, the terminology
and/or notation used in applied social science statistics textbooks and
similar sources is quite variable. This has the potential for causing
confusion in students' minds when attempting to read the same subject matter
in different sources. Second, it is very important to be clear about the
differences between a linear and nonlinear regression model. As will be seen,
"truly" nonlinear regression models are not often used in many areas of social
science.
Our aim in this appendix merely is to clarify the uses of the
terminology. References are cited at the end of the appendix for readers who
desire to learn more about nonlinear regression models.
I believe confusion exists in the use of the terminology for several
reasons.
Perhaps the basic factor relates to what students learn in
The terms linear/nonlinear as they
nonstatistical mathematical courses.
relate to functions or rolationships discussed in mathematics textbooks are
not used in the same way by statisticians when discussing linear/nonlinear
regression models.
Consider a simple example of the parabola (or quadratic or second degree
equation):

2
Y

f (X)

-3 < X
MM.

48

< 3
111

4 3

If this function is plotted on ordinary graphing paper for


values of X + 3
the plot would show a curve opening downward
with maximum height of 8 Y units at the origin. This function is
not linear in form because it cannot be expressed in the
form of a first degree equation:

i(X)

a + bX

Geometrically, a plot of the quadratic function above


would not reveal a straight line or linear function.
For these two reasons/the parabola may be thought
of as a "nonlinear" function.
Statisticians use the terms linear/nonlinear in
a different manner. In the statisticians use of the
terms, the difference between them has more to do
with the form of the regression parameters (slope terms)
than with the form of the independent or dependent variables.
In addition, a plot of the raw observed data points
is not relevant to classifying a regression model as linear
or nonlinear.
Let us examine some examples. Assume the following regression
model (adapted from Draper and Smith, p. 264):
2

exp(b

+
1

e)

Where:

exp

b,,b
XI

2-

the dependent variable,


the exponentiation operator for the mathematical
constant,
2.71828 (approx.),
e
parameters to be estimated,
the independent variable,
the stochastic error term (as used in this paper).

Note that equation 1 expresses what we have been calling a


"raw score model";e.g., for equation 1, we could write:
A
F
exp(F
+ e).
Is the model in (1) a linew, or nonlinear regression
model? We need to examine the terms in (1) to decide.

Let us now rework equation 1 to render the model linear.


If we take the natural logarithm of each side of equation 1,
we obtain:

(1)

4 4

lnExp(b

lnF

b X

+
1

e).)

2
2

b X
2

(2)

We now redefine the terms in equation 2.


Y

lnF,

Let:

Then (2) becomes:

Ymb + bX
1

+e

(3)

Equation 2 has been linearized. Statisticians would call the regression


model expressed in (3) a linear model despite the fact that the relationship
between the dependent and independent variables is not one of a straight line.
Draper and Smith offer useful terminology to distinguish (1) from (3).
The regression model stated in (1) may be referred to as intrinsically linear.
This means that although equation 1 is nonlinear (with respect
to the parameters b and b ), transformations
2

can be made to express the model


the parameters).

in a form which is linear (with respect to

To take a second example (also from Draper and Smith), consider the
following regression model:

X]

exp(-b X) - exp(-b

b - b
1

+ e

(4)

Where:

the dependent variable,


as in equation 1,
the parameters,

exp
b ,b
1

the independent variable

is nonlinear (with respect to the parameters). In addition,


equation 4 cannot be transformed such that the parameters will be linear in
This model

45

Draper and Smith refer to such a regression model as


form.
intrinsically nonlinear.
Further discussion and examples of linear/nonlinear regression mw.;els
may be found in Kendall and Stuart (1967), Mosteller and Tukey (1977) and Nie,
et al. (1976). Those references provide additional source material.

46

Notes

Page
See O'Brien (1983a, Appendix B) for an errata sheet.
references given in the errata pertain to the original pagination
(i.e., at the top of the page).

Errata for this paper are given in Appendix A of the present paper.

Readers who need to review regk-ession analysis theory can refer


to standard applied statistics textbooks.
One that is highly
recommended for its thoroughness and clarity is by
Lindeman, Gold and Merenda (1382).
A general overview is given
by Lewis-Beck (1980).

See Appendix B for discussion of linear and nonlinear


regression models.

5
If it is understood that the summation limits range from the first
observation (im1) to the last (imn), then we can drop the summation
limits; n refers to the total number of observations for the
criterion and predictor(s). This sample size is the same
Later when the algebra
regardless of the number of predictors.
becomes more complex, we use summation limits extensively.
6

As mentioned earlier, it is assumed that the reader is familiar


with the author's 1983a paper.

The regression model for one standardized predictor ;6:

52

47

ZA

A+BZ
1 1

The observed standard score model

is:

+ e

where:

Z"

the predicted criterion ;11 standard score form,

the slope inercept ierm (not standardized-see O'Brien, 1982c)


the standardized predictor; i.e.,

where S

(x -7 )/S

z
1

is the
1

standard deviation of X
1

slope term (regression or beta weight)


1

the prediction error.


8

The reader may wonder why we divide by the term, n-2.


This term
represents the degrees of freedom for the unbiased standard error
of estimate for one predictor.
It can be shown that dividing by the appropraite degrees of freedom
term makes the sample standard error of estimate unbiased;i.e.,
the expected value of the sample standard error of estimate
equals the population parameter.
In general, the degrees of freedom for the unbiased
standard error of estiNate is: n-(0.1)
where p
the number of
predictors in the regression model. For one predictor,
n-2.
n-(1+1)
p + 1 arises from the number of parameters
n-(p+1)
that can be estimated in any raw score linear
revession model--p slope (b) terms plus the slope intercept term.
For a good discussion of degrees of freedom, see the
classic paper by Helen Walker (1940,1971). See also Stilson (19BG).
,

An alternate approach to the derivations could be used


by working with matrix albegra notation. The author
intends to present the derivations of this paper
and others in this series in matrix algebra.
They
will be written as part of this series for ERIC.

53

48

References

Draper, Norman and Harry Smith.


Wiley 8 Sons, 1966.

Applied Regression Analysis.

New York= John

Kendall, Maurice G.
and Alan Stuart. The Advanced Theory
of Statistics: Inference and Relationship, Vol. 2.
(2nd ed.). New York:
Hafner Publishing Co., 1967.

Lewis-Beck, Michael S. Applied Regression: An Introduction. Beverly Hills,


CA:Sage Publication, 1880.

Lindeman, Richard H., Ruth Gold and Peter Merenda.


Bivariate and Multivariate Analysis. Chicago: Scott, Foresman and Co., 1882

Mosteller, Frederick and John W. Tukey.


Data Analysis and
Regression:
A Second Course in Statistics. Mass: Addison-Wesley
Publishing Co., 1877.

Nie, Norman H. et al. Statistical Package for the Social Sciences (2nd ed.).
NY: McGraw Hill Book Co., 1875
2

O'Brien, Francis J., Jr. A proof that t and F are identical:


the general case, 1882a. ERIC ED 215 884
.

Proof that the sample bivariate correlation coefficient has limits +


ERIC ED 216 874

1882b.

. A derivation of the sample multiple correlation formula for standard


scores, 1882c. ERIC ED 223 428

A derivation of the sample multiple correlation formula for raw


scores, 1883a. ERIC ED 235 205
.

Stilson, Donald W. Probability and Statistics in Psychological


Research and Theory.
San Francisco, CA:Holden-Day, Inc., 1866.
Walker, Helen A.
Degrees of Freedom. Journal of Educational Psychology, 31,
1840, 238-269. Reprinted in Readings in
Statistics for the Behavioral Scientist.
Joseph A.
Steger (Ed.). NY:
Holt, Rinehart a Winston, Inc., 1871.

54

.7

-DOCUMENT'RESUME
A

SE 037 098

.ED '215 894

O'Brien, Francis J., Jr.


A Proof That t2 and F are Identical: The General

AUTIOR
TITLE

Case.
,24 Apr 82

PUB DATE
NOTE

EDR PRICE
DESCRIPTORS

20p.
.

'

MF01/PC01 Plus Postage.


*College Mathematics; Equations (Mathematics); Higher
Education- Instructional Materials; *Mathematical
Applications; Mathematical Concepts; *Mathematical
Formulas; Mathematics; *Proof (Mathematics);
*Research Tools; *Statistics; Supplementary Reading
Materials

ABSTRACT

This document proves that the F statisticcan be


obtained by squaring ,t -test values,, or that'equivalent t=test values
may be obtained by extracting the positive square roots of F values.
Proof to varying degrees of*completeness and accessibility 16s been
given by other scholars, but generally these prier statements,
particularly those available to students of education or psychology,,
focus'on the special case,4rhen sample sizes are equal. No source

could be found that provides a complete,\ detai.led'prZof of the


Atueents
of.
applied
general case that was understandable-to
statisti.cs. This document seeks to give a clear-step-by-itep proof,

MO'

with,a numerical example worked out, anda plan is provided for


proving the special case. It is felt the reader should te able to.
follow the proof of the general case, and should therefore have
little difficulty in translating the acquired knowledge into proving
the special case. (MP)
'

***********************************************************************
Reproductions supplied by EDRS are the best thtt can be made
*
from the original document.
****************-*******************************************************

/VS. DEPARTMENT OF EDUCATION


NATIONAL INSTITUTE OF EDUCATION

"PERMISSION TO REPRODUCE THIS


MATERIAL HAS BEEN GRANTED BY

EOUCATIONAL RESOURCES INFORMATION


CENTER IERICI

ivori:ns document has been reproduced as


received, from the person or organzahon.

A Proof that t

and F are Identical:

the General. Case

onognaung a

-_

Mrnor changes have been mad, Io anprOve


reproduction

Francis

p_ oats of view dr opiniOnS sta d n this (loco

relent do not,necessanly rep,

F.

O'Brien, Jr., Ph.D.


TO THE EDUCATIONAL RESOURCES
INFORMATION CENTER (ERIC)"

root NIP

POSMon Or poky

.1t is well known that a researcher wilt) has. collected data from two

independent groupsmA,

rforth either a t-test for independent samples

or a one-way analysis 6f variance for two cfroups.

This is because

knoWledge of results from one type of computation can be transformed into


an equivalent result)for the other type of computation.
r;

if 'a t-test for independent samples is calculated,

For example,

the equivalent

F statistic can be obtained by squaring the t-test value. Analogously,


if the researcher has available the F statistic obtained from a one-way
analysis of variance for two groups, the.equi'valent t-test value may be
obtained by extracting the positive square root of ,the F value.
That is, t

= F or t = (F)

The proof that t

1/2
.

= F has been given to varying degrees 'of

completeness and licessibility to students by other scholars 'in,

'professional journals (See Rucci and Tweney, 1980 for citations).


.
[

Statistics textbooks commonally available to stude4te of education


1

....

or psychology

occasionally provide hintS for proving the'special case

of the relationship (when sample sizes are equal). (See, for example,
.

N.

Glass and Stanley, 1970)


The motivation fOr presenting the Proof is t.101ctold.

First, many

prior statements of the proof for the general case (of unequal sample
sizes) are either abbreviated, mathematically inaccessible or incomplete

for understanding this important relationship., A search, of


did not reveal a

the literature

source that' provided a complete, detailed proof of

'

the general case that was understandable to students of applied statistics.


Second, a-full sfep-by-step_ proof foi the general case will give reader's
4

a sense that statistics is not all just a babel of "Greek arithmetic".


As a dormer instructor of graduate lev'el applied s'Ntistics, I know

that many students can follow well-articulated proofs and desire to see them
worked out.
/

In this paper

three

tasks will be accomplished.

a clear step-by-step proof that t

Firsty

= F in the general case will be provided.

Second, a'buMerical example will be worked out.

-provided for proving the speciaLcase.

Third,

a plan will be

It is felt that the reader should

be able to follow the proof of the general case, and therefore, shOuld

have little difficulty in translating the acquired knowledge into


proving the special case.

Proof that t

= F: the Gen4ral
j
Case

Firs410eus lay out a table of. symbolic values'in order to


introduce a familiar notation and the var9.ab:Te7gAused in the proof.

This is done in'Table 1.


yhe'plan'for the proof is important for understanding the strategy
0

,involved in attacking a statistical proofr

that will be used here are given below:


1.

The steps of the plan

state the form of the t-test Statistic using the


notation of Table 1.
r

2.

square the t in step 1.

3.

state the form of the F. statistic using the notation


of Table 1.

4.

simplify alggbraiciIly the F statistic in step 3.

5_.

observe that t
t8 the squared t

amplified F of step 4 is equal


f step 2.

3.

Table 1

Table Layout

Notes for Table 1

for Two Independent Groups


1.

oup 1

sample sizes are assumed


unequal., That is,

Group 2

n.

X11

12
2.

21

the total sample size is

22
t..

31

41

n.

n.
2

32
3.

42

the grand mean (x..) is a


weighted mean since
sample sizes are unequal.
That is,
0

AP
+

X.

n.
1

n. 2X.2
'2

n.1 ,+

n.
2

X.

12

Ime

4.

s ?.

is not needed

for the proof. It is


included only for
completeness.

nl

Total

Sample
Size

n.

Sample
Mean

i.

n.

x..

X.2

Sample
Variance

n..

s.

s.

41

s..

N-

fr

5.

4-

\21'
This is the expresion we will use as a basis oecomparison with the.
simplified F statistic to be obtained in step 4.

t value is referred to simply as t

Note that the alcove squared

2
.

. o

Step 3.

the F statistic

It.\ \will be recalled that the F statistic is the ratio of two independent :
1

sums of squares: a between


squares (SS

).

and a within sums of

Also, eachk sums of squares is divided by an appropriate

'degrees of freedom term:


and df

sums of sqUares (SS

df

(between sums of.squares degrees of freedpm,

w (within sums of squares degrees of freedom).

the F statistic (for any number of groups)

The general 'expression

is:

SSb/df
SS

w/df

The genera1.form'ofdfb'is "the number of groups minus one" (i.e., dfb = J-1,
where J is the number of groups).
4

two groups is df

= 2

1 = 1.

For two groups, J = 2, and, so, dfb for

The- general form of df

size minus.the number of groups (i.e., df


we can Write

= n

+ n.
.1

df

='n

'1

J.
2

= n.. - J).

is

the total sample,


0

Since n.! =n.


1

+ n.

Hence for J=2 groups, we can write

Gr&mmarians will point out th-t the preposition "between" refers to


the relationship of two entites while "among "ore

g to more than two.

However, since the ,referencehere is ultimately td. two groups, "between'

'will be used instead of\the correct "among": 44afocept the righteous indig.

nation of the grammarian.

fr

5.

(N

This is the expresion we will use as a 'psis of'coMparison with the.


simplified F statistic to be obtained in step 4.
t value is referred to simply as t
*,
.

Step 3.

2
.

4%

Note that the above squared

1.

the F statistic

It\will be recalled that the F statistic is the ratio of two independent :


sums of squares: a between'
squares (SS

).

sums of sqdares

and a within sums of

( SSb )

Also, each,sums of squares is divided by an appropriate

'degrees of freedom term:

df

(between sums Of, squares degrees of freedp

and df
w (within sums of squares degrees of freedom).

the F statistic (for any nuber of groups)

The general 'expression

is:

SS.
b/df
FJ- b,n.. -J

SS

w/df

The general.form of'df b is "the number of groups minus one" (i.e., df

whereJ is the number of groups).

= J-1,

For two groups, J = 2, and, so, dfb for

two groups-is df

= 2 - 1 = 1.

The general form of df is "the total sample,


w

size minus,the number of groups (i.e., df


we

an Write

= n.1 + n.

J.
2

= n.. - J).

Since n.! = n.

+ n.
1

Hence, for J=2 groups, we can write

dfw,
44

ad

Gra.mmarians will.point out that the ipepotition "between" refers to


the relationship of two entiti,es while "among "ref

g to more than two.

However, since the iefereqcehere is ultimately tO two groups, "between'


'will be used instead of\the correct "among": .04 aotept the righteous indig.

nation of the grammarian.

-6.

Thus, using the notation of

Table 1, we carcwrite the F statistic

fof two groups as follows:

sb/i
F
1,n.

+ n.
1

- 2
2

ss

w/(n. +n. -2)


2

SS

SS

w
- 2)

+n.

(n.
1

In order to facilitate the proof, we will write'F in the


terms of means and variances.

familiar

That is:

n.

1nl+n. -2

+
1

n.

-X..)

(Z(

-1)s.

(n.
1

-1)s.

(n.
2

n.1 + n.2 2

We will refer to this expression as simply F.


2

It is instructive to now compare the vale of t

and F. for the readers.


I

convenience, we will restate t

'

and F so that they may be compared,,

arZ referred to later. This is done in Table 2.

7.

Table 2
L.

Restatement of t

and F

t2

F
41.

z
.-X.

(X.
1

(n: -.1)s.

(n.

+ n.2\

)s..

(A

-1)s.

:2

-R.:)2

+ (n. -1).
2
-2

,n.1

+ n.

(i.

+,n.
+

n.

(n.

)(n.

+ n.

n.

Step 4.- Simplify F.


This is the next to the last step in the proof.

proof requires several algebraic steps.


of,F

We will first

A full step-by-step
simplify the ,numerator

Notes pertaining to the algebraic manipulations are provided if

the margin for the readers convenience:

See the following pages*

c..
4

4..

Notes,

sta, by looking at the numerator of F (SSb).


Since,

n.

SS

n.

(;

1 "" 1

R..

(R..R..)2

n.

""=

n:1

+ n.

n.
1

n.2 4X

n.

Finding common denominators for each


bracketed term,

n.2 /

(n.1X.1 + n.2X.2

(n.1 +n.2)X.1

+ n.
1

n.2

+n.

(n.
1

))-(.

- (n.1X.1

R.

n.
2

2!

+ n.

1 n.
1

].
n.

L
SS

P n.

n.

we can substitute for X..

X.1

+ n.
1

Sat.

=
n.

n. 2X.21

+ n. X.

R.
1

qemoytng inside parentheses, multiplying


Aside:the group means and subtracting,

AP

V.

+ n.

.n.
1

SS

- n. X..

tl

n.

1,

- n. X.
1

4/

n.1 + n.
2

+ n.

R.

n.

n.
2

n. R.

R.
1

n.
1

2X.2
2

Cancellipg.like teifms,

+ n.

n.

n.

n.
1

,Factbring like sample size

terms,

+ n.

n.
1

n.

SS

-R.

(R.

+ n.

n.
1

2
2

n.

rR. --)

n.1

n.2

Squaring each term separately


inside the brackets,

10Na

SS

(h.

n.

2
1

(n.

(n.

2'
+ n.
1

/n.

qo(n.1) (n.2)

n.2(X.1 -X.2)2

n.

+ n.

-R.

(R.
1

1-

)2
1

-X,

(X .

b
(n.

SS

brackets,

(X.

Rearranging terms by bringing


all sample size terms outside

+ n.

(n.

)(n.

(n.

+,n.
1

(R.2-R.1

1
(n.

SS

-X.

(X.

1)

Factdring out like sample size


terms in'numerator and denom-:inator, A

y
lk

(n.1 + n.2)2

Note that (X. -X.

is the

same as (R.,-R.)? because


Oftufo =sq
ed`difer
the same to
Tes
same quantity. Hence, we. dan
can

411,

factor within thebrackefs


I

to obtain,

9.P

)(n..

(h.
1

(n

SS

+ n

+ n.21

(n

(X.1 1

)2

Simplifying (n.1 + n.2)

'2

and (n .

+ n .
1

) 2',

we obtain, 16-

2-

)(n. 2)

(n.

SS

(R.
1

-R.

This is just about itiNow sub

)2
2

stitute the value of SSt, just


obtained into the F statistic,
and obtain,

n.1 + n.,

4.

1_3

-14
'so

p
C

0/

11.

)(n.

(n.
1

(X.

n
'1

k
0

If we divide numerator and denominator by

-1.)s..

(n.

).

-1)s.

(n.

-X.
1

(n.

n;

(R.
1

F
2

-1)s.

-R. )2

+ n.

Step 5.

observe that t

- 2

= F

t (n.

+ n.

n.
.

ENO OF PROOF.

-1)s2

(n.

n.

we obtain,

4-n.

+ n.

n.

(n.

).(n.

) (n.

If the value of F above is compared with value of t

it will be seen that they are in the same'

form, signifying that they'are equal. Refer to Table 2 for this comparison.

This completes the entire proof in accordance with the five step plan.

We now turn to a numerical

example for two.independerit groups of unequal sample size.

15"

16

4
12.
t

Numerica1 Example

The analytib
proof using algebraic rules for the general case
.
41,

was given in great 'detail.

A numerical example should provide additional

insight.

The data and descriptive statistics are rovided in Table 3 which is


modeled do Table 1.

Note that the data were chosen for illustrative purposes

only.

Table 3

Data for Numerical Example


46.

Group 1

"`Group 2

10

50

40

70

20 "

100

50

40

'QM

75

ieb

4
.

90

'Total

Sample
Size

10

Sample
Mean

07.5

65.0

54.5

Sample
Variance

957.5,

700.0

NOT NEEDED

Refer to Table 2 for the t-test formula.

13.

Using the formula

-,

there, the t statistic

value is:

47,.5 - 65.0
t

(6-1)957.5 + (4-1)700.0
10

6887.5

J.
t

10

24

-.9240

1%.

If we, square this value, we obtain,

(-.9240)

.8537

(t

= .8537)

.We place the computed value in the margin for easy reference.

Now

compute the F statistic using the formula provided in Table 2.


.6(47.5-54.5)

+ 4(65.0-54.5)

(6-1)957.5 + (4-1)700.0

\_J

6 + 4

6(49) + 4(110.5)

F
6887,5
8

.8537

(F = .8537)

18

./

Thus, t

= .8537 and FL .8537 or t

14.

= F.

Prooff the Special Case

If the sample 'sizes are equal in eresearch design of


two independent groups, the proof that t
derive.

=F is somewhat easier to

Rather than perform the necessary algebraic manipulations, it

may prove instrpctive,forthe reader to actually derive it himself or herself.


Working_ through the analytic proof for the special case will solidify
Jr(AnUnderstanding of.the proof for th

general case.

Some hints will be provi ed for the reader in proving thC


special case.
1.

They are summarized as follows.


= n

= n.

n.

R.2

R.
2.

X.. =

That is, since sample sizes are equal,


r

used; it could be called n.


mean (X..- )
for X- ..

one symbol for sample size may be

Also, since samplX)zes are equal, the grand

is simply the average of the means of 6ach group.

should be substituted in the proof.

Thi5 value

de`

By making these two changes and by following the five step outline
used for the general case proof, the reader should be able to derive the
proof for the special case., One may.also wish to "make up" an.easy to work with
numerical data set to check

on the process.

I
0

Note

For students or researchers who enjoy proofs in applied statistics, the


following

two

references may be useful.

EdwardAll'en.L.

Expected Values of Discrete Random Variables and Elementary


.Statistics. New York: Wiley,"1964.

Guilford, J. P. and Fruchter, Benjamin. Fundamental Statistics in Psychology


and Education, 4th and 5th editions. New York: McGraw-Hill, 1973.

References
Glass, Gene V and Stanley, Julian_ C. Statistical Methods in Education
and Psychology. Englewood Cliffs, New Jersey: Prvitice-Hall, 1970.

Rocca, Anthony J. and Tweney, Ryan D. Analysis oalvariance and the "second discipline"
of Scientific isychology: a historicaraccount. Psychological
Bulletin, 1980, Vol.,87, No. 1, 166-184.

O
'fp

a MD

Das könnte Ihnen auch gefallen