Statistical Proofs For Undergrads

DOCUMENT RESUME
SE 037 439
"- ED 216 874
O'Brien,-Trancis J.
AUTHOR
TITLE'
Jr.
Correlation
Cdefficient Has Limits ((Plus or Minus)) 1.
P r
'
PUB DATE
NOTE
32p.
EDRS PRICE
DESCRIPTORS
.
.11
8'2
IDENTIFIERS
MF01/PCO2 Plus postagf.

*Correlation; *Educatidnal Research; Higher
Education; Mathematical Formulas; *Proof
(Mathematics); *Research Tools; *Statistics
*AppliecOStatistics
,
ABSTRACT'
er presents in detail a proof of the liMits,

This
of th'asample bivariate correlation cdefficient which requires only
knowledge of algebra. Notation and basic formulas of standard (z)
scores, bivariate correlation formulas in unstandardized and
istandardized form, and algebraic inequalities are `reviewed first,
since they are necessary for understanding the proof.'Then the proof
is presented, with an appendix.containing additional proofs of
related material. (NNS)
6
1
4
'
,P,
.
I
O
f,
****************************4,**************1***************************
*
*
Reproductions supplied by EDRS are the beet that can be made
* ,
.*
from the original document.
*****************************************v*****************************.fr
Proof That the Sample Bivariate Correlation Coefficient

U S. DEPARTMENT OF EDUCATION
NATIONAL INSTITUTE OF EDUCATION
EDUCATIONAL RESOURCES INFORMATION
CENTER IERICI
Has
Limits
-I:
"PERMISSION TO REPRODUCE THIS

MATERrAL HAS BEEN GRANTED BY
Ars aocument has been reproduced as

recereed from the person or ofganizahon
onginating It,
Minor changes have been made to Improve
Fraricis J. O'Brien, Jr.,'Ph.D.
reprOduCnOn Quality.
Points of crew or %mons sated fn this docu

meet do not necessaray represent
othcfal NIE
T,0 THE EDUCATIONAL RESOURCES

INFORMATION CENTER (ERIC)."
posafon or 'poky
Virtually all social science students who have studied applied
-4'
..
r%-
CO '-.
%CO
statistics have been introduced to the concepts and formulas
orslinear
s
correlation of two variables.
Applied statistics textbooks routinely
r--4
report the theoretical limits of the bivariate . correlation coefficient; tamely,

6.11.1
'HoWever,
'that the coefficient is no' more than +1 and no less than
no commonly used applied statistics textbook
proves this.
One of the
best textbooks available to students of education and psychology introdUces

o
the'prOof (Glass and Stanley, 1970).
UndoUbtedly, one of the constraints
placed on authors by publishers is space limitations available for detailed

explanations, derivations and proofs;
This paper will set forth in-detail a proof bf the limits of the sample
bivariate correlation coefficient. Since the proof requires only knowledge of
4
algebra, most students of applied statistics at the advanced undeigraduate

, or introductory graduate'level should have little difficulty in under-
standing.the proof.
As a former instructor of graduate level introductory
applied statistics; I know that the typical student can understand the
prodf as it is presented here.
The key fdr tinderstanding statistical proofs is a presentation
-
of detaile\d steps in -4 well articulated and coherent manner.
A review
1P4
of relevant statistical and mathematical concepts is also helpful

.
..
and
.\
'2.
1.
'
usually required).
When students are presented
in detail im-
portant statistical proofsithey feel that some of the mystery and magic
of mathematics has been unveiled.
My experience has been that the typical
student of applied statistics can follow a good number of proofs becguse

most proofs can be presented algebraically without use of calculus. In addition to
enhancing knowledge, an occasional
proof often increases academic
motivational
)
Some Preliminary
Concepts
The proof requires knowledge of several concepts in statistics

X
and mathematics. In order to make thispaPer self-contained, some
preliminary concepts stated in a consistent notation will be reviewed.

We will review the concepts and forMulas of standardiscores (z scores),
bivariate correlation formulas in unstandardized and standardized form,
and algebraic inequalities.
Notation and Basic Formulas
Table l'i,a layout of symbolic values written in the notation

to'be used in this paper.
The model presented in Table 1 is of two measures
in unstandardized (raw scorer and standardized (z score) form. Table

.presents some familiar
formulas based on unstandardized variables that
will be useful'for the; development of the proof.
1This
T
paper is one of a series contemplated.for publication. [See
O'Brien, 1982]. Eventually ,I hope to present a textbook of applied /stat'
tics proofs and derivations to supplement standard applied statistics
textbooks.
61"
Table 1
Table Layout for Two Measures in Unstandardized and Standardized Form
Measure
Unstandardizgd
Measure
(X -R)/S'
rR)/S
(X
3
r,
x2
Y1
='i
JY
= Z
(Y -Y)/S
Zx
1
Standardized
Unstandardized
',Standardized
(X -;) /S
x
1
Y2
= Z
(Y
3'
Y3
I
0
t*
x.
(g.1
-R1/S
= Z
(Yi-) /S
Yi
x.
.
= Z
Y
-X
(X -T) /S
.=Z
= Z
(Y
40
Y n'
'Sample
Size
n
zx
n
x
Y,
Sample
Mean
Sample
Variance
.,2*
3c
r'.
#.
.
-NOTE:
all sample size terms are equa.1; th4t is: n

..
= Ti
,1= n
z.x
-= n
zy
..
' o,
Any of these sample size terms could be identified by,just.Che symbol -- such As
n.
We' will use h when it is not important to disiinguish..kmong the other sample'
size terms, but will Ilse the table' values above when itis necessary or important
to do so.
,,
Table
Relevant Formulas for Unstandardized Measures
-Y
Measure
Measure
-4
Sample
Mean
1 =1
X=
X.
X
1=1
Y
n
TX.
nxX =
nx
Sum
Y.
n Y =
E
1=1
i=1
Y.
1
nx
i=1
Sample
Variance
S
x
nx
2
(n -1)S
x
x
Sum of
Squares ,
L....(Xi :X)
2,
.
(n
i=1
-1)S2 =
Y
),__.
i=1
(T.-(Y. -Y')-?
1
.
NOTES:
The sampleisize terms are equal:n =ny . Also n x =n.=n4

y
x
l ,
..
anipulation of "Sample Mean"; i.e.,

2. ,"Sum"is simply-an algebraic manipulation
multiply over the sample sizeterm in "Sample Mean" to get-"Sum".
Also,"Sum of Squares is such a manipulationbased on "-Sample
"Sum" and "Sum of Squares"-will be useful later on.
Variance"..
.
3.
.
Des criptive statistics for standardized scoret'will be

body".ol. the text.
developed in. the
Ilr
Standard Scores
It will be Kecalled that the standard score for an unstandardized
.
'
measure (raw score) is "the score-minus the mean divided by the standard
deviation".
For case 1 of measure X in Table 1 , the standard' -(z) score is:
For any (hypotheticall, case, 'the standard,score,of-an,X measure is

.
x
=
1
X
.The lame procedure can be applied to Ymeasurds..For case'l:

,
Y -Y
1
Y1
Similarly,
for the ith (hypothetical case),-we have:

.:
Y-
,Since a standard score distribupion (such as in Table 11 is a

.
distribution of variable measures., we can calculate means; standard deviations,

,
.,
,
.
variances, correlations, and so forth, just as we can,calculate theSe.
__
.
statistics for unstandardized measures.
0-
6.
Most students will recall that the mean of z scores is equal to 0

and.the standard deviation (and variance) of standardiZed scores is equal to 1.
(The proof of these statements is given in the Appendix
1
.)
The mean of X standardized scores is defined as:
xi
i=i
Z =
x
40k
Similarly, for Ymeasures:
z
0
The variance Of X in z score notation is defined as:
n
cz
11 2
P
4
1=1
n
z
.
--.
E ) 2
...
-1
4,
Th'e Appendii Contains proof of certaimconcepts'or relationhips

that miy be of interest to the reader but are not crucial for.the.development
.,of the proof in this Raper (the theoretical limits of the sample bivariate
correlation coefficient).
.
..
16
7.
'I
For the standerdized,Y measure the variance is defined as:

4:0
n
z
1E: (z
2
i=1
-z 1
Yi Y
nz
-1
Sum of Squared Standard Scores.
To understand the proof it is necessary to know the result
of summing a distributiorkof squared standard
/.
If we square
scor4s.
each, standard score for the X measurein Table 1 and sum them, we obtain:
11,
4
.
Xi"
i=1
Z2
>..... Z2
Z
.
,,,-.
Z-2
-2
V
.
If we substitute the appropriate means and variances in .the right hand side
of the expression, we obtain:.
'n
zx
E
.i.71
------
z2
xi
;
0
.r'
(X
+
2
40 2
0(
-X).
(X -X)
n
2
C,
8.
Since
the
2
S
is a qonstant, we can.factor it outside, and write:
n ,
z ..
x
'
'
z-x4
..
(x -x)
1
(x
-xr
+...+ (x
)i)
n.
1 =1
.1
S,
!I
4'
,.,)
.e08
Rewriting the right hand -side in summation notation, we obtain:
(X1 --co 2
h
z
1:
k=1,
Z2
S
i
i=1
.
x'
.
.
From Table 2, we know that we can substitute the sum of squares term
into the numerator on'the right hand side. This results in:
.5
(n Ll)S
x
n x-1
n71
(Recall that n -1 = n-1).
If we were to work through the same steps for Y, we would obtain:
,(n -1)Sy
Y
2
--.1a -1
,Y
4
(Recall that n 71 = n-1) .
41.
9.
As
Theu relationships between squared.z scores and sample size are very important'
,
for the proof 'later on.
They will be summarized dater on for easy reference._.,
Correlation Formulas
Unstandardized Form .
Using the notation'and variables in Tables 1 and 2, theiinstand
dized
'form.corc,elation'for two measures(Vand Y) is defined as follows:
i=1
r
3;
0;
xy
-7
-X)2
1=1
i=1
n -1
x
ma.
n -1,
Y
6...1.
Note that the numerator contains the term n-1 because it is not important
9
or necessary to distinguish between nx-1 or n y -1.

inator it is helpful to distingpish nn -1 from n -1.
However, in the denom-,
In any case, all
of the sample size terms would be equal, to the same numerical'value

,....serre/ation coefficient were computed on a set of data (n x -1 =
r
4'
10
if5a,
-1 =n-1)
Jr
10.
Standardized Form
The correlation of measure,Xsand measure Y in standard score form
is defined as follows:
1
n-1
1E: (Z --E )(Z

x
x.
17 .
i=1
n
rzx`zY
zx
z.
E(Z x.
-Z
i=1
i=1,
nz- -1
z y,
It is proven in the Appendix that this correlation formula is egdal to

I
E.Z x.Z
d=1
171.
.
-1
rz z
x y
If we rearrangejthis formula y multiplying over the n-1 term,we
obtain:
-n
(n-1) r
x. , y .
z z
x y
i=1 "
This relationship will be useful in,the proof
it will be restated for
easy reference later.
The reader may recall,that the same correlation coefficient results

when the variables are in raw score form or standard score form. That is:'
ll:
.
.__
This statement is proven in the Appendix.
W&will restate it prior to the
. .
proof for the readers convenience.
'C'
,Inequalities,
to
it it necessaryto revie
Before startlng the prOor
In the proof we are required to manip--
topic: algebraic inequalitiA.

ulate
one eifurther
the folm "gfeatek than- or eqUal to"
form of inequality:
and
Airless than,or equalto". An example will serve as a refresher..

For two variables, say A and B, we Can. write:
> B
which means "A is-greater than or equal to B".

'1
Equivalently, we can write,i,

,
<
A ',
"B is less than or, equal to A"!1-For example,.
which means the same thing:

3
) 1 or equivalently
}e obvious.
All'of this m
1 L 3.
What students, sometimes forget is
what happenS when. multiplying or dividing by negative quantities.
For
example, if
.
3 > 1, and we multiply this inequality by -1, we would obtain:
N.
J
=1( (3)
(1)]
-3
/---
12
12.
4"
.
That is, the inequality sign is reversed. when multiplying by a negative number.
004:
,
The same result occurs for more cbmplex expressions.
For example:
1 - A >
Multiplying each side of the inequitS, by -1,we
-11(1-A) >
-(1-A)
(B)]
(0))
<
A-1 <
Example:
1 1
-Multiplying through by
4>Q
we obtain:
-1[ (1-1/4)
-(1-1/4)
347.1
<
^:
SummaryNof Important Concepts:
a
We have reviewed standard scores(z), correlkion formulas and
algebrait inequalities.
(
the
proof that follows.
All of these concepts are important to understand

For the readers convenience, we will summarize
these concepts, for easy reference.
This is done in Table 3.
Table 3
Summary of Important Condepts
x-z'
1=1
z
n-1
=
Y.
1=1
xy,
z z
x y
I
n
Z
1=1
,,
.y.
(n-1) r
'
(n-1) r
x y
(I-A)
z z
>.
(B)
<
xy
. \
.4
14.
Proof
Formally,,we want to pro've
We are now ready to present the proof.

C.
the following statements:
-1
xy
<
xy
+1
.-011
writing each of the statements in one linear form:
-1
C
410t
+1
xy
This states the same'information as the above two separate statements'.
The proof consists of two parts: one part shows the lower limit
4
o
of r
of'r
(i.e.,
xy
xy
xy
(i.e., e
xy
>
-1), and
+1). We will prove the'upp'er limit first.
the second'part shows theupper limit
g7
Proof that
xy
+1
To prOve this limit, we sill perform
algebraic manipulations
on a statement which-is mathematically true.
Dz
i=1
That statement is:
z.
xi
Yi
15
15.
In,wdrds, the statement means:: the sum of squared differences of n

standardized value pairs will always be equal'to or greater than O. The
The squared differences
reader may refer to.Table 1 for clarification.
e,
Yi
fi
Z
values starting at
and
are taken 17 each row (pairs) of
...44.
xl, yi and conti uing down ,to the last pair of Z's (Zx ,Z
n
).
Yn
ob
Most students readily agree that the squared sum will be greater than O.
But can it ever be exactly equal to 0? Yes, theoretically it can.
Refer-
',
4
ring to Table 1, if olle;imagines each standardized X and Y measure to have

0
1
the same numerical value, then it is apparent that each difference will be
0; so, the squared
will itself be equal toi0.
it is onlyreqUired
sense.
that
Now, a sum of squared O's.
of 0 is also O.
value
While it may be unlikely to occur in practice,

:,(Z
xi
1=1
Yi
Thus, the statement is true.
be true in a mathematical
We will expand this squared, sum,
perform algebiaic manipulations and substitutions, and arrive at the proof
for the upper limit of thesample correlation coefficient.

The actual steps in the derivation will now be presented.
.pertaining to the
Notes
algebra are provided for the readers reference.
to Tables 1,2 and 3 as needed.
Refer,
It is suggested that the reader first
examine the algebraic statement on the left side of the page. Then read the
comment on the right side for explanation. Seenext page.

1
That is, within pairs, not all pairs.
Example:
Y.
i
1.41
-.68
1.41
-.68
.05
.05
etc.
40.
1.
That
+1
xy
Notes
2
-Z
Z(Z
yestatement
each term, we obtain an expansion of
1=1'
thd binomial in this form:

2
;t(Z2
2Z
1=1
Z
x. y.
3.
>
A2 + B2 - 2AB ,
(A-B.)
AK.
Squaring
from before.
Y1
Distributing the summati& opel-ator
3.
to 'each term, and, bringing g-the

a
constant 12)P outside

the summation sign
co,
\
ti
n
2
+" Z- Y1
x.
1=1
1=1 1:4:
2 E
1=1
Z
x. y.
1
This next stet) is very important. We will

substitute three quantitiee,a,11 frdm
Table 3. They are:
n
2:Z2
n-1
x.
i=1 1
17
n
I
n-1
.1.Z2
i=1
xZy
3.
3.
(n-1)r
(n-1)
z z
x y
i=1
.
So
f"
Making the
> 0
2(n -1)r
(n-1)
(n-1)
'xy
ree substitutions
Collecting, the liicetei-ms of (nr1)

A.
2.(n -1)r
2(.1-1)
2(n-1)(17r
xy
>
>
xy
n-1).term
.FactOring the
Dividing each
2(n-1) which
equality s gn
positive b ca
side of the inequality by

oes not change the in-,
as
2 (p-1). is always
se n mutt ,alwayS, be >
'
2(n-1)(1-rxy ]
2(n-1)
2(n-1)'
(1-r
Here we make
se of multiplying an
a negative number. Let
ch side of the inequality
us multiply
which reverses
by -1_(see Ta le 3)
sign
and
reverses the 1-r
the inequalit
xy
inequality' b
13
-1[(1-r
xy
xy
>
'+i
xy
0]
-1
xy
Now, add
:+1
to each side
This gives us
END OF PROOF
OR. UPPER LIMIT.
xy
18.
Proof that
xy
-1
,Part two of the proof
will be much simpler because the structure
of the proof is very much
of
e the first part.

.
follow the ame basic steps.
We will
We start out with a statement that is mathe-
true, namely:g.
.
",
2: (zx
)2
'o
y.
1=1
Again this statement is true in a mathematical senseeven through the

411;,'
.
"equals 0" aspect is very, unlikely to occur in statistical , practice.
The development of the proof with apprOpriate notes-begins on the
next page-
-1
4
0
10'
,0
-41Nha
xy
;..?
4
(2,
,. n
Zx
>
)2
Step ) restated. Squaring each term

results in a binomi4a1 expansion in
1
this form:
i=1
(A+B)
o,
2AB
Ez
i=1
+ Z
x.
2Z
+
.
Yi e
>
Z
)
x. y.
1
1
....,
2 EZ Z
i=1 1
Distributing the summatiOn operator

and bringing out the 2
i=1111
, 0
Makingsthesame three substitutions

As in part one/ we obtainer
y
i
i
.'
i
t+
(n-1)
2 (n-1) r
2(n-1)[1 + r
xy
Adding like terms and factoring

xy
Dividing each side by ;pi-1)
xy
.>
Adding.
i%
1.
xy
.41"4"'to.each side
>
-1
Simplifying
xy 0
>
END OF PROOF FOR LOWER LIMIT
-1
23
22
ra
jolt
JO,
20.
,
.'..
...
We have just proven that 1 <
+1.
See the
Appendix
'
for aclditional proofs of related material.

4
tit
re
ri
,r
ti
24
v..
21.
J.
.4
APPENDIX
Fs
g.
Selsiled'Proofs2
1:
That the mean of standard scores is equal to 0.
We will start wi th the definition of the mean of z scores for

o
6C
'4604
the X meature:'
. n
zx '
Z. 41 i
Z
1=1
=
x
Expanding the right sidel

4.
.,,
(X2-56
-i)
(X
n_z
S
-
..
..._
S
.
____CX_n ri)----
+ ... +
S
?(
Factoring the constant,
outside amervwriting the sum of deviatiOns
in summation notation:
n
x
(ni:X)
1 ;
Zx
nz
47
1=1
22
7
'
Distributing the summation sign inside the parentheses
:*
t_
Zx
n.
Sx
z
x-
Y.
e,
Since
x,
"")L
i=1
v Xlisequal
4,
x
and '4,e sum of the constant,
n X
x
. -
X..-.1,.
i=1
,t--..:
to in X
X
P,
0,"'
..
1.
1
-
Zx
(n X
x
----
n-
n7()
.,
Thus the mean of
Z scores is equal to 0.,

x
Similar reasoning for the
Y measure will produce the same result, namely:
nz
Z
yi
i=1
Zy
(n
n1
zy
"Y.
- n
S1
.y
Therefore, variables in standardized form have
mean equal to 0.
0,
Recall that when taking the sum of a c onstant (say C) we have:

n
JE:c .
\
..,0
c + c + c + ..1- c
i=1
. ,nCL
.
That is, the sum of 4i0brisiant

is equal to the constant times the number
...,4.,
..---of terms added (in thts.case / n).
,
.
.
26
.
23.
-That the variance and standard deviation og standard scores is equal to 1.
2.
By. definition, the variance fbr X meAsures in ,'standard score form is:
2::(Z
x.
i=1
,z
zX
-1
=;0, we now have:
Since we know that Z
S.
n1
nz
*S---(Z
L___ x.
1=1
1
nz -1
.,
If we
..
x',
rewrite 'Z
'
in terms ofunstandardized mean and standard

x.
'T
deviation:
z =
nz
x
i=1
2
S
Rearranging terms:
n
='
2-'
-1
z
S2
x
x
2
14EDX -X)
i=1
59
24.
From Table 3, we can substitute. into the numerator of S
the "sup of
x
This results in:
situares" for the X measure.
Since
nx = nz
2
S.
=
z
-1
:(n -1)(S )
X
X
we can cancel terms,' leaving:'

x
Similar reasoning for Y standardized measures-will produce, as the next

to the last step in the derivation:
(n -1) (S2)
Y
'Y
-1
nzy
2
,
Y
46.
Since n
= n
z
-Y'
2
In each case, the standard deviation for approp riate variance terms,
is simply the square root of 1. That iS!

,
2
n.
Sz
=
X
=17-=
Sz
and
Tr
S
Y'
=
,-
=j1
S.
z
Thtis the variance and standard deviaapn-of z scores
equal to 1.
a.
25.
0 3.
That
xy
rz
x y
We want to show that when measures X and Y are converted to standard

scores and correlated, the resulting correlation is the same as the correlation between the unstandardized (raw) measures of X and Y.Let us first
rewrite the correlation formula foY z scores:
:E=
,e 1
n-1
r
(z E
x
i=1
(Z E y)
tex
Yi
z z
x y
-am
z
2
-Z
x
i=1
n
^
i=1
-1
-1
x
,
Since Z
= 0, we can simplify to get:
4i
n
(Zx ) (Z
n-1
r
y .
3.
3.
i=1
z z
x y
nz
2
(Z
Y.
i=1
nz -1
Y4
In the denominator, we recognize. that'

i=1
(z'xi
)2
= n -1
z
.111140114,
nz
Substituting these valu es, we obtain:
and
r.
26.
S--(Z
r
.)(Z
x.
1.
y.
n-1 i=1
z z
x y
-1
.nz)c-1
-1
z
-1
n
z
The denominator cancels out completely leaving:
r
z z
11-1
x y
xi yi
1=1
(Recall that this relationship was used in the proof for the limits of r ).
xy
Nowo expanding the z score terms:
r.--
,
r
z z
x y
cp
n-1
(Xi -T)
(y.- 1-i)
.4i=1 '(S ) (S
x
This is identical to :
n
1.
RHY i)'
n-1 i=1
30
441
27.
Recognizing that
S2
and \02
, we can write:
40P
'
)
(X -R).(Y.-i
1
n-1
1=1
2
(S
2
) (S
Rewriting the denominator of the variance product term in
aw score
terms (see Table 2)':
.
i Em-X)
n-1
.
z z
x y
r-
-i)
i=1
....
n
Y
2
21(X. -X)
.4*
i=1
i=1
.
n x-1
Therefore,
-Y)2
Y:
n
,110i46
=mow.
01/0.
This is precisely the fOrm for
.....
...-
n
x
that was defined earlier it the iapei.
xY
the correlation ketween measures in rawrscore and z score
forms is identical.
31
28.
References
Glass, Gene V and Stanley, Julian C. StatisticalMethods in Education

and Psychology. Englewood Cliffs, New Jersey: Prentice-Hall,-1970.
a
2
O'Brien, Francis J., Jr. A 15/15tf that t and 'F are i e tical: the general
case. ERIC Clearinghouse orisCience, Mathematics and Environmental`.
(Note: the ED number for
Education, Ohio State University, April, 1982.
was not available at the time of this publication).
this document
.7
-DOCUMENT'RESUME
A
SE 037 098
.ED '215 894
O'Brien, Francis J., Jr.

A Proof That t2 and F are Identical: The General
AUTIOR
TITLE
Case.
,24 Apr 82
PUB DATE
NOTE
EDR PRICE
DESCRIPTORS
20p.
.
'
MF01/PC01 Plus Postage.

*College Mathematics; Equations (Mathematics); Higher
Education- Instructional Materials; *Mathematical
Applications; Mathematical Concepts; *Mathematical
Formulas; Mathematics; *Proof (Mathematics);
*Research Tools; *Statistics; Supplementary Reading
Materials
ABSTRACT
This document proves that the F statisticcan be

obtained by squaring ,t -test values,, or that'equivalent t=test values
may be obtained by extracting the positive square roots of F values.
Proof to varying degrees of*completeness and accessibility 16s been
given by other scholars, but generally these prier statements,
particularly those available to students of education or psychology,,
focus'on the special case,4rhen sample sizes are equal. No source
could be found that provides a complete,\ detai.led'prZof of the

Atueents
of.
applied
general case that was understandable-to
statisti.cs. This document seeks to give a clear-step-by-itep proof,
MO'
with,a numerical example worked out, anda plan is provided for

proving the special case. It is felt the reader should te able to.
follow the proof of the general case, and should therefore have
little difficulty in translating the acquired knowledge into proving
the special case. (MP)
'
***********************************************************************
Reproductions supplied by EDRS are the best thtt can be made
*
****************-*******************************************************
/VS. DEPARTMENT OF EDUCATION


MATERIAL HAS BEEN GRANTED BY
EOUCATIONAL RESOURCES INFORMATION

CENTER IERICI
ivori:ns document has been reproduced as

received, from the person or organzahon.
A Proof that t
and F are Identical:
the General. Case
onognaung a
-_
Mrnor changes have been mad, Io anprOve

reproduction
Francis
p_ oats of view dr opiniOnS sta d n this (loco
relent do not,necessanly rep,
F.
O'Brien, Jr., Ph.D.

TO THE EDUCATIONAL RESOURCES
INFORMATION CENTER (ERIC)"
root NIP
POSMon Or poky
.1t is well known that a researcher wilt) has. collected data from two
independent groupsmA,
rforth either a t-test for independent samples
or a one-way analysis 6f variance for two cfroups.
This is because
knoWledge of results from one type of computation can be transformed into

an equivalent result)for the other type of computation.
r;
if 'a t-test for independent samples is calculated,
For example,
the equivalent
F statistic can be obtained by squaring the t-test value. Analogously,

if the researcher has available the F statistic obtained from a one-way
analysis of variance for two groups, the.equi'valent t-test value may be
obtained by extracting the positive square root of ,the F value.
That is, t
= F or t = (F)
The proof that t
1/2
.
= F has been given to varying degrees 'of
completeness and licessibility to students by other scholars 'in,
'professional journals (See Rucci and Tweney, 1980 for citations).

.
[
Statistics textbooks commonally available to stude4te of education

1
....
or psychology
occasionally provide hintS for proving the'special case
of the relationship (when sample sizes are equal). (See, for example,
.
N.
Glass and Stanley, 1970)

The motivation fOr presenting the Proof is t.101ctold.
First, many
prior statements of the proof for the general case (of unequal sample
sizes) are either abbreviated, mathematically inaccessible or incomplete
for understanding this important relationship., A search, of

did not reveal a
the literature
source that' provided a complete, detailed proof of
'
the general case that was understandable to students of applied statistics.

Second, a-full sfep-by-step_ proof foi the general case will give reader's
4
a sense that statistics is not all just a babel of "Greek arithmetic".

As a dormer instructor of graduate lev'el applied s'Ntistics, I know
that many students can follow well-articulated proofs and desire to see them
worked out.
/
In this paper
three
tasks will be accomplished.
a clear step-by-step proof that t
Firsty
= F in the general case will be provided.
Second, a'buMerical example will be worked out.
-provided for proving the speciaLcase.
Third,
a plan will be
It is felt that the reader should
be able to follow the proof of the general case, and therefore, shOuld
have little difficulty in translating the acquired knowledge into

proving the special case.
Proof that t
= F: the Gen4ral
j
Case
Firs410eus lay out a table of. symbolic values'in order to

introduce a familiar notation and the var9.ab:Te7gAused in the proof.
This is done in'Table 1.

yhe'plan'for the proof is important for understanding the strategy
0
,involved in attacking a statistical proofr
that will be used here are given below:

1.
The steps of the plan
state the form of the t-test Statistic using the

notation of Table 1.
r
2.
square the t in step 1.
3.
state the form of the F. statistic using the notation

of Table 1.
4.
simplify alggbraiciIly the F statistic in step 3.
5_.
observe that t
t8 the squared t
amplified F of step 4 is equal

f step 2.
3.
Table 1
Table Layout
Notes for Table 1
for Two Independent Groups

1.
oup 1
sample sizes are assumed

unequal., That is,
Group 2
n.
X11
12
2.
21
the total sample size is
22
t..
31
41
n.
n.
2
32
3.
42
the grand mean (x..) is a

weighted mean since
sample sizes are unequal.
That is,
0
AP
+
X.
n.
1
n. 2X.2
'2
n.1 ,+
n.
2
X.
12
Ime
4.
s ?.
is not needed
for the proof. It is

included only for
completeness.
nl
Total
Sample
Size
n.
Sample
Mean
i.
n.
x..
X.2
Sample
Variance
n..
s.
s.
41
s..
N-
fr
5.
4-
\21'
This is the expresion we will use as a basis oecomparison with the.
simplified F statistic to be obtained in step 4.
t value is referred to simply as t
Note that the alcove squared
2
.
. o
Step 3.
the F statistic
It.\ \will be recalled that the F statistic is the ratio of two independent :
1
sums of squares: a between

squares (SS
).
and a within sums of
Also, eachk sums of squares is divided by an appropriate
'degrees of freedom term:

and df
sums of sqUares (SS
df
(between sums of.squares degrees of freedpm,
w (within sums of squares degrees of freedom).
the F statistic (for any number of groups)
The general 'expression
is:
SSb/df
SS
w/df
The genera1.form'ofdfb'is "the number of groups minus one" (i.e., dfb = J-1,
where J is the number of groups).
4
two groups is df
= 2
1 = 1.
For two groups, J = 2, and, so, dfb for
The- general form of df
size minus.the number of groups (i.e., df

we can Write
= n
+ n.
.1
df
='n
'1
J.
2
= n.. - J).
is
the total sample,

0
Since n.! =n.

1
+ n.
Hence for J=2 groups, we can write
Gr&mmarians will point out th-t the preposition "between" refers to

the relationship of two entites while "among "ore
g to more than two.
However, since the ,referencehere is ultimately td. two groups, "between'
'will be used instead of\the correct "among": 44afocept the righteous indig.
nation of the grammarian.
fr
5.
(N
This is the expresion we will use as a 'psis of'coMparison with the.

*,
.
Step 3.
2
.
4%
Note that the above squared
1.
the F statistic
It\will be recalled that the F statistic is the ratio of two independent :

sums of squares: a between'
squares (SS
).
sums of sqdares
( SSb )
Also, each,sums of squares is divided by an appropriate
df
(between sums Of, squares degrees of freedp
and df
the F statistic (for any nuber of groups)
is:
SS.
b/df
FJ- b,n.. -J
SS
w/df
The general.form of'df b is "the number of groups minus one" (i.e., df
whereJ is the number of groups).
= J-1,
two groups-is df
= 2 - 1 = 1.
The general form of df is "the total sample,

w
size minus,the number of groups (i.e., df

we
an Write
= n.1 + n.
J.
2
= n.. - J).
Since n.! = n.
+ n.
1
Hence, for J=2 groups, we can write
dfw,
44
ad
Gra.mmarians will.point out that the ipepotition "between" refers to

the relationship of two entiti,es while "among "ref
g to more than two.
However, since the iefereqcehere is ultimately tO two groups, "between'

'will be used instead of\the correct "among": .04 aotept the righteous indig.
-6.
Thus, using the notation of
Table 1, we carcwrite the F statistic
fof two groups as follows:
sb/i
F
1,n.
+ n.
1
- 2
2
ss
w/(n. +n. -2)

2
SS
SS
w
- 2)
+n.
(n.
1
In order to facilitate the proof, we will write'F in the

terms of means and variances.
familiar
That is:
n.
1nl+n. -2
+
1
n.
-X..)
(Z(
-1)s.
(n.
1
-1)s.
(n.
2
n.1 + n.2 2
We will refer to this expression as simply F.

2
It is instructive to now compare the vale of t
and F. for the readers.

I
convenience, we will restate t
'
and F so that they may be compared,,
arZ referred to later. This is done in Table 2.
7.
Table 2
L.
Restatement of t
and F
t2
F
41.
z
.-X.
(X.
1
(n: -.1)s.
(n.
+ n.2\
)s..
(A
-1)s.
:2
-R.:)2
+ (n. -1).
2
-2
,n.1
+ n.
(i.
+,n.
+
n.
(n.
)(n.
+ n.
n.
Step 4.- Simplify F.

This is the next to the last step in the proof.
proof requires several algebraic steps.

of,F
We will first
A full step-by-step
simplify the ,numerator
Notes pertaining to the algebraic manipulations are provided if
the margin for the readers convenience:
See the following pages*
c..
4
4..
Notes,
sta, by looking at the numerator of F (SSb).

Since,
n.
SS
n.
(;
1 "" 1
R..
(R..R..)2
n.
""=
n:1
+ n.
n.
1
n.2 4X
n.
Finding common denominators for each

bracketed term,
n.2 /
(n.1X.1 + n.2X.2
(n.1 +n.2)X.1
+ n.
1
n.2
+n.
(n.
1
))-(.
- (n.1X.1
R.
n.
2
2!
+ n.
1 n.
1
].
n.
L
SS
P n.
n.
we can substitute for X..
X.1
+ n.
1
Sat.
=
n.
n. 2X.21
+ n. X.
R.
1
qemoytng inside parentheses, multiplying

Aside:the group means and subtracting,
AP
V.
+ n.
.n.
1
SS
- n. X..
tl
n.
1,
- n. X.
1
4/
n.1 + n.
2
+ n.
R.
n.
n.
2
n. R.
R.
1
n.
1
2X.2
2
Cancellipg.like teifms,
+ n.
n.
n.
n.
1
,Factbring like sample size
terms,
+ n.
n.
1
n.
SS
-R.
(R.
+ n.
n.
1
2
2
n.
rR. --)
n.1
n.2
Squaring each term separately

inside the brackets,
10Na
SS
(h.
n.
2
1
(n.
(n.
2'
+ n.
1
/n.
qo(n.1) (n.2)
n.2(X.1 -X.2)2
n.
+ n.
-R.
(R.
1
1-
)2
1
-X,
(X .
b
(n.
SS
brackets,
(X.
Rearranging terms by bringing

all sample size terms outside
+ n.
(n.
)(n.
(n.
+,n.
1
(R.2-R.1
1
(n.
SS
-X.
(X.
1)
Factdring out like sample size

terms in'numerator and denom-:inator, A
y
lk
(n.1 + n.2)2
Note that (X. -X.
is the
same as (R.,-R.)? because

Oftufo =sq
ed`difer
the same to
Tes
same quantity. Hence, we. dan
can
411,
factor within thebrackefs

I
to obtain,
9.P
)(n..
(h.
1
(n
SS
+ n
+ n.21
(n
(X.1 1
)2
Simplifying (n.1 + n.2)
'2
and (n .
+ n .
1
) 2',
we obtain, 16-
2-
)(n. 2)
(n.
SS
(R.
1
-R.
This is just about itiNow sub
)2
2
stitute the value of SSt, just

obtained into the F statistic,
and obtain,
n.1 + n.,
4.
1_3
-14
'so
p
C
0/
11.
)(n.
(n.
1
(X.
n
'1
k
0
If we divide numerator and denominator by
-1.)s..
(n.
).
-1)s.
(n.
-X.
1
(n.
n;
(R.
1
F
2
-1)s.
-R. )2
+ n.
Step 5.
observe that t
- 2
= F
t (n.
+ n.
n.
.
ENO OF PROOF.
-1)s2
(n.
n.
we obtain,
4-n.
+ n.
n.
(n.
).(n.
) (n.
If the value of F above is compared with value of t
it will be seen that they are in the same'
form, signifying that they'are equal. Refer to Table 2 for this comparison.
This completes the entire proof in accordance with the five step plan.
We now turn to a numerical
example for two.independerit groups of unequal sample size.
15"
16
4
12.
t
Numerica1 Example
The analytib
proof using algebraic rules for the general case
.
41,
was given in great 'detail.
A numerical example should provide additional
insight.
The data and descriptive statistics are rovided in Table 3 which is

modeled do Table 1.
Note that the data were chosen for illustrative purposes
only.
Table 3
Data for Numerical Example

46.
Group 1
"`Group 2
10
50
40
70
20 "
100
50
40
'QM
75
ieb
4
.
90
'Total
Sample
Size
10
Sample
Mean
07.5
65.0
54.5
Sample
Variance
957.5,
700.0
NOT NEEDED
Refer to Table 2 for the t-test formula.
13.
Using the formula
-,
there, the t statistic
value is:
47,.5 - 65.0
t
(6-1)957.5 + (4-1)700.0
10
6887.5
J.
t
10
24
-.9240
1%.
If we, square this value, we obtain,
(-.9240)
.8537
(t
= .8537)
.We place the computed value in the margin for easy reference.
Now
compute the F statistic using the formula provided in Table 2.

.6(47.5-54.5)
+ 4(65.0-54.5)
(6-1)957.5 + (4-1)700.0
\_J
6 + 4
6(49) + 4(110.5)
F
6887,5
8
.8537
(F = .8537)
18
./
Thus, t
= .8537 and FL .8537 or t
14.
= F.
Prooff the Special Case
If the sample 'sizes are equal in eresearch design of

two independent groups, the proof that t
derive.
=F is somewhat easier to
Rather than perform the necessary algebraic manipulations, it
may prove instrpctive,forthe reader to actually derive it himself or herself.

Working_ through the analytic proof for the special case will solidify
Jr(AnUnderstanding of.the proof for th
general case.
Some hints will be provi ed for the reader in proving thC

special case.
1.
They are summarized as follows.

= n
= n.
n.
R.2
R.
2.
X.. =
That is, since sample sizes are equal,

r
used; it could be called n.

mean (X..- )
for X- ..
one symbol for sample size may be
Also, since samplX)zes are equal, the grand
is simply the average of the means of 6ach group.
should be substituted in the proof.
Thi5 value
de`
By making these two changes and by following the five step outline
used for the general case proof, the reader should be able to derive the
proof for the special case., One may.also wish to "make up" an.easy to work with
numerical data set to check
on the process.
I
0
Note
For students or researchers who enjoy proofs in applied statistics, the

following
two
references may be useful.
EdwardAll'en.L.
Expected Values of Discrete Random Variables and Elementary

.Statistics. New York: Wiley,"1964.
Guilford, J. P. and Fruchter, Benjamin. Fundamental Statistics in Psychology

and Education, 4th and 5th editions. New York: McGraw-Hill, 1973.
References
Glass, Gene V and Stanley, Julian_ C. Statistical Methods in Education
and Psychology. Englewood Cliffs, New Jersey: Prvitice-Hall, 1970.
Rocca, Anthony J. and Tweney, Ryan D. Analysis oalvariance and the "second discipline"
of Scientific isychology: a historicaraccount. Psychological
Bulletin, 1980, Vol.,87, No. 1, 166-184.
O
'fp
a MD
DOCUMENT RESUME
SE 051 972
ED 331 706
AUTHOR
TITLE
PUB DATE
NOTE
PUB TYPE
EDRS PRICE
DESCRIPTORS

A Derivation of the Limits of the Sample Multivariate
Correlation Coefficient.
Mar 91
18p.
Guides - Classroom Use - Instructional Materials (For

Learner) (051)
Algebra; *College Mathematics; *Correlation; Higher
Education; Learning Activities; Mathematical
Applications; Mathematics Education; *Multivariate
Analysis; *Problem Solving; *Proof (Mathematics);
*Statistics
ABSTRACT
This paper is the sixth in a series designed to
supplement the statistics training of students. The intended audience
is social science undergraduate and graduate students studying
applied statistics. The purpose of the applied statistics monographs
is to provide selected proofs and derivations of important
relationships or formulas that students do not find available and/or
comprehensible in journals, textbooks and similar sources. Derived is
the theoretical limits of the sample multivariate (or multiple)
correlation of one criterion (dependent variable) and any (finite)
number of predictors (independent variables). The proof given in this
paper involves deriving the individual terms of R. The lower limit
and upper limit of R are derived separately. (KR)
***********************************************************************
Reproductions supplied by EDRS are the best that can be made
**********************************************************************,
CC,
A DERIVATION OF THE Limn OF THE

rim(
SAMPLE MULTWAIUATE CORRELATION COKFFICIENT
Cf'D
Francis J. O'Brien, Jr., Ph.D.

frel
U S DEPARTMENT OF EDUCATION
(Mr* at F &rational Research and Improvement
CENTER IERIC)
zp This locumenr has peen rep/O&M:I as

rece,vet; am nie person or OnilandatIOn
nr.ginatmg it
M,not changes have been made to imcoove
rephodur ton quality
Points at view or opinions Mated in MS clOcu
men? no nal ne(OSSSIoly repreSAPnt 0.61.411
Of RI posot,an ot pohi

BY
MATERIAL HAS BEEN GRANTED
cjit..1 L'i
C;j11)F11.11
RESOURCES
TO THE EDUCATIONAL
36 Linden Street
Middletown, RI 02840
March, 1991
1991
FRANCIS J. O'BRIEN, JR.
ALL RIGHTS RESERVED
UST COPY AVAILABLE
Table of Contents
Page
Introduction
Introduction io Proof
Proof that 0 5 R 5 1
Proof of the Lower limit

Proof of the Upper Limit
11
13
17
References
List of Tib les
Table
1
Page
Formulas and Relationships for Standardized Raw Score

Variables
Formulas and Relationships for the Terms of the

Multiple R
10
..0.7-,oEsoo
a ass....
ar*
_EE
1,
aet412Verata.thansao_o .ACM96011172
A Derivation of the Limits of the Sample Multivariate Correlation

Coefficient

Introduction
This paper is the sixth in a series of ERIC publications designed

to supplement the statistics training of students. For related
documents see O'Brien (1982a; 1982b; 1982c; 1984: 1987). The
intended audience for these papers is social science undergraduate
and graduate students studying applied statistics.
The purpose of these applied statistics monographs is to provide

selected proofs and derivations of important relationships or formulas
that students do not find available and/or comprehensible in journals.
textbooks and similar sources. For example, based on the author's
personal experience as a former applied statistics instructor at the
graduate level, few students would profit from a reading ;if Kendall and
Stuart (1967) to understand the proof provided in the present paper.
The unique feature of the papers in this series is deta...tz:ci step-by-step
proofs or derivations written in a consistent notation system. Calculus
is neither used nor assumed. Each proof or derivation is presented
algebraically in detail.
The present paper assumes familiarity with the authors' 1982c

paper (or equivalent knowledge). That paper formulated a detailed
derivation of the sample multiple correlation formula for one
dependent variable and p predictors for the linear model based on
standardized (z) variables.
Introduction to Proof
In this paper we derive the theoretical limits of the sample

multivariate (or multiple) correlation of one criterion (dependent
variable) and any (finite) number of predictors (independent variables).
To facilitate the development of the proof, we will work with
standardized (z) variables. Although the proof could be presented in
the unstandardized (draw scorel form, normalized variables reduces
some of the algebraic details.
Many students have learned that the multivariate correlation
between one dependent variable and a finite number of independent
variables can be expressed as a weighted sum of regression weights
and Pearson (zero-order) product-moment correlations between
dependent/independent variables. This relationship holds only for
standardized variables. This correlation for p independent variables
can be written (see O'Brien, 1982c):
Rzyszi,
+ Biy2 +
+ Bjryi +
+ Bryp
Writing the right-hand side in summation notation,
Rzrzi.
zi.
zp
sVtBjr 1
J=1
where
Rzy.zi,
Z.J,
Zp
= multiple correlation of p standardized

variables.
= the standardized dependent variable
ZY
zi. 42.
= the standardized independent

I
variables
5
B1 B2,
B,
= beta (regression weights)

attached to each
standardized independent
B
p
variable*
r
r
..., r
y1, y2,
= product moment (zeroorder) dependent/independent
ryp
variable correlations.
Many st adents know that the numerical limits on the above

multiple R are zero and 1 (i.e.. 0 5 R 1 ). The purpose of this
paper is to prove that statement.
Proof that 0 R 5 1
In this section we present a detailed proof that the limits of the
multiple R are 0/1. First, a review is given of the notation and
uecessary definitions as well as the relevant results that were derived
in O'Brien (1982c).
We can state the formal linear regression prediction equation for
p standardized predictors as follows:**
= B1 Z1 + B2Z2 + ...+ BiZi
...+ BZp
p+
This equation represents the predicted standardized criterion

measure or score (246) for the ith subject in the sample on the p
i
standardized variable's Z 1 through Z
* Technically, the beta weights (1) ) are called "standardized partial regression
cofficients". The formal notation in some standard textbooks is more elaborate than
ours (e.g., Hays, 1973 or Kendall and Stuart. 1967). As in previous papers, we have
minimized the reading of the symbdism to clarify the concepts in the development of
the proof.
**The coefficient "A" is not included for the reason given in O'Brien (1982c) ; i.e.. it
"drops out" in the least squares derivations and so may be ignored.
The multiple correlation (or just R for short) for this regression
model of p standardized predictors may be defined conceptually as:
=
Corray,
Cot(Zir
\IVaray)Vara0
where Corr is the correlation operator, Coy is the covariance operator,

and Var is the variance operator. Note that Zy is the random variable
that represents the "observed" or known information while Z1
represents the "predicted information".

The proof that is given in this paper involves deriving the
individual terms of R. Two tables are provided for reference in the
development of the proof. Table 1 summarizes familiar formulas for
standardized variables. Table 2 is a summary of the results derived in
O'Brien (1982c) for the multiple R of p standardized variables. The
information in each table provides the essential building blocks of the
proof.
'.vaaff
Table 1
Formulas and Relationships for Sample Standardized Variables

Name of Quantity
Formula
Note
Sum
13 =0
n is sample size. The
i=1
summation is understood to
be across the sample for a
given predictorj.
Sum of Squares
Mean
= n-1
EZ 2
1=1
Above note applies.
Mean of jth predictor for
=0
total sample. The
summation is understood
to be across the sample
for a given predictorj.
Variance
n-1
= Vaa
r
Variance of jth
predictor for total

sample. The
summation is
understood to be
across the sample
for a given
predictot j.
(Table 1 cont.)
Correlation
so rzAzy = rxy
General zero-order
correlation formula for
any two standardized
variables, Zx and Zy..
Note: Proof of these formulas/relationships may be found in O'Brien

(1982b, Ap endix).
Table 2
rmulas
d Relationshies for the Sams e M _lti 'le R

p p- 1
/92
p113,4
COU(Zy,
Var(Zy)
=1
Var(71)
= ;EI2 + 2 JUEhair
-pryj
Cord?,
, Zs7)
P
pr
+
2A 1.; ;Biro
P P2 j.; BiBrij
iBjryj
J=1
where
= dependent/independent variable
Pearson (zero-order) correlations, and
rg = Pearson correlations among the p
independent variables
ryi
Note: Proof of these formulas/relationships may be found in O'Brien

(1982c).
As the reader can verify from Table 2, the covariance term is

Cov(Zy. 7,0) = Var( Zi) = R2. These relationships constitute the "key"
to the proof for the 0/1 limits of R as developed in this paper. We now
demonstrate this proof. The development of the proof will consist of
two parts --one part will demonstrate the proof for the lower limit and
the other will show the proof for the upper limit. The lower limit is
now presented.
Proof of the lower limit

The proof of the lower limit (R z 0) is based on an algebraic
inequality and the information in Table 1 and Table 2. Recall the
conceptual definition of the sample variance of Zi :
Vara0
n-1
A3 is true for any standardized mean, Z. = 0 (see Table 1). Thus.
Var(Z)
n-1
The reader will agree that the following algebraic inequality is a true
statement mathematically:
n-1
From Table 2, this statement is equivalent to:
Zi.)
;Er +
1) P-1
;Iwo
.1"2 11`
But, as the reader can verify from Table 2, Var(21)
Hence,
- Vad
= R2
or
Var(74) 0
Since the value of the square root of a variance term is, by definition,
positive, then
iJVar(Z)
or by substituting R2 ,
k0
4-1-21
Consequently,
R
O.
The proof for the lower limit has been demonstrated.
Proof of the upper limIt

The proof of the upper limit (R 5 1) follows with similar logic. The
reader will recall that the least squares criterion for standard scores can be
stated as follows (see O'Brien, 1982c):
= a minimum.
E(Z
zi
We can also write the least squares criterion as:
Egy
)2
1=1
which is a true statement mathematically.

Our proof for the upper limit will consist of first expanding the above
squared sum, substituting quantities from Tables 1 and 2. and simplifying.
We then return to the inequality relation and conclude the derivation.
Expanding out the left side as a binomial and bringing in the
summation operator:
)2
Ezyi&
24
1=1
1=1
1=1
2
Y1
1=1
Each term can be simplified in turn. As shown in Table 1, the sum of

squared standardized scores in a sample is:
Y1
= n-1 where n is the sample size. As for the second
1=1
term in the expansion, that term reduces to

13
(n-1)Vad 74)
which is derived as an algebraic manipulation for the form given in Table 1.

The last term can be obtained in several steps by expansion and
manipulation as follows:
zy 1( B Z1 + B Z2 +
+ BpZ)
1=1
1=1
21(BI7I7yi
+ B2Z2Zyi +
+ Bpyn)
i=1
2(B1EZ1Zy + B2 EZ2Zy +
i=1
Oe
+B
13
1:1
1=1
Z,i
X
1=1
From Table 1, it can be seen that any term of the form EZx Zy is equal to
1=1
For correlations involving the independent/dependent variables

, we have:
(n- 1 )r
(r
xl
xY
I zyi
2EB 1
.=,
(n-l)r
+ B2(n-1)r
yl
2(n-l)iB r
Jyj
+ B (n- 1)r
P
yP1
.1=1
Collecting all terms together, we can now rewrite the least squares criterion
as:
14
15
)2
n-1 + (n-1)Var( 74)
2(n-1)13 r
20
1=1
Upon factoring out n-1 and dividing it through the inequality, we have
2iBjry 2 0
E(Z, - 7.4- )2 = 1 +
z
1=1
.1=1
Now. irom Table 2, we know that
iB r
74).
Yi
.1=1
Thus,
(Zu.
- Zo
)2
1 + Var(
2Var(
20
i=1
Or
- Zi> )2
=1-
Var( 74)
i=1
Reversing the s% lse of the inequality, we can write the right hand side of
the above as:
Var(
5 1.
This gives equivalently,
Vara& 5
But since Var( 74) = R2, then

15
Viii S 1 or
R S 1. Proof is completed.
We have proven in this paper that 0 S R s 1.
References
Hays, W. L. Statistics for the Social Sciences (2nd ed.). NY: Holt,
Rinehart & Winston, 1973.

Kendall, M. G. & A. Stuart. The Advanced Theory of Statistics. Vol II:
Inference and Relatinnship (2nd ed.). NY: Hafner Publishing
Co., 1967.
O'Brien, F. A proof that t2 and F are identical: the general case.

Washington, D.C. : Educational Resources Information Center,
1982a. (ED 215894)
O'Brien. F. Proof that sample bivariate correlation coefficient has
limits +. Washington, D.C. : Educational Resources Information
Center, 1982b. (ED 216874)
O'Brien. F. A derivation of the sample multiple correlation formula for
standard scores. Washington, D.C. : Educational Resources
Information Center, 1982c. (ED 223429)
O'Brien, F. A derivation of the sample multiple correlaton formula for

raw scores. Washington, D.C. : Educational Resuurces
Information Center, 1984. (ED 235205)
O'Brien, F. A derivation of the unbiased standard error of estimate: the
general case. Washington, D.C. : Educational Resources
Information Center 1987. (ED 280896)
17
is
DOCUMENT RESUME
TM 830 618
ED 235 205
AUTHOR
TITLE
INSTITUTION
PUB DATE.
NOTE
PUB TYPE
O'Brien; Francis J.; Jr;

A Derivation of the Sample .Multiple Correlation
Formula for Raw Scores;
National Opinion Research Center, New York, NY.
24 dun 83
64p.; For related document, see ED 223 429.
Materials (For Learner)
Classroom Use
Guides
(051)
EDRS PRICE
DESCRIPTORS
IDENTIFIERS
MF01/PC03 Plus_Postage.
correlation; Higher Education; Instructional
Materials; *Mathematical Formulas; *Scores;
*StatittiCS; *Supplementary Reading Materials
Linear MOdelt; *Multiple Correlation Formula
ABSTRACT
This paper, a derivation of the multiple correlation

formula for unstandardized (raw) scores, is the fourth in a series of
publications. The purpose of these papers is to provide supplementary
reading for students of applied statistics; The intended audience is
Social science graduate and advanced undergraduate students familiar
with applied statistics; The minimum background for most of the
existing and forthcoming papers is knowledge of applied statistics
through rudimentary analysis of variance, and multiple correlation_
and regression analysis; The unique feature of this set of papers_is
detailed proofs and derivations of important formulas and_ derivations
which are not readily available in textbooksi journal articles; and
.other similar sources. Each proof or derivation is presented in a
clear, detailed and consistent fashion. When necessary, a review of
relevant algebra is provided. Calculus is not -used or assumed. This
series seeks to address the needs of studentsto see a full,
comprehensible Statement of a mathematical argument. (PN)
**A********************************************************************
*
Reroductions supplied by-EDRS are the best that can be made
*
*
*********************************************************A*************
A Derivation of the Sample Multiple Correlation Formula

fz.)r
Scores
Raw
Assistant Sampling Director

National Opinion Research Center
NORC
Sampling Department
902 Broadway
New Ygrk, NY 10010
June 24; 1983

1983
FRANCIS J. O'BRIEN, JR.

INFORMATION CENTER (ERIC).
ALL RIGHTS RESERVED

U.SDEPABTJVIENT OF EDUCATION
NATIONAL INSTITUTE.OF EOUCATION.
CENTER (CRICI
Xs.
tvrt,.1 frmo
r10111.11i1),1
Minor
iwrson or orcianIzation
it
hantr..
11., b,,)
r11
,,(1,
re111o1,n.Z.,1
Pt.ntt,1)1,,, nr opiolotts 51,11,1 in 1,0. dot:kJ

ment
trtir)
',It
:
;)tifit,
lopresvnt of I Ici.11 NIL
SHEET
ERRATA
"A
derivation of the sample multiple correlation formula for raw scores"
1i? Francis J. O'Brien, Jr., June 24, 1983
GO-itettt-O
Noi4 ReadS
10, footnote,
4 lines down
13
17, footnote
nX1Y
hX Y
var(b2,x2)
var(b2x2)
Multiple R
multiple
b Ex2
29, f--JquationZx-Y
36)
2 lines from
bottom of text
mathematical calculus
JO, 2nd equation
Si
43, last lihe in text
*Page refers to original numbers (at top).
mathematical statistics
r
2
SY
Table of Contents
Page
Introduction
Overview of Derivation
Bricf Review of RegreSion Analysis and Derivation for

Two Predictors
Normal Equations
Multiple Correlation
12
Derivation
15
Derivation for Three Predictors
19
Derivation for p Predictors
28
Multiple Correlatibn for p Predictors and Derivation

Appendix A
Normal Equations in Regression Analysis
30
36
Introduction
36
Plan
39
Finding Normal Equations for the Two Predictor Model
40
Finding Normal Equations for p Predictors
44
Alternate Procedure
47
Example for Five Predictors
48
Appendix B: Errata for paper; ED 223 429

References
51
'" 52
list of Tables
Page
Fable
1; Descriptive Samplo Statistics
Normal Equations and Multiple Correlation Formula

:for Two Raw Score Predictors
16
3; Normal Equations and Multiple Correlation Formula

for Three Raw Score PredictOrS
4. Normal Equations and Multiple Correlation Formula

for p Raw Score Predictors
ii
31
A Dorivation of the Sample Multiple Correlation Formula

for Raw Scores
'
Francis J. O'Brien, Jr.; Ph.D.

Nat,ional Opinion
Research
Center, New York
Introduction
This paper is the fourth in a series, of publications.
The purpose
of these papers is to provide supplementary reading for students of

applied statistics. (See O'Brien; 1982a; 1982b; 1982c).
My intended
audience is social science graduate and advanced undergraduate students

familiar with applied statistica
The minimum background for most of
the existing and forthcoming papers is knowledge or applied statistics throul

rudimentary analysis.of variance; and multiple correlation and regression
analYSiS.
The unique feature of this set of
papers is detailed proofs and

derivations of imnortant formulas and derivations Which are not readily
available in textbooks, journal articles, and other similar sources.
Each proof or derivation is presented in a clear; detailed and consistent

fashion.
When necessary, a review of.relevant algebra is provided.
Calculus is not used or assumed.
As a former instructor of applied statistics on the graduate

level, I know that many students are very capable of understanding the
proofs and deriliations preSented in these paperS
My experience has been
that many students desire to see a full, comprehensible statement of

a mathematical argument.
This series seeks to address such needs.
The present paper is a companion work to an earlier paper (O'Brien,

Each is a derivation of the multiple correlation formula for
1982c).
the linear model.
The first paper formulated a detailed derivation of the
multiple correlation formula for standard (z) scores.
The present paper
is a derivation of the multiple correlation formula for unstandardized

(raw) scores.
1-
Readers should find each paper interesting and informative.
TYpographical errors appeared in this paper For the readers

convenience, corrections are summarized in Appendix B of the present paper.
The author would be grateful if other errors in that paper or the
present paper were communicated to him.
The two paperS taken together are meant to be preparatory reading

for a related paper.
In this paper we will present a derivation of the linear multiple

correlation fortuld for raw scores.
The basic objective is to derive
this formula for one raw score criterion (dependent variable) and
lany finite number of raw score predictors (independent variables);
Let us first state the formula we will derive and introduce the
notation used
The linear Multiple correlation between one criterion
and p predictors can be expressed as:
RY
,x ,...,x,,...,x
=
P
hit
Sy S1 +
v 2
br-,S S
J YJ
+ ...+ b r S S
P YP y p
TJriting the right hand side in summation notation:
RY.X
1
x 2,".,.X
'13.r
:S S
J YJ Y
-where:
Y.A ix
x-
multiple correlation of raw
scores,
the observed raw score criterion to be predicted;
xl,x2,...,xj,...,xp
raw score predictors of the criterion,
Forthcoming with the expected title: "A Derivation of the Unbiased

Sample Standard Error of Estimate: the General Case." It will appear in ERIC.
3.
...,b ,... b
b'ill
2
,r
vi,..., vp
...,S
;S-,0....S
_--
slope coefficients o
prOdUct moment cri:erionifiredie.:or correlationS,
,--
standard deviations of the predictors`,
regression weight-Si
the standard deviation of the criterion.
ThiS IS the formula that is derived in this paper:
We will
first present a deriVation for the simplest multivariate case: one criterion
and two predictors.
A derivation is then presented for three predictors
The latter derivation is a useful exercise because it allows a review

of the logic and procedures used in the derivation.
In addition, it
will motivate the use of summation when the algebra becomes complex.
The deriVation is then presented for the general case of 0 (finite)
predictors.
An integral part of this paper is' Appendix'A.
In that
appendix, a method is presented for finding the "normal equations" in

regression analysis for raw score linear models.
Prior to starting the derivation for two predictors, let us

outline the plan which will be followed in the derivations.
will
use
The steps
are:
1.
State the regression model
2.
derive the normal. equations (see Appendix A)
T.
define the multiple correlation
4.
apply rules of covariance and variance algebra

to simplify the definitional form of the multiple
correlation formula
5.
substitute the normal equations into the multiple

-correlation formula
6.
simplify.
We will refine these steps to suit a particular application:
we
4.
Brief Overview of Regression Analysis and

Derivation for Two Prediccor5
In this section we will review the baSic concepts, logic and

our notation for regression analysis.
Introductory applied statistics
textbooks can he consulted for more detailed information on regression

See, for example, Lindeman
analysis theory.
in this section is to review the rationale
et_al., 1982.
The intention
of regression analysis;
The primary use of statistical regression analysis is controlled

The basic principle
prediction and explanation of. quantatativedata.

that-lay behind regression analysis
involves aelecting a general
mathematical functicn that beSt matches the underlying form of

ictability.
variables over which one desires to exercise p
Assume one i5 attempting
one raw Score criterion by use OT two raw score predictorS.

to predict
Assume further that the relationship between each predictor and the
Criterion is linear in form
The mathematical function most
often selected to obtain the best linear "fit" for these conditions is
provided by the following equation:
+ 6-X. +
x
2
where:
Y
a,
b2
the predicted (not actual or observed) criterion,
constants_to be selected by the "least.squares"; procedure;

a = the Slope intercept, and b- and b. = slope coefficient
1
ter-MS,
predictor variables in deviation score form.
1U
5.
ft is conventional to express the predictor variables in deviation Score
tom.
That is, for each predictor, first find its mean and then subtract
the mean from each predictor.
For example,
X1
Here, for either variable,

is its'arithmetic mean.
'cap X" is the actual (or gross)
It is not necessary for any mathematical
reason to re-express the predictors in deviation score fOrM.
simply to fOreethe algebra to be more tractable.

of convenience,
Note that we do not re-express

each type of criterion.
re-express
raw score'and
ThiS is dohe
As such, it is a matter
Y (or Y) as deviations: We could
However, we have chosen not to
do this since most authors follow this convention.

Using deviation scores for the predictors, we can now write
the two predictor raw score model as follows:
a
=a
b (X
1-bx
1
1):
1
As stated, we Will use the second form in this paper.

model.
The regression model stated above is an idealized mathematical
be
If a variable set consisting of one criterion and two predictors can
apply
assumed to be linear, then the model is a reasonable one to
It is idealized
for prediction of actual or observed criterion scores.
of Y. In
in the sense that it assumes no error is made in the prediction
practice, when an actual criterion score is compared to the criterion
1-
Readers of the 1982c paper may wonder why on page 2 there6f the
raTscore regression model was stated in terms of gross raw score (and
As stated, it is not_necessary mathnot deviation score) predictors;
In
any
case, the major result_we are seeking in
ematically to re-express;
this paper is unaffected by the initial form of the predictors. The
derivation could be made witnout the translation of predictors into
deviation score form, but the result ould involve unnecessary and unwanted
complexities. Practically speaking, t .ispaper would have been very much
longer if re-expression was not done
6.
score generated by model, some error is likely to occur--the "fit" is

Y,
If we call the actual sample raw score criterion
lesJ than perfect.
we can sLate another model (an observed raw score model):
where:
the amount of numerical error -resulting from using the

idealized mathematical model (Y) to predict the actual
criterion score (Y).
That is, an actual criterion consists of a predicted quantity plus an

error component.
The error made in predicting the observed criterion Store by the

idealiZed mathematical model is:
A
e
Y - Y
This is the quantity we want to be as small as possible in order to

It can be seen that,if e=0;the
minimize the error in prediction.
"N_
Ictual criterion is perfectly predicted by,the idealized model (Y=Y).

The technique most Often used in the social sciences to accomplish
this goal is the "least squares" procedure.
Essentially, this procedure
seeks to maximize predictability by minimizing prediction error.
The.least
1
squares criterion or goal is summarized in the folloW'ing eXpression:
r's
Y
_7
e
a minimum
1=1
A
If we substitute the quantity for Y previously defined, we can rewrite the

leaSt
squares criterion as:
If it is understood that the summation limits range from the first

observation (i=1) to the laSt (i=n) then we can drop the Summation limitsc
n refers to the total number'of observations for the zriterionand predictors.
This sample size is the same regardless of the number of predic.tors in the
complex,
m
regression model._ Later in the paper when the algebra becomes more
we use summation limitS extensively.
7.
b X
(a + b x
1
,=ie
minimum
and b
(As an aside, "least squares" means we deter-Mine values for d,b

1
in Y such that the squared error term reSultS in the least-poSSible value
Normal Equations
Having stated the multiple regression model for two predictors,

We now derive the So-called "normal equations ".
A discussion of the pro-
cedures and results we will need is presented in Appendix A.
The reader
may wish to read Appendix A at this 'point (or take the next step
on taith).
The normal equations are derived from the least squares criterion
The basic idea that lay behind the technique for
using calculus.
two predictors is to generate an equation for each of the constants in

the regression model (a,bt and b,i).
equations for a, b
and b-,;,
For the two predictor model; the normal
respectively are found to be:
na
Y
12.xl
x9Y
In the first normal equation (for a), n is the sample size:
TheS6 normal equatiOns can be simplified by substituting various

descriptive statistics into terms of the equations.
cancel in the process.
Other terms will
For the readers convenience in following the
substitutions; some basic formulas for sample descriptive statistic:;

presented in Table 1.
'Table
Descriptive Sample Statistics
Statistic
Raw Score Form
Deviation Score Form
Mean
X1
same
Variance
1
n-1
Standard Deviation
(where y = Y-Y)
Y and
)( y
(Xii) (Y-7
Correlation of
Y1
x
(n-1)S.S
(n-1)S S
1
(n-I)S-S2'
fi
15
Note:
For "moan" it is understood

for "correlation").
that the summation extends across all n values of X,
(and Y'
This applies equally to other statistics defined in the table.
9.
1.
In the first normal equation, we recognize that, on the right hand side:
x1- )
In the second normal equation, we can see that:
E(x
but the sample variance
)2
1
or
This may be substituted for
As for Ex
S2
is:
(n-1)S1
we can use the definition of the sample
correlation
between x
and x 7 to Simplify thiS term.
y.(xA)(x-a2)
By definition, for samples:
x
or:
(n-1)S S
(n-1)S1S-7
(n=1)r12S
Ex
Thia may be substituted.
Finally, Ty may be simplified as followS:
y(x x)
is
Now,
identical to'
or
`'1) (Y -Y)
(where y = Y-Y)
This is recognized to be the
numerator of the correlation between
and Y (ryl or r
ly
Hence,
).
x Y
(n-1)S S
1
or
xiY
SYS
(n-1 ) r
y 1
This
may be substituted into the second normal equation.

1
PROOF:
1E:(
Now,
n3"-ci
xV
E(X1-7 )(Y-V)
/.(X Y
-)(1)Y
+ TclY )
sfiY +X Y
= yX1
- Y(nX
n7)
-1-4.
TC17
Therefore,
=5q)Y
= 2:(X
) (Y=T)
End of proof.
11.
we can write down immediately the following
For the:equation
3.
Simplifications:
-E. *2y = (n-1)r
translates into
)
x-Y
S -S-
y2 y"2
In addition,
5Ex9-
lx; =
(n-1 )r
12
(n=1)S7
Making all these substitutions, we arrive at a Simplified set of the originally

.Stated normal equations.
na
b9(0)
(0)
1
(n-l)r
S S
v
a(0)
-=
(n =1
y2 y'.2
b2 (n -1
b (n-1)S
TO further simplify; eliminate zero terms, and for the last two. normal
equations, divide each term by (n-1).
This gives us:
na
EY
-9
r
yi y
y2 y
_;.
S
1
s'T
1
b r
2
b r
2
12
-2
h
12
(n=1)S2-
a(0)
S
2
12.
As a -final simplification, we can divide through the first equation by n:
S S
YI Y
b r
S
I
b r
S S
y2 y 2
S S
19
TheFe are the normal equations we want to work with in the derivation
for two predictors;
For the readers convenience in working through the
derivation, we will restate them prior to the derivation.
We are now ready to define the multiple correlation for one criterion
and two predictors.
RY.x
x2
By definition:
corr(Y,a + b-x- + b-x-)
corr(Y,Y) =
2 2
cov(Y,Y)
\ivar(Y) var()
cov(Y,a +
1
var (Y)
x +bx
2 2
1
var (a + b xi + b2x2)
where:
corr means correlation

cov means covariance and,
var means variance.
1
Alternative notation systems use
y.x +x
I
IL)
Or
x2
among others.
13.
Itisimportanttorememberthata,b-1 andb-function as constants.

Elementary covariance and variance operations performed on the
correlation formula yield
ahove
in the first step:
cov(Y,a)
var(a)
cov(Y,b1x1)
cov(Y,
Var(b X
var(b ;x
)
1
dvar(Y)
2cov(a,bix )
+ 2cov(a,b2x7)
x )
2cov(bixi,b2x9)
Applying rules of covariance and variance for variables and constants;

1
we can achieve further simOlification.
This is done on the next page.
the variance of any constant is zero; the

To briefly review:
variance of a product term containing a constant yields the squared
constant times the variance of the variablesfor example
var(b x
1
= b1
1
When a covariance term contains constants, factor the constants outside

the covariance operator (sometimes this reduces the covariance to zero)for example,
cov(a b x-) = ab cov(I,x

'
= 0
but
-x
cov(b-1 x1'
cov(x ,x
definition the covariance is related to the simple correlationfor example,

cov(xi;x2)
r12S1S2
This should appear correct since, by definition)

cOV(k ,k )
2
r12
var(x ) var(x ?
1
20
14.
04-b1
cov(Ifx)+b2 cov(Y;x-)
2
X9
)
0 + bvar(x
1
1
b2var(x
0 4- 0 4- 2bib2cov(xi,x)
\II
As mentioned, by definition:
cov(Y;x ) =rSS
YI Y
cov(Y,x,)
cov(x ix
=
9
y2 V 2
S1
12 12
One further obr-erVation should be made with respect to the variance
For example; the variance of x 1 is:
of the predictors.
var(X1 - 5Z
var(x1 )
definitiOn, the variance of this difference is:
var(X1) + var(R )
2cov(Xt:RI)
is a constant,
Since X
1
var
var(X1) + 0
Si
Similar results obtain for var(x 2 ).
Therefore ) when all substitutions
are
made:
Ry5x
brylSS
+
y
r-S
2 y2 y
x2
b12 S1 2
b22S2
S
2
+ 2bbr
SS 2
2 12
1
NI
This is the form of the multiple R we will use in the derivation.

be restated for the reader
convenience.
2i
It will
15.
Diarivation
The following formula
for
one criterion and two predictors
appears in many applied statistics textbooks:
b r
1
Ry.x ,x
S S
yl y
+ hr_SS
2 y2 y 2
S.
We are now able to show its derivation.

t
For the readers cDnvenience, a restatement of the simplified

set of normal equations
and the multiple R formula is given in Table 2.
The derivation involves two steps: a)substitute the normal equations

into the numerator of the multiple R formula and b)simplify algebraically.
See the page following Table
Table 2
Predietbr-gNormal EquatLony,---and_flulttple Correlation Formula for Two Raw Score

7
No-tmai-Ettuations
S S
yl y
S- S
12
b r
2
y2 y 2
h S
12
S
1
b-S
2 2
Multiple Cornelia-Lo
brSS
)12 y 2
hr SS
yl y
RY.x
x
2
; 2
sy
term is
1--
The
NOTE:
a= Y
omitted
b S
2b1h2r12SIS2
because it plays no role in the derivation (other than zero),
Proof involves the substitution of the normal equation into the numerator of
the multiple R formula and simplifying. See text for details.
17.
Notice that the numeratDr of the multiple R formula contains

These terms are functionally related
S s- and r S
y2 Y
vl y
If we substitute normal equations for
to the normal equations.
the terms r
each term into R and rearrange terms, we obtain the following reSultS:
1
112-
+ b9 S9
7_ 9
b-S-7
y
17-2-S
2b b
+bbrSS
')
12
S`
S
oi2 S")
2b
2- 2
b2 S2
b r
2b
1
b r_
(Hence,
-2-2
12
9 9
b-S-
2b b2 r12S1S2
+ b2r y 2 S yS 2
S S
S S
r
2
2-2
1pS2
2b
Now, the bracketed term of the denominator can be simplified algebraically if

we remember
radicals and laws of exponents'.
1--
Let the denominator (inside the brackets) be called A. Thus, the

structure of the Multiple R
A
R
S
\jr.k
kecall the following permissible operation (rationalizing the denominator):
18.
Simplifying:
2
b`1 S
2 9
b2S7
2b
r
1
12
S
1
Therefore,
br
S_S
1 yl y
+br2 y2SS
y 2
RY'xi'x2
END OF PROOF1
For readers familiar with the 1982c paper, it is possible to
Obtain a "theap" proof in the analogous standard score regression model:. If variables
are in standard score form, then the standard deviations become unity;
the
1982c
1.
Thus, in the notation of
paper
R-
z .z,z 2
Y
y2
19.
Let us ;now workout the derivation for a three predictor

raw score linear regression model;
This will allow us to review
the logic and procedures of the derivation.
We will also introduce
the ti,;(-2 of summation which becomes necessary for the general case of
p predictors.
The first step is to state the regression model.
Y = a + b1x1 + b7x7 + b k
3
For
three predictors:
e.'
We have simply added an independent variable to our prediction (idealized)

mathematical model to forM A foUr dimensional ilodel (Y. and three
pred:i.ctors with their associated slope terms0).
As in
the two predictor model, we make use of the least squares
criterion to establish our g'oaI of minimizing the prediction error:

0
Y)-
b x- = Le2
- b x
1
a minimum
The next step is the application of partial differentiation to find

derivatives of each of the terms in the prediction model (a,b1;b2 and b
This procedure produces the set of normal equations. Appendix A shows

the procedures involved.
Omitting the cumbersome algebra in<7olved
in Simplifying the original set of normal equations, we can state the

final and simplified set of normal equations as follows:
+b
5.s.
yl y
brS
rSS
y2 y 2
r
b3r23S2S1
rS
b r
_S
S S
2
2 23 2
S
3
--+,
Recall that the value of
but it plays no role in the
is determined in practice
derivation Since it "drops out" in covariance and variance operations of the multiple
R derivation.
The above normal equations are the ones we will make use of in the derivation
of the multiple R formula for three predictors.
A restatement of them is presented
in Table 3 for easy reference;
The third step is to define the multiple correlation of one crit

and three ray./ score predictors;
on
Rules of covariance and variance algebra will allow
us to simplify the definitional form of R.

1
The multiple R is defined
on the followin'g
page.
The term a is omitted
For justification; the reader ti),
the definition of R and a3certain the result,
w;int to include it i.n
Y.x
,x
1
+b
corr(Y b1 x
corr(YiY )
3,
cov(Y;Y)
3 3
2 2
var(Y) var(Y)
All of
the above forms state equivalent ways to define the multiple R.
to operations of covariance and variance,
The last is amenable
Applying rules of covariance and variance algebra:
Cov(Y,b xi) + cov(Y,b2x2) + cov(Y,b3x3)

RY.x1
var (b x ) + var(b x ) + var(b x3) +

3
cov(h x
h X )
+ 2cov
x2)
+ 2cov
33
hr SS-
y j 1
SS
b r
r -S S
2 y2 y 2
2
3
+2b
r
1
This is
as far as
easy reference;
ti
we can simplify
See Table 3.
S1 S
13
S2 S
2b2 b3 r
the multiple R at this point.
23
We will retain this for
22
Table 3
Nor-Mal.--1:ytaLi-on,s -and Multiple Correlation Formula for Three RawScore- Pred-ict-orS
Normal Equations'
br SS
S-
-S
12
S2
13
br S
b-r12S1S2
y2
3 23 2
SS
b3 r
2
b r
1
b
13
r
2
S3
brSS
1 _04_4_
242_
2
23
+
1
2b b2r 4S1S2
-r -S
rSS
3-0
2b
_2
b2
2b}b3r1
r
2
S
2 3
S
2
Again) we note that the term a (=Y) is omitted from normal equations and the multiple R.
NOTE:
perivation involves substituting the normal equations into the multiple R and simplifying. See
the text for details.
23.
We have stated the multiple regression model and least squares criterion;
the normal equations and the multiple R fbrMula.
and presented
The faUrth step is to tibtitute
the normal equations into the multiple R.

If we !=substitute each of the normal equations for appropriate term
tlie-fturriglato>/ r
of R we obtain (see Table 3):
cov(YiY) =
b-r -S S1
h-rSS-
hrS S-
2 y 2 y 2
3 y 3
S S ) +
r
13
(b'S'
+h
12
3 113 223
(1) b
r13S1S3) +
b
1
+b2'p-r S S
rlS
Y 3
r
2
+ b bjr jS2S
ijS-S
+ bjS-
S2 + b2
12
-S S
23
(bi jr jSiSj
jr2j
3
Now
let us write each parenthesised term on a separate line to form a covariance matrix:
./
cov(Y;Y) =
b b r
1)i s1
12'1
9-9
r
12
S
1
2'2
S3
S
2
2- 2
b S
33
1.1
At this point we will introduce SUMM8tiO0 to simplify the algebra:
Consider the three
squared terms along the northwest to southeast diagonal of the covariance matrix.
that we might express these terms in summation as follows:
3
2
bS
+b2 S 2
2 2
j=1
The remaining six terms in the matrix consist of three Rain
S S
2b b r
1
12
2b
r
1
of quantitie:
13
One common way to express this in summation is as follows:'

3
+b
2E 2:1),b.r-..
j
One of several :orms often seen in multivariate statistics textbooks is as
2
i(j
b.j b-r.
It is clear
25.
The total number of terms to be summed is determined by multiplying

the upper limits (3x2=6).
In the double summation operation, the inside
summation operator is
set to I;
then-increment the outer operator
(jr=2,3) giving i j=12 + 13. Now increment i to 2 and complete the limits of j
(with the side condition that i j
e.g.,
i j=22 is not permitted);
The subscripts that result from all of the summation operations are:
12 + 13 + 23.
Each value ,of course, is taken twice.
26.
Thus, the nine covariance terms of the multiple R numerator can he written in all of
the following ways:
coy Y,Y)
+ bY
b21 S
-2-
+ 2b
S 5
2b
S S
3
r 3S S3
2b
1
2 2
b b.r..S.S.
j=i
j 1]
brS5 +brSS
2 y2 y 2
1
b r S S
3 y3 y 3
b.r .S S.
Y J
This last equation is simply a restatement of multiple R numerator from Table 3.
The second
equation was just derived from the first equation.
Turning to the denominator of the multiple R in Table 3, it is readily apparent that

That is
it is similar to the covariance term above.
I.22
S
S,
2 2
2 2
+ 2bbr-SS+ 2bbrSS + 2bbrSS
bS+bS+
b-S
2 3 23 2 3
3 3
2 2
3
1
12
-3--2vr- 2 2
b
bar -S S.
b j ij y j
S. S.
j=1
j=2 1=1
3 dI-
If we now form the ratio of covariance and variance terms for the multiple R, we
can complete the derivation for three predctors:
y7-b 2
-P
El
i=1 J
J72 i71 1 j
ij
,x
111.110,=1.1mmEMMINE
2,x3
b r
j=1 3
Notice that the numerator, and
J =2i=
S.
ij I J
denominator (under the radical) are identical in for.
If we
make the same algebraic simplification we made for the two predictor derivation, we obtain:'
b b,r.
Jj=1
RY.x ,x ,x
2
j=
i=1
=
3
.S S:
J YJ. Y J
This completes the derivation for three predictors.
END OF PROOF
We now derive the multiple R for any
possible (finite) number of predictors in the linear regression model;
34
28.
The derivation of the multiple correlation formula for any number

of predictors will be presented/as a generalization of the two and three
predictor cases.
A rigorous mathematical proof that the generalization

could be provided by "mathematical induction".
holds for p predictors
Our
Approach in thiS section is a straightforward multivariate generalization.

For reference; the following is a listing of the general
steps for the p predictor variable case:
1.
state the regression model for p predictors
2.
derive the normal equations (see Appen,lix A)
3.
define the multiple R
4.
substitute normal= equations into numerator of R
5.
express the
6.
express the variance term in summation
7.
Simplify
covariance term in summation
The linear regression model is
A
Y = a + blk1
2 2
+ ...+ b.x. + ...+ b x
P P
3 J
The least squares criterion is:

^ 9
i.= 2
1(Y = Y)-
Le
a minimum
Sbstituting for Y:
1(
1x1
box
J
Next We derive the normal equations.
b-x-)-=
P P
e2
a minimum
In unsimplified form we have:
,x
nit8thatthenormalequationsfortermsxY y_ 2 Y etc. are written

Since these
such that the first subscript is atwayc less that the-second one
Yx1 etc.) this method simplifies the algebra.
products are symmetric (Y.x Y
1
Appendix A for more detail.
1
See
1
=
na
hl Exl
Ex
2
)c2
-f-...+ b.
x,
+...+ b
Ex
P
Ex Y
Ex
Ex x
Ex x
b
2
+...+ b
3
Ex x. +...+ b
j
Ex,x
p
Ex Y
2
a Ex
2
Ex3Y
b1 Ex x
Ex
a Ex3
Ex x
Ex x
b
3
+...+ b.
3
a Ex
Ex x
2 j
Ex x
1 p
3 p
S S
yl y
+brS
S2
11
12
+bt SS +...+ hr SS, +.1..+br

1
h-rY
rY3SYS3
S S
YP
=
P
+ b3r3
S2
lj
h-r
-S
+...+
j 2) 2 j
1p 2p 2 p
.
b r
3 +
3 13
b2r23S2S3 + b3S
brSS+br
S 1S
1p
p
1
+...+ b,r ;S -S; +...+ b r
a3J3J
p3p3p
+ b r
3
+...+ b S
JP j P
P P
S S
3p
restatement of the normal equations is given in Table
2 and 3 predictors, we
3 P
Ex
P
obtain a simplified set of normal equations!
Ex x
Exx +...+ b
+...+ b
Ex x
It we apply the same logic and make the same substitutions we made for
Ex x
p
b3Ex3+...+b,Ex x. +...+ b
b2 Ex2x3
Ex
Ex x, +...+ b
e tOrre-Tat ion- fOrl
We are no
lc tO-rs--an-dDeri-vatiori
ready to derive the multiple correlation formula for p:predictors.
for a statement of the definition
See Table 4
the multiple R.
The covariance term is
cov(Y,a +bk
bk
+b2 k
b,x, +...+bk )
P P
br SS +br SS + b-r-SS
1
Y1 Y
2 Y2 9 2
b.r_.S S. +...+br_SS
P YP Y
J YJ Y 3
3 Y3 Y
Now, substitute the normal equations (line for line--see Table 4)
tN
SS
cov(Y,Y) = b (b,S'T
1
1 2
4--..,--1--
S S) + b (b r
p 1p
2\
S S
-I-. ..+
r '
p 2p
12
...+
p
-2
S2)
b (b-r SS + b- r` y ---i--...-1-- b
P
Multiply
IP
PP
P,-P
I)
each of the b, terms inside the parentheses and write each parenthesiied sum on a separate line:
A
cov(Y,Y) =
+ b2r
-S
b-r
v2 y 2
yS1
12
rSS+
b h
3 13
n ip
sp
3 23 2 3
1 p 1p
bbrS
2 3
22
,..+hbr SI S
13
1S
+1)22
S.
hbr SS.
PYPYP
Y3 Y
+ blb3r
+;;.+ b_ r_ __S _S__
+...+ b b,r
p 20 2 p
For reasons presented earlier; the term
S S,
2j
22
+...+ b-r
33
3p 3 p
bb r- SS + hr SS
2
3 3p 3 p
P JP
.J 1:SL
S
I
is omitted in the derivation.
37
P p
31.
Table 4
Pte44ctd-rs
Co-rtelationFormula
Normal.
Nortif-1--ktiat fori-S
rSS.b S.
yly1
b r
11
rSS .brS
y 2
12
S2
b3r13S1S
.,;+ bjrijS1SJ +...+ bripS/Sp
b3r3S2S3
...+ bir2iSiSi +...+ bpr-4S2Sp
12
2 2
+...+ b r
=b1 r13 S
ry3 Sy S
2232
r SS
r-
1 10p3
YP Y P
3J
b-r
b-r
2 20 2p
+...+ b 52
S:S
+;;;+
-S
S S, +...+ b r
P 3P 3 P
3 J
pp
jjp jp
3p 3 p
Mg-tiple COtteLat
RY.x
x ,x ,...,x
2
+,..+bix +.;A b x )
corr (Y; b x + b-
corr(Y;Y)
=
j
22
s3
coV(Yib Xi +
pp
j j
+ b3X3
cov(Y,Y)
Iv
var(Y),
a r 69 var(Y)
var(b x +
1
r
1
...+b,r .5 S +...+b r
x +...+b,x, ...+b x )
PP
S S
SY
b
P
S
2b,b,r,.
+.
j ij i j
:.+
2b
2b
P
b
S
p -1 p p -11P P-
+ 2b
121
r
3
...+
S S
13
The PY term is Mitt-0 from the normal equations and multiple R;

NOTE:
Derivation consists of substituting each r ,S S,
notMal equation term into the
YJ Y J
covariance term of the multiple R. See text for details;
40
32.
To facilitate workihg with such a complex matrix, we will introduce summation at this point;
As the first step, we count the
total number of terms to be summed.
Ah inspection of the covariance matrix (page 30)

Since there
above makes it evident that each row consists of p terms.
is a total of p such rows, the entire covariance matrix consists of

For example, in the derivation for three
p x p = p2 terms.
predictors, we worked with three rows; each of which contained three

_
terms or a total of 3 x 3 =3-=9 terms.
In the p predictor model,
the covariance matrix consists of two kinds of terms: diagonal

terms (h
2 2
1
to b SP P
and off diagonal terms.
there are p such diagonal teLms.
It is evident that
A little algebra will tell us
how many off diagonal terms are in the covariance matrix.
Let X
represent the total number of off diagonal terms. Then:

7
TOTAL M A T R I X = p
= p + X
or
X
-2
= p -p
X = 0(0-1)
Thus, the entire covariance matrix consists of p diagonal terms and p(p-1)
off
diagonal terms for a total of p
terms.
We can View the structure of the covariance matrix in another way.

This view is the "trick"
in summation notation.
in understanding the expression of the matrix
Notice that the off diagonal terms exhibit

Each
a pattern (as we saw in the two and three predictor cases).

b.b r
1
to i t ;
corresponds to one other term in the matrix that is identical
.S
ij
and the first term in row two is identical to it.

any off diagonal term in row i;column j
in row j, tOlUffin i (e.g., row 2, column 5
the of
In general,
is .identical to the term

=
tot 5, column 2).
diagonal terms consist of a number of identical pairs
There are
halve
S S
For example, the first off diagonal term in row one is b
such pairs of off diagonal termS.
p(p-1)
the total p
a right triangle.
matrix
Thus;
of terms.
Suppose we
and consider the upper half only that makes
In this halved matrix; we are considering
the p
2
3
diagonal terms and

triangle consists of
p(p-1)/2 off diagonal teams

p(p71)
terms.
That is; the
upper
To represent the entire covariance
41
33.
mattix (0
terms), simply double the number of off diagonal terms
in the half matrix:
2 0(0=1)
p + p(p-1) total terms.
[
Examine the matrix of covariance terms for the three predictor case
for further clarification;
A
As explained,
9
there are p
the cov(Y,Y) matrix consists of p x p
and p(p-1)b.b r
ID'S'
=p2
or 2fP(p-1)/1
S S.
ij
terms;
terms in
the total matrix;
Expressing the total number of diagonal terms in summation notation:

9
2'9
2 2
P P
...+ b.S. +...+ b s
j=1 J
The off diagonal terms can be expressed in Summation notation as follows:

P
12
S.S )
r
JP J P
EbbrSS
2E
j=2
i=i
ij
1Fbr those reader_; familiar with combinatoric -; the following may assist
in clarifying the logic.
2
which are combined with all such terms
There is a t_ctal of p b.2 S
2
at a time. In combinatorial notation, this means that p

one at a time =that is:
bine-1
total number of
com-
l(p-1)(p-2)...1
11(p-1)!
For the off diagonal terms, we construct
terms are
p(p -1)(p -2);;;1
P!
C)
terms
1,4S?
b,2 S
J
terms (pairs
2-
of identical terms, each combined with all other
like terms two at a timP).
Thus:
4.m
Total number Of
bb,r:,S,S; term8
11
P(p=l)(P=2) (13-3)
2
P-
(: ;)
P(P-1)
2!(p-2)!
(p-2) (p-3)
Verne, tl-,e entire covariance matrix consists of:
4;
d9
P(P-1)
terms
34.
For
example, in the three predictet model, the first off diagonal
term was seen to be
and the last was seen to be b2bjrjS_;
b-b-r--12 S-S1
'
In-the case of a 10 predictor model, fitSt

laSt terms, respectively would be:
b b
1
r
2
and
-8 S-
12
b b
10
9 1
and
S
9S 10 .
We can now express the full covariance matrix in Summation

notation as:
p
2E
P -9-2
cov(Y,Y) =
j=1 3
p-1
Ebbj rS-S.
1=1
Equivalentl,
cov(Y,Y
b r
SS +br 2SS
=...+ b,r ,S S.
y2 y 2
1 yl y
3 Y3 Y 3
p
E
j=1
b,1 r
.S
Y3 Y
S.
3
Thus,
cov(Y,Y)
i=
P P-1
E b.b,r..S.S,
E
j=2i=1 1 3 13 1 3
The latter equation is very important
E b,r ,S S1 Y3 Y
3L-1
in the final steps.
If the variance terms of the multiple R are examined, we see that

;!var(Y)
is simply Sy by definition.
The term,
var (Y)
can be manioulated
by covariance and variance rules to produce the folloWing (See Table 4):
35.
var(Y)
var
+...+ b,x,+...+ b-x-)
JJ
2
2
2 2
...+ b, S, +...+ b S
P P
J
J
+ 2b
+...+ 2b,b,r
S,+...+ 2b -b
1JJIJ
4r 14
r-
S -S
PP-1 PfP-PP-1
In summation notation:
var(Y)
b; S;
J
P=.1
Ebb,rS,S,
2E
13 1 j
i j
j.2 i=1
THEREFORE, AFTER MUCH LABOR, WE CAN STATE THE MULTIPLE CORRELATION

-1
2.2
E 11,S;
-E
Ebbr ,S,S,
.x
x,,...,x,i...,x
1
21.2 i=1 1 J
2 2
E b.S
2 E
_i=1
il 1 J
17F-T7T7"-
E b;b,r,S,S,
j=1 j
j.2 i.1 1 j iJ 1 J
p
2-
p-1
2E
E b.S:
j.i
Ebb,r_S,S,
i 1 1.3
j =2i =l
P
\I'
44
JE
'
b,r
,S
YJ Y
S.
END OF PROOF FOR p PREDICTORS
36.
Appendix
Normal Equations
in
Regression
AnalySiS
Introduction
In this appendix, we outline a set of procedures to apply in

regression analysis for finding normal equations.
The procedures
are appropriate when:

a)
the regression model is linear, and
b) the measures are in raw score.
If variables are transformed to a nonlinear form prior to

regression analysis procedures, the procedures described in this appendix
would not apply.
Examples of nonlinear transformations include
logarithmic, exponential and square root
the exponents of the variables in the regression model
general, whenever
are not equal to unity.

Y = a + b 1X
re-expression, or, in
For example,
+ b2x2
1
This is a nonlinear mathematical model since the exponent of

x2 is not equal to 1.
To derive normal equations for a given regression model requires knowledge of elementary differential calculus which makes
1
use of partial differentiation.
Students who are familiar with
taltulus may read any textbook of mathematical calculus for the

details
for example, Hoel, Port and Stone, 1971 ).
For students who need to review this procedure, or who_know

some calculus and want to learn the technique, see Goodman,1977, for
a good introduction.
37.
TO render a conceptual understanding of normal equations as they are

employed in the least squares procedure, let us take an example of a two
predictor model.
The mathematical model applied to a distribution assumed
linear in each predictor is the one given in the text, namely:
= a+bx + b x 2
The raw score model includes an error component, and the error
made in prediction of the criterion (Y) with the above model may be
negative, zero or positive.
The raw score model is
= Y + 6
Solving for e, we obtain:
Y = e
This represents the amount of numerical error made on a score-by-score

A
To obtain
basis when we predict Y with the idealized model, Y.
an overall indication of the amount of prediction error for the
define
entire raw score distribution, we might be tempted to

A
E(Y-Y)
Ee
(over all n observations)
The problem with this approach is that the resulting sum on the left
"
side turns out to be exactly zerol;
E(Y-Y)
Ee
That is,
0.
positive errors cancel out negative errors leaving zero as the ove:ali
sum;
This is obviously problematical because no matter how good or
bad
a partitular mathematical model (linear or nonlinear)
empiriOal sc ,ore prediction,
is for
we would have no way of determining
its
utility (using the sensible criterion of minimizing prediction error).

1-
Proof.
For two predictors:
2".(Y-a=bixib2y
E(Y-V-
E(Y-V)
IX,-
blExi
b2Ex2
The generalization of thiS for p predictors is obvious.
47
38.
For these reasons, the most widely used and accepted procedure
for finding normal equations is based on the least squares criterion;
F.(Y
- Y)-
E(Y-a -b-
X-
= minimum
Ee
(The summation ranges from 1=1 to i=n o over the entire set of observations
least squares states: find numerical values for a,
In words;
13
and b2
2
which will make the prediction error the smallest possible numerical
upon subStitution.
amount
least squares
one
The reader is already aware of
A kind 3f least squares
type of result from elementary statistics.
criterion (and procedure) is used in defining the sample variance

of,a diStribution; i.e.,
E(Y-7)2
n-1
Mean, Y,
The arithmetic
in variance formulas (instead
is used
of medians or other numbers
) because the resulting variance is
the smallest possible value when the mean is used rather than any
other number (or combination of numbers)
in that given distribution.
This is derived through the same calculus procedure used in deriving

ormal equations, and is based on the same principle: optimization
normal
or minimization.
1
Take an example:
.(Y-2)`
(Y-4)
(Y=10)2
(Y-'8)
(Y -11)2
8
10
11
--
(The n-1
(Y -Y)
Find_eath squared sum and compare it against
can be ignored since it is a constant and has no material bearing on
the result).. It will be seen that only
.
(Y-7)
gives the smallest squared deviation sum.
48
-Cy.752
39.
Our task in regression analysis is to find numerical

values corresponding to
terms in the model
to Satisfy the least
squares criterion of minimum error of prediction;
The resulting
values, when SUbStituted into the regression eqUatibn, satisfies

the criterion of minimization.
In essence, we solve
p+1 equations
(p= the number of predictors, and 1 corresponds to the slope intercept

term), or one equation for one term in the model.
Each equation is
:hen solved simultaneously to determine computing formulas to

obtain the numerical values for the p+1 terms in the model.
Finally,
each predictor (and the slope intercept term) is passed through

the resulting prediction equation to find a unique predicted criterion
for each observation in the data set.
(see Lindeman,
The rest is statist cal theory
et al. for an excellent discussion of regression theory).
To take the two predictor example once again,
E(Y -a-bx
x2)
Ee
minimum.
We are not interested in finding a comnutational formula for

and b
b
1
2.
doing that;
Our goal is to stop
one step short Of
We are interested in finding the normal equations,.
and simplifying them
to substitute into the multiple R.
Plan
We will now set down a plan for finding the normal equations.
A four phaSe
plan is used throughout this appendix for finding
40.
normal equations.
This will help structure the presentation.
^
Y
the regression model,
A.
State
B.
state the matheffiatical function of the

least squares criterion,
2
ECY-Y
C.
derive the normal equations for each of

terms in the model
D.
summarize the normal equations
Finding Normal Eouations for the Two Predictor Model
Let us apply the four phase plan firat to the two predictor case.
A.
the regression function is

A
Y = a
B.
b 1x1
the least squares criterion is
E(Y - Y)2
C.
+ b2x2
E(Y - a -b x- - b x )2
Ee2
the procedures fOr deriving the normal equations

are
1.
For the slope intercept term, a, we need to:

a)
b)
c)
d)
e)
and set function equal to 0

drop the exponent 2
distribute the summation_ operator
apply rules of summation for constants
solve in terms of the criterion. variable; Y
subStitute descriptive statistics and simplify
Applying each step in a) through e) produces:

a)
b) \
E(Y -a-bI x2 - b2x2)

EY
Ea
c)
EY
na
Zb-x
b x
b1E xi =
0
2
b2E x2
= 0
d)
EY =
na
+ biE x1
e)
EY =
na
+ b1(0)
b2E x2
+ b2(0)
Recall that Ex1=Exi 2= O. Dividing through by

n gives us the normal equation for aSin simplified form).
a =Y
50
41.
2.
The procedure$for finding the normal equation for b
are:
a) drop the exponent and set function equal to 0.

b) multiply the function by xi
c) diattibUte the X1 term

d)
e)
f)
g)
distribute the summation operator

solve in terms of the criterion variable, Y
substitute descriptive statistics and simplify
Applying each step in turn produces:

a) 7(Y - a- b ixi
b) E(Y = a = bi*i = b2x2

_2
c) E(Yki = Aki - bixi

= Eax
d)
'1
b2xlx2
Eb x2
1
b2 Exix2
:xi +
b2 Exix2
--10
a Z,X
Eb x x
Exl
since
EYxi = (n-OryiSySi
Exi = (n-1)S
and
Exix
and
(n- 1)r12SiS2
we can substitute these quantities, and obtain:

S S
(recall that
Ex
(n-Or
0 + b (n-1)S2
_(n-1)r12
=,..0).
1
n-1), we obtin:
If we divide the last equation by
-S
Y1 Y
S-
r
2
12
This is the normal equation in simplified form

we used in the derivation (see Table 2;.
42.
3.
The steps for finding the normal equation for b2

parallel those for 1)1:
a)
drop the exponent 2 and set function equal to 0
b)
multiply the funttiOn by x2
c)
d)
diStribUte the x term

2
e)
f)
solve in terms of the criterion variable, Y
g)
substitute descriptive statistics and simplify
Applying each step in order:
a)
E(Y -a
- b2x2) =
b-1 x1
b)
E(Y =d = bixi = b2X2)X2

E(Yx2
bixIx2
ax2
d)
EY-x 2
- FA)-,2
e)
EYX2
aEx
f)
E Yx2
since
Ex
1
aEx
c)
-bE x22
?Klx2
b22x22
x1 x
2
2
-Eb2x;
EbiX1
i
u2x2
r
= (n=l)-S-S
y2 y 2
EYx
x=r12 SS 2
2
and
E4
and
(n-1)S1
we can substitute these quantities and obtain:

= 0 +
(n-1)r2S)iS2
bl(1-1)r12S1S2
b2.(n-1)S2
If we divide through by (n-1) we have:

r
S S
y2 y
b S
2 2
This was the simplified form of the normal equation for

that was used in the derivation (see Table 2).
b
2
.0
43.
D.
As noted, a normal equation is
We now recapitulate.
derived at the point when we solve in terms of the criterion

variable;
Subsequent steps are used to simplify.
Y:
and b2 were:
The normal equations for a, b

I
b Ex
= n
EY
For
For
= aEx1
EYX
Ex-
Ex x
b,Ex1.2
For b
Ex
aEx
EYx-
2:
When we simplified the normal equations, we obtained the

1
following set used in the derivation for two predictors.
b2r1
S S
yl .y
b r
S S
y2 y 2
12
Readers of the 1982c paper should recognize the remarkable

SiMildrity between raw score and standard score normal
f the above variables were standardized,
equations.
each term S, 1 = 0 and a=0 making each normal equation set equal.
I
We actually disregarded the term_ain the derivation because

it was seen to "drop out" when it_ was included in thelalgebra,__
It is included here because -the slope intercept term is included
The formula
in the regression equation for criterion score calculation.
used is:
A
Y
Y+
1
x +bx
2 2
1
See Lindeman, et al; for additional methodS of writing this

equation.
53
44.
Finding Normal Equations for p Predictors
The rules and method for deriving a set of normal equations

when the number of predictors is greater than two are
-
generalizations for the two (or one) predictor case.

show two methods for the general case;
We will
The first method will
The 'second is a short -cut technique;
use the four phase plan.
But
the shorter method depends on first showing the longer one.
What are the normal equations for_the one predictor model?

The reader may_find it instructive to derive the normal equations for
this linear model. This can be done using the above procedures as guidelines.
ANSWER:
S S
yl y 1
The "multiple" R_in this case is the simple Pearson product correlation,
This is obtained from the second equation.
S
which is equal to
r
b
yl
SY
Thus,
the regression (predictiol equation upon substitution is:

Y
b x
1
S-
4- ry,
xl
Applying the four phase plan gives the following results for the general case.
A.
The regression model is:

A
hk-
b k
h_x_
+...+ 1),X,
P P
B.
The function to be minimized accoring to the least squares criterion i:
:,;(Y
1)-
b,x,
J
P P
C. The procedures for fihding the normal equations for a and any
term are as follows:
In deriving the normal equation for a, regardless of the number of predictors,

the result is always the same-- a = I
2.
Finding the normal equation for any b: term can he done in seven steps:
a)
drop the exponent 2,and set the funetiOn equal to 0
b)
multiply the functiOn by x
c)
distribute the x, term
d)
e)
Hpply rules of summation for constants
,)
solve L
terms of the criterion variable,
sunrit
c descriptive statistics and simplify
46.
Applying these steps in turn produces:
a)
h1
h)
22
-b x,x
j) ...p) p
22j
11j
= 0
pp
= b x x.- b x x
ax
c)
-bpxp)
b x
b-x_
E(Y - a
2--bj
22j
= 0
Eb-X X- -...- Eb,x,

1
e)
P ) P
J J
b Ex?
b Ex x
aTx-
ZYx.
b Ex,x- = 0
J P
P
f)
8Ex, + b-Ex-x- + b
ZYx, =
Ilj
,x-x, +...+
2 j
_2
g)
S S
01-1 r
J
1 )
Dividing thrbUgh by
r .S S: =
yi y.3
bAn -1)S, +...+ (n -1)r, S,S
b(n-l)r1) .S-S, t b2 (n-1)r

)
n-1)
b r
1
13
b-r- S-S- +;..+b,S, +...+ b r S.S

22121
V JP I P
13
Thus, the normal equations for any number of predictors in the regression model
consist of
8=Y
and p normal equations of the general form
56
defined above.
yJyJ
JpJp
47.
Alternate -Proce-dure
The above normal equation for any b, term
,S S.
is a general result.
NoW a
Yj Y
much simpler procedure which makes use of this fatt Will be presented.
Recall that the simple correlation of any variable with itself is equal to 1.
r--=r -=,;=r, =;;;=r

22
II
jj
is eIual to
= 1.
Also recall that the
covariance of any variable With itself
PP
2
the
variance of that variable; that is
COV(X
X )
l'
or,
That is;
in general
cdv
Another way to denote
= S2
COV(X ,X
.= S
2
2
1
Cov(x -ik )
is S11
in general
;
we cnn Write
coli(x,Oc.)=S.or
J
S.
Jj
F.
From these facts; it is possible to write down an entire set of normal equations for any nutter
of
predictors.
kids
If r
for any b, term, the
it holds for j=1, j=2, j=3,..,,
YJ v J
j=p.
For example assume j=2 predictors.
We know that the set of normal equations will consist
ofixi= 2x2=2'
4 terms,
Thuil
firSt write out the general result forr,S 5,

Yl Y 3
,S S.
yj y j
S,
YJ Y J
br SS
,S S
j
b-r
b r
1
2 ij 2 j
lj
22j
Now; substitute the appropriate j value: j=1 for line 1; and j..=2 for line 2i
twice
as follows:
S S
Y1 Y
r
1
11
+ b r
S S
S S
11
yl y 1
S S
21 2
...
S=b1
OR
r-
S -S
y2 y
b r
S S
b r
2 2 2 22
S1 S2 +1,S2
12
OR
r_ S S
yl y
br- SS
br SS
S S
y2 y
12
12
+
2
The last set shows the subscripts of the correlations between predictors and criterion; and
the predictor standard deviations written so that the first Subscript is less than the second
subscript.
As mentioned
in the text, this c,:/vention
makes it easier to read the matrix ( and
see the symmetry of off diagonal terms).
Example for FiveTTedictors
To exem Jify the procedures for p predictors

equations for five predictors.
We will show
we will work thrOUgh the solution of normal
the solution by the shortcut method.
The long method could be used by applying the steps listed above for any
15,
term; but s'ice
the shorter method gives identical results, we will not work through the longer method.
We begin by writing out the 52 = 25 terms for the general r_ ,S S. normal ecivation.
nit i.,
YJ Y J
S S. on five separate line.
write out r
YJ Y J
br,SS, +
YJ Y
SS
S S
b r
I
2j 2 j
brSS
S S
lj
1j
S S
,SS,
bl
biSS3
3)j3
+ b3rS
S S
+
+ b4r 4i4i:
S S
2 2] 2 j
3j3
b -r
b rSS
b r
S S
3 3j
+ brS
b2r
J 33
+ br
b tSS
+brSS
223 2 j
b
4 4j 4 j
3 33 3
33j3j
I jr
+ br SS. +
+ br,SS
SS
b,
b r
S S
Yi Y
brS S
I j
+br SS, +br-,S-S,

4 4j
5 5j 5 3
Substitute the ,appropri ate j value (j=1 for line 1, j=2 for line 2, etc.); set ri
etc;
=1 r22=2)
and Set S
=S
etc.
= S.
22
'
+ br SS
2
.S S.
yl y
S S2
y2 y
S S
r
1
12
biri3SrS3
S S
rY.
21
S2
+ br SS
3 31 3
41
b3r32S3S2
b- r-
5 51 5
b4r42S4
b4
b5r52S5S2
b2r23yj
+ b3S
23
S S
S S4,
S S
Y
Y 5
24 2 4
y4 y
+ br SS
SS
b1 r
b r -S
1
15
25
Y
4
+ bS-
SS
4
'5
60
S S
5 54 5
4 4
4.
+ b3 r,
S2 S
b2 r
+ b-r
b4r45S4S5
1,
"5"05
If one desires, the subscripts
may be reversed for variables in the Upper right hand
The result is the same set of
triangle to render the first less than the second.
normal equations that would be obtained if the longer method were used to derive the
h(tMal equations.
The auLhor would be pleased to receive comment and reactions by readers of this paper
and others that appear in this series.
proofs and derivations
My intention is to prepare a textbook
for social science_ students.
of
I have long felt the
textbooks
need to bridge the gap between the standard applied statistics (and psychometrics)
currently on the market and mathematical_statistics. The mathematical sophistication
of students entering college and_univerity is rising steadily, and a textbook such I am

contemplating would make a contribution,_I feel. While it is true that a "real"
understanding of statistical (and probability) theory requires substantial mathematical
coursework, it is nonetheless true that more in the way of explanation and justification
It is my belief that a textbook
of results in probability and statistics is possible.

showing detailed presentations of proofs/derivations
would be a welcome addition to the market.
I would like to hear from readers (students, professors and others) regarding these
Are there proofs that you would like to
For example, are they clear?
papers.
see (statistics or psychometrics) in this format? Please remember, at this time I
am limiting my selections to those which can be presented with algebra.
I welcome comments on any level from readers of these papers. My mailing address is:
Franci J. O'Brien, Jr.
106 Morningide Drive, Apartr6nt# 5
New York, New York
61
10027
51.
Appendix B
ERRATA for " A derivation of the sample multiple Correlation formula for standard scores" ED 223 429
--Nor
-Dretiv-atik- far TwoPtedc
rs
Correct to
Reads--.
Let us review some
it us review some of the
concepts; notation
concepts, math
...
1_
3; footnote
1-
If it is understood that the 111
21, first formula
If it is understood that 411-1_4-a
summations range _from i=1 to
summations range from i=1 to
i=h,then We can drop the summation
i=n, then we can drop the summatim
limit.all together;
limits 4togthser;
cov(Z,y1,B2Z2,B3Z3,.,
Z
J
cov(2,B Z
B Z
2
B.Z.
PP
PP
JJ
corr(Zy,BIZ1,B2z2,B3
22
Corr(yBIZ1
B2Z2
B-P Z-P )
B.Z,
B Z )
JJ%
PP,
27; statement under Plan
add period, D.
27; two lines under previous
demonstrate
demonstrate
31, line 2
consisdered
onSidered
33, 4 sentences 1'rom bottom
first
erratum
firSt
first
corrected prose is underlined; but focMuIas are rewritten with applied corrections only,
NOTE: "Page" refers to original numbers in upper right hand corner.
,B Z
3 3
52.
REFERENCES
Goodman, A,W;_Call-culus for the Social Sciences

Saunders Company; 1977.
Philadelhpia: W.B.
Charles J. Stone. Introduction to

Hoel; Paul G., Sidney C. Port and
Statistical Theor.L. Boston: Houghton Mifflin, 1971.
Lindeman, Richard H., Ruth Gold and Peter Merenda. Bivariate and Multivariate
Chlcago: Scott, Foresman and Co., 1982.
Analysis
.
O'Brien, Francis J. Jr.

A proof that t and F are identical: the
general case, 1982a. ERIC ED 215 894.
Proof that the sample bivariate correlation coefficient has

1, 1982b. ERIC ED 216 874
1iMitS
A derivation Of the sample multiple Correlation formula for

Standard scores, 1982c. EDIC ED 223 429.
.
DOCUMENT RESUME
ED 280 896
TM 870 228
AUTHOR
TITLE

A Derivation of the Unbiased Standard Error of
Estimate: The General Case.
PUB DATE
NOTE
87
PUB TYPE
EDRS PRICE
DESCRIPTORS
IDENTIFIERS
54p.; For earlier monographs in this series, see ED

215 894, ED 216 874, ED 223 429, and ED 235 205.
Reports
Research/Technical (143) -- Guides Classroom Use - Materials (For Learner) (051)
*Error of Measurement; *Estimation (Mathematics);
Goodness of Fit; Higher Education; *Mathematical
Models; *Predictor Variables; Proof (Mathematics);
*Raw Scores; Regression (Statistics); Statistical
Studies
Applied Statistics; *Z Scores
ABSTRACT
This paper is pa4.-t of a series of applied statistics
monographs intended to provide s:applementary reading for applied
statistics students. In the present paper, derivations of the
unbiased standard error of estimate for both the raw score and
standard score linear models are presented. The derivations for raw
score linear models are presented in graduated steps of generality
for one, two, three, and any finite number of predictors. A brief
overview of regression analysis precedes the derivations. Appendices
include: (1) errata for a derivation of the sample multiple
correlation formula; and (2) a discussion of linear and nonlinear
regression models. (LMO)
**********************************************************************
Reproductions supplied by EDRS are the best that can be made
***********************************************************************
Derivation of the Unbiased Standard Error of Estimate:

the General Case
U.S. DEPARTMENT OF EDUCATION

Office 0 Educational riesearch and Improvement
CENTER (ERIC)
VThis document has teen reproduced as
received from the person or organization

originating it
0 Minor changes have teen made to improve
reproduction quality.
Points & view or opinions stated in this docu .
ment do not necessarily represent official

OERI position or pohcy.

MATERIAL HAS BEEN GRAN1ED BY
Fj O'Br
Tr.

1 987
Francis J. O'Brien, Jr.

ALL RIGHTS
RESERVED
BEST COPY AVAILABLE
"A Derivation of the Unbiased Standard Error of

Estimate:the General Case"
ERRATA SHEET for
4,
2nd equation
(X - X )
1
6,
CHANGE TO
NOW READS
PAGE
definition
2
Y.x ,x , ...,x ,...x
Y.x ,x
of R
definition
of r
9,
3xY
91
9,
Ix
9
1
footnote b
...,x
2
align subscript for 7
9, definition
of
x
1
10,
ist equation
Y.x
Y.x1
12, Sth equation
Ix
Ex y
18,
2nd equation
2 2
2 2
2 2
+ b
(b S
1 2
2 2
2b b r
S S
1 2 91 9 2
-2(Ebr SS)
j'l
9j Y
2b b r
S S )
1 2 12 1 2
26, 3rd line

from bottom
2 2
b S
2 2
(b S +
1 1
-2(Eb r
S S
j41 j 9j 9 j
PAGE
CORRECT TO
NOW. READS
31, 2nd line of

2nd equation
34
'2
1 -
1 - 1
br SY
j 9j Y
Refers to page at top.
br
j
j.
SS
9.1
9 j
Table of Contents
Page
Introduction
Overview of Regression Analysis
The Standard Error of Estimate
Derivations for Raw Score Model
Derivation for One Predictor
Derivation for Two Predictors
14
20
29
Derivations for Standard Score Model
36
Introduction
36
37
Outline for Derivations
40
Appendix A:Errata for ED 235 205
41
Appendix B:Discussion of Linear and Nonlinear Regression

Models
42
Notes
46
References
48
List of Tables
Page
Table
Basic Sample Descriptive Statistics for One Predictor

Raw Score Model ....
2.
Substitution Equations for Two Predictor Raw Score Model ..
17
3.
Functions of R
4.
Generalized Substitution Equations for Raw Score Model
5.
Functions of R
6.
Functions of R
1.
for Two
Predictor Raw Score Model
for Three
Predictor Raw Score Model
19
25
27
for p Predictor Raw Score Model . .......
ii
34
A Derivation of the Unbiased Standard Error of Estimate:

the General Case
Introduction
This paper represents the f!fth in a series of

applied statistics
monographs (See O'Brien 1982a, 1982b, 1982c, 1983a)
The purpose of these
papers is to provide supplementary reading for applied statistics students.
The intended audience is social science graduate and advanced undergraduate
students.
The minimum background for i.lost of the existing and forthcoming
papers is familiarity with elementary analysis of variance, and multiple
correlation and regression analysis.
The unique feature of this series is detailed proofs and derivations of
important formulas and relationships which are not readily available in
textbooks, journal articles and similar sources.
Each proof or derivation is
presented in a detailed and clear fashion using well defined and consistent
notation. When necessary,
a review of relevant algebra is provided.
Calculus is not used or assumed.
.
The present paper assumes familiarity with two previous papers in this
series (O'Brien, 19820, 1983a). Each paper formulated a detailed derivation
of the multiple correlation formula of one criterion and p predictors for the
linear model.
The first"
paper (1982c) presented a derivation of the
multiple R based on
1
standard (Z) scores,
and the second showed the analogous

2
derivation for the raw score model.
In the present paper derivations of the unbiased standard error of

estimate for both the raw score and standard score linear models are
presented. The derivations will be presented in graduated steps of
generality. First the derivation for one criterion (dependent) variable and
one predictor (independent) variable is presentd for the raw score model.
A
derivation for two raw score predictors is then presented.
Next, the
derivation for the three predictor case is formulated.
Finally, the
derivation for any (finite) number of predictors is presented.
Derivations
for the standard score model are then outlined.
Overview of Regression Analysis
Prior to presenting the derivations, a brief overview

3
of regression analysis will be given. Let us consider

the linear regression model for one raw score criteHon and
one predictor. Assume one is attempting to predict one
criterion with one predictor.
We assume that the model
4
is linear in form.
The mathematical model we might select
to"fit" such a distribution is the simple linear equation:
A
a + b X
1 1
Where:
A
Y
the predicted criterion,

= the slope intercept term,
the slope coefficient term,

1
the predictor variable in deviation score form ;
i.e.,
x -
where
1
"T
is the arithmetic mean.
If a scatter diagram were constructed for this hypothetical model (based on

actual data, of course), the actual rau score observations would in all
likelihood not fall on the line defined
A
by the linear equation of the idealized mai-hematical model (Y).
A
Such deviations from Y
are considered errors of prediction.
We can
conceive a raw scor :? observation as consisting of a component predicted by the
model plus an error component. That is:
+ e
Where:
Y
the actual criterion we want to predict by YA

the amount of numerical error resulting from using
A
the idealized mathematical model
actual raw score criterion (Y).
(Y)
to predict the
an actual dependent (criterion)

variable score consists of the
quantity predicted by the idealized "best fitting" line plus an error
component.
That is,
The error made in predicting the observed criterion score by the model
is
simply:
enY-Y
One of the goals of regression analysis is to minimize the prediction error
denoted by e above. It can be seen that if e.0, then the actual criterion is
perfectly predicted by the selected mathematical model.
That is to say, the
simple linear
equation fitted to the observed data points,
a + b x
1 1
predicts every observation (Y)
when egO, every Y score falls

values corresponding to a and
algebra based on the observed
exist in the social sciences.
procedures which will provide
in the distribution.
Geometrically,
on the straight line, Y.

For this case, the
b can be solved empirically using elementary
data. Rarely, however, do such distributions
Consequently, we are forced to select
computing formulas for calculating the a and b
terms.
The technique most often used in the social sciences -`o minimize the
error of prediction is the "least squares" procedure.
Essentially,
this
procedure seeks to maximize predictability by minimizing prediction error.
The least squares criterion or goal is summarized in the following
expression:
a minimum
If we substitute the quantity for Y

least squares criterion as:
previously defined, we can rewrite the
Ie
(Y-a-b x )
(a+b x
1 1
minimum
(As an aside, "least squares" means we determine values for a and b such that
the squared error term results in the least possible value).
The Standard Error of Estimate
The standard error of estimate provides a measure of the

for Y score prediction.
average amount of error that results from using Y
(See
Lindeman, et al.).
The unbiased standard error of estimate for one
predictor is defined as follows:
...
(Y-Y
(X
1-17-(a+b
.)
X )1
E(Y- a
b x )
1 1
n-2
where:
Y.x
1
the unbiased standard error of estimate for

one predictor,
the sample size.
Note that the predictor variable (x )
is
in deviation form.
However, the criterion to be predicted (Y) is not transformed;nor

A
do we transform the predicted criterion (Y),
This is the definitional formula for the unbiased standard error of
estimate. An equivalent formula shown in virtually all
applied statistics textbooks is as follows:
10
n-1 (1-r
x y
Y.x
n-2
S
1
where:
S
the standard deviction of the actual criterion score,
the square of the simple Pearson correlation between
x Y
1
the predictor in deviation form (x ) and the

criterion (Y).
1
This formula will be derived in this paper.

In general, the standard error of estimate con be
obtained for a linear regression model containing any finite
of predictors. If we let p represent an indefinite number
of raw score predictors, the unbiased standard error
of estimate can be expressed as::
BEST COPY AVAILABLE
11
number
Y.x ,x
,...,x
X(Y- Y
.411".11
n-(p+1)
(1- R
(n-1)
Y.x
, x
n-(p+1)
where:
Y.x ,x
1
the unbiased standard error

of estimate for p predictors
(in
deviation score form),
an indefinite number cf predictors,
Y.x ,x
the squared linear multiple

correlation between one criterion and
p predictors.
This formula also will be derived in this paper.
The standard error of estimate also can be de:ived for regression models
in which the variables have been expressed in standard score (Z) form.
The
unbiased sample standard error of estimate for a one predictor standard score
linear
is defined as
model
n-2
2.
E[zy
(A +
iij
n-2
where:
the standard error of estimate for the

standardized criterion (Z ) and the
standardized predictor (Z ),
1
a the sample size,
8 the slope intercept term,
the standardized predictor,
weight.
the beta (regression)

3.
the prediction error.
We show that the definitional formula above is equal
n-1 (1-r
Z .Z
Y
Z ,Z
3.
n-2
13
to:
where:
the squared correlation of Z

Z
and Z
1
1
For standard score variables, the unbiased standard error of

estimate for p predictors is:
n-1
Z .Z ,Z
Y
,...Z
2
n-(p+1)
2
-
Z .Z
Y
...,Z
...,Z
where:
...,Z ,...,Z
.Z ,Z
unbiased standard error of estimate

for p predictors,
.Z ,Z
Z
Y
squared multiple correlation

between the criterion (Zy)
and p standardized predictors.
In this paper we will concentrate on the standard error of estimate

for the raw score model. The derivations for the Z score model will
The reader may wish to work out the derivations
be outlined.
for the standard score model using the detailed presentations for
the raw score model as a guide.
Derivations for Raw Score Model
In the next several sections, we will show the derivations of the

We begin with the
unbiased standard error of estimate for raw scores.
simplest case of one criterion and one predictor.
For the readers convenience in working through the algebra, we will

This is done in Table 1.
summarize relevant definitions and formulas.
14
Table 1
Basic Sample Descriptive Statistics for One Predictor

Raw Score Model
Regression Model
a+bx V+ r
1 1
91
1
1
Variance of Y:
r(Y-v)
n-1
n-1
2
Variance of X
n-1
n-1
Correlation
x
and
of
:
(n-1) S S
91
Y 1
Note: All summations range from i.1 to in observations.

a
This is derived from the least squares criterion ;i.e.,

2
-
i.1
r .(Y -a-b x )
1.1
i.1
1 1
n
N
minimum
i
See O'Brien, 19830, p. 44
See O'Brien, 19830, for justification that the numerator in the

correlation formula may be given as:
9 or IX Y, where
Y,
X - 7
x
1
In this paper, we will use the correlation
15
and 9 .Y-Y.
expression
10
(or r
).
92
yl
We begin by repeating the definition of the unbiased
standard error
of estimate:
Substituting for
Y.
1:(Y -
a - b x )
1 1
Y.x
n-2
It will be easier if we work with the variance err,or of estimate.

This is simply the square of the standard error of estimate:
2
2
Y.x
a - b x )
1 1
(Y
1
n-2
It was shown by the author that the slope intercept term,
a,
(See O'Brien, 18830, p.44).

equal to the criterion mean, V
Making that substitution and rearranging terms:
16
is
11
2
2
p-7)
Y.x
b x )]
1 1
n-2
Let us express (Y-7)

algebra:
in deviation score form
to simplify the
This gives us:
Y-Y.
E (y-b x )
Y.x
1 1
n-2
Squaring out the terms inside parentheses for this binomial

expression:
2
2
Y.x
+ b
1
2
x
n-2
- 29b x )
1 1
12
Bringing the summation operator inside and factoring constants

functions
out54de the summation operator (recall that b

1
as constant to be estimated in the regression mcdel):
2
2
Y.x
2
1
x 9)
-2b
n-2
Substituting the following expressions (see Table 1):
(n-1) S
N:x
(n-1)S
1
91
(based on substitution from

Table 1 and O'Brien, 1983a, p.44)
(n-1)r
Y
1
S S
91 9 1
Thus:
Y.x
(n-1)S
S / S )
(r
91
(n-1)S
n-2
-2(r
S /5
91 9 1
(n-1)r
S S
9191
Factoring out the (n-1) term:
18
)]
13
2 2
(S /S ) S
(n-1)
y1
2r
y1
(n-2)
Simplifying:
(n-1)
(n-2)
[li
(n-1)
Y.x
91
9
1
2r
9
91
91
2
S
1.
(n-0
91
(n-2)
Taking the (positive) square root, the unbiased standard error of estimate
for one raw score predictor is:
Y.x
ENO OF PROOF
(n-1) 1
(n-2)
19
14
Derivation for Two Predictors
In this section we seek to show that the unbiased standard

error of estimate for two raw score predictors is:
Y.x ,x
(n-1)
- R
Ii
.x ,x
Y
(n-3)
where:
the observed criterion standard deviation,

squared
Y.x
the/mUltfpfle correlation between the criterion and

the two raw score predictors (in deviation score
x
1
form)
We begin with the definition of the unbiased standard error

of estimate for two raw score predictors:
A
Y.x ,x
1
(Y-Y
E(Y-Y
n-(p+1)
n-3
2
BY
-a -bx- bx
1 1
2 2
n-3
As in the one predictor derivation,

variance error of estimate:
20
it will be easier to work with the
15
2
2
S
Y .x
SE(Y- a- bx -bx)
, x
2 2
1 1
r.w.gw.m.m.Mgmm.wa./Mr.
n-3
Substituting -sir
for the slope intercept term and rearranging:
S
1
-bx -bx)
Z[Y-7)
Y.x ,x
1 1
2 2
n-3
Now, expressing Y-7

trinomial expression:
in
deviation form and expanding the
Y.x ,x
1
n-3
[y+bx +bx
1
- 2yb x
2 2
1 1
-2yb x
+ 2b b x x ]
1 2 1 2
Bringing the summation operator inside and factoring constants:
21
16
2
S
--
Y.x ,x
1
n-3
1
+
2b b
1 2
7x
( 1:9 + b
rx
" 1
+ b
5-
2 ' 2
x )
1 2
The following formu)as can be used for simplification:

2
(n-1)S
L9
2
(n-1)S
7Ex
(n-1)S
2
EA2
S S
(n-1)r
91
9 1
92
S S
9 2
12
S S
1 2
(n-1)r
(n-1)r
For ease reference, these formulas are summarized in

Table 2.
24 2
17
Table 2
Substitution Equations for Two Predictor Raw Score Model
(n-1)S
9
2
(n-1)S
1
Ex
(n-1)S
2
Ex
y1
S S
y 1
y2
S S
y 2
12
S S
1 2
(n-1)r
(n-1)r
2
yx x
(n-1)r
1 2
Note: equations are expressed in deviation score form.

Each equation is based on algebraic rearrangements for
basic sample descriptive statistics (compare Table 1).
For example, the variance of Y is:
2
S
E(Y-7) /(n-1)
"
Ey /(n-1)
Solving in terms of
(n-1)S
23
18
Making these substitutions:

2 2
+ (n-1)b
1 1
+ (n-1)b S
fin-1)S
S
2
n-3
-2(n-1)b r S S
1 91 9 1
2(n-1)b r
S S
2 92 9 2
+ 2(n-1)b br SS
1 2 12 1 2
Factoring out the (n-1) term and rearranging:
r2
(n-1)
Y.x ,x
(n-3)
2
+
(b
1
2
+
b
2
-2(brSS+brSS
1 91 9 1
2b br SS)
1 2 91 9 2
2
)1
2 92 9 2
The next step is very important. The two terms in parentheses

reduce to functions of the squared multiple R for two predictors.
As was shown in the author's 1383a paper, the derivation of
R for two predictors results in several equivalent ways to express
2
R or R
Table 3 shows forms of R which will be used in the
next step. (Compare O'Brien, 1383a, pages 12-18, especially p. 18).
.
24
19
Table 3
2
Functions of R
for Two Raw Score Predictors.
bS +bS
2
+ 2bbrSS
1 2 12 1 2
br SS +br SS
1
'21
y 1
2 y2 y 2
Rearranging:
2 2
R S
b
1
+ 2bbr SS
Note: R
+b
1 2 12 1 2
2
a
Y.x ,x
1
See O'Brien, 1983a.
25
br SS +br SS
1 y1 y 1
2 y2 y 2
20
Thus,
bS +bS +2bbr SS
R S
1 2 12 1 2
br SS +br SS
2 92 Y 2
1 91 Y 1
Making these substitutions:

n-1
Y.x ,x
1
n-3
+ S
25
R
9
n-1
2-1
n-3
n-1
1- R
n-3
Y.x ,x
Taking the positive square root, the unbiased standard error of estimate for
two raw score predictors is:
(n-1)
Y.x ,x
1
-------
[1-
END OF PROOF
Y.x ,x
(n-3)
Prior to showing the derivation for the general case of p predictors, we

will present the derivation for the three predictor model.
This allows us to
review the logic and procedures of the derivation.
In addition, we introduce
26
21
summation notation throughout all of the steps of the derivation which

simplifies the algebra for the general case.
For three raw score predictors, we will show that:
2
S
Y .x , x
1
, x
n-1
Y.x ,x
n-4
We begin by presenting the definition of the unbiased standard error

of estimate for three predictors:
CY
Y.x ,x ,x
1
n- (p+1)
-bx -bx)
1
2 2
3 3
n-4
As before, we will work with the variance error of estimate:
27
22
a- bx -bx -bx)
2
3 3
2 2
3.
Y.x ,x ,x
n-4
Proceeding as before, we first replace a with V

express
as 9:
2
S
Y .x
1
- bx -bx -bx)
3 3
2.2
1 1
, x
, x
and
n-4
[9
-bx - bx -bx
3 3
2 2
1 1
n-4
Expanding this quadrinomial expression:
2
3.
Y.x ,x ,x
1
n-4
LI
2
+b
2
x
1 1
2
x
3.
- 29b x
2
+
29b x
2 2
2
x
- 296 x
3 3
+ 2b b x x + 2b b x x + 2b b x x
2 3 2 3
1 3 1 3
1 2 1 2
Bringing the summation operator inside:
28
23
Y.x ,x ,x
2
n-4
2
+b 5x +b yx
+b
1
rx
- 2b
1
- 2b
x y
2
2b brxx
1 2
1 2
x y
2b
2b b
1 3
+ 2b b cx x
1 3
2 31--
2 3
The following substitution formulas stated in general form will help us to

simplify the above expression (see Table 4 for reference):
(n-1)
ru
2.
For any x
2
For any A lc
Y 4
EA X.
(n-1)r
S S
Y
j
For any x x
i
S S
(n-1)r
i
ij
Applying these substitutions:
29
24
2
2
Y.>: ,x ,x
1
En-1)5
3
n-4
(n-1) b
1
- 2(n-1)b r
+ (n-1) b
SS
- 2(n-1)b r
+ (n-1)b
2
1 91 9 1
2
S
SS
2 92 9 2
- 2(n-1)b r
SS
3 93 9 3
+ 2(n-1)b br SS + 2(n-1)b br SS + 2(n-1)b br SS

1 2 12 1 2
1 3 13 1 3
30
2 3 23 2 3
25
Table 4
Generalized Substitution Equations For Raw Score Model
(n-1) Ly
9
(n-1)
Ex
S S
(n-1)r
9j 9
S S
(n-1)r
CE:x x
j
ij
For example, the second equation applies to any X variable;

for the jth X variable, the sum of squares is related to the
jth variance.
31_
26
Factoring out (n-1) and rearranging:
n-1
Y.x ,x ,x
n-4
S +b S +b S
+(b
1
'3
2bbr SS
-2(br SS+br SS+br SS

1 91 9 1
SS
.r2bbr
1 2 12 1 2
2bbr SS)
1 3 13 1 3
2 3 23 2 3
)1
3 93 9 3
2 92 9 2
We now express the parenthesized terms in summation notation (see O'Brien,
13830:
n-1
maillma.11=1111Mlis
x ,x
Y.x
1
Sy
n-4
3
3
+
j1
+ 2 1 r bbr SS)
j2 1.1
ij
3
-
2 (' z br SS ) 1
ji
9j V
Table 5 shows equivalent forms of R
for three predictors stated
in summation notation.
32
27
Table 5
a
j'1
Raw Score Predictors
For Three
Functions of R
bbr SS
2 E.
j'2 i'1
ij
2: b r
j.1
S S
yj Y j
Rearranging:
2 2
RS
j1
Note:
j=2 in1
Y.x ,x ,x
a
gEbr SS
2 'Er bbr 5 $
See O'Brien, 1983a
33
ij
j'l
9j
28
Thus:
2
R S
b
j
bbr SS
ja2 ia1
ij
br SS
j
9j
Substituting:
n-1
Y.x ,x ,x
1
2 3
2
S
n-4
2
+
Simplifying:
34
2
R
2S
R
9
29
IiR
n-1 S
Y.x ,x ,x
1
n-4
or
n-1 S
Y.x ,x ,x
1
[1
Y.x ,x ,x
n-4
Therefore, the unbiased standard error of estimate is:
END OF PROOF
n-1[. - R
Y.x ,x ,x
Y.x ,x ,x
1
n-4
Derivation For p Predictors
In this section, we show the general form of the unbiased

standard error of estimate when the regression model contains
some unknown but finite number of predictors (p).
We will follow the same steps in the derivation we used for one,
two and three predictors. It will be seen that the derivation
for the general case of p predictors is a straightforward
multivariate generalization.
Formally, we will show that the unbiased standard error of
estimate for p predictors is:
35
30
S
1
1 n-1
n-(p+1)
Definitions for terms in the formula were given in the section

"Overview of Derivation".
Starting with the definition of the unbiased standard error
of estimate:
2
)
n-(p+1)
1: [Y- a-bixi-b2x2-...-b.x.-...-b
J
P P
n-(p+1)
As in the previous derivations, we will work with the variance error of

estimate:
7 (Y-a-b x -b x -...-b x -...-b x )
I--
1 1
2 2
p p
n-(p+1)
Now replace a by V
and express Y-7
in deviation score form:

2
1:
Y.x ,x ,...,): ,...,x

1
1 1
2 2
n-(p+1)
36
P P
31
Expanding this multinomial:

2
X
Y .x
, x
n- (p+1)
E(9 +b
1
+...+ b
+...+ b
ii
2
x
29b x -...-2yb x
j
j
P P
- 29b x
2 2
- 29b x
1
+b
+ 2b b x x + 2b b x x +...+ 2b b x x
1 2 1 2
1 3 1 3
ijij
+...+ 2b
b x
x )
p-1 p p-1 p
Bringing the summation operator inside:
Y .x
, x
n- (p+1)
rx
( E9 + b
1
+ 2b b
1 2
2
x
1 2
r.x
+...+
...-2b
...-2b ST' yx
sx
j/
+ 2b b
+...+ b
2
2b 7yx
1
- 2b \5,
1
x x
+..
P
b
1 3' 1 3
+...+2b
.! 4 i
b 1:x
p-1 p
Using the generalized substitution formulas given in Table 4,

can simplify as follows:
37
x )
p-1 p
we
32
Y.x ,x
...,x
n-(p+1)
(n-1)S
(n-l)b
- 2(n-1)b r
S S
+...+
- 2(n-1)b r
S S
2 y2 y 2
1311
(n-1)b
(n-l)b
2(n-1)b r
''
S S
.-
JJJ
S S
2(n-1)b r
P YP
P
+ 2(n-1)b b r S S + 2(n-1)b b r S S +...+ 2(n-1)b b r

1 2 12 1 2
1 3 13 1 3
j
S S +...+
ij
b r
2(n-1)b
p-1 p p-1,p p-1 p
Factoring out (n-1) and rearranging:
n-1
Y.x ,x
n-(p+1)
2
[S2
2
+
(b
S
1
b
2
+...+
+..+
ii
2b b r S S + 2b b r S S +...+ 2b b r
1 2 12 1 2
1 3 13 1 3
j
i
b r
2b
2
S
+...+
S S
ij
S )
p-1 p p-1,p p-1 p

2(b r
1
SS +...+br SS)
j9j9j
P913913
SS +br SS +...+ b r
y1 y 1
222
Expressing the terms in parentheses in summation notation:
38
33
n-1
2
S
Y.x ,x ,...,x ,...,x

2
n-(p+1)
P
S
9
2...
(L
j1
2(r
P
J*1
2
S
bbr SS)
j2
jiji j
br SS )1
j
9j
Table 6 shows equivalent forms of the multiple R

predictors (see O'Brien, 1983a).
39
for p
34
Table 6
2
Functions of R
for p Predictors
jl
bbr SS
j2 i1
ij
r br SY
j1
9j Y
Rearranging:
RS
jal
Note:
E b
2
S
2 LEbbr SS
p2 i1
Y.x ,x , ...,x , ...,x

a
See O'Brien, 1383a
40
ij
br SS
j
jm1
9j 9
35
Thus:
2 2
RS
1E:b
9
p-1
+ 21bbr SS
j.2 i1
ij
'tbr SS
j=1
9j 9
Substituting into the variance error of estimate above:
n-1
Y .x
, x
n- (p+1)
n-1
x ,
2
S
Y.x
- 2R S
+ R S
9.x
n- (p+1) 9
x
2
Therefore:
n-1
s
x
Y .x
x
j
R
9
Y.x
n- (p+1)
END OF PROOF
41
x
1
36
Derivations for Standard Score Model
Introduction
We have presented derivations for the unbiased standard error

of estimate for the linear raw score model when the number
of predictors was one, two, three and some finite number, p.
In this part of the paper we will outline the derivations
for the standard score model.
The reader may be aware of the fact that there is
a simple relathnship between models in raw score form
and standard score (Z) form. This relationship obviates the
need for presenting detailed derivations for the Z score model.
Therefore, we will outline the derivations for the standard
score model, and leave the proofs as an exercise for the reader.
We will show the logic behind transforming from the linear
raw score model to the Z score model. First we take the standardized
model for one predictor. We then provide an outline
for generalizing the derivation for the p predictor standard
score case.
42
37

Recall the derivation for the one predictor raw scot-, model.
The derivation of the standard error of estimate was shown
to be:
2
n-1.
[1-r
--9
n-2
Let us now consider the model in standard score form.

First, recall the following relationships for the Z score
model (See O'Brien, 1982b for proofs):
2
r
Z ,Z
Y
91
1
43
38
the standard deviation for the raw score

variable Y is equal to unity when Y is standardized.
Also, the square of the simple (zero order)
Pearson correlation when calculated in raw score form
is identical to the correlation between the same variables
that have each been standardized.Taking these facts into account,
we can rewrite the raw score standard error of estimate
for Z scores as follows:
That is,
n-1
1-r
Z ,Z
n-2
9
n-1
1-r
x y
n-2 L
1[7::
x y
n-2
44
39
If one were to extend this logic to the case

of p standardized predictors, the standard error of estimate
for p standardized predictors is:
n-1
,...Z 2
.Z ,Z
Y
n-(p+i)
Z .Z
Y
Z
2
(n-1)
[1- R
Y.x ,x
1
,...,x.
2
n -(p+1)
also is equal to 1.
For the p predictor case S

9
It remains to be proved 'That the squared multiple R's

are equal to one another.. It can be shown that they
are equal for p predictors, although this statement is
not proved in this paper.
45
40
Outline for Derivations
The reader who desires to derive the unbiased

standard error of estimate for p linear standardized
predictors may use the following outline as a guide.
Essentially, the steps parallel those for the raw
score model. First, the definitional form for
the standard error of estimate is stated.
Substituting the terms of the regression model
for p predictors is the second step. (See O'Brien, 1983c).
Third,
square the multinomial expression. Nex+, a
series of equations are substituted into the squares
and cross products of the squared multinomial. The
reader may refer to the author's paper (1983c) for
the relevant equations. The simplified expression
Functions
is then expressed in summation notation.
of the multiple squared R are substituted.
Upon
simplification, the result will be the unbiased
standard error of estimate for the Z score model.
Many students who work out the derivations for
the Z score model prefer to
work with several
predictors in succession. This was our approach
for the raw score model derivations. A careful review
of the steps used in the raw score derivations
9
may be helpful
in working through
the long tedious algebra.
46
Appendix A
Errata for "A derivation
of
the samplemultiple correlation formula
for raw scores,

Page
ED 235 205
Correct to
Nol.irReads
10, footnote,
3 lines down
X Y
X Y
10, footnote,
4 lines down
n X Y
13
var (b ,x )
n X Y
1
var(b x )
2 2
16, footnote,
last 2 lines
... and simplifying.iSee

text for details.
See the text for details.
17, footnote
Multiple R
multiple R
24, footnote 1
Omit this.
29, equation
x Y
b
p
30, 3 lines from bottom
x
p
bbr SS
bbr SS
2 p 2p 2 p
2 j 2j 2 j
34,2nd equation
36, 2 lines from

bottom of text
x
p
change = to +
=...+b r S S
j yj Y j
mathematical calculus
2
mathematical statistics
2
38, 2nd equation

1
2
43, last line in text
47
Page number-at -top of text:-
4 2
Appendix
Discussion of Linear and Nonlinear Regression Models
This appendix will clarify terminology used in two previous papers

(O'Brien, 1982c, 1983a).
Some readers have requested clarification of my use
of terms "linear" and "nonlinear' as they apply to regression analysis.
There are two reasons why this should be done.
First, the terminology
and/or notation used in applied social science statistics textbooks and
similar sources is quite variable. This has the potential for causing
confusion in students' minds when attempting to read the same subject matter
in different sources. Second, it is very important to be clear about the
differences between a linear and nonlinear regression model. As will be seen,
"truly" nonlinear regression models are not often used in many areas of social
science.
Our aim in this appendix merely is to clarify the uses of the
terminology. References are cited at the end of the appendix for readers who
desire to learn more about nonlinear regression models.
I believe confusion exists in the use of the terminology for several
reasons.
Perhaps the basic factor relates to what students learn in
The terms linear/nonlinear as they
nonstatistical mathematical courses.
relate to functions or rolationships discussed in mathematics textbooks are
not used in the same way by statisticians when discussing linear/nonlinear
regression models.
Consider a simple example of the parabola (or quadratic or second degree
equation):
2
Y
f (X)
-3 < X
MM.
48
< 3
111
4 3
If this function is plotted on ordinary graphing paper for

values of X + 3
the plot would show a curve opening downward
with maximum height of 8 Y units at the origin. This function is
not linear in form because it cannot be expressed in the
form of a first degree equation:
i(X)
a + bX
Geometrically, a plot of the quadratic function above

would not reveal a straight line or linear function.
For these two reasons/the parabola may be thought
of as a "nonlinear" function.
Statisticians use the terms linear/nonlinear in
a different manner. In the statisticians use of the
terms, the difference between them has more to do
with the form of the regression parameters (slope terms)
than with the form of the independent or dependent variables.
In addition, a plot of the raw observed data points
is not relevant to classifying a regression model as linear
or nonlinear.
Let us examine some examples. Assume the following regression
model (adapted from Draper and Smith, p. 264):
2
exp(b
+
1
e)
Where:
exp
b,,b
XI
2-
the dependent variable,

the exponentiation operator for the mathematical
constant,
2.71828 (approx.),
e
parameters to be estimated,
the independent variable,
the stochastic error term (as used in this paper).
Note that equation 1 expresses what we have been calling a

"raw score model";e.g., for equation 1, we could write:
A
F
exp(F
+ e).
Is the model in (1) a linew, or nonlinear regression
model? We need to examine the terms in (1) to decide.
Let us now rework equation 1 to render the model linear.

If we take the natural logarithm of each side of equation 1,
we obtain:
(1)
4 4
lnExp(b
lnF
b X
+
1
e).)
2
2
b X
2
(2)
We now redefine the terms in equation 2.

Y
lnF,
Let:
Then (2) becomes:
Ymb + bX
1
+e
(3)
Equation 2 has been linearized. Statisticians would call the regression

model expressed in (3) a linear model despite the fact that the relationship
between the dependent and independent variables is not one of a straight line.
Draper and Smith offer useful terminology to distinguish (1) from (3).
The regression model stated in (1) may be referred to as intrinsically linear.
This means that although equation 1 is nonlinear (with respect
to the parameters b and b ), transformations
2
can be made to express the model

the parameters).
in a form which is linear (with respect to
To take a second example (also from Draper and Smith), consider the
following regression model:
X]
exp(-b X) - exp(-b
b - b
1
+ e
(4)
Where:
the dependent variable,

as in equation 1,
the parameters,
exp
b ,b
1
the independent variable
is nonlinear (with respect to the parameters). In addition,

equation 4 cannot be transformed such that the parameters will be linear in
This model
45
Draper and Smith refer to such a regression model as

form.
intrinsically nonlinear.
Further discussion and examples of linear/nonlinear regression mw.;els
may be found in Kendall and Stuart (1967), Mosteller and Tukey (1977) and Nie,
et al. (1976). Those references provide additional source material.
46
Notes
Page
See O'Brien (1983a, Appendix B) for an errata sheet.
references given in the errata pertain to the original pagination
(i.e., at the top of the page).
Errata for this paper are given in Appendix A of the present paper.
Readers who need to review regk-ession analysis theory can refer

to standard applied statistics textbooks.
One that is highly
recommended for its thoroughness and clarity is by
Lindeman, Gold and Merenda (1382).
A general overview is given
by Lewis-Beck (1980).
See Appendix B for discussion of linear and nonlinear

regression models.
5
If it is understood that the summation limits range from the first
observation (im1) to the last (imn), then we can drop the summation
limits; n refers to the total number of observations for the
criterion and predictor(s). This sample size is the same
Later when the algebra
regardless of the number of predictors.
becomes more complex, we use summation limits extensively.
6
As mentioned earlier, it is assumed that the reader is familiar

with the author's 1983a paper.
The regression model for one standardized predictor ;6:
52
47
ZA
A+BZ
1 1
The observed standard score model
is:
+ e
where:
Z"
the predicted criterion ;11 standard score form,
the slope inercept ierm (not standardized-see O'Brien, 1982c)

the standardized predictor; i.e.,
where S
(x -7 )/S
z
1
is the
1
standard deviation of X
1
slope term (regression or beta weight)

1
the prediction error.

8
The reader may wonder why we divide by the term, n-2.

This term
represents the degrees of freedom for the unbiased standard error
of estimate for one predictor.
It can be shown that dividing by the appropraite degrees of freedom
term makes the sample standard error of estimate unbiased;i.e.,
the expected value of the sample standard error of estimate
equals the population parameter.
In general, the degrees of freedom for the unbiased
standard error of estiNate is: n-(0.1)
where p
the number of
predictors in the regression model. For one predictor,
n-2.
n-(1+1)
p + 1 arises from the number of parameters
n-(p+1)
that can be estimated in any raw score linear
revession model--p slope (b) terms plus the slope intercept term.
For a good discussion of degrees of freedom, see the
classic paper by Helen Walker (1940,1971). See also Stilson (19BG).
,
An alternate approach to the derivations could be used

by working with matrix albegra notation. The author
intends to present the derivations of this paper
and others in this series in matrix algebra.
They
will be written as part of this series for ERIC.
53
48
References
Draper, Norman and Harry Smith.

Wiley 8 Sons, 1966.
Applied Regression Analysis.
New York= John
Kendall, Maurice G.
and Alan Stuart. The Advanced Theory
of Statistics: Inference and Relationship, Vol. 2.
(2nd ed.). New York:
Hafner Publishing Co., 1967.
Lewis-Beck, Michael S. Applied Regression: An Introduction. Beverly Hills,

CA:Sage Publication, 1880.
Lindeman, Richard H., Ruth Gold and Peter Merenda.

Bivariate and Multivariate Analysis. Chicago: Scott, Foresman and Co., 1882
Mosteller, Frederick and John W. Tukey.

Data Analysis and
Regression:
A Second Course in Statistics. Mass: Addison-Wesley
Publishing Co., 1877.
Nie, Norman H. et al. Statistical Package for the Social Sciences (2nd ed.).
NY: McGraw Hill Book Co., 1875
2
O'Brien, Francis J., Jr. A proof that t and F are identical:

the general case, 1882a. ERIC ED 215 884
.
Proof that the sample bivariate correlation coefficient has limits +

ERIC ED 216 874
1882b.
. A derivation of the sample multiple correlation formula for standard

scores, 1882c. ERIC ED 223 428
A derivation of the sample multiple correlation formula for raw

scores, 1883a. ERIC ED 235 205
.
Stilson, Donald W. Probability and Statistics in Psychological

Research and Theory.
San Francisco, CA:Holden-Day, Inc., 1866.
Walker, Helen A.
Degrees of Freedom. Journal of Educational Psychology, 31,
1840, 238-269. Reprinted in Readings in
Statistics for the Behavioral Scientist.
Joseph A.
Steger (Ed.). NY:
Holt, Rinehart a Winston, Inc., 1871.
54
.7
-DOCUMENT'RESUME
A
SE 037 098
.ED '215 894

A Proof That t2 and F are Identical: The General
AUTIOR
TITLE
Case.
,24 Apr 82
PUB DATE
NOTE
EDR PRICE
DESCRIPTORS
20p.
.
'

*College Mathematics; Equations (Mathematics); Higher
Education- Instructional Materials; *Mathematical
Applications; Mathematical Concepts; *Mathematical
Formulas; Mathematics; *Proof (Mathematics);
*Research Tools; *Statistics; Supplementary Reading
Materials
ABSTRACT
This document proves that the F statisticcan be

obtained by squaring ,t -test values,, or that'equivalent t=test values
may be obtained by extracting the positive square roots of F values.
Proof to varying degrees of*completeness and accessibility 16s been
given by other scholars, but generally these prier statements,
particularly those available to students of education or psychology,,
focus'on the special case,4rhen sample sizes are equal. No source
could be found that provides a complete,\ detai.led'prZof of the

Atueents
of.
applied
general case that was understandable-to
statisti.cs. This document seeks to give a clear-step-by-itep proof,
MO'
with,a numerical example worked out, anda plan is provided for

proving the special case. It is felt the reader should te able to.
follow the proof of the general case, and should therefore have
little difficulty in translating the acquired knowledge into proving
the special case. (MP)
'
***********************************************************************
Reproductions supplied by EDRS are the best thtt can be made
*
****************-*******************************************************
/VS. DEPARTMENT OF EDUCATION


EOUCATIONAL RESOURCES INFORMATION

CENTER IERICI
ivori:ns document has been reproduced as

received, from the person or organzahon.
A Proof that t
and F are Identical:
the General. Case
onognaung a
-_
Mrnor changes have been mad, Io anprOve

reproduction
Francis
p_ oats of view dr opiniOnS sta d n this (loco
relent do not,necessanly rep,
F.
O'Brien, Jr., Ph.D.

INFORMATION CENTER (ERIC)"
root NIP
POSMon Or poky
.1t is well known that a researcher wilt) has. collected data from two
independent groupsmA,
rforth either a t-test for independent samples
or a one-way analysis 6f variance for two cfroups.
This is because
knoWledge of results from one type of computation can be transformed into

an equivalent result)for the other type of computation.
r;
if 'a t-test for independent samples is calculated,
For example,
the equivalent
F statistic can be obtained by squaring the t-test value. Analogously,

if the researcher has available the F statistic obtained from a one-way
analysis of variance for two groups, the.equi'valent t-test value may be
obtained by extracting the positive square root of ,the F value.
That is, t
= F or t = (F)
The proof that t
1/2
.
= F has been given to varying degrees 'of
completeness and licessibility to students by other scholars 'in,
'professional journals (See Rucci and Tweney, 1980 for citations).

.
[
Statistics textbooks commonally available to stude4te of education

1
....
or psychology
occasionally provide hintS for proving the'special case
of the relationship (when sample sizes are equal). (See, for example,
.
N.
Glass and Stanley, 1970)

The motivation fOr presenting the Proof is t.101ctold.
First, many
prior statements of the proof for the general case (of unequal sample
sizes) are either abbreviated, mathematically inaccessible or incomplete
for understanding this important relationship., A search, of

did not reveal a
the literature
source that' provided a complete, detailed proof of
'
the general case that was understandable to students of applied statistics.

Second, a-full sfep-by-step_ proof foi the general case will give reader's
4
a sense that statistics is not all just a babel of "Greek arithmetic".

As a dormer instructor of graduate lev'el applied s'Ntistics, I know
that many students can follow well-articulated proofs and desire to see them
worked out.
/
In this paper
three
tasks will be accomplished.
a clear step-by-step proof that t
Firsty
= F in the general case will be provided.
Second, a'buMerical example will be worked out.
-provided for proving the speciaLcase.
Third,
a plan will be
It is felt that the reader should
be able to follow the proof of the general case, and therefore, shOuld
have little difficulty in translating the acquired knowledge into

proving the special case.
Proof that t
= F: the Gen4ral
j
Case
Firs410eus lay out a table of. symbolic values'in order to

introduce a familiar notation and the var9.ab:Te7gAused in the proof.
This is done in'Table 1.

yhe'plan'for the proof is important for understanding the strategy
0
,involved in attacking a statistical proofr
that will be used here are given below:

1.
The steps of the plan
state the form of the t-test Statistic using the

notation of Table 1.
r
2.
square the t in step 1.
3.
state the form of the F. statistic using the notation

of Table 1.
4.
simplify alggbraiciIly the F statistic in step 3.
5_.
observe that t
t8 the squared t
amplified F of step 4 is equal

f step 2.
3.
Table 1
Table Layout
Notes for Table 1
for Two Independent Groups

1.
oup 1
sample sizes are assumed

unequal., That is,
Group 2
n.
X11
12
2.
21
the total sample size is
22
t..
31
41
n.
n.
2
32
3.
42
the grand mean (x..) is a

weighted mean since
sample sizes are unequal.
That is,
0
AP
+
X.
n.
1
n. 2X.2
'2
n.1 ,+
n.
2
X.
12
Ime
4.
s ?.
is not needed
for the proof. It is

included only for
completeness.
nl
Total
Sample
Size
n.
Sample
Mean
i.
n.
x..
X.2
Sample
Variance
n..
s.
s.
41
s..
N-
fr
5.
4-
\21'
This is the expresion we will use as a basis oecomparison with the.
Note that the alcove squared
2
.
. o
Step 3.
the F statistic
It.\ \will be recalled that the F statistic is the ratio of two independent :
1
sums of squares: a between

squares (SS
).
Also, eachk sums of squares is divided by an appropriate

and df
sums of sqUares (SS
df
(between sums of.squares degrees of freedpm,
the F statistic (for any number of groups)
is:
SSb/df
SS
w/df
The genera1.form'ofdfb'is "the number of groups minus one" (i.e., dfb = J-1,
where J is the number of groups).
4
two groups is df
= 2
1 = 1.
The- general form of df
size minus.the number of groups (i.e., df

we can Write
= n
+ n.
.1
df
='n
'1
J.
2
= n.. - J).
is
the total sample,

0
Since n.! =n.

1
+ n.
Hence for J=2 groups, we can write
Gr&mmarians will point out th-t the preposition "between" refers to

the relationship of two entites while "among "ore
g to more than two.
However, since the ,referencehere is ultimately td. two groups, "between'
'will be used instead of\the correct "among": 44afocept the righteous indig.
fr
5.
(N
This is the expresion we will use as a 'psis of'coMparison with the.

*,
.
Step 3.
2
.
4%
Note that the above squared
1.
the F statistic
It\will be recalled that the F statistic is the ratio of two independent :

sums of squares: a between'
squares (SS
).
sums of sqdares
( SSb )
Also, each,sums of squares is divided by an appropriate
df
(between sums Of, squares degrees of freedp
and df
the F statistic (for any nuber of groups)
is:
SS.
b/df
FJ- b,n.. -J
SS
w/df
The general.form of'df b is "the number of groups minus one" (i.e., df
whereJ is the number of groups).
= J-1,
two groups-is df
= 2 - 1 = 1.
The general form of df is "the total sample,

w
size minus,the number of groups (i.e., df

we
an Write
= n.1 + n.
J.
2
= n.. - J).
Since n.! = n.
+ n.
1
Hence, for J=2 groups, we can write
dfw,
44
ad
Gra.mmarians will.point out that the ipepotition "between" refers to

the relationship of two entiti,es while "among "ref
g to more than two.
However, since the iefereqcehere is ultimately tO two groups, "between'

'will be used instead of\the correct "among": .04 aotept the righteous indig.
-6.
Thus, using the notation of
Table 1, we carcwrite the F statistic
fof two groups as follows:
sb/i
F
1,n.
+ n.
1
- 2
2
ss
w/(n. +n. -2)

2
SS
SS
w
- 2)
+n.
(n.
1
In order to facilitate the proof, we will write'F in the

terms of means and variances.
familiar
That is:
n.
1nl+n. -2
+
1
n.
-X..)
(Z(
-1)s.
(n.
1
-1)s.
(n.
2
n.1 + n.2 2
We will refer to this expression as simply F.

2
It is instructive to now compare the vale of t
and F. for the readers.

I
convenience, we will restate t
'
and F so that they may be compared,,
arZ referred to later. This is done in Table 2.
7.
Table 2
L.
Restatement of t
and F
t2
F
41.
z
.-X.
(X.
1
(n: -.1)s.
(n.
+ n.2\
)s..
(A
-1)s.
:2
-R.:)2
+ (n. -1).
2
-2
,n.1
+ n.
(i.
+,n.
+
n.
(n.
)(n.
+ n.
n.
Step 4.- Simplify F.

This is the next to the last step in the proof.
proof requires several algebraic steps.

of,F
We will first
A full step-by-step
simplify the ,numerator
Notes pertaining to the algebraic manipulations are provided if
the margin for the readers convenience:
See the following pages*
c..
4
4..
Notes,
sta, by looking at the numerator of F (SSb).

Since,
n.
SS
n.
(;
1 "" 1
R..
(R..R..)2
n.
""=
n:1
+ n.
n.
1
n.2 4X
n.
Finding common denominators for each

bracketed term,
n.2 /
(n.1X.1 + n.2X.2
(n.1 +n.2)X.1
+ n.
1
n.2
+n.
(n.
1
))-(.
- (n.1X.1
R.
n.
2
2!
+ n.
1 n.
1
].
n.
L
SS
P n.
n.
we can substitute for X..
X.1
+ n.
1
Sat.
=
n.
n. 2X.21
+ n. X.
R.
1
qemoytng inside parentheses, multiplying

Aside:the group means and subtracting,
AP
V.
+ n.
.n.
1
SS
- n. X..
tl
n.
1,
- n. X.
1
4/
n.1 + n.
2
+ n.
R.
n.
n.
2
n. R.
R.
1
n.
1
2X.2
2
Cancellipg.like teifms,
+ n.
n.
n.
n.
1
,Factbring like sample size
terms,
+ n.
n.
1
n.
SS
-R.
(R.
+ n.
n.
1
2
2
n.
rR. --)
n.1
n.2
Squaring each term separately

inside the brackets,
10Na
SS
(h.
n.
2
1
(n.
(n.
2'
+ n.
1
/n.
qo(n.1) (n.2)
n.2(X.1 -X.2)2
n.
+ n.
-R.
(R.
1
1-
)2
1
-X,
(X .
b
(n.
SS
brackets,
(X.
Rearranging terms by bringing

all sample size terms outside
+ n.
(n.
)(n.
(n.
+,n.
1
(R.2-R.1
1
(n.
SS
-X.
(X.
1)
Factdring out like sample size

terms in'numerator and denom-:inator, A
y
lk
(n.1 + n.2)2
Note that (X. -X.
is the
same as (R.,-R.)? because

Oftufo =sq
ed`difer
the same to
Tes
same quantity. Hence, we. dan
can
411,
factor within thebrackefs

I
to obtain,
9.P
)(n..
(h.
1
(n
SS
+ n
+ n.21
(n
(X.1 1
)2
Simplifying (n.1 + n.2)
'2
and (n .
+ n .
1
) 2',
we obtain, 16-
2-
)(n. 2)
(n.
SS
(R.
1
-R.
This is just about itiNow sub
)2
2
stitute the value of SSt, just

obtained into the F statistic,
and obtain,
n.1 + n.,
4.
1_3
-14
'so
p
C
0/
11.
)(n.
(n.
1
(X.
n
'1
k
0
If we divide numerator and denominator by
-1.)s..
(n.
).
-1)s.
(n.
-X.
1
(n.
n;
(R.
1
F
2
-1)s.
-R. )2
+ n.
Step 5.
observe that t
- 2
= F
t (n.
+ n.
n.
.
ENO OF PROOF.
-1)s2
(n.
n.
we obtain,
4-n.
+ n.
n.
(n.
).(n.
) (n.
If the value of F above is compared with value of t
it will be seen that they are in the same'
form, signifying that they'are equal. Refer to Table 2 for this comparison.
This completes the entire proof in accordance with the five step plan.
We now turn to a numerical
example for two.independerit groups of unequal sample size.
15"
16
4
12.
t
Numerica1 Example
The analytib
proof using algebraic rules for the general case
.
41,
was given in great 'detail.
A numerical example should provide additional
insight.
The data and descriptive statistics are rovided in Table 3 which is

modeled do Table 1.
Note that the data were chosen for illustrative purposes
only.
Table 3
Data for Numerical Example

46.
Group 1
"`Group 2
10
50
40
70
20 "
100
50
40
'QM
75
ieb
4
.
90
'Total
Sample
Size
10
Sample
Mean
07.5
65.0
54.5
Sample
Variance
957.5,
700.0
NOT NEEDED
Refer to Table 2 for the t-test formula.
13.
Using the formula
-,
there, the t statistic
value is:
47,.5 - 65.0
t
(6-1)957.5 + (4-1)700.0
10
6887.5
J.
t
10
24
-.9240
1%.
If we, square this value, we obtain,
(-.9240)
.8537
(t
= .8537)
.We place the computed value in the margin for easy reference.
Now
compute the F statistic using the formula provided in Table 2.

.6(47.5-54.5)
+ 4(65.0-54.5)
(6-1)957.5 + (4-1)700.0
\_J
6 + 4
6(49) + 4(110.5)
F
6887,5
8
.8537
(F = .8537)
18
./
Thus, t
= .8537 and FL .8537 or t
14.
= F.
Prooff the Special Case
If the sample 'sizes are equal in eresearch design of

two independent groups, the proof that t
derive.
=F is somewhat easier to
Rather than perform the necessary algebraic manipulations, it
may prove instrpctive,forthe reader to actually derive it himself or herself.

Working_ through the analytic proof for the special case will solidify
Jr(AnUnderstanding of.the proof for th
general case.
Some hints will be provi ed for the reader in proving thC

special case.
1.
They are summarized as follows.

= n
= n.
n.
R.2
R.
2.
X.. =
That is, since sample sizes are equal,

r
used; it could be called n.

mean (X..- )
for X- ..
one symbol for sample size may be
Also, since samplX)zes are equal, the grand
is simply the average of the means of 6ach group.
should be substituted in the proof.
Thi5 value
de`
By making these two changes and by following the five step outline
used for the general case proof, the reader should be able to derive the
proof for the special case., One may.also wish to "make up" an.easy to work with
numerical data set to check
on the process.
I
0
Note
For students or researchers who enjoy proofs in applied statistics, the

following
two
references may be useful.
EdwardAll'en.L.
Expected Values of Discrete Random Variables and Elementary

.Statistics. New York: Wiley,"1964.
Guilford, J. P. and Fruchter, Benjamin. Fundamental Statistics in Psychology

and Education, 4th and 5th editions. New York: McGraw-Hill, 1973.
References
Glass, Gene V and Stanley, Julian_ C. Statistical Methods in Education
and Psychology. Englewood Cliffs, New Jersey: Prvitice-Hall, 1970.
Rocca, Anthony J. and Tweney, Ryan D. Analysis oalvariance and the "second discipline"
of Scientific isychology: a historicaraccount. Psychological
Bulletin, 1980, Vol.,87, No. 1, 166-184.
O
'fp
a MD

Statistical Proofs For Undergrads

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Statistical Proofs For Undergrads

Hochgeladen von

Copyright:

Verfügbare Formate

DOCUMENT RESUME

"- ED 216 874

MF01/PCO2 Plus postagf.

er presents in detail a proof of the liMits,

Proof That the Sample Bivariate Correlation Coefficient

"PERMISSION TO REPRODUCE THIS

Ars aocument has been reproduced as

Minor changes have been made to Improve

Fraricis J. O'Brien, Jr.,'Ph.D.

Points of crew or %mons sated fn this docu

T,0 THE EDUCATIONAL RESOURCES

Virtually all social science students who have studied applied

statistics have been introduced to the concepts and formulas

correlation of two variables.

Applied statistics textbooks routinely

report the theoretical limits of the bivariate . correlation coefficient; tamely,

'that the coefficient is no' more than +1 and no less than

no commonly used applied statistics textbook

best textbooks available to students of education and psychology introdUces

the'prOof (Glass and Stanley, 1970).

UndoUbtedly, one of the constraints

placed on authors by publishers is space limitations available for detailed

algebra, most students of applied statistics at the advanced undeigraduate

As a former instructor of graduate level introductory

of detaile\d steps in -4 well articulated and coherent manner.

of relevant statistical and mathematical concepts is also helpful

When students are presented

My experience has been that the typical

student of applied statistics can follow a good number of proofs becguse

proof often increases academic

The proof requires knowledge of several concepts in statistics

preliminary concepts stated in a consistent notation will be reviewed.

Notation and Basic Formulas

Table l'i,a layout of symbolic values written in the notation

The model presented in Table 1 is of two measures

in unstandardized (raw scorer and standardized (z score) form. Table

formulas based on unstandardized variables that

will be useful'for the; development of the proof.

Table Layout for Two Measures in Unstandardized and Standardized Form

all sample size terms are equa.1; th4t is: n

Relevant Formulas for Unstandardized Measures

The sampleisize terms are equal:n =ny . Also n x =n.=n4

anipulation of "Sample Mean"; i.e.,

Des criptive statistics for standardized scoret'will be

For case 1 of measure X in Table 1 , the standard' -(z) score is:

For any (hypotheticall, case, 'the standard,score,of-an,X measure is

.The lame procedure can be applied to Ymeasurds..For case'l:

for the ith (hypothetical case),-we have:

,Since a standard score distribupion (such as in Table 11 is a

distribution of variable measures., we can calculate means; standard deviations,

variances, correlations, and so forth, just as we can,calculate theSe.

statistics for unstandardized measures.

Most students will recall that the mean of z scores is equal to 0

The mean of X standardized scores is defined as:

Similarly, for Ymeasures:

The variance Of X in z score notation is defined as:

Th'e Appendii Contains proof of certaimconcepts'or relationhips

For the standerdized,Y measure the variance is defined as:

Sum of Squared Standard Scores.

To understand the proof it is necessary to know the result

of summing a distributiorkof squared standard

of the expression, we obtain:.

is a qonstant, we can.factor it outside, and write: