Beruflich Dokumente
Kultur Dokumente
Michael G. Aamodt
DCI Consulting Group and Radford
University
Michael A. Surrette
Springfield College
David B. Cohen
DCI Consulting Group
2016ThomsonWadsworth,apartofTheThomsonThomsonHigherEducation
Corporation.Thomson,theStarlogo,andWadsworth10DavisDrive
aretrademarksusedhereinunderlicense. Belmont,CA940023098
USA
AllRightsReserved.Nopartofthiswork
coveredbythecopyrighthereonmaybereproducedor
usedinanyformorbyanymeansgraphic,electronic,
ormechanical,includingphotocopying,recording,
taping,Webdistribution,informationnetworks,or
informationstorageandretrievalsystemswithoutthe
writtenpermissionofthepublisher.
PrintedintheUnitedStatesofAmerica
12345671312111009
ISBN:0495186635
Contents
4. Understanding Correlation
51
5. Understanding Regression
71
6. Meta-analysis
91
References 115
Index 119
i
ii
Introductionand
Acknowledgments
_______________________________
iv
1. TheConceptofStatistical
Analysis
_______________________________
ReasonsforAnalyzingData
TODESCRIBEDATA
1
question(samplesize),howthetypicalemployeerespondedto
eachquestion(mean,median,mode),andtheextenttowhich
the employees answered the questions in similar ways
(variance, standard deviation, range). These types of
descriptivestatisticswillbediscussedindetailinChapter2.
TODETERMINEIFTWOORMOREGROUPSDIFFERON
SOMEVARIABLE
ThesestatisticswillbediscussedindetailinChapter3.
TODETERMINEIFTWOORMOREVARIABLESARE
RELATED
Aquestionoftenaskedinresearchistheextenttowhichtwoor
morevariablesarerelated,ratherthandifferent.Forexample,
wemightaskifatestscoreisrelatedtojobperformance,ifjob
satisfaction is related to employee absenteeism, or if the
amountofmoneyspentonrecruitmentisrelatedtothenumber
of qualified applicants. To determine if the variables are
related,wemightusea correlation. Ifwewantedtobeabit
more precise or are interested in how several different
2
Concept of Statistical Analysis
Attimes,wehavealotofdatathatwethinkcanbesimplified.
Forexample,wemighthavea100itemquestionnaire.Rather
thanseparatelyanalyzeall100questions,wemightcheckto
see if the 100 questions represent five major
themes/categories/factors. To reduce data, we might use a
factor analysis ora cluster analysis. Factor analysis will be
discussedindetailinChapter6.
SignificanceLevels
Significancelevelsareoneofthenicethingsaboutstatistical
analysis. Ifyouarereadinganarticleabouttheeffectiveness
of a new training technique and dont care a thing about
statistics,youcanmovethroughthealphabetsoupdescribing
thetypeofanalysisused(e.g.,ANOVA,MANOVA,ANACOVA)
and go right to the significance level which will be written
somethinglike,p <.03.Whatthisistellingyouisthatthe
difference in performance between two or more groups (e.g.,
trainedversusuntrainedormenversuswomen)issignificantly
differentatsomelevelofchance.
Whatistheneedforsignificancelevels?Supposethat
youwalkintoatrainingroomandaskthepeopleontheright
side of the room how old they are and then do the same to
peoplesittingontheleftsideoftheroom. Youfindthatthe
averageageofthepeoplesittingontherightsideoftheroomis
37.6,whereastheaverageageofpeopleontheleftsideofthe
room is 39.3 years. Does this difference make you want to
submitapaperonthesubject?Coulditbethatolderpeoplesit
closertothedoorsotheydonthavetowalkasfar?Itcouldbe,
butprobablynot. Anytimewecollectdatafromtwoormore
groups,thenumberswillneverbeidentical.Thequestionthen
3
Chapter 1
becomes, if the numbers are never identical, how much of a
difference does it take before we can say that something is
actuallyhappening?Thisiswheresignificancelevelscomein.
Basedonavarietyoffactorssuchassamplesizeandvariance,
theendresultofanystatisticalanalysisisasignificancelevel
that indicates the probability that our results occurred by
chancealone.Ifouranalysisindicatesthatthegroupsdifferat
p<.03,thenwewouldconcludethatthereare3chancesin100
thatthedifferencesweobtainedweretheresultoffate,karma,
orchance. Inthesocialsciences,wehaveaverydumbrule
that if theprobabilityislessthan 5 in 100 that our results
couldbeduetochance(p <.05),wesaythatourresultsare
statisticallysignificant.
CHOOSINGASIGNIFICANCELEVEL
Although.05isthesignificanceleveltraditionallyusedinthe
socialsciences,insomecircumstancesresearchersmaychoose
touseamoreliberaloramoreconservativelevel.Thischoiceis
a function of the cost associated with being wrong. When
interpretingtheresultsofastatisticalanalysis,therearetwo
waysinwhichaninterpretationcanbewrong:aTypeIerror
andaTypeIIerror.Toexplaintheseerrors, lets imaginea
study in which a personnel analyst is trying to predict job
performancebyusinganemploymenttest.
WithaTypeIerror,theresearcherconcludesthatthere
isarelationshipbetweenthetestandjobperformancewhenin
factthereisnosuchrelationship. Witha TypeII error,the
researcherconcludesthatthereisnorelationshipbetweenthe
two variables when in fact there is one. By using a more
conservativesignificancelevelsuchas.01or.001,aresearcher
istryingtoreducethechanceofaTypeIerror. Likewise,by
using a more liberal significance level such as .10 or .15, a
researcheristryingtoreducethechanceofaTypeIIerror.
The decision to use a particular significance level is
determinedbythecostofbeingwrong.Iftheemploymenttest
4
Concept of Statistical Analysis
is expensive to administer and might result in the hiring of
fewerwomenandminorities,ourpersonnelanalystmightwant
touseaconservativesignificancelevel(e.g.,.01)toreducethe
chanceofaTypeIerror. Thatis,ifwearegoingtospenda
greatdealofmoneytouseatestthatmightalsodecreasethe
diversityofourworkforce,wewanttobeverysurethatthetest
actuallypredictsperformance.However,supposethetestcosts
20centsperapplicantanddoesnotresultinadverseimpact,
wemightbemorewillingtomakeaTypeIIerrorusingatest
that doesnt actually predict performance. If this were the
case,wemightbewillingtoacceptasignificancelevelof.10.
Inadditiontoconsideringthefinancialandsocialcosts
ofbeingwrong,significancelevelscanbeselectedonthebasis
ofpreviousresearch. Thatis,if50previousstudiesfounda
significant relationship between an employment test and job
performance, we might be more willing to consider a
probabilitylevelof.07tobesignificantthanwewouldifthere
werenopreviousstudies.
Statistical significance levels only tell us if we are
allowedtopayattentiontoourresults.Theydonottellusif
our results are important or useful. If our results are
statistically significant, we get to interpret them and make
decisions about practical significance. If they are not
statisticallysignificant,westartagain.
STATISTICALSIGNIFICANCEINJOURNALARTICLES
Statisticalsignificancelevelsareusuallypresentedinjournal
articlesorconferencepapersinoneoftwoways.Thefirstway
is to list the significance level in the text. For example, an
articlemightread:
Thejobsatisfactionleveloffemaleemployees(M=4.21)
wassignificantlyhigherthanthatofmaleemployees
(M=3.50),t(60)=2.39,p<.02.
5
Chapter 1
The M = 4.21 and M = 3.50 are mean scores on a job
satisfaction scale, the 60 is the degrees of freedom (you will
learnaboutthisinchapter3),the2.39isthevalueofthettest
(youwilllearnaboutthisinchapter3),andthep<.02tellsus
that there are only 2 chances in 100 that we would expect
similarresultspurelybychance.Inotherwords,thedifference
in satisfaction between men and women is statistically
significant.
Thesecondwaytodepictasignificancelevelistouse
asterisksinatable. Takeforexamplethenumbersshownin
Table1.1.Thecorrelationof.12betweencognitiveabilityand
commendationsdoesnothaveanyasterisks,indicatingthatit
is not statistically significant. The correlation between
educationandcommendationshasoneasteriskindicatingthat
thecorrelationissignificantatthe.05level. Thecorrelation
betweeneducationandperformanceinthepoliceacademyhas
twoasterisksindicatingthatitissignificantatthe.01level,
and the three asterisks above the .43 indicate that the
correlationbetweencognitiveabilityandacademyperformance
issignificantatthe.001levelofconfidence. Thegreaterthe
numberofasterisks,thegreatertheconfidencewehavethat
thenumberdidnotoccurbychance.
PracticalSignificance
Ifourresultsarestatisticallysignificant,wethenaskabout
thepracticalsignificanceofourfindings.Thisisusuallydone
by looking at effect sizes, which can include d scores,
correlations (r), omegasquares, and a host of other awful
sounding terms. Effect sizes are important to understand
becausewecanobtainstatisticalsignificancewithlargesample
sizesbuthaveresultswithnopracticalsignificance.
6
Concept of Statistical Analysis
Table1.1
Exampleofstatisticalsignificance
AcademyScore Commendations
Cognitiveability .43*** .12
Education .28** .24*
*p<.05,**p<.01,***p<.001
Forexample,supposethatweconductastudywitha
millionpeopleandfindthatwomenscoreanaverageof86ona
mathtestandmenscoreanaverageof87. Withsuchabig
samplesize,wewouldprobablyfindthedifferencebetweenthe
twoscorestobestatisticallysignificant.However,whatwould
we conclude about the practical significance of a onepoint
differencebetweenmenandwomenona100pointexam?Are
men superior to women in math? Will we have adverse
impact if we use a math test for selection? Should we
discourageourdaughtersfromacareerinscience? Probably
not. The statistical significanceallows us toconfidently say
thatthereislittledifferencebetweenmenandwomenonthis
variable. Ifwecomputeaneffectsize,wecansaythisina
morepreciseway.
Another good example of the importance of practical
significance comes from the computation of adverse impact
statistics. Imagine a situation in which an employer selects
99%ofthemenand98%ofthewomenapplyingforproduction
jobs. From a practical significance perspective, the 1%
difference suggests that the employer is essentially hiring
malesandfemalesatequalrates.Iftherewere200menand
200 women in the analysis, the 1% difference would not be
statisticallysignificant.If,however,therewere2,000menand
2,000womenintheanalysis,thatsame1%differencewould
now be statistically significant. Without the consideration of
practicalsignificance,onemightconcludethatbecausethe1%
difference in the large sample is statistically significant, the
employermightbediscriminatingagainstwomen.
7
Chapter 1
TypesofMeasurement
8
Concept of Statistical Analysis
points. Likewise,asshowninTable1.2,thescoredifference
between the applicants in first and second place is not
necessarilythesamescoredifferencebetweentheapplicantsin
secondandthirdplace.
Interval data have equal intervals but not necessarily
equal ratios. Examplesofintervaldataincludeperformance
ratings,thetemperatureoutside,andascoreonapersonality
test. Letsusetemperaturetodemonstrate. Athermometer
has equal intervals in that the distance between 89 and 90
degrees and the distance between 54 and 55 degrees is the
same(onedegree). However,atemperatureof80degreesis
not twice as hot as a temperature of 40 degrees. Thus,
althoughtheintervalsbetweenpointsonthesaleareequal,
theratioisnot.
Ratio data have equal ratios and a true zero point.
Examplesofratiodataincludesalary,height,andthenumber
of job applicants. All three have a true zero point in that
someonecan haveno salary, there canbe no job applicants,
and if something doesnt exist, it can have no height. The
ratiosareequalinthat10jobapplicantsistwiceasmanyas5,
asalaryof$40,000istwiceasmuchasasalaryof$20,000,and
adesksixfeetinlengthistwiceaslongasadeskthatisthree
feetinlength.
Now that you have some of the basics, the following
chapters will provide information about particular types of
statistics
Table1.2
ApplicantListfortheBlueMoonDetectiveAgency
Rank Applicant Score
1 MaddieHayes 99
2 DavidAddison 94
3 TomMagnum 93
4 JohnShaft 89
5 FrankCannon 88
6 ThomasBanacek 87
9
Chapter 1
7 NoraCharles 80
8 JoeMannix 75
9 JessicaFletcher 71
10
2. Statistics That Describe Data
____________________________________
Table2.01
RawSalaryData
Employee HourlyRate
Jim $15.35
Ryan $16.37
Pam $15.35
Dwight $16.11
Michael $15.10
Oscar $17.05
Kevin $16.80
SampleSize
Animportantelementininterpretingthevalueofapieceof
research is the sample sizethe number of participants
included in a particular study. The number of participants
11
Chapter 2
may include an entire population (e.g., all employees at the
Pulaski Furniture Plant) but more than likely represents a
sample (100 students from Radford University) of a larger
population(allcollegestudentsintheUnitedStates).Inmost
journalortechnicalreports,thenumberofpeopleinasample
isdenotedbytheletterNandthenumberofpeopleinasub
sample(e.g.,numberofmen,numberofwomen)isdenotedbya
lowercasen.
Researchresultsderivedfromstudiesconductedwitha
smallnumberofindividualsshouldbeinterpretedwithalower
degree of confidence than a study conducted with a large
numberofparticipants.Itisimportanttonote,however,that
we also need to be aware of the difference between a small
samplesizeandasmallpopulation. Werememberbeingata
conferencewhenoneoftheaudiencemembersquestionedthe
accuracyofapresentersresults,becausethesamplesizewas
only25participants. Thespeakerpausedforamomentand
then told the audience member that the 25 participants
represented everyone in his police department, that is, the
entirepopulation.Theinterestingpartofthisstoryisthatthe
audience member continued to comment that the use of 25
participants was still not acceptable. What the audience
memberfailedtounderstandisthat,althoughlargesamples
are preferred over small samples, you can never acquire a
samplesizelargerthanthepopulationavailabletoyou.
Ifasampleisusedratherthananentirepopulation,it
isimportanttoconsidertwoaspectsofthesample:theextent
to which it is random and the extent to which it is
representative of the population. In a random sample, every
memberofthepopulationhasanequalchanceofbeingchosen
for the sample. For example, suppose a large organization
wants to determine the satisfaction level of its 3,000
employees. Because the budget for the project is not large
enoughtosurveyall3,000employees,theorganizationdecides
tosample500employees.Tochoosethe500,theorganization
mightusearandomnumberstableordrawemployeenames
from a hat. The more random the sample, the lower the
sample size needed to generalize the results to the entire
12
Statistics that Describe Data
Unfortunatelyinmostresearch,thesamplesusedare
certainlynotrandomlyselected.Forexample,supposethata
researcher at a university wants to study the relationship
between employee personality and performance in a job
interview. The population in this case would be every
applicantintheworldwhohaseverbeenonanemployment
interview.Ideally,theresearcherwouldrandomlysamplefrom
thispopulation. However,asyoucanimagine,thiswouldbe
impossible.Soinstead,theresearchermightgiveapersonality
testto250applicantsforpositionsatalocalmanufacturerand
thentrytogeneralizetheresultstootherapplicants. These
250applicantswouldbecalledaconveniencesample.Because
theconveniencesamplesusedinmoststudiesaredrawnfrom
one organization (e.g., municipal employees for the City of
Mobile, Alabama) located in one region (e.g., south) of one
country(e.g.,U.S.),cautionshouldbetakeningeneralizingthe
resultstootherorganizationsorcultures.
Conveniencesamplesarefineaslongastwoconditions
are met. The first is that the convenience sample must be
similar to the population to which you want to apply your
results.Thatis,theaffirmativeactionopinionsof18yearold
collegefemalesinAlabamamaynotgeneralizeto50yearold
malefactoryworkersinOhio.
The second condition is that members of the
conveniencesamplemustberandomlyassignedtothevarious
research groups. Take for example a researcher wanting to
study the effects of a training program on employee
productivity. Before spending $100,000 to train all 500
employees in the plant, the researcher might take a
convenience sample (30 employees on the night shift) and
randomly assign 15 to receive training (experimental group)
and15tonotreceivetraining(controlgroup).Thesubsequent
jobperformanceofthetwogroupscanthenbecompared.
A sample is considered to be representative of the
populationifitissimilartothepopulationinsuchimportant
characteristics as sex, race, and age. Random samples are
typically also representative samples. If a sample is not
13
Chapter 2
random,itisimportanttocomparethepercentageofwomen,
minorities,olderpeople,andothervariablesofinteresttothe
percentages in the relevant population. If the sample differs
fromthepopulation,itisdifficulttogeneralizethefindingof
thestudy.
Although in most cases it is important to have a
representativesample,therearetimeswhenitisnecessaryto
oversample certain types of employees. A good example of
suchasituationmightbeanemployeeattitudesurveyatan
organizationinwhichonly10%oftheemployeesarewomen.If
a random sample of 20 employees were drawn from a
populationof200,onlytwowomenwouldbeinthesample,not
enoughtocomparetheattitudesofwomentomen.Toensure
thatgenderdifferencesinattitudescouldbeinvestigated,one
might randomly select 10 of the 180 men and 10 of the 20
women.
MeasuresofCentralTendency
Setsofstatisticsthatdescribeasetofrawdataarecollectively
referredtoasmeasuresofcentraltendency.Individually,they
arereferredtoasthemean,median,andmode.
THEMEAN
Table2.02
ComputingtheMeanSalary
Employee HourlyRate
14
Statistics that Describe Data
Michael $15.10
Jim $15.35
Pam $15.35
Dwight $16.11
Ryan $16.37
Kevin $16.80
Oscar $17.05
Sum 112.13
N 7
Mean $16.02
Asyoureadjournalarticlesandtechnicalreports,you
will find that M and X are the symbols most often used to
represent the mean. Throughout this chapter, we will
representthemeanwiththesymbolM.
THEMEDIAN
Themedian(Md)isthepointinyourdatawhere50%ofyour
rawscoresfallaboveand50%ofyourrawscoresfallbelow.To
determinethemedian,youbeginbyrankingyourrawscores
fromhighesttolowestandthenfindthescorethatfallsinthe
middle. Using the data from the seven employees in Table
2.02,weseethatthemedianwouldbe$16.11,becausethree
salaries($15.10,$15.35,and$15.35)arelowerthan$16.11and
threesalaries($16.37,$16.80,and$17.06)arehigher.
In our example, the median was easy to compute
becausetherewasanoddnumberofscores(7).Whenthereis
an even number of scores, you take the score that would
theoretically fall between the two middle scores. As an
example,letsaddonemoresalarytoourdataset($16.27):
$15.10,$15.35,$15.35,$16.11,$16.27,$16.37,$16.80,$17.06
Whenyoucountupfromthelowestsalary,thefourthsalaryis
$16.11, and if you count down from the highest salary, the
15
Chapter 2
fourthsalaryis$16.27.Toobtainthemediansalary,wewould
addthe$16.11andthe$16.27anddividebytwo. Thusthe
mediansalarywouldbe$16.19.Thisisthepointatwhich50%
ofthesalarieswouldfallaboveand50%ofthesalarieswould
fallbelow,eventhoughthesalaryof$16.19isnotanactual
memberofthedataset.
THEMODE
TheMode(MO)representsthemostfrequentlyoccurringscore
inasetofdata. Lookingatouroriginalsampledatasetin
Table 2.02, $15.35 would be the mode as it occurs twice;
whereas,eachoftheothersalariesoccursonlyonce. Inthe
casewhereyouhavemorethanonescoreoccurringmultiple
times(e.g.16,14,13,13,10,8,7,5,and5),thedatawouldbe
saidtobebimodal(havingtwomodes:13and5).
Becausetherearethreemeasuresofcentraltendency(mean,
median,mode),itisreasonabletoaskwhichofthethreeisthe
besttouse.Withlargesamplesizes,themeanisthedesired
measure of central tendency. With smaller sample sizes,
however,themeancanbeundulyinfluencedbyanoutliera
scorethatisverydifferentfromtheotherscores. Thus,with
smaller samples, the median should probably be used.
Unfortunately, there is no real ruleofthumb for what
constitutesasmallsample;andthus,theuseofthemeanor
medianissubjecttopersonalpreference.
Toseehowanoutliercanaffectthemean,lookatTable
2.03. InSample1,thecognitiveabilityscoresarerelatively
similarandthemeanandthemedianarethesame.InSample
2, however, the cognitive ability score of 44the outlieris
verydifferentfromtheotherscores, causingthemeantobe
much higher than the median. If we had 100 employees
insteadofthe7intheexample,theeffectofoneoutlierwould
16
Statistics that Describe Data
Table2.03
CognitiveAbilityScoresforTwoSamples
Sample1 Sample2
17 17
18 18
19 19
20 20
21 21
22 22
23 44
Mean 20 23
Median 20 20
MeasuresofVariability
Table2.04
ExampleofThreeDistributions
Day Evening Night
Shift Shift Shift
4 2 3
4 3 3
4 4 4
4 5 5
4 6 5
Mean 4 4 4
Table2.05
SamplePerformanceAppraisalRatings
Geller Tribbiani
3 2
3 2
3 3
3 3
3 4
3 4
18
Statistics that Describe Data
STANDARDDEVIATION
Thestandarddeviationisastatisticthat,whencombinedwith
the mean, provides a range in which most scores in a
distribution would fall. The standard deviation is based on
somethingcalledthenormalcurveorthebellcurve. The
ideabehindthenormalcurveisthatiftheentirepopulation
was measured on something (e.g., intelligence, height), most
people would score near the mean (the middle of the
distribution)andveryfewwouldscoreconsiderablyaboveor
belowthemean.
There are two waysthat astandard deviation can be
used to interpret data. The first is to focus on what the
standarddeviationtellsusaboutadistribution.Inviewingthe
19
Chapter 2
normal curve, we find that 68.26% of scores fall within one
standard deviation of the mean, 95.44% fall within two
standarddeviationsofthemean,and99.73%fallwithinthree
standard deviations of the mean. Lets use an example to
demonstratewhythisknowledgeisuseful.
Supposethatyouareatrainerandwillbetrainingone
group of employees in the morning and another in the
afternoon. Priortostartingyourtraining,youlookattheIQ
scoresoftheemployeestobetrained.AsshowninTable2.06,
youarepleasedthattheemployeesinbothclasseshaveamean
IQof100. Giventhatascoreof100istheaverageIQinthe
U.S., you feel comfortable that your trainees will be bright
enough to learn the material. However, as you look at the
standarddeviations,yourealizethatyourafternoonclasswill
beatrainersnightmare.
Table2.06
IQScoresforTwoTrainingGroups
Group MeanIQ SD 1SDRange 2SDRange
Morning 100 3 97103 94106
Afternoon 100 15 85115 70130
Inthemorningclass,thestandarddeviationof3tells
youthattheIQof68%ofyourtraineesiswithin6pointsofone
anotherandthattheIQof95%ofyourtraineesiswithin12
pointsofoneanother. Inotherwords,theemployeesinthe
morningsectionhavesimilarIQlevels.Theafternoonclassis
adifferentmatter.ThoughtheaverageIQis100,thestandard
deviationis15.Someofyourtraineesaresobright(e.g.,IQof
130) that they probably will be bored; whereas, others have
such a low IQ (e.g., 70) that they will need remedial work.
WithsuchalargedispersionofIQsintheclass,thereisnoway
you could use the same material and the same pace to
effectivelytraineachemployee.Thisisaconclusionthatcould
nothavebeenmadewiththemeanalone.
20
Statistics that Describe Data
Thesecondwaytouseastandarddeviationistofocus
on what the standard deviation tells us about a particular
score.Forexample,considerasalarysurveyforpoliceofficers
reportingameansalaryof$55,000andastandarddeviationof
$3,000.Fromthisinformationwewouldknowthatabouttwo
thirds (68.26%) of police departments pay their officers
between $52,000 (the mean of $55,000 minus the standard
deviationof$3,000)and$58,000(themeanof$45,000plusthe
standarddeviationof$3,000).Onthebasisofthesefigures,we
mightnotethatalthoughthe$54,000salarywepayisbelow
themean,thefactthatitiswithinonestandarddeviationof
themeanindicatesoursalaryisnotextremelylow.
Asanotherexample,supposethatanapplicantsscore
onanexamisonestandarddeviationabovethemean.Usinga
chart such as that shown in Table 2.07, we see that the
applicants score was equal to or higher than 84.13% of the
otherapplicants.
Now that we have discussed the usefulness of
interpretingastandarddeviation,itistimeforsomebadnews.
Inferencesfromastandarddeviationwillonlybeaccurateif
your data set is fairly large and your data are normally
distributed(i.e.,aplotofyourdatawouldlooklikeanormal
curve). Unfortunately,thisisseldomthecase. Thoughmost
measures are normally distributed in the world population,
seldomaretheynormallydistributedinanygivenorganization
or job. That is, because we screen out applicants with low
ability and promote those with high ability, test scores and
performanceevaluationsseldomresembleanormalcurve.
Why does this matter? Consider the data shown in
Table 2.08. The table shows the number of traffic citations
writtenbypoliceofficersintwodepartments. Thenumberof
citations written in Elmwood approximates a normal
distribution,whereas,thenumberwritteninOakdaledoesnot.
As you can see from the table, the large standard deviation
causedbyalackofanormaldistributionintheOakdaledata
wouldcauseustomaketheinferencethatanofficerwhoistwo
21
Chapter 2
standard deviations below the mean would be writing a
negativenumberoftickets!
Table2.07
InterpretingStandardDeviations
Standard
Cumulative%
Deviation
3.0 0.14
2.0 2.28
1.5 6.68
1.0 15.87
0.5 30.85
0.0 50.00
+0.5 69.15
+1.0 84.13
+1.5 93.32
+2.0 97.72
+3.0 99.86
Table2.08
NumberofTrafficCitationsWritten
PoliceDepartment
Officer
Elmwood Oakdale
A 1 1
B 2 1
C 2 1
D 3 1
E 3 1
F 3 1
G 4 1
H 4 1
22
Statistics that Describe Data
I 4 1
J 4 1
K 5 1
L 5 1
M 5 1
N 5 9
O 5 9
P 5 9
Q 6 9
R 6 9
S 6 9
T 6 9
U 7 9
V 7 9
W 7 9
X 8 9
Y 8 9
Z 9 9
Mean 5.00 5.00
Standarddeviation 2.00 4.08
1SDRange 37 .929.08
2SDRange 19 3.1613.16
VARIANCE
Athirdmeasureofdispersionisthevariance,whichis
simply the square of the standard deviation. Although the
varianceisimportantbecauseitservesasthecomputational
basis for several statistical analyses (e.g., ttests, analysis of
variance),byitselfitservesnousefulinterpretativepurpose.
Thus, the standard deviation is more commonly reported in
journalarticlesandtechnicalreportsthanisthevariance.
23
Chapter 2
StandardScores
Standardscores convertrawscoresintoaformatthattellsus
therelationshipoftherawscoretotherawscoresofothers.
Theyareusefulbecausetheyallowustobetterinterpretand
compare raw data collected on different measures. That is,
suppose your daughter told you that she scored a 43 on the
NationalHistoryTestthatwasadministeredatschool. With
onlythatrawscore,youwouldntknowwhethertorewardher
bybuyingthenewKatyPerryCDorpunishherbymakingher
listentoyourBarryManilowcollection. However,ifshetold
youthatherscoreof43putherinthetop5%,yourdecision
wouldbemucheasier.
Tomakerawscoresmoreuseful,weoftenconvertthem
into something that by itself has meaning. Perhaps the
simplestattemptatdoingthisistoconvertarawscoreintoa
percentage.Forexample,yourdaughtershistorytestscoreof
43 would be divided by the number of points possible (45)
resulting in a score of 95.6%. However, the problem with
percentagesitthattheydonttellushoweveryoneelsescored.
Thatis,atestmightbesoeasythata95.6%isthelowestscore
in the class. Likewise, I remember taking a physiological
psychology course as an undergraduate in which the best
studentintheclasshadanaverageof58%acrossfourtests!
The two most commonly used standard scores are
percentilesandZscores.
PERCENTILES
Apercentileisascorethatindicatesthepercentageofpeople
thatscoredatorbelowacertainscore.Forexample,asalary
survey might reveal that a salary of $26,000 is at the 71 st
percentile,indicatingthat71%of theorganizationssurveyed
pay$26,000orlessand29%paymorethan$26,000.Likewise,
24
Statistics that Describe Data
Table2.09
UsingPercentilesinaSalarySurvey
______________________________________________________
25
Chapter 2
HourlyWage Rank Computation Percentile
______________ _____ ____________ __________
$32.17 20 20/20 99
$30.43 19 19/20 95
$28.72 18 18/20 90
$25.25 17 17/20 85
$24.96 16 16/20 80
$24.48 15 15/20 75
$22.92 14 14/20 70
$22.75 13 13/20 65
$22.11 12 12/20 60
$21.03 11 11/20 55
$20.86 10 10/20 50
$20.79 9 9/20 45
$20.35 8 8/20 40
$20.22 7 7/20 35
$20.03 6 6/20 30
$18.93 5 5/20 25
$16.65 4 4/20 20
$16.50 3 3/30 15
$16.25 2 2/20 10
$14.24 1 1/20 5
________________________________________________________
Z-Scores
z=(rawscoremeanscore)standarddeviation
26
Statistics that Describe Data
Forexample,ifyouscored70onatestthathasameanof60
andastandarddeviationof20,yourzscorewouldbe:
z=(7060)20
z=1020
z=0.5
Table2.10
InterpretingZScores
% falling at or below
zScore
score
3.00 0.14
2.00 2.28
1.75 4.01
1.50 6.68
1.25 10.56
1.00 15.87
0.75 22.66
0.50 30.85
0.25 40.13
0.00 50.00
+0.25 59.87
+0.50 69.15
+0.75 77.34
+1.00 84.13
+1.25 89.44
+1.50 93.32
+1.75 95.99
+2.00 97.72
27
Chapter 2
+3.00 99.86
OTHERSTANDARDSCORES
Becausemanypeopledonotlikeworkingwithnegativevalues,
theychoosetouseastandardscoreformatotherthanthe z
score. For example, the Minnesota Multiphasic Personality
Inventory 2 (MMPI2) and the California Psychological
Inventory(CPI)useaTscoreinwhichthestandardizedmean
foreachscaleis50andthestandarddeviationis10. Thus,
with zscores, a personscoring onestandard deviationbelow
themeanwouldhaveastandardscoreof1.00;whereas,with
T scores, a person scoring one standard deviation below the
mean would have a standard score of 40 (mean of 50 one
standard deviation of 10). As shown in Table 2.11, another
examplewould beIQ scoresthat have a meanof 100 anda
standarddeviationof15.
DECIDINGWHICHSTANDARDSCORETOUSE
28
Statistics that Describe Data
Table2.11
ComparisonofStandardScores
Percentile 0.14 2.28 15.87 50.00 84.13 97.72 99.86
Zscore 4 3 2 1 0 +1 +2 +3 +4
TScore 10 20 30 40 50 60 70 80 90
IQ 40 55 70 85 100 115 130 145 160
29
Chapter 2
StatisticalSymbols
Authorsofjournalarticlesandtechnicalreportsseldominclude
such terms as standard deviation or mean in their tables.
Instead,theyusesymbolstorepresenttheirstatistics. Table
2.12 contains the statistical symbols you are most likely to
encounterinjournalarticlesandtechnicalreportsthatdenote
thestatisticsdiscussedinthischapter.
Table2.12
SymbolsUsedtoDenoteDescriptiveStatistics
_____________________________________________________________
Statistic CommonSymbols
___________________ ____________________
Numberofpeople
Inthesample(samplesize) N
Inasubsample n
Numberofgroups K
Mean M,M,X,Mx
Median Mdn,Md
Mode MO
Standarddeviation
Samplestandarddeviation SD,s,StdDev
Populationstandarddeviation
Variance
Samplevariance s2
Populationvariance 2
Standardscore z
Quartile
Firstquartile Q1
Thirdquartile Q3
_______________________________________________
30
Statistics that Describe Data
ApplyingYourKnowledge
Youcanapplywhatyouhavelearnedinthischapterby
reading the following articles in Applied H.R.M. Research
(www.xavier.edu/appliedhrmresearch)andinPublicPersonnel
Management.
Buttigieg,S.(2005).Genderandracedifferencesinscoreson
theEmployeeAptitudeSurvey:Test5Space
Visualization.AppliedH.R.M.Research,10(1),4546.
Kethley,R.B.,&Terpstra,D.E.(2005).Ananalysisof
litigationassociatedwiththeuseoftheapplicationform
intheselectionprocess.PublicPersonnelManagement,
34(4),357375.
Roberts,G.E.(2004).Municipalgovernmentbenefitspractices
andpersonneloutcomes:Resultsfromanational
survey.PublicPersonnelManagement,33(1),119.
Selden,S.C.(2005).Humanresourcemanagementin
Americancounties,2002.PublicPersonnel
Management,34(1),5984.
31
3. Statistics That Test
Differences
Between Groups
______________________________
ChoosingtheRightStatisticalTest
Researcherstypicallyareinterestedintestingthesignificance
of differences between means or frequencies. A ttest or
ANOVA is used to test differences in means and non
parametrictestsareusedtotestdifferencesinfrequencies.
TESTINGDIFFERENCESINMEANS
Table3.01HoursspentwatchingESPN
Women Men
4 3 4 5 5 5
3 4 3 5 5 5
4 3 4 5 5 5
3 4 3 5 6 5
33
Chapter 3
4 3 3 6 5 5
Table3.02HoursspentwatchingLawandOrder
Women Men
5 2 1 5 9 4
0 1 3 8 5 9
1 7 4 1 0 4
10 6 3 10 2 7
3 4 2 2 1 10
Averydifferentpatternoccurs,however,inTable3.02.
Althoughthemeansformenandwomenarethesameasthey
wereinTable3.01,thevariability within eachgroupismuch
greater. WhereasinTable3.01,wherethehighestnumberof
hoursforwomen(4)islowerthanthelowestnumberofhours
formen(5),thehighsandlowsformeninTable3.02arethe
same as those for women. With such high variability, it is
unlikely that the differences in means would be statistically
significant.
Table3.03:Statisticsthattestdifferencesinmeans
NumberofDependent
NumberofIndependent
Variables
Variables
One Twoormore
Oneindependentvariable
Twolevels ttest MANOVA
Twoormorelevels ANOVA MANOVA
Twoormoreindependentvariables ANOVA MANOVA
Intestingdifferencesinmeans,ttestsandANOVAsare
the most commonly used statistics. As shown in Table 3.03,
whenthereisonlyoneindependentvariable(e.g.,sexorrace)
with only two levels (e.g., male, female or minority,
nonminority), a ttest is used to test group differences in
means. When there is one independent variable with more
34
Statistics that Test Differences Between Groups
thantwolevels(e.g.,race:AfricanAmerican,White,Hispanic
American, and Asian American) or there are two or more
independent variables (e.g., sex and race), an analysis of
variance (ANOVA) is used. When there are more than two
dependent variables (e.g., turnover and absenteeism), a
multivariateanalysisofvariance(MANOVA)isused.
Forexample,a ttestmightbeusedtotestdifferences
in:
Salary(1dependentvariable)betweenmalesandfemales
2levels(male,female)of1independentvariable(sex)
Sex
Male Female
Salary $46,000 $43,000
Race
Nonminority Minority
AssessmentCenterScore 52.6 47.3
JobType
Production Clerical
JobSatisfactionScore 6.1 7.5
Ananalysisofvariance(ANOVA)mightbeusedtotest
differencesin:
Salary (1 dependent variable) among White, African
American, and Hispanic employees3 levels (White,
35
Chapter 3
AfricanAmerican,andHispanic)of1independentvariable
(race).
Race/Ethnicity
White AfricanAmerican Hispanic
Salary $44,000 $41,000 $41,500
Race/Ethnicity
Gender White African Hispanic
American
Male $44,000 $41,000 $41,500
Female $42,000 $39,000 $40,000
TESTINGDIFFERENCESWITHSMALLSAMPLESIZES
AsstatedinChapter2,whensamplesizesaresmallorwhen
thereareoutliersthatmightskewthedata,usingthemean
mightresultinamisinterpretationofthedata.Insuchcases,
a statistic that does not assume a normal distribution (non
parametric)isused.Althoughtherearemanynonparametric
tests,twocommonlyusedtestsinthehumanresourcefieldare
theMannWhitneyUtestandtheFishersexacttest.
The MannWhitney (also called WilcoxonMann
Whitney)teststhedifferencesintherankorderofscoresfrom
twopopulations.Forexample,supposethatacompanywants
toknowifthesalariespaidtofemaleaccountantsarelessthan
thosepaidtomaleaccountants.AsyoucanseeinTable3.04,
thereareonly12accountants,probablytoofewtouseattest.
The first step in the MannWhitney is to rank order the
salariesandthensumtheranksforeachgroup.Forexample,
thesumofranksforourfemaleaccountantsis1+7+10+11=
29andthesumofranksforourmaleaccountantsis2+3+4+
36
Statistics that Test Differences Between Groups
5+6+8+9+12=49.AUvalueisthencomputedforeachof
these two sums and a table is used to determine if the
differenceinthetwoUvaluesisstatisticallysignificant.
Ratherthanusingtheaveragerank,theFishersexact
testcomparesthenumberoffemaleaccountantswhosesalary
isabovethemediansalarytothenumberofmaleaccountants
whosesalaryisabovethemedian.Theactualcalculationsfor
theFishersexacttestcangetcomplicatedandarebeyondthe
scope of this book. However, lets discuss the basic concept
behindthetest.AsdepictedinTable3.04,themediansalary
forouraccountantsis$28,500.AsshowninTable3.05,25%of
womenhavesalariesabovethemedianand75%havesalaries
belowthemedian. Formen,62.5%(5/8)havesalariesabove
themedianand37.5%(3/8)havesalariesbelowthemedian.A
Fischersexacttestwoulddeterminetheprobabilitythatthese
differences are statistically significant (i.e., did not occur by
chance).
Table3.04
Salariesforaccountants
Salary AccountantSex
$32,000 Female
$31,500 Male
$31,000 Male
$30,900 Male
$30,200 Male
$29,000 Male
$28,000 Female
$27,800 Male
$27,200 Male
$27,000 Female
$26,500 Female
$26,000 Male
37
Chapter 3
Table3.05
Numberofmenandwomenwhosesalaryfallsaboveand
belowthemedian
Women Men
Abovethemedian 1 5 6
Belowthemedian 3 3 6
4 8 12
TESTINGDIFFERENCESINFREQUENCIES
Attimes,aresearcherwantstotestdifferencesinfrequencies
ratherthandifferencesinmeansormedians.Forexample,as
showninTable3.06,anHRmanagermightwanttoseeifthe
distributionofmenandwomenacrossjobsisthesame.Or,as
showninTable3.07,theHRmangermightwanttodetermine
whether there are differences in the number of people hired
using different recruitment methods. In situations such as
these,achisquareismostcommonlyusedwithlargesamples
andtheFishersexacttestforsmallsamples.
Table3.06 Table3.07
PositionType Male Female RecruitmentMethod Hired
Management 15 5 Referral 43
Clerical 2 27 Advertisement 27
Production 45 13 JobFair 26
InterpretingStatisticalResults
tTEST
When ttestsareusedintechnicalreportsorjournalarticles,
38
Statistics that Test Differences Between Groups
theresultsoftheanalysisaretypicallylistedinthefollowing
format:
t(45)=2.31,p<.01
Atwosamplettestisusedtocomparethemeansoftwo
independent groups. For example, a group of 30 employees
received customer service training, and the town manager
wantstocomparethecomplaintratefortheseemployeeswith
thatof40employeeswhodidnotreceivethetraining.Another
examplewouldbethatacompensationmanagerfoundthatthe
averagesalaryformalepoliceofficersinthetownwas$32,200,
andtheaveragesalaryforfemalepoliceofficerswas$30,800.
To determine if the average salary for men was statistically
higherthantheaveragesalaryforwomen,atwosamplettest
wouldbeused.
A paireddifference ttest is used when you have two
measuresfromthesamesample. Forexample,policeofficers
in one department averaged 1.3 complaints per officer. To
reduce the number of complaints, the chief had each of the
officersattendatrainingseminaroncommunicationskills.In
theyearfollowingtheseminar,theaveragecomplaintratefor
thosesameofficerswas1.0.Apaireddifferencettestwouldbe
used to determine if the decrease from 1.3 to 1.0 was
statisticallysignificant.
ANALYSISOFVARIANCEONEINDEPENDENTVARIABLE
DegreesofFreedom
FStatistic
41
Chapter 3
expectedbychance.IntheexampleshowninTable3.09,theF
of35.32foreducationindicatesthattheeffectofeducationon
academygradesis35timeswhatwouldbeexpectedbychance.
SignificanceLevel
Table3.09
Example of an ANOVA source table for one
independent variable
Effect df SS MS F p<
Education 2 6739.24 3369.62 35.32 .0001
Error 671 64011.83 95.40
Total 673 70751.07
IfourFvalueisnotstatisticallysignificant,wecannot
concludethatourindependentvariable(e.g.,education)hadan
effectonthedependentvariable(e.g.,academygrades).Ifour
F valueissignificant,wehaveonemoreanalysistoperform.
AlthoughthesignificantFvalueindicatesthatacademycadets
42
Statistics that Test Differences Between Groups
performeddifferentlyonthebasisoftheireducationlevel,we
dont know if academy performance differed for each of the
three degree types. That is, it may be that cadets with
associatesdegreesorbachelorsdegrees outperformed cadets
withahighschooldiploma,butcadetswithassociatesdegrees
orbachelors degreesperformedat thesamelevel. Togeta
clearerpictureofwhichmeansdifferfromoneanother, post
hoc tests are conducted. Examples of such tests include
Scheffe,Tukey,Duncan,LSD,andNewmanKeuls.
Table3.10
ApproximateFvalueneededforsignificanceatthe.05level
LevelsintheIndependentVariable
SampleSize
2 3 4 5
10 5.12 4.46 4.35 4.53
20 4.38 3.55 3.20 3.01
30 4.18 3.34 2.96 2.74
60 4.00 3.15 2.76 2.52
100 3.94 3.09 2.70 2.46
200 3.89 3.04 2.65 2.41
Infinity 3.84 2.99 2.60 2.37
Injournalarticlesandtechnicalreports,theresultsof
thesetestsaretypicallydepictedusingsuperscriptsnexttothe
mean. Means that share the same superscript are not
statisticallydifferentfromoneanother.Forexample,thethree
meansinTable3.11havedifferentsuperscripts,thustheyare
statistically different from each other. As shown in the
examplesinTable3.12,
In Example 1, cadets with bachelors degrees and
associatesdegreesperformedbetterthancadetswithhigh
schooldiplomas,buttherewasnodifferencebetweencadets
withassociatesdegreesandbachelorsdegrees.
InExample2,cadetswithbachelorsdegreesoutperformed
thosewithassociatesdegreeswhooutperformedthosewith
43
Chapter 3
highschooldiplomas.
In Example 3, cadets with bachelors degrees performed
better than those with associates degrees or high school
diplomas. Cadets with associates degrees did not
outperformcadetswithahighschooldiploma.
Table3.11
MeansTable
Table3.12
Examplesofposthoctestresults
EducationLevel Example1 Example2 Example3
Highschooldiploma 73.24a
73.24 a
73.24a
Associatesdegree 77.89b 77.89b 75.99a
Bachelorsdegree 78.01b 80.21c 80.21b
AsshowninTable3.13,anANOVAproducesan F valuefor
each independent variable and combination of independent
variables.Eachindividualvariableiscalledamaineffectand
the combination of variables is called an interaction. When
there are two independent variables, three outcomes are
possible:
Oneorbothmaineffectsarestatisticallysignificantbut
theinteractionisnot(Table3.14)
Neithermaineffectissignificantbuttheinteractionis
(Table3.15)
44
Statistics that Test Differences Between Groups
Table3.13
Example of an ANOVA source table for two independent
variables
Effect df SS MS F p<
Race 1 1.31708 1.31708 9.31 .003
Sex 1 0.00896 0.00896 0.06 .802
Sex*Race 1 0.51953 0.51953 3.67 .058
Error 107 15.13865 0.14148
Total 110 16.98421
Table3.14
Exampleofsignificantmaineffectwithnointeraction
SourceTable
Effect df SS MS F p<
Sex 1 .0073 .0073 14.09 .0006
Race 1 0.5198 0.5198 0.20 .6593
Sex*Race 1 .0026 0.0026 0.07 .7937
Error 36 1.3279 0.0369
Total 39 1.8576
MeansTable
Sex White Minority
Male 2.42 2.46 2.44
Female 2.21 2.22 2.22
2.31 2.34 2.33
45
Chapter 3
2.5
ng 2.4
ati 2.3 Men
2.2 Women
R
2.1
2
Minority White
Race
Table3.15
Exampleofasignificantinteractionbutnomain
effects
SourceTable
Effect df SS MS F p<
Race 1 0.0172 0.0172 1.20 .2804
Sex 1 0.0056 0.0056 0.04 .8441
Sex*Race 1 0.2873 0.2873 20.03 .0001
Error 36 0.5162 0.0143
Total 39 0.8213
MeansTable
Sex White Minority
Male 2.48 2.27 2.38
Female 2.30 2.43 2.43
2.39 2.34 2.40
46
Statistics that Test Differences Between Groups
2.5
2.4
Men
ng
ati 2.3
R Women
2.2
2.1
Minority White
Race
Asignificantinteractionindicatesthattheeffectofone
variabledependsontheleveloftheotherdependentvariable.
Forexample,asshowninTable3.15,therearenosignificant
differences in overall performance ratings between men and
women or minorities and nonminorities. The significant
interaction,however,tellsusthatsexandraceinteractsothat
white males and minority females are receiving the highest
performance evaluations. To be sure of which means are
differentfromoneanother,wewoulduseoneoftheplanned
comparisontestspreviouslymentioned.
Table3.16
Exampleofasignificantinteractionandmain
effects
SourceTable
Effect df SS MS F p<
Race 1 0.2234 0.2234 5.64 .0230
Sex 1 0.5086 0.5086 12.83 .0010
Sex*Race 1 0.2576 0.2576 6.50 .0152
Error 36 1.4267 0.0396
Total 39 2.4163
MeansTable
Sex White Minority
Male 2.59 2.28 2.44
Female 2.21 2.22 2.22
47
Chapter 3
2.7
2.6
2.5
Men
ng
ati 2.4
2.3 Women
2.2
R
2.1
2
Minority White
Race
FISHERSEXACTTEST
CHISQUARE
Injournal articles,chisquareresultsareoftenreportedina
mannersuchas,2(2)=2.69,p<.04.The(2)isthedegreesof
freedom, the2.69isthechisquarevalue, andthe.04is the
probabilitylevel.
Withachisquareanalysis,thedegreesoffreedomare
48
Statistics that Test Differences Between Groups
thenumberofgroupsminusone.Forexample,iftheanalysis
examined racial frequencies (white, African American,
Hispanic,Asian),thedegreesoffreedomwouldbethree:four
racesminusone.IntheexampleshownbackinTable3.07,the
degreesoffreedomwouldbetwo(threerecruitmentmethods
minusone).Whentheanalysisinvolvestwovariables,suchas
thatshownbackinTable3.06,thedegreesoffreedomarethe
numberofgroupsinthefirstvariableminusone,multipliedby
thenumberofgroupsinthesecondvariableminusone. The
degrees of freedom for Table 3.06 would be two: (two for
positiontype[3positionsminus1]multipliedbyoneforsex[2
sexesminusone].)
Thesignificancelevelisdeterminedbythesizeofthe
chisquarevalueandthedegreesoffreedom. Thegreaterthe
degreesoffreedom,thehigherthechisquarevalueneededfor
statistical significance. An example of the chisquare values
neededforstatisticalsignificanceisshowninTable3.17. As
you can see in the table, with 3 degrees of freedom, a chi
square value of 7.81 or higher is needed for a frequency
distribution to be significantly different at the .05 level and
11.34forthe.01levelofsignificance.
Itisimportanttounderstandthatwhentherearemore
than two levels of a variable, a significant chisquare only
indicates that the distribution of frequencies is not equal.
UnlikethepairedcomparisonsavailableforANOVA,thereis
no test with chisquare to indicate which frequencies are
different from one another. For example, in Table 3.06, a
significantchisquarewouldtellusthatmalesandfemalesare
not represented equally across positions; but we would not
know which positions are statistically different from one
another. Inlookingatthetablewewouldprobablymakethe
assumption that males are represented more in the
management and production positions and less often in the
clericalpositions;butwecouldnotstatethiswithstatistical
certainty.
49
Chapter 3
Table3.17
Chisquarevaluedneededforstatistical
significance
ProbabilityLevel
DegreesofFreedom
.05 .01
1 3.84 663
2 5.99 9.21
3 7.81 11.34
4 9.49 13.28
5 11.07 15.09
6 12.59 16.81
7 14.07 18.48
8 15.51 20.09
9 16.92 21.67
10 18.31 23.21
20 31.41 37.57
30 43.77 50.89
50
Statistics that Test Differences Between Groups
ApplyingYourKnowledge
Youcanapplywhatyouhavelearnedinthischapterby
reading the following articles in Applied H.R.M. Research
(www.xavier.edu/appliedhrmresearch)andinPublicPersonnel
Management.
ttest
Kuroyama,J.,Wright,C.W.,Manson,T.M.,&Sablynski,C.J.
(2010).Theeffectofwarningagainstfakingon
noncognitivetestoutcomes:Afieldstudyofbusoperator
applicants.AppliedH.R.M.Research,12(1),5974.
AnalysisofVariance(ANOVA)
Levine,S.P.&Feldman,R.S.(2002).Womenandmens
nonverbalbehaviorandselfmonitoringinjobinterview
setting.AppliedH.R.M.Research,7(1),114.
Roberts,L.L.,Konczak,L.J,&Macan,T.H.(2004).Effectsof
datacollectionmethodsonorganizationalclimate
surveyresults.AppliedH.R.M.Research,9(1),1326.
Chisquare
Lee,J.A.,Havighurst,L.C.,&Rassel,G.(2004).Factorsrelated
tocourtreferencestoperformanceappraisalfairness
andvalidity.PublicPersonnelManagement,23(1),61
69.
Somers,M.,&Casal,J.C.(2011).Typeofwrongdoingand
whistleblowing:Furtherevidencethattypeof
wrongdoingaffectsthewhistleblowingprocess.Public
PersonnelManagement,40(2),151163.
51
Chapter 3
52
4.UnderstandingCorrelation
_______________________________
WhatisCorrelation?
InterpretingaCorrelationCoefficient
MAGNITUDEANDDIRECTION
54
Understanding Correlation
correlationof+.30because,eventhoughthe.39isnegative,it
isfurtherfromzerothanthe+.30.
55
Chapter 4
negativecorrelation.NoteinFigure4.2thereareeightpoints
Figure4.1
ExampleofaPositiveCorrelation
6 x
PerformanceRating
5 x
4 A B x x
3 x C D x x x
2 x x x
1 x
12 13 14 15 16 17 18 19 20 21 22 23 24
TestScore
Figure4.2
ExampleofaNegativeCorrelation
PerformanceRating
6 x
5 x x
4 x A B x x
3 x C D x x
56
Understanding Correlation
2 x x
1 x
12 13 14 15 16 17 18 19 20 21 22 23 24
TestScore
Figure4.3
ExampleofTwoUncorrelatedVariables
6 x
5 x x x
PerformanceRating
4 A B x x
3 x C D x x
2 x x x
12 13 14 15 16 17 18 19 20 21 22 23 24
TestScore
inquadrantsAandDandonlyfourinquadrantsCandB.If
thenumberisthesame,thereisnocorrelation.NoteinFigure
4.3thattherearesixpointsinquadrantsAandDandsixin
quadrantsCandB.
57
Chapter 4
FactorsLimitingtheMagnitudeofaCorrelation
own study, we obtained a correlation of .40. What would
explain this discrepancy? Probably three factors: test
unreliability,criterionunreliability,andrangerestriction.
Reliability. Thesizeofacorrelationcoefficientislimited
by the reliability of the two variables being correlated
(reliability is the extent to which a score is free from
error). So,ifourtwomeasures(inthiscase,scoresona
cognitiveabilitytestandgradesintheacademy)havelow
reliability, the correlation between the test scores and
academy grades (our validity coefficient) will be lower
thanexpected.Therearefourtypesofreliability:testretest,
alternateforms,internal,andscorer.
With testretest reliability, people take the same test
twice.Thescoresfromthefirstadministrationofthetestare
correlatedwithscoresfromthesecondtodeterminewhether
theyaresimilar.Iftheyare,thetestissaidtohavetemporal
stability:thetestscoresarestableacrosstimeandnothighly
susceptibletosuchrandomdailyconditionsasillness,fatigue,
stress,oruncomfortabletestingconditions.
Withalternativeformsreliability,twoformsofthesame
test are constructed. The scores on the two forms are then
correlatedtodeterminewhethertheyaresimilar.Iftheyare,
thetestissaidtohaveformstability.Multipleformsofatest
arecommoninsituationsinwhichindividualsmighttakethe
testmorethanonce(e.g.,apromotionexam)orwhenthereis
concernthattesttakerswillcopyanswersfromanothertest
taker.
With internalreliability,thesimilarityofresponsesto
testitemsiscompared.Inatestwithhighinternalreliability,
we would expect a test taker to answer similar items in a
similarway.Thatis,wewouldexpectapersonwhoratesthe
itemIamoutgoingasbeingsimilartothemwouldalsorate
theitemIliketotalkwithpeopleasalsobeingsimilarto
them. Measures of internal reliability that you might
encounter in a journal article include splithalf reliability,
58
Understanding Correlation
Cronbachs coefficient alpha, and the KuderRichardson
Formula20(KR20).
Scorer reliability is the extent to which two people
scoringatestwillobtainthesametestscore.Scorerreliability
isanissueespeciallyinprojectiveorsubjectivetests(e.g.,the
RorschachInkBlotTest,interviews,writingsamples)inwhich
thereisnoonecorrectanswer,buteventestsscoredwiththe
useofkeyssufferfromscorermistakes. Forexample,Allard,
Butler,Faust,andShea(1995)foundthat53%ofhandscored
personalitytestscontainedatleastonescoringerrorandthat
19% contained enough errors to alter a clinical diagnosis.
When human judgment of performance is involved, scorer
reliabilityisdiscussedintermsofinterraterreliability. That
is,willtwointerviewersgiveanapplicantsimilarratings,or
will two supervisors give an employee similar performance
ratings?
RangeRestriction. Acorrelationcoefficientisalsolimited
bytherangeoftestscoresandperformancemeasuresthatare
includedinthestudythewidertherangeofscores,thehigher
thevaliditycoefficient(thecorrelationbetweenatestscoreand
ameasureofjobperformance). Unfortunately,inthetypical
validitystudyinwhichwecorrelateatestwithsomemeasure
ofperformance,weusuallyencountersomethingcalled range
restriction.Thatis,wedon'thaveafullrangeoftestscoresor
performance ratings. For example, in a given employment
situation,fewemployeesareattheextremesofaperformance
scale. Employees who would be at the bottom were either
neverhiredorhavesincebeenterminated. Employeesatthe
upperendoftheperformancescaleeithergotpromotedorwent
toanorganizationthatpaidmoremoney.
Range restriction is important becauseit is easiest to
predictthefutureperformanceofpeoplewithextremelyhighor
extremely low test scores. For example, suppose that 10
students scored 165 on the verbal portion of the GRE and
another 10 scored 140. Most people would be willingtobet
thatmostofthestudentswithascoreof165willdobetterin
59
Chapter 4
However, suppose that one student scores 153 and another
scores 151. How many people would be willing to bet the
mortgageonsuchasmalldifferenceinpoints?
Curvilinearity.Anotherproblemthatcanlowerthesizeofa
correlationcoefficientis curvilinearity. AsdepictedinFigure
4.4,oneoftheassumptionsbehindcorrelationisthatthetwo
variablesbeingcorrelatedarelinearlyrelatedthescoreson
onevariablearerelatedinastraightlinetoscoresontheother
variable.
However, manythingsinlifearenotlinearlyrelated.
For example, research strongly indicates that bright people
performbetterthanlessbrightpeopleinthepoliceacademy.
But, is there apoint at which increases inintelligencedon't
help? In the example shown in Figure 4.5, academy
performanceincreasesasIQscoresincreaseuntilwereachan
IQof110.Afterscoresof110,increasingamountsofIQdonot
result in better performance. Why would we obtain such a
relationship?Becausethemateriallearnedintheacademyis
onlysodifficult;andatsomepoint,beingsupersmartmaynot
provideanyadvantageoverbeingsmart.
60
Understanding Correlation
In human resources, we see a similar relationship
betweenyearsofexperienceandjobperformance.Thatis,the
differenceinthejobperformanceofapersonwithtwoyears
experienceversusapersonwithnoexperienceorwithoneyear
experienceisprobablyfairlygreat.However,aftertenyears,
wouldanemployeewith15yearsofexperienceperformbetter
thananemployeewith12yearsofexperience?
61
Chapter 4
wouldprobablynotresultinasignificantcorrelation,andwe
wouldincorrectlyconcludethatthereisnocorrelationbetween
the two variables. Fortunately, there are some statistical
adjustments that we can make to test for this possibility
(convertingourmeasurestozscoresandthensquaringthem).
Justasfortunately,suchadjustmentsarebeyondthescopeof
thischapterandprobablyyourinterestaswell!
AninterestingexampleoftheinvertedUwasprovided
in 1996 by the New London, Connecticut police department.
NewLondonrequiredthatapplicantsscorebetween20and27
on the Wonderlic Personnel Test (a cognitive ability test),
reasoning that people scoringbelow 20were too dumb to be
cops,andthosescoringabove27weretoosmartandwouldbe
bored performing the daytoday law enforcement duties.
Though New London had no statistical proof to back their
claim, the 2nd Circuit Court of Appeals upheld the citys
practiceofnothiringapplicantswhoweretoobright(Jordanv.
CityofNewLondon,2000).Asyoucanimagine,NewLondon
receivedlotsofbadpublicityandagooddealofribbingfrom
othercities.TheSanFranciscoPoliceDepartmentwentsofar
as to hold a press conference inviting these too smart
62
Understanding Correlation
applicants rejected by New London to move out west and
applyfortheSFPD!
STATISTICALSIGNIFICANCE
Todetermineifweareevenallowedtointerpretacorrelation
coefficient, we must first compute something called a
significancelevel(seeChapter1foradiscussionofsignificance
levels). Significance levels tell us the probability that our
correlationcoefficientoccurredbychancealone.Thatis,ifwe
obtainacorrelationof.30betweenatestscoreandsupervisor
ratingsofonthejobperformance,istherereallyarelationship
between the two variables or is our correlation a chance
finding?
The significance level for a correlation coefficient is a
function of two factors: the size of the correlation coefficient
andthesamplesizeusedinthestudy.Thegreaterthesample
size, the smaller the correlation needed for statistical
significance.Forexample,asshowninTable4.1,acorrelation
of .19 would be significant if we had 100 employees in our
studybutnotifwehadonly50employees.
Table4.1
Samplesizesneededforstatisticalsignificance
SampleSize SmallestSignificantCorrelation(p<.05)
10 .63
20 .44
30 .36
40 .31
50 .27
60 .25
70 .23
80 .22
90 .21
100 .19
63
Chapter 4
Ifacorrelationcoefficientisnotstatisticallysignificant,
we cannot try to interpret it as being high/low or useful/not
useful. We essentially pretend that it doesn't exist. If,
however, the correlation is statistically significant, we must
address the issue of practical significance. That is, is the
relationshiphighenoughtobeofanyuse?
INTERPRETINGCORRELATIONSWITHNOMINALDATA
64
Understanding Correlation
lower code), and a negative correlation would mean that
womenarebeingpaidlessthanmen.
PRACTICALSIGNIFICANCE:ISOURCORRELATIONANY
GOOD?
Wehavealreadydiscussedthatthemagnitudeofacorrelation
coefficient can range from 0 to 1 and that the farther the
coefficientisfromzero,thehighertherelationshipbetweentwo
variables.Butwhatdoesacorrelationof.40mean?Weknow
thatacorrelationof.40indicatesastrongerrelationshipthana
correlationof.20.But,is.40ahighcorrelation?Theanswer,of
course,isthatitdependsandtherearethreecommonwaysto
answer this question: variance accounted for, comparison to
norms,andutilityanalysis.
VarianceAccountedFor(r2)
ComparisontoNorms
65
Chapter 4
Thoughwewouldliketohaveextremelyhighcorrelationsand
rsquares,asmentionedpreviously,inpsychologyandhuman
resources this israre. So, to interpret acorrelationas being
good or high, one must compare the magnitude of a
correlation with those that are typically obtained in similar
situations. As can be seen in Table 4.2, the typical
correlationsfoundinorganizationalpsychologyresearchvary
tremendouslybytopic.
In personnel selection, correlations between selection
tests and measures of performance (validity coefficients)
correlations are typically in the .20 to .30 range. Thus, in
personnelselection,acorrelationbelow.20wouldbeconsidered
low,.20to.29consideredmoderate,.30to.39high,and.40or
greater as outstanding. Validity coefficients greater than .50
probably indicate one of two things: either the correlation
coefficient is suspect (e.g., calculation errors, chance due to
small sample size, or cheating) or the personnel analyst
deservestheNobelPrizeforscience!
Ifweareusingaparticulartypeofselectiontestand
wanttocompareittosimilartests,Table4.3providesaneasy
waytoassessthemagnitudeofourcorrelation.Forexample,if
we correlated our assessment center scores with supervisor
ratingsofperformanceandobtainedacorrelationof.15,wecan
see from Table 4.3 that ourvalidity of .15 is well below the
typicalvalidityof.28forassessmentcenters. Notetometa
analysis fansthe correlations in the table below are
uncorrected.SeeAamodt(2016)orSchmidtandHunter(1998)
for tables showing corrected or "true" validities. These
concepts will be explained further in Chapter 6 on meta
analysis.
Unfortunately,tablessuchas4.2and4.3arenotalways
availablethatmakeiteasytocompareacorrelationcoefficient
withnorms.
66
Understanding Correlation
Table4.2
CorrelationNormsinOrganizationalPsychology
Topic Metaanalysis Average
Correlation
Intrinsicmotivationand Mathieu&Zajac .67
organizationalcommitment (1990)
Jobsatisfactionand CooperHakim& .59
organizationalcommitment Viswesvaran(2005)
Agreementofperformance Conway&Huffcut .50
ratingsbytwosupervisors (1997)
Koslowskyetal. .40
Absenteeismandlateness
(1997)
Jobsatisfactionand .30
Judgeetal.(2001)
performance
Absenteeismandturnover Griffthetal.(2000) .21
Enjoymentoftrainingand .11
Alligeretal.(1997)
actuallearning
Table4.3
CorrelationNormsforEmployeeSelectionValidity
Technique Metaanalysis Average
Validity
Cognitiveability Schmidt&Hunter(1998) .39
Biodata Beall(1991) .36
Structuredinterview Huffcutt&Arthur(1994) .34
Assessmentcenters Arthuretal.(2003) .28
Worksamples Rothetal.(2005) .26
Experience Quinonesetal.(1995) .22
Conscientiousness Judgeetal.(2013) .21
Situationaljudgment McDanieletal.(2007) .20
References Aamodt&Williams(2005) .18
Grades Rothetal.(1996) .16
Integritytests VanIddekingeetal.(2012) .13
Personality Tettetal.(1994) .12
67
Chapter 4
UtilityAnalysis
Anotherwaytodetermineifavaliditycoefficientis"anygood"
istotranslatethecorrelationintotermsthatmostpeoplecan
understand.Thoughthereareseveraldifferentmethodsused
to establish the utility of a test (e.g., TaylorRussell Tables,
expectancy charts), we will concentrate on the
BrogdenCronbachGleser Utility Formula. This formula
computes the amount of money that an organization would
saveifitusedatesttoselectemployees.Tousethisformula,
sixpiecesofinformationmustbeknown.
1. Numberofemployeeshiredperyear(n).Thisnumberis
easytodetermine:itissimplythenumberofemployees
who are hired for a given position in a year. This
numbercanbetheactualnumberinagivenyearoran
estimateofthenumberinatypicalyear.
2. Averagetenure(t).Thisistheaveragenumberofyears
that employees in the position tend to stay with the
company.Thenumberiscomputedbyusinginformation
from company records to identify the time that each
employeeinthatpositionstayedwiththecompany.The
numberofyearsoftenureforeachemployeeissummed
anddividedbythetotalnumberofemployees.Ifactual
tenuredataarenotavailable,anestimatecanbeused;
butestimatesreducetheaccuracyoftheutilityformula.
68
Understanding Correlation
Research has shown, however, that for jobs in which
performanceisnormallydistributed,agoodestimateof
thedifferenceinperformancebetweenanaverageanda
good worker (one standard deviation away in
performance)is40%oftheemployee'sannualsalary.To
obtain this, the total salaries of current employees in
thepositioninquestioncanbeaveraged,orthesalary
grade midpoint for the position can be used. For
example, if the salary midpoint for an electronics
assembler is $25,000, SD$ would be .40 * $25,000 =
$10,000.
5. Meanstandardizedpredictorscoreofselectedapplicants
(m).Thisnumberisobtainedinoneoftwoways.The
first method is to obtain the average score on the
selectiontestbothfortheapplicantswhoarehiredand
theapplicantswhoarenothired.Theaveragetestscore
of the nonhired applicants is subtracted from the
average test score of the hired applicants. This
differenceisdividedbythestandarddeviationofallthe
test scores. For example, we administer a test of
cognitiveabilitytoagroupof50applicantsandhirethe
5 with the highest scores. The average score of the 5
hired applicants is 35.2, the average test score of the
other45applicantsis28.2,andthestandarddeviation
ofalltestscoresis8.5.Thedesiredfigurewouldbe:
(35.228.2)8.5=7.08.5=.647
openingsapplicants=550=.10
6. Costoftesting(C). Thisfigureisobtainedbymultiplying
thenumberofapplicantsbythecostpertest.
Todeterminethesavingstothecompany,weusethe
followingformula:
Savings=(n)(t)(r)(SD$)(m)costoftesting
Table4.4
Selectionratioconversiontable
SelectionRatio StandardScore(m)
.05 2.08
.10 1.76
.20 1.40
.30 1.17
.40 0.97
.50 0.80
.60 0.64
.70 0.50
.80 0.35
.90 0.20
1.00 0.00
Asanexample,supposewewillhire10auditors per
year, the average person in this position stays 2 years, the
validity coefficient is .40, and average annual salary for the
positionis$30,000,andwehave50applicantsfor10openings.
Thus,
70
Understanding Correlation
n=10
t=2
r=.40
sd$=$30,000x.40=12,000
m=10/50=.20=1.40(.20isconvertedto1.40byusing
theconversiontable)
Costoftesting=(50applicantsx$10=$500)
Usingtheutilityformula,wewouldhave
(10)(2)(.40)(12,000)(1.40)(500)=$133,900
Thismeansthatafteraccountingforthecostoftesting,
using this particular test instead of selecting employees by
chancewillsaveacompany$133,900overthetwoyearsthat
auditors typically stay with the organization. Because a
company seldom selects employees by chance, the same
formulashouldbeusedwiththevalidityofthetest(interview,
psychological test, references, and so on) that the company
currentlyuses.Theresultofthiscomputationshouldthenbe
subtractedfromthefirst.
71
Chapter 4
ApplyingYourKnowledge
Youcanapplywhatyouhavelearnedinthischapterby
reading the following articles in Applied H.R.M. Research
(www.xavier.edu/appliedhrmresearch)andinPublicPersonnel
Management.
Cole,M.S.,Feild,H.S.,&Giles,W.F.(2003).Whatcanwe
uncoveraboutapplicantsbasedontheirresumes?A
fieldstudy.AppliedH.R.M.Research,8(2),5162.
Cucina,J.M.,Busciglio,H.H.,&Vaughn,K.(2013).Category
ratingsandassessments:Impactonvalidity,utility,
veteranspreference,andmanagerialchoice.Applied
H.R.M.Research,13(1),5168.
Gilbert,J.A.(2000).Anempiricalexaminationofresourcesin
adiverseenvironment.PublicPersonnelManagement,
29(2),175184.
OConnell,M.S.,Doverspike,D.,Cober,A.B.,&Philips,J.L.
(2001).Forgingworkteams:Effectsofthedistribution
ofcognitiveabilityonteamperformance.Applied
H.R.M.Research,6(2),115128.
Smith,W.J.,Harrington,K.V.,&Houghton,J.D.(2000).
Predictorsofperformanceappraisaldiscomfort:A
preliminaryexamination.PublicPersonnel
Management,29(1),2132.
72
5.UnderstandingRegression
_______________________________
FunctionsofRegression
MAKINGPRECISEPREDICTIONS
Ifyourememberfromourdiscussiononcorrelation,weliketo
seecorrelationsofatleast.20betweenaselectiontestanda
measureofjobperformance.However,supposethatwehavea
personalityinventorythatcorrelates.13withjobperformance
and a cognitive ability test that correlates .15 with job
performance. Based on these small correlations, we would
probably be disappointed and lose hope of ever winning the
Nobel Prize for HR Validity Studies. However, regression
analysismightbeabletosavetheday.Thatis,withmultiple
regression, we can combine two small correlations into one
largercorrelation(ifthetwopredictorsdonotcorrelatewith
oneanother,wecansimplyaddthesquaredcorrelations,butit
isusuallymorecomplicatedthanthat).
The late industrial psychologist Dan Johnson likened
theuseofregressiontoafishingtrip.Duringourtrip,wecan
trytocatchonehugefishtomakeourmeal,orwecancatch
several small fish that, when cooked and placed on a plate,
make the same size meal as one large fish. With selection
tests, we try for one or two tests that will correlate with
performance at a high level. Unfortunately, such big
correlations are as hard to get as it is to catch a fish large
enough to feed the entire family. But by combining several
tests with smaller validities, we can predict performance as
wellaswecouldbyusingonetestwithaveryhighvalidity.
REMOVINGUNNECESSARYVARIABLES
Oneofthenicethingsaboutmultipleregressionisthat
inadditiontocombiningsmallcorrelations,italsotellsusifwe
havetoomanyvariablesmeasuringthesamething. Thatis,
supposeourselectionbatterycontainsapersonalityinventory
and an unstructured interview. The personality inventory
correlates .25 with performance and the unstructured
Understanding Regression
interview correlates .20 with performance. We start to get
excitedbecauseifweaddthetwotogether,wewouldhavea
multiple correlation (R) of .45. However, after entering our
data into a regression analysis, we find that our equation
"threwout"theinterviewbecauseitwasmeasuringthesame
thingourpersonalitytestwasmeasuringsocialskillsand
extroversion.So,eventhoughwethoughtweweremeasuring
twodifferentconstructs,ourtestandinterviewwereactually
measuringthesamething(theywerehighlycorrelated).
A few years ago, we were asked by a clinical
psychologisttovalidatethetestbatteryhewasusingtoselect
policeofficers.Aswereviewedhisbattery,wewerestunnedto
see that every applicant was administered three different
measures of cognitive ability and three different personality
inventories. Whenweaskedthepsychologistwhyheusedso
manysimilartests,hesaidthathe,"gotsomethingdifferent
from each one of them." However, since the three cognitive
ability measures were highly correlated, as were the three
personalityinventories(hescoresthempass/fail),wedoubted
thattheextratestsprovidedanynewinformation.
To test this idea, we entered the test scores into two
separate regression equationsone to predict his overall
ratingsof"suitability"andonetopredictsupervisorratingsof
onthejobperformance.Asexpected,onepersonalityinventory
andonecognitiveabilitytestpredictedhissuitabilityratings
the other tests did not help predict his ratings (that is, the
other tests did not account for unique variance). We were
unsuccessful in explaining the results to him in terms of
statistics,sowefinallysaid,"Whattheresultsshowisthatyou
canmakethesamedecisionswithtwotestsasyouwouldwith
six.Thedifferenceisthatyouwillsaveabout$100intesting
costsperapplicant."Thatheunderstood!
The moral of this story is that when selecting
employees, the test battery should not contain several
measuresofthesameknowledge,skill,orability.Ifwegoback
toourexampleofmakingameal,oncewecatchenoughfish(a
cognitive ability test), there is no need to catch more (more
Chapter 5
cognitiveabilitytests). Instead,tomaketheperfectmeal,we
should add a salad (personality inventory), some bread
(structuredinterview),anddesert(integritytest).Toomuchof
thesamethingmakesaboringmealandawastefulselection
battery.
ConductingaRegressionAnalysis
Toconductaregressionanalysis,acomputerprogramsuchas
SPSS,SAS,orExcelistypicallyused. Thougheachofthese
programs uses slightly different commands, the results are
almost identical. To run a regression analysis, aresearcher
tells the computer which variables are the predictors (the
independentvariables)andwhichvariablesaretheonestobe
predicted(thedependentvariables).Forexample,
A personnel analyst might be interested in seeing how
interviewscoresandscoresonfivepersonalitydimensions
(the predictors) predict supervisor ratings of onthejob
performance(thedependentvariable).
A compensation analyst might want to see how
performance ratings, years in the organization, and
education level (the predictors) are related to salary (the
dependentvariable).
AuniversitymightwanttofindthebestwaytouseSAT
scoresandhighschoolgradepointaverages(thepredictors)
to predict the GPA students will earn during their
freshmanyear(thedependentvariable).
TYPESOFREGRESSIONANALYSES
There are two main ways to enter the predictors into the
regression analysis: stepwise and hierarchical. With a
stepwise regression analysis, the computer takes the best
predictor of the dependent variable and enters it into the
equation first. The computer then enters the second best
predictor and then the third best and so on. The computer
Understanding Regression
stopsenteringvariableswhenthereareeithernovariablesleft
to enter or the remaining variables do not add a significant
amount of prediction. Stepwise regression is the most
commonlyusedmethodinemployeeselection.
Howdoesthecomputerprogramknowwhichvariableis
bestateachstep?Thefirstvariableenteredintotheregression
equation is the one with the highest correlation with the
dependentvariable.Thenextoneenteredisdeterminedbytwo
things:howwellitisrelatedtothedependentvariableandhow
highlyitis correlatedwiththevariablealreadyenteredinto
theequation.Asanexample,lookatthecorrelationsinTable
5.1 that show the relationships between grades in graduate
school andGREscores,undergraduate grades,andreference
letters.
Inastepwiseregression,GREscoreswouldbethefirst
predictor entered into the equation, because they have the
highest correlation with the dependent variable (graduate
GPA).Althoughundergraduategradeshavethenexthighest
correlation(r=.25),references(r=.20)wouldactuallybethe
next entered; because, although undergraduate GPA has a
higher correlation than do references, they are so highly
correlated(r =.80)withGREscoresthattheywouldnotadd
muchuniqueorincrementalprediction.References,however,
arenotatallcorrelatedwithGREscores(r=.00)andthuswill
addincrementalprediction.
Table5.1
CorrelationswithGraduateGPA
GradGPA GRE UGGPA Reference
s
GraduateGPA 1.00 .30 .25 .20
GRE 1.00 .80 .00
Undergraduate 1.00 .10
GPA
References 1.00
Chapter 5
Withahierarchicalregressionanalysis,theresearcher
tells the computer program the order in which to enter the
predictors. There are many times when a researcher might
wanttodictatetheorder. Forexample,apolicedepartment
hasdevelopedastructuredinterviewandintendstouseitas
themainmethodofselectingnewofficers.Thedepartmentis
considering adding a cognitive ability test to its selection
batteryAnother
and wants to know
reason if the
to use cognitive regression
hierarchical ability test will
is to
increasethepredictionaccuracyabovethatalreadyprovidedby
reduce adverse impact. For example, suppose that a police
theinterview.Insuchacase,thedepartmentwouldfirstenter
departmentplanstouseastructuredinterviewandacognitive
interview
ability test scores into new
to select the officers.
regression analysis
The and
cognitive thentest
ability the
cognitive ability scores. If the cognitive
correlates.30withperformance,andthestructuredinterview ability scores are
statisticallysignificant(i.e.,provide
correlates .20 with performance. The incrementalvalidity),the
cognitive interview is
departmentcorrelated
moderately would use with both the
the structured
interview and interview.
the cognitive
When
abilitytest. Iftheadditionofthecognitiveabilitytestisnot
usedtogetherinastepwiseregression,theinterviewandthe
significant, the department would use only the structured
testcorrelate.35withpoliceperformance.
interview.Thoughthedepartmentishappywiththevalidityofthe
two tests, it is concernedbecause their selectionbatteryhas
adverseimpactagainstAfricanAmericans.Afteranalyzingthe
datafurther,thedepartmentfindsthattheadverseimpactis
due to the cognitive ability testAfrican Americans and
Whitesscoreequallywellonthestructuredinterview.
Because of the adverse impact, the department
considersdroppingthecognitiveabilitytestbutdoesntwantto
do that because the test is valid. In this case, hierarchical
regression might be a partial solution. By entering the
structuredinterviewintotheequationfirst,itwillcarrymore
weight than it would in a stepwise regression equation. By
increasingtheweight givento thepredictorwithnoadverse
impact, the adverse impact of the entire selection procedure
willbereduced(butnoteliminated).
Hierarchicalregressionisalsocommonlyusedinsalary
equity analysis. For example, suppose that a school system
discoversthattheaveragesalaryforits30femalejanitorsis
$17,232 compared to $20,400 for its 50 male janitors. The
Department of Labor thinks this difference is due to
discrimination. Theschoolsystem,however, thinks thatthe
Understanding Regression
differenceinaveragesalaryisduetothenumberofyearsthe
employees have been with the school system and to their
performanceratingsratherthantosexdiscrimination.
Totestitsidea,theschoolsystemwouldfirstenterthe
employees tenure and performance rating data into the
analysisandthenentertheemployeesgender(codedas0for
males,1forfemales).Ifgenderdidnotentertheequationasa
significantpredictorofsalary,itcouldbesaidthatthesalary
differences were indeed due to tenure and performance. If
gender entered the equation as a significant predictor, two
interpretations could be made. The first is that the school
systemisdiscriminating. Thesecondisthatthereareother
unknownvariables(e.g.,educationlevel)thatcouldexplainthe
salary difference if they were entered into the regression
equation.
CONSIDERATIONSINRUNNINGAREGRESSIONANALYSIS
Foraregressionanalysistobeaccurate,threefactorsshouldbe
considered: number of subjects, variables in the regression
model,andmissingdata.
NumberofSubjects
Thoughcomputerprogramswillallowyoutorunaregression
analysiswithdatafromtwoormoresubjects,theresultsare
notasaccurate(stable)whenasmallnumberofsubjects(e.g.,
applicants, employees) is used. The question of how many
subjectsareneededtoreliablyrunaregressionanalysisisa
difficultonetoansweranddependsinpartonthepurposeof
the regression analysis. If the purpose of the regression
analysis is to explain what is happening in a series of data
(e.g.,Aremenandwomenataparticularcompanybeingpaid
equitably?Whatfactorsareassociatedwithabsenteeismonthe
nightshift?),fewersubjectsareneededthanifthepurposeisto
predict the behavior of people not in the sample (e.g., using
employeedatafrom19992005topredicthowfutureemployees
Chapter 5
So, how many subjects do you need? The answer
dependsonwhoyouask.Somestatisticiansarguethatthereis
aminimumnumberofsubjectsthatmustbepresenttoconduct
aregressionanalysis(e.g.,50),somearguethatthekeyisratio
of subjects to the number of variables (e.g., 10 subjects for
every predictor), and others argue that both a minimum
numberandthesubjectstovariableratioareimportant(e.g.,a
minimumof30subjectsandatleast10subjectsperpredictor).
In general, regression used to explain data can be
comfortablyusedwhenyouhavedatafrom50ormorepeople.
Regressions can be run with data from fewer people, but
cautionshouldbeusedwheninterpretingtheresults.
VariablesintheRegressionModel
ModelSpecification.Oneoftheassumptionsinregressionis
that all relevant variablesareincluded inthemodel andno
irrelevant variables are included. For example, suppose that
you weretrying to predict graduate GPAand theorized that
grades in graduate school are the result of both cognitive
ability and motivation. If your regression equation only
included GRE scores (a measure of cognitive ability) but no
measure of motivation (e.g., undergraduate GPA, letters of
recommendation,personalstatement),youwouldhaveamodel
specificationerror.Thusitisimportanttotrytofirsttheorize
orbrainstormtherelevantvariablesandthenmakeastrong
efforttoincludetheminyourregression.
MissingData
InterpretingRegressionResults
Table5.2
Regressionstatistics
Example
Understanding Regression
SalaryStudy GraduateSchool
Admissions
Observations 33 232
MultipleR 0.53 0.47
Rsquare 0.28 0.22
AdjustedR2 0.21 0.21
Standarderror $2,160 0.39
EFFECTIVENESSOFTHEREGRESSIONANALYSIS
Observations
MultipleR
RSquare(R2)
RSquare(R2)isthepercentageofindividualdifferencesinthe
dependent variable that the regression model explains. As
shown in Table 5.2, the combination of timeincompany,
bachelorsdegree,andperformanceratingsaccountsfor28%of
theindividualdifferencesinemployeepay(.53*.53).Weare
not sure what accounts for the additional 72%. For the
graduateadmissionsstudy,thecombinationofundergraduate
GPA,GREscores,andreferenceratingsaccountsfor22%ofthe
variabilityingraduateschoolgrades(.47*.47).
AdjustedR2
Regression is most accurate with large sample sizes. The
Adjusted Rsquare corrects for estimated errors caused by
smallsamplesizes.Thelargerthesamplesize,thesmallerthe
differencebetweentheR2 andtheadjustedR2. InTable5.2,
theR2of.28forthesalarystudywasadjusteddownwardto.21
and the R2 of .22 for the graduate admissions study was
adjusted downward to .21. Notice that because the graduate
admissions study has a much larger sample size (232) than
doesthesalarystudy(33),theadjustedR2 didnotdeclineas
much.
StandardErroroftheEstimate
Asmentionedearlierinthechapter,regressioncanbeusedto
makepredictions.Inthesalarystudy,thegoalwastopredict
orestimatewhatanemployeessalaryshouldbegivenhis/her
yearsinthecompany,education, andperformance. Because
estimates made from a regression equation are just that
estimatesmostregressionoutputincludestheStandardError
oftheEstimate. ThegreatertheR2 andthesamplesize,the
smallerthestandarderroroftheestimate.
Understanding Regression
AsshowninTable5.2,forthesalarystudy,thestandard
errorof$2,160indicatesthat68%(onestandarddeviationfrom
the mean) of the errors in estimating what an employees
salaryshouldbewillfallwithin$2,160.Statedanotherway,if
weestimatethatanemployeeshouldbepaid$45,000,weare
68% confident that their salary should be between $42,840
($45,000 $2,160) and $47,160 ($45,000 + $2,160). So, if we
estimate that an employee should make $45,000 and she is
actuallymaking$44,000,wewouldprobablynotbeconcerned
that the employee is underpaid; because her actual salary
($44,000)fallswithinonestandarderror($42,840$47,160)of
theestimated salary.Thoughourexampleusedonestandard
error, most HRprofessionalsuseacriterionof twostandard
errors.
Forthegraduateadmissionsstudy,thestandarderrorof
the estimate was .39, indicating that if we predict that a
studentwillearnagraduateGPAof3.6,wewouldexpectthat
68%ofthetimetheiractualgraduateGPAwillbebetween3.21
(3.60.39)and3.99(3.60+.39).
ANALYSISOFVARIANCERESULTS
Table5.3
ANOVAtableforthesalarystudyregressionanalysis
Source df SumofSquares MeanSquare F p<
Regression 3 53,164,088.63 17,721,362,88 3.80 0.02
Chapter 5
Residual 29 135,373,606.01 4,668,055.38
Total 32 188.537.694.64
Table5.4
ANOVAtableforthegraduateadmissionsregression
Source df SumofSquares MeanSquare F p<
Regression 3 9.97 3.32 21.96 0.000
Residual 228 34.50 0.15
Total 231 44.47
ForthegraduateadmissionsstudyANOVAinTable5.4,
noticethattheprobabilityvalueis0.000.Thisalsoindicates
that the R2 is statistically significant. Most programs round
probabilitylevelstotwoorthreedecimalplaces.Asaresult,if
youseeaprobabilityvalueof.00or.000, it means that the
probabilitythattheresultsoccurredbychanceislowerthan1
inahundredforthe.00figureand1inathousandforthe.000
figure,bothofwhicharestatisticallysignificant.
Although the other numbers in the tables are only
importantbecausetheyprovidethedatanecessarytogetthe
significancelevel,incaseyouareinterested,hereiswhatthe
othercolumnsmean.
DegreesofFreedom
SumofSquares,MeanSquare,andF
Thesumsofsquaresandthemeansquareareusedtocompute
theFvalue that youlearnedabout inChapter 3. Themean
square is computed by dividing the sum of squares by the
degreesoffreedomandtheFvalueiscomputedbydividingthe
regressionmeansquarebytheresidual(orerror)meansquare.
INFORMATIONABOUTTHEINDEPENDENTVARIABLES
Thefinalsectionoftheoutputcontainsinformationabouteach
of the independent variables included in the regression
analysis. Thefirstkeyvalueforeachvariableisthe pvalue,
which is the significance level. If this value is less than or
equal to .05, the variable explains a statistically significant
percentage of the individual differences in your dependent
variable.InthesalarystudyexampleshowninTable5.5,time
in company (tenure) with a pvalue of .00 is statistically
significant but having a bachelors degree (p < .13) and
performanceratingsarenot(p<.89).
When looking at the significance levels for the
independentvariables,threepatternscanemerge:allvariables
will be statistically significant, none of the variables will be
statisticallysignificant,orsome,butnotall,ofthevariables
will be statistically significant. If none of the variables are
significant,youcantusetheregressionmodeltounderstand
currentbehaviorortopredict/estimatefuture/desiredbehavior.
InthegraduateadmissionsstudyshowninTable5.6,
bothGPA(p<.000)andGREscores(p<.000)arestatistically
significantbutreferenceratings(p<.118)arenot.
Chapter 5
Table5.5
Informationabouttheindependentvariablesinthesalary
study
Standard
Variable Coefficient tvalue pvalue Beta
Error
Intercept $19,756.00 $668.97 29.53 0.00 0.00
Tenure $318.82 $103.69 3.07 0.00 0.74
Degree $983.32 $625.57 1.57 0.13 0.39
Performance $96.58 $693.77 0.14 0.89 0.02
Table5.6
Informationabouttheindependentvariablesinthegraduate
admissionsstudy
Standard
Variable Coefficient tvalue pvalue Beta
Error
Intercept 1.13 .321 3.52 .001
GPA .383 .076 5.03 .000 .337
GRE .001 .000 3.59 .000 .211
References .136 .087 1.57 .118 .105
THEREGRESSIONEQUATION
Y=c+(b1)(x1)
WhereYisthepredictedvalueofsomevariable,cisaconstant
(inalgebra,wewouldcallthistheinterceptwhichrepresents
the predicted score on the criterion if the scores on the
predictorwerezero),b1istheweightwegiveourpredictor(in
algebra we would call this the slope which represents the
amount of change we would expect in thepredictor for each
unit of change in the predictor), and x1 is the score on a
predictor. Though the constant and the weight can be
calculatedbyhand,wenormallyletthecomputerdothework
byusingaprogramsuchasSAS,SPSS,orExcel.
To use our regression results from Table 5.6, our
regressionequationtopredictgraduateschoolgradeswouldbe:
PredictedgradGPA=1.13+(.383)(UGGPA)+(.001)(GRE)+(.136)
(referencescore)
Intheequation:
Chapter 5
1.13istheconstant(intercept)
.383istheweightthatismultipliedbytheundergraduate
GPA
.001istheweightthatismultipliedbytheGREscore
.136istheweightthatismultipliedbythereferencerating
(the reference rating is on a 14 scale with a 4 being
excellentanda1beingbelowaverage).
Let'susetwohypotheticalstudentsasanexample. Jenny
CraighasaGREscoreof1000,anundergraduateGPAof3.60,
andareferenceratingscoreof3.0. RichardSimmonshasa
GREscoreof900,anundergraduateGPAof3.0andareference
rating score of 3.5. The formula to predict the students'
graduateGPAswouldbe:
Jenny'sGPA =1.13+(.383)(3.60)+(.001)(1000)+(.136)(3.0)
=1.13+1.38+1.00+0.41
=3.92
Richard'sGPA =1.13+(.383)(3.00)+(.001)(900)+(.136)(2.0)
=1.13+1.15+0.90+.272
=3.45
FinalThought
Box5.1
Checklistforevaluatingjournalarticlesandtechnicalreports
usingregression
? AssessmentQuestion
Isthesamplesizelargeenoughtohandlethenumberofvariables
intheregression?
Areallrelevantvariablesincludedintheregression?
Isthemodelfreeofirrelevantvariables?
Isthehypothesizedrelationshipbetweentheindependentand
dependentvariableslinear?
Ifthehypothesizedrelationshipisnotlinear,didtheresearcher
testforcurvilinearrelationships?
Didtheresearchersearchforandremoveoutliers?
Arethecorrelationsamongtheindependentvariableslessthan.
90?
Chapter 5
ApplyingYourKnowledge
Youcanapplywhatyouhavelearnedinthischapterby
reading the following articles in Applied H.R.M. Research
(www.xavier.edu/appliedhrmresearch)andinPublicPersonnel
Management.
Befort,N.&Hattrup,K.(2003).Valuingtaskandcontextual
performance:Experience,jobroles,andratingsofthe
importanceofjobbehaviors.AppliedH.R.M.Research,
8(1),1732.
Hausdorf,P.A.,&Girard,M.J.L.(2013).Predictingclerical
trainingperformanceintheCanadianforces:A
comparisonofcognitiveability,educationandwork
experience.AppliedH.R.M.Research,13(1),6972.
Huang,I,Chuang,C.J.,&Lin,H.(2003).Theroleofburnout
intherelationshipbetweenperceptionsof
organizationalpoliticsandturnoverintentions.Public
PersonnelManagement,32(4),519530.
Raynes,B.L.(2001).Predictingdifficultemployees:The
relationshipbetweenvocationalinterests,selfesteem,
andproblemcommunicationstyles.AppliedH.R.M.
Research,6(1),3366.
Roberts,G.E.(2003).Municipalgovernmentparttime
employeebenefitspractices.PublicPersonnel
Management,32(3),435454.
6.MetaAnalysis
_______________________________
ProblemswithTraditionalLiterature
Reviews
SMALLBUTCONSISTENTCORRELATIONS
MODERATERELATIONSHIPSANDSMALLSAMPLESIZES
Asecondsituationinwhichtraditionalliteraturereviewsoften
drawincorrectconclusionsoccurswhenthecorrelationsinthe
previous studies are moderate or high, but the sample sizes
weretoolowfortherelationshiptobestatisticallysignificant.
TakeforexamplethefourstudiesshowninTable6.2.Eachof
thecorrelationsisatwhatwewouldconsiderahighlevel,yet
thecorrelationswouldnotbestatisticallysignificantduetothe
small sample sizes in each study. If we combined the four
studieshowever,wewouldgetanaveragecorrelationof.41
with a total sample size of 80, this would be statistically
significant.
Tobetterunderstandsamplingerror,imaginethatyou
haveabowlcontainingthreeredballs,threewhiteballs,and
threeblueballs. Youare askedtocloseyour eyesandpick
threeballsfromthebowl.Becausethereareequalnumbersof
red,white,andblueballsinthebowl,youwouldexpecttodraw
oneofeachcolor.However,inanygivendrawfromthebowl,it
isunlikelythatyouwillgetoneofeachcolor. Ifyouhaveno
lifeanddrawthreeballsatatimefortenhours,youmightget
threeredballsonsomedraws,nowhiteballsonotherdraws,
andthreewhiteballsonotherdraws. Thus,eventhoughwe
knowthereareanequalnumberofeachcolorofball,anyone
drawmayormaynotrepresentwhatweknowis"thetruth."
However,overthe10hoursyouaredrawingballs, themost
commondrawwillbeoneofeachcolorafinding consistent
withwhatweknowisinthebag.
Thesameistrueinresearch. Supposeweknowthat
thetruecorrelationbetweeneducationlevelandperformance
Chapter 6
another agency might report a correlation of .50, and yet
anotheragencymightreportacorrelationof.30. Ifallthree
studieshadsmallsamples,thedifferencesamongthestudies
and differences from the "truth" might be due purely to
samplingerror.Thisiswheremetaanalysissavestheday.
Metaanalysis is a statistical method for combining
researchresults. Sincethefirstmetaanalysiswaspublished
byGeneGlassin1976,thenumberofpublishedmetaanalyses
hasincreasedtremendously;andthemethodologyhasbecome
increasinglycomplex.ThemetaanalysispioneerswereFrank
Schmidt and John Hunter, and almost every metaanalysis
usesthemethodstheysuggestedintheir1990bookMethodsof
MetaAnalysis and clarified in the book Conducting Meta
AnalysisusingSASbyWinfredArthur,WinstonBennett,and
AllenHuffcutt.
Though metaanalyses will vary somewhat in their
methods and their purpose, most metaanalyses involving
personnelselectionissuestrytoanswerthreequestions:
ConductingaMetaAnalysis
Meta-Analysis
FINDINGSTUDIES
CHOOSINGSTUDIESTOINCLUDEINTHEMETAANALYSIS
Oncealloftherelevantstudiesonatopichavebeenlocated,
the next step is to determine which of these studies will be
included in the metaanalysis. To be included in a meta
analysis, an article must report the results of an empirical
Chapter 6
investigation and include a correlation coefficient, another
statistic (e.g., F, t, chisquare) that could be converted to a
correlationcoefficient,ortabulardatathatcanbeenteredinto
the computer to yield a correlation coefficient (many meta
analysesuseCohen'sDratherthanacorrelationcoefficientbut
the rules to include an article are the same). Articles that
report results without the above statistics (e.g., "we found a
significant relationship between education and academy
performance"or"wedidn'tseeanyrealdifferencesbetweenour
educated and uneducated officers") cannot be included in a
metaanalysis.
Often, metaanalysts will have other rules about
keepingstudies.Forexample,inametaanalysisonemployee
wellness programs, the researchers decision to include only
studiesusingbothpreandpostmeasuresofabsenteeismas
wellasexperimentalandcontrolgroupsresultedinonlythree
usablestudies.
CONVERTINGRESEARCHFINDINGSTOCORRELATIONS
CUMULATINGVALIDITYCOEFFICIENTS
CORRECTINGFORARTIFACTS
Table6.4
ExampleofCumulatingValidityCoefficients
Sample
Study Correlation CorrelationxSampleSize
Size
Briscoe(1997) .23 150 34.5
Green(1974) .10 100 10.0
Curtis(1982) .42 50 21.0
Logan(1991) .27 300 81.0
Ceretta(1995) .01 20 0.2
Greevy(1989) .29 200 58.0
TOTAL 820 204.8
WeightedAverage=204.8820=.25
Table6.5
CorrectingCorrelationsforTestUnreliability
Study Validity Test Square Corrected
Chapter 6
Theseadjustmentscanbemadeinoneoftwoways.The
most desirable way is to correct the validity coefficient from
each study based on the predictor reliability, criterion
reliability, and restriction of range associated with that
particularstudy.Asimpleexampleofthisprocessisshownin
Table6.5.Tocorrectthevaliditycoefficientsineachstudyfor
test unreliability, the validity coefficient is divided by the
squarerootofthereliabilitycoefficient.IntheTinkers(1985)
study,thereliabilityofthetestwas.92,thesquarerootof.92
is.96,andthecorrectedvaliditycoefficientis.30.96=.31.
When the necessary information is not available for
each study, the mean validity coefficient is corrected rather
than each individual coefficient. This is the most common
practice. The numbers used to make these corrections come
eitherfromtheaverageofinformationfoundinthestudiesthat
provided reliability or range restriction information or from
other metaanalyses. For example, an estimate of the
reliabilityofsupervisorratingsofoverallperformance(r=.52)
canbeborrowedfromthemetaanalysisonratingreliabilityby
Viswesvaran,Ones,andSchmidt(1996).
SEARCHINGFORMODERATORS
UnderstandingMetaAnalysisResults
Table6.6Samplemetaanalysisresultsforcognitiveability
95%Confidence 90%Credibility
Interval Interval
Criterion K N r SE% Qw
Lower Upper Lower Upper
Supervisor 61 16,231 .16 .12 .20 .27 .14 .40 80% 76.40
ratings
K=numberofstudies,N=samplesize,r=meancorrelation,=meancorrelationcorrectedforrangerestriction,
criterionunreliability,andpredictorreliability,SE%=percentageofvarianceexplainedbysamplingerrorand
studyartifacts
Meta-Analysis
NUMBEROFSTUDIESANDSAMPLESIZE
The"K"columnindicatesthenumberofstudiesincludedinthe
metaanalysis, and the "N" column indicates the number of
totalsubjectsinthestudies.Thereisnotamagicalnumber
ofstudiesthatwelookfor,butametaanalysiswith20studies
isclearlymoreusefulthanonewithfive.
MEANOBSERVEDVALIDITYCOEFFICIENT
The"r"columnrepresentsthemeanvaliditycoefficientacross
allstudies(weightedbythesizeofthesample).Thiscoefficient
answers our question about the typical validity coefficient
foundinvalidationstudiesonthetopicofcognitiveabilityand
police performance. On the basis of our metaanalysis, we
would conclude that the validity of cognitive ability in
predictingacademygradesis.41,andthevalidityofcognitive
abilityinpredictingsupervisorratingsonthejobperformance
is.16.
CONFIDENCEINTERVAL
Todetermineifourobservedvaliditycoefficientis"statistically
significant,"welookatthenexttwocolumnswhichrepresent
thelowerandupperlimitstoour95%confidenceinterval. If
the interval includes zero, we cannot say that our mean
validitycoefficientissignificant.FromthefiguresinTable6.6,
we would conclude that cognitive ability is a significant
predictorofgradesintheacademy(ourconfidenceintervalis.
33 .48) and performance as a police officer (our confidence
interval is .12 .20). Using confidence intervals we can
communicateourfindingswithasentencesuchasThoughour
bestestimateofthevalidityofcognitiveabilityinpredicting
academy performance is .41, we are 95% confident that the
validity isno lower than .33and no higher than .48. It is
Chapter 6
importanttonotethatsomemetaanalysesuse80%,85%,or
90% confidence intervals. The choice of confidence interval
levelsisareflectionofhowconservativeametaanalystwants
tobe:themorecautiousonewantstobeininterpretingthe
metaanalysisresults,thehighertheconfidenceintervalused.
CORRECTIONSFORARTIFACTS
CREDIBILITYINTERVAL
PERCENTAGEOFVARIANCEDUETOSAMPLINGERROR
Table6.7
MetaAnalysisofGradesandWorkPerformance
rcr,r rcr,rr,p
Criterion K N r rcr 80%C.I. SE%
r r
Agoodexampleoftheuseofthisstatisticcanbefound
in a metaanalysis of the effect of flextime and compressed
workweeks on workrelated behavior (Baltes, Briggs, Huff,
Wright,&Neuman,1999). AsyoucanseeinTable6.8,the
asterisksinthefinalcolumnindicateasignificantQw,forcinga
searchformoderators.Notethatthismetaanalysisusedthed
statisticratherthananr(correlation)astheeffectsize.Inthis
example,adof.30isequivalenttoanrof.15.
Table6.8
MetaAnalysisofFlextimeandCompressedWorkWeeks
95%CI
Variable K N D
Lower Upper
Flextime 41 4,492 .30 .26 .35 1004.55**
Compressedworkweek 25 2,921 .29 .23 .34 210.58**
Meta-Analysis
ApplyingYourKnowledge
Youcanapplywhatyouhavelearnedinthischapterby
reading the following articles in Applied H.R.M. Research
(www.xavier.edu/appliedhrmresearch).
Aamodt,M.G.(2004).SpecialissueonusingMMPI2scale
configurationsinlawenforcementselection:
Introductionandmetaanalysis.AppliedH.R.M.
Research,9(2),4152.
Godfrey,K.J.,Bonds,A.S.,Kraus,M.E.,Wiener,M.R.,&
Toth,C.S.(1990).Freedomfromstress:Ametaanalytic
viewoftreatmentandinterventionprograms.Applied
H.R.M.Research,1(2),6780.
Chapter 6
7.FactorAnalysis
____________________________
IMAGINETHEFOLLOWINGSITUATIONS:
A human resource manager asked her employees 50
questions about their attitudes toward work and is
lookingtofindaneasywaytosummarizetheresponses
tothe50questions.
AnHRanalysthaswrittena100itemmathtestandis
worried that the test may be tapping more than just
mathskills.
Atrainingspecialisthasevaluatedheremployeeson20
dimensionsofcommunicationskillsbutisworriedthat
givingfeedbackon20dimensionswilltaketoolongand
betoodifficulttounderstand.
An organization development specialist created a
personalitytesttouseintrainingworkshops.Hethinks
his 200 questions tap five distinct personality
dimensions.
In these situations, the four human resource
professionalshaveoneoftwogoals.Eithertheywanttoreduce
a large amount of data (e.g., 50 attitude questions, 20
communicationdimensions)intosomethingmoremanageable,
or they want to ensure that they are measuring the correct
number of constructs (e.g., math skills, 5 personality
dimensions).Factoranalysiswillhelpachievebothgoals.
Factor analysis is a statistical technique that is
extremelydifficulttocomputebyhand,fairlyeasytocompute
withacomputerprogramsuchasSASorSPSS,andfairlyeasy
to understand when reading the results in a journal article.
Thischapterwillfocusonunderstandingtheterminologyand
Chapter 7
tablesusedinjournalarticlesusingfactoranalysis.
Afactoranalysiscomputesthesimilarityofresponsesto
aseriesofquestionsandthendeterminesgroupsofquestions
that seem to generate similar responses. For example, in a
studyonhobbies,interestinbaseball,football,andbasketball
might fall into a sports group and interest in hiking,
canoeing,andfishingmightfallintoanoutdoorgroup.
Thesegroupingsarecalledfactors.Foreachquestion,a
factor coefficient is generated indicating the extent to which
thatquestionrelatestoeachfactor. Thesefactorcoefficients
can be interpreted in much the same way as we would a
correlation coefficient: the higher the coefficient, the greater
therelationshipbetweenthequestionandthefactor.
DeterminingtheNumberofFactors
DeterminingtheItemsBelongingto
EachFactor
Todeterminetheitemsthatbelongtoeachfactor,welookat
the size of the factor coefficients. Generally, for an item to
belongtoafactor,thefactorcoefficientmustbe.30orhigher.
AsyoucanseebackinTable7.1,popcorn(.80),potatochips
(.83),andCrackerJack(.75)belongtoFactor1;carrots(.64),
peas(.84),andlimabeans(.75)belongtoFactor2;andapples
(.60),oranges(.87),andplums(.73)belongtoFactor3.Notice
thatsomeofthefactorcoefficientsarenegative.Asyoumight
remember from the chapter on correlation, a negative
coefficientisnotabadthing:itsimplytellsusthedirectionof
the relationship. For example, in Table 7.1, the negative
loadingofapplesonFactor1(.21)tellsusthatpeoplewho
likeapplesarenotasinclinedtolikethefoodsthatloadhighly
onFactor1(popcorn,potatochips,CrackerJack).
Often,anegativeloadingisexpectedandhelpsdefinea
factor. Forexample,wemightaddshyandintrovertedto
the traits that form the extraversion factor in Table 7.2.
Whereas, we would expect outgoing, talkative, funny, and
sociabletohavehighpositivefactorcoefficients;andwewould
expect shy and introverted to have high negative factor
coefficientsontheextraversionfactor.
When describing the results of a factor analysis,
researchers will often use the term rotation. Rotation is a
statistical procedure that makes it easier to determine the
Factor Analysis
itemsthatbelongtoeachfactor. Themostcommonrotations
include varimax, equimax, oblimin, and quartimax; each of
whichhasadifferentgoal.Forexample,thegoalofavarimax
rotationistohavehighfactor coefficientsforitemsthat are
relevant to the factor and very low coefficients for the other
items. In contrast, the aim of a quartimax rotation is to
increase the odds that an item will have a high factor
coefficientononlyonefactor.Itshouldbenotedthatarotation
onlymakesthefactoranalysiseasiertointerpret.Itdoesnot
changetheactualrelationshipsamongtheitems.
Table7.3showsthefactoranalysisofsixscalesfroman
abilitytest. Noticethatwithoutarotation,perceptualspeed
and vocabulary have high loadings on both factors. After
rotation, however, the factors are cleaner, and each scale
loadsonlyononefactor,makingthefactorseasiertointerpret.
Also notice that the factor loadings for the two types of
rotationsarealmostidentical,thusthetypeofrotationwould
nothavemattered.Thisisnotalwaysthecase.
NamingtheFactors
Onceweseetheitemsthatbelongtoafactor,wetrytomake
senseofthatfactorbygivingitaname.Theexamplebackin
Table7.1shouldbeeasythethreefactorsrepresentthefood
groups of vegetables, fruits, and junk food. Sometimes,
however,namingthefactorscanbemorechallenging.Agood
example comes from the factor analysis of difficult employee
types that was conducted by Raynes (1997). Raynes noticed
that popular books on dealing with difficult people (e.g.,
Bramson, 1981; Brinkman & Kirshner, 1994) suggested that
there are several types of difficult people such as tanks,
snipers, whiners, yes people, no people, maybe people,
gossipers, and knowitalls. Raynes, who questioned whether
therewereactuallysomanytypes,factoranalyzedsupervisor
ratingsofemployeesdifficultbehaviorsandfoundthatthese
behaviorscouldbereducedtothetwofactorsshowninTable
Chapter 7
7.4. What would you call these two factors? Raynes (1997)
labeled Factor I aggressive behaviors and Factor II passive
behaviors.
TheRaynesstudyisagoodexampleoftheusefulnessof
factoranalysis. Byreducingthenumberofvariablesfrom10
to2,Rayneswasabletoconductamoreefficientstudyofthe
validity of a test battery in screening applicants who might
becomeproblememployees. Butmoreimportantly,thefactor
analysisdemonstratedthatthepopularbeliefthatthereare10
separatetypesofdifficultemployeesisnottrue.Instead,there
isonetype(aggressive)whogossips,disagrees,whines,throws
tantrums,andusessarcasm,andasecondtype(passive)who
cantsaynoandwontspeakup. Thus,ratherthanlearning
howtodealwith10typesofdifficultpeople,weonlyneedto
learnhowtohandletwotypes.
DeterminingiftheFactorAnalysisis
AnyGood
Withmoststatistics,wegetasignificancelevelthathelpsus
determinetheconfidencewecanplaceinourfindings. With
factoranalysis,determiningiftheresultsaresignificantisa
bitmoredifficult. Onewaytoevaluateanexploratoryfactor
analysisistoconsidertheamountofvariancethatisexplained
bythefactors. Thisisdonebysummingtheeigenvaluesand
then dividing by the number of items that were factor
analyzed.Forexample,forthefactoranalysisshowninTable
7.1, after summing the eigenvalues (1.96 + 1.79 + 1.69) and
dividingbythenumberofitems(9),weseethat60.4%ofthe
variance(5.449)wasaccountedforbythethreefactors.The
twofactorsinTable7.3accountfor55.9%ofthevariance.Ata
minimum,wewantthispercentagetobeabove50.
If a factor analysis is done to confirm that a certain
numberoffactorsexist,thereareahostofstatisticstotesthow
well the obtained factor analysis results fit the expected
results.Thesegoodnessoffitindexesrangeinvaluefrom0
Factor Analysis
1: the closer to one, the better the fit. An index of .90 is
consideredtheacceptablelevel(Bryant&Yarnold,1995).
Table7.4
FactorAnalysisofDifficultPeople
DifficultBehavior FactorI FactorII
Gossiping .85 .07
Disagreeingwitheveryone .84 .00
Yelling .74 .10
Actinglikeaknowitall .73 .00
Whining .73 .02
Sayingnotoeverything .66 .38
Usingsarcasm .50 .09
Notmakinguptheirmind .43 .69
Notspeakingup .04 .78
Agreeingtoeverything .21 .63
ApplyingYourKnowledge
Youcanapplywhatyouhavelearnedinthischapterby
reading the following articles in Applied H.R.M. Research
(www.xavier.edu/appliedhrmresearch).
Conte,J.M.,Ringenbach,K.L.,Moran,S.K.,&Landy,F.J.
(2001).Criterionvalidityevidencefortimeurgency:
Associationswithburnout,organizationalcommitment,
andjobinvolvementintravelagents.AppliedH.R.M.
Research,6(2),129134.
Franz,T.M.&Norton,S.D.(2001).Investigatingbusiness
casualdresspolicies:Questionnairedevelopmentand
exploratoryresearch.AppliedH.R.M.Research,6(2),79
94.
Raynes,B.L.(2001).Predictingdifficultemployees:The
Chapter 7
relationshipbetweenvocationalinterests,selfesteem,
andproblemcommunicationstyles.AppliedH.R.M.
Research,6(1),3366.
References
________________________
subordinate,supervisor,peer,andselfratings.Human
Performance,10(4),331360.
CooperHakim,A.,&Viswesvaran,C.(2005).Theconstructofwork
commitment:Testinganintegrativeframework.
PsychologicalBulletin,131(2),241259.
Gaugler, B.B., Rosenthal, D. B., Thornton, G. C., & Bentson, C.
(1987).Metaanalysisofassessmentcentervalidity.Journal
ofAppliedPsychology,72,493511.
Griffeth,R.W.,Hom,P.W.,&Gaertner,S.(2000).Ametaanalysis
ofantecedentsandcorrelatesofemployeeturnover:Update,
moderatortests,andresearchimplicationsforthenext
millennium.JournalofManagement,26(3),463488.
Huffcutt, A. I., & Arthur, W. (1994). Hunter and Hunter (1984)
revisited: Interviewvalidityforentryleveljobs. Journalof
AppliedPsychology,79(2),184190.
Hunter,J.E.,&Schmidt,F.L.(1990).Methodsofmetaanalysis:
Correctingerrorandbiasinresearchfindings.Newbury
Park,CA:SagePublications.
Judge,T.A.,Rodell,J.B.,Klinger,R.L.,Simon,L.S.,&Crawford,E.
R.(2013).HierarchicalrepresentationsoftheFiveFactor
Modelofpersonalityinpredictingjobperformance:
Integratingthreeorganizingframeworkswithtwotheoretical
perspectives.JournalofAppliedPsychology,98(6),875925.
Judge,T.A.,Thoresen,C.J.,Bono,J.E.,&Patton,G.K.(2001).The
jobsatisfactionjobperformancerelationship:Aqualitative
andquantitativereview.PsychologicalBulletin,127(3),376
407.
Kachigan,S.K.(1986).Statisticalanalysis.NY:RadiusPress.
Koslowsky,M.,Sagie,A.,Krausz,M.,&Singer,A.H.(1997).
Correlatesofemployeelateness:Sometheoretical
considerations.JournalofAppliedPsychology,82(1),7988.
Mathieu,J.E.,&Zajac,D.M.(1990).Areviewandmetaanalysisof
theantecedents,correlates,andconsequencesof
organizationalcommitment.PsychologicalBulletin,108(1),
171194.
McDaniel,M.A.,Hartman,N.S.,Whetzel,D.L.,&Grubb,W.L.
(2007).Situationaljudgmenttests,responseinstructions,
andvalidity:Ametaanalysis.PersonnelPsychology,60(1),
6391.
References
Quinones,M.A.,Ford,J.K.,&Teachout,M.S.(1995).The
relationshipbetweenworkexperienceandjobperformance:A
conceptualandmetaanalyticreview.PersonnelPsychology,
48(4),887910.
Index
________________________
Moderators,9899
Analysisofvariance,3946
Regression,8385 Results,99104
Centraltendency,1416 Samplingerror,9294
Chisquare,4748 Mode,1516
Confidenceinterval,101102 Modelspecification,78
Credibilityinterval,102 Multicollinearity,7879
Conveniencesample,13 MultipleR,8182
Correlation,5170 Nominaldata,7,62
Magnitude,5259 Ordinaldata,8
Norms,6465 Percentiles,2425
Significance,6069 Practicalsignificance,67,63
Curilinearity,5760 69
Eignenvalues,108 Qwstatistic,103104
Factoranalysis,107114 Randomassignment,13
Fishersexacttest,36,37,46 Randomsample,12
47 Range,1819
Frequencies,36 Rangerestriction,57
Hierarchicalregression,7477 Ratiodata,8
Intervaldata,8 Regression,7190
Interveningvariable,5152 Considerations,7780
InvertedU,5960 Equation,8688
Lanningv.SEPTA,6162 Interpreting,8088
MannWhitneyU,35 Purpose,7173
Mean,1415,3135 Types,7477
Measurementscales,79 Reliability,5557
Median,15,35 Rsquare,63,82
Metaanalysis,91105 Samplesize,1114
Artifacts,9798 Samplingerror,9294,103104
Findingstudies,9596 Significancelevels,37
References
Standarddeviation,1921
Standarderror,8283
Standardscores,2328
Statisticssymbols,29
Stepwiseregression,7476
ttests,3739
TypeIerrors,4
TypeIIerrors,4
Utilityanalysis,6669
Validity,6469
Variability,1723
Variance,23
Zscores,2526