Sie sind auf Seite 1von 127

Understanding Statistics:

A Guide for I/O


Psychologists and Human
Resource Professionals

Michael G. Aamodt
DCI Consulting Group and Radford
University

Michael A. Surrette
Springfield College

David B. Cohen
DCI Consulting Group
2016ThomsonWadsworth,apartofTheThomsonThomsonHigherEducation
Corporation.Thomson,theStarlogo,andWadsworth10DavisDrive
aretrademarksusedhereinunderlicense. Belmont,CA940023098
USA

AllRightsReserved.Nopartofthiswork
coveredbythecopyrighthereonmaybereproducedor
usedinanyformorbyanymeansgraphic,electronic,
ormechanical,includingphotocopying,recording,
taping,Webdistribution,informationnetworks,or
informationstorageandretrievalsystemswithoutthe
writtenpermissionofthepublisher.

PrintedintheUnitedStatesofAmerica
12345671312111009

ISBN:0495186635
Contents

Introduction and Acknowledgements


iii

1. The Concept of Statistical Analysis


1

2. Statistics that Describe Data


11

3. Statistics that Test Differences Between Groups


31

4. Understanding Correlation
51

5. Understanding Regression
71

6. Meta-analysis
91

7. Factor Analysis 107

References 115

Index 119

i
ii
Introductionand
Acknowledgments
_______________________________

THE PURPOSE OF THIS BOOK istoprovidestudentsandhuman


resourceprofessionalswithabriefguidetounderstandingthe
statisticstheyencounterinjournalarticles,technicalreports,
andconferencepresentations.Ifweaccomplishourgoals,you
wontpanicwhensomeoneusessuchtermsasttestoranalysis
ofvariance,youwonthaveapuzzledlookduringconference
presentations, andyou will actually comprehendmost of the
statisticsyouencounter. Whatyouwontbeabletodoafter
readingthisbookiscomputethesestatisticsbyhand.Ifthatis
your goal, there are plenty of good statistics books available
thatcanteachyouhowtodothis.
Chapter 1 provides an overview of why statistical
analysisisconductedandcoversafewimportantpointssuch
assignificancelevels. Chapter2explainsthebasicstatistics
used to describe data such as measures of central tendency
(mean, median, mode), measures of dispersion (variance,
standard deviation), and standard scores. Chapter 3 covers
statisticsusedtodeterminegroupdifferencessuchas ttests,
analysis of variance, and chisquare. Chapter 4 discusses
correlation and how to interpret correlation coefficients.
Chapter5coversregressionanalysis.Chapter6explainsmeta
analysis, a statistical method for reviewing the literature.
Chapter 7 concludes the discussion of statistics by covering
factoranalysis.
iii
Thoughthereareplentyofotherstatisticsoutthere,we
wantedtocoverthestatisticsmostfrequentlyencounteredby
human resource professionals. To help readers apply what
theyhavelearned,wehaveincludedreferencesineachchapter
tojournalarticlesthathaveusedthestatisticcoveredbythe
chapter. Whenpossible,wetriedtousejournalarticlesfrom
Public Personnel Management, the journal published by the
International Personnel Management Association for Human
Resources or Applied H.R.M. Research, the online journal
(www.xavier.edu/appliedhrmresearch) published by
Xavier University. When possible, we also tried to use some
humor, at least as much as is possible when talking about
statistics.
Tobreakthemonotonyofreadingaboutstatistics,the
namesoftelevisionormoviecharactersarelistedasemployees
inthevarioustables. Try to identifythetelevisionshows or
moviesthattheyrepresent.
We would like to thank Johnny Fuller of Marriott
SodexoandKeliWilsonofDCIConsultingandBobbieRaynes
ofNewRiverCommunityCollegefortheirhelpinreviewing
earlierversionsofthisbookandprovidingvaluablefeedback.
Ifyouhaveanyquestions,wouldliketocommentonthe
book,orfindanerror,pleasefeelfreetocontactMikeAamodt
atmaamodt@dciconsult.com).

iv
1. TheConceptofStatistical
Analysis
_______________________________

IN THE PAST DECADE, it is clear that the field of human


resourceshasbecomemorecomplex. Itseemsthatwecant
readajournal article,listentoaconferencepresentation,or
talk to a consultant without encountering some form of
statistical analysis. Though actually computing many
statisticscanbeacomplexprocess,understandingmostjournal
articlesorconferencepresentationsshouldnotbe.Attimesit
mayappearthattherearefivemilliontypesofstatistics,but
statistical analyses are done for only one of four reasons: to
describedata,determineiftwoormoregroupsdifferonsome
variable(e.g.,testscores,jobsatisfaction),determineiftwoor
morevariablesarerelated,ortoreducedata.Thischapterwill
briefly explain these reasons as well as explain some of the
basics associated with statistical analysis. Each of the
followingchapterswillexplainingreaterdetailthestatistics
mentionedinthischapter.

ReasonsforAnalyzingData

TODESCRIBEDATA

The most simple type of statistical analysisdescriptive


statisticsis conducted to describe a series of data. For
example,ifanemployeesurveywasconducted,onemightwant
to report the number of employees who responded to each

1
question(samplesize),howthetypicalemployeerespondedto
eachquestion(mean,median,mode),andtheextenttowhich
the employees answered the questions in similar ways
(variance, standard deviation, range). These types of
descriptivestatisticswillbediscussedindetailinChapter2.

TODETERMINEIFTWOORMOREGROUPSDIFFERON
SOMEVARIABLE

Once descriptive statistics are obtained, a commonly asked


questioniswhethertwogroupsdiffer.Forexample,didwomen
performbetterintrainingthanmen?Wereolderemployeesas
likely to accept a new benefit plan as their younger
counterparts?Toanswersuchquestions,wewoulduse:
Attestiftherewereonlytwogroups(e.g.,male,female)
andourdescriptivestatisticwasamean.
Ananalysisofvarianceifourdescriptivestatisticwasa
meanandthereweremorethantwogroups(e.g.,south,
north, east, west) or more than one independent
variable(e.g.,raceandgender).
Achisquareifourdescriptivestatisticwasafrequency
count.

ThesestatisticswillbediscussedindetailinChapter3.

TODETERMINEIFTWOORMOREVARIABLESARE
RELATED

Aquestionoftenaskedinresearchistheextenttowhichtwoor
morevariablesarerelated,ratherthandifferent.Forexample,
wemightaskifatestscoreisrelatedtojobperformance,ifjob
satisfaction is related to employee absenteeism, or if the
amountofmoneyspentonrecruitmentisrelatedtothenumber
of qualified applicants. To determine if the variables are
related,wemightusea correlation. Ifwewantedtobeabit
more precise or are interested in how several different
2
Concept of Statistical Analysis

variables predict performance, we might use regression or


causalmodeling.Correlationwillbediscussedingreaterdetail
inChapter4andregressioninChapter5.
TOREDUCEDATA

Attimes,wehavealotofdatathatwethinkcanbesimplified.
Forexample,wemighthavea100itemquestionnaire.Rather
thanseparatelyanalyzeall100questions,wemightcheckto
see if the 100 questions represent five major
themes/categories/factors. To reduce data, we might use a
factor analysis ora cluster analysis. Factor analysis will be
discussedindetailinChapter6.

SignificanceLevels

Significancelevelsareoneofthenicethingsaboutstatistical
analysis. Ifyouarereadinganarticleabouttheeffectiveness
of a new training technique and dont care a thing about
statistics,youcanmovethroughthealphabetsoupdescribing
thetypeofanalysisused(e.g.,ANOVA,MANOVA,ANACOVA)
and go right to the significance level which will be written
somethinglike,p <.03.Whatthisistellingyouisthatthe
difference in performance between two or more groups (e.g.,
trainedversusuntrainedormenversuswomen)issignificantly
differentatsomelevelofchance.
Whatistheneedforsignificancelevels?Supposethat
youwalkintoatrainingroomandaskthepeopleontheright
side of the room how old they are and then do the same to
peoplesittingontheleftsideoftheroom. Youfindthatthe
averageageofthepeoplesittingontherightsideoftheroomis
37.6,whereastheaverageageofpeopleontheleftsideofthe
room is 39.3 years. Does this difference make you want to
submitapaperonthesubject?Coulditbethatolderpeoplesit
closertothedoorsotheydonthavetowalkasfar?Itcouldbe,
butprobablynot. Anytimewecollectdatafromtwoormore
groups,thenumberswillneverbeidentical.Thequestionthen
3
Chapter 1
becomes, if the numbers are never identical, how much of a
difference does it take before we can say that something is
actuallyhappening?Thisiswheresignificancelevelscomein.
Basedonavarietyoffactorssuchassamplesizeandvariance,
theendresultofanystatisticalanalysisisasignificancelevel
that indicates the probability that our results occurred by
chancealone.Ifouranalysisindicatesthatthegroupsdifferat
p<.03,thenwewouldconcludethatthereare3chancesin100
thatthedifferencesweobtainedweretheresultoffate,karma,
orchance. Inthesocialsciences,wehaveaverydumbrule
that if theprobabilityislessthan 5 in 100 that our results
couldbeduetochance(p <.05),wesaythatourresultsare
statisticallysignificant.

CHOOSINGASIGNIFICANCELEVEL

Although.05isthesignificanceleveltraditionallyusedinthe
socialsciences,insomecircumstancesresearchersmaychoose
touseamoreliberaloramoreconservativelevel.Thischoiceis
a function of the cost associated with being wrong. When
interpretingtheresultsofastatisticalanalysis,therearetwo
waysinwhichaninterpretationcanbewrong:aTypeIerror
andaTypeIIerror.Toexplaintheseerrors, lets imaginea
study in which a personnel analyst is trying to predict job
performancebyusinganemploymenttest.
WithaTypeIerror,theresearcherconcludesthatthere
isarelationshipbetweenthetestandjobperformancewhenin
factthereisnosuchrelationship. Witha TypeII error,the
researcherconcludesthatthereisnorelationshipbetweenthe
two variables when in fact there is one. By using a more
conservativesignificancelevelsuchas.01or.001,aresearcher
istryingtoreducethechanceofaTypeIerror. Likewise,by
using a more liberal significance level such as .10 or .15, a
researcheristryingtoreducethechanceofaTypeIIerror.
The decision to use a particular significance level is
determinedbythecostofbeingwrong.Iftheemploymenttest

4
Concept of Statistical Analysis
is expensive to administer and might result in the hiring of
fewerwomenandminorities,ourpersonnelanalystmightwant
touseaconservativesignificancelevel(e.g.,.01)toreducethe
chanceofaTypeIerror. Thatis,ifwearegoingtospenda
greatdealofmoneytouseatestthatmightalsodecreasethe
diversityofourworkforce,wewanttobeverysurethatthetest
actuallypredictsperformance.However,supposethetestcosts
20centsperapplicantanddoesnotresultinadverseimpact,
wemightbemorewillingtomakeaTypeIIerrorusingatest
that doesnt actually predict performance. If this were the
case,wemightbewillingtoacceptasignificancelevelof.10.
Inadditiontoconsideringthefinancialandsocialcosts
ofbeingwrong,significancelevelscanbeselectedonthebasis
ofpreviousresearch. Thatis,if50previousstudiesfounda
significant relationship between an employment test and job
performance, we might be more willing to consider a
probabilitylevelof.07tobesignificantthanwewouldifthere
werenopreviousstudies.
Statistical significance levels only tell us if we are
allowedtopayattentiontoourresults.Theydonottellusif
our results are important or useful. If our results are
statistically significant, we get to interpret them and make
decisions about practical significance. If they are not
statisticallysignificant,westartagain.

STATISTICALSIGNIFICANCEINJOURNALARTICLES

Statisticalsignificancelevelsareusuallypresentedinjournal
articlesorconferencepapersinoneoftwoways.Thefirstway
is to list the significance level in the text. For example, an
articlemightread:

Thejobsatisfactionleveloffemaleemployees(M=4.21)
wassignificantlyhigherthanthatofmaleemployees
(M=3.50),t(60)=2.39,p<.02.

5
Chapter 1
The M = 4.21 and M = 3.50 are mean scores on a job
satisfaction scale, the 60 is the degrees of freedom (you will
learnaboutthisinchapter3),the2.39isthevalueofthettest
(youwilllearnaboutthisinchapter3),andthep<.02tellsus
that there are only 2 chances in 100 that we would expect
similarresultspurelybychance.Inotherwords,thedifference
in satisfaction between men and women is statistically
significant.
Thesecondwaytodepictasignificancelevelistouse
asterisksinatable. Takeforexamplethenumbersshownin
Table1.1.Thecorrelationof.12betweencognitiveabilityand
commendationsdoesnothaveanyasterisks,indicatingthatit
is not statistically significant. The correlation between
educationandcommendationshasoneasteriskindicatingthat
thecorrelationissignificantatthe.05level. Thecorrelation
betweeneducationandperformanceinthepoliceacademyhas
twoasterisksindicatingthatitissignificantatthe.01level,
and the three asterisks above the .43 indicate that the
correlationbetweencognitiveabilityandacademyperformance
issignificantatthe.001levelofconfidence. Thegreaterthe
numberofasterisks,thegreatertheconfidencewehavethat
thenumberdidnotoccurbychance.

PracticalSignificance

Ifourresultsarestatisticallysignificant,wethenaskabout
thepracticalsignificanceofourfindings.Thisisusuallydone
by looking at effect sizes, which can include d scores,
correlations (r), omegasquares, and a host of other awful
sounding terms. Effect sizes are important to understand
becausewecanobtainstatisticalsignificancewithlargesample
sizesbuthaveresultswithnopracticalsignificance.

6
Concept of Statistical Analysis

Table1.1
Exampleofstatisticalsignificance
AcademyScore Commendations
Cognitiveability .43*** .12
Education .28** .24*
*p<.05,**p<.01,***p<.001

Forexample,supposethatweconductastudywitha
millionpeopleandfindthatwomenscoreanaverageof86ona
mathtestandmenscoreanaverageof87. Withsuchabig
samplesize,wewouldprobablyfindthedifferencebetweenthe
twoscorestobestatisticallysignificant.However,whatwould
we conclude about the practical significance of a onepoint
differencebetweenmenandwomenona100pointexam?Are
men superior to women in math? Will we have adverse
impact if we use a math test for selection? Should we
discourageourdaughtersfromacareerinscience? Probably
not. The statistical significanceallows us toconfidently say
thatthereislittledifferencebetweenmenandwomenonthis
variable. Ifwecomputeaneffectsize,wecansaythisina
morepreciseway.
Another good example of the importance of practical
significance comes from the computation of adverse impact
statistics. Imagine a situation in which an employer selects
99%ofthemenand98%ofthewomenapplyingforproduction
jobs. From a practical significance perspective, the 1%
difference suggests that the employer is essentially hiring
malesandfemalesatequalrates.Iftherewere200menand
200 women in the analysis, the 1% difference would not be
statisticallysignificant.If,however,therewere2,000menand
2,000womenintheanalysis,thatsame1%differencewould
now be statistically significant. Without the consideration of
practicalsignificance,onemightconcludethatbecausethe1%
difference in the large sample is statistically significant, the
employermightbediscriminatingagainstwomen.

7
Chapter 1

TypesofMeasurement

Statistical data come in four measurement types: nominal,


ordinal, interval, andratio, whichcan easilyberemembered
withtheacronymNOIR(ifforsomereasonyouactuallywant
to remember these four types). Understanding the four
measurement types is important because the four
measurementtypesareoftenmentionedinjournalarticlesand
certaintypesofstatisticalanalysescanonlybeperformedon
certaintypesofdata.
Nominal data consist of categories or dimensions and
have no numerical meaning by themselves. Examples of
nominal data include race, hair color, and marital status.
Whennominaldataareincludedinadataset,itiscommonto
assignanumericalcodetoeachofthecategories.Anexample
ofassigningnumberstonominaldataforhaircolormightbe
1=blond,2=brunette,3=brown,4=black,5=red.Asyoucansee
fromthisexample,thenumbersassignedtocategorieshaveno
realmeaning. Thatis,ahaircolorof5isnotabetterhair
colorthanahaircolorof1. Likewise,sayingthatthemean
haircolorofoursampleis3.3wouldbemeaningless.Instead,
the numerical code is just shorthand for the category
description. Inhumanresourcesweoftenseesuchcodingfor
race(i.e.,1=White,2=AfricanAmerican,3=Hispanic,5=Asian)
andsex(1=male,2=female).
Ordinaldataarerankorders.Examplesofordinaldata
include baseball standings (Whos in first place?), seniority
lists(Sheisthirdfromthetop),andbudgetrequests(Putin
rankorderyourlistofneededequipmentandwewillseewhat
wecanpurchase.).Ordinaldatatellustherelativedifference
betweenpeopleorcategoriesbutdonottellusanythingabout
the absolutedifference betweenthe people or the categories.
Forexample, ifapplicantsareplacedonahiringlist onthe
basisoftheirtestscores,weknowthatthepersonrankedfirst
hasahigherscorethanthepersonrankedsecond;butwedont
know if the difference between the two is one point or 50

8
Concept of Statistical Analysis
points. Likewise,asshowninTable1.2,thescoredifference
between the applicants in first and second place is not
necessarilythesamescoredifferencebetweentheapplicantsin
secondandthirdplace.
Interval data have equal intervals but not necessarily
equal ratios. Examplesofintervaldataincludeperformance
ratings,thetemperatureoutside,andascoreonapersonality
test. Letsusetemperaturetodemonstrate. Athermometer
has equal intervals in that the distance between 89 and 90
degrees and the distance between 54 and 55 degrees is the
same(onedegree). However,atemperatureof80degreesis
not twice as hot as a temperature of 40 degrees. Thus,
althoughtheintervalsbetweenpointsonthesaleareequal,
theratioisnot.
Ratio data have equal ratios and a true zero point.
Examplesofratiodataincludesalary,height,andthenumber
of job applicants. All three have a true zero point in that
someonecan haveno salary, there canbe no job applicants,
and if something doesnt exist, it can have no height. The
ratiosareequalinthat10jobapplicantsistwiceasmanyas5,
asalaryof$40,000istwiceasmuchasasalaryof$20,000,and
adesksixfeetinlengthistwiceaslongasadeskthatisthree
feetinlength.
Now that you have some of the basics, the following
chapters will provide information about particular types of
statistics

Table1.2
ApplicantListfortheBlueMoonDetectiveAgency
Rank Applicant Score
1 MaddieHayes 99
2 DavidAddison 94
3 TomMagnum 93
4 JohnShaft 89
5 FrankCannon 88
6 ThomasBanacek 87

9
Chapter 1
7 NoraCharles 80
8 JoeMannix 75
9 JessicaFletcher 71

10
2. Statistics That Describe Data
____________________________________

IMAGINE THAT YOU arereadingatechnicalreportorajournal


articleandtheauthorstates,AsyoucanseefromTable2.01,
ouremployeesarewellpaid.AsyouglanceatTable2.01,you
realize that the table contains raw data and is difficult to
interpret. Because looking at raw data is not particularly
meaningful, the first step in a statistical analysis is to
summarizetherawdataintoaformthatismeaningful.This
initial summarization is called descriptive statistics and
generally includes the sample size, a measure of central
tendency,andameasureofdispersion.

Table2.01
RawSalaryData
Employee HourlyRate
Jim $15.35
Ryan $16.37
Pam $15.35
Dwight $16.11
Michael $15.10
Oscar $17.05
Kevin $16.80

SampleSize

Animportantelementininterpretingthevalueofapieceof
research is the sample sizethe number of participants
included in a particular study. The number of participants

11
Chapter 2
may include an entire population (e.g., all employees at the
Pulaski Furniture Plant) but more than likely represents a
sample (100 students from Radford University) of a larger
population(allcollegestudentsintheUnitedStates).Inmost
journalortechnicalreports,thenumberofpeopleinasample
isdenotedbytheletterNandthenumberofpeopleinasub
sample(e.g.,numberofmen,numberofwomen)isdenotedbya
lowercasen.
Researchresultsderivedfromstudiesconductedwitha
smallnumberofindividualsshouldbeinterpretedwithalower
degree of confidence than a study conducted with a large
numberofparticipants.Itisimportanttonote,however,that
we also need to be aware of the difference between a small
samplesizeandasmallpopulation. Werememberbeingata
conferencewhenoneoftheaudiencemembersquestionedthe
accuracyofapresentersresults,becausethesamplesizewas
only25participants. Thespeakerpausedforamomentand
then told the audience member that the 25 participants
represented everyone in his police department, that is, the
entirepopulation.Theinterestingpartofthisstoryisthatthe
audience member continued to comment that the use of 25
participants was still not acceptable. What the audience
memberfailedtounderstandisthat,althoughlargesamples
are preferred over small samples, you can never acquire a
samplesizelargerthanthepopulationavailabletoyou.
Ifasampleisusedratherthananentirepopulation,it
isimportanttoconsidertwoaspectsofthesample:theextent
to which it is random and the extent to which it is
representative of the population. In a random sample, every
memberofthepopulationhasanequalchanceofbeingchosen
for the sample. For example, suppose a large organization
wants to determine the satisfaction level of its 3,000
employees. Because the budget for the project is not large
enoughtosurveyall3,000employees,theorganizationdecides
tosample500employees.Tochoosethe500,theorganization
mightusearandomnumberstableordrawemployeenames
from a hat. The more random the sample, the lower the
sample size needed to generalize the results to the entire
12
Statistics that Describe Data

Unfortunatelyinmostresearch,thesamplesusedare
certainlynotrandomlyselected.Forexample,supposethata
researcher at a university wants to study the relationship
between employee personality and performance in a job
interview. The population in this case would be every
applicantintheworldwhohaseverbeenonanemployment
interview.Ideally,theresearcherwouldrandomlysamplefrom
thispopulation. However,asyoucanimagine,thiswouldbe
impossible.Soinstead,theresearchermightgiveapersonality
testto250applicantsforpositionsatalocalmanufacturerand
thentrytogeneralizetheresultstootherapplicants. These
250applicantswouldbecalledaconveniencesample.Because
theconveniencesamplesusedinmoststudiesaredrawnfrom
one organization (e.g., municipal employees for the City of
Mobile, Alabama) located in one region (e.g., south) of one
country(e.g.,U.S.),cautionshouldbetakeningeneralizingthe
resultstootherorganizationsorcultures.
Conveniencesamplesarefineaslongastwoconditions
are met. The first is that the convenience sample must be
similar to the population to which you want to apply your
results.Thatis,theaffirmativeactionopinionsof18yearold
collegefemalesinAlabamamaynotgeneralizeto50yearold
malefactoryworkersinOhio.
The second condition is that members of the
conveniencesamplemustberandomlyassignedtothevarious
research groups. Take for example a researcher wanting to
study the effects of a training program on employee
productivity. Before spending $100,000 to train all 500
employees in the plant, the researcher might take a
convenience sample (30 employees on the night shift) and
randomly assign 15 to receive training (experimental group)
and15tonotreceivetraining(controlgroup).Thesubsequent
jobperformanceofthetwogroupscanthenbecompared.
A sample is considered to be representative of the
populationifitissimilartothepopulationinsuchimportant
characteristics as sex, race, and age. Random samples are
typically also representative samples. If a sample is not
13
Chapter 2
random,itisimportanttocomparethepercentageofwomen,
minorities,olderpeople,andothervariablesofinteresttothe
percentages in the relevant population. If the sample differs
fromthepopulation,itisdifficulttogeneralizethefindingof
thestudy.
Although in most cases it is important to have a
representativesample,therearetimeswhenitisnecessaryto
oversample certain types of employees. A good example of
suchasituationmightbeanemployeeattitudesurveyatan
organizationinwhichonly10%oftheemployeesarewomen.If
a random sample of 20 employees were drawn from a
populationof200,onlytwowomenwouldbeinthesample,not
enoughtocomparetheattitudesofwomentomen.Toensure
thatgenderdifferencesinattitudescouldbeinvestigated,one
might randomly select 10 of the 180 men and 10 of the 20
women.

MeasuresofCentralTendency

Setsofstatisticsthatdescribeasetofrawdataarecollectively
referredtoasmeasuresofcentraltendency.Individually,they
arereferredtoasthemean,median,andmode.

THEMEAN

The mean represents the mathematical average of a set of


data.Tocomputethemean,yousumallofthescoresobtained
fromyourparticipantsandthendividethissumbythetotal
numberofparticipants.Forexample,asyoucanseeinTable
2.02, the mean salary from the raw data first presented in
Table2.01is$16.02.

Table2.02
ComputingtheMeanSalary
Employee HourlyRate

14
Statistics that Describe Data

Michael $15.10
Jim $15.35
Pam $15.35
Dwight $16.11
Ryan $16.37
Kevin $16.80
Oscar $17.05

Sum 112.13
N 7
Mean $16.02

Asyoureadjournalarticlesandtechnicalreports,you
will find that M and X are the symbols most often used to
represent the mean. Throughout this chapter, we will
representthemeanwiththesymbolM.

THEMEDIAN

Themedian(Md)isthepointinyourdatawhere50%ofyour
rawscoresfallaboveand50%ofyourrawscoresfallbelow.To
determinethemedian,youbeginbyrankingyourrawscores
fromhighesttolowestandthenfindthescorethatfallsinthe
middle. Using the data from the seven employees in Table
2.02,weseethatthemedianwouldbe$16.11,becausethree
salaries($15.10,$15.35,and$15.35)arelowerthan$16.11and
threesalaries($16.37,$16.80,and$17.06)arehigher.
In our example, the median was easy to compute
becausetherewasanoddnumberofscores(7).Whenthereis
an even number of scores, you take the score that would
theoretically fall between the two middle scores. As an
example,letsaddonemoresalarytoourdataset($16.27):

$15.10,$15.35,$15.35,$16.11,$16.27,$16.37,$16.80,$17.06

Whenyoucountupfromthelowestsalary,thefourthsalaryis
$16.11, and if you count down from the highest salary, the

15
Chapter 2
fourthsalaryis$16.27.Toobtainthemediansalary,wewould
addthe$16.11andthe$16.27anddividebytwo. Thusthe
mediansalarywouldbe$16.19.Thisisthepointatwhich50%
ofthesalarieswouldfallaboveand50%ofthesalarieswould
fallbelow,eventhoughthesalaryof$16.19isnotanactual
memberofthedataset.

THEMODE

TheMode(MO)representsthemostfrequentlyoccurringscore
inasetofdata. Lookingatouroriginalsampledatasetin
Table 2.02, $15.35 would be the mode as it occurs twice;
whereas,eachoftheothersalariesoccursonlyonce. Inthe
casewhereyouhavemorethanonescoreoccurringmultiple
times(e.g.16,14,13,13,10,8,7,5,and5),thedatawouldbe
saidtobebimodal(havingtwomodes:13and5).

DECIDING WHICH CENTRAL TENDENCY MEASURE TO


USE

Becausetherearethreemeasuresofcentraltendency(mean,
median,mode),itisreasonabletoaskwhichofthethreeisthe
besttouse.Withlargesamplesizes,themeanisthedesired
measure of central tendency. With smaller sample sizes,
however,themeancanbeundulyinfluencedbyanoutliera
scorethatisverydifferentfromtheotherscores. Thus,with
smaller samples, the median should probably be used.
Unfortunately, there is no real ruleofthumb for what
constitutesasmallsample;andthus,theuseofthemeanor
medianissubjecttopersonalpreference.
Toseehowanoutliercanaffectthemean,lookatTable
2.03. InSample1,thecognitiveabilityscoresarerelatively
similarandthemeanandthemedianarethesame.InSample
2, however, the cognitive ability score of 44the outlieris
verydifferentfromtheotherscores, causingthemeantobe
much higher than the median. If we had 100 employees
insteadofthe7intheexample,theeffectofoneoutlierwould
16
Statistics that Describe Data

not result in the mean and the median being substantially


differentfromoneanother.
Why would this matter? Suppose that you have just
conducted a salary survey with the goal of adjusting your
salaries to match the industry standard. Your survey
indicates a mean salary of $46,000 and a median salary of
$42,000.Ifyourorganizationiscurrentlypaying$44,000,use
of the mean would suggest your employees are underpaid;
whereas, use of the median would suggest that they are
overpaid.
Themodeshouldbeusedwhenthegoaloftheanalysis
is to determine the most likely event that will occur. For
example, if a district attorney and a defense attorney were
tryingtoreachapleaagreement,thedefenseattorneywould
probably be most interested in the sentence most commonly
administeredbyaparticularjudge(mode)thanthemeanor
mediansentence.

Table2.03
CognitiveAbilityScoresforTwoSamples
Sample1 Sample2
17 17
18 18
19 19
20 20
21 21
22 22
23 44

Mean 20 23
Median 20 20

MeasuresofVariability

Though measures of central tendency provide useful


informationregardingthetypicalscoreinadataset,theydo
not provide information about the distribution of scores. In
17
Chapter 2
fact,twodatasetscanhavethesamemeanbutalsohavevery
different distributions. Take for example the three
distributionsshowninTable2.04.Allthreehaveameanand
medianof4.0,yetallofthedayshiftscoresarethesame(4),
thenightshiftscoresrangefrom3to5,andtheeveningshift
scoresrangefrom2to6.Measuresofvariabilityordispersion
areusefulfordeterminingthesimilarityofscoresinadataset.

Table2.04
ExampleofThreeDistributions
Day Evening Night
Shift Shift Shift
4 2 3
4 3 3
4 4 4
4 5 5
4 6 5

Mean 4 4 4

Table2.05
SamplePerformanceAppraisalRatings
Geller Tribbiani
3 2
3 2
3 3
3 3
3 4
3 4

Lets use the performance appraisal data shown in


Table2.05todemonstratewhywemightcareaboutmeasures
ofdispersion.ImaginethatsupervisorsinRiverCityratetheir
employeesperformanceonafivepointscale.Aratingof1is
terrible,2isneedsimprovement,3issatisfactory,4isgood,
and5isexcellent.Asthedepartmenthead,youarepleasedto

18
Statistics that Describe Data

see that the mean employee rating given by your two


supervisors is 3.0, a number indicating that the typical
employeewasratedassatisfactory.However,inlookingatthe
ratings, you notice that one of your supervisors rated every
employee Themostcommonmeasuresofdispersionaretherange,
as performing at a satisfactory level (3); whereas,
anothersupervisorassignedaratingof2totwoemployees,a
variance,andstandarddeviation.
rating of 3 to two employees, and a rating of 4 to two
employees.
R ANGE ThelackofdispersioninGellersratingsandthe
useofonly3ofthe5scalepointsbyTribbianiindicateeither
that the rating scale has too many points or that the
Therangeofadatasetrepresentsthespreadofthedatafrom
supervisorsdidnotproperlyevaluatetheiremployees.
the highest to the lowest score. To obtain the range, the
highest score in the data set is subtracted from the lowest
score. If we use Tribbianis performance ratings from Table
2.05, therange iscalculated bytaking4 (highest score) and
subtracting2(lowestscore).Therangeinperformanceratings
wouldbe2.Noticethatourrangedoesnotincludetwoofthe
points(1,5) ontheperformance appraisal scaledescribedin
the previous paragraph. When reporting the range in a
technicalreport,itisagoodideatolistthelowestandhighest
scores obtained as well as the lowest and highest possible
scores.Forexample,asyoucanseeinTable2.05,eventhough
RiverCitydesigneditsperformanceappraisalratingswitha5
pointscale,inrealityithasa3pointscale.

STANDARDDEVIATION

Thestandarddeviationisastatisticthat,whencombinedwith
the mean, provides a range in which most scores in a
distribution would fall. The standard deviation is based on
somethingcalledthenormalcurveorthebellcurve. The
ideabehindthenormalcurveisthatiftheentirepopulation
was measured on something (e.g., intelligence, height), most
people would score near the mean (the middle of the
distribution)andveryfewwouldscoreconsiderablyaboveor
belowthemean.
There are two waysthat astandard deviation can be
used to interpret data. The first is to focus on what the
standarddeviationtellsusaboutadistribution.Inviewingthe
19
Chapter 2
normal curve, we find that 68.26% of scores fall within one
standard deviation of the mean, 95.44% fall within two
standarddeviationsofthemean,and99.73%fallwithinthree
standard deviations of the mean. Lets use an example to
demonstratewhythisknowledgeisuseful.
Supposethatyouareatrainerandwillbetrainingone
group of employees in the morning and another in the
afternoon. Priortostartingyourtraining,youlookattheIQ
scoresoftheemployeestobetrained.AsshowninTable2.06,
youarepleasedthattheemployeesinbothclasseshaveamean
IQof100. Giventhatascoreof100istheaverageIQinthe
U.S., you feel comfortable that your trainees will be bright
enough to learn the material. However, as you look at the
standarddeviations,yourealizethatyourafternoonclasswill
beatrainersnightmare.

Table2.06
IQScoresforTwoTrainingGroups
Group MeanIQ SD 1SDRange 2SDRange
Morning 100 3 97103 94106
Afternoon 100 15 85115 70130

Inthemorningclass,thestandarddeviationof3tells
youthattheIQof68%ofyourtraineesiswithin6pointsofone
anotherandthattheIQof95%ofyourtraineesiswithin12
pointsofoneanother. Inotherwords,theemployeesinthe
morningsectionhavesimilarIQlevels.Theafternoonclassis
adifferentmatter.ThoughtheaverageIQis100,thestandard
deviationis15.Someofyourtraineesaresobright(e.g.,IQof
130) that they probably will be bored; whereas, others have
such a low IQ (e.g., 70) that they will need remedial work.
WithsuchalargedispersionofIQsintheclass,thereisnoway
you could use the same material and the same pace to
effectivelytraineachemployee.Thisisaconclusionthatcould
nothavebeenmadewiththemeanalone.

20
Statistics that Describe Data

Thesecondwaytouseastandarddeviationistofocus
on what the standard deviation tells us about a particular
score.Forexample,considerasalarysurveyforpoliceofficers
reportingameansalaryof$55,000andastandarddeviationof
$3,000.Fromthisinformationwewouldknowthatabouttwo
thirds (68.26%) of police departments pay their officers
between $52,000 (the mean of $55,000 minus the standard
deviationof$3,000)and$58,000(themeanof$45,000plusthe
standarddeviationof$3,000).Onthebasisofthesefigures,we
mightnotethatalthoughthe$54,000salarywepayisbelow
themean,thefactthatitiswithinonestandarddeviationof
themeanindicatesoursalaryisnotextremelylow.
Asanotherexample,supposethatanapplicantsscore
onanexamisonestandarddeviationabovethemean.Usinga
chart such as that shown in Table 2.07, we see that the
applicants score was equal to or higher than 84.13% of the
otherapplicants.
Now that we have discussed the usefulness of
interpretingastandarddeviation,itistimeforsomebadnews.
Inferencesfromastandarddeviationwillonlybeaccurateif
your data set is fairly large and your data are normally
distributed(i.e.,aplotofyourdatawouldlooklikeanormal
curve). Unfortunately,thisisseldomthecase. Thoughmost
measures are normally distributed in the world population,
seldomaretheynormallydistributedinanygivenorganization
or job. That is, because we screen out applicants with low
ability and promote those with high ability, test scores and
performanceevaluationsseldomresembleanormalcurve.
Why does this matter? Consider the data shown in
Table 2.08. The table shows the number of traffic citations
writtenbypoliceofficersintwodepartments. Thenumberof
citations written in Elmwood approximates a normal
distribution,whereas,thenumberwritteninOakdaledoesnot.
As you can see from the table, the large standard deviation
causedbyalackofanormaldistributionintheOakdaledata
wouldcauseustomaketheinferencethatanofficerwhoistwo

21
Chapter 2
standard deviations below the mean would be writing a
negativenumberoftickets!

Table2.07
InterpretingStandardDeviations
Standard
Cumulative%
Deviation
3.0 0.14
2.0 2.28
1.5 6.68
1.0 15.87
0.5 30.85
0.0 50.00
+0.5 69.15
+1.0 84.13
+1.5 93.32
+2.0 97.72
+3.0 99.86

Table2.08
NumberofTrafficCitationsWritten
PoliceDepartment
Officer
Elmwood Oakdale
A 1 1
B 2 1
C 2 1
D 3 1
E 3 1
F 3 1
G 4 1
H 4 1

22
Statistics that Describe Data

I 4 1
J 4 1
K 5 1
L 5 1
M 5 1
N 5 9
O 5 9
P 5 9
Q 6 9
R 6 9
S 6 9
T 6 9
U 7 9
V 7 9
W 7 9
X 8 9
Y 8 9
Z 9 9
Mean 5.00 5.00
Standarddeviation 2.00 4.08
1SDRange 37 .929.08
2SDRange 19 3.1613.16

VARIANCE

Athirdmeasureofdispersionisthevariance,whichis
simply the square of the standard deviation. Although the
varianceisimportantbecauseitservesasthecomputational
basis for several statistical analyses (e.g., ttests, analysis of
variance),byitselfitservesnousefulinterpretativepurpose.
Thus, the standard deviation is more commonly reported in
journalarticlesandtechnicalreportsthanisthevariance.

23
Chapter 2

StandardScores

Standardscores convertrawscoresintoaformatthattellsus
therelationshipoftherawscoretotherawscoresofothers.
Theyareusefulbecausetheyallowustobetterinterpretand
compare raw data collected on different measures. That is,
suppose your daughter told you that she scored a 43 on the
NationalHistoryTestthatwasadministeredatschool. With
onlythatrawscore,youwouldntknowwhethertorewardher
bybuyingthenewKatyPerryCDorpunishherbymakingher
listentoyourBarryManilowcollection. However,ifshetold
youthatherscoreof43putherinthetop5%,yourdecision
wouldbemucheasier.
Tomakerawscoresmoreuseful,weoftenconvertthem
into something that by itself has meaning. Perhaps the
simplestattemptatdoingthisistoconvertarawscoreintoa
percentage.Forexample,yourdaughtershistorytestscoreof
43 would be divided by the number of points possible (45)
resulting in a score of 95.6%. However, the problem with
percentagesitthattheydonttellushoweveryoneelsescored.
Thatis,atestmightbesoeasythata95.6%isthelowestscore
in the class. Likewise, I remember taking a physiological
psychology course as an undergraduate in which the best
studentintheclasshadanaverageof58%acrossfourtests!
The two most commonly used standard scores are
percentilesandZscores.

PERCENTILES

Apercentileisascorethatindicatesthepercentageofpeople
thatscoredatorbelowacertainscore.Forexample,asalary
survey might reveal that a salary of $26,000 is at the 71 st
percentile,indicatingthat71%of theorganizationssurveyed
pay$26,000orlessand29%paymorethan$26,000.Likewise,
24
Statistics that Describe Data

a students score of 960 on the SAT might indicate that he


scoredatthe45thpercentile45%ofthestudentsscoredator
below960and55%scoredhigher. Becausethereareseveral
formulasfordeterminingpercentiles,softwareprogramssuch
asExcelandSPSSoftenwillarriveatdifferentpercentilesfor
the same set of data. We will discuss the method that is
easiesttocalculateandinterpret.
As shown in Table 2.09, percentiles are computed by
firstrankingtherawscoresfrombottomtotop.Then,therank
associatedwith each scoreisdivided by thetotal number of
scores,resultinginthepercentileforthescore.Noticethatthe
highest score will always be at the 99th percentile; there is
nevera100th percentile. The25th percentileisalsocalledthe
firstquartile(Q1)andthe75 thpercentileisalsocalledthethird
quartile(Q3).
Though some authors have written that the 50th
percentileandthemedianarethesame,thisisnotusuallythe
case.Rememberthatbydefinition,themedianisthepointat
which50%ofthescoresfallbelowand50%fallabove.The50 th
percentile,however,isthepointatwhich50%ofthescoresfall
atorbelow.Forexample,ifyouhave6scoreswithnoties(e.g.,
20,22,24,26,28,30),the50th percentilewouldbethethird
highestscore(24),whereasthemedianwouldbe25asitfalls
betweenthethirdhighest(24)andfourthhighest(26)scores.

Table2.09
UsingPercentilesinaSalarySurvey
______________________________________________________

25
Chapter 2
HourlyWage Rank Computation Percentile
______________ _____ ____________ __________

$32.17 20 20/20 99
$30.43 19 19/20 95
$28.72 18 18/20 90
$25.25 17 17/20 85
$24.96 16 16/20 80
$24.48 15 15/20 75
$22.92 14 14/20 70
$22.75 13 13/20 65
$22.11 12 12/20 60
$21.03 11 11/20 55
$20.86 10 10/20 50
$20.79 9 9/20 45
$20.35 8 8/20 40
$20.22 7 7/20 35
$20.03 6 6/20 30
$18.93 5 5/20 25
$16.65 4 4/20 20
$16.50 3 3/30 15
$16.25 2 2/20 10
$14.24 1 1/20 5
________________________________________________________

Z-Scores

Whereas percentiles are based on the actual distribution of


scores in a data set (e.g., the salaries you obtained in your
salarysurvey),zscoresusethemeanandstandarddeviationof
asetofscorestoprojectwhereascorewouldfallinanormal
distribution. When a data set is large and is normally
distributed, percentiles and zscores will yield similar
interpretations.
Toobtainazscoreforanygivenrawscore,thefollowing
formulaisused:

z=(rawscoremeanscore)standarddeviation

26
Statistics that Describe Data

Forexample,ifyouscored70onatestthathasameanof60
andastandarddeviationof20,yourzscorewouldbe:
z=(7060)20
z=1020
z=0.5

A positive zscore indicates an above average score,


whereasanegativezscoreindicatesabelowaveragescore.An
averagescorewouldhaveazofzero.Inthepreviousexample,
our zscore of .5 indicates that our raw score of 70 is a
standard deviation (.5) above the mean. As shown in Table
2.10, a zscore of .5would meanthat our raw score of 70 is
higherthanthescoresof69.15%ofthepopulation.

Table2.10
InterpretingZScores
% falling at or below
zScore
score
3.00 0.14
2.00 2.28
1.75 4.01
1.50 6.68
1.25 10.56
1.00 15.87
0.75 22.66
0.50 30.85
0.25 40.13
0.00 50.00
+0.25 59.87
+0.50 69.15
+0.75 77.34
+1.00 84.13
+1.25 89.44
+1.50 93.32
+1.75 95.99
+2.00 97.72

27
Chapter 2
+3.00 99.86

OTHERSTANDARDSCORES

Becausemanypeopledonotlikeworkingwithnegativevalues,
theychoosetouseastandardscoreformatotherthanthe z
score. For example, the Minnesota Multiphasic Personality
Inventory 2 (MMPI2) and the California Psychological
Inventory(CPI)useaTscoreinwhichthestandardizedmean
foreachscaleis50andthestandarddeviationis10. Thus,
with zscores, a personscoring onestandard deviationbelow
themeanwouldhaveastandardscoreof1.00;whereas,with
T scores, a person scoring one standard deviation below the
mean would have a standard score of 40 (mean of 50 one
standard deviation of 10). As shown in Table 2.11, another
examplewould beIQ scoresthat have a meanof 100 anda
standarddeviationof15.

DECIDINGWHICHSTANDARDSCORETOUSE

Now that we have discussed percentiles, zscores, and other


standard scores, an important question becomes, Which is
best? Aswithmanyquestionslikethis,theanswerdepends
onwhatyouaretryingtoaccomplish.
Percentiles are best when the person reading your
analysisisnotstatisticallyinclined. Percentilesarealsobest
when you are describing a specific data set that will not be
generalizedtootherorganizations.Forexample,supposethat
you were conducting a study to determine which of your
employeeswereoutoflineintermsofdaysabsentorwhichof
yourpoliceofficerswereoutoflineinthenumberoftraffic
citations they issued. Creating a percentile chart would
probablyresultinamoreaccurateinterpretationthanwould
theuseofzscores.

28
Statistics that Describe Data

Zscores are best when standardizing scores for the


purpose of conducting certain statistical analyses. In fact, it
wouldbeinappropriatetousepercentilesforsuchapurpose.
Converting zscores to Tscores is best when your audience
consists of people who are used to using tests such as the
MMPI2(e.g.,clinicalpsychologists).

Table2.11
ComparisonofStandardScores
Percentile 0.14 2.28 15.87 50.00 84.13 97.72 99.86
Zscore 4 3 2 1 0 +1 +2 +3 +4
TScore 10 20 30 40 50 60 70 80 90
IQ 40 55 70 85 100 115 130 145 160

29
Chapter 2

StatisticalSymbols

Authorsofjournalarticlesandtechnicalreportsseldominclude
such terms as standard deviation or mean in their tables.
Instead,theyusesymbolstorepresenttheirstatistics. Table
2.12 contains the statistical symbols you are most likely to
encounterinjournalarticlesandtechnicalreportsthatdenote
thestatisticsdiscussedinthischapter.

Table2.12
SymbolsUsedtoDenoteDescriptiveStatistics
_____________________________________________________________

Statistic CommonSymbols
___________________ ____________________

Numberofpeople
Inthesample(samplesize) N
Inasubsample n
Numberofgroups K
Mean M,M,X,Mx
Median Mdn,Md
Mode MO
Standarddeviation
Samplestandarddeviation SD,s,StdDev
Populationstandarddeviation
Variance
Samplevariance s2
Populationvariance 2
Standardscore z
Quartile
Firstquartile Q1
Thirdquartile Q3
_______________________________________________

30
Statistics that Describe Data

ApplyingYourKnowledge

Youcanapplywhatyouhavelearnedinthischapterby
reading the following articles in Applied H.R.M. Research
(www.xavier.edu/appliedhrmresearch)andinPublicPersonnel
Management.

Buttigieg,S.(2005).Genderandracedifferencesinscoreson
theEmployeeAptitudeSurvey:Test5Space
Visualization.AppliedH.R.M.Research,10(1),4546.
Kethley,R.B.,&Terpstra,D.E.(2005).Ananalysisof
litigationassociatedwiththeuseoftheapplicationform
intheselectionprocess.PublicPersonnelManagement,
34(4),357375.
Roberts,G.E.(2004).Municipalgovernmentbenefitspractices
andpersonneloutcomes:Resultsfromanational
survey.PublicPersonnelManagement,33(1),119.
Selden,S.C.(2005).Humanresourcemanagementin
Americancounties,2002.PublicPersonnel
Management,34(1),5984.

31
3. Statistics That Test
Differences
Between Groups
______________________________

A COMMONLY ASKED questioninhumanresourcesiswhether


twoormoregroupsdifferonsomevariable. Thatis,dothe
average salaries of men and women differ? Are the average
performanceappraisalscoresofminoritiesandnonminorities
different? Which of five recruitment methods produces the
greatestnumberofhires? Themostcommonlyusedstatistics
toanswerthesequestionsarethettest,chisquare,analysisof
variance(ANOVA),andFishersexacttest.Thestatisticused
isafunctionofthenumberofgroups,thetypeofmeasurement,
the number of independent variables, and the number of
dependentvariables.

ChoosingtheRightStatisticalTest

Researcherstypicallyareinterestedintestingthesignificance
of differences between means or frequencies. A ttest or
ANOVA is used to test differences in means and non
parametrictestsareusedtotestdifferencesinfrequencies.

TESTINGDIFFERENCESINMEANS

Three factors are used to determine if differences between


means are statistically significant: size of the difference
32
Statistics that Test Differences Between Groups

between the means, the sample size, and the variability of


scores within each group being compared. The greater the
differenceinmeans,thegreaterthesamplesize;andthelower
thevariabilityofscoreswithingroups,thegreaterthechance
of two means being statistically different from one another.
For example, ifdata from 500 employees (large sample size)
indicate there is a $15,000 difference between mens and
womens salaries (large difference between means), the
differencewillprobablybestatisticallysignificant. However,
in a situation in which there is a $300 difference (small
difference between means) in the salaries of 16 men and 10
women (small sample size), it is unlikely that the difference
wouldbestatisticallysignificant.Althoughthelargedifference
in means and the larger sample size probably makes sense,
letsspendamomentdiscussingthevariabilityissue.
AsmentionedinChapter2,variabilityistheextentto
whichscoresdifferfromoneanother.Thestatisticsusedtotest
groupdifferencesinmeans(e.g., ttest,ANOVA)comparethe
variability of scores within a group with the variability of
scores between groups. For example, suppose we conduct a
studyinwhichwecomparethenumberofhoursperweekthat
men and women spend watching ESPN. The results of our
studyareshowninTable3.01. Thereis littlevariabilityin
scoreswithineachsex(i.e.,thescoresaresimilar),butthereis
muchvariabilityinscoresbetweenthegroups. Whensucha
situation occurs, the difference between the mean for men
(5.13)andthemeanforwomen(3.46)islikelytobestatistically
significant.

Table3.01HoursspentwatchingESPN
Women Men
4 3 4 5 5 5
3 4 3 5 5 5
4 3 4 5 5 5
3 4 3 5 6 5

33
Chapter 3

4 3 3 6 5 5

Table3.02HoursspentwatchingLawandOrder
Women Men
5 2 1 5 9 4
0 1 3 8 5 9
1 7 4 1 0 4
10 6 3 10 2 7
3 4 2 2 1 10

Averydifferentpatternoccurs,however,inTable3.02.
Althoughthemeansformenandwomenarethesameasthey
wereinTable3.01,thevariability within eachgroupismuch
greater. WhereasinTable3.01,wherethehighestnumberof
hoursforwomen(4)islowerthanthelowestnumberofhours
formen(5),thehighsandlowsformeninTable3.02arethe
same as those for women. With such high variability, it is
unlikely that the differences in means would be statistically
significant.

Table3.03:Statisticsthattestdifferencesinmeans
NumberofDependent
NumberofIndependent
Variables
Variables
One Twoormore
Oneindependentvariable
Twolevels ttest MANOVA
Twoormorelevels ANOVA MANOVA
Twoormoreindependentvariables ANOVA MANOVA

Intestingdifferencesinmeans,ttestsandANOVAsare
the most commonly used statistics. As shown in Table 3.03,
whenthereisonlyoneindependentvariable(e.g.,sexorrace)
with only two levels (e.g., male, female or minority,
nonminority), a ttest is used to test group differences in
means. When there is one independent variable with more

34
Statistics that Test Differences Between Groups

thantwolevels(e.g.,race:AfricanAmerican,White,Hispanic
American, and Asian American) or there are two or more
independent variables (e.g., sex and race), an analysis of
variance (ANOVA) is used. When there are more than two
dependent variables (e.g., turnover and absenteeism), a
multivariateanalysisofvariance(MANOVA)isused.
Forexample,a ttestmightbeusedtotestdifferences
in:
Salary(1dependentvariable)betweenmalesandfemales
2levels(male,female)of1independentvariable(sex)

Sex
Male Female
Salary $46,000 $43,000

Assessment center scores (1 dependent variable) between


minorities and nonminorities2 levels (minority, non
minority)of1independentvariable(minoritystatus)

Race
Nonminority Minority
AssessmentCenterScore 52.6 47.3

Job satisfaction levels (1 dependent variable) between


clerical and production workers2 levels (clerical,
production)of1independentvariable(jobtype).

JobType
Production Clerical
JobSatisfactionScore 6.1 7.5

Ananalysisofvariance(ANOVA)mightbeusedtotest
differencesin:
Salary (1 dependent variable) among White, African
American, and Hispanic employees3 levels (White,

35
Chapter 3
AfricanAmerican,andHispanic)of1independentvariable
(race).

Race/Ethnicity
White AfricanAmerican Hispanic
Salary $44,000 $41,000 $41,500

Salary (1 dependent variable) on the basis of race and


gender(2independentvariables:raceandgender)

Race/Ethnicity
Gender White African Hispanic
American
Male $44,000 $41,000 $41,500
Female $42,000 $39,000 $40,000

TESTINGDIFFERENCESWITHSMALLSAMPLESIZES

AsstatedinChapter2,whensamplesizesaresmallorwhen
thereareoutliersthatmightskewthedata,usingthemean
mightresultinamisinterpretationofthedata.Insuchcases,
a statistic that does not assume a normal distribution (non
parametric)isused.Althoughtherearemanynonparametric
tests,twocommonlyusedtestsinthehumanresourcefieldare
theMannWhitneyUtestandtheFishersexacttest.
The MannWhitney (also called WilcoxonMann
Whitney)teststhedifferencesintherankorderofscoresfrom
twopopulations.Forexample,supposethatacompanywants
toknowifthesalariespaidtofemaleaccountantsarelessthan
thosepaidtomaleaccountants.AsyoucanseeinTable3.04,
thereareonly12accountants,probablytoofewtouseattest.
The first step in the MannWhitney is to rank order the
salariesandthensumtheranksforeachgroup.Forexample,
thesumofranksforourfemaleaccountantsis1+7+10+11=
29andthesumofranksforourmaleaccountantsis2+3+4+
36
Statistics that Test Differences Between Groups

5+6+8+9+12=49.AUvalueisthencomputedforeachof
these two sums and a table is used to determine if the
differenceinthetwoUvaluesisstatisticallysignificant.

Ratherthanusingtheaveragerank,theFishersexact
testcomparesthenumberoffemaleaccountantswhosesalary
isabovethemediansalarytothenumberofmaleaccountants
whosesalaryisabovethemedian.Theactualcalculationsfor
theFishersexacttestcangetcomplicatedandarebeyondthe
scope of this book. However, lets discuss the basic concept
behindthetest.AsdepictedinTable3.04,themediansalary
forouraccountantsis$28,500.AsshowninTable3.05,25%of
womenhavesalariesabovethemedianand75%havesalaries
belowthemedian. Formen,62.5%(5/8)havesalariesabove
themedianand37.5%(3/8)havesalariesbelowthemedian.A
Fischersexacttestwoulddeterminetheprobabilitythatthese
differences are statistically significant (i.e., did not occur by
chance).

Table3.04
Salariesforaccountants
Salary AccountantSex
$32,000 Female
$31,500 Male
$31,000 Male
$30,900 Male
$30,200 Male
$29,000 Male
$28,000 Female
$27,800 Male
$27,200 Male
$27,000 Female
$26,500 Female
$26,000 Male

37
Chapter 3

Table3.05
Numberofmenandwomenwhosesalaryfallsaboveand
belowthemedian
Women Men
Abovethemedian 1 5 6
Belowthemedian 3 3 6
4 8 12

TESTINGDIFFERENCESINFREQUENCIES

Attimes,aresearcherwantstotestdifferencesinfrequencies
ratherthandifferencesinmeansormedians.Forexample,as
showninTable3.06,anHRmanagermightwanttoseeifthe
distributionofmenandwomenacrossjobsisthesame.Or,as
showninTable3.07,theHRmangermightwanttodetermine
whether there are differences in the number of people hired
using different recruitment methods. In situations such as
these,achisquareismostcommonlyusedwithlargesamples
andtheFishersexacttestforsmallsamples.

Table3.06 Table3.07
PositionType Male Female RecruitmentMethod Hired
Management 15 5 Referral 43
Clerical 2 27 Advertisement 27
Production 45 13 JobFair 26

InterpretingStatisticalResults

tTEST

When ttestsareusedintechnicalreportsorjournalarticles,

38
Statistics that Test Differences Between Groups

theresultsoftheanalysisaretypicallylistedinthefollowing
format:

t(45)=2.31,p<.01

The number in the parentheses, in this case 45,


representsthedegreesoffreedom.Forattest,thedegreesof
freedom are the number of people in the sample minus 2.
Thus,inourexampleabove,our45degreesoffreedomindicate
thatourttestwasconductedonscoresfrom47people.
The next number, the 2.31, is the value of our ttest.
The larger the tvalue, the greater the difference in scores
betweenthetwogroups.Withsamplesizesof120ormore,the
tvalue can be interpreted as approximately the number of
standarderrorsinwhichthetwogroupsdiffer.Forexample,a
t of 2.0 would indicate that the salary for males is about 2
standarderrorshigherthanthesalaryforfemales.Likewise,a
t of 1.5 would indicate a difference of approximately 1
standard errors. With sample sizes of less than 120, the
interpretationofatvalueisnotasprecise.
Thesignificancelevelisindicatedbythenotationp<.
01withthe.01indicatingthattheprobabilityofourresults
occurringbychanceis1in100(.01). Traditionally,whena
significancelevelis.05orlower(e.g.,.04,.02,.001),ourresults
are considered to be statistically significant. As shown in
Table3.08,thesignificancelevelisafunctionofthetvalueand
thedegreesoffreedom.Thehigherthedegreesoffreedom(the
greater the sample size), the lower the tvalue needed to be
consideredstatisticallysignificant.
When reading the results of a ttest in a journal or
technicalreport,youmightfindthatthearticlementionedone
ofthreetypesof ttests:Onesample,twosamples,orpaired
difference.
Aonesamplettestisusedwhenaresearcherwantsto
comparethemeanfromasamplewithaparticularmean.For
39
Chapter 3
example, suppose that a police department found that the
averagenumberofcomplaintsreceivedforeachofficerwas1.3
peryear. Thenationalaverageforcomplaintsis1.2. Aone
samplet-test could be used to determine if the rate of 1.3 for the town was
statistically higher than the national rate of 1.2.

Atwosamplettestisusedtocomparethemeansoftwo
independent groups. For example, a group of 30 employees
received customer service training, and the town manager
wantstocomparethecomplaintratefortheseemployeeswith
thatof40employeeswhodidnotreceivethetraining.Another
examplewouldbethatacompensationmanagerfoundthatthe
averagesalaryformalepoliceofficersinthetownwas$32,200,
andtheaveragesalaryforfemalepoliceofficerswas$30,800.
To determine if the average salary for men was statistically
higherthantheaveragesalaryforwomen,atwosamplettest
wouldbeused.
A paireddifference ttest is used when you have two
measuresfromthesamesample. Forexample,policeofficers
in one department averaged 1.3 complaints per officer. To
reduce the number of complaints, the chief had each of the
officersattendatrainingseminaroncommunicationskills.In
theyearfollowingtheseminar,theaveragecomplaintratefor
thosesameofficerswas1.0.Apaireddifferencettestwouldbe
used to determine if the decrease from 1.3 to 1.0 was
statisticallysignificant.

ANALYSISOFVARIANCEONEINDEPENDENTVARIABLE

When the results of an ANOVA are reported in a technical


report or journal, two tables are usually provided: a means
40
Statistics that Test Differences Between Groups

the ANOVA, and the means table provides descriptive


statisticsthatserveasthebasisforthesourcetable.
AsshowninTable3.09,thesourcetableprovidesfive
piecesofstatisticalinformation,onlythreeofwhichdegrees
offreedom(df),Fvalue(F),andtheprobabilitylevel(p<)are
importantforinterpretingtheresultsoftheANOVA.

DegreesofFreedom

In an ANOVA source table, the degrees of freedom for an


independentvariablearethenumberofgroupsinthevariable
minus one. For example, as shown in Table 3.09, in the
Education variable, there were three groups: high school
diploma,associatesdegree,andbachelorsdegree.Thedegrees
offreedom,then,wouldbe3groups1=2degreesoffreedom.
Hadtherebeenfoureducationlevels(i.e.,highschooldiploma,
associates degree, bachelors degree, masters degree), there
wouldhavebeenthreedegreesoffreedom(4educationlevels
1=3).
The total degreesof freedom represent the number of
people in our analysis minus one. The 673 total degrees of
freedominTable3.09indicatethatouranalysiswasbasedon
datafrom674employees(6741=673degreesoffreedom).

FStatistic

An ANOVA with one independent variable will yield one F


value. Statistically,the F valueiscomputedbydividingthe
meansquare(MS)forthevariablebythemeansquareerror.
For example, the F of 35.32 for education was computed by
dividingthemeansquare(MS)foreducation(3369.62)bythe
meansquareerror(95.40). An Fof1.0orlessindicatesthat
the independent variable had the effect size that we would
haveexpectedbychance.An F valuegreaterthan1indicates
thattheeffectofourvariablewasgreaterthanwhatwouldbe

41
Chapter 3

expectedbychance.IntheexampleshowninTable3.09,theF
of35.32foreducationindicatesthattheeffectofeducationon
academygradesis35timeswhatwouldbeexpectedbychance.

SignificanceLevel

Aswith ttestresults,thesignificancelevelofour F valueis


indicated by the notation, p < .0001, indicating that the
probability of our results occurring by chance is 1 in 10,000
(.0001).Asmentionedearlier,aprobabilitylevellessthan.05
is considered statistically significant. As you canseefrom
Table3.09,theeffectforeducationissignificantatthe.0001
level.Thislevelindicatesthatthemaineffectforeducationis
statisticallysignificantbecauseitislessthan.05.
AsshowninTable3.10,thesignificanceofanFvalueis
determinedbythesamplesizeandthenumberoflevelsinthe
independentvariable. Thegreaterthesamplesize,thelower
the F valueneededforstatisticalsignificance.Asyoucansee
fromthetable,wewouldneedanFvalueofabout3(2.99)for
the effect of education on academy grades to be statistically
significant.Ourvalueof35.32greatlyexceedsthatvalue.

Table3.09
Example of an ANOVA source table for one
independent variable
Effect df SS MS F p<
Education 2 6739.24 3369.62 35.32 .0001
Error 671 64011.83 95.40
Total 673 70751.07

IfourFvalueisnotstatisticallysignificant,wecannot
concludethatourindependentvariable(e.g.,education)hadan
effectonthedependentvariable(e.g.,academygrades).Ifour
F valueissignificant,wehaveonemoreanalysistoperform.
AlthoughthesignificantFvalueindicatesthatacademycadets
42
Statistics that Test Differences Between Groups

performeddifferentlyonthebasisoftheireducationlevel,we
dont know if academy performance differed for each of the
three degree types. That is, it may be that cadets with
associatesdegreesorbachelorsdegrees outperformed cadets
withahighschooldiploma,butcadetswithassociatesdegrees
orbachelors degreesperformedat thesamelevel. Togeta
clearerpictureofwhichmeansdifferfromoneanother, post
hoc tests are conducted. Examples of such tests include
Scheffe,Tukey,Duncan,LSD,andNewmanKeuls.

Table3.10
ApproximateFvalueneededforsignificanceatthe.05level
LevelsintheIndependentVariable
SampleSize
2 3 4 5
10 5.12 4.46 4.35 4.53
20 4.38 3.55 3.20 3.01
30 4.18 3.34 2.96 2.74
60 4.00 3.15 2.76 2.52
100 3.94 3.09 2.70 2.46
200 3.89 3.04 2.65 2.41
Infinity 3.84 2.99 2.60 2.37

Injournalarticlesandtechnicalreports,theresultsof
thesetestsaretypicallydepictedusingsuperscriptsnexttothe
mean. Means that share the same superscript are not
statisticallydifferentfromoneanother.Forexample,thethree
meansinTable3.11havedifferentsuperscripts,thustheyare
statistically different from each other. As shown in the
examplesinTable3.12,
In Example 1, cadets with bachelors degrees and
associatesdegreesperformedbetterthancadetswithhigh
schooldiplomas,buttherewasnodifferencebetweencadets
withassociatesdegreesandbachelorsdegrees.
InExample2,cadetswithbachelorsdegreesoutperformed
thosewithassociatesdegreeswhooutperformedthosewith

43
Chapter 3
highschooldiplomas.
In Example 3, cadets with bachelors degrees performed
better than those with associates degrees or high school
diplomas. Cadets with associates degrees did not
outperformcadetswithahighschooldiploma.

Table3.11
MeansTable

EducationLevel MeanAcademyScore StandardDeviation


Highschooldiploma 77.09a 10.83
Associatesdegree 81.31b 9.09
Bachelorsdegree 83.94c 8.25

Table3.12
Examplesofposthoctestresults
EducationLevel Example1 Example2 Example3
Highschooldiploma 73.24a
73.24 a
73.24a
Associatesdegree 77.89b 77.89b 75.99a
Bachelorsdegree 78.01b 80.21c 80.21b

ANALYSIS OF VARIANCE: TWO OR MORE INDEPENDENT


VARIABLES

AsshowninTable3.13,anANOVAproducesan F valuefor
each independent variable and combination of independent
variables.Eachindividualvariableiscalledamaineffectand
the combination of variables is called an interaction. When
there are two independent variables, three outcomes are
possible:
Oneorbothmaineffectsarestatisticallysignificantbut
theinteractionisnot(Table3.14)
Neithermaineffectissignificantbuttheinteractionis
(Table3.15)

44
Statistics that Test Differences Between Groups

One of the main effects is significant as is the


interaction(Table3.16)


Table3.13
Example of an ANOVA source table for two independent
variables
Effect df SS MS F p<
Race 1 1.31708 1.31708 9.31 .003
Sex 1 0.00896 0.00896 0.06 .802
Sex*Race 1 0.51953 0.51953 3.67 .058
Error 107 15.13865 0.14148
Total 110 16.98421

Table3.14
Exampleofsignificantmaineffectwithnointeraction
SourceTable
Effect df SS MS F p<
Sex 1 .0073 .0073 14.09 .0006
Race 1 0.5198 0.5198 0.20 .6593
Sex*Race 1 .0026 0.0026 0.07 .7937
Error 36 1.3279 0.0369
Total 39 1.8576
MeansTable
Sex White Minority
Male 2.42 2.46 2.44
Female 2.21 2.22 2.22
2.31 2.34 2.33

45
Chapter 3

2.5
ng 2.4
ati 2.3 Men
2.2 Women
R

2.1
2
Minority White
Race

Table3.15
Exampleofasignificantinteractionbutnomain
effects
SourceTable
Effect df SS MS F p<
Race 1 0.0172 0.0172 1.20 .2804
Sex 1 0.0056 0.0056 0.04 .8441
Sex*Race 1 0.2873 0.2873 20.03 .0001
Error 36 0.5162 0.0143
Total 39 0.8213
MeansTable
Sex White Minority
Male 2.48 2.27 2.38
Female 2.30 2.43 2.43
2.39 2.34 2.40

46
Statistics that Test Differences Between Groups

2.5
2.4
Men
ng
ati 2.3
R Women
2.2
2.1
Minority White
Race

Asignificantinteractionindicatesthattheeffectofone
variabledependsontheleveloftheotherdependentvariable.
Forexample,asshowninTable3.15,therearenosignificant
differences in overall performance ratings between men and
women or minorities and nonminorities. The significant
interaction,however,tellsusthatsexandraceinteractsothat
white males and minority females are receiving the highest
performance evaluations. To be sure of which means are
differentfromoneanother,wewoulduseoneoftheplanned
comparisontestspreviouslymentioned.

Table3.16
Exampleofasignificantinteractionandmain
effects
SourceTable
Effect df SS MS F p<
Race 1 0.2234 0.2234 5.64 .0230
Sex 1 0.5086 0.5086 12.83 .0010
Sex*Race 1 0.2576 0.2576 6.50 .0152
Error 36 1.4267 0.0396
Total 39 2.4163
MeansTable
Sex White Minority
Male 2.59 2.28 2.44
Female 2.21 2.22 2.22

47
Chapter 3

2.40 2.25 2.33

2.7
2.6
2.5
Men
ng
ati 2.4
2.3 Women
2.2
R
2.1
2
Minority White
Race

FISHERSEXACTTEST

Interpreting a journal article or technical report using a


Fishers exact test is relatively easy because the results are
usuallyreportedinamannersuchas,FET=.023. The.023
represents the probability the difference between the two
groupsoccurredbychance.Withmostsoftwareoutput,atwo
tailedvalueof.05orlowerisconsideredstatisticallysignificant
at the .05 level and a value of .01 or lower is considered
significantatthe.01level. Thus,theFETof.042wouldbe
significant at the .05 level. An FET of .067 would not be
statisticallysignificant.Ifaonetailedtestisused,aFETof.
025orlessisneededforthegroupdifferencetobeconsidered
significant.

CHISQUARE

Injournal articles,chisquareresultsareoftenreportedina
mannersuchas,2(2)=2.69,p<.04.The(2)isthedegreesof
freedom, the2.69isthechisquarevalue, andthe.04is the
probabilitylevel.
Withachisquareanalysis,thedegreesoffreedomare

48
Statistics that Test Differences Between Groups

thenumberofgroupsminusone.Forexample,iftheanalysis
examined racial frequencies (white, African American,
Hispanic,Asian),thedegreesoffreedomwouldbethree:four
racesminusone.IntheexampleshownbackinTable3.07,the
degreesoffreedomwouldbetwo(threerecruitmentmethods
minusone).Whentheanalysisinvolvestwovariables,suchas
thatshownbackinTable3.06,thedegreesoffreedomarethe
numberofgroupsinthefirstvariableminusone,multipliedby
thenumberofgroupsinthesecondvariableminusone. The
degrees of freedom for Table 3.06 would be two: (two for
positiontype[3positionsminus1]multipliedbyoneforsex[2
sexesminusone].)
Thesignificancelevelisdeterminedbythesizeofthe
chisquarevalueandthedegreesoffreedom. Thegreaterthe
degreesoffreedom,thehigherthechisquarevalueneededfor
statistical significance. An example of the chisquare values
neededforstatisticalsignificanceisshowninTable3.17. As
you can see in the table, with 3 degrees of freedom, a chi
square value of 7.81 or higher is needed for a frequency
distribution to be significantly different at the .05 level and
11.34forthe.01levelofsignificance.
Itisimportanttounderstandthatwhentherearemore
than two levels of a variable, a significant chisquare only
indicates that the distribution of frequencies is not equal.
UnlikethepairedcomparisonsavailableforANOVA,thereis
no test with chisquare to indicate which frequencies are
different from one another. For example, in Table 3.06, a
significantchisquarewouldtellusthatmalesandfemalesare
not represented equally across positions; but we would not
know which positions are statistically different from one
another. Inlookingatthetablewewouldprobablymakethe
assumption that males are represented more in the
management and production positions and less often in the
clericalpositions;butwecouldnotstatethiswithstatistical
certainty.

49
Chapter 3
Table3.17
Chisquarevaluedneededforstatistical
significance
ProbabilityLevel
DegreesofFreedom
.05 .01
1 3.84 663
2 5.99 9.21
3 7.81 11.34
4 9.49 13.28
5 11.07 15.09
6 12.59 16.81
7 14.07 18.48
8 15.51 20.09
9 16.92 21.67
10 18.31 23.21
20 31.41 37.57
30 43.77 50.89

50
Statistics that Test Differences Between Groups

ApplyingYourKnowledge

Youcanapplywhatyouhavelearnedinthischapterby
reading the following articles in Applied H.R.M. Research
(www.xavier.edu/appliedhrmresearch)andinPublicPersonnel
Management.

ttest

Kuroyama,J.,Wright,C.W.,Manson,T.M.,&Sablynski,C.J.
(2010).Theeffectofwarningagainstfakingon
noncognitivetestoutcomes:Afieldstudyofbusoperator
applicants.AppliedH.R.M.Research,12(1),5974.

AnalysisofVariance(ANOVA)

Levine,S.P.&Feldman,R.S.(2002).Womenandmens
nonverbalbehaviorandselfmonitoringinjobinterview
setting.AppliedH.R.M.Research,7(1),114.
Roberts,L.L.,Konczak,L.J,&Macan,T.H.(2004).Effectsof
datacollectionmethodsonorganizationalclimate
surveyresults.AppliedH.R.M.Research,9(1),1326.

Chisquare

Lee,J.A.,Havighurst,L.C.,&Rassel,G.(2004).Factorsrelated
tocourtreferencestoperformanceappraisalfairness
andvalidity.PublicPersonnelManagement,23(1),61
69.
Somers,M.,&Casal,J.C.(2011).Typeofwrongdoingand
whistleblowing:Furtherevidencethattypeof
wrongdoingaffectsthewhistleblowingprocess.Public
PersonnelManagement,40(2),151163.

51
Chapter 3

52
4.UnderstandingCorrelation
_______________________________

WhatisCorrelation?

CORRELATION IS A statistical procedure that allows a


researcher to determine the relationship between two
variables. For example, we might want to know the
relationshipbetweenanemploymenttestandfutureemployee
performance,jobsatisfactionandjobattendance,oreducation
level and performance in a training program. Though
correlationsshowtheextenttowhichtwovariablesarerelated,
itisimportanttounderstandthatcorrelationalanalysisdoes
not necessarily say anything about whether one variable
causesanother.
Why does a correlation coefficient (the result of a
correlational analysis) not indicate a cause and effect
relationship?Becauseathirdvariable,aninterveningvariable,
oftenaccountsfortherelationshipbetweentwovariables.Take
the example often used by psychologist David Schroeder.
Supposethereisaveryhighcorrelationbetweenthenumberof
ice cream cones sold in New York during August and the
numberofbabiesthatdieduringAugustinIndia.Doeseating
icecreaminNewYorkkillbabiesinanothernation?No,that
wouldnotmakesense.Instead,welookforthatthirdvariable
that would explain our high correlation. In this case, the
answerisclearlythesummerheat.
Anotherinterestingexampleofaninterveningvariable
wasprovidedbypsychologistWaymanMullinsinaconference
53
Chapter 4
presentation about the incorrect interpretation of correlation
coefficients. Mullins pointed out that data show a strong
negative correlation between the number of cows per square
mileandthecrimerate.Withhistonguefirmlyplantedinhis
cheek,MullinssuggestedthatNewYorkCitycouldriditselfof
crime by importing millions of head of cattle. Of course, the
realinterpretationforthenegativecorrelationisthatcrimeis
greaterinurbanareasthaninruralareas.
As shown above, a good researcher should always be
cautiousaboutvariablesthatseemrelated.Severalyearsago,
People magazine reported on a minister who conducted a
"study" of 500 pregnant teenage girls and found that rock
musicwasbeingplayedwhen450ofthembecamepregnant.
Theministerconcludedthatbecausethetwofactorsarerelated
(that is, they occurred at the same time), rock music must
causepregnancy.Hissolution?Outlawrockmusic,andteenage
pregnancywoulddisappear.However,supposewefoundthat
in all 500 cases of teenage pregnancy, a pillow was also
present. By using the same logic as the minister, the real
solutionwouldbetooutlawpillows,notrockmusic.Although
both"solutions"arecertainlystrange,thepointshouldbeclear:
justbecausetwoeventsoccuratthesametimeorseemtobe
related does not mean that one event or variable causes
another.

InterpretingaCorrelationCoefficient

MAGNITUDEANDDIRECTION

The result of correlational analysis is a number called a


correlationcoefficient.Thevaluesofthiscoefficientrangefrom
0to+1andfrom 0 to 1. Thefurther the coefficientis from
zero,thegreatertherelationshipbetweentwovariables.That
is,acorrelationof.40showsastrongerrelationshipbetween
two variables than does a correlation of .20. Likewise, a
correlation of .39 shows a stronger relationship than a

54
Understanding Correlation
correlationof+.30because,eventhoughthe.39isnegative,it
isfurtherfromzerothanthe+.30.

The + and signs indicate the direction of the


correlation.Apositive(+)correlationmeansthatasthevalues
ofonevariableincrease,thevaluesofasecondvariablealso
increase. For example, we might find a positive correlation
between intelligence and scores on a classroom exam. This
wouldmeanthatthemoreintelligentthestudent,thehigher
hisorherscoreontheexam.
Anegative() correlationmeans thatasthevaluesof
onevariableincrease,thevaluesofasecondvariabledecrease.
For example, we would probably find a negative correlation
betweenthenumberofbeersstudentsdrinkthenightbeforea
test and their scores on that test. As the number of beers
increases,theirtestscoresarelikelytodecrease. Inhuman
resources, we find negative correlations between job
satisfaction and absenteeism and between nervousness and
interviewsuccess.
Though correlation coefficients are computed using a
statistical formula, they are most easily understood through
scatterplotssuchasthosefoundinFigures4.1,4.2,and4.3.
Eachofthesefiguresdepictsthescoresof12applicantsonan
employmenttestandtheirsupervisorratingsafter6months
onthejob.Figure4.1showshowthescatterplotwouldlookif
test scores and performance ratings had a high positive
correlation,Figure4.2showsaplotofanegativecorrelation,
andFigure4.3showsaplotoftwouncorrelatedvariables.
Tointerpretascatterplot,usetheruleofthumbtaught
bypsychologistTomPierce.Countthenumberofpeoplefalling
inquadrantsCandBandthencount thenumber fallingin
quadrantsAandD. IfthenumberinquadrantsCandBis
higher than in A and D, there is a positive correlation: the
greaterthedifferenceinnumbers,thestrongerthecorrelation.
NoteinFigure4.1thereareninepointsinquadrantsBandC
and only three in quadrants A and D. If the number in
quadrants A and D is higher than in C and B, there is a

55
Chapter 4
negativecorrelation.NoteinFigure4.2thereareeightpoints

Figure4.1
ExampleofaPositiveCorrelation

6 x
PerformanceRating

5 x

4 A B x x

3 x C D x x x

2 x x x

1 x

12 13 14 15 16 17 18 19 20 21 22 23 24
TestScore

Figure4.2
ExampleofaNegativeCorrelation
PerformanceRating

6 x

5 x x

4 x A B x x

3 x C D x x

56
Understanding Correlation

2 x x

1 x

12 13 14 15 16 17 18 19 20 21 22 23 24
TestScore

Figure4.3
ExampleofTwoUncorrelatedVariables

6 x

5 x x x
PerformanceRating

4 A B x x

3 x C D x x

2 x x x

12 13 14 15 16 17 18 19 20 21 22 23 24
TestScore

inquadrantsAandDandonlyfourinquadrantsCandB.If
thenumberisthesame,thereisnocorrelation.NoteinFigure
4.3thattherearesixpointsinquadrantsAandDandsixin
quadrantsCandB.

57
Chapter 4

FactorsLimitingtheMagnitudeofaCorrelation
own study, we obtained a correlation of .40. What would
explain this discrepancy? Probably three factors: test
unreliability,criterionunreliability,andrangerestriction.

Reliability. Thesizeofacorrelationcoefficientislimited
by the reliability of the two variables being correlated
(reliability is the extent to which a score is free from
error). So,ifourtwomeasures(inthiscase,scoresona
cognitiveabilitytestandgradesintheacademy)havelow
reliability, the correlation between the test scores and
academy grades (our validity coefficient) will be lower
thanexpected.Therearefourtypesofreliability:testretest,
alternateforms,internal,andscorer.
With testretest reliability, people take the same test
twice.Thescoresfromthefirstadministrationofthetestare
correlatedwithscoresfromthesecondtodeterminewhether
theyaresimilar.Iftheyare,thetestissaidtohavetemporal
stability:thetestscoresarestableacrosstimeandnothighly
susceptibletosuchrandomdailyconditionsasillness,fatigue,
stress,oruncomfortabletestingconditions.
Withalternativeformsreliability,twoformsofthesame
test are constructed. The scores on the two forms are then
correlatedtodeterminewhethertheyaresimilar.Iftheyare,
thetestissaidtohaveformstability.Multipleformsofatest
arecommoninsituationsinwhichindividualsmighttakethe
testmorethanonce(e.g.,apromotionexam)orwhenthereis
concernthattesttakerswillcopyanswersfromanothertest
taker.
With internalreliability,thesimilarityofresponsesto
testitemsiscompared.Inatestwithhighinternalreliability,
we would expect a test taker to answer similar items in a
similarway.Thatis,wewouldexpectapersonwhoratesthe
itemIamoutgoingasbeingsimilartothemwouldalsorate
theitemIliketotalkwithpeopleasalsobeingsimilarto
them. Measures of internal reliability that you might
encounter in a journal article include splithalf reliability,
58
Understanding Correlation
Cronbachs coefficient alpha, and the KuderRichardson
Formula20(KR20).
Scorer reliability is the extent to which two people
scoringatestwillobtainthesametestscore.Scorerreliability
isanissueespeciallyinprojectiveorsubjectivetests(e.g.,the
RorschachInkBlotTest,interviews,writingsamples)inwhich
thereisnoonecorrectanswer,buteventestsscoredwiththe
useofkeyssufferfromscorermistakes. Forexample,Allard,
Butler,Faust,andShea(1995)foundthat53%ofhandscored
personalitytestscontainedatleastonescoringerrorandthat
19% contained enough errors to alter a clinical diagnosis.
When human judgment of performance is involved, scorer
reliabilityisdiscussedintermsofinterraterreliability. That
is,willtwointerviewersgiveanapplicantsimilarratings,or
will two supervisors give an employee similar performance
ratings?

RangeRestriction. Acorrelationcoefficientisalsolimited
bytherangeoftestscoresandperformancemeasuresthatare
includedinthestudythewidertherangeofscores,thehigher
thevaliditycoefficient(thecorrelationbetweenatestscoreand
ameasureofjobperformance). Unfortunately,inthetypical
validitystudyinwhichwecorrelateatestwithsomemeasure
ofperformance,weusuallyencountersomethingcalled range
restriction.Thatis,wedon'thaveafullrangeoftestscoresor
performance ratings. For example, in a given employment
situation,fewemployeesareattheextremesofaperformance
scale. Employees who would be at the bottom were either
neverhiredorhavesincebeenterminated. Employeesatthe
upperendoftheperformancescaleeithergotpromotedorwent
toanorganizationthatpaidmoremoney.
Range restriction is important becauseit is easiest to
predictthefutureperformanceofpeoplewithextremelyhighor
extremely low test scores. For example, suppose that 10
students scored 165 on the verbal portion of the GRE and
another 10 scored 140. Most people would be willingtobet
thatmostofthestudentswithascoreof165willdobetterin
59
Chapter 4
However, suppose that one student scores 153 and another
scores 151. How many people would be willing to bet the
mortgageonsuchasmalldifferenceinpoints?

Curvilinearity.Anotherproblemthatcanlowerthesizeofa
correlationcoefficientis curvilinearity. AsdepictedinFigure
4.4,oneoftheassumptionsbehindcorrelationisthatthetwo
variablesbeingcorrelatedarelinearlyrelatedthescoreson
onevariablearerelatedinastraightlinetoscoresontheother
variable.

However, manythingsinlifearenotlinearlyrelated.
For example, research strongly indicates that bright people
performbetterthanlessbrightpeopleinthepoliceacademy.
But, is there apoint at which increases inintelligencedon't
help? In the example shown in Figure 4.5, academy
performanceincreasesasIQscoresincreaseuntilwereachan
IQof110.Afterscoresof110,increasingamountsofIQdonot
result in better performance. Why would we obtain such a
relationship?Becausethemateriallearnedintheacademyis
onlysodifficult;andatsomepoint,beingsupersmartmaynot
provideanyadvantageoverbeingsmart.

60
Understanding Correlation
In human resources, we see a similar relationship
betweenyearsofexperienceandjobperformance.Thatis,the
differenceinthejobperformanceofapersonwithtwoyears
experienceversusapersonwithnoexperienceorwithoneyear
experienceisprobablyfairlygreat.However,aftertenyears,
wouldanemployeewith15yearsofexperienceperformbetter
thananemployeewith12yearsofexperience?

Curvilinearity also occurs in situations in which too


little or too much of a variable could actually result in
decreasedperformancesomethingcalledthe"invertedU"that
isdepictedinFigure4.6Therelationshipbetweenarousaland
performanceprovidestheperfectexample. Apersonwhohas
verylowlevelsofarousalisprobablynotmotivatedenoughto
dowellonatask.Apersonwithveryhighlevelsofarousalwill
becomenervousandperformpoorly.However,apersonwitha
moderatelevelofarousalhasenoughenergytobemotivated
butnotsomuchthatperformancewilldecrease(toonervous
anduptight).So,eventhoughthereisarelationshipbetween
arousalandperformance,therelationshipisnotlinear.Thus,
a simple correlation between arousal levels and performance

61
Chapter 4
wouldprobablynotresultinasignificantcorrelation,andwe
wouldincorrectlyconcludethatthereisnocorrelationbetween
the two variables. Fortunately, there are some statistical
adjustments that we can make to test for this possibility
(convertingourmeasurestozscoresandthensquaringthem).
Justasfortunately,suchadjustmentsarebeyondthescopeof
thischapterandprobablyyourinterestaswell!

AninterestingexampleoftheinvertedUwasprovided
in 1996 by the New London, Connecticut police department.
NewLondonrequiredthatapplicantsscorebetween20and27
on the Wonderlic Personnel Test (a cognitive ability test),
reasoning that people scoringbelow 20were too dumb to be
cops,andthosescoringabove27weretoosmartandwouldbe
bored performing the daytoday law enforcement duties.
Though New London had no statistical proof to back their
claim, the 2nd Circuit Court of Appeals upheld the citys
practiceofnothiringapplicantswhoweretoobright(Jordanv.
CityofNewLondon,2000).Asyoucanimagine,NewLondon
receivedlotsofbadpublicityandagooddealofribbingfrom
othercities.TheSanFranciscoPoliceDepartmentwentsofar
as to hold a press conference inviting these too smart

62
Understanding Correlation
applicants rejected by New London to move out west and
applyfortheSFPD!

STATISTICALSIGNIFICANCE

Todetermineifweareevenallowedtointerpretacorrelation
coefficient, we must first compute something called a
significancelevel(seeChapter1foradiscussionofsignificance
levels). Significance levels tell us the probability that our
correlationcoefficientoccurredbychancealone.Thatis,ifwe
obtainacorrelationof.30betweenatestscoreandsupervisor
ratingsofonthejobperformance,istherereallyarelationship
between the two variables or is our correlation a chance
finding?
The significance level for a correlation coefficient is a
function of two factors: the size of the correlation coefficient
andthesamplesizeusedinthestudy.Thegreaterthesample
size, the smaller the correlation needed for statistical
significance.Forexample,asshowninTable4.1,acorrelation
of .19 would be significant if we had 100 employees in our
studybutnotifwehadonly50employees.

Table4.1
Samplesizesneededforstatisticalsignificance
SampleSize SmallestSignificantCorrelation(p<.05)
10 .63
20 .44
30 .36
40 .31
50 .27
60 .25
70 .23
80 .22
90 .21
100 .19

63
Chapter 4
Ifacorrelationcoefficientisnotstatisticallysignificant,
we cannot try to interpret it as being high/low or useful/not
useful. We essentially pretend that it doesn't exist. If,
however, the correlation is statistically significant, we must
address the issue of practical significance. That is, is the
relationshiphighenoughtobeofanyuse?

INTERPRETINGCORRELATIONSWITHNOMINALDATA

Correlations are normally conducted between variables that


aremeasuredonordinal,ratio,andintervalscales. Thatis,
variables whose numbers suggest something about their
standing relative to other numbers. For example, if we
correlate the number of years that employees have been
employed by the organization with their current salaries, it
wouldbeeasytointerpretthecorrelationcoefficient.However,
with one exception, correlations cannot be conducted on
variables measured with a nominal scale (numbers that
represent discreet categories that provide no relative
information). Forexample,supposethatwecodedouroffice
locationsas(1)LosAngeles,(2)NewYork,(3)Miami,and(4)
Dallas. We then correlate location with salary and obtain a
correlationof.40. Whatwouldthatmean? Aslocationwent
up,sodidsalary? Suchacorrelationwouldnotmakesense.
Similarly, if we coded employee race as 1=White, 2=African
American,3=Hispanic,and4=AsianAmerican,whatwoulda
correlationmeanbetweensalaryandrace? Thatasonehas
morerace,hisorhersalaryincreases?
The only time we can use a nominal variable in a
correlationiswhenthereareonlytwolevelsofthatvariable.
An excellent example of this would be employee sex. If we
coded men as 0 and women as 1, we can correlate sex with
salaryandactuallyhavethecorrelationmakesense.Thatis,
with sex coded with men as 0 and women as 1, a positive
correlation would mean that average salary for women (the
highercode)isgreater thantheaveragesalaryformen(the

64
Understanding Correlation
lower code), and a negative correlation would mean that
womenarebeingpaidlessthanmen.

PRACTICALSIGNIFICANCE:ISOURCORRELATIONANY
GOOD?

Wehavealreadydiscussedthatthemagnitudeofacorrelation
coefficient can range from 0 to 1 and that the farther the
coefficientisfromzero,thehighertherelationshipbetweentwo
variables.Butwhatdoesacorrelationof.40mean?Weknow
thatacorrelationof.40indicatesastrongerrelationshipthana
correlationof.20.But,is.40ahighcorrelation?Theanswer,of
course,isthatitdependsandtherearethreecommonwaysto
answer this question: variance accounted for, comparison to
norms,andutilityanalysis.

VarianceAccountedFor(r2)

Onewaytoadd meaningtoacorrelationcoefficient (r) is to


square the coefficient. This squared coefficient is most often
referredtoasr2(rsquared)butisalsocalledthecoefficientof
determination. Asanexample,letsimaginethatweobtaina
correlationof.40betweenscoresonSATandgradesincollege.
Ifwesquareourcoefficientof.40wewouldgetanr2of.16.The
.16 indicates that we can explain or predict 16% of the
variability in college grades by students SAT scores. The
remaining 84% is explained by such other factors as
motivation,interest,luck,andillness.
Ingeneral, the r2 values in psychology andinhuman
resources tend to be relatively low. Why? Because life is
complicatedandsuccessinschool,onajob,orinrelationships
isduetomorethanonefactor.So,wheninterpretingthevalue
of r2,thenormsassociatedwithagivenfieldortopicmustbe
considered.

ComparisontoNorms
65
Chapter 4

Thoughwewouldliketohaveextremelyhighcorrelationsand
rsquares,asmentionedpreviously,inpsychologyandhuman
resources this israre. So, to interpret acorrelationas being
good or high, one must compare the magnitude of a
correlation with those that are typically obtained in similar
situations. As can be seen in Table 4.2, the typical
correlationsfoundinorganizationalpsychologyresearchvary
tremendouslybytopic.
In personnel selection, correlations between selection
tests and measures of performance (validity coefficients)
correlations are typically in the .20 to .30 range. Thus, in
personnelselection,acorrelationbelow.20wouldbeconsidered
low,.20to.29consideredmoderate,.30to.39high,and.40or
greater as outstanding. Validity coefficients greater than .50
probably indicate one of two things: either the correlation
coefficient is suspect (e.g., calculation errors, chance due to
small sample size, or cheating) or the personnel analyst
deservestheNobelPrizeforscience!
Ifweareusingaparticulartypeofselectiontestand
wanttocompareittosimilartests,Table4.3providesaneasy
waytoassessthemagnitudeofourcorrelation.Forexample,if
we correlated our assessment center scores with supervisor
ratingsofperformanceandobtainedacorrelationof.15,wecan
see from Table 4.3 that ourvalidity of .15 is well below the
typicalvalidityof.28forassessmentcenters. Notetometa
analysis fansthe correlations in the table below are
uncorrected.SeeAamodt(2016)orSchmidtandHunter(1998)
for tables showing corrected or "true" validities. These
concepts will be explained further in Chapter 6 on meta
analysis.
Unfortunately,tablessuchas4.2and4.3arenotalways
availablethatmakeiteasytocompareacorrelationcoefficient
withnorms.

66
Understanding Correlation

Table4.2
CorrelationNormsinOrganizationalPsychology
Topic Metaanalysis Average
Correlation
Intrinsicmotivationand Mathieu&Zajac .67
organizationalcommitment (1990)
Jobsatisfactionand CooperHakim& .59
organizationalcommitment Viswesvaran(2005)
Agreementofperformance Conway&Huffcut .50
ratingsbytwosupervisors (1997)
Koslowskyetal. .40
Absenteeismandlateness
(1997)
Jobsatisfactionand .30
Judgeetal.(2001)
performance
Absenteeismandturnover Griffthetal.(2000) .21
Enjoymentoftrainingand .11
Alligeretal.(1997)
actuallearning

Table4.3
CorrelationNormsforEmployeeSelectionValidity
Technique Metaanalysis Average
Validity
Cognitiveability Schmidt&Hunter(1998) .39
Biodata Beall(1991) .36
Structuredinterview Huffcutt&Arthur(1994) .34
Assessmentcenters Arthuretal.(2003) .28
Worksamples Rothetal.(2005) .26
Experience Quinonesetal.(1995) .22
Conscientiousness Judgeetal.(2013) .21
Situationaljudgment McDanieletal.(2007) .20
References Aamodt&Williams(2005) .18
Grades Rothetal.(1996) .16
Integritytests VanIddekingeetal.(2012) .13
Personality Tettetal.(1994) .12

67
Chapter 4

Unstructuredinterview Huffcutt&Arthur(1994) .11

UtilityAnalysis

Anotherwaytodetermineifavaliditycoefficientis"anygood"
istotranslatethecorrelationintotermsthatmostpeoplecan
understand.Thoughthereareseveraldifferentmethodsused
to establish the utility of a test (e.g., TaylorRussell Tables,
expectancy charts), we will concentrate on the
BrogdenCronbachGleser Utility Formula. This formula
computes the amount of money that an organization would
saveifitusedatesttoselectemployees.Tousethisformula,
sixpiecesofinformationmustbeknown.

1. Numberofemployeeshiredperyear(n).Thisnumberis
easytodetermine:itissimplythenumberofemployees
who are hired for a given position in a year. This
numbercanbetheactualnumberinagivenyearoran
estimateofthenumberinatypicalyear.

2. Averagetenure(t).Thisistheaveragenumberofyears
that employees in the position tend to stay with the
company.Thenumberiscomputedbyusinginformation
from company records to identify the time that each
employeeinthatpositionstayedwiththecompany.The
numberofyearsoftenureforeachemployeeissummed
anddividedbythetotalnumberofemployees.Ifactual
tenuredataarenotavailable,anestimatecanbeused;
butestimatesreducetheaccuracyoftheutilityformula.

3. Test validity (r). This figure is the criterion validity


coefficientthatwasobtainedthroughavaliditystudy,
thetechnicalmanualthataccompaniesacommercially
availabletest,orvaliditygeneralization.

4. Standard deviation of performance in dollars (SD$).


Formanyyears,thisnumberwasdifficulttocompute.

68
Understanding Correlation
Research has shown, however, that for jobs in which
performanceisnormallydistributed,agoodestimateof
thedifferenceinperformancebetweenanaverageanda
good worker (one standard deviation away in
performance)is40%oftheemployee'sannualsalary.To
obtain this, the total salaries of current employees in
thepositioninquestioncanbeaveraged,orthesalary
grade midpoint for the position can be used. For
example, if the salary midpoint for an electronics
assembler is $25,000, SD$ would be .40 * $25,000 =
$10,000.

5. Meanstandardizedpredictorscoreofselectedapplicants
(m).Thisnumberisobtainedinoneoftwoways.The
first method is to obtain the average score on the
selectiontestbothfortheapplicantswhoarehiredand
theapplicantswhoarenothired.Theaveragetestscore
of the nonhired applicants is subtracted from the
average test score of the hired applicants. This
differenceisdividedbythestandarddeviationofallthe
test scores. For example, we administer a test of
cognitiveabilitytoagroupof50applicantsandhirethe
5 with the highest scores. The average score of the 5
hired applicants is 35.2, the average test score of the
other45applicantsis28.2,andthestandarddeviation
ofalltestscoresis8.5.Thedesiredfigurewouldbe:

(35.228.2)8.5=7.08.5=.647

The second way to find m is to compute the


proportionofapplicantswhoarehiredandthenusea
conversiontabletoconverttheproportionintoastand
ard score. This second method is used when an
organization plans to use a test, knows the probable
selection ratio based on previous hiring periods, but
does not know the average test scores because the
organizationhasneverusedthetest.Usingtheabove
69
Chapter 4
example,theproportionofapplicantshiredwouldbe:

openingsapplicants=550=.10

From Table 4.4, we see that the standard score


associatedwithaselectionratioof.10is1.76.

6. Costoftesting(C). Thisfigureisobtainedbymultiplying
thenumberofapplicantsbythecostpertest.

Todeterminethesavingstothecompany,weusethe
followingformula:

Savings=(n)(t)(r)(SD$)(m)costoftesting

Table4.4
Selectionratioconversiontable
SelectionRatio StandardScore(m)
.05 2.08
.10 1.76
.20 1.40
.30 1.17
.40 0.97
.50 0.80
.60 0.64
.70 0.50
.80 0.35
.90 0.20
1.00 0.00

Asanexample,supposewewillhire10auditors per
year, the average person in this position stays 2 years, the
validity coefficient is .40, and average annual salary for the
positionis$30,000,andwehave50applicantsfor10openings.
Thus,

70
Understanding Correlation

n=10
t=2
r=.40
sd$=$30,000x.40=12,000
m=10/50=.20=1.40(.20isconvertedto1.40byusing
theconversiontable)
Costoftesting=(50applicantsx$10=$500)

Usingtheutilityformula,wewouldhave

(10)(2)(.40)(12,000)(1.40)(500)=$133,900

Thismeansthatafteraccountingforthecostoftesting,
using this particular test instead of selecting employees by
chancewillsaveacompany$133,900overthetwoyearsthat
auditors typically stay with the organization. Because a
company seldom selects employees by chance, the same
formulashouldbeusedwiththevalidityofthetest(interview,
psychological test, references, and so on) that the company
currentlyuses.Theresultofthiscomputationshouldthenbe
subtractedfromthefirst.

71
Chapter 4

ApplyingYourKnowledge

Youcanapplywhatyouhavelearnedinthischapterby
reading the following articles in Applied H.R.M. Research
(www.xavier.edu/appliedhrmresearch)andinPublicPersonnel
Management.

Cole,M.S.,Feild,H.S.,&Giles,W.F.(2003).Whatcanwe
uncoveraboutapplicantsbasedontheirresumes?A
fieldstudy.AppliedH.R.M.Research,8(2),5162.
Cucina,J.M.,Busciglio,H.H.,&Vaughn,K.(2013).Category
ratingsandassessments:Impactonvalidity,utility,
veteranspreference,andmanagerialchoice.Applied
H.R.M.Research,13(1),5168.
Gilbert,J.A.(2000).Anempiricalexaminationofresourcesin
adiverseenvironment.PublicPersonnelManagement,
29(2),175184.
OConnell,M.S.,Doverspike,D.,Cober,A.B.,&Philips,J.L.
(2001).Forgingworkteams:Effectsofthedistribution
ofcognitiveabilityonteamperformance.Applied
H.R.M.Research,6(2),115128.
Smith,W.J.,Harrington,K.V.,&Houghton,J.D.(2000).
Predictorsofperformanceappraisaldiscomfort:A
preliminaryexamination.PublicPersonnel
Management,29(1),2132.

72
5.UnderstandingRegression
_______________________________

IN THE LAST CHAPTER,wediscussedhowcorrelationisusedto


showrelationshipsbetweentwovariables.Thoughcorrelation
isthebasisforregression,regressionanalysisallowsustodo
three things that simple correlations do not: make precise
predictions,combinesmallcorrelationcoefficients,andremove
unnecessaryvariables.

FunctionsofRegression

MAKINGPRECISEPREDICTIONS

Suppose that you conduct a validitystudy andfindthat the


correlation between scores on a cognitive ability test and
performanceinthepoliceacademyis.45.Fromtheseresults,
youwouldconcludethatthetwoarehighlyrelatedandthat
applicantswithhighscoresonthetestwillperformbetterin
the academy than applicants with low scores on the test.
Though suchinformationisuseful, it maynot tellus allwe
needtoknow. Thatis,weknowthatona relative basis,an
applicant scoring 85 on the exam should do better in the
academythananapplicantscoring75.Whatwedont know,
however,isonan absolute basis,howwelltheapplicantwill
performintheacademy.Willsheaverage90%?Willsheeven
passtheacademy? Regressionanalysiscanhelpanswersuch
questionsbyallowingustotakeascoreonatest,enteritintoa
regressionequation,andobtainanapplicant'spredictedscore
Chapter 5
onsomemeasureofworkperformance(e.g.,academygrades,
supervisorratings).
COMBININGSMALLCORRELATIONS

Ifyourememberfromourdiscussiononcorrelation,weliketo
seecorrelationsofatleast.20betweenaselectiontestanda
measureofjobperformance.However,supposethatwehavea
personalityinventorythatcorrelates.13withjobperformance
and a cognitive ability test that correlates .15 with job
performance. Based on these small correlations, we would
probably be disappointed and lose hope of ever winning the
Nobel Prize for HR Validity Studies. However, regression
analysismightbeabletosavetheday.Thatis,withmultiple
regression, we can combine two small correlations into one
largercorrelation(ifthetwopredictorsdonotcorrelatewith
oneanother,wecansimplyaddthesquaredcorrelations,butit
isusuallymorecomplicatedthanthat).
The late industrial psychologist Dan Johnson likened
theuseofregressiontoafishingtrip.Duringourtrip,wecan
trytocatchonehugefishtomakeourmeal,orwecancatch
several small fish that, when cooked and placed on a plate,
make the same size meal as one large fish. With selection
tests, we try for one or two tests that will correlate with
performance at a high level. Unfortunately, such big
correlations are as hard to get as it is to catch a fish large
enough to feed the entire family. But by combining several
tests with smaller validities, we can predict performance as
wellaswecouldbyusingonetestwithaveryhighvalidity.

REMOVINGUNNECESSARYVARIABLES

Oneofthenicethingsaboutmultipleregressionisthat
inadditiontocombiningsmallcorrelations,italsotellsusifwe
havetoomanyvariablesmeasuringthesamething. Thatis,
supposeourselectionbatterycontainsapersonalityinventory
and an unstructured interview. The personality inventory
correlates .25 with performance and the unstructured
Understanding Regression
interview correlates .20 with performance. We start to get
excitedbecauseifweaddthetwotogether,wewouldhavea
multiple correlation (R) of .45. However, after entering our
data into a regression analysis, we find that our equation
"threwout"theinterviewbecauseitwasmeasuringthesame
thingourpersonalitytestwasmeasuringsocialskillsand
extroversion.So,eventhoughwethoughtweweremeasuring
twodifferentconstructs,ourtestandinterviewwereactually
measuringthesamething(theywerehighlycorrelated).
A few years ago, we were asked by a clinical
psychologisttovalidatethetestbatteryhewasusingtoselect
policeofficers.Aswereviewedhisbattery,wewerestunnedto
see that every applicant was administered three different
measures of cognitive ability and three different personality
inventories. Whenweaskedthepsychologistwhyheusedso
manysimilartests,hesaidthathe,"gotsomethingdifferent
from each one of them." However, since the three cognitive
ability measures were highly correlated, as were the three
personalityinventories(hescoresthempass/fail),wedoubted
thattheextratestsprovidedanynewinformation.
To test this idea, we entered the test scores into two
separate regression equationsone to predict his overall
ratingsof"suitability"andonetopredictsupervisorratingsof
onthejobperformance.Asexpected,onepersonalityinventory
andonecognitiveabilitytestpredictedhissuitabilityratings
the other tests did not help predict his ratings (that is, the
other tests did not account for unique variance). We were
unsuccessful in explaining the results to him in terms of
statistics,sowefinallysaid,"Whattheresultsshowisthatyou
canmakethesamedecisionswithtwotestsasyouwouldwith
six.Thedifferenceisthatyouwillsaveabout$100intesting
costsperapplicant."Thatheunderstood!
The moral of this story is that when selecting
employees, the test battery should not contain several
measuresofthesameknowledge,skill,orability.Ifwegoback
toourexampleofmakingameal,oncewecatchenoughfish(a
cognitive ability test), there is no need to catch more (more
Chapter 5
cognitiveabilitytests). Instead,tomaketheperfectmeal,we
should add a salad (personality inventory), some bread
(structuredinterview),anddesert(integritytest).Toomuchof
thesamethingmakesaboringmealandawastefulselection
battery.

ConductingaRegressionAnalysis

Toconductaregressionanalysis,acomputerprogramsuchas
SPSS,SAS,orExcelistypicallyused. Thougheachofthese
programs uses slightly different commands, the results are
almost identical. To run a regression analysis, aresearcher
tells the computer which variables are the predictors (the
independentvariables)andwhichvariablesaretheonestobe
predicted(thedependentvariables).Forexample,
A personnel analyst might be interested in seeing how
interviewscoresandscoresonfivepersonalitydimensions
(the predictors) predict supervisor ratings of onthejob
performance(thedependentvariable).
A compensation analyst might want to see how
performance ratings, years in the organization, and
education level (the predictors) are related to salary (the
dependentvariable).
AuniversitymightwanttofindthebestwaytouseSAT
scoresandhighschoolgradepointaverages(thepredictors)
to predict the GPA students will earn during their
freshmanyear(thedependentvariable).

TYPESOFREGRESSIONANALYSES

There are two main ways to enter the predictors into the
regression analysis: stepwise and hierarchical. With a
stepwise regression analysis, the computer takes the best
predictor of the dependent variable and enters it into the
equation first. The computer then enters the second best
predictor and then the third best and so on. The computer
Understanding Regression
stopsenteringvariableswhenthereareeithernovariablesleft
to enter or the remaining variables do not add a significant
amount of prediction. Stepwise regression is the most
commonlyusedmethodinemployeeselection.
Howdoesthecomputerprogramknowwhichvariableis
bestateachstep?Thefirstvariableenteredintotheregression
equation is the one with the highest correlation with the
dependentvariable.Thenextoneenteredisdeterminedbytwo
things:howwellitisrelatedtothedependentvariableandhow
highlyitis correlatedwiththevariablealreadyenteredinto
theequation.Asanexample,lookatthecorrelationsinTable
5.1 that show the relationships between grades in graduate
school andGREscores,undergraduate grades,andreference
letters.
Inastepwiseregression,GREscoreswouldbethefirst
predictor entered into the equation, because they have the
highest correlation with the dependent variable (graduate
GPA).Althoughundergraduategradeshavethenexthighest
correlation(r=.25),references(r=.20)wouldactuallybethe
next entered; because, although undergraduate GPA has a
higher correlation than do references, they are so highly
correlated(r =.80)withGREscoresthattheywouldnotadd
muchuniqueorincrementalprediction.References,however,
arenotatallcorrelatedwithGREscores(r=.00)andthuswill
addincrementalprediction.

Table5.1
CorrelationswithGraduateGPA
GradGPA GRE UGGPA Reference
s
GraduateGPA 1.00 .30 .25 .20
GRE 1.00 .80 .00
Undergraduate 1.00 .10
GPA
References 1.00
Chapter 5
Withahierarchicalregressionanalysis,theresearcher
tells the computer program the order in which to enter the
predictors. There are many times when a researcher might
wanttodictatetheorder. Forexample,apolicedepartment
hasdevelopedastructuredinterviewandintendstouseitas
themainmethodofselectingnewofficers.Thedepartmentis
considering adding a cognitive ability test to its selection
batteryAnother
and wants to know
reason if the
to use cognitive regression
hierarchical ability test will
is to
increasethepredictionaccuracyabovethatalreadyprovidedby
reduce adverse impact. For example, suppose that a police
theinterview.Insuchacase,thedepartmentwouldfirstenter
departmentplanstouseastructuredinterviewandacognitive
interview
ability test scores into new
to select the officers.
regression analysis
The and
cognitive thentest
ability the
cognitive ability scores. If the cognitive
correlates.30withperformance,andthestructuredinterview ability scores are
statisticallysignificant(i.e.,provide
correlates .20 with performance. The incrementalvalidity),the
cognitive interview is
departmentcorrelated
moderately would use with both the
the structured
interview and interview.
the cognitive
When
abilitytest. Iftheadditionofthecognitiveabilitytestisnot
usedtogetherinastepwiseregression,theinterviewandthe
significant, the department would use only the structured
testcorrelate.35withpoliceperformance.
interview.Thoughthedepartmentishappywiththevalidityofthe
two tests, it is concernedbecause their selectionbatteryhas
adverseimpactagainstAfricanAmericans.Afteranalyzingthe
datafurther,thedepartmentfindsthattheadverseimpactis
due to the cognitive ability testAfrican Americans and
Whitesscoreequallywellonthestructuredinterview.
Because of the adverse impact, the department
considersdroppingthecognitiveabilitytestbutdoesntwantto
do that because the test is valid. In this case, hierarchical
regression might be a partial solution. By entering the
structuredinterviewintotheequationfirst,itwillcarrymore
weight than it would in a stepwise regression equation. By
increasingtheweight givento thepredictorwithnoadverse
impact, the adverse impact of the entire selection procedure
willbereduced(butnoteliminated).

Hierarchicalregressionisalsocommonlyusedinsalary
equity analysis. For example, suppose that a school system
discoversthattheaveragesalaryforits30femalejanitorsis
$17,232 compared to $20,400 for its 50 male janitors. The
Department of Labor thinks this difference is due to
discrimination. Theschoolsystem,however, thinks thatthe
Understanding Regression
differenceinaveragesalaryisduetothenumberofyearsthe
employees have been with the school system and to their
performanceratingsratherthantosexdiscrimination.
Totestitsidea,theschoolsystemwouldfirstenterthe
employees tenure and performance rating data into the
analysisandthenentertheemployeesgender(codedas0for
males,1forfemales).Ifgenderdidnotentertheequationasa
significantpredictorofsalary,itcouldbesaidthatthesalary
differences were indeed due to tenure and performance. If
gender entered the equation as a significant predictor, two
interpretations could be made. The first is that the school
systemisdiscriminating. Thesecondisthatthereareother
unknownvariables(e.g.,educationlevel)thatcouldexplainthe
salary difference if they were entered into the regression
equation.

CONSIDERATIONSINRUNNINGAREGRESSIONANALYSIS

Foraregressionanalysistobeaccurate,threefactorsshouldbe
considered: number of subjects, variables in the regression
model,andmissingdata.

NumberofSubjects

Thoughcomputerprogramswillallowyoutorunaregression
analysiswithdatafromtwoormoresubjects,theresultsare
notasaccurate(stable)whenasmallnumberofsubjects(e.g.,
applicants, employees) is used. The question of how many
subjectsareneededtoreliablyrunaregressionanalysisisa
difficultonetoansweranddependsinpartonthepurposeof
the regression analysis. If the purpose of the regression
analysis is to explain what is happening in a series of data
(e.g.,Aremenandwomenataparticularcompanybeingpaid
equitably?Whatfactorsareassociatedwithabsenteeismonthe
nightshift?),fewersubjectsareneededthanifthepurposeisto
predict the behavior of people not in the sample (e.g., using
employeedatafrom19992005topredicthowfutureemployees
Chapter 5
So, how many subjects do you need? The answer
dependsonwhoyouask.Somestatisticiansarguethatthereis
aminimumnumberofsubjectsthatmustbepresenttoconduct
aregressionanalysis(e.g.,50),somearguethatthekeyisratio
of subjects to the number of variables (e.g., 10 subjects for
every predictor), and others argue that both a minimum
numberandthesubjectstovariableratioareimportant(e.g.,a
minimumof30subjectsandatleast10subjectsperpredictor).
In general, regression used to explain data can be
comfortablyusedwhenyouhavedatafrom50ormorepeople.
Regressions can be run with data from fewer people, but
cautionshouldbeusedwheninterpretingtheresults.

VariablesintheRegressionModel

ModelSpecification.Oneoftheassumptionsinregressionis
that all relevant variablesareincluded inthemodel andno
irrelevant variables are included. For example, suppose that
you weretrying to predict graduate GPAand theorized that
grades in graduate school are the result of both cognitive
ability and motivation. If your regression equation only
included GRE scores (a measure of cognitive ability) but no
measure of motivation (e.g., undergraduate GPA, letters of
recommendation,personalstatement),youwouldhaveamodel
specificationerror.Thusitisimportanttotrytofirsttheorize
orbrainstormtherelevantvariablesandthenmakeastrong
efforttoincludetheminyourregression.

Singularity and Multicollinearity. There are times when


two variables in a regression equation are either perfectly
correlated (singularity) or so highly correlated
(multicollinearity) with each other that problems in the
regressionequationoccur.Anexampleofsingularitywouldbe
if you were trying to predict employee salary and your
variablesincludedyearswiththecompanyandtotalyearsof
experience.Normally,thesevaluesformostemployeeswould
Understanding Regression
bedifferent.Thatis,anemployeemighthave5yearswiththe
companyand10yearsoftotalexperience(5withthecompany
and5withothercompanies).Inthecaseofanentryleveljob,
however, it might be that the number of years with the
companyisthesameasthetotalyearsofexperience.Insucha
case,oneofthetwovariableswouldneedtoberemovedforthe
regressionequationtorun.
Withacaseofsingularity,theregressionequationwill
not even run. With multicollinearity, however, the equation
willrun,buttheregressionweightswillnotbeaccurate.These
inaccurate regression weights might result in an important
variableappearingtobeunrelatedtothedependentvariableor
appearingtoberelatedinanegativedirectioneventhoughthe
actualrelationshipispositive.
Thoughthereissomedisagreementamongstatisticians,
twovariablesneedtobecorrelatedatleast.90,andprobably
higher,formulticollinearitytobeaconcern.Iftwovariablesdo
correlatethathighly,theeasiestsolutionistosimplyremove
oneofthevariablesfromtheregressionanalysis.

MissingData

Suppose you have 100 employees in your organization and


wanttoseehowwelltheemployeescognitiveability,education
level,andinterviewscorespredicttheirsupervisorsratingsof
theirperformance. Ofthe100employees,youhaveinterview
scoresforallofthem,educationlevelsfor95employees,and
cognitive ability scores for 70 employees. If you ran a
regression,onlythe70employeeswithallthreescorescouldbe
includedintheanalysis.
Therearethreecommonwaystohandlemissingdata.
Theeasiestisto removetheemployees withmissingdataand
run the regression only with the employees who have no
missingdata.Anotherapproachistoremovethevariablesthat
have missing data and run the regression using only those
variablesforwhichdataarenotmissing.Thethirdapproachis
to substitute the mean value of the variable for the missing
Chapter 5
data.Thatisifthemeaninthecompanyis5yearsofprior
experience,andexperiencedataaremissingforJoeandSue,
wewouldgivebothJoeandSueavalueof5years.
The choice of the three methods to use is largely
dependent on the percentage of data that are missing. For
example, if you have data from 100 employees and data are
missing on a variable for 2 of the 100 employees, the best
solutionwouldprobablybetoremovetheemployeeswiththe
missingdata.If,however,dataonavariablearemissingfrom
halfoftheemployees,itwouldmakemoresensetoremovethe
variablefromtheanalysis.

InterpretingRegressionResults

Most statistical programs such as SAS and SPSS produce


output with similar information and tables. The output can
best be interpreted by breaking it into three sets of
information:effectivenessoftheregressionanalysis,analysisof
variance results, and information about the independent
variables.
To explain how to interpret these three sets of
information, we will use twoexamples. One of the examples
comesfromacompanythatwantedtodeterminetheextentto
which three factors (years with the company, having a
bachelors degree, and performance ratings) explained how
muchitsemployeeswerebeingpaid.Theotherexampleisa
regression analysis used at Radford University to select I/O
psychology graduate students using the students
undergraduate GPAs, scores on the Graduate Record Exam
(GRE),andreferenceratingsprovidedbyfacultymembers.

Table5.2
Regressionstatistics
Example
Understanding Regression
SalaryStudy GraduateSchool
Admissions
Observations 33 232
MultipleR 0.53 0.47
Rsquare 0.28 0.22
AdjustedR2 0.21 0.21
Standarderror $2,160 0.39

EFFECTIVENESSOFTHEREGRESSIONANALYSIS

The first set of information that is reported in most


journal articles or technical reports summarizes the overall
effectivenessoftheregressionanalysis.AsshowninTable5.2,
five pieces of information are usually provided: number of
observations, multiple R, Rsquared, adjusted R 2, and the
standarderroroftheestimate.

Observations

Observations are the number of employees included in the


analysis.Asmentionedpreviously,thegreaterthenumberof
observations,themoreaccurateandstabletheresultsofthe
regression.Inthesalarystudy,ourregressionanalysisisbased
on data from only 33 employees; whereas, in our graduate
admissionsstudy,ourregressionanalysisisbasedondatafrom
232students.

MultipleR

The MultipleR isthecorrelationbetweenthe combination of


theindependent variables and the dependent variable. From
Table 5.2, you can see that for the salary study, the
combination of timeincompany, bachelors degree, and
performanceratingscorrelates.53withemployeesalaries.For
the graduate admissions study, the combination of
undergraduate GPA, GRE scores, and reference ratings
Chapter 5
correlates .47 with the GPA students obtained in graduate
school.

RSquare(R2)

RSquare(R2)isthepercentageofindividualdifferencesinthe
dependent variable that the regression model explains. As
shown in Table 5.2, the combination of timeincompany,
bachelorsdegree,andperformanceratingsaccountsfor28%of
theindividualdifferencesinemployeepay(.53*.53).Weare
not sure what accounts for the additional 72%. For the
graduateadmissionsstudy,thecombinationofundergraduate
GPA,GREscores,andreferenceratingsaccountsfor22%ofthe
variabilityingraduateschoolgrades(.47*.47).

AdjustedR2
Regression is most accurate with large sample sizes. The
Adjusted Rsquare corrects for estimated errors caused by
smallsamplesizes.Thelargerthesamplesize,thesmallerthe
differencebetweentheR2 andtheadjustedR2. InTable5.2,
theR2of.28forthesalarystudywasadjusteddownwardto.21
and the R2 of .22 for the graduate admissions study was
adjusted downward to .21. Notice that because the graduate
admissions study has a much larger sample size (232) than
doesthesalarystudy(33),theadjustedR2 didnotdeclineas
much.

StandardErroroftheEstimate

Asmentionedearlierinthechapter,regressioncanbeusedto
makepredictions.Inthesalarystudy,thegoalwastopredict
orestimatewhatanemployeessalaryshouldbegivenhis/her
yearsinthecompany,education, andperformance. Because
estimates made from a regression equation are just that
estimatesmostregressionoutputincludestheStandardError
oftheEstimate. ThegreatertheR2 andthesamplesize,the
smallerthestandarderroroftheestimate.
Understanding Regression
AsshowninTable5.2,forthesalarystudy,thestandard
errorof$2,160indicatesthat68%(onestandarddeviationfrom
the mean) of the errors in estimating what an employees
salaryshouldbewillfallwithin$2,160.Statedanotherway,if
weestimatethatanemployeeshouldbepaid$45,000,weare
68% confident that their salary should be between $42,840
($45,000 $2,160) and $47,160 ($45,000 + $2,160). So, if we
estimate that an employee should make $45,000 and she is
actuallymaking$44,000,wewouldprobablynotbeconcerned
that the employee is underpaid; because her actual salary
($44,000)fallswithinonestandarderror($42,840$47,160)of
theestimated salary.Thoughourexampleusedonestandard
error, most HRprofessionalsuseacriterionof twostandard
errors.
Forthegraduateadmissionsstudy,thestandarderrorof
the estimate was .39, indicating that if we predict that a
studentwillearnagraduateGPAof3.6,wewouldexpectthat
68%ofthetimetheiractualgraduateGPAwillbebetween3.21
(3.60.39)and3.99(3.60+.39).

ANALYSISOFVARIANCERESULTS

The next section of the output depicts the results of the


analysis of variance (ANOVA) that tests the statistical
significance of the R2. The key part of this section is the
significancelevel(p<column).Ifthesignificancelevelisless
thanorequalto.05,theR2discussedintheprevioussectionis
considered statistically significant. In the output shown in
Table5.3,thesignificancelevelof.02wouldindicatethatthe
R2of.28fromthesalarystudyisstatisticallysignificant.

Table5.3
ANOVAtableforthesalarystudyregressionanalysis
Source df SumofSquares MeanSquare F p<
Regression 3 53,164,088.63 17,721,362,88 3.80 0.02
Chapter 5
Residual 29 135,373,606.01 4,668,055.38
Total 32 188.537.694.64

Table5.4
ANOVAtableforthegraduateadmissionsregression
Source df SumofSquares MeanSquare F p<
Regression 3 9.97 3.32 21.96 0.000
Residual 228 34.50 0.15
Total 231 44.47

ForthegraduateadmissionsstudyANOVAinTable5.4,
noticethattheprobabilityvalueis0.000.Thisalsoindicates
that the R2 is statistically significant. Most programs round
probabilitylevelstotwoorthreedecimalplaces.Asaresult,if
youseeaprobabilityvalueof.00or.000, it means that the
probabilitythattheresultsoccurredbychanceislowerthan1
inahundredforthe.00figureand1inathousandforthe.000
figure,bothofwhicharestatisticallysignificant.
Although the other numbers in the tables are only
importantbecausetheyprovidethedatanecessarytogetthe
significancelevel,incaseyouareinterested,hereiswhatthe
othercolumnsmean.

DegreesofFreedom

The df column indicates the degrees of freedom. The


regressiondegreesoffreedomarethenumberofindependent
variables in the analysis. In both examples, we have three
independentvariables:yearswiththecompany,education,and
performance ratings for the salary study and undergraduate
GPA, GRE scores, and reference ratings for the graduate
admissions study. Thus, for both studies, we have three
regressiondegreesoffreedom.Thetotaldegreesoffreedomare
the number of observations minus one. In the salary study,
becausewehave33observations,wehave32totaldegreesof
freedom.Inthegraduateadmissionsstudy,becausewehave
232observations,wehave 231total degreesoffreedom. The
Understanding Regression
residual degrees of freedom are simply the total degrees of
freedomminustheregressiondegreesoffreedom.

SumofSquares,MeanSquare,andF

Thesumsofsquaresandthemeansquareareusedtocompute
theFvalue that youlearnedabout inChapter 3. Themean
square is computed by dividing the sum of squares by the
degreesoffreedomandtheFvalueiscomputedbydividingthe
regressionmeansquarebytheresidual(orerror)meansquare.

INFORMATIONABOUTTHEINDEPENDENTVARIABLES

Thefinalsectionoftheoutputcontainsinformationabouteach
of the independent variables included in the regression
analysis. Thefirstkeyvalueforeachvariableisthe pvalue,
which is the significance level. If this value is less than or
equal to .05, the variable explains a statistically significant
percentage of the individual differences in your dependent
variable.InthesalarystudyexampleshowninTable5.5,time
in company (tenure) with a pvalue of .00 is statistically
significant but having a bachelors degree (p < .13) and
performanceratingsarenot(p<.89).
When looking at the significance levels for the
independentvariables,threepatternscanemerge:allvariables
will be statistically significant, none of the variables will be
statisticallysignificant,orsome,butnotall,ofthevariables
will be statistically significant. If none of the variables are
significant,youcantusetheregressionmodeltounderstand
currentbehaviorortopredict/estimatefuture/desiredbehavior.
InthegraduateadmissionsstudyshowninTable5.6,
bothGPA(p<.000)andGREscores(p<.000)arestatistically
significantbutreferenceratings(p<.118)arenot.
Chapter 5
Table5.5
Informationabouttheindependentvariablesinthesalary
study
Standard
Variable Coefficient tvalue pvalue Beta
Error
Intercept $19,756.00 $668.97 29.53 0.00 0.00
Tenure $318.82 $103.69 3.07 0.00 0.74
Degree $983.32 $625.57 1.57 0.13 0.39
Performance $96.58 $693.77 0.14 0.89 0.02

Table5.6
Informationabouttheindependentvariablesinthegraduate
admissionsstudy
Standard
Variable Coefficient tvalue pvalue Beta
Error
Intercept 1.13 .321 3.52 .001
GPA .383 .076 5.03 .000 .337
GRE .001 .000 3.59 .000 .211
References .136 .087 1.57 .118 .105

The second part of the output that provides useful


information is the coefficient column. For each variable, the
coefficient indicates the amount of change in the dependent
variableforeachunitofchangeinthedependentvariable.For
example, in Table 5.5, the coefficient of $318.82 for tenure
indicatesthatforeveryyeartheemployeehasbeenwiththe
company, his or her salary would be expected to be $318.82
above the intercept coefficient of $19,756. So, if Marcus has
beenwiththecompanyfor10years,wewouldestimatethat
hissalaryshouldbe$22,944.20[$19,756+(10*$318.82)]andif
Maryhasbeenwiththecompanyfor5yearswewouldestimate
thathersalaryshouldbe$21,350.10[$19,756+5*$318.82)].
Forthegraduateadmissionsexample,thecoefficientof.
383indicatesthatforeveryfullpointofundergraduateGPA,
wewouldexpectanincreaseingraduateGPAof.383.Thatis,
theexpecteddifferenceingraduateGPAsbetweenanapplicant
with an undergraduate GPA of 3.00 and one with an
Understanding Regression
undergraduateGPAof4.00wouldbe.383.
NoticeinTables5.5and5.6thatthereisacolumnwith
the heading, Beta. Though Beta is not commonly used to
interpretregressionresults,itisthestandardizedregression
coefficient.ThefurtherthattheBetavalueisfromzero,the
stronger the relationship between the independent variable
andthedependentvariable.

THEREGRESSIONEQUATION

From our salary study and graduate admissions examples


above,weknowwecanpredictemployeesalariesandgraduate
GPAs. To use our regression results to make specific
predictions, we need to create a regression equation. In its
simplest form, the results of a regression analysis yield a
regressionequationthatlookssomethinglike:

Y=c+(b1)(x1)

WhereYisthepredictedvalueofsomevariable,cisaconstant
(inalgebra,wewouldcallthistheinterceptwhichrepresents
the predicted score on the criterion if the scores on the
predictorwerezero),b1istheweightwegiveourpredictor(in
algebra we would call this the slope which represents the
amount of change we would expect in thepredictor for each
unit of change in the predictor), and x1 is the score on a
predictor. Though the constant and the weight can be
calculatedbyhand,wenormallyletthecomputerdothework
byusingaprogramsuchasSAS,SPSS,orExcel.
To use our regression results from Table 5.6, our
regressionequationtopredictgraduateschoolgradeswouldbe:

PredictedgradGPA=1.13+(.383)(UGGPA)+(.001)(GRE)+(.136)
(referencescore)

Intheequation:
Chapter 5
1.13istheconstant(intercept)
.383istheweightthatismultipliedbytheundergraduate
GPA
.001istheweightthatismultipliedbytheGREscore
.136istheweightthatismultipliedbythereferencerating
(the reference rating is on a 14 scale with a 4 being
excellentanda1beingbelowaverage).
Let'susetwohypotheticalstudentsasanexample. Jenny
CraighasaGREscoreof1000,anundergraduateGPAof3.60,
andareferenceratingscoreof3.0. RichardSimmonshasa
GREscoreof900,anundergraduateGPAof3.0andareference
rating score of 3.5. The formula to predict the students'
graduateGPAswouldbe:

Jenny'sGPA =1.13+(.383)(3.60)+(.001)(1000)+(.136)(3.0)
=1.13+1.38+1.00+0.41
=3.92

Richard'sGPA =1.13+(.383)(3.00)+(.001)(900)+(.136)(2.0)
=1.13+1.15+0.90+.272
=3.45

At Radford University, we typically have about 80


studentsapplyforour12openingssoweonlyacceptstudents
whosepredictedgraduateGPAisatleasta3.60. Usingthe
datafromtheaboveexample,wewouldacceptJennyandher
3.92predictedGPAandrejectRichardandhis3.45predicted
GPA.

FinalThought

When running a regression or reading about a


regressionanalysisinajournalortechnicalreport,youcanuse
thequestionsinBox5.1toevaluatetheanalysis.Iftheanswer
toanyofthequestionsisno,theremaybeaproblemwiththe
analysis.
Understanding Regression

Box5.1
Checklistforevaluatingjournalarticlesandtechnicalreports
usingregression
? AssessmentQuestion
Isthesamplesizelargeenoughtohandlethenumberofvariables
intheregression?
Areallrelevantvariablesincludedintheregression?
Isthemodelfreeofirrelevantvariables?
Isthehypothesizedrelationshipbetweentheindependentand
dependentvariableslinear?
Ifthehypothesizedrelationshipisnotlinear,didtheresearcher
testforcurvilinearrelationships?
Didtheresearchersearchforandremoveoutliers?
Arethecorrelationsamongtheindependentvariableslessthan.
90?
Chapter 5

ApplyingYourKnowledge

Youcanapplywhatyouhavelearnedinthischapterby
reading the following articles in Applied H.R.M. Research
(www.xavier.edu/appliedhrmresearch)andinPublicPersonnel
Management.

Befort,N.&Hattrup,K.(2003).Valuingtaskandcontextual
performance:Experience,jobroles,andratingsofthe
importanceofjobbehaviors.AppliedH.R.M.Research,
8(1),1732.
Hausdorf,P.A.,&Girard,M.J.L.(2013).Predictingclerical
trainingperformanceintheCanadianforces:A
comparisonofcognitiveability,educationandwork
experience.AppliedH.R.M.Research,13(1),6972.
Huang,I,Chuang,C.J.,&Lin,H.(2003).Theroleofburnout
intherelationshipbetweenperceptionsof
organizationalpoliticsandturnoverintentions.Public
PersonnelManagement,32(4),519530.
Raynes,B.L.(2001).Predictingdifficultemployees:The
relationshipbetweenvocationalinterests,selfesteem,
andproblemcommunicationstyles.AppliedH.R.M.
Research,6(1),3366.
Roberts,G.E.(2003).Municipalgovernmentparttime
employeebenefitspractices.PublicPersonnel
Management,32(3),435454.
6.MetaAnalysis
_______________________________

IN THE OLD DAYS (prior to 1980), research was reviewed by


reading all of the articles on a topic and then drawing a
conclusion.Forexample,supposethatapersonnelanalystwas
askedtoreviewtheliteraturetoseeifeducationwasrelatedto
policeperformance.Theanalystwouldfindeveryarticleonthe
topic, perhaps count the articles that showed significant
results,andthenreachaconclusionsuchas,"Giventhateight
articlesshowedsignificantresultsandninedidnot,wemust
concludethateducationisnotrelatedtopoliceperformance."

ProblemswithTraditionalLiterature
Reviews

Unfortunately, there are three common situations in which


such a conclusion might be inaccurate: small but consistent
relationships, moderaterelationshipsbut smallsample sizes,
andlargedifferencesinsamplesizesacrossstudies.

SMALLBUTCONSISTENTCORRELATIONS

Suppose that you find four studies investigating the


relationship between education and police performance, but
none of the four studies reported a significant relationship
between the two variables. With a traditional review, you
would probably conclude that education is not a significant
predictorofpoliceperformance.However,itmightbethatthe
actual relationship between education and performance is
relatively small, and a large number of subjects would have
Chapter 6
been needed in each study to detect this small relationship.
TakeforexamplethestudiesshowninTable6.1.Youhavefour
studies, each with samples of 50 officers. The correlations
betweeneducationlevel andperformance in thefour studies
are.20,.17,.19,and.16.Thoughthesizeofthecoefficientsis
consistentacrossthefourstudies,noneofthecorrelationsby
itselfisstatisticallysignificantduetothecombinationofsmall
correlationsandsmallsamplesizesineachstudy. Ifallfour
studiesarecombinedinametaanalysishowever,wefindthat
theaveragecorrelationis.18andwithasamplesizeof200,the
correlationwouldbestatisticallysignificant.

MODERATERELATIONSHIPSANDSMALLSAMPLESIZES

Asecondsituationinwhichtraditionalliteraturereviewsoften
drawincorrectconclusionsoccurswhenthecorrelationsinthe
previous studies are moderate or high, but the sample sizes
weretoolowfortherelationshiptobestatisticallysignificant.
TakeforexamplethefourstudiesshowninTable6.2.Eachof
thecorrelationsisatwhatwewouldconsiderahighlevel,yet
thecorrelationswouldnotbestatisticallysignificantduetothe
small sample sizes in each study. If we combined the four
studieshowever,wewouldgetanaveragecorrelationof.41
with a total sample size of 80, this would be statistically
significant.

LARGE DIFFERENCES IN SAMPLE SIZES ACROSS


STUDIES

Another reason that we might incorrectly conclude that


educationdoesnotpredictperformanceisthatdifferences in
correlationsacrossstudiesmaybeduetolargedifferencesin
sample sizes across studies. As you can seein theexample
shown in Table 6.3, the reason our traditional review would
find mixed results is that the two studies showing a low
correlationbetweeneducationandperformancehadverysmall
sample sizes. Thus, what seem to be huge differences in
Meta-Analysis

validityareactuallydifferencesdueto samplingerror caused


bysmallsamplesizes.

Tobetterunderstandsamplingerror,imaginethatyou
haveabowlcontainingthreeredballs,threewhiteballs,and
threeblueballs. Youare askedtocloseyour eyesandpick
threeballsfromthebowl.Becausethereareequalnumbersof
red,white,andblueballsinthebowl,youwouldexpecttodraw
oneofeachcolor.However,inanygivendrawfromthebowl,it
isunlikelythatyouwillgetoneofeachcolor. Ifyouhaveno
lifeanddrawthreeballsatatimefortenhours,youmightget
threeredballsonsomedraws,nowhiteballsonotherdraws,
andthreewhiteballsonotherdraws. Thus,eventhoughwe
knowthereareanequalnumberofeachcolorofball,anyone
drawmayormaynotrepresentwhatweknowis"thetruth."
However,overthe10hoursyouaredrawingballs, themost
commondrawwillbeoneofeachcolorafinding consistent
withwhatweknowisinthebag.
Thesameistrueinresearch. Supposeweknowthat
thetruecorrelationbetweeneducationlevelandperformance
Chapter 6
another agency might report a correlation of .50, and yet
anotheragencymightreportacorrelationof.30. Ifallthree
studieshadsmallsamples,thedifferencesamongthestudies
and differences from the "truth" might be due purely to
samplingerror.Thisiswheremetaanalysissavestheday.
Metaanalysis is a statistical method for combining
researchresults. Sincethefirstmetaanalysiswaspublished
byGeneGlassin1976,thenumberofpublishedmetaanalyses
hasincreasedtremendously;andthemethodologyhasbecome
increasinglycomplex.ThemetaanalysispioneerswereFrank
Schmidt and John Hunter, and almost every metaanalysis
usesthemethodstheysuggestedintheir1990bookMethodsof
MetaAnalysis and clarified in the book Conducting Meta
AnalysisusingSASbyWinfredArthur,WinstonBennett,and
AllenHuffcutt.
Though metaanalyses will vary somewhat in their
methods and their purpose, most metaanalyses involving
personnelselectionissuestrytoanswerthreequestions:

1. What is the mean validity coefficient found in the


literature for a given predictor (e.g., interviews,
assessmentcenters,cognitiveability)?

2. If we had a perfect measure of the predictor (e.g.,


intelligence,computerknowledge),aperfectmeasureof
performance,andnorestrictioninrange,whatwouldbe
the "true correlation" between our construct and
performance?

3. Can we generalize the metaanalysis results to every


agency (validity generalization), or is our construct a
betterpredictorofperformanceinsomesituationsthan
in others (e.g., large vs. small departments, police
departmentsvs.sheriffsoffices)?

ConductingaMetaAnalysis
Meta-Analysis

FINDINGSTUDIES

The first step in a metaanalysis is to locate studies on the


topicofinterest. Itiscommontousebothan"activesearch"
anda"passivesearch."Anactivesearchtriestoidentifyevery
research study within a given parameter. For example, a
metaanalyst might concentrate her active search on journal
articles and dissertations published between 1970 and 2001
and referenced in one of three computerized literature data
bases(PsycInfo,InfoTrac,DissertationAbstractsInternational)
orreferencedinanarticlefoundduringthecomputersearch.
Apassivesearchmightincludequeriestoprofessionalsknown
tobeexpertsinthearea,paperspresentedatconferences,or
technicalreportsknowntotheauthor.
The major difference between an active and passive
searchisthatthegoalofanactivesearchistoinclude every
relevantstudywithinthegivenparameters,whereasthegoal
ofthepassivesearchistofindotherrelevantresearchwithout
anythoughtthateverystudyonthetopicwasfound.Though
thismaynotseemmuchofadifference,itis.Thesedays,there
are so many potential sources for researchthousands of
journals, conference presentations, theses, dissertations,
technical reports, and unpublished research articlesthat
relevantstudiesaregoingtobemissed.Thus,thecredibilityof
ametaanalysishingesonthescopeandinclusionaccuracyof
itsactivesearch.

CHOOSINGSTUDIESTOINCLUDEINTHEMETAANALYSIS

Oncealloftherelevantstudiesonatopichavebeenlocated,
the next step is to determine which of these studies will be
included in the metaanalysis. To be included in a meta
analysis, an article must report the results of an empirical
Chapter 6
investigation and include a correlation coefficient, another
statistic (e.g., F, t, chisquare) that could be converted to a
correlationcoefficient,ortabulardatathatcanbeenteredinto
the computer to yield a correlation coefficient (many meta
analysesuseCohen'sDratherthanacorrelationcoefficientbut
the rules to include an article are the same). Articles that
report results without the above statistics (e.g., "we found a
significant relationship between education and academy
performance"or"wedidn'tseeanyrealdifferencesbetweenour
educated and uneducated officers") cannot be included in a
metaanalysis.
Often, metaanalysts will have other rules about
keepingstudies.Forexample,inametaanalysisonemployee
wellness programs, the researchers decision to include only
studiesusingbothpreandpostmeasuresofabsenteeismas
wellasexperimentalandcontrolgroupsresultedinonlythree
usablestudies.

CONVERTINGRESEARCHFINDINGSTOCORRELATIONS

Once research articles have been located and the decision is


madeastowhicharticlestoinclude,statisticalresults(e.g.,F,
t, Chisquare) that need to be converted into correlation
coefficientsaredoneusingtheformulasprovidedinRosenthal
(1985).Insomecases,rawdataordatalistedintablescanbe
enteredintoastatisticalprogram(e.g.SAS,SPSS)todirectly
determineacorrelationcoefficient.

CUMULATINGVALIDITYCOEFFICIENTS

As shown in Table 6.4, after the individual correlation


coefficients have been computed, the validity coefficient for
eachstudyisweightedbythesizeofthesampleandsummed.
Thisprocedureensuresthatlargerstudiespresumedtobe
moreaccuratecarrymoreweightthansmallerstudies. For
example,inTable6.4,the.23correlationreportedbyBriscoeis
Meta-Analysis
multiplied by the sample size of 150 to get 34.5. This
procedureisthendoneforeachofthestudies. Inadditionto
themeanvaliditycoefficient,theobservedvariance,amountof
varianceexpectedduetosamplingerror,anda95%confidence
intervalarecalculated(itisbeyondthescopeofthischapterto
discussthesecalculations).

CORRECTINGFORARTIFACTS

When conducting a metaanalysis, it is desirable to adjust


correlation coefficients to correct for error associated with
predictor unreliability, criterion unreliability, restriction of
range, and a host of other artifacts (see Hunter & Schmidt,
1990 for a thorough discussion). These adjustments answer
the second question of, "If we had perfect measures of the
construct,aperfectmeasureofperformance,andnorestriction
in range, what would be the true correlation between our
constructandperformance?"

Table6.4
ExampleofCumulatingValidityCoefficients
Sample
Study Correlation CorrelationxSampleSize
Size
Briscoe(1997) .23 150 34.5
Green(1974) .10 100 10.0
Curtis(1982) .42 50 21.0
Logan(1991) .27 300 81.0
Ceretta(1995) .01 20 0.2
Greevy(1989) .29 200 58.0
TOTAL 820 204.8
WeightedAverage=204.8820=.25

Table6.5
CorrectingCorrelationsforTestUnreliability
Study Validity Test Square Corrected
Chapter 6

Reliability Root Validity


Tinkers(1985) .30 .92 .96 .31
Evers(1990) .23 .80 .89 .26
Chance(1995) .25 .65 .81 .31

Theseadjustmentscanbemadeinoneoftwoways.The
most desirable way is to correct the validity coefficient from
each study based on the predictor reliability, criterion
reliability, and restriction of range associated with that
particularstudy.Asimpleexampleofthisprocessisshownin
Table6.5.Tocorrectthevaliditycoefficientsineachstudyfor
test unreliability, the validity coefficient is divided by the
squarerootofthereliabilitycoefficient.IntheTinkers(1985)
study,thereliabilityofthetestwas.92,thesquarerootof.92
is.96,andthecorrectedvaliditycoefficientis.30.96=.31.
When the necessary information is not available for
each study, the mean validity coefficient is corrected rather
than each individual coefficient. This is the most common
practice. The numbers used to make these corrections come
eitherfromtheaverageofinformationfoundinthestudiesthat
provided reliability or range restriction information or from
other metaanalyses. For example, an estimate of the
reliabilityofsupervisorratingsofoverallperformance(r=.52)
canbeborrowedfromthemetaanalysisonratingreliabilityby
Viswesvaran,Ones,andSchmidt(1996).

SEARCHINGFORMODERATORS

Being able to generalize metaanalysis findings across all


similar organizations andsettings (validity generalization) is
animportantgoalofanymetaanalysis.Itisstandardpractice
inmetaanalysistogeneralizeresultswhenatleast75%ofthe
observedvariabilityinvaliditycoefficientscanbeattributedto
sampling error. When less than 75% can be attributed to
sampling error, a search is conducted to find variables that
might moderate the size of the validity coefficient. For
Meta-Analysis
example,educationmightpredictperformancebetterinlarger
policedepartmentsthaninsmallerones.
Theideabehindthis75%ruleisthatduetosampling
error,weexpectcorrelationstodifferfromstudytostudy.The
questionis,arethedifferencesweobservejustsamplingerror,
ordotheyrepresentrealdifferencesinstudies?Thatis,isthe
differencebetweenthecorrelationof.30foundinonestudyand
thecorrelationof.20foundinanotherstudyduetosampling
error;oristhedifferenceduetoonestudybeingconductedin
anurbanpolicedepartmentandtheotherbeingconductedina
ruraldepartment?
Toanswerthisquestionthereareformulasthattellus
howmuchvariabilityinstudieswehaveinourmetaanalysis
and how much of that variability would be expected due to
samplingerror.

UnderstandingMetaAnalysisResults

Now that you have an idea about how a metaanalysis is


conducted, let's talk about how to understand metaanalysis
resultsthatyoumightfindinapublishedarticle.InTable6.6,
you will find the partial results of a metaanalysis we
conducted on the relationship between cognitive ability and
policeperformance. Thenumbersinthetablerepresentthe
validityofcognitiveabilityinpredictingacademygradesand
supervisorratingsofperformanceasapoliceofficer.
Chapter 6

Table6.6Samplemetaanalysisresultsforcognitiveability

95%Confidence 90%Credibility
Interval Interval
Criterion K N r SE% Qw
Lower Upper Lower Upper

Academygrades 61 14,437 . .33 .48 .62 .44 .81 78% 77.82


41

Supervisor 61 16,231 .16 .12 .20 .27 .14 .40 80% 76.40
ratings

K=numberofstudies,N=samplesize,r=meancorrelation,=meancorrelationcorrectedforrangerestriction,
criterionunreliability,andpredictorreliability,SE%=percentageofvarianceexplainedbysamplingerrorand
studyartifacts
Meta-Analysis

NUMBEROFSTUDIESANDSAMPLESIZE

The"K"columnindicatesthenumberofstudiesincludedinthe
metaanalysis, and the "N" column indicates the number of
totalsubjectsinthestudies.Thereisnotamagicalnumber
ofstudiesthatwelookfor,butametaanalysiswith20studies
isclearlymoreusefulthanonewithfive.

MEANOBSERVEDVALIDITYCOEFFICIENT

The"r"columnrepresentsthemeanvaliditycoefficientacross
allstudies(weightedbythesizeofthesample).Thiscoefficient
answers our question about the typical validity coefficient
foundinvalidationstudiesonthetopicofcognitiveabilityand
police performance. On the basis of our metaanalysis, we
would conclude that the validity of cognitive ability in
predictingacademygradesis.41,andthevalidityofcognitive
abilityinpredictingsupervisorratingsonthejobperformance
is.16.

CONFIDENCEINTERVAL

Todetermineifourobservedvaliditycoefficientis"statistically
significant,"welookatthenexttwocolumnswhichrepresent
thelowerandupperlimitstoour95%confidenceinterval. If
the interval includes zero, we cannot say that our mean
validitycoefficientissignificant.FromthefiguresinTable6.6,
we would conclude that cognitive ability is a significant
predictorofgradesintheacademy(ourconfidenceintervalis.
33 .48) and performance as a police officer (our confidence
interval is .12 .20). Using confidence intervals we can
communicateourfindingswithasentencesuchasThoughour
bestestimateofthevalidityofcognitiveabilityinpredicting
academy performance is .41, we are 95% confident that the
validity isno lower than .33and no higher than .48. It is
Chapter 6
importanttonotethatsomemetaanalysesuse80%,85%,or
90% confidence intervals. The choice of confidence interval
levelsisareflectionofhowconservativeametaanalystwants
tobe:themorecautiousonewantstobeininterpretingthe
metaanalysisresults,thehighertheconfidenceintervalused.

CORRECTIONSFORARTIFACTS

The column labeled (rho) represents our mean validity


coefficient corrected for criterion unreliability, predictor
unreliability,andrangerestriction.Thiscoefficientrepresents
whatthe"truevalidity"ofcognitiveabilitywouldbeifwehada
perfectly reliable measure of cognitive ability, a perfectly
reliablemeasureofacademygradesandsupervisorratingsof
performance,andnorangerestriction.Noticehowourobserved
correlationsof.41and.16increasedto.62and.27afterbeing
correctedforartifacts.Whenencountering,itisimportantto
considerhowmanyoftheartifactswerecorrectedfor.Thatis,
two metaanalyses on the same topic might yield different
results if one metaanalysis corrected for all three artifacts
whileanotheronlycorrectedforcriterionunreliability.

CREDIBILITYINTERVAL

Credibility intervals are used to determine if the corrected


correlationcoefficient()isstatisticallysignificantandifthere
aremoderatorspresent.Whereasastandarddeviationisused
tocomputeaconfidenceinterval,thestandarderrorisusedto
computeacredibilityinterval.Aswithconfidenceintervals,ifa
credibility interval includes zero, the corrected correlation
coefficient is not statistically significant. If a credibility
intervalcontainszeroorislarge,theconclusiontobedrawnis
thatthecorrectedvaliditycoefficientcannotbegeneralizedand
that moderators are operating (Arthur, Bennett, & Huffcutt,
2001).Whenreadingametaanalysistable,becarefulbecause
the abbreviation CI is often used both for confidence and
credibilityintervals.
Meta-Analysis

PERCENTAGEOFVARIANCEDUETOSAMPLINGERROR

The next column in a metaanalysis table represents the


percentageofobservedvariancethatisduetosamplingerror
and study artifacts (SE%). Notice that for grades and
performance,thesepercentagesare78%and80%respectively.
Becausethepercentageisgreaterthan75,wecangeneralize
ourfindingsandnothave tosearchformoderators. Sucha
findingisdesired,butisunusual. Moretypicalisthemeta
analysisresultsshowninTable6.7.Theseresultsarefromthe
excellentmetaanalysisoftherelationshipbetweengradesin
school and work performance that was conducted by Roth,
BeVier,Switzer,andShippmann(1996).
Roth and his colleagues found that only 54% of the
observedvarianceincorrelationswouldhavebeenexpectedby
samplingerrorandstudyartifacts.Becauseofthis,theywere
forcedtosearchformoderators. Theyhypothesizedthatthe
level of education where the grades were earned
(undergraduate,masters,ordoctoralprogram)mightmoderate
thevalidityofhowwellgradespredictedworkperformance.As
youcanseefromthetable,thevalidityofgradesinmasters
degree programs was higher than in doctoral programs and
thatsamplingerrorandstudyartifactsexplained100%ofthe
variability across studies for these two levels. However,
samplingerrorandstudyartifactsaccountedforonly66%of
theobservedvarianceincorrelationsforgradesearnedatthe
bachelors level. So, the researchers further broke the
bachelorslevelgradesdownbytheyearssincegraduation.
Rather than using the 75% rule, some metaanalyses
willreporta Qw orHw statistic. Ifthisstatisticissignificant,
thenasearchformoderatorsmustbemade.Ifthestatisticis
notsignificant,wecangeneralizeourfindings.Asshownback
in Table 6.6, the Qw statistic was not significant for either
academy grades or supervisor ratings of performance. This
lackof significanceis consistent with the fact that sampling
error and study artifacts accounted for at least 75% of the
Chapter 6
observedvariance.

Table6.7
MetaAnalysisofGradesandWorkPerformance

rcr,r rcr,rr,p
Criterion K N r rcr 80%C.I. SE%
r r

Overall 71 13,984 .16 .23 .32 .35 .30.41 54


EducationLevel
B.A. 49 9,458 .16 .23 .33 .36 .30.42 66
M.A. 4 446 .23 .33 .46 .50 .31.56 100
Ph.D./M.D. 6 1,755 .07 .10 .14 .15 .08.25 100
Yearssincegraduation
1year 13 1,288 .23 .32 .45 .49 .40.62 89
25years 11 1,562 .15 .21 .30 .33 .23.48 80
6+years 4 866 .05 .08 .11 .12 .00.41 59

Agoodexampleoftheuseofthisstatisticcanbefound
in a metaanalysis of the effect of flextime and compressed
workweeks on workrelated behavior (Baltes, Briggs, Huff,
Wright,&Neuman,1999). AsyoucanseeinTable6.8,the
asterisksinthefinalcolumnindicateasignificantQw,forcinga
searchformoderators.Notethatthismetaanalysisusedthed
statisticratherthananr(correlation)astheeffectsize.Inthis
example,adof.30isequivalenttoanrof.15.

Table6.8
MetaAnalysisofFlextimeandCompressedWorkWeeks
95%CI
Variable K N D
Lower Upper
Flextime 41 4,492 .30 .26 .35 1004.55**
Compressedworkweek 25 2,921 .29 .23 .34 210.58**
Meta-Analysis

ApplyingYourKnowledge

Youcanapplywhatyouhavelearnedinthischapterby
reading the following articles in Applied H.R.M. Research
(www.xavier.edu/appliedhrmresearch).

Aamodt,M.G.(2004).SpecialissueonusingMMPI2scale
configurationsinlawenforcementselection:
Introductionandmetaanalysis.AppliedH.R.M.
Research,9(2),4152.
Godfrey,K.J.,Bonds,A.S.,Kraus,M.E.,Wiener,M.R.,&
Toth,C.S.(1990).Freedomfromstress:Ametaanalytic
viewoftreatmentandinterventionprograms.Applied
H.R.M.Research,1(2),6780.
Chapter 6
7.FactorAnalysis
____________________________

IMAGINETHEFOLLOWINGSITUATIONS:
A human resource manager asked her employees 50
questions about their attitudes toward work and is
lookingtofindaneasywaytosummarizetheresponses
tothe50questions.
AnHRanalysthaswrittena100itemmathtestandis
worried that the test may be tapping more than just
mathskills.
Atrainingspecialisthasevaluatedheremployeeson20
dimensionsofcommunicationskillsbutisworriedthat
givingfeedbackon20dimensionswilltaketoolongand
betoodifficulttounderstand.
An organization development specialist created a
personalitytesttouseintrainingworkshops.Hethinks
his 200 questions tap five distinct personality
dimensions.
In these situations, the four human resource
professionalshaveoneoftwogoals.Eithertheywanttoreduce
a large amount of data (e.g., 50 attitude questions, 20
communicationdimensions)intosomethingmoremanageable,
or they want to ensure that they are measuring the correct
number of constructs (e.g., math skills, 5 personality
dimensions).Factoranalysiswillhelpachievebothgoals.
Factor analysis is a statistical technique that is
extremelydifficulttocomputebyhand,fairlyeasytocompute
withacomputerprogramsuchasSASorSPSS,andfairlyeasy
to understand when reading the results in a journal article.
Thischapterwillfocusonunderstandingtheterminologyand
Chapter 7
tablesusedinjournalarticlesusingfactoranalysis.
Afactoranalysiscomputesthesimilarityofresponsesto
aseriesofquestionsandthendeterminesgroupsofquestions
that seem to generate similar responses. For example, in a
studyonhobbies,interestinbaseball,football,andbasketball
might fall into a sports group and interest in hiking,
canoeing,andfishingmightfallintoanoutdoorgroup.
Thesegroupingsarecalledfactors.Foreachquestion,a
factor coefficient is generated indicating the extent to which
thatquestionrelatestoeachfactor. Thesefactorcoefficients
can be interpreted in much the same way as we would a
correlation coefficient: the higher the coefficient, the greater
therelationshipbetweenthequestionandthefactor.

DeterminingtheNumberofFactors

If you have nine items in your factor analysis, you can


potentiallyhaveninefactorsifnoneoftheitemsarerelatedto
eachother. However,thegoaloffactoranalysisistoreduce
thenumberofitemstoasmallernumberofmeaningfulfactors.
For a factor to be meaningful, it must explain a significant
amountoftherelationshipsamongtheitems.Thisamountis
calledaneigenvalue;thehighertheeigenvalue,thegreaterthe
amountofvariancethatisexplainedbythatfactor,andthus,
themoreimportantthefactor. Mostcomputerprogramsuse,
andmosttextssuggest,aneigenvalueofoneasadefaultfor
keepingafactor. Thedecisionregardingtheidealnumberof
factorsisguidedbyaneedtobalancesimplicity(fewerfactors)
withprecision(manyfactorsaccountingforthemajorityofthe
variance).
Forexample,supposethatweasked100studentstouse
afivepointscaletoratehowwelltheylikeninedifferentfoods.
AsshowninTable7.1,afactoranalysisrevealsthatthesenine
foodsrepresentthreedistinctfactors,witheigenvaluesforthe
factors of 1.96, 1.79, and 1.69. Note that these three
eigenvaluesaresimilarinmagnitude,indicatingthatthethree
factorsareofaboutequalimportance.Contrastthissimilarity
Factor Analysis
withthedifferenceineigenvaluesshowninTables7.3and7.4.

For a factoranalysis to bereliable, there needs tobe


datafromasufficientnumberofpeople. Almosteveryexpert
agrees that there should be data from at least 100 people.
However, in determining a sufficient sample size, one also
needstoconsiderthenumberofvariables.Mostexpertsagree
that data from at least five people are needed for every
variable. Thatis,ifwearefactoranalyzingresponsesto40
foods,wewouldneeddatafromatleast200people(40items
multiplied by 5 people). Some authors (e.g., Kachigan, 1986)
suggestthattherebetenpeopleforeveryvariable;orifthere
are more than 10 variables, the square of the number of
variablesistheminimumnumberofpeopleneeded.Thatis,if
youhave15variables,youneeddatafrom225people(15times
15). These rules of thumb are the minimums: the more
subjects you have, the better the reliability of your factor
analysis.
Therearetimeswhenajournalarticlewilldescribea
studyasaconfirmatoryfactoranalysis.Insuchcases,rather
Chapter 7
researcheristryingtoconfirmthatasetofitemswillyielda
certainnumberoffactors.Thepresumednumberoffactorsis
basedeitheronpreviousresearchoronatheory.Forexample,
suppose that a consultant created a 12item personality test
thathethoughtwouldyieldscoresonthethreefactorsshown
inTable7.2.Hewouldconducthisanalysistoseeiftherewere
actuallythreefactorsandiftheitemsloaded(hadhighfactor
coefficients)onthefactorshethoughttheywould.

DeterminingtheItemsBelongingto
EachFactor

Todeterminetheitemsthatbelongtoeachfactor,welookat
the size of the factor coefficients. Generally, for an item to
belongtoafactor,thefactorcoefficientmustbe.30orhigher.
AsyoucanseebackinTable7.1,popcorn(.80),potatochips
(.83),andCrackerJack(.75)belongtoFactor1;carrots(.64),
peas(.84),andlimabeans(.75)belongtoFactor2;andapples
(.60),oranges(.87),andplums(.73)belongtoFactor3.Notice
thatsomeofthefactorcoefficientsarenegative.Asyoumight
remember from the chapter on correlation, a negative
coefficientisnotabadthing:itsimplytellsusthedirectionof
the relationship. For example, in Table 7.1, the negative
loadingofapplesonFactor1(.21)tellsusthatpeoplewho
likeapplesarenotasinclinedtolikethefoodsthatloadhighly
onFactor1(popcorn,potatochips,CrackerJack).
Often,anegativeloadingisexpectedandhelpsdefinea
factor. Forexample,wemightaddshyandintrovertedto
the traits that form the extraversion factor in Table 7.2.
Whereas, we would expect outgoing, talkative, funny, and
sociabletohavehighpositivefactorcoefficients;andwewould
expect shy and introverted to have high negative factor
coefficientsontheextraversionfactor.
When describing the results of a factor analysis,
researchers will often use the term rotation. Rotation is a
statistical procedure that makes it easier to determine the
Factor Analysis
itemsthatbelongtoeachfactor. Themostcommonrotations
include varimax, equimax, oblimin, and quartimax; each of
whichhasadifferentgoal.Forexample,thegoalofavarimax
rotationistohavehighfactor coefficientsforitemsthat are
relevant to the factor and very low coefficients for the other
items. In contrast, the aim of a quartimax rotation is to
increase the odds that an item will have a high factor
coefficientononlyonefactor.Itshouldbenotedthatarotation
onlymakesthefactoranalysiseasiertointerpret.Itdoesnot
changetheactualrelationshipsamongtheitems.
Table7.3showsthefactoranalysisofsixscalesfroman
abilitytest. Noticethatwithoutarotation,perceptualspeed
and vocabulary have high loadings on both factors. After
rotation, however, the factors are cleaner, and each scale
loadsonlyononefactor,makingthefactorseasiertointerpret.
Also notice that the factor loadings for the two types of
rotationsarealmostidentical,thusthetypeofrotationwould
nothavemattered.Thisisnotalwaysthecase.

NamingtheFactors

Onceweseetheitemsthatbelongtoafactor,wetrytomake
senseofthatfactorbygivingitaname.Theexamplebackin
Table7.1shouldbeeasythethreefactorsrepresentthefood
groups of vegetables, fruits, and junk food. Sometimes,
however,namingthefactorscanbemorechallenging.Agood
example comes from the factor analysis of difficult employee
types that was conducted by Raynes (1997). Raynes noticed
that popular books on dealing with difficult people (e.g.,
Bramson, 1981; Brinkman & Kirshner, 1994) suggested that
there are several types of difficult people such as tanks,
snipers, whiners, yes people, no people, maybe people,
gossipers, and knowitalls. Raynes, who questioned whether
therewereactuallysomanytypes,factoranalyzedsupervisor
ratingsofemployeesdifficultbehaviorsandfoundthatthese
behaviorscouldbereducedtothetwofactorsshowninTable
Chapter 7
7.4. What would you call these two factors? Raynes (1997)
labeled Factor I aggressive behaviors and Factor II passive
behaviors.
TheRaynesstudyisagoodexampleoftheusefulnessof
factoranalysis. Byreducingthenumberofvariablesfrom10
to2,Rayneswasabletoconductamoreefficientstudyofthe
validity of a test battery in screening applicants who might
becomeproblememployees. Butmoreimportantly,thefactor
analysisdemonstratedthatthepopularbeliefthatthereare10
separatetypesofdifficultemployeesisnottrue.Instead,there
isonetype(aggressive)whogossips,disagrees,whines,throws
tantrums,andusessarcasm,andasecondtype(passive)who
cantsaynoandwontspeakup. Thus,ratherthanlearning
howtodealwith10typesofdifficultpeople,weonlyneedto
learnhowtohandletwotypes.

DeterminingiftheFactorAnalysisis
AnyGood

Withmoststatistics,wegetasignificancelevelthathelpsus
determinetheconfidencewecanplaceinourfindings. With
factoranalysis,determiningiftheresultsaresignificantisa
bitmoredifficult. Onewaytoevaluateanexploratoryfactor
analysisistoconsidertheamountofvariancethatisexplained
bythefactors. Thisisdonebysummingtheeigenvaluesand
then dividing by the number of items that were factor
analyzed.Forexample,forthefactoranalysisshowninTable
7.1, after summing the eigenvalues (1.96 + 1.79 + 1.69) and
dividingbythenumberofitems(9),weseethat60.4%ofthe
variance(5.449)wasaccountedforbythethreefactors.The
twofactorsinTable7.3accountfor55.9%ofthevariance.Ata
minimum,wewantthispercentagetobeabove50.
If a factor analysis is done to confirm that a certain
numberoffactorsexist,thereareahostofstatisticstotesthow
well the obtained factor analysis results fit the expected
results.Thesegoodnessoffitindexesrangeinvaluefrom0
Factor Analysis
1: the closer to one, the better the fit. An index of .90 is
consideredtheacceptablelevel(Bryant&Yarnold,1995).

Table7.4
FactorAnalysisofDifficultPeople
DifficultBehavior FactorI FactorII
Gossiping .85 .07
Disagreeingwitheveryone .84 .00
Yelling .74 .10
Actinglikeaknowitall .73 .00
Whining .73 .02
Sayingnotoeverything .66 .38
Usingsarcasm .50 .09
Notmakinguptheirmind .43 .69
Notspeakingup .04 .78
Agreeingtoeverything .21 .63

Eigenvalues 3.97 1.62

ApplyingYourKnowledge

Youcanapplywhatyouhavelearnedinthischapterby
reading the following articles in Applied H.R.M. Research
(www.xavier.edu/appliedhrmresearch).

Conte,J.M.,Ringenbach,K.L.,Moran,S.K.,&Landy,F.J.
(2001).Criterionvalidityevidencefortimeurgency:
Associationswithburnout,organizationalcommitment,
andjobinvolvementintravelagents.AppliedH.R.M.
Research,6(2),129134.
Franz,T.M.&Norton,S.D.(2001).Investigatingbusiness
casualdresspolicies:Questionnairedevelopmentand
exploratoryresearch.AppliedH.R.M.Research,6(2),79
94.
Raynes,B.L.(2001).Predictingdifficultemployees:The
Chapter 7
relationshipbetweenvocationalinterests,selfesteem,
andproblemcommunicationstyles.AppliedH.R.M.
Research,6(1),3366.

References
________________________

Aamodt, M. G. (2016). Industrial/organizational psychology: An


appliedapproach(8thed.).Belmont,CA:Wadsworth.
Aamodt,M.G.,&Williams,F.(2005,April).Reliability,validity,and
adverseimpactofreferencesandlettersofrecommendation.
Paperpresentedatthe20thannualmeetingoftheSocietyfor
IndustrialandOrganizationalPsychology,LosAngeles,CA.
Allard,G.,Butler,J.,Faust,D.,&Shea,T.M.(1995).Errorsinhad
scoringobjectivepersonalitytests:ThecaseofthePersonality
DiagnosticQuestionnaire. ProfessionalPsychology:Research
andPractice,26(3),304308.
Alliger,G.M.,Tannenbaum,S.I.,Bennett,W.,Traver,H.,&
Shotland,A.(1997).Ametaanalysisoftherelationsamong
trainingcriteria.PersonnelPsychology,50(2),341358.
Arthur,W.,Bennett,W.,&Huffcutt,A.I.(2001).Conductingmeta
analysisusingSAS.Mahwah,NJ:Erlbaum.
Baltes,B.B.,Briggs,T.E.,Huff,J.W.,Wright,J.A.,andNeuman,
G.A.(1999).Flexibleandcompressedworkweekschedules:
Ametaanalysisoftheireffectsonworkrelatedcriteria.
JournalofAppliedPsychology,84(4),496513.
Beall,G.E. (1991). Validityofweightedapplicationblanksacross
fourjobcriteria.AppliedH.R.M.Research,2(1),1826.
Bramson,R.(1981).Dealingwithdifficultpeople.NewYork:Dell
Publishing.
Brinkman,R.,&Kirschner,R.(1994).Dealingwithpeopleyoucant
stand.NewYork:McGrawHill.
Bryant,F.B.,&Yarnold,P.R.(1995).Principalcomponents
analysisandexploratoryandconfirmatoryfactoranalysis.In
Grimm,L.G.,&Yarnold,P.R.(Eds.)Readingand
understandingmultivariatestatistics.Washington,D.C.:
AmericanPsychologicalAssociation.
Conway,J.M.,&Huffcutt,A.I.(1997).Psychometricpropertiesof
multisourceperformanceratings:Ametaanalysisof
References

subordinate,supervisor,peer,andselfratings.Human
Performance,10(4),331360.
CooperHakim,A.,&Viswesvaran,C.(2005).Theconstructofwork
commitment:Testinganintegrativeframework.
PsychologicalBulletin,131(2),241259.
Gaugler, B.B., Rosenthal, D. B., Thornton, G. C., & Bentson, C.
(1987).Metaanalysisofassessmentcentervalidity.Journal
ofAppliedPsychology,72,493511.
Griffeth,R.W.,Hom,P.W.,&Gaertner,S.(2000).Ametaanalysis
ofantecedentsandcorrelatesofemployeeturnover:Update,
moderatortests,andresearchimplicationsforthenext
millennium.JournalofManagement,26(3),463488.
Huffcutt, A. I., & Arthur, W. (1994). Hunter and Hunter (1984)
revisited: Interviewvalidityforentryleveljobs. Journalof
AppliedPsychology,79(2),184190.
Hunter,J.E.,&Schmidt,F.L.(1990).Methodsofmetaanalysis:
Correctingerrorandbiasinresearchfindings.Newbury
Park,CA:SagePublications.
Judge,T.A.,Rodell,J.B.,Klinger,R.L.,Simon,L.S.,&Crawford,E.
R.(2013).HierarchicalrepresentationsoftheFiveFactor
Modelofpersonalityinpredictingjobperformance:
Integratingthreeorganizingframeworkswithtwotheoretical
perspectives.JournalofAppliedPsychology,98(6),875925.
Judge,T.A.,Thoresen,C.J.,Bono,J.E.,&Patton,G.K.(2001).The
jobsatisfactionjobperformancerelationship:Aqualitative
andquantitativereview.PsychologicalBulletin,127(3),376
407.
Kachigan,S.K.(1986).Statisticalanalysis.NY:RadiusPress.
Koslowsky,M.,Sagie,A.,Krausz,M.,&Singer,A.H.(1997).
Correlatesofemployeelateness:Sometheoretical
considerations.JournalofAppliedPsychology,82(1),7988.
Mathieu,J.E.,&Zajac,D.M.(1990).Areviewandmetaanalysisof
theantecedents,correlates,andconsequencesof
organizationalcommitment.PsychologicalBulletin,108(1),
171194.
McDaniel,M.A.,Hartman,N.S.,Whetzel,D.L.,&Grubb,W.L.
(2007).Situationaljudgmenttests,responseinstructions,
andvalidity:Ametaanalysis.PersonnelPsychology,60(1),
6391.
References
Quinones,M.A.,Ford,J.K.,&Teachout,M.S.(1995).The
relationshipbetweenworkexperienceandjobperformance:A
conceptualandmetaanalyticreview.PersonnelPsychology,
48(4),887910.
Index
________________________
Moderators,9899

Analysisofvariance,3946
Regression,8385 Results,99104
Centraltendency,1416 Samplingerror,9294
Chisquare,4748 Mode,1516
Confidenceinterval,101102 Modelspecification,78
Credibilityinterval,102 Multicollinearity,7879
Conveniencesample,13 MultipleR,8182
Correlation,5170 Nominaldata,7,62
Magnitude,5259 Ordinaldata,8
Norms,6465 Percentiles,2425
Significance,6069 Practicalsignificance,67,63
Curilinearity,5760 69
Eignenvalues,108 Qwstatistic,103104
Factoranalysis,107114 Randomassignment,13
Fishersexacttest,36,37,46 Randomsample,12
47 Range,1819
Frequencies,36 Rangerestriction,57
Hierarchicalregression,7477 Ratiodata,8
Intervaldata,8 Regression,7190
Interveningvariable,5152 Considerations,7780
InvertedU,5960 Equation,8688
Lanningv.SEPTA,6162 Interpreting,8088
MannWhitneyU,35 Purpose,7173
Mean,1415,3135 Types,7477
Measurementscales,79 Reliability,5557
Median,15,35 Rsquare,63,82
Metaanalysis,91105 Samplesize,1114
Artifacts,9798 Samplingerror,9294,103104
Findingstudies,9596 Significancelevels,37
References

Standarddeviation,1921
Standarderror,8283
Standardscores,2328
Statisticssymbols,29
Stepwiseregression,7476
ttests,3739
TypeIerrors,4
TypeIIerrors,4
Utilityanalysis,6669
Validity,6469
Variability,1723
Variance,23
Zscores,2526

Das könnte Ihnen auch gefallen