Sie sind auf Seite 1von 18


CHAPTER 13 , Describing Data

linear regression bivariare linear regression least-squares regression line regression weight

standard error of estimate coefficient of nonderermination correlation matrix

..,.,...,.:,..:;*-:.;. f{.*.\,**\ I a
~ , ..~.~-~-~.. .-...~

Inferential Statistics: Basic ^

Using InferGritial
hapter 13 reviewed descriptive statistics that help you characrerize and describe your data. However, they do ITot help you

Sampling Distribution Sampling Error Degrees OFFreedom

Parametric Versus Nonpai

The Logic Behind Inferentia

Statistics Statistical Errors

Statistical Significance
One-Tailed Versus Two-Te Parametric Statistics

assess the reliability of your findings. A reliable finding is repeatable

whereas ai\ unreliable one may riot be. Statistics that assess the TellTICS of the samples comprising your data

Assumptions Underlying ;
Parametric Statistic

ability of your findings are called inferential statistics because they let you infer the characteristics of a popularioi\ from the characterisThis chapter reviews The most widely used inferentialstatistics. Rather Than focusing on how to calculate these statistics, This discussionfocuses on issues of application and interpretation. Consequently,

Inferential Statistics With

The tTest

An Example From the Lit,

Contrasting Two Groups
The zTest for the Differei

Between Two Proportion

computational formulas or worked examples are not presented

. . _ . . ..._. ....-...--~-. ~ ,.~"-~.,,..",,'.*,.,"L"~' ,,~~,~"., -.~ "'~' "' ~ '~

Beyond Two Grou s: variance(ANOVA!Ane 'E

The One-Factor Between

The One-FactorWithin-S

Before exploring some of the Inore popular inferential statistics, we resentsomeofthebasicconceptsunderlyingthesestatisrics. You should understand these concepts before tackling the discussion on inferGritial statistics that follows. If you I\eed a more comprehensive refresher on
these concepts, consult a good introductory statistics text

The Two-Factor Between


The Two-FactorWithin-S

Mixed Designs

Higher-order and Specia

Sampling Distribution

Noriparametric Statistics
Chi-Square The Mann-Whimey U T^

Chapter 13 Introduced the notion of a distribution of scores. Such a

The Wilcoxon Signed Rai

Parametric Versus Norip;

distribution results from collecting data across a series of observations

and then plotting the frequency of each score or range of scores. It is These samples could be used to form a distribution of sample means.

also possible to create a distribution by repeatedly taking samples of a given size (e. g. , n = 10 scores) from the population. The means of If you could take eveTy possible sample of n scores from the populalion, you would have what is known as The sampling distTibtttion of
the mean. Statistical theory reveals that this distribution will tend to

Special Topics in Inferenti

Statistics Power of a Statistical Te Statistical Versus Practic

Significance The Meaning of the LevE Significance

DataTransformations Alternatives to Inferenti,

closely approximate the normal distribution, even when the population of scores from which the samples were drawn is farfrom normal

Summary Review Questions KeyTerms



,... ,,,... . ~ ...

.. . ..,,, .,.,. ..




Using Inferential Statistic



in shape. Thtis, yoti calT LISe the Itormal distribution as a theoretical Inodel That will

allow you to make Inferences aboutthe likely value of the populatioi\ mean, given the
mean of a single sample froin that population.

parametric statistic, you are making certain assumptions about the population from which your sample was drawn. A key assumption of a parametric test is that Your
sample was drawn from a normally distributed population

TITe sample Ineai\ is ITot the only statistic for which yoLi can ohmiiT a samplin distribution. In fact, eacl\ sample statistic has its own Theoretical salnpling disrribudistributions, yoti cal\ determine the probability that a value of a statistic as Iar e as
called the obtained p. Sampling Error

In contrast to a parametric statistic, a nonparametric statistic makes no assumptions

about the distribution of scores underlying your sample. Noriparametric statistics are

lion. For exaiiTple, the tabled values for the z statistic, Sttident's t, the F ratio and chi,

square represent the sampling distributions of those statistics. Using these samplin

used tryour data do not meet the assumptions of a parametric rest THELOGICBEHINDINFERENTIALSTATISTICS

or larger thai\ the obtained value could have occurred by chance. This probabilit is

When yoti draw a sample from a populatioiT of scores, the Ineai\ of the saln Ie, M,
standard deviation of the sample as follows:

will probably differ froin the popularioiT mean, F1. A1T estimate of the amount of vari-

standard error of the mean (orst"richrd erroT for short). It Inay be calculated froin the
S, '-

ability in the expected sample means across a series of such samples is provided by the

Whenever you conduct an experiment, you expose SIIblects to differentlevels of your independent variable. Although a given experiment may contain several groups assume for the present discussion that the experiment In question includes only two The data from each group can be viewed as a sample of The scores obtained if allsub jectsin the target popularioi\ were tested under the conditions to which the group was exposed. For example, the treatinent group mean represents a population of subjects exposed to your experimental treatment. Each treatment Inean is assumed to repre
sentthe mean of The underlying population

where s Is the standard deviatioi\ of the sample and n is the ITtimher of scores in the sample. The standard error Is used to estimate The standard deviation of the saln Iin

In all respects except for treatment, the treatment and control groups were ex posed to equivalent conditions. Assume that the treatment hadno effect on the scores. In that case, each group'sscores could be viewed as an independentsample taken from
the same population. Figure 14-1 illustrates this situation
FIGURE 14-1 Line graphsshowing

distriburion of the Inean f<tithe population froin which the sample was drawn.
Degrees of Freedom


independentinf(, rination. For example, ifyoLihave a sample of 10 scores and a known mean (e. g. , 6.5), only 9 scores are free to vary. That is, once yon have selected 9 scores from The population, the value of The 10/1\ must have a particular value that will ield the Inean. Thus, the degrees of freedom (df) for a single sample are n - I (where n is
the total ITUmber of scores in the sample). Degrees of freedom come into play whei\ yoLi use any inferentialsratisric. Yoti

In any distribution of scores with a known mean, a limited ITtimber of data points ield

> U " .

the relationship between samples and population, assuming that the treatment had no effect on the dependent variable (M, , mean of Sample I; Mz, mean of
Sample 2)

" L

Your experiment with Ineans ofZ, 5, and 10, the grand liteai\ (The SUIn of allthe scores

can extend this logic to the analysis of ai\ experiment. If yoti have three groups in

divided by n) is theIT 5.7. IfyoLi know The grand InGal\ and yoLi know the means from two of yotir groups, the final meal\ is set. Hence, the degrees of freedom for a three
grotip experiment are k - I (where k is the ITUmber of levels of The independent varistatistic against which the computed value is compared Parametric Versus Nonparametric Statistics
, " .


able). The degrees of freedom are theIT LISed to find the appropriate tabled value of a

Sample I


Sample 2

Inferentialstatistics can be classified as either parametric or nonpttrametTtc. A pornmeter

" L

in this context is a characteristic of a population, \\, hereas a skinstic is a characteris-

tic of your sample (Gravetter 6< Wallnau, 2007). A parametric statistic estimates the

value of a popularioi\ parameter from The characteristics of a sample. When yoLitise a




Using Inferential Statistics



Eacl\ sample mean provides an independent estimate of the popLilation mean. Each saln it standard error provides an independent estimate of the standard deviatioi\ of sample means in the sampling distribution of means. Because the two means ere drawn froin the same population, yo\Iwould expect them to differ only becaLise

The bottom parr of the figure shows two possible sample distributions-one for the control group and one for the Treatment group. The scores from the control group still constitute a sample from the 11nshifted distribution (left-hand upper curve

in Figure 14-2), but the scores from the treatment group now constitute a sample
from the shifted distributioit (right-hand upper curve in Figure 14-2). The two sam

OSampingeT h ddd'triofthis
distribution (The standard errors). From this information, you can calculate the probh'lit that the two sample means would after as much as or more than they do SImp y
because of chance factors. This probability is the obtained p.

PIe means provide estimates of two different population means. Because of sampling
error, the two sample Ineans might or might I\or differ evei\ though a difference exists between the 11ndeTlying popularioiT means Your problem (as a researcher) Is that yoLi do not know whether the Treatment really had an effect o1\ the scores. YOLi inList decide this based on your observed sample means (which may differ by a certain amount) and The sample standard deviations. From this information, you Inust decide whether the two sample Ineans were drawn froin the same populatioiT (the treatment had no effect on the sam PIe scores) or from two different populations (the treatment shifted the scores relative to scores froin the control group). Inferential statistics help you make this

would expect the scores from the two groups to provide Independent samples Tom
the same o ularion. From these samples, you can estimate the characteristics of that
difference between the two treatment means.

Let's review these points. If The treatment had Do effect on the scores, then You

popuation;Tomt\iseSi ,Y I h herVGd
in 11\ein LIPward. Figure 14-I Illustrates this situation. In The LIPper part oft e gure
C nsiderthe case in which the Treatment does affectthe scores, perhaps by shift-

underI ing the treatment group sample distribution. The population distribution Lindenying the treatment group Isshifted upward and away from the contro group popuIaiion distribution. This shift could be obtained by simply adding a constant to each
old Linshifted distriburioi\ in standard deviation, but its mean is higher.

a 0 111ation tinderlying the control group sample disrribution and anot er one

value in the control group distribution. This new shifted distribution resembles the

FIGURE 14-2 Line graphs showing

Concrol group

Treatment group

These Two possibilities (different or the same populations) calT be viewed as sta Listical hypotheses to he tested. The hypothesis that the means were drawn from the same population (I. e. , ^! = IL:) is referred to as the null hypothesis (HD). The hyporh esis that the means were drawn froin different populations (IAI 7E ILz) is called the alternative hypothesis (H, ) Inferential statistics use the characteristics of the two samples to evaluate the validity of the null hypothesis. Put another way, they assess the probability that the means of the Two samples would differ by the observed amount or more if they had been drawn froin the same population of scores. If this probability is sufficiently small (i. e. , if it is very unlikely that two samples this different would be drawn by chance from the same population), then the difference between the sample means is said to be statisticalIy significant, and the null hypothesis is rejected
Statistical Errors

the relationship between samples and population, assuming rhat the treatment
had an effect on the dependent variable.







When making a comparison between two sample means, there are two possible states of affairs (the null hypothesis is true or it Is false) and two possible decisions you can make (not to reject the null hypothesis or To reject it). In combination, these condi tions lead to four possible outcomes, as shown ii\ Table 14.1. The labels across the top of Table 14-1 indicate the two states of affairs, and those in the left-hand column indicate the two possible decisions. Each box represents a different combination of
the two conditions

Control group
> . "

Treatment group



The lower left-hand box represents the situation in which the null hypothesis Is True (the Independent variable had no effect), and you correctly decide riot to reject the nullhyporhesis. This is a disappointing outcome, but at least you made the right


The upper left-hand box represents a more disturbing outcome. Here the ITUll hypothesis is again true, but you have incorrectly decided to reject the null hypoth esis. In other words, you decided that your Independent variable had an effect when in fact it did not. In statistics this mistake is called a Type I error. In signal-detection


'~""~~$79, ~**,




Using Inferential Statistics

c e I or re'ect the nullhypothesis. This criterion, known as the alpha level (or),
TABLE 14-1 Statistical Errors

re resents the probability that a di erence ath I'netrOTrepresentstepTOaiiTY h east as aTge as e

The al ha levelrhar You adopt (along with the degrees of freedom) a so etermines
The critical value of the statistic that you are using. e

HD True

HO False
Correct decision

Reject Ho

Type I

I e of .000001. There are good reasons, discussed later, why you o not or inari y
adopt such a conservative alpha level.
By convention, t e minimum accepta e a p

I than one T e I error in I million experiments by choosing an a p a


DONotReject Ho

Type U


Id have roduced a difference at least ariarge as the

ex Griments, the same kind of mistake is called a "{alse alarm" (saying that a slimti LIS
was present when actually it was riot).

A difference between means yielding an observed value of a statistic that meets or exd the critical value of Your inferGritial statistic is said to be statistical13 significant.
The strategy of looking up the critical value of a statistic in a
the obtained value with this critical value was developed in an era w en

case, The null hypothesis is false (the independent variable did have an effect), but

h correctl decided ITottt)rejectthe nullhypothesis. This is called aType

had no effect whei\ it really did have one. 11\ signal-detection experiments, suc an

outcome is called a "miss" (not detecting a stimulus that was present). Idealy, Youwouie( Tel error actually increase the probability of a Type U error, and vice versa.
Statistical Significance

a the exact probability value p along Y I obtained value of I e teststatisproviderbeeXaCtPTO" with thedjhlevelandaVOid
having to use the relevant table. If the o tame p ' comparison is statisticalIy significant.
One'Tailed Versus Two, Tailed Tests

Ifboth samples came from the same popularioi\ (or{Torn populations having the same
I means inn , be just such a chance difference, or it Inay reflect a Tea ti erence between the means of the populations from which the samples were Lawn. of these is the case! To help yo\I decide, yoLi cal\ compute ai\ inferentia statistic to he robabilit of obtaining a difference between sample means as large as or larger thai\ the difference yoLi actually got, tinder t \e assumptioi\ t at I\e liti
' flects ITothii\ Inore tl\at\ sampling error. The act11al difference betweei\ your sam-

mean), then the ntilllTyporhesisistrue, and any difference between the sample means

ThecriticavaUGS' hd djhalevel. Theyalso

de end on whether The testis one'tailed or two-tailed.

" fastatisticde endonsuchfactorsasthenumberofobserva,

igure ' EThemean. Theleftdistribution shows the critical region (shaded area) for a one'tailed test, assuming a p a b n set to .05. This region contains 5% of the total area under The curve, represen -

arurargert\al\' h, -' thenunhothesis

because yotiwould he unlikely To have obtaine I\e I erence y

the 5% of cases whose z scores occur by chance with a PTO a iity o . or

'ri' I are'udedtobestatjstjcallysjgnjficant.

his robabilit , oticalculate ai\ observed value of your inferentia

1stic. This observed value is compared to a critical value of that statistic (norma y
f I\d in a statistical table such as those in The Appendix, for examp e, a e

n whether or I\or The observed valtie of the statistic Ineets or exceeds the critica value. Asstated, you want to he able to reduce the probability of committing a ype I error. The probability of committing a Type I error depends on the criterion you LISe

You would conduct a one'tailed test if You were intereste on y in

f h statistic falls in one tail of the sampling distribution for that

statistic. This is usually the case when your researc ypot eses ar





Using Inferential Statistics

One-tailed resr Two-tailed test

Assumptions Underlying a Parametric Statistic



Three assumptions underlie parametric inferentialtests(Gravetter 6LWallnau, 2007): (1) The scores have been sampled randomly from the population, (2) the sampling
distribution of the mean is normal, and (3) the within-groups variances are hornogenot the variance.
Cmcal regio Critical region

tr L

CFicical region

. d
" L

neous. Assumption 3 means the variances of the different groups are highly similar.
Serious violation of one or more of these assumptions may bias the statistical test.

In statistical inference, The independent variable is assumed to affect the mean but


I I'2

2\ ,


I I'2

FIGURE 14-3 Graphs showing critical regions for one-tailed and two-tailed tests of
statistical significance

Such bias will lead you to commit a Type I error either more or less often than the stated alpha probability and thus undermine the value of the statistic as aguide to deci510n making. The effects of violations of these assumptions are examined later in more detail during a discussion of the statistical technique known as the analysis of"orionce.
Inferential Statistics With Two Samples

example, you may want to know whether a new therapy is me asurably better than The
standard one. However, if the new Therapy is not better, then You really do not care

whether it is simply as good as the standard Inethod or is actually worse. You would
not use it in either case

Ima me that you have conducted a two-group experiment on whether "death-

qualifying" a jury (i. e. , removing any jurors who could not vote forthe death penalty)

In contrast, you would conduct a two-tailed testifyou wanted to know whether the new therapy was either better orworse than the standard method. In that case, you need to check whether your obtained statistic falls into either tailof the distribution The major implication of allthis is that for a given alpha levelyou must obtain a greater difference between the means of your two treatment groups to Teach statistical significance ifyou use a two-tailed test than ifyou use a one'railed test. The one'tailed test is therefore more likely to detect a real difference if one is present (i. e. , it is more powerf\11). However, using the one'tailed test means giving up any information about the reliability of a difference in the other, untesred direction
The use of one'tailed versus two-tailed tests has been a controversial topic among

affects how simulated jurors perceive a criminal defendant. Participants in your ex-

perlmental group were death qualified whereas those in your controlgroup were not.
Partici ants then rated on a scale from O to 10 the likelihood that the defendant was
mean is 7.2, and the controlgroup mean is 4.9).

guilty as charged of the crime. You run your experiment and then compute a mean or each group. You find the two means differ from one another (The experimental group

error. Or your means may Tellably represent two different populations. Your task is to
determine which of these two conditions is true. Is The observed difference between meansreliab!e, or does it merely reflectsampling error!This question can be answered by applying the appropriate statistical test, which in this case is a t test
Thet Test

Your means may represent a single population and differ only because of sampling

statisticians. Strictly speaking, yoLi must choose which version you will use before You see The data. You must base your decision on suchfactoTs as practical considerations (as in the therapy example), your hypothesis, orpreviousknowledge. Ifyouwaitunrilafteryotihave seen the data and Then base your decision on the direction of the obtained outcome, your actual probability of falsely rejecting the null hypothesis will be greater than the stated alpha value. YOLihave LISed Information contained in the data to make your decision, but that information may itselfbe the result of chance processes and unreliable H you conduct a two-tailed test and then fail to obtain a statisticalIy significant result, the temptation is to find some excuse why yoti"should have done" a one'tailed test. YOLi cal\ avoid this temptation ifyoti adopt the following rule of thumb: Always
use a two-tailed test unless there are compelling a priorireasons not to

The t testis used when Your experiment includes only two levels of the independent

variable (as in the jury example). Special versions of the I test exist for designs involvsamples (e. g. , marched-pairs designs and within-subjects designs).

ing independentsamples (e. g. , randomized groups) and for those involving correlated when you have data from two groups of participants who were assigned at random to the two groups. The test comes in two versions, depending on the errorterm selected.
The Impoo!ed version computes an errorterm based on the standard error of the mean Tovided separately by each sample. The pooled version computes an errorterm base on the two samples combined, under the assumption that both samples come from
o ulations having the same variance. The pooled version may be more sensitive to

The t Test for Independent Sumples You use the t test for independent samples

As noted above, there are two types of inferenrialstatistics: parametric and non parametric. The Type that you apply to your data depends on the scale of measure merit LISed and how your data are distributed. This section discusses parametric
inferGritial statistics

effect of the independent variable, bur itshould be avoided if there are large difforences in sample sizes and standard errors. Under these conditions, The probability
estimates provided by the pooled version may be misleading.


~. ,



I' I

' "'* ,

, . .


.-. .



111 11

\ I '. I^

CHAPTER 14 * Using InferentialStatistics




The t Test for Correlated Sqmples When the two means bein cot

amp cot at are not in epen ent of one another, the formula forthe ttest in

TABLE 14-2

Means and t Values From the Five Significant Differences Fo""d by Her^ at a1, (2003)

^; I

t (df) 2.40 (18, -2.20 (31) 2.40(34) 3.16 (49) 4.73 (44)

icipant or ToIn single observations taken on each of a matched air of arti '
designs meet this requirement.

Written attention test

41.6 91.4 47.1



I -I


I^ I


The t test for correlated samples produces a Iar er I value tha t pp , Scores Tomtetwosamplesare atleastmodetj I ie to I e same data ithe scores from thea Pusare atleastmoderatelycorrelated, andth' two samples are

Motorspeed Verbal learning Verbal memory (immediate recall) Verbal memory (delayed recall)

37.9 187


I 1' I.

o set y t e correlated sample t test's smaller degrees of freedom Iequal to n - I,

As presented, the data in Table 14-2 do nor make much sense. Anthar you have

11 I

corre are samp us an in GPendent samples I tests (pooled 'arelentica; withinreduceddegreesoffreed ,h Iversion) are identical;
pen entsamp us ttest to detect any effect of the independent variable.


are means and a t value (with its degrees of freedom) for each measure. You must decide ifthe I values are large enough to warrant a conclusion that the observed difforences are statisticalIy significant. After calculating a tscore, you compare its value with a critical value of t found

An Example From the Literatures Contrasting Two G pinal cord injuries (SGI) represent a major source for physical disabilitie H s
vo ve rapi ece eratioi\ o I e body and may resultin Inild traumatic b


'njury MTBl). Hess et al. ITore that when a patient with an SGI is rushed i h
wit processing information (Hess et a1. , 2003). The problem is that it is somet'in
emotional Iranma associated with SGI.


in Table Z of the Appendix. Before you can evaluate your obtained t value, however, you must obtain the degrees offreedom (for the between-subjects ttest, df = N - Z, where N is the total number of subjects in the experiment). Once you have obtained the degrees offreedom (these are shown in parentheses in the fourth column of Table 14-2), you compare the obtained Iscore with the tabled critical value, a process requiring two steps. In Table Z of the Appendix, first read down the column labeled "Degrees of Freedom" and find the number matching your degrees of freedom. Next, find the column corresponding to the desired alpha
level(labeled "Alpha Level"). The critical value of t is found at the intersection of

the degrees of freedom (Tow) and alpha level(column) of your test. If your obtained tscore is equal to or greater than the tabled CScore, then the difference between your sample means is statisticalIy significant at the selected alpha level.
In some instances, you may find that the Table you have does not include the

David Hess, Iennifer Marwitz, and Ieffrey Kreutzer (2003) conducted a u i erentiate etweei\ patients with MTBl(without SGI) and patients with SGI. Parcipanrs were patients with SGI or MTBl who had been Treated at a medical

degrees offreedom that you have calculated (e. g. , 44). Ifthis occurs, you can use the nextlower degrees offreedom in the table. With 44 degrees offTeedom, You would use
the entry for 40 degrees offreedom in the table.

ssing attention (two tests), motorspeed, verbal learning, verbal memor (two rests),
' uospatia s I s, andwordftuency. Meal\scores werecomputedoneach f

If You are conducting your t rests on a computer, most statistical packages will compute the exactp value forthe test, given the obtained I and degrees offreedom. In that case, simply compare your obtainedp values to your chosen alpha level. Ifp is less than or equal to alpha, the difference between Your groups is statisticalIy significant at
the stated alpha level.

e at, as a rLi e, patients with SClperformed better than

The z Testforthe Difference Between Two Proportions In some research, you may have to determine whether two proportions are significantly different. In a jury simulation in which participants return verdicts of guilty or not guilty, for example, your dependent variable might be expressed as the proportion of participants who voted guilty. A relatively easy way to analyze data of this type is to

e as emotional well-being.
I .

. .*, ~ ."' ,,. .'.'

J . .

.. ..


. ,.,.*. .., .





Using Inferential SLRtistics


use a z test for the difference between two ro ortion . evaluatedagainstanestimateof Poporrionsjs against an estimate of error variance.

essentia by the same as for the I tests. The difference between Ih

mental error, or by a combinarioiT of these (Graverrer 6< WallnaLi, 2007). The second component, the within-groups wormbility, Inay be attributed to error. This error can arise

froin either or both of two SOLirces: individual differences between subjects treated alike
within groLips and experimental error (Gravelter & Wallnati, 2007). Take ITore that

Beyond Two Groups: Analysis of Variance (ANOVA)

}eilyour experiment inclLides Inore ThaiT twoThaiT twostatistical the of ch ' your experiment inclLides Inore groups, the Tou s, test star'

variability caused by your Treatment effects is Liniqtie to the between-grotips vanahility The F Ratio The statistic LISed ii\ ANOVA to deterIn ine statistical significance Is the F ratio. The F ratio is simply The ratio of between-groups variahility to within groups variability. Both types of variahility that constitute the ratio are expressed as variances. (Chapter 13 described the variance as a Ineasure of spread. ) However statisticians perversely Insist o1\ calling the variance the mean square perhaps because the Tenri is more descriptive. ItISI as with The I statistic, once yon l\ave obtained youI F ratio, yoti compare it against a table of critical values to determine whether YouI
results are statisticalIy significant

p aria yzing t e variance that appears in the data. Forthis analysis, the va t'

, ariance sthei\amelmplies, ANOVAisbasedontheco-

scri e Tow variation is partitioned into sonrces and how the resultin so variationsareusedrocjj eTeSLltingsource
riation among means Is statisticalIy significant.

'ecrs experiment is rs: caracteristicsofthesubjecr arthetjm, ,hdetermined by three factors (1) characteristics of the sub'e arthetimethescorewasineasured, (Z)I 'resLijecr

artitioning Vuritttion The value of any particularscore obt ' d '

The One'Factor Between-Subjects ANOVA The one'factor between-subjects ANCVA is LISed when your experiment Includes only one factor (will\ Two or Inure levels) and has different subjects in eacl\ experi mental condition. As ai\ example, imagine yotil\ave conducted an experiment on
how well participants calT detect a signal against a background of noise, measured

\ w Tel\ a SII. jects are exposed to the same treatment conditions. S

independent variable is effective.

Figtire14-4showshowtherotalvariarioninrhescoresf t' lily among scores. f ' ''' OvaTiaiitymaybeattributable ina be artrib t 'ooneorm - fh Again, this total amount of variabilityy eattrjut,bl, Tooneorjnoreofthreefactors: d d .Y
experimenralerror(Gravener&Wallnati, 2007). '

p I itione into Two sources variability (between-groups variabiliry andwjjhin-groupsvariability).ofNoticetharth, , I, b g Upsvarja il, of wit in-groups variability). Notice that the example begins with a total amount ,
Theftrsrcomponentresulringfromthepartirionisthebt ' e etween-grotips variability may be catised by The variation in our ind d
, Tencesamongt e Ierentsubjectsinyourgroups, byexperi-

in decibels (dh). Participants were exposed to different levels of background ITUise (no IToise, 20 db, LIT 40 ab) and asked To indicate whether or ITt)I they Iteard a tone The I\umber of times that The participant correctly stared That a tone was present represents yonr dependent variable. Yotiftiund that participants in the I\0-noise groLip derecred Inure of The tones (364) Than participants in either the 20-d}) (238) o1 40-db (160) groups. Table 14-3 shows the distributions for the three groups Submitting your data to a one'factor between-subjects ANOVA, yoLitibtaii\ an F ratio of 48.91. This F ratio is now compared with the appropriate critical value ofF in Tables 3A and 3B in the Appendix. To find the critical\, alue, yotii\eed to LISe the degrees of freedQin for bon\ the I\Limerator (1< - I, where k is the ITUmbeT of groups) and denominator Ik(s - I), where s is the number of subjects in each groupl of youI
F ratio. In this case, the degrees of freedom for the I\umeTaror and denominator are Z and 12, respective I\ To identify the appropriate critical value for F (at or = .05), first locate The

FIGURE 14-4 Parririoning Local

variation Into between-groups and within
groups sources
Between groups

appropriate degrees of freedom for the ITUmerator across the top of Table 3A. Then
read doWIT the left-hand CUIumi\ to find The degrees of freedQin for the denomina


tor. In This example, the critical value for F(Z, 12) at u = .05 is 3.89. Because youI obtained F ratio is greater thai\ the tabled \, alue, yoLi have ai\ effect significant at
p < 05 . In fact, ifyotilook at the critical value for F(2, 12) at or = .01 (found in Table 3B), you will find Your obtained F ratio is also significant at p < .01

Within groups


When yoLireporr a significant effect, typically yoLi express it in terms of ap value Alpha refers To the cutoff point that yon adopt. In contrast, the p value refers to the actual probability of making a Type I error given that the ITUll hypothesis is TTLie Hence, for This example, yott would report that yotir finding was significant at p < .05
or p < .01. The disctissioiT 11\ the following sections assumes the "p < " ITotation






Using Inferenrial Statistics

Any set of means has (k - I) orthogonal comparisons

TABLE 13-3

where k is the number of

Data From Hypothetical Signal-Detection Study



400ECIBELS 17 14
19 11 19 80 I 328

d in atISOns can be LISed in lieu of an overallANOVA tryou have highly SpecificpreexpeTimGnayp ' 11 tiveiSto 'f th relationshi s were predicted before yoLi conducted your experiment. Per ormIt' re tests on the same data increases the probability of Inaking a ype error across comparisons through a process cane PTO a tit3! py

d ICt multi\Ie Itests. Yotishould not perform Too Inariy of these comparisons even

25 21 27 119

32 37




6 684 36.4

2 855

its If otido not have a specific preexperimentalhypothUnplanned Coinporisons Ifyotido not have a specific preexperimenta ypot oncernin OUTresults, yoLimust conduct unplanned comparisons (a so Down


as OSI hoc comparisons). Unplanned comparisons are often "fishing expeditions"'

jar e number of unplanned comparisons To fLilly analyze the data.

,hj, ,otjaresimpyOO g Y b dt erformafaiTly

wotypesO Ilealhaforeach
Sometimes the table of the critical vanies of F does ITor 11st the exact degrees of

freedom for your denominator. Ifthis happens, you can approximate the critical value ofF by choosing the nextlower degrees offreedom forthe denominator in the table Choosing This lower value provides a more conservative test of your F ratio Interpreting Your F Rqtio A significant F ratio tells yotithat at leastsome of the differences among your Ineans are probably nor caused by chance burrather by vari arion in your independent variable. The only problem, at this point, is that The F ratio fails to tell you where among the possible comparisons the reliable differences actually occur. To isolate which means differ significantly, yotiinust conduct specific comparisons between pairs of means. These comparisons can be either planned or

rate is .05. The familywise error Tare (KGppe1, 1982) takes into account the increasing ,, oaiityoina g b t{withthefollowing

arison between means. Ifyotiset an alpha level of .05, the per-comparison err

uru, I_ ,, _ or,

where, tstenu"' ' ( =4)andu=. 05, then or, w'

I - (I - .05)* = I - .95* = I - .815 = .185

Planned Coinpqrtsons Planned comparisons (also known as a pTioTicompttTisons) are used when you have specific preexperimental hypotheses. For example, you may have hyporhesized that the no-noise group would differ from the 40-db group but not froin the 20-db group. In this case, yotiwould compare the no-noise and 40-db groups and then the no-noise and 20-db grotips. These comparisons are made using informa Lion from Your overall ANOVA (see KGppe1, 1982). Separate F ratios (each having I degree of freedom) or I tests are computed for each pair of means. The resulting F ratios are then compared with the critical values of F in Tables 3A and 3B in the

f ed to controlfamilywise error and gives a a bTie escription o eac often LISed to controlfamilywise error and givesbriefdescription ofeac . or more information about These tests, see KGppe1(1982, chap. 8).

S ecialtests can be applied to controlfamilywise error, but it is beyond t e scope f h' cha ter to discuss each of them individually. Table 14-4 lists The rests most

'11 use an ANOVA ANOVA if Your groups contain unequa numSumple Size You can still use an if Your groups contain unequal numb ifstib'ects, but yoLiinust use adjusted computational formulas. T e a justmen s

can take one of two forms, depending on the reasons for Linequa wit Inb - roduct of the way that You conducted

nt. If oLi conducted your experiment by randomly y istri P I our ex eriment. If yoLi conducted your experiment by ran Qindistributing y Yourexperiment. IfyoLiconLicteyouruting your I. I SLich cases, unequa sample equal. InSLich cases, unequalsamp e ssizes do not result from the properties o your

You can conduct as many of these planned comparisons as necessary. Howevei a limited number of such comparisons yield unique Information. For example, if you found that the no-noise and 20-db groups did not differ significantly and that the 40- and 20-db groups did, you have no reason to compare the no-noise and 40-db groups. You can logically infer that the ITo-noise and 40-db groups differ significantly Those comparisons that yield new information are known as orthogonal comparisons

treatmentconditions. f U tialsam Ie sizes also may resultfrom the effects of your treatments. one f treatments is painful or stressftil, participants may drop out of your experimen

b Ise of The aversive nature of that treatment. Death of animalsin a grotip receiving



Using Inferential Statistics

f I riditionsis another example of subjectloss re ate

TABLE 14-4 Post HocTests TEST Scheff6 rest USE COMMENTS

I ani ulations That result in unequal salnp e sizes.

a u with Linequalsample sizes for reasons

TokeepfamilywiseeTror Veryconservativetest;Scheff6
rate constant regardless
of the number of coin

notreiatedtotheeffecjsofyouTtreatmel\ , , ,h, ,, discarding

correction factor corrects for all

Dunnett test

possible comparisons, even if nor allare made parisons to be made To contrastseveralexperi- Not as conservative asrhe Scheff6 mental groups with a single rest because only the number of comparisons made is considered in controlgroup
the familywise error rare correction To hold the familywise
errorrate constant over

the ANOVA. This analysis gives each gTotip in yot

SI n equal weight in the aria ysis, espite LID
Ie sizes was planned or reflects

Tukey. a HSD rest

Not as conservative as the Scheff6

an entire set of two-group


test for comparisons between pairs of means;less powerful than the Schef{6 for more complex

actualdifferences 11\thepopLiarion, VOtlS .\ d, ,, ording to th' ,, mber of SLiiGC"in"" ' h domeanswjrhlowerweights. See
Kepp. I(1973,1982) orc"areIte"an a ,
ualsample size in ANOVA.

Tukey-b VsD test

Alternative Tukey Test

Not as conservative as Tukey's

HSD test hut more conservative

than the Newman-Keulstest Newman-Keulstest

To compare allpossible

pairs of means and control

per-comparison errorrate

Ryan's Test (REGWQ)

Modified Newman-Keuls test in which critical values decrease as the

Less conservative than the Tukey rest; critical value varies according to the number of comparisons made Controls familywise error better
than the Newman-Keulstesrbut

Th O e, Factor Within-Subjects ANOVA

T ex eTiment, The statistical test

to LISetstheone-factoTwithin-subjectsANOVA- 'in' 1, f independent to, ,,, 1, the one'fac'""""""' ff, ,, db the levelofthe he inae ende"
,,,, U -The within, q f, ,, l, nce(s)als\ICanbepaTtjtjOnedlntotwo ac, T tmentsSUm' subjectssouTceofvaTiance(s)als\ICanbePaTt""' ,

is less powerful Than the


range between the high

esr and lowest means decreases Duncan test

To compare allpossible pairs of means

Coinpured in the same way anthe

Newman-Keulstest with more

than two means to be compared

it is less conservative than the Newman-Keuls
Fisher test

hesametreatment)andexperimentaeTTOT- b-,, IsasafaCTOT

in the analysis(S). YOLitlTensu Tract it '. . , I, ,F, ,,IhLIS making, the in tanalsis(S). YOLitlTensubrractSfromtheLisualwitl\in-groupsva"'the in,I\eanajsjs(S). """"' he den minatoToftl\eFratio, j, ,IhLISmaking"' beanalYS"(' YOLjtjTensubrractSfromtheLisualWit\in'g' I ki, ,
Fratioinoresensitivetothee ectso tTern GP , ,

To compare allpossible
combinations of means

Powerful testrhar does not over

compensate to controlfamilywise
error rate; no special correction factor used; significant overall F ratio justifies comparisons

desi ITS are LISed to counterbalance the

order 11TwhiCl\Subj'''""" h L, t, ,\sqtjaredesigl\renal'

A conservative test is one with which it is more difficult co achieve statistical significance

than with a less conservative test. "Power" refers to the ability of a test to reject the null hypothesis when the null hypothesis Is false
SOURCE:Information in this table was summarized from Keppe1, 1982, pp. 153-159; Pagan0, 2007 Winer, 1971; and information found at http://WWW2. chass. ncsu. edu/garson/pa765/an ova. htm

theLatinsquaTeANOVA, see eppe , -

I veTall F ratio tells You that significant

diff, ,ences exist am''gV ' h- h ,,,, differ, yotjjnustftjTtheranayze 'thantaifferencesoccuT. To determinew it Ineai ,

's n our Ineans, but, netisLial, It does I\orte you


CHAPTER 14 , Using InferenrialStatisrics


your data. The tests used to coin are

betwee, b I GSimiartoroseLisedinthe
The Two-Factor Between-Subjects ANOVA Chapter 10 discussed The two-factor between-sub'ects des' e Two in I dd myassign letentsubjectstoeach Cond'I'. epen Grit variables and randomly assign different subjects to each

FIGURE 14-5 Graph showing a two way Interaction that masks main effects

~ "

. " -------------


Main effect FactorA

. =

e ect o t e two factors (interaction) o1\ the aboutrheme fh reviewdependent variable. (Ifyou are unclear a our The meanings of these Terms, "e- youareunclear Chapter 10. ) The anal SIS a To ri re Statistical"fi fb CauseiTinustetermjnethe

" L



Level I Factor A

Level 2

Factor B Level I

MumEffects and InterQctions Ifyoufind bothsignificanr am effects rid ii r

Level 2

c Ions in your experiment, you must be careful about interpreting the main effect . as ai\ effectI o1T the Grit varia e, regardlessregardless of the levelof o h e GPen dependent variable, of the levelofyour other inde-

to be inherently more interesting than main effects. They show how changes in one
variable alter the effects on behavior of other variables

The interaction shows that ITeirher of your inde endenr .' bl

Interaction is present.

Sample Size just as with a one'factor ANOVA, yoti can compute a inulrifactor

ect. onsequenr y, you should avoid interpreting main effects when an

ANOVA with unequal sample sizes. The tinweighred Ineans analysis can be con ducted on a design with two or more factors (the logic is the same). For details on

You should also be aware that certaiiT kinds of '

modifications to the basic two-factor ANOVA formulas for weighted means and un
weighted Ineans analyses, see Keppe1(1973, 1982)

ICa aria ysisI willto revea statistica y significant Inain effects for' thesefor. th f an failto revealstatistically significant Inain effects factors.

ain e ects. e in epeiT ent variables may have been effective, and errhe star' igure 14-5 s ows the cell means for this hypothetical

ANOVAforaTwo-FoctorBetween"Subjects Design: An Example An experiment conducted by Doris Chang and Stanley Sue (2003) provides an excellent example of the application of ANOVA to the analysis of data from a two-factor experiment
Chang and Sue were interested in investigating how the race of a student affected a
teacher's assessments of the student's behavior and whether those assessments were

The diagonal lines depictthe functional relationshi b ,

ria e at t e two levels of Factor B. The fact That the lines form ai\ X (rather'I than being parallel) indicates presence of an interaction. Notice that Facan eing para el) indicates the the presence of an interaction. Notice th F
y a ects t e eve o t e ependenr variable at both levels of Factor B but that these effects run in opposite directions.

specific to certain types of issues. Teachers (163 women and 34 men) completed a
survey on which they were asked to evaluate the behavior of three hypothetical chil dren. Each survey included a photograph of either an Asian-American, an African American, or a Caucasian child. The survey also included a short description of

The dashed line in Figure 14-5 represents the main effecr ofFa I A, linershorizonral, indicar' the dependent variable us, the , in ICaring t arthere is no change in h h "Or is across d
u per an owerpoints to co apse acrossthe levels of Factor B. This dashed

the child's behavior. The child's behavior was depicted as falling into one of three problem" types: (1) "overcontrolled" (anxious to please and afraid of Inaking
mistakes), (2) "undercontrolled" (alsobedient, disruptive, and easily frustrated), or (3) "normal" (generally follows rtiles, fidgets only occasionally, etc. ). These two vari

Logically, If the interaction of two variables is significant, then the t

Onsequent y, you Y aVGaSignicantinteractjon, ignoreLhem' ffI .Th I'ave a SignificantinteTaCtiOn,

GPen ent vana e at each level of Factor B, Its average (main) effect is zero.

ables comprise The two independent variables in a 3 (race of child) X 3 (PTOblein

type) factorial design. The SLITvey also included several measures on which teachers evaluated The child's behavior (e. g. , seriousness, how typical the behavior was, attri butions for the caLises of The behavior, and academic pe"formance)

or not the main effects are statisticalIy significant.

Finally, most of the rime you are more Interested i

We limit our discussion of the results to one of The dependent variables: Typicalicy
of the behavior. The data were analyzed with a two, factor ANCVA. The results

I an in main effects, eveiT before your experiment Is conducted. H orhesized I'ps among variables are often stared in terms of Interactions. Interactions t d

sh, wad , signifi. ant main atect of p"obl. in Type, F(Z, 368) = 46.19, p < .0001
Normal behavior (M = 6.10) was seen as more typical than either undercontrolled


I, ,*4




J, *,.--,.

.. * ......,., j.

.~,.**. " -*,


...,., ..




,... .... ...~. ....:.

..-. ~.~ . .- . ...




..,..;::.,^ *

, . .,,. ;. , ....

, .

. , ,:,;






CHAPTER 14 * Using InferentialStaristics


(M = 4.08) or one'cont, Quad (M = 4.34) bebanjo, . The ANOVA al, . chow. d a ,re-

The Two-Factor Within-Subjects ANOVA

tistically significant race by problem-type interaction, F(4, 368) = 7.37, p < .0001
Interpreting the Results This example shows how to interpret The results from

Allsubjects it\ a within-subjects desigiT will\ two factors are exposed tt) every possible
combinatioi\ of levels of your two Independent variables. These designs are analyzed

a two-factor ANOVA. First, consider the Two mall\ effects. There was a significant effect of problem type on typicality ratings. Normal behavior was rated as Inore tvpicalthan overcontrolled or undercontrolled behavior. If this were The only significant effect, you could then conclude that race of the child had no effect o1\ typicality
ratings because the mail\ effect of race was I\ot statisticalIy significant. However, this conclusion is nor warranted because of The presence of a significant interaction between race of learner and problem type.

using a two-IncloT witliin-subjects ANOVA. This anal\, SIS applies the same logic develuped for the one'factor within-SIIhjccts ANCVA. As in the one'factor case, subjects
are treated as a factor along with your manipulated independent variables. The major difference herweei\ the one' and two-factor within-subject ANOVA

is That yotiintist consider the interaction between eacl\ of yotn'independent variables

and the SIIhjects factor (A X S and B X S), in additioi\ tti the Interaction between

your Independent variables (A X B). Because the Itasic logic and interpretation ttf
results from a within-subjects ANOVA are essentially the same as for The between-

The presence of a significant interaction suggests that the relationship between the two independent variables and your dependent variable is complex. Figure 14-6 shows the data contributing to the significant interaction in The Chang and Sue (2003) experiment. Analyzing a significant interaction like this one involves Inaking

subjects ANOVA, a complete example isn't given here. A complete example of the
two-factor within-SLiblecrs ANOVA can be found in Kcppe1(1973) Mixed Designs

comparisons among The meansinvolved


Because Chang and Sue (2003) predicted the interaction, they LISed planned
of the Caucasian child and African-American child. Teacherssaw the ITormalbehav-


comparisons Orests) to contrast the relevantineans. The results showed that the TVpicaliry of the Asian-American child's behaviorwas evaluated very differently froin that
ior of the Asian-AmericaiT child aslessrypicalrhan the ITormalbehavior of either The

In some situations, your researcl\ may call for a design I\\ixing between-SIIbjecrs and within-subjects components. This desigi\ was discLissed briefly in Chapter 11. If You
LISe such a desioi\ (knowi\ as a mixecl or split-1,101 design), y. ti cal\ analyze your data with an ANCVA. The computations Invt)Ive calcularing sums of squares for the
between factor and for the within factor

Caucasian or African-American child. Teachers saw the overcontrolled behavior by

the Asian-American child as more typicalrhaiT the same behavior attributed to the
African-American or Caucasian child. The undercontrolled behavior wasseeiT asless typical for The Asian-American child than forthe African-American and Caucasian

The most complex parr of the analysis Is the selectioi\ of ai\ error Ierin to calculate the F ratios. The within-grotips Incai\ sqLiare Is LIScd to calculate the between-

subjects F whereas The interactioi\ of the withii\ factLir witl\ the within-groups
variance is LISed to evalLiate built the within-subjects factor and the interaction

children, respectively. So the race of the child did affect how participants rated The typicality of a behavior, butthe nature of that effect depended on the Type of behavior
attributed to the child.

between the within-subjects and between-subjects factors. Keppe1 (1973, 1982) provides an excellent discussioi\ of this analysis and a complete worked example. Higher-order and Special-Case ANOVAs

8 7 ' , 6

Variations of ANOVA exist for lust ribotit any desigi\ LISed in research. For example,
.t~ AsianAmerican

yoti cal\ include three or f<IUT factors ii\ a single experiment and analyze the data with a higher-ordeT ANCVA. In a three-factor ANOVA, for example, yoLi can test three
mail\ effects (A, B, and C), three two-way Interactions (AB, AC, and BC), and a

." "L

*..,, t
*,' '

aE 5
E :g

'a. AfricanAmerican

8 " 4
a~ " ~ 0 3

\\ \,.,,~,^a
*. *~

.*', Caucasian

three-way interactioi\ (ABC). As yoLi add factors, ITUwever, the coinptitations become more complex and probably SITould ITUt he done by I\and. In addition, as disctisse\I in Chapter 10, It Inay EC difficult to interpret the ITigher-order interactions will\ Inure
ThaiT {oLIT factors


A special ANOVA is LISed \vhei\ y. ti have incltidcd a continuous correlational variable in Your experiment (such as age). This type of ANOVA, called The analysis of covariance (ANCOVA), allows yotito examine the relationship I\etween exNormal Overcontrolled Under on trolled

perlmentally Inariipulated variables while controlling another \, ariable that may Ite
correlated with them. Keppe1(1973, 1982) pro\. Ides clear discussions of these analyses
and other issues relating to ANCOVA

Problem type

FIGURE 14-6 Graph showing an interaction between I'ace and problem type
SOURCE: Chang and Sue, 2003; reprinted with permission

To summarize. ANOVA Is a powerfLil parametric statistic LISed to analyze one'

factor experiments (either within-subjects or between-subjects) with Inure Than two

., .. ... . ..

. -..~,

..,. --

~ .,

~. - .

*. ,"' .:*.' F. ,





CHAPTER 14 * Using InferentialStatisrics


to the LISe of parametric statistics in general (such as homogeneity of variance and normally distributed sampling distribution) apply to ANOVA.

treatments and to analyze Inulrifacror experiments. IT Is intended for use wheiT your dependent variable is scaled on at least ai\ intervalscale. The assumptions that appl

TABLE 14-5

Noriparametric Tests

ANOVA involves forIn ing a ratio between the variance catised by your inaependent variable PItis experimental error and the variance (mean square) caused by experimental error alone. The resulting score is called an F ratio. A significant F ratio tells you
That at least one of yoLir Ineans differs from the other means. Once a significant effect rimcanr effect in order to determine where the significant differences occur. These tests


one'sample Tests
Binomial Nominal Nominal Ordinal

is found, you theIT perforin more detailed analyses of the Ineans contributing to the SIg-


become more complicated as the design of your experiment becomes more complex.


Can be used as a more powerful

alternative to chi-square

Two ladepet, dent Samples

Nominal Nominal Ordinal Ordinal Ordinal Interval

Thus far, this discussion has centered on parametric statistical rests. In some SILLialions, however, yon inay not be able to LISe a paramerric test. WheiT your data do nor meetrhe assumptions of a parametric test or whei\ your dependent variable wasscaled
o1T a nominal or ordinal scale, consider a ITonparametric rest. This sectioiT discusses

Fisher exact probability Kolmogorov-Sinirnov

Wald-Wolfowitzruns Moses test of extreme

Alternative to chi-square when

expected frequencies are small

More powerful than the
Mann-Whitney U test Less powerful than
Mann-Whitney U test Teststhe difference between



Three ITonparametric rests: chi-square, the Mann-Whimey U rest, and the Wilcoxon signed-ranks rest. Yotiinight consider using Inariy other nonparametric tests. For a
complete descriprioi\ of these, see Siegel and CastellaiT (1988). Table 14-5 summarizes some Information on these and other nonparamerric rests Chi-Square

Randomization rest

gtiilry) or a frequency count (such as how many people voted for Candidate A and how many for Candidare B), The statistic of choice is chi-square (X'). Versions of
chi-square exist for studies with one and two variables. This discussion is limited to Siegel and Castellai\ (1988) or RDScoe (1975).

When your dependent variable Is a dichotomous decision (such as yesjno or gtiiltyjnot

Mann-Whitney U

Ordinal or above

means without assuming normality of data orhomogeneity of variance Good alternative to Itest when

assumptions violated

the two-variable case. For further information o1\ the one'variable analysis, see either

Two Reluted Samples

Nominal Ordinal

Chi-Square for Contingency Tables Chi-sqtiare for contingency tables (also called
the chi-sqi{aTe test for independence) is designed for frequency data in which the Tela,

Good test when you have a before-after hypothesis

Good when quantitative measures


Lionship, or contingency, between Two variables is to be determined. In a voter prefererrce study, for example, yoti might have liteasured sex of respondent in additioi\ to
candidate preference. You may wantto know whether the two variables are related or
Wilcoxon matched pairs
Walsh rest Ordinal Interval

are not possible, but you can rank


, ., ,,

Good alternative to Itest when

Independent. The chi-square test for contingency Tables compares your observed cell

normality assumption is violated

Good nonparametric alternative to the t test; data must be distributed symmetricalIy

. 4 t ,,

frequencies (Those you obtained in your study) with the expected cellfTeqi{encies (those
you would expect to find itchance alone were operating).


*;;! $*.
.,. .,

A study reported by Herbert Harari, OreiT Harari, and Robert White (1985) provides aiT excellent example of the application of the chi-square test to the analysis of frequency data. Harari et al. investigated whether male participants would help the
vicriin of a simulated rape. Previous research on helping behaviorsuggested that indi-

Randomization testfor matched pairs


.* ,


viduals are less likely to help someone in distress If they are with others than if they

, :.


I, -

I*. ,,.

. .

. ...



tv 31

,. ..
-*.... .*.~...,



CHAPTER 14 , Using InferentialStatisric

TABLE 14-5


NoriparametricTests co"tin"ed

TABLE 14-6

Number of Participants Helping Mock Rape Victim,

in Two Conditions DIDNOT INTERVENE




More Thon Two Reloted Samples

Cochran Q test
Nominal Ordinal


26 60

40 40

Most useful when data fallinro


Friedman two-way

natural dichotomous categories

SOURCE: Data from Hatari, Hatari, and White, 1985

More Than Twolndependent Samples


Nominal Ordinal

The Mann-Whitney UTest Good alternative to a one'factor



Kruskal-Wallis one-way

A h ' owerfuliTonpaTainetric test is The Mann-Whitney U test. The Mann-



ANOVA when assumptions are


Whime . tcst cal\ be LISed whci\ y. tn' dependent \, ariable is scaled on at east an ordinal scale. 11 is alsLt a good alternative to the I test whei\ yotir data do ITot Ineel t e


OURCE: Data from Roscoe, 1975, and Siegel and Castellan. 1988

, . intrions of the I rest (sucl\ as \\, heri the scores are I\or normally distributed, w en
the variances are heterogeneous, \IT whei\ yoLi have smallsamplc sizes). CalculatioiT of the Mani\-Whimey . tcstisfairly simple. The firststep tsu> coin-

, ...

arealone. Hararietal. conducted antidinvestiati '.'

bine the data from your two grotips. Scores are ranked (froin highest to lowest) an\

Ing a one or in noriinteracring groLips) were exposed to a mock in re (a male

s I e reqtiencies of participants I}elping Linder the two condir

I, bele I accordin, to the group to \\, hich they belong. H there Is a difference .etween

e o t e experimenters grabs a female confederate and dra s I\er

Yourgroups, ' --ILld'tributed. AUScoreis

alculated for each group in your experiment. The lower of the Iw!I . scores o laine 11 ai\ the tabled U value, yon theiT conclude your two groups differ significant y
The Wilcoxon Signed Ranks Test

Tom a c i'square test performed on These data lionshjpberweenthedecisioiTroofferhjshowedg icanjre, . d h ' a si. rim , "I. ions ip erween the decisioiT roofferhelp andwhetherpartici antsw I
I cuayinoreieytoTejpthajtthosewhc)were

theIT evaluated against critical\, alues of U. If the lower of the Iwt\ U scores is sintt eT

linttutions of Chi-Square A problem arises If any of your ex ected cell fre, Lien-

If I conducted a single-factor experiment LISing a correlated-samples (relatet) or

t hed-tails desi, n, the Wilcoxon signed ranks test would he a good statistic to

(Graverrer6{Walln, ,, 2007). Y' '''q""'naYheartificiallyjnftat,"" (Gin, er, ere*wall, an, 2007). Y, ,, h, . h ' ""' "' 'in d
Iact pro a lity test (see Roscoe, 1975, or Siegel ,* Castellan, 1988) is an
Tingencytable(Roscoe, 1975) ' con-

analy, eyour ata- *"""' ' I, ,,,,,, ked(disre. archngthe

sign of the difference score) from smallest to largest. Next, each rank is assigned a
'live or I\e alive sign, depending o1\ whether the difference score was positive or
negative. The positive and I\egative Tan s arc t ei\ summeL.

negai P I 11. ettial. However, ifthe

f h isillve and ne alive ranks are very different, theI\ the nullhypothesis ai\ Ite re'ected. For more informal101\ o1\ the Wilcoxoi\ signed ranks test, see iege
and Castellan (1988).

signi cantchi-square tellsyotirhatyourrwovariables aresi n 11 I I

withANOVA, however, chi, -d Psarereare. A,
an two categories o each variable exist. To determine the locus thecontingencytable ceso examp e, a yon now Is That group size and helpino are related. A

Parametric Versus Nonparametric Statistics

N\IT arametric statistics are LISeful when yoLir data do ITot meet the assumptions
f atametric statistics. If yoLi have a choice. cl\QOSe a parametric statistic over a



IJ, ,._.,,. . .," ..,'.' ' '~



,,..,. ....... ~

"-^ ~,.* .~- "-~ ,...*. - .. ~

' 'fLL. ,\. ,..,. .., A .., ~"'

..". *


CHAPTER 14 , Using InferentialStaristics



nonparametric one becaLise parametric statistics are generally more powerful. That ,
ua y provi us a more sensitive rest of the null hypothesis

secon problein with noriparametric statistics is that appropriate versions

tailed test. This can be easily demonstrated by looking at the critical values oftfound in Table 2 in the Appendix. At 20 degrees offTeedom, the critical value at u = .05 for
a one' tailed test is 1.73. For a two-tailed test, the critical value is 2.09. It is thus easier

One' Tailed VetsMS Two, Toiled Tests A two, tailed restisless powerful than a one'

Ouhjd I q Y, Wenesigningyourstudy,
parametric statistic calT be used.

to Teect the ITUll hypothesis with the one' tailed test than with the two-tailed test.

chan us the value of the dependent variable is Termed the effect size. To facilitate

Effect Size The degree to which the manipulation of your independent variable Ie, the effecrsize for the difference betweei\ two treatment means might be reported as (Mz - Mj)/s, where s is the pooled sample standard deviation (Cohen, 1988).


comparison across variables and experiments, effect size is usually reported as a PTO-

ortion of the variation in scores within the treatments under comparison; for exam-

forward. Howe , If YPP"'SimPeansrraight,

amerric or nonparametric statistic, when using any inferential statistic. This section

The application of The appropriare Inferentialstatisric ina a co

Measured in this way, effect size estimates the amount of overlap between the two
o ulation distributions from which the samples were drawn. Large effect sizes ind'cate relativeI little overlap: The mean of Population Z lies far into one tail of the

iscusses some special topics to consider when deciding on a ,,. , evaluated, strategy to statisticalI
Power of a Statistical Test

distribution of Population I, so a Teal difference in population means is likely to be detected in the inferentialtest (good power). Small effect sizes indicate great overlap
in the o Ination distributions and Thus, everything else being equal, relatively little ower. However, because inferGrillaltesrs rely on the sampling distribution of the test
statistic rather thai\ the population distributions, you may be able to improve power in such cases by, for example, increasing the sample size. decide whether or not to reject the null hypothesis, the issue of power is important.

Inferentialstatisrics are designed to help you determine The validi are inconsistent with The nullhyporhesis. The power of a statistical test is its ab'I' sensitive to differences in your data Than a less powerful one' The issue of the power of your statistical rest is an jin ortant o

Orhes's. C I ITeTevaiityotenullhy-

reject the null hypothesis (Gravetrer Is{ Wallnau, 2007). A powerful statistic is more
nti ypor esis imp ies that your independent variable affected your dependent

erect t ese differences. Put in statistical terms, power is a statistic's abilit to corre 11

Determining Power Because the business of inferentialstatistics is to allow you to

You want to be reasonably sure that your decision is correct. Failure to achieve sta-

tisticalsignificance in your experiment (thus not rejecting the null hypothesis) can be caused by many factors. Your independent variable actually may have Do effect,
difference, or you did not use enough subjects.

to reject the null hypothesis Is not caused by a lack of power in our statistical re I. e power of yourstatisricaltest is affected by your chosen alpha level, the size of
duced '

o r ex eriment may have been carried outso poorly that the effect was buried in variance. Or maybe your statistic simply was riot powerful enough to detect the

can be set directly, it is not so easy to determine what the power of Your analysis wi

Although alpha (the probability of rejecting the null hypothesis when it is True)

be. However, you can work backward from a desired amount of power to estimate t e

Alpha coe As you reduce your alpha level(e. g. g.froin .05 to to .01), 0Lireduc the p a Le"el As you reduce your alpha level(e. , , froin .05 .01), yoLireduce he probability of making a Type I error. Adopting a more conservative al ha Ie I k
it more ithculr to reject the null hypothesis. Unfortunately, it also reduce
Iven a constant error variance, a larger difference between means is re uired I

sample sizes reqtiired for a study. To calculate these estimates, you must be wiling to
to find in Your experiment, and the expected error variance.

state The amount of power required, the magnitude of the difference that You expect

obtain statistical significance with a more conservative alpha level.

estimated from pilotreseaTch, from theory, or froin previous research in your area. For exam Ie, if revious research has found a small effect of your independent varia e

The expected difference between means and the expected error variance can e

Sample Size The power of ourstatistical of your sample Is ICa test Increases wit I e size

(. g. ,Zp. ints), you. anuseT rs, Saner' bjjh. ThreinO

a Teed-on acceptable or desirable levelofpower (Keppe1, 1982). Ifyou are willing and able to specify the values mentioned, however, you can estimate the size of the samp e
6< Wallnau, 2007, or KGppe1, 1982, for a discussion on how to estimate the require
sample size. )

particu ar, the standard errors of the means from your treatments will be lower, so Ih
POSiions o t e popu ation means fall within narrower bounds. Consequentl,

needed to detect differences of a given magnitude in your research. (See Gravetter

hypothesis when it is false.

., ,, ,. ."
*, " ,.



.*.. .,' .,., . .,

..,. ,

., .... ,

^ ...~...

.. .,

A . ~


,, ,

*,. .-




Using Inferential Statistics

Too much power can be as bad as too little. If you Tai\ enotigl\ subjects, yoti could conceivably find statistical significance in even the InOSI minute and trivial of diffei ences. Similarly, when you use a correlation, you can achieve statistical significance even with small correlations tryou include enough subjects. Consequently, yoursam PIG should be large enough to be sensitive to differences between treatments but ITot so large as to produce significant but trivial results The possibility of your results being statisticalIy significant and Yet trivial 11THy seem strange co you. Ifso, The next section may clarify this concept

'tl. ht , Ithar yoLiderermine reasonable for yoLn' PUTpos , --,, 'tl. ht II ,elrhar oLiderermine IsIs reasonable for yoLn' PUTp\ ,

, ". t"inureTeliablC"Thai\SignificantTeSLiiSOtrainC\, ,,

Statistical Versus Practical Significance

To say that results are significant (statisticalIy speaking) merely indicates LITat the ob served differences betweensample means are probably reliable, northe result of chance Confusion arises when You give the word significantitsii\ore coinmoiT nTeaning. Some thing "significant" in This more common sense is important or worthy of note The fact That the Treatment means of Your experiment differ significantly ina\ or may not be important. If the difference is predicted by a particular theory and ITot by others, then the finding may be important because it supports the theory ovei the others. The finding also may be important if it shows that one variable strongly affects another. Such findings may have practical implications by demonstrating, for exainple, the superiority of a new therapeutic technique. In such cases, a statisticalIy significant (i. e. , reliable) finding also may have pro^Cttlsign"cance Advertisers sometimes purposely blur the distinctioi\ herweeiT statistical and PIac ticalsignificance. A few years ago, Bayer aspirii\ announced The results of a "hospital study on pain other than headache. " Evidently, groups of I\OSpitalpatienrs were Treated with Bayer aspirin and with several other brands. The advertisement glossed over The details of the study, bur apparently the patients were asked to rate the severity of their pain at some point after raking Bayer or Brand X (the identities of both brands were probably concealed). According to the ad, "the results were significant-Bayer was bet ter. " However, the ad did riotsay in what way the results were significant. Evidently, the results were statisticalIy significant and thus probably riot caused by chance. Without any information aboutthe pailTratings, however, yoLido I\orkno\v Ifthis finding has any practicalsignificance. It may be That the Bayer and Brand X group ratings differed by less rhaiT I point on a 10-pointscale. Although this average difference Inay ITave been reliable, it also may be the case that I\o individual could tellrhe difference between two pains so close together on the scale. 11\ that case, the statisticalIy significant difference would have no practicalsignificance and would provide no Teasoi\ for. choosing Bayer over other brands of aspirin

werercstingthecffectivenCSSO, , ,,,, jousrhai\a'YP"

Trot. Ifyoti^etain The ITUlll^, POT \esis w Tel\ It Is , .
convictedasaTeSLit- . balancebetweenTVPel

, \ H '*Tsunfortunutcly, most journalswilli\ulpLi is

h t nt at least at The p < .05 level. Chapter 3 t , NITificantaTlcaStaTTith<. eve. exam' SIoi\ of publicatioi\ practices
Data Transformations

'ansfoTm your data with the appropriate

f tion. TransformingdarameansconvertingyoLirorigi datatranSfOrma"""' b CCDjniishedbyaddiUgorSUb,

V It Subtractingaconsrantfroineachscorecan am f-QineachscorecanmaketheiTurnersinanag ,to 'richscoreinighrremovei\egativenumers- .. ncdis,

b doesn't change. ThemeaiTofthedistrititiOD g ' d deviationdoesI\Or- "V'' lidjj, ,,, t, qnsfoTmqtionS, Simply Levia s ItTansformations, calledlineartTansfoTmutions, SImpy
I"n, etheinagniiudeofrhei\umbersTepresentingy ,
thescaleofmeaStiTement. ,, 110ns. MYOUT

h In tions, yoticouldchooseadifferenrstatistic. datadonotmeettheseassumptions, yoticOLl htcanbG datadonotmeettheseassumptiOnS, VO\I -,, ICthatcanbG

The Meaning of the Level of Significance

In the behavioral sciences, an alpha level of .05 (or I chance in 20) Is usually consid ered the maximum acceptable rate for Type I errors. This level provides reasonable protection against Type I errors while also maintaining a reasonable levelofpowerfor most analyses. Of course, ifyou wantto guard more strongly against Type I errors, you can adopt a more stringent alpha level, such as the .01 level(I chance in 100)

d in transformations and The conditions tin er w IC


Using Inferential Sintisrics

singleexperimenr. WhenvOUTeie d, Id, ccuronlvOD"

TABLE 14-7 DataTransformations and Uses

.' thenunh orhesisatp<. 05, it meansthata

d obabl notduetochancebutrathertothee ecto I e '

Square root

x' = Vie


When cellmeansand variances are related, this transformation makes

variances more horno

geneous; also, if data show

a moderate positive skew

X' = 2 arcsin V5<


When basic observations

.O a havedarathatbadlyviolatetheassumprionso parame
ithnoappTopTiatei\DriparametTicstatisticrouseinstea ,

X' = Z arcsin

are proportions and have a X ,: (1/2n)' binomial distribution

Normalizes data with

Thereliabilityofyourdatabyreplication- liable,


X' = log X

,picationm"' ' I f eachreplicationReplicationdoes

,, iginalexperime''. thnrheoriginalcontext-The origll P atameteTswithii\Theoriginalcontext. e
new experiment will provi e a c e

severe positive skew

X' = log(X + I)

Formula used if basic observations are frequencies or Ifvalues ofX are small
bFormula used ifvalues ofX are close to O or I Formula used ifvalue ofX is equal to or near O SOURCE:Information summarized from Tabachnick and Fide11, 2001, and Winer, 1971

Informatio" atOSmall, ridesIgnsorsituationsin ' Youcanincludeai\elementof replicationin hichviolationsofassumptjonsoccur. You can incue fia,

whichviolationsofaSSUmptiOnSO , Itj, , OUTOWnfi"'

Data transformations to make data conforin 10 the assumptions of a statistic are

being LISed less and less frequently (KGppe1, 1973). ANOVA, perhaps the most coinmonly used inferentialstatistic, appears To be very robust against even moderately
serious violations of Its assLimptions L!riderlying the test. For example, Winer (197 I)
has demonstrated that evei\ if the within-cell variances vaTy by a 3:1 ratio, the F test

is I\ot seriously biased. Transformations t)f the data Ina^ riot be ITecessary in these cases. Also, \vhei\ yon trans{trim your data, your conclusions must be based on the

transformed scale and I\or the original. In most cases, this is ITor a problem. However,

Keppe1(1973) provides ai\ example in which a square Tool transformation changed


significantly the relationship betweei\ Two means. Prior to transformation, the mean for OroLip I was lower than the Inean for Group Z. The opposite was true after transUse data transformations only when absolutely I\ecessaTy because they can be trick . Sometimes Transformations of data correct one aspect of the data (such as

restoring normality) but induce new violations of assumptions (such as heterogeneity of variance). If yoti inList LISe a data transformation, before going forward with yoLir analysis, check to be sure that the transformation had the intended effect.
Alternatives to InferGritial Statistics

I th result of "noise. " Inferentialstatistics can con To

hLiman tendency To interpret every appaTen

weremeaningful athereforeinayfail
Inferentialstaristics are tools to help yon inake a decisioi\ about the null hypothesis.

h clearl shownbyTeplication. ACasempointisprovi y

series of experimentscon ucte y

Essentially, inferenrial statistics provide yoLi witl\ a way to test the reliability of a



CHAPTER 14 , Using InfercnrialStarisrics


experiment, Three groups of eightrats were exposed to a schedtile of predictable shock,

or pain sensitivity by means of the "tail-fuck" rest. In the tail-flick test, a hot beam f

e ecr o predictable versus tinpredicrable shock schedules on pain sensitivir . In e h

ordinal scale, a noriparametric statistic can be used (such as chi-square or the MannWhimey U test). These tests are LISually easier to compute than parametric tests. However, they are less powerful and more limited in application. Noriparametric statisrics may nor be available for higher-order factorial designs. Statistical significance indicates that the difference between your means was

rail out of the healn (a protective reflex) indicated the degree of am sensitivit .

I was octise on tle rat's tai. The length of rime elapsing Lintil the rat flicked 'I

establishedfindin. I ad" ,h , Picatingawe,

statisticalIysignificant(p>. 05). '

sensitive t ai\ the group exposed to predictable shock. However, this effect was t aramerers of the experiment were twice altered in ways that were ex ected t

unlikel if only chance were at work. Itsuggests that Your Independent varia e a f{ I. Two factors contribute to a statisticalIy significant effect: the size of the difference betweei\ means and The variability among the scores. You can have a arge difference between Ineans, but if the variability is high, You may not find starisrica incance. Conversely, you may have a very small difference and find a significant
effect if The variability Is low.

rep icate . However, each replication produced virtually the identical result. On ea h occasion, I e 11npredicrable shock group demonstrated lesssensitivir to am tha Ih pre ICta e shock group, and each time this difference was riotstatisticall SI rim The problemcouldbedealtwirhbytakingineasures to increase the f h

increase The size of the predictability effect (if it existed), and the experiment was

Consider The power of Your statistical test when evaluating your results. I you o not find statistical significance, perhaps no differences exist. Or it could mean that Your rest was not sensitive enough to pick up small differences that do exist. Sample size is

'In ortanr contributor to power. Generally, the larger the sample, the Inore POWe u the statistic. This Is because larger samples are more representative of the un er ying o ularions than are smallsamples. Use a sample that is large enough To e sensitive to to s ecif an expected magnitude of the treatment effect, an estimate o error vari-

of Ihfid I Sources. ntiscase, thereliability

analysis itself indicated that the results were probably nor reliable.

', o o so won appear 10 e a waste of resources. In this case, the reliabilit

differences but riotso large as to be oversensitive. There are Inethods for determining timalsain Ie sizes for a given level of power. However, you must be willing and ab e ance, and The desired power. The first two cal\ be estimated from pilot data or previous
research. Unfortunately, there is no agreed-o1\ acceptable levelofpower. A al ha level of .05 is the largest generally acceptable levelfor Type I errors. This value has been chosen because It represents a reasonable compromise etween

of therese-I .A ganareiTortegoaj suc a esign. Much like designing Your experiment before developing h orheses, variableThewa Idjk P"YOUrinepen, ,,
works best for. that design.

a particLi ar way simply becaLise a particular Inferential statistic Is available to anal

T e I and Type H errors. In some cases (such as in applied research), The .05 Ieve
to Teach the conventional level of significance.

may b conservative. However, journalsprobably will norpublishresultsthatfai be too conservative. However, journa s p

theI\ select the method of analysis (whether inferential statistic or re 11cati

Data transformations are available for those situations in which your data are in some wa abnormal. You may transform data ifrhe ITUmbers are large and unmanage-

bl or if OUT data do norineet The assumptions of a statistical test. The trans ormaton of data to meet assumptions of a rest, however, is being done less requent y

because inferentialsratistics tend to be robust against the effects of even moderate y veTe violations of assumptions. Transformations should be used sparingIy because they change the I\arure of the variables of yoursrudy
Z 3 4 5

is c apter has reviewed some of the basics of InferGritial statistics. Inferenrial fullcs go eyond simple description of results. They allow you To determine wh th r e i erences observed in yoLir sample are reliable. Inferentialsratistics allow ou t

e a ecisioit a otit The viability of The null hypothesis (which states That there is

nullhoth h g proaiityorejectingthe

Why are sampling distributions important in inferGrillalstatistics. Whatissampling error, and why is it importantto know about. What are degrees offreedom, and how do they relate To inferentialstatistics.
How do parametric and nonparametric statistics differ!

ANOVA)Inakeassuintioi\ ab, ,, h "" 'stettesra, d

examp e, t ese tests assume that the sampling distribution of means is normal a d

What is the general logic behind Inferentialstatistics! 6. How are Type I and Type 11 errors related!
7. What does statistical significance mean!

fo h d Tametricstaristicsareesigned
vio ate I e assumptions of a parametric test or your data are scaled on a nominal

8. When should you use a one'tailed or a two-tailed Test.

9. What are the assumptions Lindenying parametric statistics.


, I


A1' %*'

t. "

r ':;;'



Using Infercnrial Statistics


Whicl\ parametric statistics would yon Lise to analyze data froin ai\ experiment with two grillips! Identify which statistic would be LISed for a particular type of
desigi\ or data.

11. Which parametric statistic is 11ToSL appropriate fin' designs with Inure than one
level of a single independent variable!

12. Whei\ would y. 11 do a planned versus at\ 11nplanned comparison, and why! 13. Whatis the difference between weighted and tinweigl\ted Ineans analysis, and
when would yoiiLise each!


14. What are a Inaii\ effect and ai\ interaction, and how are they analyzed! 15. Under what conditions would yoLiLise a ITonparainetric statistic!
16. What is Ineant by the power of a statistical test, and what factors can affect 111

17. Does a statisticalIy significant finding always have practicalsignificance! Why

or why not!
LISing Line!

Using Multivariate Design and Analysis

uring discussions of experimental and nonexperimental design, previous chapters assumed that only one dependent variable was included in a design or that multiple dependent variables were treated separately in any statistical tests. This approach to analysis is called a univariate strategy. Although Inariy research questions can Ile ad dressed with a univariate strategy, others are best addressed by consid Gring variables together in a single analysis. WheiT you Incltide two or more dependentineasuTes in asingle analysis, you are using a multivari
ate strategy

Correlation al a Multivariate De Correlation al

Causal Inferer

Assumptions a
Multivariate St

18. Whei\ are data transformationstised, and whatshould yoti consider when 19. What are the alternatives to inferenrialsratistics for evaluating the reliability
of data!


Normality ant
Multicollinear Error of Meas


Sample Size
Multivariate St

inferential statistics

F ratio

standard error of the Inean

p value planned comparisons tinplanned comparisons

pepcomparisoi\ erroi

degrees offreedoiiT (df)

Type I erroi Type H error alpha level(or) critical region
I test

fomilywise error analysis of covariance (ANCOVA) chi-square (X ) Mann-Whimey U test Wilcoxon signed ranks test

tiest for independentsamples t rest for correlated samples

z test for the difference between two

This chapter introduces the Inajor InLiltivariate analysis Tech niques. Keep in mind that providing aiT in-depth introduction to these techniques in the confines of one chapter is impossible. Such a task is better suited to ai\ entire book. Also, the complex and laborioLis calculations needed to compute Inulrivariate statistics are better left to computers. Consequently, this chapter does not discuss the ITTathematics behind these statistical tests except for. those cases in which some mathematical analysis is reqLiired to tindersrand The issues. Instead, this chapter focuses o1T practical issues applications of the various statistics, the assumptions That must be Inet, and inter pretation of results. Ifyou want to use any of the statistics disctissed in this chapter, ^earl Using Multivariate Statistics (Tabachnick & Fidell 2001) or one of the many monographs published by Sage Publica

Factor Analys
Partial and P

Multiple Regr
DISCriminant Canonical Co Multivariate I

Multiway Fret

Path Analysis
Structural Eq
Multivariate Ar Note

Review QuestIC

effect size data Transformation

analysis of variance (ANCVA)

1<ey Terms

A multivariate design Is a research design In which multiple depen dent or Inultiple predictor analor CTiterioiT variables are included Analysis of data froin such designs requires special statistical proce dures. Multivariate desigi\ and analysis apply to both experimental and correlational research studies. The following sections describe
some of the available multivariate statistical rests