0 Stimmen dafür0 Stimmen dagegen

391 Aufrufe18 SeitenMar 19, 2012

© Attribution Non-Commercial (BY-NC)

PDF, TXT oder online auf Scribd lesen

Attribution Non-Commercial (BY-NC)

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

391 Aufrufe

Attribution Non-Commercial (BY-NC)

Als PDF, TXT **herunterladen** oder online auf Scribd lesen

- The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
- Why We Sleep: Unlocking the Power of Sleep and Dreams
- The Power of Habit: Why We Do What We Do in Life and Business
- Grit: The Power of Passion and Perseverance
- Come as You Are: The Surprising New Science that Will Transform Your Sex Life
- Come as You Are: The Surprising New Science that Will Transform Your Sex Life
- Postmortem
- Undaunted Courage: Meriwether Lewis Thomas Jefferson and the Opening
- Love and Respect: The Love She Most Desires; The Respect He Desperately Needs
- The Bone Labyrinth: A Sigma Force Novel
- Self-Compassion: The Proven Power of Being Kind to Yourself
- Influence: The Psychology of Persuasion
- Daring Greatly: How the Courage to Be Vulnerable Transforms the Way We Live, Love, Parent, and Lead
- Team of Teams: New Rules of Engagement for a Complex World
- The Highly Sensitive Person
- Radical Remission: Surviving Cancer Against All Odds
- How to Change Your Mind: What the New Science of Psychedelics Teaches Us About Consciousness, Dying, Addiction, Depression, and Transcendence
- Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness
- Out of My Later Years: The Scientist, Philosopher, and Man Portrayed Through His Own Words

Sie sind auf Seite 1von 18

linear regression bivariare linear regression least-squares regression line regression weight

..,.,...,.:,..:;*-:.;. f{.*.\,**\ I a

CHAPTER OUT

~ , ..~.~-~-~.. .-...~

Using InferGritial

Statistics

hapter 13 reviewed descriptive statistics that help you characrerize and describe your data. However, they do ITot help you

Parametric Versus Nonpai

Statistics

Statistics Statistical Errors

Statistical Significance

One-Tailed Versus Two-Te Parametric Statistics

whereas ai\ unreliable one may riot be. Statistics that assess the TellTICS of the samples comprising your data

Assumptions Underlying ;

Parametric Statistic

ability of your findings are called inferential statistics because they let you infer the characteristics of a popularioi\ from the characterisThis chapter reviews The most widely used inferentialstatistics. Rather Than focusing on how to calculate these statistics, This discussionfocuses on issues of application and interpretation. Consequently,

Samples

The tTest

Contrasting Two Groups

The zTest for the Differei

INFERENTIAL STATISTICS: BASICCONCEPTS

. . _ . . ..._. ....-...--~-. ~ ,.~"-~.,,..",,'.*,.,"L"~' ,,~~,~"., -.~ "'~' "' ~ '~

The One-Factor Between

ANOVA

The One-FactorWithin-S

ANOVA

Before exploring some of the Inore popular inferential statistics, we resentsomeofthebasicconceptsunderlyingthesestatisrics. You should understand these concepts before tackling the discussion on inferGritial statistics that follows. If you I\eed a more comprehensive refresher on

these concepts, consult a good introductory statistics text

ANOVA

The Two-FactorWithin-S

ANOVA

Mixed Designs

ANOVAs

Sampling Distribution

Noriparametric Statistics

Chi-Square The Mann-Whimey U T^

Parametric Versus Norip;

Statistics

and then plotting the frequency of each score or range of scores. It is These samples could be used to form a distribution of sample means.

also possible to create a distribution by repeatedly taking samples of a given size (e. g. , n = 10 scores) from the population. The means of If you could take eveTy possible sample of n scores from the populalion, you would have what is known as The sampling distTibtttion of

the mean. Statistical theory reveals that this distribution will tend to

DataTransformations Alternatives to Inferenti,

closely approximate the normal distribution, even when the population of scores from which the samples were drawn is farfrom normal

41

..

.\

,... ,,,... . ~ ...

.. . ..,,, .,.,. ..

....

"..

4/8

CHAPTER 14

THELOGIC BEHINDINFERENTIALSTATISTICS

41

in shape. Thtis, yoti calT LISe the Itormal distribution as a theoretical Inodel That will

allow you to make Inferences aboutthe likely value of the populatioi\ mean, given the

mean of a single sample froin that population.

parametric statistic, you are making certain assumptions about the population from which your sample was drawn. A key assumption of a parametric test is that Your

sample was drawn from a normally distributed population

TITe sample Ineai\ is ITot the only statistic for which yoLi can ohmiiT a samplin distribution. In fact, eacl\ sample statistic has its own Theoretical salnpling disrribudistributions, yoti cal\ determine the probability that a value of a statistic as Iar e as

called the obtained p. Sampling Error

about the distribution of scores underlying your sample. Noriparametric statistics are

lion. For exaiiTple, the tabled values for the z statistic, Sttident's t, the F ratio and chi,

square represent the sampling distributions of those statistics. Using these samplin

used tryour data do not meet the assumptions of a parametric rest THELOGICBEHINDINFERENTIALSTATISTICS

or larger thai\ the obtained value could have occurred by chance. This probabilit is

When yoti draw a sample from a populatioiT of scores, the Ineai\ of the saln Ie, M,

standard deviation of the sample as follows:

will probably differ froin the popularioiT mean, F1. A1T estimate of the amount of vari-

standard error of the mean (orst"richrd erroT for short). It Inay be calculated froin the

S, '-

ability in the expected sample means across a series of such samples is provided by the

Whenever you conduct an experiment, you expose SIIblects to differentlevels of your independent variable. Although a given experiment may contain several groups assume for the present discussion that the experiment In question includes only two The data from each group can be viewed as a sample of The scores obtained if allsub jectsin the target popularioi\ were tested under the conditions to which the group was exposed. For example, the treatinent group mean represents a population of subjects exposed to your experimental treatment. Each treatment Inean is assumed to repre

sentthe mean of The underlying population

where s Is the standard deviatioi\ of the sample and n is the ITtimher of scores in the sample. The standard error Is used to estimate The standard deviation of the saln Iin

In all respects except for treatment, the treatment and control groups were ex posed to equivalent conditions. Assume that the treatment hadno effect on the scores. In that case, each group'sscores could be viewed as an independentsample taken from

the same population. Figure 14-1 illustrates this situation

FIGURE 14-1 Line graphsshowing

distriburion of the Inean f<tithe population froin which the sample was drawn.

Degrees of Freedom

Populario

independentinf(, rination. For example, ifyoLihave a sample of 10 scores and a known mean (e. g. , 6.5), only 9 scores are free to vary. That is, once yon have selected 9 scores from The population, the value of The 10/1\ must have a particular value that will ield the Inean. Thus, the degrees of freedom (df) for a single sample are n - I (where n is

the total ITUmber of scores in the sample). Degrees of freedom come into play whei\ yoLi use any inferentialsratisric. Yoti

In any distribution of scores with a known mean, a limited ITtimber of data points ield

> U " .

the relationship between samples and population, assuming that the treatment had no effect on the dependent variable (M, , mean of Sample I; Mz, mean of

Sample 2)

d

" L

Your experiment with Ineans ofZ, 5, and 10, the grand liteai\ (The SUIn of allthe scores

can extend this logic to the analysis of ai\ experiment. If yoti have three groups in

divided by n) is theIT 5.7. IfyoLi know The grand InGal\ and yoLi know the means from two of yotir groups, the final meal\ is set. Hence, the degrees of freedom for a three

grotip experiment are k - I (where k is the ITUmber of levels of The independent varistatistic against which the computed value is compared Parametric Versus Nonparametric Statistics

>

, " .

11

able). The degrees of freedom are theIT LISed to find the appropriate tabled value of a

Sample I

11

Sample 2

,

" L

tic of your sample (Gravetter 6< Wallnau, 2007). A parametric statistic estimates the

M2

420

CHAPTER 14

421

Eacl\ sample mean provides an independent estimate of the popLilation mean. Each saln it standard error provides an independent estimate of the standard deviatioi\ of sample means in the sampling distribution of means. Because the two means ere drawn froin the same population, yo\Iwould expect them to differ only becaLise

The bottom parr of the figure shows two possible sample distributions-one for the control group and one for the Treatment group. The scores from the control group still constitute a sample from the 11nshifted distribution (left-hand upper curve

in Figure 14-2), but the scores from the treatment group now constitute a sample

from the shifted distributioit (right-hand upper curve in Figure 14-2). The two sam

OSampingeT h ddd'triofthis

distribution (The standard errors). From this information, you can calculate the probh'lit that the two sample means would after as much as or more than they do SImp y

because of chance factors. This probability is the obtained p.

PIe means provide estimates of two different population means. Because of sampling

error, the two sample Ineans might or might I\or differ evei\ though a difference exists between the 11ndeTlying popularioiT means Your problem (as a researcher) Is that yoLi do not know whether the Treatment really had an effect o1\ the scores. YOLi inList decide this based on your observed sample means (which may differ by a certain amount) and The sample standard deviations. From this information, you Inust decide whether the two sample Ineans were drawn froin the same populatioiT (the treatment had no effect on the sam PIe scores) or from two different populations (the treatment shifted the scores relative to scores froin the control group). Inferential statistics help you make this

decision

would expect the scores from the two groups to provide Independent samples Tom

the same o ularion. From these samples, you can estimate the characteristics of that

difference between the two treatment means.

Let's review these points. If The treatment had Do effect on the scores, then You

popuation;Tomt\iseSi ,Y I h herVGd

in 11\ein LIPward. Figure 14-I Illustrates this situation. In The LIPper part oft e gure

C nsiderthe case in which the Treatment does affectthe scores, perhaps by shift-

underI ing the treatment group sample distribution. The population distribution Lindenying the treatment group Isshifted upward and away from the contro group popuIaiion distribution. This shift could be obtained by simply adding a constant to each

old Linshifted distriburioi\ in standard deviation, but its mean is higher.

a 0 111ation tinderlying the control group sample disrribution and anot er one

value in the control group distribution. This new shifted distribution resembles the

Concrol group

Treatment group

These Two possibilities (different or the same populations) calT be viewed as sta Listical hypotheses to he tested. The hypothesis that the means were drawn from the same population (I. e. , ^! = IL:) is referred to as the null hypothesis (HD). The hyporh esis that the means were drawn froin different populations (IAI 7E ILz) is called the alternative hypothesis (H, ) Inferential statistics use the characteristics of the two samples to evaluate the validity of the null hypothesis. Put another way, they assess the probability that the means of the Two samples would differ by the observed amount or more if they had been drawn froin the same population of scores. If this probability is sufficiently small (i. e. , if it is very unlikely that two samples this different would be drawn by chance from the same population), then the difference between the sample means is said to be statisticalIy significant, and the null hypothesis is rejected

Statistical Errors

the relationship between samples and population, assuming rhat the treatment

had an effect on the dependent variable.

population

>

"

populatio

r

co

LL

11

11

When making a comparison between two sample means, there are two possible states of affairs (the null hypothesis is true or it Is false) and two possible decisions you can make (not to reject the null hypothesis or To reject it). In combination, these condi tions lead to four possible outcomes, as shown ii\ Table 14.1. The labels across the top of Table 14-1 indicate the two states of affairs, and those in the left-hand column indicate the two possible decisions. Each box represents a different combination of

the two conditions

Control group

> . "

Treatment group

sample

sample

The lower left-hand box represents the situation in which the null hypothesis Is True (the Independent variable had no effect), and you correctly decide riot to reject the nullhyporhesis. This is a disappointing outcome, but at least you made the right

decision

d

"

The upper left-hand box represents a more disturbing outcome. Here the ITUll hypothesis is again true, but you have incorrectly decided to reject the null hypoth esis. In other words, you decided that your Independent variable had an effect when in fact it did not. In statistics this mistake is called a Type I error. In signal-detection

-,~"~,~.~

'~""~~$79, ~**,

422

CHAPTER 14

423

c e I or re'ect the nullhypothesis. This criterion, known as the alpha level (or),

TABLE 14-1 Statistical Errors

TRUESTATE OFAFFAIRS

The al ha levelrhar You adopt (along with the degrees of freedom) a so etermines

The critical value of the statistic that you are using. e

HD True

HO False

Correct decision

Reject Ho

Decision

Type I

error

I e of .000001. There are good reasons, discussed later, why you o not or inari y

adopt such a conservative alpha level.

By convention, t e minimum accepta e a p

Correct

DONotReject Ho

Type U

error

decision

ex Griments, the same kind of mistake is called a "{alse alarm" (saying that a slimti LIS

was present when actually it was riot).

A difference between means yielding an observed value of a statistic that meets or exd the critical value of Your inferGritial statistic is said to be statistical13 significant.

The strategy of looking up the critical value of a statistic in a

the obtained value with this critical value was developed in an era w en

case, The null hypothesis is false (the independent variable did have an effect), but

had no effect whei\ it really did have one. 11\ signal-detection experiments, suc an

outcome is called a "miss" (not detecting a stimulus that was present). Idealy, Youwouie( Tel error actually increase the probability of a Type U error, and vice versa.

Statistical Significance

a the exact probability value p along Y I obtained value of I e teststatisproviderbeeXaCtPTO" with thedjhlevelandaVOid

having to use the relevant table. If the o tame p ' comparison is statisticalIy significant.

One'Tailed Versus Two, Tailed Tests

Ifboth samples came from the same popularioi\ (or{Torn populations having the same

I means inn , be just such a chance difference, or it Inay reflect a Tea ti erence between the means of the populations from which the samples were Lawn. of these is the case! To help yo\I decide, yoLi cal\ compute ai\ inferentia statistic to he robabilit of obtaining a difference between sample means as large as or larger thai\ the difference yoLi actually got, tinder t \e assumptioi\ t at I\e liti

' flects ITothii\ Inore tl\at\ sampling error. The act11al difference betweei\ your sam-

mean), then the ntilllTyporhesisistrue, and any difference between the sample means

de end on whether The testis one'tailed or two-tailed.

igure ' EThemean. Theleftdistribution shows the critical region (shaded area) for a one'tailed test, assuming a p a b n set to .05. This region contains 5% of the total area under The curve, represen -

because yotiwould he unlikely To have obtaine I\e I erence y

'ri' I are'udedtobestatjstjcallysjgnjficant.

1stic. This observed value is compared to a critical value of that statistic (norma y

f I\d in a statistical table such as those in The Appendix, for examp e, a e

n whether or I\or The observed valtie of the statistic Ineets or exceeds the critica value. Asstated, you want to he able to reduce the probability of committing a ype I error. The probability of committing a Type I error depends on the criterion you LISe

statistic. This is usually the case when your researc ypot eses ar

PARAMETRIC STATISTICS

425

424

CHAPTER 14

One-tailed resr Two-tailed test

>

>

Three assumptions underlie parametric inferentialtests(Gravetter 6LWallnau, 2007): (1) The scores have been sampled randomly from the population, (2) the sampling

distribution of the mean is normal, and (3) the within-groups variances are hornogenot the variance.

Cmcal regio Critical region

tr L

CFicical region

. d

" L

neous. Assumption 3 means the variances of the different groups are highly similar.

Serious violation of one or more of these assumptions may bias the statistical test.

In statistical inference, The independent variable is assumed to affect the mean but

o

Z

I I'2

1.65

2\ ,

1.96

o

Z

I I'2

1.96

FIGURE 14-3 Graphs showing critical regions for one-tailed and two-tailed tests of

statistical significance

Such bias will lead you to commit a Type I error either more or less often than the stated alpha probability and thus undermine the value of the statistic as aguide to deci510n making. The effects of violations of these assumptions are examined later in more detail during a discussion of the statistical technique known as the analysis of"orionce.

Inferential Statistics With Two Samples

example, you may want to know whether a new therapy is me asurably better than The

standard one. However, if the new Therapy is not better, then You really do not care

whether it is simply as good as the standard Inethod or is actually worse. You would

not use it in either case

qualifying" a jury (i. e. , removing any jurors who could not vote forthe death penalty)

In contrast, you would conduct a two-tailed testifyou wanted to know whether the new therapy was either better orworse than the standard method. In that case, you need to check whether your obtained statistic falls into either tailof the distribution The major implication of allthis is that for a given alpha levelyou must obtain a greater difference between the means of your two treatment groups to Teach statistical significance ifyou use a two-tailed test than ifyou use a one'railed test. The one'tailed test is therefore more likely to detect a real difference if one is present (i. e. , it is more powerf\11). However, using the one'tailed test means giving up any information about the reliability of a difference in the other, untesred direction

The use of one'tailed versus two-tailed tests has been a controversial topic among

affects how simulated jurors perceive a criminal defendant. Participants in your ex-

perlmental group were death qualified whereas those in your controlgroup were not.

Partici ants then rated on a scale from O to 10 the likelihood that the defendant was

mean is 7.2, and the controlgroup mean is 4.9).

guilty as charged of the crime. You run your experiment and then compute a mean or each group. You find the two means differ from one another (The experimental group

error. Or your means may Tellably represent two different populations. Your task is to

determine which of these two conditions is true. Is The observed difference between meansreliab!e, or does it merely reflectsampling error!This question can be answered by applying the appropriate statistical test, which in this case is a t test

Thet Test

Your means may represent a single population and differ only because of sampling

statisticians. Strictly speaking, yoLi must choose which version you will use before You see The data. You must base your decision on suchfactoTs as practical considerations (as in the therapy example), your hypothesis, orpreviousknowledge. Ifyouwaitunrilafteryotihave seen the data and Then base your decision on the direction of the obtained outcome, your actual probability of falsely rejecting the null hypothesis will be greater than the stated alpha value. YOLihave LISed Information contained in the data to make your decision, but that information may itselfbe the result of chance processes and unreliable H you conduct a two-tailed test and then fail to obtain a statisticalIy significant result, the temptation is to find some excuse why yoti"should have done" a one'tailed test. YOLi cal\ avoid this temptation ifyoti adopt the following rule of thumb: Always

use a two-tailed test unless there are compelling a priorireasons not to

PARAMETRIC STATISTICS

The t testis used when Your experiment includes only two levels of the independent

variable (as in the jury example). Special versions of the I test exist for designs involvsamples (e. g. , marched-pairs designs and within-subjects designs).

ing independentsamples (e. g. , randomized groups) and for those involving correlated when you have data from two groups of participants who were assigned at random to the two groups. The test comes in two versions, depending on the errorterm selected.

The Impoo!ed version computes an errorterm based on the standard error of the mean Tovided separately by each sample. The pooled version computes an errorterm base on the two samples combined, under the assumption that both samples come from

o ulations having the same variance. The pooled version may be more sensitive to

The t Test for Independent Sumples You use the t test for independent samples

As noted above, there are two types of inferenrialstatistics: parametric and non parametric. The Type that you apply to your data depends on the scale of measure merit LISed and how your data are distributed. This section discusses parametric

inferGritial statistics

effect of the independent variable, bur itshould be avoided if there are large difforences in sample sizes and standard errors. Under these conditions, The probability

estimates provided by the pooled version may be misleading.

~."".-.~.,.

~. ,

4*,

~*.,.

I' I

I

' "'* ,

, . .

*'

.-. .

..,.

...

111 11

I:

426

\ I '. I^

PARAMETRICSTATISTICS

427

I^

In

.:

The t Test for Correlated Sqmples When the two means bein cot

I

amp cot at are not in epen ent of one another, the formula forthe ttest in

TABLE 14-2

Means and t Values From the Five Significant Differences Fo""d by Her^ at a1, (2003)

SCl MTBl

TEST

^; I

I.

t (df) 2.40 (18, -2.20 (31) 2.40(34) 3.16 (49) 4.73 (44)

icipant or ToIn single observations taken on each of a matched air of arti '

designs meet this requirement.

25.9

30.4

126.1

I -I

11

.

I^ I

I

I

,

The t test for correlated samples produces a Iar er I value tha t pp , Scores Tomtetwosamplesare atleastmodetj I ie to I e same data ithe scores from thea Pusare atleastmoderatelycorrelated, andth' two samples are

Motorspeed Verbal learning Verbal memory (immediate recall) Verbal memory (delayed recall)

37.9 187

10.7

21.4

I 1' I.

As presented, the data in Table 14-2 do nor make much sense. Anthar you have

11 I

corre are samp us an in GPendent samples I tests (pooled 'arelentica; withinreduceddegreesoffreed ,h Iversion) are identical;

pen entsamp us ttest to detect any effect of the independent variable.

I I

are means and a t value (with its degrees of freedom) for each measure. You must decide ifthe I values are large enough to warrant a conclusion that the observed difforences are statisticalIy significant. After calculating a tscore, you compare its value with a critical value of t found

An Example From the Literatures Contrasting Two G pinal cord injuries (SGI) represent a major source for physical disabilitie H s

vo ve rapi ece eratioi\ o I e body and may resultin Inild traumatic b

I I

'njury MTBl). Hess et al. ITore that when a patient with an SGI is rushed i h

wit processing information (Hess et a1. , 2003). The problem is that it is somet'in

emotional Iranma associated with SGI.

II

in Table Z of the Appendix. Before you can evaluate your obtained t value, however, you must obtain the degrees offreedom (for the between-subjects ttest, df = N - Z, where N is the total number of subjects in the experiment). Once you have obtained the degrees offreedom (these are shown in parentheses in the fourth column of Table 14-2), you compare the obtained Iscore with the tabled critical value, a process requiring two steps. In Table Z of the Appendix, first read down the column labeled "Degrees of Freedom" and find the number matching your degrees of freedom. Next, find the column corresponding to the desired alpha

level(labeled "Alpha Level"). The critical value of t is found at the intersection of

the degrees of freedom (Tow) and alpha level(column) of your test. If your obtained tscore is equal to or greater than the tabled CScore, then the difference between your sample means is statisticalIy significant at the selected alpha level.

In some instances, you may find that the Table you have does not include the

David Hess, Iennifer Marwitz, and Ieffrey Kreutzer (2003) conducted a u i erentiate etweei\ patients with MTBl(without SGI) and patients with SGI. Parcipanrs were patients with SGI or MTBl who had been Treated at a medical

degrees offreedom that you have calculated (e. g. , 44). Ifthis occurs, you can use the nextlower degrees offreedom in the table. With 44 degrees offTeedom, You would use

the entry for 40 degrees offreedom in the table.

ssing attention (two tests), motorspeed, verbal learning, verbal memor (two rests),

' uospatia s I s, andwordftuency. Meal\scores werecomputedoneach f

If You are conducting your t rests on a computer, most statistical packages will compute the exactp value forthe test, given the obtained I and degrees offreedom. In that case, simply compare your obtainedp values to your chosen alpha level. Ifp is less than or equal to alpha, the difference between Your groups is statisticalIy significant at

the stated alpha level.

The z Testforthe Difference Between Two Proportions In some research, you may have to determine whether two proportions are significantly different. In a jury simulation in which participants return verdicts of guilty or not guilty, for example, your dependent variable might be expressed as the proportion of participants who voted guilty. A relatively easy way to analyze data of this type is to

e as emotional well-being.

I .

J . .

.. ..

*,

. ,.,.*. .., .

~"

J.

428

CHAPTER 14

PARAMETRIC STATISTICS

use a z test for the difference between two ro ortion . evaluatedagainstanestimateof Poporrionsjs against an estimate of error variance.

mental error, or by a combinarioiT of these (Graverrer 6< WallnaLi, 2007). The second component, the within-groups wormbility, Inay be attributed to error. This error can arise

froin either or both of two SOLirces: individual differences between subjects treated alike

within groLips and experimental error (Gravelter & Wallnati, 2007). Take ITore that

}eilyour experiment inclLides Inore ThaiT twoThaiT twostatistical the of ch ' your experiment inclLides Inore groups, the Tou s, test star'

variability caused by your Treatment effects is Liniqtie to the between-grotips vanahility The F Ratio The statistic LISed ii\ ANOVA to deterIn ine statistical significance Is the F ratio. The F ratio is simply The ratio of between-groups variahility to within groups variability. Both types of variahility that constitute the ratio are expressed as variances. (Chapter 13 described the variance as a Ineasure of spread. ) However statisticians perversely Insist o1\ calling the variance the mean square perhaps because the Tenri is more descriptive. ItISI as with The I statistic, once yon l\ave obtained youI F ratio, yoti compare it against a table of critical values to determine whether YouI

results are statisticalIy significant

p aria yzing t e variance that appears in the data. Forthis analysis, the va t'

scri e Tow variation is partitioned into sonrces and how the resultin so variationsareusedrocjj eTeSLltingsource

riation among means Is statisticalIy significant.

'ecrs experiment is rs: caracteristicsofthesubjecr arthetjm, ,hdetermined by three factors (1) characteristics of the sub'e arthetimethescorewasineasured, (Z)I 'resLijecr

The One'Factor Between-Subjects ANOVA The one'factor between-subjects ANCVA is LISed when your experiment Includes only one factor (will\ Two or Inure levels) and has different subjects in eacl\ experi mental condition. As ai\ example, imagine yotil\ave conducted an experiment on

how well participants calT detect a signal against a background of noise, measured

independent variable is effective.

Figtire14-4showshowtherotalvariarioninrhescoresf t' lily among scores. f ' ''' OvaTiaiitymaybeattributable ina be artrib t 'ooneorm - fh Again, this total amount of variabilityy eattrjut,bl, Tooneorjnoreofthreefactors: d d .Y

experimenralerror(Gravener&Wallnati, 2007). '

p I itione into Two sources variability (between-groups variabiliry andwjjhin-groupsvariability).ofNoticetharth, , I, b g Upsvarja il, of wit in-groups variability). Notice that the example begins with a total amount ,

Theftrsrcomponentresulringfromthepartirionisthebt ' e etween-grotips variability may be catised by The variation in our ind d

, Tencesamongt e Ierentsubjectsinyourgroups, byexperi-

in decibels (dh). Participants were exposed to different levels of background ITUise (no IToise, 20 db, LIT 40 ab) and asked To indicate whether or ITt)I they Iteard a tone The I\umber of times that The participant correctly stared That a tone was present represents yonr dependent variable. Yotiftiund that participants in the I\0-noise groLip derecred Inure of The tones (364) Than participants in either the 20-d}) (238) o1 40-db (160) groups. Table 14-3 shows the distributions for the three groups Submitting your data to a one'factor between-subjects ANOVA, yoLitibtaii\ an F ratio of 48.91. This F ratio is now compared with the appropriate critical value ofF in Tables 3A and 3B in the Appendix. To find the critical\, alue, yotii\eed to LISe the degrees of freedQin for bon\ the I\Limerator (1< - I, where k is the ITUmbeT of groups) and denominator Ik(s - I), where s is the number of subjects in each groupl of youI

F ratio. In this case, the degrees of freedom for the I\umeTaror and denominator are Z and 12, respective I\ To identify the appropriate critical value for F (at or = .05), first locate The

variation Into between-groups and within

groups sources

Between groups

appropriate degrees of freedom for the ITUmerator across the top of Table 3A. Then

read doWIT the left-hand CUIumi\ to find The degrees of freedQin for the denomina

variability

Total

variation

tor. In This example, the critical value for F(Z, 12) at u = .05 is 3.89. Because youI obtained F ratio is greater thai\ the tabled \, alue, yoLi have ai\ effect significant at

p < 05 . In fact, ifyotilook at the critical value for F(2, 12) at or = .01 (found in Table 3B), you will find Your obtained F ratio is also significant at p < .01

Within groups

variability

When yoLireporr a significant effect, typically yoLi express it in terms of ap value Alpha refers To the cutoff point that yon adopt. In contrast, the p value refers to the actual probability of making a Type I error given that the ITUll hypothesis is TTLie Hence, for This example, yott would report that yotir finding was significant at p < .05

or p < .01. The disctissioiT 11\ the following sections assumes the "p < " ITotation

**

PARAMETRICSTATISTICS

42

430

CHAPTER 14

TABLE 13-3

200ECIBELS 22

NONOISE 33 39

400ECIBELS 17 14

19 11 19 80 I 328

d in atISOns can be LISed in lieu of an overallANOVA tryou have highly SpecificpreexpeTimGnayp ' 11 tiveiSto 'f th relationshi s were predicted before yoLi conducted your experiment. Per ormIt' re tests on the same data increases the probability of Inaking a ype error across comparisons through a process cane PTO a tit3! py

nextsection)

d ICt multi\Ie Itests. Yotishould not perform Too Inariy of these comparisons even

24

25 21 27 119

41

32 37

12X

182

Exz

M

6 684 36.4

2 855

23.8

its If otido not have a specific preexperimentalhypothUnplanned Coinporisons Ifyotido not have a specific preexperimenta ypot oncernin OUTresults, yoLimust conduct unplanned comparisons (a so Down

16.0

jar e number of unplanned comparisons To fLilly analyze the data.

wotypesO Ilealhaforeach

Sometimes the table of the critical vanies of F does ITor 11st the exact degrees of

freedom for your denominator. Ifthis happens, you can approximate the critical value ofF by choosing the nextlower degrees offreedom forthe denominator in the table Choosing This lower value provides a more conservative test of your F ratio Interpreting Your F Rqtio A significant F ratio tells yotithat at leastsome of the differences among your Ineans are probably nor caused by chance burrather by vari arion in your independent variable. The only problem, at this point, is that The F ratio fails to tell you where among the possible comparisons the reliable differences actually occur. To isolate which means differ significantly, yotiinust conduct specific comparisons between pairs of means. These comparisons can be either planned or

unplanned

rate is .05. The familywise error Tare (KGppe1, 1982) takes into account the increasing ,, oaiityoina g b t{withthefollowing

formula

arison between means. Ifyotiset an alpha level of .05, the per-comparison err

uru, I_ ,, _ or,

I - (I - .05)* = I - .95* = I - .815 = .185

Planned Coinpqrtsons Planned comparisons (also known as a pTioTicompttTisons) are used when you have specific preexperimental hypotheses. For example, you may have hyporhesized that the no-noise group would differ from the 40-db group but not froin the 20-db group. In this case, yotiwould compare the no-noise and 40-db groups and then the no-noise and 20-db grotips. These comparisons are made using informa Lion from Your overall ANOVA (see KGppe1, 1982). Separate F ratios (each having I degree of freedom) or I tests are computed for each pair of means. The resulting F ratios are then compared with the critical values of F in Tables 3A and 3B in the

Appendix

f ed to controlfamilywise error and gives a a bTie escription o eac often LISed to controlfamilywise error and givesbriefdescription ofeac . or more information about These tests, see KGppe1(1982, chap. 8).

S ecialtests can be applied to controlfamilywise error, but it is beyond t e scope f h' cha ter to discuss each of them individually. Table 14-4 lists The rests most

'11 use an ANOVA ANOVA if Your groups contain unequa numSumple Size You can still use an if Your groups contain unequal numb ifstib'ects, but yoLiinust use adjusted computational formulas. T e a justmen s

can take one of two forms, depending on the reasons for Linequa wit Inb - roduct of the way that You conducted

nt. If oLi conducted your experiment by randomly y istri P I our ex eriment. If yoLi conducted your experiment by ran Qindistributing y Yourexperiment. IfyoLiconLicteyouruting your I. I SLich cases, unequa sample equal. InSLich cases, unequalsamp e ssizes do not result from the properties o your

You can conduct as many of these planned comparisons as necessary. Howevei a limited number of such comparisons yield unique Information. For example, if you found that the no-noise and 20-db groups did not differ significantly and that the 40- and 20-db groups did, you have no reason to compare the no-noise and 40-db groups. You can logically infer that the ITo-noise and 40-db groups differ significantly Those comparisons that yield new information are known as orthogonal comparisons

treatmentconditions. f U tialsam Ie sizes also may resultfrom the effects of your treatments. one f treatments is painful or stressftil, participants may drop out of your experimen

b Ise of The aversive nature of that treatment. Death of animalsin a grotip receiving

PARAMETRIC STATISTICS

432

CHAPTER 14

433

TABLE 14-4 Post HocTests TEST Scheff6 rest USE COMMENTS

a u with Linequalsample sizes for reasons

TokeepfamilywiseeTror Veryconservativetest;Scheff6

rate constant regardless

of the number of coin

Dunnett test

possible comparisons, even if nor allare made parisons to be made To contrastseveralexperi- Not as conservative asrhe Scheff6 mental groups with a single rest because only the number of comparisons made is considered in controlgroup

the familywise error rare correction To hold the familywise

errorrate constant over

SI n equal weight in the aria ysis, espite LID

Ie sizes was planned or reflects

comparisons

test for comparisons between pairs of means;less powerful than the Schef{6 for more complex

comparisons

actualdifferences 11\thepopLiarion, VOtlS .\ d, ,, ording to th' ,, mber of SLiiGC"in"" ' h domeanswjrhlowerweights. See

Kepp. I(1973,1982) orc"areIte"an a ,

ualsample size in ANOVA.

HSD test hut more conservative

To compare allpossible

per-comparison errorrate

Less conservative than the Tukey rest; critical value varies according to the number of comparisons made Controls familywise error better

than the Newman-Keulstesrbut

to LISetstheone-factoTwithin-subjectsANOVA- 'in' 1, f independent to, ,,, 1, the one'fac'""""""' ff, ,, db the levelofthe he inae ende"

,,,, U -The within, q f, ,, l, nce(s)als\ICanbepaTtjtjOnedlntotwo ac, T tmentsSUm' subjectssouTceofvaTiance(s)als\ICanbePaTt""' ,

Newman-KGulstest

esr and lowest means decreases Duncan test

Newman-Keulstest with more

it is less conservative than the Newman-Keuls

Fisher test

in the analysis(S). YOLitlTensu Tract it '. . , I, ,F, ,,IhLIS making, the in tanalsis(S). YOLitlTensubrractSfromtheLisualwitl\in-groupsva"'the in,I\eanajsjs(S). """"' he den minatoToftl\eFratio, j, ,IhLISmaking"' beanalYS"(' YOLjtjTensubrractSfromtheLisualWit\in'g' I ki, ,

Fratioinoresensitivetothee ectso tTern GP , ,

To compare allpossible

combinations of means

compensate to controlfamilywise

error rate; no special correction factor used; significant overall F ratio justifies comparisons

A conservative test is one with which it is more difficult co achieve statistical significance

than with a less conservative test. "Power" refers to the ability of a test to reject the null hypothesis when the null hypothesis Is false

SOURCE:Information in this table was summarized from Keppe1, 1982, pp. 153-159; Pagan0, 2007 Winer, 1971; and information found at http://WWW2. chass. ncsu. edu/garson/pa765/an ova. htm

diff, ,ences exist am''gV ' h- h ,,,, differ, yotjjnustftjTtheranayze 'thantaifferencesoccuT. To determinew it Ineai ,

434

PARAMETRICSTATISTICS

betwee, b I GSimiartoroseLisedinthe

The Two-Factor Between-Subjects ANOVA Chapter 10 discussed The two-factor between-sub'ects des' e Two in I dd myassign letentsubjectstoeach Cond'I'. epen Grit variables and randomly assign different subjects to each

FIGURE 14-5 Graph showing a two way Interaction that masks main effects

~ "

.

. " -------------

Q.

"

. =

,

e ect o t e two factors (interaction) o1\ the aboutrheme fh reviewdependent variable. (Ifyou are unclear a our The meanings of these Terms, "e- youareunclear Chapter 10. ) The anal SIS a To ri re Statistical"fi fb CauseiTinustetermjnethe

" L

o

,

LA

Level I Factor A

Level 2

Factor B Level I

Level 2

c Ions in your experiment, you must be careful about interpreting the main effect . as ai\ effectI o1T the Grit varia e, regardlessregardless of the levelof o h e GPen dependent variable, of the levelofyour other inde-

to be inherently more interesting than main effects. They show how changes in one

variable alter the effects on behavior of other variables

Interaction is present.

Sample Size just as with a one'factor ANOVA, yoti can compute a inulrifactor

,

ANOVA with unequal sample sizes. The tinweighred Ineans analysis can be con ducted on a design with two or more factors (the logic is the same). For details on

modifications to the basic two-factor ANOVA formulas for weighted means and un

weighted Ineans analyses, see Keppe1(1973, 1982)

ICa aria ysisI willto revea statistica y significant Inain effects for' thesefor. th f an failto revealstatistically significant Inain effects factors.

experiment.

ain e ects. e in epeiT ent variables may have been effective, and errhe star' igure 14-5 s ows the cell means for this hypothetical

ANOVAforaTwo-FoctorBetween"Subjects Design: An Example An experiment conducted by Doris Chang and Stanley Sue (2003) provides an excellent example of the application of ANOVA to the analysis of data from a two-factor experiment

Chang and Sue were interested in investigating how the race of a student affected a

teacher's assessments of the student's behavior and whether those assessments were

ria e at t e two levels of Factor B. The fact That the lines form ai\ X (rather'I than being parallel) indicates presence of an interaction. Notice that Facan eing para el) indicates the the presence of an interaction. Notice th F

y a ects t e eve o t e ependenr variable at both levels of Factor B but that these effects run in opposite directions.

specific to certain types of issues. Teachers (163 women and 34 men) completed a

survey on which they were asked to evaluate the behavior of three hypothetical chil dren. Each survey included a photograph of either an Asian-American, an African American, or a Caucasian child. The survey also included a short description of

The dashed line in Figure 14-5 represents the main effecr ofFa I A, linershorizonral, indicar' the dependent variable us, the , in ICaring t arthere is no change in h h "Or is across d

u per an owerpoints to co apse acrossthe levels of Factor B. This dashed

the child's behavior. The child's behavior was depicted as falling into one of three problem" types: (1) "overcontrolled" (anxious to please and afraid of Inaking

mistakes), (2) "undercontrolled" (alsobedient, disruptive, and easily frustrated), or (3) "normal" (generally follows rtiles, fidgets only occasionally, etc. ). These two vari

Onsequent y, you Y aVGaSignicantinteractjon, ignoreLhem' ffI .Th I'ave a SignificantinteTaCtiOn,

GPen ent vana e at each level of Factor B, Its average (main) effect is zero.

type) factorial design. The SLITvey also included several measures on which teachers evaluated The child's behavior (e. g. , seriousness, how typical the behavior was, attri butions for the caLises of The behavior, and academic pe"formance)

Finally, most of the rime you are more Interested i

We limit our discussion of the results to one of The dependent variables: Typicalicy

of the behavior. The data were analyzed with a two, factor ANCVA. The results

I an in main effects, eveiT before your experiment Is conducted. H orhesized I'ps among variables are often stared in terms of Interactions. Interactions t d

sh, wad , signifi. ant main atect of p"obl. in Type, F(Z, 368) = 46.19, p < .0001

Normal behavior (M = 6.10) was seen as more typical than either undercontrolled

^.

I, ,*4

,,*

.~

.,

"..

J, *,.--,.

.. * ......,., j.

.

...,., ..

-.

-.

...

,*

.,..,.-..\

~,

"

..,..;::.,^ *

, . .,,. ;. , ....

..

, .

. , ,:,;

.^.

.\..

,..

..

436

PARAMETRIC STATISTICS

(M = 4.08) or one'cont, Quad (M = 4.34) bebanjo, . The ANOVA al, . chow. d a ,re-

tistically significant race by problem-type interaction, F(4, 368) = 7.37, p < .0001

Interpreting the Results This example shows how to interpret The results from

Allsubjects it\ a within-subjects desigiT will\ two factors are exposed tt) every possible

combinatioi\ of levels of your two Independent variables. These designs are analyzed

a two-factor ANOVA. First, consider the Two mall\ effects. There was a significant effect of problem type on typicality ratings. Normal behavior was rated as Inore tvpicalthan overcontrolled or undercontrolled behavior. If this were The only significant effect, you could then conclude that race of the child had no effect o1\ typicality

ratings because the mail\ effect of race was I\ot statisticalIy significant. However, this conclusion is nor warranted because of The presence of a significant interaction between race of learner and problem type.

using a two-IncloT witliin-subjects ANOVA. This anal\, SIS applies the same logic develuped for the one'factor within-SIIhjccts ANCVA. As in the one'factor case, subjects

are treated as a factor along with your manipulated independent variables. The major difference herweei\ the one' and two-factor within-subject ANOVA

and the SIIhjects factor (A X S and B X S), in additioi\ tti the Interaction between

your Independent variables (A X B). Because the Itasic logic and interpretation ttf

results from a within-subjects ANOVA are essentially the same as for The between-

The presence of a significant interaction suggests that the relationship between the two independent variables and your dependent variable is complex. Figure 14-6 shows the data contributing to the significant interaction in The Chang and Sue (2003) experiment. Analyzing a significant interaction like this one involves Inaking

=I

subjects ANOVA, a complete example isn't given here. A complete example of the

two-factor within-SLiblecrs ANOVA can be found in Kcppe1(1973) Mixed Designs

,I

Because Chang and Sue (2003) predicted the interaction, they LISed planned

of the Caucasian child and African-American child. Teacherssaw the ITormalbehav-

.!

comparisons Orests) to contrast the relevantineans. The results showed that the TVpicaliry of the Asian-American child's behaviorwas evaluated very differently froin that

ior of the Asian-AmericaiT child aslessrypicalrhan the ITormalbehavior of either The

In some situations, your researcl\ may call for a design I\\ixing between-SIIbjecrs and within-subjects components. This desigi\ was discLissed briefly in Chapter 11. If You

LISe such a desioi\ (knowi\ as a mixecl or split-1,101 design), y. ti cal\ analyze your data with an ANCVA. The computations Invt)Ive calcularing sums of squares for the

between factor and for the within factor

the Asian-American child as more typicalrhaiT the same behavior attributed to the

African-American or Caucasian child. The undercontrolled behavior wasseeiT asless typical for The Asian-American child than forthe African-American and Caucasian

The most complex parr of the analysis Is the selectioi\ of ai\ error Ierin to calculate the F ratios. The within-grotips Incai\ sqLiare Is LIScd to calculate the between-

subjects F whereas The interactioi\ of the withii\ factLir witl\ the within-groups

variance is LISed to evalLiate built the within-subjects factor and the interaction

children, respectively. So the race of the child did affect how participants rated The typicality of a behavior, butthe nature of that effect depended on the Type of behavior

attributed to the child.

between the within-subjects and between-subjects factors. Keppe1 (1973, 1982) provides an excellent discussioi\ of this analysis and a complete worked example. Higher-order and Special-Case ANOVAs

8 7 ' , 6

,,

Variations of ANOVA exist for lust ribotit any desigi\ LISed in research. For example,

a',

.t~ AsianAmerican

yoti cal\ include three or f<IUT factors ii\ a single experiment and analyze the data with a higher-ordeT ANCVA. In a three-factor ANOVA, for example, yoLi can test three

mail\ effects (A, B, and C), three two-way Interactions (AB, AC, and BC), and a

." "L

*..,, t

*,' '

,.*

aE 5

E :g

'a. AfricanAmerican

8 " 4

a~ " ~ 0 3

->

\\ \,.,,~,^a

*. *~

.*', Caucasian

three-way interactioi\ (ABC). As yoLi add factors, ITUwever, the coinptitations become more complex and probably SITould ITUt he done by I\and. In addition, as disctisse\I in Chapter 10, It Inay EC difficult to interpret the ITigher-order interactions will\ Inure

ThaiT {oLIT factors

^,

A special ANOVA is LISed \vhei\ y. ti have incltidcd a continuous correlational variable in Your experiment (such as age). This type of ANOVA, called The analysis of covariance (ANCOVA), allows yotito examine the relationship I\etween exNormal Overcontrolled Under on trolled

perlmentally Inariipulated variables while controlling another \, ariable that may Ite

correlated with them. Keppe1(1973, 1982) pro\. Ides clear discussions of these analyses

and other issues relating to ANCOVA

Problem type

FIGURE 14-6 Graph showing an interaction between I'ace and problem type

SOURCE: Chang and Sue, 2003; reprinted with permission

I

., .. ... . ..

. -..~,

..,. --

~ .,

~. - .

*. ,"' .:*.' F. ,

."

"

-:*

.

438

NONPARAMETRIC STATISTICS

to the LISe of parametric statistics in general (such as homogeneity of variance and normally distributed sampling distribution) apply to ANOVA.

treatments and to analyze Inulrifacror experiments. IT Is intended for use wheiT your dependent variable is scaled on at least ai\ intervalscale. The assumptions that appl

TABLE 14-5

Noriparametric Tests

MINIMUMSCALE OFMEASUREMENT coA, I\, IENTs

ANOVA involves forIn ing a ratio between the variance catised by your inaependent variable PItis experimental error and the variance (mean square) caused by experimental error alone. The resulting score is called an F ratio. A significant F ratio tells you

That at least one of yoLir Ineans differs from the other means. Once a significant effect rimcanr effect in order to determine where the significant differences occur. These tests

TEST

one'sample Tests

Binomial Nominal Nominal Ordinal

is found, you theIT perforin more detailed analyses of the Ineans contributing to the SIg-

Chi-square

become more complicated as the design of your experiment becomes more complex.

NONPARAMETRIC STATISTICS

Kolmogorov-Sinimov

alternative to chi-square

Chi-square

Nominal Nominal Ordinal Ordinal Ordinal Interval

Thus far, this discussion has centered on parametric statistical rests. In some SILLialions, however, yon inay not be able to LISe a paramerric test. WheiT your data do nor meetrhe assumptions of a parametric test or whei\ your dependent variable wasscaled

o1T a nominal or ordinal scale, consider a ITonparametric rest. This sectioiT discusses

*

Wald-Wolfowitzruns Moses test of extreme

reactions

More powerful than the

Mann-Whitney U test Less powerful than

Mann-Whitney U test Teststhe difference between

*;

.}

t

Three ITonparametric rests: chi-square, the Mann-Whimey U rest, and the Wilcoxon signed-ranks rest. Yotiinight consider using Inariy other nonparametric tests. For a

complete descriprioi\ of these, see Siegel and CastellaiT (1988). Table 14-5 summarizes some Information on these and other nonparamerric rests Chi-Square

Randomization rest

gtiilry) or a frequency count (such as how many people voted for Candidate A and how many for Candidare B), The statistic of choice is chi-square (X'). Versions of

chi-square exist for studies with one and two variables. This discussion is limited to Siegel and Castellai\ (1988) or RDScoe (1975).

Mann-Whitney U

Ordinal or above

means without assuming normality of data orhomogeneity of variance Good alternative to Itest when

assumptions violated

the two-variable case. For further information o1\ the one'variable analysis, see either

MCNemar

Nominal Ordinal

Chi-Square for Contingency Tables Chi-sqtiare for contingency tables (also called

the chi-sqi{aTe test for independence) is designed for frequency data in which the Tela,

Good when quantitative measures

Sign

Lionship, or contingency, between Two variables is to be determined. In a voter prefererrce study, for example, yoti might have liteasured sex of respondent in additioi\ to

candidate preference. You may wantto know whether the two variables are related or

Wilcoxon matched pairs

Walsh rest Ordinal Interval

data

*,

, ., ,,

Independent. The chi-square test for contingency Tables compares your observed cell

Good nonparametric alternative to the t test; data must be distributed symmetricalIy

. 4 t ,,

.

^.

frequencies (Those you obtained in your study) with the expected cellfTeqi{encies (those

you would expect to find itchance alone were operating).

!*.,

*;;! $*.

.,. .,

A study reported by Herbert Harari, OreiT Harari, and Robert White (1985) provides aiT excellent example of the application of the chi-square test to the analysis of frequency data. Harari et al. investigated whether male participants would help the

vicriin of a simulated rape. Previous research on helping behaviorsuggested that indi-

Interval

.* ,

,

co"tin"es

viduals are less likely to help someone in distress If they are with others than if they

, :.

.,

I, -

%.:

I*. ,,.

14;

. .

. ...

..*^.

**..,

,

tv 31

,. ..

-*.... .*.~...,

*,*,*

440

TABLE 14-5

NONPARAMETRIC STATISTICS

NoriparametricTests co"tin"ed

MINIMUMSCALE OFMEASUREMENT

TABLE 14-6

in Two Conditions DIDNOT INTERVENE

6

TEST

COMMENTS

INTERVENED

Cochran Q test

Nominal Ordinal

PARTICIPANTSINGROUPS PARTICIPANTSALONE

34

26 60

40 40

14

20

Friedman two-way

ANOVA

SOURCE: Data from Hatari, Hatari, and White, 1985

Chi-square

I

...

Nominal Ordinal

violated

ANOVA

Kruskal-Wallis one-way

,.

*'

,.

Whime . tcst cal\ be LISed whci\ y. tn' dependent \, ariable is scaled on at east an ordinal scale. 11 is alsLt a good alternative to the I test whei\ yotir data do ITot Ineel t e

..

OURCE: Data from Roscoe, 1975, and Siegel and Castellan. 1988

, . intrions of the I rest (sucl\ as \\, heri the scores are I\or normally distributed, w en

the variances are heterogeneous, \IT whei\ yoLi have smallsamplc sizes). CalculatioiT of the Mani\-Whimey . tcstisfairly simple. The firststep tsu> coin-

*

, ...

bine the data from your two grotips. Scores are ranked (froin highest to lowest) an\

s I e reqtiencies of participants I}elping Linder the two condir

I, bele I accordin, to the group to \\, hich they belong. H there Is a difference .etween

alculated for each group in your experiment. The lower of the Iw!I . scores o laine 11 ai\ the tabled U value, yon theiT conclude your two groups differ significant y

The Wilcoxon Signed Ranks Test

Tom a c i'square test performed on These data lionshjpberweenthedecisioiTroofferhjshowedg icanjre, . d h ' a si. rim , "I. ions ip erween the decisioiT roofferhelp andwhetherpartici antsw I

I cuayinoreieytoTejpthajtthosewhc)were

theIT evaluated against critical\, alues of U. If the lower of the Iwt\ U scores is sintt eT

linttutions of Chi-Square A problem arises If any of your ex ected cell fre, Lien-

t hed-tails desi, n, the Wilcoxon signed ranks test would he a good statistic to

(Graverrer6{Walln, ,, 2007). Y' '''q""'naYheartificiallyjnftat,"" (Gin, er, ere*wall, an, 2007). Y, ,, h, . h ' ""' "' 'in d

Iact pro a lity test (see Roscoe, 1975, or Siegel ,* Castellan, 1988) is an

Tingencytable(Roscoe, 1975) ' con-

sign of the difference score) from smallest to largest. Next, each rank is assigned a

'live or I\e alive sign, depending o1\ whether the difference score was positive or

negative. The positive and I\egative Tan s arc t ei\ summeL.

f h isillve and ne alive ranks are very different, theI\ the nullhypothesis ai\ Ite re'ected. For more informal101\ o1\ the Wilcoxoi\ signed ranks test, see iege

and Castellan (1988).

withANOVA, however, chi, -d Psarereare. A,

an two categories o each variable exist. To determine the locus thecontingencytable ceso examp e, a yon now Is That group size and helpino are related. A

N\IT arametric statistics are LISeful when yoLir data do ITot meet the assumptions

f atametric statistics. If yoLi have a choice. cl\QOSe a parametric statistic over a

'.*

..

..

.i

,,

,,..,. ....... ~

..". *

442

\

SPECIALTOPICS IN INFERENTIALSTATISTICS

441

nonparametric one becaLise parametric statistics are generally more powerful. That ,

ua y provi us a more sensitive rest of the null hypothesis

tailed test. This can be easily demonstrated by looking at the critical values oftfound in Table 2 in the Appendix. At 20 degrees offTeedom, the critical value at u = .05 for

a one' tailed test is 1.73. For a two-tailed test, the critical value is 2.09. It is thus easier

One' Tailed VetsMS Two, Toiled Tests A two, tailed restisless powerful than a one'

Ouhjd I q Y, Wenesigningyourstudy,

parametric statistic calT be used.

to Teect the ITUll hypothesis with the one' tailed test than with the two-tailed test.

chan us the value of the dependent variable is Termed the effect size. To facilitate

Effect Size The degree to which the manipulation of your independent variable Ie, the effecrsize for the difference betweei\ two treatment means might be reported as (Mz - Mj)/s, where s is the pooled sample standard deviation (Cohen, 1988).

comparison across variables and experiments, effect size is usually reported as a PTO-

ortion of the variation in scores within the treatments under comparison; for exam-

amerric or nonparametric statistic, when using any inferential statistic. This section

Measured in this way, effect size estimates the amount of overlap between the two

o ulation distributions from which the samples were drawn. Large effect sizes ind'cate relativeI little overlap: The mean of Population Z lies far into one tail of the

iscusses some special topics to consider when deciding on a ,,. , evaluated, strategy to statisticalI

Power of a Statistical Test

distribution of Population I, so a Teal difference in population means is likely to be detected in the inferentialtest (good power). Small effect sizes indicate great overlap

in the o Ination distributions and Thus, everything else being equal, relatively little ower. However, because inferGrillaltesrs rely on the sampling distribution of the test

statistic rather thai\ the population distributions, you may be able to improve power in such cases by, for example, increasing the sample size. decide whether or not to reject the null hypothesis, the issue of power is important.

Inferentialstatisrics are designed to help you determine The validi are inconsistent with The nullhyporhesis. The power of a statistical test is its ab'I' sensitive to differences in your data Than a less powerful one' The issue of the power of your statistical rest is an jin ortant o

Orhes's. C I ITeTevaiityotenullhy-

reject the null hypothesis (Gravetrer Is{ Wallnau, 2007). A powerful statistic is more

nti ypor esis imp ies that your independent variable affected your dependent

erect t ese differences. Put in statistical terms, power is a statistic's abilit to corre 11

You want to be reasonably sure that your decision is correct. Failure to achieve sta-

tisticalsignificance in your experiment (thus not rejecting the null hypothesis) can be caused by many factors. Your independent variable actually may have Do effect,

difference, or you did not use enough subjects.

to reject the null hypothesis Is not caused by a lack of power in our statistical re I. e power of yourstatisricaltest is affected by your chosen alpha level, the size of

duced '

o r ex eriment may have been carried outso poorly that the effect was buried in variance. Or maybe your statistic simply was riot powerful enough to detect the

can be set directly, it is not so easy to determine what the power of Your analysis wi

Although alpha (the probability of rejecting the null hypothesis when it is True)

be. However, you can work backward from a desired amount of power to estimate t e

Alpha coe As you reduce your alpha level(e. g. g.froin .05 to to .01), 0Lireduc the p a Le"el As you reduce your alpha level(e. , , froin .05 .01), yoLireduce he probability of making a Type I error. Adopting a more conservative al ha Ie I k

it more ithculr to reject the null hypothesis. Unfortunately, it also reduce

Iven a constant error variance, a larger difference between means is re uired I

sample sizes reqtiired for a study. To calculate these estimates, you must be wiling to

to find in Your experiment, and the expected error variance.

state The amount of power required, the magnitude of the difference that You expect

estimated from pilotreseaTch, from theory, or froin previous research in your area. For exam Ie, if revious research has found a small effect of your independent varia e

The expected difference between means and the expected error variance can e

Sample Size The power of ourstatistical of your sample Is ICa test Increases wit I e size

a Teed-on acceptable or desirable levelofpower (Keppe1, 1982). Ifyou are willing and able to specify the values mentioned, however, you can estimate the size of the samp e

6< Wallnau, 2007, or KGppe1, 1982, for a discussion on how to estimate the require

sample size. )

particu ar, the standard errors of the means from your treatments will be lower, so Ih

POSiions o t e popu ation means fall within narrower bounds. Consequentl,

,

., ,, ,. ."

*, " ,.

...,

,.

..,. ,

~.,"

., .... ,

^ ...~...

.. .,

A . ~

,.-.,

,, ,

*,. .-

,,

444

CHAPTER 14

Too much power can be as bad as too little. If you Tai\ enotigl\ subjects, yoti could conceivably find statistical significance in even the InOSI minute and trivial of diffei ences. Similarly, when you use a correlation, you can achieve statistical significance even with small correlations tryou include enough subjects. Consequently, yoursam PIG should be large enough to be sensitive to differences between treatments but ITot so large as to produce significant but trivial results The possibility of your results being statisticalIy significant and Yet trivial 11THy seem strange co you. Ifso, The next section may clarify this concept

'tl. ht , Ithar yoLiderermine reasonable for yoLn' PUTpos , --,, 'tl. ht II ,elrhar oLiderermine IsIs reasonable for yoLn' PUTp\ ,

, ". t"inureTeliablC"Thai\SignificantTeSLiiSOtrainC\, ,,

To say that results are significant (statisticalIy speaking) merely indicates LITat the ob served differences betweensample means are probably reliable, northe result of chance Confusion arises when You give the word significantitsii\ore coinmoiT nTeaning. Some thing "significant" in This more common sense is important or worthy of note The fact That the Treatment means of Your experiment differ significantly ina\ or may not be important. If the difference is predicted by a particular theory and ITot by others, then the finding may be important because it supports the theory ovei the others. The finding also may be important if it shows that one variable strongly affects another. Such findings may have practical implications by demonstrating, for exainple, the superiority of a new therapeutic technique. In such cases, a statisticalIy significant (i. e. , reliable) finding also may have pro^Cttlsign"cance Advertisers sometimes purposely blur the distinctioi\ herweeiT statistical and PIac ticalsignificance. A few years ago, Bayer aspirii\ announced The results of a "hospital study on pain other than headache. " Evidently, groups of I\OSpitalpatienrs were Treated with Bayer aspirin and with several other brands. The advertisement glossed over The details of the study, bur apparently the patients were asked to rate the severity of their pain at some point after raking Bayer or Brand X (the identities of both brands were probably concealed). According to the ad, "the results were significant-Bayer was bet ter. " However, the ad did riotsay in what way the results were significant. Evidently, the results were statisticalIy significant and thus probably riot caused by chance. Without any information aboutthe pailTratings, however, yoLido I\orkno\v Ifthis finding has any practicalsignificance. It may be That the Bayer and Brand X group ratings differed by less rhaiT I point on a 10-pointscale. Although this average difference Inay ITave been reliable, it also may be the case that I\o individual could tellrhe difference between two pains so close together on the scale. 11\ that case, the statisticalIy significant difference would have no practicalsignificance and would provide no Teasoi\ for. choosing Bayer over other brands of aspirin

Trot. Ifyoti^etain The ITUlll^, POT \esis w Tel\ It Is , .

convictedasaTeSLit- . balancebetweenTVPel

h t nt at least at The p < .05 level. Chapter 3 t , NITificantaTlcaStaTTith<. eve. exam' SIoi\ of publicatioi\ practices

Data Transformations

V It Subtractingaconsrantfroineachscorecan am f-QineachscorecanmaketheiTurnersinanag ,to 'richscoreinighrremovei\egativenumers- .. ncdis,

b doesn't change. ThemeaiTofthedistrititiOD g ' d deviationdoesI\Or- "V'' lidjj, ,,, t, qnsfoTmqtionS, Simply Levia s ItTansformations, calledlineartTansfoTmutions, SImpy

I"n, etheinagniiudeofrhei\umbersTepresentingy ,

thescaleofmeaStiTement. ,, 110ns. MYOUT

In the behavioral sciences, an alpha level of .05 (or I chance in 20) Is usually consid ered the maximum acceptable rate for Type I errors. This level provides reasonable protection against Type I errors while also maintaining a reasonable levelofpowerfor most analyses. Of course, ifyou wantto guard more strongly against Type I errors, you can adopt a more stringent alpha level, such as the .01 level(I chance in 100)

ECIALTOPICSIN INFERENTIALSTATISTICS

446

CHAPTER 14

TABLE 14-7 DataTransformations and Uses

TRANSFORMATION FORMULA

USE

Square root

x' = Vie

or

x'=V^I

variances more horno

a moderate positive skew

Arcsin

or

.O a havedarathatbadlyviolatetheassumprionso parame

ithnoappTopTiatei\DriparametTicstatisticrouseinstea ,

X' = Z arcsin

Normalizes data with

Thereliabilityofyourdatabyreplication- liable,

Log

X' = log X

or

,, iginalexperime''. thnrheoriginalcontext-The origll P atameteTswithii\Theoriginalcontext. e

new experiment will provi e a c e

X' = log(X + I)

Formula used if basic observations are frequencies or Ifvalues ofX are small

bFormula used ifvalues ofX are close to O or I Formula used ifvalue ofX is equal to or near O SOURCE:Information summarized from Tabachnick and Fide11, 2001, and Winer, 1971

Informatio" atOSmall, ridesIgnsorsituationsin ' Youcanincludeai\elementof replicationin hichviolationsofassumptjonsoccur. You can incue fia,

being LISed less and less frequently (KGppe1, 1973). ANOVA, perhaps the most coinmonly used inferentialstatistic, appears To be very robust against even moderately

serious violations of Its assLimptions L!riderlying the test. For example, Winer (197 I)

has demonstrated that evei\ if the within-cell variances vaTy by a 3:1 ratio, the F test

is I\ot seriously biased. Transformations t)f the data Ina^ riot be ITecessary in these cases. Also, \vhei\ yon trans{trim your data, your conclusions must be based on the

transformed scale and I\or the original. In most cases, this is ITor a problem. However,

formation

significantly the relationship betweei\ Two means. Prior to transformation, the mean for OroLip I was lower than the Inean for Group Z. The opposite was true after transUse data transformations only when absolutely I\ecessaTy because they can be trick . Sometimes Transformations of data correct one aspect of the data (such as

restoring normality) but induce new violations of assumptions (such as heterogeneity of variance). If yoti inList LISe a data transformation, before going forward with yoLir analysis, check to be sure that the transformation had the intended effect.

Alternatives to InferGritial Statistics

hLiman tendency To interpret every appaTen

weremeaningful athereforeinayfail

Inferentialstaristics are tools to help yon inake a decisioi\ about the null hypothesis.

series of experimentscon ucte y

Essentially, inferenrial statistics provide yoLi witl\ a way to test the reliability of a

~.

448

REVIEW QUESTIONS

or pain sensitivity by means of the "tail-fuck" rest. In the tail-flick test, a hot beam f

ordinal scale, a noriparametric statistic can be used (such as chi-square or the MannWhimey U test). These tests are LISually easier to compute than parametric tests. However, they are less powerful and more limited in application. Noriparametric statisrics may nor be available for higher-order factorial designs. Statistical significance indicates that the difference between your means was

rail out of the healn (a protective reflex) indicated the degree of am sensitivit .

I was octise on tle rat's tai. The length of rime elapsing Lintil the rat flicked 'I

statisticalIysignificant(p>. 05). '

sensitive t ai\ the group exposed to predictable shock. However, this effect was t aramerers of the experiment were twice altered in ways that were ex ected t

unlikel if only chance were at work. Itsuggests that Your Independent varia e a f{ I. Two factors contribute to a statisticalIy significant effect: the size of the difference betweei\ means and The variability among the scores. You can have a arge difference between Ineans, but if the variability is high, You may not find starisrica incance. Conversely, you may have a very small difference and find a significant

effect if The variability Is low.

rep icate . However, each replication produced virtually the identical result. On ea h occasion, I e 11npredicrable shock group demonstrated lesssensitivir to am tha Ih pre ICta e shock group, and each time this difference was riotstatisticall SI rim The problemcouldbedealtwirhbytakingineasures to increase the f h

increase The size of the predictability effect (if it existed), and the experiment was

Consider The power of Your statistical test when evaluating your results. I you o not find statistical significance, perhaps no differences exist. Or it could mean that Your rest was not sensitive enough to pick up small differences that do exist. Sample size is

'In ortanr contributor to power. Generally, the larger the sample, the Inore POWe u the statistic. This Is because larger samples are more representative of the un er ying o ularions than are smallsamples. Use a sample that is large enough To e sensitive to to s ecif an expected magnitude of the treatment effect, an estimate o error vari-

analysis itself indicated that the results were probably nor reliable.

differences but riotso large as to be oversensitive. There are Inethods for determining timalsain Ie sizes for a given level of power. However, you must be willing and ab e ance, and The desired power. The first two cal\ be estimated from pilot data or previous

research. Unfortunately, there is no agreed-o1\ acceptable levelofpower. A al ha level of .05 is the largest generally acceptable levelfor Type I errors. This value has been chosen because It represents a reasonable compromise etween

of therese-I .A ganareiTortegoaj suc a esign. Much like designing Your experiment before developing h orheses, variableThewa Idjk P"YOUrinepen, ,,

works best for. that design.

SUMMARY

T e I and Type H errors. In some cases (such as in applied research), The .05 Ieve

to Teach the conventional level of significance.

may b conservative. However, journalsprobably will norpublishresultsthatfai be too conservative. However, journa s p

Data transformations are available for those situations in which your data are in some wa abnormal. You may transform data ifrhe ITUmbers are large and unmanage-

bl or if OUT data do norineet The assumptions of a statistical test. The trans ormaton of data to meet assumptions of a rest, however, is being done less requent y

because inferentialsratistics tend to be robust against the effects of even moderate y veTe violations of assumptions. Transformations should be used sparingIy because they change the I\arure of the variables of yoursrudy

REVIEW QUESTIONS

Z 3 4 5

is c apter has reviewed some of the basics of InferGritial statistics. Inferenrial fullcs go eyond simple description of results. They allow you To determine wh th r e i erences observed in yoLir sample are reliable. Inferentialsratistics allow ou t

e a ecisioit a otit The viability of The null hypothesis (which states That there is

nullhoth h g proaiityorejectingthe

Why are sampling distributions important in inferGrillalstatistics. Whatissampling error, and why is it importantto know about. What are degrees offreedom, and how do they relate To inferentialstatistics.

How do parametric and nonparametric statistics differ!

examp e, t ese tests assume that the sampling distribution of means is normal a d

What is the general logic behind Inferentialstatistics! 6. How are Type I and Type 11 errors related!

7. What does statistical significance mean!

fo h d Tametricstaristicsareesigned

vio ate I e assumptions of a parametric test or your data are scaled on a nominal

9. What are the assumptions Lindenying parametric statistics.

,,..~,~~~~~

~

,

, I

*"

A1' %*'

t. "

r ':;;'

450

CHAPTER 14

10

Whicl\ parametric statistics would yon Lise to analyze data froin ai\ experiment with two grillips! Identify which statistic would be LISed for a particular type of

desigi\ or data.

11. Which parametric statistic is 11ToSL appropriate fin' designs with Inure than one

level of a single independent variable!

12. Whei\ would y. 11 do a planned versus at\ 11nplanned comparison, and why! 13. Whatis the difference between weighted and tinweigl\ted Ineans analysis, and

when would yoiiLise each!

CHAPTE

14. What are a Inaii\ effect and ai\ interaction, and how are they analyzed! 15. Under what conditions would yoLiLise a ITonparainetric statistic!

16. What is Ineant by the power of a statistical test, and what factors can affect 111

or why not!

LISing Line!

*

uring discussions of experimental and nonexperimental design, previous chapters assumed that only one dependent variable was included in a design or that multiple dependent variables were treated separately in any statistical tests. This approach to analysis is called a univariate strategy. Although Inariy research questions can Ile ad dressed with a univariate strategy, others are best addressed by consid Gring variables together in a single analysis. WheiT you Incltide two or more dependentineasuTes in asingle analysis, you are using a multivari

ate strategy

Experimental

Causal Inferer

Assumptions a

Multivariate St

18. Whei\ are data transformationstised, and whatshould yoti consider when 19. What are the alternatives to inferenrialsratistics for evaluating the reliability

of data!

Linearity

Outliers

Normality ant

Multicollinear Error of Meas

KEYTERMS

,

Sample Size

Multivariate St

inferential statistics

F ratio

pepcomparisoi\ erroi

Type I erroi Type H error alpha level(or) critical region

I test

fomilywise error analysis of covariance (ANCOVA) chi-square (X ) Mann-Whimey U test Wilcoxon signed ranks test

POWei

z test for the difference between two

proportions

This chapter introduces the Inajor InLiltivariate analysis Tech niques. Keep in mind that providing aiT in-depth introduction to these techniques in the confines of one chapter is impossible. Such a task is better suited to ai\ entire book. Also, the complex and laborioLis calculations needed to compute Inulrivariate statistics are better left to computers. Consequently, this chapter does not discuss the ITTathematics behind these statistical tests except for. those cases in which some mathematical analysis is reqLiired to tindersrand The issues. Instead, this chapter focuses o1T practical issues applications of the various statistics, the assumptions That must be Inet, and inter pretation of results. Ifyou want to use any of the statistics disctissed in this chapter, ^earl Using Multivariate Statistics (Tabachnick & Fidell 2001) or one of the many monographs published by Sage Publica

Lions (such as Asher, 1976, or Levine, 1977) CORRELATIONALANDEXPERIMENTAL MULTIVARIATEDESIGNS

Factor Analys

Partial and P

Multiple Regr

DISCriminant Canonical Co Multivariate I

Multiway Fret

Path Analysis

Structural Eq

Multivariate Ar Note

Summary

Review QuestIC

1<ey Terms

A multivariate design Is a research design In which multiple depen dent or Inultiple predictor analor CTiterioiT variables are included Analysis of data froin such designs requires special statistical proce dures. Multivariate desigi\ and analysis apply to both experimental and correlational research studies. The following sections describe

some of the available multivariate statistical rests

- Market Demand Sales ForecastingHochgeladen vonee1993
- mc16Hochgeladen vonanujkhannaee
- 05 Hypotheses TestingHochgeladen vonThapaa Lila
- Business Statistics/Series-2-2005(Code3009)Hochgeladen vonHein Linn Kyaw
- -28sici-291097-0258-2819970615-2916-3A11-3C1307-3A-3Aaid-sim559-3E3.0.co-3B2-nHochgeladen vonLata Deshmukh
- Hypothesis Testing Skills SetHochgeladen vonKahloon Tham
- Practice Question 2016(1)Hochgeladen vonVishal Sharma
- ugcmodule7 new2Hochgeladen vontradag
- A Pa TemplateHochgeladen vonMd Delowar Hossain Mithu
- Grace Ann PluradHochgeladen vonjackie cayabyab
- UT Dallas Syllabus for opre6301.502 06s taught by Carol Flannery (flannery)Hochgeladen vonUT Dallas Provost's Technology Group
- 141MATH-36B-1_1389298721Hochgeladen vonwilliam1230
- Writing_steps_for_beginners.pdfHochgeladen vondhanu
- Introduction to StatisticsHochgeladen vonSifat Mahmud
- Statistical TestHochgeladen vonXandae Mempin
- Ch3 Results.pdfHochgeladen vonDav'd Mwanje
- MBA All Trimesters 2017.pdfHochgeladen vonansa thampi
- Power by SimulationHochgeladen vonMohammad Usaid Awan
- nafisHochgeladen vonbabaahmad
- helwig study guide - research and program evaluationHochgeladen vonapi-359182604
- n.ssp.pdfHochgeladen vonMaria Asghar
- D-MAIC 9 Steps 483N Vanity Gap AHochgeladen vonCarlos Oliver Montejano
- Advanced Stat Lecture Notes 3Hochgeladen vonJcxtr Bdñ
- six sigma __Hochgeladen vondhia_albrihi
- QNT 351 Week 5 Final ExamHochgeladen von491acc
- Chp_09Hypothesis Testing of IAQHochgeladen vonDr. Shahul Hameed bin Mohamed Ibrahim
- FFFEEDHochgeladen vonYeshambel Mekuriaw
- Chapter 09B, Hypothesis Testing With t and pHochgeladen vonLinh Hihi
- Revista2.docxHochgeladen vonKathryn Soraya
- Tests for Differences Between MeansHochgeladen vonNursingNow

- 06Econometrics_Statistics_Basic_1-8.pptHochgeladen vonHay Jirenyaa
- Chapter 6_L4 L5Hochgeladen vonMbiko Sabeyo
- 10. Biostat Hypothesis TestingHochgeladen vonteklay
- MANOVA-dgerHochgeladen vonHoang Nguyen
- homework 1Hochgeladen vonapi-253411445
- A Manual of Modern Scholastic Philosophy v2 1000059811Hochgeladen vonMohammad Waseem
- uji tukey HSD.pdfHochgeladen vondasrial
- 76197355 Ken Black QA 5th Chapter15 SolutionHochgeladen vonManish Khandelwal
- Academic.udayton.edu-Using SPSS for T-TestsHochgeladen vonDarkAster12
- Elite_data Analysis With StataHochgeladen vonsara
- ST102 NotesHochgeladen vonBenjamin Ng
- Mgmt E-5070 2nd Examination SolutionHochgeladen von13sandip
- Cunningham_McCrum_Gardner2007.pdfHochgeladen vonReza Saputree
- Hypothesis TestingHochgeladen vonmig007
- Ancient Greek PhilosophyHochgeladen vonJoanne Butac
- Contoh Jurnal Supri.docxHochgeladen vonQhi
- MLG - Stefan Stavrev (1)Hochgeladen vonMuamer Besic
- Statistical Hypothesis TestingHochgeladen vonBernard Okpe
- SasHochgeladen vonamrutanatu
- bovee_bct12_tb12Hochgeladen vonMahmoudTahboub
- Chp02 2 Reasoning and Inference EditedHochgeladen vonBoon Ping
- Jurnal MetlitHochgeladen vonabiyya salsabil
- Problem SolvingHochgeladen vonLynette Vii dela Peña
- Ch 1 Part 2 Introduction to Accounting TheoryHochgeladen vonJue Yasin
- ita guideHochgeladen vonKuno Murvai
- d. Madhavi Latha Murthy 249-255Hochgeladen vonHotSpire
- Regression New.pptHochgeladen vonNataliAmiranashvili
- sta630 mcqsHochgeladen vonMahmood Alqumaish
- A Discourse of the Studies of the University of CambridgeHochgeladen vonJames Ungureanu
- Hypothesis Testing and a Nova TablesHochgeladen vonanon_38926273

## Viel mehr als nur Dokumente.

Entdecken, was Scribd alles zu bieten hat, inklusive Bücher und Hörbücher von großen Verlagen.

Jederzeit kündbar.