ML

© All Rights Reserved

Als TXT, PDF, TXT **herunterladen** oder online auf Scribd lesen

14 Aufrufe

ML

© All Rights Reserved

Als TXT, PDF, TXT **herunterladen** oder online auf Scribd lesen

- Quiz 4
- Bio Statistics
- Gravity Model
- Lean Six Sigma Analyze Phase Tollgate Template
- Statistical Inference-(MGT601) Mid FA2015
- Sec 8 5 x2 Test for a Variance or Standard Deviation 1
- Measuring Security Price Performance (S. Brown, J. Warner)
- ES 211 (1)
- Ap13 Statistics q4
- lec9_chap9
- Comparative Study between Uber and Grab as the preferred TNV service provider of college students
- Hypothesis Test Steps 3
- Study Guide 2014 Summer
- Statistics for Management
- Association between variables.pdf
- Example of Hypothesis Testing
- Chi Square Frequency Analysis
- applied final$$$$$
- stats ch
- Ch01_03

Sie sind auf Seite 1von 9

f1 is the frequency of the modal class

f0 is the frequency of the class before the modal class in the frequency table

f2 is the frequency of the class after the modal class in the frequency table

h is the class interval of the modal class

THE CHI-SQUARE TEST

Introduction: The chi-square test is a statistical test that can be used to dete

rmine whether observed frequencies are significantly different from expected fre

quencies. For example, after we calculated expected frequencies for different al

lozymes in the HARDY-WEINBERG module we would use a chi-square test to compare t

he observed and expected frequencies and determine whether there is a statistica

lly significant difference between the two. As in other statistical tests, we be

gin by stating a null hypothesis (H0: there is no significant difference between

observed and expected frequencies) and an alternative hypothesis (H1: there is

a significant difference). Based on the outcome of the chi-square test we will e

ither reject or fail to reject the null hypothesis.

Importance: Chi-square tests enable us to compare observed and expected frequenc

ies objectively, since it is not always possible to tell just by looking at them

whether they are "different enough" to be considered statistically significant.

Statistical significance in this case implies that the differences are not due

to chance alone, but instead may be indicative of other processes at work.

Question: How is the chi-square test used to compare samples or populations? Wha

t does a comparison of observed and expected frequencies tell us about these sam

ples?

Variables:

the chi-square test statistic

o observed count or frequency

e expected count or frequency

n total number of observations

RT row total

CT column total

Methods: Shaklee et al. (1993) collected data to study genetic variation within

a species of fish called the barramundi perch (Lates calcarifer). Many fish spec

ies are composed of breeding groups called stocks, which are populations that ar

e genetically distinct from one another. One of the goals of Shaklee et al.'s st

udy was to identify individual stocks of the barramundi perch on the basis of si

gnificant genetic differentiation. Of the 25 collections examined, those that we

re not significantly genetically distinct from one another were considered to be

from the same stock; collections that were genetically distinct were considered

to be from different stocks. Understanding species subdivision into stocks has

important implications for conservation and fisheries management, since maintain

ing the genetic diversity of the species as a whole will require conservation of

the different stocks.

We'll use some of their data here to illustrate the application of a simple chi-

square test. Below are data showing allele frequencies at seven loci for eight c

ollections of perch from different parts of the Australian coast (table adapted

from Shaklee et al. 1993; all errors due to rounding are mine).

Locus & allele

# 1

# 2

# 14

# 15

# 18

# 21

# 22

# 25

EST-2*

*100+

249

78

97

115

101

242

128

116

*98

26

4

0

1

2

0

2

30

*95

126

41

60

60

52

226

125

70

ESTD*

*100+

390

120

155

176

171

465

335

210

*114

15

4

0

0

0

9

2

6

mIDHP*

*100

387

123

152

167

152

474

333

216

*78

0

0

5

10

4

1

0

0

sIDHP*

*100

354

113

111

137

143

432

310

177

*121+

37

7

44

33

27

39

18

28

*83

9

3

0

0

0

1

1

3

LDH-C*

*100

373

115

156

175

154

400

245

208

*90+

29

9

1

1

1

75

25

5

PGDH*

*100

382

122

130

145

153

378

240

199

*88+

5

2

21

18

16

95

89

3

PROT*

*100+

399

120

149

168

147

453

326

207

*97

8

4

8

9

9

22

5

9

We can use the chi-square test to compare collections # 1 and # 25 at the EST-2*

locus. The expected values are the allele frequencies we would expect if there

were no difference between the two collections at this locus. We can calculate t

he expected allele frequencies using the row and column totals from a table of t

he observed frequencies for these two collections.

For the first cell (collection #1, allele *100+) we begin by calculating the pro

bability of an observation being in the first row, regardless of column. To do t

his, take the row total (365) and divide it by n (617) (note that n changes depe

nding on which locus and which pair of populations is being compared). Based on

these two collections, the probability of a barramundi perch having the *100+ al

lele at the EST-2* locus is 0.5916 (365/617). Next, we calculate the probability

of an observation being in the first column, regardless of row, by taking the c

olumn total (401) and dividing it by n (617). The probability of an observation

coming from collection #1 as opposed to collection #25 is 0.6499 (401/617).

We have now determined the probability of a perch having a given allele at this

locus, and the probability of being in a given collection. But what is the proba

bility that an individual observation will have the *100+ allele at the EST-2* l

ocus and be from collection #1? The probability of two outcomes occurring togeth

er is called the joint probability, and is calculated by multiplying the two sep

arate probabilities: 0.5916 x 0.6499 = 0.3845. It follows that in a sample of 61

7 fish we would expect 617 x 0.3845 = 237 individuals to be from collection #1 a

nd have the *100+ allele, and we have now calculated our expected value for the

first cell in the table. This calculation can be simplified with the following f

ormula:

e = (RT/n)(CT/n)*n

Verify that the other expected frequencies have been calculated correctly.

Observed frequencies

Expected frequencies

allele # 1 # 25

RT

allele # 1

# 25

RT

*100+ 249 116 365 *100+ 237 128 365

*98 26 30 56 *98 36 20 56

*95 126 70 196 *95 127 69 196

CT

401 216 n=617

CT

401 216 n=617

Note also that the row and column totals remain the same. Now we can use the chi

-square test to compare the observed and expected frequencies. The chi-square te

st statistic is calculated with the following formula:

For each cell, the expected frequency is subtracted from the observed frequency,

the difference is squared, and the total is divided by the expected frequency.

The values are then summed across all cells. This sum is the chi-square test sta

tistic. For the example here,

= 0.608 + 2.778 + 0.008 + 1.125 + 5.000 + 0.014 = 9.533.

Interpretation: The critical value for the chi-square in this case () is 5.991;

if the calculated chi-square value is equal to or greater than this critical val

ue, we can conclude that the probability of the null hypothesis being correct is

0.05 or less-- a very small probability indeed! Our calculated value of 9.533 i

s greater than the critical value of 5.991. We therefore reject the null hypothe

sis, and conclude that there is a significant difference between the observed an

d expected frequencies of alleles at the EST-2* locus for these two collections

of barramundi perch. (Critical values for the chi-square are determined from a s

tatistical table based on the significance level at which the test is being perf

ormed [0.05 in our case] and a number called degrees of freedom [2 in this examp

le], but the details are beyond the scope of this module).

Conclusions: Our rejection of the null hypothesis allows us to conclude that the

two collections of barramundi perch compared here are genetically distinct at t

he EST-2* locus. In other words, the frequencies of the three alleles at this lo

cus are significantly different between the two populations. Using somewhat more

complicated applications of the chi-square test, the authors concluded that the

25 collections they analyzed came from seven genetically distinct stocks, or po

pulations, from adjacent stretches of the northeastern Australian coast. One of

the goals of conservation and/or management is the preservation of genetic diver

sity within a species. Management decisions based on the assumption that a speci

es' genetic variation is distributed across populations could have disastrous co

nsequences for the future of the species if the populations are indeed genetical

ly distinct. Techniques for identifying amounts and patterns of genetic variatio

n within a species are critical tools for biologists.

Additional Questions:

1) Are the allele frequencies at the other six loci also significantly differen

t between collections #1 and #25? (**For loci with two alleles instead of three,

the critical value of the chi-square is 3.841, but otherwise the procedure is t

he same).

2) Use the chi-square test to compare allele frequencies for collections #14 an

d #15. Can you determine whether or not these two collections are from the same

stock?

Sources: Rohlf, F. J. and R. R. Sokal. 1995. Biometry, 3rd ed. W. H. Freeman and

Company, New York, NY.

Rohlf, F. J. and R. R. Sokal. 1995. Statistical Tables, 3rd ed. W. H. Freeman an

d Company, New York, NY.

Shaklee, J. B., J. Salini, and R. N. Garrett. 1993. Electrophoretic characteriza

tion of multiple genetic stocks of barramundi perch in Queensland, Australia. Tr

ansactions of the American Fisheries Society 122:685-701.

copyright 1999 by M. Beals, L. Gross, and S. Harrell

Related Searches:

Mathematics Majors

Calculus Tutorials

Debit Credit Card

Mathematics Teachers

Introduction To Differential Equations

Mathematics Degree Programs

Merchant Account Services

Mathematical Methods In The Physical Sciences

Credit Card Today

About this Ad

Trust Rating

91%

tiem.utk.edu

Close

Chi-Square Test for Independence

This lesson explains how to conduct a chi-square test for independence. The test

is applied when you have two categorical variables from a single population. It

is used to determine whether there is a significant association between the two

variables.

For example, in an election survey, voters might be classified by gender (male o

r female) and voting preference (Democrat, Republican, or Independent). We could

use a chi-square test for independence to determine whether gender is related t

o voting preference. The sample problem at the end of the lesson considers this

example.

When to Use Chi-Square Test for Independence

The test procedure described in this lesson is appropriate when the following co

nditions are met:

The sampling method is simple random sampling.

Each population is at least 10 times as large as its respective sample.

The variables under study are each categorical.

If sample data are displayed in a contingency table, the expected frequency coun

t for each cell of the table is at least 5.

This approach consists of four steps: (1) state the hypotheses, (2) formulate an

analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

Suppose that Variable A has r levels, and Variable B has c levels. The null hypo

thesis states that knowing the level of Variable A does not help you predict the

level of Variable B. That is, the variables are independent.

H0: Variable A and Variable B are independent.

Ha: Variable A and Variable B are not independent.

The alternative hypothesis is that knowing the level of Variable A can help you

predict the level of Variable B.

Note: Support for the alternative hypothesis suggests that the variables are rel

ated; but the relationship is not necessarily causal, in the sense that one vari

able "causes" the other.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null

hypothesis. The plan should specify the following elements.

Significance level. Often, researchers choose significance levels equal to 0.01,

0.05, or 0.10; but any value between 0 and 1 can be used.

Test method. Use the chi-square test for independence to determine whether there

is a significant relationship between two categorical variables.

Analyze Sample Data

Using sample data, find the degrees of freedom, expected frequencies, test stati

stic, and the P-value associated with the test statistic. The approach described

in this section is illustrated in the sample problem at the end of this lesson.

Degrees of freedom. The degrees of freedom (DF) is equal to:

DF = (r - 1) * (c - 1)

where r is the number of levels for one catagorical variable, and c is the numbe

r of levels for the other categorical variable.

Expected frequencies. The expected frequency counts are computed separately for

each level of one categorical variable at each level of the other categorical va

riable. Compute r * c expected frequencies, according to the following formula.

Er,c = (nr * nc) / n

where Er,c is the expected frequency count for level r of Variable A and level c

of Variable B, nr is the total number of sample observations at level r of Vari

able A, nc is the total number of sample observations at level c of Variable B,

and n is the total sample size.

Test statistic. The test statistic is a chi-square random variable (?2) defined

by the following equation.

?2 = S [ (Or,c - Er,c)2 / Er,c ]

where Or,c is the observed frequency count at level r of Variable A and level c

of Variable B, and Er,c is the expected frequency count at level r of Variable A

and level c of Variable B.

P-value. The P-value is the probability of observing a sample statistic as extre

me as the test statistic. Since the test statistic is a chi-square, use the Chi-

Square Distribution Calculator to assess the probability associated with the tes

t statistic. Use the degrees of freedom computed above.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher r

ejects the null hypothesis. Typically, this involves comparing the P-value to th

e significance level, and rejecting the null hypothesis when the P-value is less

than the significance level.

Test Your Understanding of This Lesson

Problem

A public opinion poll surveyed a simple random sample of 1000 voters. Respondent

s were classified by gender (male or female) and by voting preference (Republica

n, Democrat, or Independent). Results are shown in the contingency table below.

Voting Preferences Row total

Republican Democrat Independent

Male 200 150 50 400

Female 250 300 50 600

Column total 450 450 100 1000

Is there a gender gap? Do the men's voting preferences differ significantly from

the women's preferences? Use a 0.05 level of significance.

Solution

The solution to this problem takes four steps: (1) state the hypotheses, (2) for

mulate an analysis plan, (3) analyze sample data, and (4) interpret results. We

work through those steps below:

State the hypotheses. The first step is to state the null hypothesis and an alte

rnative hypothesis.

H0: Gender and voting preferences are independent.

Ha: Gender and voting preferences are not independent.

Formulate an analysis plan. For this analysis, the significance level is 0.05. U

sing sample data, we will conduct a chi-square test for independence.

Analyze sample data. Applying the chi-square test for independence to sample dat

a, we compute the degrees of freedom, the expected frequency counts, and the chi

-square test statistic. Based on the chi-square statistic and the degrees of fre

edom, we determine the P-value.

DF = (r - 1) * (c - 1) = (2 - 1) * (3 - 1) = 2

Er,c = (nr * nc) / n

E1,1 = (400 * 450) / 1000 = 180000/1000 = 180

E1,2 = (400 * 450) / 1000 = 180000/1000 = 180

E1,3 = (400 * 100) / 1000 = 40000/1000 = 40

E2,1 = (600 * 450) / 1000 = 270000/1000 = 270

E2,2 = (600 * 450) / 1000 = 270000/1000 = 270

E2,3 = (600 * 100) / 1000 = 60000/1000 = 60

?2 = S [ (Or,c - Er,c)2 / Er,c ]

?2 = (200 - 180)2/180 + (150 - 180)2/180 + (50 - 40)2/40

+ (250 - 270)2/270 + (300 - 270)2/270 + (50 - 60)2/60

?2 = 400/180 + 900/180 + 100/40 + 400/270 + 900/270 + 100/60

?2 = 2.22 + 5.00 + 2.50 + 1.48 + 3.33 + 1.67 = 16.2

where DF is the degrees of freedom, r is the number of levels of gender, c is th

e number of levels of the voting preference, nr is the number of observations fr

om level r of gender, nc is the number of observations from level c of voting pr

eference, n is the number of observations in the sample, Er,c is the expected fr

equency count when gender is level r and voting preference is level c, and Or,c

is the observed frequency count when gender is level r voting preference is leve

l c.

The P-value is the probability that a chi-square statistic having 2 degrees of f

reedom is more extreme than 16.2.

We use the Chi-Square Distribution Calculator to find P(?2 > 16.2) = 0.0003.

Interpret results. Since the P-value (0.0003) is less than the significance leve

l (0.05), we cannot accept the null hypothesis. Thus, we conclude that there is

a relationship between gender and voting preference.

Note: If you use this approach on an exam, you may also want to mention why this

approach is appropriate. Specifically, the approach is appropriate because the

sampling method was simple random sampling, each population was more than 10 tim

es larger than its respective sample, the variables under study were categorical

, and the expected frequency count was at least 5 in each cell of the contingenc

y table.

- Quiz 4Hochgeladen vonBibliophilioManiac
- Bio StatisticsHochgeladen vonMichelle Iñigo Luba
- Gravity ModelHochgeladen vonYusriana Riana
- Lean Six Sigma Analyze Phase Tollgate TemplateHochgeladen vonSteven Bonacorsi
- Statistical Inference-(MGT601) Mid FA2015Hochgeladen vonMuhammad Azam
- Sec 8 5 x2 Test for a Variance or Standard Deviation 1Hochgeladen vonrizcst9759
- Measuring Security Price Performance (S. Brown, J. Warner)Hochgeladen vonAlizada Huseynov
- ES 211 (1)Hochgeladen vonJr Bongabong
- Ap13 Statistics q4Hochgeladen voncurlyfriez
- lec9_chap9Hochgeladen vonakirank1
- Comparative Study between Uber and Grab as the preferred TNV service provider of college studentsHochgeladen vonAndy Benigno
- Hypothesis Test Steps 3Hochgeladen vonObaidur Rahman
- Statistics for ManagementHochgeladen vonJishu Twaddler D'Crux
- Study Guide 2014 SummerHochgeladen vonmickmcq
- Association between variables.pdfHochgeladen voninag2012
- Example of Hypothesis TestingHochgeladen vonkasyap
- Chi Square Frequency AnalysisHochgeladen vonBobbyNichols
- applied final$$$$$Hochgeladen vonNizar Mohammad
- stats chHochgeladen vonapi-346736529
- Ch01_03Hochgeladen vonTitian Hasanah
- How Useful Are the Ep Ratio and the Spreads Between the Ep Ratio and Interest Rates in Forecasting Hong Kong Stock Market ConditionsHochgeladen vonfirebirdshockwave
- Statistical Test Excel FileHochgeladen vonrb
- Bakker2011MisreportingHochgeladen vonjenrioux
- 3--Test of hypothesis (part_1).pdfHochgeladen vonhijab
- Chap009aHochgeladen vonAlvin A. Velazquez
- documentHochgeladen vondrefor
- StatisticHochgeladen vonAizel Joy Dalangin
- Fin Dhaka v PrimeHochgeladen vonMahian Shuvo
- Slide_2 HYPOTHESIS FORMATION, TYPES OF ERROR AND ESTIMATIONHochgeladen voncdahad
- 186286 on Prior Probabilities of Rejecting Statistical Hypotheses 1973Hochgeladen vonlimuvi

- 10_Sampling and Sample Size Calculation 2009 Revised NJF_WBHochgeladen vonObodai Manny
- Inequality Tests for Two Means Using Differences (Two-Sample T-Test)Hochgeladen vonscjofyWFawlroa2r06YFVabfbaj
- Karp Rcommander IntroHochgeladen vonThanin Kuphoonsap
- Null and Alternative HypothesisHochgeladen vonGercel Millare
- 2 Hypothesis TestingHochgeladen vonfais_38
- Maggio 2014Hochgeladen vonujju7
- Yoga CourseHochgeladen vonsujanpgowda
- JaywalkingHochgeladen vonsherlockhisham
- Hypothesis TestingHochgeladen vonasdasdas asdasdasdsadsasddssa
- 8. Tests of SignificanceHochgeladen vonssckp86
- AP Bio Lab 7Hochgeladen vonmaxmax92007
- Free distribution or cost sharing? Evidence from a randomized malaria experiment - Jessica Cohen and Pascaline DupasHochgeladen vonsdfzdvgzdfbzdf
- 195461067 as Core Practicals Handbook 2013 14Hochgeladen vonRajasekar Krishnasamy
- 4. Economics - IJECR - -Intergovernmental Fiscal Transfer System - MARIOLA KAPIDANIHochgeladen vonTJPRC Publications
- The Philosophy of Quantitative MethodsHochgeladen vonjayro1974
- Fishers ExperimentHochgeladen vonNishant Panda
- Variable Importance Assessment RegressionHochgeladen vonscmret
- Statistics in Anaesthesia - Part 1Hochgeladen vonnot here 2make friends sorry
- Six Sigma ToolsHochgeladen vonapi-3835934
- Econometric sHochgeladen vonSohail Khan
- Staff Personnel Administrative Practices Adopted by Principals for Promoting Teacher Job Performance in Secondary Schools in Awka Education Zone Anambra State NigeriaHochgeladen vonIJARP Publications
- Editor-Reader Comparison StudiesHochgeladen vonRahul Sati
- Assumption of NormalityHochgeladen vonjdchandrapal4980
- Dhs 2011 PrelimHochgeladen vonUmehara Eri
- Traffic Engineering ManualHochgeladen vonKkr
- skripsi b. inggrisHochgeladen vonFarid Agung
- Bmat EssaysHochgeladen vonSarath Bandara
- SPSS Def + Example_new_1!1!2011Hochgeladen vonvickysan
- EFFECTS_OF_DEPRESSION_ON_STUDENTS_ACADEM.pdfHochgeladen vonKristine Dela Cruz
- ReviewForTest3Answers Population MeanHochgeladen vonMohamad El Arab