Sie sind auf Seite 1von 3

How to Perform a Chi Square Goodness-of-Fit Test

My hypothesis is that a particular penny is a fair penny. In other words, that it is not weighted or in any other way designed
to favor falling with heads up or to favor falling with tails up. If this is true of my coin, then my prediction is that the
probability of flipping heads (P(H)) is 0., and the probability of flipping tails (P(!)) is also 0.. !his means that I am
predicting that " of the time the coin will come up heads, and " of the time it will come up tails. !herefore, if I flip a coin
#00 times, my hypothesis predicts$
%&pected$ Heads$ '0 !ails$ '0 !otal$ #00
!o test this hypothesis, I flip my penny #00 times. Here are the numbers I get$
(bserved$ Heads$ ')* !ails$ '#+ !otal$ #00
!here are several factors which are important in determining the significance between the observed (O) and expeted (!)
values.
!he absolute difference in numbers is important. !his is obtained by subtracting the % value from the ( value ((,%).
-or heads$ (,% . ')* , '0 . '* -or tails$ (,% . '#+ / '0 . ,'*
!o get rid of the plus and minus signs, and for other esoterical statistical reasons, these values are s0uared, giving us ((,%)
*

for each of our data classes.
-or heads$ ((,%)
*
. '*
*
. '11 -or tails ((,%)
*
. ,'*
*
. '11
!he number of trials is also very important. 2 particular deviation from perfect means a lot more if there are only a few
trials than it would if there were many trials. !his is done by dividing our ((,%)
*
values by the e&pected values (which reflect
the number of trials),
-or heads$ ((,%)
*
3% . '113'0 . 0.4)5 -or tails$ ((,%)
*
3% . '113'0 . 0.4)5
5!hese values won6t always wor7 out to be the same for all of the categories. In this case
they do because we have only two categories of data, and our e&pectations for the two
categories are identical.
!o calculate the chi s0uare value for our e&periment, we add together all of the ((,%)
*
3% values8one for each of the
categories of results, (In this e&periment, our categories of results are 9heads: and 9tails:; for the dice you will be using in
class, there would be si& categories of results$ ', *, #, 1, , and ).)
<um of the =
*
. .4) > .4) . '.4*
?ote some important features of this number. It6s the sum of two numbers derived from fractions. !he absolute difference
between e&pected and observed results are in the numerators of those fractions, so the more you miss, the bigger the chi
s0uare number will turn out to be. !he e&pected values, reflecting the number of trials, are in the denominators of those
fractions, and thus the bigger your sample si@e, the smaller the =
*
numbers will turn out to be.
2ll of this information can be laid out in a =
w
data table$
C"ass (of data) !xpeted Observed (O # !) (O # !)
$
(O # !)
$
%!
Heads &'( &)$ &$ &** +,)
Tai"s &'( &-. -&$ &** +,)
Tota" -(( -(( Sum of /
$
0 &+,$
?(!% that the greater the deviation of any observed value from its e&pected value, the larger the =
*
value will be, and that
the larger the sample si@e, the smaller the =
*
value will be. !hus, in general, the smaller the <um of the =
*
value, the better
the fit between our prediction and our actual data.
?ow that you have a sum of the =
*
value, you must determine how significant that value is. Aemember that the 0uestion is, are
your actual data different enough from your predicted data to cast your hypothesis in doubtB -or the ne&t step, you need one
additional bit of information$ the de1rees of freedom (df). Cegrees of freedom reflects the numbers of independent and
dependent variables in your e&periment. !o calculate the degrees of freedom, we need to 7now the number of classes of data.
In the case of this e&ample, that number would be two (9heads: and 9tails:). If you were doing an e&ercise with dice, rather
than coins, the number of classes of data would be si& (the si& possible sides of the dice). Cegrees of freedom will generally
be the number of classes of data minus one. In this case, * / ' . ' degree of freedom. 2gain, if we were dealing with dice
rather than coins, degrees of freedom would be ) / ' . .
?ow we have two different numbers8the sum of the =
*
and the degrees of freedom8'.4) and ', respectively, for our coin
tossing e&ample. !he final step in our process is to refer to a professionally prepared table of the probabilities of =
*
values.
<uch a table is reproduced on the last page of this document. !hese tables come in a variety of si@es, depending upon how
many subdivisions (columns) are present, and how high the degrees of freedom go. !his particular table is rather small
compared to many available tables. !he table lists the degrees of freedom as the headings to the rows. 2cross the top are
probability figures8the 9probability of the Dhi,<0uare.: !he interior of the table consists of the sum of the =
*
values
themselves. Aemember, the point of the e&ercise is to decide whether our actual data are far enough away from the numbers
which we predicted to Eustify throwing out our hypothesis.
To 2se the Tab"e
'. -ind the degrees of freedom for your data (' in this case) in the left,hand column of the table.
*. <can across the row of =
*
values beside the df number until you find two values which brac7et your calculated
number ('.4) in this case). !his means that one of the figures will be larger, and the other will be smaller. If the
table were subdivided into enough columns, you might have found your e&act calculated value on the table, but you
should easily be able to see why that happens only very rarely. Fenerally, you have to be satisfied with finding the
brac7eting numbers. In this case, '.4) falls between the numbers 0.1 and *.G0).
#. Hoo7 up at the top of the table to see which probabilities correspond to your brac7eting =
*
values8in this case, 0.0
and 0.'0 respectively. If you had found your e&act =
*
value on this table, its probability would have fallen somewhere
between these two. <o we could say that (+&( 3 P(/
$
) 3 (+'(. !his mathematical statement means 9the probability
of our Dhi,<0uare falls between 0.'0 and 0.0.:
1. <o what does that meanB 2 probability of 0.'0 corresponds to a 9chance: of '0I; a probability of 0.0 to a 9chance:
of 0I. !his chi,s0uare result means that, if our hypothesis is correct, and we performed e&actly this e&periment
over and over again, '0I to 0I of the time, our results would be at "east this far from what we predicted. (r, the
probability that we would get results at "east as bad as these, even though our hypothesis is orret is between
0.'0 and 0.0.
. !he usual 9level of discrimination: used by investigators is P(=
*
) . 0.0. !hus, if your chi,s0uare value has a
probability of 0.0 or lower, it is very li7ely (but not certain) that your hypothesis is not correct.
Critia" 4a"ues of the /
$
5istribution
Probabi"it6 of the Chi-Square 7P (/
$
)8
df (+,,' (+,9' (+, (+' (+& (+(' (+(' (+(& (+(('
& (+((( (+((( (+(&) (+*'' $+9() -+.*& '+($* )+)-' 9+.9,
$ (+(&( (+('& (+$&& &+-.) *+)(' '+,,& 9+-9. ,+$&( &(+',9
- (+(9$ (+$&) (+'.* $+-)) )+$'& 9+.&' ,+-*. &&+-*' &$+.-.
* (+$(9 (+*.* &+()* -+-'9 9+99, ,+*.. &&+&*- &-+$99 &*+.)(
' (+*&$ (+.-& &+)&( *+-'& (+$-) &&+(9( &$+.-$ &'+(.) &)+9'(
) (+)9) &+$-9 $+*($ '+-*. &(+)*' &$+',$ &*+**, &)+.&$ &.+'*.
9 (+,., &+),( $+.-- )+-*) &$+(&9 &*+()9 &)+(&- &.+*9' $(+$9.
. &+-** $+&.( -+*,( 9+-** &-+-)$ &'+'(9 &9+'-' $(+(,( $&+,''
, &+9-' $+9(( *+&). .+-*- &*+).* &)+,&, &,+($- $&+))) $-+'.,
&( $+&') -+$*9 *+.)' ,+-*$ &'+,.9 &.+-(9 $(+*.- $-+$(, $'+&..

Das könnte Ihnen auch gefallen