Methods For Proportions

Methods for Proportions
Relations between Categorical Variables

Chapter 10
Goals for Chapter 10
1. Standard deviations for proportion differences

2. Confidence intervals and hypothesis tests for
proportion differences
3. Contingency Tables for several proportions
4. Statistical Significance:
the Chi-Square statistic for a contingency table
5. Relative Risk, Increased Risk, Odds Ratio
Difference in Sample Proportions-standard deviation, confidence interval
(notation: pi =xi/Ni is observed proportion for sample i, i=

1 or 2 for two samples; note: text uses hat, with carat
over it)
estimated s.d. for proportion difference:
s1- 2 = [p1(1-p1)/N1 + p2(1-p2)/N2]
Confidence Interval for proportion difference:
p1-p2 -z1-/2 s1- 2 1- 2 p1-p2 +z1-/2 s1- 2
Hypothesis test for proportion difference:
use z-statistic: z = (p1-p2)/ s1- 2
formula above assumes null hypothesis,
H0:
population proportion difference is 0.
Example for Proportion Differences
A study classified pregnant women according to

whether they smoked and whether they were able
to get pregnant during the first cycle they tried.
RESULTS:
71
100
Percent 1st
cycle
29.0%
1st cycle 2nd cycle Total

Smoker 29
Non
198
288
486
40.7%
Total
227
359
586
38.7%
Calculating Conditional Percentages
What is proportion of women who smoke who also become

pregnant during the first cycle? 29 /100 = 29 %
What is proportion of women who dont smoke who also
become pregnant during the first cycle? 198/486=40.7 %
all
227
nonsmoker
198
29
smoker
0%
486
288
71
20% 40% 60% 80% 100%
1st cycle?
2nd+ cycle?
Statistical Significance of 2x2 Tables--1
Strength of Relation:
Relation Compare percents or rates of
those who do with those who dont;
Example: smokers % pregnant for 1st cycle, 29%
nonsmokers % pregnant 1st cycle, 41%

Therefore 41 / 29 = 1.4 times as likely to become
pregnant during 1st cycle if nonsmoker.
Size of Study Sample:
Sample How does the number of
subjects affect the significance of the result?
Clearly the result becomes more significant (not
result of chance) as sample size increases.
If there had been only 59 women in the study,
difference in proportions would be much less
significant.
Assessing Statistical Significance of Tables--2
We use the Chi-Square Statistic to determine whether

differences between proportions is real or due to chance .
The Chi-square statistic shows how the distribution of
observed proportions compared to those expected on the
basis of pure chance varies; for example, if we tossed
snake-eyes in craps on every throw we might think the
dice were loaded
For previous example, if there were no difference between
smokers and nonsmokers, we would expect the proportions
for both to be the same as in the total:
First cycle:227/586 =0.387 or 38.7%; on the basis of this
expected proportion, we calculate the numbers:
Calculating the Chisquare Statistic-Expected Values
Since the total number of smokers is 100, there would

be 100x 0.387 = 38.7 smokers pregnant in the first
cycle if there no difference between smokers and non.
Since the total number of pregnant during the first
cycle is 227, there would be 227-38.7= 188.3
nonsmokers pregnant during the first cycle,
1st cycle
smokers 38.7
expected
188.3
non
expected
Total
227
2nd cycle
Total
61.3
100
297.7
486
359
586
Calculating the Chisquare Statistic--Differences
Once the Expected values for each cell are calculated,

we take the differences between the observed and
expected values for each cell i, observedi - expectedi:
Note that we only have to calculate one difference; the
differences in rows or columns have to sum to zero.
2nd cycle
total
smokers -9.7
difference
+9.7
+9.7
non
difference
-9.7
1st cycle
Calculating the Chisquare Statistic-2x2 Tables
Once the differences, Di, and expected values, Ei, for

each cell are calculated, then the chisquare statistic is
evaluated from the formula.
2 = Di2 / Ei,
2 = (9.7)2[ 1/38.7 + 1/61.3 + 1/188.3 + 1/297.7] = 4.78
This value is greater than 3.84, the critical value for chisquare
at a 95% significance level
1st cycle
smokers 38.7
expected
188.3
non
expected
Total
227
2nd cycle
Total
61.3
100
297.7
486
359
586
Two X Two Tables and Chi-Square Statistics
Example Are males more likely to be underachievers?

Students classified as underachievers if grades in high
school below the prediction given by a reading test at Age
12.
Total
Under
Over
Girls
Boys
0%
Boys
Girls
Total
20%
40%
Under
26
8
34
60%
Over
13
22
25
80%
Total
39
30
69
100%
%Under
61
27
49
2x2 Table and Chisquare Statistic Example,cont.
Calculation of Chisquare Statistic for previous example

1. Compute expected values: boys under: (39/69)x34 = 19.2 ;
girls under: 34 - 19.2 = 14.8;
boys over: 39 - 19.2 = 19.8;
girls over: 30
- 14.8 = 15.2
2. Take the difference between observed and expected, square it,
and divide by expected for each cell: boys under: (-6.8) 2/ 19.2 =
2.41; girls under: (+6.8)2 / 14.8= 3.12; boys over: (+6.8) 2 /
19.8= 2.34; girls over: (-6.8)2/ 15.2 = 3.04.
3. Sum the terms calculated in 2 to get the Chisquare statistic:
Chisquare = 2.41 + 3.12 + 2.34 + 3.04 = 10.91
4. Compare the calculated Chisquare statistic with 3.84 to
determine significance (at the 95% level). In this example, 10.91
is much greater than 3.84 so results (difference in proportion) is
statistically significant.
Risk and Odds
Both the Risk and Odds give information about the likelihood of a
positive response to a categorical variable, but their numerical values
differ. Example: 2x2 Table gives results for stopping smoking after
eight weeks use of either a nicotine patch or placebo: Note that the risk
of continuing to smoke after using the nicotine patch is 0.47 or 47%
compared to the greater risk for the placebo use, 0.80 or 80 %. Thus the
RISK is equivalent to the conditional probability for the outcome
variable, given a response variable. The ODDS is the ratio of these
conditional probabilities for the two outcome variables and can be less
than or greater than one.
Nicotine
Placebo
Smokes Stops Total Risk

Odds
64
56
120 64/120=0.47 64/56=1.1
96
24
120 96/120 = 0.8 96/24=4.0
Relative Risk and 2x2 Tables
Every 2x2 table will have two explanatory variables (eg,

for the previous slide, whether a nicotine patch or a
placebo was used.
The ratio of the risks for these two variables is called the
RELATIVE RISK.
Example
RR, relative risk of continuing to smoke if placebo
rather than nicotine patch used:
RR= 0.80 / 0.47 = 80% / 47% =
1.70
Odds Ratio and 2x2 Tables
The ratio of the odds for the two explanatory variables is

called, as might be expected, the Odds Ratio (OR). If the
odds are very small then the Odds Ratio and Relative Risk
are approximately equal.
Examples:
stopping smoking with nicotine patch:
Odds(placebo) = 96/24 = 4.0;
Odds (nicotine) = 64/56 =1.1;
Thus OR = (96/24) / (64/ 56) = 3.5,
(compared to RR = 1.7)
Simpsons Paradox and Hidden Variables
Example : 1972 admissions rates for graduate programs, UC

(Berkeley)--found overall that percent of women applicants
admitted was less than the percent of men applicants, even
though women percentages were higher for individual
departments. (see Exercise 13). The paradox can be explained
by different overall selectivity in each program and different
proportions of men and women applying to each program.
Program | Sex
Admit
Deny
Percent Admit
A | Men
A | Women
400
50
250 (400 / 650 ) 61.5

25 ( 50 / 75 ) 66.7
B | Men
B | Women
A+B | Men
A+B | Women
50
125
450
175
300
300
550
325
( 50 / 350 ) 14.3
(125 / 425) 29.4
( 450 /1000) 45.0
( 175 / 500) 35.0
Goodness of Fit-Comparing Observed with Theoretical Proportions
Procedure:
tabulate observed frequencies,Ni,for each category;
tabulate expected (theoretical) frequencies, Ei;
take difference between corresponding observed
and theoretical frequencies, Di= Ni-Ei
calculate Chi-square statistic by formula
2 = Di2 / Ei,
with the degrees of freedom (df) = k-1, where k is
the number of categories (one proportion)
Example: Exercises 10-14,10.15 (p 402).
Comparing Several Proportions, Categories
Procedure:
Set up a k (no. of explanatory categories) by r (number of
response proportions) contingency table;
tabulate number for each cell in the table, marginal totals for
each category, Nk, response variable, Nr, and grand total, N.
For the cell in the table corresponding to the kth category, rth
response variable, the expected number (if the proportion
would be the same for all response variables) is E kr = (Nk Nr)
/N
Calculate the difference Dkr = Nkr - Ekr and calculate a Chisquare statistic from the formula
2 = Dkr2 / Ekr
Degrees of freedom, df = (k-1)(r-1)
Comparing Several Proportions, Categories

Example, Exercise 10.18, p.408:
A study of potential age discrimination considers promotions
among middle managers in a large company. The data are
promoted: 9 under 30; 29 in 30 to 39 category; 32 in 40 to 49; and

10 in 50 and over category (total 80);
not promoted: 41 under 30; 41 in 30 to 39 category,
48 in 40 to 49, and 49 in 50 and over (total 170)

Methods For Proportions

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Methods For Proportions

Hochgeladen von

Copyright:

Verfügbare Formate

Methods for Proportions

Relations between Categorical Variables

Goals for Chapter 10

1. Standard deviations for proportion differences

Difference in Sample Proportions-standard deviation, confidence interval

(notation: pi =xi/Ni is observed proportion for sample i, i=

Example for Proportion Differences

A study classified pregnant women according to

1st cycle 2nd cycle Total

Calculating Conditional Percentages

What is proportion of women who smoke who also become

20% 40% 60% 80% 100%

Statistical Significance of 2x2 Tables--1

nonsmokers % pregnant 1st cycle, 41%

Assessing Statistical Significance of Tables--2

We use the Chi-Square Statistic to determine whether

Calculating the Chisquare Statistic-Expected Values

Since the total number of smokers is 100, there would

Calculating the Chisquare Statistic--Differences

Once the Expected values for each cell are calculated,

Calculating the Chisquare Statistic-2x2 Tables

Once the differences, Di, and expected values, Ei, for

Two X Two Tables and Chi-Square Statistics

Example Are males more likely to be underachievers?

2x2 Table and Chisquare Statistic Example,cont.

Calculation of Chisquare Statistic for previous example

Risk and Odds

Smokes Stops Total Risk

Relative Risk and 2x2 Tables

Every 2x2 table will have two explanatory variables (eg,

Odds Ratio and 2x2 Tables

The ratio of the odds for the two explanatory variables is

Simpsons Paradox and Hidden Variables

Example : 1972 admissions rates for graduate programs, UC

250 (400 / 650 ) 61.5

Goodness of Fit-Comparing Observed with Theoretical Proportions

Comparing Several Proportions, Categories

Comparing Several Proportions, Categories

promoted: 9 under 30; 29 in 30 to 39 category; 32 in 40 to 49; and

Das könnte Ihnen auch gefallen