La Morte

Main Menu - The hyperlinks below take you to the appropriate wo
Basic Concepts
Normal Distribution & 'Standard Deviation
Skewed Distribution
Epidemic curve (how to create one)
Descriptive Statistics (mean, median,mode, 95% confidence interval for a mean, standard deviation, standard error, ra
Epidemiology/Biostatistics Tools
Wayne W. LaMorte, MD, PhD, MPH Copyright

2006
Menu - The hyperlinks below take you to the appropriate worksheet.

Statistical Tests
ANOVA (Analysis of Variance)
Chi Squared Test
Confidence Interval for a Proportion
Correlation & Linear Regression
T-test (Unpaired)
T-test (Paired)
Standardized Rates (Proportions) - Direct Standardization

Standardized Incidence Ratio
Fisher's Exact Test (You need to be online to use this.)

appropriate worksheet.
Study Analysis
Case-Control
Cohort Studies
Screening Test Performance - Sensitivity/Specificity
Sample Size Calculations

Survival Curves
Random Assignment to Groups

The Normal Distribution: Mean, Variance, and Standard Deviation
Data set 1:
BMI Many biological characteristics that are measurements follow a Normal distribution fa
22 meaning their frequency distributions are bell-shaped and symmetrical around a mean or a
23 shape of the bell will vary between tall & skinny for samples with relatively little variability to
23 samples that have a lot of variability. To the right and left in green are two datasets that sho
23
24 mass index (BMI). I can graph the frequency distribution of each dataset by followin
24 select the block of data; 2) click on "Data" from the tool bar above, and choose "Sort"; 3) i i
25 be sorted according to the column the data is located in, and select "Ok." 3) With the data s
25 determine the minimum and maximum values and the frequency of each of value in the ran
25 tallies in the smaller table entitled "Counts for Bar Chart". 4) select the 2 column block of da
25 for Bar Chart" and click on the Graph icon from the toolbar above (the miniature, multicolor
25 chart). Indicate a vertical bar chart. Note: If it graphs the BMI and Frequency as two separa
26
may have to first create the chart as an "XY Scatter" to indicate that they are related, and th
26
26 chart to a vertical bar. Scroll down to view the graph and for information about mean, sta
26and thenetc.
om the tool bar above, choose "Sort"; 3) I then indicate that it is to be sorted according to the column the data is located in, and se
26
26 Counts for Bar Chart
26 BMI frequency
27 22 1
27 23 3
27 24 2
27 25 5
27 26 7
27 27 9
27 28 6
27 29 5
27 30 3
28 31 1
28 32 1
28 33 1
28
28
28 10
29 9
29 8
29 7
29 6
29 5
30 4
30 3
30 2
31 1
32 0
33 22 23 24 25 26 27 28 29 30 31 32 33
27.00 Mean
5.77 Variance +/- 1 SD =68%
2.40 Standard Deviation
+/- 2 SD =95%
+/- 2 SD =95%
Variability in the data can be quantified from the variance, which basically calculates the average dis
the mean (the x with the "bar" over it).
2
(x – x)
n -1
Standard deviation is just the square root of the variance, and it is convenient because the mean +
and the mean + 2 SD captures 95% of the observations.
2
(x – x)
n -1
Note the functions that are used to calculate variance and SD in Excel. Then compare the standard de
these affect the shape of the frequency distributions. Data
Using SD versus SEM: A standard deviation from a sample is an estimate of the population SD, e.g. t
weight in the population. The SEM is a measure of the precision of our estimate of the population’s me
will increase as the sample size increases, i.e. the SEM will be narrower with larger samples.
If the purpose is to describe a group of patients, for example, to see if they are typical in their variability
Tables 2& 3 in Gottlieb et al.: N. Engl. J. Med. 1981; 305:1425-31). However, if the purpose is to estima
prevalence of disease, one should use SEM or a confidence interval.
Wayne W. LaMorte, MD, PhD, MPH
Copyright 2006
Main Menu
Data set 2:
rmal distribution fairly closely, BMI
cal around a mean or average value. The
#DIV/0! 23
atively little variability to short & wide for 24
e two datasets that show values of body 24
25
ach dataset by following these steps: 1) 25
nd choose "Sort"; 3) i indicate that it is to 25
"Ok." 3) With the data sorted I can easily 25
each of value in the range. Put these 25
he 2 column block of data in the "Counts 26
he miniature, multicolored vertical bar 26
equency as two separate entities, you 26
26
they are related, and then convert the
26
mation about mean, standard deviation, 26
26
27 Counts for Bar Chart
27
27 BMI frequency
27 22 0
27 23 1
27 24 2
27 25 5
27 26 7
27 27 14
27 28 6
27 29 5
27 30 3
27 31 1
27 32 0
28 33 0
28
28
16
28
28 14
28 12
29 10
29
8
29
29 6
29 4
30 2
30
0
30
22 23 24 25 26 27 28 29 30 31 32 33
31
27.05 Mean
3.02 Variance +/- 1 SD =68%
1.74 Standard Deviation
+/- 2 SD =95%
+/- 2 SD =95%
culates the average distance between each individual value and
ent because the mean + 1 SD captures 68% of the observations,
mpare the standard deviations for datasets 1 and 2, and see how
he population SD, e.g. the degree of variability of body

of the population’s mean. The precision of this estimate
ger samples.
ypical in their variability one should use SD (e.g. see

he purpose is to estimate the mean in a group or the
30 31 32 33
Skewed Distributions Wayne W. LaMorte, MD, PhD, MPH
Main Menu Copyright 2006
Examining the frequency distribution of a data set is an important first step in analysis. It gives an overall picture of the data, an
distribution determines the appropriate statistical analysis. Many statistical tests rely on the assumption that the data are norm
this isn't always the case. Below in the green cells is a data set with hospital length of stay (days) for rwo sets of patients who
surgery. One data set was collected before instituting a new clinical pathway and one set was collected after instituting it.
Question: Was LOS different after
instituting the pathway?
LOS
Before After
3 3
12 1
2 1
1 5
11 1
4 6
2 1 We can rapidly get a feel for what is going on here by creating a frequency
2 5
3 2
histogram. The first step is to sort each of the data sets. Begin by selecting the
1 3 "before" values of LOS. Then, from the top toolbar, click on "Data", "Sort" (if you
8 3 get a warning about adjacent data, just indicate you want to continue with the
2 1 current selection). Also, indicate that there is no "header" row and that you want
3 5 to sort in ascending order. Repeat this procedure for the other data set.
6 2
1 2
13 2
3 3
8 3
10 7
6 3
4 4
12 1
9 3
7 3
1 2
3 2
3 2
2 4
5.07 2.86
Your data should now look like this:

LOS
Before After
1 1
1 1 And you can summarize it by counting
1 1 the frequency of each LOS.
1 1
2 1 Summary: Frequency of each LOS
2 1 # of people # of people
2 2 LOS before after
2 2 1 4 6
2 2 2 5 7 9
3 2 3 6 8
8
3 2 4 2 2
3 2 5 0 3 7
Mean Mean
3 2 6 2 1 6
after before
3 3 7 1 1 5
3 3 8 2 0 4
4 3 9 1 0
4 3 10 1 0 3
6 3 11 1 0 2
6 3 12 2 0 1
7 3 13 1 0 0
8 3 14 0 0 1 2 3 4 5 6 7 8 9 10
8 4 15 0 0
9 4 total 28 28 Median
10 5
11 5 Note that the data is not normally distributed; it is a skewed distribution. As a
12 5
12 6
is large, relative to the mean. In situations were the distribution is quite skewe
13 7 deviation are misleading parameters to describe the data, and it is better to u
5.07 2.86 Mean (half of the observations are above the median and half are below) and the ra
14.74 2.57 Variance values). Note that in this case one mean is almost twice as large as the other
3.84 1.60 SD the same. Consequently, it is not clear whether institution of the clinical pathw
in hospital stay. With skewed data like this, a common mistake is to compar
3 3 Median NOT appropriate, because the validity of the t-test relies on the assumption th
distributed.
Copyright 2006
s an overall picture of the data, and the shape of the

ssumption that the data are normally distributed, and
ays) for rwo sets of patients who had femoral bypass
s collected after instituting it.
y creating a frequency
ets. Begin by selecting the
lick on "Data", "Sort" (if you
want to continue with the
ader" row and that you want
the other data set.
Mean
before before
after
4 5 6 7 8 9 10 11 12 13 14 15
a skewed distribution. As a result the standard deviation

he distribution is quite skewed the mean and standard
the data, and it is better to use simply state the median
nd half are below) and the range (minimum & maximum
t twice as large as the other, but the median values are
stitution of the clinical pathway produced an improvement
mmon mistake is to compare them using a t-test. This is
t relies on the assumption that the data are normally
Descriptive Statistics: Mean, Median, Mode, 95% confidence
interval for a mean, Standard Deviation, Standard Error, Range Wayne W. LaMorte, MD, PhD, M
(minimum and maximum) Copyright 2006
N 12 Median 19
Mean 17.83 Mode 17
STD 4.80 Minimum 7
Std Error 1.39 Maximum 23
T-crititcal 2.12
T-crititcal*std err 2.94
Note:
CONFIDENCE 2.72 Using the Excel 'CONFIDENCE' function Note:
1.96* Std Error= 2.72 gives same thing as 1.96 x stderr
This worksheet is
currently under
Example Data:
development.
14
17
22
18
22
17
12
7
20
21
21
23
Confidence interval for a mean = X =/- t critical * SD/sqrt(n)

Copyright 2006 Main Menu
Note:
This worksheet is
currently under
development.
X =/- t critical * SD/sqrt(n)

The Unpaired T-Test Main Menu
Copyright 2006
Unpaired t-tests (comparing two independent means):

For continuous data one is frequently asking the question "Is the mean different for these two groups?" I
is that the groups have the same mean. If the sample size is relatively large (>30) this can be done usin
distribution. However, authors frequently use a t-test (even with large sample), and this is particularly ap
small.
T-tests calculate a "t" statistic that takes into account the difference between the means, the variability in
observations in each group. Based on the "t" statistic and the degrees of freedom (total observations in
look up the probability of observing a difference this great or greater if the null hypothesis were true.
T-tests are based on several assumptions:

1) that the data are reasonably close to being normally distributed
2) that the two samples have similar variance & standard deviation
3) that the observations are independent of each other.
Consider the WBC counts (in thousands) in two groups of patients:
Group 1 Group 2 From a practical point of view Excel provides built in functions tha
4.5 4.2
cell C44 to see the function used for a t-test with equal variance.
5.0 7.2
5.3 8.0
• the cells where the first groups data is found,
5.3 3.5 • the cells where the second group's data is found,
6.0 6.3 • then whether it is a 2-tailed test or a 1-tailed test, and
6.0 5.1 • finally a "2" to indicate a test for equal variance.
7.6 4.6 If the variance is unequal, there is a modified calculations that on
7.7 4.8 the last parameter in the function (compare the formulae in cells C
6.4 2.0 thumb, if one standard deviation is more than twice the other, you
7.2 5.0 variance test.
7.0 5.4
5.6
8.4 Note also that the two groups do not have to have the same num
8.3
9.5 Finally, note that in this case we are estimating the means in each
15 11 N are different; consequently, it is appropriate to calculate SEM, wh
6.7 5.1 Mean root of N.
2.10 2.77 Variance
1.45 1.66 SD
0.37 0.50 SEM (standard error of the mean)
0.02 Two-tailed p-value by t-test for equal variance
0.02 Two-tailed p-value by t-test for unequal variance
The t-test is a "parametric" test, because it relies on the legitimate use of the means and standard deviations, w
the parameters that define normally distributed continuous variables. If the groups you want to compare are cl
skewed (i.e. do not conform to a Normal distribution), you have two options:
1) Sometimes you can "transform" the data, e.g. by taking the log of each observation; if the lo
normally distributed, you can then do a t-test on the transformed data; this is legitimate.
2) You can use a "non-parametric" statistical test.

the parameters that define normally distributed continuous variables. If the groups you want to compare are cl
skewed (i.e. do not conform to a Normal distribution), you have two options:
1) Sometimes you can "transform" the data, e.g. by taking the log of each observation; if the lo
normally distributed, you can then do a t-test on the transformed data; this is legitimate.
2) You can use a "non-parametric" statistical test.

LaMorte, MD, PhD, MPH
Copyright 2006
Age Freq. faile
10-20 1
21-30
rent for these two groups?" In other words, the null hypothesis 31-40 4
(>30) this can be done using z scores and the normal 41-50
e), and this is particularly appropriate if the sample size is 51-60 2
61-70 1
n the means, the variability in the data, and the number of 4.5
edom (total observations in the two groups minus 2) one can 4
ull hypothesis were true. 3.5
3
2.5
2
1.5
1
0.5
0
10-20 21-30 31-40 41-50
provides built in functions that make t-tests easy. Click on
a t-test with equal variance. One specifies:
failed ok
ps data is found, 56 19
oup's data is found, 37 25
est or a 1-tailed test, and 57 38
for equal variance. 39
modified calculations that one can get by specifying "3" as 35
mpare the formulae in cells C44 & C45). As a rule of 40
ore than twice the other, you should use the unequal 66
19
43.6 27.3
227.4 94.3
have to have the same number of subjects. 15.1 9.7
estimating the means in each group to test whether they 0.08

opriate to calculate SEM, which is SD divided by the square
ans and standard deviations, which are

ps you want to compare are clearly
g of each observation; if the log values are

itimate.
ps you want to compare are clearly
g of each observation; if the log values are

itimate.
Freq OK
1
1
1
Freq. failed
Freq OK
21-30 31-40 41-50 51-60 61-70
Mean
Variance
SD
Two-tailed p-value; ttest with unequal variance

In order to perform analysis of variance you must first intall the Excel "Analysis Tool-Pak". Click on "Tools" (above) and then on
Pack" and "Analysis Tool-Pak - VBA"; then click "Ok". After installation, when you click on "Tools," you will see a new selection
Tools menu. When you select "Data Analysis" you will see options for analysis of variance and other procedures.
Analysis of Variance Main Menu
Controls (ANOVA)
Aortoiliac Fem-AK Pop Fem-Distal The columns of data to the left are serum creatinine le
0.7 1.1 1.5 1.2 factor analysis of variance can be performed to determ
differences in the means of these groups.
1.2 1.3 1.1 0.8
1.1 0.9 0.8 0.7
Select the block of data (including column labels) from
0.7 0.7 0.9 0.7 select "Tools", then "Data Analysis", then "Single Fact
1.0 0.8 1.1 8.4 for labels, and specify the Output Range as G12. The
0.5 1.4 0.9 1.8
1.6 0.5 7.0 0.8 The p-value (0.0764) indicates differences in means t
0.8 1.1 1.4 1.0 criterion for statistical significance.
0.6 2.0 0.8 0.7
0.6 0.8 1.1 2.8 Anova: Single Factor
0.6 0.7 0.6 1.5
1.3 1.4 1.2 0.6 SUMMARY
0.5 1.1 0.6 1.3 Groups Count
1.0 1.5 1.2 0.5 Controls 25
1.0 1.0 0.6 1.2 Aortoiliac 25
0.8 0.9 0.8 8.2 Fem-AK Pop 25
0.8 0.9 0.8 0.4 Fem-Distal 25
0.6 0.6 1.3 0.6
0.5 0.9 1.3 1.6
0.9 0.9 1.5 0.5 ANOVA
0.7 1.2 1.5 11.4 Source of Variation SS
0.7 1.2 0.4 0.8 Between Groups 30.3779
0.7 1.3 12.9 0.7 Within Groups 412.2632
0.7 0.4 1.1 0.6
1.1 0.7 8.6 0.9 Total 442.6411
Means: 0.828 1.012 2.040 1.988
ck on "Tools" (above) and then on "Add-Ins" and select "Analysis Tool
ols," you will see a new selection ("Data Analysis") at the bottom of the
d other procedures.

Copyright 2006
to the left are serum creatinine levels among 4 groups of subjects. A one-
iance can be performed to determine whether there are significant
ans of these groups.
ata (including column labels) from B2:E27. Then, from the upper menu,
Data Analysis", then "Single Factor Analysis of Variance". Check the box
y the Output Range as G12. The result is shown in the box below.
indicates differences in means that do not quite meet the alpha=0.05

l significance.
Sum Average Variance

20.7 0.828 0.07626667
25.3 1.012 0.1261
51 2.04 8.77333333
49.7 1.988 8.20193333
df MS F P-value F crit
3 10.12597 2.35794221 0.0764786914 2.699393
96 4.294408
99
Case-Control Studies Main Menu
Enter data into the blue cells to calculate a p-value with the chi squared test.
Observed Data Expected Under H0

Cases Controls Cases Controls
Exposed 2 3 5 Exposed 2.06 2.94
Non-exposed 19 27 46 Non-exposed 18.94 27.06
21 30 51 21 30
Odds Ratio= 0.95

Chi Sq= 0.003 Chi square inappropriate; an expected cell is <5.
p-value= 0.955117
Conf. Level= 0.95
Upper CI= 6.23
Lower CI= 0.14
Confidence Interval (precision-based)
se(lnOR) 0.96073
Mantel-Ha
Mantel-Ha
Stratified Analysis (for 2-6 Substrata)
Stratum 1 Stratum 2
Exposed 9 6 15 20 21 41
Non-exposed 115 73 188 596 1171 1767
124 79 203 616 1192 1808
Odds Ratio= 0.95 Odds Ratio= 1.87

Chi Sq= 0.008 Chi Sq= 4.041
p-value= 0.928719 p-value= 0.044407
Conf. Level= 0.95 Conf. Level= 0.95
Upper CI= 2.79 Upper CI= 3.48
Lower CI= 0.33 Lower CI= 1.01
se(lnOR) 0.54788 se(lnOR) 0.3164662
ad/T= 3.236453 ad/T= 12.95354

bc/T= 3.399015 bc/T= 6.9225664
Expected Under H0 Expected Under H0

Exposed 9.16 5.84 15.00 13.97 27.03 41.00
Non-exposed 114.84 73.16 188.00 602.03 1164.97 1767.00
124 79 203 616 1192 1808
[(ad-bc)/n]= -0.16256 [(ad-bc)/n]= 6.0309734513

n0n1m0m1/(n2(n-1)= 3.318596 n0n1m0m1/(n2(n-1)= 9.0058014722 n0n1m0
Squared sums row 47 34.43826
sums of row 48 12.3244
MH chi sq 2.794316
MH p value 0.094599
Copyright 2006
d Under H0
5
46
51
d cell is <5.
enszel OR= 1.57

chi square= 0.09459915
6 Substrata)
Stratum 3 Stratum 4
0 0
0 0
0 0 0 0 0 0
Odds Ratio= #DIV/0! Odds Ratio= #DIV/0! Odds Ratio=

Chi Sq= #DIV/0! Chi Sq= #DIV/0!
p-value= Err:502 p-value= Err:502 p-value=
Conf. Level= 0.95 Conf. Level= 0.95 Conf. Level=
Upper CI= #DIV/0! Upper CI= #DIV/0! Upper CI=
Lower CI= #DIV/0! Lower CI= #DIV/0! Lower CI=
se(lnOR) #DIV/0! se(lnOR) #DIV/0!
ad/T= 0 ad/T= 0
bc/T= 0 bc/T= 0

#DIV/0! #DIV/0! #DIV/0! #DIV/0! #DIV/0! #DIV/0!
[(ad-bc)/n]= #DIV/0! [(ad-bc)/n]= #DIV/0! [(ad-bc)/n]=

n0n1m0m1/(n2(n-1)= #DIV/0! n0n1m0m1/(n2(n-1)= #DIV/0! n0n1m0m1/(n2(n-1)=
Stratum 5 Stratum 6
0 0
0 0
0 0 0 0 0 0
Odds Ratio= #DIV/0! Odds Ratio= #DIV/0!

p-value= Err:502 p-value= Err:502
Upper CI= #DIV/0! Upper CI= #DIV/0!
Lower CI= #DIV/0! Lower CI= #DIV/0!
ad/T= 0 ad/T= 0
bc/T= 0 bc/T= 0

#DIV/0! [(ad-bc)/n]= #DIV/0!

#DIV/0! n0n1m0m1/(n2(n-1)= #DIV/0!
Cohort Studies- Cumulative Incidence
Diseased No Disease Diseased
Exposed 156 656 812 Exposed 104.24
Non-expose 229 1958 2187 Non-exposed 280.76
385 2614 2999 385
Confidence Level 0.95
Incidence exposed= 0.1921 Confidence Interval for the Relative Risk
Incidence nonexposed= 0.1047 Upper CI= 2.21
Relative Risk= 1.83 Lower CI= 1.52
Risk Difference= 0.0874
AR%= 45.5 se(lnRR) 0.095332545
Chi Sq= 40.432
p-value= 0.000000000 Confidence Interval for the Risk Difference
Upper CI= 0.11739
# Needed to Treat= 11 Lower CI= 0.05743
Cohort Studies- Incidence Rate

Diseased No Disease Person-Time observation Diseased
Exposed 156 - 17352 Exposed 103.04
Non-expose 268 - 54048 Non-exposed 320.96
424 71400 424
Incidence exposed= 0.0090 95% Confidence Interval for the Relative Risk (test-base
Incidence nonexposed= 0.0050 Upper 95% CI= 2.20
Relative Risk= 1.81 Lower 95% CI= 1.49
Risk Difference= 0.004032
Chi Sq= 35.954
p-value= 0.000000
Stratified Analysis for Cumulative Incidence (2-6 Substrata)

Mantel-Haenszel RR
Mantel-Haenszel chi square
Stratum 1 Stratum 2
Diseased Not Diseased Not
Exposed 40 60 100 51 36
Non-exposed 20 80 100 169 68
60 140 200 220 104
Risk Ratio= 2.00 Risk Ratio= 0.82

Chi Sq= 9.524 Chi Sq= 4.700
p-value= 0.002028 p-value= 0.030163
Upper CI= 3.76 Upper CI= 1.37
Lower CI= 1.06 Lower CI= 0.49
se(lnOR) 0.322749 se(lnOR) 0.2607847063
a(c+d) 20 a(c+d) 37.305555556

c(a+B) 10 c(a+B) 45.37962963

Exposed 30.00 70.00 100.00 59.07 27.93
Non-exposed 30.00 70.00 100.00 160.93 76.07
60 140 200 220 104
For Chi sq: [(ad-bc)/n]= 10 [(ad-bc)/n]= -8.07407407

n0n1m0m1/(n2(n-1)= 10.55276382 n0n1m0m1/(n2(n-1)= 13.91332968
Squared sums row 47 3.709190672
sums of row 48 598.5897312
MH chi sq 0.006196549
MH p value 0.937256799
For RR: a*Nu/Nt 20 a*Nu/Nt 37.30555556

(Buring) c*Ne/Nt 10 c*Ne/Nt 45.37962963
Sums row 64 57.30555556
Sums row 65 55.37962963
RRmh 1.034776793
Main Menu Copyright 2006
xpected Under H0
No Disease
707.76 812
1906.24 2187
2614 2999
r the Relative Risk Test-based 95% CI

2.21
1.52
(Precision-based)
r the Risk Difference
xpected Under H0
No Disease
-
-
he Relative Risk (test-based)

(Test-based)
ce (2-6 Substrata)
Mantel-Haenszel RR= 1.03
Mantel-Haenszel chi square= 0.937257
Stratum 3 Stratum 4
DiseasedNot Diseased Not
87 0 0
237 0 0
324 0 0 0 0 0 0
Risk Ratio= #DIV/0! Risk Ratio= #DIV/0! Risk Ratio=

p-value= Err:502 p-value= Err:502 p-value=
Conf. Level= 0.95 Conf. Level= 0.95 Conf. Level=
Upper CI= #DIV/0! Upper CI= #DIV/0! Upper CI=
Lower CI= #DIV/0! Lower CI= #DIV/0! Lower CI=
a(c+d) 0 a(c+d) 0
c(a+B) 0 c(a+B) 0

87.00 #DIV/0! #DIV/0! #DIV/0! #DIV/0! #DIV/0! #DIV/0!
237.00 #DIV/0! #DIV/0! #DIV/0! #DIV/0! #DIV/0! #DIV/0!
324 #DIV/0! #DIV/0! #DIV/0! #DIV/0! #DIV/0! #DIV/0!
[(ad-bc)/n]= #DIV/0! [(ad-bc)/n]= #DIV/0! [(ad-bc)/n]=

n0n1m0m1/(n2(n-1)= #DIV/0! n0n1m0m1/(n2(n-1)= #DIV/0! n0n1m0m1/(n2(n-1)=
a*Nu/Nt #DIV/0! a*Nu/Nt #DIV/0! a*Nu/Nt

c*Ne/Nt #DIV/0! c*Ne/Nt #DIV/0! c*Ne/Nt
Stratum 5 Stratum 6
Diseased Not Diseased Not
0 0
0 0
0 0 0 0 0 0
Risk Ratio= #DIV/0! Risk Ratio= #DIV/0!

p-value= Err:502 p-value= Err:502
Upper CI= #DIV/0! Upper CI= #DIV/0!
Lower CI= #DIV/0! Lower CI= #DIV/0!
a(c+d) 0 a(c+d) 0
c(a+B) 0 c(a+B) 0

#DIV/0! [(ad-bc)/n]= #DIV/0!

#DIV/0! n0n1m0m1/(n2(n-1)= #DIV/0!
#DIV/0! a*Nu/Nt #DIV/0!

#DIV/0! c*Ne/Nt #DIV/0!
Chi Squared Test Main Menu
Copyright 2006
Example Observed Data Expected Under H0
+ Outcome -Outcome + Outcome -Outcome
Exposed 7 124 131 Exposed 4.99 126.01 131
Non-exposed 1 78 79 Non-exposed 3.01 75.99 79
8 202 210 8 202 210
p-value= 0.13481471 8/210= 0.03809524

The Chi Squared statistic is calculated from the difference between observed and expected values for each cell. The
difference is squared and then divided by the expected value for the cell. This calculation is repeated for each cell and
the results are summed. Note that the chi squared test is a "large sample test"; it should not be used if the number of
expected observations in any of the cells is <5, because it gives falsely low p-values. In this case, Fisher's Exact Test
should be used.
+ Outcome -Outcome + Outcome -Outcome
Exposed 0 Exposed #DIV/0! #DIV/0! ###
Non-exposed 0 Non-exposed #DIV/0! #DIV/0! ###
0 0 0 #DIV/0! #DIV/0! ###
Chi Sq= #DIV/0!
p-value= #DIV/0! #DIV/0!
The chi squared test can also be applied to situations with multiple groups and outcomes.
For example, the number of runners who finished a marathon in less than 4 hours among those who trained not at all, a little, m
The Excel function CHITEST will calculate the p-value automatically, if you specify the range of actual (observe
frequencies and the range of expected observations. For example,
Finished Didn't finish Finished Didn't finish
Not at all 2 5 7 3.29 3.71 7
A little 8 30 38 17.86 20.14 38
Moderately 20 15 35 16.45 18.55 35
A lot 25 12 37 17.39 19.61 37
55 62 117 55 62 117
p-value= 0.000280
The Chi Squared Test is based on the difference
PhD, MPH
between the frequency distribution that was observed
Copyright 2006
and the frequency distribution that would have been
expected under the null hypothesis. In the example
above, only 8 of 210 subjects had the outcome of
interest (3.8095%). Under the null hypothesis, we would
expect 3.8095% of the exposed group to have the
outcome, and we would expect 3.8095% of the non-
exposed group to have the outcome as well. The 2x2
table to the right calculates the frequencies expected
under the null hypothesis
for each cell. The
ed for each cell and
d if the number of
2
 (O-E)
isher's Exact Test
=
2
E
o trained not at all, a little, moderately, or a lot.

he range of actual (observed)
Case-Control Studies Main Menu

Exposed 2 3 5 Exposed 2.06 2.94
Non-exposed 19 27 46 Non-exposed 18.94 27.06
21 30 51 21 30
Odds Ratio= 0.95 Conf Level 0.95

Chi Sq= 0.003 Chi square inappropriate; an expected cell is <5.
p-value= 0.955117
Confidence Interval (precision-based)
Upper CI= 6.23
Lower CI= 0.14
se(lnOR) 0.96073
Copyright 2006
d Under H0
5
46
51
d cell is <5.
Wayne W. LaMorte, MD, PhD, MPH Confidence Intervals for a Proportion
Copyright 2006
"N" Interval Interval
Numerator Denominator Estimated +/- Lower Upper +/- Lower Upper
proportion Limit Limit Limit Limit
1 79 0.01265823 0.021 0.00 0.03 0.025 0.00 0.04
12 100 0.12 0.053 0.07 0.17 0.064 0.06 0.18

#DIV/0! ### #DIV/0! #DIV/0! ### #DIV/0! #DIV/0!
als for a Proportion
Main Menu
Interval
+/- Lower Upper
Limit Limit
0.032 0.00 0.05
0.084 0.04 0.20

### #DIV/0! #DIV/0!
Correlation, Linear Regression, and the Line of Best Fit Main Menu
Example "X" "Y"
Weeks Savings Prediction
The "dependent" vari
1 200 50.88679 m= 515.32
savings and the indep
2 850 566.2075 b= -464.43
(X and Y values) are
3 1300 1081.528 B3 to C10. The analy
4 1500 1596.849 r= 0.942109 functions that are bui
5 1578 2112.17 r2= 0.89 calculated in cell H3 u
8 3000 3658.132 N=? 8 "=SLOPE(C3:C10,B3
9 3600 4173.453 t- 6.882303 specify where the dat
10 5900 4688.774 p-value= 0.000235 first.
The Y-INTERCEPT "b

7000 Excel function "INTER
Savings
6000 data block is specifie

5000 values first. From the
specify the line of bes
4000
3000 To calculate the corre
2000
relationship one woul
"=CORREL(B3:B10,C
1000 Finally, in H7, I squar
0 calculate "r-squared",
0 2 4 6 8 10 12 of the variability in ea
Weeks
I used the graphing tool to plot the individual data points (blue diamonds) and
the line of best fit (pink line).
Main Menu
The "dependent" variable (outcome of interest) here is

savings and the independent variable is time. The data
(X and Y values) are contained in the block of cells from
B3 to C10. The analysis is performed using several
functions that are built into Excel. The SLOPE "m" is
calculated in cell H3 using the Excel function
"=SLOPE(C3:C10,B3:B10) " . So basically you need to
specify where the data is, with the "Y" values specified
first.
The Y-INTERCEPT "b" is calculated in H4 from the

Excel function "INTERCEPT(C3:C10,B3:B10); again, the
data block is specified, given the location of the "Y"
values first. From these two parameters, one can now
specify the line of best fit using the form Y=b + mX.
To calculate the correlation coefficient for this

relationship one would use the Excel function
"=CORREL(B3:B10,C3:C10)" which is located in H6.
Finally, in H7, I squared what I got in H6 in order to
calculate "r-squared", which indicates what percentage
of the variability in earnings is explained by time.
Making an Epidemic Curve for a Disease Outbreak
Date of Onset Onset # Cases Onset # Cases
1) The cases are 9/1 4/14/2005 1 9/1/04 1
11/24 28-Apr 1 11/24/04 1
sorted by date of 12/15
disease onset.
29-Apr 2 12/15/04 1 3) Starting with 4/2
2/2 30-May 2 2/2 1 then, tallied the nu
2/10 1-May 1 2/10 1 cases at 4 day inte
3/15 2-May 2 3/15 1
4/14 3-May 0 4/14 1
4/28 4-May 4 4/28 6
4) I selected the block
4/29 5-May 3 5/2 9
4/29 6-May 3 5/6 14 dates and tallies, and
4/30 7-May 5 5/10 11 (beneath the "Help" m
4/30 8-May 3 5/14 7 create a vertical colum
5/1 9-May 3 5/18 4
5/2 10-May 4
5/2 11-May 2
5/4 12-May 1 16
5/4 13-May 4 14
5/4 14-May 1
12
5/4 15-May 3
5/5 16-May 2 10
2) Then,
5/5 tally the 17-May 1 8
number
5/5 of cases 18-May 2
6
by day.
5/6 19-May 1
5/6 20-May 1 4
5/6 21-May 2
5/7 22-May 2
5/7 23-May 2 0
5/7 24-May 2 2 9 16 23 2 9 16 23 30 6 13
2/ 2/ 2/ 2/ 3/ 3/ 3/ 3/ 3/ 4/ 4/ 4/
2
5/7 25-May
5/7 26-May
5/8 27-May 1
5/8 28-May
5/8 29-May 1
5/9 30-May
5/9 31-May
5/9 1-Jun 2
5/10 2-Jun
5/10 3-Jun
5/10 4-Jun
5/10 5-Jun 2
5/11 6-Jun 1
5/11 7-Jun 3
5/12 8-Jun 1
5/13 9-Jun 1
5/13 10-Jun 2
5/13 11-Jun 3
5/13 12-Jun 3
5/14 13-Jun 4
5/15 14-Jun 2
Main Menu
3) Starting with 4/28, I

then, tallied the number of
cases at 4 day intervals.
4) I selected the block of data to the left with

dates and tallies, and used the graphing tool
(beneath the "Help" menu at the top toolbar) to
create a vertical column chart as shown below.
9 16 23 30 6 13 20 27 4 11 18
3/ 3/ 3/ 4/ 4/ 4/ 4/ 5/ 5/ 5/
Random Number Generator Main Menu
Number of groups= 4
Enter a seed # 7
Assign to Group: 3
random # 0.696398
69
This program usesxe2a random number
276 generator to assign subjects randomly to a group. You
need to specify how
/100many groups
2.76you want in the first blue cell. You then need to “spark” the
random number generator
trunc by entering
3 some number (ANY number) in the 2nd blue cell. Enter a
number and click outside the cell; this will generate a random number and specify to which
group the subject should be assigned, based on how many groups you specified.
Main Menu
group. You
o “spark” the
ue cell. Enter a
y to which
.
T-Tests
Unpaired T-test
Group 1 Group 2
Consider the values of body mass index for the two groups to
BMI BMI
25 23 represents values in a group that was treated with a regimen
25 26 variability from person to person. Values range from 22-34, a
27 24 two groups.
34 32 40
38 34 Not suprisingly, when I perform an

30 30 unpaired t-test on these data, the 35
25 24 differences are not statistically
28 26 significant (p=0.18). 30
29 22
32 31
25
27 28
28 25
30 27 20
0.8
31 30
29.21 Mean 27.29
3.70 SD 3.65
p-value for unpaired t-test 0.17682508

However, suppose these were not two independe
Paired T-test
groups of individuals, but a single group whose
(before) (after)
Subject BMI 1 BMI 2 difference BMIs were measured before and after the 4 month
1 25 23 -2 treatment. In other words, the data were "paired" in
2 25 26 1 the sense that each person acted as their own
3 27 24 -3 control. Much of the "variability" that we are dealing
4 34 32 -2 with in the setting of two independent groups is due
5 38 34 -4 to the fact that there is substantial person-to-perso
6 30 30 0 variability to begin with. However, what we are reall
7 25 24 -1 interested in is the response to treatment.
8 28 26 -2
9 29 22 -7 In this case, it looks like just about all subjects redu
10 32 31 -1 factor out the person-to-person differences, it looks
11 27 28 1 effect.
12 28 25 -3
13 30 27 -3 In the unpaired t-test the null hypothesis is that the
14 31 30 -1 t-test the null hypothesis is that the mean differe
Mean difference -1.9
p-value with paired t-test 0.004 In Excel a paired t-test is specified just like an unpaired t-test
except that the last parameter is set to 1.
A paired t-test can be used in two circumstances:

1) When doing a "before and after" comparison in each subject or comparing two treatments in each
2) In matched case-case control studies it is sometimes possible to make comparisons in pairs. [See
case-control study by Herbst et al.: Adenocarcinoma of the vagina: association of maternal stilbesterol
in young women, N. Engl. J. Med 1971; 284:878-883.]
A paired t-test relies on the following assumptions:

1) The data are quantitative.
2) The differences (e.g. after-before) are normally distributed.
3) The differences are independent of one another.
A paired t-test can be used in two circumstances:
1) When doing a "before and after" comparison in each subject or comparing two treatments in each
2) In matched case-case control studies it is sometimes possible to make comparisons in pairs. [See
case-control study by Herbst et al.: Adenocarcinoma of the vagina: association of maternal stilbesterol
in young women, N. Engl. J. Med 1971; 284:878-883.]
A paired t-test relies on the following assumptions:

1) The data are quantitative.
2) The differences (e.g. after-before) are normally distributed.
3) The differences are independent of one another.
Main Menu
ss index for the two groups to the left; group1 was untreated & group 2
at was treated with a regimen of diet and exercise for 4 months. There is
n. Values range from 22-34, and there is considerable overlap between the
40
an
e 35
30
25
20
0.8 1.8
hese were not two independent 40
but a single group whose

before and after the 4 month 35
ds, the data were "paired" in

erson acted as their own 30
ariability" that we are dealing

wo independent groups is due 25
s substantial person-to-person
. However, what we are really 20
ponse to treatment. 0.8 1.8
e just about all subjects reduced their BMI somewhat, and if you
o-person differences, it looks like the treatment regimen had an
he null hypothesis is that the means are the same, but in a paired
esis is that the mean difference between the pairs is zero.
specified just like an unpaired t-test,

eter is set to 1.
paring two treatments in each in a clinical trial.

ke comparisons in pairs. [See the methods section in the
ation of maternal stilbesterol therapy with tumor appearance
paring two treatments in each in a clinical trial.
ke comparisons in pairs. [See the methods section in the
ation of maternal stilbesterol therapy with tumor appearance
Sample Size Calculations
Part I - Sample Size Calculations for Means
Anticipated Values: Put your anticipated proportions in the blue boxes.
Mean Stan. Dev
Group 1 25 15 The cells in the table below show the estimated number of
subjects needed in each group to demonstrate a statistically
Group 2 15 15
significant differenence at "p" values ranging from 0.10-0.01 and
at varying levels of "power." [Power is the probability of finding a
Difference in means= 40 % statistically significant difference, assuming it exists, at a given
"p" value.]
Sample Size Needed in Each Group

alpha level Power
("p" value) 95% 90% 80% 50%
0.10 49 39 28 12 The red cells indicate the two
0.05 59 47 35 17 most commonly used estimates,
0.02 71 59 45 24 i.e. based on 90% or 80% power
0.01 80 67 53 30
=================================================================
Part II - Sample Size Calculations for a Difference in Proportions (frequency)

Anticipated Values: Put your anticipated proportions in the blue boxes.
Proportion with (without)
Group 1 0.12 The cells in the table below show the estimated number of subjects needed in each group
0.88
statistically significant differenence at "p" values ranging from 0.10-0.01 and at varying lev
Group 2 0.06 0.94 probability of finding a statistically significant difference, assuming it exists, at a given "p" v
Sample Size Needed in Each Group

alpha level Power
("p" value) 95% 90% 80% 50%
0.10 486 387 279 122 The red cells indicate the two
0.05 585 473 351 171 most commonly used estimates,
0.02 711 585 450 243 i.e. using a p-value < ,05 and
0.01 801 671 527 297 either 90 or 80% power.
Main Menu
Table for (Z1-alpha/2+Z1-beta)squared

ue boxes. beta
alpha 0.05 0.1 0.2 0.5
0.1 10.8 8.6 6.2 2.7
0.05 13 10.5 7.8 3.8
0.02 15.8 13 10 5.4
0.01 17.8 14.9 11.7 6.6
(frequency)
ue boxes.
er of subjects needed in each group to demonstrate a

ng from 0.10-0.01 and at varying levels of "power." [Power is the
e, assuming it exists, at a given "p" value.]
ells indicate the two

monly used estimates,
a p-value < ,05 and
or 80% power.
S u rviv al P ro b ab ility
Survival Curves Main Menu
(Adapted from Kenneth Rothman's "Episheet".)
In the blue cells enter the initial # of subjects at risk (C8), and then the # of events and
losses to follow up for each period.
1.0000
0.9000
Risk
0.8000
Initial No. at
No. at Risk
Cumulative
Surv. Prob.
95% Lower
95% Upper
0.7000
Follow-up
Effective
sum q/pL
0.6000
Survival
Lost to
Events
Bound
Bound
Prob.
Period
0.5000
Risk
0.4000
0 100 6 4 98.0 0.0612 0.9388 0.9388 0.8728 0.9716 0.3000
0.000665
1 90 6 5 87.5 0.0686 0.9314 0.8744 0.7931 0.9267 0.2000
0.001507
2 79 3 2 78.0 0.0385 0.9615 0.8408 0.7535 0.9012 0.1000
0.002020
3 74 5 7 70.5 0.0709 0.9291 0.7811 0.6854 0.8540 0.0000
0.003102
4 62 4 7 58.5 0.0684 0.9316 0.7277 0.6254 0.8106 0.004357
1 2 3 4
5 51 5 2 50.0 0.1000 0.9000 0.6550 0.5459 0.7498 0.006579
6 44 3 6 41.0 0.0732 0.9268 0.6070 0.4947 0.7091 0.008505
7 35 0 3 33.5 0.0000 1.0000 0.6070 0.4947 0.7091 0.008505
8 32 7 3 30.5 0.2295 0.7705 0.4677 0.3493 0.5899 0.018271
9 22 5 4 20.0 0.2500 0.7500 0.3508 0.2364 0.4854 0.034938
10 13 6 7 9.5 0.6316 0.3684 0.1292 0.0517 0.2879 0.215389
11 0
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
S u rviv al P ro b ab ility
Survival Curve
1.0000
0.9000
0.8000
Effective Size
0.7000 Cumulative Surv.

0.6000 Prob.
0.5000 95% Lower Bound
0.4000
95% Upper Bound
0.3000
98.0000
0.2000
95.3235
0.1000
93.7696
0.0000
90.3081
1 2
85.8686
3 4 5 6 7 8 9 10 11
80.0720 Time Period
76.1161
76.1161
62.2871
52.9724
31.2817
Screening Main Menu
Gold Standard
+ -
Test + 29 79 108 PPV= 0.269
Result - 42 1344 1386 NPV= 0.970
71 1423 1494
Sensitivity Specificity
0.408 0.944
Standardized Incidence Ratios
SIR is useful for evaluating whether the number of observed cancers in a community exceeds
the overall average rate for the entire state.
(Column CxD)
State Cancer # People in Expected # Observed #
Calculation of Standard
Rate Community Community Community (SIRs)
e.g. age Stratum (Standard) Strata Cancers Cancers To determine whether e
<20 1 0.00010 74657 7.5 11
20-44 2 0.00020 134957 27.0 25 cases occurred in a com
45-64 3 0.00050 54463 27.2 30 data are tabulated by ag
65-74 4 0.00150 25136 37.7 40 compare the observed n
75-84 5 0.00180 17012 30.6 30
85+ 6 0.00100 6337 6.3 8
the number that would b
7 0.00000 0 0.0 statewide cancer rate.
8 0.00000 0 0.0
Totals 312562 136.4 144
Standarized Incidence Ratio (SIR): 106

The SIR is the ratio of the
Lower 95% Confidence Limit: 88 cases divided by the expe
Upper 95% Confidence Limit: 123 ratio is then multiplied X 1
Example:
(Column CxD)
State Cancer # People in Expected # Observed #
Rate Community Community Community
e.g. age Stratum (Standard) Strata Cancers Cancers Age Group Community
<20 1 0.00010 74657 7.5 11 population
20-44 2 0.00020 134957 27.0 25
45-64 3 0.00050 54463 27.2 30 0-19 .0001
65-74 4 0.00150 25136 37.7 40
75-84 5 0.00180 17012 30.6 30
20-44 .0002
85+ 6 0.00100 6337 6.3 8 45-64 .0005
7 0.00000 0 0.0
8 0.00000 0 0.0 65-74 .0015
Totals 312562 136.4 144
75-84 .0018
Standarized Incidence Ratio (SIR): 106
85+ .0010
Lower 95% Confidence Limit: 88
Upper 95% Confidence Limit: 123
If the observed count is >30, the confidence interval for
Main Menu observed count is calculated using the Poisson distribu
mmunity exceeds to approximate the distribution of the observed counts
If the observed count is >30, exact confidence limits ar

calculated using the Poisson function to find the first co
Calculation of Standardized Incidence Ratios that is significantly less than the observed count.
(SIRs)
To determine whether elevated numbers of cancer
Calculation by serial nested if statements
cases occurred in a community, cancer incidence 120.48 120 120
data are tabulated by age group and gender to gap lower upper Confidence Limits
compare the observed number of cancer cases to 23.5 120.5 167.5 for Observed Count
obs p value
the number that would be expected based on the 144 0.522
statewide cancer rate. 143 0.489
142 0.456
141 0.423
140 0.390
139 0.358
The SIR is the ratio of the observed # 138 0.327
cases divided by the expected #. The 137 0.297
ratio is then multiplied X 100. 136 0.269
135 0.242
134 0.216
133 0.192
132 0.169
131 0.148
Community State Rate Expected Cases Observed 130 0.129
population Cases 129 0.112
128 0.096
.0001 74,657 7.47 11 127 0.083
126 0.070
.0002 134,957 26.99 25 125 0.059
.0005 54,463 27.23 30 124 0.050
123 0.041
.0015 25,136 37.70 40 122 0.034
121 0.028
.0018 17,012 30.62 30 120 0.023
119 0.018
.0010 6,337 6.34 8
118 0.015
117 0.012
116 0.009
115 0.007
, the confidence interval for the
d using the Poisson distribution
ion of the observed counts.
, exact confidence limits are

n function to find the first count
the observed count.
nested if statements
120
Confidence Limits
for Observed Count
Direct Standardization (for Adjusted Rates)
Adapted from Dr. Tim Heeren, Boston University School of Public Health, Dept. of Biostatist
For specific strata of a population (e.g. age groups) indicate the number of observed events and the number of people in the s
the distribution of some standard reference population in column C. [Leave a "1" in column F for extra strata to prevent calcula
Distribution of Number
Reference of Number of Proportion
e.g. age Stratum Population Events Subjects or "Rate" SE
<5 1 0.07 2414 850000 0.00284 0.00006
5-19 2 0.22 1300 2280000 0.00057 0.00002
20-44 3 0.40 8732 4410000 0.00198 0.00002
45-64 4 0.19 21190 2600000 0.00815 0.00006
65+ 5 0.12 97350 2200000 0.04425 0.00014
6 0.00 0 1 0.00000 0.00000
7 0.00 0 1 0.00000 0.00000
8 0.00 0 1 0.00000 0.00000
Totals 1.00 130986 12340003
Crude Rate 0.01061
Standardized Proportion or "Rate" 0.00797
Standard Error 0.00002
95% CI for Standardized Rate 0.00793 0.00802
Suppose you want to compare Florida and Alaska with respect to death rates from cancer. The problem is tha
Example: and Florida and Alaska have different age distributions. However, we can calculate age-adjusted rates by usin
determine what the overall rates for Florida and Alaska would have been if their populations had similar distrib
rates observed for each population and calculates a weighted average using the "standard" populations distrib
distribution in 1988 was used as a standard, but you can use any other standard. Note that the crude rates for
(1,061 per 100,000 vs.391 per 100,000, but Florida has a higher percentage of old people. The standardized (
750 per 100,000).
Distribution of Florida
US Population of Number of Proportion
e.g. age Stratum in 1988 Events Subjects or "Rate" SE
<5 1 0.07 2414 850000 0.00284 0.00006 Florida
5-19 2 0.22 1300 2280000 0.00057 0.00002 Age Deaths Pop.
20-44 3 0.40 8732 4410000 0.00198 0.00002 <5 2,414 850,00
45-64 4 0.19 21190 2600000 0.00815 0.00006 5-19 1,300 2,280,0
65+ 5 0.12 97350 2200000 0.04425 0.00014 20-44 8,732 4,410,0
6 0.00 0 1 0.00000 0.00000 45-64 21,190 2,600,0
7 0.00 0 1 0.00000 0.00000 >65 97,350 2,200,00
8 0.00 0 1 0.00000 0.00000 Tot. 130,986 12,340,00
Totals 1.00 130986 12340003
Crude Rate= 130,986/1
Crude Rate 0.01061
Distribution of Alaska
US Population of Number of Proportion Alaska
Stratum in 1988 Events Subjects or "Rate" SE
Age Deaths Pop
1 0.07 164 60000 0.00273 0.00021 <5 164 60
2 0.22 85 130000 0.00065 0.00007 5-19 85 130
3 0.40 450 240000 0.00188 0.00009 20-44 450 240
45-64 503 80
>65 870 20
Tot. 2,072 530,
Crude Rate= 2,072/

Alaska
Age Deaths Pop
<5 164 60
5-19 85 130
20-44 450 240
4 0.19 503 80000 0.00629 0.00028 45-64 503 80
5 0.12 870 20000 0.04350 0.00144 >65 870 20
6 0.00 0 1 0.00000 0.00000 Tot. 2,072 530,
7 0.00 0 1 0.00000 0.00000
8 0.00 0 1 0.00000 0.00000 Crude Rate= 2,072/
Totals 1.00 2072 530003
Crude Rate 0.00391
Main Menu
h, Dept. of Biostatistics
number of people in the stratum in columns E and F. Indicate
a strata to prevent calculation error.]
0.00019880 0.00000000
0.00012544 0.00000000
0.00079202 0.00000000
0.00154850 0.00000000
0.00531000 0.00000000
0.00000000 0.00000000
0.00000000 0.00000000
0.00000000 0.00000000
0.00797476 0.00000000
ancer. The problem is that death rates are markedly affected by age,
age-adjusted rates by using a reference or "standard" distribution to
ulations had similar distributions. The calculation uses the age-specific
andard" populations distribution for weighting. In this case, the US age
ote that the crude rates for Florida and Alaska differ substantially
eople. The standardized (age-adjusted) rates are very similar (797 vs.
Florida0.00019880 0.00000000
% of total Rate per
Age Deaths0.00012544
Pop. 0.00000000
(Weight) 100,000
<5 0.00079202
2,414 850,0000.00000000
7% 284
5-19 1,300 2,280,0000.00000000
0.00154850 18% 57
20-44 8,732 4,410,0000.00000000
0.00531000 36% 198
45-64 21,190 2,600,0000.00000000
0.00000000 21% 815
>65 97,350 2,200,0000.00000000
0.00000000 18% 4,425
Tot. 130,986 12,340,0000.00000000
0.00000000 100%
0.00797476 0.00000000
Crude Rate= 130,986/12,340,000=1,061 per 100,000
Alaska % of total Rate per

Age Deaths Pop. (Weight) 100,000
<5 0.00019133
164 60,0000.00000000
11% 274
5-19 0.00014385 0.00000000
85 130,000 25% 65
20-44 0.00075000 0.00000000
450 240,000 45% 188
45-64 503 80,000 15% 629
>65 870 20,000 4% 4,350
Tot. 2,072 530,000 100%
Crude Rate= 2,072/530,000=391 per 100,000

Alaska % of total Rate per
Age Deaths Pop. (Weight) 100,000
<5 164 60,000 11% 274
5-19 85 130,000 25% 65
20-44 450 240,000 45% 188
45-64 0.00119463
503 0.00000000
80,000 15% 629
>65 0.00522000
870 0.00000003
20,000 4% 4,350
Tot. 2,072 530,000
0.00000000 100%
0.00000000
0.00000000 0.00000000
Crude Rate= 2,072/530,000=391
0.00000000 0.00000000per 100,000
0.00749980 0.00000003

La Morte

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

La Morte

Hochgeladen von

Copyright:

Verfügbare Formate

Main Menu - The hyperlinks below take you to the appropriate wo

Epidemic curve (how to create one)

Wayne W. LaMorte, MD, PhD, MPH Copyright

Menu - The hyperlinks below take you to the appropriate worksheet.

Standardized Rates (Proportions) - Direct Standardization

Fisher's Exact Test (You need to be online to use this.)

Screening Test Performance - Sensitivity/Specificity

Sample Size Calculations

Random Assignment to Groups

culates the average distance between each individual value and

ent because the mean + 1 SD captures 68% of the observations,

he population SD, e.g. the degree of variability of body

ypical in their variability one should use SD (e.g. see

Your data should now look like this:

s an overall picture of the data, and the shape of the

a skewed distribution. As a result the standard deviation

Confidence interval for a mean = X =/- t critical * SD/sqrt(n)

X =/- t critical * SD/sqrt(n)

Unpaired t-tests (comparing two independent means):

T-tests are based on several assumptions:

Consider the WBC counts (in thousands) in two groups of patients:

2) You can use a "non-parametric" statistical test.

2) You can use a "non-parametric" statistical test.

estimating the means in each group to test whether they 0.08

ans and standard deviations, which are

g of each observation; if the log values are

g of each observation; if the log values are

21-30 31-40 41-50 51-60 61-70

Two-tailed p-value; ttest with unequal variance

Analysis of Variance Main Menu

Wayne W. LaMorte, MD, PhD, MPH

indicates differences in means that do not quite meet the alpha=0.05

Sum Average Variance

Observed Data Expected Under H0

Odds Ratio= 0.95

Odds Ratio= 0.95 Odds Ratio= 1.87

ad/T= 3.236453 ad/T= 12.95354

Expected Under H0 Expected Under H0

[(ad-bc)/n]= -0.16256 [(ad-bc)/n]= 6.0309734513

enszel OR= 1.57

Odds Ratio= #DIV/0! Odds Ratio= #DIV/0! Odds Ratio=

Expected Under H0 Expected Under H0

[(ad-bc)/n]= #DIV/0! [(ad-bc)/n]= #DIV/0! [(ad-bc)/n]=

Odds Ratio= #DIV/0! Odds Ratio= #DIV/0!

Expected Under H0 Expected Under H0

#DIV/0! [(ad-bc)/n]= #DIV/0!

Cohort Studies- Incidence Rate

Stratified Analysis for Cumulative Incidence (2-6 Substrata)

Risk Ratio= 2.00 Risk Ratio= 0.82

a(c+d) 20 a(c+d) 37.305555556

Expected Under H0 Expected Under H0

For Chi sq: [(ad-bc)/n]= 10 [(ad-bc)/n]= -8.07407407

For RR: a*Nu/Nt 20 a*Nu/Nt 37.30555556

r the Relative Risk Test-based 95% CI

r the Risk Difference

he Relative Risk (test-based)

Risk Ratio= #DIV/0! Risk Ratio= #DIV/0! Risk Ratio=

Expected Under H0 Expected Under H0

[(ad-bc)/n]= #DIV/0! [(ad-bc)/n]= #DIV/0! [(ad-bc)/n]=

a*Nu/Nt #DIV/0! a*Nu/Nt #DIV/0! a*Nu/Nt

Risk Ratio= #DIV/0! Risk Ratio= #DIV/0!

Expected Under H0 Expected Under H0

#DIV/0! [(ad-bc)/n]= #DIV/0!

For RR: aNu/Nt 20 aNu/Nt 37.30555556

aNu/Nt #DIV/0! aNu/Nt #DIV/0! a*Nu/Nt