Beruflich Dokumente
Kultur Dokumente
Theresa A Scott, MS
Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/TheresaScott
Sample Size
1 / 24
Introduction
After youve decided what and whom youre going to study and the design to be used, you must decide how many subjects to sample. Even the most rigorously executed study may fail to answer its research question if the sample size is too small. If the sample size is too large, the study will be more dicult and costly than necessary while unnecessarily exposing a number of subjects to possible harm. Goal: to estimate an appropriate number of subjects for a given study design. ie, the number needed to nd the results youre looking for. IMPORTANT: Although a useful guide, sample size calculations give a deceptive impression of statistical objectivity. Really only making a ballpark estimate.
Theresa A Scott, MS (Vandy Biostatistics) Sample Size 2 / 24
Introduction, contd
Only as accurate as the data and estimates on which they are based, which are often just informed guesses. Often reveals that the research design is not feasible or that dierent predictor or outcome variables are needed. TAKE HOME MESSAGE: Sample size should be estimated early in the design phase of the study, when major changes are still possible. In addition to the statistical analysis plan, the sample size section is critical to an IRB proposal and any kind of grant. 42% of R01s examined in one review paper were criticized for the sample size justications or analysis plans.1 Much more involved than a cut-and-paste paragraph.
1
Inouye & Fiellin, An Evidence-Based Guide to Writing Grant Proposals for Clinical Research, Annals of Internal
Medicine, 142.4 (2005): 274-282. Theresa A Scott, MS (Vandy Biostatistics) Sample Size 3 / 24
Underlying principles
Research hypothesis: Specic version of the research question that summarizes the main elements of the study the sample, and the predictor and outcome variables in a form that establishes the basis for the statistical hypothesis tests.2 Should be simple (ie, contain one predictor and one outcome variable); specic (ie, leave no ambiguity about the subjects and variables or about how the statistical hypothesis will be applied); and stated in advance. Example: Use of tricyclic antidepressant medications, assessed with pharmacy records, is more common in patients hospitalized with an admission of myocardial infarction at Longview Hospital in the past year than in controls hospitalized for pneumonia.
2
NOTE: Hypotheses are not needed for descriptive studies more to come. Sample Size 4 / 24 Theresa A Scott, MS (Vandy Biostatistics)
Good rule of thumb: choose the smallest eect size that would be clinically meaningful (and you would hate to miss).
Will be okay if true eect size ends up being larger.
Theresa A Scott, MS (Vandy Biostatistics) Sample Size 7 / 24
Additional considerations
Variability of the eect size: Statistical tests depend on being able to show a dierence between the groups being compared. The greater the variability (spread) in the outcome variable among the subjects, the more likely it is that the values in the groups will overlap, and the more dicult it will be to demonstrate an overall dierence between them. Use the most precise measurements/variables possible. Often have >1 hypothesis, but should specify a single primary hypothesis for sample size planning. Helps to focus the study on its main objective and provides a clear basis for the main sample size calculation. Useful to rank other research questions/specic aims as secondary, etc.
Theresa A Scott, MS (Vandy Biostatistics) Sample Size 9 / 24
Assumptions:
Whether the 2 groups are paired or independent.
Example paired: Comparing the mean BMI of subjects before and after a weight loss program. Example independent: Comparing the mean depression score in subjects treated with 2 dierent antidepressants.
Rules of thumb:
Smaller sample size needed for paired groups SD of the dierence in a variable usually smaller than the SD of a variable. Sample size decreases as the dierence in the mean values increases (holding SD constant). Sample size increases as SD increases (holding the dierence in the mean values constant).
eectsize . SD
Rule of thumb: both the value of the proportions and the dierence between them aect sample size sample size much larger for small proportions. Can also calculate assuming a relative risk (instead of 2 proportions).
IMPORTANT: a relative risk (ie, risk ratio) equals an odds ratio in only certain cases.
Theresa A Scott, MS (Vandy Biostatistics) Sample Size 15 / 24
Sample Size
16 / 24
Can also calculate the sample size required to determine whether the correlation coecient between 2 continuous variables is signicant (ie, signicantly diers from 0) see Designing Clinical Research. Adjusting for covariates: when designing a study, you may decide that 1 variable will confound the association between the predictor and outcome, and plan to use regression analysis to adjust for these confounders. Calculated sample size will need to be increased (10:1 rule) see a statistician.
Theresa A Scott, MS (Vandy Biostatistics) Sample Size 17 / 24
For both:
(1) the desired precision (total width) of the CI. (2) the condence level for the interval (eg, 95 or 99%).
Example: A sample size of 166 subjects is needed to estimate the mean IQ among 3rd graders in an urban area with a 99% CI of 3 points (ie, a total with of 6 points), assuming a standard deviation of 15 points.
Theresa A Scott, MS (Vandy Biostatistics) Sample Size 19 / 24
Additional thoughts/considerations
Consider strategies for minimizing sample size and maximizing power, which include using Continuous variables, Paired measurements, Unequal group sizes, and A more common (ie, prevalent) binary outcome. Useful to calculate (and report) a range of sample sizes by assuming dierent combinations of parameter values take the largest sample size to cover all bases. Always justify the feasibility of the calculated sample size. How long would it take to accrue/enroll the subjects? Need to consider the source of subjects, the inclusion/exclusion criteria, the prevalence of the outcome, etc.
Theresa A Scott, MS (Vandy Biostatistics) Sample Size 22 / 24