Beruflich Dokumente
Kultur Dokumente
Introduction to Biostatistics
To provide an overview of basic concepts in Design and Analysis of Biostatistical Investigations. Unify the thought process as many students take courses under different circumstances at various times Kindle imaginations for your course work and refresh memory Grade
Letter grade will be based on a multiple choice test on the last day of the lecture
What is biostatistics?
Key Concepts Goal of statistical analysis is to draw inference or conclusions in an unbiased fashion The target population should be clearly defined (that is, the population for which inference is drawn should be clearly stated) It is important recognize the variations in the population and should be reflected in the inferences .
The design and conduct of experiments (or studies) to collect observations (or data). Display and analyze the data to infer about the population Duly acknowledge the uncertainty in the stated conclusions while inferring about the population. Study of populations Study of variations Study of methods to reduce the data
The same experiment conducted on two different populations may yield different results (systematic component of variation) Two different conditions or experiments conducted on the same population may yield different results (systematic component of variation) The experiment replicated under the same conditions on the same population may yield different results (random component of variation).
Always find ways to succinctly describe the results using graphical and numerical summaries
Examples
Example 4
Example 1
Medi-Cal is Californias medical assistance program funded by federal, state and county taxes. Roughly 50% of sick visits to a pediatric clinic are made by children covered under the Medi-cal program and the remaining 50% are covered by other insurance programs or by private payment. The objective is to study whether the healthcare given to Medi-Cal patients at the clinic differed from health care given to non-Medi-Cal patients. Observations have indicated that stimulation of walking and placing reflexes in the newborn promotes increased walking and placing. That is, if a newborn infant is held under his arms and his bare feet are permitted to touch a flat surface, he will perform well-coordinated walking movements similar to those of an adult. If dorsa of his feet are drawn against the edge of a flat surface, he will perform placing movements much like those of a kitten (Zelazo, Zelazo and Kolb (1972), Science). How do we conduct a study to test the generality of this observation? Increasing understanding of the etiology of disease has lead to development of new and improved drugs to treat them. Through clinical observations and experience the evidence is mounting that a new drug is better than the current standard drug. How do we test the generality of this observations? What are the consequences of switching to this new drug? How much does the immediate cost of switching to a new drug offsets the benefits over a long haul?
Improvements in medical technology and public health practice during the last one hundred years has increased longevity and improved health. Despite such progress, the disparities in health among various racial and ethnic groups continues to be a daunting problem. One of the goals of Healthy People 2010 is to eliminate health disparities. How do we measure health disparities? How do we decide that we have eliminated health disparities?
Biostatistical Investigations
1.
Example 2
Design an experiment or study to collect information on a set of individuals from the target population to address a set of issues or hypotheses.
2.
Randomized experiments Observational studies Explore the data by computing graphical and numerical summaries
Example 3
3.
Numerical measures to quantify central tendency, spread and shape Graphical summaries to study the distributions and associations Translate these findings into inference about the target population.
Estimates, standard errors, confidence intervals and p-values Are these inferences pertain to Causation or simply Associations?
Stem-and-Leaf
Random blood glucose levels (mmol/liter) from a groups of first year medical students
Stem-and-Leaf
Stem is all the digits except the last Leaf is the last digit Multiply the numbers by 10, if there are 2 decimal places, 100 for 3 decimal places etc. Split the stems if there the number of leaves is large
4.7 3.6 3.8 2.2 4.7 4.1 3.6 4.0 4.4 5.1 4.2 4.1 4.4 5.0 3.7 3.6 2.9 3.7 4.7 3.4 3.9 4.8 3.3 3.3 3.6 4.6 3.4 4.5 3.3 4.0 3.4 4.0 3.8 4.1 3.8 4.4 4.9 4.9 4.3 6.0 An alternative Stem-and-Leaf Display 2 3 3 4 4 29 333444 6666778889 00011123444 56777899 01 0
Stem-and-leaf is useful for small data sets, to look at the shape of the distribution and identify any outliers. For large data sets, its close cousin, histogram is used (but it is coarser) Displays the entire data Stem-and-leaf can also be used to compare distributions
Survival times (beyond day 10) for guinea pigs in treatment group (high vitamin C)
Stem
Leaf
5 6
Two Displays
1 4466 5431 2 0446888 331 3 012 4 4 5 Treatment Control Such displays are useful to visually inspect the extent to which distributions overlap as well as the magnitude of the differences 1 4466 2 0134456888 3 4 5 4 011233
Neoplasms Respiratory system Injury/Poisoning Digestive system Nervous system Others Total
Dot chart
Bar chart
You can create these and other types plots using PROC GPLOT in SAS or in Excel. These graphs were produced by using a freeware called R which can be downloaded from www.r-project.org
We will discuss some more graphical displays after introducing some numerical summaries Numerical Summaries
Central tendency: Represents a typical or middle value Spread: Extent of variability across observations Shape: The structure of the distribution Mean: Sum all the observations and divide by the number of observations being summed (arithmetic mean)
Central tendency
(21+23+24+25+31+33+33+54)/8=244/8=30.5
Median: The number such that 50% of the observations are less than the number and 50% are greater than the number
21,23,24,25,
31,33,33,54
Technically a number between 25 and 31. Sometimes the average (25+31)/2=28 is used.
Mode: The most frequent value among the set of observations Geometric mean: 1/n the power of the product of numbers
Harmonic mean: Reciprocal of the arithmetic mean of the reciprocals (useful when rates are being averaged)
Variance : s 2 = ( xi x ) 2 /(n 1)
i =1
Quartiles: Values such that 25% of observations lie below the first quartile, 25% lie between the first and the second quartile, 25% between the second and third quartile and 25% lie above the third quartile.
Interquartile range (IQR): Third quartile-First Quartile 1.5*IQR is used to detect outliers
Box plot
Median, IQR and MAD are robust to outliers. They are less influenced by extreme observations. For example, suppose the observation 54 in control group in guinea pig example were 94 then median remains unchanged, but mean changes dramatically Mean and Standard deviations are popular because they are tied to a popular normal distribution. This is a typical distribution used in many statistical analysis. If the normality is not valid then more robust methods involving median, quartiles are used Some more graphical summaries based on these additional summaries
Median+1.5*IQR Box plot captures and displays most important features of the data: Measures of central tendency, Spread and Shape There are other versions of box plots by adding more features such as mean Some box plots extend the line from the edges of the box to the minimum and maximum values (M&M)
First quartile Second quartile or median Third quartile Median-1.5*IQR Points outside the 1.5 IQR range
Histogram
Divide the data into groups choosing intervals (Mutually exclusive, equal width and exhaustive) Some rules: n/3 or n1/2log10(n) Count the observations in each group Draw a bar chart with area of the bar proportional to the count
Symmetric: Mean, Median and Modes are the same. Values on the either side of the mean are equally likely Skewed: Mean is larger or smaller than the median
Normal distribution
If you fix the width of the bar to 1 then the height of the bar is proportional to the count Characterized by the mean and standard deviation: 68% of observations lie with 1 SD 95% of observations lie within 2 SD 99.7% of observations lie within 3SD
Key Concepts
Probability
All people living in the United States In the study of treatment for diabetes, all people with diabetes Blood pressure for a person: All possible measurements blood pressure in that person
Empirical definition: Probability of an event A is the relative frequency with which the events occur in a long sequence of trials in which A is one of the outcomes.
Generally, it is impossible to measure each and every unit in the population (if we could,it is called a Census).
It only makes sense to talk about the probability when the event under question can be thought of as a result of an experiment that could be performed repeatedly Tossing of a coin or throwing a dice Suppose the median height in the population is 168 cm. Suppose we keep drawing one individual at a time and measuring height. Over the long run, as the sample size gets large, half the people in our sample will have heights below 168 cm.
A practical approach: Sample is usually a very small subset units in the population. The sample is measured and studied to draw conclusions about the population. Method used to draw the sample is the key step in a biostatistical investigation.
Sample should be representative of the population (probability or random sampling designs assure such unbiased representativeness)
A random person chosen from this population will have height below 168cm with probability
Due to sampling from the population there is uncertainty in the inferences. Statistical analysis expresses these uncertainties in terms of probabilistic statements
Probability (contd.)
Key Concepts
It is a degree of belief expressing the certainty with which the event is expected to occur. This broader definition allows probabilistic statements without necessarily contemplating a series of trials Anything that is not known to you means that you are uncertain about it. The probability is simply an expression of that uncertainty.
Tossing a coin: S={H,T} Study on health insurance: Random sample of n subjects and assessing how many have health insurance: S={0,1,2,,n} Tossing a coin: E={H} Study on health insurance: None have insurance (E={0}) At least 60% have health insurance
Statistical inference based on the empirical definition of probability is called a frequentist or repeated sampling inference Statistical inference based on the subjective interpretation of the probability is called Bayesian inference. Fortunately, for large samples the numerical results under both system of inferences are very similar but the interpretation differ. Frequentist inference is the focus of this course
E = { X S | X 0.6 n}
Probability Distribution
Sample space: An interval on the real line Assume (0, ) or ( , ) with almost zero mass outside the appropriate interval Example: X=Systolic blood pressure Mathematical convenience
Probability Mass function: Probability assignment to each individual element of the sample space (Discrete Sample Space) Pr( X = x) = f ( x), x S Pr( X = x) = 0, x S Probability density function: Probability assignment to an arbitrarily small interval around each potential value of a continuous variable (Continuous Sample Space)
Distribution function
Pr( X u ) = F (u ) = f ( x)dx
a
Rules of Probability
A = Event 0 Pr( A) 1 Pr( A) = 0 A will not occur in the entire sequence of experiments Pr( A) = 1 Only A will occur in the entire sequence of experiments Ac or A = Complement of A (or Not A) Pr( Ac ) = 1 Pr( A)
Two events A and B are mutually exclusive when occurrence of A rules out the occurrence of B in a trial Pr (A or B)=Pr(A)+Pr(B) Two events A and B are independent when occurrence of A has no bearing on the occurrence or non-occurrence of B Pr (A and B)=Pr(A)*Pr(B)
Example 1: Median height of the population is 168cm. Two individuals are chosen at random independently. What is the probability that height of both is less than 168cm? A= First persons height is less than 168cm B= Second persons height is less than 168cm A and B=Both have height less than 168cm Because of independence, Pr(A and B)=Pr(A)*Pr(B)=1/2 *1/2=1/4 Example 2: Suppose that 10% of the population has height exceeding 180cm. What is the probability that exactly one persons height exceeds 180cm? Two possible scenarios: C1: A <= 180 and B > 180 C2: A>180 and B <= 180 Both C1 and C2 cannot occur Pr(C1 or C2)=Pr(C1)+Pr(C2)=9/10*1/10+1/10*9/10=2*9/100=9/50 Note: Pr(C1) was calculated using independence of A and B. Class exercise: What is the probability that at least one persons height exceeds 180cm? What is the probability that at most one persons height exceeds 180cm?
Extension
Binomial Distribution
Suppose that 5 individuals are selected at random. What is the probability that there is only one person among 5 whose height exceeds 180cm?
Suppose that a very large population has unknown proportion of subjects, P, who have a disease. Suppose that a random sample of size n is drawn from this population. The number of diseased subjects in the sample will be x with probability,
T=Persons height exceeds 180cm. F=Persons height less than or equal to 180cm Possible scenarios: TFFFF FTFFF FFTFF FFFTF FFFFT Total 1/10*9/10*9/10*9/10*9/10 9/10*1/10*9/10*9/10*9/10 9/10*9/10*1/10*9/10*9/10 9/10*9/10*9/10*1/10*9/10 9/10*9/10*9/10*9/10*1/10 5*(9*9*9*9)/(10*10*10*10*10)
Poisson Distribution
Poisson distribution is a close cousin of Binomial distribution. If P is very small (rare disease) and n is very large then probability that in a sample of size n, you will find x diseased subjects is
In a typical sample of size n, you may expect nP subjects to have disease If you take several samples of size n from this population and note down the number of diseased subjects, the variance among these numbers will be nP(1-P). Inferential problem: Given n and x, how do we infer about P? Intuitively the estimate of P is x/n. This estimate turns out to be really a good estimate. How do we decide that it is good? We will see later.
(nP ) x e nP x e = x! x! where = nP
= Expected number of diseased people in the population
If you take a large number of very large samples and count the number of diseased people in each sample, the variance among these numbers will be approximately
Inferential problem: Given x how do we draw inference about
Normal Distribution
Conditional Probability
A popular model for many continuous variables It is a symmetric bell shaped curve characterized by two parameters: Mean and Standard Deviation Mean: Center of the distribution (the same as the median and mode) 90% of observations lie between mean-1.64*SD and mean+1.64*SD 95% of observations lie between mean-1.96*SD and mean+1.96*SD
How likely that an Event A will happen given that the event B has occurred?
P( A | B) = Pr( A I B ) Pr( B )
Key Concept in statistical inference: Sampling Distribution How do we judge whether the estimator, (x/n), (sample proportion) is a good estimator of the population proportion P? Imagine that you draw several samples each of size n. Each will give you different estimate. Variation in the estimates from sample to sample is called the sampling variance. The square root of the sampling variance is called the standard error. Two important criteria:
Diagnostic test indicates T+ or T True state of the disease: D+ or DProperties of diagnostic tests
Sensitivity: Pr(T+|D+) Specificity: Pr(T-|D-) Positive Predictive Value (PPV): Pr(D+|T+) Negative Predictive Value (NPV): Pr(D-|T-)
You would want the estimates to be the same as the estimand, on the average. Such estimates are called unbiased The sample to sample variation in the estimates should be as small as possible. That is, the standard error should be as small as possible The most desirable estimate: An unbiased estimate that has the smallest sampling variance. In this sense the sample proportion is the most desirable estimate of the population proportion
Sample Proportion
The sampling variance of the sample proportion, p, is approximately p(1p)/n. The standard error:
Confidence intervals
p 1.64 SE
p 2.57 SE
p (1 p ) / n
Instead of using the single value to estimate the population proportion, sometimes it is desired to provide a range of plausible values for the unknown population proportion with reasonable degree of confidence. Confidence interval is a summary measure that provides such set of plausible values. Usually confidence levels are 90%,95% or even 99%. An approximate 95% confidence interval for the unknown population proportion is
Example: In a random sample of size 2,837children in the State of Michigan, 118 said they usually coughed, first thing in the morning. What can you infer about the prevalence of this condition in the entire state? Sample prevalence is= 118/2837=0.0416, the estimated prevalence rate for entire state. The uncertainty in the estimate is
0.0416 (1 0.0416) / 2837 = 0.0037 95% confidence interval: 0.0416 1.96 0.0037=(0.034,0.049) With reasonable confidence one could conclude that the population prevalence rate is between 3.4% to 4.9%
p 1.96 SE p 1.96 p (1 p) / n
Confidence intervals
90% : mean 1.64 SE 95% : mean 1.96 SE 99% : mean 2.57 SE
These results are valid even if the outcome measures in the population are not normal Property called Central limit theorem: Suppose you take several samples of size n from the population. Compute the mean from each sample The histogram of these means will look normal as the sample size gets large, regardless of the distribution of values in the population.
The same principles apply when the outcome measure is continuous. Suppose that a population consists of a very large number of individuals. The objective is to infer about the mean glucose level across all subjects in the population. Suppose that the glucose levels in the population are reasonably normally distributed. A random sample of size n is taken and their glucose levels are measured. The mean of these n individuals is the best estimate of the population mean The standard error of the sample mean is SD / n
Types of Studies
Observational studies: An existing situation are observed as in Survey or a clinical case reports.
There are two treatments A and B for a disease. Is a patients survival longer if treatment A were administered instead of treatment B?
The data is used to infer how the observed state of affairs has come about.
Two possible outcomes for each patient: Survival time under treatment A (say, YA) and survival time under B (say, YB). Only one survival time can be observed. But we want to infer about YA-YB. More importantly, if there are N subjects in the population, the quantity of interest is the population average of the differences
Experimental studies: One or more conditions are manipulated under which the state of affairs can be observed
The state of affairs under each manipulated condition are then studied to infer about how the population will change when the conditions changes.
Causal inferences are direct in experimental studies whereas Association or correlational inferences are more natural in observational studies
Though a strong association in the absence of any other explanation can be construed as causal
Take a random sample of size n subjects from the population administer A and observe the mean survival Take another random sample of size n subjects from the same population administer B and observe the mean survival The difference between the two sample means is the estimate of the population averaged causal effect
An equivalent study: Take a sample of 2n subjects from the population, assign at random treatment A to n subjects and treatment B to remaining n subjects (Completely randomized design) Sometimes several other factors other than treatment can affect the outcome.
Suppose you have two treatments A and B. Take a sample of n subjects. For half the subjects administer treatment A and measure outcome. After a wash-out period, administer treatment B and measure outcome. For the remainder half, administer treatment B and measure outcome. After the the same washout period, administer treatment A and measure outcome Question: Why cant we simply administer A for all n subjects observe outcome. After a wash-out period, administer treatment B and measure outcome?
While drawing 2n subjects, n pairs are selected so that subjects in each pair are similar in terms of these other factors. A random subject in each pair is given Treatment A and the other is given Treatment B. That is, member in each pair are equal except for treatment. The difference in outcome within each pair is then the effect of treatment. (Randomized Block or matched design).
Before-after designs: A sample n subjects are drawn from the population, outcome is measured, they are then given a treatment, and again outcome is observed. The difference between post and pre treatment outcomes is the effect of treatment. That is each person is acting as his/her controls.
This design makes a number of assumptions: The effect of treatment is reversible. The washout period is enough so that there is no carry-over effect of Treatment A on B or treatment B on A.
Matched or block designs, before-after designs and cross-over designs are trying to control for factors other than treatment that could affect the outcome
An alternative approach is to measure all the factors that could affect the outcome and use statistical models to adjust for differences in these factors between treatment groups. Regression analysis are useful to achieve this goal. Actual design will be a mix of two. Some factors will be used in blocking and some variables will be used in adjustment. Nevertheless, thinking about extraneous factors or variables, other than the variable under question (treatment), is very important at the design stage. Non compliance with treatment regimen can be a problem Randomization is the key step in experimental studies and is the justification for interpreting the observed mean difference, for example, as a causal effect.
Randomization
Random number tables (Several books provide this as an appendix) Computer programs to generate random orderings.
Suppose that you want to assign two treatments, A and B, to 20 patients at random with 10 getting treatment A and the other 10 getting treatment B. Most statistical packages have routines to generate random ordering of numbers 1,2,,20. Assign on of these ordering to 20 subjects. One such ordering generated using SAS is (1,5,9,3,13,18,16,12,4,2,11,10,8,17,6,15,14,7,20,19) Subjects assigned 1 to 10 get treatment A and subjects assigned to 11 to 20 get treatment B. Alternatively, subjects assigned odd number get treatment A and those assigned even number get treatment B.
Randomization does not mean haphazard or arbitrary. Randomization uses a formal chance or probabilistic mechanism to determine who gets treatment A and who gets treatment B
Cross-sectional studies: These are based on a sample drawn from a population at one time point.
Cohort studies: This starts with a sample of subjects from the population. They are then followed over time to assess one or more disease outcomes. Subjects in the study are periodically assessed on several key risk factors.
These are useful to study associations between disease status (say, self-report on hypertension) and a risk factor (say, race/ethnicity). It is important that we select the sample to be a miniature population in several key aspects. It is important that we use a probabilistic mechanism to select which members of the population are included in the sample and which are not. This is like randomization in experimental studies. Population can be defined as a geographical area or a collection of geographical areas.
These are also called prospective studies or longitudinal studies. If the sample is a probability sample, then the results are generalizable to the entire population. Sometime special cohorts such as nurses willing to participate (Nurses Health Study), medical professionals returning the questionnaires (British Medical Doctors Study) are used in such studies. Generalizability of results from such studies is questionable People lost during the follow-up can make cohort loose its representativeness of the population
Cohort study is practically not feasible for rare diseases. Case-Control studies: Two samples. Sample diseased subjects (cases) from the population. Take a sample of non-diseased subjects (controls) from the same population. Compare the exposure to risk factors in the two groups
Inference from experimental and observational studies Statistical analysis technique is similar for both experimental or observational studies
Population-based case-control studies: Sample diseased subjects from a well defined geographical area. Probability sample of controls from the same population Typically, a census of all cases is used in population-based casecontrol studies Hospital-based case-control studies: Sample diseased subjects from one or more hospital and sample non-diseased subjects from the same hospital who visit hospital for some other reason
If the sample size in each group is 50 or larger, we will call it as large sample
Unmatched versus Matched or before-after (cross-over designs will be considered much later) Continuous (normal), binary and count outcomes Non-normal but continuous outcomes
Example
Questions of Interest
The following data was collected in a study of plasma magnesium in diabetic patients. The diabetic subjects were all insulin dependent subjects attending a diabetic clinic over a 5 month period. The non-diabetic controls were mixture of blood donors and people attending day centers for elderly, to give wide age distribution. Plasma magnesium follows a Normal distribution very closely. The summary data is as follows:
Calculate an interval which would include 95% of plasma magnesium measurements from the control population. This is called reference interval. It give information about the distribution of plasma magnesium in the population.
Number of diabetic subjects=227 Mean plasma magnesium=0.719 Standard deviation =0.068 Number of non-diabetic controls=140 Mean plasma magnesium=0.810 Standard deviation=0.057
Given that the distribution of plasma magnesium is normal, the mean and standard deviation completely specify the distribution. Thus we would expect 95% of the observations to lie between 0.810-1.96*0.057 and 0.810+1.96*0.057. That is, between 0.698 and 0.922.
The plasma magnesium level for diabetic subject is normal with mean 0.719 and standard deviation 0.068. What is the area under this normal curve between 0.698 and 0.922?
P = = =
r( 0 .3 1 Z 2 .9 9 ) P r( Z 2 .9 9 ) P r( Z 0 .3 1) 0 .9 9 8 6 0 .3 7 8 3 0 .6 2 0 3
Diabetic population: n1 = 227, s1 = 0.068; SE1 = s1 / n1 = 0.068 / 227 = 0.0045 Non-diabetic population: n2 = 140, s2 = 0.057; SE2 = s2 / n2 = 0.057 / 140 = 0.0048
Only about 62% of diabetic patient will lie in the reference interval. What are the estimates of the population mean of plasma magnesium for diabetic and non-diabetic populations? Estimate of the population mean for diabetic subjects is 0.719 mmol/liter Estimate of population mean for non-diabetic subjects is 0.810 mmol/liter.
Sample-to-sample variation in estimated mean for the population diabetic subjects is 0.0045 and for the control population it is 0.0048.
Find 95% confidence interval for the population mean for the control population.
Find the standard error of difference in the mean plasma magnesium between diabetic and non-diabetic population?
x2 = 0.810 SE2 = 0.0048 95% confidence interval: ( x2 1.96 SE2 , x2 + 1.96 SE2 ) = (0.810 1.96 0.0048, 0.810 + 1.96 0.0048)
= 0.0066
Find 95% confidence interval for the difference in the means between diabetic and non-diabetic populations.
= (0.801, 0.819) How does the confidence interval differ from the 95% reference interval? Why are they different?
More than 95% confident that the difference in the population means is negative. That is, the mean magnesium for diabetic subjects is smaller than the mean magnesium level for non-diabetic subjects.
Would plasma magnesium be a good diagnostic test for diabetes? The method discussed so far can be used to compare two population proportions. Note that the proportion is simply the average of 0s and 1s. The proportion is the mean of the binary variable. Example: A study was conducted to determine to what extent children with bronchitis in infancy get more respiratory symptoms in later life than others. 273 children who had bronchitis before age 5 (group 1) were compared to 1046 children who did not(group 2). The outcome was whether or not these children coughed during the day or night at age 14. 26 of 273 reported coughing in group 1 and 44 of 1046 reported coughing in group 2.
Data
When the sample size is large, the central limit theorem applies and the sample mean has a normal distribution regardless of the original distribution of the the outcome variables. When the same size is small, the distribution of the sample is not normal even if the distribution of the outcome is normal. Adjustment is needed when the sample size is small Example: Does increasing the amount of calcium in our diet reduce blood pressure? In a randomized experiment 10 black men were a calcium supplement for 12 weeks and 11 black men received placebo that appeared identical. The experiment was double blind. The outcome was the change in the blood pressure over a 12 week period.
Calcium group: n=10, mean=5 and standard deviation =8.743 Placebo group: n=11, mean=-0.273 and standard deviation=5.901 Suppose the population standard deviations in the two populations are the same
Two situations
SE ( x1 x2 ) =
s2 p n1
s2 p n2
What if the two population standard deviations are not the same?
S E ( x1 x 2 ) = = s 12 s2 + 2 = n1 n2
2
= 7.385 1/ 10 + 1/ 11 = 3.227 95% confidence interval (5.273 t 3.227) Degrees of freedom = 9 + 10 = 19 t = 2.093 (5.273 2.093 3.227,5.273 + 2.093 3.227) = (1.48,12.027)
(8 .7 4 3 ) 2 (5 .9 0 1) 2 + 10 11
7 .6 5 + 3 .1 7 = 3 .2 9
s 12 s2 + 2 n2 n1 df = 2 2 2 2 s1 s2 1 1 + n1 1 n1 n2 1 n2 [(8 .7 4 3 ) 2 / 1 0 + (5 .9 0 1) 2 / 1 1]2 = (8 .7 4 3 2 / 1 0 ) 2 / 9 + (5 .9 0 1 2 / 1 1) 2 / 1 0 = 1 1 6 .6 4 ( 7 .6 4 + 3 .1 6 ) 2 = = 1 5 .5 7 ( 7 .6 4 2 / 9 + 3 .1 6 2 / 1 0 ) 6 .4 9 + 1 .0 0
Given the considerable uncertainty, no change in the population mean difference between calcium and placebo groups is plausible. Based on this data, we are confident that the mean difference that one would observe is between 1.5 to 12.0
t 2 .1 3 1 + ( 2 .1 2 0 2 .1 3 1) * 0 .5 7 = 2 .1 2 5 9 5 % c o n f id e n c e in te r v a l: (5 .2 7 3 2 .1 2 5 3 .2 9 , 5 .2 7 3 + 2 .1 2 5 3 .2 9 ) = ( 1 .7 2 ,1 2 .2 6 )
So far we have concentrated on estimation of population quantities such as proportions, means, differences in two means or proportions. Sample is used to derive single estimate (point estimate) Expressed uncertainty through standard errors or confidence intervals (interval estimates) Point and interval estimates are very important quantities to communicate inference about the population or scientific phenomenon Sometimes a decision to be made is explicitly tied to the inferential process.
Decisions
Based on the data in hand usually the decisions are of the type Yes/ No. Sometimes, additional new information may be sought before making the decision. Ultimately the decision has to be made either choosing Yes or No Inferential technique that explicitly leads one to make a decision is called Tests of Statistical Hypothesis or Tests of Statistical Significance. Use of this approach when no such explicit decision making process is involved is questionable. Sometimes people tend to frame every inferential problem as decision making process so that they can use these techniques which is also questionable. Note also the implicit adversarial nature of the problem
FDA has to decide whether to approve or disapprove a drug A new intravenous procedure is touted to reduce infection rate. The hospital has to decide whether to implement the new procedure or stick with the current one.
A company that conducts coaching for an examination claims that its new method of learning results in higher score than any of its competitor. A competitor disputes this claim. Who is right?
It is the hypothesis against which one wants to find evidence A candidate hypothesis for rejection Usually the null hypothesis represents a null or no effect Opposite of null hypothesis. The hypothesis that is favored in light of the evidence against the null hypothesis
An experiment was conducted to assess the effect of pronethalal in the treatment of angina pectoris. A sample of 12 patients were assessed on number of attacks during a two-week period were observed. They were then put on pronethalal for next two weeks. The number of pain attacks were assessed while on pronethalal. The claim is that pronethalal reduces the number of pain attacks. Data
Example
Baseline: 71, 323, 8, 14, 23, 34, 79, 60, 2, 3, 17, 7 Pronethalal: 29, 348, 1, 7, 16, 25, 65, 41, 0, 0, 15, 2 Difference: 42, -25, 7, 7, 7, 9, 14, 19, 2, 3, 2, 5
Alternative hypothesis
On the average suppose that the expected difference in the number of pain attacks in the population is Null hypothesis: Ho:
=0
Alternative hypothesis: HA: 0 (This is called two-sided hypothesis) One sided alternatives: (1) HA: > 0 (2) HA: < 0
How can we check whether the population standard deviations are the same?
General principle: Check whether the data is consistent with the null hypothesis.
It can be argued that the equality of population standard deviations can never be empirically verified, especially, if the sample size is small. One should always, therefore, use the procedure which does not assume equality of population standard deviations.
We will answer the question by assessing how likely the observed data would have been generated if the null hypothesis were true. If the null hypothesis were true then the difference between pronethalol and baseline values will be on the average 0 with roughly half of them positives and the rest of them negatives. That is, under the hypothesis the number of negative signs will occur with probability 0.5. But only one negative value has occurred.
A simple procedure
The above test is called sign test. Also we performed a one sided test because only smaller values of the number of negative values were considered. If one were to find large number of negative values, say 11 or 12 then that is also evidence against the null hypothesis The probability of obtaining 11 or 12 negatives is also 0.00317 (verify!). The two sided p-value is 0.00317+0.00317=0.00634 How do we define extreme values that constitute evidence against the null hypothesis?
Even more extreme observation will be 0 negative and 12 positives which has the probability
Test statistic: Number of negative values P-value: The probability of observing the test statistic which is as or more extreme than the observed, if the null hypothesis were true
P-value=0.00293+0.00024=0.00317
Type 1 error: Rejecting the null hypothesis when it is actually true. Type 2 error: Failure to reject the null hypothesis when it is actually false. (Equivalently, accepting the null hypothesis when it is actually false) Chances of making type 1 error is called significance level and is denoted by Chances of making type 2 error is denoted by .
1- is called power. Power is the chance of Rejecting the null hypothesis when alternative is true
Suppose we specify that the chances of making type 1 error is 0.05 For two sided alternatives: The extreme value then is determined by choosing values, c and d, so that the number of negative values less than or equal to c or greater than or equal to d is 0.05. For one sided alternatives: The extreme value is determined by choosing a value, c, so that the number of negative values less than or equal to c is 0.05. Looking at the binomial table on Page T-9 (last column, n=12). If we choose c = 2 and d = 10. The probability of type 1 error is 0.0161+0.0161=0.0322. It is not possible to determine c and d to achieve significance level 0.05. For one sided hypothesis, choose c=3. (It gives slightly larger than the specified significance level 0.05
Objective is to control chances of making either types of errors Strategy: For a fixed significance level, we will define the extreme values.
Usually power is calculated instead of chances of making type 2 error. We need a specific alternative value which is the truth. Suppose that 1/12 (approximately, 8%) is the indeed the true value. The chances of rejecting the null hypothesis is: 0.3677+0.3837+0.1835+0.0000+0.0000+0.0000=0.9349 Try calculating power for alternatives 0.05, 0.10, 0.20 etc. Power curve is a plot of power against the alternative values. The sign test so far has considered only the sign of the difference and not the magnitude of the difference. Let us consider some alternatives.
Suppose that the differences can be assumed to be normally distributed. The estimate of the mean difference in the population is 7.7 and standard deviation is 15.1. The standard error is 4.4. If the null hypothesis is true then the sample mean should be distributed around 0. The extent to which the observed sample mean is different from 0 is the evidence against the null hypothesis. One way to measure the distance between the observed sample mean and the null hypothesis value is in terms of standard error unit.
t=
x 0 SE ( x )
Calculated value of t-statistic is 7.7/4.4 = 1.75 What is the probability that one would observe this extreme or even more extreme values of the test statistic under the null hypothesis?
Two-Sample Tests
Revisit the plasma magnesium example
The following data was collected in a study of plasma magnesium in diabetic patients. The diabetic subjects were all insulin dependent subjects attending a diabetic clinic over a 5 month period. The non-diabetic controls were mixture of blood donors and people attending day centers for elderly, to give wide age distribution. Plasma magnesium follows a Normal distribution very closely. The summary data is as follows:
Number of diabetic subjects=227 Mean plasma magnesium=0.719 Standard deviation =0.068 Number of non-diabetic controls=140 Mean plasma magnesium=0.810 Standard deviation=0.057
If the null hypothesis were true then the statistic has a tdistribution with 11 degrees of freedom.
-1.75
1.75
For a fixed significance level, say 0.05. The value of the test statistic considered to be large is 2.201
Are the means of plasma magnesium in the two populations (diabetic and non-diabetic) the same?
n1 = 227 x1 = 0.719 s1 = 0.068
Diabetic
Non Diabetic
Mean for Diabetic population: 1 Mean for Non-diabetic population: 2 Null Hypothesis: H o : 1 = 2 Alternative Hypothesis: H A : 1 2 x1 x2 is an estimate of 1 2 If the Null hypothesis were true then x1 x2 should be distributed around mean 0. The extent to which it is away from 0 is evidence against the null hypothesis. Test statistic:
t= = ( x1 x2 ) ( 1 2 ) x1 x2 = SE ( x1 x2 ) SE ( x1 x2 )
Sampling distribution is normal given the large sample sizes from each population. If the null hypothesis were true, 68% of the samples should result in the value of the test statistic to be between -1 and 1, 90% of the samples between -1.64 and 1.64 and 95% of samples between -1.96 and 1.96. What we have observed is very unlikely under the null hypothesis. Therefore, the null hypothesis is a suspect
- 13.78
13.78
Small sample example revisited Example: Does increasing the amount of calcium in our diet reduce blood pressure? In a randomized experiment 10 black men were given a calcium supplement for 12 weeks and 11 black men received placebo that appeared identical. The experiment was double blind. The outcome was the change in the blood pressure over a 12 week period. Data Calcium group: sample size =10, mean=5 and standard deviation =8.743 Placebo group: sample size=11, mean=-0.273 and standard deviation=5.901 Population mean if everybody in the population were given calcium supplement: 1 Population mean if everybody in the population were given only Placebo: 2 Null hypothesis: H o : 1 = 2 Alternative hypothesis: H o : 1 > 2 Large positive mean difference x1 x2 is evidence against the null hypothesis in favor of the alternative hypothesis Alternative hypothesis: H A : 1 2 Large positive or negative mean difference is evidence against the null hypothesis in favor of alternative hypothesis
= 7.385 1/ 10 + 1/ 11 = 3.227
P-value One sided alternative From Table D on page T-11, the shaded area is between 0.05 and 0.10 Computer software:0.0598
1.63
SE ( x1 x2 ) =
Test statistic
df =
t=
2
2 1 s12 1 s2 + n1 1 n1 n2 1 n2
Paired Design An experimenter was interested in dieting and weight losses among men and women. It was believed that in the first two weeks of a standard dieting program, women would tend to loose more weight than men. As a check on this notion, a random sample of 10 husband-wife pairs were put on the same strenuous diets. Their weight losses after two weeks showed in the table.
Pair 1 2 3 4 5 6 7 8 9 10
Husband 5.0 lbs 3.3 4.3 6.1 2.5 1.9 3.2 4.1 4.5 2.7
Wife 2.7lbs 4.4 3.5 3.7 5.6 5.1 3.8 3.5 5.6 4.2
There are numerous aspects of shared environment (extraneous factors) that could affect the weight gain for husband-wife pairs. Assuming that these effects are additive, the difference between husband and wife in any given pair represent the gender effect On the other hand if the extraneous effects are multiplicative, then the ratio will represent the gender effect.
d = d s /10 = 0.45
s =1 10 2 sd = (d s d ) 2 / 9 = 3.88 s =1 10
Pair
Difference (d)
1 2 3 4 5 6 7 8 9 10
2.3 -1.1 0.8 2.4 -3.1 -3.2 -0.6 0.6 -1.1 -1.5
Some issues
Power Calculations
Consider two-sample problem Assume significance level =0.05 Assume Equal variances Assume two sided alternative Assume sample sizes are large enough to use standard normal curve. Assume that the standard deviation is known Assume that the alternative is = 1 2 The cut-off value of the test statistic is then 1.96. That is, if the value of the test statistic based on a data set is outside the interval (-1.96,1.96) then hypothesis will be rejected.
P-value: The probability of observing outcomes (or test-statistics) that are more inconsistent with the null hypothesis relative to alternative hypothesis.
It depends upon the null hypothesis, alternative hypothesis and the teststatistic
Sometimes it is wrongly interpreted as the probability of the null hypothesis being true. Of course, one will never know whether any hypothesis is true or not. In fact, it is very unlikely that any null hypothesis will be every true. Failure to reject the null hypothesis at a pre-specified significance level, is to be used as guidance for action as though the null hypothesis is true. One should choose significance level to small enough so that failing to reject the null will make her/him comfortable to act as though the null hypothesis is true. 5% is one example and by no means the only number. If a sample size is very large then standard error will be small. The teststatistic, being a ratio of the difference in the means divided by the standard error, can be large even though the mean difference may have no practical or clinical consequence.
The question is how likely that we will reject the null hypothesis when actually some alternative is true. Pr(test-statistic > 1.96 or test-statistic < -1.96|Alternative is true)
1 1.96 + 1.96 1 1 1 1 + + n1 n2 n1 n2
If the sample size is sufficiently large any null hypothesis will be rejected!
Analysis of Variance
Suppose now that we want to compare more than 2 populations One could do pair-wise comparisons. This is cumbersome and is not easy to summarize when the number of populations compared is large. The analysis of variance is used by framing question in terms of in-depth investigation of variations in the observed data Analysis variance basically partitions the overall variability into one or more assignable causes or reasoning. What is left unassigned is called residual variability. Based on the partition of the variability relative merits of assignable causes are investigated. Generally, the variation due to an assignable cause relative to the residual variability is used as a yard stick for judging the importance of the assignable cause.
n
30 40 50 60 70 100
2.5 5 7.5 25.6 74.4 97.5 32.7 85.6 99.4 39.3 92.2 99.9 45.6 95.9 100 51.5 97.9 100 66.6 99.8 100
The assignable causes can be carefully planned or manipulated through an experimental design The assignable causes are based on substantive reasoning in an observational study design Example:
Data
Active Exercise 9.00 9.50 9.75 10.00 13.00 9.50 Passive Exercise 11.00 10.00 10.00 11.75 10.50 15.00 No Exercise Control
A randomized study was conducted to test the generality of the observation that stimulation of the walking and placing reflexes in the newborn promotes increase walking and placing (Zelazo, Zelazo and Kolb (1972, Science, pages 314-315)). A total of 29 one-week old males were randomized to four groups. 1: Active exercise, 2: Passive exercise, 3: Noexercise and 4: 8-week control group. Age of infants walking alone (in months) was the outcome variable of interest. The assignable cause is the levels of exercise Is the variation caused by this assignable cause substantial?
yij = Observation for subject j in group i j = 1, 2,..., ni i = 1, 2,..., k y++ = Overall mean Total variation= ( yij y++ ) 2
i =1 j =1 k ni
ANOVA
An alternative Expression
Between Groups
yij = + i + ij
Overall mean Deviation of the group i mean from overall mean
Residual
Df for Total SS=22 ( Every observation is used but sum of deviations is zero) Df for Within SS=19 (Every observation is used but sum of deviations within each group is zero) Df for Between SS=3 (Four means are used but sum of deviations from the overall mean is zero)
ANOVA Example
MS ( Between) = 14.78 / 3 = 4.93 MS (Within) = 43.69 /19 = 2.30 MS ( Between) / MS (Within) = 2.14
Is 2.14 large?
To compare the Sums of squares, differences in the degrees of freedom has to be taken into account. Mean square =Sum of square/Degrees of Freedom
Use F-distribution with (numerator df=3, denominator df=19) to determine how likely is 2.14 or even larger F when in actuality there are no differences among the four groups? P-value: 0.1228
Regression Analysis
Terminology
X= Independent variable.
A variable that an investigator can change in an experiment Amenable to intervention in an observational study Simply the variable whose impact is to be assessed. It is possible that there can be more than one independent variable of interest Other names: Predictors, correlates, right-had-side variables, exogenous variables Variable for which you want to assess effect of X. Other names: Outcome, endogenous variables, left-hand-side variables
Causal relationship: If one changes the variable X by a certain amount how much does the variable Y change? Association or correlational relationship: Are subjects with different values of X also tend have different values of Y? What is the nature of these relationships in the population? How do you quantify these relationships in the population? How to estimate the quantities describing these population relationships? How accurately are those estimates? How much uncertainty is there in assessing these relationships?
The two sample tests and ANOVA also fit into this category
Are the population means related to treatments assigned or the observed grouping ? We will later see that the two-sample t-tests and ANOVA F-tests are particular cases of the general regression framework.
Example
Scatter plot
Scatter plot: A graphical device to assess the type of relationship. Each point is a pair (X,Y) Dependent variable on the vertical axis Independent variable on the horizontal axis Inspection of the graph suggests a linear relationship
The following table gives data collected by a group of medical students in a physiology class. The objective is to assess association between height and FEV1. Height FEV1 164.0 167.0 170.4 171.2 171.2 171.3 172.0 3.54 3.54 3.19 2.85 3.42 3.20 3.60 Height FEV1 172.0 174.0 176.0 177.0 177.0 177.0 177.4 3.78 4.32 3.75 3.09 4.05 5.43 3.60 Height FEV1 178.0 180.7 181.0 183.1 183.6 183.7 2.98 4.80 3.96 4.78 4.56 4.68
Linear relationship
( y1 y2 ) ( x1 x2 )
Method of Least Squares: Find a and b that minimizes the residual sum of squares:
Representation
y = a + b x
yi = a + b xi + ei
Line-value or the expected value
i =1
2 i
= ( yi a b xi ) 2
i =1
Residual
( x x )( y y ) b= (x x )
i i i 2 i i
a = y b x
Simplified formulas
Example
Slope
b=
x y x y /n
i i i i i i i
y=FEV1, x=Height
x xi / n i i
2 i
x
i i
= 3,507.6, yi = 77.12
i
x y
i
Intercept
a = yi / n b xi / n i i
Needed quantities
13,568.18 3,507.6 77.12 / 20 = 0.074389 b= 615, 739.24 (3,507.6) 2 / 20 a = 77.12 / 20 0.074389 3,507.6 / 20 = 9.19 Prediction equation
FEV 1 = 9.19 + 0.0744 Height
x , y , x y , x
i i i i i i i i
2 i
Interpretation
Interpretation (Contd.)
Slope
Intercept
b = Expected difference in y for unit positve difference in x Two Individuals Individual 1: x = h Individual 2: x = h + 1 Expected or line-value for individual 1: a + b h Expected or line-value for individual 2: a + b (h + 1) Difference = b
Expected value of y when x=0. It is not very interpretable in this particular problem. Value of FEV1 when Height is 0!
Modification: Centering
y = c + d (x x )
Residual
Computational formulas
s =
2 e
ei = y i a b x i
e
i
2 i
Residuals represent deviations from the expected value. Large residuals reflect unreliability or uncertainty. One way to measure this uncertainty is through variance of the residuals (or the standard deviation of the residuals).
n2
( y y)
i i
b 2 ( xi x ) 2
i
n2
2 2 (n 1)( s y b 2 sx )
(n 2) s y = SD of y ' s sx = SD of x ' s
= sxy b 2 sx sxy =
s =
2 e
e
i
2 i
( x x )( y y )
i i
n2
n 1
Covariance
Example
sx = 5.51, s y = 0.71 se2 = 19 (0.71 0.0744 5.51 ) = 0.35 18
2 2 2
(Residual variance from a horizontal line) Degrees of freedom = 19 Residual variance = 0.35 = se2 (Residual variance from the regression line on x) Degrees of freedom = 18
2 (n 1) s y (n 2) se2
r=
sxy sx s y
R =
2
(n 1) s
2 y
Another Form of R2
Inference
How much the slope and intercept estimates vary from sample to sample? Standard error of the estimates
R2 =
2 y 2 y
SE (b) =
R-square is a simple measure to assess how much variability in y is explained by the variation in x. Large values of R-square indicates substantial variation in y is due to variation in x. Small R-square indicates the opposite This measure also has disadvantages and we will discuss those when we consider multiple preidtors
se2 2 (n 1) sx
b t0.025, n 2 SE (b )
Suppose x=f and it is not one of the observed values in the data set. What would one expect y to be on the average?
Test the
yf = a + b f 1 ( f x )2 SE ( y f ) = s + 2 n (n 1) sx
2 e
f = 175 y f = 9.19 + 0.0744 175 = 3.83 1 (175 175.38) 2 SE ( y f ) = 0.35 + = 0.133 20 19 5.512
Prediction Interval
This refers to a confidence interval for a single observation on outcome variable for a given value of the independent variable x = f.
In the example considered so far, the independent variable was a continuous variable. Suppose now the independent variable is a binary coded as x = 0 or 1. Interpretation of regression coefficients:
Discrete Predictors
E ( y | x) = a + bx E ( y | x = 0) = a
a = Mean for the reference group defined as subjects with x = 0 b = Difference in the mean between two groups x=1 versus x=0
t - test
Multiple Predictors
Often in practice several variables might influence the dependent variable. Some common examples
b= Unadjusted estimate d= Estimate adjusted for C; Usually d will be smaller than b (but it can be larger than b) C=Confounding variable
E (Y | X ) = a + bX
E (Y | X , I ) = c + dX + eI d 0
M=0
Y0
M=1
Y1
Statistical effect of I or C will be the same on the regression coefficient of X. The conceptual understanding has to distinguish between confounding and Intervening variables
E (Y | X , M ) = a + bX + cM + dX M E (Y | X , M = 0) = a + bX E (Y | X , M = 1) = (a + c) + (b + d ) X
d=The extent to which the effect of X is modified by the presence of M (that is, M=1) d=0 c arbitrary: Parallel lines for two groups M=1 and M=0 d=0,c=0: Coincidental lines c=0,d arbitrary: Same intercept, lines for M=0 and M=1 are fanning out d=0,c=0 c=0,d arbitrary d=0,c arbitrary d,c arbitrary
So far we have concentrated on analyzing relationships between a continuous dependent variable and continuous or discrete (or categorical) independent variables.
Discrete dependent (Yes/No for a disease), continuous independent (Age, BMI etc)
Logistic regression
Regression ANOVA
Many times the dependent and independent variables are both discrete.
Number of events as dependent (Number of seizures among epileptic patients over a fixed or variable period of time)
Poisson Regression
Qualitative categories (Such as Gender, Race/Ethnicity, geographical location, type of health insurance) Quantitative or ordered categories
low, medium and high socioeconomic status none, very low, low, medium, high doses of environmental exposure
Continuous dependent but truncated (or censored). For example, failure time, time to death, time to symptoms. These may be known for some individuals and for others it is only known to exceed some known value.
Survivial analysis
Example:
Such a table based on classification of subjects on two or more variables is called contingency table or crossclassification Each entry is a frequency, or the number of individuals having these characteristics. If 1443 is representative sample from the population, 50/1443 is the estimated population proportion of owner-occupier births that is also preterm. 50/1443 can be viewed as an estimated probability that a subject is owner-occupier and has preterm birth
A study was conducted to assess the relationship between socio-economic position (as measured by home ownership) and whether they had preterm delivery. A sample of 1443 births were chosen and the following table was created.
Preterm 50 29 11 6 3 99 Term 849 229 164 66 36 1344 Total 899 258 175 72 39 1443
Housing Tenure Owner-occupier Council Tenent Private Tenent Lives with parents Other Total
Similarly,
Suppose that A and B are not associated or related. Then The observed estimate 50/1443 and the expected (under independence or no association) is (899/1443)*(99/1443) Equivalently, the observed and expected frequencies are 50 and 1443*(899/1443)*(99/1443)=61.68 If the observed and expected frequencies are discrepant, then the hypothesis of no association is questionable. One way to assess the reasonableness hypothesis of no association is through measuring distance between the expected and observed frequencies.
Housing Tenure Owner-occupier Council Tenent Private Tenent Lives with parents Other Total Housing Tenure Owner-occupier Council Tenent Private Tenent Lives with parents Other Total
Term Total 849 899 229 258 164 175 66 72 36 39 1344 1443 Term Total 837.3 899 240.3 258 163.0 175 67.1 72 36.3 39 1344 1443
Observed frequencies
Expected frequencies
T has a chi-square distribution with df=(r-1)(c-1) degrees of freedom. r=5, c=2; df=4. Critical value for significance level is 0.05 is 9.49. The data are not consistent with the hypothesis of no association between housing tenure and time of delivery. That is, there is a good evidence of association between housing tenure and time of delivery The chi-square statistics is not a measure of association. If we double the frequencies in each cell, the association will remain unchanged but chi-square will double. Chi-square is a large sample test and is questionable if any expected frequency is less than 5. Alternatives are
r = Number of rows c = Number of columns Oij = Observed frequency in row i and column j Eij = Expected frequency in row i and column j
(50 61.7)2 (849 837.3) 2 (29 17.7) 2 + + + 61.7 837.3 17.7 (3 2.7) 2 (36 36.3) 2 ... + + = 10.5 2.7 36.3 T=
Yates Correction
(| O E | 0.5) 2 TY = E
Fishers exact test. It is based on computing the probability of observing a particular contingency table or tables that are more inconsistent with the hypothesis of no association. It is a complicated algorithm and usually is performed using a computer. Example: The following table is from a trial investigating the efficacy of streptomycin for the treatment of pulmonary tuberculosis. The data is for subgroup for patients with an initial temperature of 100100.9F. The two variables are radiological assessment of the disease 6 months later and treatment.
Pooled table
Radiological assessment Streptomycin Improvement Deterioration or Death Odds of improvement in the streptomycin group=13/2=a/c Odds of improvement in the control group=5/12=b/d Odds ratio=(13/2)/(5/12)=15.6 =(ab)/(cd) Confidence interval: exp[Log(OR)-z*SE(log(OR)), Log(OR)+z*SE(log(OR))] 13 (a) 2 (c) Control 5 (b) 12 (d)
Analysis so far involved only two variables. What to do if we have more than two variables? For example, suppose we want to adjust for Age and other confounding variables while assessing association between treatment and outcome (or home ownership and time of delivery).
f yn + f ny f yn + f ny f yn f ny 2 2 + 2 = f yn + f ny f yn + f ny 2 = 31.4 df = 1
The chi-square value is highly significant. The proportions at two ages are not the same.
Nonparametric Approaches
Most statistical approaches discussed so far assume some distribution for the population (mostly normal). The approaches such as one and two sample t-tests, linear regression etc. are valid unless the departure from normality is very severe. Nevertheless, it will be useful to have a set of techniques that can be applied without any distributional assumptions. The sign test discussed earlier in the course is an example of a nonparametric test. However, this procedure can have low power because it uses only the signs and not the magnitude. An alternative is to use magnitude in some way but still maintaining the nonparametric nature of the tests. Rank-based procedures are quite popular
Paired Designs
Pair
Rank (r)
Difference (d) 2.3 -1.1 0.8 2.4 -3.1 -3.2 -0.6 0.6 -1.1 -1.5
Wilcoxon signed rank procedure Step 1: Rank the absolute values of the differences Step 2: Take the difference in the sums of the ranks of the positive and negative differences
1 2 3 4 5 6 7 8 9 10
w = (6 + 3 + 7 + 1.5) (4.5 + 8 + 9 + 1.5 + 4.5 + 5) = 17.5 32.5 = 15 E ( w) = 0; Var ( w) = n(n + 1)(2n + 1) / 6 = 10 11 21/ 6 = 385 z = ( w E ( w)) / var( w) = 15 / 385 = 0.76
Null hypothesis: Median of the distribution of differences is zero. All nonparametric procedures formulate hypotheses in terms of medians rather than mean
Sample of size n from population 1 Sample of size m from population 2 Rank (n+m) units regardless of the populations Sum the ranks of subjects in sample 1and call it T. Define U=T-n(n+1)/2
z = [U mn / 2] / mn(m + n + 1) /12
Alternatively, one can sum the ranks of subjects in sample 2 and then replace n by m
Null hypothesis: The distribution Crohns in the two populations are the disease same 1.8,2.8,4.2, Example:The following table 6.2,2.2,3.2,4. gives biceps skinfold 4,6.6,2.4,3.6, measurements for 20 patients 4.8,7.0,2.5,3. with Crohns disease and 9 8,5.6,10.0,2.8 patients Coelic disease. The , 4.0,6.0,10.4 objective is to assess whether the distribution of the bicep measurements are the same
3.8 14.5 6.2 24 3.8 14.5 6.6 25 4.0 16 7.0 26 4.2 17.5 7.6 27 4.2 17.5 10.0 28 10.6 29
Generalizations
What if you have more than two groups? Rank all the observations regardless of group and then perform the one-way analysis of variance of the ranks. The null hypothesis: The distributions for the various populations defined by the groups are the same. You can get ranks by using PROC RANK in SAS. See the handout for example