Sie sind auf Seite 1von 67

Epidemology Notes

Stats and clinical trials

PDF generated using the open source mwlib toolkit. See for more information. PDF generated at: Wed, 23 Jun 2010 02:43:06 UTC

Epidemiological methods Study design Blind experiment Randomized controlled trial Statistical inference Statistical hypothesis testing Meta-analysis Clinical trial 1 3 6 9 22 30 41 47

Article Sources and Contributors Image Sources, Licenses and Contributors 63 64

Article Licenses
License 65

Epidemiological methods

Epidemiological methods
The science of epidemiology has matured significantly from the times of Hippocrates and John Snow (physician). The techniques for gathering and analyzing epidemiological data vary depending on the type of disease being monitored but each study will have overarching similarities.

Outline of the Process of an Epidemiological Study[1]

1. Establish that a Problem Exists Full epidemiological studies are expensive and laborious undertakings. Before any study is started a case must be made for the importance of the research. 2. Confirm the Homogeneity of the Events Any conclusions drawn from inhomogeneous cases will be suspicious. All events or occurrences of the disease must be true cases of the disease. 3. Collect all the Events It is important to collect as much information as possible about each event in order to inspect a large number of possible risk factors. The events may be collected from varied methods of epidemiological study or from censuses or hospital records. The events can be characterized by Incidence rates and Prevalence rates. 4. Characterize the events as to epidemiological factors 1. Predisposing factors Non-environmental factors that increase the likelihood of getting a disease. Genetic history, age, and gender are examples. 2. Enabling/Disabling factors Factors relating to the environment that either increase or decrease the likelihood of disease. Exercise and good diet are examples of disabling factors. A weakened immune system and poor nutrition are examples of enabling factors. 3. Precipitation factors This factor is the most important in that it identifies the source of exposure. It may be a germ, toxin or gene. 4. Reinforcing factors These are factors that compound the likelihood of getting a disease. They may include repeated exposure or excessive environmental stresses. 5. Look for patterns and trends Here one looks for similarities in the cases which may identify major risk factors for contracting the disease. Epidemic curves may be used to identify such risk factors. 6. Formulate a Hypothesis If a trend has been observed in the cases, the researcher may postulate as to the nature of the relationship between the potential disease-causing agent and the disease. 7. Test the hypothesis Because epidemiological studies can rarely be conducted in a laboratory the results are often polluted by uncontrollable variations in the cases. This often makes the results difficult to interpret. Two methods have evolved to assess the strength of the relationship between the disease causing agent and the disease. Koch's postulates were the first criteria developed for epidemiological relationships. Because they only work well for highly contagious bacteria and toxins, this method is largely out of favor.

Epidemiological methods Bradford-Hill Criteria are the current standards for epidemiological relationships. A relationship may fill all, some, or none of the criteria and still be true. 8. Publish the results

Epidemiologist are famous for their use of rates. Each measure serves to characterize the disease giving valuable information about contagiousness, incubation period, duration, and mortality of the disease.

Measures of occurrence
1. Incidence measures 1. Incidence rate, where cases included are defined using a case definition 2. Hazard rate 3. Cumulative incidence 2. Prevalence measures 1. Point prevalence 2. Period prevalence

Measures of association
1. Relative measures 1. Risk ratio 2. Rate ratio 3. Odds ratio 4. Hazard ratio 2. Absolute measures 1. Absolute risk reduction 2. Attributable risk 1. Attributable risk in exposed 2. Percent attributable risk 3. Levins attributable risk

Other measures
1. 2. 3. 4. Virulence and Infectivity Mortality rate and Morbidity rate Case fatality Sensitivity (tests) and Specificity (tests)

See also
Study design Epidemiology OpenEpi Epi_Info

Epidemiological methods

External links [2] Epidemiologic Inquiry online weblog for epidemiology researchers Epidemiology Forum [3] A discussion and forum community for epi analysis support and fostering questions, debates, and collaborations in epidemiology The Centre for Evidence Based Medicine [4] at Oxford maintains an on-line "Toolbox" of evidence based medicine methods. Epimonitor [5] has a comprehensive list of links to associations, agencies, bulletins, etc. Epidemiology for the Uninitiated [6] On line text, with easy explanations. North Carolina Center for Public Health Preparedness Training [7] On line training classes for epidemiology and related topics.

[1] Austin, Donald F., and S. B. Werner. Epidemiology for the health sciences a primer on epidemiologic concepts and their uses. Springfield, Ill: C. C. Thomas, 1974. Print. [2] http:/ / www. Epidemiologic. org/ [3] http:/ / www. epidemiologic. org/ forum/ [4] http:/ / www. cebm. net/ toolbox. asp [5] http:/ / www. epimonitor. net/ index. htm [6] http:/ / bmj. bmjjournals. com/ epidem/ epid. html [7] http:/ / www. sph. unc. edu/ nccphp/ training/

Austin, Donald F., and S. B. Werner. Epidemiology for the health sciences a primer on epidemiologic concepts and their uses. Springfield, Ill: C. C. Thomas, 1974. Print.

Study design
A number of different study designs are indicated below.

Treatment studies
Randomized controlled trial Double-blind randomized trial Single-blind randomized trial Non-blind trial Nonrandomized trial (quasi-experiment) Interrupted time series design (measures on a sample or a series of samples from the same population are obtained several times before and after a manipulated event or a naturally occurring event) - considered a type of quasi-experiment

Study design

Observational studies
Cohort study Prospective cohort Retrospective cohort Time series study Case-control study Nested case-control study Cross-sectional study Community survey (a type of cross-sectional study) Ecological study

Important considerations
When choosing a study design, many factors must be taken into account. Different types of studies are subject to different types of bias. For example, recall bias is likely to occur in cross-sectional or case-control studies where subjects are asked to recall exposure to risk factors. Subjects with the relevant condition (e.g. breast cancer) may be more likely to recall the relevant exposures that they had undergone (e.g. hormone replacement therapy) than subjects who don't have the condition. The ecological fallacy may occur when conclusions about individuals are drawn from analyses conducted on grouped data. The nature of this type of analysis tends to overestimate the degree of association between variables.

Seasonal studies
Conducting studies in seasonal indications (such as allergies, Seasonal Affective Disorder, influenza, and others) can complicate a trial as patients must be enrolled quickly. Additionally, seasonal variations and weather patterns can effect a seasonal study.[1] [2]

Other terms
The term retrospective study is sometimes used as another term for a case-control study. This use of the term "retrospective study" is misleading, however, and should be avoided because other research designs besides case-control studies are also retrospective in orientation. Superiority trials are designed to demonstrate that one treatment is more effective than another. Non-inferiority trials are designed to demonstrate that a treatment is at least not appreciably worse than another. Equivalence trials are designed to demonstrate that one treatment is as effective as another. When using "parallel groups", each patient receives one treatment; in a "crossover study", each patient receives several treatments. A longitudinal study research subjects over two or more points in time; by contrast, while a cross-sectional study assesses research subjects at one point in time.

Study design

See also
Clinical trial Conceptual framework Design of experiments Epidemiological methods Experimental control Hypothesis Meta-analysis Operationalization

External links [2] Epidemiologic Inquiry online weblog for epidemiology researchers Epidemiology Forum [3] An epidemiology discussion and forum community to foster debates and collaborations in epidemiology Some aspects of study design [3] Tufts University web site Comparison of strength [4] Description of study designs from the National Cancer Institute Political Science Research Design Handbook [5] Truman State University website Study Design Tutorial [6] Cornell University College of Veterinary Medicine

[1] Yamin Khan and Sarah Tilly. "Flu, Season, Diseases Affect Trials" (http:/ / appliedclinicaltrialsonline. findpharma. com/ appliedclinicaltrials/ Drug+ Development/ Flu-Season-Diseases-Affect-Trials/ ArticleStandard/ Article/ detail/ 652128). Applied Clinical Trials Online. . Retrieved 26 February 2010. [2] Yamin Khan and Sarah Tilly. "Seasonality: The Clinical Trial Manager's Logistical Challenge" (http:/ / www. pharm-olam. com/ pdfs/ POI-Seasonality. pdf). published by: Pharm-Olam International (http:/ / www. pharm-olam. com/ ). . Retrieved 26 April 2010. [3] http:/ / www. jerrydallal. com/ LHSP/ STUDY. HTM [4] http:/ / imsdd. meb. uni-bonn. de/ cancernet/ 902570. html [5] http:/ / www2. truman. edu/ polisci/ design. htm [6] http:/ / www. vet. cornell. edu/ imaging/ tutorial/ index. html

Blind experiment

Blind experiment
A blind or blinded experiment is a scientific experiment where some of the persons involved are prevented from knowing certain information that might lead to conscious or unconscious bias on their part, invalidating the results. For example, when asking consumers to compare the tastes of different brands of a product, the identities of the latter should be concealed otherwise consumers will generally tend to prefer the brand they are familiar with. Similarly, when evaluating the effectiveness of a medical drug, both the patients and the doctors who administer the drug may be kept in the dark about the dosage being applied in each case to forestall any chance of a placebo effect, observer bias, or conscious deception. Blinding can be imposed on researchers, technicians, subjects, funders, or any combination of them. The opposite of a blind trial is an open trial. Blind experiments are an important tool of the scientific method, in many fields of research from medicine, forensics, psychology and the social sciences, to basic sciences such as physics and biology and to market research. In some disciplines, such as drug testing, blind experiments are considered essential. The terms blind (adjective) or to blind (transitive verb) when used in this sense are figurative extensions of the literal idea of blindfolding someone. The terms masked or to mask may be used for the same concept. (This is commonly the case in ophthalmology, where the word 'blind' is often used in the literal sense.)

One of the earliest suggestions that a blinded approach to experiments would be valuable came from Claude Bernard, who recommended that any scientific experiment be split between the theorist who conceives the experiment and a naive (and preferably uneducated) observer who registers the results without foreknowledge of the theory or hypothesis being tested. This suggestion contrasted starkly with the prevalent Enlightenment-era attitude that scientific observation can only be objectively valid when undertaken by a well-educated, informed scientist.[1]

Single-blind trials
Single-blind describes experiments where information that could introduce bias or otherwise skew the result is withheld from the participants, but the experimenter will be in full possession of the facts. In a single-blind experiment, the individual subjects do not know whether they are so-called "test" subjects or members of an "experimental control" group. Single-blind experimental design is used where the experimenters either must know the full facts (for example, when comparing sham to real surgery) and so the experimenters cannot themselves be blind, or where the experimenters will not introduce further bias and so the experimenters need not be blind. However, there is a risk that subjects are influenced by interaction with the researchers known as the experimenter's bias. Single-blind trials are especially risky in psychology and social science research, where the experimenter has an expectation of what the outcome should be, and may consciously or subconsciously influence the behavior of the subject. A classic example of a single-blind test is the "Pepsi challenge." A marketing person prepares several cups of cola labeled "A" and "B". One set of cups has Pepsi, the others have Coca-Cola. The marketing person knows which soda is in which cup but is not supposed to reveal that information to the subjects. Volunteer subjects are encouraged to try the two cups of soda and polled for which ones they prefer. The problem with a single-blind test like this is the marketing person can give (unintentional or not) subconscious cues which bias the volunteer. In addition it's possible the marketing person could prepare the separate sodas differently (more ice in one cup, push one cup in front of the volunteer, etc.) which can cause a bias. If the marketing person is employed by the company which is producing the challenge there's always the possibility of a conflict of interests where the marketing person is aware that future income will be based on the results of the test.

Blind experiment

Double-blind trials
Double-blind describes an especially stringent way of conducting an experiment, usually on human subjects, in an attempt to eliminate subjective bias on the part of both experimental subjects and the experimenters. In most cases, double-blind experiments are held to achieve a higher standard of scientific rigor. In a double-blind experiment, neither the individuals nor the researchers know who belongs to the control group and the experimental group. Only after all the data have been recorded (and in some cases, analyzed) do the researchers learn which individuals are which. Performing an experiment in double-blind fashion is a way to lessen the influence of the prejudices and unintentional physical cues on the results (the placebo effect, observer bias, and experimenter's bias). Random assignment of the subject to the experimental or control group is a critical part of double-blind research design. The key that identifies the subjects and which group they belonged to is kept by a third party and not given to the researchers until the study is over. Double-blind methods can be applied to any experimental situation where there is the possibility that the results will be affected by conscious or unconscious bias on the part of the experimenter. Computer-controlled experiments are sometimes also erroneously referred to as double-blind experiments, since software may not cause the type of direct bias between researcher and subject. Development of surveys presented to subjects through computers shows that bias can easily be built into the process. Voting systems are also examples where bias can easily be constructed into an apparently simple machine based system. In analogy to the human researcher described above, the part of the software that provides interaction with the human is presented to the subject as the blinded researcher, while the part of the software that defines the key is the third party. An example is the ABX test, where the human subject has to identify an unknown stimulus X as being either A or B.

In medicine
Double-blinding is relatively easy to achieve in drug studies, by formulating the investigational drug and the control (either a placebo or an established drug) to have identical appearance (color, taste, etc.). Patients are randomly assigned to the control or experimental group and given random numbers by a study coordinator, who also encodes the drugs with matching random numbers. Neither the patients nor the researchers monitoring the outcome know which patient is receiving which treatment, until the study is over and the random code is broken. Effective blinding can be difficult to achieve where the treatment is notably effective (indeed, studies have been suspended in cases where the tested drug combinations were so effective that it was deemed unethical to continue withholding the findings from the control group, and the general population),[2] [3] or where the treatment is very distinctive in taste or has unusual side-effects that allow the researcher and/or the subject to guess which group they were assigned to. It is also difficult to use the double blind method to compare surgical and non-surgical interventions (although sham surgery, involving a simple incision, might be ethically permitted). A good clinical protocol will foresee these potential problems to ensure blinding is as effective as possible. It has also been argued[4] that even in a doubly-blind experiment, general attitudes of the experimenter such as skepticism or enthusiasm towards the tested procedure can be subconsciously transferred to the test subjects. Evidence-based medicine practitioners prefer blinded randomised controlled trials (RCTs), where that is a possible experimental design. These are high on the hierarchy of evidence; only a meta analysis of several well designed RCTs is considered more reliable.

Blind experiment

In physics
Modern nuclear physics and particle physics experiments often involve large numbers of data analysts working together to extract quantitative data from complex datasets. In particular, the analysts want to report accurate systematic error estimates for all of their measurements; this is difficult or impossible if one of the errors is observer bias. To remove this bias, the experimenters devise blind analysis techniques, where the experimental result is hidden from the analysts until they've agreedbased on properties of the data set other than the final valuethat the analysis techniques are fixed. One example of a blind analysis occurs in neutrino experiments, like the Sudbury Neutrino Observatory, where the experimenters wish to report the total number N of neutrinos seen. The experimenters have preexisting expectations about what this number should be, and these expectations must not be allowed to bias the analysis. Therefore, the experimenters are allowed to see an unknown fraction f of the dataset. They use these data to understand the backgrounds, signal-detection efficiencies, detector resolutions, etc.. However, since no one knows the "blinding fraction" f, no one has preexisting expectations about the meaningless neutrino count N' = N x f in the visible data; therefore, the analysis does not introduce any bias into the final number N which is reported. Another blinding scheme is used in B meson analyses in experiments like BaBar and CDF; here, the crucial experimental parameter is a correlation between certain particle energies and decay timeswhich require an extremely complex and painstaking analysisand particle charge signs, which are fairly trivial to measure. Analysts are allowed to work with all of the energy and decay data, but are forbidden from seeing the sign of the charge, and thus are unable to see the correlation (if any). At the end of the experiment, the correct charge signs are revealed; the analysis software is run once (with no subjective human intervention), and the resulting numbers are published. Searches for rare events, like electron neutrinos in MiniBooNE or proton decay in Super-Kamiokande, require a different class of blinding schemes. The "hidden" part of the experimentthe fraction f for SNO, the charge-sign database for CDFis usually called the "blindness box". At the end of the analysis period, one is allowed to "unblind the data" and "open the box".

In forensics
In a police photo lineup, an officer shows a group of photos to a witness or crime victim and asks him or her to pick out the suspect. This is basically a single-blind test of the witness' memory, and may be subject to subtle or overt influence by the officer. There is a growing movement in law enforcement to move to a double blind procedure in which the officer who shows the photos to the witness does not know which photo is of the suspect.[5] [6]

External links
control group study The Skeptics Dictionary [7] More on why double blind is important. PharmaSchool JargonBuster Clinical Trial Terminology Dictionary [8]

Blind experiment

[1] Daston, Lorraine. "Scientific Error and the Ethos of Belief". Social Research 72 (Spring 2005): 18. [2] "Male circumcision 'cuts' HIV risk" (http:/ / news. bbc. co. uk/ 2/ hi/ health/ 6176209. stm). BBC News. 2006-12-13. . Retrieved 2009-05-18. [3] McNeil Jr, Donald G. (2006-12-13). "Circumcision Reduces Risk of AIDS, Study Finds" (http:/ / www. nytimes. com/ 2006/ 12/ 13/ health/ 13cnd-hiv. html?pagewanted=print). The New York Times. . Retrieved 2009-05-18. [4] The Journal of Alternative and Complementary Medicine. http:/ / www. liebertonline. com/ doi/ abs/ 10. 1089/ acm. 2009. 0515 title=Skeptical Comment About Double-Blind Trials. Retrieved 2010-05-04. [5] Psychological sleuths Accuracy and the accused (http:/ / www. apa. org/ monitor/ julaug04/ accuracy. html) on [6] Under the Microscope For more than 90 years, forensic science has been a cornerstone of criminal law. Critics and judges now ask whether it can be trusted. (http:/ / www. legalaffairs. org/ issues/ July-August-2002/ review_koerner_julaug2002. msp) [7] http:/ / skepdic. com/ control. html [8] http:/ / www. pharmaschool. co. uk/

Randomized controlled trial

A randomized controlled trial (RCT) is a type of scientific experiment most commonly used in testing the efficacy or effectiveness of healthcare services (such as medicine or nursing) or health technologies (such as pharmaceuticals, medical devices or surgery). RCTs involve the random allocation of different interventions (treatments or conditions) to subjects. The most important advantage of proper randomization is that "it eliminates selection bias, balancing both known and unknown prognostic factors, in the assignment of treatments."[2] The terms "RCT" and randomized trial are often used synonymously, but some authors distinguish between "RCTs" which compare treatment groups with control groups not Flowchart of four phases (enrollment, intervention allocation, follow-up, and data receiving treatment (as in a analysis) of a parallel randomized trial of two groups, modified from the placebo-controlled study), and "randomized [1] CONSORT (Consolidated Standards of Reporting Trials) 2010 Statement trials" which can compare multiple treatment groups with each other.[3] RCTs are sometimes known as randomized control trials.[4] RCTs are also called randomized clinical trials or randomized controlled clinical trials when they concern clinical research[5] [6] [7] ; however, RCTs are also employed in other research areas such as criminology, education, and international development.

Randomized controlled trial


It is claimed that the first published RCT was a 1948 paper entitled "Streptomycin treatment of pulmonary tuberculosis. A Medical Research Council investigation."[8] [9] [10] One of the authors of that paper was Austin Bradford Hill, who is credited as having conceived the modern RCT.[11] By the late 20th century, RCTs had become the "gold standard" for "rational therapeutics" in medicine.[12] As of 2004, more than 150,000 RCTs were in the Cochrane Library.[11] To improve the reporting of RCTs in the medical literature, an international group of scientists and editors published Consolidated Standards of Reporting Trials (CONSORT) Statements in 1996, 2001, and 2010 which have become widely accepted.[1] [2]

Although the principle of clinical equipoise ("genuine uncertainty within the expert medical community... about the preferred treatment") common to clinical trials[13] has been applied to RCTs, the ethics of RCTs have special considerations. For one, it has been argued that equipoise itself is insufficient to justify RCTs.[14] For another, "collective equipoise" can conflict with a lack of personal equipoise (e.g., a personal belief that an intervention is effective).[15] Finally, Zelen's design, which has been used for some RCTs, randomizes subjects before they provide informed consent, which may be ethical for RCTs of screening and selected therapies, but is likely unethical "for most therapeutic trials."[16] [17]

Classifications of RCTs
By study design
One way to classify RCTs is by study design. From most to least common in the medical literature, the major categories of RCT study designs are[18] : Parallel-group each participant is randomly assigned to a group, and all the participants in the group receives (or does not receive) an intervention Crossover over time, each participant receives (or does not receive) an intervention in a random sequence Split-body separate parts of the body of each participant (e.g., the left and right sides of the face) are randomized to receive (or not receive) an intervention Cluster pre-existing groups of participants (e.g., villages, schools) are randomly selected to receive (or not receive) an intervention Factorial each participant is randomly assigned to a group that receives a particular combination of interventions or non-interventions (e.g., group 1 receives vitamin X and vitamin Y, group 2 receives vitamin X and placebo Y, group 3 receives placebo X and vitamin Y, and group 4 receives placebo X and placebo Y) An analysis of the 616 RCTs indexed in PubMed during December 2006 found that 78% were parallel-group trials, 16% were crossover, 2% were split-body, 2% were cluster, and 2% were factorial.[18]

Randomized controlled trial


By outcome of interest (efficacy vs. effectiveness)

RCTs can be classified as "explanatory" or "pragmatic."[19] Explanatory RCTs test efficacy in a research setting with highly selected participants and under highly controlled conditions.[19] In contrast, pragmatic RCTs test effectiveness in everyday practice with relatively unselected participants and under flexible conditions; in this way, pragmatic RCTs can "inform decisions about practice."[19]

By hypothesis (superiority vs. noninferiority vs. equivalence)

Another classification of RCTs categorizes them as "superiority trials," "noninferiority trials," and "equivalence trials," which differ in methodology and reporting.[20] Most RCTs are superiority trials, in which one intervention is hypothesized to be superior to another in a statistically significant way.[20] Some RCTs are noninferiority trials "to determine whether a new treatment is no worse than a reference treatment."[20] Other RCTs are equivalence trials in which the hypothesis is that two interventions are indistinguishable from each other.[20]

The advantages of proper randomization in RCTs include[21] : "It eliminates bias in treatment assignment," specifically selection bias and confounding. "It facilitates blinding (masking) of the identity of treatments from investigators, participants, and assessors." "It permits the use of probability theory to express the likelihood that any difference in outcome between treatment groups merely indicates chance." There are two processes involved in randomizing patients to different interventions. First is choosing a randomization procedure to generate an unpredictable sequence of allocations; this may be a simple random assignment of patients to any of the groups at equal probabilities, may be "restricted," or may be "adaptive." A second and more practical issue is allocation concealment, which refers to the stringent precautions taken to ensure that the group assignment of patients are not revealed prior to definitively allocating them to their respective groups. Non-random "systematic" methods of group assignment, such as alternating subjects between one group and the other, can cause "limitless contamination possibilities" and can cause a breach of allocation concealment.[21]

Randomization procedures
An ideal randomization procedure would achieve the following goals[22] : Equal group sizes for adequate statistical power, especially in subgroup analyses. Low selection bias. That is, the procedure should not allow an investigator to predict the next subject's group assignment by examining which group has been assigned the fewest subjects up to that point. Low probability of confounding (i.e., a low probability of "accidental bias"[21] [23] , which implies a balance in covariates across groups. If the randomization procedure causes an imbalance in covariates related to the outcome across groups, estimates of effect may be biased if not adjusted for the covariates (which may be unmeasured and therefore impossible to adjust for). However, no single randomization procedure meets those goals in every circumstance, so researchers must select a procedure for a given study based on its advantages and disadvantages.

Randomized controlled trial Simple randomization This is a commonly used and intuitive procedure, similar to "repeated fair coin-tossing."[21] Also known as "complete" or "unrestricted" randomization, it is robust against both selection and accidental biases. However, its main drawback is the possibility of imbalanced group sizes in small RCTs. It is therefore recommended only for RCTs with over 200 subjects.[24] Restricted randomization To balance group sizes in smaller RCTs, some form of "restricted" randomization is recommended.[24] The major types of restricted randomization used in RCTs are: Permuted-block randomization or blocked randomization: a "block size" and "allocation ratio" (number of subjects in one group versus the other group) are specified, and subjects are allocated randomly within each block.[21] For example, a block size of 6 and an allocation ratio of 2:1 would lead to random assignment of 4 subjects to one group and 2 to the other. This type of randomization can be combined with "stratified randomization," for example by center in a multicenter trial, to "ensure good balance of participant characteristics in each group."[2] A special case of permuted-block randomization is random allocation, in which the entire sample is treated as one block.[21] The major disadvantage of permuted-block randomization is that even if the block sizes are large and randomly varied, the procedure can lead to selection bias.[22] Another disadvantage is that "proper" analysis of data from permuted-block-randomized RCTs requires stratification by blocks.[24] Adaptive biased-coin randomization methods (of which urn randomization is the most widely-known type): In these relatively uncommon methods, the probability of being assigned to a group increases if the group is over-represented and decreases if the group is under-represented.[21] The methods are thought to be less affected by selection bias than permuted-block randomization.[24] Adaptive At least two types of "adaptive" randomization procedures have been used in RCTs, but much less frequently than simple or restricted randomization: Covariate-adaptive randomization, of which one type is minimization: The probability of being assigned to a group varies in order to minimize "covariate imbalance."[24] Minimization is reported to have "supporters and detractors"[21] ; because only the first subject's group assignment is truly chosen at random, the method does not necessarily eliminate bias on unknown factors[2] . Response-adaptive randomization, also known as outcome-adaptive randomization: The probability of being assigned to a group increases if the responses of the prior patients in the group were favorable.[24] Although arguments have been made that this approach is more ethical than other types of randomization when the probability that a treatment is effective or ineffective increases during the course of an RCT, ethicists have not yet studied the approach in detail.[25]


Allocation concealment
"Allocation concealment" (defined as "the procedure for protecting the randomisation process so that the treatment to be allocated is not known before the patient is entered into the study") is considered desirable in RCTs.[26] In practice, in taking care of individual patients, clinical investigators in RCTs often find it difficult to maintain impartiality. Stories abound of investigators holding up sealed envelopes to lights or ransacking offices to determine group assignments in order to dictate the assignment of their next patient.[21] Such practices introduce selection bias and confounders (both of which should have minimized by randomization), thereby possibly distorting the results of the study.[21] Some standard methods of ensuring allocation concealment include sequentially-numbered, opaque, sealed envelopes (SNOSE); sequentially-numbered containers; pharmacy controlled randomization; and central

Randomized controlled trial randomization.[21] It is recommended that allocation concealment methods be included in an RCT's protocol, and that the allocation concealment methods should be reported in detail in a publication of an RCT's results; however, 2005 study determined that most RCTs have unclear allocation concealment in their protocols, in their publications, or both.[27] On the other hand, a 2008 study of 146 meta-analyses concluded that the results of RCTs with inadequate or unclear allocation concealment tended to be biased toward beneficial effects only if the RCTs' outcomes were subjective as opposed to objective.[28]


An RCT may be Blinded, (also called "masked") by "procedures that prevent study participants, caregivers, or outcome assessors from knowing which intervention was received."[28] Unlike allocation concealment, blinding is sometimes inappropriate or impossible to perform in an RCT; for example, if an RCT involves a treatment in which active participation of the patient is necessary (e.g., physical therapy), participants cannot be blinded to the intervention. Traditionally, blinded RCTs have been classified as "single-blind," "double-blind," or "triple-blind"; however, in 2001 and 2006 two studies showed that these terms have different meanings for different people.[29] [30] The 2010 CONSORT Statement specifies that authors and editors should not use the terms "single-blind," "double-blind," and "triple-blind"; instead, reports of blinded RCT should discuss "If done, who was blinded after assignment to interventions (for example, participants, care providers, those assessing outcomes) and how."[2] RCTs without blinding are referred to as "unblinded"[31] , "open"[32] , or (if the intervention is a medication) "open-label"[33] . In 2008 a study concluded that the results of unblinded RCTs tended to be biased toward beneficial effects only if the RCTs' outcomes were subjective as opposed to objective[28] ; for example, in an RCT of treatments for multiple sclerosis, unblinded neurologists (but not blinded neurologists) felt that the treatments were beneficial[34] . In pragmatic RCTs, although the participants and providers are often unblinded, it is "still desirable and often possible to blind the assessor or obtain an objective source of data for evaluation of outcomes."[19]

Analysis of data from RCTs

The types of statistical methods used in RCTs depend on the characteristics of the data and include: For dichotomous (binary) outcome data, logistic regression (e.g., to predict sustained virological response after receipt of peginterferon alfa-2a for hepatitis C[35] ) and other methods can be used. For continuous outcome data, analysis of covariance (e.g., for changes in blood lipid levels after receipt of atorvastatin after acute coronary syndrome[36] ) tests the effects of predictor variables. For time-to-event outcome data that may be censored, survival analysis (e.g., KaplanMeier estimators and Cox proportional hazards models for time to coronary heart disease after receipt of hormone replacement therapy in menopause[37] ) is appropriate. Regardless of the statistical methods used, important considerations in the analysis of RCT data include: Whether a RCT should be stopped early due to interim results. For example, RCTs may be stopped early if an intervention produces "larger than expected benefit or harm," or if "investigators find evidence of no important difference between experimental and control interventions."[2] The extent to which the groups can be analyzed exactly as they existed upon randomization (i.e., whether a so-called "intention-to-treat analysis" is used). A "pure" intention-to-treat analysis is "possible only when complete outcome data are available" for all randomized subjects[38] ; when some outcome data are missing, options include analyzing only cases with known outcomes and using imputed data[2] . Nevertheless, the more that analyses can include all participants in the groups to which they were randomized, the less bias that an RCT will be subject to.[2]

Randomized controlled trial Whether subgroup analysis should be performed. These are "often discouraged" because multiple comparisons may produce false positive findings that cannot be confirmed by other studies.[2]


Reporting of RCT results

The CONSORT 2010 Statement is "an evidence-based, minimum set of recommendations for reporting RCTs."[39] The CONSORT 2010 checklist contains 25 items (many with sub-items) focusing on "individually randomised, two group, parallel trials" which are the most common type of RCT.[1] For other RCT study designs, "CONSORT extensions" have been published.[1]

RCTs are considered by most to be the most reliable form of scientific evidence in the hierarchy of evidence that influences healthcare policy and practice because RCTs reduce spurious causality and bias. Results of RCTs may be combined in systematic reviews which are increasingly being used in the conduct of evidence-based medicine. Some examples of scientific organizations' considering RCTs or systematic reviews of RCTs to be the highest-quality evidence available are: As of 1998, the National Health and Medical Research Council of Australia designated "Level I" evidence as that "obtained from a systematic review of all relevant randomised controlled trials" and "Level II" evidence as that "obtained from at least one properly designed randomised controlled trial."[40] Since at least 2001, in making clinical practice guideline recommendations the United States Preventive Services Task Force has considered both a study's design and its internal validity as indicators of its quality.[41] It has recognized "evidence obtained from at least one properly randomized controlled trial" with good internal validity (i.e., a rating of "I-good") as the highest quality evidence available to it.[41] The GRADE Working Group concluded in 2008 that "randomised trials without important limitations constitute high quality evidence."[42] For issues involving "Therapy/Prevention, Aetiology/Harm," the Oxford Centre for Evidence-based Medicine]] as of 2009 defined "Level 1a" evidence as a systematic review of RCTs that are consistent with each other, and "Level 1b" evidence as an "individual RCT (with narrow Confidence Interval)."[43] Notable RCTs with unexpected results that contributed to changes in clinical practice include: After Food and Drug Administration approval, the antiarrhythmic agents flecainide and encainide came to market in 1986 and 1987 respectively.[44] The non-randomized studies concerning the drugs were characterized as "glowing"[45] , and their sales increased to a combined total of approximately 165,000 prescriptions per month in early 1989[44] . In that year, however, a preliminary report of a RCT concluded that the two drugs increased mortality.[46] Sales of the drugs then decreased.[44] Prior to 2002, based on observational studies, it was routine for physicians to prescribe hormone replacement therapy for post-menopausal women to prevent myocardial infarction.[45] In 2002 and 2004, however, published RCTs from the Women's Health Initiative claimed that women taking hormone replacement therapy with estrogen plus progestin had a higher rate of myocardial infarctions than women on a placebo, and that estrogen-only hormone replacement therapy caused no reduction in the incidence of coronary heart disease.[37] [47] Possible explanations for the discrepancy between the observational studies and the RCTs involved differences in methodology, in the hormone regimens used, and in the populations studied.[48] [49] The use of hormone replacement therapy decreased after publication of the RCTs.[50]

Randomized controlled trial


Many papers discuss the disadvantages of RCTs.[51] [52] Among the most frequently-cited drawbacks are:

Limitations of external validity

The extent to which RCTs' results are applicable outside the RCTs varies; that is, RCTs' external validity may be limited.[51] [53] Factors that can affect RCTs' external validity include[53] : Where the RCT was performed (e.g., what works in one country may not work in another) Characteristics of the patients (e.g., an RCT may include patients whose prognosis is better than average, or may exclude "women, children, the elderly, and those with common medical conditions"[54] ) Study procedures (e.g., in an RCT patients may receive intensive diagnostic procedures and follow-up care difficult to achieve in the "real world") Outcome measures (e.g., RCTs may use composite measures infrequently used in clinical practice) Incomplete reporting of adverse effects of interventions

RCTs can be expensive[52] ; one study found 28 Phase III RCTs funded by the National Institute of Neurological Disorders and Stroke prior to 2000 with a total cost of US$335 million[55] , for a mean cost of US$12 million per RCT. Nevertheless, the return on investment of RCTs may be high, in that the same study projected that the 28 RCTs produced a "net benefit to society at 10-years" of 46 times the cost of the trials program, based on evaluating a quality-adjusted life year as equal to the prevailing mean per capita gross domestic product.[55]

Relative importance of RCTs and observational studies

Two studies published in The New England Journal of Medicine in 2000 found that observational studies and RCTs overall produced similar results[56] [57] . The authors of the 2000 findings cast doubt on the ideas that "observational studies should not be used for defining evidence-based medical care" and that RCTs' results are "evidence of the highest grade."[56] [57] However, a 2001 study published in Journal of the American Medical Association concluded that "discrepancies beyond chance do occur and differences in estimated magnitude of treatment effect are very common" between observational studies and RCTs.[58] Two other lines of reasoning question RCTs' contribution to scientific knowledge beyond other types of studies: If study designs are ranked by their potential for new discoveries, then anecdotal evidence would be at the top of the list, followed by observational studies, followed by RCTs.[59] RCTs may be unnecessary for treatments that have dramatic and rapid effects relative to the expected stable or progressively worse natural course of the condition treated.[51] [60] One example is combination chemotherapy including cisplatin for metastatic testicular cancer, which increased the cure rate from 5% to 60% in a 1977 non-randomized study.[60] [61]

Randomized controlled trial


Difficulty in studying rare events

Interventions to prevent events that occur only infrequently (e.g., sudden infant death syndrome) and uncommon adverse outcomes (e.g., a rare side effect of a drug) would require RCTs with extremely large sample sizes and may therefore best be assessed by observational studies.[51]

Difficulty in studying outcomes in distant future

It is costly to maintain RCTs for the years or decades that would be ideal for evaluating some interventions.[51] [52]

Pro-industry findings in industry-funded RCTs

Some RCTs are fully or partly funded by the health care industry (e.g., the pharmaceutical industry) as opposed to government, nonprofit, or other sources. A systematic review published in 2003 found four 1986-2002 articles comparing industry-sponsored and nonindustry-sponsored RCTs, and in all the articles there was a correlation of industry sponsorship and positive study outcome.[62] A 2004 study of 1999-2001 RCTs published in leading medical and surgical journals determined that industry-funded RCTs "are more likely to be associated with statistically significant pro-industry findings."[63] One possible reason for the pro-industry results in industry-funded published RCTs is publication bias.[63]

Therapeutic misconception
Although subjects almost always provide informed consent for their participation in an RCT, studies since 1982 have documented that many RCT subjects believe that they are certain to receive treatment that is best for them personally; that is, they do not understand the difference between research and treatment.[64] [65] Further research is necessary to determine the prevalence of and ways to address this "therapeutic misconception."[65]

Statistical error
RCTs are subject to both type I ("false positive") and type II ("false negative") statistical errors. Regarding Type I errors, a typical RCT will use 0.05 (i.e., 1 in 20) as the probability that the RCT will falsely find two equally effective treatments significantly different.[66] Regarding Type II errors, despite the publication of a 1978 paper noting that the sample sizes of many "negative" RCTs were too small to make definitive conclusions about the negative results[67] , by 2005-2006 a sizeable proportion of RCTs still had inaccurate or incompletely-reported sample size calculations[68] .

Cultural effects
The RCT method creates cultural effects that have not been well understood. [69] For example, patients with terminal illness may attempt to join trials as a last ditch attempt at treatment, even when treatments are unlikely to be successful.

RCTs in criminology, education, and international development

A 2005 review found 83 randomized experiments in criminology published in 1982-2004, compared with only 35 published in 1957-1981.[70] The authors classified the studies they found into five categories: "policing", "prevention", "corrections", "court", and "community".[70] Focusing only on offending behavior programs, Hollin (2008) argued that RCTs may be difficult to implement (e.g., if an RCT required "passing sentences that would randomly assign offenders to programmes") and therefore that experiments with quasi-experimental design are still necessary.[71]

Randomized controlled trial


RCTs have been used in evaluating a number of educational interventions. For example, a 2009 study randomized 260 elementary school teachers' classrooms to receive or not receive a program of behavioral screening, classroom intervention, and parent training, and then measured the behavioral and academic performance of their students.[72] Another 2009 study randomized classrooms for 678 first-grade children to receive a classroom-centered intervention, a parent-centered intervention, or no intervention, and then followed their academic outcomes through age 19.[73]

International development
RCTs are currently being used by a number of international development experts to measure the impact of development interventions worldwide. Development economists at research organizations including Abdul Latif Jameel Poverty Action Lab[74] [75] and Innovations for Poverty Action[76] have used RCTs to measure the effectiveness of poverty, health, and education programs in the developing world. RCTs can be highly effective in policy evaluation since they allow researchers to isolate the impacts of a specific program from other factors such as other programs offered in the region, general macroeconomic growth, short-term events (such as a favorable harvest), and differences in personal qualities that might make one individual more successful than another. For development economists, the main benefit to using RCTs compared to other research methods is that randomization guards against selection bias, a problem present in many current studies of development policy. In one notable example of a cluster RCT in the field of development economics, Olken (2007) randomized 608 villages in Indonesia in which roads were about to be built into six groups (no audit vs. audit, and no invitations to accountability meetings vs. invitations to accountability meetings vs. invitations to accountability meetings along with anonymous comment forms).[77] After estimating "missing expenditures" (a measure of corruption), Olken concluded that government audits were more effective than "increasing grassroots participation in monitoring" in reducing corruption.[77]

See also
Drug development Hypothesis testing Impact evaluation Jadad scale Statistical inference

Further reading
Domanski MJ, McKinlay S. Successful randomized trials: a handbook for the 21st century. Philadelphia: Lippincott Williams & Wilkins, 2009. ISBN 9780781779456. Jadad AR, Enkin M. Randomized controlled trials: questions, answers, and musings. 2nd ed. Malden, Mass.: Blackwell, 2007. ISBN 9781405132664. Matthews JNS. Introduction to randomized controlled clinical trials. 2nd ed. Boca Raton, Fla.: CRC Press, 2006. ISBN 1584886242. Nezu AM, Nezu CM. Evidence-based outcome research: a practical guide to conducting randomized controlled trials for psychosocial interventions. Oxford: Oxford University Press, 2008. ISBN 9780195304633. Solomon PL, Cavanaugh MM, Draine J. Randomized controlled trials: design and implementation for community-based psychosocial interventions. New York: Oxford University Press, 2009. ISBN 9780195333190. Torgerson DJ, Torgerson C. Designing randomised trials in health, education and the social sciences: an introduction. Basingstoke, England, and New York: Palgrave Macmillan, 2008. ISBN 9780230537354.

Randomized controlled trial


External links
Bland M. Directory of randomisation software and services. [78] University of York, 2008 March 19. Evans I, Thornton H, Chalmers I. Testing treatments: better research for better health care. [79] London: Pinter & Martin, 2010. ISBN 9781905177356. Gelband H. The impact of randomized clinical trials on health policy and medical practice: background paper. [80] Washington, DC: U.S. Congress, Office of Technology Assessment, 1983. (Report OTA-BP-H-22.) REFLECT (Reporting guidElines For randomized controLled trials for livEstoCk and food safeTy) Statement [81] Wathen JK, Cook JD. Power and bias in adaptively randomized clinical trials. [82] M. D. Anderson Cancer Center, University of Texas, 2006 July 12.

[1] Schulz KF, Altman DG, Moher D; for the CONSORT Group (2010). "CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials" (http:/ / www. pubmedcentral. nih. gov/ articlerender. fcgi?tool=pmcentrez& artid=2844940). Br Med J 340: c332. doi:10.1136/bmj.c332. PMID20332509. PMC2844940. [2] Moher D, Hopewell S, Schulz KF, Montori V, Gtzsche PC, Devereaux PJ, Elbourne D, Egger M, Altman DG (2010). "CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials" (http:/ / www. pubmedcentral. nih. gov/ articlerender. fcgi?tool=pmcentrez& artid=2844943). Br Med J 340: c869. doi:10.1136/bmj.c869. PMID20332511. PMC2844943. [3] Ranjith G (2005). "Interferon--induced depression: when a randomized trial is not a randomized controlled trial". Psychother Psychosom 74 (6): 387. doi:10.1159/000087787. PMID16244516. [4] Chalmers TC, Smith H Jr, Blackburn B, Silverman B, Schroeder B, Reitman D, Ambroz A (1981). "A method for assessing the quality of a randomized control trial". Control Clin Trials 2 (1): 3149. doi:10.1016/0197-2456(81)90056-8. PMID7261638. [5] Peto R, Pike MC, Armitage P, Breslow NE, Cox DR, Howard SV, Mantel N, McPherson K, Peto J, Smith PG (1976). "Design and analysis of randomized clinical trials requiring prolonged observation of each patient. I. Introduction and design" (http:/ / www. pubmedcentral. nih. gov/ articlerender. fcgi?tool=pmcentrez& artid=2025229/ ). Br J Cancer 34 (6): 585612. PMID795448. PMC2025229/. [6] Peto R, Pike MC, Armitage P, Breslow NE, Cox DR, Howard SV, Mantel N, McPherson K, Peto J, Smith PG (1977). "Design and analysis of randomized clinical trials requiring prolonged observation of each patient. II. Analysis and examples" (http:/ / www. pubmedcentral. nih. gov/ articlerender. fcgi?tool=pmcentrez& artid=2025310). Br J Cancer 35 (1): 139. PMID831755. PMC2025310. [7] Wollert KC, Meyer GP, Lotz J, Ringes-Lichtenberg S, Lippolt P, Breidenbach C, Fichtner S, Korte T, Hornig B, Messinger D, Arseniev L, Hertenstein B, Ganser A, Drexler H (2004). "Intracoronary autologous bone-marrow cell transfer after myocardial infarction: the BOOST randomised controlled clinical trial". Lancet 364 (9429): 1418. doi:10.1016/S0140-6736(04)16626-9. PMID15246726. [8] Streptomycin in Tuberculosis Trials Committee (1948). "Streptomycin treatment of pulmonary tuberculosis. A Medical Research Council investigation" (http:/ / www. pubmedcentral. nih. gov/ articlerender. fcgi?tool=pmcentrez& artid=2091872). Br Med J 2 (4582): 76982. doi:10.1136/bmj.2.4582.769. PMID18890300. PMC2091872. [9] Brown D (1998-11-02). "Landmark study made research resistant to bias". Washington Post. [10] Shikata S, Nakayama T, Noguchi Y, Taji Y, Yamagishi H (2006). "Comparison of effects in randomized controlled trials with observational studies in digestive surgery" (http:/ / www. pubmedcentral. nih. gov/ articlerender. fcgi?tool=pmcentrez& artid=1856609). Ann Surg 244 (5): 66876. doi:10.1097/01.sla.0000225356.04304.bc. PMID17060757. PMC1856609. [11] Stolberg HO, Norman G, Trop I (2004). "Randomized controlled trials" (http:/ / www. ajronline. org/ cgi/ content/ full/ 183/ 6/ 1539). Am J Roentgenol 183 (6): 153944. PMID15547188. . [12] Meldrum ML (2000). "A brief history of the randomized controlled trial. From oranges and lemons to the gold standard". Hematol Oncol Clin North Am 14 (4): 74560, vii. doi:10.1016/S0889-8588(05)70309-9. PMID10949771. [13] Freedman B (1987). "Equipoise and the ethics of clinical research". N Engl J Med 317 (3): 1415. PMID3600702. [14] Gifford F (1995). "Community-equipoise and the ethics of randomized clinical trials". Bioethics 9 (2): 12748. doi:10.1111/j.1467-8519.1995.tb00306.x. PMID11653056. [15] Edwards SJL, Lilford RJ, Hewison J (1998). "The ethics of randomised controlled trials from the perspectives of patients, the public, and healthcare professionals" (http:/ / www. bmj. com/ cgi/ content/ full/ 317/ 7167/ 1209). Br Med J 317 (7167): 120912. PMID9794861. PMC1114158. . [16] Zelen M (1979). "A new design for randomized clinical trials" (http:/ / content. nejm. org/ cgi/ content/ abstract/ 300/ 22/ 1242). N Engl J Med 300 (22): 12425. PMID431682. . [17] Torgerson DJ, Roland M (1998). "What is Zelen's design?" (http:/ / www. bmj. com/ cgi/ content/ full/ 316/ 7131/ 606). Br Med J 316 (7131): 606. PMID9518917. PMC1112637. . [18] Hopewell S, Dutton S, Yu LM, Chan AW, Altman DG (2010). "The quality of reports of randomised trials in 2000 and 2006: comparative study of articles indexed in PubMed" (http:/ / www. bmj. com/ cgi/ content/ full/ 340/ mar23_1/ c723). BMJ 340: c723. doi:10.1136/bmj.c723. PMID20332510. PMC2844941. .

Randomized controlled trial

[19] Zwarenstein M, Treweek S, Gagnier JJ, Altman DG, Tunis S, Haynes B, Oxman AD, Moher D; CONSORT group; Pragmatic Trials in Healthcare (Practihc) group (2008). "Improving the reporting of pragmatic trials: an extension of the CONSORT statement" (http:/ / www. bmj. com/ cgi/ content/ full/ 337/ nov11_2/ a2390). BMJ 337: a2390. doi:10.1136/bmj.a2390. PMID19001484. . [20] Piaggio G, Elbourne DR, Altman DG, Pocock SJ, Evans SJ; CONSORT Group (2006). "Reporting of noninferiority and equivalence randomized trials: an extension of the CONSORT statement" (http:/ / jama. ama-assn. org/ cgi/ content/ full/ 295/ 10/ 1152). JAMA 295 (10): 115260. doi:10.1001/jama.295.10.1152. PMID16522836. . [21] Schulz KF, Grimes DA (2002). "Generation of allocation sequences in randomised trials: chance, not choice" (http:/ / www. who. int/ entity/ rhl/ LANCET_515-519. pdf). Lancet 359 (9305): 5159. doi:10.1016/S0140-6736(02)07683-3. PMID11853818. . [22] Lachin JM (1988). "Statistical properties of randomization in clinical trials". Control Clin Trials 9 (4): 289311. doi:10.1016/0197-2456(88)90045-1. PMID3060315. [23] Buyse ME (1989). "Analysis of clinical trial outcomes: some comments on subgroup analyses". Control Clin Trials 10 (4 Suppl): 187S194S. doi:10.1016/0197-2456(89)90057-3. PMID2605967. [24] Lachin JM, Matts JP, Wei LJ (1988). "Randomization in clinical trials: conclusions and recommendations". Control Clin Trials 9 (4): 36574. doi:10.1016/0197-2456(88)90049-9. PMID3203526. [25] Rosenberger WF, Lachin JM (1993). "The use of response-adaptive designs in clinical trials". Control Clin Trials 14 (6): 47184. doi:10.1016/0197-2456(93)90028-C. PMID8119063. [26] Forder PM, Gebski VJ, Keech AC (2005). "Allocation concealment and blinding: when ignorance is bliss" (http:/ / www. mja. com. au/ public/ issues/ 182_02_170105/ for10877_fm. html). Med J Aust 182 (2): 879. PMID15651970. . [27] Pildal J, Chan AW, Hrbjartsson A, Forfang E, Altman DG, Gtzsche PC (2005). "Comparison of descriptions of allocation concealment in trial protocols and the published reports: cohort study" (http:/ / www. bmj. com/ cgi/ content/ full/ 330/ 7499/ 1049). BMJ 330 (7499): 1049. doi:10.1136/bmj.38414.422650.8F. PMID15817527. PMC557221. . [28] Wood L, Egger M, Gluud LL, Schulz KF, Jni P, Altman DG, Gluud C, Martin RM, Wood AJ, Sterne JA (2008). "Empirical evidence of bias in treatment effect estimates in controlled trials with different interventions and outcomes: meta-epidemiological study" (http:/ / www. bmj. com/ cgi/ content/ full/ 336/ 7644/ 601). BMJ 336 (7644): 6015. doi:10.1136/bmj.39465.451748.AD. PMID18316340. PMC2267990. . [29] Devereaux PJ, Manns BJ, Ghali WA, Quan H, Lacchetti C, Montori VM, Bhandari M, Guyatt GH (2001). "Physician interpretations and textbook definitions of blinding terminology in randomized controlled trials" (http:/ / jama. ama-assn. org/ cgi/ content/ full/ 285/ 15/ 2000). J Am Med Assoc 285 (15): 20003. doi:10.1001/jama.285.15.2000. PMID11308438. . [30] Haahr MT, Hrbjartsson A (2006). "Who is blinded in randomized clinical trials? A study of 200 trials and a survey of authors" (http:/ / ctj. sagepub. com/ cgi/ content/ abstract/ 3/ 4/ 360). Clin Trials 3 (4): 3605. doi:10.1177/1740774506069153. PMID17060210. . [31] Marson AG, Al-Kharusi AM, Alwaidh M, Appleton R, Baker GA, Chadwick DW, et al (2007). "The SANAD study of effectiveness of valproate, lamotrigine, or topiramate for generalised and unclassifiable epilepsy: an unblinded randomised controlled trial" (http:/ / www. pubmedcentral. nih. gov/ articlerender. fcgi?tool=pmcentrez& artid=2039891). Lancet 369 (9566): 101626. doi:10.1016/S0140-6736(07)60461-9. PMID17382828. PMC2039891. [32] Chan R, Hemeryck L, O'Regan M, Clancy L, Feely J (1995). "Oral versus intravenous antibiotics for community acquired lower respiratory tract infection in a general hospital: open, randomised controlled trial" (http:/ / www. bmj. com/ cgi/ content/ full/ 310/ 6991/ 1360). BMJ 310 (6991): 13602. PMID7787537. PMC2549744. . [33] Fukase K, Kato M, Kikuchi S, Inoue K, Uemura N, Okamoto S, Terao S, Amagai K, Hayashi S, Asaka M; Japan Gast Study Group (2008). "Effect of eradication of Helicobacter pylori on incidence of metachronous gastric carcinoma after endoscopic resection of early gastric cancer: an open-label, randomised controlled trial". Lancet 372 (9636): 3927. doi:10.1016/S0140-6736(08)61159-9. PMID18675689. [34] Noseworthy JH, Ebers GC, Vandervoort MK, Farquhar RE, Yetisir E, Roberts R (1994). "The impact of blinding on the results of a randomized, placebo-controlled multiple sclerosis clinical trial" (http:/ / www. neurology. org/ cgi/ content/ abstract/ 44/ 1/ 16). Neurology 44 (1): 1620. PMID8290055. . [35] Manns MP, McHutchison JG, Gordon SC, Rustgi VK, Shiffman M, Reindollar R, Goodman ZD, Koury K, Ling M, Albrecht JK (2001). "Peginterferon alfa-2b plus ribavirin compared with interferon alfa-2b plus ribavirin for initial treatment of chronic hepatitis C: a randomised trial". Lancet 358 (9286): 95865. doi:10.1016/S0140-6736(01)06102-5. PMID11583749. [36] Schwartz GG, Olsson AG, Ezekowitz MD, Ganz P, Oliver MF, Waters D, Zeiher A, Chaitman BR, Leslie S, Stern T; Myocardial Ischemia Reduction with Aggressive Cholesterol Lowering (MIRACL) Study Investigators (2001). "Effects of atorvastatin on early recurrent ischemic events in acute coronary syndromes: the MIRACL study: a randomized controlled trial" (http:/ / jama. ama-assn. org/ cgi/ content/ full/ 285/ 13/ 1711). J Am Med Assoc 285 (13): 17118. doi:10.1001/jama.285.13.1711. PMID11277825. . [37] Rossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, Stefanick ML, Jackson RD, Beresford SA, Howard BV, Johnson KC, Kotchen JM, Ockene J; Writing Group for the Women's Health Initiative Investigators (2002). "Risks and benefits of estrogen plus progestin in healthy postmenopausal women: principal results from the Women's Health Initiative randomized controlled trial" (http:/ / jama. ama-assn. org/ cgi/ content/ full/ 288/ 3/ 321). J Am Med Assoc 288 (3): 32133. doi:10.1001/jama.288.3.321. PMID12117397. . [38] Hollis S, Campbell F (1999). "What is meant by intention to treat analysis? Survey of published randomised controlled trials" (http:/ / www. bmj. com/ cgi/ content/ full/ 319/ 7211/ 670). Br Med J 319 (7211): 6704. PMID10480822. PMC28218. . [39] CONSORT Group. "Welcome to the CONSORT statement Website" (http:/ / www. consort-statement. org). . Retrieved 2010-03-29. [40] National Health and Medical Research Council (1998-11-16). A guide to the development, implementation and evaluation of clinical practice guidelines (http:/ / www. nhmrc. gov. au/ _files_nhmrc/ file/ publications/ synopses/ cp30. pdf). Canberra: Commonwealth of Australia. p.56. ISBN1864960485. . Retrieved 2010-03-28.


Randomized controlled trial

[41] Harris RP, Helfand M, Woolf SH, Lohr KN, Mulrow CD, Teutsch SM, Atkins D; Methods Work Group, Third US Preventive Services Task Force (2001). "Current methods of the US Preventive Services Task Force: a review of the process" (http:/ / www. ahrq. gov/ clinic/ ajpmsuppl/ review. pdf). Am J Prev Med 20 (3 Suppl): 2135. doi:10.1016/S0749-3797(01)00261-6. PMID11306229. . [42] Guyatt GH, Oxman AD, Kunz R, Vist GE, Falck-Ytter Y, Schnemann HJ; GRADE Working Group (2008). "What is "quality of evidence" and why is it important to clinicians?" (http:/ / www. bmj. com/ cgi/ content/ full/ 336/ 7651/ 995). BMJ 336 (7651): 9958. doi:10.1136/bmj.39490.551019.BE. PMID18456631. . [43] Oxford Centre for Evidence-based Medicine (2009 March). "Levels of evidence" (http:/ / www. cebm. net/ index. aspx?o=1025). . Retrieved 2010-03-28. [44] Anderson JL, Pratt CM, Waldo AL, Karagounis LA (1997). "Impact of the Food and Drug Administration approval of flecainide and encainide on coronary artery disease mortality: putting "Deadly Medicine" to the test" (http:/ / www. ajconline. org/ article/ S0002-9149(96)00673-X/ abstract). Am J Cardiol 79 (1): 437. doi:10.1016/S0002-9149(96)00673-X. PMID9024734. . [45] Rubin R (2006-10-16). "In medicine, evidence can be confusing - deluged with studies, doctors try to sort out what works, what doesn't" (http:/ / www. usatoday. com/ news/ health/ 2006-10-15-medical-evidence-cover_x. htm). USA Today. . Retrieved 2010-03-22. [46] "Preliminary report: effect of encainide and flecainide on mortality in a randomized trial of arrhythmia suppression after myocardial infarction. The Cardiac Arrhythmia Suppression Trial (CAST) Investigators" (http:/ / content. nejm. org/ cgi/ content/ abstract/ 321/ 6/ 406). N Engl J Med 321 (6): 40612. 1989. PMID2473403. . [47] Anderson GL, Limacher M, Assaf AR, Bassford T, Beresford SA, Black H, et al (2004). "Effects of conjugated equine estrogen in postmenopausal women with hysterectomy: the Women's Health Initiative randomized controlled trial" (http:/ / jama. ama-assn. org/ cgi/ content/ full/ 291/ 14/ 1701). JAMA 291 (14): 170112. doi:10.1001/jama.291.14.1701. PMID15082697. . [48] Grodstein F, Clarkson TB, Manson JE (2003). "Understanding the divergent data on postmenopausal hormone therapy" (http:/ / content. nejm. org/ cgi/ content/ extract/ 348/ 7/ 645). N Engl J Med 348 (7): 64550. doi:10.1056/NEJMsb022365. PMID12584376. . [49] Vandenbroucke JP (2009). "The HRT controversy: observational studies and RCTs fall in line" (http:/ / linkinghub. elsevier. com/ retrieve/ pii/ S0140-6736(09)60708-X). Lancet 373 (9671): 12335. doi:10.1016/S0140-6736(09)60708-X. PMID19362661. . [50] Hsu A, Card A, Lin SX, Mota S, Carrasquillo O, Moran A (2009). "Changes in postmenopausal hormone replacement therapy use among women with high cardiovascular risk" (http:/ / ajph. aphapublications. org/ cgi/ content/ full/ 99/ 12/ 2184). Am J Public Health 99 (12): 21847. doi:10.2105/AJPH.2009.159889. PMID19833984. . [51] Black N (1996). "Why we need observational studies to evaluate the effectiveness of health care" (http:/ / www. bmj. com/ cgi/ content/ full/ bmj;312/ 7040/ 1215). BMJ 312 (7040): 12158. PMID8634569. PMC2350940. . [52] Sanson-Fisher RW, Bonevski B, Green LW, D'Este C (2007). "Limitations of the randomized controlled trial in evaluating population-based health interventions" (http:/ / www. ajpm-online. net/ article/ S0749-3797(07)00225-5/ abstract). Am J Prev Med 33 (2): 15561. doi:10.1016/j.amepre.2007.04.007. PMID17673104. . [53] Rothwell PM (2005). "External validity of randomised controlled trials: "to whom do the results of this trial apply?"" (http:/ / apps. who. int/ rhl/ Lancet_365-9453. pdf). Lancet 365 (9453): 8293. doi:10.1016/S0140-6736(04)17670-8. PMID15639683. . [54] Van Spall HG, Toren A, Kiss A, Fowler RA (2007). "Eligibility criteria of randomized controlled trials published in high-impact general medical journals: a systematic sampling review" (http:/ / jama. ama-assn. org/ cgi/ content/ full/ 297/ 11/ 1233). JAMA 297 (11): 123340. doi:10.1001/jama.297.11.1233. PMID17374817. . [55] Johnston SC, Rootenberg JD, Katrak S, Smith WS, Elkins JS (2006). "Effect of a US National Institutes of Health programme of clinical trials on public health and costs" (http:/ / www. chrp. org/ pdf/ HSR20070511. pdf). Lancet 367 (9519): 131927. doi:10.1016/S0140-6736(06)68578-4. PMID16631910. . [56] Benson K, Hartz AJ (2000). "A comparison of observational studies and randomized, controlled trials" (http:/ / content. nejm. org/ cgi/ content/ full/ 342/ 25/ 1878). N Engl J Med 342 (25): 187886. doi:10.1056/NEJM200006223422506. PMID10861324. . [57] Concato J, Shah N, Horwitz RI (2000). "Randomized, controlled trials, observational studies, and the hierarchy of research designs" (http:/ / nejm. highwire. org/ cgi/ content/ full/ 342/ 25/ 1887). N Engl J Med 342 (25): 188792. doi:10.1056/NEJM200006223422507. PMID10861325. PMC1557642. . [58] Ioannidis JP, Haidich AB, Pappa M, Pantazis N, Kokori SI, Tektonidou MG, Contopoulos-Ioannidis DG, Lau J (2001). "Comparison of evidence of treatment effects in randomized and nonrandomized studies" (http:/ / jama. ama-assn. org/ cgi/ content/ full/ 286/ 7/ 821). J Am Med Assoc 286 (7): 82130. doi:10.1001/jama.286.7.821. PMID11497536. . [59] Vandenbroucke JP (2008). "Observational research, randomised trials, and two views of medical science" (http:/ / www. pubmedcentral. nih. gov/ articlerender. fcgi?tool=pmcentrez& artid=2265762). PLoS Med 5 (3): e67. doi:10.1371/journal.pmed.0050067. PMID18336067. PMC2265762. [60] Glasziou P, Chalmers I, Rawlins M, McCulloch P (2007). "When are randomised trials unnecessary? Picking signal from noise" (http:/ / www. pubmedcentral. nih. gov/ articlerender. fcgi?tool=pmcentrez& artid=1800999). Br Med J 334 (7589): 34951. doi:10.1136/bmj.39070.527986.68. PMID17303884. PMC1800999. [61] Einhorn LH (2002). "Curing metastatic testicular cancer" (http:/ / www. pubmedcentral. nih. gov/ articlerender. fcgi?tool=pmcentrez& artid=123692). Proc Natl Acad Sci U S A 99 (7): 45925. doi:10.1073/pnas.072067999. PMID11904381. PMC123692. [62] Bekelman JE, Li Y, Gross CP (2003). "Scope and impact of financial conflicts of interest in biomedical research: a systematic review" (http:/ / jama. ama-assn. org/ cgi/ content/ full/ 289/ 4/ 454). J Am Med Assoc 289 (4): 45465. doi:10.1001/jama.289.4.454. PMID12533125. .


Randomized controlled trial

[63] Bhandari M, Busse JW, Jackowski D, Montori VM, Schnemann H, Sprague S, Mears D, Schemitsch EH, Heels-Ansdell D, Devereaux PJ (2004). "Association between industry funding and statistically significant pro-industry findings in medical and surgical randomized trials" (http:/ / ecmaj. com/ cgi/ content/ full/ 170/ 4/ 477). Can Med Assoc J 170 (4): 47780. PMID14970094. PMC332713. . [64] Appelbaum PS, Roth LH, Lidz C (1982). "The therapeutic misconception: informed consent in psychiatric research". Int J Law Psychiatry 5 (3-4): 31929. doi:10.1016/0160-2527(82)90026-7. PMID6135666. [65] Henderson GE, Churchill LR, Davis AM, Easter MM, Grady C, Joffe S, Kass N, King NM, Lidz CW, Miller FG, Nelson DK, Peppercorn J, Rothschild BB, Sankar P, Wilfond BS, Zimmer CR (2007). "Clinical trials and medical care: defining the therapeutic misconception" (http:/ / www. pubmedcentral. nih. gov/ articlerender. fcgi?tool=pmcentrez& artid=2082641). PLoS Med 4 (11): e324. doi:10.1371/journal.pmed.0040324. PMID18044980. PMC2082641. [66] Wittes J (2002). "Sample size calculations for randomized controlled trials". Epidemiol Rev 24 (1): 3953. doi:10.1093/epirev/24.1.39. PMID12119854. [67] Freiman JA, Chalmers TC, Smith H Jr, Kuebler RR (1978). "The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial. Survey of 71 "negative" trials" (http:/ / content. nejm. org/ cgi/ content/ abstract/ 299/ 13/ 690). N Engl J Med 299 (13): 6904. PMID355881. . [68] Charles P, Giraudeau B, Dechartres A, Baron G, Ravaud P (2009 May 12). "Reporting of sample size calculation in randomised controlled trials: review" (http:/ / www. pubmedcentral. nih. gov/ articlerender. fcgi?tool=pmcentrez& artid=2680945). Br Med J 338: b1732. doi:10.1136/bmj.b1732. PMID19435763. PMC2680945. [69] Jain SL (2010). "The mortality effect: counting the dead in the cancer trial". Public Culture 21 (1): 89117. doi:10.1215/08992363-2009-017. [70] Farrington DP, Welsh BC (2005). "Randomized experiments in criminology: What have we learned in the last two decades?". Journal of Experimental Criminology 1 (1): 938. doi:10.1007/s11292-004-6460-0. [71] Hollin CR (2008). "Evaluating offending behaviour programmes: does only randomization glister?". Criminology and Criminal Justice 8 (1): 89106. doi:10.1177/1748895807085871. [72] Walker HM, Seeley JR, Small J, Severson HH, Graham BA, Feil EG, Serna L, Golly AM, Forness SR (2009). "A randomized controlled trial of the First Step to Success early intervention. Demonstration of program efficacy outcomes in a diverse, urban school district". Journal of Emotional and Behavioral Disorders 17 (4): 197212. doi:10.1177/1063426609341645. [73] Bradshaw CP, Zmuda JH, Kellam SG, Ialongo NS (2009). "Longitudinal impact of two universal preventive interventions in first grade on educational outcomes in high school". Journal of Educational Psychology 101 (4): 926937. doi:10.1037/a0016586. [74] Duflo E, Glennerster R, Kremer M (2006 December 12). "Using randomization in development economics research: a toolkit" (http:/ / web. archive. org/ web/ 20070119193234/ http:/ / www. povertyactionlab. com/ papers/ Using+ Randomization+ in+ Development+ Economics. pdf) (PDF). MIT Department of Economics. . Working Paper No. 06-36 [75] Banerjee AV, Cole S, Duflo E, Linden L (2007). "Remedying education: evidence from two randomized experiments in India". Quarterly Journal of Economics 122 (3): 12351264. doi:10.1162/qjec.122.3.1235. [76] Karlan D, Zinman J (2010). "Expanding credit access: using randomized supply decisions to estimate the impacts". Review of Financial Studies 23 (1): 433464. doi:10.1093/rfs/hhp092. [77] Olken BA (2007). "Monitoring corruption: evidence from a field experiment in Indonesia". Journal of Political Economy 115 (2): 200249. doi:10.1086/517935. [78] http:/ / www-users. york. ac. uk/ ~mb55/ guide/ randsery. htm [79] http:/ / www. jameslindlibrary. org/ pdf/ testing-treatments. pdf [80] http:/ / www. fas. org/ ota/ reports/ 8310. pdf [81] http:/ / reflect-statement. org/ [82] http:/ / www. mdanderson. org/ education-and-research/ departments-programs-and-labs/ departments-and-divisions/ division-of-quantitative-sciences/ research/ biostats-utmdabtr-002-06. pdf


Statistical inference


Statistical inference
Statistical inference is the process of making conclusions using data that is subject to random variation, for example, observational errors or sampling variation.[1] More substantially, the terms statistical inference, statistical induction and inferential statistics are used to describe systems of procedures that can be used to draw conclusions from datasets arising from systems affected by random variation.[2] Initial requirements of such a system of procedures for inference and induction are that the system should produce reasonable answers when applied to well-defined situations and that it should be general enough to be applied across a range of situations. The outcome of statistical inference may be an answer to the question "what should be done next?", where this might be a decision about making further experiments or surveys, or about drawing a final conclusion before implementing some organizational or governmental policy.

For the most part, statistical inference makes propositions about populations, using data drawn from the population of interest via some form of random sampling. More generally, data about a random process is obtained from its observed behavior during a finite period of time. Given a parameter or hypothesis about which one wishes to make inference, statistical inference most often uses: a statistical model of the random process that is supposed to generate the data, and a particular realization of the random process, i.e. a set of data. The conclusion of a statistical inference is a statistical proposition. Some common forms of statistical proposition are: an estimate, i.e. a particular value that best approximates some parameter of interest, a confidence interval (or set estimate), i.e. an interval constructed from the data in such a way that, under repeated sampling of datasets, such intervals would contain the true parameter value with the probability at the stated confidence level, a credible interval, i.e. a set of values containing e.g. 95% of posterior belief, rejection of a hypothesis [3] clustering or classification of data points into groups

Comparison to descriptive statistics

Statistical inference is generally distinguished from descriptive statistics. In simple terms, descriptive statistics can be thought of as being just a straightforward presentation of facts, in which modeling decisions made by a data analyst have had minimal influence. A complete statistical analysis will nearly always include both descriptive statistics and statistical inference, and will often progress in a series of steps where the emphasis moves gradually from description to inference.

Statistical inference


Any statistical inference requires some assumptions. A statistical model is a set of assumptions concerning the generation of the observed data and similar data. Descriptions of statistical models usually emphasize the role of population quantities of interest, about which we wish to draw inference.[4]

Degree of models/assumptions
Statisticians distinguish between three levels of modeling assumptions; Fully parametric: The probability distributions describing the data-generation process are assumed to be fully described by a family of probability distributions involving only a finite number of unknown parameters.[4] For example, one may assume that the distribution of population values is truly Normal, with unknown mean and variance, and that datasets are generated by 'simple' random sampling. The family of generalized linear models is a widely-used and flexible class of parametric models. Non-parametric: The assumptions made about the process generating the data are much less than in parametric statistics and may be minimal.[5] . For example, every continuous probability distribution has a median, which may be estimated using the sample median or the Hodges-Lehmann-Sen estimator, which has good properties when the data arise from simple random sampling. Semi-parametric: This term typically implies assumptions 'between' fully and non-parametric approaches. For example, one may assume that a population distribution have a finite mean. Furthermore, one may assume that the mean response level in the population depends in a truly linear manner on some covariate (a parametric assumption) but not make any parametric assumption describing the variance around that mean (i.e. about the presence or possible form of any heteroscedasticity). More generally, semi-parametric models can often be separated into 'structural' and 'random variation' components. One component is treated parametrically and the other non-parametrically. The well-known Cox model is a set of semi-parametric assumptions.

Importance of valid models/assumptions

Whatever level of assumption is made, correctly-calibrated inference in general requires these assumptions to be correct, i.e. that the data-generating mechanisms really has been correctly specified. Incorrect assumptions of 'simple' random sampling can invalidate statistical inference [6] . More complex semi- and fully-parametric assumptions are also cause for concern. For example, incorrectly assuming the Cox model can in some cases lead to faulty conclusions [7] . Incorrect assumptions of Normality in the population also invalidates some forms of regression-based inference [8] . The use of any parametric model is viewed skeptically by most experts in sampling human populations: "most sampling statisticians, when they deal with confidence intervals at all, limit themselves to statements about [estimators] based on very large samples, where the central limit theorem ensures that these [estimators] will have distributions that are nearly normal."[9] Here, the central limit theorem states that the distribution of the sample mean "for very large samples" is approximately normally distributed, if the distribution is not heavy tailed. Approximate distributions Given the difficulty in specifying exact distributions of sample statistics, many methods have been developed for approximating these. With finite samples, approximation results measure how close a limiting distribution approaches the statistic's sample distribution: For example, with 10,000 independent samples the normal distribution approximates (to two digits of accuracy) the distribution of the sample mean for many population distributions, by the BerryEsseen theorem[10] . Yet for many practical purposes, the normal approximation provides a good approximation to the sample-mean's distribution when there are 10 (or more) independent samples, according to simulation studies, and statisticians' experience.[11] Following Kolmogorov's work in the 1950s, advanced statistics uses approximation

Statistical inference theory and functional analysis to quantify the error of approximation: In this approach, the metric geometry of probability distributions is studied; this approach quantifies approximation error with e.g. the KullbackLeibler distance, Bregman divergence, and the Hellinger distance.[12] [13] [14] With infinite samples, limiting results like the central limit theorem describe the sample statistic's limiting distribution, if one exists. Limiting results are not statements about finite samples, and indeed are logically irrelevant to finite samples.[15] However, the asymptotic theory of limiting distributions is often invoked for work in estimation and testing. For example, limiting results are often invoked to justify the generalized method of moments and the use of generalized estimating equations, which are popular in e.g. econometrics and biostatistics. The magnitude of the difference between the limiting distribution and the true distribution (formally, the 'error' of the approximation) can be assessed using simulation:[16] . The use of limiting results in this way works well in many applications, especially with low-dimensional models with log-concave likelihoods (such as with one-parameter exponential families).


Randomization-based models
For a given dataset that was produced by a randomization design, the randomization distribution of a statistic (under the null-hypothesis) is defined by evauating the test statistic for all of the plans that could have been generated by the randomiation design. In frequentist inference, randomization allows inferences to be based on the randomization distribution rather than a subjective model, and this is important especially in survey sampling and design of experiments.[17] [18] . Statistical inference from randomized studies is also more straightforward than many other situations.[19] [20] [21] In Bayesian inference, randomization is also of importance: In survey sampling sampling without replacement ensures the exchangeability of the sample with the population; in randomized experiments, randomization warrants a missing at random assumption for covariate information.[22] Objective randomization allows properly inductive procedures.[23] [24] [25] [26] Many statisticians prefer randomization-based analysis of data that was generated by well-defined randomization procedures.[27] (However, it is true that in fields of science with developed theoretical knowledge and experimental control, randomized experiments may increase the costs of experimentation without improving the quality of inferences.[28] [29] ) Similarly, results from randomized experiments are recommended by leading statistical authorities as allowing inferences with greater reliability than do observational studies of the same phenomena.[30] However, a good observational study may be better than a bad randomized experiment. The statistical analysis of a randomized experiment may be based on the randomization scheme stated in the experimental protocol and does not need a subjective model.[31] [32] However, not all hypotheses can be tested by randomized experiments or random samples, which often require a large budget, a lot of expertise and time, and may have ethical problems. Model-based analysis of randomized experiments It is standard practice to refer to a statistical model, often a normal linear model, when analyzing data from randomized experiments. However, the randomization scheme guides the choice of a statistical model. It is not possible to choose an appropriate model without knowing the randomization scheme.[33] Seriously misleading results can be obtained analyzing data from randomized experiments while ignoring the experimental protocol; common mistakes include forgetting the blocking used in an experiment and confusing repeated measurements on the same experimental unit with independent replicates of the treatment applied to different experimental units.[34]

Statistical inference


Modes of inference
Different schools of statistical inference have become established. These schools (or 'paradigms') are not mutually-exclusive, and methods which work well under one paradigm often have attractive interpretations under other paradigms. The two main paradigms in use are frequentist and Bayesian inference, which are both summarized below.

Frequentist inference
This paradigm calibrates the production of propositions by considering (notional) repeated sampling of datasets similar to the one at hand. By considering its characteristics under repeated sample, the frequentist properties of any statistical inference procedure can be described - although in practice this quantification may be challenging. Examples of frequentist inference P-value Confidence interval Frequentist inference, objectivity, and decision theory Frequentist inference calibrates procedures, such as tests of hypothesis and constructions of confidence intervals, in terms of frequency probability; that is, in terms of repeated sampling from a population. (In contrast, Bayesian inference calibrates procedures with regard to epistemological uncertainty, described as a probability measure) The frequentist calibration of procedures can be done without regard to utility functions. However, some elements of frequentist statistics, such as statistical decision theory, do incorporate utility functions. In particular, frequentist developments of optimal inference (such as minimum-variance unbiased estimators, or uniformly most powerful testing) make use of loss functions, which play the role of (negative) utility functions. Loss functions must be explicitly stated for statistical theorists to prove that a statistical procedure has an optimality property. For example, median-unbiased estimators are optimal under absolute value loss functions, and least squares estimators are optimal under squared error loss functions. While statisticians using frequentist inference must choose for themselves the parameters of interest, and the estimators/test statistic to be used, the absence of obviously-explicit utilities and prior distributions has helped frequentist procedures to become widely-viewed as 'objective'.

Bayesian inference
The Bayesian calculus describes degrees of belief using the 'language' of probability; beliefs are positive, integrate to one, and obey probability axioms. Bayesian inference uses the available posterior beliefs as the basis for making statistical propositions. There are several different justifications for using the Bayesian approach. Examples of Bayesian inference Credible intervals for interval estimation Bayes factors for model comparison Bayesian inference, subjectivity and decision theory Many informal Bayesian inferences are based on "intuitively reasonable" summaries of the posterior. For example, the posterior mean, median and mode, highest posterior density intervals, and Bayes Factors can all be motivated in this way. While a user's utility function need not be stated for this sort of inference, these summaries do all depend (to some extent) on stated prior beliefs, and are generally viewed as subjective conclusions. (Methods of prior construction which do not require external input have been proposed but not yet fully developed.)

Statistical inference Formally, Bayesian inference is calibrated with reference to an explicitly stated utility, or loss function; the 'Bayes rule' is the one which maximizes expected utility, averaged over the posterior uncertainty. Formal Bayesian inference therefore automatically provides optimal decisions in a decision theoretic sense. Given assumptions, data and utility, Bayesian inference can be made for essentially any problem, although not every statistical inference need have a Bayesian interpretation. Analyses which are not formally Bayesian can be (logically) incoherent; a feature of Bayesian procedures which use proper priors (i.e. those integrable to one) is that they are guaranteed to be coherent. Some advocates of Bayesian inference assert that inference must take place in this decision-theoretic framework, and that Bayesian inference should not conclude with the evaluation and summarization of posterior beliefs.


Other modes of inference (besides frequentist and Bayesian)

Information and computational complexity Other forms of statistical inference have been developed from ideas in information theory [35] and the theory of Kolmogorov complexity.[36] . For example, the minimum description length (MDL) principle selects statistical models that maximally compress the data; inference proceeds without assuming counterfactual or non-falsifiable 'data-generating mechanisms' or probability models for the data, as might be done in frequentist or Bayesian approaches. However, if a 'data generating mechanism' does exist in reality, then according to Shannon's source coding theorem it provides the MDL description of the data, on average and asymptotically[37] . In minimizing description length (or descriptive complexity), MDL estimation is similar to maximum likelihood estimation and maximum a posteriori estimation (using maximum-entropy Bayesian priors). However, MDL avoids assuming that the underlying probability model is known; the MDL principle can also be applied without assumptions that e.g. the data arose from independent sampling[37] [38] . The MDL principle has been applied in communication-coding theory in information theory, in linear regression, and in time-series analysis (particularly for chosing the degrees of the polynomials in Autoregressive moving average (ARMA) models).[38] Information-theoretic statistical inference has been popular in data mining, which has become a common approach for very large observational and heterogeneous datasets made possible by the computer revolution and internet[36] . The evaluation of statistical inferential procedures often uses techniques or criteria from computational complexity theory or numerical analysis.[39] Fiducial inference Fiducial inference was an approach to statistical inference based on fiducial probability, also known as a "fiducial distribution". In subsequent work, this approach has been recognized as being ill-defined, extremely limited in applicability, and even fallacious.[40] Structural inference Developing ideas of Fisher and of Pitman from 1938-1939[41] , George A. Barnard developed "structural inference" or "pivotal inference",[42] an approach using invariant probabilities on group families. Barnard reformulated the arguments behind fiducial inference on a restricted class of models on which "fiducial" procedures would be well-defined and useful.

Statistical inference


Inference topics
The topics below are usually included in the area of statistical inference. 1. 2. 3. 4. 5. 6. 7. 8. Statistical assumptions Statistical decision theory Estimation theory Statistical hypothesis testing Revising opinions in statistics Design of experiments, the analysis of variance, and regression Survey sampling Summarizing statistical data

See also
Predictive inference Induction (philosophy) Philosophy of statistics Algorithmic inference

Bickel, Peter J.; Doksum, Kjell A. (2001). Mathematical statistics: Basic and selected topics. 1 (Second (updated printing 2007) ed.). Pearson Prentice-Hall. MR443141. Cox, D. R. (2006). Principles of Statistical Inference, CUP. ISBN 0-521-68567-2. Fisher, Ronald "Statistical methods and scientific induction" J. Roy. Statist. Soc. Ser. B. 17 (1955), 6978. (criticism of statistical theories of Jerzy Neyman and Abraham Wald) Freedman, David A. (2009). Statistical models: Theory and practice [43] (revised ed.). Cambridge University Press. pp.xiv+442 pp.. MR2489600. ISBN978-0-521-74385-3. Hansen, Mark H.; Yu, Bin (June 2001). "Model Selection and the Principle of Minimum Description Length: Review paper" [44]. Journal of the American Statistical Association 96 (454): pp.746774. JSTOR2670311.MR1939352. Kolmogorov, Andrei N. (1963). "On Tables of Random Numbers". Sankhy Ser. A. 25: pp.369375. MR178484. Kolmogorov, Andrei N. (1998). "On Tables of Random Numbers". Theoretical Computer Science 207 (2): pp.387--395. doi:10.1016/S0304-3975(98)00075-9. MR1643414. Neyman, Jerzy (1956). "Note on an Article by Sir Ronald Fisher" [45]. Journal of the Royal Statistical Society. Series B (Methodological) 18 (2): pp.288294. JSTOR2983716. (reply to Fisher 1955) Peirce, C. S. (18771878), "Illustrations of the Logic of Science" (series), Popular Science Monthly, vols. 12-13. Relevant individual papers: (1878 March), "The Doctrine of Chances", Popular Science Monthly, v. 12, March issue, pp. 604 [46]615. Reprinted (CLL 61-81), (CP 2.645-668), (W 3:276-290), (EP 1:142-154). Internet Archive Eprint [47]. Selections plus CP 2.661-668 and CP 2.758, published as "The Doctrine of Chances With Later Reflections", PWP 157-173. (1878 April), "The Probability of Induction", Popular Science Monthly, v. 12, pp. 705 [48]718. Reprinted (CLL 82-105), (CP 2.669-693), (PWP 174-189), (EP 1:155-169). Internet Archive Eprint [49]. (1878 June), "The Order of Nature", Popular Science Monthly, v. 13, pp. 203 [50]217. Reprinted (CLL 106-130), (CP 6.395-427), (EP 1:170-185). Internet Archive Eprint [51]. (1878 August), "Deduction, Induction, and Hypothesis", Popular Science Monthly, v. 13, pp. 470 [52]482. Reprinted (CLL 131-156), (CP 2.619-644), (EP 1:186-199). Internet Archive Eprint [53].

Statistical inference Peirce, C. S. (1883), "A Theory of Probable Inference", Studies in Logic, pp. 126-181 [54], Little, Brown, and Company. (Reprinted 1983, John Benjamins Publishing Company, ISBN 9027232717 (W 4:408-453) Pfanzagl, Johann; with the assistance of R. Hambker (1994). Parametric Statistical Theory. Berlin: Walter de Gruyter. MR1291393. ISBN3-11-01-3863-8, 3-11-014030-6. Rissanen, Jorma (1989). Stochastic Complexity in Statistical Inquiry. Series in computer science. 15. Singapore: World Scientific. MR1082556. Soofi, Ehsan S. (December 2000). "Principal Information-Theoretic Approaches (Vignettes for the Year 2000: Theory and Methods, ed. by George Casella)" [55]. Journal of the American Statistical Association 95 (452): pp.13491353. JSTOR2669786.MR1825292. Zabell, S. L. (Aug. 1992). "R. A. Fisher and Fiducial Argument" [56]. Statistical Science 7 (3): pp.369387. JSTOR2246073


Further reading
Casella, G., Berger, R.L. (2001). Statistical Inference. Duxbury Press. ISBN 0534243126 David A. Freedman. Statistical Models and Shoe Leather (1991). Sociological Methodology, vol. 21, pp.291313. David A. Freedman. Statistical Models and Causal Inferences: A Dialogue with the Social Sciences. 2010. Edited by David Collier, Jasjeet S. Sekhon, and Philip B. Stark. Cambridge University Press. Hinkelmann, Klaus and Kempthorne, Oscar (2008). Design and Analysis of Experiments, Volume I: Introduction to Experimental Design [57] (Second ed.). Wiley [58]. ISBN978-0-471-72756-9. Kruskal, William (December 1988). "Miracles and Statistics: The Casual Assumption of Independence (ASA Presidential address)" [59]. Journal of the American Statistical Association 83 (404): pp.929940. JSTOR2290117 Lenhard, Johannes (2006). "Models and Statistical Inference: The Controversy between Fisher and NeymanPearson," British Journal for the Philosophy of Science, Vol. 57 Issue 1, pp.6991. Lindley, D. (1958). "Fiducial distribution and Bayes theorem", Journal of the Royal Statistical Society, Series B, 20, 1027 Sudderth, William D. (1994). "Coherent Inference and Prediction in Statistics," in Dag Prawitz, Bryan Skyrms, and Westerstahl (eds.), Logic, Methodology and Philosophy of Science IX: Proceedings of the Ninth International Congress of Logic, Methodology and Philosophy of Science, Uppsala, Sweden, August 714, 1991, Amsterdam: Elsevier. Trusted, Jennifer (1979). The Logic of Scientific Inference: An Introduction, London: The Macmillan Press, Ltd. Young, G.A., Smith, R.L. (2005) Essentials of Statistical Inference, CUP. ISBN 0-521-83971-8

External links
MIT OpenCourseWare [60]: Statistical Inference

[1] [2] [3] [4] [5] [6] Upton, G., Cook, I. (2008) Oxford Dictionary of Statistics, OUP 978-0-19-954145-4 Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9 (entry for "inferential statistics") According to Peirce, acceptance means that inquiry on this question ceases for the time being. In science, all scientific theories are revisable Cox (2006) page 2 van der Vaart, A.W. (1998) Asymptotic Statistics Cambridge University Press. ISBN 0-521-78450-6 (page 341) Kruskal, William (December 1988). "Miracles and Statistics: The Casual Assumption of Independence (ASA Presidential address)" (http:/ / www. jstor. org/ stable/ 2290117). Journal of the American Statistical Association 83 (404): pp.929940. . JSTOR2290117

[7] Freedman, D.A. (2008) "Survival analysis: An Epidemiological hazard?". The American Statistician (2008) 62: 110-119. (Reprinted as Chapter 11 (pages 169192) of: Freedman, D.A. (2010) Statistical Models and Causal Inferences: A Dialogue with the Social Sciences (Edited by David Collier, Jasjeet S. Sekhon, and Philip B. Stark.) Cambridge University Press. ISBN 9780521123907)

Statistical inference
[8] Berk, R. (2003) Regression Analysis: A Constructive Critique (Advanced Quantitative Techniques in the Social Sciences) (v. 11) Sage Publications. ISBN 0-761-92904-5 [9] Page 6 in Brewer, Ken (2002). Combined Survey Sampling Inference: Weighing of Basu's Elephants. Hodder Arnold. ISBN0340692294, 978-0340692295.: In particular, a normal distribution "would be a totally unrealistic and catastraphoicaly unwise assumption to make if we were dealing with any kind of economic population" (page 6 again). [10] Jrgen Hoffman-Jrgensen's Probability With a View Towards Statistics, Volume I. Page 399 [11] Op. cit. [12] Lucien Le Cam. Asymptotic Methods of Statistical Decision Theory. [13] Erik Torgerson (1991) Comparison of Statistical Experiments, volume 36 of Encyclopedia of Mathematics. Cambridge University Press. [14] Liese, Friedrich and Miescke, Klaus-J. (2008). Statistical Decision Theory: Estimation, Testing, and Selection. Springer. [15] Kolmogorov, Andrei N. (1963). "On Tables of Random Numbers". Sankhy Ser. A. 25: pp.369375. (Page 369): "The frequency concept, based on the notion of limiting frequency as the number of trials increases to infinity, does not contribute anything to substantiate the applicability of the results of probability theory to real practical problems where we have always to deal with a finite number of trials". (page 369) Lucien Le Cam. Asymptotic Methods of Statistical Decision Theory: "Indeed, limit theorems 'as tends to infinity' are logically devoid of content about what happens at any particular . All they can do is suggest [definite] approaches whose performance must then be checked on the case at hand." (page xiv) Pfanzagl, Johann; with the assistance of R. Hambker (1994). Parametric Statistical Theory. Walter de Gruyter. MR1291393. ISBN3-11-01-3863-8.: "The crucial drawback of asymptotic theory: What we [wish] from asymptotic theory are results which hold approximately . . . . What asymptotic theory has to offer are limit theorems."(page ix) "What counts for applications are approximations, not limits." (page 188) [16] Pfanzagl, Johann; with the assistance of R. Hambker (1994). Parametric Statistical Theory. Walter de Gruyter. MR1291393. ISBN3-11-01-3863-8.: "By taking a limit theorem as being approximately true for large sample sizes, we commit an error the size of which is unknown. [. . .] Realistic information about the remaining errors may be obtained by simulations." (page ix) [17] Jerzy Neyman. "On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection" given at the Royal Statistical Society on 19 June 1934. [18] Hinkelmann and Kempthorne. [19] ASA Guidelines for a first course in statistics for non-statisticians. (available at the ASA website) [20] David A. Freedman et alia's Statistics. [21] David S. Moore and George McCabe. Introduction to the Practice of Statistics. [22] Gelman, Rubin. Bayesian Data Analysis. [23] Peirce (1877-1878) [24] Peirce (1883) [25] David Freedman et alia Statistics and David A. Freedman Statistical Models. [26] Rao, C.R. (1997) Statistics and Truth: Putting Chance to Work, World Scientific. ISBN 9810231113 [27] Peirce, Freedman, Moore and McCabe. [28] Box, G.E.P. and Friends (2006) Improving Almost Anything: Ideas and Essays, Revised Edition, Wiley. ISBN 978-0-471-72755-2 [29] Cox (2006), page 196 [30] ASA Guidelines for a first course in statistics for non-statisticians. (available at the ASA website) David A. Freedman et alia's Statistics. David S. Moore and George McCabe. Introduction to the Practice of Statistics. [31] Neyman, Jerzy. 1923 [1990]. On the Application of Probability Theory to AgriculturalExperiments. Essay on Principles. Section 9. Statistical Science 5 (4): 465472. Trans. Dorota M. Dabrowska and Terence P. Speed. [32] Hinkelmann, Klaus and Kempthorne, Oscar (2008). Design and Analysis of Experiments, Volume I: Introduction to Experimental Design (http:/ / books. google. com/ books?id=T3wWj2kVYZgC& printsec=frontcover& hl=sv& source=gbs_book_other_versions_r& cad=4_0) (Second ed.). Wiley (http:/ / eu. wiley. com/ WileyCDA/ WileyTitle/ productCd-0471727563. html). ISBN978-0-471-72756-9. [33] Hinkelmann and Kempthorne. [34] Hinkelmann and Kempthorne, chapter 6. Bailey, etc. [35] Soofi (2000) [36] Hansen & Yu (2001) [37] Hansen and Yu (2001), page 747. [38] Rissanen (1989), page 84 [39] Joseph F. Traub, G. W. Wasilkowski, and H. Wozniakowski. Judin and Nemirovski. [40] Neyman 1956. Zabell. [41] Davison, op. cit. page 12. [42] Barnard, G.A. (1995) "Pivotal Models and the Fiducial Argument", International Statistical Review, 63 (3), 309323. Stable URL: http:/ / www. jstor. org/ stable/ 1403482 [43] http:/ / www. cambridge. org/ catalogue/ catalogue. asp?isbn=9780521743853 [44] http:/ / www. jstor. org/ stable/ 2670311


Statistical inference
[45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] http:/ / www. jstor. org/ stable/ 2983716 http:/ / books. google. com/ books?id=ZKMVAAAAYAAJ& jtp=604 http:/ / www. archive. org/ stream/ popscimonthly12yoummiss#page/ 612/ mode/ 1up http:/ / books. google. com/ books?id=ZKMVAAAAYAAJ& jtp=705 http:/ / www. archive. org/ stream/ popscimonthly12yoummiss#page/ 715/ mode/ 1up http:/ / books. google. com/ books?id=u8sWAQAAIAAJ& jtp=203 http:/ / www. archive. org/ stream/ popularsciencemo13newy#page/ 203/ mode/ 1up http:/ / books. google. com/ books?id=u8sWAQAAIAAJ& jtp=470 http:/ / www. archive. org/ stream/ popularsciencemo13newy#page/ 470/ mode/ 1up http:/ / books. google. com/ books?id=V7oIAAAAQAAJ& pg=PA126 http:/ / www. jstor. org/ stable/ 2669786 http:/ / www. jstor. org/ stable/ 2246073 http:/ / books. google. com/ books?id=T3wWj2kVYZgC& printsec=frontcover& hl=sv& source=gbs_book_other_versions_r& cad=4_0 http:/ / eu. wiley. com/ WileyCDA/ WileyTitle/ productCd-0471727563. html http:/ / www. jstor. org/ stable/ 2290117 http:/ / ocw. mit. edu/ OcwWeb/ Mathematics/ 18-441Statistical-InferenceSpring2002/ CourseHome/


Statistical hypothesis testing

A statistical hypothesis test is a method of making decisions using experimental data. In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. The phrase "test of significance" was coined by Ronald Fisher: "Critical tests of this kind may be called tests of significance, and when such tests are available we may discover whether a second sample is or is not significantly different from the first."[1]

Hypothesis testing is sometimes called confirmatory data analysis, in contrast to exploratory data analysis. In frequency probability, these decisions are almost always made using null-hypothesis tests (i.e., tests that answer the question Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?)[2] One use of hypothesis testing is deciding whether experimental results contain enough information to cast doubt on conventional wisdom. Statistical hypothesis testing is a key technique of frequentist statistical inference, and is widely used, but also much criticized. While controversial,[3] the Bayesian approach to hypothesis testing is to base rejection of the hypothesis on the posterior probability.[4] Other approaches to reaching a decision based on data are available via decision theory and optimal decisions. The critical region of a hypothesis test is the set of all outcomes which, if they occur, will lead us to decide that there is a difference. That is, cause the null hypothesis to be rejected in favor of the alternative hypothesis. The critical region is usually denoted by C. The following examples should solidify these ideas.

Example 1 - Court Room Trial

A statistical test procedure is comparable to a trial; a defendant is considered innocent as long as his guilt is not proven. The prosecutor tries to prove the guilt of the defendant. Only when there is enough charging evidence the defendant is condemned. In the start of the procedure, there are two hypotheses : "the defendant is innocent", and : "the defendant is guilty". The first one is called null hypothesis, and is for the time being accepted. The second one is called alternative (hypothesis). It is the hypothesis one tries to prove. The hypothesis of innocence is only rejected when an error is very unlikely, because one doesn't want to condemn an innocent defendant. Such an error is called error of the first kind (i.e. the condemnation of an innocent person), and

Statistical hypothesis testing the occurrence of this error is controlled to be seldom. As a consequence of this asymmetric behaviour, the error of the second kind (setting free a guilty person), is often rather large.
Accept Null Hypothesis (H0) Let Him Go Null Hypothesis (H0) is true He truly is innocent GOOD Reject Null Hypothesis (H0) GUILTY - HANG 'EM HIGH BAD - Incorrectly reject the null Type I Error False Positive GOOD


Alternative Hypothesis (H1) is true He truly is guilty

BAD - Incorrectly accept the null Type II Error False Negative

Example 2 - Clairvoyant Card Game

A person (the subject) is tested for clairvoyance. He is shown the reverse of a randomly chosen play card 25 times and asked which suit it belongs to. The number of hits, or correct answers, is called X. As we try to find evidence of his clairvoyance, for the time being the null hypothesis is that the person is not clairvoyant. The alternative is, of course: the person is (more or less) clairvoyant. If the null hypothesis is valid, the only thing the test person can do is guess. For every card, the probability (relative frequency) of guessing correctly is 1/4. If the alternative is valid, the test subject will predict the suit correctly with probability greater than 1/4. We will call the probability of guessing correctly p. The hypotheses, then, are: (Null Hypothesis = Just Guessing) and (Alternative Hypothesis = True Clairvoyant) When the test subject correctly predicts all 25 cards, we will consider him clairvoyant, and reject the null hypothesis. Thus also with 24 or 23 hits. With only 5 or 6 hits, on the other hand, there is no cause to consider him so. But what about 12 hits, or 17 hits? What is the critical number, c, of hits, at which point we consider the subject to be clairvoyant, versus coincidental? How do we determine the critical value c? It is obvious that with the choice clairvoyance when all cards are predicted correctly) we're more critical than with (i.e. we only accept . In the first case almost

no test subjects will be recognised to be clairvoyant, in the second case, some number more will pass the test. In practice, one decides how critical one will be. That is, one decides how often one accepts an error of the first kinda false positive, or Type I error. With the probability of such an error is:

Hence, very small. The probability of a false positive is the probability of randomly guessing correctly all 25 times. Less critical, with , gives:


yields a much greater probability of false positive.

Before the test is actually performed, the desired probability of a Type I error is determined. Typically, values in the range of 1% to 5% are selected. Depending on this desired Type 1 error rate, the critical value c is calculated. For example, if we select an error rate of 1%, c is calculated thus:

Statistical hypothesis testing From all the numbers c, with this property, we choose the smallest, in order to minimize the probability of a Type II error, a false negative. For the above example, we select: .
Accept Null Hypothesis (H0) You Believe Subject is Just Guessing Send him home to Mama! Null Hypothesis (H0) is true Subject Truly is Just Guessing GOOD Reject Null Hypothesis (H0) You Believe Subject is Clairvoyant Take him to the Horse Races! BAD - Incorrectly reject the null Type I Error False Positive


Alternative Hypothesis (H1) is true BAD - Incorrectly accept the null Subject is Truly a Gifted Clairvoyant Type II Error False Negative


Note that the sum of all four possible outcomes must be 1. A + B + C + D = 1. When designing a statistical test, one wants to maximize the good probabilities (in this case A and D) and minimize the bad probabilities (B and C).

Example 3 - Radioactive Suit Case

As an example, consider determining whether a suitcase contains some radioactive material. Placed under a Geiger counter, it produces 10 counts per minute. The null hypothesis is that no radioactive material is in the suitcase and that all measured counts are due to ambient radioactivity typical of the surrounding air and harmless objects. We can then calculate how likely it is that we would observe 10 counts per minute if the null hypothesis were true. If the null hypothesis predicts (say) on average 9 counts per minute and a standard deviation of 1 count per minute, then we say that the suitcase is compatible with the null hypothesis (this does not guarantee that there is no radioactive material, just that we don't have enough evidence to suggest there is). On the other hand, if the null hypothesis predicts 3 counts per minute and a standard deviation of 1 count per minute, then the suitcase is not compatible with the null hypothesis, and there are likely other factors responsible to produce the measurements. The test described here is more fully the null-hypothesis statistical significance test. The null hypothesis represents what we would believe by default, before seeing any evidence. Statistical significance is a possible finding of the test, declared when the observed sample is unlikely to have occurred by chance if the null hypothesis were true. The name of the test describes its formulation and its possible outcome. One characteristic of the test is its crisp decision: to reject or not reject the null hypothesis. A calculated value is compared to a threshold, which is determined from the tolerable risk of error.
Accept Null Hypothesis (H0) TSA Believes Suitcase contains only Approved Items Allow Person to Board Aircraft Null Hypothesis (H0) is true Suitcase contains only Approved Items - Clothes, Toothpaste, Shoes... GOOD Reject Null Hypothesis (H0) TSA Believes Suite contains Radioactive Materials Detain at Airport Security BAD - Incorrectly Reject Type I Error False Positive Cavity Searches. Scandal Ensues. Lawsuits to Follow. GOOD

Alternative Hypothesis (H1) is true Suitcase contains Radioactive Material

BAD - Incorrectly Accept Type II Error False Negative Terrorist Gets On Board!

Again, the designer of a statistical test wants to maximize the good probabilities and minimize the bad probabilities.

Statistical hypothesis testing


The testing process

Hypothesis testing is defined by the following general procedure: 1. The first step in any hypothesis testing is to state the relevant null and alternative hypotheses to be tested. This is important as mis-stating the hypotheses will muddy the rest of the process. 2. The second step is to consider the statistical assumptions being made about the sample in doing the test; for example, assumptions about the statistical independence or about the form of the distributions of the observations. This is equally important as invalid assumptions will mean that the results of the test are invalid. 3. Decide which test is appropriate, and stating the relevant test statistic T. 4. Derive the distribution of the test statistic under the null hypothesis from the assumptions. In standard cases this will be a well-known result. For example the test statistics may follow a Student's t distribution or a normal distribution. 5. The distribution of the test statistic partitions the possible values of T into those for which the null-hypothesis is rejected, the so called critical region, and those for which it is not. 6. Compute from the observations the observed value tobs of the test statistic T. 7. Decide to either fail to reject the null hypothesis or reject it in favor of the alternative. The decision rule is to reject the null hypothesis H0 if the observed value tobs is in the critical region, and to accept or "fail to reject" the hypothesis otherwise. It is important to note the philosophical difference between accepting the null hypothesis and simply failing to reject it. The "fail to reject" terminology highlights the fact that the null hypothesis is assumed to be true from the start of the test; if there is a lack of evidence against it, it simply continues to be assumed true. The phrase "accept the null hypothesis" may suggest it has been proved simply because it has not been disproved, a logical fallacy known as the argument from ignorance. Unless a test with particularly high power is used, the idea of "accepting" the null hypothesis may be dangerous. Nonetheless the terminology is prevalent throughout statistics, where its meaning is well understood.

Definition of terms
The following definitions are mainly based on the exposition in the book by Lehmann and Romano:[5] Simple hypothesis Any hypothesis which specifies the population distribution completely. Composite hypothesis Any hypothesis which does not specify the population distribution completely. Statistical test A decision function that takes its values in the set of hypotheses. Region of acceptance The set of values for which we fail to reject the null hypothesis. Region of rejection / Critical region The set of values of the test statistic for which the null hypothesis is rejected. Power of a test (1) The test's probability of correctly rejecting the null hypothesis. The complement of the false negative rate, . Size / Significance level of a test () For simple hypotheses, this is the test's probability of incorrectly rejecting the null hypothesis. The false positive rate. For composite hypotheses this is the upper bound of the probability of rejecting the null hypothesis over all cases covered by the null hypothesis.

Statistical hypothesis testing Most powerful test For a given size or significance level, the test with the greatest power. Uniformly most powerful test (UMP) A test with the greatest power for all values of the parameter being tested. Consistent test When considering the properties of a test as the sample size grows, a test is said to be consistent if, for a fixed size of test, the power against any fixed alternative approaches 1 in the limit.[6] Unbiased test For a specific alternative hypothesis, a test is said to be unbiased when the probability of rejecting the null hypothesis is not less than the significance level when the alternative is true and is less than or equal to the significance level when the null hypothesis is true. Uniformly most powerful unbiased (UMPU) A test which is UMP in the set of all unbiased tests. p-value The probability, assuming the null hypothesis is true, of observing a result at least as extreme as the test statistic.


The direct interpretation is that if the p-value is less than the required significance level, then we say the null hypothesis is rejected at the given level of significance. Criticism on this interpretation can be found in the corresponding section.

Common test statistics

In the table below, the symbols used are defined at the bottom of the table. Many other tests can be found in other articles.
Name One-sample z-test Formula Assumptions or notes (Normal population or n > 30) and known. (z is the distance from the mean in relation to the standard deviation of the mean). For non-normal distributions it is possible to calculate a minimum proportion of a population that falls within k standard deviations for any k (see: Chebyshev's inequality). Normal population and independent observations and 1 and 2 are known

Two-sample z-test

Two-sample pooled t-test, equal variances*

(Normal populations or n1+n2>40) and independent observations and 1 = 2 and 1 and 2 unknown

[7] Two-sample unpooled t-test, unequal variances* [8] (Normal populations or n1+n2>40) and independent observations and 1 2 and 1 and 2 unknown

Statistical hypothesis testing


One-proportion z-test

n .p0 > 10 and n (1p0) > 10 and it is a SRS (Simple Random Sample), see notes. n1 p1 > 5 and n1(1p1) > 5 and n2 p2>5 and n2(1p2) > 5 and independent observations, see notes.

Two-proportion z-test, pooled for

Two-proportion z-test, unpooled for

n1 p1 > 5 and n1(1p1) > 5 and n2 p2>5 and n2(1p2) > 5 and independent observations, see notes.

One-sample chi-square test

One of the following All expected counts are at least 5 All expected counts are >1 and no more that 20% of expected counts are less than5 Arrange so [9] > and reject H0 for

*Two-sample F test for equality of variances

In general, the subscript 0 indicates a value taken from the null hypothesis, H0, which should be used as much as possible in constructing its test statistic. ... Definitions of other symbols: , the probability of Type I error (rejecting a null hypothesis when it is in fact true) = sample size = sample 1 size = sample 2 size = sample mean = hypothesized population mean = population 1 mean = population 2 mean = population standard deviation = population variance = sample standard deviation = sample variance = sample 1 standard deviation = sample 2 standard deviation = t statistic = degrees of freedom = sample mean of differences = hypothesized population mean difference = standard deviation of differences = Chi-squared statistic = F statistic

= x/n = sample proportion, unless specified otherwise = hypothesized population proportion = proportion 1 = proportion 2 = hypothesized difference in proportion = minimum of n1 and n2

Hypothesis testing is largely the product of Ronald Fisher, Jerzy Neyman, Karl Pearson and (son) Egon Pearson. Fisher was an agricultural statistician who emphasized rigorous experimental design and methods to extract a result from few samples assuming Gaussian distributions. Neyman (who teamed with the younger Pearson) emphasized mathematical rigor and methods to obtain more results from many samples and a wider range of distributions. Modern hypothesis testing is an (extended) hybrid of the Fisher vs Neyman/Pearson formulation, methods and terminology developed in the early 20th century.

Example 4 - Lady Tasting Tea

The following example is summarized from Fisher, and is known as the Lady tasting tea example.[10] Fisher thoroughly explained his method in a proposed experiment to test a Lady's claimed ability to determine the means of tea preparation by taste. The article is less than 10 pages in length and is notable for its simplicity and completeness regarding terminology, calculations and design of the experiment. The example is loosely based on an event in Fisher's life. The Lady proved him wrong.[11]

Statistical hypothesis testing 1. The null hypothesis was that the Lady had no such ability. 2. The test statistic was a simple count of the number of successes in 8 trials. 3. The distribution associated with the null hypothesis was the binomial distribution familiar from coin flipping experiments. 4. The critical region was the single case of 8 successes in 8 trials based on a conventional probability criterion (<5%). 5. Fisher asserted that no alternative hypothesis was (ever) required. If and only if the 8 trials produced 8 successes was Fisher willing to reject the null hypothesis effectively acknowledging the Lady's ability with >98% confidence (but without quantifying her ability). Fisher later discussed the benefits of more trials and repeated tests.


Statistical hypothesis testing plays an important role in the whole of statistics and in statistical inference. For example, Lehmann (1992) in a review of the fundamental paper by Neyman and Pearson (1933) says: "Nevertheless, despite their shortcomings, the new paradigm formulated in the 1933 paper, and the many developments carried out within its framework continue to play a central role in both the theory and practice of statistics and can be expected to do so in the foreseeable future".

Some statisticians have commented that pure "significance testing" has what is actually a rather strange goal of detecting the existence of a "real" difference between two populations. In practice a difference can almost always be found given a large enough sample. The typically more relevant goal of science is a determination of causal effect size. The amount and nature of the difference, in other words, is what should be studied.[12] Many researchers also feel that hypothesis testing is something of a misnomer. In practice a single statistical test in a single study never "proves" anything.[13] Rejection of the null hypothesis at some effect size has no bearing on the practical significance of the observed effect size. A statistically significant finding may not be relevant in practice due to other, larger effects of more concern, whilst a true effect of practical significance may not appear statistically significant if the test lacks the power to detect it. Appropriate specification of both the hypothesis and the test of said hypothesis is therefore important to provide inference of practical utility.

Criticism is of the application, or of the interpretation, rather than of the method. Criticism of null-hypothesis significance testing is available in other articles (for example "Statistical significance") and their references. Attacks and defenses of the null-hypothesis significance test are collected in Harlow et al..[14] The original purposes of Fisher's formulation, as a tool for the experimenter, was to plan the experiment and to easily assess the information content of the small sample. There is little criticism, Bayesian in nature, of the formulation in its original context. In other contexts, complaints focus on flawed interpretations of the results and over-dependence/emphasis on one test. Numerous attacks on the formulation have failed to supplant it as a criterion for publication in scholarly journals. The most persistent attacks originated from the field of Psychology. After review, the American Psychological Association did not explicitly deprecate the use of null-hypothesis significance testing, but adopted enhanced publication guidelines which implicitly reduced the relative importance of such testing.

Statistical hypothesis testing The International Committee of Medical Journal Editors recognizes an obligation to publish negative (not statistically significant) studies under some circumstances. The applicability of the null-hypothesis testing to the publication of observational (as contrasted to experimental) studies is doubtful.


Philosophical criticism
Philosophical criticism to hypothesis testing includes consideration of borderline cases. Any process that produces a crisp decision from uncertainty is subject to claims of unfairness near the decision threshold. (Consider close election results.) The premature death of a laboratory rat during testing can impact doctoral theses and academic tenure decisions. "... surely, God loves the .06 nearly as much as the .05"[15] The statistical significance required for publication has no mathematical basis, but is based on long tradition. "It is usual and convenient for experimenters to take 5% as a standard level of significance, in the sense that they are prepared to ignore all results which fail to reach this standard, and, by this means, to eliminate from further discussion the greater part of the fluctuations which chance causes have introduced into their experimental results."[10] Ambivalence attacks all forms of decision making. A mathematical decision-making process is attractive because it is objective and transparent. It is repulsive because it allows authority to avoid taking personal responsibility for decisions.

Pedagogic criticism
Pedagogic criticism of the null-hypothesis testing includes the counter-intuitive formulation, the terminology and confusion about the interpretation of results. "Despite the stranglehold that hypothesis testing has on experimental psychology, I find it difficult to imagine a less insightful means of transiting from data to conclusions."[16] Students find it difficult to understand the formulation of statistical null-hypothesis testing. In rhetoric, examples often support an argument, but a mathematical proof "is a logical argument, not an empirical one". A single counterexample results in the rejection of a conjecture. Karl Popper defined science by its vulnerability to disproof by data. Null-hypothesis testing shares the mathematical and scientific perspective rather than the more familiar rhetorical one. Students expect hypothesis testing to be a statistical tool for illumination of the research hypothesis by the sample; it is not. The test asks indirectly whether the sample can illuminate the research hypothesis. Students also find the terminology confusing. While Fisher disagreed with Neyman and Pearson about the theory of testing, their terminologies have been blended. The blend is not seamless or standardized. While this article teaches a pure Fisher formulation, even it mentions Neyman and Pearson terminology (Type II error and the alternative hypothesis). The typical introductory statistics text is less consistent. The Sage Dictionary of Statistics would not agree with the title of this article, which it would call null-hypothesis testing.[2] "...there is no alternate hypothesis in Fisher's scheme: Indeed, he violently opposed its inclusion by Neyman and Pearson."[17] In discussing test results, "significance" often has two distinct meanings in the same sentence; One is a probability, the other is a subject-matter measurement (such as currency). The significance (meaning) of (statistical) significance is significant (important). There is widespread and fundamental disagreement on the interpretation of test results. "A little thought reveals a fact widely understood among statisticians: The null hypothesis, taken literally (and that's the only way you can take it in formal hypothesis testing), is almost always false in the real world.... If it is false, even to a tiny degree, it must be the case that a large enough sample will produce a significant result and lead to its rejection. So if the null hypothesis is always false, what's the big deal about rejecting it?"[17] (The above criticism

Statistical hypothesis testing only applies to point hypothesis tests. If one were testing, for example, whether a parameter is greater than zero, it would not apply.) "How has the virtually barren technique of hypothesis testing come to assume such importance in the process by which we arrive at our conclusions from our data?"[16] Null-hypothesis testing just answers the question of "how well the findings fit the possibility that chance factors alone might be responsible."[2] Null-hypothesis significance testing does not determine the truth or falsity of claims. It determines whether confidence in a claim based solely on a sample-based estimate exceeds a threshold. It is a research quality assurance test, widely used as one requirement for publication of experimental research with statistical results. It is uniformly agreed that statistical significance is not the only consideration in assessing the importance of research results. Rejecting the null hypothesis is not a sufficient condition for publication. "Statistical significance does not necessarily imply practical significance!"[18]


Practical criticism
Practical criticism of hypothesis testing includes the sobering observation that published test results are often contradicted. Mathematical models support the conjecture that most published medical research test results are flawed. Null-hypothesis testing has not achieved the goal of a low error probability in medical journals.[19] [20] Many authors have expressed a strong skepticism, sometimes labeled as postmodernism, about the general unreliability of statistical hypothesis testing to explain many social and medical phenomena. For example, modern statistics do not reliably link exposures of carcinogens to spatial-temporal patterns of cancer incidence. There is not a strong convention in statistical hypothesis testing to consider alternate units of scale. With temporal data the units chosen for temporal aggregation (hour, day, week, year, decade) can completely alter the trends and cycles. With spatial data, the units chosen for analysis (the modifiable areal unit problem) can alter or reverse relationships between variables. If the issue of analysis scale is ignored in a hypothesis test then skepticism about the results is justified.

Straw man
Hypothesis testing is controversial when the alternative hypothesis is suspected to be true at the outset of the experiment, making the null hypothesis the reverse of what the experimenter actually believes; it is put forward as a straw man only to allow the data to contradict it. Many statisticians have pointed out that rejecting the null hypothesis says nothing or very little about the likelihood that the null is true. Under traditional null hypothesis testing, the null is rejected when the conditional probability P(Data as or more extreme than observed | Null) is very small, say 0.05. However, some say researchers are really interested in the probability P(Null | Data as actually observed) which cannot be inferred from a p-value: some like to present these as inverses of each other but the events "Data as or more extreme than observed" and "Data as actually observed" are very different. In some cases , P(Null | Data) approaches 1 while P(Data as or more extreme than observed | Null) approaches 0, in other words, we can reject the null when it's virtually certain to be true. For this and other reasons, Gerd Gigerenzer has called null hypothesis testing "mindless statistics"[21] while Jacob Cohen described it as a ritual conducted to convince ourselves that we have the evidence needed to confirm our theories.[22]

Statistical hypothesis testing


Bayesian criticism
Bayesian statisticians reject classical null hypothesis testing, since it violates the Likelihood principle and is thus incoherent and leads to sub-optimal decision-making. The JeffreysLindley paradox illustrates this. Along with many frequentist statisticians, Bayesians prefer to provide an estimate, along with a confidence interval, (although Bayesian confidence intervals are different from classical ones). Some Bayesians (James Berger in particular) have developed Bayesian hypothesis testing methods, though these are not accepted by all Bayesians (notably, Andrew Gelman). Given a prior probability distribution for one or more parameters, sample evidence can be used to generate an updated posterior distribution. In this framework, but not in the null hypothesis testing framework, it is meaningful to make statements of the general form "the probability that the true value of the parameter is greater than 0 isp". According to Bayes' theorem, we have:

thus P(Null|Data) may approach 1 while P(Data|Null) approaches 0 only when P(Null)/P(Data) approaches infinity, i.e. (for instance) when the a priori probability of the null hypothesis, P(Null), is also approaching1, while P(Data) approaches 0: then P(Data|Null) is low because we have extremely unlikely data, but the Null hypothesis is extremely likely to be true.

Publication bias
In 2002, a group of psychologists launched a new journal dedicated to experimental studies in psychology which support the null hypothesis. The Journal of Articles in Support of the Null Hypothesis [23] (JASNH) was founded to address a scientific publishing bias against such articles. According to the editors, "other journals and reviewers have exhibited a bias against articles that did not reject the null hypothesis. We plan to change that by offering an outlet for experiments that do not reach the traditional significance levels (p < 0.05). Thus, reducing the file drawer problem, and reducing the bias in psychological literature. Without such a resource researchers could be wasting their time examining empirical questions that have already been examined. We collect these articles and provide them to the scientific community free of cost." The "File Drawer problem" is a problem that exists due to the fact that academics tend not to publish results that indicate the null hypothesis could not be rejected. This does not mean that the relationship they were looking for did not exist, but it means they couldn't prove it. Even though these papers can often be interesting, they tend to end up unpublished, in "file drawers." Ioannidis has inventoried factors that should alert readers to risks of publication bias.[20]

Jones and Tukey suggested a modest improvement in the original null-hypothesis formulation to formalize handling of one-tail tests. Fisher ignored the 8-failure case (equally improbable as the 8-success case) in the example test involving tea, which altered the claimed significance by a factor of 2[24] .

See also
Comparing means test decision tree Counternull Multiple comparisons Omnibus test BehrensFisher problem Bootstrapping (statistics)

Statistical hypothesis testing Checking if a coin is fair Falsifiability Fisher's method for combining independent tests of significance Null hypothesis P-value Statistical theory Statistical significance Type I error, Type II error Exact test


[1] [2] [3] [4] [5] [6] [7] R. A. Fisher (1925). Statistical Methods for Research Workers, Edinburgh: Oliver and Boyd, 1925, p.43. Cramer, Duncan; Dennis Howitt (2004). The Sage Dictionary of Statistics. p.76. ISBN076194138X. Spiegelhalter, D. and Rice, K: (2009) Bayesian statistics Scholarpedia, 4(8):5230 Schervish, M: Theory of Statistics, page 218. Springer, 1995 Lehmann, E.L.; Joseph P. Romano (2005). Testing Statistical Hypotheses (3E ed.). New York: Springer. ISBN0387988645. Cox, D.R.; D.V. Hinkley (1974). Theoretical Statistics. ISBN0412124293. NIST handbook: Two-Sample t-Test for Equal Means (http:/ / www. itl. nist. gov/ div898/ handbook/ eda/ section3/ eda353. htm)

[8] Ibid. [9] NIST handbook: F-Test for Equality of Two Standard Deviations (http:/ / www. itl. nist. gov/ div898/ handbook/ eda/ section3/ eda359. htm) (should say "Variances") [10] Fisher, Sir Ronald A. (1956) [1935]. "Mathematics of a Lady Tasting Tea" (http:/ / books. google. com/ ?id=oKZwtLQTmNAC& pg=PA1512& dq="mathematics+ of+ a+ lady+ tasting+ tea"). in James Roy Newman. The World of Mathematics, volume 3 [Design of Experiments]. Courier Dover Publications. ISBN9780486411514. .. [11] Box, Joan Fisher (1978). R.A. Fisher, The Life of a Scientist. New York: Wiley. p.134. [12] McCloskey, Deirdre (2008). The Cult of Statistical Significance. Ann Arbor: University of Michigan Press. ISBN0472050079. [13] Wallace, Brendan; Alastair Ross (2006). Beyond Human Error. Florida: CRC Press. ISBN978-0849327186. [14] Harlow, Lisa Lavoie; Stanley A. Mulaik; James H. Steiger (1997). What If There Were No Significance Tests?. Mahwah, N.J.: Lawrence Erlbaum Associates Publishers. ISBN978-0-8058-2634-0. [15] Rosnow, R.L.; Rosenthal, R. (October 1989). "Statistical procedures and the justification of knowledge in psychological science" (http:/ / ist-socrates. berkeley. edu/ ~maccoun/ PP279_Rosnow. pdf). American Psychologist 44 (10): 12761284. doi:10.1037/0003-066X.44.10.1276. ISSN0003-066X. . [16] Loftus, G.R. (1991). "On the tyranny of hypothesis testing in the social sciences". Contemporary Psychology 36: 102105. [17] Cohen, Jacob (December 1990). "Things I have learned (so far)". American Psychologist 45 (12): 13041312. doi:10.1037/0003-066X.45.12.1304. ISSN0003-066X. [18] Weiss, Neil A. (1999). Introductory Statistics (5th ed.). Reading, Mass.: Addison Wesley isbn=0-201-59877-9. p.521. [19] Ioannidis, John P. A. (July 2005). "Contradicted and initially stronger effects in highly cited clinical research" (http:/ / jama. ama-assn. org/ cgi/ content/ full/ 294/ 2/ 218). JAMA 294 (2): 21828. doi:10.1001/jama.294.2.218. PMID16014596. . [20] Ioannidis, John P. A. (August 2005). "Why most published research findings are false" (http:/ / www. pubmedcentral. nih. gov/ articlerender. fcgi?tool=pmcentrez& artid=1182327). PLoS Med. 2 (8): e124. doi:10.1371/journal.pmed.0020124. PMID16060722. PMC1182327. [21] Gigerenzer, Gerd (2004). "Mindless statistics". The Journal of Socio-Economics 33 (5): 587606. doi:10.1016/j.socec.2004.09.033. [22] Cohen, Jacob (December 1994). "The earth is round (p < .05)". American Psychologist 49 (12): 9971003. [23] http:/ / www. jasnh. com/ [24] Jones LV, Tukey JW (December 2000). "A sensible formulation of the significance test" (http:/ / content. apa. org/ journals/ met/ 5/ 4/ 411). Psychol Methods 5 (4): 4114. doi:10.1037/1082-989X.5.4.411. PMID11194204. .

Lehmann E.L. (1992) Introduction to Neyman and Pearson (1933) On the Problem of the Most Efficient Tests of Statistical Hypotheses. In: Breakthroughs in Statistics, Volume 1, (Eds Kotz, S., Johnson, N.L.), Springer-Verlag. ISBN 0-387-94037-5 (followed by reprinting of the paper) Neyman, J., Pearson, E.S. (1933) On the Problem of the Most Efficient Tests of Statistical Hypotheses. Phil. Trans. R. Soc., Series A, 231, 289337.

Statistical hypothesis testing


External links
A Guide to Understanding Hypothesis Testing by Laerd Statistics ( hypothesis-testing) Wilson Gonzlez, Georgina; Karpagam Sankaran (September 10, 1997). "Hypothesis Testing" (http://www.cee. Environmental Sampling & Monitoring Primer. Virginia Tech. Bayesian critique of classical hypothesis testing ( html) Critique of classical hypothesis testing highlighting long-standing qualms of statisticians (http://www.npwrc. Dallal GE (2007) The Little Handbook of Statistical Practice ( (A good tutorial) References for arguments for and against hypothesis testing ( NHST-SHIT.htm) Statistical Tests Overview: ( stat_overview_table.html) How to choose the correct statistical test An Interactive Online Tool to Encourage Understanding Hypothesis Testing ( ?l=en)

In statistics, a meta-analysis combines the results of several studies that address a set of related research hypotheses. This is normally done by identification of a common measure of effect size, which is modelled using a form of meta-regression. Resulting overall averages when controlling for study characteristics can be considered meta-effect sizes, which are more powerful estimates of the true effect size than those derived in a single study under a given single set of assumptions and conditions. From the perspective of the systemic TOGA (Top-down Object-based Goal-oriented Approach, 1993) meta-theory , a meta-analysis is an analysis performed on the level of tools applied to a particular or more general domain-oriented analysis. Roughly speaking, any analysis A relates to a problem P in an arbitrarily selected ontological domain D. It is performed using some analysis tools T. Therefore we may write: A (P, D, T). In such declarative formalization, for a given Ax, a meta-analysis domain is Tx (such as method, algorithm, methodology) in the Px, Dx context, it means MA(Pm, Dm =(Px,Tx), Tm), where MA denotes a meta-analysis operator. The above systemic definition is congruent with definitions of meta-theory, meta-knowledge and meta-system.

The first meta-analysis was performed by Karl Pearson in 1904, in an attempt to overcome the problem of reduced statistical power in studies with small sample sizes; analyzing the results from a group of studies can allow more accurate data analysis.[1] [2] However, the first meta-analysis of all conceptually identical experiments concerning a particular research issue, and conducted by independent researchers, has been identified as the 1940 book-length publication Extra-sensory perception after sixty years, authored by Duke University psychologists J. G. Pratt, J. B. Rhine, and associates.[3] This encompassed a review of 145 reports on ESP experiments published from 1882 to 1939, and included an estimate of the influence of unpublished papers on the overall effect (the file-drawer problem). Although meta-analysis is widely used in epidemiology and evidence-based medicine today, a meta-analysis of a medical treatment was not published until 1955. In the 1970s, more sophisticated analytical techniques were introduced in educational research, starting with the work of Gene V. Glass, Frank L. Schmidt and

Meta-analysis John E. Hunter. The online Oxford English Dictionary lists the first usage of the term in the statistical sense as 1976 by Glass.[4] The statistical theory surrounding meta-analysis was greatly advanced by the work of Nambury S. Raju, Larry V. Hedges, Harris Cooper, Ingram Olkin, John E. Hunter, Jacob Cohen, Thomas C. Chalmers, and Frank L. Schmidt.


Advantages of meta-analysis
Advantages of meta-analysis (eg. over classical literature reviews, simple overall means of effect sizes etc.) include: Shows if the results are more varied than what is expected from the sample diversity Derivation and statistical testing of overall factors / effect size parameters in related studies Generalization to the population of studies Ability to control for between-study variation Including moderators to explain variation Higher statistical power to detect an effect than in n=1 sized study sample

Steps in a meta-analysis
1. Search of literature 2. Selection of studies (incorporation criteria) Based on quality criteria, e.g. the requirement of randomization and blinding in a clinical trial Selection of specific studies on a well-specified subject, e.g. the treatment of breast cancer. Decide whether unpublished studies are included to avoid publication bias (file drawer problem: see below) 3. Decide which dependent variables or summary measures are allowed. For instance: Differences (discrete data) Means (continuous data) Hedges' g is a popular summary measure for continuous data that is standardized in order to eliminate scale differences, but it incorporates an index of variation between groups: in which is the treatment mean, is the control mean, the pooled variance.

4. Model selection (see next paragraph) For reporting guidelines, see QUOROM statement [5] [6]

Meta-regression models
Generally, three types of models can be distinguished in the literature on meta-analysis: simple regression, fixed effects meta-regression and random effects meta-regression.

Simple regression
The model can be specified as


is the effect size in study


(intercept) the estimated overall effect size.


parameters specifying different study characteristics. does not allow specification of within study variation.

specifies the between study variation. Note that this model



Fixed-effects meta-regression
Fixed-effects meta-regression assumes that the true effect size is normally distributed with , i.e. where . is the within study variance of the effect size. A fixed effects meta-regression model thus allows for within study variability, but no between study variability because all studies have expected fixed effect size Where is the variance of the effect size in study

. Fixed effects meta-regression ignores between study

variation. As a result, parameter estimates are biased if between study variation can not be ignored. Furthermore, generalizations to the population are not possible.

Random effects meta-regression

Random effects meta-regression rests on the assumption that (hyper-)distribution in is a random variable following a

Where again

is the variance of the effect size in study

. Between study variance

is estimated using

common estimation procedures for random effects models (restricted maximum likelihood (REML) estimators).

Applications in modern science

Modern statistical meta-analysis does more than just combine the effect sizes of a set of studies. It can test if the outcomes of studies show more variation than the variation that is expected because of sampling different research participants. If that is the case, study characteristics such as measurement instrument used, population sampled, or aspects of the studies' design are coded. These characteristics are then used as predictor variables to analyze the excess variation in the effect sizes. Some methodological weaknesses in studies can be corrected statistically. For example, it is possible to correct effect sizes or correlations for the downward bias due to measurement error or restriction on score ranges. Meta-analaysis can be done with Single-subject design as well as group research designs. This is important because much of the research on low incidents populations has been done with Single-subject research designs. Considerable dispute exists for the most appropriate meta-analytic technique for single subject research [7] Meta-analysis leads to a shift of emphasis from single studies to multiple studies. It emphasizes the practical importance of the effect size instead of the statistical significance of individual studies. This shift in thinking has been termed "meta-analytic thinking". The results of a meta-analysis are often shown in a forest plot. Results from studies are combined using different approaches. One approach frequently used in meta-analysis in health care research is termed 'inverse variance method'. The average effect size across all studies is computed as a weighted mean, whereby the weights are equal to the inverse variance of each studies' effect estimator. Larger studies and studies with less random variation are given greater weight than smaller studies. Other common approaches include the MantelHaenszel method[8] and the Peto method. A recent approach to studying the influence that weighting schemes can have on results has been proposed through the construct of gravity, which is a special case of combinatorial meta-analysis. Signed differential mapping is a statistical technique for meta-analyzing studies on differences in brain activity or structure which used neuroimaging techniques such as fMRI, VBM or PET.



Meta-analysis can never follow the rules of hard science, for example being double-blind, controlled, or proposing a way to falsify the theory in question. It only be a statistical examination of scientific studies, not an actual scientific study, itself. A weakness of the method is that sources of bias are not controlled by the method. A good meta-analysis of badly designed studies will still result in bad statistics. Robert Slavin has argued that only methodologically sound studies should be included in a meta-analysis, a practice he calls 'best evidence meta-analysis'. Other meta-analysts would include weaker studies, and add a study-level predictor variable that reflects the methodological quality of the studies to examine the effect of study quality on the effect size.

File drawer problem

Another weakness of the method is the heavy reliance on published studies, which may create exaggerated outcomes, as it is very hard to publish studies that show no significant results. For any given research area, one cannot know how many studies have been conducted but never reported and the results filed away.[9] This file drawer problem results in the distribution of effect sizes that are biased, skewed or completely cut off, creating a serious base rate fallacy, in which the significance of the published studies is overestimated. For example, if there were fifty tests, and only ten got results, then the real outcome is only 20% as significant as it appears, except that the other 80% were not submitted for publishing, or thrown out by publishers as uninteresting. This should be seriously considered when interpreting the outcomes of a meta-analysis.[9] [10] This can be visualized with a funnel plot which is a scatter plot of sample size and effect sizes. There are several procedures available that attempt to correct for the file drawer problem, once identified, such as guessing at the cut off part of the distribution of study effects. Other weaknesses are Simpson's Paradox (two smaller studies may point in one direction, and the combination study in the opposite direction); the coding of an effect is subjective; the decision to include or reject a particular study is subjective; there are two different ways to measure effect: correlation or standardized mean difference; the interpretation of effect size is purely arbitrary; it has not been determined if the statistically most accurate method for combining results is the fixed effects model or the random effects model; and, for medicine, the underlying risk in each studied group is of significant importance, and there is no universally agreed-upon way to weight the risk. The example provided by the Rind et al. controversy illustrates an application of meta-analysis which has been the subject of subsequent criticisms of many of the components of the meta-analysis.

Dangers of agenda-driven bias

The most severe weakness and abuse of meta-analysis often occurs when the person or persons doing the meta-analysis have an economic, social,or political agenda such as the passage or defeat of legislation. Those persons with these types of agenda have a high likelihood to abuse meta-analysis due to personal bias. For example, researchers favorable to the author's agenda are likely to have their studies "cherry picked" while those not favorable will be ignored or labeled as "not credible". In addition, the favored authors may themselves be biased or paid to produce results that support their overall political, social, or economic goals in ways such as selecting small favorable data sets and not incorporating larger unfavorable data sets. If a meta-analysis is conducted by an individual or organization with a bias or predetermined desired outcome, it should be treated as highly suspect or having a high likelihood of being "junk science". From an integrity perspective, researchers with a bias should avoid meta-analysis and use a less abuse-prone (or independent) form of research.



A funnelplot expected without the file drawer problem

A funnelplot expected with the file drawer problem

See also
Epidemiologic methods Meta-analytic thinking NewcastleOttawa scale Review journal Study heterogeneity Systematic review

Further reading
Thompson, Simon G; Pocock, Stuar J (2 November 1991), "Can meta-analysis be trusted?" [11], The Lancet 338: 1127-1130, retrieved 15 June 2010. Explores two contrasting views: Does meta-analysis provide "objective, quantitative methods for combining evidence from separate but similar studies" or merely "statistical tricks which make unjustified assumptions in producing oversimplified generalisations out of a complex of disparate studies"?

[1] O'Rourke, Keith (2007-12-01). "An historical perspective on meta-analysis: dealing quantitatively with varying study results" (http:/ / jrsm. rsmjournals. com). J R Soc Med 100 (12): 579582. doi:10.1258/jrsm.100.12.579. PMID18065712. . Retrieved 2009-09-10. [2] Egger, M; G D Smith (1997-11-22). "Meta-Analysis. Potentials and promise" (http:/ / www. bmj. com/ archive/ 7119/ 7119ed. htm). BMJ (Clinical Research Ed.) 315 (7119): 13711374. ISSN0959-8138. . Retrieved 2009-09-10. [3] Bsch, H. (2004). Reanalyzing a meta-analysis on extra-sensory perception dating from 1940, the first comprehensive meta-analysis in the history of science. In S. Schmidt (Ed.), Proceedings of the 47th Annual Convention of the Parapsychological Association, University of Vienna, (pp. 113) [4] meta-analysis. (http:/ / dictionary. oed. com/ cgi/ entry/ 00307098?single=1& query_type=word& queryword=meta-analysis) Oxford English Dictionary. Oxford University Press. Draft Entry June 2008. Accessed 28 March 2009. "1976 G. V. Glass in Educ. Res. Nov. 3/2 My major interest currently is in what we have come to call..the meta-analysis of research. The term is a bit grand, but it is precise and apt... Meta-analysis refers to the analysis of analyses." [5] http:/ / www. consort-statement. org/ resources/ related-guidelines-and-initiatives/

[6] http:/ / www. consort-statement. org/ index. aspx?o=1346 [7] Van den Noortgate, W. & Onghena, P. (2007). Aggregating Single-Case Results. The Behavior Analyst Today, 8(2), 196209 BAO (http:/ / www. baojournal. com) [8] Mantel, N.; Haenszel, W. (1959). "Statistical aspects of the analysis of data from the retrospective analysis of disease". Journal of the National Cancer Institute 22 (4): 719748. PMID13655060. [9] Rosenthal, Robert (1979), "The "File Drawer Problem" and the Tolerance for Null Results", Psychological Bulletin 86 (3): 638-641 [10] Hunter, John E; Schmidt, Frank L (1990), Methods of Meta-Analysis: Correcting Error and Bias in Research Findings, Newbury Park, California; London; New Delhi: SAGE Publications [11] http:/ / tobaccodocuments. org/ pm/ 2047231315-1318. html


Cooper, H. & Hedges, L.V. (1994). The Handbook of Research Synthesis. New York: Russell Sage. Cornell, J. E. & Mulrow, C. D. (1999). Meta-analysis. In: H. J. Adr & G. J. Mellenbergh (Eds). Research Methodology in the social, behavioral and life sciences (pp.285323). London: Sage. Norman, S.-L. T. (1999). Tutorial in Biostatistics. Meta-Analysis: Formulating, Evaluating, Combining, and Reporting. Statistics in Medicine, 18, 321359. Sutton, A.J., Jones, D.R., Abrams, K.R., Sheldon, T.A., & Song, F. (2000). Methods for Meta-analysis in Medical Research. London: John Wiley. ISBN 0-471-49066-0 Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions Version 5.0.1 [updated September 2008]. The Cochrane Collaboration, 2008. Available from

Further reading
Owen, A.B. (2009). Karl Pearsons meta-analysis revisited. Annals of Statistics, 37 (6B), 38673892.

External links
Cochrane Handbook for Systematic Reviews of Interventions ( index.htm) Effect Size and Meta-Analysis ( (ERIC Digest) Meta-Analysis at 25 (Gene V Glass) ( Meta-Analysis in Educational Research ( (ERIC Digest) Meta-Analysis: Methods of Accumulating Results Across Research Domains ( MetaA/) (article by Larry Lyons) Meta-analysis ( ( article)

ClinTools ( (commercial) Comprehensive Meta-Analysis ( (commercial) MIX 2.0 ( Software for professional meta-analysis in Excel (free and commercial versions available) What meta-analysis features are available in Stata ( (free add-ons to commercial package) The Meta-Analysis Calculator ( free on-line tool for conducting a meta-analysis Metastat ( (Free) Meta-Analyst ( Free Windows-based tool for Meta-Analysis of binary, continuous and diagnostic data Revman ( A free software for meta-analysis and preparation of cochrane protocols and review available from the Cochrane Collaboration

Clinical trial


Clinical trial
Clinical trials are conducted to allow safety and efficacy data to be collected for health interventions (e.g., drugs, diagnostics, devices, therapy protocols). These trials can take place only after satisfactory information has been gathered on the quality of the non-clinical safety, and Health Authority/Ethics Committee approval is granted in the country where the trial is taking place. Depending on the type of product and the stage of its development, investigators enroll healthy volunteers and/or patients into small pilot studies initially, followed by larger scale studies in patients that often compare the new product with the currently prescribed treatment. As positive safety and efficacy data are gathered, the number of patients is typically increased. Clinical trials can vary in size from a single center in one country to multicenter trials in multiple countries. Due to the sizable cost a full series of clinical trials may incur, the burden of paying for all the necessary people and services is usually borne by the sponsor who may be a governmental organization, a pharmaceutical, or biotechnology company. Since the diversity of roles may exceed resources of the sponsor, often a clinical trial is managed by an outsourced partner such as a contract research organization or a clinical trials unit in the academic sector.

In planning a clinical trial, the sponsor or investigator first identifies the medication or device to be tested. Usually, one or more pilot experiments are conducted to gain insights for design of the clinical trial to follow. In medical jargon, effectiveness is how well a treatment works in practice and efficacy is how well it works in a clinical trial. In the U.S., the elderly comprise only 14% of the population but they consume over one-third of drugs.[1] Despite this, they are often excluded from trials because their more frequent health issues and drug use produces unreliable data. Women, children, and people with unrelated medical conditions are also frequently excluded.[2] In coordination with a panel of expert investigators (usually physicians well known for their publications and clinical experience), the sponsor decides what to compare the new agent with (one or more existing treatments or a placebo), and what kind of patients might benefit from the medication or device. If the sponsor cannot obtain enough patients with this specific disease or condition at one location, then investigators at other locations who can obtain the same kind of patients to receive the treatment would be recruited into the study. During the clinical trial, the investigators: recruit patients with the predetermined characteristics, administer the treatment(s), and collect data on the patients' health for a defined time period. These data include measurements like vital signs, concentration of the study drug in the blood, and whether the patient's health improves or not. The researchers send the data to the trial sponsor who then analyzes the pooled data using statistical tests. Some examples of what a clinical trial may be designed to do: Assess the safety and effectiveness of a new medication or device on a specific kind of patient (e.g., patients who have been diagnosed with Alzheimer's disease) assess the safety and effectiveness of a different dose of a medication than is commonly used (e.g., 10mg dose instead of 5mg dose) Assess the safety and effectiveness of an already marketed medication or device for a new indication, i.e. a disease for which the drug is not specifically approved Assess whether the new medication or device is more effective for the patient's condition than the already used, standard medication or device ("the gold standard" or "standard therapy") Compare the effectiveness in patients with a specific disease of two or more already approved or common interventions for that disease (e.g., Device A vs. Device B, Therapy A vs. Therapy B)

Clinical trial Note that while most clinical trials compare two medications or devices, some trials compare three or four medications, doses of medications, or devices against each other. Except for very small trials limited to a single location, the clinical trial design and objectives are written into a document called a clinical trial protocol. The protocol is the 'operating manual' for the clinical trial, and ensures that researchers in different locations all perform the trial in the same way on patients with the same characteristics. (This uniformity is designed to allow the data to be pooled.) A protocol is always used in multicenter trials. Because the clinical trial is designed to test hypotheses and rigorously monitor and assess what happens, clinical trials can be seen as the application of the scientific method to understanding human or animal biology. Synonyms for 'clinical trials' include clinical studies, research protocols and clinical research. The most commonly performed clinical trials evaluate new drugs, medical devices (like a new catheter), biologics, psychological therapies, or other interventions. Clinical trials may be required before the national regulatory authority[3] approves marketing of the drug or device, or a new dose of the drug, for use on patients. Beginning in the 1980s, harmonization of clinical trial protocols was shown as feasible across countries of the European Union. At the same time, coordination between Europe, Japan and the United States led to a joint regulatory-industry initiative on international harmonization named after 1990 as the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) [4] Currently, most clinical trial programs follow ICH guidelines, aimed at "ensuring that good quality, safe and effective medicines are developed and registered in the most efficient and cost-effective manner. These activities are pursued in the interest of the consumer and public health, to prevent unnecessary duplication of clinical trials in humans and to minimize the use of animal testing without compromising the regulatory obligations of safety and effectiveness."[5]


The history of clinical trials before 1750 is easily summarized: there were no clinical trials.[6] [7] Clinical trials were first introduced in Avicenna's The Canon of Medicine in 1025 AD, in which he laid down rules for the experimental use and testing of drugs and wrote a precise guide for practical experimentation in the process of discovering and proving the effectiveness of medical drugs and substances.[8] He laid out the following rules and principles for testing the effectiveness of new drugs and medications, which still form the basis of modern clinical trials:[9] [10] 1. The drug must be free from any extraneous accidental quality. 2. It must be used on a simple, not a composite, disease. 3. The drug must be tested with two contrary types of diseases, because sometimes a drug cures one disease by its essential qualities and another by its accidental ones. 4. The quality of the drug must correspond to the strength of the disease. For example, there are some drugs whose heat is less than the coldness of certain diseases, so that they would have no effect on them. 5. The time of action must be observed, so that essence and accident are not confused. 6. The effect of the drug must be seen to occur constantly or in many cases, for if this did not happen, it was an accidental effect. 7. The experimentation must be done with the human body, for testing a drug on a lion or a horse might not prove anything about its effect on man. One of the most famous clinical trials was James Lind's demonstration in 1747 that citrus fruits cure scurvy.[11] He compared the effects of various different acidic substances, ranging from vinegar to cider, on groups of afflicted sailors, and found that the group who were given oranges and lemons had largely recovered from scurvy after 6 days. Frederick Akbar Mahomed (d. 1884), who worked at Guy's Hospital in London,[12] made substantial contributions to the process of clinical trials during his detailed clinical studies, where "he separated chronic nephritis with secondary

Clinical trial hypertension from what we now term essential hypertension." He also founded "the Collective Investigation Record for the British Medical Association; this organization collected data from physicians practicing outside the hospital setting and was the precursor of modern collaborative clinical trials."[13]


One way of classifying clinical trials is by the way the researchers behave. In an observational study, the investigators observe the subjects and measure their outcomes. The researchers do not actively manage the experiment. An example is the Nurses' Health Study. In an interventional study, the investigators give the research subjects a particular medicine or other intervention. Usually, they compare the treated subjects to subjects who receive no treatment or standard treatment. Then the researchers measure how the subjects' health changes. Another way of classifying trials is by their purpose. The U.S. National Institutes of Health (NIH) organizes trials into five (5) different types:[14] Prevention trials: look for better ways to prevent disease in people who have never had the disease or to prevent a disease from returning. These approaches may include medicines, vitamins, vaccines, minerals, or lifestyle changes. Screening trials: test the best way to detect certain diseases or health conditions. Diagnostic trials: conducted to find better tests or procedures for diagnosing a particular disease or condition. Treatment trials: test experimental treatments, new combinations of drugs, or new approaches to surgery or radiation therapy. Quality of life trials: explore ways to improve comfort and the quality of life for individuals with a chronic illness (a.k.a. Supportive Care trials). Compassionate use trials or expanded access: provide partially tested, unapproved therapeutics prior to a small number of patients that have no other realistic options. Usually, this involves a disease for which no effective therapy exists, or a patient that has already attempted and failed all other standard treatments and whose health is so poor that he does not qualify for participation in randomized clinical trials. Usually, case by case approval must be granted by both the FDA and the pharmaceutical company for such exceptions.

A fundamental distinction in evidence-based medicine is between observational studies and randomized controlled trials. Types of observational studies in epidemiology such as the cohort study and the case-control study provide less compelling evidence than the randomized controlled trial. In observational studies, the investigators only observe associations (correlations) between the treatments experienced by participants and their health status or diseases. A randomized controlled trial is the study design that can provide the most compelling evidence that the study treatment causes the expected effect on human health. Currently, some Phase II and most Phase III drug trials are designed as randomized, double blind, and placebo-controlled. Randomized: Each study subject is randomly assigned to receive either the study treatment or a placebo. Blind: The subjects involved in the study do not know which study treatment they receive. If the study is double-blind, the researchers also do not know which treatment is being given to any given subject. This 'blinding' is to prevent biases, since if a physician knew which patient was getting the study treatment and which patient was getting the placebo, he/she might be tempted to give the (presumably helpful) study drug to a patient who could more easily benefit from it. In addition, a physician might give extra care to only the patients who receive the placebos to compensate for their ineffectiveness. A form of double-blind study called a "double-dummy"

Clinical trial design allows additional insurance against bias or placebo effect. In this kind of study, all patients are given both placebo and active doses in alternating periods of time during the study. Placebo-controlled: The use of a placebo (fake treatment) allows the researchers to isolate the effect of the study treatment. Although the term "clinical trials" is most commonly associated with the large, randomized studies typical of Phase III, many clinical trials are small. They may be "sponsored" by single physicians or a small group of physicians, and are designed to test simple questions. In the field of rare diseases sometimes the number of patients might be the limiting factor for a clinical trial. Other clinical trials require large numbers of participants (who may be followed over long periods of time), and the trial sponsor is a private company, a government health agency, or an academic research body such as a university.


Active comparator studies

Of note, during the last ten years or so it has become a common practice to conduct "active comparator" studies (also known as "active control" trials). In other words, when a treatment exists that is clearly better than doing nothing for the subject (i.e. giving them the placebo), the alternate treatment would be a standard-of-care therapy. The study would compare the 'test' treatment to standard-of-care therapy. A growing trend in the pharmacology field involves the use of third-party contractors to obtain the required comparator compounds. Such third parties provide expertise in the logistics of obtaining, storing, and shipping the comparators. As an advantage to the manufacturer of the comparator compounds, a well-established comparator sourcing agency can alleviate the problem of parallel importing (importing a patented compound for sale in a country outside the patenting agency's sphere of influence).

Clinical trial protocol

A clinical trial protocol is a document used to gain confirmation of the trial design by a panel of experts and adherence by all study investigators, even if conducted in various countries. The protocol describes the scientific rationale, objective(s), design, methodology, statistical considerations, and organization of the planned trial. Details of the trial are also provided in other documents referenced in the protocol such as an Investigator's Brochure. The protocol contains a precise study plan for executing the clinical trial, not only to assure safety and health of the trial subjects, but also to provide an exact template for trial conduct by investigators at multiple locations (in a "multicenter" trial) to perform the study in exactly the same way. This harmonization allows data to be combined collectively as though all investigators (referred to as "sites") were working closely together. The protocol also gives the study administrators (often a contract research organization) as well as the site team of physicians, nurses and clinic administrators a common reference document for site responsibilities during the trial. The format and content of clinical trial protocols sponsored by pharmaceutical, biotechnology or medical device companies in the United States, European Union, or Japan has been standardized to follow Good Clinical Practice guidance[15] issued by the International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH).[16] Regulatory authorities in Canada and Australia also follow ICH guidelines. Some journals, e.g. Trials, encourage trialists to publish their protocols in the journal.

Clinical trial


Design features
Informed consent An essential component of initiating a clinical trial is to recruit study subjects following procedures using a signed document called "informed consent."[17] Informed consent is a legally-defined process of a person being told about key facts involved in a clinical trial before deciding whether or not to participate. To fully describe participation to a candidate subject, the doctors and nurses involved in the trial explain the details of the study using terms the person will understand. Foreign language translation is provided if the participant's native language is not the same as the study protocol. The research team provides an informed consent document that includes trial details, such as its purpose, duration, required procedures, risks, potential benefits and key contacts. The participant then decides whether or not to sign the document in agreement. Informed consent is not an immutable contract, as the participant can withdraw at any time without penalty. Statistical power In designing a clinical trial, a sponsor must decide on the target number of patients who will participate. The sponsor's goal usually is to obtain a statistically significant result showing a significant difference in outcome (e.g., number of deaths after 28 days in the study) between the groups of patients who receive the study treatments. The number of patients required to give a statistically significant result depends on the question the trial wants to answer. For example, to show the effectiveness of a new drug in a non-curable disease as metastatic kidney cancer requires many fewer patients than in a highly curable disease as seminoma if the drug is compared to a placebo. The number of patients enrolled in a study has a large bearing on the ability of the study to reliably detect the size of the effect of the study intervention. This is described as the "power" of the trial. The larger the sample size or number of participants in the trial, the greater the statistical power. However, in designing a clinical trial, this consideration must be balanced with the fact that more patients make for a more expensive trial. The power of a trial is not a single, unique value; it estimates the ability of a trial to detect a difference of a particular size (or larger) between the treated (tested drug/device) and control (placebo or standard treatment) groups. By example, a trial of a lipid-lowering drug versus placebo with 100 patients in each group might have a power of .90 to detect a difference between patients receiving study drug and patients receiving placebo of 10mg/dL or more, but only have a power of .70 to detect a difference of 5mg/dL.

Placebo groups
Merely giving a treatment can have nonspecific effects, and these are controlled for by the inclusion of a placebo group. Subjects in the treatment and placebo groups are assigned randomly and blinded as to which group they belong. Since researchers can behave differently to subjects given treatments or placebos, trials are also doubled-blinded so that the researchers do not know to which group a subject is assigned. Assigning a person to a placebo group can pose an ethical problem if it violates his or her right to receive the best available treatment. The Declaration of Helsinki provides guidelines on this issue.

Clinical trial


Clinical trials involving new drugs are commonly classified into four phases. Each phase of the drug approval process is treated as a separate clinical trial. The drug-development process will normally proceed through all four phases over many years. If the drug successfully passes through Phases I, II, and III, it will usually be approved by the national regulatory authority for use in the general population. Phase IV are 'post-approval' studies. Before pharmaceutical companies start clinical trials on a drug, they conduct extensive pre-clinical studies.

Pre-clinical studies
Pre-clinical studies involve in vitro (test tube) and in vivo (animal or cell culture) experiments using wide-ranging doses of the study drug to obtain preliminary efficacy, toxicity and pharmacokinetic information. Such tests assist pharmaceutical companies to decide whether a drug candidate has scientific merit for further development as an investigational new drug.

Phase 0
Phase 0 is a recent designation for exploratory, first-in-human trials conducted in accordance with the United States Food and Drug Administration's (FDA) 2006 Guidance on Exploratory Investigational New Drug (IND) Studies.[18] Phase 0 trials are also known as human microdosing studies and are designed to speed up the development of promising drugs or imaging agents by establishing very early on whether the drug or agent behaves in human subjects as was expected from preclinical studies. Distinctive features of Phase 0 trials include the administration of single subtherapeutic doses of the study drug to a small number of subjects (10 to 15) to gather preliminary data on the agent's pharmacokinetics (how the body processes the drug) and pharmacodynamics (how the drug works in the body).[19] A Phase 0 study gives no data on safety or efficacy, being by definition a dose too low to cause any therapeutic effect. Drug development companies carry out Phase 0 studies to rank drug candidates in order to decide which has the best pharmacokinetic parameters in humans to take forward into further development. They enable go/no-go decisions to be based on relevant human models instead of relying on sometimes inconsistent animal data. Questions have been raised by experts about whether Phase 0 trials are useful, ethically acceptable, feasible, speed up the drug development process or save money, and whether there is room for improvement.[20]

Phase I
Phase I trials are the first stage of testing in human subjects. Normally, a small (20-100) group of healthy volunteers will be selected. This phase includes trials designed to assess the safety (pharmacovigilance), tolerability, pharmacokinetics, and pharmacodynamics of a drug. These trials are often conducted in an inpatient clinic, where the subject can be observed by full-time staff. The subject who receives the drug is usually observed until several half-lives of the drug have passed. Phase I trials also normally include dose-ranging, also called dose escalation, studies so that the appropriate dose for therapeutic use can be found. The tested range of doses will usually be a fraction of the dose that causes harm in animal testing. Phase I trials most often include healthy volunteers. However, there are some circumstances when real patients are used, such as patients who have terminal cancer or HIV and lack other treatment options. Volunteers are paid an inconvenience fee for their time spent in the volunteer centre. Pay ranges from a small amount of money for a short period of residence, to a larger amount of up to approx $6000 depending on length of participation. There are different kinds of Phase I trials: SAD Single Ascending Dose studies are those in which small groups of subjects are given a single dose of the drug while they are observed and tested for a period of time. If they do not exhibit any adverse side effects, and the

Clinical trial pharmacokinetic data is roughly in line with predicted safe values, the dose is escalated, and a new group of subjects is then given a higher dose. This is continued until pre-calculated pharmacokinetic safety levels are reached, or intolerable side effects start showing up (at which point the drug is said to have reached the Maximum tolerated dose (MTD). MAD Multiple Ascending Dose studies are conducted to better understand the pharmacokinetics & pharmacodynamics of multiple doses of the drug. In these studies, a group of patients receives multiple low doses of the drug, while samples (of blood, and other fluids) are collected at various time points and analyzed to understand how the drug is processed within the body. The dose is subsequently escalated for further groups, up to a predetermined level. Food effect A short trial designed to investigate any differences in absorption of the drug by the body, caused by eating before the drug is given. These studies are usually run as a crossover study, with volunteers being given two identical doses of the drug on different occasions; one while fasted, and one after being fed.


Phase II
Once the initial safety of the study drug has been confirmed in Phase I trials, Phase II trials are performed on larger groups (20-300) and are designed to assess how well the drug works, as well as to continue Phase I safety assessments in a larger group of volunteers and patients. When the development process for a new drug fails, this usually occurs during Phase II trials when the drug is discovered not to work as planned, or to have toxic effects. Phase II studies are sometimes divided into Phase IIA and Phase IIB. Phase IIA is specifically designed to assess dosing requirements (how much drug should be given). Phase IIB is specifically designed to study efficacy (how well the drug works at the prescribed dose(s)). Some trials combine Phase I and Phase II, and test both efficacy and toxicity. Trial design Some Phase II trials are designed as case series, demonstrating a drug's safety and activity in a selected group of patients. Other Phase II trials are designed as randomized clinical trials, where some patients receive the drug/device and others receive placebo/standard treatment. Randomized Phase II trials have far fewer patients than randomized Phase III trials.

Phase III
Phase III studies are randomized controlled multicenter trials on large patient groups (3003,000 or more depending upon the disease/medical condition studied) and are aimed at being the definitive assessment of how effective the drug is, in comparison with current 'gold standard' treatment. Because of their size and comparatively long duration, Phase III trials are the most expensive, time-consuming and difficult trials to design and run, especially in therapies for chronic medical conditions. It is common practice that certain Phase III trials will continue while the regulatory submission is pending at the appropriate regulatory agency. This allows patients to continue to receive possibly lifesaving drugs until the drug can be obtained by purchase. Other reasons for performing trials at this stage include attempts by the sponsor at "label expansion" (to show the drug works for additional types of patients/diseases beyond the original use for which the drug was approved for marketing), to obtain additional safety data, or to support marketing claims for the drug. Studies in this phase are by some companies categorised as "Phase IIIB studies."[21] [22] While not required in all cases, it is typically expected that there be at least two successful Phase III trials, demonstrating a drug's safety and efficacy, in order to obtain approval from the appropriate regulatory agencies such as FDA (USA), or the EMA (European Union), for example.

Clinical trial Once a drug has proved satisfactory after Phase III trials, the trial results are usually combined into a large document containing a comprehensive description of the methods and results of human and animal studies, manufacturing procedures, formulation details, and shelf life. This collection of information makes up the "regulatory submission" that is provided for review to the appropriate regulatory authorities[3] in different countries. They will review the submission, and, it is hoped, give the sponsor approval to market the drug. Most drugs undergoing Phase III clinical trials can be marketed under FDA norms with proper recommendations and guidelines, but in case of any adverse effects being reported anywhere, the drugs need to be recalled immediately from the market. While most pharmaceutical companies refrain from this practice, it is not abnormal to see many drugs undergoing Phase III clinical trials in the market.[23]


Phase IV
Phase IV trial is also known as Post Marketing Surveillance Trial. Phase IV trials involve the safety surveillance (pharmacovigilance) and ongoing technical support of a drug after it receives permission to be sold. Phase IV studies may be required by regulatory authorities or may be undertaken by the sponsoring company for competitive (finding a new market for the drug) or other reasons (for example, the drug may not have been tested for interactions with other drugs, or on certain population groups such as pregnant women, who are unlikely to subject themselves to trials). The safety surveillance is designed to detect any rare or long-term adverse effects over a much larger patient population and longer time period than was possible during the Phase I-III clinical trials. Harmful effects discovered by Phase IV trials may result in a drug being no longer sold, or restricted to certain uses: recent examples involve cerivastatin (brand names Baycol and Lipobay), troglitazone (Rezulin) and rofecoxib (Vioxx).

Clinical trials are only a small part of the research that goes into developing a new treatment. Potential drugs, for example, first have to be discovered, purified, characterized, and tested in labs (in cell and animal studies) before ever undergoing clinical trials. In all, about 1,000 potential drugs are tested before just one reaches the point of being tested in a clinical trial. For example, a new cancer drug has, on average, 6 years of research behind it before it even makes it to clinical trials. But the major holdup in making new cancer drugs available is the time it takes to complete clinical trials themselves. On average, about 8 years pass from the time a cancer drug enters clinical trials until it receives approval from regulatory agencies for sale to the public. Drugs for other diseases have similar timelines. Some reasons a clinical trial might last several years: For chronic conditions like cancer, it takes months, if not years, to see if a cancer treatment has an effect on a patient. For drugs that are not expected to have a strong effect (meaning a large number of patients must be recruited to observe any effect), recruiting enough patients to test the drug's effectiveness (i.e., getting statistical power) can take several years. Only certain people who have the target disease condition are eligible to take part in each clinical trial. Researchers who treat these particular patients must participate in the trial. Then they must identify the desirable patients and obtain consent from them or their families to take part in the trial. The biggest barrier to completing studies is the shortage of people who take part. All drug and many device trials target a subset of the population, meaning not everyone can participate. Some drug trials require patients to have unusual combinations of disease characteristics. It is a challenge to find the appropriate patients and obtain their consent, especially when they may receive no direct benefit (because they are not paid, the study drug is not yet proven to work, or the patient may receive a placebo). In the case of cancer patients, fewer than 5% of adults with cancer will participate in drug trials. According to the Pharmaceutical Research and Manufacturers of America (PhRMA), about 400 cancer medicines were being tested in clinical trials in 2005. Not all of these will prove to be useful, but those that are may be delayed in getting approved because the number of participants is so low.[24]

Clinical trial For clinical trials involving a seasonal indication (such as airborne allergies, Seasonal Affective Disorder, influenza, and others), the study can only be done during a limited part of the year (such as Spring for pollen allergies), when the drug can be tested. This can be an additional complication on the length of the study, yet proper planning and the use of trial sites in the southern as well as northern hemispheres allows for year-round trials can reduce the length of the studies.[25] [26] Clinical trials that do not involve a new drug usually have a much shorter duration. (Exceptions are epidemiological studies like the Nurses' Health Study.)


Clinical trials designed by a local investigator and (in the U.S.) federally funded clinical trials are almost always administered by the researcher who designed the study and applied for the grant. Small-scale device studies may be administered by the sponsoring company. Phase III and Phase IV clinical trials of new drugs are usually administered by a contract research organization (CRO) hired by the sponsoring company. (The sponsor provides the drug and medical oversight.) A CRO is a company that is contracted to perform all the administrative work on a clinical trial. It recruits participating researchers, trains them, provides them with supplies, coordinates study administration and data collection, sets up meetings, monitors the sites for compliance with the clinical protocol, and ensures that the sponsor receives 'clean' data from every site. Recently, site management organizations have also been hired to coordinate with the CRO to ensure rapid IRB/IEC approval and faster site initiation and patient recruitment. At a participating site, one or more research assistants (often nurses) do most of the work in conducting the clinical trial. The research assistant's job can include some or all of the following: providing the local Institutional Review Board (IRB) with the documentation necessary to obtain its permission to conduct the study, assisting with study start-up, identifying eligible patients, obtaining consent from them or their families, administering study treatment(s), collecting and statistically analyzing data, maintaining and updating data files during followup, and communicating with the IRB, as well as the sponsor and CRO.

Ethical conduct
Clinical trials are closely supervised by appropriate regulatory authorities. All studies that involve a medical or therapeutic intervention on patients must be approved by a supervising ethics committee before permission is granted to run the trial. The local ethics committee has discretion on how it will supervise noninterventional studies (observational studies or those using already collected data). In the U.S., this body is called the Institutional Review Board (IRB). Most IRBs are located at the local investigator's hospital or institution, but some sponsors allow the use of a central (independent/for profit) IRB for investigators who work at smaller institutions. To be ethical, researchers must obtain the full and informed consent of participating human subjects. (One of the IRB's main functions is ensuring that potential patients are adequately informed about the clinical trial.) If the patient is unable to consent for him/herself, researchers can seek consent from the patient's legally authorized representative. In California, the state has prioritized [27] the individuals who can serve as the legally authorized representative. In some U.S. locations, the local IRB must certify researchers and their staff before they can conduct clinical trials. They must understand the federal patient privacy (HIPAA) law and good clinical practice. International Conference of Harmonisation Guidelines for Good Clinical Practice (ICH GCP) is a set of standards used internationally for the conduct of clinical trials. The guidelines aim to ensure that the "rights, safety and well being of trial subjects are protected". The notion of informed consent of participating human subjects exists in many countries all over the world, but its precise definition may still vary.

Clinical trial Informed consent is clearly a necessary condition for ethical conduct but does not ensure ethical conduct. The final objective is to serve the community of patients or future patients in a best-possible and most responsible way. However, it may be hard to turn this objective into a well-defined quantified objective function. In some cases this can be done, however, as for instance for questions of when to stop sequential treatments (see Odds algorithm), and then quantified methods may play an important role.


Responsibility for the safety of the subjects in a clinical trial is shared between the sponsor, the local site investigators (if different from the sponsor), the various IRBs that supervise the study, and (in some cases, if the study involves a marketable drug or device) the regulatory agency for the country where the drug or device will be sold. For safety reasons, many clinical trials of drugs are designed to exclude women of childbearing age, pregnant women, and/or women who become pregnant during the study. In some cases the male partners of these women are also excluded or required to take birth control measures.

Throughout the clinical trial, the sponsor is responsible for accurately informing the local site investigators of the true historical safety record of the drug, device or other medical treatments to be tested, and of any potential interactions of the study treatment(s) with already approved medical treatments. This allows the local investigators to make an informed judgment on whether to participate in the study or not. The sponsor is responsible for monitoring the results of the study as they come in from the various sites, as the trial proceeds. In larger clinical trials, a sponsor will use the services of a Data Monitoring Committee (DMC, known in the U.S. as a Data Safety Monitoring Board). This is an independent group of clinicians and statisticians. The DMC meets periodically to review the unblinded data that the sponsor has received so far. The DMC has the power to recommend termination of the study based on their review, for example if the study treatment is causing more deaths than the standard treatment, or seems to be causing unexpected and study-related serious adverse events. The sponsor is responsible for collecting adverse event reports from all site investigators in the study, and for informing all the investigators of the sponsor's judgment as to whether these adverse events were related or not related to the study treatment. This is an area where sponsors can slant their judgment to favor the study treatment. The sponsor and the local site investigators are jointly responsible for writing a site-specific informed consent that accurately informs the potential subjects of the true risks and potential benefits of participating in the study, while at the same time presenting the material as briefly as possible and in ordinary language. FDA regulations and ICH guidelines both require that the information that is given to the subject or the representative shall be in language understandable to the subject or the representative." If the participant's native language is not English, the sponsor must translate the informed consent into the language of the participant.[28]

Clinical trial


Local site investigators

A physician's first duty is to his/her patients, and if a physician investigator believes that the study treatment may be harming subjects in the study, the investigator can stop participating at any time. On the other hand, investigators often have a financial interest in recruiting subjects, and can act unethically in order to obtain and maintain their participation. The local investigators are responsible for conducting the study according to the study protocol, and supervising the study staff throughout the duration of the study. The local investigator or his/her study staff are responsible for ensuring that potential subjects in the study understand the risks and potential benefits of participating in the study; in other words, that they (or their legally authorized representatives) give truly informed consent. The local investigators are responsible for reviewing all adverse event reports sent by the sponsor. (These adverse event reports contain the opinion of both the investigator at the site where the adverse event occurred, and the sponsor, regarding the relationship of the adverse event to the study treatments). The local investigators are responsible for making an independent judgment of these reports, and promptly informing the local IRB of all serious and study-treatment-related adverse events. When a local investigator is the sponsor, there may not be formal adverse event reports, but study staff at all locations are responsible for informing the coordinating investigator of anything unexpected. The local investigator is responsible for being truthful to the local IRB in all communications relating to the study.

Approval by an IRB, or ethics board, is necessary before all but the most informal medical research can begin. In commercial clinical trials, the study protocol is not approved by an IRB before the sponsor recruits sites to conduct the trial. However, the study protocol and procedures have been tailored to fit generic IRB submission requirements. In this case, and where there is no independent sponsor, each local site investigator submits the study protocol, the consent(s), the data collection forms, and supporting documentation to the local IRB. Universities and most hospitals have in-house IRBs. Other researchers (such as in walk-in clinics) use independent IRBs. The IRB scrutinizes the study for both medical safety and protection of the patients involved in the study, before it allows the researcher to begin the study. It may require changes in study procedures or in the explanations given to the patient. A required yearly "continuing review" report from the investigator updates the IRB on the progress of the study and any new safety information related to the study.

Regulatory agencies
If a clinical trial concerns a new regulated drug or medical device (or an existing drug for a new purpose), the appropriate regulatory agency for each country where the sponsor wishes to sell the drug or device is supposed to review all study data before allowing the drug/device to proceed to the next phase, or to be marketed. However, if the sponsor withholds negative data, or misrepresents data it has acquired from clinical trials, the regulatory agency may make the wrong decision. In the U.S., the FDA can audit the files of local site investigators after they have finished participating in a study, to see if they were correctly following study procedures. This audit may be random, or for cause (because the investigator is suspected of fraudulent data). Avoiding an audit is an incentive for investigators to follow study procedures. Different countries have different regulatory requirements and enforcement abilities. "An estimated 40 percent of all clinical trials now take place in Asia, Eastern Europe, central and south America. There is no compulsory registration system for clinical trials in these countries and many do not follow European directives in their

Clinical trial operations, says Dr. Jacob Sijtsma of the Netherlands-based WEMOS, an advocacy health organisation tracking clinical trials in developing countries." [29]


In March 2006 the drug TGN1412 caused catastrophic systemic organ failure in the individuals receiving the drug during its first human clinical trials (Phase I) in Great Britain. Following this, an Expert Group on Phase One Clinical Trials published a report.[30]

The cost of a study depends on many factors, especially the number of sites that are conducting the study, the number of patients required, and whether the study treatment is already approved for medical use. Clinical trials follow a standardized process. The costs to a pharmaceutical company of administering a Phase III or IV clinical trial may include, among others: manufacturing the drug(s)/device(s) tested staff salaries for the designers and administrators of the trial payments to the contract research organization, the site management organization (if used) and any outside consultants payments to local researchers (and their staffs) for their time and effort in recruiting patients and collecting data for the sponsor study materials and shipping communication with the local researchers, including onsite monitoring by the CRO before and (in some cases) multiple times during the study one or more investigator training meetings costs incurred by the local researchers such as pharmacy fees, IRB fees and postage. any payments to patients enrolled in the trial (all payments are strictly overseen by the IRBs to ensure that patients do not feel coerced to take part in the trial by overly attractive payments) These costs are incurred over several years. In the U.S. there is a 50% tax credit for sponsors of certain clinical trials.[31] National health agencies such as the U.S. National Institutes of Health offer grants to investigators who design clinical trials that attempt to answer research questions that interest the agency. In these cases, the investigator who writes the grant and administers the study acts as the sponsor, and coordinates data collection from any other sites. These other sites may or may not be paid for participating in the study, depending on the amount of the grant and the amount of effort expected from them. Clinical trials are traditionally expensive and difficult to undertake. Using internet resources can, in some cases, reduce the economic burden.[32]

Many clinical trials do not involve any money. However, when the sponsor is a private company or a national health agency, investigators are almost always paid to participate. These amounts can be small, just covering a partial salary for research assistants and the cost of any supplies (usually the case with national health agency studies), or be substantial and include 'overhead' that allows the investigator to pay the research staff during times in between clinical trials.

Clinical trial


In Phase I drug trials, participants are paid because they give up their time (sometimes away from their homes) and are exposed to unknown risks, without the expectation of any benefit. In most other trials, however, patients are not paid, in order to ensure that their motivation for participating is the hope of getting better or contributing to medical knowledge, without their judgment being skewed by financial considerations. However, they are often given small payments for study-related expenses like travel or as compensation for their time in providing follow-up information about their health after they are discharged from medical care.

Participating in a clinical trial

Phase 0 and Phase I drug trials seek healthy volunteers. Most other clinical trials seek patients who have a specific disease or medical condition.

Locating trials
Depending on the kind of participants required, sponsors of clinical trials use various recruitment strategies, including patient databases, newspaper and radio advertisements, flyers, posters in places the patients might go (such as doctor's offices), and personal recruitment of patients by investigators. Volunteers with specific conditions or diseases have Newspaper advertisements seeking patients and healthy volunteers to participate in clinical trials. additional online resources to help them locate clinical trials. For example, people with Parkinsons disease can use PDtrials to find up-to-date information on Parkinsons disease trials currently enrolling participants in the U.S. and Canada, and search for specific Parkinsons clinical trials using criteria such as location, trial type, and symptom.[33] Other disease-specific services exist for volunteers to find trials related to their condition.[34] Volunteers may also search directly on to locate trials using a registry run by the U.S. National Institutes of Health and National Library of Medicine. However, many clinical trials will not accept participants who contact them directly to volunteer as it is believed this may bias the characteristics of the population being studied. Such trials typically recruit via networks of medical professionals who ask their individual patients to consider enrollment.

Steps for volunteers

Before participating in a clinical trial, interested volunteers should speak with their doctors, family members, and others who have participated in trials in the past. After locating a trial, volunteers will often have the opportunity to speak or e-mail the clinical trial coordinator for more information and to answer any questions. After receiving consent from their doctors, volunteers then arrange an appointment for a screening visit with the trial coordinator.[35] All volunteers being considered for a trial are required to undertake a medical screen. There are different requirements for different trials, but typically volunteers will have the following tests:[36] Measurement of the electrical activity of the heart (ECG) Measurement of blood pressure, heart rate and temperature Blood sampling Urine sampling Weight and height measurement

Clinical trial Drugs abuse testing Pregnancy testing (females only)


Marcia Angell has been a stern critic of U.S. health care in general and the pharmaceutical industry in particular. She is scathing on the topic of how clinical trials are conducted in America: Many drugs that are assumed to be effective are probably little better than placebos, but there is no way to know because negative results are hidden.... Because favorable results were published and unfavorable results buried ... the public and the medical profession believed these drugs were potent.... Clinical trials are also biased through designs for research that are chosen to yield favorable results for sponsors. For example, the sponsor's drug may be compared with another drug administered at a dose so low that the sponsor's drug looks more powerful. Or a drug that is likely to be used by older people will be tested in young people, so that side effects are less likely to emerge. A common form of bias stems from the standard practice of comparing a new drug with a placebo, when the relevant question is how it compares with an existing drug. In short, it is often possible to make clinical trials come out pretty much any way you want, which is why it's so important that investigators be truly disinterested in the outcome of their work.... It is simply no longer possible to believe much of the clinical research that is published, or to rely on the judgment of trusted physicians or authoritative medical guidelines. I take no pleasure in this conclusion, which I reached slowly and reluctantly over my two decades as an editor of the New England Journal of Medicine.[37] Angell believes that members of medical school faculties who conduct clinical trials should not accept any payments from drug companies except research support, and that that support should have no strings attached, including control by the companies over the design, interpretation, and publication of research results. She has speculated that "perhaps most" of the clinical trials are viewed by critics as "excuses to pay doctors to put patients on a company's already-approved drug".[38] Seeding trials are particularly controversial.[39]

See also
Academic clinical trials Bioethics CIOMS Guidelines Clinical trial management Clinical data acquisition Clinical site Community-based clinical trial Contract Research Organization Data Monitoring Committees Drug development Drug recall Electronic Common Technical Document Ethical problems using children in clinical trials European Medicines Agency FDA Special Protocol Assessment Health care Health care politics IFPMA Investigational Device Exemption ISO 10006 Medical ethics Nocebo Nursing ethics Odds algorithm Orphan drug Philosophy of Healthcare Randomized controlled trial World Medical Association

Clinical Data Interchange Standards Consortium

Clinical trial


Rang HP, Dale MM, Ritter JM, Moore PK (2003). Pharmacology 5 ed. Edinburgh: Churchill Livingstone. ISBN 0-443-07145-4 Finn R, (1999). Cancer Clinical Trials: Experimental Treatments and How They Can Help You., Sebastopol: O'Reilly & Associates. ISBN 1-56592-566-1 Chow S-C and Liu JP (2004). Design and Analysis of Clinical Trials : Concepts and Methodologies, ISBN 0-471-24985-8 Pocock SJ (2004), Clinical Trials: A Practical Approach, John Wiley & Sons, ISBN 0-471-90155-5

External links
Clinical Trials [40] at the Open Directory Project The International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) [41] The International Clinical Trials Registry Platform (ICTRP) [42] IFPMA Clinical Trials Portal (IFPMA CTP) [43] to Find Ongoing & Completed Trials of New Medicines [44] PDtrials [45] for Parkinson's disease clinical trials ICH GCP Guidelines [46]

[1] Avorn J. (2004). Powerful Medicines, pp. 129-133. Alfred A. Knopf. [2] Van Spall HG, Toren A, Kiss A, Fowler RA (March 2007). "Eligibility criteria of randomized controlled trials published in high-impact general medical journals: a systematic sampling review". JAMA 297 (11): 123340. doi:10.1001/jama.297.11.1233. PMID17374817. [3] The regulatory authority in the USA is the Food and Drug Administration (United States); in Canada, Health Canada; in the European Union, the European Medicines Agency; and in Japan, the Ministry of Health, Labour and Welfare [4] (http:/ / www. pmda. go. jp/ ich/ s/ s1b_98_7_9e. pdf) (Japanese) [5] ICH (http:/ / www. ich. org/ cache/ compo/ 276-254-1. html) [6] " Clinical trials in oncology (http:/ / books. google. com/ books?id=Zke8ocubNXAC& pg=PA1& dq& hl=en#v=onepage& q=& f=false)". Stephanie Green, Jacqueline Benedetti, John Crowley (2003). CRC Press. p.1. ISBN 1584883022 [7] " Clinical Trials Handbook (http:/ / books. google. com/ books?id=d8GxG0d9rpgC& pg=PA118& dq& hl=en#v=onepage& q=& f=false)". Shayne Cox Gad (2009). John Wiley and Sons. p.118. ISBN 0471213888 [8] Toby E. Huff (2003), The Rise of Early Modern Science: Islam, China, and the West, p. 218. Cambridge University Press, ISBN 0521529948. [9] Tschanz, David W. (May/June 1997). "The Arab Roots of European Medicine". Saudi Aramco World 48 (3): 2031. [10] D. Craig Brater and Walter J. Daly (2000), "Clinical pharmacology in the Middle Ages: Principles that presage the 21st century", Clinical Pharmacology & Therapeutics 67 (5), p. 447-450 [448]. [11] "James Lind: A Treatise of the Scurvy (1754)" (http:/ / www. bruzelius. info/ Nautica/ Medicine/ Lind(1753). html). 2001. . Retrieved 2007-09-09. [12] O'Rourke, Michael F. (1992). "Frederick Akbar Mahomed". Hypertension (American Heart Association) 19: 212217 [213] [13] O'Rourke, Michael F. (1992). "Frederick Akbar Mahomed". Hypertension (American Heart Association) 19: 212217 [212] [14] Glossary of Clinical Trial Terms, NIH (http:/ / clinicaltrials. gov/ ct2/ info/ glossary) [15] ICH Guideline for Good Clinical Practice: Consolidated Guidance (http:/ / www. fda. gov/ cder/ guidance/ 959fnl. pdf) [16] International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (http:/ / www. ich. org) [17] What is informed consent? US National Institutes of Health, (http:/ / clinicaltrials. gov/ ct2/ info/ understand) [18] "Error: no |title= specified when using {{Cite web}}". Food and Drug Administration. January 2006. [19] The Lancet (2009). "Phase 0 trials: a platform for drug development?". Lancet 374 (9685): 176. doi:10.1016/S0140-6736(09)61309-X. [20] Silvia Camporesi (October 2008). "Phase 0 workshop at the 20th EORT-NCI-AARC symposium, Geneva" (http:/ / www. ecancermedicalscience. com/ blog. asp?postId=27). ecancermedicalscience. . Retrieved 2008-11-07. [21] "Guidance for Institutional Review Boards and Clinical Investigators" (http:/ / www. fda. gov/ oc/ ohrt/ irbs/ drugsbiologics. html). Food and Drug Administration. 1999-03-16. . Retrieved 2007-03-27. [22] "Periapproval Services (Phase IIIb and IV programs)" (http:/ / www. covance. com/ periapproval/ svc_phase3b. php). Covance Inc.. 2005. . Retrieved 2007-03-27.

Clinical trial
[23] Arcangelo, Virginia Poole; Andrew M. Peterson (2005). Pharmacotherapeutics for Advanced Practice: A Practical Approach. Lippincott Williams & Wilkins. ISBN0781757843. [24] Web Site Editor; Crossley, MJ; Turner, P; Thordarson, P (2007). "Clinical Trials - What Your Need to Know" (http:/ / www. cancer. org/ docroot/ ETO/ content/ ETO_6_3_Clinical_Trials_-_Patient_Participation. asp). American Cancer Society 129 (22): 7155. doi:10.1021/ja0713781. PMID17497782. . [25] Yamin Khan and Sarah Tilly. "Seasonality: The Clinical Trial Manager's Logistical Challenge" (http:/ / www. pharm-olam. com/ pdfs/ POI-Seasonality. pdf). Pharm-Olam International (http:/ / www. pharm-olam. com/ ). . Retrieved 26 April 2010. [26] Yamin Khan and Sarah Tilly. "Flu, Season, Diseases Affect Trials" (http:/ / appliedclinicaltrialsonline. findpharma. com/ appliedclinicaltrials/ Drug+ Development/ Flu-Season-Diseases-Affect-Trials/ ArticleStandard/ Article/ detail/ 652128). Applied Clinical Trials Online. . Retrieved 26 February 2010. [27] http:/ / irb. ucsd. edu/ ab_2328_bill_20020826_enrolled. pdf [28] Back Translation for Quality Control of Informed Consent Forms (http:/ / www. gts-translation. com/ medicaltranslationpaper. pdf) [29] Common Dreams (http:/ / www. commondreams. org/ archive/ 2007/ 12/ 14/ 5838/ ) [30] Expert Group on Phase One Clinical Trials (Chairman: Professor Gordon W. Duff) (2006-12-07). "Expert Group on Phase One Clinical Trials: Final report" (http:/ / www. dh. gov. uk/ en/ Publicationsandstatistics/ Publications/ PublicationsPolicyAndGuidance/ DH_063117). The Stationery Office. . Retrieved 2007-05-24. [31] "Tax Credit for Testing Expenses for Drugs for Rare Diseases or Conditions" (http:/ / www. fda. gov/ orphan/ taxcred. htm). Food and Drug Administration. 2001-04-17. . Retrieved 2007-03-27. [32] Paul, J.; Seib, R.; Prescott, T. (Mar 2005). "The Internet and clinical trials: background, online resources, examples and issues" (http:/ / www. jmir. org/ 2005/ 1/ e5/ ) (Free full text). Journal of medical Internet research 7 (1): e5. doi:10.2196/jmir.7.1.e5. PMID15829477. PMC1550630. . [33] http:/ / www. pdtrials. org/ en/ about_PDtrials_what [34] http:/ / www. mlanet. org/ resources/ hlth_tutorial/ mod4c. html [35] http:/ / www. pdtrials. org/ en/ participate_clinicalresearch_how [36] Life on a Trial - What to Expect (http:/ / www. beavolunteer. co. uk/ index. php?option=com_content& view=article& id=25& Itemid=21) [37] Angell, Marcia (2009), "Drug Companies & Doctors: A Story of Corruption", New York Review of Books, Vol 56, No 1; 15 January 2009. [38] Angell M. (2004). The Truth About Drug Companies, p. 30. [39] Sox HC, Rennie D (August 2008). "Seeding trials: just say "no"" (http:/ / www. annals. org/ cgi/ pmidlookup?view=long& pmid=18711161). Ann. Intern. Med. 149 (4): 27980. PMID18711161. . Retrieved 2008-08-21. [40] http:/ / www. dmoz. org/ Business/ Biotechnology_and_Pharmaceuticals/ Pharmaceuticals/ Products_Evaluation/ Clinical_Trials/ / [41] http:/ / www. ich. org [42] http:/ / www. who. int/ trialsearch [43] http:/ / clinicaltrials. ifpma. org [44] http:/ / clinicaltrials. gov [45] http:/ / pdtrials. org [46] http:/ / www. gcphelpdesk. com


Article Sources and Contributors


Article Sources and Contributors

Epidemiological methods Source: Contributors: Arcadian, Barticus88, Charles Matthews, Dcfleck, Gsociology, JeffreyN, Oleg Alexandrov, Patiwat, SimonP, Tobycat, Wikid, Wmahan, 16 anonymous edits Study design Source: Contributors: Ahoerstemeier, Arcadian, Astaines, Evands, Falk Lieder, G716, GTBacchus, Gccwang, Holzi, Iss246, Jeepday, Kiefer.Wolfowitz, Koavf, KyraVixen, Melcombe, Mosis, Nanook77, Open2universe, Ps07swt, ROxBo, Richard Keatinge, Schwnj, Simon66217, Stevenmac4201, Xasodfuih, 31 anonymous edits Blind experiment Source: Contributors: 168..., APT, AThing, Academic Challenger, AderakConsteen, Allmuth, Apollo reactor, Arcadian, BD2412, Bebz08, Beland, Century0, Ceyockey, Costyn, Dav2008, Ddstretch, Dfrg.msc, Drbreznjev, Edcolins, Eddieboyoreilly, Feezo, Fermion, G716, Gakrivas, Giftlite, Gr8wight, Hanjabba, Hankwang, Haya shiloh, Henry Flower, Hephaestos, Hgilbert, Ianharvey, Ibpassociation, Ijliao, Ingolfson, Jibegod, Jjamison, John Carter, Johnkarp, Jorge Stolfi, JunCTionS, KJBurns, Kiefer.Wolfowitz, Kku, Lakn, Lethaniol, LilHelpa, Lowellian, MER-C, Manfi, Marco Krohn, Mightyms, Mitch Ames, Nick San, Nsaa, Om3rtA, Peyna, Phe, Phil Boswell, Plastic Fish, Pmrobert49, Psydude, Quercus solaris, Reytan, Rfwoolf, RichardVeryard, Rjwilmsi, Ronabop, Sadmachine14, Seans Potato Business, Seren-dipper, Shadowjams, Snoyes, SteveJothen, Subheight640, Subversive, Tarotcards, Thatcher, The Anome, Thumperward, Towne, TrippingTroubadour, Twinxor, Vinders, WhisperToMe, Zhenqinli, 120 anonymous edits Randomized controlled trial Source: Contributors: 168..., 2over0, Afa86, Alex.tan, AngoMera, Antoncampos, Arcadian, AxelBoldt, Bakerstmd, BarretBonden, Barticus88, Beland, Bongwarrior, Btyner, Burlywood, Ceyockey, Chenxinghan, Cordless Larry, Costyn, Csol, Davidnortman, Davwillev, Dfrg.msc, Dirrival, Drbreznjev, Edcolins, Edgar181, Ericjsilva, GSMartin, Hawksj, Hephaestos, Hielor, Hooriaj, Hugh Waddington, Hugh2414, Huji, Iainspeed, Ibpassociation, Instantnematode, Jakew, Jbarrett, John of Reading, Johnbibby, Johnkarp, Khaledelmansoury, Kiefer.Wolfowitz, Knutux, Koolkao, Kroetz, Kymacpherson, LeadSongDog, LittleHow, Llort, MattKingston, MaxPont, Michael Hardy, Mikael Hggstrm, Mimihitam, Mizst, Mootros, Nephron, Nescio, Neurotip, Nopetro, Omegatron, Patrick, Pedvi, Peter-F, Ph.eyes, PhilKnight, Philng, Piano non troppo, PierreAbbat, Pmuean, PrevMedFellow, Pvosta, Quercus solaris, Qwfp, Rich Farmbrough, Rjwilmsi, Ronabop, Rorro, Rsabbatini, Sam Hocevar, Shardulkoza, Sj, Sleske, Squids and Chips, Tatsundo h, The Anome, Tsange, Uirauna, Unara, Vicarious, Wapcaplet, WhatamIdoing, Zvika, 102 anonymous edits Statistical inference Source: Contributors: Aagtbdfoua, Ancheta Wis, Arcadian, Bo Jacoby, CRGreathouse, Christian List, Conversion script, Den fjttrade ankan, Dick Beldin, Eric Kvaalen, G716, Giftlite, Graham87, Henrygb, Hoary, JA(000)Davidson, Jkominek, Jtneill, Kenneth M Burke, Kiefer.Wolfowitz, Larry_Sanger, Lfkrebs, Maher27777, MarkSweep, Mattisse, McPastry, Melcombe, Michael Hardy, Modeha, Nbarth, Oleg Alexandrov, Ph.eyes, Piotrus, PleaseStand, Pollinosisss, Rich Farmbrough, RockfanRecords, Rongou, Shalom Yechiel, The Tetrast, Tomi, 19 anonymous edits Statistical hypothesis testing Source: Contributors: AbsolutDan, Acroterion, Adamjslund, Adjespers, Adoniscik, Agricola44, Alansohn, Albmont, AlexKepler, AndonicO, Andresswift, Andrew73, Arcadian, Arthena, Aua, B, Badjoby, Bdolicki, Benlisquare, Birge, BlaiseFEgan, Boxplot, Brougham96, Btyner, Cburnett, Conversion script, Coppertwig, Crasshopper, Crazy george, Cretog8, Cybercobra, Cyc, Cydmab, Czap42, Dailyknowledge, Daniel11, DavidCBryant, DavidSJ, Davidruben, Dcljr, Ddxc, Den fjttrade ankan, Dhaluza, Digfarenough, Drakedirect, Epbr123, Feinstein, G716, Gabbe, Ggchen, Giftlite, Graham87, Henrygb, Hu, Hu12,, J heisenberg, J04n, Jackol, Jackzhp, Jake Wartenberg, JamesAM, Jason.grossman, John Quiggin, Johnkarp, Jollyroger131, Juliancolton, Jyeee, Kateshortforbob, Kenosis, Kjtobo, Krawi, Larry_Sanger, LeilaniLad, Levineps, Lordmetroid, Mattisse, Mbhiii, Mcld, Melcombe, Meritus, Michael C Price, Michael Hardy, Mikelove1, Mortense, MrOllie, MystRivenExile, NYC2TLV, Nbarth, Nijdam, Nsaa, Oleg Alexandrov, Patrick, Pdbogen, Pejman47, Pete.Hurd, Pewwer42, Pgan002, Philippe, Philippe (WMF), Poor Yorick, Pstevens, Qwfp, Radagast83, Reedy, Remi Arntzen, Requestion, Rich Farmbrough, Robbyjo, Robma, Ronz, Sam Blacketer, Schwnj, SiobhanHansa, Skbkekas, Slakr, Someguy1221, Spalding, Statlearn, Storm Rider, Strategist333, Sullivan.t.j, TedDunning, Tedunning, Terry0051, The Anome, The-tenth-zdog, Thoytpt, Tim bates, Tom Lougheed, Trainspotter, TrickyTank, Ultramarine, Ulyssesmsu, Urdutext, Utcursch, Varuag doos, Verlainer, Wikid77, Wile E. Heresiarch, Wittygrittydude, Wolverineski, Woohookitty, Xdenizen, Xiphosurus, Zheric, Zvika, , 295 anonymous edits Meta-analysis Source: Contributors: Alphax, Amead, Andre Engels, Ap, Armandeh, Arnoutf, Atgvg, Babbage, Beetstra, Bfearing, Billjefferys, Bobber0001, Bondegezou, Bopps, CBM, CLW, Chainsawriot, Cordless Larry, Crispycook, DSLeB, DarkAxius, Davidmack, Devilly, Dreamback1116, Dysprosia, Ellusion, Flyerdog11, G716, Giftlite, Gnusmas, Haywardmedical, Hbent, Hollybeth, Hugh2414, Huji, Hyju, IceKarma, ImperfectlyInformed, Indquimal, Ioannes Pragensis, Jjsimpsn, Joe Decker, Jrandall, Jtneill, Kauczuk, Kazvorpal, Kiefer.Wolfowitz, Kikos, Kochhard, Koolkao, Leonbax, Lexor, Lipothymia, Lumos3, LutzPrechelt, Mangojuice, Melcombe, Michael Hardy, Mike2000, MikeRiggs, Mycatharsis, Mydogategodshat, NYC2TLV, Nakon, NavakanthRajulapati, Neilc, Nesbit, No Guru, NoisyJinx, Ph.eyes, Quimr, Qwfp, RekishiEJ, Rich Farmbrough, RichardF, Rmotz, Rodgarton, Salinecjr, Sam Blacketer, Shoemaker's Holiday, Simonm223, Stori, Subversive.sound, Sweetmarie9, TesseUndDaan, TimVickers, Tookla, Trontonian, UserUserUserUserUserUserUserUserUser, Vassyana, Victor Chmara, Widefox, Wiki091005!!, Wikilibrarian, Wildt, Wmahan, Wotnow, Wyvern1110, Xomyork, 135 anonymous edits Clinical trial Source: Contributors: 97198, Abcb336, Alex.tan, Alfie66, Allthingstoallpeople, Amusella, Angelafspencer, Angesalm, Arcadian, Argon233, ArielGold, Avb, Badger.the, Banjo.randomizer, Baselbonsai, Beland, Benmento, BethBukata, Bfinn, Bgs022, Bicanashy, BillBell, Biowizardvinoth, Blehfu, Bluesandsm, Boxplot, Brim, Burlywood, Cacafuego95, CaddyJ, Calltech, Campelli, Camw, Canuck4, Carlijn, Ceyockey, Chrispounds, Chuunen Baka, ClinicSoft, ClinicalGuy, Cnshealthcare, Cojax, CommodiCast, Cp111, Cpichardo, Cultofpj, Cutler, Damian Yerrick, Dancter, Dave Nelson, Daveswagon, David.Throop, Davidgrunwald, Dawn Bard, Deez n, Dfrg.msc, Diberri, DocWatson42, Dogbertd, Dr.michael.benjamin, Drhirose, Edcolins, Edgar181, EverSince, Falciparum, Ferkelparade, Ferrar08, FitzColinGerald, FreplySpang, G716, GJeffery, Gaff, Gak, Gene Nygaard, Ghassankaram, Giavedonil, Giftlite, Giovanni Abbadessa, Gkklein, Gnusworthy, GraemeL, HJ Mitchell, Happywaffle, Helper6860, Holopoj, Hopping, Hu12, Hunter1084, ITBlair, Ibd-uc, Ibpassociation, ImperfectlyInformed, Iridescent, Jackol, Jagged 85, Jane rawr, Jaroslav Pavli, Jeff Silvers, Jfdwolff, JimJonesap, Jkpjkp, Jmh649, Joeerato, Johnkarp, JonHarder, Jonathan321, Junkyardprince, Justin Eiler, Jwanders, Jytdog, Karada, Kauczuk, Kingpin13, Kinnison, Kirill Lokshin, Knutux, Koolkao, Kpjas, Kristenq, Kudret abi, Kyoko, Lagostin, Lahiru k, Latka, Leevanjackson, Leon7, LibLord, Lilac Soul, Lindsay658, Lisastockdale, LittleHow, Lotje, Luna Santin, Lyndsay31, MER-C, MadScientistMatt, Malte.koester, Mamour, MarkSutton, Melcombe, MeredithParmer, Michael Hardy, Michaelbluejay, Michelle stone1, Mike2vil, Mild Bill Hiccup, Minglex, Mrgoodguy, Msweany, Mwanner, NellieBly, Noca2plus, Norman21, Npang, Nrets, Nunquam Dormio, Oakbell, Odcaapshs, PMO writers, Pabloes, PamD, Paul144, Per84, Peter grotzinger, Petersam, Petewailes, Phantomsteve, PhilipO, Piano non troppo, Plasticup, Pmh35, Pvosta, Pyfan, QBay, Quantyz, Qwfp, R'son-W, Radio89, Redux, Restepc, Riana, Rich Farmbrough, Richard Arthur Norton (1958- ), RichardF, Richardjpitman, Richi, Rjwilmsi, Rod57, Rumping, Sadaphal, Saihtam, Sam Staton, Scottk100, Sexecutioner, Shadow1, Siguva, SimonP, Simonhalsey, Siroxo, Sofia Roberts, Spacejamiri, SparsityProblem, Spchjok, Sriram sh, Sseretti84, Stevenfruitsmaak, Suprateep, Surfunc, Sytwix, TABryant450, Tcaruso2, Techelf, Teresag, The freddinator, Thehalfone, Thincat, ThisGregM, Tide rolls, TimVickers, Tito4000, Tobby72, Tramb, Trovardig, Una Smith, User27091, Uthbrian, Vassyana, Veinor, WLU, Wakekamel, WatchAndObserve, Wdynamia, WhatamIdoing, Wiki emma johnson, WikiDan61, William Avery, Willking1979, Wjjessen, Wjs, Wouterstomp, X!, Xasodfuih, Yamamoto Ichiro, Ybbor, Ymerino, Yurik, Zefr, ZimZalaBim, Zrenneh, 497 anonymous edits

Image Sources, Licenses and Contributors


Image Sources, Licenses and Contributors

Image:Flowchart of Phases of Parallel Randomized Trial - Modified from CONSORT 2010.jpg Source: License: GNU Free Documentation License Contributors: User:PrevMedFellow Image:funnel_1.png Source: License: Public Domain Contributors: User:TesseUndDaan Image:funnel_2.png Source: License: Public Domain Contributors: User:TesseUndDaan File:Clinical trial newspaper advertisements.JPG Source: License: unknown Contributors: Sofia Roberts



Creative Commons Attribution-Share Alike 3.0 Unported http:/ / creativecommons. org/ licenses/ by-sa/ 3. 0/