Beruflich Dokumente
Kultur Dokumente
.N.B.:
- Incidence measures (e.g. relative risk or relative rate) can't be directly measured in case-control study,
- Because the people being studied are those who have already developed the disease.
1|Page
STATISTICS STEP2 TikiTaka
- Relative risk and relative rate are calculated in cohort studies, where people are followed over time for the
occurrence of the disease.
- Prevalence odds ratio is calculated in cross-sectional studies to compare the prevalence of the disease in different
populations.
. Divides the study group into "exposed" and "non exposed" to the risk factors.
. Each subject is then follow prospectively till the presence of the disease.
. Is a prospective observational study in which groups are chosen based upon the presence or absence of one or
more risk factors?
. All subjects are then observed over time for the development of the disease of interest.
. Thus allowing estimation of the incidence within the total population and comparison of incidences between
subgroups.
. It is best for determining the incidence of the disease & comparing the incidence of the disease in 2
populations,
. (One with and one without a given risk) allows for calculation of a relative risk.
. It is stronger than case-control study and cross sectional study.
. Loss to follow-up in a prospective studies creates a potential for selection bias (selective loss of high risk or
low risk subjects).
. e.g. if a substantial number of subjects are lost to follow-up in exposed and/or unexposed groups,
. It is possible that the lost subjects differ in their risk of developing the outcome from the remaining,
. Such loss may result in either overestimation or underestimation of the association between exposure and the
disease.
. Example: if 30% of subjects were lost to follow-up in a prospective study for the relation of alcohol and
breast cancer,
. There is no information available on whether these subjects develop breast cancer or not.
. The number (30%) is substantial and will influence the outcome if heterogeneity in developing breast cancer
exists between the lost subjects and the remaining subjects.
. For example if the subjects lost in the exposed group experienced more breast cancer than those with follow-
up (selective loss of high risk subjects).
. As a result, the measure of association might be underestimated.
. To reduce the potential for selection bias in prospective studies, investigators try to achieve high rates of
follow-up
2|Page
STATISTICS STEP2 TikiTaka
.N.B.:
- Median survival: used to compare the median survival times in two or more groups of patients (e.g. receiving new
treatment or placebo).
- Median survival is calculated in cohort study or clinical studies.
.N.B.:
- Prevalence odds ratio: is calculated in cross-sectional studies to compare the prevalence of the disease between two
different peoples.
. It is the frequency of new cases of a disease arising in a population at risk over a specified time period.
. It is the measure of the appearance of new cases.
. PREVALENCE: is the measure of those with the disease in the population at a particular point in time.
. the relation between them in a stable population (little migration) can be demonstrated by:
Prevalence = (incidence) (time).
. So if the incidence is fixed in a stable population, the prevalence is increased if there are factors,
That prolong survival (i.e. disease duration) e.g. improved quality of care.
3|Page
STATISTICS STEP2 TikiTaka
. In a cohort study, the study subjects are free of the outcome at the time a study begins.
. A CASE SERIES:
. A study involving only patients already diagnosed with the condition of interest.
. It is helpful in determining the natural history of uncommon conditions.
. But provides no information about the disease incidence.
. CLINICAL TRIALS:
. Compare the therapeutic benefit of different interventions in patient already diagnosed with a particular disease.
. Usually subjects are randomly arranged into exposed (treatment group) & placebo and then followed to detect the
development of the outcome of interest.
. Can't be used to determine disease incidence.
4|Page
STATISTICS STEP2 TikiTaka
. CLUSTER ANALYSIS:
___________________
. Is the grouping of different data point into similar categories?
. Usually involves randomization at the level of groups rather than at the level of individuals.
. EFFECT MODIFICATION:
_____________________
. Occurs when the effect a main exposure on an outcome is modified by another variable.
. It is not a bias.
. It is a natural phenomenon that should be described not corrected as it is not a bias or confounding.
. Example: the effect of oral contraceptives on breast cancer is modified by the family history.
. i.e. women with +ve family history have an increased risk, while women without +ve family history don't have an
increased risk.
. Other examples: studying the effect of estrogen on the risk of venous thrombosis (modified by smoking).
. Also studying of the risk of lung cancer in people exposed to asbestos (greatly depends on / modified by smoking).
5|Page
STATISTICS STEP2 TikiTaka
. For example:
. the effect of a new estrogen receptors agonist drug on the incidence of DVT is modified by smoking status:
. Smokers taking the drug have an increased risk of developing DVT, while nonsmokers taking the drug don't.
. It may be confused with confounding, both can be differentiated by dividing the whole cohort into subgroups
(stratified analysis).
. Imagine that smoking is a confounding that, by itself is associated with a higher risk of DVT, so if more smokers are
taking the drug,
it might appear that the drug causes DVT, but when stratified analysis is performed by analyzing smokers and
nonsmokers separately,
it will appear that the drug is no longer associated with DVT.
. LATENT PERIOD:
________________
. Is a time period required for an exposure to start the effect i.e the time require from getting exposed to outcome.
. In infectious diseases it is relatively short, while in chronic diseases (e.g. cancer or CAD),
it may be very long and extended period of exposure may be required to affect the outcome.
. Latent period also can be applied to the exposure to risk modifier, as it may need to be continuous over a certain
period of time before influencing the outcome.
. Latent period is a natural phenomenon not a bias.
6|Page
STATISTICS STEP2 TikiTaka
. The median is much more resistant to the outliers as is located in the middle of the dataset where the observations
usually dont differ much from each other.
. One weakness of the RR is that it gives no clue whether such finding can be explained by chance alone.
. The confidence interval and the "P" value can help strengthen the finding of the study.
. For the study to be statistically significant:
1- The confidence interval must not contain null value (1).
2- The "p" value should be less than 0.05 (i.e. < 5% chance the result obtained were due to chance alone).
3- The RR is not Null value (1).
7|Page
STATISTICS STEP2 TikiTaka
- The "p" value is used to strengthen the results of the study, it is defined as the probability of obtaining the result by
chance alone.
- e.g. "P" value is 0.01 means that (the probability of obtaining the result by chance alone is 1%).
- There commonly accepted upper limit (cut-off point) of the "P" value for the study,
To be considered statistically significant is 0.05 (i.e. less than 5%).
- The "P" value deals with random variability, not bias.
- If the "P" value less than 0.05 (i.e. the study is statistically significant), the 95% confidence interval doesn't contain
1.0 (the null value for RR).
- A relative risk of 0.71 shows that the drug decreased the risk of mortality by 29% (the null value for RR is 1).
e.g.: A case of RR 1.6 (greater than 1) & the confidence interval 1.02-2.15 (doesn't contain the null value 1),
so for the study to be statistically significant the "P" value must be less than 0.05.
N.B: Very important to know how to calculate relative risk from the 22 table:
.N.B.:
- The power of a study is the ability to detect a difference between two groups (treated versus non treated, exposed
versus non exposed).
8|Page
STATISTICS STEP2 TikiTaka
- Increasing the sample size --> increases the power of the studying consequently makes
the confidence interval of the point of estimate (e.g. relative risk) tighter.
- If the sample size is small --> low power of study to detect the difference between exposed and non exposed subjects
&
this makes the confidence interval of the study wide (e.g. 0.8-3.1) and makes the study statistically insignificant.
- And if we increase the sample size --> the confidence interval will be tighter and the study will be statistically
significant.
- Relative risk reduction (RRR)= ARR(control group) - ARR (treatment group)/ ARR (control group).
. For example: drug X (deaths=60 & living=20) placebo drug (deaths=38 & living=38).
9|Page
STATISTICS STEP2 TikiTaka
. SELECTION BIAS:
_________________
. Results from the manner in which the subjects are selected for the study, from the selective losses from the follow-
up.
. BERKSONS BIAS:
_________________
. It is a selection bias that can be created by selecting a hospitalized patients as the control group.
. INFORMATION BIAS:
___________________
. Occurs due to imperfect assessment of the association between the exposure and outcome.
. As a result of errors in the measurements of exposure and outcome status.
. It can be minimized by using standardized techniques for surveillance and measurement of outcomes
10 | P a g e
STATISTICS STEP2 TikiTaka
. MEASUREMENT BIAS:
___________________
. Occurs from poor data collection with inaccurate results.
. LEAD-TIME BIAS:
_________________
. Lead-time bias should be considered while evaluating any screening test.
. It happens when two interventions are compared to diagnose a disease,
and one of them diagnose the disease earlier than the other without an effect on the outcome (survival).
. What actually happens is that detection of the disease was made at an earlier point of time,
. But the disease course itself or the prognosis did not change.
. So the screened patients appeared to live longer from the time of diagnosis till the time of death.
N.B.: IN USMLE:
. Think of LEAD BIAS when you see " a new screening test" for poor prognosis diseases like lung cancer or
pancreatic cancer.
11 | P a g e
STATISTICS STEP2 TikiTaka
. RECALL BIAS:
______________
. Occurs when a study participant is affected by prior knowledge to answer a question.
. Result from inaccurate recall of past exposure by people in the study and applies mostly to retrospective studies as
case-control study.
. People who have suffered a diverse event (such as having a child with congenital anomalies) are more likely to recall
previous risk factors than,
. People who have not experienced a poor outcome.
. This is more common in case-control studies than in randomized clinical trials.
. Detection Bias:
_________________
. Refers to the fact that a risk factor itself may lead to extensive diagnostic investigations and increase the probability
that a disease is identified.
. For example: patients who smoke may undergo increased imaging surveillance due to their smoking status, which
would detect more cases of cancer in general.
. RESPONDENT BIAS:
_________________
. Occurs when the outcome of the test is obtained by the patient's response not by objective diagnostic methods (e.g.
migraine headache).
. SUSCEPTABILITY BIAS:
12 | P a g e
STATISTICS STEP2 TikiTaka
______________________
. Is a type of selection bias where a treatment regimen is selected for a patient based on the severity of their condition?
without taking into account other possible confounding variables.
. Offline case 20.
. Allocation bias:
__________________
. It may result from the way that treatment and control groups are assembled.
. It may occur if the subjects are assigned to the study groups of a clinical trial in a non-random fashion.
. For example in a study group comparing oral NSAIDs and intra-articular corticosteroid injections for the treatment
of osteoarthritis,
Obese patients may be preferentially assigned to the corticosteroid group (affect the outcome).
. Beta error:
_____________
. Refer to a conclusion that there is no difference between the groups studied when a difference truly existing.
. It is a random error not a systemic error (i.e bias).
. CONFOUNDING:
______________
. Occurs when at least part of the exposure-disease relationship can be explained by another variable (confounding).
. Due to presence of one or more variables associated independently with both the exposure and the outcome.
. For example: cigarette smoking can be a confounding factor in studying the association between maternal alcohol
drinking and low birth weight babies.
. As cigarette smoking is independently associated with alcohol consumption and low birth weight babies.
13 | P a g e
STATISTICS STEP2 TikiTaka
. Hawthorne effect:
___________________
. It is the tendency of a study population to affect the outcome because these people are aware that they are being
studied.
. This awareness leads to consequent change in behavior while under observation --> seriously affecting the validity
of the study.
. It is usually seen in studies that concern behavioral outcomes or outcomes that can be influenced by behavioral
changes.
. In order to minimize the Hawthorne effect, the studied subjects can be kept unaware that they are being studied.
. Pygmalion EFFECT:
___________________
. It describes researcher's beliefs in the efficacy of treatment that can potentially affect the outcome.
1- Selection bias can be controlled by choosing a representative sample of the population for the study & achieving a
high rate of follow up.
2- Observer's bias can be controlled by blinding technique.
3- Ascertainment bias can be controlled by selecting a strict protocol of case ascertainment.
4- Confounders: can be avoided by 3 methods in the design stage of the study; matching restriction and
randomization.
- Matching is used in case control study in which select variables that could be confounders (age, race,..) then,
Cases and controls are selected based on the matching variables.
- Randomization is commonly employed in clinical trials its purpose is to balance various factors (confounders) that
can
14 | P a g e
STATISTICS STEP2 TikiTaka
Influence the estimate of association between the treatment and placebo groups so that the uncompounded effect of
the exposure can be isolated.
- A very important advantage of randomization when compared to other methods is the possibility to control,
The known risk factors (as; Age, severity of the disease) as well as unknown & difficult to measure confounders as
(Level of stress, socioeconomic status) and make all confounders evenly distributed between the treatment group
and the placebo.
- In clinical trials, randomization is said to be successful, when there is similarity in the distribution of
The baseline characteristics (age, race, prevalence...) between the treatment and placebo groups
i.e the confounders are evenly distributed between the treatment and the placebo groups.
. HAZARD RATIO:
_______________
. It is the ratio of the chance of an event occurring in the treatment arm (drug or group of interest),
. Compared to the chance of that event occurring in the control arm (the other drug or group) during a set period of
time.
. Hazard ratio = event occurring in the test group / event occurring in the control group.
. So; the lower the hazard ratio, the less likely the event will occur in the treatment arm.
. The higher the ratio, the more likely the event will occur in the treatment arm.
. A ratio close to 1 indicates no significant difference between the 2 groups,
. Example: Hazard ratio of 2 drugs A & B in bleeding complications:
. Hazard ratio for major bleeding = 0.93 i.e. close to 1 means that both groups are similar to each others in this event.
. Hazard ratio for intracranial bleeding = 0.41 (indicates the lower chance of drug "A" to cause intracranial bleeding
than drug "B").
. Hazard ratio for GIT bleeding = 1.50 (indicates that drug "A" has a higher chance to cause GIT than drug "B").
. Hazard ratio for life threating bleeding = 0.80 (indicates the lower chance of drug "A" to cause intracranial bleeding
than drug "B").
. Hazard ratio for total bleeding = 0.91 (indicates the slight lower chance of drug "A" to cause intracranial bleeding
than drug "B").
. In case number (11 offline) you should focus on the baseline value in the case in take the corresponding hazard ratio
in the study then
. Decide, which one of them has the greater hazard of hyperkalemia (N.B. Ca channel blockers affects GFR).
15 | P a g e
STATISTICS STEP2 TikiTaka
. SUCCESSFUL RANDOMIZATION:
___________________________
. In any randomized clinical study, the goal of successful randomization is:
1- to eliminate bias in treatment assignments.
2- Blind the instigators from the identity of the patients who receive the treatment arm.
3- Minimize the confounding variables.
. Ideal randomization allows for adequate statistical power and should include:
1- Equal patient group sizes.
2- Low selection bias.
3- Low probability of confounding variables.
. A listing of the base line characteristics of the patients in each arm would demonstrate,
if the two arms had patients with similar characteristics and would insure the proper randomization occurred in the
study
16 | P a g e
STATISTICS STEP2 TikiTaka
. ANOVA test:
_____________
. I.e. Analysis of variances (ANOVA).
. Used to compare two or more means (determine whether there are significant differences between the means of 2 or
more independent groups.
.
. e.g. ANOVA can be used to assess for difference in mean blood pressure among three samples of populations,
grouped by exercise status (never exercise, exercise occasionally and exercise frequently).
. META-ANALYSIS:
________________
. Is an epidemiologic method for pooling of the data from several studies to do an analysis having a relatively big
statistical power.
. For example: individual studies assessing the effects of aspirin on certain cardiovascular events may be inconclusive,
17 | P a g e
STATISTICS STEP2 TikiTaka
. However analysis of data compiled from multiple clinical trials may reveal a significant benefit.
# Patient Randomization:
------------------------
18 | P a g e
STATISTICS STEP2 TikiTaka
1) ACEIs:
2) Beta blocker:
3) Ca channel blocker:
N.B.:
- In contrary to normal distribution curve, most of data in real world statistical analysis have asymmetrical
distributions:
19 | P a g e
STATISTICS STEP2 TikiTaka
. SENSITIVITY:
______________
. Sensitivity --> the proportion of true +ve cases among all diseased cases (Sensitivity = true +ve by the test/all
patients that are actually diseased).
. Indicates the ability of a test to detect those patient with disease.
. A higher sensitivity --> the higher the test detect patient with the disease --> decrease false negatives.
. Screening tests (especially for diseases with severe sequels) should have a high sensitivity.
. SPECIFICITY:
______________
. Specificity --> the proportion of true -ve cases among all non diseased cases (Specificity = true -ve by the test/all
patients that are actully free).
20 | P a g e
STATISTICS STEP2 TikiTaka
. Is a measure of the true negative rate and indicates how will a test can rule out a given condition (exclude those
without the disease).
. The higher the specificity the more likely that most healthy patients will have a -ve test results.
. The higher the specificity --> the less likely the false +ves.
. They are fixed values that are not vary with the pre-test probability of a disease or with th prevalence of the disease.
. The ideal diagnostic test should have high sensitivity and specificity.
N.B.:
- Raising the cutoff point of a diagnostic test --> decrease it's sensitivity but increase it's specificity.
- Lowering the cutoff point of a diagnostic test --> increase it's sensitivity but decrease it's specificity.
. OR = (ad)/(bc).
. RR = [a/(a+b)]/[c/(c=d)].
. Direct calculation of RR in case-control study is not possible, because the study design doesn't include following
peoples overtime.
21 | P a g e
STATISTICS STEP2 TikiTaka
.N.B:
- Attributable risk percent (ARP): represents the excess risk in a population that can be attributed to the exposure to a
particular risk factor.
. It can be calculated be subtracting the risk in the unexposed population (baseline risk) from the risk from the
exposed population,
and dividing the results by the risk in the exposed population.
. ARP = (Risk in exposed - Risk in nonexposed)/Risk in exposed.
or
. ARP can be calculated from the relative risk as follow:
. ARP = (RR-1)/RR
. Pre and post-test Probabilities (+ve predictive value (PPV) & -ve predictive value:
____________________________________________________________________________________
Positive predictive value (ppv) test:
----------------------------------------
. Describes the probability of having the disease if the test result is +ve,
(if the patient has a +ve test result, what is the likelihood that he actually has a disease).
. The post-test probability of having the disease is directly related to the PPv.
. If the PPV is 25% i.e low, consequently if the test result is positive, then the post-test probability of having the
disease is low.
. The post-test probability is also dependent on the sensitivity, specificity and pre-test probability of having the
disease.
22 | P a g e
STATISTICS STEP2 TikiTaka
----------------------------------------
. describes the probability of not having the disease if the test result is -ve.
. NPV will vary with the pre-test probability of a disease (important) i.e,
. A patient with high probability of having a disease will have a low NPV.
. And a patient with a low probability of having a disease will have a high NPV.
. If the NPV is 96 % this means that if the test result is -ve, the chances of the patient to not have the disease is high
(96%).
. And the chances of the patient to have the disease is low (100 - 96 = 4%).
Example:
--------
1- BREAST CANCER & FNA test results:
. a patient of a high pre-test probability for having the disease (1st degree relative having breast cancer or age > 40
ys), has a low NPV.
. a patient of a low pre-test probability for having breast cancer (less than 40 yrs old), has a high NPV.
NOTE:
----
. The prevalence of the disease is directly related to the pre-test probability of having the disease (PPV) & inversely
related to
the pre-test probability of not having the disease (NPV), so increased prevalence --> low NPV but high PPV and vice
versa.
. Sensitivity and specificity are not affected by the prevalence of the disease and so the like hood ratio positive i.e
sensitivity (1-specificity),
23 | P a g e
STATISTICS STEP2 TikiTaka
N.B.:
----
. If the test result is -ve , the probability of the patient to have the disease = 1 - NPV.
. Cases and diagnostic tests are high yield USMLE questions in probabilities:
- Coronary artery disease and ECG stress test.
- Pulmonary embolism and ventilation-perfusion scanning.
- Prostate cancer and serum PSA level.
N.B.: Also sensitivity and specificity of a test compare its results to the results obtained by the gold standard test
. RELIABILITY:
______________
. Test-retest reliability.
. A reliable test is reproducible; gives similar or very close results on repeat measurements.
. Reliability is quantified in terms of Coefficient of variation (CV).
. Coefficient of variation; is the standard deviation of the set of repeated measurements divided by their mean &
expressed as a percentage.
. Reliability is maximal when random error is minimal.
________________________________________________
. It emphasizes the importance of choosing the appropriate cutoff value, although overlapping of normal & abnormal
results make it difficult.
. Any cutoff point demonstrates a trade-off between SENSITIVITY and 1-SPECIFICITY.
. Sensitivity (positivity in disease) --> is the proportion of subjects who have the target condition and gives positive
results.
. Sensitivity = TP/(TP + FN).CLINICALLY
. Specificity (Negativity in health) --> is the proportion of subjects without the target condition and gives negative
results.
. Specificity = TN/(TN + FP).CLINICALLY
. ++ Sensitivity --> ++ true +ve & -- false -ve (diagnosed as normal but he is diseased).
. ++ Sensitivity --> allow not to miss any diseased patient (not to miss any true +ve).
. ++ Specificity --> ++ true -ve & -- false +ve (diagnosed as diseased but he is normal).
. ROC --> Aiming at decrease false -ve and false +ve results (i.e increase sensitivity and specificity).
.N.B.:
- In ROC curve: sensitivity = true positive while (1-specificity) = false positive.
. Positive predictive value (ppv) --> is the probability of having the disease if the test results are +ve.
. PPV = TP/(TP + FP).
. Negative predictive value (NPV) --> is the probability of not having the disease if the test result is -ve.
. NPV = TN/(TN + FN).
. Positive likelihood ratio (LR+) = sensitivity/(1-specificity).
. (LR+) --> is the ratio of the proportion of patients who have the target condition & test positive to,
. the proportion of patients without the target condition & who also test positive.
Negative likelihood ratio (LR-) = (1-specificity)/sensitivity.
. (LR-) --> is the ratio of the proportion of patients who have the target condition who test negative to,
the proportion of patients without the target condition who also test negative.
. ROC curve has 2 lines; vertical line (Y) for sensitivity and horizontal line (X) for specificity
. Large Y values --> Indicates High sensitivity.
25 | P a g e
STATISTICS STEP2 TikiTaka
. A shift of the ROC curve upwards for a given cutoff indicates increased sensitivity and vice versa.
. A shift of the curve to the right for a given cutoff (higher value)indicates decreased sensitivity and vice versa.
. The curve usually shows that an increase in sensitivity is offset by decrease in specificity.
. As mentioned before sensitivity= TP/(TP+FN) & specificity= TN/(TN+FP), so decreased overlap between the
healthy and diseased population curves -->
--> Decrease both the number of FP & FN (i.e decreases the dominator) --> thus increase both sensitivity and
specificity (i.e allow for a test with both
Higher sensitivity and specificity.
. In overlap curve: moving the cutoff value to the right (higher value) would increase specificity at the expense of
sensitivity, while
Moving the cutoff to the left (lower value) would increase sensitivity at the expense of specificity.
. A cutoff value just outside the overlapping portion would maximize the sensitivity (if to the left) or specificity (if to
the right) at 100%.
. Both sensitivity and specificity depend on the cutoff value of a given test for example:
. Raising the cutoff value makes it more difficult to diagnose the condition i.e
it makes it harder to obtain +ve results and easier to obtain -ve results --> this will increase specificity but decrease
sensitivity.
. Lowering the cutoff value makes it easier to obtain +ve results and harder to obtain -ve results,
26 | P a g e
STATISTICS STEP2 TikiTaka
. Increase sensitivity --> increase -ve predictive value (NPV) due to (decrease false -ve results).
. Increase specificity --> increase +ve predictive value (PPV) due to (decrease false +ve results).
. PERCISION:
___________
. Is the proportion of the true +ve results out of the total number of the true results of the test (-ve results are not taken
into account).
. Precision is equivalent to +ve predictive value i.e. true +ve/all true.
. It is the measure of the random error in the study.
. The study is precise if the results are not scattered widely, this is reflected by a tight confidence interval.
. So, if the first study has a wider confidence interval than the second study --> the second study is more precise.
. ACCURACY:
___________
. Is the proportion of the true results (true +ve and true -ve) out of all results that are predicted by the test.
. The closer the plotted curve approaches the left and top borders of the ROC curve, the more accurate the test.
. Accuracy can also be measured by the total area under the plotted curve on ROC curve.
. Increase of the total area under the curve --> increases the accuracy of the test.
.N.B:
. Both accuracy and precision depend upon sensitivity and specificity of the test as well as the prevalence of the
condition in the population tested.
. Validity and accuracy are measures of systematic errors (bias).
. Accuracy is reduced if the sample doesn't reflect the true value of the parameter measured.
. Increasing the sample size --> increases the precision of the study, but doesn't affect the accuracy.
27 | P a g e
STATISTICS STEP2 TikiTaka
. Risk:
_______
. Median --> is the middle observation in a series of observations after arranging them in an ascending or descending
manner.
. If number of observations is odd --> Median = (n+1)/2.
28 | P a g e
STATISTICS STEP2 TikiTaka
. EXAMPLE: 5,6,7,5,10,3
.EX2: 5,6,8,9,11.
. EX3: 5,6,8,9
.N.B.:
. Range: is a measure of variation (dispersion).
. Range: is the difference between the largest and the smallest values
. Range = largest value - smallest value
e.g. Range = 9-5 = 4.
.N.B.:
. Average: it is the summation of the total number of observations divided by the sample size.
. e.g. in random sample of children the number of episodes of UTIs are as follow (50 child (0), 30 child (1), 10 child
(2), 10 child (3)).
. The average number of UTIs episodes per year in a child is;
29 | P a g e
STATISTICS STEP2 TikiTaka
- the number of UTIs episodes per years is: (500) + (301) + (102) + (103) = 80 UTIs episodes per year.
- The average number of UTIs episodes per year in a child = 80/100 = 0.8 (between 0 and 1)
- i.e the child experiences less than one attack of UTIs per year.
. SCATTER PLOTS:
________________
. They are useful for crude analysis of data.
. They can demonstrate the type of association (linear or non linear).
. If a linear association is present, the correlation coefficient can be calculated.
. The association is positive (if the outcome increases with the increase in the exposure) -> +ve correlation coefficient
while
the association is negative (if outcome decreases with the increase in exposure) -> -ve correlation coefficient.
. the correlation coefficient in an almost perfect linear association is close to 1.
. Crude analysis of association using the scatter plots doesn't account for possible confounders.
30 | P a g e
STATISTICS STEP2 TikiTaka
.N.B:
1- It is very important to consider the natural history of a disease when evaluating the effectiveness of a drug in a trial,
e.g. common cold --> natural solution within one week should be taken in consideration while evaluating,
an anti-viral drug used in treatment of common cold.
2- It is difficult to comment on a drug effectiveness, unless a comparison is made with the control group and
Statistical significance is made to know the power of the study.
. NULL HYPOTHESIS:
-------------------
. Is always the statement of NO relationship between the exposure and the outcome.
. To state the null hypothesis correctly you should recognize the study design first.
. In cross-sectional study: the 2 variables (CRP & cancer colon) are studied at the same point of time so,
the temporal relationship between the 2 variable can't be evaluated.
. So you can't measure the relationship between the 2 variables --> Null hypothesis is better considered.
. ALTERNATIVE HTPOTHESIS:
--------------------------
. It Opposes the Null hypothesis.
. It States that there is a relationship between the exposure and the outcome.
. It is better for studies in which a relationship between the 2 variables is existing to consider the Alternative
hypothesis.
. External validity answer the question "how the generalizable are the results of a study to other populations.
. For example: if the cohort is restricted to middle aged women, the results of the study are applicable only to middle
aged women & not applicable to elderly men.
====================================================================================
=====================================
2- How to calculate:
. Sensitivity = true +ve by the test / (true +ve + false -ve) all patients that are actually diseased.
. True positive = sensitivity (true +ve + false -ve) i.e.(N. of patients actually with the disease).
. True negative = (1- sensitivity) (true +ve + false -ve) i.e. (N. of patients actually with the disease).
. Specificity = true -ve by the test/ (true -ve + false +ve) all patients that are actually free.
. True negative = specificity (true -ve + false +ve) i.e. all patients that are actually free.
32 | P a g e
STATISTICS STEP2 TikiTaka
. So if the researchers need to find a difference between a tested drug and the standard of care if exists, they need to
maximize the power (1-B).
. Power depends on sample size and the difference in outcome between the 2 groups being tested.
. Type II error:
_______________
. Occurs when the researchers fail to reject the null hypothesis when the null hypothesis is really false,
(they say there is no difference when actually there is (one) difference).
. It causes the investigators to miss true relationships.
. An example: a study finding that doesn't affect platelet function when, in fact it does.
. Beta (B): is the probability of committing a type II error.
. If (B) is set at 0.2 (20%) i.e there will be a 20% chance to accept the null hypothesis when it is false -->
the power (1-B) will be 0.8 (8o%) i.e there will be a 80% chance of rejecting the null hypothesis when it is truly
false.
. Type I error:
_______________
. Occurs when the researchers reject the null hypothesis when the null hypothesis is really true,
(They say there is difference when actually there is no difference)..
. i.e the study finds a statistically significant difference between 2 groups when it is actually not existing.
. An example: If a study concluded that hard candy improves heart failure mortality, when it doesn't.
. Alpha (a): is the maximum probability of making type I error a researcher is willing to accept.
. It corresponds with the 'P" value or the probability of making a type I error.
. The (a) is typically set at P= 0.05, meaning that the researchers accept a 5% possibility that the difference perceived
as true is actually due to chance.
33 | P a g e
STATISTICS STEP2 TikiTaka
.N.B:
- There are 4 basic payment methods that exist between health insurance and physicians:
Capitation:
______________
. Physicians are paid fixed amount of money per enrollee, not per service (i.e paid by capitation).
. So they have incentives to contain (decrease) costs per enrollee due to the fixed budget allocated for them.
. If many enrollees seek care or there are enrollees need extensive care, physicians costs may be greater than their
payments.
. So physicians are motivated to provide more preventive care to catch illness early so patients stay healthier and need
fewer tests and procedures as they age.
34 | P a g e
STATISTICS STEP2 TikiTaka
4) Salary:
__________
. Physicians are paid a fixed amount and their pay is not tied to number of enrollees or services rendered (provided).
. Unless their contracts include withholds or bonuses, salaried physicians face no financial risk.
. So they have no financial incentive to change their treatment patterns, either in service provided or number of follow
up visits.
N.B.:
- A state with a population of 4,000,000 contains 20,000 people who have disease A, a fatal neurodegenerative
condition. there are 7,000 new cases of the disease,
a year and 1000 deaths attributable to disease A. there are 40,000 deaths per year from all causes, what is the ....??
1- Incidence of the disease: is the number of new cases of a disease per year divided by population at risk.
Incidence = 7000 / (4,000,000 - 20,000).
2- The disease specific mortality: is the number of deaths attributable to the disease per year divided by the total
population.
The disease specific mortality = 1000/4,000,000.
3- The rate of increase of a disease: is the number of new cases per year minus the number of deaths (or cures) per
year divided by the total population.
The rate of increase of a disease = (7000-1000)/4,000,000.
35 | P a g e
STATISTICS STEP2 TikiTaka
4- The prevalence of a disease: is the number of persons with the disease divided by the total population at a specfic
point of time.
The prevalence of a disease = 20,000 / 4,000,000.
5- The mortality rate: is the number of deaths per year divided by the total population.
The mortality rate = 40,000 / 4,000,000.
1- An increase in lung cancer incidence and mortality has been observed in women over the last four decades due to
increased cigarette smoking.
2- Breast cancer is the most common non skin cancer among women in USA, but breast cancer mortality is
comparatively low,
3- Mortality from breast cancer has stayed relatively stable overtime, whereas colon cancer mortality decreased
somewhat over the last decades.
4- Stomach cancer is now uncommon, so it's incidence and mortality have been drastically decreased in the last
decades.
5- Mortality of ovarian cancer is stable over time.
6- A part from skin cancer, the most common women cancer are ordered in descending according to incidence: Breast
cancer, Lung cancer then colon cancer.
7- In order of mortality: Lung cancer followed by Breast cancer then colon cancer.
N.B.:
- Case-Fatality rate: is calculated by dividing the fatal cases by the total number of people with the disease.
- Case-fatality = Number of fatal cases/total number of people with the disease.
N.B:
- If events are independent, the probability that all events will turn out the same (e.g. -ve) is the product of the
separate probabilities for each event.
36 | P a g e
STATISTICS STEP2 TikiTaka
- The probability of at least 1 event turning out differently is given as: 1- (the probability of all events being the
same).
- For example:
A new serological test for detecting prostate cancer is negative in 95% of patients who dont have the disease, if the
test is used on 8 blood samples
taken from patients without prostate cancer, what is the probability of getting at least 1 positive test.
- In this case a 0.95 (95%) probability of giving a true negative result and 0.05 (5%) probability of giving false
positive result.
8
- To calculate the chance of all 8 tests being negative: probability (all negative) = (0.95).
- you have to to know that the total probability is always equal to 1.0 (100%).so
- The probability that at least 1 test turns out positive is: 8
- Probability (at least 1 positive) = 1-probility (all negative) = 1- (0.95)
37 | P a g e