Sie sind auf Seite 1von 19

School of Mathematics and Statistics The University of Sydney

Statistic component of PHAR2813 Therapeutic Principles Lecture Notes - Revision

John T. Ormerod

1. Probability on Sets
Formulae: P(A B) = P(A) + P(B) P(A B) If A and B are independent then P(A B) = P(A) P(B), otherwise A and B are dependent and P(A B) = P(A) P(B). If A and B are mutually exclusive then P(A B) = 0. Bayes rule: P(A|B) = P(A B) P(B) and P(B|A) = P(A B) . P(A)

2. Medical Jargon used in Diagnostics

Know the denitions of medical jargon in terms of probabilities, e.g. Sensitivity P(S+ |D+ ). Specicity P(S |D ). False Negative P(S |D+ ) False Positive P(S+ |D ) PV + P(D+ |S+ ). PV P(D |S ). Prevalence P(D+ ).

3. Calculate PV +
Calculate P(D+ |S+ ): You will be given the formula for P(D+ |S+ ): PV + = P(D+ |S+ ) P(S+ |D+ )P(D+ ) = . P(S+ |D )P(D ) + P(S+ |D+ )P(D+ ) You will need to read the problem description, and from the problem description, identify the constituent parts of the formula. A little algebra might be needed, e.g. know, for example, P(S+ |D ) = 1 P(S |D ) and P(D ) = 1 P(D+ ).

4. Binomial and Poisson Probabilities


Binomial Probabilities: X Binomial(n, p), 0 X n and X is an integer. P(X = x) = BINOMDIST(x, n, p, 0) in EXCEL. P(X x) = BINOMDIST(x, n, p, 1) in EXCEL. E(X) = np and Var(X) = np(1 p). Poisson Probabilities: X Poisson(), X 0 is an integer. P(X = x) = POISSON(x, n, p, 0) in EXCEL. P(X x) = POISSON(x, n, p, 1) in EXCEL. E(X) = and Var(X) = . Be careful about: P(X > x) = 1 P(X x) P(X < x) = P(X x 1) and P(X x) = 1 P(X x 1) since x only takes integer values.

5. Normal Probabilities

Normal Probabilities: X N(, 2 ), X is continuous. E(X) = and Var(X) = 2 . P(X = x) = 0 (But impossible things happen all the time!) Many equivalent statements P(X x) = NORMDIST(x, , , 1) and if Z =
X

in EXCEL

then

P(Z z) = (z) = NORMSDIST(z) in EXCEL

6. Normal Probabilities Continued


Some formulae: P(Z a) = 1 P(Z a) = 1 (a). P(Z a) = 1 P(Z a) = 1 (a) = (a). Assuming a < b, P(a Z b) = (b) (a).
2 2 If X1 N(1 , 1 ) and X2 N(2 , 2 ) are independent then 2 2 a1 X1 + a2 X2 N(a1 1 + a2 2 , a2 1 + a2 2 ) 2 1

Means and totals:


n n

T=
i=1

Xi N(n, n 2 )

and X =

1 n i=1

Xi N ,

2 n

7. Approximating Distributions
Poisson Distribution on different time lengths: If the rate of occurrence is Poisson() per unit interval, and if W counts the responses over an interval of length t then W Poisson(t). Approximating a Binomial Distribution by a Poisson Distribution: If X Binomial(n, p), and if n is large and p is small, a good approximate variable is Y Poisson(), where = np, and so P(X r) P(Y r).

8. Approximating Distributions

Approximating the distribution of total or mean by a Normal Distribution, i.e. the central limit theorem: If E(X) = , Var(X) = 2 and n is large then
n n

T=
i=1

Xi N(n, n )

and X =

1 n i=1

Xi N ,

2 n

9. Condence Intervals

Know how to calculate a condence interval for the cases:


2 Normal/Constant 2 = 0 case: x z 0 n

Normal/Unknown 2 case: x z Proportions: p z


p(1p) n

s n

where P(|Z| z ) = 1 , P(|tn1 | t ) = 1 and is typically 5%, e.g. P(|Z| 1.96) = 0.95. Note z = ABS(NORMSINV(/2)) and t = ABS(TINV(, n 1)). Also: Interpretation of condence intervals.

10. p-values
Denition: The p-value is the probability of observations at least extreme of unusual as actually observed. Also, the p-value is calculated assuming that H0 is true. Interpretation: Small p-values (< 0.05), for example a p-value of 0.01 means either 1. or 2. is true (but we cant tell which):
1. H0 is true and the observed sample is improbable. 2. H0 is not true.

Late p-values (> 0.05), for example a p-value of 0.2 means.


1. The observed sample is consistent with H0 . 2. It does not mean H0 is actually true (the sample could have come from a different distribution for example).

The smaller the p-value, the stronger the evidence against H0 in favour of HA .

12. Short Answer Hypothesis Testing

Given a problem description: Select an appropriate null and alternative hypothesis. Select an appropriate test statistic for the problem (and know its distribution), i.e. choose the correct test. State the EXCEL command to calculate the p-value. Given the p-value draw a conclusion (again, interpret the p-value in relation to the hypothesis). Note that test statistics and their distributions are in the formula sheet. Too many tests to go through now. Consult the lecture notes.

13. Short Answer Study Types


What is the difference between a Prospective and Retrospective Study? A prospective study is based on subjects who are initially identied as disease-free and classied by presence or absence of the risk factor. A random sample from each group is followed in time (prospectively) until eventually classied by disease outcome. In a prospective study the row totals are xed. A retrospective study is based on random samples from each of the two outcome categories which are followed back (retrospectively) to determine the presence or absence of the risk factor for each individual. In a prospective study the column totals are xed.

14. Short Answer Relative Risk/Odds Ratios


Consider the general table: S+ S D+ a c a+c D b d b+d a+b c+d a+b+c+d

Relative risk: Risk of the disease given a risk factor divided by the risk of the disease without the risk factor. Formula: P(D+ |S+ ) a(c + d) RR = = . + |S ) P(D c(a + b) Only makes sense if data from a prospective study or from a sample of completed records. Odds ratio: Many denitions. For the general 2 2 table all come down to: ad OR = . bc Can be calculated regardless of the type of study used.

15. Sample Question

The presence of a symptom (S+ ) is used to diagnose the presence of a certain disease (D+ ). The probability, P(D |S ) is known as: (a) sensitivity (b) specicity (c) PV (This one is correct) (d) odds ratio (e) relative risk.

16. Sample Question


In a certain community, 10% of all adults have depression (P(D+ ) the prevalence). Suppose that a social worker in this community correctly diagnoses 95% of all adults with depression as having depression (P(S+ |D+ ) the sensitivity). This same social worker also incorrectly diagnoses 2% of all adults without depression as having depression (P(S+ |D ) false positive). What is the probability that an adult, diagnosed by the social worker as having depression, actually has depression (a) 0.492 (b) 0.995 (c) 0.010 (d) 0.841 (This one is correct) (e) none of the above

17. Sample Question

If the prevalence of an infection is 0.0003, the probability of at most 2 cases in a random sample of 10000 is, (using the Poisson approximation): (a) P (Y < 3), with Y Poisson(3), (This one is correct) (b) P (Y 3), with Y Poisson(3), (c) =1-POISSON(0,3,1), (in EXCEL) (d) =POISSON(0,3,1)+POISSON(1,3,1)+POISSON(2,3,1), (in EXCEL) (e) none of these.

19. Sample Question


Consider the distribution of serum cholesterol levels for all males in the U.S. who smoke. The distribution is normal with an unknown mean () and a known standard deviation = 0 = 46 mg/100 ml. Suppose we draw a random sample of size 12 from the population of male U.S. smokers and these men have a mean serum cholesterol level x = 217 mg/100 ml. Based on this sample, an appropriate 95% condence interval for the population mean is: (a) (217 1.96 46, 217 + 1.96 46). 46 46 (b) 217 t 12 , 217 + t 12 where P(|t11 | < t ) = 0.95. (c) 217 1.96 (d) 217 t (e) none of these.
46 46 , 217 + 1.96 . (This one is correct) 12 12 46 46 , 217 + t where P(|t11 | > t ) = 0.95. 12 12

20. Sample Question

A p-value of 0.01 means: (a) there is 1% chance H0 true, (b) there is 1% chance H1 true, (c) the data are consistent with H0, (d) there is evidence against H0, (This one is correct) (e) none of these.

Das könnte Ihnen auch gefallen