Biostatistics Notes 2 Fall 2016

Chapter 4:
Summary Statistics
Note: In these notes, I will be using

M instead of X-bar to symbolize
sample mean.
Summarizing Data
Mean
N will refer to the

number of scores
N = 28
M = arithmetic mean
= sum; add up
2127
M
75.96
28
Summarizing Data
Ordered data
32
34
40
62
63
65
68
68
70
76
76
77
78
79
80
80
81
81
87
88
88
88
89
90
93
95
97
102
Summarizing Data
Median
32
34
40
62
63
65
68
68
70
76
76
77
78
79
80
80
81
81
87
88
88
88
89
90
93
95
97
102
Place scores in order from

lowest to highest
Can also list from highest to
lowest
Median (Mdn) = middle score
Half of scores are above the
median, half are below
Summarizing Data
Median
32
34
40
62
63
65
68
68
70
76
76
77
78
79
80
80
81
81
87
88
88
88
89
90
93
95
97
102
Since we have an even number

of scores (N = 28), there is no
one score that is in the middle
Here we have 2 middle scores:
79 and 80 (13 scores above, 13
scores below)
Take the mean of the 2 middle
scores
159
Mdn
79.5
2
Summarizing Data
Score
Frequency
32
34
40
62
63
65
68
70
76
77
78
79
80
81
87
88
89
90
93
95
97
102
1
1
1
1
1
1
2
1
2
1
1
1
2
2
1
3
1
1
1
1
1
1
Mode
Data are still ordered

from lowest to highest
Frequency column
shows how often each
score appears
Mode (Mo) is the score
which appears most
frequently
Mo = 88
Measures of Central Tendency

Mean (M) = X = 2127 = 75.95
N
28
Median (Mdn) = 79 + 80 = 159
= 79.5
2
2
Mode (Mo) = 88
All are accurately calculated, but give a slightly
different impression of where the middle is.
In a normal distribution, these three measures will
be the same (or very close)
Displaying Data -- Histogram

Graph of frequencies
3.0
Frequency
2.0
1.0
0.0
SCORE

(8 categories)
10
Frequency
0
35.0
45.0
55.0
65.0
75.0
SCORE
85.0
95.0
105.0

(16 categories)
Frequency
0
32.5
37.5
42.5
47.5
52.5
57.5
62.5
67.5
72.5
SCORE
77.5
82.5
87.5
92.5
97.5
102.5 107.5
Histogram
(16 categories with normal curve)
Frequency
0
32.5
37.5
42.5
47.5
52.5
57.5
62.5
67.5
72.5
SCORE
77.5
82.5
87.5
92.5
97.5
102.5 107.5
Whats a Skewed Distribution?

Salaries of 3 companies
Company A
Company B
C ompany C
$10,000
$10,000
$12,000
$12,000
$12,000
$12,000
$12,000
$12,000
$14,000
$14,000
$10,000
$10,000
$12,000
$12,000
$12,000
$12,000
$12,000
$12,000
$14,000
$380,000
$10,000
$10,000
$12,000
$12,000
$12,000
$12,000
$12,000
$12,000
$15,000
$380,000
Note the dramatic effect of one salary on the mean, especially comparing A to B

Company A
$10,000
$10,000
$12,000
$12,000
$12,000
$12,000
$12,000
$12,000
$14,000
$14,000
M
Mdn
Mo
$12,000
$12,000
$12,000
Company B
C ompany C
$10,000
$10,000
$10,000
$10,000
$12,000
$12,000
$12,000
$12,000
Means$12,000
for Company B$12,000
and C are
$12,000
pulled
in the direction$12,000
of the one
$12,000
$12,000
unusual score
$12,000
$12,000
$14,000
$15,000
$380,000
$380,000
$48,600
$12,000
$12,000
$48,700
$12,000
$12,000

Company A
Company B
C ompany C
$10,000
$10,000
$10,000
$10,000
$10,000
$10,000
$12,000
$12,000
$12,000
$12,000
$12,000
$12,000
$12,000
$12,000
$12,000
$12,000 and Modes
$12,000
$12,000 the
But the Medians
remain unaffected
$12,000
$12,000
$12,000
impact of
a few unusually
high or low scores
is seen
$12,000
most clearly$12,000
on the MEAN $12,000
$14,000
$14,000
$15,000
$14,000
$380,000
$380,000
M
Mdn
Mo
$12,000
$12,000
$12,000
$48,600
$12,000
$12,000
Not skewed
Skewed
$48,700
$12,000
$12,000
Skewed
Examples of Shapes of Skewed

Distributions
Positively Skewed (aka skewed to the
right)
14
12
Frequency
10
2
0
95.0
105.0
100.0
115.0
110.0
125.0
120.0
135.0
130.0
145.0
140.0
155.0
150.0
IQ
Mode 105 (& 121)

121.1
Median 120.5
Mean
Positively Skewed Distribution

14
12
10
Std. Dev = 3.98
Mean = 7.3
N = 50.00
0
2.0
4.0
murder rate
6.0
8.0
10.0
12.0
14.0
16.0
18.0
20.0

30
20
10
Std. Dev = 1.58

Mean = 1.56
N = 124.00
0
0.00
1.00
.50
2.00
1.50
2.50
3.00
4.00
3.50
average daily alcohol use
5.00
4.50
6.00
5.50
7.00
6.50
7.50
8.00

300
Frequency
200
100
Mean gets
dragged to
the right
0
15.0
25.0
20.0
35.0
30.0
45.0
40.0
55.0
50.0
65.0
60.0
75.0
70.0
85.0
80.0
90.0
AGE
Mode 21
31.34
Median
25
Mean
Examples of Shapes of Skewed

Distributions
Negatively Skewed (aka skewed to the
left)
3.5
3.0
Frequency
2.5
2.0
1.5
1.0
.5
0.0
70.0
72.5
75.0
77.5
80.0
82.5
85.0
87.5
90.0
Score on exam
Mean 82.48
Mode 88.8
Median 84
Negatively Skewed Distribution

7
Frequency
1
0
2.81
3.06
3.31
2.94
3.19
3.56
3.44
3.81
3.69
3.94
Median
3.55
OVERALL COLLEGE GPA
Mean 3.51
Mode 3.71
Negatively Skewed Distribution

14
12
10
Mean gets
dragged
to the left
Frequency
2
0
3.19
3.31
3.25
3.44
3.38
3.56
3.50
3.69
3.63
3.81
3.75
3.94
3.88
4.00
MAJOR AREA GPA
Mean 3.78
Mode 3.98
Median
3.82
Skewed Distributions
Relative positions of measures of central
tendency
Salaries in 2 Companies
Means and Medians suggest that the employees at these 2 companies have similar pay
M = $60,000
Mdn = $60,000
M = $60,000
Mdn = $60,000
Salaries in 2 Companies
But looking at the distribution of pay, the shapes are
different
6.0
5.5
5.0
4.5
4
Frequency
Frequency
4.0
3.5
3.0
2.5
2.0
1.5
1.0
.5
0.0
M = $60,000
Mdn = $60,000
M = $60,000
Mdn = $60,000
Range
6.0
A rough measure of dispersion

(variability)
6
5.5
5.0
4.5
4
3.5
Frequency
Frequency
4.0
3.0
2.5
2.0
1.5
1.0
.5
0.0
R = Maximum Minimum
R = $110,000 $10,000
$40,000
R = $ 100,000
R = $ 80,000
R = $ 40,000
Comparing the ranges of these 2 companies suggests that

there is some difference between them.
Standard Deviation
A more precise measure of dispersion
Comparable to the mean in that it takes all of the
data points into account (not just highest and
lowest)
More commonly used than Range
Is basis for many later measures and tests well be
using
Standard = average
Deviation = away from normal
Standard Deviation
Scores (X)
3
4
5
6
7
We are going to
calculate a standard
deviation (SD) for
this set of 5 scores.
N
25
5
5
The mean is 5, so
how far away (on
average) do these 5
scores deviate from
the mean?
Standard Deviation
Scores (X)
3
4
5
X M (deviation score)
6
7
N
25
5
5
35
= -2
45
= -1
55
= 0
65
= 1
75
= 2
Subtracting
each score
from the
mean to get
the deviation
score
Standard Deviation
Scores (X)
3
4
5
6
7
N
25
5
5
35
= -2
45
= -1
55
= 0
65
= 1
75
= 2
(X- M)
N
0
5
then taking
the average of
these 5
numbers -- !
Always results
in 0. The
positive &
negative
numbers
cancel each
other out to
Standard Deviation
Scores (X)
3
This method
wont work
positives &
negatives
always cancel
each other
out to zero!
4
5
6
7
N
25
5
5
35
= -2
45
= -1
55
= 0
65
= 1
75
= 2
(X- M)
N
0
5
Need another
method to get
the information
we want.
Standard Deviation
Deviation method
(X M)2
Scores (X)
3
4
XM
(deviation score)
-2
-1
4
1
0
1
4
We square EACH deviation score. Why?

Because in the X M column, the
positives and negatives cancel each
other out (sum to zero).
What happens when we square these
deviation scores?
Standard Deviation
Deviation method
(X M)2
Scores (X)
3
4
XM
(deviation score)
-2
-1
4
1
0
1
4
We still get a measure of distance from

the mean (a squared version), and also
lose the pesky negatives ( since negative
x negative = positive).
Standard Deviation
Deviation method
(X M)2
Scores (X)
3
4
XM
(deviation score)
-2
-1
4
1
0
1
4
s2
Add up these 5 numbers (SS is the sum

of squares)
and then
divide by (n 1)
SS
n- 1
10
Standard Deviation
Deviation method
(X M)2
Scores (X)
3
4
XM
(deviation score)
-2
-1
4
1
0
1
4
s2
NOW when we find the (near)average of

THESE 5 scores, we get a meaningful
number an expected squared distance
from the mean.
SS
n- 1
10
4
2.5
Standard Deviation
Deviation method
(X M)2
Scores (X)
3
4
XM
(deviation score)
-2
-1
4
1
0
1
4
s2 2.5
Taking the square root of the
variance gets this measure back
into the original scale.
2.5 1.58
Standard Deviation
Deviation method
Scores (X)
3
4
5
6
7
The standard deviation for this set of

numbers is 1.58
It is a measure of how much variation
(departure from the mean) these numbers
have
A larger number tells us that the
numbers are generally spread out (i.e., far
away from each other)
A smaller number tells us that many of
the numbers are quite close to the mean.
Standard Deviation
Computational method for SS
The deviation method (the last few pages) will give you
the standard deviation, and it is easier to see where that
end number comes from (a measure of how far from the
mean the numbers are).
However, the deviation method can be a bit tedious in
hand calculations, especially with certain types of raw
data.
We have another more direct method (computational
2 hand
method) of getting to the SS when using
X
2
calculations.
SS X
Standard Deviation
Computational method
Scores (X)
SS
3
4
X2
9
16
25
49
N
25
5
5
SS 10
s
SS
n 1
10
1.58
4
X2 = 135
NOTICE: this is NOT 252
25
SS 135
36
Standard Deviation
Second example deviation method
Scores (X)
10
12
13
15
18
20
22
25
Another dataset,
showing you how to
calculate standard
deviation using both
methods.
Standard Deviation
Scores (X)
10
12
13
15
18
20
22
25
M = 135 / 8
= 16.875
XM
-6.875
-4.875
-3.875
-1.875
1.125
3.125
5.125
8.125
(X
M)2
47.27
23.77
15.02
3.52
1.27
SS
s2 9.77
n- 1
192.91
26.27
7
66.02
27.56
Standard Deviation
Scores (X)
10
12
13
15
18
20
22
25
M = 135 / 8
= 16.875
XM
-6.875
-4.875
-3.875
-1.875
1.125
3.125
5.125
8.125
(X
M)2
47.27
23.77
15.02
3.52
1.27
9.77
s2 27.56
26.27
s 27.56 5.25
66.02
Standard Deviation
Second example computation method
Scores (X)
10
12
13
15
18
20
22
25
M = 135 / 8
= 16.875
X2
100
144
169
225
324
400
434
625
SS
135
SS 2471
SS 192.875
s
SS
n 1
X2 = 2471
192.875
5.249
7
Standard Deviation
Third example deviation method
Scores
(X)
6
2
8
5
4
4
7/7
M = 36
= 5.143
XM
.857
-3.14
2.86
-.143
-1.143
-1.143
1.857
(X
M)2
.735
9.878
8.16
.020
1.306
1.306
3.449
SS
2
s
n- 1
24.857
6
4.143
Standard Deviation
Third example deviation method
Scores
(X)
6
2
8
5
4
4
7/7
M = 36
= 5.143
XM
.857
-3.14
2.86
-.143
-1.143
-1.143
1.857
(X
M)2
.735
9.878
8.16
.020
1.306
1.306
3.449
s2 4.143
s 4.143 2.035
Standard Deviation
Third example computation method
Scores
(X)
6
2
8
5
4
4
7
M = 36 / 7
= 5.143
X2
36
4
64
25
16
16
49
SS
SS
36
210
SS 24.857
s
X2 = 210
SS
n 1
24.857
2.035
6
Kurtosis
Another check on normality
40
Frequency
30
20
10
0
16.0
18.0
20.0
22.0
24.0
26.0
28.0
30.0
32.0
Distance (mm) from center of pituitary to pteryo-maxillary fissure

Does the distribution
have the right shape to be
normal?
This one fits pretty well under the normal curve
(Mesokurtic)
Kurtosis
10
Frequency
0
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
9.5
Score
This one is flatter in the center, and doesnt fit

under the corners
This one would be considered platykurtic
Kurtosis
55
50
45
40
Frequency
35
30
25
20
15
10
5
0
23.00
23.50
24.00
24.50
25.00
Score
This one is higher in the center, and doesnt fill out

the shoulders
This one would be considered leptokurtic
A quick check on kurtosis

In a normal curve, the standard deviation is
approximately 1/6 of the size of the range
So in order to check if a distribution fits this ratio, you
take the SD and divide by the range
IF the distribution is normal, you should get a result of
approximately .17 (somewhere between .15 and .19)
Normal:
s
.17
R
Anything LARGER than that (such as .38 or .47) would
indicate a platykurtic distribution one that has MORE
variability than normal
Anything SMALLER than that (such as .11 or .09) would
suggest that the data are more closely packed or
Assessing Kurtosis
In a normal curve,
the standard
deviation is
approximately 1/6
of the range
40
Frequency
30
s
.17
R
20
10
In this example:
0
16.0
18.0
20.0
22.0
24.0
26.0
28.0
30.0
32.0
Distance (mm) from center of pituitary to pteryo-maxillary fissure
s
2.93
.19
R
15
Verdict: Mesokurtic
Assessing Kurtosis
Normal:
10
s
.17
R
Frequency
Inthisexample
:
s
2.42
.30
R
8
0
1.5
2.5
3.5
4.5
5.5
Score
6.5
7.5
8.5
9.5
Verdict: Platykurtic
Assessing Kurtosis
Normal:
50
s
.17
R
Frequency
40
30
20
Inthisexample
:
10
s
.226
.11
R
2
0
22.95
23.55
24.15
24.75
25.35
Score
Verdict:
Leptokurtic
Is this distribution normal?

Skewness & Kurtosis
40
45
50
60
60
65
70
74
75
75
77
80
80
80
85
85
85
90
90
90
93
93
93
93
94
94
95
100
100
105
105
105
105
110
120
127
130
137
140
180
X = 3675
X2 = 365971

Skewness
18
M = X /N
15
=
3675/40
12
=
91.875
Mdn = 91.5
0
40.0
60.0
80.0
100.0
120.0
140.0
160.0
180.0
Mo = 93 &
105
Conclusion: Not (very) skewed

Skewness
2
3675
SS 365971
18
40
15
SS 28330.375
12
0
40.0
60.0
80.0
100.0
120.0
140.0
160.0
180.0
28330.375
26.952
39
s
26.952
.193
R
140
Conclusion: Mesokurtic
Chapter 5:
Probability Concepts
and Screening
Definitions
Random variable: a numerical quantity that takes on
different values depending on chance
Population: the set of all possible values for a
random variable
Event: an outcome or set of outcomes
Probability: the proportion of times an event is
expected to occur in the population
Ideas about probability are founded on relative
frequencies (proportions) in populations.
Types of random variables

Discrete random variables: a countable set of possible
outcome (e.g., the number of cases in an SRS from the
population)
Binomial: a family of discrete random variables
Binomial random variable: the random number of
successes in n independent Bernoulli trials (a
Bernoulli trial has two possible outcomes: success
or failure)
Binomials random variables have two parameters
n number of trials
p probability of success of each trial
Continuous random variable: an unbroken continuum of
possible outcome (e.g., the average weight of newborns
Classical Model
a priori, or before the fact predictions, ideas about
what will happen using reason alone
(Based on conditions that all possible outcomes are
equally likely and that only 1 can occur)
probability = # of specific events out of total
possible events:
p = s/t
This will give us a number between 0 and 1
0 ------------------.50------------------1.00
no chance
certainty
of occurring
of occurring
Classical/Theoretical Model
Tossing a coin or rolling dice are good examples
In flipping a coin:
Head is just as likely as tails
Can only have 1 outcome (cant be both H & T)
In rolling a die:
Getting a 3 is just as likely as getting a 5
Can only have 1 outcome
What is the probability of flipping a coin and getting
a Head as the outcome?
Number of specific outcomes that satisfy this
condition
1 (getting a Head)
How many total possible outcomes are there?
2 (H or T)
p = s / t = 1 / 2 = .50
Classical model predicts a 50% chance of getting a
Head
What is the probability of rolling a die and getting a 3?
How many specific outcomes satisfy this condition?
1 (i.e., tossing a 3)
6
p=
s/t
= 1/6
.167
Classical model predicts a 17% chance of getting a 3

on any given toss of a die
Classical Model
What is the probability of tossing a die and getting
an even number as an outcome?
How many specific outcomes satisfy this condition?
3 (i.e., getting a 2, 4 or 6 mutually exclusive
outcomes)
6
p =
s/t
= 3/6
.50
Classical model predicts a 50% chance of getting an

even number on any given toss of a die
Long-run/Empirical model
a posteriori viewpoint, after the fact, data have
already been collected
Rather than looking at what SHOULD occur, this is
what HAS actually occurred over a large number of
events
Can compare this number to the expected outcomes
as predicted by the classical model
Long-run because the more data you have, the
more accurate the results are going to be
Coin flip game: a coin will be flipped
Heads = Ill give you $5
Tails = you give me $5
So youd be hoping for heads
With multiple tosses, you might generally expect to
break even, or maybe one person would be $5 ahead
of the other
On 10 flips, the outcome is:
5 tails (sounds good)
6 tails (I got a bit lucky)
7 tails (more lucky)
8 tails (hm)
9 tails (???)
At what point do you (should you) become
suspicious?
Same scenario:
Heads = Ill give you $5
Tails = you give me $5
BUT instead of there being 10 tosses, there will
be 100
On 100 flips, the outcome is:
50 tails
60 tails
70 tails
80 tails
90 tails
Percentages are the same as with 10 tosses, but now
its based on a larger number
60% based on 100 is a lot more suspicious than
60% based on 10
Connection to real data

Are outcomes different?
Fair/biased coin = control/experimental groups
E.g., Are the results we got with an
experimental treatment that much different
from the control group?
Different enough to be suspicious?
Connection to the normal curve

Weve talked a lot about areas and percentages
under different parts of the normal curve
These ideas can be directly translated into
statements of probability
What is the probability that any individual will
have an IQ score greater than 100?
If 50% of people have scores in this range, then
stating this as a probability would be:
p = .50
Connection to the normal curve

Converting percentages to probability
statements is simple
Since a percentage is a proportion out of 100,
simply divide the percentage by 100
50 %
= > 50 / 100
=> .50
The z table already gives the information in the

form of probability
Rules and Properties of Probability

Conditional Probability
Conditional Probability
Let Pr(B|A) represent the conditional probability
of B given A.
This denotes the probability of B given that A is
evident. By definition,
Pr(AandB)
Pr(B| A)
Pr(A)
(as long as Pr(A) > 0)
Rules and Properties of Probability

Multiplication
General Rule for Multiplication
Start with the definition of conditional probability:
Pr(AandB)
Pr(B| A)
Pr(A)
then rearrange the formula as follows:
Pr A andB Pr A Pr B| A
Rules and Properties of

Probability
Independent Events
Events A and B are independent if and only if
Pr(A and B) = Pr(A) Pr(B)
General Rule of Addition:
Pr(A or B) = Pr(A) + Pr(B) - Pr(A and B)
Combining Probabilities
1 outcome
Finding the probability that ONE of several
acceptable outcomes may occur
For example, what is the probability that you will
get a Head OR a Tail when you flip a coin?
In this type of situation, you add the
probabilities together:
p = Head OR Tail
p = .50
+
.50
= 1.00
1 outcome
p = Head
p = .50
OR
+
Tail
.50
1.00
In this case, weve expanded the range of

acceptable outcomes from just Head Head
OR Tail
This makes it more likely that we will have an
outcome that fits our criteria
Rule: Add probabilities for Or statements (if
independent and mutually exclusive)
1 outcome
Finding the probability that one of several
acceptable outcomes may occur
Outcomes must be all mutually exclusive
What is the probability of rolling a die and getting
an even number OR a 3?
ADD-OR rule
p = even number OR
=
.50
= .667
.167
Multiple outcomes
Finding the probability that multiple events will
occur
For example, what is the probability that you will
flip 2 coins and get 2 Heads?
In this type of situation, you multiply the
probabilities:
p = Head AND Head
p = .50
x
.50
= .25
Multiple outcomes
p = Head
p = .50
AND
x
Head
.50
.25
In this case, weve made it more difficult to meet the

criteria than if we had just specified 1 Head on 1 coin
flip
This makes it less likely that we will have an
outcome that fits our criteria
Rule: Multiply probabilities for And statements
applied to multiple outcomes (independent and
mutually exclusive)
Multiple outcomes
You will draw one card from a deck, replace that card
and draw a second card from the deck (the
replacement will make the two draws independent)
What is the probability that you will get an Ace on
the first draw AND and an Ace on the second draw?
MULT-AND rule:
p = (Ace) AND
= (4/52)
X
= .077
X
(Ace)
(4/52)
.077 =
.006
Combining probabilities
When you have one outcome desired, and more
than one way to achieve that outcome, it makes it
easier to achieve (higher probability)
When you have multiple outcomes that need to
occur together, it makes it more difficult to achieve
than any one of the outcomes alone
Applying probability
Vital statistics rates
We can combine these rules and properties to
determine certain information of interest
Mortality rate: standard way to compare death rate
across different circumstances
Numerator: # of people who died during a given
period of time
Denominator: # of people who were at risk of
dying during the same period
Denominator may be difficult to calculate, so the
number of people alive in the population halfway
through the time period is sometimes used as an
estimate
Morbidity rate: calculated like mortality, but
conveys the rate disease of disease in relation to
the population
Prevalence: number of individuals with the
disease at one point in time divided by the
population at risk at that time
Incidence: number of new cases during a given
time span divided by the population at risk (at
the beginning of the interval)
NOTE:
(Because prevalence does not involve a period of
time, it is actually a proportion, but is often
mistakenly termed a rate.)
The term incidence is sometimes used
erroneously when the term prevalence is meant.
One way to distinguish between them is to look for
units: An incidence rate should always be
expressed in terms of a unit of time.
Prevalence and incidence can be compared to crosssectional and longitudinal studies
Prevalence is like a snapshot (like a cross-sectional
study); may see cross-sectional studies referred to as
prevalence studies
Incidence needs a period of time to pass, like cohort
studies (begin at a given time and continue to
examine outcomes over the specific span of the
study)
Screening
Screening used to distinguish those who are apparently
well from those who have a decently high probability of
having the disease or condition under study (with the goal
of further testing and/or tx)
Generally employed when:
Target disease is serious enough to warrant
Test is proven and acceptable to detect the disease
early enough for intervention
There is tx available
Two probabilities are used to measure the ability of a test
to distinguish between those who have the disease and
those who do not
Compare the screening results to definitive diagnosis
results
Screening
Sensitivity: does the test return a positive result
on those who actually have the disease?
(Missing out on finding people who have it would
reduce the sensitivity)
Sensitivit
y
#ofpeoplewhotested
()atscreening
x 100
total#ofpeoplescreened
whohavethedisease
Specificity: does the test return a negative result

on those who are free of the disease? (False
negatives would reduce the specificity)
Specificit
y
#ofpeoplewithoutthedisease
whoscreened
(-)
X 100
total#ofpeoplescreened
whoarewithoutthedisease
Screening
Screening
Higher cutoff (27)
Screening
Higher cutoff
Hits: true positive screenings
Screening
Higher cutoff
True negatives
Screening
Higher cutoff
Some false negatives (misses) and
some correct negatives
Screening
Higher cutoff
Misses: false negatives
Screening
Lower cutoff (22)
Screening
Lower cutoff (22)
Hits: true positive screenings
Screening
Lower cutoff
True negatives
Screening
Lower cutoff
Some hits and
some false positives
Screening
Lower cutoff
False positives
Screening
False positives with low cutoff
False negatives with high cutoff
Screening
To detect all glaucomatous eyes (i.e., 100% sensitivity), cutoff must be

at 22 mm. But this will also result in nonglaucomatous eyes at the high
end to be included (specificity drops below 100%). As long as there is
overlap in (+) and (-) populations with regard to the screening
characteristic, both sensitivity AND specificity can never be 100%
Screening
Which is worse? False positive or false negative?
Where should we err in putting the cutoff?
Depends:
Is the disease rare? Then high sensitivity is valuable
Is the disease silent for a while before symptoms?
Is the disease lethal?
Is there effective treatment available?
Will see a parallel between these questions, and
those relating to Type I/II errors and power (related to
hypothesis/ significance testing)
Predictive value (positive)
Predictive value of a test: the probability of
disease given a positive result, the chance that a
patient with a positive test has the disease
This is a conditional probability in which the
event of the disease being present is dependent
(i.e., conditional) on having a positive test result
P(T
|
D
)
P(D
)
P(D | T )
P(T | D ) P(D ) P(T | D ) P(D )
Predictive value (positive)
The probability that a test is
positive, given that the disease is
present (i.e., the sensitivity of
the test)
The best guess (or prior

probability) that the
patient has the disease to
begin with (i.e., the
prevalence of the disease)
P(T
|
D
)
P(D
)
P(D | T )
P(T | D ) P(D ) P(T | D ) P(D )
The denominator. The probability that a positive test occurs at all,

P(T +),
which can occur in one of two ways:
a positive test when the disease is present, and
a positive test when the disease is not present
(each weighted by the prior probability of that outcome)
Predictive value (negative)
Predictive value of a test: the probability of
absence of disease given a negative result: the
chance that a patient with a negative test is
free of the disease
This is a conditional probability in which the
event of the disease being present is dependent
(i.e., conditional) on having a negative test
result
P(T
|
D
)
P(D
)
P(D | T )
P(T | D ) P(D ) P(T | D ) P(D )
Predictive value (negative)
The probability that a test is

negative, given that the disease
is not present (i.e., the specificity
of the test)
The best guess (or prior

probability) that the
patient does not have the
disease to begin with (i.e.,
the non-prevalence of the
disease)
P(T | D ) P(D )
P(D | T )
P(T | D ) P(D ) P(T | D ) P(D )
The denominator. The probability that a negative test result

occurs at all, P(T -),
which can occur in one of two ways:
a negative test when the disease is not present, and
a negative test when the disease is present
(each weighted by the prior probability of that outcome)
Screening exercise
A newly developed test produced positive results in 138 of 150
known diabetics and in 24 of 150 persons known not to have
diabetes.
a.
What is the sensitivity of the new test?
b.
What is the specificity of the new test?
c.
What is the false-positive rate of the new test?
d.
Suppose a fasting blood sugar is obtained with known

sensitivity and specificity of 0.80 and 0.96, respectively. If this
test is applied to the same group that the new test used (150
persons with diabetes and 150 persons without diabetes),
what is the predictive validity of a positive result on the blood
sugar screening?

Biostatistics Notes 2 Fall 2016

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Biostatistics Notes 2 Fall 2016

Hochgeladen von

Copyright:

Verfügbare Formate

Chapter 4:

Note: In these notes, I will be using

N will refer to the

Place scores in order from

Since we have an even number

Data are still ordered

Measures of Central Tendency

Displaying Data -- Histogram

Displaying Data -- Histogram

Displaying Data -- Histogram

(16 categories with normal curve)

Whats a Skewed Distribution?

Whats a Skewed Distribution?

Whats a Skewed Distribution?

Examples of Shapes of Skewed

Mode 105 (& 121)

Positively Skewed Distribution

Std. Dev = 3.98

Positively Skewed Distribution

Std. Dev = 1.58

average daily alcohol use

Positively Skewed Distribution

Examples of Shapes of Skewed

Negatively Skewed Distribution

OVERALL COLLEGE GPA

Negatively Skewed Distribution

MAJOR AREA GPA

A rough measure of dispersion

Comparing the ranges of these 2 companies suggests that

We square EACH deviation score. Why?

We still get a measure of distance from

Add up these 5 numbers (SS is the sum

NOW when we find the (near)average of

The standard deviation for this set of

NOTICE: this is NOT 252

Distance (mm) from center of pituitary to pteryo-maxillary fissure

This one is flatter in the center, and doesnt fit

This one is higher in the center, and doesnt fill out

A quick check on kurtosis

Distance (mm) from center of pituitary to pteryo-maxillary fissure

Is this distribution normal?

Is this distribution normal?

Conclusion: Not (very) skewed

Is this distribution normal?

Types of random variables

Classical model predicts a 17% chance of getting a 3

Classical model predicts a 50% chance of getting an

Connection to real data

Connection to the normal curve

Connection to the normal curve

The z table already gives the information in the

Rules and Properties of Probability

Rules and Properties of Probability

then rearrange the formula as follows:

Rules and Properties of

In this case, weve expanded the range of

In this case, weve made it more difficult to meet the

Specificity: does the test return a negative result

Hits: true positive screenings

Misses: false negatives

Hits: true positive screenings

False negatives with high cutoff

To detect all glaucomatous eyes (i.e., 100% sensitivity), cutoff must be

The best guess (or prior