Validity of Screening Tests

VALIDITY AND
REALIABILITY OF
SCREENING
TESTS
SCREENING OR DIAGNOSTIC
TESTS
Epidem. is about (among other things)
determining the prevalence or
incidence of disease in pops.
Usually, pop. is examined to decide
whether condition is present or not.
Screening procedure is for early
detection of disease process.
Procedure & examiner must be valid and
Reliable.
VALIDITY & RELIABILITY:
Validity (accuracy) is about

recognizing a condition where it is
present or not.
There must also be reliability

(Consistency).
Reliability is about being able to

produce the same finding when
examination done more than once.
VALIDITY :
Simply ability of test to do what it
purports to do (ACCURATE).
- i.e. correctly categorize those that
are +ve.
or correctly categorize those that
are
ve
Consider a diagnostic test (e.g.
Dipstix). (For Dichotomous Results.)
DISEASE STATUS
DIAGNOSIS (TRUTH)
POSITIVE
NEGATIVE
TOTAL
SCREENING
TEST
POSITIVE
a (TP)
b (FP)
a+b
NEGATIVE
c (FN)
d (TN)
c+d
a+c
b+d
a+b+c+d
TOTAL
VALIDITY Cont.:
a = Those with disease detected by test
(True positives - TP)
b= Those without disease that test
says they do (False positives - FP)
c= Those with disease that test says
dont have (False negatives - FN)
d= Those without disease which test
says they dont (True negatives TN)
Measuring (Quantifying)
validity:
Sensitivity:
= Proportion of positives (Those with
disease) that the test is able to detect.
i.e. a (Probab. that +ve will be called
+ve)
a+c
(Able to give +ve findings when person
has disease)
Measures of validity cont:

Specificity:
= Proportion of those without disease
that test was able to detect
i.e. d (Probability of a -ve being called
-ve)
b+d
(Able to give -ve findings when person
has no disease)
Accuracy is thus sensitivity and

specificity.
As sensitivity false Negative (FN)

As specificity false Positive (FP)
In setting up test cut off point

(Sensitivity or specificity), must
consider consequences of missing a
positive or a negative.
Sensitivity (andspecifity) when disease

is serious and treatment exists or when
spreading at high rate. (HIV !!!!)
Desirable to have a high (100%) sensitivity

and specificity.
In real life it isnt so, espec. Continuous

variables.
Lowering criterion for +ve means more people

with disease will test +ve ( sensitivity)
But people without disease will also among those

testing positive (Specificity). (Thus the test will be
very sensitive but less specific).
When criterion those without disease will

(specificity). But those with disease will .
Thus it will be more specific but less sensitive.
Measuring (Quantifying)
validity:
Predictive Values:
Accuracy of a test is alternatively
described as:
The extent to which being

categorized as positive or negative
predicts the presence or absence of
the disease.
This is given as positive or negative
predictive values.

Positive Predictive Value (PV+):
= (Predictive value of a positive
test) is percentage of persons who
are deemed positive by new test
and confirmed so by standard.

Negative Predictive Value (PV-):
= (Predictive Value of a negative
test) is percentage of persons who
are deemed negative by new test
and confirmed so by standard.
(This is proportion of people being

correctly labeled diseased or not
disease).
GOLD STANDARD
(DIAGNOSTIC) TEST
SCREENING
TEST
TOTAL
+
TP (a)
FP (b)
TP+FP
FN (c)
TN (d)
FN+TN
FP+TN
TP+FP+
FN+TN
TOTAL
TP+FN
Validity:
Sensitivity = TP
TP +FN
Specificity = TN
FP +TN
PV +
TP
TP +FP
PV -
TN
FN +TN

PV+
= a
(Proportion of +ves by test
who are
a+b actually with disease).
PV= d
(Proportion of -ves by test who
are
c+d actually without
disease).
Measures of validity
cont:
In rare disease PV- are high cause

most of those tested will be ve.
Predictive Values depend not only on

validity of test (Sensitivity, specificity)
but also on prevalence of disease.
Test that is more specific will make person with

+ve test likely to have the disease. Thus the
greater the PV+ (More accurately spotting the
ve).
Test that is more sensitive will make person
with a ve test likely to have no disease. Thus
the greater the PV-.
No matter how specific test is, the positives in
a disease with low prevalence are likely to be
false positives.
PREDICTIVE VALUE & SPECIFICITY OF

TEST
Specificity is one factor that affects PV

of a test.
Thus increase in specificity results in a
much greater in PV+ than does the
same in sensitivity.
Diagnostic test
SCREENIN
G TEST
Prev. =
50%
Sens. =
50%
Spec. =
50%
PV+ = ??
1000
Diagnostic Test
+
Screenin
g Test
Prev. = 20%
Sens. =
50%
Spec. =
50%
PV+ = ??
(1000)
Diagnostic Test
Screening
Test
Prev. = 20%
Sens. = 90%
Spec. = 50%
PV+ = ??
(1000)
Diagnostic Test
+
Screening
Test
Prev. = 20%
Sens. = 50%
Spec. = 90%
PV+ = ??
(1000)

TEST
Relationship between disease prevalence

and predictive value in a test with 95%
sensitivity and 95% specificity.
At 0 prevalence, chance ve test has no
disease is 100% (PV-) and the chance
that a +ve test has disease is 0% (PV+).
The rise in prevalence is accompanied by
a rise in PV+ and decrease in PV-. At
40% Prev. PV+ rises to peak while PVdeclines lower.

TEST
Most of gain in PV+ occurs with

increase in prevalence at the lowest
rates of disease prevalence i.e.
(1% - 5% Prevalence associated with
17% - 51% of predictive value).
(Prev. 20%; Pop=1000;
Sensitivity=90% Specificity=80% Calculate PV+).
REATIONSHIP OF DISEASE PREV. TO PREDICTIVE VALUE

EXAMPLE: SENSITIVITY = 99% SPECIFICITY = 95%
DISEAS
E PREV.
1%
TESTS
RESULTS
SICK
NOT
SICK
TOTALS
+
TOTALS
5%
10,000
+
TOTALS
10,000
PREDICTIVE
(+VE) VALUE
REATION BETWEEN SPECIFITY AND PREDICTIVE VALUE

EXAMPLE: PREVALENCE = 10%, SENSITIVITY = 100%
SPECIFICITY
70%
TESTS
RESULT
S
SICK
NOT SICK
TOTALS
+
TOTAL
S
95%
10,000
+
TOTAL
S
10,000
PREDICTIVE
VALUE (+VE)
Validity Cont:
Why worry about disease prevalence;
The higher the prevalence, the higher the

predictive +ve value.
Screening test is more efficient if targeted
on high risk pop.
Screening low prevalence pops can be
wasteful and yields few detected cases for
large efforts applied.
SUMMARY Cont:
Sensitivity:
- Is calculated from test results of
diseased persons.
-It is totally independent of the test
results of the non-diseased.
SUMMARY Cont.:
Specificity:
- Is calculated from test results of nondiseased persons.
- It is totally independent of the test
results of the diseased.
Predictive values rely on both results of
diseased and non-diseased. Always a
high predictive value is preferred.
SUMMARY Cont:
Altering cut-off point diagnostic test may

affect sensitivity and specificity. e.g in BP
for Hypertension.
BP defined as Diastolic 90mmHg or more.

But have some hypertensives between
80mmHg & 90mmHg.
If cut off is reduced to 80mmHg i.e. all with
80mmHg are hypertensive,
SUMMARY Cont.:
All with hypertension (True) +ves will be

detected (sensitivity).
But those without will also (false +ves) which
is specificity,
So test will be very sensitive but not specific.
When we cut off point to 100mm Hg diastolic
Those without hypertension will all be detected
true negatives (specificity).
But those with disease will ( in true positives)

which is in sensitivity.
So test will be very specific but not sensitive.
SUMMARY Cont:
In setting sensitivity or specificity levels, must

consider consequences of:
Missing actual cases (Positives e.g. Ca. Cervix).

Missing actual negatives (HIV).
sensitivity when disease is serious and

treatment exists or when spreading at high
rate and is serious.
Specificity (PV+) when treatment procedure
is cumbersome and expensive (e.g.
mastectomy).
But when early detection is important for
complete cure and treatment is invasive, then
balance the two.
RELIABILITY (REPEATABILITY,
PRECISION, REPRODUCABILITY)
Test gives consistent results when test

repeated on same person under same
condition.
Four sources of variability that can affect the
reproducibility of a screening test.
Inherent biological variability in person being

tested e.g. BP. Varies in individuals under
differing circumstances.
Reliability of the instrument being used or the
test method e.g. when temp or equipment
tilted.
Intra-observer variability.
Reliability Cont.:
Inter-Observer variability
- Two observers
- Extent to which observers agree or
disagree can be put in quantitative
terms.
Calculating Overall (%) Agreement

X-RAYS
RADIOLOGIST (OBSERVER 2)
RADIOLOGIST
(OBSERVER
1)
NORMAL
SUSPECT
DOUBTFUL
ABNORMAL
NORMAL
(A)
SUSPECT
(F)
DOUBTFUL
(L)
ABNORMAL
(Q)
Overall (%) Agreement
Percent Agreement
= A+F+L+Q
x 100
Total readings (Total x-rays read).
In general most people who are tested
have negative results.
Considerable agreement is therefore
found (between two observers) in
negative or normal tests i.e. when no
disease its easier to detect for both
observers.
% Agreement:
When one calculates percentage

agreement on all subjects
(population) per cent agreement
may be high because of the high
agreement among negative tests.
(Those with obvious disease are
few. Doubtful cases are more
difficult and few).
OBSERVER 2
+
a
OBSERVER
1
Can
Ignore
(d)
% Agreement:
This high value of percent agreement because
of the ve tests, tend to conceal significant
disagreements between the observers in
regard to identification of subjects as positive.
- hence
a
a+b+c
will only address % agreement in regard to
identifying the sick.
Kappa Statistic : (coefficient):
Agreement between two observers can be

purely by chance e.g. If no standard or criteria
for reading x-rays, agreement in many cases is
purely by chance.
Question we ask is:
To what extent do their readings agree beyond what

we would expect by chance alone.
Or
To what extent does agreement between the two

observers exceed the level of agreement that would
result just from chance.

The Kappa Statistic is used to
calculate this extent:
Kappa:
Numerator: is percent observed
agreement minus per cent agreement
expected by chance alone. (Deals
with actual observations).
Kappa Statistic (coefficient):
Denominator: Difference between

full agreement and percent agreement
expected by chance alone.
Thus Kappa quantifies the extent to
which observed agreement exceeds
that which would be expected by
chance alone.
To calculate Kappa, first calculate observed

agreement.
A = Identifies 45 slides i.e. 60% of 75 total as grade II.

B = Identifies 44 or 58.6% of all slides as grade II.
To calculate % agreement the formula is:
a+d x 100%
a+b+c+d
In this case % observed agreement is:
41+27 x 100 = 90.7%
75
PATHOL A
GRADE
II
II
PATHOL B
III
44
(56.8%)
41
a
3
b
31 (41.4%)
4
c
45 (60%)
III
30 (40%)
27
d
If 2 pathologists used entirely different

sets of criteria, how much agreement
would be expected solely on the basis
of chance?
A read 60% of all 75 slides as grade II.
If A applied a criteria independent of

that used by B,
Then A would read as grade II, 60% of
those that B has called grade II and
60% of those that B called grade III
would be grade II by A.
Thus 60% of slides called grade II by
B = 60 x 44 = 26.4
100
GRADE A
GRADE
II
II
GRADE B
III
44
(56.8%)
26.4
a
17.6
b
31 (41.4%)
18.6
c
45 (60%)
III
30 (40%)
12.4
d
60% of slides called grade III by B will

be grade II by A
= 60 x 31 = 18.6
100
Thus Agreement expected by chance
alone
= 26.4+12.4 x 100 = 51.7%
75
Kappa is calculated by formula

= (% Obser. Agre.) (% agre. Expec. by
chance)
100% - (% agre. Expec. by chance)
= 90.7% - 51.7% = 39% = 81

100% - 51.7%
48.3%
Its suggested that a Kappa of :
0.75 and above is excellent agreement

beyond chance.
0.40 is poor agreement.
Between .40 and .75 is intermediate

agreement

Validity of Screening Tests

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Validity of Screening Tests

Hochgeladen von

Copyright:

Verfügbare Formate

VALIDITY AND

VALIDITY & RELIABILITY:

Validity (accuracy) is about

There must also be reliability

Reliability is about being able to

Measures of validity cont:

Measures of validity cont:

Accuracy is thus sensitivity and

As sensitivity false Negative (FN)

Measures of validity cont:

In setting up test cut off point

Sensitivity (andspecifity) when disease

Measures of validity cont:

Desirable to have a high (100%) sensitivity

In real life it isnt so, espec. Continuous

Lowering criterion for +ve means more people

But people without disease will also among those

When criterion those without disease will

Thus it will be more specific but less sensitive.

The extent to which being

Measures of validity cont:

Measures of validity cont:

(This is proportion of people being

Measures of validity cont:

In rare disease PV- are high cause

Predictive Values depend not only on

Measures of validity cont:

Test that is more specific will make person with

PREDICTIVE VALUE & SPECIFICITY OF

Specificity is one factor that affects PV

PREDICTIVE VALUE & SPECIFICITY OF

Relationship between disease prevalence

PREDICTIVE VALUE & SPECIFICITY OF

Most of gain in PV+ occurs with

REATIONSHIP OF DISEASE PREV. TO PREDICTIVE VALUE

REATION BETWEEN SPECIFITY AND PREDICTIVE VALUE

Why worry about disease prevalence;

The higher the prevalence, the higher the

Altering cut-off point diagnostic test may

BP defined as Diastolic 90mmHg or more.

All with hypertension (True) +ves will be

But those with disease will ( in true positives)

In setting sensitivity or specificity levels, must

Missing actual cases (Positives e.g. Ca. Cervix).

sensitivity when disease is serious and

Test gives consistent results when test

Inherent biological variability in person being

Calculating Overall (%) Agreement

Overall (%) Agreement

When one calculates percentage

Kappa Statistic : (coefficient):

Agreement between two observers can be

To what extent do their readings agree beyond what

To what extent does agreement between the two

Kappa Statistic : (coefficient):

Kappa Statistic (coefficient):

Denominator: Difference between

Kappa Statistic (coefficient):

To calculate Kappa, first calculate observed

A = Identifies 45 slides i.e. 60% of 75 total as grade II.

Kappa Statistic (coefficient):

If 2 pathologists used entirely different

A read 60% of all 75 slides as grade II.

Kappa Statistic (coefficient):