Sie sind auf Seite 1von 53

VALIDITY AND

REALIABILITY OF
SCREENING
TESTS

SCREENING OR DIAGNOSTIC
TESTS
Epidem. is about (among other things)
determining the prevalence or
incidence of disease in pops.
Usually, pop. is examined to decide
whether condition is present or not.
Screening procedure is for early
detection of disease process.
Procedure & examiner must be valid and
Reliable.

VALIDITY & RELIABILITY:

Validity (accuracy) is about


recognizing a condition where it is
present or not.

There must also be reliability


(Consistency).

Reliability is about being able to


produce the same finding when
examination done more than once.

VALIDITY :
Simply ability of test to do what it
purports to do (ACCURATE).
- i.e. correctly categorize those that
are +ve.
or correctly categorize those that
are
ve
Consider a diagnostic test (e.g.
Dipstix). (For Dichotomous Results.)

DISEASE STATUS
DIAGNOSIS (TRUTH)
POSITIVE

NEGATIVE

TOTAL

SCREENING
TEST
POSITIVE

a (TP)

b (FP)

a+b

NEGATIVE

c (FN)

d (TN)

c+d

a+c

b+d

a+b+c+d

TOTAL

VALIDITY Cont.:
a = Those with disease detected by test
(True positives - TP)
b= Those without disease that test
says they do (False positives - FP)
c= Those with disease that test says
dont have (False negatives - FN)
d= Those without disease which test
says they dont (True negatives TN)

Measuring (Quantifying)
validity:
Sensitivity:
= Proportion of positives (Those with
disease) that the test is able to detect.
i.e. a (Probab. that +ve will be called
+ve)
a+c
(Able to give +ve findings when person
has disease)

Measures of validity cont:


Specificity:
= Proportion of those without disease
that test was able to detect
i.e. d (Probability of a -ve being called
-ve)
b+d
(Able to give -ve findings when person
has no disease)

Measures of validity cont:

Accuracy is thus sensitivity and


specificity.

As sensitivity false Negative (FN)


As specificity false Positive (FP)

Measures of validity cont:

In setting up test cut off point


(Sensitivity or specificity), must
consider consequences of missing a
positive or a negative.

Sensitivity (andspecifity) when disease


is serious and treatment exists or when
spreading at high rate. (HIV !!!!)

Measures of validity cont:

Desirable to have a high (100%) sensitivity


and specificity.

In real life it isnt so, espec. Continuous


variables.

Lowering criterion for +ve means more people


with disease will test +ve ( sensitivity)

But people without disease will also among those


testing positive (Specificity). (Thus the test will be
very sensitive but less specific).

When criterion those without disease will


(specificity). But those with disease will .

Thus it will be more specific but less sensitive.

Measuring (Quantifying)
validity:
Predictive Values:
Accuracy of a test is alternatively
described as:

The extent to which being


categorized as positive or negative
predicts the presence or absence of
the disease.
This is given as positive or negative
predictive values.

Measures of validity cont:


Positive Predictive Value (PV+):
= (Predictive value of a positive
test) is percentage of persons who
are deemed positive by new test
and confirmed so by standard.

Measures of validity cont:


Negative Predictive Value (PV-):
= (Predictive Value of a negative
test) is percentage of persons who
are deemed negative by new test
and confirmed so by standard.

(This is proportion of people being


correctly labeled diseased or not
disease).

GOLD STANDARD
(DIAGNOSTIC) TEST

SCREENING
TEST

TOTAL

+
TP (a)

FP (b)

TP+FP

FN (c)

TN (d)

FN+TN

FP+TN

TP+FP+
FN+TN

TOTAL

TP+FN

Validity:

Sensitivity = TP
TP +FN

Specificity = TN
FP +TN

PV +

TP
TP +FP

PV -

TN
FN +TN

Measures of validity cont:


PV+
= a
(Proportion of +ves by test
who are
a+b actually with disease).
PV= d
(Proportion of -ves by test who
are
c+d actually without
disease).

Measures of validity
cont:

In rare disease PV- are high cause


most of those tested will be ve.

Predictive Values depend not only on


validity of test (Sensitivity, specificity)
but also on prevalence of disease.

Measures of validity cont:

Test that is more specific will make person with


+ve test likely to have the disease. Thus the
greater the PV+ (More accurately spotting the
ve).
Test that is more sensitive will make person
with a ve test likely to have no disease. Thus
the greater the PV-.
No matter how specific test is, the positives in
a disease with low prevalence are likely to be
false positives.

PREDICTIVE VALUE & SPECIFICITY OF


TEST

Specificity is one factor that affects PV


of a test.
Thus increase in specificity results in a
much greater in PV+ than does the
same in sensitivity.

Diagnostic test
SCREENIN
G TEST

Prev. =
50%

Sens. =
50%
Spec. =
50%

PV+ = ??

1000

Diagnostic Test

+
Screenin
g Test

Prev. = 20%
Sens. =
50%
Spec. =
50%
PV+ = ??

(1000)

Diagnostic Test

Screening
Test
Prev. = 20%
Sens. = 90%
Spec. = 50%
PV+ = ??

(1000)

Diagnostic Test

+
Screening
Test

Prev. = 20%
Sens. = 50%
Spec. = 90%
PV+ = ??

(1000)

PREDICTIVE VALUE & SPECIFICITY OF


TEST

Relationship between disease prevalence


and predictive value in a test with 95%
sensitivity and 95% specificity.
At 0 prevalence, chance ve test has no
disease is 100% (PV-) and the chance
that a +ve test has disease is 0% (PV+).
The rise in prevalence is accompanied by
a rise in PV+ and decrease in PV-. At
40% Prev. PV+ rises to peak while PVdeclines lower.

PREDICTIVE VALUE & SPECIFICITY OF


TEST

Most of gain in PV+ occurs with


increase in prevalence at the lowest
rates of disease prevalence i.e.
(1% - 5% Prevalence associated with
17% - 51% of predictive value).
(Prev. 20%; Pop=1000;
Sensitivity=90% Specificity=80% Calculate PV+).

REATIONSHIP OF DISEASE PREV. TO PREDICTIVE VALUE


EXAMPLE: SENSITIVITY = 99% SPECIFICITY = 95%

DISEAS
E PREV.

1%

TESTS
RESULTS

SICK

NOT
SICK

TOTALS

+
TOTALS

5%

10,000

+
TOTALS

10,000

PREDICTIVE
(+VE) VALUE

REATION BETWEEN SPECIFITY AND PREDICTIVE VALUE


EXAMPLE: PREVALENCE = 10%, SENSITIVITY = 100%

SPECIFICITY

70%

TESTS
RESULT
S

SICK

NOT SICK

TOTALS

+
TOTAL
S

95%

10,000

+
TOTAL
S

10,000

PREDICTIVE
VALUE (+VE)

Validity Cont:

Why worry about disease prevalence;

The higher the prevalence, the higher the


predictive +ve value.
Screening test is more efficient if targeted
on high risk pop.
Screening low prevalence pops can be
wasteful and yields few detected cases for
large efforts applied.

SUMMARY Cont:
Sensitivity:
- Is calculated from test results of
diseased persons.
-It is totally independent of the test
results of the non-diseased.

SUMMARY Cont.:
Specificity:
- Is calculated from test results of nondiseased persons.
- It is totally independent of the test
results of the diseased.
Predictive values rely on both results of
diseased and non-diseased. Always a
high predictive value is preferred.

SUMMARY Cont:

Altering cut-off point diagnostic test may


affect sensitivity and specificity. e.g in BP
for Hypertension.

BP defined as Diastolic 90mmHg or more.


But have some hypertensives between
80mmHg & 90mmHg.
If cut off is reduced to 80mmHg i.e. all with
80mmHg are hypertensive,

SUMMARY Cont.:

All with hypertension (True) +ves will be


detected (sensitivity).
But those without will also (false +ves) which
is specificity,
So test will be very sensitive but not specific.
When we cut off point to 100mm Hg diastolic
Those without hypertension will all be detected
true negatives (specificity).

But those with disease will ( in true positives)


which is in sensitivity.
So test will be very specific but not sensitive.

SUMMARY Cont:

In setting sensitivity or specificity levels, must


consider consequences of:

Missing actual cases (Positives e.g. Ca. Cervix).


Missing actual negatives (HIV).

sensitivity when disease is serious and


treatment exists or when spreading at high
rate and is serious.
Specificity (PV+) when treatment procedure
is cumbersome and expensive (e.g.
mastectomy).
But when early detection is important for
complete cure and treatment is invasive, then
balance the two.

RELIABILITY (REPEATABILITY,
PRECISION, REPRODUCABILITY)

Test gives consistent results when test


repeated on same person under same
condition.
Four sources of variability that can affect the
reproducibility of a screening test.

Inherent biological variability in person being


tested e.g. BP. Varies in individuals under
differing circumstances.
Reliability of the instrument being used or the
test method e.g. when temp or equipment
tilted.
Intra-observer variability.

Reliability Cont.:

Inter-Observer variability

- Two observers
- Extent to which observers agree or
disagree can be put in quantitative
terms.

Calculating Overall (%) Agreement


X-RAYS
RADIOLOGIST (OBSERVER 2)

RADIOLOGIST
(OBSERVER
1)

NORMAL

SUSPECT

DOUBTFUL

ABNORMAL

NORMAL

(A)

SUSPECT

(F)

DOUBTFUL

(L)

ABNORMAL

(Q)

Overall (%) Agreement

Percent Agreement
= A+F+L+Q
x 100
Total readings (Total x-rays read).
In general most people who are tested
have negative results.
Considerable agreement is therefore
found (between two observers) in
negative or normal tests i.e. when no
disease its easier to detect for both
observers.

% Agreement:

When one calculates percentage


agreement on all subjects
(population) per cent agreement
may be high because of the high
agreement among negative tests.
(Those with obvious disease are
few. Doubtful cases are more
difficult and few).

OBSERVER 2

+
a

OBSERVER
1

Can
Ignore
(d)

% Agreement:
This high value of percent agreement because
of the ve tests, tend to conceal significant
disagreements between the observers in
regard to identification of subjects as positive.
- hence
a
a+b+c
will only address % agreement in regard to
identifying the sick.

Kappa Statistic : (coefficient):

Agreement between two observers can be


purely by chance e.g. If no standard or criteria
for reading x-rays, agreement in many cases is
purely by chance.
Question we ask is:

To what extent do their readings agree beyond what


we would expect by chance alone.

Or

To what extent does agreement between the two


observers exceed the level of agreement that would
result just from chance.

Kappa Statistic : (coefficient):


The Kappa Statistic is used to
calculate this extent:
Kappa:
Numerator: is percent observed
agreement minus per cent agreement
expected by chance alone. (Deals
with actual observations).

Kappa Statistic (coefficient):

Denominator: Difference between


full agreement and percent agreement
expected by chance alone.
Thus Kappa quantifies the extent to
which observed agreement exceeds
that which would be expected by
chance alone.

Kappa Statistic (coefficient):

To calculate Kappa, first calculate observed


agreement.

A = Identifies 45 slides i.e. 60% of 75 total as grade II.


B = Identifies 44 or 58.6% of all slides as grade II.
To calculate % agreement the formula is:
a+d x 100%
a+b+c+d
In this case % observed agreement is:
41+27 x 100 = 90.7%
75

PATHOL A
GRADE

II
II

PATHOL B

III

44

(56.8%)

41
a

3
b

31 (41.4%)

4
c
45 (60%)

III

30 (40%)

27
d

Kappa Statistic (coefficient):

If 2 pathologists used entirely different


sets of criteria, how much agreement
would be expected solely on the basis
of chance?

A read 60% of all 75 slides as grade II.

Kappa Statistic (coefficient):

If A applied a criteria independent of


that used by B,
Then A would read as grade II, 60% of
those that B has called grade II and
60% of those that B called grade III
would be grade II by A.
Thus 60% of slides called grade II by
B = 60 x 44 = 26.4
100

GRADE A
GRADE

II
II

GRADE B

III

44

(56.8%)

26.4
a

17.6
b

31 (41.4%)

18.6
c
45 (60%)

III

30 (40%)

12.4
d

Kappa Statistic (coefficient):

60% of slides called grade III by B will


be grade II by A
= 60 x 31 = 18.6
100
Thus Agreement expected by chance
alone
= 26.4+12.4 x 100 = 51.7%
75

Kappa Statistic : (coefficient):

Kappa is calculated by formula


= (% Obser. Agre.) (% agre. Expec. by

chance)
100% - (% agre. Expec. by chance)

= 90.7% - 51.7% = 39% = 81


100% - 51.7%
48.3%

Kappa Statistic (coefficient):

Its suggested that a Kappa of :

0.75 and above is excellent agreement


beyond chance.
0.40 is poor agreement.

Between .40 and .75 is intermediate


agreement

Das könnte Ihnen auch gefallen