Sie sind auf Seite 1von 9

Tutorial 9

Screening and diagnostic test

Overview
In this session we will learn on the value of a type of secondary prevention called
screening. Screening is a way of improving patient outcomes by detecting a disease at an
earlier, more treatable stage, or by avoiding recurrence of disease. In order to provide
effective curative or preventive health care, it is needed to distinguish between
individuals who have a disease and those who do not.
For this purpose, several tests such as physical examination; biochemical assay of blood,
urine and other body fluids; radiography; ultrasonography; cytology; and histopathology.
One question we need to answer is how good these tests are at separating individuals
with and without the disease in question. Unfortunately, several screening and diagnostic
tests are liable to error. In this chapter, you will learn about certain statistical methods for
assessing the quality of screening and diagnostic tests to help you make informed
decisions about their use and interpretation.

Learning objective
After working through this session, students were expected to be able to:
• Describe and calculate the measures of validity of a diagnostic test
• Explain the relationship between prevalence and predictive values
• List the World Health Organization guidelines for assessing appropriateness of
screening
• Describe and calculate the measure of reliability of a test.

Definition and purpose of screening


The aim of screening is to identify asymptomatic disease, or risk factors for disease, by
testing a population that has not yet developed clinical symptoms. Screening tests are
often not diagnostic and usually seek to identify small numbers of individuals at high risk
of a particular condition. Further tests are needed to confirm a diagnosis. Efficacious
screening rests on the premise that the detection of early disease, and subsequent
effective treatment, will beneficially alter the natural course of the disease and thus
improve patient outcomes.
Screening is usually considered as an example of secondary prevention although primary
prevention screening can be used to identify patients with an exposure to a risk factor,
instead of a disease. For example, screening individuals for high blood cholesterol levels

1
seeks to identify those at higher risk of coronary heart disease for targeted health
promotion or cholesterol lowering drug treatment. Screening is also used for other
purposes such as selection of people fit enough for a job or containment of infection (e.g.
screening new nurses or teachers for tuberculosis or food handlers for salmonella).
Screening is not universally beneficial and the course of certain diseases may not be
altered through early identification especially if, for example, there is no available and
effective treatment. Screening programs need to be properly evaluated before they are
implemented, using the methods already described in this book. The ethics of screening
also needs to be considered;

Reliability and validity of a screening test


An effective screening program will use a test that is able to differentiate between
individuals with a disease, or its precursor, and those without. This property of a test is
known as its validity. A screening test should also ideally be inexpensive, easy to
administer and impose minimal discomfort on those to whom it is administered. It also
needs to be reliable in that it measures a variable consistently and is free of random
error. A clinical test has yet to be developed that is able to determine with 100% accuracy
all those with and without a particular sign or symptom.
A measure of a screening test’s sensitivity is the proportion of ‘true positives’ correctly
identified with a subsequent diagnostic test. If sensitivity is low, it suggests that a number
of positive cases have been missed. These are termed the ‘false negatives’. A false
positive screening test can be costly for both the service provider and the patient. A
measure of a test’s specificity is the proportion of ‘true negatives’ correctly identified.

We cannot expect sensitivity and specificity values to be equally high for a given test, and
the importance of each measure will depend on the disease in question. In the case of a
communicable disease, for example, specificity may be considered more important as a
false positive case may have less of a public health impact than a false negative which
could result in continued transmission of the disease. Estimation of sensitivity and
specificity will depend on the definition that is used for a true positive. This may be
relatively easy when the test is for a dichotomous variable where a disease is considered
to be either present or absent. For a continuous variable, such as blood pressure, the
definition of a positive case needs to be determined and be evidence-based; this may be
by carrying out a further ‘gold standard’ diagnostic test, or by following up participants to
see who develops clinical manifestations of disease.

Predictive values
Another important measure for a screening test is the predictive value. The positive
predictive value of mammography, for example, will tell a woman how likely it is that she
has breast cancer after a positive mammogram. The negative predictive value will tell a
woman the probability is that she truly does not have breast cancer if the mammogram is

2
negative. Predictive values measure whether or not the individual actually has the
disease, given the results of the screening test, and are determined by the validity of a
test (specificity and sensitivity) and the characteristics of the population being tested
(particularly the prevalence of preclinical disease). The more sensitive a test, the less
likely it is that an individual with a negative result will have the disease, so the greater the
negative predictive value. The more specific a test, the less likely an individual with a
positive test will be free from disease and the greater the positive predictive value.
However, if the disease is rare, and the population is at a low risk of disease, the positive
results are likely to be mostly false positives. Table 12.1 summarizes the relationship
between the results of a screening test and the actual presence of disease as determined
by the result of a subsequent confirmatory diagnostic test (the ‘gold standard’).

In the table, a is the number of subjects who have the condition and are found positive
by the test (true positives), b the number of subjects who do not have the condition but
are found positive by the test (false positives), c the number of subjects who have the
condition but are found negative by the test (false negatives) and d the number of
subjects who do not have the condition and are found negative by the test (true
negatives).

Table 1. Measuring the effectiveness of a screening test

Reliability Test
Reliability mean that the results of a test or measure are identical or closely similar each
time it is conducted. When there was a different result between two test (with similar
equipment or tool) it means there was a variation between the first and second test
conducted. There were 3 kind of variation :
1. Intra subject variation
The values obtained in measuring many human characteristics often vary over
time, even during a short period. Variability over time is considerable. This, as
well as the conditions under which certain tests are conducted (e.g.,

3
postprandially or postexercise, at home or in a physician's office), clearly can lead
to different results in the same individual. Therefore, in evaluating any test result,
it is important to consider the conditions under which the test was performed,
including the time of day.
2. Intra observer variation
Sometimes variation occurs between two or more readings of the same test
results made by the same observer. For example, a radiologist who reads the
same group of X-rays at two different times may read one or more of the X-rays
differently the second time. Tests and examinations differ in the degree to which
subjective factors enter into the observer's conclusions, and the greater the
subjective element in the reading, the greater the intra observer variation in
readings is likely to be.
3. Inter observer variation
Another important consideration is variation between observers. Two examiners
often do not derive the same result. The extent to which observers agree or
disagree is an important issue, whether we are considering physical examinations,
laboratory tests, or other means of assessing human characteristics. We therefore
need to be able to express the extent of agreement in quantitative terms. We
measure this variation using kappa method.

Kappa Method
Kappa statistic was used to measure agreement between two observers to know whether
the agreement happen by chance or not. Because percent agreement is also significantly
affected by the fact that even if two observers use completely different criteria to identify
subjects as positive or negative, we would expect the observers to agree solely as a
function of chance.

Observer A
Observer B Total
+ -
+ a b a+b
- c d c+d
Total a+c b+d a+b+c+d

Kappa =

4
Kappa =

Landis and Koch, suggested that a kappa greater than 0.75 represents excellent
agreement beyond chance, a kappa below 0.40 represents poor agreement, and a kappa
of 0.40 to 0.75 represents intermediate to good agreement.

Criteria for screening


We have discussed some of the potential advantages and disadvantages of participation
in a screening programme both for the individual and for society. To ensure that the
potential for harm is minimized, programmes need to fulfil a number of criteria that
should be considered before implementation. The World Health Organization criteria for
assessing the appropriateness of screening, first published by Wilson and Jungner (1968).

Table 2. Wilson and Jungner criteria for screening (1968)

Activity 1
In a hypothetical study, 1000 patients attending a hospital general outpatient department
were tested for diabetes using the following two tests:
• fasting blood sugar (FBS)
• glucose tolerance test (GTT)

There were 100 patients who had a positive GTT, and they were classified as true cases of
diabetes. There were also 140 patients with an FBS of at least 6 mmol/l (the cut-off point
to distinguish people with diabetes from those who do not have diabetes). Among these
140 patients, only 98 were true cases of diabetes (i.e. only 98 had a positive GTT as well).

5
1. What are the sensitivity, specificity, and positive and negative predictive values of
the FBS test in this study population?

Tabel 1. Diabetes by true cases againt FBS Test results(cut-off 6mmol/l)

Test Results (FBS) Diabetes(GTT) Total


Positive Negative
Positive 98 42 140
Negative 2 858 860
Total 100 900 1000
FBS sensitivity= 98/100x100=98%
FBS specificity=858/900x100=95%
FBS Positive Predictive Value=98/140x100=70%
FBS Negative Predictive Value=858/860x100=99,8%
2. When the cut-off point for the FBS was raised to 7 mmol/l, the sensitivity of the
test decreased to 95% and the specificity increased to 98% in the hypothetical
study population.
Calculate the positive predictive value and false negative error rate of FBS at this
cut-off point.
Table 2. Diabetes by trur cases against FBS Test results(cut-off 7 mmol/l)
Test Results (FBS) Diabetes(GTT) Total
Positive Negative
Positive 95 18 113
Negative 5 882 887
Total 100 900 1000
FBS positive predictive value=95/113x100=84%
FBS false negative error rate=5/100x100=5%
3. The FBS test and GTT were used in a hypothetical community survey to screen for
diabetes. Among 1000 people surveyed, 40 people had a positive GTT for
diabetes and were classified as true cases of diabetes. An FBS cut-off value of 6
mmol/l was used to distinguish between people with and without diabetes; you
can assume that at this cutoff point the FBS had a sensitivity of 98% and
specificity of 95%.
What are the positive predictive value and false negative error rate of FBS in this
survey?
Table 3. Diabetes by trur cases against FBS Test results(cut-off 6 mmol/l)
Test Results (FBS) Diabetes(GTT) Total

6
Positive Negative
Positive 39 48 97
Negative 1 912 913
Total 40 960 1000
FBS positive predictive value=39/97x100=40%
FBS false negative error rate=1/40x100=2,5%

4. Why is the positive predictive value different from that observed in the
hypothetical hospital-based study?
Assume that if the cut-off point of FBS is increased to 7.5 mmol/l, the sensitivity is
90% and the specificity is 99% for diagnosing diabetes.
Meskipun sensitivitas dan spesifisitas FBS pada titik batas ini adalah sama
dalam studi berbasis rumah sakit hipotetis dalam survei komunitas, ada
pengurangan yang nyata dalam nilai prediksi positif FBS dalam survei
komunitas. Hal ini disebabkan oleh fakta bahwa prevalensi diabetes pada
populasi rumah sakit lebih tinggi (10%) daripada di masyarakat (4%).
5. What are the positive predictive value and the false negative error rate of FBS if
the cut-off point of 7.5 mmol/l is used to screen for diabetes in this community?
Table 4. Diabetes by trur cases against FBS Test results(cut-off 7,5 mmol/l)
Test Results (FBS) Diabetes(GTT) Total
Positive Negative
Positive 36 10 46
Negative 4 950 954
Total 40 960 1000
FBS positive predictive value= 36/46x100=78%
FBS false negative error rate=4/40x100=10%
6. If you were asked to fix the cut-off point of FBS for a survey of your community
would you select 6 mmol/l or 7 mmol/l? Give reasons for your answer.
Saya merekomendasikan 6 mmol / l sebagai titik batas yang sesuai untuk FBS
karena tingkat kesalahan negatif palsu lebih rendah (2,5%) pada tingkat 6
mmol / l daripada di 7mmol / l. tingkat kesalahan negatif palsu yang lebih
rendah, alasannya sebagai berikut:
a. Stres fisik dan psikologis setelah tes positif palsu minimal karena tes
diagnostik lebih lanjut tersedia untuk mengkonfirmasi atau membantah
diagnosis diabetes.
b. Ada pengobatan yang efektif untuk diabetes yang dapat mencegah
komplikasi yang lebih parah dari diabetes yang tidak diobati.

Activity 2
A physical examination was used to screen for breast cancer in 2500 women with biopsy
proven adenocarcinoma of the breast and in 5000 age and race matched control women.

7
The results of the physical examination were positive (mass was palpated) in 1800 cases
and 800 control women.

Calculate the sensitivity, specificity and positive predictive value of the physical
examination.

Table 1. Physical examination with bopsy adenocarcinoma of the breast cancer


Pemeriksaan fisik Breast cancer Total
Positive Negative
Positive 1800 1700 3500
Negative 700 800 1500
Total 2500 2500 5000
Sensitivity = 1800/2500x100=72%
Specificity= 800/2500x100=32%
Positive predictive value= 1800/3500=51%

Activity 3
A study was carried out in Hospital X to investigate the reliability of nutrition screening
tool. Two Dietisien were asked to determine the risk of malnutrition among 100 patients
pre-surgery in Hospital X by using Nutritional Risk Screening 2002 (NRS). The
classification is divided into risk and not risk. The comparison of their classification is
shown in following table:

Classification of malnutrition risk by NRS 2002

Dietisien 2
Dietisien 1 Risk Not Risk Total
Risk 40 20 60
Not Risk 10 30 40
Total 50 50 100

1. The simple, overall percent agreement between two dietisien out of the total is…
a+d/a+b+c+d= 40+30/40+20+10+30= 0,7
2. The overall percent agreement between the two dietitian, removing the result of NRS
2002 that both dietisien classified as not risk is
30
3. The value kappa is:

8
0,7-0,5/1-0,5= 0,2/0,5=0,4
4. This kappa represents which kind agreement? ( excellent or intermediate to good or
poor) intermedieate

Das könnte Ihnen auch gefallen