Sie sind auf Seite 1von 5

Revue dEpidemiologie et de Sante Publique 58 (2010) 5963

Methodological note

Using an analytical hierarchy process (AHP) for weighting items


of a measurement scale: A pilot study
Utilisation du processus de hirarchie analytique pour la pondration des items
dune chelle de mesure : tude pilote
C. Benam a,*,b,c, D.-A. Perennou d, J.-Y. Pelissier e, J.-P. Daures f
b

a
Pole reeducation-readaptation, CHU de Dijon, 23, rue Gaffarel, 21000 Dijon, France
Inserm U887, faculte des sciences du sport, universite de Bourgogne, campus universitaire Montmuzard, 21078 Dijon, France
c
CIC-P Inserm 803, 23, rue Gaffarel, 21000 Dijon, France
d
UMR UJF CNRS 5525, laboratoire TIMC-IMAG, clinique MPR, institut de reeducation, hopital Sud,
CHU de Grenoble, BP 338, 38434 Echirolles cedex, France
e
Service de reeducation neurologique, centre helio-marin, CHU de Nmes, 30240 Le Grau-du-Roi, France
f
Laboratoire depidemiologie et de biostatistiques, IURC, 75, rue de la Cardonille, 34093 Montpellier, France

Received 28 December 2008; accepted 1 September 2009


Available online 21 January 2010

Abstract
Background. Many clinical scales contain items that are scored separately prior to being compiled into a single score. However, if the items
have different degrees of importance, they should be weighted differently before being compiled. The principal aims of this study were to show how
the analytic hierarchy process (AHP), which has never been used for this purpose, can be applied to weighting the six items of the London
handicap scale, and to compare the AHP to the conjoint analysis (CA), which was previously implemented by Harwood et al. (1994) [1].
Design. In order to assess the relative importance of the six items, we submitted AHP and CA to a group of 10 physiatrists. We compared the
methods in terms of item ranking according to importance, assessment of fictitious patients based on weights determined by each method, and
perceived difficulty by the physiatrist.
Results. For both techniques, Physical independence (PHY) was the best-weighted item, but other ranks varied depending on the
technique. AHP was better than CA in terms of accuracy (global assessment of the clinical status) and perceived difficulty.
Conclusion. AHP may be used to reveal the importance that experts assign to the items of a multidimensional scale, and to calculate the
appropriate weights for specific items. For this purpose, AHP seems to be more accurate than CA.
# 2010 Elsevier Masson SAS. All rights reserved.
Keywords: Outcome assessment (health care); Validation studies; Scoring methods; Decision theory

Resume
Position du proble`me. Les items dune echelle clinique multidimensionnelle sont habituellement cotes separement avant detre simplement
additionnes pour former le score total. Toutefois, avant cette addition, ils devraient theoriquement etre ponderes en fonction de leur importance
respective. Les objectifs du present travail etaient : (1) de montrer que la methode du processus de hierarchie analytique (PHA), jamais utilisee
dans ce cadre, permet daffecter des poids aux six items de la London Handicap Scale en fonction de leur importance, et (2) de la comparer a`
lAnalyse Conjointe (AC) precedemment utilisee par Harwood et al. (1994) [1].
Methode. Afin devaluer limportance relative des items, nous avons soumis dix medecins specialistes en medecine physique et readaptation
aux deux methodes PHA et CA. Nous avons compare les deux techniques en ce qui concerne le classement ditems par ordre dimportance,
levaluation du statut de patients fictifs avec utilisation des poids calcules et la difficulte percue par les medecins de mise en uvre de la technique.
Resultats. Pour les deux methodes, litem independance physique etait le mieux pondere, mais les autres etaient ponderes differemment
selon la technique utilisee. PHA etait superieure a` CA en termes de precision pour levaluation globale du patient et de facilite de mise en uvre.
* Corresponding author.
E-mail address: charles.benaim@chu-dijon.fr (C. Benam).
0398-7620/$ see front matter # 2010 Elsevier Masson SAS. All rights reserved.
doi:10.1016/j.respe.2009.09.004

60

C. Benam et al. / Revue dEpidemiologie et de Sante Publique 58 (2010) 5963

Conclusion. La methode du PHA peut etre employee pour evaluer limportance accordee a` des experts aux dimensions dune echelle
multidimensionnelle et pour calculer les poids des items. Dans ce cadre, PHA semble superieure a` CA.
# 2010 Elsevier Masson SAS. Tous droits reserves.
Mots cles : Evaluation ; Etude de validation ; Methode de scorage ; Theorie de la decision

1. Introduction
Health measurement is a fundamental issue in medicine, and
measurement scales are increasingly used by clinicians to assess
patients health. In Pubmed database, we identified a considerable increase in articles devoted to scale validation (key words
scale and validation): 34 articles in 1988, 152 in 1998, and
1288 in 2008. Most measurement scales comprise several ordinal
items, which are summed in order to produce a single score.
Consequently, all items have the same mathematical weight.
However, when an item is more important than another, its
weight should be increased accordingly. For instance, consider a
two-dimensional scale made up of two items scored 0 to 5:
mobility (MOB) and physical independence (PHY). If
MOB is twice PHY, then the handicap score should be
H = 2  MOB + PHY instead of H = MOB + PHY, or MOB
should be changed to an item scored 0 to 10.
To date, only few authors have questioned the way in which
item weights should be determined in a multi-item scale and have
come up with solutions taken from non-medical literature.
Harwood et al. used the technique of conjoint analysis (CA) to
develop a 6-item handicap measurement [1]. They built a matrix
of scale weights (or part utilities) relating to the different
levels of each item, the overall handicap score of a patient being
the sum of the scale weights corresponding to the clinical status.
Many other techniques have been developed to reveal the
importance which experts give to criteria, which are known as
criteria-weighting techniques. The analytic hierarchy process
(AHP) is one of the most widely recognized criteria-weighting
techniques in the field of multiple-criteria decision-making [2]. It
is based on the construction of a matrix diagramming the
comparisons of the criteria to one another (corresponding to the
preferences of the decision maker, from which criteria weights
are derived). AHP has been used in a variety of contexts, ranging
from strategic planning to solving international conflicts [3]. It
has also been successfully used in medical decision making [4
6], involving patients in decisions about their care [7], hospital
management problems [8] and problems of selecting medical
staff [9], but we do not have knowledge concerning the use of
AHP to calculate the weights of clinical items.
The principal aims of the present study were:
 to show how AHP, which has never been used for this
purpose, can be used to weight the items of a multi-item scale;
 to compare AHP with CA.
2. Methods
We choose the so-called London handicap scale as an
example of a multi-item scale. It is made up of six ordinal items:

MOB, PHY, occupation (OCC), social integration (SOC),


orientation (ORI) and economic self-sufficiency (ECO) [10].
This scale was validated before the World Health Organization
approved the new internal classification of functioning,
disability and health (2002). The six items of this scale should
now be classified as function items (ORI), participation items
(OCC, SOC, ECO) and activity items (MOB, PHY).
AHP and CA were submitted to a group of 10 physiatrists
used to dealing with handicapped patients. They were
interviewed as described below; the techniques were administered in random order.
2.1. Analytic hierarchy process
Each physiatrist was interviewed in order to complete a
comparison matrix. For each pair of items, he indicated which
was the most important and to what extent, using a 19 ordinal
scale. For example, if the preference of MOB over PHY was
absolute, then the entry (MOB, PHY) of the matrix was set at 9,
and the entry (PHY, MOB) was set at 1/9. If MOB and OCC
were of equal importance, then (MOB, OCC) and (OCC,
MOB) were set at 1, and so on (Table 1). The preference for one
criterion over another can be absolute (9), very strong
(7), strong (5), moderate (3) or equal (1). Intermediate values are possible (2, 4, 6, 8). Once the comparison
matrix is obtained, its first eigenvector is calculated using
mathematical software, the components of which were the
estimations of the criteria weights (see technical details in [2]).
The global score of a patient is then the mean of his weighted
item scores.
2.2. Conjoint analysis
With six 5-level items, 15,625 (56) profiles may be
described, from which direct measurements (utilities), of

Table 1
An example of reciprocal pairwise comparison matrix filled during physiatrist
interview.

MOB
PHY
OCC
SOC
ORI
ECO

MOB

PHY

OCC

SOC

ORI

ECO

1
1/2
1/5
1/7
1
1/9

2
1
1/3
1/6
3
1/8

5
3
1
1/3
7
1/5

7
6
3
1
7
1/3

1
1/3
1/7
1/7
1
1/9

9a
8
5
3
9
1

MOB: mobility; PHY: physical independence; OCC: occupation; SOC: social


integration; ORI: orientation; ECO: economic self-sufficiency.
a
For instance, the preference for MOB over ECO was felt to be absolute in
the opinion of this physiatrist.

C. Benam et al. / Revue dEpidemiologie et de Sante Publique 58 (2010) 5963

61

Table 2
An example of fictitious profile to be assessed by the physiatrists (5,5,3,3,1,1).
Item range 15.

responses were then compared to results from the comparisons


of the calculated profile scores using:

Item

Description

Mobility

5: confined to a bed or a chair. Cannot move


around at all. There is no one to move the
patient
5: need help with everything. Need constant
attention, day and night
3: being unable to do many things, but busy
most of the time
3: feeling comfortable only with a few close
friends or family. Making new acquaintances
is not easy
1: normal
1: complete independence

 the item weights (AHP);


 the scale weights (CA).

Physical independence
Occupation
Social integration

Orientation
Economic self sufficiency

Concordance between the two sets of responses was


measured by Kappa coefficients.
The time required by the physiatrists to complete each
method was recorded and they were asked to indicate which
was the more difficult.
Calculations were carried out with NCSS for CA [12], and
MATHLAB for AHP [13].
3. Results

course, cannot be taken. However, CA requires direct


measurements of only 30 profiles (five levels  six items)
from which a multilinear regression technique produces a
matrix of 30 scale weights associated with the levels of each
item (for technical details, see [1]). The status of any patient can
then be estimated by combining the scale weights corresponding to the patient. As the measurement of 30 profiles would
have been too costly to be feasible for interviews, we
considered only three levels in each item (one, three and
five), which reduced to 18 the number of profiles to be
measured by the physiatrists. They were asked to assess the
status of the 18 profiles on 0100 visual analogous scales. An
example of a profile is given in Table 2. One matrix of 18 scale
weights (six items  three levels) was estimated for each
physiatrist along with an aggregated matrix containing the
mean weights. Since only the scale weights of three levels of the
items were estimated, the two missing levels were interpolated
linearly.
2.3. Comparing AHP and CA
The comparison of AHP and CA followed a procedure close
to the one proposed by Schoemaker and Carter Waid for
comparing different criteria-weighting techniques, which he
called the 10 binary choices [11]. Physiatrists were
presented 10 pairs of randomly selected profiles (fictitious
patients), in which the sums of item scores were close to one
another, for example: 1,1,5,1,3,1 and 5,1,1,1,1,3. For each
pair, the physiatrist was asked to identify the worse profile. His

Ten physiatrists were interviewed. All had at least 5 years of


clinical experience in physical medicine and rehabilitation.
3.1. Analytic hierarchy process
Item weights have been re-scaled in order to obtain
calculated profile scores ranging from 0 to 100. Item weights
were 5.1 (1.9), 7.4 (2.1), 3.1 (1.9), 3.3 (2.3), 3.4 (1.9), 2.7 (1.9)
for MOB, PHY, OCC, SOC, ORI, ECO, respectively, and the
intercept was 25. The score of a patient with the profile
1,3,1,3,4,2 was therefore: 25 + 5.1  1 + 7.4  3 + 3.1 
1 + 3.3  3 + 3.4  4 + 2.7  2 = 34.3.
3.2. Conjoint analysis
Means of scale weights are reported in Table 3, along with the
importance of the items, which is the difference between the
highest and the lowest scale weights. Values have been re-scaled
in order to obtain calculated profile scores ranging from 0 to 100.
For example, a patient with a profile 1,3,1,3,4,2 would be
scored 0.43 2.88 0.82 + 7.17 + 11.44 + 4.71 = 20.05. The
most important item was PHY (25.38), followed by OCC
(19.35), ORI (16.77), SOC (15.56), ECO (11.55) and MOB
(11.40).
3.3. Comparing AHP and CA
Both methods revealed that the weight of PHY was (much)
higher than the others according to physiatrists, but the ranking
of the five remaining items was not concordant.

Table 3
Means of scale weights associated with item levels.
Items
Mobility
Physical independence
Occupation
Social integration
Orientation
Economic self sufficiency

Level 1

Level 2

Level 3

Level 4

Level 5

Importance

0.43
0.06
0.82
0.18
0.54
1.17

5.29
1.47
1.94
3.49
3.06
4.71

10.14
2.88
4.71
7.17
6.66
8.24

10.99
11.22
11.62
11.27
11.44
10.48

11.83
25.32
18.53
15.38
16.22
12.73

11.40
25.38
19.35
15.56
16.77
11.55

Values have been re-scaled in order to obtain calculated profile scores ranging from 0 to 100. Importance = level 5 level 1.

C. Benam et al. / Revue dEpidemiologie et de Sante Publique 58 (2010) 5963

62

Table 4
Concordance between the 10 binary choices and methods CA and AHP for the 10 physiatrists (Kappa coefficients).
Methods

CA
AHP

Physiatrists
CF

PG

RE

VC

AA

0.19
0.40

0.42
0.36

0.57
0.57

0.55
0.31

0.07
1.00

The concordance between AHP and the 10 binary choices


was good (Kappa  0.40) for six physiatrists, and very poor
(Kappa close to 0 or negative) for two. The concordance
between CA and the 10 binary choices was good for only
three physiatrists, and very poor for six (Table 4).
The time required to carry out the interviews was
significantly lower for AHP than for CA: 3.7 (1.0) minutes
versus 6.9 (1.6) minutes (Wilcoxon test, p < 0.02). AHP was
perceived to be less difficult than CA for eight physiatrists out
of 10, and of equal difficulty for two.
4. Discussion
Many mathematical techniques have been proposed in
multiple-criteria decision-making and in marketing to compile
criteria scores [14]. However, there have been very few attempts
to re-scale multi-item scales using such techniques. For example,
Harwood et al. used the CA to estimate the utilities attached to
levels of handicap items (scale weights) [1], Stineman et al. used
feature-resource trade-off games for the items of the functional
independence measure [15]. Both techniques rely on assessments
or comparisons of profiles representing fictitious patients. We
have no knowledge of any previous studies using criteriaweighting techniques based on the assessments or comparisons
of clinical items (instead of profiles) for this purpose. In the
present study, we compared AHP, which is a criteria-weighting
technique based on pairwise item comparisons, with CA for the
weighting of items of the London handicap scale.
Most of the physiatrists found that AHP interviews were
easier than CA interviews. This result is consistent with those of
Schoemaker and Carter Waid who compared five criteriaweighting techniques, including AHP and two techniques
involving the assessment of fictitious profiles (multiple linear
regression analysis of holistic assessments and direct decomposed tradeoffs) [11]. He found AHP to be significantly better
in terms of perceived difficulty and trustworthiness. A possible
explanation for this result is that the profiles are actually a very
restricted representations of (fictitious) patients, representations which may be difficult to assess by physicians. Only the
levels of six items are described, while other pertinent data,
such as age, sex or profession, are not included. Moreover,
comparing one profile with another requires considering a large
number of item scores at the same time (12 item scores for the
comparison of two 6-item profiles), which is also problematic
in fictitious cases. Making AHP pairwise comparisons of items
thus seems to be easier.
The predictability of AHP was better in our study, while
Schoemaker found no difference. It is probable that the small
number of criteria in his study (four) was partly responsible for

EC
0.03
0.74

BM
0.15
1.00

JP
0.03
0.74

AB
0.03
0.18

SL
0.40
0.21

the absence of differences between the methods, as it is wellknown that the smaller the number of criteria, the easier the
weight estimation, and the smaller the differences between
methods [16].
In this study, the importance of items was quite different
from the one calculated by Harwood et al., using CA with the
same handicap items (in a 6-level version) [1]. We found PHY
to be the item with the highest importance, while Harwood
found ECO to be the most important, followed by ORI, MOB,
PHY, OCC and SOC. This difference is due to the different
origins of populations included in the studies (physiatrists vs
general population), and remind us that CA (or any criteriaweighting technique) is not intended for estimating the real
scale weights of a multi-dimensional scale, but to reveal
preferences of the subjects interviewed about the scale. As a
consequence, the weights calculated above are (exclusively)
suited to an objective measurement of a patients status by
physiatrists. For another purpose, such as a self-assessment
made by consumers, a new set of weights should be estimated.
Stineman et al. found such a difference in the assessment of
functional states, clinicians placing greater value on cognitive
skills than do consumers [15]. As stressed in the new internal
classification of functioning, disability and health, the
patients environment has to be taken into account and is thus
another possible major determinant of item weights. For
example, the ability to climb stairs is crucial for a stroke patient
living at home without an elevator, while it is less important for
a patient living in a single-storey house. The item climbing
stairs (or the level able to climb stairs in a MOB item)
should then be weighted accordingly.
Although it seems obvious to us that weighting items is
essential prior to using a multi-item scale in a particular
context, it is also true that weighting techniques are not always
easy to implement. In the present study, we submitted to experts
15 pairwise comparisons (AHP) and 18 fictitious profiles (CA)
to assess the weights of six items. The interview conditions
were acceptable, but they would have been very difficult with a
greater number of items. For instance, the number of pairwise
comparisons (AHP) should have been 45 for a 10-item scale,
which is nearly impossible to implement. A large number of
items is thus a serious limitation for criteria-weighting
techniques. In such cases, grouping items into a small number
of homogeneous clusters prior to applying weighting techniques to the clusters may be an acceptable solution.
Validity and reliability are the main qualities to be assessed for
the validation of a multi-dimensional clinical scale [17].
However, item weighting should also be an important stage in
the validation process, since items will be aggregated into a
single global score. This can be done either along with the

C. Benam et al. / Revue dEpidemiologie et de Sante Publique 58 (2010) 5963

assessment of validity and reliability [1], or afterwards on a


previously validated scale [15]. Many techniques, which are well
know in multiple-criteria decision-making, in economics, in
marketing or in psychometrics, can accurately reveal the
preferences of subjects concerning criteria and thus may be
applied to multi-item clinical assessment. However, some of
them, while having different names, are identical mathematically. For example, judgment analysis in psychometrics
[18,19], and policy capturing in multiple-criteria decisionmaking [16] are both based on multilinear regression. The choice
of a weighting technique has been extensively analyzed and has
been subject of intense debate [14]. For some authors, it has little
importance with respect to the final result and the decision made
[20], while for others it can dramatically influence weight
estimation [11,2125]. We found AHP to be better than CA for
re-scaling a 6-item scale, but this may not be the case in another
context. Still, our understanding of the cognitive aspects of
weighting procedures remains insufficient [26,27].
References
[1] Harwood R, Rogers A, Dickinson E, Ebrahim S. Measuring handicap: the
London handicap scale, a new outcome measure for chronic disease. Qual
Health Care 1994;3:116.
[2] Saaty T. A scaling method for priorities in hierarchical structures. J Math
Psychol 1977;15:23481.
[3] Zahedy F. The analytic hyerarchical process. A survey of the method and
its applications. Interfaces 1986;16:96108.
[4] Dolan JG, Isselhardt Jr BJ, Cappuccio JD. The analytic hierarchy process
in medical decision making: a tutorial. Med Decis Making 1989;9:4050.
[5] Dolan JG. Medical decision making using the analytic hierarchy process:
choice of initial antimicrobial therapy for acute pyelonephritis. Med Decis
Making 1989;9:516.
[6] Hatcher M. Voting and priorities in health care decision making, portrayed
through a group decision support system, using analytic hierarchy process.
J Med Syst 1994;18:26788.
[7] Dolan JG, Bordley DR. Involving patients in complex decisions about
their care: an approach using the analytic hierarchy process. J Gen Intern
Med 1993;8:2049.
[8] Tarimcilar MM, Khaksari SZ. Capital budgeting in hospital management
using the analytic hierarchy process. Socioecon Plann Sci 1991;25:2734.

63

[9] Weingarten MS, Erlich F, Nydick RL, Liberatore MJ. A pilot study of the
use of the analytic hierarchy process for the selection of surgery residents.
Acad Med 1997;72:4002.
[10] World Health Organization. International classification of impairments,
disabilities and handicaps. Geneva: WHO, 1980.
[11] Schoemaker P, Carter Waid C. An experimental comparison of different
approaches to determining weights in additive utility models. Mgmt Sci
1982;28:18296.
[12] NCSS: Number cruncher statistical system (Computer Program), version
97. Kaysville, Utah; 1997 (program).
[13] Matlab (Computer program), version 5.1. USA;1997 (program).
[14] Mousseau V. Analyse et classification de la litterature traitant de limportance relative des crite`res en aide multicrite`re a` la decision. Rech Oper
1992;26:36789.
[15] Stineman M, Maislin G, Nosek M, Fiedler R. Comparing consumer and
clinician values for alternative functional states: application of a new
feature trade-off consensus building tool. Arch Phys Med Rehabil 1998;
79:15229.
[16] Adelman L, Sticha P, Donnell M. The role of task properties in determining the relative effectiveness of multi-attribute weighting techniques.
Organiz Behav Hum Perf 1984;33:24362.
[17] McDowell I, Newell C. Measuring health: a guide to rating scales and
questionnaires. New York: Oxford University Press; 1987.
[18] Fisch H, Hammond K, Joyce C, OReilly M. An experimental study of the
clinical judgment of general physicians in evaluating and prescribing for
depression. Brit J Psychiat 1981;138:1009.
[19] OBoyle C, McGee H, Hickey A, OMalley K, Joyce C. Individual quality
of life in patients undergoing hip replacement. Lancet 1992;339:108891.
[20] Wainer H. Estimating coefficients in linear models: it dont make no never
mind. Psychological Bull 1976;83:2137.
[21] Eckenrode R. Weighting multiple criteria. Mgmt Sci 1965;12:18091.
[22] Nutt C. Comparing methods for weighting decision criteria. Omega
1980;8:16372.
[23] Belton V. A comparison of the analytic hierarchy process and a simple
multi-attribute value function. EJOR 1986;26:721.
[24] Borcherding K, Eppel T, von Winterfeld D. Comparisons of weighting
judgments in multiattribute utility measurements. Mgmt Sci 1991;37:
160319.
[25] Olson D, Moshkovich H, Schellenberg R, Mechitov A. Consistency and
accuracy in decision aids: Experiments with four multiattribute systems.
Decis Sci 1996;26:72348.
[26] Weber M, Borcherding K. Behavioral influences on weight judgments in
multiattribute decision making. EJOR 1993;67:112.
[27] Roy B. Multicriteria methodology for decision aiding. Dordrecht: Kluwer
Academic; 1996.

Das könnte Ihnen auch gefallen