Ann. Rev. Public Health. 1985. 6:295-324

Mark C. Hornbrook
Kaiser Permanente Center for Health Research, Portland, Oregon 97215; and Depart­
ment of Community Health Care Systems, School of Nursing, The Oregon Health
Sciences University, Portland, Oregon 97201


Defining Hospital Output

The output of a hospital has three fundamental dimensions: volume, case mix,
and quality. Volume refers to the total number of patients treated. Case mix is
the array of different diseases and health problems treated. Quality is the
hospital's contribution to the successful outcome or resolution of patients'
illnesses or health problems. Both case mix and quality present extremely
thorny issues for measurement; this article focuses on those that should be
addressed in selecting an appropriate case-mix measure.
Defining and measuring the output of the hospital is necessary for many
reasons , including research , policy, and administration. The purpose for which
a measure is needed is the fundamental criterion on which to base selection of a
measure. For example, development of a payment system for hospitals requires
a case-mix method that defines groups of cases that are similar in their efficient
use of hospital resources. For the purpose of research on the number , size , and
specialization of hospitals, case-mix measures are needed to differentiate
degree of specialization and level of complexity of types o( diseases and
conditions treated by different hospitals over time. The purpose for which a
measure is needed establishes the logical criterion for judging the validity and
consistency of alternative measures. Without a clearly defined purpose ,
measurement becomes arbitrary. For this reason, specification of purpose is the
first step in selecting a case-mix method.


Goals of Case-Mix Measurement

One of the most crucial applications of case mix is payment of hospitals.
Prospective output-based reimbursement programs for inpatient care are being
developed by all third-party payers, public and private. In this application, a
case-mix measure should have a number of desirable properties. It should not
be subject to manipulation independent of the actual diseases of the persons
admitted. It should reflect the patient's condition, not merely the provider's
response. It should classify particular diseases similarly across hospitals and
over time-horizontal equity-yet it should be sensitive to changes in the mix
of diseases over time and place-vertical equity. It should evolve to keep pace
with new definitions of disease and newly discovered diseases. It should be
acceptable to the medical profession as representative of medical technology,

and to third-party payers as administratively feasible. It should be a reliable and
valid measure of hospital output.

Disease versus Illness

Case mix is defined as the proportion of cases of each disease and health
problem treated in the hospital. The reason for emphasizing disease as a basic
classification criterion for case-mix measurement is that it represents the
primary motivation for consumption of hospital services. Use of the hospital is
driven essentially by the need to alleviate manifestations of illnesses. "Illness­
es" are the actual disruptions in psychological and physiological processes
experienced by individuals. In contrast, "diseases" constitute conceptual mod­
els of the causes and· courses of illnesses observed to be similar in their
manifestations in most individuals. Disease models also incorporate specifica­
tion of appropriate therapy and expected responses to that therapy. To the
extent that a disease model is an accurate representation of the reality of an
illness, patients will receive appropriate care. Thus, diagnoses, which are the
titles of disease models, are relevant both to consumers, who require satisfac­
tory resolution of symptoms, and to physicians, who require identification of
appropriate treatment regimens .
Diseases, even though theoretical in nature, are not simple entities. Indeed,
it may be appropriate to say that no two people ever have the sanie disease
because each person has a unique physiology and psychology that interact with
the disease to create a unique occurrence. The sheer complexity of human
psychophysiology and the variety of possible disruptions present an almost
overwhelming obstacle to taxonomic efforts, especially for purposes of
measuring case mix. Nevertheless, generalization is necessary to accomplish
research, policy, and management objectives.

Splitting and Lumping

Case-mix measurement can be said to be a problem in aggregation and dif­
ferentiation-grouping diseases into "clusters" that are sufficiently similar with

respect to the purpose for which the measure is needed , such as cost (in
designing a payment system), and splitting disease clusters according to signifi­
cant differences in their nature that relate to the primary purpose . Determina­
tion of how much splitting or lumping is required is a problem in valuation, not
measurement precision . This issue is discussed in a later section. Development
of case-mix measures entails derivation of alternative means of grouping cases
into meaningful classes, which must then be evaluated for consistency with the
measurement goal , be it research , management, or payment.

Inpatient Case Mix

A substantial body of literature on measurement of case mix has evolved over

the past few years . It is not possible to present a comprehensive synthesis of this
literature within the space of this article. Instead, this overview acquaints the
reader with case-mix approaches tailored to the inpatient setting . Hospital
case-mix methods are the most well-developed, and the recent adoption of
prospective reimbursement by the Medicare program brings case-mix measure­
ment to the forefront of policy debate, and makes it a high priority among
hospital administrators. This article illustrates the variety of approaches with
six leading case-mix methods, specifies the outstanding issues in the case-mix
field via comparison of these alternative methods, and identifies the most
important conceptual requirements for defining and selecting a measure of case
mix. A more comprehensive treatment of the conceptual issues in this area is
available in Hornbrook (26, 27). The case-mix techniques reviewed here are
Diagnosis-Related Groups , Disease Staging, Patient Management Categories ,
APACHE, Patient Severity Index, and Acuity .


The general aims of this review are to provide an overview of the leading
approaches to case-mix measurement and to illustrate the similarities and
differences among alternative approaches in order to highlight the fundamental
principles of case-mix measurement. This is not a cookbook wherein the ideal
measure for any application is outlined . It is an attempt to foster increased
understanding of the complexities involved in this area for those who are not
familiar with the field and require a general orientation. This article does not
describe how to compute or apply particular methods. The reader must consult
the literature cited to learn the details of each technique.
The specific aims of this article are as follows:

1. To describe the leading techniques for measuring case mix in inpatient

2. To highlight the purpose, strengths, and limitations of each measure and to
give examples of their applications in research, policy, and management.
3. To indicate new methodological advances in case-mix measurement.

4. To identify the critical limitations and methodological barriers to advances

in case-mix methodology and to suggest priorities for future research.


Six case-mix methods developed primarily for hospital care are reviewed in this
section. One method, Diagnosis-Related Groups (DRGs), is now the basis for
reimbursing hospitals for care of Medicare beneficiaries. Another method,
Acuity, is being used to determine nurse staffing levels in a majority of
hospitals in this country. The other four methods are being studied for their
potential contributions to understanding patterns of health care use and to
correcting problems inherent in the DRGs.

Diagnosis-Related Groups
PURPOSE Diagnosis-Related Groups are perhaps the most well-known and
widely applied case-mix measure. They now serve as the basis for state-wide
hospital prospective reimbursement systems in New Jersey and Maryland and
for the Medicare program (8, 10, 1 1). The purpose of the DRGs is to relate "the
demographic, diagnostic, and therapeutic characteristics of patients to the
output they are provided so that cases are differentiated by only those variables
related to the condition of the patient (e.g. age, primary diagnosis) and
treatment process (e.g. operations) that affect his utilization of the hospital' s
facilities" ( 1 3, p . 2). The developers o f the DRGs rejected patient groups
defined on the basis of principal diagnosis alone because such groups do not
show sufficiently homogeneous utilization patterns. This is because principal
diagnosis does not subsume all of the clinically pertinent aspects of patients'
illnesses, or any relevant socioeconomic characteristics of patients, or any
characteristics of providers or the health care system. Not all of these factors
should be incorporated into a case-mix measure, however, as is discussed

DESCRIPTION The developers of DRGs reacted to these problems in two

ways. First, in addition to the principal diagnosis, other clinical aspects of the
patient were included in the structure of the classification system: secondary
diagnoses, major surgical procedures, age of the patient, discharge status (alive
vs dead), and discharge disposition to another facility or to home. Second,
these diagnostically related groupings were collapsed on the basis of length of
stay, total charges, and medical meaningfulness. The goals of the DRG scheme
were (a) homogeneous medical categories with a relatively similar case man­
agement process for all patients within a given category; (b) classes defined in
terms of variables typically available from the hospital discharge abstract
system; (c) mutually exclusive, exhaustive classification with a manageable

number of classes; and (d) similar length of hospital stay and total resource use
for patients within any class.
Primary emphasis in the development of the DRGs was placed on clinical
judgments, whereas secondary emphasis was placed on statistical partitioning
( 1 4). The first step in the DRG classification is the Major Diagnostic Categories
(MDCs). The MDCs represent mutually exclusive and exhaustive groupings of
the International Classification of Diseases, 9th Revision, Clinical Modifica­
tion (ICD-9-CM) (7) . The codes are grouped by organ system rather than
etiology, because this method follows the organization of medical specialties
and, hence, better represents medical technology. A supplemental criterion is

used to define MDC 1 2, "diseases and disorders of the male reproductive
system," and MDC 1 3, "diseases and disorders of the female reproductive

system. " Here, urogenital conditions common to both sexes were split on the
basis of sex of the patient.
The second step in the DRG classification is to make the initial split within
most MDCs on the basis of performance of an operating room procedure.
Procedures not typically performed in an operating room were not included in
this determination. Note that the DRG definitions include lists of specific
procedures that qualify the patient; the actual place of performance of the
surgery for any individual is not relevant. Surgical procedures related to the
MDC are ranked from lowest to highest resource intensity; the patient is then
classified on the basis of the most resource intensive procedure received that is
related to the principal diagnosis . Exceptions to the initial major procedure split
are the following MDCs:

1. MDC 14, "pregnancy, childbirth and the puerperium ." Initial split: Deliv­
ery during this admission?
2. MDC 15, "normal newborns and other neonates with certain conditions
originating in the perinatal period. " Initial split: Died or transferred?
3. MDC 1 7 , "myeloproliferative disorders and poorly differentiated malignan­
cy and other neoplasms not elsewhere classified." Initial split: Lymphoma
or leukemia principal diagnosis?
4. MDC 20, "substance use disorders and substance-induced organic dis­
orders. " Initial split: Left against medical advice?
5. MDC 22, "bums." Initial split: Transferred to another acute care hospital?
The third level in the DRG hierarchy is specification of lists of medically
significant complications and secondary diagnoses for grouping patients and
identifying case types. Complications and simultaneous diseases were defined
as significant if average length of stay was observed to increase by at least one
day in 75% of the cases in the original DRG database. Finally, age and death
were included as criterion variables for the DRGs.
A total of 470 DRGs were defined. However, only 468 of them represent

actual case types. DRG 469 is "patients with a principal diagnosis that is invalid
as a discharge diagnosis," and DRG 470 is "patients that are unclassifiable
because of missing or invalid information. "
An illustration of the DRG classification scheme is shown in Figure 1. The
patient's principal diagnosis-the condition that after study is determined to be
the reason for admission to the hospital--determines which MDC applies, in
this case, skin and breast conditions. The surgical procedures performed, if
any, determine placement in the surgical/medical partitioning. Principal di­
agnosis determines placement in the medical partitioning. In some cases,
diagnosis and procedure criteria are applied jointly, as with cancer of the breast
and mastectomy versus excision versus no procedure. Then, the patient is
classified by age (usually, age 69 and below versus age 70 and above), and
presence of significant complications or comorbidities are classified as second­
ary diagnoses. For trauma and cellulitis without an operating room procedure,
an additional age break is introduced-age 17 and below versus age 1 8 and
The partitioning of MDC 9, "diseases of the skin, subcutaneous tissue and
breast," results in 28 DRGs. Breast disorders comprise nine DRGs-six
surgical groups and three medical groups. The effects of age, complications,
and secondary diseases on resource use within a procedure group is illustrated
by DRGs 257 and 258. Under the Medicare program, total mastectomy for
malignancy carries a relative weight of 1 . 1085 for the older and/or complicated
group, and a weight of 1.0729 for the younger, uncomplicated group ( 10). The
most resource-intensive DRG within this MDC is skin grafts for skin ulcer in
the older and/or complicated group, with a weight of 2.4737 ( 1 0). The least
intensive DRG is trauma to the skin in persons less than 1 8 years old, with a
weight of 0. 3460 ( 1 0). Note that these resource weights were developed with
data from the Medicare population; a younger population may reveal a different
pattern of resource use across the DRGs.
The DRGs, as they have been implemented in reimbursement systems in
New Jersey and the Medicare program, contain another overarching classifica­
tion criterion-outliers. In any given DRG, patients whose lengths of stay are
above a specified threshold are considered to be a distinct class of patients. In
New Jersey, patients with very short lengths of stay were treated as separate
classes of patients. Finally, under the new Medicare prospective payment
system, patients with very high costs, but who do not meet day-outlier criteria,
are also considered to be separate sets of patients under each DRG. These
modifications essentially double and triple the number of classification rubrics
under the DRG scheme.
The data base used in construction of the DRGs was developed from a
sample of 332 hospitals subscribing to the Professional Activities Study of the
Commission on Professional and Hospital Activities (CPHA). The sample was

selected to be as nationally representative as possible within the voluntary

subscriber membership of CPHA. A total of 1 . 4 million discharges were
obtained from these hospitals for the third and fourth quarters of 1979. A
sample of 394,8 1 4 records was selected according to a stratified random
sampling procedure, with MDCs as the strata.
DRGs represent "clinically coherent groups based on consumption of re­
sources" .( 1 3). Here, emphasis is placed on treatment patterns rather than
disease processes. The former is assumed to be an acceptable representation of
the latter. This feature is crucial to understanding the essential nature of this
classification scheme and how it differs from other systems.

Disease Staging

PURPOSE The purpose of Disease Staging is to provide a more complete

specification of the patient's disease so that any application requiring a case­
mix measure will not confound differences in patient condition with differences
in therapeutic response (20).

DESCRIPTION Staging refers to the measurement of the severity of the pa­

tient's condition at any given point, where severity is defined as the likelihood
of death or residual impairment as the result of a disease, without consideration
of type or appropriateness of treatment. Severity is related to the patient's
prognosis (the expected outcome to the illness episode) as well as to his/her use
of medical resources-admission to the hospital, length of stay, use of nursing
and other hospital services. Sicker patients can be expected to require a higher
intensity of services, so that costs will be greater to treat these types of patients.
The staging concept was initiated in clinical medicine, specifically oncolo­
gy. During the course of neoplastic diseases, discrete "stages" were observed
or inferred that could be defined and detected clinically, reflected the severity
of the disease, and had clinical significance for prognosis and choice of
therapeutic modality. This concept has now been applied to other medical and
surgical problems in order to classify essentially all hospitalized patients.
In staging, diseases are generically divided into categories of increasing
levels of severity, as follows:

Stage 1 : Conditions with no complications or problems of minimal

Stage 2: Problems limited to an organ or system; significantly increased
risk of complications than Stage 1.
Stage 3: Multiple site involvement; generalized systemic involvement;
poor prognosis.
Stage 4: Death .
Diagnosis of

Skin Ulcer or Cellulilis





ct: :2
� Ii

m jl �Ii 6 t5
t d::



i II 11

il l
.is ill
� �


'> <: Age
'> « Age





Figure 1 Major Diagnostic Category 9: "diseases of the skin, subcutaneous tissue, and breast." A. Surgical partitioning. B. Medical partitioning. Legend: \j.l
OR procedure: Operating room procedure. Age 70 CC: Patient's age> 69 years and/or presence of complications or comorbidities. CC: Presence of
complications and/or comorbidities. Source: Fetter, R. B., et al. 1 98 1 . The New ICD-9-CM Diagnosis Related Groups . Classification System. User's
Manual. New Haven: Health Systems Management Group, Yale Univ.

Staging is not a function of actual utilization patterns, or expected response

to therapy. It is derived from a conceptual model of the disease process itself,
rather than from the relative efficacy of medical technology. Even though
disease technically ceases to exist upon death of the patient, it is very useful,
from both heuristic and practical viewpoints, to include death as an endpoint of
the severity spectrum. This provides explicit consideration of patients who die
during the course of their treatments.
Medical staging criteria for two conditions, otitis media and lung cancer, are
shown in Table 1. These illustrate the progression of disease from a simple,
localized lesion to compromise of critical organs.
Staging criteria for over 400 diseases have been developed (17). These,
along with some "catch-all" criteria sets, cover nearly all the discharges from
typical community hospitals. A panel of 23 medical consultants was formed to
assist in specification of the medical staging criteria. Each disease was assigned
to two members of the panel to be staged independently. The specific instruc­
tions to the panel called for every condition to be divided into at least four
primary stage categories, according to the generic definitions specified above.

Table 1 Disease staging criteria for otitis media and cancer of the lunga

Otitis media Cancer of the lung

Death Death

Sigmoid sinus thrombosis Distal metastasis present

Petrositis Tumor involving regional lymph nodes

Brain abscess; subdural abscess Tumor of any size with direct extension into a adjacent
structure such as the parietal pleura or chest wall, the
Meningitis diaphragm, or the mediastinum and its contents or a tumor
demostrable bronchoscopically to involve a main bron­
Otitis media and/or mastoiditis with chus less than 2.0 cm distal to the carina; or any tumor
CNS complications associated with atelectasis or obstructive pneumonitis of
an entire lung or pleural effusion

Facial nerve paralysis Tumor more than 3.0 cm in greatest diameter, or a tumor of
any size that either invades the visceral pleura or has
Labyrinthitis associated atelectasis or obstructive pneumonitis extend­
ing to the hilar region. At bronchoscopy, the proximal
Perforated eardrum or mastoiditis extent of demostrable tumor must be within a lobar bron­
chus or at least 2.0 cm distal to the carina. Any associated
Acute otitis media with complications atelectasis or obstructive pneumonitis must involve less
than an entire lung, and there must be no pleural effusion.
Acute otitis media Tumor that is 3.0 cm or less in greatest diameter, surrounded
by lung or visceral pleura, and without evidence of inva­
Serous otitis media sion proximal to a lobar bronchus at bronchoscopy.

'SOURCE: Gonnella, J. S., ed. 1983. Clinical Criteria for Disease Staging. pp. 192,262-63. Santa Barbara,
Calif.: SysteMetrics, Inc.

The panelists were encouraged to develop as many substages within each

primary category as deemed appropriate. Each substage should place the
patient at a significantly higher risk of morbidity andlor mortality than the
previous substage and should be clinically differentiable from other substages.
When complications with each primary stage or substage could not be differen­
tiated in their severity, then they were classified as equivalent. Thus, oper­
ationalization of the concept of severity was based on the subjective, a priori
judgment of physicians, who inferred severity level from the particular body
system involved, their clinical experience, and the relevant medical literature
on that condition. Only single disease entities, without concern for complica­
tions arising from a simultaneous, unrelated condition or factor (e.g. skin
infection in a patient with hypertension, or age of patient), were considered.

Finally, stages were defined primarily in terms of biological complications;
functional manifestations of severity were not included.
The Staging panelists were instructed as to the importance of specifying
synonyms, so that the criteria could be made as general as possible for use with
coded diagnostic data to capture all patients who may have the condition
regardless of variations in use of medical terminology by physicians, nurses,
and medical records technicians. The panelists were asked to furnish criteria for
objectively validating the presence of the stage in the patient, either quantita­
tively or qualitatively, and to provide a list of references to support the medical
judgments implicit and explicit in the criteria.
A team of medical records technicians was employed to translate each stage
and substage definition into ICD-9-CM diagnostic codes. Operationally, a
procedure similar to that used for medical criteria development was used for the
coding process. Each medical staging criteria set was coded independently and
then reviewed by a third medical records technician to resolve discrepancies.
When necessary, physician panel members were consulted to assist in the final
decision making. A computer software system is available to apply Staging on
large-scale data bases. An individual patient record from a computerized
discharge abstract file is read by the program, which systematically searches
for the principal and associated diagnoses. If one of the conditions for which
Staging criteria exist appears among this vector of diagnostic codes, the patient
is "staged" for that condition. For each medical (or surgical) problem, patients
are assigned to the highest stage specifically justified when all diagnoses listed
are taken into account. The patient is staged for each diagnosis listed (if the
diagnosis corresponds to a Staging criteria set) and the highest stage related to
the principal diagnosis is selected as the primary staged condition for the
patient. This software will classify patients into any of 420 diagnostic categor­
ies and assign an ordinal stage within each category. In addition to diagnostic
data, the software also employs data on selected elective procedures, sex and
discharge status .

Other than principal diagnosis, the staging software does not require any
ordering of multiple diagnoses. Moreover, the software will overrule the
principal diagnosis listed if it is a complication or manifestation of a listed
secondary diagnosis, according to the medical criteria. For example, neuro­
pathy is considered a manifestation of diabetes mellitus by the software when
both conditions are listed on a discharge abstract, regardless of the order in
which they appear. Staging can be done manually from medical records as
well. Experienced medical records technicians can easily and reliably assign
the stage of disease from a full medical chart. Staging from the record gives
more valid results than from the discharge abstract or face sheet because of the
greater amounts of data available.
Staging has been used in a number of studies to evaluate health care quality
and costs (5, 1 5, 18, 19). It provides a useful framework for organizing clinical
data on the patient for purposes of analysis of utilization patterns.

Patient Management Categories

PURPOSE Young et al (34) have designed a case-mix classification scheme to

address three problems: inconsistencies in the classification and coding of
diagnoses arising out of dependence on the ICD-9-CM; inadequate attention to
the diversity of appropriate types of stays within diagnostic categories; and lack
of a priori clinical input in defining appropriate management of homogeneous
groups of hospitalized patients . These problems can be illustrated by noting
that among hospitalized patients with a principal diagnosis of breast carcinoma
with metastasis are a number of distinct subgroups: (a) patients admitted with
suspected breast cancer who are discovered to be malignant with metastases;
(b) patients who have had a previous mastectomy for breast cancer and who
have been readmitted because of metastatic complications in other sites; (c)
patients readmitted for chemotherapy evaluation subsequent to a mastectomy;
and (d) patients readmitted for immunotherapy only, subsequent to a mastec­
tomy. Among these four case types, the expected lengths of stay and use of
ancillary services vary from one day and one procedure to ten or more days and
an elaborate series of invasive diagnostic tests and therapeutic procedures. For
this reason, Young et al conclude that a case-mix measure must be based on
more than just the principal discharge diagnosis. The innovative contribution of
Young's approach is to incorporate stage of treatment-initial work-up versus
return follow-up--as a classification criterion in their case-mix measure.

DESCRIPTION Young et al used a three-step procedure to develop clinically

meaningful Patient Management Categories (PMCs) that are sufficiently
homogeneous to enable physicians to specify associated diagnostic and ther­
apeutic strategies. First, clinically homogeneous patient categories were de-

fined by medical consultants on the basis of principal diagnosis and reason for
admission to the hospital . Reason for admission included factors other than the
diagnosis that were pertinent to use of the hospital at a particular point in time.
Second, the medical consultants specified patient management paths (PMPs),
which describe the essential components of the diagnostic and treatment
regimen for each PMC. These management paths are the key services that are
expected to be provided to a typical patient; they are neither optimal nor
complete care of every individual case, nor are they the most frequently
observed patterns of care for a set of hospitals. The PMPs are not intended to be
exhaustive of disease or therapy, but rather generally indicative. Third, the

vector of services comprising a PMP was converted to a scalar resource value
via assignment of dollar costs to each service component. These cost estimates
were derived from an accounting cost-finding methodology applied in six
Western Pennsylvania hospitals. This resource scalar serves as the costliness
weight for the PMC. The PMPs are not used to classify patients, rather only to
derive the costliness weights.
Specification of the patient's reason for admission (RFA) is based on two
factors for which information can be found on the patient's medical record: the
patient's chief problem; and performance of an elective procedure. The RFAs
are defined specific to each underlying disease. The medical consultants
defined which procedures are considered to be elective in determining a PMC .
A basic assumption of the PMCs is that the clinical status of the patient at time
of admission, in addition to the discharge diagnosis, should be assessed for
predicting patterns of resource use during the course of a hospital stay. Data on
the chief problem or presenting condition are not routinely available on most
discharge abstracts. To address this problem, Young et al developed algorithms
for mapping the vector of diagnostic codes and other discharge abstract data
onto the PMCs (which include RFA specifications), so that patients can be­
classified into the PMCs without collecting new data on reason for admission.
Validation of this algorithm with medical record data has shown a very high
degree of match in assignment of PMCs between the two approaches. It is
important to point out that the reasons for admission are not defined relative to
the admitting diagnosis; often this diagnosis is assigned by the physician with
the object of gaining a bed for the patient rather than accurately describing the
patient's clinical status at that point. PMCs are defined after discharge on the
basis of all information available regarding the patient's illness and hospital
stay. PMCs do not represent a hierarchical mapping from RFA to principal
diagnosis to PMC; rather, all of these elements were considered simultaneously
by the medical consultants in order to avoid specifying multiple PMCs that lead
to the same PMP, and defining single heterogeneous PMCs that lead to
completely divergent PMPs.
An example of a set of PMPs, and their associated PMCs, is presented in

Figure 2. The paths incorporate initial diagnostic work-ups, which may lead to
more than one PMC. More than one PMC may be defined if differences in
therapy warrant the split, as with fractures due to bone metastasis .


PURPOSE The Acute Physiology and Chronic Health Evaluation (APACHE)

index is a measure of severity of illness that incorporates data from all seven
major body systems . This index was originally developed for the purpose of
assessing severity among patients in the Intensive Care Unit in order to control
for risk of poor outcome in trials of new therapies (29).
DESCRIPTION APACHE is a three-part classification system. The first part is

the Acute Physiology Score (APS) and consists of a weighted sum of 33
potential physiologic measurements obtained from the patient's clinical record.
It is not necessary to have observations on all 33 factors because the assumption
is made that if an observation was not made on a particular factor its value was
normal. A weight ranging from 0 to 4 is assigned to each factor according to a
predetermined scale. A heart rate between 70 and 1 10 is assigned a value of 0,
while very fast (over 180) and very slow (less than 40) heart rates are assigned a
value of 4. These assessments reflect the degree of derangement of the seven
vital systems: neurologic, cardiovascular, respiratory, gastrointestinal, renal,
metabolic, and hematologic. The APS does not measure the severity of illness
per se, but rather the body's response to the acute illness. The internal scales of
the APS were defined by a panel of ICU physicians. Each scale is exhaustive of
the range of potential values on the factor. The factors include vital signs (blood
pressure, temperature, pulse, respiratory rate), urine output, results of routine
blood tests (arterial blood gases, serum pH, and SMA-6), results of cultures,
urinalysis, and the Glasgow coma score . The scores on these various physio­
logic abnormalities are assumed to be additive, so that the APS is treated as a
single cardinal measure of severity. When multiple observations on a given
factor are available, the value with the greatest divergence from normal is
The APS has been shown to have a strong and stable relationship with
various measures of resource use and outcome ( 1 2, 30, 3 1).
The second part of the APACHE index is an assessment of preadmission or
chronic health status. This scale has four categories: (a) good health; (b) mild to
moderate limitations; (c) serious limitations; and (d) severe restrictions of
activity. Patients are assigned one of these categories by the attending clinician
after reviewing the patient's medical record, including the history and physical,
and answering ten multiple-choice questions about the patient . These are
general health status items that are designed to reflect the patient's chronic
health status approximately six months prior to ICU admission . Research has
Reason for Admission Diagnostic Workup

� Diagnoses

Components of Therapy Lengths of Stay
None 2 Days
Chest X-ray Breast Neoplasm

Elective Admission:
Excisional Biopsy
Palpable Breast
I-- Liver Function Studies
Bone Marrow Biopsy
Operative Procedure Bone Sean
Xeroradiography Malignant
Hormone Receptor Studies 7-10 Days
Breast Neoplasm
I Liver Scan
Surgieal lntervention

Metastatic Workup'
Arterial Blood Gases
Closed Pleural Biopsy
Pulmonary Lung Tomagram Lung Metastasis
Chest Tube Insertion
Metastatic Pleural Ruid Examination from Malignant 7-10 Days
for Pleural Effusion
Involvement Pulmonary Function Tests Breast Neoplasm
Sputum Examination
Thoracentesis for
Pleural Effusion

Antibiotics (parenteral)
Physical Therapy 10-14 Days
Surgical Intervention
Skeletal Metastatic Workup' Bone Metastasis
Metastatic Skeletal x-rays from Malignant
Involvement Tomogram Breast Neoplasm
Physical Therapy 7-10 Days >
Elective Admission:

Malignant -

Chemotherapy Metastatic Workup'

Breast Neoplasm
Chemotherapy 5-7 Days X

r Bective Admission:
Malignant 1 Immunotherapy
I 1-2 Days
Immunotherapy Breast Neoplasm
I c:::
'METASTAnc WORKUP for Breast Malignancy
Bone Marrow Biopsy
Bone Scan
Liver Function Studies
Liver Scan

Brain Scan Mammography!Xeroradiography
Chest X-Ray

Figure 2 Patient management categories and paths: breast neoplasms. Source: Young, W. W., Swinkola, R. B., Zorn, D. M. 1982. The measurement of �
hospital case mix. Med. Care 20:509. \0

Table 2 APACHE: Components of acute physiology score

Vital systems Component scales

Cardiovascular Heart rate venticular response

Mean blood pressure
CPK-MB or EKG evidence of acute MI
Lactate meq/L (serum)
pH (blood)
Respiratory Respiratory rate total nonventilated

A-aD02 (1 00%)
Renal Urine output/day

Serum BUN
Serum creatinine

Gastrointestinal Serum amylase IU

Serum albumin
Bilirubin (total)
Alkaline phosphatase (serum)
Anergy (response to skin tests)

Hematologic H emoerit
White blood count (total)
Protime, no anticoagulants

Septic CSF-positive culture

Blood-positve culture
Fungal-positive culture
Temperature (rectal)

Metabolic Serum calcium

Serum glucose
Serum sodium
Serum potassium
Serum HC03
Serum osmolarity

Neurologic Glascow coma score

shown that patients with severe chronic illnesses are four times as likely to die
during their rcu stays as patients with no chronic conditions (29). Examples of
the component scales of the APS are shown in Table 2.
Unlike other case-mix methods, APACHE does not incorporate specific
diagnosis as a classification dimension because it is not usually relevant for

explaining use of the ICU. Need for life support or close monitoring is the
major reason for ICU admission, and this is related to level of functioning of the
critical organ systems . The developers of APACHE collapsed principal di­
agnoses at ICU admission into the seven major organ systems, which constitute
the third part of the APACHE classification system.
The APACHE classification system is typically applied within 24 hours of
admission to the ICU, because this is the period when the greatest degree of
derangement is likely to be observed and when the necessary data are likely to
be available. While APACHE can be applied at any time subsequent to this, it
will reflect the results of therapy.

APACHE appears to have limited usefulness as an approach for measuring
overall hospital case mix because of its focus on intensive care. However, the
principle underlying the APS, objective physiologic measurement, can be
extended to all case types admitted to the hospital in order to achieve greater
precision in classification within general disease areas. Preliminary work by the
developers of APACHE has resulted in identification of 12 factors included in
the APACHE that can serve as a useful subset of the 33 factors with wide
applicability (31a). These are heart rate, mean blood pressure, respiratory rate,
temperature, Glascow coma score, hematocrit, white blood cell count, serum
potassium, serum sodium, serum creatinine, serum pH, and serum Pa02.
These variables are presently measured on nearly all hospital admissions, and
can be scored and summed in the same fashion as with APACHE.

Patient Severity Index

PURPOSE The purpose of the Patient Severity Index (PSI) is to measure the
severity of the patient's illness, so that the performance of physicians with
respect to prescribing appropriate lengths of stay or ancillary services can be
examined and the overall costliness of the hospital can be predicted (25).

DESCRIPTION The PSI is based on clinical and utilization data contained in

the patient's medical record and has seven components: (a) stage of disease,
from preclinical to life-threatening; (b) severity of complications; (c) concur­
rent interacting conditions that are likely to affect the hospital course of the
illness; (d) level of nursing care required; (e) performance of procedures
outside of the operating room, from noninvasive diagnostic procedures to
emergency life-sustaining procedures performed in the emergency room; if)
rate of response to therapy; and (g) completeness of response to therapy. At the
time of discharge, the patient's medical record is reviewed and the patient is
rated on a four-point scale for each of the above seven factors. The rater then
subjectively integrates the values of these variables into an overall severity
score ranging from one to four. Thus, the inter-item weighting scheme is not
specified a priori, but is determined by each rater.

The PSI is illustrated in Figure 3. The complete set of definitions for the
various levels of each factor is contained in Hom (25).
Classification of patients with the PSI requires data from the history and
physical examination, progress notes, radiological findings, laboratory results,
and nursing notes. Thus, a comprehensive review of the patient's course is
required. Reliability studies of the PSI have shown a high degree of inter-rater
concordance in assigning severity scores (23) . Validity studies have revealed
positive association between PSI and total charges, length of stay, laboratory
charges, routine charges, and radiological charges (22, 24) .
PURPOSE A plethora of patient classification systems have been developed

over the past 1 5 years for the purpose of identifying the intensity of nursing
services that corresponds to client characteristics ( 1 -4, 6, 16, 21). Clients are
typically placed into one of three to five ordinal categories based on various
dependency, need, or acuity typologies . These classification systems have
primarily been utilized as management tools for designating nurse staffing mix
and level . Typically, hospital administrators have developed or modified
classification tools and have independently established standards of nursing


Characteristic 1 2 3 4

Stage of Moderate
Principal Asymptomatic Manifestations Major Manifestation Catastrophic

Interactions None Low Moderate Major

Response Rate Prompt Moderate Delay Serious Delay No Response

Therapy Residual None Minor Moderate Major

Moderate Major
Complications None or (less important (as or more important Catastrophic
veryminor than principal than principal
diagnosis) diagnosis)

Dependency Low Moderate Major Extreme

Procedures Noninvasive Therapeutic None merge ncy Emergency

(Non O.A.) Diagnostic or Invasive Life Sustaining Life Sustaining

Severity rating
(circle one): 2 3 4

Figure 3 Patient severity index. Source: Hom, S. D. 1983 . Measuring severity of illness:
Comparison across institutions. Am. J. Public Health 73:26.

resource allocation based on time-on-task patterns in their own institutions.

These methods presume that nursing intensity and resource consumption are a
function of organizationally unique practice patterns as well as volume
changes, including patient census and nurse availability. The tools are not
standardized to account for the effects of relevant client, organizational, or
environmental characteristics on use of nursing resources over time and place

DESCRIPTION The essence of an acuity classification system is as follows:

first, the nursing staff develops a comprehensive inventory of nursing services
and procedures . Then, a period of very intensive data collection follows in

which every nursing service provided to each patient is logged, together with
the time required to perform that service. The average nursing resources
required per patient-by-shift are computed. A system for classifying patients by
level of nursing needs (acuity) is developed by clustering selected activities and
their average performance time. Most acuity systems incorporate level of
dependence (as indicated by activities of daily living), treatments (dressing
changes, isolation procedures, irrigations, etc), medications (oral and IV),
monitoring (instability, suicide precautions, etc), patient education, and other
nursing activities. Each nursing acuity class is assigned an intensity weight
equivalent to hours of care. The range of weights is subdivided into four or five
ordinal classes. The sum (or average) of these ordinal scores, together with the
number of patients, determines the unit staffing level required, based on a
predetermined set of coefficients for that unit. An example of an acuity system
is shown in Figure 4.
Patient acuity systems are prospective judgments regarding the level of
nursing care required for a patient during the next eight-hour shift. They are not
a measure of severity of illness, although more severely ill patients generally
require more nursing care. Acuity is not a measure of actual services provided
to patients, as it is a prospective index. However, it can be assumed that over
large numbers of patients, the nursing care plans imbedded in acuity judgments
are actually implemented, so that acuity is likely to be a good proxy measure of
nursing intensity. Curtin (9) has suggested that acuity and DRGs be combined
into a single classification scheme, with each DRG subdivided into four acuity
levels for the purposes of reimbursing for nursing services.

Comparison of Case-mix Methods

The similarities and differences among the six case-mix measures described
above are highlighted in Table 3. One of the important conceptual differences is
whether the scheme is objective, based on diagnostic and other clinically
pertinent attributes of the patient, or subjective, based on direct observation of
the patient. Another important difference is whether the scheme is "generic," in

Minimal Assistance
(Needs help in 1 Or 2 areas), 5 Routine ObselVation (q shift v.s.I&O) 5
...J Moderate Assistance Close ObselVation (q 4 hours v.s. 1&0,
q (Needs help in 3 or 4 areas)' 15 Stage I Alcohol Withdrawal) 10
Total Care t!l Unstable Condition (q 1-2 hour assess.,
(Needs help in 5 Or 6 areas), 20 z immediate Post Op., Stage II & III Alcohol)
Drainage Tubes Potentially Dangerous (q 30 minutes,
(Hemovac, Foley, N/G)
5 g
assessment Stage IVAlcohol) 20
0 T itrate Dosage.of potentially dangerous
Traction 10 :2: drugs 20
Simple Dressing Closely Watched (confused, demanding,
(Inspection, Reinforcement. Replace) 5 suicide or seizure precautions) 20

Complex Dressing
(Drains, Special Cleansing, Pack, Bag) 15 Repetitive Requests 15
:2: Irrigations
(Enema, Douche, GU, Catheter, N/G) 10 t!l Intensive Teaching (>1 hour/shift) 20
55 Special Procedure Assists' (Spinal Tap,
5: Teaching-Home Care, Discharge
I- T horacentesis, Subclavian Insertion) u Planning (10-20 minutes/shift)
Special Treatments. (Tube Feed, Trach Care,

N-T Suction, Decubitus, Colostomy Care)
20 � Emotional Needs/Family Problems
('1.-1 hour/shift)
Other Simple Treatments (Perineal Care. Admission Assessment/Discharge
Hot Pack, Surgical Prep, Sitz Bath, C&A)
10 Procedure
Barrier to Communication (deaf, blind,
Isolation (Gown. Mask, Bagging) 5 a:
w foreign language, retarded)
Routine Oral or 1M Maintenance 5 0 Transfer (room-room) 10
Multiple Meds
en (6 doses/shift-includes PRN) 10 Transfer (unit"unit or outside) 15
0 IVStart/Restart 10

is Simple IVFluid Administration 10 11=
:2: IVwith Meds (Blood, Hyperarimentation,
intermittent-less than 2/shift)
IVwith intermittent Meds IV= CLASSIFICATION
(more than 2 meds/shiftL 20

Figure 4 Acuity classification system. *Activities of daily living: bathing, dressing, eating,
toileting, transfering, ambulating. Source: Bess Kaiser Medical Center, Kaiser Permanente,
Northwest Region, Portland, Oregon.

the sense of being applicable at different levels of analysis, or is a mutually

exclusive and exhaustive classification of all patients.

Diagnostic Classification
Three of the systems are based on a preexlstmg diagnostic classification
system-DRGs, Staging, and PMCs. The International Classification of Dis­
eases, 9th Revision, Clinical Modification (ICD-9-CM) (7), a classification of
diseases, injuries, impairments, symptoms, and causes of death, is now widely
used by US hospitals. The scheme includes over 1 0, 000 categories. The
ICD-9-CM is a comprehensive coding system. It attempts to code the structural
aspects of disease for the pathologist, the clinical manifestations of disease for
the clinician, and the vagaries of human behavior for the statistician. This
comprehensiveness is both a strength and a weakness. There is likely to be a
code number to represent every possible reason for contact with the health care
system-disease, health problem, prevention, consultation, administrative re-

quh'ement, and so on. The ICD-9-CM probably represents the most compre­
hensive, finely divided classification system for inpatient care. In this sense, it
represents the ultimate frame of reference for all other classifications related to
in-hospital activities. There are, however, some critical shortcomings with this
scheme, especially for use as the foundation of a case-mix measure.
The ICD-9-CM is not consistent in its coding criteria across diseases. For
example, the dimension of severity, i.e. mild, moderate, severe, may be
applied to any diagnosis, but it is specifically allowed in this system for only a
few selected diagnoses. At any given level of classification, the cells are not
Annu. Rev. Public Health 1985.6:295-322. Downloaded from

conceptually homogeneous. Other dimensions not uniformly applied to each

diagnosis include manner of detection, etiologic agent, and specific location.
Access provided by on 04/13/19. For personal use only.

When these factors are not consistently coded, a case-mix measure cannot
incorporate them on a uniform basis across all classes.

Utilization Criteria
Two of the case-mix approaches are dependent on utilization data for classifica­
tion-DRGs and PSI. Cardiac surgery DRGs, for example, do not require that
the indications for a specific procedure be present, other than a principal
diagnosis of cardia}: disease, only that the procedure was actually performed.
Classification of bum cases by whether or not they were transferred to another
hospital is, literally, an assessment about how the case was handled in the
health care system, rather than an assessment of the intrinsic nature of the bum
injury or the physiologic response. One can presume that most transfers of bum
cases represent transfers to bum units where the patient can receive more
intensive care, appropriate to the severity of the bum, and that the transferring
hospital would receive less payment because less had been done for these cases.
Hospitals can affect the classification of patients in this MDC by whether or not
the patient is transferred, independent of the actual severity of the bum or the
medical need for the transfer. Patients with nonextensive bums who do not
receive an operating room procedure have a relative payment weight of 1 .4225
under the Medicare program; if this case is transferred, the payment weight
rises by 47% to 2.0902 ( 1 0). Full payment is received in the event of a transfer
because the definition of the DRG includes the transfer.
Ideally, a case-mix classification system should be independent of the
utilization patterns because the nature of the disease is exogenous to the
hospital. Cases should not be transformed (other than cured) by how physicians
decide to treat them. The rationale for employing utilization as a measure of
output is two-fold: one is that under the assumption of constant efficiency,
inputs can serve as proxy measures of outputs. If the patient received surgery
and if the physician is providing care of acceptable quality, then it can be
assumed that the patient needed the surgery. Since utilization data are more
readily available than clinical data, this approach is quite attractive. The other
Table 3 Comparison of six case-mix measures w

Characteristic Diagnosis-Related Disease Staging Patient Manage- APACHE Patient Severity Acuity
Groups menl Categories Index


Purpose Iso-resource use Iso- Iso- Iso- lso- Iso-

d isease disease di sease severity nursing needs '"

No. of classes 467 x 3 1600 + 1000 + Vcry large 4 and up 4 0

Source data Discharge abstract Medical record or Discharge abstract Medical record or Medical record Direct observation
or medical re- discharge ab- or medical re- disrect observ.
cord Slrac! wrd

Type of class;- Objective Objective Objective Objective Subjective SUb.iective


Actual utilization Yes No No No Yes No

in class defs.

Classification Principal di- Principal di- Principal di- Principal di- Diagnoses; his- Nursing assess�
variables agnosis; sec. agnosis; sec. agnosis ; sec. agnosis; lab tory and phy- ment
diag . ; age; diag.; sex; ruag.; reason tests; history sical; lab tests;
prin. surgery; dis- for admission; and physical x-rays; progress
procedure; sec. charge disposi- chief problem; and nursing
procs. ; sex; lion elective proc. notes
discharge dis-

Comorbidity in- Yes-specific di- Yes--secondary Yes-secondary Yes--multiple Yes--any factor Yes-any factor
eluded? agnoses diag . as comor- diag. as comor- system impacts that affects that affects
bidily, and nidity and com- length of stay
complicatioos plications
Iatro-genesis No No No No No No
No No Yes No Yes
Deaths omitted? No

Outliers omitted? Yes No No No No No

Discharge to No No No No No No
other facility or
against advice

Affected by errors Yes Yes Yes No No No

in diagnosis
coding on ab-
stracts? (')
Automated? Yes Yes Yes No No No tp
Mutually exclu- Yes Yes Yes No Yes Yes ><
sive and ex- ::
haustive? >
When applied? When data be- When data be- After discharge 24 hr after admit After discharge Admission and :;0
come available come available every 8 hr ::
Need for revi- Shifts in patterns New diseases Shifts in patterrns New diagnostic None None o-l
sions of care; new of care; new techniques
diseases diseases (.-l

attractiveness of the DRG and PSI approaches is their greater acceptability by

providers because output is counted by their use of inputs, so that no indepen­
dent judgment of appropriateness services is implied by the system.
Use of observed utilization patterns in a case-mix scheme will incorporate
differences in efficiency, quality of care, availability of inputs, and patient
compliance with treatment regimens, as well as the nature of the illness. It is
only the latter factor that should be included in any case-mix measurement
scheme . Moreover, definition of classes on the basis of observed resource use
combines diseases that are very different from a medical prospective . DRG 1 ,
"patients over age 17 who received a craniotomy and who did not have trauma
Annu. Rev. Public Health 1985.6:295-322. Downloaded from

as the principal diagnosis," does not distinguish between aneurysm, tumor, or

Access provided by on 04/13/19. For personal use only.

abscess. To a physician, it is not self-evident why these different diseases were

combined into a single DRG. It is difficult to understand what some DRGs are
measuring. This problem may be alleviated somewhat by creating a set of
iso-illness categories prior to mapping them onto a set of iso-resource cate­

Iso-resource Classification
From a theoretical perspective, the DRGs are not based on any definition of the
concept of resource use. Resource heterogeneity can be defined in terms of
length of stay, total costs, type or mode of treatment, type of manpower inputs
required, minutes of physician time required, and so on. Different case-mix
measures are likely to be implied by these differing indicators of resource use.
Users of the DRGs should always bear in mind the fact that they were derived
on the basis of length of stay as the measure of resource use.
The DRGs are not organized by etiology or stage of disease. Complications
and comorbidities enter the scheme as having effects or not; no attempt is made
to scale these factors in terms of their expected influence on patterns of care .
Interactions among multiple complications are not detected by the DRGs.
One of the most important drawbacks to DRGs is that they depend on the
specific data base and analysis employed in their derivation. Data from a
different set of hospitals would probably contain different patterns of practice,
yielding different splits. Moreover, Fetter et al ( 1 3 , 14) do not define "clinical
meaningfulness." Rather, clinical meaningfulness and the appropriate degree
of variance reduction were defined operationally by the DRG physician consul­
tants, that is, meaningfulness was what they found to be meaningful among
themselves. Even with identical input data, another group of physicians might
define DRGs differently. This was demonstrated by Young et al (33).

Need for Revisions

Patterns of practice change over time, so any DRG scheme would begin to
accumulate a number of exceptions, anomalies, and problems from its incep­
tion . Moreover, new diseases are discovered; leukemia used to be considered a

single disease entity, but now it i s known to be at least eight or nine. The DRGs
must be periodically reviewed and reevaluated in terms of new diseases and
within-group homogeneity of resource use. A new procedure that reduces costs
and length of stay for one specific condition within a particular DRG may
render the DRG too heterogeneous to serve as the basis for reimbursement. In
contrast, Staging criteria need be revised only in light of new developments in
understanding of disease processes . APACHE would be revised as new labora­
tory tests are devised to replace component scales in the APS.

Despite their problems, DRGs do have at least four advantages: (a) they
represent considerably fewer case types than the ICD-9-CM; (b) they can be
applied by means of a simple classification algorithm (now coming to be known
as the "DRG Grouper") to a few data on computerized hospital discharge
abstracts; (c) they represent a sufficiently large number of case types so that
case-mix differences among most hospitals can be detected; and (d) DRGs have
become familiar to many persons in hospitals and reimbursement programs.
Thus , DRGs are the most widely accepted case-mix measure.
Disease staging is likely to be acceptable to physicians because it was
derived by physicians for use by the medical profession and because of their
previous exposure to cancer staging . The medical staging criteria have not been
directly validated to determine whether patients in Stage 2 of a given disease do
actually have a higher risk of adverse outcome than patients in Stage 1 .
Moreover , the face validity o f stage definitions have not been determined via
review by an independent group of physicians or critique from the various
specialty societies. [Several sets of criteria related to otological conditions have
been through a peer review process and published in a specialty journal (32)].
The Staging criteria would be much more acceptable to administrators, re­
searchers, and policymakers if high interphysician agreement on stage rankings
and definition could be demonstrated.

Reason for Admission

PMPs are unique in their recognition of reason for admission (RFA) as a
classification criterion. Obviously, this factor is important for reducing heter­
ogeneity of resource use within a general disease category. The RFA contains
two concepts that should be distinguished for purposes of case-mix classifica­
tion. One is "stage of treatment. " It is important to discriminate between
patients admitted for initial workups and those admitted for follow-up.
The other concept included in the RFA is "elective procedures." While
Young et al (34) use this concept to mean "schedulable," in contrast with urgent
and emergent admissions, there are some cases where it is the patient's demand
for care that defines the episode. Some of these are disease-related and others
are not.

Subjective Classification
The two so-called "generic" case-mix methods-PSI and acuity-are limited
only by the discriminatory powers of the raters, i.e. nurses and medical records
technicians, in achieving precision. It is quite possible for a clinical rater to
identify thousands of classes, but the meaning of those classes depends on the
assumptions made by the raters in assessing severity and nursing needs.
Application of the PSI is relatively labor intensive, which reduces its useful­
ness as a case-mix measure in applications to large-scale databases. It has not
been computerized , and approximately seven to eight minutes are required to
rate a chart for severity using the PSI. In preliminary applications of the PSI
paired with medical record coding for the discharge abstract, the incremental
time required to assign the severity score was around two minutes (S. D . Hom,
personal communication).
The classes of the PSI are ordinal in nature, in that the difference between
level I and level 2 patients is not the same as between level 2 and level 3
patients; moreover, level 2 patients are not twice as severely ill as level 1
When applied within specific diseases, the PSI provides more homogeneous
groups of patients with respect to resource consumption than the DRGs ,
Staging, and PMCs (25) . The PSI is meant to be "generic" in that it can be
applied across , as well as within, diagnostic categories . That is, all level 1
patients can be grouped, regardless of diagnosis. It is not surprising that the PSI
performs well in classifying patients because the rater must read the entire
medical record for a given episode to produce a severity rating. To duplicate
this intuitive process on the computer would require an extremely large set of
variables and a very complex decision logic. Thus, the subjective approach
enables classification on the basis of complex criteria that would not otherwise
be feasible. However, it is not possible to observe or replicate this process for
any given patient , so that one cannot be sure on what basis the rater arrived at a
severity assignment. For example, this rater may use resource use as an
indicator of severity-"the patient stayed 20 days so he must have been very
sick. " Also, the medical record is, in essence, a justification for what was done
for the patient. Evaluation of the patient's response to therapy, when assessing
severity, assumes that the treatment itself was effective and appropriate. The
medical record does not provide an exogenous set of criteria to discriminate
between resistant diseases and inappropriate diagnosis and treatment.

Five of the case-mix measures described herein represent standardized mea­
sures that are designed to be replicatable across multiple applications. Acuity,
however, represents thousands of tailored variants. Standardization of acuity
systems has not been pursued because nursing staffs have felt the need to

account for regional, institutional, and clinical specialty differences in patterns

of practice. In addition to interhospital variation, there is often considerable
intrahospital variation as each nursing unit or clinical department modifies the
rating criteria to fit their own standards of practice. It is hypothesized that
different nursing acuity systems are likely to maintain the same relative dis­
crimination among a group of patients but that they will differ in the implied
baseline for the acuity scale, thus giving rise to significant interhospital and
interunit differences in the ratio of nurses per occupied bed. Hospitals with
higher resources may not use a different technology altogether but may main­
tain the same relative ordering of services within and across case-types. The
alternative hypothesis is that acuity systems do not have any correspondence
with one another, which seems highly unlikely given that all of the systems are
looking at approximately the same nursing tasks with the same technology.
Acuity systems are likely to be rather robust in their sensitivity to case-type
differences in nursing resource use.
One of the oft-cited limitations of acuity ratings is the bias toward inflating
the ratings by nurses to acquire additional staffing for the unit . Nursing
administrators report that they know who are the most flagrant abusers of the
acuity system and these persons are routinely counselled to bring their ratings
into line. Thus, the acuity system may be subject to some upward creep, but
administrators exert countervailing pressures. However, it is not likely that
acuity creep will be systematically different between cases. Creep is likely to be
shift -specific and, perhaps, unit-specific, so that over the course of the stay, the
acuity rating will vary randomly around the "true" acuity level. That is,
manipulation of acuity ratings is presumed to add random disturbances to the
data, rather than confounding effects.

Valuation of Output
A disadvantage of all the case-mix systems developed to date is that they are not
true measures of output. The ultimate output of a hospital is its contribution to
social welfare. This means that case-types should be valued according to their
social worth, not according to their costs of treatment . An expensive case-type
is merely one that is costly to treat, not one that deserves to be treated. While it
is true that sicker patients usually require more care and services, and that
society usually stipulates that the sickest get higher priority in resource alloca­
tion, the risk in case-mix measurement is that the "black box" of the compli­
cated classification algorithms obscures an essential feature: greater case-mix
complexity is a reflection of the nature of cost, rather than of the nature of social
welfare. For example, the cost of treating a complicated closed head injury can
be computed and can be shown to be higher than that for uncomplicated
pediatric otitis media. But, it is a different issue to say that treatment of a head
injury is worth more than treatment of otitis media. This caveat is issued to

warn of the potential trap of any case-mix measure extant in the literature.
Social welfare criteria are required to valuate case mix in order to arrive at a true
measure of output.


Multiple Conditions
One of the critical problems not addressed by most of the above classification
systems is that patients often present with more than one diagnosis or symptom.
This is usually handled by forcing a choice of the principal diagnosis or chief

complaint. While this is necessary, the analytic problem of distinguishing
among patients with multiple problems remains . The APACHE combines

different approaches to assess simultaneous acute and chronic problems. A
major theoretical and empirical task for future research is to develop appropri­
ate methods to handle multiple, simultaneous illness episodes in the same

A second research need derives from the imper-atives of the Medicare program.
The manipulability of DRGs and the interhospital differences in within-DRG
variation in use of resources create an imperative for accelerated development
of objective case-mix measures, preferably based on common, inexpensive
laboratory tests. APACHE and Staging show the greatest promise for extension
in this direction.

Dimensions of Resource Use

A third area for research concerns the structure of resource use across and
within the various case types within each system. For purposes of reimburse­
ment, an iso-resource case-mix scheme is desired. However, the various
components of hospital services should be examined-nursing , radiology,
drugs, surgery, and so on. This includes dynamic patterns of resource use over
the stay. Two cases may use identical amounts of services overall, but one
receives the services concentrated within the first three days after admission,
whereas the other receives the services distributed uniformly over the stay.

Comprehensive Episode Approach

Finally, all of the case-mix methods discussed here focus on inpatient stays.
The development of a classification scheme for entire illness episodes is needed
for measuring the output of the health care system. Such a scheme, when used
as the foundation for a payment system, can provide improved incentives, as
compared to DRG-based payment, regarding use of the hospital.


The six case-mix measures presented here-DRGs, Staging , PMPs ,

APACHE, PSI, and Acuity----each have their significant strengths and weak­
nesses. Progress in the case-mix field will result from development of eclectic
approaches that capitalize on the strengths of each approach. Further develop­
ment will also create a system of nested measures whose properties are well
understood. For the present, creative manipulation of DRGs by hospitals will
continue to evolve. The DRGs were developed under the assumption that
diagnostic and procedure coding were not related to hospital financial position.
Many hospitals are now acquiring software to optimize assignment of DRGs at

the point of coding the patients' medical records . If all hospitals have equal
opportunity and are equally proficient in this process, then the relative ordering
of case mixes among hospitals is likely to be preserved. However, it is possible
that some hospitals will play the "DRG game" better than others , thus prolong­
ing the difficulty of attaining horizontal and vertical equity in hospital reim­

