Single-Case Experimental Designs

Michel Hersen

Introduction 1986; Guyatt et al., 1986; Louis, Lavori, Bailar, &

Polansky, 1984; Mcleod, Cohen, Taylor, & Cullen,
Behavior modification and therapy perhaps are best 1986; Robin & Burke, 1986). Thus, the widespread
distinguished from other therapeutic and educational application attests to both its popularity and its utility.
approaches by their dependence on the experimental- The importance of the single-case experimental
empirical methods for solving human problems. Thus, study to behavior therapy was best illustrated by Yates
in evaluating the efficacy of emerging therapeutic and (1970). Indeed, he felt compelled to define behavior
educational techniques, a large variety of experimental therapy specifically in relation to single-case meth-
strategies has been carried out by behavioral re- odology. He argued that
searchers. Included, of course, are both group-com-
parison designs (cf. Kazdin, 1980) and single-case ex- behavior therapy is the attempt to utilize systematically that
perimental designs (Barlow & Hersen, 1984; Hersen body of empirical and theoretical knowledge which has resulted
from the application of the experimental method in psychology
& Barlow, 1976). and its closely related disciplines (physiology and neu-
Although group comparison and single-case design rophysiology) in order to explain the genesis and maintenance
strategies are both well within the armamentarium of of abnormal patterns of behavior; and to apply that knowledge
to the treatment or prevention of those abnormalities by means
behavioral researchers, the single-case design ap-
of controlled experimental studies of single cases, both descrip-
proach to evaluating technical efficacy is almost tive and remedial. (p. 18).
uniquely tied in with the behavioral movement of the
least two decades. That is, the single-case research This strong link between behavior therapy and re-
approach has been followed not only in clinical psy- search is reflected in the single-case publications in
chology (Hayes, 1981), psychiatry (Barlow & Hersen, psychological (e.g., Journal of Consulting and Clini-
1973), and education (Risley & Wolf, 1972; Thoresen, cal Psychology), psychiatric (American Journal ofPsy-
1972), but also in the practice of social work (Thomas, chiatry. Archives of General Psychiatry), and, of
1978), rehabilitation (Martin & Epstein, 1976; Van course, the behavioral journals (e.g., Behavior Modifi-
Hasselt & Hersen, 1981), behavioral medicine (Bar- cation, Behaviour Research and Therapy, Behavior
low, Blanchard, Hayes, & Epstein, 1977), and, most Therapy, Journal ofApplied Behavior Analysis, Jour-
recently, internal medicine (Anonymous Editorial, nal of Behavior Therapy and Experimental Psychia-
In this chapter, we first briefly trace the history of
Michel Hersen • Department of Psychiatry, Western Psychi-
atric Institute and Clinic, University of Pittsburgh School of the single-case approach, particularly as it relates to
Medicine. Pittsburgh, Pennsylvania 152l3. the problems and limitations of the group comparison


A. S. Bellack et al. (eds.), International Handbook of Behavior Modification and Therapy

© Plenum Press, New York 1990

method. This is to be followed by a discussion of more brains of experimental animals were systematically
general issues involved in research, such as variability, excised (i.e., the extirpation of parts). The relationship
intrasubject averaging, and the generality of findings. of such surgical excisions and subsequent behavioral
Next, we outline the basic procedures followed in sin- changes in single organisms was meticulously studied,
gle-case evaluations: repeated measurement, choice of thus providing "an anatomical map of brain func-
a baseline, changing of one variable at a time, length of tions." The critical point to be underscored here is that
phases, distinction between reversal and withdrawal, findings of wide generality were gleaned on the basis
and evaluation of irreversible procedures. Then we of experimental work with very few research subjects.
discuss and illustrate A-B-A designs and their exten- It is generally agreed that Fechner's publication in
sions (e.g., interaction designs, drug evaluations). 1860 of Elemente der Psychophysik heralded the be-
This is followed by our examination of additional de- ginning of experimental psychology. In this treatise,
sign strategies (e.g., the three types of multiple-base- Fechner described studies he had conducted, using in-
line designs, the multiple-schedule and simultaneous- dividual subjects, to determine sensory thresholds and
treatment designs, and the changing criterion design). just-noticeable differences in a variety of sense modal-
Next, we tackle the thorny issue of the role of statis- ities. Although he did apply some statistical methods
tical analyses in evaluating treatment efficacy in sin- in evaluating his work, such statistics were employed
gle-case studies. Highlighted are the arguments in sup- to ascertain variability within a given subject. Follow-
port of and against the use of such statistical ing Fechner's studies in psychophysics, Wundt and his
techniques. Finally, we discuss the importance of rep- colleagues evaluated sensation and perception, while
lication in single-case research. Three types of replica- Ebbinghaus assessed the processes of learning, devel-
tion methods are to be considered: direct, clinical, and oping a new tool for conducting such research: the
systematic. nonsense syllable. Both of these giants in the history of
psychology accomplished their goals by studying indi-
vidual subjects. Later, in the early part ofthe twentieth
History century, Pavlov's classical experiments in physiology,
learning, and conditioning were all conducted with
An historical perusal clearly shows that the single- single organisms.
case approach, as currently applied, owes its heritage With the emergence of the group comparison meth-
to many disciplines (cf. Barlow & Hersen, 1984; ods, bolstered by the statistical genius ofR. A. Fisher
Hersen & Barlow, 1976, Chapter 1; Kazdin, 1978). in the 1930s (i.e., inferential statistics), interest in the
There can be no doubt that the single-case study has single-case approach during the middle part of this
been important in the development of physiology, century waned considerably. Of course, the psycho-
medicine, early experimental psychology, and psycho- analysts did (and continue to) publish their descrip-
analysis. In all of these disciplines, many critical find- tions of protracted treatments of individual patients.
ings have emerged from the careful study of individual Probably the first was Breuer and Freud's case history
organisms and subjects. published in 1895 (1957), describing the systematic
The tradition of single-case research dates back to treatment of Anna O's hysterical symptoms. (Paren-
the 1830s, as exemplified by the work of Johannes thetically, we might note that Hersen and Barlow,
Milller and Claude Bernard in physiology. More 1976, Chapter 1, have likened Breuer and Freud's ap-
important from a historical perspective, however, is proach to the multiple-baseline design across behav-
the contribution of Paul Broca in 1861. At that time, iors.) Nonetheless, these reports, albeit of tremendous
Broca was treating a patient who had suffered a severe therapeutic import, generally had subjective in-
speech loss. However, the patient died while still un- terpretations of results, in that usually no hard data
der his care. Broca subsequently performed an autopsy were presented. However, the psychoanalytic case
and discovered a lesion in the man's cerebral cortex study certainly may be considered one of the anteced-
(i.e., in the third frontal convolution). He correctly ents to the single-case experimental tactic.
assumed that this part of the brain controlled speech In the 1920s and 1930s, there were some sporadic
functions. As pointed out by Barlow and Hersen descriptions of the behavioral treatment of individual
(1984), Broca's clinical method was an extension of cases of unusual interest (cf. Max, 1935; 1. B. Watson
prior work done in laboratories in which parts of the & Rayner, 1920). But these single-case descriptions
appear to have had little impact on therapeutic attitudes tients and "spontaneous remission" rates (evaluated
of the day and on subsequent strategies developed to from insurance company records) and concluded that
assess therapeutic efficacy. the effects of psychotherapy (as then practiced) were
There are several other historical antecedents that negligible at best. This finding, of course, sparked a
warrant our attention. Most outstanding, of course, is tremendous controversy in the psychological world,
the operant work of B. F. Skinner and his students in which still rages at times. Bergin (1966) reevaluated
the 1940s and 1950s. Skinner (1966) has stated his the disappointing results of psychotherapy when con·
philosophy of research in very succinct form: "Instead trasted with control group procedures and discovered
of studying a thousand rats for one hour each, or a that some patients improved as a function of treatment,
hundred rats for ten hours each, the investigator is whereas others actually worsened. Indeed, the statis-
likely to study one rat for a thousand hours" (p. 21). tical averaging of results (employing the group com-
The specific experimental strategies used in the experi- parison method) led to a canceling of treatment effects
mental analysis of behavior (with special emphasis on for a fairly substantial number of patients.
research with animals) were compiled and elucidated Bergin's (1966) work in particular clearly indicated
in Sidman's (1960) now-classic tome entitled Tactics some of the limitations of the group comparison ap-
ofScientific Research. However, this book was written proach to studying the efficacy of psychotherapy. As
prior to the plethora of behavior therapy studies that noted by Hersen and Barlow (1976), "These difficul-
appeared in the 1960s, the 1970s, and now in the ties or objections, which tend to limit the usefulness of
1980s. (For a more comprehensive description of the a group comparison approach in applied research, can
use of single-case strategies as applied to humans in be classified under five headings: (1) ethical objec-
therapeutic endeavors, the reader is referred to Hersen tives, (2) practical problems in collecting large num-
and Barlow, 1976, and Barlow and Hersen, 1984). bers of patients, (3) averaging of results over the
In the more clinical realm, the contribution of group, (4) generality of findings, and (5) inter-subject
Shapiro (1966) and Chassan (1967) must not be over- variability" (p. 14).
looked. Both were committed to the intensive study of We briefly comment here on each of these limita-
the single case in a methodologically rigorous manner. tions in turn. First, with regard to ethical concerns, the
Although neither of the two used the current nomen- primary one is that in the group comparison strategy,
clature (e.g., A-B-A) for describing their single-case the control group subjects do not receive treatment
strategies, a number of the reported cases (e.g., and, of consequence, are denied potential benefits.
Shapiro & Ravenette, 1959) bear a striking similarity This objection, naturally, is predicated on the notion
to the prototypical A-B-A design. However, for the (albeit erroneous at times) that the treatment being
most part, the work of Shapiro and Chassan may be evaluated is efficacious in the first place (cf. Eysenck,
described as correlational. That is, the experimental 1952). Second, the practical problems in identifying
control of therapeutic variable over dependent mea- and matching subjects in large-scale group comparison
sures is not as clearly specified as in the reports of studies, in addition to selecting and remunerating suit-
today's behavior analysts (cf. Hersen & Barlow, 1976; able therapists, are overwhelming. Moreover, this ap-
Kazdin, 1975). proach to research is time-consuming and usually re-
quires large federal allocations. It is not at all uncom-
mon for a 3- to 4-year outcome study in psychotherapy
Group Comparison Designs to cost the National Institute of Mental Health upward
of $600,000. Third, already discussed, are the pitfalls
It was in the late 1940s and 1950s that the effects of involved in the statistical averaging of patients who
psychotherapy began to be evaluated in large-scale improve or worsen as a function of treatment. Such
group comparison designs (see Rubenstein & Parloff, problems led Paul (1967) to conclude that psycho-
1959, for a review ofthe issues). However, very quick· therapy researchers should identify the patient who
1y some of the major shortcomings were pointed out in would profit from a specific therapy under very specif-
both the therapeutic techniques themselves (cf. Ey- ic circumstances. Fourth is the issue of generality of
senck, 1952) and the design strategies carried out by findings. Inasmuch as group averaging may "wash
clinical researchers (cf. Bergin, 1966). Eysenck out" the individual effects of particular treatments, the
(1952) compared the improvement rates of treated pa- practicing clinician in the community cannot ascertain

which specific patient characteristics may be corre- With the advent of behavior therapy and the emer-
lated with improvement. Nonetheless, if a study of this gence of a new journal devoted to the experimental
kind is planned prospectively (usually a factorial de- study of the individual in depth (Journal of Applied
sign), such information may be teased out statistically. Behavior Analysis), much of the scientist-practitioner
Fifth is the concern with intersubject variability. Al- split was bridged. Although initially the great majority
though ideally in the group comparison study a fre- ofthe work was operant in nature, more recently other
quent objective is to contrast homogeneous groups of types oftherapeutic strategies (e.g., systematic desen-
patients, in practice this often is neither feasible nor sitization) have also been assessed by means of single-
practicable. (It is obviously impossible to control for case methodology (see Van Hasselt, Hersen, Bellack,
the individual learning histories of patients, irrespec- Rosenblum, & Lamparski, 1979). Generally, single-
tive of whether the presenting symptoms are identical. case research as now practiced is referred to as the
That is, different etiologies may result in identical experimental analysis of behavior (Baer, Wolf, &
symptom patterns.) Thus, again, the unique response Risley, 1968, 1987).
of the individual patient to treatment is lost. Also, in
most group comparison studies, the effects of treat-
ment are indicated on a pre-post basis. As a result, the General Issues
vicissitudes oftherapeutic response throughout the full
course of treatment are not clarified. This certainly is
one area of marked import to every practicing clini- Intrasub;ect Variability
cian, who knows through experience about the "ups To determine the sources of variability in the subject
and downs" of responsivity to treatment, regardless of is probably the most important task of the single-case
the theoretical approach represented. researcher. The assessment of variability, of course, is
facilitated by observing the individual over time under
highly standardized conditions (i.e., repeated mea-
surement). In the quest to determine the causes of vari-
Experimental Analysis of Behavior
ability, the greater the control over the subject's en-
vironment (external and internal), the greater the
In addition to the problems inherent in evaluating likelihood of accurately identifying such variability.
the effects of psychotherapy in group comparison de- As noted by Hersen and Barlow (1976), the task is
signs, some other factors contributed to the growing made easier by studying lower organisms (e.g., the
importance of the single-case approach in the late white rat):
1960s and through the 1980s. First was the then-preva-
lent scientist-practitioner split. That is, many clinical In response to this, many scientists choose to work with lower
life forms in the hope that laws of behavior will emerge more
psychologists pursued esoteric research interests that rapidly and be generalized to the infinitely more complex area of
had little or no bearing on the work they conducted human behavior. Applied researchers do not have this lUXUry.
with their patients. Indeed, often the research carried The task of the investigator in the area of human behavior disor-
out by such clinicians was only of academic import. As ders is to discover functional relations among treatments and
specific behavior disorders over and above the welter of environ-
late as 1972, Matarazzo pointed out that "even after 15 mental and biological variables impinging on the patient at any
years, few of my research findings affect my practice. given time. Given these complexities, it is small wonder that
Psychological science per se doesn't guide me one bit. most treatments, when tested, produce small effects. (p. 35)
I still read avidly but this is of little direct practical
help. My clinical experience is the only thing that has In identifying sources of variability at the human
helped me in my practice to date" (Bergin & Strupp, level, the researcher needs to consider biolog-
1972, p. 340). As earlier argued by Hersen and Barlow ical,cognitive, and environmental variables. Although
(1976), these three systems are obviously interconnected, each
has some unique contributions to the problem. Biolog-
Since this view prevailed among prominent clinicians who were ical or cyclical variability in human beings (and ani-
well acquainted with research methodology, it follows that clini- mals, for that matter) is best represented by the
cians without research training or expertise were largely un-
affected by the promise of substance of scientific evaluation of female's estrus cycle. As is well known clinically and
behavior change procedures. (p. 22) is equally well documented empirically (see Hersen &
Barlow, 1976, Chapter 4), the dramatic hormonal absolving him from all household responsibilities)
changes that occur in women throughout the entire while concurrently ignoring any of the few positive
cycle (be it 24, 28, or 30 days) often yield equally verbal and motoric initiatives he did take. It was only
dramatic changes in mood, affect, and behavior. Ap- when the family was instructed and taught to reverse
plied behavioral researchers evaluating effects of ther- the contingencies (i.e., to ignore symptomatic presen-
apeutic interventions, particularly in female subjects tation and to reinforce positive verbal and motor be-
whose menstrual changes in behavior are extreme, haviors) that there was a marked change in the patient's
need to consider this factor when deriving conclusions behavior that maintained itself through a lengthy post-
from their data. Thus, it is quite conceivable that a treatment follow-up period.
behavioral intervention may coincide with a given part
of the cycle, yielding changes in behavior (either im-
provement or worsening) and thus confounding the Intersubject Variability
possible controlling effects of the specific behavioral
technique. Indeed, what behavioral change does take To this point, we have focused our discussion on the
place simply may be due to biological (internal) mech- attempt to ascertain the sources of variability within
anisms. Certainly, in the case of the woman whose the individual subject. However, another type of vari-
post-menses mood typically improves, improved ability that concerns the single-case researcher in-
mood after introduction of a behavioral treatment for volves the differences between and among subjects in
depression may have nothing to do with the behavioral reaction to a therapeutic or educational procedure.
intervention. To the contrary, improved mood most Small and large differences in responding between and
likely is the progression of natural biological events among subjects is termed intersubject variability. We
rather than therapeutic efficacy. have already touched on this issue when discussing the
Although of somewhat more recent interest to ap- limitations and problems of the group comparison ap-
plied behavioral researchers (cf. Bellack & Schwartz, proach to research. There we pointed out how some
1976; Hersen, 1979, 1981; Meichenbaum, 1976), the patients may improve as a function of treatment,
importance of the subject's cognitions can be neither whereas others may worsen. But when the entire treat-
ignored nor discounted. The strict operant interpreta- ment group's data are averaged and contrasted with the
tion of behavior, albeit more parsimonious, probably control condition, no statistically significant dif-
fails to reflect completely what truly distinguishes ferences emerge. From the aforementioned, it is clear
human beings from the lower species. Thus, when that intersubject variability poses an enormous prob-
repeated measurements are conducted, the subject's lem for tile group comparison researcher, even if ho-
emotional-cognitive state requires attention, both as to mogeneous groups are to be contrasted. For tile single-
how he or she feels and thinks over time (a dependent case researcher who is conducting replications of treat-
measure) and also as to how such thinking and feeling ments in a series of patients (presumably homoge-
themselves can be causative agents for alteri!lg overt neous witll regard to a particular disorder), intersubject
behavior. variability is also a problem but may result in subse-
Finally, but hardly least of all, we must consider the quent refinements of procedures. In addition, with ex-
contribution of the external environment (i.e., the con- tensive intersubject variability, the power of a particu-
tingencies of reinforcement) on specific behavioral lar procedure may be determined in addition to an
manifestations. To date, most of the work in single- evaluation of its possible limitations.
case methodology has been devoted to elucidating the More specifically, a behavioral treatment for de-
environmental variables that control directly observ- pression may prove efficacious for botll males and
able motor responses (cf. Kazdin, 1975). For example, females who have had no prior episodes of tile disor-
in a case of conversion reaction where the patient pre- der. On the other hand, for those patients who have
sented himself as unable to walk, Kallman, Hersen, suffered several prior depressive episodes, the same
and O'Toole (1975) clearly documented how the fami- technique may be only partially effective. Under these
ly's reactions to the symptoms resulted directly in the circumstances, the single-case researcher may alter
patient's continued symptomatology. That is, the fami- some aspect of the treatment strategy in the hope of
ly tended to reinforce the patient's verbalizations about getting improved results. As stated in Hersen and Bar-
symptoms (as well as serving him meals in bed and low (1976),

the task confronting the applied researcher at this point is to Generality of Findings
devise experimental designs to isolate the cause of change, or
lack of change. One advantage of single case experimental de-
signs is that the investigator can begin an immediate search for
As will be apparent in the following sections of this
the cause of an experimental behavior trend by altering his ex- chapter, the main objective of the experimental analy-
perimental designs on the spot. This feature, when properly em- sis-of-behavior model (i.e., the single-case design) is
ployed, can provide immediate information on hypothesized to demonstrate the functions of the therapeutic or edu-
sources of variability. (p. 40)
cational strategy that control the target behavior of
interest. However, single-case researchers are intent on
On the other hand, the same behavioral treatment ap- demonstrating this functional relationship not only in
plied to depressives who have had prior episodes may the individual case but also for other individuals who
yield absolutely no change if the patients in addition bear similar characteristics. This, then, is referred to as
are severely obsessive. At this point, the upper limits subject generality. That is, the same therapeutic strat-
of the behavioral strategy may have been discovered, egy should prove effective over a number of patients
and it behooves the single-case researcher to consider with homogeneous features. Such features may relate
either a different behavioral strategy or the combined to sex of the patients, their age, their diagnosis, their
(synergistic) effects of the behavioral-pharmacolog- premorbid personality structure, or the family history
ical approach (cf. Hersen, 1979, 1986). of the disorder under investigation.
A second kind of generality concerns the behavior
change agent (i.e., the therapist or educator). Given
Magnitude of Change the same type of patient and the identical therapeutic
strategy, do the unique characteristics of the therapist
In the section on statistical analysis, we consider in
affect the outcome? More specifically, is the male ther-
some detail the advantages and disadvantages of the
apist who carries out assertion training with an unas-
statistical versus the visual evaluation of data trends.
sertive female client as effective as the female therapist
However, here it is important to consider the magni-
doing the same treatment who is also highly committed
tude of change brought about by a particular interven-
to the goals of the women's movement? This naturally
tion. Because of the frequently exploratory nature of
is an empirical question whose answer can be deter-
single-case work, it is especially important to docu-
mined only via careful replication across different ther-
ment the power of the technique under consideration.
apists (see the sections of direct, clinical, and system-
For example, in the treatment of a depressed indi-
atic replication).
vidual, one may be able to document a statistically
A third type of generality deals with the setting in
significant change if the Beck Depression Inventory
which the therapeutic or educational technique is being
(BDI) score decreases, say, from 20 at baseline to
applied. That is, will a given intervention work as well
about 15 following treatment. However, the question
in one type of setting as in another? For example, if the
to be raised at this juncture is: How meaningful
flooding treatment of an agoraphobic in a rural setting
(clinically and socially) is this 5-point diminution?
appears to work, will the same therapy for an agora-
Certainly, a score of 15 on the BDI still represents a
phobic living in an urban center prove as efficacious?
considerable residue of depression. Thus, although
The three types of generality discussed above are
potentially of statistical significance, the therapeutic
problems for the single-case researcher. Indeed, critics
technique would have to yield a much greater change if
of the single-case approach must often point to gener-
it is to be considered of value to the practicing clini-
ality of findings as one of the weaker features of this
cian. In his incisive review of this issue, Kazdin (1977)
research strategy:
has argued about the importance of providing social
validation. That is, to be given clinical credence a
The most obvious limitation of studying a single case is that one
therapeutic technique should be able to bring about
does not know if the results from this case would be relevant to
sufficient change so that the treated individual ap- other cases. Even if one isolates the active therapeutic variable in
proaches the social norm. Thus, in the case of our a given client through a rigorous single case experimental de-
depressive, the change brought about should lead to a sign, critics . . . note that there is little basis for inferring that
this therapeutic procedure would be equally effective when ap-
posttreatment score of 0-5 on the BDI. Otherwise, the plied to clients with similar behavior disorders (client generality)
norm for the nonpathological population will not have or that different therapists using this technique would achieve the
been closely enough approached. same results (therapist generality). Finally, one does not know if

the technique would work in a different setting (setting gener- more difficult and more time-consuming. It is highly
ality). This issue, more than any other, has retarded the develop- unlikely that the kind of environmental control that one
ment of single case methodology in applied research and has
caused many authorities on research to deny the utility of study-
is capable of obtaining in the animal laboratory will
ing a single case for any other purpose than the generation of ever be possible in the clinical situation, because of the
hypotheses .... Conversely, in the search for generality of ap- multiplicity of variables impinging on human beings
plied research findings, the group comparison approach ap- as well as because of the more obvious ethical
peared to be the logical answer. (Hersen & Barlow, 1976, p. 53)
However, as the old adage goes, "Appearances can
be misleading." A careful scrutiny of the group com-
General Procedures
parison strategy reveals that there are limitations here,
too, on establishing generality of findings. These have
Repeated Measurement
been discussed in considerable detail in Barlow and
Hersen (1984, Chapter 2), and the interested reader is The hallmark of the single-case experimental design
referred to that source. However, for purposes of ex- is that dependent measures are repeatedly taken during
position at this point, there are two problems that war- baseline and treatment phases. Such measures may
rant our attention. The first is that one cannot automati- involve the observation of motor behavior (e.g., the
cally infer that the results from a homogeneous group number of social interactions per half hour of free play
of subjects are necessarily representative of the popu- in a socially isolated child), the assessment of physio-
lation of such SUbjects. Second, and probably of para- logical functioning (e. g., the heart rate of a phobic
mount importance, it is difficult to take the average patient on presentation of the feared stimulus), or the
response of a group of subjects and generalize to the evaluation of the cognitive-attitudinal state of the sub-
individual case. In the section on replication (direct, ject (e.g., the score on a self-report anxiety or depres-
clinical, and systematic), we examine how single-case sion scale).
researchers have attempted to document the general Irrespective of the measurement system under con-
applicability of their findings. sideration, "the operations involved in obtaining such
measurements . . . must be clearly specified, observ-
able, pUblic, and replicable in all respects .... Sec-
Variability as Related to Generality
ondly, measurements taken repeatedly, especially over
On the surface, one would think that variability and extended periods of time must be done under exacting
generality are unrelated. However, on closer inspec- and totally standardized conditions with respect to
tion it is clear that by identifying as many sources of measurement devices used, personnel involved, time
variability as possible, single-case researchers are able or times of day, . . . instructions to the subject, and
to improve and refine their techniques so that overall the specific environmental conditions (e.g., location)
treatments have greater applicability to a wider range where the measurement session occurs" (Hersen &
of subjects. Sidman (1960) contended that Barlow, 1976, p. 71).
Each of the measurement systems poses some
tracking down sources of variability is then a primary technique unique challenges to the single-case researcher. When
for establishing generality. Generality and variability are basical-
ly antithetical concepts. If there are major undiscovered sources motoric measures are taken and human observers are
of variability in a given set of data, any attempt to achieve subject used, independent reliability checks are required.
or principle generality is likely to fail. Every time we discover These reliability checks can be expressed either as a
control of a factor that contributes to variability, we increase the percentage of agreement for interval data (with 80%
likelihood that our data will be reproducible with new subjects
and in different situations. Experience has taught us that preci- considered minimally acceptable) or as a correlation
sion of control leads to more extensive generalization of data. for continuous data (with r = .80 considered mini-
(pp. 50-51) mally acceptable). (For a more comprehensive survey
of behavioral assessment strategies, see Hersen &
Of course, Sidman was referring primarily to work Bellack, 1988).
with infrabuman species. Although the same prin- When physiological measures are repeatedly taken,
ciples hold in clinical and educational investigation this too must be done under totally standardized condi-
with human subjects, the methods for achieving con- tions. Here, investigators must be concerned not only
trol and ferreting out sources of variability are both with the functioning of the electronic devices but with

the subject's adaptation to the equipment. In addition, lectors (e. g., nursing personnel) assumes even greater
concern for fatigability is of some importance, particu- importance.
larly if intertrial time periods are not sufficiently long.
In cases where sexual responding in males is being
Choosing a Baseline
repeatedly evaluated, assessment sessions should be
programmed to guarantee maximum possible respond- With the exception of the B-A-B design, where
ing. Thus, the fatigue factor, if not properly attended to treatment precedes baseline assessment, in most sin-
in this instance, might lead to a confounding of conclu- gle-case experimental designs the initial period of ob-
sions. This could occur if decreased deviant sexual servation involves the natural frequency of occurrence
responding to stimuli is erroneously attributed to treat- of the behavior of interest. This initial phase is referred
ment but is, in fact, simply a function of satiation. to as baseline and is labeled A. Baseline serves as the
A major problem in using repeated self-reports of standard by which subsequent treatment phases are
subjects in single-case evaluations of treatments in- contrasted.
volves the external validity of such data. As noted by In the ideal case, the assessment of baseline func-
Hersen and Barlow (1976), "When using this type of tioning yields a stable pattern of data, thus facilitating
assessment technique, the possibility always exists, the interpretation of treatment effects in the B phase.
even in clinical subjects, that the subject's natural re- However, more often than not, such stability of data is
sponsivity will not be tapped, but the data in confor- not to be found. This is less of a problem for the basic
mity to experimental demand are being recorded" (p. animal researcher, who is in a position to program the
73). That is, the subject verbally responds to what he or subject's responding through the application of a vari-
she perceives as the therapist's expectation at that point ety of interval- and ratio-scheduling methodologies. It
in the treatment. Of course, the use of alternate forms is understandable, then, why Sidman's (1960) defini-
of the scale and the establishment of external validity tion of stability is a 5% range of variability. If vari-
by correlating self-report with motoric and physiologi- ability exceeds that range, it is recommended that the
cal indexes are two methods for avoiding some of the experimenter evaluate sources of variability systemati-
pitfalls of attitudinal measures. However, there is am- cally.
ple evidence in the behavioral literature that de- In evaluating human subjects, the experimenter's
synchrony exists among the three response systems flexibility in creating and choosing a baseline is much
(i.e., motoric, physiological, and cognitive) (Hersen, more constricted. Generally, the applied researcher
1973, 1978). Indeed, there is the suggestion that does not have the lUXury of "creating" ideal baseline
clinical subjects' self-reports of how they feel should conditions and is compelled to accept the baseline pat-
be given credence, irrespective of how motoric and tern as a given. Furthermore, the applied researcher is
physiological data change during the course of treat- usually under time constraints; hence, there is less op-
ment (cf. Hersen, 1978). It is assumed that if sufficient portunity to search for the manifold causes of vari-
improvements do not take place in motoric and physio- ability. However, sometimes adjustment in the mea-
logical areas, but the cognitive-attitudinal system re- surement scale being used may reduce extensive
mains unmodified, treatment should be considered variability. That is, at times, the measurement interval
only partially successful. may not be appropriate for the behavior under study
A specific issue faced by the single-case researcher and therefore leads to extraneous variability.
who works in the psychiatric setting (Hersen & Bell- In the following discussion, we illustrate some of
ack, 1978) that obviously can affect the standardiza- the baseline patterns typically encountered when con-
tion of data is the different composition of the staff at ducting applied research with human subjects. Prob-
various times. Not only may variable levels of staff lems inherent in each of the patterns and methods for
cooperation yield a differing quality in the data col- dealing with them are outlined.
lected within a stated time period (e.g., the morning), Hersen and Barlow (1976) have identified and illus-
but the marked staff differences in number and attitude trated eight specific baseline patterns (see Table 1).
during day, evening, and weekend shifts are variables These, of course, are the most representative, but
that may lead to confounded data. Thus, when con- many other possibilities, combinations, and permuta-
ducting single-case research in the psychiatric setting, tions exist. Each of the baseline patterns illustrated
standardization of data collection times and data col- contains six data points. In single-case research, an
Table 1. Baseline Patterns
r: 25
1. Stable baseline
2. Increasing baseline (target behavior worsening) ~ 200
3. Decreasing baseline (target behavior improving) i::: 150 _____________
4. Variable baseline
5. Variable-stable baseline
~ 100
6. Increasing-decreasing baseline
Decreasing-increasing baseline
Unstable baseline ...
:. 50
~ 0

oft-raised question is: "How many points do I need for

an appropriate baseline assessment?" Although this is
1 2 3
4 5

a straightforward question, the answer to it is a bit Figure 1. The stable baseline. Hypothetical data for mean num-
complex. The first issue, of course, is how many data bers of facial tics averaged over three daily 15-min videotaped
points are required in order to ascertain some trend in sessions. (From Fig. 3-1 in Single Case ExperimentaiDesigns by
M. HersenandD. H. Barlow, 1976, New York: Pergamon Press.
the data. Barlow and Hersen (1973) argued that "a
Copyright 1976 by Pergamon Press. Reprinted by permission.
minimum of three separate observation points, plotted
on the graph, during this baseline phase are required to
establish a trend in the data" (p. 320). Sometimes ment were ineffective, then no difference in the slope
more data points are needed if the baseline is initially of the curve might be noted. On the other hand, if
variable or unstable. But, of course, the exigencies of a treatment were detrimental to the patient, it would be
treatment situation may, at times, demand that the in- difficult to determine whether the data in the interven-
vestigator forego experimental purity and institute tion phase simply represent a continuation of the trend
treatment as rapidly as possible. begun in baseline or whether they indicate further dete-
An upward trend in the data is represented by three rioration because of the treatment itself. However, a
successively increasing points. Conversely, a decreas- marked change in the slope of the curve could be in-
ing trend in the data is represented by three suc- terpreted as a deterioration effect because of the
cessively decreasing points. However, the power of treatment.
the trend is dictated by the slope of the curve, with The third pattern is one where the baseline is de-
steeper slopes indicating greater power. The statistical creasing and the target behavior is improving. This
methods for assessing slopes and trends in single-case pattern is problematic inasmuch as subsequent treat-
research have been reviewed by Kazdin (1976). How- ment application might just result in a continuation of
ever, to date, despite considerable controversy in the the trend begun in baseline. If there were a marked
field (Baer, 1977; Blumberg, 1984; Huitema, 1985; change in the slope of the curve, the improvement
Jones, Vaught, & Weinrott, 1977; Wampold & Work- might be attributed to treatment, but this would be
sham, 1986), most applied behavioral researchers rely difficult to evaluate via visual inspection. Generally, in
on a visual analysis of the data. this instance, treatment would have to be withdrawn
The stable baseline is depicted in Figure 1. As is and reinstituted if its controlling effects are to be estab-
quite apparent, there is some minor variability in tic lished. If treatment were to lead to a worsening of the
frequency, but this variability is minimal, with the data patient's condition, then a reversed trend in the data
essentially representing a straight line (i.e., no upward would be apparent.
or downward trend). The application of treatment fol- The fourth pattern, portrayed in Figure 2, is the
lowing such a baseline would permit an unambiguous variable baseline. We should note that this is a pattern
interpretation of its effect (no change, improvement, frequently encountered in applied clinical research.
or worsening). The figure shows a tic frequency ranging from 24 to
As indicated in Table 1, the second pattern is the 255; no clear trend is apparent in the data. Nonethe-
increasing baseline, where the target behavior is wors- less, there is a clear pattern of alternating low and high
ening. This, of course, is an acceptable pattern that data points. Some investigators who obtain this pattern
could lead to a meaningful interpretation if subsequent block the data by averaging tic frequency over a 2-day
treatment were to reverse the trend. However, if treat- period. This would lead to an apparently stable pat-

.. 251 .. 250

i: 211 ...c
C; 151 ......c 150
~ 100

& 51


1 2 J
• 5

2 3
4 5

Figure 3. The increasing-decreasing baseline. Hypothetical
Figure 2. The variable baseline. Hypothetical data for mean data for mean number of facial tics averaged over three daily 15-
number of facial tics averaged over three 15-min videotaped min videotaped sessions. (From Fig. 3-6 in Single Case Experi-
sessions. (From Fig. 3-4 in Single Case Experimental Designs by mental Designs by M. Hersen and D. H. Barlow, 1976, New
M. Hersen and D. H. Barlow, 1976, New York:: Pergamon Press. York: Pergamon Press, 1976. Reprinted by permission.)
Copyright 1976 by Pergamon Press. Reprinted by permission.

tern, at least visually. However, this is an artificial The seventh pattern (decreasing-increasing base-
manner of dealing with variability that is "cosmetic" line) is the converse of the sixth: improvement fol-
but does not alter the basic pattern. In light of the lowed by deterioration. In this instance, application of
extreme variability seen in this pattern, subsequent in- treatment that results in a reversal of data trends per-
terpretations of a treatment effect could be quite diffi- mits a clear interpretation of the effect. However, in
cult unless the treatment itself were successful in re- the event that treatment is detrimental to the patient,
ducing both variability and tic frequency. As recom- visual inspection should prove extremely difficult un-
mended by Sidman (1960), if time permits, the re- less there is a marked change in the slope of the curve.
searcher should attempt to identify the source(s) of The final pattern, the unstable baseline, is graphi-
variability. But in the clinical situation, this usually is cally portrayed in Figure 4. In this example, we have
not the case. an extended baseline assessment that fails to reveal any
As can be seen in Table 1, the fifth pattern is the particular pattern in the data. Thus, even the cosmetics
variable-stable baseline. Here, despite initial vari- of blocking would fail to yield visual improvement as
ability, the investigator extends the baseline observa- to stability. As noted by Hersen and Barlow (1976),
tion until the data are less variable. Indeed, this is one "to date, no completely satisfactory strategy for deal-
of the methods for dealing with the fourth pattern: the ing with this type of baseline has appeared; at best, the
variable baseline. After stability is achieved, the in-
stitution of a given treatment should once again lead to
an unambiguous interpretation of the resulting data. ..
The only problem here is that extensive baseline obser- ...c
vation may not be possible or ethical in certain clinical U
c 150
situations (e.g., severe head banging or severe depres-
sion where suicidal ideation is present). 0
~ 110
The sixth pattern (increasing-decreasing baseline), II
presented in Figure 3, is one where after an initial
~ 50
period of deterioration, improvement is quite apparent 0
in the subject's condition. But as in the case of the
decreasing baseline, the subsequent treatment applica- 3 5 7 • n 13 15
tion might only result in the continuation of the trend in DIYS
the second part of baseline (i.e., continued improve- Figure 4. The unstable baseline. Hypothetical data for mean
ment). Therefore, here, as in the case of the decreasing number of facial tics averaged over three daily 15-min vid-
eotaped sessions. (From Fig. 3-8 in Single Case Experimental
baseline, withdrawal and reinstatement of treatment Designs by M. Hersen and D. H. Barlow, 1976, New York:
are needed to document the effects controlling the in- Pergamon Press. Copyright 1976 by Pergamon Press. Reprinted
tervention strategy. by permission.)

kinds of strategies for dealing with variable baseline We might also note that in drug evaluations, the one-
are also recommended here" (p. 82). variable rule also holds but has some additional im-
plications. Instead of progressing from a baseline
phase (where no treatment is being administered) to a
Changing One Variable at a Time
treatment phase (active drug), an additional step (i.e.,
One of the basic tenets of the single-case approach is placebo) is needed to control for the mere fact that the
that only one variable is altered at a time when pro- subject is ingesting a substance. Thus, a typical drug
ceeding from one phase to the next (Barlow & Hersen, evaluation accomplished in the experimental single-
1973, 1984; Hersen & Barlow, 1976). It should be case design might involve the following sequence:
noted that if two variables are manipulated simul- (1) no drug, (2) placebo, (3) active drug, (4) placebo,
taneously from one phase to another, then it is impossi- and (5) active drug. This design, labeled A-A' -B-A'-
ble to determine which of the two was responsible for B, allows for evaluation of the contribution of the
or contributed most to behavioral change. This one- placebo over baseline and the drug over and above
variable stricture holds irrespective of whether the be- placebo alone.
ginning, the middle, or the end phase is being
Length of Phases
Let us examine this basic tenet in greater detail. In
the A-B-A-B design, for example, only one variable is A number of factors need to be considered when
changed from one adjacent phase to the next. Baseline determining length of baseline and treatment phases in
is followed by treatment, which is succeeded by base- single-case research. Included are time limitations,
line and then treatment again. If treatment consists of a staff reactions, the relative length of adjacent phases,
single therapeutic strategy-say, social reinforce- and ethical considerations. Johnston (1972) argued
ment-then only one variable is altered from A to B. that
However, many treatments (e. g. , social skills training)
consist of a melange of techniques (Le., instructions, it is necessary that each phase be sufficiently long to demonstrate
stability (lack of trend and constant range of variability) and to
feedback, modeling, and social reinforcement). Thus,
dispel any doubts of the reader that the data shown are sensitive
in an A-B-A-B design involving the application of such to and representative of what was happening under the described
treatment, B represents the full combination of tech- conditions. (p. 1036)
niques. In this analysis, it is not possible to evaluate the
separate contribution of each technique. However, in In the ideal case, of course, the investigator attempts
an A-B-A-B-BC-B design, where A is baseline, B is to secure a relatively equal number of data points per
feedback, and C is social reinforcement, the separate phase. This is especially important in the A-B-A-B
contributions of feedback and social reinforcement to design; otherwise, if, for example, a treatment phase
the overall treatment effect can be determined. were substantially longer than the preceding baseline,
Although the one-variable rule is generally adhered effects could be attributed to the extended time factor
to by behavioral researchers, examples in the literature rather than to the treatment per se (see Hersen & Bar-
may be found where incorrect applications have been low, 1976, p. 101).
carried out and published. Two prime examples are the An excellent example of an A-B-A-B design with
A-B-A-C design, where the investigator erroneously equal phases (with the exception of the last B phase)
assumes that the differential effects of A and C can be was presented by Miller (1973). In this study, the ef-
determined, and the A-B-A-BC design, where the in- fects of retention control training were evaluated in a
vestigator assumes that the combined effects of BC secondary enuretic child, with two targets (number of
may be contrasted with the original B phase. Not only enuretic episodes and frequency of daily urination) se-
is this idea erroneous in terms of the one-variable ten- lected as dependent measures (see Figure 5). The read-
et, but the investigator has failed to consider the addi- er will note the relative stability in the baseline, the
tional factor of the sequencing of possible treatment initial effects of the treatment, the return to baseline
effects and the time lapses between treatment applica- stability, and the renewed effects of the treatment dur-
tions. With respect to these two examples, it should be ing the second B phase. However, the second B phase
pointed out that the experimental error is most fre- was extended to 5 data points (instead of 3) to ensure
quently committed toward the latter part of the experi- the permanence of the treatment effects. This is a pro-
mental analysis. cedure commonly carried out in the last phase of the A-

. - . • • • U'f .1 ••• , . . .

vention it is possible to terminate treatment (with mini-
mal carryover effects, it is hoped, from treatment to
• »
baseline), in pharmacological applications the biolog-
ical effects of the drug may actually persist into the

!!II 10 ::
• placebo and baseline phases. Thus, it generally is not
,, ,

, I '. ., ,, II
feasible to evaluate the long-term effects of drugs in

,# .' ....••
single-case studies without the use of additional phases

("washout" phases, where there is no intervention)
:I '.~' \. ,....,
......... -.' interposed between treatment and placebo. However,
... r -
5 "'\
for the short-term evaluation of drugs, where they are
•4 4 !C rapidly introduced and removed, the single-case strat-
•• 2
egy is quite satisfactory (see Liberman & Davis,
» 1975).

1 2 I 4 5 • , I I 10 11 12 11 14
CO.IECUTI'E WEEII In the behavioral literature (e.g., Baer et al., 1968;
Figure 5. Number of enuretic episodes per week and mean Barlow & Hersen, 1973, 1984; Kazdin, 1973), the A-
number of daily urinations per week for Subject 1. (From "An B-A-B design is considered prototypical of the reversal
Experimental Analysis of Retention Control Training in the strategy:
Treatment of Nocturnal Enuresis in Two Institutionalized Ado-
lescents" by P. M. Miller, 1973, Behavior Therapy, 4, 288-294.
When speaking of a reversal, one typically refers to the removal
Fig. 1. Copyright 1973 by Behavior Therapy. Reprinted by
Fig. 1. Copyright 1973 by Behavior Therapy. Reprinted by permission.
line measurement has been concluded. In practice, the reversal
involves a withdrawal of the B phase (in the A-B-A design) after
behavioral change has been successfully demonstrated. If the
treatment (B phase) indeed exerts control over the targeted be-
B-A-B design, and it has clinical implications, but the havior under study, a decreased or increased trend (depending on
importance of the equality of the data points in the A- which direction indicates deterioration) in the data should follow
B-A phase of the study is clear. its removal. (Hersen & Barlow, 1976, p. 92)
Sometimes, when the targeted behavior is poten-
tially injurious to the subject under study (e.g., head However, although the word reversal is used to de-
banging) and/or the staff in the institution are eager to scribe the A-B-A-B design and the removal of treat-
get the behavior under control very quickly because it ment in the second A phase, Hersen and Barlow (1976)
is annoying, the initial baseline and the subsequent argued that the term withdrawal better describes the
withdrawals of treatment (second and third A phases) technical operation carried out by the applied behav-
may be very brief as contrasted with the intervention ioral research. Hersen and Barlow (1976) contended
phases. Here, it is quite clear that ethical considera- that there is a specific experimental strategy that is to
tions have precedence over experimental rigor. be labeled the reversal design. An illustration of this
Still another factor related to length of phase is the design appears in Figure 6. K. E. Allen, Hart, Buell,
carry-over effects of treatment to baseline. In the A-B- Harris, and Wolf (1964) evaluated the effects of social
A-B design evaluating a behavioral strategy, this oc- reinforcement in a 4Y2-year-old withdrawn girl attend-
curs in the second A phase, where the experimenter is ing a preschool nursery. The target behaviors selected
unable to recover the initial baseline level that ap- for study were the percentage of interaction with adults
peared in the first A phase. This is one of the primary and the percentage of interaction with children. As can
reasons that Bijou, Peterson, Harris, Allen, and John- be seen in Figure 6, during baseline, a greater percent-
ston (1969) stated that "in studies involving stimuli age of social interaction took place with adults than
with reinforcing properties, relatively short experi- with children. In the second phase, the teacher was
mental periods are advocated, since long ones might instructed to reinforce the child socially when she was
allow enough time for the establishment of new condi- interacting with other children and to ignore her when
tioned reinforcers" (p. 202). she was interacting with adults. In the next phase, the
A special problem concerning carry-over effects in- teacher was instructed to reverse the contingencies
volves evaluations of pharmacological treatments in (i.e., to reinforce interaction with adults and to ignore
single-case designs. Whereas with a behavioral inter- interactions with children). Again, interaction with


z 40
• •
• •
u 20
~ 0
u. 100

• •
Z 00
a: 00

• •



o 1 2 3 4 5 6 7 B 9 10 11 17 lB 19 20 21 V 23 24 25 31 384051

Figure 6. Daily percentages of time spent in social interaction with adults and with children during approximately 2 hours of each
morning session. (From "Effects of Social Reinforcement on Isolate Behavior of a Nursery School Child" by K. E. Allen et aI., 1964,
Child Development, 35, 511-518, Fig. 2. Copyright 1964 by The Society for Research in Child Development, Inc. Reprinted by

adults increased while interaction with children de- here to the physical discontinuation of a drug whose
creased. Indeed, this is a true reversal (of differential biological effect in the body may persist through the
attention) and is vastly different from simple with- placebo and baseline phases of experimentation.
drawal of treatment in the second A phase ofthe A-B- Nonetheless, it is possible to study the very short-term
A-B design. In the fourth phase of the K. E. Allenetal. effects of an instructional set that is periodically intro-
(1964) study, the contingencies were once more re- duced and removed (Eisler, Hersen, & Agras, 1973)
versed, this reversal leading to increased interaction and the effects of changing the instructional set from a
with children and decreased interaction with adults. positive to a negative expectation (Barlow, Agras,
We should note parenthetically, however, that de- Leitenberg, Callahan, & Moore, 1972). But usually,
spite this distinction drawn between withdrawal and instructions tend to be maintained as a constant across
reversal, most applied behavioral researchers persist in the various phases of treatment, so that only one thera-
referring to the A-B-A-B design as a reversal strategy. peutic variable is changed at a time (cf. Kallman et al.,
In short, the distinction made has not been reinforced 1975). When this is not possible, alternative experi-
by journal editors. mental strategies, such as the multiple-baseline de-
sign, may be used to evaluate specific instructional
effects on targeted behaviors (e.g., Hersen & Bellack,
Evaluating Irreversible Procedures
In single-case research, such variables as feedback,
social reinforcement, and a variety of punishment
techniques can be readily introduced and withdrawn in Basic A-B-A Designs
a number of designs (withdrawal and reversal). How-
ever, one variable-instructions-cannot be with-
A-B Design
drawn in the technical sense. That is, once an instruc-
tional set has been given to the subject a number of The A-B design is the simplest of the single-case
times, simply ceasing to remind him or her of the in- strategies, with the exception of the so-called B de-
structions cannot be equated with cessation of feed- sign, where measures are repeatedly taken throughout
back or reinforcement. Indeed, there is an analogy the course of treatment (i.e., in an uncontrolled case

study with repeated measures). On the other hand, in 18 hln. ,, Treat ... nt Follow-up

the A-B Design, the natural frequency of the behavior 16
~ ,,
under study is ftrst assessed in baseline (A). Then, in 1~ I!
...I \ iill,
I ,
the B phase, treatment is instituted. .. I: 12
I ,

I: -
Of the single-case strategies, the A-B design is one

~i ~~
• .. 10
of the weakest in terms of inferring causality. Indeed,
the design is often referred to as correlational in that .


the effects of treatment that control the dependent mea- 4

--- ~~
sures are not completely documented unless treatment : \
: \
is withdrawn subsequent to B (i.e., the A-B-A design). : '
: . . . . .
Thus, in the A-B design, it is possible that changes in B 2 ~ 6 8 10 12 1~ 16 18 20 22 24 2 ~ 6 8 1012

are not the direct result of treatment per se but perhaps o HS WEEK 5
of some other factor, such as passage of time, that is Figure 7. Frequency of gagging during baseline treatment and
correlated with the treatment. follow-up. (From "Behavioral Control of Hysterical Gagging"
Nonetheless, the A-B design does have its use and by L. H. Epstein and M. Hersen, 1974, Journal of Clinical
certainly represents a vast improvement over the un- Psychology, 30, 102-104. Fig. 1. Copyright 1974 bytbe Ameri-
Copyright 1974 bytbe American Psychological Association. Reprinted by permission.
controlled case study. Also, for therapeutic or educa-
tional problems that have long proved recalcitrant, if
intervention in B yields behavioral improvement, then
with some degree of conftdence one may attribute the From a design perspective, the reader should note
effects to the speciftc intervention. However, only by the initial variability in baseline, which then stabilized
demonstrating such change in a withdrawal or reversal to some degree. Further, the length of the treatment
design will dispel the remaining doubts of the more phase was double that of the baseline; it was extended
"hard-headed" operant researcher. for obvious clinical considerations (i.e., renewed
An example of an A-B design that also includes symptomatology on Day 15).
follow-up was presented by Epstein and Hersen Although the treatment appeared to be efficacious, it
(1974). The subject was a 26-year-old psychiatric in- is possible that some unidentifted variable, correlated
patient who had suffered from gagging episodes for with reinforcement procedures, led to behavioral
about 2 years in spite of numerous medical interven- change. But as previously noted, the A-B design does
tions. However, the problem appeared to have no di- not allow for a completely unambiguous interpretation
rect medical etiology; hence, the patient was admitted of causality. However, given the longevity of this pa-
to the psychiatric service of a Veterans Administration tient's disorder and the repeated failure of medical in-
hospital. During baseline (A), the patient was asked to terventions, there is a good likelihood that the treat-
record on an index card the specific time and frequency ment per se caused the improvement.
of each gagging episode. During treatment (B), the
patient was given $2 in canteen books (exchangeable
Pretest-Posttest Design
at the hospital commissary) for an n - 1 decrease in his
gagging rate from the previous day. In treatment, the Thyer and Curtis (1983) have proposed a variant of
emphasis was on the patient's managing his disorder the A-B design that they term the "repeated pretest-
himself, with canteen booklets serving as the incen- posttest" single-subject experiment. This design, to
tive. During the 12-week follow-up, the patient con- date, has not enjoyed widespread popularity, but its
timied recording his gagging rate at home, with self- value to individual practitioners is considerable, given
reports corroborated by his wife. the usual experimental constraints of their practices.
Figure 7 baseline data reveal a gagging frequency of According to Thyer and Curtis, this design does not
8-17 instances per day. Institution of treatment led to a interfere with the delivery of treatment and has "high
marked decrease, to 0 on Day 14. However, renewed internal validity." Basically, the design involves taking
symptomatology was evidenced on Day 15, and treat- measurements of the client's or patient's behavior at
ment was continued, with the criterion for Day 15 reset the beginning and at the end of each treatment session.
to that originally used for Day 13. Improvements were Improvements, of course, can be readily seen from
noted between Days 15 and 18, and treatment was pre- to postmeasurement for self-report, motoric, or
continued an additional 6 days. physiological indices. As noted by Thyer and Curtis,
maintenance of the changes induced in one treatment session are •. - . POlNTS EARNED
assessed by measuring the behavior at the beginning of the next 0 - 0 BEMA VIORAL RATINGS

session. Visual inspection of the data, both pre and post each
30 3
treatment session and across sessions, permits a robust behavior
analysis of the effects of intervention. (p. 312) Q
...Z lit

To illustrate this design, Thyer and Curtis presented •e



the results of exposure treatment (6 sessions) for a lit 20 I I 2
woman suffering from simple phobia, reflecting some

improvements in pulse rate and very marked improve- 0
ments of subjective units of discomfort (SUDS) when ...0 >
confronted with the phobic object. ....
•I ~

A-B-A Design

::) ~

The A-B-A design corrects for one of the major Z

shortcomings of the A-B design: lack of experimental 0 0
control. Removal of treatment in the second A phase is TOKEN
used to confirm experimental control over the depen-
dent measure initially suggested when improvement 2 3 II 5 6 7 8 9 10 11 12
occurs in B. However, the A-B-A design is not com-
Figure 8. Number of points earned and mean behavioral ratings
pletely adequate either, as it terminates in a no-treat- for Subject 1. (From "Effects of Token Economy on Neurotic
ment phase. For very obvious clinical and ethical rea- Depression: An Experimental Analysis" by Michel Hersen et
sons, this is problematic; at times, the experimenter ai., 1973, Behavior Therapy, 4, 392-397, Fig. 1. Copyright
1973 by Behavior Therapy. Reprinted by permission.)
may have intended to follow the more complete A-B-
A-B strategy, but for any number of reasons, the sub-
ject terminates the treatment prematurely. Even under behavioral ratings, suggesting efficacy of the interven-
these circumstances, data from A-B-A designs are of tion. Removal of treatment in the second A phase led
value. to a marked diminution of points earned; decreased
Let us consider an example of an A-B-A designed by behavioral ratings also confirmed the controlling ef-
Hersen, Eisler, Alford, & Agras (1973) some years fects of the treatment.
ago. In this study the investigators evaluated the ef- From a design perspective, the fact that the number
fects of a token economy on neurotic depression in a of points earned in baseline was on the increase makes
married, white, 52-year-old farmer who had become it a bit more difficult to interpret the greater number of
depressed following the sale of his farm. The two de- points earned during token reinforcement. However,
pendent measures selected for study were the number the marked change in the slope of the curve during the
of points earned and the behavioral ratings of depres- token economy phase is highly suggestive. Moreover,
sion (talking, smiling, and motor activity), with higher data in the second A phase (showing a marked de-
ratings indicating less depression. During baseline crease in the number of points earned) confirmed the
(A), the patient was able to earn points, but they had no initial impression of the token economy's controlling
exchange value. In B (token economy), the patient had effects. The data for behavioral ratings, because of the
to purchase privileges on the ward with points earned. specific trends obtained, are definitely less ambiguous
Then, in the third phase (A), baseline procedures were to interpret and clearly indicate the controlling effects
reinstated. of the token economy.
The results of this experimental analysis are present-
ed graphically in Figure 8. Inspection of baseline
A-B-A-B Design
shows that the number of points earned was increas-
ing, whereas decreased behavioral ratings of depres- As previously noted, the A-B-A-B design controls
sion indicated a slight worsening of the patient's condi- for deficiencies inherent in the A-B-A strategy, and
tion. It is quite clear that with introduction of token elsewhere it has been termed the "equivalent time-
reinforcement in the B phase, there was a sharp in- samples design" (Campbell & Stanley, 1966). Not
crease both in the number of points earned and in the only does this design end on a treatment phase (B), but

8.ln Thoughl Slop 8.ln Thoughl Slop Follow - up


~ 200
i ~\
-y: lr

O ,!~\~ 1, i I:'
\ :'

Figure 9. Duration of obsessive ruminations during baseline, treatment, and 6-week follow-up_ (From "Thought-Stopping in the
Control of Obsessive Ruminations" by T. W. Lombardo and S. M. Thmer, 1979, Behavior Modification. 3, 267-272, Fig. 1.
Copyright by Sage Publications. Reprinted by permission.)

it provides two opportunities for showing the control- it softly, and fInally whispering "STOP." The patient then ver-
ling effects of treatment over the dependent measure balized "STOP" in the same manner with an additional fInal step
of saying "STOP" covertly. Depending upon how rapidly the
(B to A and A to B). patient gained control, four to six repetitions of stopping were
Let us now examine an example of the successful used at each voice intensity. (Lombardo & Turner, 1979, p. 269)
use of an A-B-A-B single-case design. Lombardo and
Turner (1979) evaluated the effects of thought stopping Treatment was discontinued on Day 18 and recom-
in a 26-year-old male psychiatric inpatient who was menced on Day 28. In addition, a 6-week follow-up
severely obsessive. Obsessions focused on "imaginal was carried out.
relationships" he had had with other patients on the The results of this study appear in Figure 9. Follow-
ward during previous hospitalizations. Although the ing a period of baseline stability (the model response
patient attempted to control obsessive ruminations was 40 min), thought stopping led to a marked de-
through distraction, this approach failed to reduce the crease of obsessions to a 0 level. When treatment was
disorder's full intensity. then withdrawn in the second A phase (baseline), ob-
In all phases of the experimental analysis, the pa- sessions increased considerably, well over baseline
tient was instructed to note the beginning and ending levels, albeit in a very unstable fashion. However, re-
times of each obsessive episode, thus allowing a deter- introduction of the treatment led to renewed improve-
mination of both the rate of ruminations and the total ment to a 0 level, maintained through Days 33-40.
time per day. Baseline (A) consisted of 6 days of obser- Furthermore, improvement continued throughout the
vation. Treatment (thought stopping) began on Day 7 6-week follow-up period.
and consisted of the patient raising his right index fin- Although the functional effects of treatment ap-
ger whenever he had obtained a vivid obsessive image. peared to be documented, "this conclusion must be
At that point, the therapist shouted, "Stop," and the tempered by the fact that controls were not provided
patient lowered his finger: for the possible therapeutic effects of instructions and
therapist as well as patient expectancies" (Lombardo
Fading of "STOP" intensity and transfer of control from thera- & Turner, 1979, p. 270). This kind of problem is defi-
pist to patient in all training was accomplished as follows. Ini-
tially, the therapist provided the "STOP" fIrst shouting, then nitely more prevalent when self-report data are used as
saying it loudly, then using a normal speaking voice, then saying opposed to motoric and physiological measures. As

noted earlier, motoric and physiological measures are NO


less susceptible to such confounding. 50


B-A-B Design
Although not as complete an experimental analysis UPON
as the A-B-A-B design, the B-A-B design is superior to ..I_MANCI

the A-B-A strategy, as it ends in a treatment phase.

Since the experimental analysis begins in a treatment
phase in the B-A-B design, the natural frequency (i.e.,
rate) of the behavior under investigation is not initially .'
obtained. On the other hand, the B-A-B design may be 10

useful for experimentation in institutional settings,

particularly if the staff are eager to get some disruptive
or unpleasant behavior under quick control. In such 30

instances, the staff will undoubtedly require persua- DAYS

sion with regard to withdrawal of the treatment in the Figure 10. The total number of hours of the on-ward perfor-
second phase (i.e., in A). mance by a group of 44 patients. (From "The Measurement and
Let us consider a historical example of a B-A-B Reinforcement of Behavior of Psychotics" by T. Ayllon and N.
H. Azrin, 1965, Behaviour Research and Therapy, 8, 357-383.
design in which the effects of token economic pro-
Fig. 4. Copyright 1965 by Pergamon Press, Ltd. Reprinted by permission.
cedures on work performance were evaluated for 44 permission. )
chronic schizophrenic patients (Ayllon & Azrin,
1965). In the first phase (B), the patients were awarded
tokens contingently for engaging in a variety of hospi- inhere~t in the traditional group comparison approach (e.g.,
tal-ward work activities. Tokens, of course, were ex- averagmg out of effects, effects due to a small minority while the
changeable for a large menu of "backup" reinforcers. majority remains unaffected by treatment) will be carried over to
the experimental analysis procedure. (p. 190)
In the second phase (A), the patients were given tokens
noncontingently, based on the individual rates ob-
tained in B. In the third phase (B), treatment was
Extensions of the A-B-A Design
The results of this study are depicted in Figure 10.
Extensions of the basic A-B-A design have appeared
During the first B phase, the group of patients averaged
a total of 45 work hours per day. When the contingency in numerous behavioral publications. In this section,
was removed in A, the work level dropped to I hour by we will consider three categories of such extensions.
Day 36. Reinstatement of the treatment in the second B The first involves a more extended replication of the
phase led to a marked increase of work output similar basic A-B pattern (e.g., A-B-A-B-A-B: Mann, 1972)
to that in the first phase. The data in the second B phase or the A-B-A-C-A design, where the controlling ef-
clearly document the controlling effects of the token fects ofB and C on A are examined in one study (e.g.,
economy on the work performance of these chronic Wincze, Leitenberg, & Agras, 1972). However, in the
A-B-A-C-A design, it is not possible to make a com-
schizophrenic patients.
Inasmuch as group data were averaged in this ex- parison of the relative effects of Band C, since these
two interventions are confounded by a third factor:
perimental analysis, Ayllon and Azrin (1965) also pre-
sented individual data, indicating that 36 of the 44
The second category we will look at involves the
patients were affected by the contingency in force; 8 of
the patients did not respond to token economic pro- additive or interactive effects of two therapeutic vari-
cedures. As argued by Hersen and Barlow (1976), ables (e.g., A-B-A-B-BC-B design). Here, given ap-
when group data are presented graphically, the investi- propriate data trends, it is possible to evaluate the con-
gator should also display data for selected subjects: tribution of C above and beyond that of B.
Finally, the third category is concerned with the as-
Individual data presented for selected subjects can be quite sessment of pharmacological treatments. As already
useful, particularly if data trends differ. Otherwise, difficulties noted, there are some unique problems in evaluating

the effects of drugs in a single-case design (e.g., the ment (contingency contracting) led to marked de-
need for placebo phases and the carry-over effects). creases in weight, with interposed baseline data evinc-
Also, it should be noted that at this juncture the use of ing a plateauing effect or an upward trend. In short, the
single-case analyses for pharmacological interven- controlling effects of the contingency contract on
tions is not as widespread as that carried out in evaluat- weight loss were firmly demonstrated several times in
ing behavioral interventions. Thus, in our discussion, this experimental analysis.
we also will highlight possibilities for the future. Wincze et al. (1972) evaluated the effects of feed-
back and token reinforcement on the verbal behavior of
a delusional psychiatric inpatient using an A-B-A-C-
A-B-A-B-A-B and A-B-A-C-A-C' Designs
A-C' -A design. During each of the phases of study, the
Mann (1972) repeatedly evaluated the effects of patient was asked daily to respond to 15 questions
contingency contracting (A-B-A-B-A-B design) in his selected at random from a pool of 105. The proportion
efforts to treat an overweight subject. At the beginning of the responses containing delusional material was
of the study, the subject surrendered a number of recorded for the individual sessions, as was the per-
prized possessions (i.e., variables) to the investigator, centage of delusional talk on the ward monitored by
which could be regained (one at a time), contingent on nurses 20 times a day.
a 2-pound weight loss over a previous low within a During A (baseline), no contingencies were in ef-
designated time period. By contrast, a 2-pound weight fect, and the patient received "free" tokens. Feedback
gain led to the subject's permanent loss of the valuable, (B) involved the patient's being corrected whenever he
to be disposed of by the investigator in equitable fash- responded delusionally. Tokens were still given to him
ion. That is, he did not profit in any way from the noncontingently in this phase. In A, baseline pro-
subject's loss. cedures were reinstituted. In the fourth phase (C),
As can be seen in Figure 11, institution of the treat- tokens were earned contingently for nondelusional


.. : -


i~ ~. ..
~ 1 ~ i

250 A ··
• B A B A

50 1 0 150 200

Figure 11. A record of the weight of Subject 1 during all conditions. Each open circle (connected by the thin solid line) represents a 2-
week minimum-weight loss requirement. Each of the solid dots (connected by the thick solid line) represents the subject's weight on
each of the days he was measured. Each triangle indicates the point at which the subject was penalized by a loss of valuables, eitherfor
gaining weight or for not meeting the 2-week minimum-weight-loss requirement. Note: The subject was ordered by his physician to
consume at least 2,500 calories per day for 10 days, in preparation for medical tests. (From "The Behavior Therapeutic Use of
Contingency Contracting to Control an Adult Behavior Problem: Weight Control" by R. A. Mann, 1972, Journal ofApplied Behavior
Analysis, 5, 99-109, Fig. l. Copyright 1972 by the Society for the Experimental Analysis of Behavior, Inc. Reprinted by
permission. )
2 3 4 5 6 7
~ 100 lIa••line FMdback Token: "Hline Token:
....Ion. Ionu•
~ 90
talk. This was followed by a return to baseline condi- struction of complex treatments it becomes necessary to deter-
tions. In C', tokens were awarded contingently for mine the nature of these interactions. (p. 213)
nondelusional talk that exceeded a given criterion As clearly noted in an earlier section, the impor-
(nondelusional talk more than 90%). Finally, in the last tance of the one-variable rule (i.e., changing one vari-
phase (A) ,baseline procedures were reinstated for the able across phases) holds in particular in interaction
fourth time. designs. In some instances, introduction of one thera-
The results of this study appear in Figure 12. These peutic variable will lead to some behavioral change,
data indicate that none of the treatment variables ap- but addition of a second variable will lead to still fur-
plied effected any change in delusional talk on the ther increases, as marked by a significant change in the
ward. Similarly, feedback (B) yielded no effects on slope of the curve (see Hersen & Barlow, 1976, p.
delusional talk in individual sessions. But token ses- 217). In other instances, the first variable may lead to a
sions (Phase 4) and token bonuses (Phase 6) pro- minimal effect, while the second suggests consider-
cedures led to decreased delusional talk in individual able additional effects. Let us consider one such
sessions, thus demonstrating the controlling power of example.
these treatments over the dependent measure. How- Kallman et al. (1975) evaluated the effects of rein-
ever, as has already been underscored, this design does forcing standing and walking on the mean distance in
not permit an analysis of the relative effects of token yards walked per instruction in a white, 42-year-old,
sessions and the token bonus treatment. married patient suffering from a conversion reaction
(i.e., an inability to walk). Figure 13 shows that in the
Interaction Designs first phase, when standing was reinforced with verbal
praise, only minimal efforts were made to walk. In the
As previously pointed out in Hersen and Barlow second phase, when walking and standing were both
(1976), reinforced, a marked linear increase in walking was
noted. In the third phase, standing alone was rein-
most treatments contain a number of therapeutic components.
One task of the clinical researcher is to experimentally analyze forced; the result was a plateauing effect. However,
these components to determine which are effective and which when reinforcement for standing and walking was re-
can be discarded, resulting in a more efficient treatment. Analyz- instituted in the fourth phase, further improvements in
ing the separate effects of single therapeutic variables is a neces-
walking appeared. In the next two phases, reinforce-
sary way to begin to build therapeutic programs, but it is obvious
that these variables may have different effects when interacting ment for standing and walking were maintained, but
with other treatment variables. In advanced states of the con- with the addition of a walker in the fifth phase and its


I . . . . .
a:0( I

~1 40


Q 120
~ I
::> •
a: 100 I

~ I
~ 8(]
w I
W 60 I


/ Adm.


o 20

Z • ,I
w / REIN. 14 wk REIN.
2wk 6 wk wk. wk.


STAND "flo
123 456 7891011 121314 1516 1718 1920 212223

Figure 13. Mean distances walked during all phases of treatment and follow-up. (From "The Use of Social Reinforcement in a Case
of Conversion Reaction" by W. M. Kallman et al., 1975, Behavior Therapy 6, 411-413, Fig. 1. Copyright 1975 by Behavior
Therapy. Reprinted by permission.)

removal in the sixth. The nomenclature for the first six once again was added to reinforcement in the final
phases of this study is as follows: (1) B; (2) BC; (3) B; phase. In summary, this study failed to document the
(4) BC; (5) BCD; and (6) BC. An evaluation of the controlling effects of reinforcement on weight gain and
analysis clearly indicates the controlling effects of C caloric intake, but it definitely reflects the controlling
(reinforcing walking) over B (reinforcing standing), effects of feedback on these two dependent measures.
but it does not reveal the controlling effects of the
walker (BCD) over no walker (BC), inasmuch as im-
provements in walking continued after the walker was Drug Evaluations
Let us consider still another example of the interac- So far in this chapter, we have touched on some of
tion design: an evaluation of the effects of feedback the issues related to the evaluation of pharmacological
and reinforcement on the eating behavior of an anorex- agents in single-case designs (viz., the placebo phase
ia nervosa patient (Agras, Barlow, Chapin, Abel, & and the carry-over effects from adjacent phases). A
Leitenberg, 1974). This study was done in an A-B-BC- third important issue in drug research, of course, is the
B-BC design, with A as baseline, B as reinforcement, use of double-blind assessments; that is, neither the
and C as feedback. Throughout the study, the patient patient nor the assessor is aware of whether a placebo
was provided four meals daily, each consisting of or an active drug is being administered. In the single-
1,500 calories. Reinforcement consisted of granting blind assessment, only the patient typically is unaware
the patient privileges, contingent on weight gain. of whether he or she is receiving a drug or a placebo.
Feedback, on the other hand, involved giving the pa- Hersen and Barlow (1976) have pointed out the dif-
tient specific information as to weight, caloric intake, ficulties inherent in conducting the double-blind as-
and actual mouthfuls consumed. sessment in single-case analysis:
The data presented in Figure 14 show a slight in-
crease in weight during baseline but decreased caloric A major difficulty in obtaining a "true" double-blind trial in
intake. When feedback was added to reinforcement in single case research is related to the experimental monitoring of
data (i.e., making decisions as to when baseline observation is to
the third phase, a marked increase in weight and calor- be concluded and when various phases are to be introduced and
ic intake was' noted. This leveled off when feedback withdrawn) throughout the course of investigation. It is possible
was removed in Phase 4, but increased when feedback to program phase lengths on an a priori basis, but then one of the
Reinforcement Reinforcement Reinforcement
ease Line I Reinforcement & Feedback & Feedback
Weight __
39 Intake 0----0

·r ~ ,,
, .5 i
~ '0 37
q ,,, .2~
c:: 2.000
• c::

\ 0..'
I 0, , I"
36 \
l 0
,0 ? 1.000
, ,O.10' \ \

. \

0 10 20 30 40 50 80

Figure 14. Data from an experiment examining the effect of feedback on the eating behavior of a patient with anorexia nervosa.
(From "Behavior Modification of Anorexia Nervosa" by W. S. Agras et af., 1974, Archives ofGeneral Psychiatry, 30, 279-286, Fig.
4. Copyright 1974 by the American Medical Association. Reprinted by permission.)

major advantages of the single case strategy (i.e., its flexibility) Listed in Table 2 are some of the possible design
is lost. However, even though the experimenter is fully aware of strategies for assessing drugs. Design 4-15 are all ex-
treatment changes, the spirit of the double-blind trial can be
maintained by keeping the observer . . . unaware of drug and
perimental in that the controlling effects of the drug on
placebo changes .... We might note here additionally that de- targeted behaviors may be ascertained. Also indicated
spite the use of the double-blind procedure, the side effects of is whether a single- or double-blind procedure is pos-
drugs in some cases . . . and the marked changes in behavior sible. Let us consider a published example of one of
resulting from removal of active drug therapy in other cases often
betray to nursing personnel whether a placebo or drug condition the designs (Number 13).
is currently in operation. (p. 206) Liberman et at. (1973) assessed the effects of
placebo and Stelazine on the social interaction of a 21-
In spite of the aforementioned difficulties, which year-old chronic schizophrenic patient who was quite
equally plague the group comparison researcher, there withdrawn. Social interaction was evaluated by noting
are some good examples of single-case work using the patient's willingness to engage in 18 daily half-
drugs. In some, the drug is a constant across phases minute chats with nursing personnel on the ward. Re-
while behavioral strategies are evaluated (cf. Wells, fusals to engage in such chats were labeled asocial
Turner, Bellack, & Hersen, 1978); in others, the addi- responses. In the first phase (A), the patient was with-
tion of a drug to a behavioral intervention is assessed drawn from all medication. In the next phase (A'), he
(Turner, Hersen, & Alford, 1974; Turner, Hersen, was administered a placebo, followed by 60 mg per
Bellack, & Wells, 1979); in still others (Liberman, day of Stelazine (Phase B). Next, he was withdrawn
Davis, Moon, & Moore, 1973; Williamson, CalPin, from Stelazine (Phase A'), and then Stelazine was re-
DeLorenzo, Garris, & Petti, 1981), the primary effects instated (Phase B).
of the drug are evaluated. As can be seen in Figure 15, removal of the drugs in

Table 2. Single-Case Experimental Drug Strategiesa

No. Design b Type Blind possible

1. A-A' Quasi-experimental None

2. A-B Quasi-experimental None
3. A'-B Quasi-experimental Single or double
4. A-A'-A Experimental None
5. A-B-A Experimental None
6. A'-B-A' Experimental Single or double
7. A'-A-A' Experimental Single or double
8. B-A-B Experimental None
9. B-A'-B Experimental Single or double
10. A-A'-A-A' Experimental Single or double
11. A-B-A-B Experimental None
12. A'-B-A'-B Experimental Single or double
13. A-A' -B-A'-B Experimental Single or double
14. A-A' -A-A' -BA'-B Experimental Single or double
15. A'-B-A'-C-A'-C Experimental Single or double
"From Single Case Experimental Designs (Table 6.1, p. 207) by M. Hersen and D. H.
Barlow, 1976. New York: Pergamon Press. Copyright 1976 by Pergamon Press. Re-
printed by permission.
bA = no drug; A' = placebo; B = drug 1; C = drug 2.

the first phase led to increased asocial behavior. With Vaithianathan (1986) evaluated the effects of placebo
the introduction of placebo, there was initial improve- and imipramine in a 22-year-old woman suffering from
ment and then a return of asocial behavior. Introduc- moderate mental retardation and deafness and visual
tion of Stelazine in the third phase led to a marked impairment in one eye as a result of congenital rubella.
improvement, followed by deterioration when Stela- The subject had very limited communication skills and
zine was removed in the fourth phase. Reinstitution of had lived in an institution for 11 years. Problems prior
Stelazine in the fifth phase clearly documents the con- to intervention of particular concern were crying, poor
trolling effects of the drug on improved social appetite, and screaming outbursts.
responding. An A-B-A'-B-A' -B design was followed, with
In a more recent drug trial, Field, Aman, White, and changeover phases between A and B when imipramine
was increased, and on two occasions between B and
A', when imipramine was reduced to O.
Behaviors targeted were evaluated twice weekly for
I 30 min each in the ward dayroom by two nursing staff
i 12 ,
members, using lO-sec recording intervals. As can be
g 10 I
,.Ji seen in Figure 16, percentage of time crying was low in
,,, baseline (A) and the frrst imipramine phase (B), much

,, higher (albeit variable) in the placebo phase (AI), low

again in imipramine (B), increasing in the second
~ I
placebo phase (AI), and low again in the third im-
:I 2
...,, ipramine (B) phase. However, the data are not clear as
to whether crying increased in placebo because of im-
3 8 II 13 15 17 Ie 21 23 25
ipramine withdrawal, especially given the low rate of
crying during the initial baseline phase (A).
Figure 15. Average number of refusals to engage in a brief con- On the other hand, data for percentage of meals
versation. (From "Research Design for Analyzing Drug-En- consumed clearly show the positive controlling effects
vironmental-Behavior Interactions" by R. P. Liberman et ai, of imipramine (58.7% on placebo; 71.0% on im-
1973, Journal of Nervous and Mental Disease, 156, 432-439,
Fig. 2. Copyright 1973 by Williams & Wilkins. Reprinted by ipramine). Similarly, data for number of screaming
permission.) outbursts show the controlling effects of imipramine

increased or decreased. But a weakness of the study is

that in some of the phases only two data points were
10 plotted, thus allowing for only a partial confirmation
of the data trend.

4 Additional Designs

Iv .. _
Although the basic A-B-A design and its numerous
o permutations have been used extensively and success-
Ii fully by applied behavioral researchers to evaluate
I 23 4 5 6 7 8 112131415161718182021 many therapeutic and educational problems, at times
some of these designs simply are not appropriate. In-
appropriateness may be due to practical, ethical, or
II II ! design considerations. For example, if a given thera-
I 11/'\'il
0 I
::I 60
peutic procedure cannot be reversed or withdrawn
1iii (e.g., therapeutic instructions), then a different design

0 70 I· " (such as the multiple-baseline design across behaviors)
I• ioj
. : i
could be employed to document the controlling effects
-J 60 .1
: t:
t. I:

! iii i
of instructions on independent target behaviors. On the
.. i\il
::I 50
0: other hand, if an investigator is intent on showing the
..I i
I ! ,
.' .i
effects of some kind of shaping procedure where be-
havior is to be accelerated or decelerated, then the
I 2 3 4 5 6 7 6 " 12131415161718182021 changing-criterion design would be more suitable. Fi-
nally, if the relative efficacy of two treatment strategies
is to be contrasted in a single subject, then the simul-
i f !i
: i:
taneous treatment design (sometimes referred to as the
multielement or alternating-treatment design) is the
') :: ,! design strategy of choice.
! !: i i
Let us now consider each of these designs in tum,
! 1"1 11/' beginning with the three varieties of the multiple-base-
~ I
i\!.: .
i iI:
\ line strategy.

o ! iii i Multiple Baseline

! i :: \'
i! i i Baer et al. (1968) first described the multiple-base-
I 2 3 4 5 6 7 8 I 12 1314 15 18 1 718 18 20 2 I
line design as follows:
In the multiple-baseline technique, a number of responses are
Figure 16. Percentage oftime crying, percentage of meals con· identified and measured over time to provide baselines against
sumed and number of screaming outbursts during baseline, im- which changes can be evaluated. With these baselines estab-
ipramine, and placebo periods. There was a I-week transition lished, the experimenter than applies an experimental variable to
phase, providing for dosage changes, between most drug phases one of the behaviors, produces a change in it, and perhaps notes
of the experiment. (From "A Single-Subject Study of Im- little or no change in the other baselines. (p. 94)
ipramine in a Mentally Retarded Woman with Depressive Symp-
toms" by C. J. Field et al., 1986, Journal of Mental Deficiency The investigator subsequently applies treatment to
Research,30, 191-198, Fig. 1. Copyright 1986. Reprinted by
succeeding behaviors until some criterion point has
been achieved. Generally, the treatment is then with-
(7.5 per week on placebo; 2.4 per week on held until baseline stability has been achieved.
imipramine). The strategy described above is referred to as the
A nice feature of the Field et al. study is the use of a multiple-baseline design across behaviors. An as-
changeover phase of a week in which drug dosage was sumption, of course, is that the targeted behaviors are

independent of one another. Otherwise, treatment for TRAINING SCENES

one may lead to covariation in a second, thus obfuscat- Social Skills
c:: Bsln. Follow-up
ing the controlling effects of the treatment. In essence, o . Training

~H': .!::.:~
the multiple-baseline design across behaviors is a se-
ries of A-B designs, with every succeeding A phase
continued until treatment has finally been applied to ~ § ~ .40
each. Treatment effects are inferred from the untreated ~ u 8. .20 .
baselines. That is, the controlling effects of treatment VI 0--;
on dependent measures are documented if, and only if,
change occurs when treatment is directly applied. In
this respect, the design certainly is weaker than that in

the A-B-A-B design, where the effects of controlling ,3 ...o
variables are directly shown.
Let us consider an example of the multiple-baseline ..................
design across behaviors. Bomstein, Bellack, and ,
Hersen (1977) assessed the effects of social skills train-
ing on the role-played performance of an unassertive
8-year-old female third-grader (Jane). During base-
line, specific behaviors were assessed (ratio of eye
contact to speech duration, loudness of speech,
number of requests, and overall assertiveness) in role-
played scenarios requiring assertive responding. As
can be seen in Figure 17, the baseline levels of re-
sponding for target behaviors were low. Treatment ap- 3 5 7 9 11 2- 4·
plied to each baseline under time-lagged and Probe Sessions Weeks
cumulative conditions led to marked increases in re- Figure 17. Probe sessions during baseline, social skills treat-
sponding. The reader should note that only when so- ment, and follow-up for training scenes for Jane. A multiple-
cial skills treatment was directly applied to each of the baseline analysis Of ratio to speech duration of eye contact while
first three targeted behaviors did changes take place. speaking, loudness of speech, number of requests, and overall
assertiveness. (From "Social-Skills Training for Unassertive
There was no evidence that the targeted behaviors were Children: A Multiple-Baseline Analysis" by M. R. Bomstein et
correlated, nor did concurrent change take place in al .• 1977. Journal of Applied Behavior Analysis, 10. 183-195,
untreated target measures. In short, the controlling ef- Fig. l. Copyright 1977 by the Society for the Experimental
Analysis of Behavior, Inc. Reprinted by permission.)
fects of social skills treatment were demonstrated. It
also should be noted that although overall assert-
iveness was not directly treated, independent ratings of
overall assertiveness reflected improvement through- sometimes encountered in applied clinical research.
out the course of treatment, with all treatment gains They argue that
generally maintained in follow-up.
Unless there is a specific theoretical rationale or the in case of ambiguity with the effects of a multiple-baseline de-
investigator has had prior experience working with a sign, it often is possible to include a partial reversal in the design
given set of target behaviors, there is no accurate way for one of the behaviors. The reversal phase, or return to base-
line, need not be employed for all of the behaviors (i.e., base-
to predict whether the three or more targeted behaviors lines) for which data are collected. Indeed, one of the reasons for
selected for treatment truly are independent of one using a multiple-baseline design is to avoid the ABAB design and
another. Following the initial logic of the multiple- its temporary removal of treatment. However, when the specific
baseline design across behaviors, if change in target effect of the intervention is not evident in a multiple-baseline
design, one may have to resort to a temporary withdrawal of the
behaviors 1 and 2 occur as a result of treatment applica- intervention for one of the baselines to determine the effect of the
tion to only the first, then the controlling effects of the intervention. (p. 607)
treatment will not have been demonstrated. The base-
lines are correlated, but that does not necessarily imply A problem with the Kazdin and Kopel solution is
that the treatment in general is ineffective. Kazdin and that in the case of instructions, a true reversal or with-
Kopel (1975) have offered a solution to this dilemma drawal is not possible. Thus, their recommendations
apply best to the assessment of such techniques as ---
!l0 8Dlthne
feedback, reinforcement, and modeling.
A second type of multiple-baseline strategy is the
40 --V:
one across settings. That is, a given treatment is ap- WALKING ON TRAIL
plied to one subject (or groups of subjects) across sev- 20 EVENING ACTIVITY
eral different settings (e.g., different classroom peri-
ods). The logic of the design, however, remains the
same. Baselines for separate settings increase in 0
length, with treatment applied under time-lagged and 40
cumulative conditions. Generally, only one behavior is 30
targeted for time-lagged treatment. But there is no rea- DINING HALL
son that concurrent changes in other behaviors should 20
not be monitored. '"enz 10
An example of multiple-baseline design across set- 0
en 0
, ...
tings was presented by Allen (1973). In his study, the
subject was an 8-year-old boy with minimal brain '"

damage who was attending a special summer camp. '"

30 ,
, CA81N
The target selected for modification was the child's

N 20
high rate of bizarre verbalizations in four separate ...0 10
camp settings: walking on a trail, in the dining hall, in I

the cabin, and during education sessions. Treatment ....z


0 ,
simply involved instructing the camp counselors to '"0
systematically ignore such bizarre verbalizations.
(Previously, these verbalizations had attracted consid- ...'"II: 30
erable social reinforcement from the counselors.) 20
The results of this experimental analysis appear in
Figure 18. Following 7 days of baseline, treatment was
implemented for walking on the trail, with a resultant 25
decrease in bizarre talk. But no concurrent changes DAYS
were observed in the dining hall. Only when treatment
was specifically applied to the dining hall did bizarre Figure 18. Daily number of bizarre verbalizations in specific
camp settings. (From "Case Study: Implementation of Behavior
talk decrease. Note, however, that when treatment was Modification Techniques in Summer Camp Setting" By G. J.
applied in the dining hall, there were some concurrent Allen, 1973, Behavior Therapy, 4, 570-575, Fig. I. Copyright
decreases in bizarre talk in the cabin. Similarly, when 1973 by Behavior Therapy. Reprinted by permission.)
treatment was applied in the cabin, there were some
concurrent decreases noted during education sessions. length. In contrast to the multiple baseline design across behav-
Thus, the last two baselines were not totally indepen- iors (the within-subject multiple baseline design), in the multiple
dent. Indeed, this was an instance in which Kazdin and baseline across subjects a single targeted behavior serves as the
primary focus of inquiry. However, there is no experimental
Kopel's (1975) recommendation of a partial reversal contraindication to monitoring concurrent . . . behaviors as
(withdrawal) for Baselines 3 and 4 would have added well. (p. 228)
confirmatory evidence to the treatment's effective-
ness. A good example of the multiple-baseline design
A third type of multiple-baseline design strategy is across subjects appeared in a paper by Ortega (1978).
the one across subjects. Although not strictly a single- In this study, Ortega evaluated the effects of relaxation
case study, the general principles of the multiple-base- training on the spasticity level of four cerebral palsied
line strategy apply. As described by Hersen and Bar- adults. The dependent measures involved two timed
low (1976), trials of the Placing Test and the Turning Test from the
Minnesota Rate of Manipulation Tests, which test the
a particular treatment is applied in sequence across matched
subjects presumably exposed to "identical" environmental con-
speed and dexterity of finger, hand, and arm move-
ditions. Thus, as the same treatment variable is applied to suc- ments.
ceeding subjects, the baseline for each subject increases in Figure 19 shows that all four subjects' performance

TREATMENT exposed to the same environment but treated in succes-

sion in time-lagged fashion. In this manner, the experi-
menter is able to control for the individual's history
(see Campbell & Stanley, 1966). However, under cer-
tain circumstances the possibility of treating several
subjects concurrently may not be possible, especially
if the disorder in question is rare. In order to deal with
• this problem, P. 1. Watson and Workman (1981) have
proposed an alternative strategy, referred to as the non-
SUBJECT 3 current multiple baseline across individuals.

In this research design, the researcher initially determines the

length of each of several baseline phases (e.g. 5, 10, IS days).
When a given subject becomes available (e.g. a client is referred

\®-~ who has the target behavior of interest and is amenable to the use
of a specific treatment of interest), (s)he is randomly assigned to
one of the pre-determined baseline length~. Baseline observa-
tions are then carried out; and assuming that responding has
....- - ® - TEST------
TURNING . -------...
reached acceptable stability criteria, treatment is implemented at
.PLACING TEST the pre-determined point in time. Observations are continued
throughout the treatment phase, as in a simple AB design. Sub-
jects who fail to display stable responding would be dropped
from the formal investigation; however, their eventual reaction
Figure 19. The time required to complete two trials of both the to treatment might serve as useful replication data. (p. 258)
Placing Test and the Turning Text, from the Minnesota Rate of
Manipulation Tests, during baseline, treatment, and follow-up Although Watson and Workman's strategy repre-
phases of research. Testing sessions, which measured the amount sents a viable approach, it should be used only as a last
of time required to complete various manual manipulations in- resort, since history cannot be controlled. But, of
volving pegs and pegboard, were conducted every third working
day throughout the experiment. During the treatment condition, course, with increased numbers of replications, one's
relaxation exercises were performed every working day. Subject confidence in the procedure under evaluation is
I was absent throughout Test Period 26 because of illness. Sub- enhanced.
ject 4 was vacationing during Periods 5 and 6. Severe cold weath-
er and heavy snows forced the closure of both sheltered work-
Another variation in the basic multiple-baseline de-
shop-facilities for eight working days, so Test Period 30 was sign is the use of the multiple-probe technique (Homer
canceled. (From "Relaxation Exercise with Cerebral Palsied & Baer, 1978). This strategy was designed to deal with
Adults Showing Spasticity" by D. F. Ortega, 1978, Journal of those instances where reactivity may occur simply as a
Applied Behavior Analysis, 11, 447-451. Fig. 1. Copyright
1978 by the Society for the Experimental Analysis of Behavior, result of repetition, of assessment in the successive
Inc. Reprinted by permission.) baselines (e.g., Bellack, Hersen, & Turner, 1976).
Thus, instead of having 8, 12, and 16 assessments in 3
successive baselines, fewer probes may be carried out,
on the two tests was slow, but that slight improvements resulting in, for example, 2, 3, and 4 measurement
generally occurred throughout baseline as a function of points. Of consequence, a somewhat weaker design
repeated trials. However, only when progressive re- results, especially if baseline data are not stable, there-
laxation exercises were practiced by each subject did fore providing suggestive rather than confirmatory evi-
marked changes in speed take place. Moreover, fol- dence of the controlling effects of treatment. Indeed,
low-up data indicate that performance improvement "as in the case of the noncurrent mUltiple baseline
was maintained for at least three weeks. Performance design, it should not be employed as a substitute for·
improvement from baseline to treatment (averaged continuous measurement when that is feasible" (Bar-
over the four subjects) was 28% on the Turning Test low & Hersen, 1984, p. 248).
and 21 % on the Placing Test.
Changing-Criterion Design
Variations in the Multiple Baseline
The changing-criterion design (cf. Hartmann &
In our prior discussion of the multiple-baseline de- Hall, 1976) appears to be ideal for assessing shaping
sign across subjects, each individual treated had been programs to accelerate or decelerate behaviors (e.g.,





25 I
DAYS: 1 8 15 29 36 43 50 57 64 71 78 85

Figure 20. Data from a smoking-reduction program used to illustrate the stepwise criterion-change design. The solid horizontaIlines
indicate the criterion for each treatment phase. (From "The Changing Criterion Design" by D. P. Hartmann and R. V. HaIl, 1976,
Journal ofApplied Behavior Analysis, 9, 527-532, Fig. 2. Copyright 1976 by the Society for the Experimental Analysis of Behavior,
Inc. Reprinted by permission.)

increasing activity in overweight individuals; decreas- sign was provided by Hartmann and Hall (1976) in
ing alcohol consumption in heavy drinkers). As a strat- their evaluation of a smoking-deceleration program.
egy, it bears characteristics similar to those of the A-B The baseline smoking level is graphically depicted in
design and has some features of the multiple-baseline Panel A of Figure 20. In B (treatment), the criterion
strategy. Following initial baseline observation, treat- rate was established as 95% of baseline (i.e., 45 ciga-
ment is applied until a given criterion is achieved and rettes per day). An escalating-response cost of $1 was
stability at that level appears. Then, a more rigorous set for smoking Cigarette 47, $2 for Cigarette 48, and
criterion is selected, and treatment is applied until the so forth. If the subject smoked fewer than the criterion
performance level is met. Changes in criterion level as number of cigarettes, an escalating bonus of 10 cents
a result of the second treatment are contrasted with the per cigarette was established. Subsequent treatment in
lower criterion in Treatment 1. Treatment is thereby C through G involved the same contingencies, with the
continued in this stepwise fashion until the final criteri- criterion for each succeeding phase set at 94% of the
on is met. "Thus, each phase of the design provides a previous one.
baseline for the following phase. When the rate of the The experimental analysis clearly shows the
target behavior changes with each stepwise change in efficacy of the contingencies established in reducing
the criterion, therapeutic change is replicated and ex- cigarette smoking by 6% or more from the preceding
perimental control is demonstrated" (Hartmann & phase. In addition, within the individual analysis,
Hall, 1976, p. 527). there were six clear replications of the treatment's ef-
An excellent example of the changing-criterion de- fect. In short, we agree with Hartmann and Hall (1976)

that "the changing criterion design is capable of TOKEN RFT TOKEN RFT,
providing convincing demonstrations of experimental 100 I , (CLASS)
control, seems applicable to a wide range of prob- , I~

II:: 80 I~'
lematic behaviors, and should be a useful addition to 0 ,

applied individual subject methodology" (p. 532). ~ ~:
:z: 40

Simultaneous Treatment Design .... 20

> 0
In the simultaneous treatment design (cf. Kazdin & z
~ 100
Geesey, 1977; Kazdin & Hartmann, 1978), there is the
opportunity to compare two or more treatments within

80 :~:~
I" ..... ~...... .1
Z 60
a single subject. This, of course, is in marked contrast ....
.' • I ... .... I

to the other strategies we have discussed to this point,

u ~
.. • ' r I
....II:: 40 ,
wherein design limitations do not allow for such IL
20 : SELF ••.•
comparisons. 0 ' CLASS<>---<>
As with all single-case designs, there are particular 5 10 15 20
circumstances under which the simultaneous treatment DAYS
design may be implemented. Thus, there must be the
opportunity to evaluate at least two stimulus dimen- Figure 21. Attentive behavior of Max across experimental con-
ditions. Baseline (base): no experimental intervention. Token
sions (e.g., different times of day, locations, or treat- reinforcement (token rft): implementation of the token program,
ment agents). In a classroom study, different time peri- in which tokens earned could purchase events for himself (self)
ods may be the stimulus dimension (morning vs. or the entire class (class). Second phase of token reinforcement
(token rfs 2): implementation of the class exchange intervention
afternoon). During the baseline phase, the targeted be- across both time periods. The upper panel presents the overall
havior is evaluated in each of the stimulus dimensions. data collapsed across time periods and interventions. The lower
Then two (or possibly more) interventions (e.g., indi- panel presents the data according to the time periods across
vidual vs. group contingencies) are applied concur- which the interventions were balanced, although the interven-
tions were presented only in the last two phases. (From "Simul-
rently in each of the stimulus dimensions. In order to taneous-Treatment Design Comparisons of the Effects of Eam-
avoid a possible treatment-stimulus dimension con- ing Reinforcers for One's Peers versus for Oneself" by A. E.
found, each of the two interventions is counter- Kazdin and S. Geesey, 1977, Behavior Therapy, 8, 682-693,
Fig. 2. Copyright 1977 by Behavior Therapy. Reprinted by
balanced across dimensions. For example, on the first permission.)
day, Treatment A is administered in the morning;
Treatment B is administered in the afternoon. On the
second day, treatment B is administered in the morn- 60%. Implementation of the token program for the
ing; Treatment A is administered in the afternoon, and subject alone (i.e., self) led to an average percentage
so forth. The results of the two treatments are plotted of attentive behavior of 72.5%. By contrast, the token
and visually examined. (It is possible to evaluate the program for the subject and the rest of the class (i.e.,
effects of counterbalanced treatment with statistical class: backup reinforcers were earned for himself and
analyses similar to those employed in the analysis of a the entire class) led to 91 % attentive behavior. Thus, in
Latin square design: see Benjamin, 1965.) In the third the third phase, the superior procedure was continued
phase of the study, the most efficacious treatment is across both class periods, with a mean percentage of
applied across each of the stimulus conditions. attentive behavior of 91.2% attained.
Let us look at an example of this design in a study In further considering the simultaneous treatment
carried out by Kazdin and Geesey (1977). In this inves- design, Kazdin and Hartmann (1978) pointed out that
tigation of classroom behavior, the effects of token the behaviors selected for study must be those that can
reinforcement for the subject alone versus token rein- rapidly shift and that do not evince carry-over effects
forcement for the subject and the rest of his class were after termination. By necessity, this would preclude
evaluated, with percentage of attentive behavior as the the evaluation of certain drugs in this kind of design.
dependent measure. This study was done in counter- Also, because of the counterbalancing requirement,
balanced fashion for two separate classroom periods. relatively few behaviors can be evaluated (probably
Figure 21 (bottom part) reveals that percentage of at- not more than three). Finally, "The client must make at
tentive behavior during baseline ranged from 40% to least two sorts of discriminations. First, the client must

discriminate that the treatment agents and time periods Kazdin (1976) has summarized the case against sta-
are not associated with a particular intervention be· tistical analysis in single-subject research as follows:
cause the interventions vary across each of the dimen·
sions. Second, the client must be able to distinguish Individuals who advocate non-statistical criteria for evaluation
caution against "teasing out" subtle effects because these effects
the separate interventions. One would expect that the
are least likely to be replicable. Moreover, involving statistical
greater the discrimination made by the client the more significance as the only criterion for evaluation does not encour-
likely there will be clear effects or discrepancies be- age the investigator to obtain clear unequivocal experimental
tween (among) treatments" (Kazdin & Hartmann, control over behavior. Finally, many investigators believe that in
clinical work statistical evaluation is simply not relevant for
1978, p. 919). assessing therapeutic change. (p. 272)

Statistical Analysis
The Case For
There has probably been no aspect of single-case The advocates of statistical analyses for single-case
research in recent time more fraught with controversy research recommend them for several reasons. The
than that involving statistical analysis (cf. Baer, 1977; most persuasive argument has been presented by Jones
Hartmann, 1974; Huitema, 1985; Jones et al., 1977, et al. (1977). In contrasting the statistical approach
Kazdin, 1984; Keselman & Leventhal, 1974; with visual analysis for a number of studies published
Kratchowill et al., 1974; Michael 1974a, b: Wampold in the Journal of Applied Behavior Analysis, it was
& Workman, 1986). The critics and the advocates of found that in some instances, time-series analyses (cf.
the use of statistics are equally intense about attempt- Glass, Willson, & Gottman, 1975) confirmed the ex-
ing to persuade colleagues and students to their respec- perimenters' conclusions based on visual inspection.
tive positions. In this section, it is not our purpose to In other instances, time-series analyses did not con-
attempt to resolve the controversy. Rather, it is our firm the experimenters' conclusions. In still other
explicit intention to look at the arguments and to ferret cases, time-series analyses indicated the presence of
out the data in relation to these arguments. In so doing, statistically significant findings not identified by the
we will briefly examine the opposing positions while experimenters. Consequently, Jones et al. (1977) con-
considering some of the recommended statistical cluded that
all three kinds of supplementary information provided by time-
The Case Against series analysis are useful. It is rewarding to have one's visual
impressions supported by statistical analysis. It is humbling
The basic argument against the use of statistics in and/ or educational to have other impressions not supported. And
single-case research involves the distinction between it is clearly beneficial to have unseen changes in the data detected
by a supplementary method of analysis. It is difficult to see how
clinical and statistical significance. Indeed, one of the operant researchers can lose in the application of time-series
specific arguments against the group comparison analysis to their data. (p. 166)
method is that statistics do not give the experimenter a
"true" picture of the individual pattern of results. That Statistical analysis may prove helpful when baseline
is, positive and negative treatment effects cancel out; stability is difficult to establish and considerable over-
in addition, statistics may possibly yield significance lap exists between the baseline and the treatment
from very weak overall treatment strategies. Thus, if phases. As pointed out by Kazdin (1976), "whereas
the effect of treatment is not sufficiently substantial to visual inspection of the data often entails noting dis-
be detected by visual inspection (i.e., considerable tinct changes in trends across phases, statistical analy-
overlap of data between baseline and treatment sis can scrutinize continuous shift across phases where
phases), then the treatment applied is not clinically there is not change in trend" (p. 270).
potent and its controlling effects have not been clearly A third use advocated for statistical analysis is for
documented. On the other hand, if treatment is of suffi- investigations in so-called new areas of research. Pre-
cient potency to yield considerable clinical change, sumably, in these newer areas, therapeutic techniques
then there is the expectation that such change may are unlikely to be fully refined and developed; hence,
approach the social norm (i. e. , social rather than statis- there is a lesser likelihood that marked clinical dif-
tical validation) (See Kazdin, 1977). This being the ferences will appear on visual inspection. Thus, in the
case, statistical analysis should prove superfluous. early stages of research, it is argued that statistics may

reveal small but important differences with clinical account change in the level, change in the slope of the
implications. curve, and the presence or absence of drift or slope in
A fourth reason offered for the use of statistical anal- the curve (see Jones et aI., 1977). Indeed, Figure 22
yses (cf. Kazdin, 1976, 1984) is the increased intra- depicts six illustrative treatment effects that may be
subject variability in uncontrolled research settings ascertained through the use of time-series analyses. As
(e.g., in the natural environment). Again, the argu- noted by Jones et al., in some instances the mere visual
ment put forth is that the statistical approach may de- analysis of such data might yield erroneous conclu-
tect changes that could eventually have some clinical sions.
impact when the specific therapeutic or educational Despite the obvious utility of the time-series ap-
strategy is later refined. proach, it is not without its limitations. First, to meet
the requirements of the analysis, a fairly large number
of observations may be required (Le., 50-100). Al-
T-Test and ANOVA
though feasible in some investigations, this number
A number of t-test and analysis-of-variance would preclude the use of statistics in many others
(ANOYA) techniques have been adapted for use in where short-term treatment effects are being evalu-
single-case research across the different phases of a ated. More recently, Huitema (1985) has presented
given study (cf. Gentile, Roden, & Klein, 1972; Shine evidence questioning whether in many of our single-
& Bower, 1971). If we compare the ANOYAin single- case studies "data have highly autocorrelated re-
case research and group comparison designs, the treat- siduals" (p. 107). Indeed, he argues that complicated
ment factor in the single-case study is analogous to the time-series analyses are neither "appropriate" or nec-
between-group factor. Similarly, the number of obser- essary in such instances. However, Huitema's (1985)
vations within a phase is comparable to the within- assertion has been challenged by Suen (1987; Suen &
group factor. In developing their ANOYA technique, Ary, 1987), who noted "that the findings of no or little
Gentile et al. (1972) assumed that the performance of a autocorrelation in behavior analyses data can best be
response within a phase is independent of each other described as inconclusive" (Suen & Ary, 1987, p.
response. However, it should be noted that they were 113). At this point in time it is safe to say that the
aware of "the high autocorrelation of adjacent obser- philosophical and technical controversies concerning
vations" (Kazdin, 1975, p. 276). To control for this use of statistics in single-case research continue
factor, Gentile et al. suggested combining nonadjacent unabated.
phases in the A-B-A-B design (i.e., A J + A2 ;B J + B2 )
in computing the statistical analysis.
Additional Comments
Despite the correction factor suggested by Gentile et
a!., there are two basic problems in using the ANOYA There can be no doubt that statistical analysis for
model. First and foremost is the issue of dependency. single-case research has its merits and should prolife-
As argued by Kazdin (1976), "combining phases does rate in the future. Also, the reader should keep in mind
not at all affect the problem of non-independent data that many other statistical strategies (not discussed in
points and the decreased variability among observa- this chapter) have appeared and undoubtedly will con-
tions within phases, two factors that can positively bias tinue to appear in the press. (For a more comprehen-
F tests" (p. 277). The second problem is that the sive coverage of the area, the reader is referred to
ANOYA essentially contrasts the means of each phase. Huitema, 1985; Kazdin, 1976; Kratochwill, 1978).
Thus, the statistical model proposed fails to take into
account data trends as represen.ted by the slope of the
curve. In short, it would appear that the criticisms of Replication
applications of traditional group statistics to the single-
case study are warranted (see Hartmann, 1974; Kes- In the previous section, we looked at some of the
elman & Leventhal, 1974). statistical techniques that might serve to confirm (or
even to supplant) the experimenter's visual analysis of
his or her data. The objective in using a statistical
Time-Series Analysis
technique is to guarantee that the visual inspection of
Time-series analysis controls for the problems al- trends indicating controlling effects of treatment vari-
luded to above in that the statistical strategy takes into ables on dependent measures is indeed valid. Assum-

to= Itort of lim...rie. w

a:: =
t; on int.rruption point a::
o tp =.nd of tim...ri•• (.J
~ ..J
..J «
:> :>
~L- ________ ~ _________ ~

t. t. t. t. t. t.
Ia..'in. Trealment Ia..line Trea"".nt
o. Chonte in ,...,. oro trend. b. No change in ,...,. nonzero
no change in tr.nd. tr.nd. no change in trend.

w w
a:: a::
o o
~ ~
« «
a:: a::
> >
« «
:t w
w a!

t. t. t, t. t.
lcneIine Treatment lcneIine
c. Cho,.. in Ie..,. nonzero d. No cho,.. In level. nonzero
trend. no cho,.. in trend. trend, cho,.. In trend.

w w
a:: a::
o o
~ ~
« «
a:: o
> :>«
l: :t
w w
~ ~

t. t. t. t. t. t.
IIneIIne TrwIIIIent '-a_ine Treatment
•• Change in _'. nonzero f. No chonte in ....,. zero
trend. chonte in trend. trend. no cho... in trend.

Figure 22. Six illustrative treatment effects: combinations of level and trend changes detectable by time-series analysis. (From
"Time-Series Analysis in Operant Research" by R. R. Jones et al .• 1977, Journal ofApplied Behavior Analysis, 6, 517-531, Fig. 1.
Copyright 1977 by the Society for the Experimental Analysis of Behavior, Inc. Reprinted by permission.)

ing a high concordance between a visual and a statis- Elsewhere, Hersen and Barlow (1976) noted that
tical analysis of the data (thus confirming the
treatment's efficacy for the one subject), the question, Replication is at the heart of any science. In all science, replica-
of course, remains whether the same effect can be re- tion serves at least two purposes: First, to establish the reliability
peated in a different subject. Thus, replication is con- of previous findings and, second, to determine the generality of
these findings under differing conditions. These goals, of course,
cerned with the reliability and the generality of are intrinsically interrelated. Each time that certain results are
findings. replicated under different conditions, this not only establishes

generality of findings, but also increases confidence in the relia- example (H. L. Mills, Agras, Barlow, & 1. R. Mills,
bility of these findings. (p. 317) 1973), there were four successful replications with de-
sign modifications during replications. In still another
The importance of replication in single-case re- series (Wincze et al., 1972), there were mixed results
search should be quite obvious from a strictly scientific in nine replications.
standpoint. Also, however, critics of the single-case When mixed results occur, the investigator should
research approach have chastised applied behavioral search for the causes of failure and refine the treatment
researchers for reporting chance findings in single procedures. According to Hersen and Barlow (1976),
cases, despite the apparent demonstration of experi- if one successful experiment is followed by three suc-
mental control in each. Thus, as in the use of statistics cessful replications, then it is useful to begin a system-
in the experimental analysis of behavior, replication atic replication series, in which different behaviors in
(or its absence) is a controversial point. the same setting or similar behavior in different set-
In this section, we are concerned with the descrip- tings are treated by different therapists. If, on the other
tion of three types of replication strategies referred to hand, one successful treatment is followed by two
in Hersen and Barlow (1976): (I) direct, (2) clinical, failures to replicate, the investigator should carefully
and (3) systematic. For each type of replication series, study the variables that account for the failure. This
the specific issues and guidelines are considered. can be as important as a successful attempt at direct
replication, inasmuch as new hypotheses may be gen-
erated, leading to vastly improved treatments.
Direct Replication
Sidman (1960) has defined direct replication as Clinical Replication
"repetition of a given experiment by the same investi-
gator" (p. 72). This could involve replication of a pro- Hersen and Barlow (1976) have defined clinical rep-
cedure within the same subject or across several sim- lication as the "administration of a treatment package
ilar subjects. As argued by Hersen and Barlow (1976), containing two or more distinct treatment procedures
"while repetition on the same subject increases confi- by the same investigator or group of investigators.
dence in the reliability of findings and is used occa- These procedures would be administered in a specific
sionally in applied research, generality of findings setting to a series of clients presenting similar com-
across clients can be ascertained only by replication on binations of multiple behavioral and emotional prob-
different subjects" (p. 310). Generally, thesameinves- lems, which cluster together" (p. 336). Examples
tigator (or research team) repeats the study in the same might be schizophrenia or childhood autism (e.g.,
setting (e.g., school, hospital, or clinic) with a set of Lovaas, Koegel, Simmons, & Long, 1973).
clients who present with a similar educational or psy-
The usefulness of this effort also depends to some extent on the
chological disorder (e.g., unipolar nonpsychotic de- consistency or reliability of the diagnostic category. If the clus-
pression). Although such clients may differ to some tering of the target behaviors is inconsistent, then the patients
extent on demographic variables, such as age, educa- within the series would be so heterogeneous that the same treat-
ment package could not be applied to successive patients. For
tion, and sex, it is better for a direct replication series if
this reason, and because of the advanced nature of the research
these are closely matched. This is of special impor- effort, clinical replications are presently not common in the liter-
tance for instances where failure to replicate occurs. In . ature. (Hersen & Barlow, 1976, p. 336)
such cases, failure, then, should not be attributed to
differences in demographic variables. Of paramount Guidelines for clinical replication are essentially the
concern is that the identical procedure be applied same as for direct replication. However, interclient
across the different subjects. Otherwise, possible characteristics are, by definition, likely to be more
failures or unusual successes may occur, with attribu- heterogeneous, thus necessitating a longer replication
tion to subject characteristics erroneously assumed. series than in the case of direct replication. Also, suc-
Hersen and Barlow (1976) have described several cessful clinical replication should lead to a systematic
series where direct replications have been undertaken. replication series.
In one, the initial experiment was successful followed
by two replications in the treatment of agoraphobia
Systematic Replication
(Agras, Leitenberg, & Barlow, 1968). (The same held
true for a study reported by Hersen et aI., 1973, on Systematic replication is defined "as any attempt to
token economy and neurotic depression.) In a second replicate findings from a direct replication series, vary-

ing settings, behavior change agents, behavior disor- research strategy, the single-case approach has had a
ders, or any combination thereof. It would appear that long and interesting history. In this chapter, we first
any successful systematic replication series in which examined the historical roots of the current experimen-
one or more of the above-mentioned factors is varied tal analysis-of-behavior model. Then, we considered
also provides further information on generality of find- general issues, such as intrasubject variability, inter-
ings across clients since new clients are usually in- subject variability, generality of findings, and vari-
cluded in these efforts" (Hersen & Barlow, 1976, p. ability as related to generality. We next looked at some
339). of the general procedures that characterize most sin-
As for specific guidelines, systematic replication gle-case research (repeated measurement, choice of a
ideally begins after one successful initial experiment baseline, the changing of one variable at a time, length
and three direct replications have been carried out. of phases, reversal and withdrawal, and evaluation of
However, although the word systematic is included, irreversible procedures). This was followed by a dis-
usually such a series is carried out by researchers, ei- cussion of basic A-B-A designs, their extensions, and
ther concurrently or in succession, in a number of set- the additional designs required when conditions for the
tings. Some researchers may be in direct communica- A-B-A strategies cannot be met. We briefly looked at
tion with one another, but more frequently, it turns out the thorny issue of statistical analysis in single-case
that they are simply working on similar problems; research and ended with a discussion of direct,
hence, the possibility (albeit not really systematic) of clinical, and systematic replication.
replications with variation. Probably the largest such In conclusion, we should point out that the field is
series in the behavioral literature is the one involving not static and that newer design and statistical tech-
dozens of single-case studies showing the efficacy of niques will undoubtedly emerge over the course of the
differential attention procedures for adults and chil- next few years (see Barrios, 1984). This, of course, is a
dren (cf. Hersen & Barlow, 1976, pp. 344-352). healthy phenomenon that we can only applaud. More-
In examining a systematic replication series, it is over, we should acknowledge that there are some in-
important to note differences in therapists, treatment herent limitations in all design strategies, including
settings, and clients. In that sense, the objective of a single-case analysis, that all researchers need to recog-
systematic replication series is to determine exceptions nize (Cavell, Frentz, & Kelley, 1986). Thus, we see
to the rule (i.e., those instances in which a given treat- nothing inherently wrong in, at times, using the single-
ment strategy will not work for a given client or for a case approach to generate treatment hypotheses that
particular therapist). This certainly was the case when subsequently may be refined and then pitted against
Wahler (1969) found that differential attention was not one another in larger scaled group comparison studies.
an effective treatment for dealing with oppositional
children. Thus, the limits of applicability of differen-
