Sie sind auf Seite 1von 12

1

Methodological Considerations
ABSTRACT: Conducting research with persons who have advanced dementia is
more difficult than during the earlier stages of the disease because of severe cognitive
and language difficulties that preclude abstract thinking, being interviewed, or selfreporting. In order to directly obtain data on subjects, researchers require observational scales that can measure agitation, discomfort, engagement, pain, or resistiveness to care. Family or staff respondents are necessary to measure other factors, such
as disease severity, quality of life, or satisfaction with family visits or end-of-life care.
All scales must meet criteria of conceptual, methodological, operational, and empirical adequacy. This chapter describes approaches used in developing such scales and
presents evaluation criteria in reporting how a scale was developed (purpose, framework, literature review, design, and methods) and in critiquing a scales merits. The
chapter also describes observer training and evaluation and discusses data management and recoding strategies. The goal is to overcome any potential inaccuracy in
using scales. The accuracy of a scale depends on the reliability and validity of the
scale and on the accuracy of data collection and management.

Four pillars, ranging from abstract to concrete, support excellence in research: conceptual,
methodological, operational, and empirical adequacy. A flaw in any of these pillars jeopardizes the entire project. Each pillar, alone and in combination, needs to be considered
during the phases of planning, conducting, analyzing, and utilizing research.
Conceptual adequacy relates to the abstract underpinnings of the research topic. A topic
does not exist alone unconnected to other topics, but instead theoretically as an intangible component of a conceptual framework or as a facet of a concept within a model.
The framework provides an organized way of thinking about the total project and all of
its parts, thereby creating a lens through which the problem is viewed as well as a roadmap that guides methodological choices and connects the study with existing knowledge
of the topic. Whether one is conducting a multisite clinical trial or a unit-based quality-
improvement initiative, the framework will logically connect all components of the project.
How scales are conceptualized and concepts/topics are defined determines how scales
are developed, helps in the selection of specific scales, and guides how scales should be
used. Some scale selections are easy to make. For example, when the person with dementia is not interacting with a caregiver, one would not use a scale to measure resistiveness,
which is invoked by a caregiving encounter and is defined as the repertoire of behaviors with which persons with dementia withstand or oppose the efforts of a caregiver
(Mahoney et al., 1999, p. 28). Similarly, one would select a pain scale, such as the PAINAD,
if planning an intervention to alleviate symptoms associated with an invasive procedure
that will cause tissue damage (Warden, Hurley, & Volicer, 2003).
3

4 Assessment Scales for Advanced Dementia

Methodological adequacy is achieved when researchers have rigorously used the correct
methods to answer the research questions in order to ensure the quality of the research
project. The type of study design must be determined (experimental, quasi-experimental,
or nonexperimental), which in turn determines what measures and statistical tests should
be used. To guarantee that the data are accurate and that error is minimized or controlled, one must identify procedures and techniques that are appropriate for the project
and consistent with its conceptual underpinnings.
For example, what technique(s) should be used to study a concept, and what are the
best methods for collecting data? When studying agitation in a person with advanced
dementia, it is clear that self-report is impossible, but should one observe the person directly or use a proxy respondent to provide a retrospective summation and evaluation?
Or should researchers conduct direct observation as well as use retrospective reports? The
use of multiple measures of the same concept strengthens a study and is recommended
when feasible. For example, Camberg et al.s study of enhancing well-being, where agitation was considered the inverse state of well-being, measured both direct observation and
retrospective recall to provide comprehensive results about the efficacy of using Simulated Presence as an intervention for persons with dementia (Camberg et al., 1999). (For
more details about this study, see Chapter 10.) It is important to keep in mind that a caregiver cannot simultaneously provide care and collect objective data using a behavioral
observation scale.
Operational adequacy means that the mechanisms to achieve the projects goals are all
set; that is, the research staff, equipment, devices, instruments, and so forth work properly. Operational adequacy relates to the infrastructure of the project, such as the intervention (treatment fidelity) and outcome measures (scales). One can determine a scales
operational adequacy by examining its psychometric properties (reliability and validity).
It is important to examine the initial values obtained when the scale was developed, the
scales psychometric performance in subsequent projects, and the stated values in the
study being reported. One may view psychometrics as both the paper evaluation and
the people evaluation, the latter having to do with whether the scale is used with accuracy and consistency.
Two other related factors, range and sensitivity, should be considered, each having to
do with the scales capacity to detect and quantify the concept being studied. While scales
developed in a norm-referenced (versus a criterion-referenced) framework have a range, it
is important to consider that the range of the scale may not fit the range of the actual phenomenon. For instance, the Mini-Mental Status Examination (MMSE) is commonly used
as a measure to rate disease severity (Folstein, Folstein, & McHugh, 1975). Yet, the MMSE
has a bottom effect, meaning that once the lowest scoring option of 0 is reached, the
MMSE cannot detect any more changes, although there may be vast differences between
persons scoring 0 in terms of disease severity. The BANS-S (Volicer, Hurley, Lathi, &
Kowall, 1994) was developed in order to detect differences in disease severity among persons with advanced dementia who bottom on the MMSE (see Chapter 2). Sensitivity in
this context (versus the context of sensitivity and specificity in criterion-referenced scales)
refers to the scales capacity to detect clinically significant differences when they do exist. For instance, the RTC-DAT (Mahoney et al., 1999) uses a scoring system whereby the
severity of each item is calculated by multiplying the intensity (3-point scale) by duration
(4-point scale) to yield a severity score range of 1 to 12 for each item rated as present (see
Chapter 11). Thus, individual items are sensitive enough to detect wide gradations in the
target behavior and contribute to the overall sensitivity of the RTC-DAT.

Methodological Considerations 5

Scales reliability and validity, as well as raters accuracy, also must be reviewed and
evaluated. Whether manually extracting or downloading data from an electronic record/
database or observing specific expressions or actions, raters must be accurate and consistent time after time. Another rater should be able to obtain nearly identical results during
simultaneous observations. Additionally, several traditions are followed when judging
psychometric adequacy. When a scale is being developed, one tradition is that 10 subjects
are required per scale item to conduct a factor analysis, but this has been refuted to allow
for 5 subjects (Knapp & Brown, 1995). Also, the benchmark of .8 Cronbachs alpha
to ensure adequate internal consistency can be reduced to .7 alpha for a new (versus a
mature) scale (Nunnally & Bernstein, 1994).
Empirical adequacy means that researchers use appropriate mechanisms to answer research questions or test hypotheses. The techniques used to manage data must be thorough and complete, and decisions for manipulating data should be justified and reported.
The statistical tests selected need to be suitable for doing the job and must be used
correctly. Both the absolute value/s and probability of error must be reported. If a value
falls outside the standard range for accepting a hypothesis or judging psychometric properties to be adequate, and there is a rationale for declaring the results to be satisfactory,
the rationale must be cited. While it is often important and necessary for researchers to
follow formal procedures and maintain standards, pragmatic considerations should also
have weight. For example, it may not be logistically possible to obtain the desired number
of subjects required according to a power analysis, but the topic may still be important,
timely, and worthy of study. In this case, although the study may lack the power to detect
statistically significant differences, it may still detect clinically meaningful differences or
identify a trend.
In conclusion, we suggest that research consumers follow the chronological steps of a
projects instrument development and ask: Is this the best way to view this overall issue,
to plan this specific project, and to address and manage issues that are likely to surface?
To answer these questions, one must review the overall assumptions and inner workings
of the project, including the following:



1.
2.
3.
4.

Identification and delimitation of the problem


Specification of the purpose and aims
Synthesis of the literature review
Determination of methods and procedures:
a. Identification of type/s of sites and subjects
b. Selection of scales
c. Establishment of data collection strategies
d. Planning and conducting the analysis
e. Interpreting the data
f. Evaluating the findings
5. Placement of the results in the context of existing knowledge
Since conceptual, methodological, operational, and empirical adequacies are so closely
linked, it would be artificial to discuss evaluating scales under those headings. Our approach is to align the next sections of this chapter with the typical outline of how a scale is
developed (initial report) and subsequently used (research projects). This book is neither
an instrument development nor research methods book, but rather a practical guide to

6 Assessment Scales for Advanced Dementia

help potential users evaluate a scales merits. We provide some citations to reference specific statements, but other information on measurement and research terms or statistical
tests can be found in classic texts (Tabachnick & Fidell, 2013; Waltz, Strickland, & Lenz,
2010), or via the Internet.

Purpose, Framework, and Literature Review


The purpose for developing a scale (need for making a specific concept operational)
should be clear, linked to the framework, and supported by the literature. One must
carefully read both the initial scale development article and reports of projects that used
the scale to verify that the scales topic (concept) does not overlap with other concepts.
The topic should be clearly defined, consistently used, and visibly linked to its theoretical roots. One needs to look for consistencies and/or discrepancies in how the scale was
conceptualized, reviewing theoretical and operational definitions of the topic measured.
Some studies using the scale may offer additional psychometric support, providing evidence suggesting use of the scale. Other studies may find that even when researchers use
the scale as the developers specified, psychometrics are inadequate and do not provide
evidence supporting further use. Or, a study might use the scale inconsistently with its
conceptualization or alter the scale without providing reliability and validity results to
suggest that the original and altered scales are equivalent.
The framework drives the approach for developing and using a scale. A framework
may have been used for several decades (see Chapter 6) or may be developed empirically as the project is conceptualized and carried out (see example of the model developed for the RTC-DAT scale in Chapter 11). When frameworks are depicted graphically, relationships among concepts are explained in a straightforward manner, making
it easy to visualize the place of the concept within the framework and its relationships
with other concepts (see model in Chapter 6). When not depicted graphically, draw
your own model to facilitate learning whether or not the framework informed the development of the scale, from the conceptualization of research questions, the selection and
use of appropriate methods, and elimination of potential validity threats to final results
and conclusions.
The literature review should provide a concise synthesis and critical analysis of relevant research using both seminal (possibly decades old) and current work to explicate
the concept measured by the scale and to identify relationships among different works.
Published works should inform readers about the state of the science regarding the studied concept, illuminating what is known, unknown, generally accepted, disputed, or inconsistent about the topic (e.g., major methodological flaws, gaps in research, or issues
that need further study). The literature review for the original scale report should support
why the new scale is needed. This original report should also define where and how to
collect the data and should indicate who should collect data when administering the scale
(e.g., archived database or electronic health record, self-report, proxy caregiver or family respondent, direct or recorded observation). In addition to providing the results of
hypothesis testing, subsequent articles using or reviewing the scale should provide the
scales psychometrics in that study and enough data so that others can compute an effect
size for planning follow-up studies.

Design and Methods


Instrument development reports should share the steps used in developing an instrument
and follow a logical plan with a series of consecutive phases. The scales presented in this

Methodological Considerations 7

book adhered to the tenents of classical measurement theory as suggested in Measurement in Nursing and Health Research (Waltz et al., 2010). Look for a blueprint of how the
investigators proceeded and criteria to ensure conceptual and empirical adequacy, then
investigate how these drove scale development procedures. For instance, did the investigators determine that the scale was empirically grounded (Tilden, Nelson, & May, 1990);
judged to have content validity (Lynn, 1986); and accepted by potential users (Wyatt &
Altman, 1995)? Did the scale have adequate initial reliability estimates of internal consistency (Nunnally & Bernstein, 1994)? Were stages of the development process delineated
and followed? The development process usually consists of four basic stages: 1) determining the blueprint, 2) creating the item pool, 3) examining content validity, and 4) critiquing readability and procedures.
A scale blueprint is a plan that sets forth how the scale will be developed, similar to
how a building blueprint informs construction workers and tradespersons as they translate a paper diagram into the specified edifice. The blueprint provides guidelines that help
researchers identify potential scale items (whether from the literature, from potential subjects or users, or by interview or observation.) Look for both conceptual and operational
definitions of the concept and examine them to judge if they meet criteria of conceptual
and operational adequacy. Then look to see how the item pool was developed and how
items were combined, refined, and edited. To what degree does the item pool represent
the universe of empirical indicators of the concept? Review how content validity was
judged. An established method of judging content validity (Waltz et al., 2010) involves
using content experts to rate each item and its description for relevance and congruence
with the concepts operational definition. One of the seminal articles to guide this process
is Lynns Determination and Quantification of Content Validity (Lynn, 1986), in which
a grid guides investigators on the number of judges and agreements needed to provide a
content validity index beyond the .05 level of significance for retaining items.
One should also review how administration and scoring procedures were determined. For instance, why was a visual analogue scale with a 100 mm line or a Likert-type
scoring system anchored with the descriptors selected? Anchors can reflect polar opposites of the item, degrees of intensity, numbers of defining characteristics, or duration
during a specific rating period. A not applicable box (NA) should be available for raters
to separate NA from a scoring of 0. Typically, high scores equal a high presence of the
concept being studied. Items can be reverse scored so that highly positive is represented
by high scores. Occasionally, low scores mean a high presence of the concept, as in the
QUALID (Weiner et al., 2000) (see Chapter 4). When low scores indicate a high presence
of the concept being studied and two scales are correlated to examine convergent validity,
such as the EOLD-SM (Volicer, Hurley, & Blasi, 2001) and the QUALID, the correlation
(in this case, r = .64) needs to show a minus sign, since low QUALID scores indicate good
quality of life. Also, it is typical to use the mean scores of items, subscales, and scale totals.
When examining re-test stability (assessed with the paired t-test), researchers need to devise and use a system (specified in the administration instructions) that describes how to
compare data obtained from the same person during two different time periods when it is
necessary to maintain that persons privacy.

Testing
Look for phases and types of reliability testing. Wyatt uses the criterion of credibility (Wyatt
& Altman, 1995) to learn empirically if the items and scale administration procedures are
acceptable to potential users. To be a valuable outcome measure, a scale must be considered reasonable by those who would respond (or not) to individual items. We do not use

8 Assessment Scales for Advanced Dementia

the term face validity, but instead suggest seeking input beyond the research team, such as
a focused group discussion including potential end-point users or their representatives.
While clarifying the teams intent for each of the items, the group can discuss how others
understand each item, resolving concerns and incorporating suggestions (e.g., rewording
items considered to address two distinct areas so that they become separate items).
Several steps should then follow to make empirically derived item-reduction decisions, explore conceptual dimensions, compute reliability estimates for internal consistency and retest stability (if used), and confirm the final scale and administration
procedures. Look for selection of sites and subjects to avoid potential confounders (nonrepresentative settings or subjects) and to obtain diverse subjects with desired characteristics for examining the scale. While instrument development projects are not powered to
identify the minimum required number of subjects (after data cleaning), a rule of thumb is
to plan for a minimum of five participants per item for examining factor structure (Knapp
& Brown, 1995).
One should make an a priori decision as to the percentage of items that must be completed in order to retain a subjects data. Researchers should review demographic characteristics to ensure that they are consistent with the proposed use of the scale (e.g., subjects
with advanced dementia who would not interact with an observer versus a cognitively
intact person). In general, a parsimonious yet reliable scale is desired, so only items considered to perform well are retained. When reviewing internal consistency, delete those
items without a corrected itemtotal correlation between .3 and .7, because they would
not contribute to a cohesive set of items. Review the remaining items to see if the alpha
value is above the minimal internal consistency criterion of a new scale, which is generally
set at a minimum of .7 (Nunnally & Bernstein, 1994).
If the concept is not one-dimensional, the presence of potential subscales (factorial
dimensions) can be examined by computing a Principal Components Analysis (PCA). A
PCA can also confirm that there is one construct in the scale. Examine how the researchers addressed missing values (how they handled a missing percent [e.g., mean substitution]). Look for 1) how many factors a reported scree plot indicated and percent variance
explained, 2) cut-off scores, and 3) the required difference to indicate (or not) side loading.
Side loading items should be placed with the factor that has the most conceptual congruence with the item. To learn congruity, review the definitions of the items and compare
them with the conceptual and operational definitions of the concept being measured. Review the reliability and variability of subscales. If a scale has high reliability but minimum
variability (frequency distribution), it is unlikely to detect differences between groups. If
retest stability was examined, look to see that the proper statistic was used and that the
value with its probability value demonstrated that the scores were similar. If there were
differences in sites or conditions where one might expect the concept to differ, compare
results by site(s).
There are advantages and disadvantages to both direct and recorded observations.
Direct observations do not require expensive equipment, which can be difficult to incorporate into the setting without changing the natural environment. A skilled rater can be
unobtrusive, and while observing for the targeted behavior(s), the rater can also obtain
data on the context in which the behavior(s) occurred. These data, whether scored on a
paper form or electronic device, can be immediately entered into the database. The disadvantage of direct observation is the potential for error by missing or incorrectly classifying behavior(s). Rating recorded observations allows greater control and decreases
potential bias. For example, in one study, the Sloane group (2004) recorded observational
data on digitalized videotapes and then randomly presented the data for coding to raters

Methodological Considerations 9

who were blinded to the study aims, the assignment of subjects, and preintervention or
post-intervention status. However, problems can arise when recruiting subjects for studies. One example would be for a study that includes videotaping persons either dressing
or being bathed. For instance, the wife of a potential subject expressed concern about her
husband being recorded as he was bathing as part of a study on resistiveness. She would
not allow researchers to record him. She told the project director that although she understood from personal experience exactly why the study was being done, she also realized
how embarrassed her husband would be if he knew how he was behaving. She just could
not allow him to be in the study. There are often justifiable reasons for why the numbers
of potentially available subjects, numbers of enrolled subjects, and numbers of subjects
who completed a project differ.
In addition to reviewing the overall summary in instrument development projects,
look to see if the authors provide recommendations for how the scale should be used and
describe how the scale was developed and tested. Do the authors disclose study limitations or explain why there may have been score differences among subgroups of subjects?
Do they share whether additional testing is suggested to enhance the use of the scale or to
examine use of the scale with other populations? The authors should relate the new scale
to the existing literature (some of which may be more recent than the literature review
conducted before the study was done). Do they explain what high or low scores could
mean and offer suggestions for improving conditions so as to obtain desired scores?
The authors need to provide enough overall information for readers to feel confident that
their interpretation of the results means that the scale is appropriate for use with the desired population.
Since five of the scales described in this book rely on direct observation, we are including a section on observers. The type of project and available resources will determine
observer criteria. For a unit-based quality-improvement project, caregiving staff may be
taught to observe (a plus being the cost efficiency of using existing resources, while potential minuses include insufficient time to teach staff members or relieve them from clinical
responsibilities to observe). For a multisite-controlled trial, researchers may hire specific
observers, who should be carefully selected based on background and interests consistent
with the project and the scales being used. For instance, when rating pain in residents
with advanced dementia for a quality-improvement project in a long-term care site, nursing assistants were reliable after 2 hours of training on the PAINAD (Warden, Hurley, &
Volicer, 2003). To observe agitation, college graduates with a background in psychology
or nursing and a special interest in older adults were reliable after 20 hours of classroom
and field training (Hurley et al., 1999).

Observer Training and Evaluation


Whether scoring rating forms while in the field setting or rating videotaped segments in
an office, observers will need to be accurate and consistent. The content and complexity of
the scale and the rating and scoring required determine the curriculum that should begin
in the classroom and proceed to the field (where real-time direct observation is used).
Researchers should develop a rater training program with handouts that highlight components that need to be memorized. Didactic content should include a description of the
overall project, the conceptual basis of the scale and rationale for its use, items labels with
operational definitions and defining characteristics, and scoring schema. During classroom
training, raters should be tested on how accurately they list defining characteristics and
describe the rating scheme. Videotapes of specific behaviors can be used for teaching the

10 Assessment Scales for Advanced Dementia

observers how to rate as well as ongoing competency testing. For instance, to rate observed discomfort (Hurley, Volicer, Hanrahan, Houde, & Volicer, 1992), we were able
to use videotapes of actual persons with advanced dementia. To rate observed agitation
(Hurley et al., 1999), we videotaped clinical staff and research team members acting out
specific behaviors on the scale with varying degrees of intensity as well as those behaviors
that were not manifestations of agitation.
In the classroom, the observers-in-training should be evaluated on their knowledge
of the research project. Classroom activities should consist of mastery testing of behavior definitions, rating scheme, and scoring decision rules; practice in rating videotaped
examples of behavior; debriefing discussions to clarify rating ambiguities; and measurement and feedback on individual performance. Reliability estimates during classroom
training should be based on a comparison of the observers score against the criterion
score for the videotaped segment established by the research team. Field training should
consist of operationalizing the protocol and observing actual research subjects. Each rater
should conduct observations concurrently with the study master trainer. There are several
statistical options for obtaining an index of rater reliability that make it less possible to obtain identical scores by chance (for this reason, percent agreement is not recommended).
Cohens Kappa is a widely used statistic. While .8 is desired, .6 values are considered
acceptable. Paired t-test is another option. If raters are reliable, there would be high, statistically significant Pearson correlation and low, statistically insignificant t-test values.
Interclass correlation, a form of ANOVA in which raters are the grouping variable and
scale scores are the dependent variable, can also be used.

Data Management
For both scale development and utilization projects, issues of data management must
be considered. While many of the specific details are not reported in research reports,
a behind-the-scenes process to ensure the accuracy of the dataset should precede any
analysis (i.e., the data must pass a quality assurance check). Embedding error-prevention strategies into the project can reduce many problems, but no matter how carefully
the project is designed and carried out, it is impossible to eliminate all issues. Researchers
should plan to actively search for potential errors and correct them. They might encounter errors with coding (e.g., missing or incorrect recode) or value (e.g., missing or out
of range). Data cleaning is the process of detecting, diagnosing, and editing faulty data,
while data editing involves changing the value of incorrect data (Van den Broeck, Argeseanu Cunningham, Eeckels, & Herbst, 2005). The Van den Broeck team recommends that
reports provide details of data-cleaning methods, error types and rates, and error deletion
and correction rates.
Some have viewed data editing as suspect, even bordering on data manipulation.
There are justifiable concerns about where to draw the line between manipulation and
responsible editing. Good practice guidelines for data management require transparency
and proper documentation of all procedures. Once errors (e.g., extreme or missing values) are identified, it is important to discern and follow the appropriate course of action.
Researchers must decide whether to correct, transform, delete, or do nothing with the
raw data. If patterns of missing data are random and deleting a few subjects does not adversely affect power, then cases can be deleted. Otherwise, values can be estimated from
prior knowledge or imputed (mean substitution).
Research projects that use the scales described in this book will either answer research
questions or test hypotheses and will be classified as experimental, quasi-experimental, or

Methodological Considerations 11

nonexperimental. These research projects need to use appropriate statistical tests. Several
factors determine the selection of suitable tests, including study design, types and numbers of subjects, scale characteristics, and error prevention.
The researcher must ask and answer several questions during the initial planning of
the project to prevent Type I and Type II errors. To avoid a Type I error, avoid common
mistakes in using statistics. For instance, use multivariate analysis of variance (MANOVA)
instead of sequential t-tests. An analysis of covariance (ANCOVA) should be used to account for differences between groups and to test for changes over time as well as interactions between independent variables. A multivariate analysis of covariance (MANCOVA)
tests for differences when there are many dependent variables (with any number of independent variables) to examine the effect of one or more independent variables on several
dependent variables simultaneously. A MANCOVA is used to examine how a dependent
variable varies over time, measuring that variable several times during a given time period. A MANCOVA can also determine whether or not other variables predict variability
in the dependent variable over time.
To avoid a Type II error, a power analysis must be performed to determine the
number of subjects required to detect a statistically significant difference (if indeed one
exists) in the outcome measure (scale used) between control and intervention groups.
Once the statistical test has been determined, the researcher then uses a table or computer program to calculate the sample size. Three numbers are placed in the formula.
Two numbers are set by convention: 1) alpha = .05, to avoid a Type I error, and 2) power=
.8, to avoid a Type II error. Effect size is the third number. The researcher selects an effect size based on the best estimate of anticipated score differences between groups. For
previously used scales, a review of published work would indicate meaningful score
differences between groups to help determine an estimated effect size. When there are
no data from previous research, as in the case of a newly developed scale, Munro (1999)
suggests using one-half of the standard deviation reported in the development article to
estimate a moderate effect size.
Prior to using any of these statistical tests, the data should be closely examined and
managed. Descriptive statistics should be computed on all study variables and reviewed
for potential issues that need addressing, especially systematic missing data, skewness,
and outliers. When conducting research with persons who have advanced dementia, it is
not uncommon for the data to be skewed, since so many subjects may score 0 because
a scale bottoms or none of the defining characteristics of a behavioral observation scale
are present. In this case, since the data are not normally distributed, nonparametric statistics could be used. If one wants to use skewed data with parametric statistics, the data
can be transformed by using the square root of the value or log 10 transformations. Also,
one should consider data management when there are one or two severe outliers in a
large sample and when the mean differs greatly from the median. If a few outliers scores
cause skewed data, consider recoding those scores to make them less extreme by assigning the outliers a score of one unit larger or smaller than the next extreme score in the
distribution. We suggest renaming edited variables for subsequent examination with the
raw dataset (i.e., to compare the two datasets, raw and transformed, and to report these
decisions and outcomes in the research report). Internal consistency reliabilities should be
computed on all subscales as well as scale totals for all time periods. If satisfactory alphas
( .7) are obtained, the scales can be used in subsequent analyses. If not, one must consider
whether or not to delete that variable from further analyses.
Before computing any analysis, the dataset needs to be tested to ensure that assumptions underlying the statistic are met. For example, in addition to testing for and addressing

12 Assessment Scales for Advanced Dementia

the normality of the sampling distribution, homogeneity of variancecovariance matrices,


linearity, multicollinearity, and singularity need to be checked before using MANCOVA.
Tests for univariate and multivariate outliers should be computed separately for each cell
in the design in each hypothesis so that appropriate transformations or deletions of out
lying cases can be carried out.
When evaluating how scales performed after the initial report, look to see if the scale
was used as intended and check the methods section to review whether the settings, sites,
and subjects in the present study are congruent with those in the initial study where
validity/reliability were first obtained. Is there a reason why the scale should have been
validated with the new population? Was the scale changed in any way that would require
additional psychometric testing to ensure its accuracy? Were procedures and controls in
place to ensure that the data were collected and scored accurately and that the dataset is
ready to be used in the analysis?
Since this book focuses on scales, we concentrated on those design and methods
issues specifically related to measurement. The goal is to overcome any potential inaccuracy in using scales. The accuracy of a scale depends both on the reliability and validity
of the scale and on the accuracy with which data are collected. One also has to weigh the
plusses and minuses of selecting a newly developed scale versus choosing a scale used in
many studies in order to make comparisons. Regardless, to ensure that the research using
the scales in this book will contribute to the overall goal of improving care for persons
with advanced dementia, studies outcome variables must be sensitive, reliable, and valid.
The 11 scales described in this book are suggested for use as outcome measures for
examining interventions to provide optimal care to persons with advanced dementia.
Each scale has adequate psychometric properties specific for this population. Readers are
provided with a description of the concepts measured by the scales, the original research
report outlining development and testing of each scale, and summaries of how the scales
have been used by others. We have also provided administration procedures and copies
of the scales that are suitable for duplication and make it easier to use the scales.

References
Camberg, L., Woods, P., Ooi, W.L., Hurley, A., Volicer, L., Ashley, J., et al. (1999). Evaluation of simulated presence: A personalized approach to enhance well-being in persons with Alzheimers disease. Journal of the
American Geriatrics Society, 47(4), 446452.
Folstein, M., Folstein, S., & McHugh, P.J. (1975). Mini-mental state, a practical method for grading the cognitive state of patients for clinicians. Journal of Psychiatric Research, 12, 189198.
Hurley, A.C., Volicer, B.J., Hanrahan, P., Houde, S., & Volicer, L. (1992). Assessment of discomfort in advanced
Alzheimer patients. Research in Nursing and Health, 15, 309317.
Hurley, A.C., Volicer, L., Camberg, L., Ashley, J., Woods, P., Odenheimer, G., et al. (1999). Measurement of
observed agitation in patients with dementia of the Alzheimer type. Journal of Mental Health and Aging, 5(2)
117133.
Knapp, T.R., & Brown, J.K. (1995). Focus on psychometrics: Ten measurement commandments that often
should be broken. Research in Nursing and Health, 18, 465469.
Lynn, M.R. (1986). Determination and quantification of content validity. Nursing Research, 35, 382385.
Mahoney, E.K., Hurley, A.C., Volicer, L., Bell, M., Gianotis, P., Harsthorn, M., et al. (1999). Development and
testing of the resistiveness to care scale. Research in Nursing and Health, 22, 2738.
Munro, B.H. (1999). Quantitative research methods. Alzheimer Disease and Associated Disorders, 13(Suppl. 1),
S50S53
Nunnally, J.C., & Bernstein, I.H. (1994). Psychometric theory (3rd ed.). New York: McGraw Hill Book Company,
Inc.
Sloane, P.D., Hoeffer, B., Mitchell, C.M., McKenzie, D.A., Barrick, A.L., Rader, J., et al. (2004). Effect of personcentered showering and the towel bath on bathing-associated aggression, agitation, and discomfort in
nursing home residents with dementia: A randomized, controlled trial. Journal of the American Geriatrics
Society, 52(11), 17951804.

Methodological Considerations 13

Tabachnick, B.G., & Fidell, L.S. (2013). Using multivariate statistics. Boston: Allyn and Bacon.
Tilden, V.P., Nelson, C.A., & May, B.A. (1990). Use of qualitative methods to enhance content validity. Nursing
Research, 39, 172175.
Van den Broeck, J., Argeseanu Cunningham, S., Eeckels, R., & Herbst, K. (2005). Data cleaning: Detecting,
diagnosing, and editing data abnormalities. PLOS Medicine (Open Access), 2(10), e267.
Volicer, L., Hurley, A.C., & Blasi, Z.V. (2001). Scales for evaluation of end-of-life care in dementia. Alzheimer
Disease & Associated Disorders, 15, 194200.
Volicer, L., Hurley, A.C., Lathi, D.C., & Kowall, N.W. (1994). Measurement of severity in advanced Alzheimers
disease. Journal of Gerontology, 49, M223M226.
Waltz, C.F., Strickland, O.L., & Lenz, E.R. (2010). Measurement in nursing and health research (4th ed.). New York:
Springer Publishing Company.
Warden, V., Hurley, A.C., & Volicer, L. (2003). Development and psychometric evaluation of the pain assessment in advanced dementia (PAINAD) scale. Journal of the American Medical Directors Association, 915.
Weiner, M.F., Martin-Cook, K., Svetlik, D.A., Saine, K., Foster, B., & Fontaine, C.S. (2000). The quality of life
in late-stage dementia (QUALID) scale. Journal of the American Medical Directors Association, May/June (1),
114116.
Wyatt, J., & Altman, D. (1995). Prognostic models: Clinically useful or quickly forgotten? British Medical Journal,
311(9), 15391541.

Das könnte Ihnen auch gefallen