Social Research Methods Knowledge Base PDF

Resources
*Knowledge Base
*Selecting Statistics
*Simulation Book
*Methods Tutorials
Cornell Sites
*Cornell
*CU Human
Subjects Training
Concept
Mapping
*Concept Systems, Knowledge Base Research Center Selecting Statistics

Inc.
Research Synthesis Gallery The Simulation Book Research Methods Tutorials
Concept Mapping
Search
Welcome. This website is for people involved in applied social research and evaluation.
You'll find lots of resources and links to other locations on the Web that deal in applied
social research methods. Some highlights of what is available:
● the Knowledge Base -- an online hypertext textbook on applied social research

methods that covers everything you want to know about defining a research
question, sampling, measurement, research design and data analysis.
● Selecting Statistics -- an online statistical advisor! Answer the questions and it will
lead you to an appropriate statistical test for your data.
● The Simulation Book -- A previously unpublished book of manual (i.e., dice-rolling)
and computer simulation exercises of common research designs, for students and
researchers to learn how to do simple simulations.
Copyright © 2004, 2002, 1999, 1997, William M.K. Trochim, All Rights Reserved
_
Contents
Navigating by William M. Trochim, Cornell University - What is the KB?
Foundations
Sampling - Purchase
What is the Research Methods Knowledge Base?
Measurement
Design
- Citing the KB
Analysis
The Research Methods Knowledge Base is a comprehensive web- - Using the KB in a
Write-Up
Appendices
based textbook that addresses all of the topics in a typical Course
introductory undergraduate or graduate course in social research
Search
methods. It covers the entire research process including: formulating - Copyright
research questions; sampling (probability and nonprobability); Information
measurement (surveys, scaling, qualitative, unobtrusive); research
design (experimental and quasi-experimental); data analysis; and, - About the Author
writing the research paper. It also addresses the major theoretical -
and philosophical underpinnings of research including: the idea of
validity in research; reliability of measures; and ethics. The
Acknowledgements
Knowledge Base was designed to be different from the many typical - Dedication
commercially-available research methods texts. It uses an informal,
conversational style to engage both the newcomer and the more
experienced student of research. It is a fully hyperlinked text that can
be integrated easily into an existing course structure or used as a
sourcebook for the experienced researcher who simply wants to
browse.[Back to Top]
Purchasing
You can purchase a complete printed copy of the Research Methods

Knowledge Base over the web by selecting the link Purchase the
complete printed text of the Knowledge Base online at the bottom of
any page. [Back to Top]
Using the KB in a Course
The latest editions of the Knowledge Base are published exclusively

by Atomic Dog Publishing. In addition to providing a unique updated
online version of the Knowledge Base text (much more sophisticated
than this one), they are the exclusive publishers of the print version.
Through Atomic Dog Publishing you can expect the finest in web-
based course support for the Knowledge Base including workbooks,
study guides, online testing, test item data banks, and much more. If
you have any questions about use of the Knowledge Base in your
course, please check their website at http://www.atomicdog.com/
trochim or contact them directly by e-mail at help@atomicdogpub.
com.
Atomic Dog Publishing has agreed to allow me to continue to make

this version of the Knowledge Base available indefinitely at no cost to
any instructor who wishes to use it as part of their course. However,
there are several conditions which must be met for such use:
1. You must notify me by e-mail (wmt1@cornell.edu) each time

you use part or all of this site for a course. Describe briefly
what page(s) you will use, the name of the course or
seminar, the expected number of students, and your contact
information.
2. You may not reproduce these webpages in part or in whole

on any other web server or alternative media (e.g., on a disk
or CD) -- the Knowledge Base must be accessed over the
web at this site. Unfortunately, this means that no mirroring of
this site or any pages on it can be allowed.
3. You must cite this website appropriately in any list of

readings or course syllabus (see Citing the KB for details).
Please see the latest edition of the Knowledge Base at http://www.

atomicdog.com/trochim.
About the Author
William M.K. Trochim is a Professor in the Department of Policy

Analysis and Management at Cornell University. He has taught both
the undergraduate and graduate required courses in applied social
research methods since joining the faculty at Cornell in 1980. He
received his Ph.D. in 1980 from the program in Methodology and
Evaluation Research of the Department of Psychology at
Northwestern University. His research interests include the theory
and practice of research, conceptualization methods (including
concept mapping and pattern matching), strategic and operational
planning methods, performance management and measurement, and
change management. He is the developer of The Concept System®
and founder of Concept Systems Incorporated. he lives in Ithaca,
New York with his wife Mary and daughter Nora. [Back to Top]
Acknowledgements
This work, as is true for all significant efforts in life, is a collaborative

achievement. I want to thank especially the students and friends who
assisted and supported me in various ways over the years. I
especially want to thank Dominic Cirillo who has labored tirelessly
over several years on both the web and printed versions of the
Knowledge Base and without whom I simply would not have survived.
There are also the many graduate Teaching Assistants who helped
make the transition to a web-based course and have contributed their
efforts and insights to this work and the teaching of research
methods. And, of course, I want to thank all of the students, both
undergraduate and graduate, who participated in my courses over
the years and used the Knowledge Base in its various incarnations.
You have been both my challenge and inspiration.[Back to Top]
New location
Dedication
For Mary and Nora
who continue to astonish me with their resilience, patience, and

love
[Back to Top]
Copyright ©2002, William M.K. Trochim, All Rights Reserved

Purchase a printed copy of the Research Methods Knowledge Base
Last Revised: 01/16/2005
[ Home ] [ Contents ] [ Navigating ] [ Foundations ] [ Sampling ] [ Measurement ] [ Design ] [ Analysis ] [ Write-Up ] [ Appendices ] [ Search ]
Research Methods Knowledge Base
Contents
Navigating
Yin-Yang Map
The Road Map
Foundations
Language Of Research
Five Big Words
Types of Questions
Time in Research
Types of Relationships
Variables
Hypotheses
Types of Data
Unit of Analysis
Two Research Fallacies
Philosophy of Research
Structure of Research
Deduction & Induction
Positivism & Post-Positivism
Introduction to Validity
Ethics in Research
Conceptualizing
Problem Formulation
Concept Mapping
Evaluation Research
Introduction to Evaluation
The Planning-Evaluation Cycle
An Evaluation Culture
Sampling
External Validity
Sampling Terminology
Statistical Terms in Sampling
Probability Sampling
Nonprobability Sampling
Measurement
Construct Validity
Measurement Validity Types
Idea of Construct Validity
Convergent & Discriminant Validity
Threats to Construct Validity
The Nomological Network
The Multitrait-Multimethod Matrix
Pattern Matching for Construct Validity
Reliability
True Score Theory
Measurement Error
Theory of Reliability
Types of Reliability
Reliability & Validity
Levels of Measurement
Survey Research
Types of Surveys
Selecting the Survey Method
Constructing the Survey
Types Of Questions
Question Content
Response Format
Question Wording
Question Placement
Interviews
Plus & Minus of Survey Methods
Scaling
General Issues in Scaling
Thurstone Scaling
Likert Scaling
Guttman Scaling
Qualitative Measures
The Qualitative Debate
Qualitative Data
Qualitative Approaches
Qualitative Methods
Qualitative Validity
Unobtrusive Measures
Design
Internal Validity
Establishing Cause & Effect
Single Group Threats
Regression to the Mean
Multiple-Group Threats
Social Interaction Threats
Introduction to Design
Types of Designs
Experimental Design
Two-Group Experimental Designs
Probabilistic Equivalence
Random Selection & Assignment
Classifying Experimental Designs
Factorial Designs
Factorial Design Variations
Randomized Block Designs
Covariance Designs
Hybrid Experimental Designs
Quasi-Experimental Design
The Nonequivalent Groups Design
The Regression-Discontinuity Design
Other Quasi-Experimental Designs
Relationships Among Pre-Post Designs
Designing Designs for Research
Advances in Quasi-Experimentation
Analysis
Conclusion Validity
Threats to Conclusion Validity
Improving Conclusion Validity
Statistical Power
Data Preparation
Descriptive Statistics
Correlation
Inferential Statistics
The t-Test
Dummy Variables
General Linear Model
Posttest-Only Analysis
Factorial Design Analysis
Randomized Block Analysis
Analysis of Covariance
Nonequivalent Groups Analysis
Regression-Discontinuity Analysis
Regression Point Displacement Analysis
Write-Up
Key Elements
Formatting
Sample Paper
Appendices
Citing the KB
Order the KB
Copyright Notice
Search
Yin-Yang Map Navigating the Knowledge Base

The Road Map
There are at least five options that I can think of for getting to relevant online material in the Knowledge
Base:
The Border Contents
Every page of the Knowledge Base has links in the margins. These links are based on
the hierarchical structure of the website and change depending on the position of the
page in that structure. The links at the top (repeated at the bottom) on each page show
the other pages at the same level of the hierarchy as the page you are looking at. The
links in the left border always include:
The Home Page

The parent page for the page you are viewing
The child pages for the page you are viewing
The Table of Contents
This is a standard hierarchical table of contents like the type you would expect in a
textbook. It is the only navigational device that at a glance shows every page in the
Knowledge Base.
The Yin-Yang Map
This map is based on a graphic that, at a glance, provides an organizing rubric for the
entire Knowledge Base content. It separates the theory of research from the practice of
research and shows how theory and practice are related. this might be an especially
useful launch pad for an advanced or graduate research methods course because of the
strong emphasis on the link between theory and practice.
The Road Map
This map is based on a graphic that shows the typical stages in a research project. It
uses the metaphor of research as a journey down the research road from initial
conceptualization and problem formation through the write-up and reporting. This might
be an especially useful launch pad for an introductory undergraduate course because it
concentrates primarily on the practice of research.
The Search Page
In the top and bottom margins on every page in the Knowledge base there is a link to the
Search Page. When you need to find information on a specific topic rapidly you should
use this page. The Search Page is linked to an index of every word in the Knowledge
Base, allows you to perform simple and Boolean searches, and returns resulting links
sorted from most to least relevant.

Language Of Research This section provides an overview the major issues in research and in evaluation. This is probably
Philosophy of Research the best place for you to begin learning about research.
Ethics in Research
Conceptualizing
Evaluation Research
We have to begin somewhere. (Although, if you think about it, the whole idea of hyperlinked text
sort of runs contrary to the notion that there is a single place to begin -- you can begin anywhere,
go anywhere, and leave anytime. Unfortunately, you can only be in one place at a time and, even
less fortunately for you, you happen to be right here right now, so we may as well consider this a
place to begin.) And what better place to begin than an introduction? Here's where we take care of
all the stuff you think you already know, and probably should already know, but most likely don't
know as well as you think you do.
The first thing we have to get straight is the language of research. If we don't, we're going to have
a hard time discussing research.
With the basic terminology under our belts, we can look a little more deeply at some of the
underlying philosophical issues that drive the research endeavor.
We also need to recognize that social research always occurs in a social context. It is a human
endeavor. Therefore, it's important to consider the critical ethical issues that affect the researcher,
research participants, and the research effort generally.
Where do research problems come from? How do we develop a research question? We consider
these issues under conceptualization.
Finally, we look at a specific, and very applied, type of social research known as evaluation
research.
That ought to be enough to get you started. At least it ought to be enough to get you thoroughly
confused. But don't worry, there's stuff that's far more confusing than this yet to come.

External Validity Sampling is the process of selecting units (e.g., people, organizations) from a population of
Sampling Terminology interest so that by studying the sample we may fairly generalize our results back to the
Statistical Sampling Terms
population from which they were chosen. Let's begin by covering some of the key terms in
Probability Sampling
Nonprobability Sampling
sampling like "population" and "sampling frame." Then, because some types of sampling rely
upon quantitative models, we'll talk about some of the statistical terms used in sampling. Finally,
we'll discuss the major distinction between probability and Nonprobability sampling methods and
work through the major types in each.

Construct Validity Measurement is the process observing and recording the observations that are collected as part of
Reliability a research effort. There are two major issues that will be considered here.
Levels of Measurement
Survey Research
Scaling First, you have to understand the fundamental ideas involved in measuring. Here we consider two
Qualitative Measures of major measurement concepts. In Levels of Measurement, I explain the meaning of the four major
Unobtrusive Measures levels of measurement: nominal, ordinal, interval and ratio. Then we move on to the reliability of
measurement, including consideration of true score theory and a variety of reliability estimators.
Second, you have to understand the different types of measures that you might use in social
research. We consider four broad categories of measurements. Survey research includes the
design and implementation of interviews and questionnaires. Scaling involves consideration of the
major methods of developing and implementing a scale. Qualitative research provides an overview
of the broad range of non-numerical measurement approaches. And unobtrusive measures
presents a variety of measurement methods that don't intrude on or interfere with the context of the
research.

Internal Validity Research design provides the glue that holds the research project together. A design is
Introduction to Design used to structure the research, to show how all of the major parts of the research
Types of Designs
project -- the samples or groups, measures, treatments or programs, and methods of
Experimental Design
assignment -- work together to try to address the central research questions. Here,
Quasi-Experimental Design
after a brief introduction to research design, I'll show you how we classify the major
Relationships Among Pre-Post Designs
Designing Designs for Research types of designs. You'll see that a major distinction is between the experimental
Advances in Quasi-Experimentation designs that use random assignment to groups or programs and the quasi-
experimental designs that don't use random assignment. [People often confuse what
is meant by random selection with the idea of random assignment. You should make
sure that you understand the distinction between random selection and random
assignment.] Understanding the relationships among designs is important in making
design choices and thinking about the strengths and weaknesses of different designs.
Then, I'll talk about the heart of the art form of designing designs for research and give
you some ideas about how you can think about the design task. Finally, I'll consider
some of the more recent advances in quasi-experimental thinking -- an area of special
importance in applied social research and program evaluation.

Conclusion Validity By the time you get to the analysis of your data, most of the really difficult work has been done. It's
Data Preparation much more difficult to: define the research problem; develop and implement a sampling plan;
Descriptive Statistics
conceptualize, operationalize and test your measures; and develop a design structure. If you have
Inferential Statistics
done this work well, the analysis of the data is usually a fairly straightforward affair.
In most social research the data analysis involves three major steps, done in roughly this order:
Cleaning and organizing the data for analysis (Data Preparation)

Describing the data (Descriptive Statistics)
Testing Hypotheses and Models (Inferential Statistics)
Data Preparation involves checking or logging the data in; checking the data for accuracy; entering
the data into the computer; transforming the data; and developing and documenting a database
structure that integrates the various measures.
Descriptive Statistics are used to describe the basic features of the data in a study. They provide
simple summaries about the sample and the measures. Together with simple graphics analysis, they
form the basis of virtually every quantitative analysis of data. With descriptive statistics you are simply
describing what is, what the data shows.
Inferential Statistics investigate questions, models and hypotheses. In many cases, the conclusions
from inferential statistics extend beyond the immediate data alone. For instance, we use inferential
statistics to try to infer from the sample data what the population thinks. Or, we use inferential
statistics to make judgments of the probability that an observed difference between groups is a
dependable one or one that might have happened by chance in this study. Thus, we use inferential
statistics to make inferences from our data to more general conditions; we use descriptive statistics
simply to describe what's going on in our data.
In most research studies, the analysis section follows these three phases of analysis. Descriptions of
how the data were prepared tend to be brief and to focus on only the more unique aspects to your
study, such as specific data transformations that are performed. The descriptive statistics that you
actually look at can be voluminous. In most write-ups, these are carefully selected and organized into
summary tables and graphs that only show the most relevant or important information. Usually, the
researcher links each of the inferential analyses to specific research questions or hypotheses that
were raised in the introduction, or notes any models that were tested that emerged as part of the
analysis. In most analysis write-ups it's especially critical to not "miss the forest for the trees." If you
present too much detail, the reader may not be able to follow the central line of the results. Often
extensive analysis details are appropriately relegated to appendices, reserving only the most critical
analysis summaries for the body of the report itself.
Key Elements So now that you've completed the research project, what do you do? I know you won't want to hear this,
Formatting but your work is still far from done. In fact, this final stage -- writing up your research -- may be one of the
Sample Paper
most difficult. Developing a good, effective and concise report is an art form in itself. And, in many
research projects you will need to write multiple reports that present the results at different levels of detail
for different audiences.
There are several general considerations to keep in mind when generating a report:
The Audience
Who is going to read the report? Reports will differ considerably depending on whether the
audience will want or require technical detail, whether they are looking for a summary of results,
or whether they are about to examine your research in a Ph.D. exam.
The Story
I believe that every research project has at least one major "story" in it. Sometimes the story
centers around a specific research finding. Sometimes it is based on a methodological problem
or challenge. When you write your report, you should attempt to tell the "story" to your reader.
Even in very formal journal articles where you will be required to be concise and detailed at the
same time, a good "storyline" can help make an otherwise very dull report interesting to the
reader.
The hardest part of telling the story in your research is finding the story in the first place. Usually
when you come to writing up your research you have been steeped in the details for weeks or
months (and sometimes even for years). You've been worrying about sampling response,
struggling with operationalizing your measures, dealing with the details of design, and wrestling
with the data analysis. You're a bit like the ostrich that has its head in the sand. To find the story
in your research, you have to pull your head out of the sand and look at the big picture. You have
to try to view your research from your audience's perspective. You may have to let go of some of
the details that you obsessed so much about and leave them out of the write up or bury them in
technical appendices or tables.
Formatting Considerations
Are you writing a research report that you will submit for publication in a journal? If so, you
should be aware that every journal requires articles that you follow specific formatting guidelines.
Thinking of writing a book. Again, every publisher will require specific formatting. Writing a term
paper? Most faculty will require that you follow specific guidelines. Doing your thesis or
dissertation? Every university I know of has very strict policies about formatting and style.
There are legendary stories that circulate among graduate students about the dissertation that
was rejected because the page margins were a quarter inch off or the figures weren't labeled
correctly.
To illustrate what a set of research report specifications might include, I present in this section general
guidelines for the formatting of a research write-up for a class term paper. These guidelines are very
similar to the types of specifications you might be required to follow for a journal article. However, you
need to check the specific formatting guidelines for the report you are writing -- the ones presented here
are likely to differ in some ways from any other guidelines that may be required in other contexts.
I've also included a sample research paper write-up that illustrates these guidelines. This sample paper is
for a "make-believe" research project. But it illustrates how a final research report might look using the
guidelines given here.

Citing the KB The appendices include information about how to order printed copies of the Research Methods
Order the KB Knowledge Base and how to use the text as part of an undergraduate or graduate-level course in social
Copyright Notice
research methods.

Knowledge Base Search Page
Use the form below to search for documents in the Research Methods Knowledge Base containing specific words or
combinations of words. The text search engine will display a weighted list of matching documents, with better
matches shown first. Each list item is a link to a matching document; if the document has a title it will be shown,
otherwise only the document's file name is displayed. A brief explanation of the query language is available, along
with examples.
Search for:
Start Search Clear
Query Language
The text search engine allows queries to be formed from arbitrary Boolean expressions containing the keywords
AND, OR, and NOT, and grouped with parentheses. For example:
information retrieval
finds documents containing 'information' or 'retrieval'
information or retrieval
same as above
information and retrieval

finds documents containing both 'information' and 'retrieval'
information not retrieval

finds documents containing 'information' but not 'retrieval'
(information not retrieval) and WAIS

finds documents containing 'WAIS', plus 'information' but not 'retrieval'
web*
finds documents containing words starting with 'web'
Back to Top
[ Home ] [ Citing the KB ] [ Order the KB ] [ Copyright Notice ]
Order the Enhanced and Revised KB
Whether you are an individual interested in using the Knowledge Base on your own, are a student using it as part of
an online course, or are an instructor who wishes to adopt it for a course, you can order the revised and expanded
version of the Knowledge Base at http://www.atomicdog.com/trochim.
PLEASE NOTE: The printed version of the Research Methods Knowledge Base is revised and enhanced
version of this website version. It is also available on a protected website and is available to you on the site
for free if you purchase the hardcopy version. The printed version is in greyscale, not in color (as the
website is). To print the entire volume in color would raise costs considerably.
Thanks for your interest in the Research Methods Knowledge Base.

If you quote material from the Knowledge Base in your work, please cite it accurately. An appropriate citation for the
online home page would be:
Trochim, William M. The Research Methods Knowledge Base, 2nd Edition. Internet WWW
page, at URL: <http://trochim.human.cornell.edu/kb/index.htm> (version current as of
August 16, 2004).
The date that each page was last edited is given at the bottom of the page and can be used for "version current as
of..."
If you are citing the printed version, the citation would be:
Trochim, W. (2000). The Research Methods Knowledge Base, 2nd Edition. Atomic Dog
Publishing, Cincinnati, OH.

COPYRIGHT
©Copyright, William M.K. Trochim 1998-2000. All Rights Reserved.
LICENSE DISCLAIMER
Nothing on the Research Methods Knowledge Base Web Site or in the printed version shall be construed as
conferring any license under any of the William M.K. Trochim's or any third party's intellectual property rights,
whether by estoppel, implication, or otherwise.
CONTENT AND LIABILITY DISCLAIMER
William M.K. Trochim shall not be responsible for any errors or omissions contained on the Research Methods
Knowledge Base Web Site or in the printed version, and reserves the right to make changes without notice.
Accordingly, all original and third party information is provided "AS IS". In addition, William M.K. Trochim is not
responsible for the content of any other Web Site linked to the Research Methods Knowledge Base Web Site or
cited in the printed version. Links are provided as Internet navigation tools only.
WILLIAM M.K. TROCHIM DISCLAIMS ALL WARRANTIES WITH REGARD TO THE INFORMATION
(INCLUDING ANY SOFTWARE) PROVIDED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY
AND FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. Some jurisdictions do not allow the
exclusion of implied warranties, so the above exclusion may not apply to you.
In no event shall William M.K. Trochim be liable for any damages whatsoever, and in particular William M.K. Trochim
shall not be liable for special, indirect , consequential, or incidental damages, or damages for lost profits, loss of
revenue, or loss of use, arising out of or related to the Research Methods Knowledge Base Web Site or the printed
version or the information contained in these, whether such damages arise in contract, negligence, tort, under
statute, in equity, at law or otherwise.
FEEDBACK INFORMATION
Any information provided to William M.K. Trochim in connection with the Research Methods Knowledge Base Web
Site or the printed version shall be provided by the submitter and received by William M.K. Trochim on a non-
confidential basis. William M.K. Trochim shall be free to use such information on an unrestricted basis.

[ Home ] [ Key Elements ] [ Formatting ] [ Sample Paper ]
This page describes the elements or criteria that you must typically address in a research paper. The assumption
here is that you are addressing a causal hypothesis in your paper.
I. Introduction
1. Statement of the problem: The general problem area is stated clearly and unambiguously. The importance
and significance of the problem area is discussed.
2. Statement of causal relationship: The cause-effect relationship to be studied is stated clearly and is
sensibly related to the problem area.
3. Statement of constructs: Each key construct in the research/evaluation project is explained (minimally,
both the cause and effect). The explanations are readily understandable (i.e., jargon-free) to an intelligent
reader.
4. Literature citations and review: The literature cited is from reputable and appropriate sources (e.g.,
professional journals, books and not Time, Newsweek, etc.) and you have a minimum of five references.
The literature is condensed in an intelligent fashion with only the most relevant information included.
Citations are in the correct format (see APA format sheets).
5. Statement of hypothesis: The hypothesis (or hypotheses) is clearly stated and is specific about what is
predicted. The relationship of the hypothesis to both the problem statement and literature review is readily
understood from reading the text.
II. Methods
Sample section:
1. Sampling procedure specifications: The procedure for selecting units (e.g., subjects, records) for the
study is described and is appropriate. The author state which sampling method is used and why. The
population and sampling frame are described. In an evaluation, the program participants are frequently self-
selected (i.e., volunteers) and, if so, should be described as such.
2. Sample description: The sample is described accurately and is appropriate. Problems in contacting and
measuring the sample are anticipated.
3. External validity considerations: Generalizability from the sample to the sampling frame and population is
considered.
Measurement section:
1. Measures: Each outcome measurement construct is described briefly (a minimum of two outcome
constructs is required). For each construct, the measure or measures are described briefly and an
appropriate citation and reference is included (unless you created the measure). You describe briefly the
measure you constructed and provide the entire measure in an Appendix. The measures which are used
are relevant to the hypotheses of the study and are included in those hypotheses. Wherever possible,
multiple measures of the same construct are used.
2. Construction of measures: For questionnaires, tests and interviews: questions are clearly worded,
specific, appropriate for the population, and follow in a logical fashion. The standards for good questions are
followed. For archival data: original data collection procedures are adequately described and indices (i.e.,
combinations of individual measures) are constructed correctly. For scales, you must describe briefly which
scaling procedure you used and how you implemented it. For qualitative measures, the procedures for
collecting the measures are described in detail.
3. Reliability and validity: You must address both the reliability and validity of all of your measures. For
reliability, you must specify what estimation procedure(s) you used. For validity, you must explain how you
assessed construct validity. Wherever possible, you should minimally address both convergent and
discriminant validity. The procedures which are used to examine reliability and validity are appropriate for
the measures.
Design and Procedures section:
1. Design: The design is clearly presented in both notational and text form. The design is appropriate for the
problem and addresses the hypothesis.
2. Internal validity: Threats to internal validity and how they are addressed by the design are discussed. Any
threats to internal validity which are not well controlled are also considered.
3. Description of procedures: An overview of how the study will be conducted is included. The sequence of
events is described and is appropriate to the design. Sufficient information is included so that the essential
features of the study could be replicated by a reader.
III. Results
1. Statement of Results: The results are stated concisely and are plausible for the research described.
2. Tables: The table(s) is correctly formatted and accurately and concisely presents part of the analysis.
3. Figures: The figure(s) is clearly designed and accurately describes a relevant aspect of the results.
IV. Conclusions, Abstract and Reference Sections
1. Implications of the study: Assuming the expected results are obtained, the implications of these results
are discussed. The author mentions briefly any remaining problems which are anticipated in the study.
2. Abstract: The Abstract is 125 words or less and presents a concise picture of the proposed research. Major
constructs and hypotheses are included. The Abstract is the first section of the paper. See the format sheet
for more details.
3. References: All citations are included in the correct format and are appropriate for the study described.
Stylistic Elements
I. Professional Writing
First person and sex-stereotyped forms are avoided. Material is presented in an unbiased and
unemotional (e.g., no "feelings" about things), but not necessarily uninteresting, fashion.
II. Parallel Construction
Tense is kept parallel within and between sentences (as appropriate).

III. Sentence Structure
Sentence structure and punctuation are correct. Incomplete and run-on sentences are avoided.
IV. Spelling and Word Usage
Spelling and use of words are appropriate. Words are capitalized and abbreviated correctly.
V. General Style.
The document is neatly produced and reads well. The format for the document has been correctly
followed.

Overview
The instructions provided here are for a research article or a research report (generally these guidelines follow the
formatting guidelines of the American Psychological Association documented in Publication Manual of the American
Psychological Association, 4th Edition). Please consult the specific guidelines that are required by the publisher for
the type of document you are producing.
All sections of the paper should be typed, double-spaced on white 8 1/2 x 11 inch paper with 12 pitch typeface with
all margins set to 1 inch. REMEMBER TO CONSULT THE APA PUBLICATION MANUAL, FOURTH EDITION,
PAGES 258 - 264 TO SEE HOW TEXT SHOULD APPEAR. Every page must have a header in the upper right
corner with the running header right-justified on the top line and the page number right-justified and double-spaced
on the line below it. The paper must have all the sections in the order given below, following the specifications
outlined for each section (all pages numbers are approximate):
Title Page
Abstract (on a separate single page)
The Body (no page breaks between sections in the body)
Introduction (2-3 pages)
Methods (7-10 pages)
Sample (1 page)
Measures (2-3 pages)
Design (2-3 pages)
Procedures (2-3 pages)
Results (2-3 pages)
Conclusions (1-2 pages)
References
Tables (one to a page)
Figures (one to a page)
Appendices
Title Page
On separate lines and centered, the title page has the title of the study, the author's name, and the institutional
affiliation. At the bottom of the title page you should have the words (in caps) RUNNING HEADER: followed by a
short identifying title (2-4 words) for the study. This running header should also appear on the top right of every page
of the paper.
Abstract
The abstract is limited to one page, double-spaced. At the top of the page, centered, you should have the word
'Abstract'. The abstract itself should be written in paragraph form and should be a concise summary of the entire
paper including: the problem; major hypotheses; sample and population; a brief description of the measures; the
name of the design or a short description (no design notation here); the major results; and, the major conclusions.
Obviously, to fit this all on one page you will have to be very concise.
Body
The first page of the body of the paper should have, centered, the complete title of the study.
Introduction
The first section in the body is the introduction. There is no heading that says 'Introduction,' you simply begin the
paper in paragraph form following the title. Every introduction will have the following (roughly in this order): a
statement of the problem being addressed; a statement of the cause-effect relationship being studied; a description
of the major constructs involved; a brief review of relevant literature (including citations); and a statement of
hypotheses. The entire section should be in paragraph form with the possible exception of the hypotheses, which
may be indented.
Methods
The next section of the paper has four subsections: Sample; Measures; Design; and, Procedure. The Methods
section should begin immediately after the introduction (no page break) and should have the centered title 'Methods'.
Each of the four subsections should have an underlined left justified section heading.
Sampling
This section should describe the population of interest, the sampling frame, the method for selecting the sample, and
the sample itself. A brief discussion of external validity is appropriate here, that is, you should state the degree to
which you believe results will be generalizable from your sample to the population. (Link to Knowledge Base on
sampling).
Measures
This section should include a brief description of your constructs and all measures that will be used to operationalize
them. You may present short instruments in their entirety in this section. If you have more lengthy instruments you
may present some "typical" questions to give the reader a sense of what you will be doing (and include the full
measure in an Appendix). You may include any instruments in full in appendices rather than in the body.
Appendices should be labeled by letter. (e.g., 'Appendix A') and cited appropriately in the body of the text. For pre-
existing instruments you should cite any relevant information about reliability and validity if it is available. For all
instruments, you should briefly state how you will determine reliability and validity, report the results and discuss.
For reliability, you must describe the methods you used and report results. A brief discussion of how you have
addressed construct validity is essential. In general, you should try to demonstrate both convergent and
discriminant validity. You must discuss the evidence in support of the validity of your measures. (Link to Knowledge
Base on measurement).
Design
You should state the name of the design that is used and tell whether it is a true or quasi-experiment, nonequivalent
group design, and so on. You should also present the design structure in X and O notation (this should be indented
and centered, not put into a sentence). You should also include a discussion of internal validity that describes the
major likely threats in your study and how the design accounts for them, if at all. (Be your own study critic here and
provide enough information to show that you understand the threats to validity, whether you've been able to account
for them all in the design or not.) (Link to Knowledge Base on design).
Procedures
Generally, this section ties together the sampling, measurement, and research design. In this section you should
briefly describe the overall plan of the research, the sequence of events from beginning to end (including sampling,
measurement, and use of groups in designs), how participants will be notified, and how their confidentiality will be
protected (where relevant). An essential part of this subsection is a description of the program or independent
variable that you are studying. (Link to Knowledge Base discussion of validity).
Results
The heading for this section is centered with upper and lower case letters. You should indicate concisely what
results you found in this research. Your results don't have to confirm your hypotheses. In fact, the common
experience in social research is the finding of no effect.
Conclusions
Here you should describe the conclusions you reach (assuming you got the results described in the Results section
above). You should relate these conclusions back to the level of the construct and the general problem area which
you described in the Introduction section. You should also discuss the overall strength of the research proposed (e.
g. general discussion of the strong and weak validity areas) and should present some suggestions for possible
future research which would be sensible based on the results of this work.
References
There are really two parts to a reference citation. First, there is the way you cite the item in the text when you are
discussing it. Second, there is the way you list the complete reference in the reference section in the back of the
report.
Reference Citations in the Text of Your Paper
Cited references appear in the text of your paper and are a way of giving credit to the source of the information or
quote you have used in your paper. They generally consist of the following bits of information:
The author's last name, unless first initials are needed to distinguish between two authors with the
same last name. If there are six or more authors, the first author is listed followed by the term, et al.,
and then the year of the publication is given in parenthesis. Year of publication in parenthesis. Page
numbers are given with a quotation or when only a specific part of a source was used.
"To be or not to be" (Shakespeare, 1660, p. 241)
One Work by One Author:

Rogers (1994) compared reaction times...
One Work by Multiple Authors:
Wasserstein, Zappulla, Rosen, Gerstman, and Rock (1994) [first time you cite in
text]
Wasserstein et al. (1994) found [subsequent times you cite in text]
Reference List in Reference Section
There are a wide variety of reference citation formats. Before submitting any research report you should check to
see which type of format is considered acceptable for that context. If there is no official format requirement then the
most sensible thing is for you to select one approach and implement it consistently (there's nothing worse than a
reference list with a variety of formats). Here, I'll illustrate by example some of the major reference items and how
they might be cited in the reference section.
The References lists all the articles, books, and other sources used in the research and preparation of the paper and
cited with a parenthetical (textual) citation in the text. These items are entered in alphabetical order according to the
authors' last names; if a source does not have an author, alphabetize according to the first word of the title,
disregarding the articles "a", "an", and "the" if they are the first word in the title.
EXAMPLES BOOK BY ONE AUTHOR:
Jones, T. (1940). My life on the road. New York: Doubleday.
BOOK BY TWO AUTHORS:
Williams, A., & Wilson, J. (1962). New ways with chicken. New York: Harcourt.
BOOK BY THREE OR MORE AUTHORS:
Smith, J., Jones, J., & Williams, S. (1976). Common names. Chicago: University of
Chicago Press.
BOOK WITH NO GIVEN AUTHOR OR EDITOR:
Handbook of Korea (4th ed.). (1982). Seoul: Korean Overseas Information, Ministry
of Culture & Information.
TWO OR MORE BOOKS BY THE SAME AUTHOR:
Oates, J.C. (1990). Because it is bitter, and because it is my heart. New York:
Dutton.
Oates, J.C. (1993). Foxfire: Confessions of a girl gang. New York: Dutton.
Note: Entries by the same author are arranged chronologically by the year of
publication, the earliest first. References with the same first author and
different second and subsequent authors are listed alphabetically by the
surname of the second author, then by the surname of the third author.
References with the same authors in the same order are entered
chronologically by year of publication, the earliest first. References by the
same author (or by the same two or more authors in identical order) with the
same publication date are listed alphabetically by the first word of the title
following the date; lower case letters (a, b, c, etc.) are included after the
year, within the parentheses.
BOOK BY A CORPORATE (GROUP) AUTHOR:
President's Commission on Higher Education. (1977). Higher education for

American democracy . Washington, D.C.: U.S. Government Printing Office.
BOOK WITH AN EDITOR:
Bloom, H. (Ed.). (1988). James Joyce's Dubliners. New York: Chelsea House.
A TRANSLATION:
Dostoevsky, F. (1964). Crime and punishment (J. Coulson Trans.). New York:
Norton. (Original work published 1866)
AN ARTICLE OR READING IN A COLLECTION OF PIECES BY SEVERAL AUTHORS

(ANTHOLOGY):
O'Connor, M.F. (1975). Everything that rises must converge. In J.R. Knott, Jr. & C.
R. Raeske (Eds.), Mirrors: An introduction to literature (2nd ed., pp. 58-67). San
Francisco: Canfield.
EDITION OF A BOOK:
Tortora, G.J., Funke, B.R., & Case, C.L. (1989). Microbiology: An introduction (3rd
ed.). Redwood City, CA: Benjamin/Cummings.
DIAGNOSTIC AND STATISTICAL MANUAL OF MENTAL DISORDERS:
American Psychiatric Association. (1994). Diagnostic and statistical manual of

mental disorders (4th ed.). Washington, D.C.: Author.
A WORK IN SEVERAL VOLUMES:
Churchill, W.S. (1957). A history of the English speaking peoples: Vol. 3. The Age
of Revolution. New York: Dodd, Mead.
ENCYCLOPEDIA OR DICTIONARY:
Cockrell, D. (1980). Beatles. In The new Grove dictionary of music and musicians
(6th ed., Vol. 2, pp. 321-322). London: Macmillan.
ARTICLE FROM A WEEKLY MAGAZINE:
Jones, W. (1970, August 14). Todays's kids. Newseek, 76, 10-15.
ARTICLE FROM A MONTHLY MAGAZINE:
Howe, I. (1968, September). James Baldwin: At ease in apocalypse. Harper's, 237,

92-100.
ARTICLE FROM A NEWSPAPER:
Brody, J.E. (1976, October 10). Multiple cancers termed on increase. New York
Times (national ed.). p. A37.
ARTICLE FROM A SCHOLARLY ACADEMIC OR PROFESSIONAL JOURNAL:
Barber, B.K. (1994). Cultural, family, and personal contexts of parent-adolescent

conflict. Journal of Marriage and the Family, 56, 375-386.
GOVERNMENT PUBLICATION:
U.S. Department of Labor. Bureau of Labor Statistics. (1980). Productivity.

Washington, D.C.: U.S. Government Printing Office.
PAMPHLET OR BROCHURE:
Research and Training Center on Independent Living. (1993). Guidelines for

reporting and writing about people with disabilities. (4th ed.) [Brochure]. Lawrence,
KS: Author.
Tables
Any Tables should have a heading with 'Table #' (where # is the table number), followed by the title for the heading
that describes concisely what is contained in the table. Tables and Figures are typed on separate sheets at the end
of the paper after the References and before the Appendices. In the text you should put a reference where each
Table or Figure should be inserted using this form:
_________________________________________
Insert Table 1 about here
_________________________________________
Figures
Figures are drawn on separate sheets at the end of the paper after the References and and Tables, and before the
Appendices. In the text you should put a reference where each Figure will be inserted using this form:
_________________________________________
Insert Figure 1 about here
_________________________________________
Appendices
Appendices should be used only when absolutely necessary. Generally, you will only use them for presentation of
extensive measurement instruments, for detailed descriptions of the program or independent variable and for any
relevant supporting documents which you don't include in the body. Even if you include such appendices, you
should briefly describe the relevant material in the body and give an accurate citation to the appropriate appendix (e.
g., 'see Appendix A').

This paper should be used only as an example of a research paper write-up. Horizontal rules signify the top and bottom edges of pages.
For sample references which are not included with this paper, you should consult the Publication Manual of the American Psychological
Association, 4th Edition.
This paper is provided only to give you an idea of what a research paper might look like. You are not allowed to copy any of the text of
this paper in writing your own report.
Because word processor copies of papers don't translate well into web pages, you should note that an actual paper should be formatted
according to the formatting rules for your context. Note especially that there are three formatting rules you will see in this sample paper
which you should NOT follow. First, except for the title page, the running header should appear in the upper right corner of every page
with the page number below it. Second, paragraphs and text should be double spaced and the start of each paragraph should be
indented. Third, horizontal lines are used to indicate a mandatory page break and should not be used in your paper.
The Effects of a Supported Employment Program on Psychosocial Indicators
for Persons with Severe Mental Illness
William M.K. Trochim
Cornell University
Running Head: SUPPORTED EMPLOYMENT
Abstract
This paper describes the psychosocial effects of a program of supported employment (SE) for persons with severe mental illness. The SE program
involves extended individualized supported employment for clients through a Mobile Job Support Worker (MJSW) who maintains contact with the
client after job placement and supports the client in a variety of ways. A 50% simple random sample was taken of all persons who entered the
Thresholds Agency between 3/1/93 and 2/28/95 and who met study criteria. The resulting 484 cases were randomly assigned to either the SE
condition (treatment group) or the usual protocol (control group) which consisted of life skills training and employment in an in-house sheltered
workshop setting. All participants were measured at intake and at 3 months after beginning employment, on two measures of psychological
functioning (the BPRS and GAS) and two measures of self esteem (RSE and ESE). Significant treatment effects were found on all four measures,
but they were in the opposite direction from what was hypothesized. Instead of functioning better and having more self esteem, persons in SE had
lower functioning levels and lower self esteem. The most likely explanation is that people who work in low-paying service jobs in real world settings
generally do not like them and experience significant job stress, whether they have severe mental illness or not. The implications for theory in
psychosocial rehabilitation are considered.
The Effects of a Supported Employment Program on Psychosocial Indicators for Persons with Severe Mental Illness
Over the past quarter century a shift has occurred from traditional institution-based models of care for persons with severe mental illness (SMI) to
more individualized community-based treatments. Along with this, there has been a significant shift in thought about the potential for persons with
SMI to be "rehabilitated" toward lifestyles that more closely approximate those of persons without such illness. A central issue is the ability of a
person to hold a regular full-time job for a sustained period of time. There have been several attempts to develop novel and radical models for
program interventions designed to assist persons with SMI to sustain full-time employment while living in the community. The most promising of
these have emerged from the tradition of psychiatric rehabilitation with its emphases on individual consumer goal setting, skills training, job
preparation and employment support (Cook, Jonikas and Solomon, 1992). These are relatively new and field evaluations are rare or have only
recently been initiated (Cook and Razzano, 1992; Cook, 1992). Most of the early attempts to evaluate such programs have naturally focused almost
exclusively on employment outcomes. However, theory suggests that sustained employment and living in the community may have important
therapeutic benefits in addition to the obvious economic ones. To date, there have been no formal studies of the effects of psychiatric rehabilitation
programs on key illness-related outcomes. To address this issue, this study seeks to examine the effects of a new program of supported
employment on psychosocial outcomes for persons with SMI.
Over the past several decades, the theory of vocational rehabilitation has experienced two major stages of evolution. Original models of vocational
rehabilitation were based on the idea of sheltered workshop employment. Clients were paid a piece rate and worked only with other individuals who
were disabled. Sheltered workshops tended to be "end points" for persons with severe and profound mental retardation since few ever moved from
sheltered to competitive employment (Woest, Klein & Atkins, 1986). Controlled studies of sheltered workshop performance of persons with mental
illness suggested only minimal success (Griffiths, 1974) and other research indicated that persons with mental illness earned lower wages,
presented more behavior problems, and showed poorer workshop attendance than workers with other disabilities (Whitehead, 1977; Ciardiello,
1981).
In the 1980s, a new model of services called Supported Employment (SE) was proposed as less expensive and more normalizing for persons
undergoing rehabilitation (Wehman, 1985). The SE model emphasizes first locating a job in an integrated setting for minimum wage or above, and
then placing the person on the job and providing the training and support services needed to remain employed (Wehman, 1985). Services such as
individualized job development, one-on-one job coaching, advocacy with co-workers and employers, and "fading" support were found to be effective
in maintaining employment for individuals with severe and profound mental retardation (Revell, Wehman & Arnold, 1984). The idea that this model
could be generalized to persons with all types of severe disabilities, including severe mental illness, became commonly accepted (Chadsey-Rusch
& Rusch, 1986).
One of the more notable SE programs was developed at Thresholds, the site for the present study, which created a new staff position called the
mobile job support worker (MJSW) and removed the common six month time limit for many placements. MJSWs provide ongoing, mobile support
and intervention at or near the work site, even for jobs with high degrees of independence (Cook & Hoffschmidt, 1993). Time limits for many
placements were removed so that clients could stay on as permanent employees if they and their employers wished. The suspension of time limits
on job placements, along with MJSW support, became the basis of SE services delivered at Thresholds.
There are two key psychosocial outcome constructs of interest in this study. The first is the overall psychological functioning of the person with SMI.
This would include the specification of severity of cognitive and affective symptomotology as well as the overall level of psychological functioning.
The second is the level of self-reported self esteem of the person. This was measured both generally and with specific reference to employment.
The key hypothesis of this study is:
HO: A program of supported employment will result in either no change or negative effects on psychological functioning and self
esteem.
which will be tested against the alternative:
HA: A program of supported employment will lead to positive effects on psychological functioning and self esteem.
Method
Sample
The population of interest for this study is all adults with SMI residing in the U.S. in the early 1990s. The population that is accessible to this study
consists of all persons who were clients of the Thresholds Agency in Chicago, Illinois between the dates of March 1, 1993 and February 28, 1995
who met the following criteria: 1) a history of severe mental illness (e.g., either schizophrenia, severe depression or manic-depression); 2) a
willingness to achieve paid employment; 3) their primary diagnosis must not include chronic alcoholism or hard drug use; and 4) they must be 18
years of age or older. The sampling frame was obtained from records of the agency. Because of the large number of clients who pass through the
agency each year (e.g., approximately 500 who meet the criteria) a simple random sample of 50% was chosen for inclusion in the study. This
resulted in a sample size of 484 persons over the two-year course of the study.
On average, study participants were 30 years old and high school graduates (average education level = 13 years). The majority of participants
(70%) were male. Most had never married (85%), few (2%) were currently married, and the remainder had been formerly married (13%). Just over
half (51%) are African American, with the remainder Caucasian (43%) or other minority groups (6%). In terms of illness history, the members in the
sample averaged 4 prior psychiatric hospitalizations and spent a lifetime average of 9 months as patients in psychiatric hospitals. The primary
diagnoses were schizophrenia (42%) and severe chronic depression (37%). Participants had spent an average of almost two and one-half years (29
months) at the longest job they ever held.
While the study sample cannot be considered representative of the original population of interest, generalizability was not a primary goal -- the
major purpose of this study was to determine whether a specific SE program could work in an accessible context. Any effects of SE evident in this
study can be generalized to urban psychiatric agencies that are similar to Thresholds, have a similar clientele, and implement a similar program.
Measures
All but one of the measures used in this study are well-known instruments in the research literature on psychosocial functioning. All of the
instruments were administered as part of a structured interview that an evaluation social worker had with study participants at regular intervals.
Two measures of psychological functioning were used. The Brief Psychiatric Rating Scale (BPRS)(Overall and Gorham, 1962) is an 18-item scale
that measures perceived severity of symptoms ranging from "somatic concern" and "anxiety" to "depressive mood" and "disorientation." Ratings are
given on a 0-to-6 Likert-type response scale where 0="not present" and 6="extremely severe" and the scale score is simply the sum of the 18 items.
The Global Assessment Scale (GAS)(Endicott et al, 1976) is a single 1-to-100 rating on a scale where each ten-point increment has a detailed
description of functioning (higher scores indicate better functioning). For instance, one would give a rating between 91-100 if the person showed "no
symptoms, superior functioning..." and a value between 1-10 if the person "needs constant supervision..."
Two measures of self esteem were used. The first is the Rosenberg Self Esteem (RSE) Scale (Rosenberg, 1965), a 10-item scale rated on a 6-
point response format where 1="strongly disagree" and 6="strongly agree" and there is no neutral point. The total score is simply the sum across
the ten items, with five of the items being reversals. The second measure was developed explicitly for this study and was designed to measure the
Employment Self Esteem (ESE) of a person with SMI. This is a 10-item scale that uses a 4-point response format where 1="strongly disagree" and
4="strongly agree" and there is no neutral point. The final ten items were selected from a pool of 97 original candidate items, based upon high item-
total score correlations and a judgment of face validity by a panel of three psychologists. This instrument was deliberately kept simple -- a shorter
response scale and no reversal items -- because of the difficulties associated with measuring a population with SMI. The entire instrument is
provided in Appendix A.
All four of the measures evidenced strong reliability and validity. Internal consistency reliability estimates using Cronbach's alpha ranged from .76
for ESE to .88 for SE. Test-retest reliabilities were nearly as high, ranging from .72 for ESE to .83 for the BPRS. Convergent validity was evidenced
by the correlations within construct. For the two psychological functioning scales the correlation was .68 while for the self esteem measures it was
somewhat lower at .57. Discriminant validity was examined by looking at the cross-construct correlations which ranged from .18 (BPRS-ESE) to .41
(GAS-SE).
Design
A pretest-posttest two-group randomized experimental design was used in this study. In notational form, the design can be depicted as:
ROXO
ROO
where:
R = the groups were randomly assigned
O = the four measures (i.e., BPRS, GAS, RSE, and ESE)
X = supported employment
The comparison group received the standard Thresholds protocol which emphasized in-house training in life skills and employment in an in-house
sheltered workshop. All participants were measured at intake (pretest) and at three months after intake (posttest).
This type of randomized experimental design is generally strong in internal validity. It rules out threats of history, maturation, testing,
instrumentation, mortality and selection interactions. Its primary weaknesses are in the potential for treatment-related mortality (i.e., a type of
selection-mortality) and for problems that result from the reactions of participants and administrators to knowledge of the varying experimental
conditions. In this study, the drop-out rate was 4% (N=9) for the control group and 5% (N=13) in the treatment group. Because these rates are low
and are approximately equal in each group, it is not plausible that there is differential mortality. There is a possibility that there were some
deleterious effects due to participant knowledge of the other group's existence (e.g., compensatory rivalry, resentful demoralization). Staff were
debriefed at several points throughout the study and were explicitly asked about such issues. There were no reports of any apparent negative
feelings from the participants in this regard. Nor is it plausible that staff might have equalized conditions between the two groups. Staff were given
extensive training and were monitored throughout the course of the study. Overall, this study can be considered strong with respect to internal
validity.
Procedure
Between 3/1/93 and 2/28/95 each person admitted to Thresholds who met the study inclusion criteria was immediately assigned a random number
that gave them a 50/50 chance of being selected into the study sample. For those selected, the purpose of the study was explained, including the
nature of the two treatments, and the need for and use of random assignment. Participants were assured confidentiality and were given an
opportunity to decline to participate in the study. Only 7 people (out of 491) refused to participate. At intake, each selected sample member was
assigned a random number giving them a 50/50 chance of being assigned to either the Supported Employment condition or the standard in-agency
sheltered workshop. In addition, all study participants were given the four measures at intake.
All participants spent the initial two weeks in the program in training and orientation. This consisted of life skill training (e.g., handling money, getting
around, cooking and nutrition) and job preparation (employee roles, coping strategies). At the end of that period, each participant was assigned to a
job site -- at the agency sheltered workshop for those in the control condition, and to an outside employer if in the Supported Employment group.
Control participants were expected to work full-time at the sheltered workshop for a three-month period, at which point they were posttested and
given an opportunity to obtain outside employment (either Supported Employment or not). The Supported Employment participants were each
assigned a case worker -- called a Mobile Job Support Worker (MJSW) -- who met with the person at the job site two times per week for an hour
each time. The MJSW could provide any support or assistance deemed necessary to help the person cope with job stress, including counseling or
working beside the person for short periods of time. In addition, the MJSW was always accessible by cellular telephone, and could be called by the
participant or the employer at any time. At the end of three months, each participant was post-tested and given the option of staying with their
current job (with or without Supported Employment) or moving to the sheltered workshop.
Results
There were 484 participants in the final sample for this study, 242 in each treatment. There were 9 drop-outs from the control group and 13 from the
treatment group, leaving a total of 233 and 229 in each group respectively from whom both pretest and posttest were obtained. Due to unexpected
difficulties in coping with job stress, 19 Supported Employment participants had to be transferred into the sheltered workshop prior to the posttest. In
all 19 cases, no one was transferred prior to week 6 of employment, and 15 were transferred after week 8. In all analyses, these cases were
included with the Supported Employment group (intent-to-treat analysis) yielding treatment effect estimates that are likely to be conservative.
The major results for the four outcome measures are shown in Figure 1.
_______________________________________
Insert Figure 1 about here
_______________________________________
It is immediately apparent that in all four cases the null hypothesis has to be accepted -- contrary to expectations, Supported Employment cases did
significantly worse on all four outcomes than did control participants.
The mean gains, standard deviations, sample sizes and t-values (t-test for differences in average gain) are shown for the four outcome measures in
Table 1.
_______________________________________
Insert Table 1 about here
_______________________________________
The results in the table confirm the impressions in the figures. Note that all t-values are negative except for the BPRS where high scores indicate
greater severity of illness. For all four outcomes, the t-values were statistically significant (p<.05).
Conclusions
The results of this study were clearly contrary to initial expectations. The alternative hypothesis suggested that SE participants would show
improved psychological functioning and self esteem after three months of employment. Exactly the reverse happened -- SE participants showed
significantly worse psychological functioning and self esteem.
There are two major possible explanations for this outcome pattern. First, it seems reasonable that there might be a delayed positive or
"boomerang" effect of employment outside of a sheltered setting. SE cases may have to go through an initial difficult period of adjustment (longer
than three months) before positive effects become apparent. This "you have to get worse before you get better" theory is commonly held in other
treatment-contexts like drug addiction and alcoholism. But a second explanation seems more plausible -- that people working full-time jobs in real-
world settings are almost certainly going to be under greater stress and experience more negative outcomes than those who work in the relatively
safe confines of an in-agency sheltered workshop. Put more succinctly, the lesson here might very well be that work is hard. Sheltered workshops
are generally very nurturing work environments where virtually all employees share similar illness histories and where expectations about
productivity are relatively low. In contrast, getting a job at a local hamburger shop or as a shipping clerk puts the person in contact with co-workers
who may not be sympathetic to their histories or forgiving with respect to low productivity. This second explanation seems even more plausible in
the wake of informal debriefing sessions held as focus groups with the staff and selected research participants. It was clear in the discussion that
SE persons experienced significantly higher job stress levels and more negative consequences. However, most of them also felt that the experience
was a good one overall and that even their "normal" co-workers "hated their jobs" most of the time.
One lesson we might take from this study is that much of our contemporary theory in psychiatric rehabilitation is naive at best and, in some cases,
may be seriously misleading. Theory led us to believe that outside work was a "good" thing that would naturally lead to "good" outcomes like
increased psychological functioning and self esteem. But for most people (SMI or not) work is at best tolerable, especially for the types of low-
paying service jobs available to study participants. While people with SMI may not function as well or have high self esteem, we should balance this
with the desire they may have to "be like other people" including struggling with the vagaries of life and work that others struggle with.
Future research in this are needs to address the theoretical assumptions about employment outcomes for persons with SMI. It is especially
important that attempts to replicate this study also try to measure how SE participants feel about the decision to work, even if traditional outcome
indicators suffer. It may very well be that negative outcomes on traditional indicators can be associated with a "positive" impact for the participants
and for the society as a whole.
References
Chadsey-Rusch, J. and Rusch, F.R. (1986). The ecology of the workplace. In J. Chadsey-Rusch, C. Haney-Maxwell, L. A. Phelps and F. R. Rusch
(Eds.), School-to-Work Transition Issues and Models. (pp. 59-94), Champaign IL: Transition Institute at Illinois.
Ciardiello, J.A. (1981). Job placement success of schizophrenic clients in sheltered workshop programs. Vocational Evaluation and Work
Adjustment Bulletin, 14, 125-128, 140.
Cook, J.A. (1992). Job ending among youth and adults with severe mental illness. Journal of Mental Health Administration, 19(2), 158-169.
Cook, J.A. & Hoffschmidt, S. (1993). Psychosocial rehabilitation programming: A comprehensive model for the 1990's. In R.W. Flexer and P.
Solomon (Eds.), Social and Community Support for People with Severe Mental Disabilities: Service Integration in Rehabilitation and Mental Health.
Andover, MA: Andover Publishing.
Cook, J.A., Jonikas, J., & Solomon, M. (1992). Models of vocational rehabilitation for youth and adults with severe mental illness. American
Rehabilitation, 18, 3, 6-32.
Cook, J.A. & Razzano, L. (1992). Natural vocational supports for persons with severe mental illness: Thresholds Supported Competitive
Employment Program, in L. Stein (ed.), New Directions for Mental Health Services, San Francisco: Jossey-Bass, 56, 23-41.
Endicott, J.R., Spitzer, J.L. Fleiss, J.L. and Cohen, J. (1976). The Global Assessment Scale: A procedure for measuring overall severity of
psychiatric disturbance. Archives of General Psychiatry, 33, 766-771.
Griffiths, R.D. (1974). Rehabilitation of chronic psychotic patients. Psychological Medicine, 4, 316-325.
Overall, J. E. and Gorham, D. R. (1962). The Brief Psychiatric Rating Scale. Psychological Reports, 10, 799-812.
Rosenberg, M. (1965). Society and Adolescent Self Image. Princeton, NJ, Princeton University Press.
Wehman, P. (1985). Supported competitive employment for persons with severe disabilities. In P. McCarthy, J. Everson, S. Monn & M. Barcus
(Eds.), School-to-Work Transition for Youth with Severe Disabilities, (pp. 167-182), Richmond VA: Virginia Commonwealth University.
Whitehead, C.W. (1977). Sheltered Workshop Study: A Nationwide Report on Sheltered Workshops and their Employment of Handicapped
Individuals. (Workshop Survey, Volume 1), U.S. Department of Labor Service Publication. Washington, DC: U.S. Government Printing Office.
Woest, J., Klein, M. and Atkins, B.J. (1986). An overview of supported employment strategies. Journal of Rehabilitation Administration, 10(4), 130-
135.
Table 1. Means, standard deviations and Ns for the pretest, posttest and gain scores for the four outcome variables and t-test for difference
between average gains.
BPRS Pretest Posttest Gain

Treatment Mean 3.2 5.1 1.9
sd 2.4 2.7 2.55
N 229 229 229
Control Mean 3.4 3.0 -0.4
sd 2.3 2.5 2.4
N 233 233 233
t= 9.979625 p<.05
GAS Pretest Posttest Gain

Treatment Mean 59 43 -16
sd 25.2 24.3 24.75
N 229 229 229
Control Mean 61 63 2
sd 26.7 22.1 24.4
N 233 233 233
t= -7.87075 p<.05
RSE Pretest Posttest Gain

sd 27.1 26.5 26.8
N 229 229 229
Control Mean 41 43 2
sd 28.2 25.9 27.05
N 233 233 233
t= -5.1889 p<.05
ESE Pretest Posttest Gain

sd 19.3 21.2 20.25
N 229 229 229
Control Mean 25 24 -1
sd 18.6 20.3 19.45
N 233 233 233
t= -5.41191 p<.05
Figure 1. Pretest and posttest means for treatment (SE) and control groups for the four outcome measures.
Appendix A
The Employment Self Esteem Scale
Please rate how strongly you agree or disagree with each of the following statements.
1. I feel good about my work on the job.

Strongly Disagree Somewhat Disagree Somewhat Agree Strongly Agree
2. On the whole, I get along well with others at work.

3. I am proud of my ability to cope with difficulties at work.

4. When I feel uncomfortable at work, I know how to handle it.

5. I can tell that other people at work are glad to have me there.
6. I know I'll be able to cope with work for as long as I want.

7. I am proud of my relationship with my supervisor at work.

8. I am confident that I can handle my job without constant
Strongly Disagree Somewhat Disagree Somewhat Agree Strongly Agree assistance.

9. I feel like I make a useful contribution at work.

10. I can tell that my co-workers respect me.

[ Home ] [ Structure of Research ] [ Deduction & Induction ] [ Positivism & Post-Positivism ] [ Introduction to Validity ]
Validity:
the best available approximation to the truth of a given proposition, inference, or conclusion
The first thing we have to ask is: "validity of what?" When we think about validity in research, most of us think about
research components. We might say that a measure is a valid one, or that a valid sample was drawn, or that the design
had strong validity. But all of those statements are technically incorrect. Measures, samples and designs don't 'have'
validity -- only propositions can be said to be valid. Technically, we should say that a measure leads to valid conclusions or
that a sample enables valid inferences, and so on. It is a proposition, inference or conclusion that can 'have' validity.
We make lots of different inferences or conclusions while conducting research. Many of these are related to the process of
doing research and are not the major hypotheses of the study. Nevertheless, like the bricks that go into building a wall,
these intermediate process and methodological propositions provide the foundation for the substantive conclusions that we
wish to address. For instance, virtually all social research involves measurement or observation. And, whenever we
measure or observe we are concerned with whether we are measuring what we intend to measure or with how our
observations are influenced by the circumstances in which they are made. We reach conclusions about the quality of our
measures -- conclusions that will play an important role in addressing the broader substantive issues of our study. When
we talk about the validity of research, we are often referring to these to the many conclusions we reach about the quality of
different parts of our research methodology.
We subdivide validity into four types. Each type addresses a specific methodological question. In order to understand the
types of validity, you have to know something about how we investigate a research question. Because all four validity types
are really only operative when studying causal questions, we will use a causal study to set the context.
The figure shows that there are really two realms that are involved in research. The first, on the top, is the land of theory. It
is what goes on inside our heads as researchers. It is were we keep our theories about how the world operates. The
second, on the bottom, is the land of observations. It is the real world into which we translate our ideas -- our programs,
treatments, measures and observations. When we conduct research, we are continually flitting back and forth between
these two realms, between what we think about the world and what is going on in it. When we are investigating a cause-
effect relationship, we have a theory (implicit or otherwise) of what the cause is (the cause construct). For instance, if we
are testing a new educational program, we have an idea of what it would look like ideally. Similarly, on the effect side, we
have an idea of what we are ideally trying to affect and measure (the effect construct). But each of these, the cause and
the effect, has to be translated into real things, into a program or treatment and a measure or observational method. We
use the term operationalization to describe the act of translating a construct into its manifestation. In effect, we take our
idea and describe it as a series of operations or procedures. Now, instead of it only being an idea in our minds, it becomes
a public entity that anyone can look at and examine for themselves. It is one thing, for instance, for you to say that you
would like to measure self-esteem (a construct). But when you show a ten-item paper-and-pencil self-esteem measure that
you developed for that purpose, others can look at it and understand more clearly what you intend by the term self-esteem.
Now, back to explaining the four validity types. They build on one another, with two of them (conclusion and internal)
referring to the land of observation on the bottom of the figure, one of them (construct) emphasizing the linkages between
the bottom and the top, and the last (external) being primarily concerned about the range of our theory on the top. Imagine
that we wish to examine whether use of a World Wide Web (WWW) Virtual Classroom improves student understanding of
course material. Assume that we took these two constructs, the cause construct (the WWW site) and the effect
(understanding), and operationalized them -- turned them into realities by constructing the WWW site and a measure of
knowledge of the course material. Here are the four validity types and the question each addresses:
Conclusion Validity: In this study, is there a relationship between the two variables?
In the context of the example we're considering, the question might be worded: in this study, is there a relationship
between the WWW site and knowledge of course material? There are several conclusions or inferences we might draw to
answer such a question. We could, for example, conclude that there is a relationship. We might conclude that there is a
positive relationship. We might infer that there is no relationship. We can assess the conclusion validity of each of these
conclusions or inferences.
Internal Validity: Assuming that there is a relationship in this study, is the relationship a
causal one?
Just because we find that use of the WWW site and knowledge are correlated, we can't necessarily assume that WWW
site use causes the knowledge. Both could, for example, be caused by the same factor. For instance, it may be that
wealthier students who have greater resources would be more likely to use have access to a WWW site and would excel
on objective tests. When we want to make a claim that our program or treatment caused the outcomes in our study, we can
consider the internal validity of our causal claim.
Construct Validity: Assuming that there is a causal relationship in this study, can we claim
that the program reflected well our construct of the program and that our measure reflected
well our idea of the construct of the measure?
In simpler terms, did we implement the program we intended to implement and did we measure the outcome we wanted to
measure? In yet other terms, did we operationalize well the ideas of the cause and the effect? When our research is over,
we would like to be able to conclude that we did a credible job of operationalizing our constructs -- we can assess the
construct validity of this conclusion.
External Validity: Assuming that there is a causal relationship in this study between the
constructs of the cause and the effect, can we generalize this effect to other persons,
places or times?
We are likely to make some claims that our research findings have implications for other groups and individuals in other
settings and at other times. When we do, we can examine the external validity of these claims.
Notice how the
question that each
validity type
addresses
presupposes an
affirmative answer
to the previous
one. This is what
we mean when we
say that the
validity types build
on one another.
The figure shows
the idea of
cumulativeness as
a staircase, along
with the key
question for each
validity type.
For any inference

or conclusion,
there are always
possible threats
to validity --
reasons the
conclusion or
inference might be wrong. Ideally, one tries to reduce the plausibility of the most likely threats to validity, thereby leaving as
most plausible the conclusion reached in the study. For instance, imagine a study examining whether there is a relationship
between the amount of training in a specific technology and subsequent rates of use of that technology. Because the
interest is in a relationship, it is considered an issue of conclusion validity. Assume that the study is completed and no
significant correlation between amount of training and adoption rates is found. On this basis it is concluded that there is no
relationship between the two. How could this conclusion be wrong -- that is, what are the "threats to validity"? For one, it's
possible that there isn't sufficient statistical power to detect a relationship even if it exists. Perhaps the sample size is too
small or the measure of amount of training is unreliable. Or maybe assumptions of the correlational test are violated given
the variables used. Perhaps there were random irrelevancies in the study setting or random heterogeneity in the
respondents that increased the variability in the data and made it harder to see the relationship of interest. The inference
that there is no relationship will be stronger -- have greater conclusion validity -- if one can show that these alternative
explanations are not credible. The distributions might be examined to see if they conform with assumptions of the statistical
test, or analyses conducted to determine whether there is sufficient statistical power.
The theory of validity, and the many lists of specific threats, provide a useful scheme for assessing the quality of research
conclusions. The theory is general in scope and applicability, well-articulated in its philosophical suppositions, and virtually
impossible to explain adequately in a few minutes. As a framework for judging the quality of evaluations it is indispensable
and well worth understanding.

Most research projects share the same general structure. You might think of this structure as following the shape of
an hourglass. The research process usually starts with a broad area of interest, the initial problem that the
researcher wishes to study. For instance, the researcher could be interested in how to use computers to improve the
performance of students in mathematics. But this initial interest is far to broad to study in any single research project
(it might not even be addressable in a lifetime of research). The researcher has to narrow the question down to one
that can reasonably be studied in a research project. This might involve formulating a hypothesis or a focus
question. For instance, the researcher might hypothesize that a particular method of computer instruction in math
will improve the ability of elementary school students in a specific district. At the narrowest point of the research
hourglass, the researcher is engaged in direct measurement or observation of the question of interest.
Once the basic data is collected, the researcher begins to try to understand it, usually by analyzing it in a variety of
ways. Even for a single hypothesis there are a number of analyses a researcher might typically conduct. At this
point, the researcher begins to formulate some initial conclusions about what happened as a result of the
computerized math program. Finally, the researcher often will attempt to address the original broad question of
interest by generalizing from the results of this specific study to other related situations. For instance, on the basis of
strong results indicating that the math program had a positive effect on student performance, the researcher might
conclude that other school districts similar to the one in the study might expect similar results.
Components of a Study
What are the basic components or parts of a research study? Here, we'll describe the basic components involved in
a causal study. Because causal studies presuppose descriptive and relational questions, many of the components of
causal studies will also be found in those others.
Most social research originates from some general problem or question. You might, for instance, be interested in
what programs enable the unemployed to get jobs. Usually, the problem is broad enough that you could not hope to
address it adequately in a single research study. Consequently, we typically narrow the problem down to a more
specific research question that we can hope to address. The research question is often stated in the context of
some theory that has been advanced to address the problem. For instance, we might have the theory that ongoing
support services are needed to assure that the newly employed remain employed. The research question is the
central issue being addressed in the study and is often phrased in the language of theory. For instance, a research
question might be:
Is a program of supported employment more effective (than no program at all) at keeping newly
employed persons on the job?
The problem with such a question is that it is still too general to be studied directly. Consequently, in most research
we develop an even more specific statement, called an hypothesis that describes in operational terms exactly what
we think will happen in the study. For instance, the hypothesis for our employment study might be something like:
The Metropolitan Supported Employment Program will significantly increase rates of employment
after six months for persons who are newly employed (after being out of work for at least one year)
compared with persons who receive no comparable program.
Notice that this hypothesis is specific enough that a reader can understand quite well what the study is trying to
assess.
In causal studies, we have at least two major variables of interest, the cause and the effect. Usually the cause is
some type of event, program, or treatment. We make a distinction between causes that the researcher can control
(such as a program) versus causes that occur naturally or outside the researcher's influence (such as a change in
interest rates, or the occurrence of an earthquake). The effect is the outcome that you wish to study. For both the
cause and effect we make a distinction between our idea of them (the construct) and how they are actually
manifested in reality. For instance, when we think about what a program of support services for the newly employed
might be, we are thinking of the "construct." On the other hand, the real world is not always what we think it is. In
research, we remind ourselves of this by distinguishing our view of an entity (the construct) from the entity as it
exists (the operationalization). Ideally, we would like the two to agree.
Social research is always conducted in a social context. We ask people questions, or observe families interacting, or
measure the opinions of people in a city. An important component of a research project is the units that participate
in the project. Units are directly related to the question of sampling. In most projects we cannot involve all of the
people we might like to involve. For instance, in studying a program of support services for the newly employed we
can't possibly include in our study everyone in the world, or even in the country, who is newly employed. Instead, we
have to try to obtain a representative sample of such people. When sampling, we make a distinction between the
theoretical population of interest to our study and the final sample that we actually measure in our study. Usually the
term "units" refers to the people that we sample and from whom we gather information. But for some projects the
units are organizations, groups, or geographical entities like cities or towns. Sometimes our sampling strategy is
multi-level: we sample a number of cities and within them sample families.
In causal studies, we are interested in the effects of some cause on one or more outcomes. The outcomes are
directly related to the research problem -- we are usually most interested in outcomes that are most reflective of the
problem. In our hypothetical supported employment study, we would probably be most interested in measures of
employment -- is the person currently employed, or, what is their rate of absenteeism.
Finally, in a causal study we usually are comparing the effects of our cause of interest (e.g., the program) relative to
other conditions (e.g., another program or no program at all). Thus, a key component in a causal study concerns
how we decide what units (e.g., people) receive our program and which are placed in an alternative condition. This
issue is directly related to the research design that we use in the study. One of the central questions in research
design is determining how people wind up in or are placed in various programs or treatments that we are comparing.
These, then, are the major components in a causal study:
The Research Problem

The Research Question
The Program (Cause)
The Units
The Outcomes (Effect)
The Design
Deductive and Inductive Thinking
In logic, we often refer to the two broad methods of reasoning as the deductive and inductive approaches.
Deductive reasoning works from

the more general to the more
specific. Sometimes this is
informally called a "top-down"
approach. We might begin with
thinking up a theory about our
topic of interest. We then narrow
that down into more specific
hypotheses that we can test. We
narrow down even further when
we collect observations to
address the hypotheses. This
ultimately leads us to be able to
test the hypotheses with specific data -- a confirmation (or not) of our original theories.
Inductive reasoning works the

other way, moving from specific
observations to broader
generalizations and theories.
Informally, we sometimes call
this a "bottom up" approach
(please note that it's "bottom up"
and not "bottoms up" which is
the kind of thing the bartender
says to customers when he's
trying to close for the night!). In
inductive reasoning, we begin
with specific observations and
measures, begin to detect
patterns and regularities,
formulate some tentative hypotheses that we can explore, and finally end up developing some general conclusions
or theories.
These two methods of reasoning have a very different "feel" to them when you're conducting research. Inductive
reasoning, by its very nature, is more open-ended and exploratory, especially at the beginning. Deductive reasoning
is more narrow in nature and is concerned with testing or confirming hypotheses. Even though a particular study
may look like it's purely deductive (e.g., an experiment designed to test the hypothesized effects of some treatment
on some outcome), most social research involves both inductive and deductive reasoning processes at some time in
the project. In fact, it doesn't take a rocket scientist to see that we could assemble the two graphs above into a
single circular one that continually cycles from theories down to observations and back up again to theories. Even in
the most constrained experiment, the researchers may observe patterns in the data that lead them to develop new
theories.

Let's start our very brief discussion of philosophy of science with a simple distinction between epistemology and
methodology. The term epistemology comes from the Greek word epistêmê, their term for knowledge. In simple
terms, epistemology is the philosophy of knowledge or of how we come to know. Methodology is also concerned
with how we come to know, but is much more practical in nature. Methodology is focused on the specific ways -- the
methods -- that we can use to try to understand our world better. Epistemology and methodology are intimately
related: the former involves the philosophy of how we come to know the world and the latter involves the practice.
When most people in our society think about science, they think about some guy in a white lab coat working at a lab
bench mixing up chemicals. They think of science as boring, cut-and-dry, and they think of the scientist as narrow-
minded and esoteric (the ultimate nerd -- think of the humorous but nonetheless mad scientist in the Back to the
Future movies, for instance). A lot of our stereotypes about science come from a period where science was
dominated by a particular philosophy -- positivism -- that tended to support some of these views. Here, I want to
suggest (no matter what the movie industry may think) that science has moved on in its thinking into an era of post-
positivism where many of those stereotypes of the scientist no longer hold up.
Let's begin by considering what positivism is. In its broadest sense, positivism is a rejection of metaphysics (I leave it
you to look up that term if you're not familiar with it). It is a position that holds that the goal of knowledge is simply to
describe the phenomena that we experience. The purpose of science is simply to stick to what we can observe and
measure. Knowledge of anything beyond that, a positivist would hold, is impossible. When I think of positivism (and
the related philosophy of logical positivism) I think of the behaviorists in mid-20th Century psychology. These were
the mythical 'rat runners' who believed that psychology could only study what could be directly observed and
measured. Since we can't directly observe emotions, thoughts, etc. (although we may be able to measure some of
the physical and physiological accompaniments), these were not legitimate topics for a scientific psychology. B.F.
Skinner argued that psychology needed to concentrate only on the positive and negative reinforcers of behavior in
order to predict how people will behave -- everything else in between (like what the person is thinking) is irrelevant
because it can't be measured.
In a positivist view of the world, science was seen as the way to get at truth, to understand the world well enough so
that we might predict and control it. The world and the universe were deterministic -- they operated by laws of cause
and effect that we could discern if we applied the unique approach of the scientific method. Science was largely a
mechanistic or mechanical affair. We use deductive reasoning to postulate theories that we can test. Based on the
results of our studies, we may learn that our theory doesn't fit the facts well and so we need to revise our theory to
better predict reality. The positivist believed in empiricism -- the idea that observation and measurement was the
core of the scientific endeavor. The key approach of the scientific method is the experiment, the attempt to discern
natural laws through direct manipulation and observation.
OK, I am exaggerating the positivist position (although you may be amazed at how close to this some of them
actually came) in order to make a point. Things have changed in our views of science since the middle part of the
20th century. Probably the most important has been our shift away from positivism into what we term post-
positivism. By post-positivism, I don't mean a slight adjustment to or revision of the positivist position -- post-
positivism is a wholesale rejection of the central tenets of positivism. A post-positivist might begin by recognizing that
the way scientists think and work and the way we think in our everyday life are not distinctly different. Scientific
reasoning and common sense reasoning are essentially the same process. There is no difference in kind between
the two, only a difference in degree. Scientists, for example, follow specific procedures to assure that observations
are verifiable, accurate and consistent. In everyday reasoning, we don't always proceed so carefully (although, if you
think about it, when the stakes are high, even in everyday life we become much more cautious about measurement.
Think of the way most responsible parents keep continuous watch over their infants, noticing details that non-
parents would never detect).
One of the most common forms of post-positivism is a philosophy called critical realism. A critical realist believes
that there is a reality independent of our thinking about it that science can study. (This is in contrast with a
subjectivist who would hold that there is no external reality -- we're each making this all up!). Positivists were also
realists. The difference is that the post-positivist critical realist recognizes that all observation is fallible and has error
and that all theory is revisable. In other words, the critical realist is critical of our ability to know reality with certainty.
Where the positivist believed that the goal of science was to uncover the truth, the post-positivist critical realist
believes that the goal of science is to hold steadfastly to the goal of getting it right about reality, even though we can
never achieve that goal! Because all measurement is fallible, the post-positivist emphasizes the importance of
multiple measures and observations, each of which may possess different types of error, and the need to use
triangulation across these multiple errorful sources to try to get a better bead on what's happening in reality. The
post-positivist also believes that all observations are theory-laden and that scientists (and everyone else, for that
matter) are inherently biased by their cultural experiences, world views, and so on. This is not cause to give up in
despair, however. Just because I have my world view based on my experiences and you have yours doesn't mean
that we can't hope to translate from each other's experiences or understand each other. That is, post-positivism
rejects the relativist idea of the incommensurability of different perspectives, the idea that we can never
understand each other because we come from different experiences and cultures. Most post-positivists are
constructivists who believe that we each construct our view of the world based on our perceptions of it. Because
perception and observation is fallible, our constructions must be imperfect. So what is meant by objectivity in a post-
positivist world? Positivists believed that objectivity was a characteristic that resided in the individual scientist.
Scientists are responsible for putting aside their biases and beliefs and seeing the world as it 'really' is. Post-
positivists reject the idea that any individual can see the world perfectly as it really is. We are all biased and all of our
observations are affected (theory-laden). Our best hope for achieving objectivity is to triangulate across multiple
fallible perspectives! Thus, objectivity is not the characteristic of an individual, it is inherently a social phenomenon. It
is what multiple individuals are trying to achieve when they criticize each other's work. We never achieve objectivity
perfectly, but we can approach it. The best way for us to improve the objectivity of what we do is to do it within the
context of a broader contentious community of truth-seekers (including other scientists) who criticize each other's
work. The theories that survive such intense scrutiny are a bit like the species that survive in the evolutionary
struggle. (This is sometimes called the natural selection theory of knowledge and holds that ideas have 'survival
value' and that knowledge evolves through a process of variation, selection and retention). They have adaptive
value and are probably as close as our species can come to being objective and understanding reality.
Clearly, all of this stuff is not for the faint-of-heart. I've seen many a graduate student get lost in the maze of
philosophical assumptions that contemporary philosophers of science argue about. And don't think that I believe this
is not important stuff. But, in the end, I tend to turn pragmatist on these matters. Philosophers have been debating
these issues for thousands of years and there is every reason to believe that they will continue to debate them for
thousands of years more. Those of us who are practicing scientists should check in on this debate from time to time
(perhaps every hundred years or so would be about right). We should think about the assumptions we make about
the world when we conduct research. But in the meantime, we can't wait for the philosophers to settle the matter.
After all, we do have our own work to do!

[ Home ] [ Conclusion Validity ] [ Data Preparation ] [ Descriptive Statistics ] [ Inferential Statistics ]
Threats to Conclusion Validity Of the four types of validity (see also internal validity, construct validity and external validity)
conclusion validity is undoubtedly the least considered and most misunderstood. That's
Statistical Power
probably due to the fact that it was originally labeled 'statistical' conclusion validity and you
know how even the mere mention of the word statistics will scare off most of the human race!
In many ways, conclusion validity is the most important of the four validity types because it is
relevant whenever we are trying to decide if there is a relationship in our observations (and
that's one of the most basic aspects of any analysis). Perhaps we should start with an attempt
at a definition:
Conclusion validity is the degree to which conclusions we reach about

relationships in our data are reasonable.
For instance, if we're doing a study that looks at the relationship between socioeconomic
status (SES) and attitudes about capital punishment, we eventually want to reach some
conclusion. Based on our data, we may conclude that there is a positive relationship, that
persons with higher SES tend to have a more positive view of capital punishment while those
with lower SES tend to be more opposed. Conclusion validity is the degree to which the
conclusion we reach is credible or believable.
Although conclusion validity was originally thought to be a statistical inference issue, it has
become more apparent that it is also relevant in qualitative research. For example, in an
observational field study of homeless adolescents the researcher might, on the basis of field
notes, see a pattern that suggests that teenagers on the street who use drugs are more likely
to be involved in more complex social networks and to interact with a more varied group of
people. Although this conclusion or inference may be based entirely on impressionistic data,
we can ask whether it has conclusion validity, that is, whether it is a reasonable conclusion
about a relationship in our observations.
Whenever you investigate a relationship, you essentially have two possible conclusions --
either there is a relationship in your data or there isn't. In either case, however, you could be
wrong in your conclusion. You might conclude that there is a relationship when in fact there is
not, or you might infer that there isn't a relationship when in fact there is (but you didn't detect
it!). So, we have to consider all of these possibilities when we talk about conclusion validity.
It's important to realize that conclusion validity is an issue whenever you conclude there is a
relationship, even when the relationship is between some program (or treatment) and some
outcome. In other words, conclusion validity also pertains to causal relationships. How do we
distinguish it from internal validity which is also involved with causal relationships? Conclusion
validity is only concerned with whether there is a relationship. For instance, in a program
evaluation, we might conclude that there is a positive relationship between our educational
program and achievement test scores -- students in the program get higher scores and
students not in the program get lower ones. Conclusion validity is essentially whether that
relationship is a reasonable one or not, given the data. But it is possible that we will conclude
that, while there is a relationship between the program and outcome , the program didn't cause
the outcome. Perhaps some other factor, and not our program, was responsible for the
outcome in this study. For instance, the observed differences in the outcome could be due to
the fact that the program group was smarter than the comparison group to begin with. Our
observed posttest differences between these groups could be due to this initial difference and
not be the result of our program. This issue -- the possibility that some other factor than our
program caused the outcome -- is what internal validity is all about. So, it is possible that in a
study we can conclude that our program and outcome are related (conclusion validity) and
also conclude that the outcome was caused by some factor other than the program (i.e., we
don't have internal validity).
We'll begin this discussion by considering the major threats to conclusion validity, the different
reasons you might be wrong in concluding that there is or isn't a relationship. You'll see that
there are several key reasons why reaching conclusions about relationships is so difficult. One
major problem is that it is often hard to see a relationship because our measures or
observations have low reliability -- they are too weak relative to all of the 'noise' in the
environment. Another issue is that the relationship we are looking for may be a weak one and
seeing it is a bit like looking for a needle in the haystack. Sometimes the problem is that we
just didn't collect enough information to see the relationship even if it is there. All of these
problems are related to the idea of statistical power and so we'll spend some time trying to
understand what 'power' is in this context. One of the most interesting introductions to the idea
of statistical power is given in the 'OJ' Page which was created by Rob Becker to illustrate how
the decision a jury has to reach (guilty vs. not guilty) is similar to the decision a researcher
makes when assessing a relationship. The OJ Page uses the infamous OJ Simpson murder
trial to introduce the idea of statistical power and illustrate how manipulating various factors (e.
g., the amount of evidence, the "effect size", and the level of risk) affects the validity of the
verdict. Finally, we need to recognize that we have some control over our ability to detect
relationships, and we'll conclude with some suggestions for improving conclusion validity.

Data Preparation involves checking or logging the data in; checking the data for accuracy; entering the data into the
computer; transforming the data; and developing and documenting a database structure that integrates the various
measures.
Logging the Data
In any research project you may have data coming from a number of different sources at different times:
mail surveys returns

coded interview data
pretest or posttest data
observational data
In all but the simplest of studies, you need to set up a procedure for logging the information and keeping track of it
until you are ready to do a comprehensive data analysis. Different researchers differ in how they prefer to keep track
of incoming data. In most cases, you will want to set up a database that enables you to assess at any time what data
is already in and what is still outstanding. You could do this with any standard computerized database program (e.g.,
Microsoft Access, Claris Filemaker), although this requires familiarity with such programs. or, you can accomplish
this using standard statistical programs (e.g., SPSS, SAS, Minitab, Datadesk) and running simple descriptive
analyses to get reports on data status. It is also critical that the data analyst retain the original data records for a
reasonable period of time -- returned surveys, field notes, test protocols, and so on. Most professional researchers
will retain such records for at least 5-7 years. For important or expensive studies, the original data might be stored in
a data archive. The data analyst should always be able to trace a result from a data analysis back to the original
forms on which the data was collected. A database for logging incoming data is a critical component in good
research record-keeping.
Checking the Data For Accuracy
As soon as data is received you should screen it for accuracy. In some circumstances doing this right away will allow
you to go back to the sample to clarify any problems or errors. There are several questions you should ask as part of
this initial data screening:
Are the responses legible/readable?

Are all important questions answered?
Are the responses complete?
Is all relevant contextual information included (e.g., data, time, place, researcher)?
In most social research, quality of measurement is a major issue. Assuring that the data collection process does not
contribute inaccuracies will help assure the overall quality of subsequent analyses.
Developing a Database Structure
The database structure is the manner in which you intend to store the data for the study so that it can be accessed
in subsequent data analyses. You might use the same structure you used for logging in the data or, in large complex
studies, you might have one structure for logging data and another for storing it. As mentioned above, there are
generally two options for storing data on computer -- database programs and statistical programs. Usually database
programs are the more complex of the two to learn and operate, but they allow the analyst greater flexibility in
manipulating the data.
In every research project, you should generate a printed codebook that describes the data and indicates where and
how it can be accessed. Minimally the codebook should include the following items for each variable:
variable name
variable description
variable format (number, data, text)
instrument/method of collection
date collected
respondent or group
variable location (in database)
notes
The codebook is an indispensable tool for the analysis team. Together with the database, it should provide
comprehensive documentation that enables other researchers who might subsequently want to analyze the data to
do so without any additional information.
Entering the Data into the Computer
There are a wide variety of ways to enter the data into the computer for analysis. Probably the easiest is to just type
the data in directly. In order to assure a high level of data accuracy, the analyst should use a procedure called
double entry. In this procedure you enter the data once. Then, you use a special program that allows you to enter
the data a second time and checks each second entry against the first. If there is a discrepancy, the program notifies
the user and allows the user to determine the correct entry. This double entry procedure significantly reduces entry
errors. However, these double entry programs are not widely available and require some training. An alternative is to
enter the data once and set up a procedure for checking the data for accuracy. For instance, you might spot check
records on a random basis. Once the data have been entered, you will use various programs to summarize the data
that allow you to check that all the data are within acceptable limits and boundaries. For instance, such summaries
will enable you to easily spot whether there are persons whose age is 601 or who have a 7 entered where you
expect a 1-to-5 response.
Data Transformations
Once the data have been entered it is almost always necessary to transform the raw data into variables that are
usable in the analyses. There are a wide variety of transformations that you might perform. Some of the more
common are:
missing values
Many analysis programs automatically treat blank values as missing. In others, you need to
designate specific values to represent missing values. For instance, you might use a value of -99 to
indicate that the item is missing. You need to check the specific program you are using to determine
how to handle missing values.
item reversals
On scales and surveys, we sometimes use reversal items to help reduce the possibility of a
response set. When you analyze the data, you want all scores for scale items to be in the same
direction where high scores mean the same thing and low scores mean the same thing. In these
cases, you have to reverse the ratings for some of the scale items. For instance, let's say you had a
five point response scale for a self esteem measure where 1 meant strongly disagree and 5 meant
strongly agree. One item is "I generally feel good about myself." If the respondent strongly agrees
with this item they will put a 5 and this value would be indicative of higher self esteem. Alternatively,
consider an item like "Sometimes I feel like I'm not worth much as a person." Here, if a respondent
strongly agrees by rating this a 5 it would indicate low self esteem. To compare these two items, we
would reverse the scores of one of them (probably we'd reverse the latter item so that high values
will always indicate higher self esteem). We want a transformation where if the original value was 1
it's changed to 5, 2 is changed to 4, 3 remains the same, 4 is changed to 2 and 5 is changed to 1.
While you could program these changes as separate statements in most program, it's easier to do
this with a simple formula like:
New Value = (High Value + 1) - Original Value
In our example, the High Value for the scale is 5, so to get the new (transformed) scale value, we
simply subtract each Original Value from 6 (i.e., 5 + 1).
scale totals
Once you've transformed any individual scale items you will often want to add or average across
individual items to get a total score for the scale.
categories
For many variables you will want to collapse them into categories. For instance, you may want to
collapse income estimates (in dollar amounts) into income ranges.

Correlation Descriptive statistics are used to describe the basic features of the data in a study. They provide simple
summaries about the sample and the measures. Together with simple graphics analysis, they form the basis
of virtually every quantitative analysis of data.
Descriptive statistics are typically distinguished from inferential statistics. With descriptive statistics you are
simply describing what is or what the data shows. With inferential statistics, you are trying to reach
conclusions that extend beyond the immediate data alone. For instance, we use inferential statistics to try to
infer from the sample data what the population might think. Or, we use inferential statistics to make
judgments of the probability that an observed difference between groups is a dependable one or one that
might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our
data to more general conditions; we use descriptive statistics simply to describe what's going on in our data.
Descriptive Statistics are used to present quantitative descriptions in a manageable form. In a research study
we may have lots of measures. Or we may measure a large number of people on any measure. Descriptive
statistics help us to simply large amounts of data in a sensible way. Each descriptive statistic reduces lots of
data into a simpler summary. For instance, consider a simple number used to summarize how well a batter is
performing in baseball, the batting average. This single number is simply the number of hits divided by the
number of times at bat (reported to three significant digits). A batter who is hitting .333 is getting a hit one
time in every three at bats. One batting .250 is hitting one time in four. The single number describes a large
number of discrete events. Or, consider the scourge of many students, the Grade Point Average (GPA). This
single number describes the general performance of a student across a potentially wide range of course
experiences.
Every time you try to describe a large set of observations with a single indicator you run the risk of distorting
the original data or losing important detail. The batting average doesn't tell you whether the batter is hitting
home runs or singles. It doesn't tell whether she's been in a slump or on a streak. The GPA doesn't tell you
whether the student was in difficult courses or easy ones, or whether they were courses in their major field or
in other disciplines. Even given these limitations, descriptive statistics provide a powerful summary that may
enable comparisons across people or other units.
Univariate Analysis
Univariate analysis involves the examination across cases of one variable at a time. There are three major
characteristics of a single variable that we tend to look at:
the distribution
the central tendency
the dispersion
In most situations, we would describe all three of these characteristics for each of the variables in our study.
The Distribution. The distribution is a summary of the frequency of individual values or ranges of values for
a variable. The simplest distribution would list every value of a variable and the number of persons who had
each value. For instance, a typical way to describe the distribution of college students is by year in college,
listing the number or percent of students at each of the four years. Or, we describe gender by listing the
number or percent of males and females. In these cases, the variable has few enough values that we can list
each one and summarize how many sample cases had the value. But what do we do for a variable like
income or GPA? With these variables there can be a large number of possible values, with relatively few
people having each one. In this case, we group the raw scores into categories according to ranges of values.
For instance, we might look at GPA according to the letter grade ranges. Or, we might group income into four
or five ranges of income values.
Table 1. Frequency distribution table.
One of the most common ways to describe a single variable is with a frequency distribution. Depending on
the particular variable, all of the data values may be represented, or you may group the values into
categories first (e.g., with age, price, or temperature variables, it would usually not be sensible to determine
the frequencies for each value. Rather, the value are grouped into ranges and the frequencies determined.).
Frequency distributions can be depicted in two ways, as a table or as a graph. Table 1 shows an age
frequency distribution with five categories of age ranges defined. The same frequency distribution can be
depicted in a graph as shown in Figure 2. This type of graph is often referred to as a histogram or bar chart.
Table 2. Frequency distribution bar chart.

Distributions may also be displayed using percentages. For example, you could use percentages to describe
the:
percentage of people in different income levels

percentage of people in different age ranges
percentage of people in different ranges of standardized test scores
Central Tendency. The central tendency of a distribution is an estimate of the "center" of a distribution of
values. There are three major types of estimates of central tendency:
Mean
Median
Mode
The Mean or average is probably the most commonly used method of describing central tendency. To
compute the mean all you do is add up all the values and divide by the number of values. For example, the
mean or average quiz score is determined by summing all the scores and dividing by the number of students
taking the exam. For example, consider the test score values:
15, 20, 21, 20, 36, 15, 25, 15
The sum of these 8 values is 167, so the mean is 167/8 = 20.875.
The Median is the score found at the exact middle of the set of values. One way to compute the median is to
list all scores in numerical order, and then locate the score in the center of the sample. For example, if there
are 500 scores in the list, score #250 would be the median. If we order the 8 scores shown above, we would
get:
15,15,15,20,20,21,25,36
There are 8 scores and score #4 and #5 represent the halfway point. Since both of these scores are 20, the
median is 20. If the two middle scores had different values, you would have to interpolate to determine the
median.
The mode is the most frequently occurring value in the set of scores. To determine the mode, you might
again order the scores as shown above, and then count each one. The most frequently occurring value is the
mode. In our example, the value 15 occurs three times and is the model. In some distributions there is more
than one modal value. For instance, in a bimodal distribution there are two values that occur most frequently.
Notice that for the same set of 8 scores we got three different values -- 20.875, 20, and 15 -- for the mean,
median and mode respectively. If the distribution is truly normal (i.e., bell-shaped), the mean, median and
mode are all equal to each other.
Dispersion. Dispersion refers to the spread of the values around the central tendency. There are two
common measures of dispersion, the range and the standard deviation. The range is simply the highest
value minus the lowest value. In our example distribution, the high value is 36 and the low is 15, so the range
is 36 - 15 = 21.
The Standard Deviation is a more accurate and detailed estimate of dispersion because an outlier can
greatly exaggerate the range (as was true in this example where the single outlier value of 36 stands apart
from the rest of the values. The Standard Deviation shows the relation that set of scores has to the mean of
the sample. Again lets take the set of scores:
15,20,21,20,36,15,25,15
to compute the standard deviation, we first find the distance between each value and the mean. We know
from above that the mean is 20.875. So, the differences from the mean are:
15 - 20.875 = -5.875
20 - 20.875 = -0.875
21 - 20.875 = +0.125
20 - 20.875 = -0.875
36 - 20.875 = 15.125
15 - 20.875 = -5.875
25 - 20.875 = +4.125
15 - 20.875 = -5.875
Notice that values that are below the mean have negative discrepancies and values above it have positive
ones. Next, we square each discrepancy:
-5.875 * -5.875 = 34.515625

-0.875 * -0.875 = 0.765625
+0.125 * +0.125 = 0.015625
-0.875 * -0.875 = 0.765625
15.125 * 15.125 = 228.765625
-5.875 * -5.875 = 34.515625
+4.125 * +4.125 = 17.015625
-5.875 * -5.875 = 34.515625
Now, we take these "squares" and sum them to get the Sum of Squares (SS) value. Here, the sum is
350.875. Next, we divide this sum by the number of scores minus 1. Here, the result is 350.875 / 7 = 50.125.
This value is known as the variance. To get the standard deviation, we take the square root of the variance
(remember that we squared the deviations earlier). This would be SQRT(50.125) = 7.079901129253.
Although this computation may seem convoluted, it's actually quite simple. To see this, consider the formula
for the standard deviation:
In the top part of the ratio, the numerator, we see that each score has the the mean subtracted from it, the
difference is squared, and the squares are summed. In the bottom part, we take the number of scores minus
1. The ratio is the variance and the square root is the standard deviation. In English, we can describe the
standard deviation as:
the square root of the sum of the squared deviations from the mean divided by the number of scores
minus one
Although we can calculate these univariate statistics by hand, it gets quite tedious when you have more than
a few values and variables. Every statistics program is capable of calculating them easily for you. For
instance, I put the eight scores into SPSS and got the following table as a result:
N 8
Mean 20.8750
Median 20.0000
Mode 15.00
Std. Deviation 7.0799
Variance 50.1250
Range 21.00
which confirms the calculations I did by hand above.
The standard deviation allows us to reach some conclusions about specific scores in our distribution.
Assuming that the distribution of scores is normal or bell-shaped (or close to it!), the following conclusions
can be reached:
approximately 68% of the scores in the sample fall within one standard deviation of the mean
approximately 95% of the scores in the sample fall within two standard deviations of the mean
approximately 99% of the scores in the sample fall within three standard deviations of the mean
For instance, since the mean in our example is 20.875 and the standard deviation is 7.0799, we can from the
above statement estimate that approximately 95% of the scores will fall in the range of 20.875-(2*7.0799) to
20.875+(2*7.0799) or between 6.7152 and 35.0348. This kind of information is a critical stepping stone to
enabling us to compare the performance of an individual on one variable with their performance on another,
even when the variables are measured on entirely different scales.

The t-Test With inferential statistics, you are trying to reach conclusions that extend beyond the
Dummy Variables immediate data alone. For instance, we use inferential statistics to try to infer from the
General Linear Model
sample data what the population might think. Or, we use inferential statistics to make
Posttest-Only Analysis
judgments of the probability that an observed difference between groups is a
Factorial Design Analysis
dependable one or one that might have happened by chance in this study. Thus, we
Randomized Block Analysis
Analysis of Covariance
use inferential statistics to make inferences from our data to more general conditions;
Nonequivalent Groups Analysis we use descriptive statistics simply to describe what's going on in our data.
Regression-Discontinuity Analysis
Regression Point Displacement Analysis Here, I concentrate on inferential statistics that are useful in experimental and quasi-
experimental research design or in program outcome evaluation. Perhaps one of the
simplest inferential test is used when you want to compare the average performance
of two groups on a single measure to see if there is a difference. You might want to
know whether eighth-grade boys and girls differ in math test scores or whether a
program group differs on the outcome measure from a control group. Whenever you
wish to compare the average performance between two groups you should consider
the t-test for differences between groups.
Most of the major inferential statistics come from a general family of statistical models
known as the General Linear Model. This includes the t-test, Analysis of Variance
(ANOVA), Analysis of Covariance (ANCOVA), regression analysis, and many of the
multivariate methods like factor analysis, multidimensional scaling, cluster analysis,
discriminant function analysis, and so on. Given the importance of the General Linear
Model, it's a good idea for any serious social researcher to become familiar with its
workings. The discussion of the General Linear Model here is very elementary and
only considers the simplest straight-line model. However, it will get you familiar with
the idea of the linear model and help prepare you for the more complex analyses
described below.
One of the keys to understanding how groups are compared is embodied in the notion
of the "dummy" variable. The name doesn't suggest that we are using variables that
aren't very smart or, even worse, that the analyst who uses them is a "dummy"!
Perhaps these variables would be better described as "proxy" variables. Essentially a
dummy variable is one that uses discrete numbers, usually 0 and 1, to represent
different groups in your study. Dummy variables are a simple idea that enable some
pretty complicated things to happen. For instance, by including a simple dummy
variable in an model, I can model two separate lines (one for each treatment group)
with a single equation. To see how this works, check out the discussion on dummy
variables.
One of the most important analyses in program outcome evaluations involves

comparing the program and non-program group on the outcome variable or variables.
How we do this depends on the research design we use. research designs are divided
into two major types of designs: experimental and quasi-experimental. Because the
analyses differ for each, they are presented separately.
Experimental Analysis. The simple two-group posttest-only randomized experiment
is usually analyzed with the simple t-test or one-way ANOVA. The factorial
experimental designs are usually analyzed with the Analysis of Variance (ANOVA)
Model. Randomized Block Designs use a special form of ANOVA blocking model that
uses dummy-coded variables to represent the blocks. The Analysis of Covariance
Experimental Design uses, not surprisingly, the Analysis of Covariance statistical
model.
Quasi-Experimental Analysis. The quasi-experimental designs differ from the

experimental ones in that they don't use random assignment to assign units (e.g.,
people) to program groups. The lack of random assignment in these designs tends to
complicate their analysis considerably. For example, to analyze the Nonequivalent
Groups Design (NEGD) we have to adjust the pretest scores for measurement error in
what is often called a Reliability-Corrected Analysis of Covariance model. In the
Regression-Discontinuity Design, we need to be especially concerned about
curvilinearity and model misspecification. Consequently, we tend to use a
conservative analysis approach that is based on polynomial regression that starts by
overfitting the likely true function and then reducing the model based on the results.
The Regression Point Displacement Design has only a single treated unit.
Nevertheless, the analysis of the RPD design is based directly on the traditional
ANCOVA model.
When you've investigated these various analytic models, you'll see that they all come
from the same family -- the General Linear Model. An understanding of that model will
go a long way to introducing you to the intricacies of data analysis in applied and
social research contexts.

[ Home ] [ The t-Test ] [ Dummy Variables ] [ General Linear Model ] [ Posttest-Only Analysis ] [ Factorial Design Analysis ] [ Randomized Block Analysis ] [ Analysis of Covariance ] [ Nonequivalent Groups Analysis ] [ Regression-Discontinuity Analysis ]
[ Regression Point Displacement Analysis ]
The t-test assesses whether the means of two groups are statistically different from each other. This analysis is appropriate whenever you want to compare the means of two groups, and especially appropriate as the
analysis for the posttest-only two-group randomized experimental design.
Figure 1. Idealized distributions for treated and comparison group posttest

values.
Figure 1 shows the distributions for the treated (blue) and control (green) groups in a study. Actually, the figure shows the idealized distribution -- the actual distribution would usually be depicted with a histogram or bar
graph. The figure indicates where the control and treatment group means are located. The question the t-test addresses is whether the means are statistically different.
What does it mean to say that the averages for two groups are statistically different? Consider the three situations shown in Figure 2. The first thing to notice about the three situations is that the difference between the
means is the same in all three. But, you should also notice that the three situations don't look the same -- they tell very different stories. The top example shows a case with moderate variability of scores within each
group. The second situation shows the high variability case. the third shows the case with low variability. Clearly, we would conclude that the two groups appear most different or distinct in the bottom or low-variability
case. Why? Because there is relatively little overlap between the two bell-shaped curves. In the high variability case, the group difference appears least striking because the two bell-shaped distributions overlap so much.
Figure 2. Three scenarios for differences between means.
This leads us to a very important conclusion: when we are looking at the differences between scores for two groups, we have to judge the difference between their means relative to the spread or variability of their
scores. The t-test does just this.
Statistical Analysis of the t-test
The formula for the t-test is a ratio. The top part of the ratio is just the difference between the two means or averages. The bottom part is a measure of the variability or dispersion of the scores. This formula is essentially
another example of the signal-to-noise metaphor in research: the difference between the means is the signal that, in this case, we think our program or treatment introduced into the data; the bottom part of the formula is
a measure of variability that is essentially noise that may make it harder to see the group difference. Figure 3 shows the formula for the t-test and how the numerator and denominator are related to the distributions.
Figure 3. Formula for the t-test.
The top part of the formula is easy to compute -- just find the difference between the means. The bottom part is called the standard error of the difference. To compute it, we take the variance for each group and divide
it by the number of people in that group. We add these two values and then take their square root. The specific formula is given in Figure 4:
Figure 4. Formula for the Standard error of the difference between the means.
Remember, that the variance is simply the square of the standard deviation.
The final formula for the t-test is shown in Figure 5:
Figure 5. Formula for the t-test.
The t-value will be positive if the first mean is larger than the second and negative if it is smaller. Once you compute the t-value you have to look it up in a table of significance to test whether the ratio is large enough to
say that the difference between the groups is not likely to have been a chance finding. To test the significance, you need to set a risk level (called the alpha level). In most social research, the "rule of thumb" is to set the
alpha level at .05. This means that five times out of a hundred you would find a statistically significant difference between the means even if there was none (i.e., by "chance"). You also need to determine the degrees of
freedom (df) for the test. In the t-test, the degrees of freedom is the sum of the persons in both groups minus 2. Given the alpha level, the df, and the t-value, you can look the t-value up in a standard table of significance
(available as an appendix in the back of most statistics texts) to determine whether the t-value is large enough to be significant. If it is, you can conclude that the difference between the means for the two groups is
different (even given the variability). Fortunately, statistical computer programs routinely print the significance test results and save you the trouble of looking them up in a table.
The t-test, one-way Analysis of Variance (ANOVA) and a form of regression analysis are mathematically equivalent (see the statistical analysis of the posttest-only randomized experimental design) and would yield
identical results.

A dummy variable is a numerical variable used in regression analysis to represent subgroups of the sample in your study. In research design, a dummy variable is often used to distinguish different treatment groups. In
the simplest case, we would use a 0,1 dummy variable where a person is given a value of 0 if they are in the control group or a 1 if they are in the treated group. Dummy variables are useful because they enable us to
use a single regression equation to represent multiple groups. This means that we don't need to write out separate equation models for each subgroup. The dummy variables act like 'switches' that turn various
parameters on and off in an equation. Another advantage of a 0,1 dummy-coded variable is that even though it is a nominal-level variable you can treat it statistically like an interval-level variable (if this made no sense to
you, you probably should refresh your memory on levels of measurement). For instance, if you take an average of a 0,1 variable, the result is the proportion of 1s in the distribution.
To illustrate dummy variables, consider the simple regression model for a posttest-only two-group randomized experiment. This model is essentially the same as conducting a t-test on the posttest means for two groups
or conducting a one-way Analysis of Variance (ANOVA). The key term in the model is b1, the estimate of the difference between the groups. To see how dummy variables work, we'll use this simple model to show you
how to use them to pull out the separate sub-equations for each subgroup. Then we'll show how you estimate the difference between the subgroups by subtracting their respective equations. You'll see that we can pack
an enormous amount of information into a single equation using dummy variables. All I want to show you here is that b1 is the difference between the treatment and control groups.
To see this, the first step is to compute what the equation would be for each of our two groups separately. For the control group, Z = 0. When we substitute that into the equation, and recognize that by assumption the
error term averages to 0, we find that the predicted value for the control group is b0, the intercept. Now, to figure out the treatment group line, we substitute the value of 1 for Z, again recognizing that by assumption the
error term averages to 0. The equation for the treatment group indicates that the treatment group value is the sum of the two beta values.
Now, we're ready to move on to the second step -- computing the difference between the groups. How do we determine that? Well, the difference must be the difference between the equations for the two groups that we
worked out above. In other word, to find the difference between the groups we just find the difference between the equations for the two groups! It should be obvious from the figure that the difference is b1. Think about
what this means. The difference between the groups is b1. OK, one more time just for the sheer heck of it. The difference between the groups in this model is b1!
Whenever you have a regression model with dummy variables, you can always see how the variables are being used to represent multiple subgroup equations by following the two steps described above:
create separate equations for each subgroup by substituting the dummy values
find the difference between groups by finding the difference between their equations

The General Linear Model (GLM) underlies most of the statistical analyses that are used in applied and social research. It is the foundation for the t-test, Analysis of Variance (ANOVA), Analysis of Covariance
(ANCOVA), regression analysis, and many of the multivariate methods including factor analysis, cluster analysis, multidimensional scaling, discriminant function analysis, canonical correlation, and others. Because of its
generality, the model is important for students of social research. Although a deep understanding of the GLM requires some advanced statistics training, I will attempt here to introduce the concept and provide a non-
statistical description.
The Two-Variable Linear Model
The easiest point of entry into understanding the GLM is with the two-variable case. Figure 1 shows a bivariate plot of two variables. These may be any two
continuous variables but, in the discussion that follows we will think of them as a pretest (on the x-axis) and a posttest (on the y-axis). Each dot on the plot represents
the pretest and posttest score for an individual. The pattern clearly shows a positive relationship because, in general, people with higher pretest scores also have
higher posttests, and vice versa.
The goal in our data analysis is to summarize or describe accurately what is happening in the data. The bivariate
plot shows the data. How might we best summarize these data? Figure 2 shows that a straight line through the
"cloud" of data points would effectively describe the pattern in the bivariate plot. Although the line does not
perfectly describe any specific point (because no point falls precisely on the line), it does accurately describe the
pattern in the data. When we fit a line to data, we are using what we call a linear model. The term "linear" refers
to the fact that we are fitting a line. The term model refers to the equation that summarizes the line that we fit. A
line like the one shown in Figure 2 is often referred to as a regression line and the analysis that produces it is
often called regression analysis.
Figure 3 shows the Figure 1. Bivariate plot.

equation for a
straight line. You
may remember this
equation from your
high school algebra
Figure 2. A straight-line summary of the data. classes where it is
often stated in the
form y = mx + b. In
this equation, the components are:
y = the y-axis variable, the outcome or posttest

x = the x-axis variable, the pretest
b0 = the intercept (value of y when x=0)
b1 = the slope of the line
The slope of the line is the change in the posttest given in pretest units.
As mentioned above, this equation does not perfectly fit the cloud of Figure 3. The straight-line model.
points in Figure 1. If it did, every point would fall on the line. We need one
more component to describe the way this line is fit to the bivariate plot.
Figure 4 shows the equation for the two variable or bivariate linear model. The component that we have added to the equation in Figure 3 is an error term, e, that describes the vertical distance from the straight line to
each point. This term is called "error" because it is the degree to which the line is in error in describing each point. When we fit the two-variable linear model to our data, we have an x and y score for each person in our
study. We input these value pairs into a computer program. The program estimates the b0 and b1 values for us as indicated in Figure 5. We will actually get two numbers back that are estimates of those two values.
You can think of the two-variable regression line like any other descriptive statistic -- it is simply describing the relationship between two variables much as a mean describes the central tendency of a single variable. And,
just as the mean does not accurately represent every value in a distribution, the regression line does not accurately represent every value in the bivariate distribution. We use these summaries because they show the
general patterns in our data and allow us to describe these patterns in more concise ways than showing the entire distribution allows.
The General Linear Model

Given this brief introduction to the two-
variable case, we are able to extend
the model to its most general case.
Essentially the GLM looks the same as
the two variable model shown in Figure
4 -- it is just an equation. But the big
difference is that each of the four terms
in the GLM can represent a set of
variables, not just a single one. So, the
general linear model can be written:
y = b0 + bx + e
Figure 5. What the model estimates.

Figure 4. The two-variable linear model. where:
y = a set of outcome variables

x = a set of pre-program variables or covariates
b0 = the set of intercepts (value of each y when each x=0)
b = a set of coefficients, one each for each x
You should be able to see that this model allows us to include an enormous amount of information. In an experimental or quasi-experimental study, we would represent the program or treatment with one or more dummy
coded variables, each represented in the equation as an additional x-value (although we usually use the symbol z to indicate that the variable is a dummy-coded x). If our study has multiple outcome variables, we can
include them as a set of y-values. If we have multiple pretests, we can include them as a set of x-values. For each x-value (and each z-value) we estimate a b-value that represents an x,y relationship. The estimates of
these b-values, and the statistical testing of these estimates, is what enables us to test specific research hypotheses about relationships between variables or differences between groups.
The GLM allows us to summarize a wide variety of research outcomes. The major problem for the researcher who uses the GLM is model specification. The researcher is responsible for specifying the exact equation
that best summarizes the data for a study. If the model is misspecified, the estimates of the coefficients (the b-values) are likely to be biased (i.e., wrong) and the resulting equation will not describe the data accurately. In
complex situations, this model specification problem can be a serious and difficult one (see, for example, the discussion of model specification in the statistical analysis of the regression-discontinuity design).
The GLM is one of the most important tools in the statistical analysis of data. It represents a major achievement in the advancement of social research in the twentieth century.

To analyze the two-group posttest-only randomized experimental design we need an analysis that meets the following requirements:
has two groups

uses a post-only measure
has two distributions (measures), each with an average and variation
assess treatment effect = statistical (i.e., non-chance) difference between the groups
Before we can proceed to the analysis itself, it is useful to understand what is meant by the term "difference" as in "Is there a difference between the groups?" Each group can be represented by a "bell-shaped" curve
that describes the group's distribution on a single variable. You can think of the bell curve as a smoothed histogram or bar graph describing the frequency of each possible measurement response. In the figure, we show
distributions for both the treatment and control group. The mean values for each group are indicated with dashed lines. The difference between the means is simply the horizontal difference between where the control
and treatment group means hit the horizontal axis.
Now, let's look at three different possible outcomes, labeled medium, high and low variability. Notice that the differences between the means in all three situations is exactly the same. The only thing that differs between
these is the variability or "spread" of the scores around the means. In which of the three cases would it be easiest to conclude that the means of the two groups are different? If you answered the low variability case, you
are correct! Why is it easiest to conclude that the groups differ in that case? Because that is the situation with the least amount of overlap between the bell-shaped curves for the two groups. If you look at the high
variability case, you should see that there quite a few control group cases that score in the range of the treatment group and vice versa. Why is this so important? Because, if you want to see if two groups are "different"
it's not good enough just to subtract one mean from the other -- you have to take into account the variability around the means! A small difference between means will be hard to detect if there is lots of variability or noise.
A large difference will between means will be easily detectable if variability is low. This way of looking at differences between groups is directly related to the signal-to-noise metaphor -- differences are more apparent
when the signal is high and the noise is low.
With that in mind, we can now examine how we estimate the differences between groups, often called the "effect" size. The top part of the ratio is the actual difference between means, The bottom part is an estimate of
the variability around the means. In this context, we would calculate what is known as the standard error of the difference between the means. This standard error incorporates information about the standard deviation
(variability) that is in each of the two groups. The ratio that we compute is called a t-value and describes the difference between the groups relative to the variability of the scores in the groups.
There are actually three different ways to estimate the treatment effect for the posttest-only randomized experiment. All three yield mathematically equivalent results, a fancy way of saying that they give you the exact
same answer. So why are there three different ones? In large part, these three approaches evolved independently and, only after that, was it clear that they are essentially three ways to do the same thing. So, what are
the three ways? First, we can compute an independent t-test as described above. Second, we could compute a one-way Analysis of Variance (ANOVA) between two independent groups. Finally, we can use
regression analysis to regress the posttest values onto a dummy-coded treatment variable. Of these three, the regression analysis approach is the most general. In fact, you'll find that I describe the statistical models
for all the experimental and quasi-experimental designs in regression model terms. You just need to be aware that the results from all three methods are identical.
OK, so here's the statistical model in notational form. You may not realize it, but essentially this formula is just the equation for a
straight line with a random error term thrown in (ei). Remember high school algebra? Remember high school? OK, for those of you
with faulty memories, you may recall that the equation for a straight line is often given as:
y = mx + b
which, when rearranged can be written as:
y = b + mx
(The complexities of the commutative property make you nervous? If this gets too tricky you may need to stop for a break. Have
something to eat, make some coffee, or take the poor dog out for a walk.). Now, you should see that in the statistical model yi is the
same as y in the straight line formula, β0 is the same as b, b1 is the same as m, and Zi is the same as x. In other words, in the
statistical formula, b0 is the intercept and b1 is the slope.
It is critical that you

understand that the
slope, b1 is the same thing as the posttest difference between the means for the two groups. How can a slope
be a difference between means? To see this, you have to take a look at a graph of what's going on. In the
graph, we show the posttest on the vertical axis. This is exactly the same as the two bell-shaped curves shown
in the graphs above except that here they're turned on their side. On the horizontal axis we plot the Z variable.
This variable only has two values, a 0 if the person is in the control group or a 1 if the person is in the program
group. We call this kind of variable a "dummy" variable because it is a "stand in" variable that represents the
program or treatment conditions with its two values (note that the term "dummy" is not meant to be a slur
against anyone, especially the people participating in your study). The two points in the graph indicate the
average posttest value for the control (Z=0) and treated (Z=1) cases. The line that connects the two dots is only
included for visual enhancement purposes -- since there are no Z values between 0 and 1 there can be no
values plotted where the line is. Nevertheless, we can meaningfully speak about the slope of this line, the line
that would connect the posttest means for the two values of Z. Do you remember the definition of slope? (Here
we go again, back to high school!). The slope is the change in y over the change in x (or, in this case, Z). But we
know that the "change in Z" between the groups is always equal to 1 (i.e., 1 - 0 = 1). So, the slope of the line
must be equal to the difference between the average y-values for the two groups. That's what I set out to show
(reread the first sentence of this paragraph). b1 is the same value that you would get if you just subtract the two
means from each other (in this case, because we set the treatment group equal to 1, this means we are
subtracting the control group out of the treatment group value. A positive value implies that the treatment group
mean is higher than the control, a negative means it's lower). But remember at the very beginning of this
discussion I pointed out that just knowing the difference between the means was not good enough for
estimating the treatment effect because it doesn't take into account the variability or spread of the scores. So
how do we do that here? Every regression analysis program will give, in addition to the beta values, a report on
whether each beta value is statistically significant. They report a t-value that tests whether the beta value differs
from zero. It turns out that the t-value for the b1 coefficient is the exact same number that you would get if you
did a t-test for independent groups. And, it's the same as the square root of the F value in the two group one-way ANOVA (because t2 = F).
Here's a few conclusions from all this:
the t-test, one-way ANOVA and regression analysis all yield same results in this case
the regression analysis method utilizes a dummy variable (Z) for treatment
regression analysis is the most general model of the three.

Here is the regression model statement for a simple 2 x 2 Factorial Design. In this design, we have one factor
for time in instruction (1 hour/week versus 4 hours/week) and one factor for setting (in-class or pull-out). The
model uses a dummy variable (represented by a Z) for each factor. In two-way factorial designs like this, we
have two main effects and one interaction. In this model, the main effects are the statistics associated with the
beta values that are adjacent to the Z-variables. The interaction effect is the statistic associated with b3 (i.e., the
t-value for this coefficient) because it is adjacent in the formula to the multiplication of (i.e., interaction of) the
dummy-coded Z variables for the two factors. Because there are two dummy-coded variables, each having two
values, you can write out 2 x 2 = 4 separate equations from this one general model. You might want to see if
you can write out the equations for the four cells. Then, look at some of the differences between the groups.
You can also write out two equations for each Z variable. These equations represent the main effect equations.
To see the difference between levels of a factor, subtract the equations from each other. If you're confused
about how to manipulate these equations, check the section on how dummy variables work.

I've decided to present the statistical model for the Randomized Block Design in regression analysis notation. Here is the model for a case where there are four blocks or homogeneous subgroups.
Notice that we use a number of dummy variables in specifying this model. We use the dummy variable Z1 to represent the treatment group. We use the dummy variables Z2, Z3 and Z4 to indicate blocks 2, 3 and 4
respectively. Analogously, the beta values (b's) reflect the treatment and blocks 2, 3 and 4. What happened to Block 1 in this model? To see what the equation for the Block 1 comparison group is, fill in your dummy
variables and multiply through. In this case, all four Zs are equal to 0 and you should see that the intercept (b0) is the estimate for the Block 1 control group. For the Block 1 treatment group, Z1 = 1 and the estimate is
equal to b0 + b1. By substituting the appropriate dummy variable "switches" you should be able to figure out the equation for any block or treatment group.
The data matrix that is entered into this analysis would consist of five columns and as many rows as you have participants: the posttest data, and one column of 0's or 1's for each of the four dummy variables.

I've decided to present the statistical model for the Analysis of Covariance design in regression analysis
notation. The model shown here is for a case where there is a single covariate and a treated and control
group. We use a dummy variables in specifying this model. We use the dummy variable Zi to represent the
treatment group. The beta values (b's) are the parameters we are estimating. The value b0 represents the
intercept. In this model, it is the predicted posttest value for the control group for a given X value (and, when
X=0, it is the intercept for the control group regression line). Why? Because a control group case has a Z=0
and since the Z variable is multiplied with b2, that whole term would drop out.
The data matrix that is entered into this analysis would consist of three columns and as many rows as you
have participants: the posttest data, one column of 0's or 1's to indicate which treatment group the participant
is in, and the covariate score.
This model assumes that the data in the two groups are well described by straight lines that have the same
slope. If this does not appear to be the case, you have to modify the model appropriately.

Analysis Requirements
The design notation for the Non-Equivalent Groups Design (NEGD) shows that we have two groups, a program and comparison group, and that each is measured
pre and post. The statistical model that we would intuitively expect could be used in this situation would have a pretest variable, posttest variable, and a dummy
variable variable that describes which group the person is in. These three variables would be the input for the statistical analysis. We would be interested in
estimating the difference between the groups on the posttest after adjusting for differences on the pretest. This is essentially the Analysis of Covariance (ANCOVA)
model as described in connection with randomized experiments (see the discussion of Analysis of Covariance and how we adjust for pretest differences). There's
only one major problem with this model when used with the NEGD -- it doesn't work! Here, I'll tell you the story of why the ANCOVA model fails and what we can do
to adjust it so it works correctly.
A Simulated Example
To see what happens when we use the ANCOVA analysis on data from a NEGD, I created a computer simulation to
generate hypothetical data. I created 500 hypothetical persons, with 250 in the program and 250 in the comparison
condition. Because this is a nonequivalent design, I made the groups nonequivalent on the pretest by adding five points to
each program group person's pretest score. Then, I added 15 points to each program person's posttest score. When we
take the initial 5-point advantage into account, we should find a 10 point program effect. The bivariate plot shows the data
from this simulation.
I then analyzed the data with the ANCOVA model. Remember that the way I set this up I should observe approximately a
10-point program effect if the ANCOVA analysis works correctly. The results are presented in the table.
In this analysis, I put in three scores for each person: a pretest score (X), a posttest score (Y) and either a 0 or 1 to
indicate whether the person was in the program (Z=1) or comparison (Z=0) group. The table shows the equation that the
ANCOVA model estimates. The equation has the three values I put in, (X, Y and Z) and the three coefficients that the
program estimates. The key coefficient is the one next to the program variable Z. This coefficient estimates the average
difference
between
the
program
and
comparison
groups
(because
it's the
coefficient
paired with the dummy variable indicating what group the person is in). The value should be 10 because I
put in a 10 point difference. In this analysis, the actual value I got was 11.3 (or 11.2818, to be more
precise). Well, that's not too bad, you might say. It's fairly close to the 10-point effect I put in. But we need
to determine if the obtained value of 11.2818 is statistically different from the true value of 10. To see
whether it is, we have to construct a confidence interval around our estimate and examine the difference
between 11.2818 and 10 relative to the variability in the data. Fortunately the program does this
automatically for us. If you look in the table, you'll see that the third line shows the coefficient associated
with the difference between the groups, the standard error for that coefficient (an indicator of variability),
the t-value, and the probability value. All the t-value shows is that the coefficient of 11.2818 is statistically
different from zero. But we want to know whether it is different from the true treatment effect value of 10.
To determine this, we can construct a confidence interval around the t-value, using the standard error. We
know that the 95% confidence interval is the coefficient plus or minus two times the standard error value.
The calculation shows that the 95% confidence interval for our 11.2818 coefficient is 10.1454 to 12.4182.
Any value falling within this range can't be considered different beyond a 95% level from our obtained
value of 11.2818. But the true value of 10 points falls outside the range. In other words, our estimate of
11.2818 is significantly different from the true value. In still other words, the results of this analysis are biased -- we got the wrong answer. In this example, our estimate of the program effect is significantly larger than the
true program effect (even though the difference between 10 and 11.2818 doesn't seem that much larger, it exceeds chance levels). So, we have a problem when we apply the analysis model that our intuition tells us
makes the most sense for the NEGD. To understand why this bias occurs, we have to look a little more deeply at how the statistical analysis works in relation to the NEGD.
The Problem
Why is the ANCOVA analysis biased when used with the NEGD? And, why isn't it biased when used with a pretest-posttest randomized experiment? Actually, there are several things happening to produce the bias,
which is why it's somewhat difficult to understand (and counterintuitive). Here are the two reasons we get a bias:
pretest measurement error which leads to the attenuation or "flattening" of the slopes in the regression lines
group nonequivalence
The first problem actually also occurs in randomized studies, but it doesn't lead to biased treatment effects because the groups are equivalent (at least probabilistically). It is the combination of both these conditions that
causes the problem. And, understanding the problem is what leads us to a solution in this case.
Regression and Measurement Error. We begin our attempt to understand the source of the bias by considering how error in measurement affects
regression analysis. We'll consider three different measurement error scenarios to see what error does. In all three scenarios, we assume that there
is no true treatment effect, that the null hypothesis is true. The first scenario is the case of no measurement error at all. In this hypothetical case, all
of the points fall right on the regression lines themselves. The second scenario introduces measurement error on the posttest, but not on the
pretest. The figure shows that when we have posttest error, we are disbursing the points vertically -- up and down -- from the regression lines.
Imagine a specific case, one person in our study. With no measurement error the person would be expected to score on the regression line itself.
With posttest measurement error, they would do better or worse on the posttest than they should. And, this would lead their score to be displaced
vertically. In the third scenario we have measurement error only on the pretest. It stands to reason that in this case we would be displacing cases
horizontally -- left and right -- off of the regression lines. For these three hypothetical cases, none of which would occur in reality, we can see how
data points would be disbursed.
How Regression Fits Lines. Regression analysis is a least squares analytic procedure. The actual criterion for fitting the line is to fit it so that you
minimize the sum of the squares of the residuals from the regression line. Let's deconstruct this sentence a bit. The key term is "residual." The
residual is the vertical distance from the regression line to each point.
The graph shows four residuals, two for each group. Two of the residuals fall above their regression line and two fall below. What is the criterion for fitting a line through the cloud of data points? Take all of the residuals
within a group (we'll fit separate lines for the program and comparison group). If they are above the line they will be positive and if they're below they'll be negative values. Square all the residuals in the group. Compute
the sum of the squares of the residuals -- just add them. That's it. Regression analysis fits a line through the data that yields the smallest sum of the squared residuals. How it does this is another matter. But you should
now understand what it's doing. The key thing to notice is that the regression line is fit in terms of the residuals and the residuals are vertical displacements from the regression line.
How Measurement Error Affects Slope Now we're ready to put the ideas of the previous two sections together. Again, we'll consider our three measurement error scenarios described above. When there is no
measurement error, the slopes of the regression lines are unaffected. The figure shown earlier shows the regression lines in this no error condition. Notice that there is no treatment effect in any of the three graphs
shown in the figure (there would be a treatment effect only if there was a vertical displacement between the two lines). Now, consider the case where there is measurement error on the posttest. Will the slopes be
affected? The answer is no. Why? Because in regression analysis we fit the line relative to the vertical displacements of the points. Posttest measurement error affects the vertical dimension, and, if the errors are
random, we would get as many residuals pushing up as down and the slope of the line would, on average, remain the same as in the null case. There would, in this posttest measurement error case, be more variability
of data around the regression line, but the line would be located in the same place as in the no error case.
Now, let's consider the case of measurement error on the pretest. In this scenario, errors are added along the horizontal dimension.
But regression analysis fits the lines relative to vertical displacements. So how will this affect the slope? The figure illustrates what
happens. If there was no error, the lines would overlap as indicated for the null case in the figure. When we add in pretest
measurement error, we are in effect elongating the horizontal dimension without changing the vertical. Since regression analysis fits to
the vertical, this would force the regression line to stretch to fit the horizontally elongated distribution. The only way it can do this is by
rotating around its center point. The result is that the line has been "flattened" or "attenuated" -- the slope of the line will be lower when
there is pretest measurement error than it should actually be. You should be able to see that if we flatten the line in each group by
rotating it around its own center that this introduces a displacement between the two lines that was not there originally. Although there
was no treatment effect in the original case, we have introduced a false or "pseudo" effect. The biased estimate of the slope that
results from pretest measurement error introduces a phony treatment effect. In this example, it introduced an effect where there was
none. In the simulated example shown earlier, it exaggerated the actual effect that we had constructed for the simulation.
Why Doesn't the Problem Occur in Randomized Designs? So, why doesn't this pseudo-effect occur in the randomized Analysis of
Covariance design? The next figure shows that even in the
randomized design, pretest measurement error does cause the
slopes of the lines to be flattened. But, we don't get a pseudo-
effect in the randomized case even though the attenuation occurs.
Why? Because in the randomized case the two groups are
equivalent on the pretest -- there is no horizontal difference
between the lines. The lines for the two groups overlap perfectly in
the null case. So, when the attenuation occurs, it occurs the same
way in both lines and there is no vertical displacement introduced
between the lines. Compare this figure to the one above. You
should now see that the difference is that in the NEGD case above we have the attenuation of slopes and the initial nonequivalence between the
groups. Under these circumstances the flattening of the lines introduces a displacement. In the randomized case we also get the flattening, but there is
no displacement because there is no nonequivalence between the groups initially.
Summary of the Problem. So where does this leave us? The ANCOVA statistical model seemed at first glance to have all of the right components to
correctly model data from the NEGD. But we found that it didn't work correctly -- the estimate of the treatment effect was biased. When we examined
why, we saw that the bias was due to two major factors: the attenuation of slope that results from pretest measurement error coupled with the initial
nonequivalence between the groups. The problem is not caused by posttest measurement error because of the criterion that is used in regression
analysis to fit the line. It does not occur in randomized experiments because there is no pretest nonequivalence. We might also guess from these arguments that the bias will be greater with greater nonequivalence
between groups -- the less similar the groups the bigger the problem. In real-life research, as opposed to simulations, you can count on measurement error on all measurements -- we never measure perfectly. So, in
nonequivalent groups designs we now see that the ANCOVA analysis that seemed intuitively sensible can be expected to yield incorrect results!
The Solution
Now that we understand the problem in the analysis of the NEGD, we can go about trying to fix it. Since the problem is caused in part by measurement error on the pretest, one way to deal with it would be to address the
measurement error issue. If we could remove the pretest measurement error and approximate the no pretest error case, there would be no attenuation or flattening of the regression lines and no pseudo-effect
introduced. To see how we might adjust for pretest measurement error, we need to recall what we know about measurement error and its relation to reliability of measurement.
Recall from reliability theory and the idea of true score theory that reliability can be defined as the ratio:
var(T)
var(T) + var(e)
where T is the true ability or level on the measure and e is measurement error. It follows that the reliability of the pretest is directly related to the amount of measurement error. If there is no measurement error on the
pretest, the var(e) term in the denominator is zero and reliability = 1. If the pretest is nothing but measurement error, the Var(T) term is zero and the reliability is 0. That is, if the measure is nothing but measurement error,
it is totally unreliable. If half of the measure is true score and half is measurement error, the reliability is.5. This shows that there is a direct relationship between measurement error and reliability -- reliability reflects the
proportion of measurement error in your measure. Since measurement error on the pretest is a necessary condition for bias in the NEGD (if there is no pretest measurement error there is no bias even in the NEGD), if
we correct for the measurement error we correct for the bias. But, we can't see measurement error directly in our data (remember, only God can see how much of a score is True Score and how much is error). However,
we can estimate the reliability. Since reliability is directly related to measurement error, we can use the reliability estimate as a proxy for how much measurement error is present. And, we can adjust pretest scores using
the reliability estimate to correct for the attenuation of slopes and remove the bias in the NEGD.
The Reliability-Corrected ANCOVA. We're going to solve the bias in ANCOVA treatment effect estimates for the NEGD using a "reliability"
correction that will adjust the pretest for measurement error. The figure shows what a reliability correction looks like. The top graph shows the
pretest distribution as we observe it, with measurement error included in it. Remember that I said above that adding measurement error widens
or elongates the horizontal dimension in the bivariate distribution. In the frequency distribution shown in the top graph, we know that the
distribution is wider than it would be if there was no error in measurement. The second graph shows that what we really want to do in adjusting
the pretest scores is to squeeze the pretest distribution inwards by an amount proportionate to the amount that measurement error elongated
widened it. We will do this adjustment separately for the program and comparisons groups. The third graph shows what effect "squeezing" the
pretest would have on the regression lines -- It would increase their slopes rotating them back to where they truly belong and removing the bias
that was introduced by the measurement error. In effect, we are doing the opposite of what measurement error did so that we can correct for
the measurement error.
All we need to know is how much to squeeze the pretest distribution in to correctly adjust for measurement error. The answer is in the reliability
coefficient. Since reliability is an estimate of the proportion of your measure that is true score relative to error, it should tell us how much we
have to "squeeze." In fact, the formula for the adjustment is very simple:
The idea in this formula is that we are going to construct new pretest scores for each person. These new scores will be "adjusted" for pretest
unreliability by an amount proportional to the reliability. Each person's score will be closer to the pretest mean for that group. The formula tells
us how much closer. Let's look at a few examples. First, let's look at the case where there is no pretest measurement error. Here, reliability
would be 1. In this case, we actually don't want to adjust the data at all. Imagine that we have a person with a pretest score of 40, where the
mean of the pretest for the group is 50. We would get an adjusted score of:
Xadj = 50 + 1(40-50)
Xadj = 50 + 1(-10)
Xadj = 50 -10
Xadj = 40
Or, in other words, we wouldn't make any adjustment at all. That's what we want in the no measurement error case.
Now, let's assume that reliability was relatively low, say .5. For a person with a pretest score of 40 where the group mean is 50, we would get:
Xadj = 50 + .5(40-50)
Xadj = 50 + .5(-10)
Xadj = 50 - 5
Xadj = 45
Or, when reliability is .5, we would move the pretest score halfway in towards the mean (halfway from its original value of 40 towards the mean of 50, or to 45).
Finally, let's assume that for the same case the reliability was stronger at .8. The reliability adjustment would be:
Xadj = 50 + .8(40-50)
Xadj = 50 + .8(-10)
Xadj = 50 - 8
Xadj = 42
That is, with reliability of .8 we would want to move the score in 20% towards its mean (because if reliability is .8, the amount of the score due to error is 1 -.8 = .2).
You should be able to see that if we make this adjustment to all of the pretest scores in a group, we would be "squeezing" the pretest distribution in by an amount proportionate to the measurement error (1 - reliability).
It's important to note that we need to make this correction separately for our program and comparison groups.
We're now ready to take this adjusted pretest score and substitute it for the original pretest score in our ANCOVA model:
Notice that the only difference is that we've changed the X in the original ANCOVA to the term Xadj.
The Simulation Revisited.
So, let's go see how well our adjustment works. We'll use the same simulated data that we used earlier. The results are:
This time we get an estimate of the treatment effect of 9.3048 (instead of 11.2818). This estimate is closer to the true value of 10 points that we put into the simulated data. And, when we construct a 95% confidence
interval for our adjusted estimate, we see that the true value of 10 falls within the interval. That is, the analysis estimated a treatment effect that is not statistically different from the true effect -- it is an unbiased estimate.
You should also compare the slope of the lines in this adjusted model with the original slope. Now, the slope is nearly 1 at 1.06316, whereas before it was .626 -- considerably lower or "flatter." The slope in our adjusted
model approximates the expected true slope of the line (which is 1). The original slope showed the attenuation that the pretest measurement error caused.
So, the reliability-corrected ANCOVA model is used in the statistical analysis of the NEGD to correct for the bias that would occur as a result of measurement error on the pretest.
Which Reliability To Use?
There's really only one more major issue to settle in order to finish the story. We know from reliability theory that we can't calculate the true reliability, we can only estimate it. There a variety of reliability estimates and
they're likely to give you different values. Cronbach's Alpha tends to be a high estimate of reliability. The test-retest reliability tends to be a lower-bound estimate of reliability. So which do we use in our correction
formula? The answer is: both! When analyzing data from the NEGD it's safest to do two analyses, one with an upper-bound estimate of reliability and one with a lower-bound one. If we find a significant treatment effect
estimate with both, we can be fairly confident that we would have found a significant effect in data that had no pretest measurement error.
This certainly doesn't feel like a very satisfying conclusion to our rather convoluted story about the analysis of the NEGD, and it's not. In some ways, I look at this as the price we pay when we give up random assignment
and use intact groups in a NEGD -- our analysis becomes more complicated as we deal with adjustments that are needed, in part, because of the nonequivalence between the groups. Nevertheless, there are also
benefits in using nonequivalent groups instead of randomly assigning. You have to decide whether the tradeoff is worth it.

Analysis Requirements
The basic RD Design is a two-group pretest-posttest model as indicated in the design notation. As in other versions of this design structure (e.g., the Analysis of
Covariance Randomized Experiment, the Nonequivalent Groups Design), we will need a statistical model that includes a term for the pretest, one for the posttest, and a
dummy-coded variable to represent the program.
Assumptions in the Analysis
It is important before discussing the specific analytic model to understand the assumptions which must be met. This presentation assumes that we are dealing with the
basic RD design as described earlier. Variations in the design will be discussed later. There are five central assumptions which must be made in order for the analytic model which is presented to be appropriate, each of
which is discussed in turn:
1. The Cutoff Criterion. The cutoff criterion must be followed without exception. When there is misassignment relative to the cutoff value (unless it is known to be random), a selection threat arises and estimates of
the effect of the program are likely to be biased. Misassignment relative to the cutoff, often termed a "fuzzy" RD design, introduces analytic complexities that are outside the scope of this discussion.
2. The Pre-Post Distribution. It is assumed that the pre-post distribution is describable as a polynomial function. If the true pre-post relationship is logarithmic, exponential or some other function, the model given
below is misspecified and estimates of the effect of the program are likely to be biased. Of course, if the data can be transformed to create a polynomial distribution prior to analysis the model below may be
appropriate although it is likely to be more problematic to interpret. It is also sometimes the case that even if the true relationship is not polynomial, a sufficiently high-order polynomial will adequately account for
whatever function exists. However, the analyst is not likely to know whether this is the case.
3. Comparison Group Pretest Variance. There must be a sufficient number of pretest values in the comparison group to enable adequate estimation of the true relationship (i.e., pre-post regression line) for that
group. It is usually desirable to have variability in the program group as well although this is not strictly required because one can project the comparison group line to a single point for the program group.
4. Continuous Pretest Distribution. Both groups must come from a single continuous pretest distribution with the division between groups determined by the cutoff. In some cases one might be able to find intact
groups (e.g., two groups of patients from two different geographic locations) which serendipitously divide on some measure so as to imply some cutoff. Such naturally discontinuous groups must be used with
caution because of the greater likelihood that if they differed naturally at the cutoff prior to the program such a difference could reflect a selection bias which could introduce natural pre-post discontinuities at that
point.
5. Program Implementation. It is assumed that the program is uniformly delivered to all recipients, that is, that they all receive the same dosage, length of stay, amount of training, or whatever. If this is not the
case, it is necessary to model explicitly the program as implemented, thus complicating the analysis somewhat.
The Curvilinearity Problem
The major problem in analyzing data from the RD design is model misspecification. As will be shown below, when you misspecify the statistical model, you are likely to get biased estimates of the treatment effect. To
introduce this idea, let's begin by considering what happens if the data (i.e., the bivariate pre-post relationship) are curvilinear and we fit a straight-line model to the data.
Figure 1. A curvilinear relationship.

Figure 1 shows a simple curvilinear relationship. If the curved line in Figure 1 describes the pre-post relationship, then we need to take this into account in our statistical model. Notice that, although there is a cutoff value
at 50 in the figure, there is no jump or discontinuity in the line at the cutoff. This indicates that there is no effect of the treatment.
Figure 2. A curvilinear relationship fit with a straight-line model.
Now, look at Figure 2. The figure shows what happens when we fit a straight-line model to the curvilinear relationship of Figure 1. In the model, we restricted the slopes of both straight lines to be the same (i.e., we did
not allow for any interaction between the program and the pretest). You can see that the straight line model suggests that there is a jump at the cutoff, even though we can see that in the true function there is no
discontinuity.
Figure 3. A curvilinear relationship fit with a straight-line model with different slopes for each line (an
interaction effect).
Even allowing the straight line slopes to differ doesn't solve the problem. Figure 3 shows what happens in this case. Although the pseudo-effect in this case is smaller than when the slopes are forced to be equal, we still
obtain a pseudo-effect.
The conclusion is a simple one. If the true model is curved and we fit only straight-lines, we are likely to conclude wrongly that the treatment made a difference when it did not. This is a specific instance of the more
general problem of model specification.
Model Specification
To understand the model specification issue and how it relates to the RD design, we must distinguish three types of specifications. Figure 4 shows the case where we exactly specify the true model. What does "exactly
specify" mean? The top equation describes the "truth" for the data. It describes a simple straight-line pre-post relationship with a treatment effect. Notice that it includes terms for the posttest Y, the pretest X, and the
dummy-coded treatment variable Z. The bottom equation shows the model that we specify in the analysis. It too includes a term for the posttest Y, the pretest X, and the dummy-coded treatment variable Z. And that's all
it includes -- there are no unnecessary terms in the model that we specify. When we exactly specify the true model, we get unbiased and efficient estimates of the treatment effect.
Figure 4. An exactly specified model.
Now, let's look at the situation in Figure 5. The true model is the same as in Figure 4. However, this time we specify an analytic model that includes an extra and unnecessary term. In this case, because we included all of
the necessary terms, our estimate of the treatment effect will be unbiased. However, we pay a price for including unneeded terms in our analysis -- the treatment effect estimate will not be efficient. What does this mean?
It means that the chance that we will conclude our treatment doesn't work when it in fact does is increased. Including an unnecessary term in the analysis is like adding unnecessary noise to the data -- it makes it harder
for us to see the effect of the treatment even if it's there.
Figure 5. An overspecified model.

Finally, consider the example described in Figure 6. Here, the truth is more complicated than our model. In reality, there are two terms that we did not include in our analysis. In this case, we will get a treatment effect
estimate that is both biased and inefficient.
Figure 6. An underspecified model.
Analysis Strategy
Given the discussion of model misspecification, we can develop a modeling strategy that is designed, first, to guard against biased estimates and, second, to assure maximum efficiency of estimates. The best option
would obviously be to specify the true model exactly. But this is often difficult to achieve in practice because the true model is often obscured by the error in the data. If we have to make a mistake -- if we must misspecify
the model -- we would generally prefer to overspecify the true model rather than underspecify. Overspecification assures that we have included all necessary terms even at the expense of unnecessary ones. It will yield
an unbiased estimate of the effect, even though it will be inefficient. Underspecification is the situation we would most like to avoid because it yields both biased and inefficient estimates.
Given this preference sequence, our general analysis strategy will be to begin by specifying a model that we are fairly certain is overspecified. The treatment effect estimate for this model is likely to be unbiased although
it will be inefficient. Then, in successive analyses, gradually remove higher-order terms until the treatment effect estimate appears to differ from the initial one or until the model diagnostics (e.g., residual plots) indicate
that the model fits poorly.
Steps in the Analysis
The basic RD analysis involves five steps:
1. Transform the Pretest.

Figure 7. Transforming the
The analysis begins by subtracting the cutoff value from each pretest score, creating the modified pretest term shown in Figure 7. This is done in order to set the intercept equal to the pretest by subtracting the
cutoff value. How does this work? If we subtract the cutoff from every pretest value, the modified pretest will be equal to 0 where it was originally at the cutoff value. Since the intercept cutoff value.
is by definition the y-value when x=0, what we have done is set X to 0 at the cutoff, making the cutoff the intercept point.
2. Examine Relationship Visually.
There are two major things to look for in a graph of the pre-post relationship. First it is important to determine whether there is any visually discernable discontinuity in the relationship at the cutoff. The
discontinuity could be a change in level vertically (main effect), a change in slope (interaction effect), or both. If it is visually clear that there is a discontinuity at the cutoff then one should not be satisfied with
analytic results which indicate no program effect. However, if no discontinuity is visually apparent, it may be that variability in the data is masking an effect and one must attend carefully to the analytic results.
The second thing to look for in the bivariate relationship is the degree of polynomial which may be required as indicated by the bivariate slope of the distribution, particularly in the comparison group. A good
approach is to count the number of flexion points (i.e., number of times the distribution "flexes" or "bends") which are apparent in the distribution. If the distribution appears linear, there are no flexion points. A
single flexion point could be indicative of a second (quadratic) order polynomial. This information will be used to determine the initial model which will be specified.
3. Specify Higher-Order Terms and Interactions.
Depending on the number of flexion points detected in step 2, one next creates transformations of the modified assignment variable, X. The rule of thumb here is that you go two orders of polynomial higher than
was indicated by the number of flexion points. Thus, if the bivariate relationship appeared linear (i.e., there were no flexion points), one would want to create transformations up to a second-order (0 + 2)
polynomial. This is shown in Figure 8. There do not appear to be any inflexion points or "bends" in the bivariate distribution of Figure 8.
Figure 8. Bivariate distribution with no flexion points.
The first order polynomial already exists in the model (X) and so one would only have to create the second-order polynomial by squaring X to obtain X2. For each transformation of X one also creates the
interaction term by multiplying the polynomial by Z. In this example there would be two interaction terms: XiZi and Xi2Zi. Each transformation can be easily accomplished through straightforward multiplication on
the computer. If there appeared to be two flexion points in the bivariate distribution, one would create transformations up to the fourth (2 + 2) power and their interactions.
Visual inspection need not be the only basis for the initial determination of the degree of polynomial which is needed. Certainly, prior experience modeling similar data should be taken into account. The rule of
thumb given here implies that one should err on the side of overestimating the true polynomial function which is needed for reasons outlined above in discussing model specification. For whatever power is
initially estimated from visual inspection one should construct all transformations and their interactions up to that power. Thus if the fourth power is chosen, one should construct all four terms X to X4 and their
interactions.
4. Estimate Initial Model.
At this point, one is ready to begin the analysis. Any acceptable multiple regression program can be used to accomplish this on the computer. One simply regresses the posttest scores, Y, on the modified pretest
X, the treatment variable Z, and all higher-order transformations and interactions created in step 3 above. The regression coefficient associated with the Z term (i.e., the group membership variable) is the
estimate of the main effect of the program. If there is a vertical discontinuity at the cutoff it will be estimated by this coefficient. One can test the significance of the coefficient (or any other) by constructing a
standard t-test using the standard error of the coefficient which is invariably supplied in the computer program output.
Figure 9. The initial model for the case of no flexion points (full quadratic model specification).
If the analyst at step 3 correctly overestimated the polynomial function required to model the distribution then the estimate of the program effect will at least be unbiased. However, by including terms which may
not be needed in the true model, the estimate is likely to be inefficient, that is, standard error terms will be inflated and hence the significance of the program effect may be underestimated. Nevertheless, if at this
point in the analysis the coefficient is highly significant, it would be reasonable to conclude that there is a program effect. The direction of the effect is interpreted based on the sign of the coefficient and the
direction of scale of the posttest. Interaction effects can also be examined. For instance, a linear interaction would be implied by a significant regression coefficient for the XZ term.
5. Refining the Model.
On the basis of the results of step 4 one might wish to attempt to remove apparently unnecessary terms and reestimate the treatment effect with greater efficiency. This is a tricky procedure and should be
approached cautiously if one wishes to minimize the possibility of bias. To accomplish this one should certainly examine the output of the regression analysis in step 4 noting the degree to which the overall
model fits the data, the presence of any insignificant coefficients and the pattern of residuals. A conservative way to decide how to refine the model would be to begin by examining the highest-order term in the
current model and its interaction. If both coefficients are nonsignificant, and the goodness-of-fit measures and pattern of residuals indicate a good fit one might drop these two terms and reestimate the resulting
model. Thus, if one estimated up to a fourth-order polynomial, and found the coefficients for X4 and X4Z were nonsignificant, these terms can be dropped and the third-order model respecified. One would repeat
this procedure until: 1) either of the coefficients is significant; b) the goodness-of-fit measure drops appreciably; or, c) the pattern of residuals indicates a poorly fitting model. The final model may still include
unnecessary terms but there are likely to be fewer of these and, consequently, efficiency should be greater. Model specification procedures which involve dropping any term at any stage of the analysis are more
dangerous and more likely to yield biased estimates because of the considerable multicolinearity which will exist between the terms in the model.
Example Analysis
It's easier to understand how data from a RD Design is analyzed by showing an example. The data for this example are shown in Figure 10.
Figure 10. Bivariate distribution for example RD analysis.

Several things are apparent visually. First, there is a whopping treatment effect. In fact, Figure 10 shows simulated data where the true treatment effect is 10 points. Second, both groups are well described by straight
lines -- there are no flexion points apparent. Thus, the initial model we'll specify is the full quadratic one shown above in Figure 9.
The results of our initial specification are shown in Figure 11. The treatment effect estimate is the one next to the "group" variable. This initial estimate is 10.231 (SE = 1.248) -- very close to the true value of 10 points.
But notice that there is evidence that several of the higher-order terms are not statistically significant and may not be needed in the model. Specifically, the linear interaction term "linint" (XZ), and both the quadratic (X2)
and quadratic interaction (X2Z) terms are not significant.
Figure 11. Regression results for the full quadratic model.
Although we might be tempted (and perhaps even justified) to drop all three terms from the model, if we follow the guidelines given above in Step 5 we will begin by dropping only the two quadratic terms "quad" and
"quadint". The results for this model are shown in Figure 12.
Figure 12. Regression results for initial model without quadratic terms.
We can see that in this model the treatment effect estimate is now 9.89 (SE = .95). Again, this estimate is very close to the true 10-point treatment effect. Notice, however, that the standard error (SE) is smaller than it
was in the original model. This is the gain in efficiency we get when we eliminate the two unneeded quadratic terms. We can also see that the linear interaction term "linint" is still nonsignificant. This term would be
significant if the slopes of the lines for the two groups were different. Visual inspection shows that the slopes are the same and so it makes sense that this term is not significant.
Finally, let's drop out the nonsignificant linear interaction term and respecify the model. These results are shown in Figure 13.
Figure 13. Regression results for final model.
We see in these results that the treatment effect and SE are almost identical to the previous model and that the treatment effect estimate is an unbiased estimate of the true effect of 10 points. We can also see that all of
the terms in the final model are statistically significant, suggesting that they are needed to model the data and should not be eliminated.
So, what does our model look like visually? Figure 14 shows the original bivariate distribution with the fitted regression model.
Figure 14. Bivariate distribution with final regression model.

Clearly, the model fits well, both statistically and visually.

Statistical Requirements
The notation for the Regression Point Displacement (RPD) design shows that the statistical analysis requires:
a posttest score
a pretest score
a variable to represent the treatment group (where 0=comparison and 1=program)
These requirements are identical to the requirements for the Analysis of Covariance model. The only difference is that the RPD design only has a single treated group score.
The figure shows a bivariate (pre-post) distribution for a hypothetical RPD design of a community-based AIDS education program. The new AIDS education program is piloted in one particular county in a state, with the
remaining counties acting as controls. The state routinely publishes annual HIV positive rates by county for the entire state. The x-values show the HIV-positive rates per 1000 people for the year preceding the program
while the y-values show the rates for the year following it. Our goal is to estimate the size of the vertical displacement of the treated unit from the regression line of all of the control units, indicated on the graph by the
dashed arrow. The model we'll use is the Analysis of Covariance (ANCOVA) model stated in regression model form:
When we fit the model to our simulated data, we obtain the regression table shown below:
The coefficient associated with the dichotomous treatment variable is the estimate of the vertical displacement from the line. In this example, the results show that the program lowers HIV positive rates by .019 and that
this amount is statistically significant. This displacement is shown in the results graph:
For more details on the statistical analysis of the RPD design, you can view an entire paper on the subject entitled " The Regression Point Displacement Design for Evaluating Community-Based Pilot Programs and
Demonstration Projects."

[ Home ] [ The Nonequivalent Groups Design ] [ The Regression-Discontinuity Design ] [ Other Quasi-Experimental Designs ]
The Proxy Pretest Design

The Separate Pre-Post Samples Design
The Double Pretest Design
The Switching Replications Design
The Nonequivalent Dependent Variables (NEDV) Design
The Regression Point Displacement (RPD) Design
There are many different types of quasi-experimental designs that have a variety of applications in specific contexts.
Here, I'll briefly present a number of the more interesting or important quasi-experimental designs. By studying the
features of these designs, you can come to a deeper understanding of how to tailor design components to address
threats to internal validity in your own research contexts.
The Proxy Pretest Design
The proxy pretest design looks like a standard pre-post design. But there's
an important difference. The pretest in this design is collected after the
program is given! But how can you call it a pretest if it's collected after the
program? Because you use a "proxy" variable to estimate where the groups
would have been on the pretest. There are essentially two variations of this
design. In the first, you ask the participants to estimate where their pretest
level would have been. This can be called the "Recollection" Proxy Pretest
Design. For instance, you might ask participants to complete your measures "estimating how you would have
answered the questions six months ago." This type of proxy pretest is not very good for estimating actual pre-post
changes because people may forget where they were at some prior time or they may distort the pretest estimates to
make themselves look better. However, there may be times when you are interested not so much in where they
were on the pretest but rather in where they think they were. The recollection proxy pretest would be a sensible way
to assess participants' perceived gain or change.
The other proxy pretest design uses archived records to stand in for the pretest. We might call this the "Archived"
Proxy Pretest design. For instance, imagine that you are studying the effects of an educational program on the math
performance of eighth graders. Unfortunately, you were brought in to do the study after the program had already
been started (a too-frequent case, I'm afraid). You are able to construct a posttest that shows math ability after
training, but you have no pretest. Under these circumstances, your best bet might be to find a proxy variable that
would estimate pretest performance. For instance, you might use the student's grade point average in math from the
seventh grade as the proxy pretest.
The proxy pretest design is not one you should ever select by choice. But, if you find yourself in a situation where
you have to evaluate a program that has already begun, it may be the best you can do and would almost certainly
be better than relying only on a posttest-only design.
The Separate Pre-Post Samples Design
The basic idea in this design (and its variations) is that the people you use for the
pretest are not the same as the people you use for the posttest. Take a close look at
the design notation for the first variation of this design. There are four groups
(indicated by the four lines) but two of these groups come from a single
nonequivalent group and the other two also come from a single nonequivalent group
(indicated by the subscripts next to N). Imagine that you have two agencies or
organizations that you think are similar. You want to implement your study in one
agency and use the other as a control. The program you are looking at is an agency-
wide one and you expect that the outcomes will be most noticeable at the agency level. For instance, let's say the
program is designed to improve customer satisfaction. Because customers routinely cycle through your agency, you
can't measure the same customers pre-post. Instead, you measure customer satisfaction in each agency at one
point in time, implement your program, and then measure customer satisfaction in the agency at another point in
time after the program. Notice that the customers will be different within each agency for the pre and posttest. This
design is not a particularly strong one. Because you cannot match individual participant responses from pre to post,
you can only look at the change in average customer satisfaction. Here, you always run the risk that you have
nonequivalence not only between the agencies but that within agency the pre and post groups are nonequivalent.
For instance, if you have different types of clients at different times of the year, this could bias the results. You could
also look at this as having a proxy pretest on a different group of people.
The second example of the separate pre-post sample design is shown in

design notation at the right. Again, there are four groups in the study. This time,
however, you are taking random samples from your agency or organization at
each point in time. This is essentially the same design as above except for the
random sampling. Probably the most sensible use of this design would be in
situations where you routinely do sample surveys in an organization or
community. For instance, let's assume that every year two similar communities
do a community-wide survey of residents to ask about satisfaction with city
services. Because of costs, you randomly sample each community each year. In one of the communities you decide
to institute a program of community policing and you want to see whether residents feel safer and have changed in
their attitudes towards police. You would use the results of last year's survey as the pretest in both communities, and
this year's results as the posttest. Again, this is not a particularly strong design. Even though you are taking random
samples from each community each year, it may still be the case that the community changes fundamentally from
one year to the next and that the random samples within a community cannot be considered "equivalent."
The Double Pretest Design
The Double Pretest is a very strong quasi-experimental design with respect to

internal validity. Why? Recall that the Pre-Post Nonequivalent Groups Design
(NEGD) is especially susceptible to selection threats to internal validity. In other
words, the nonequivalent groups may be different in some way before the program
is given and you may incorrectly attribute posttest differences to the program.
Although the pretest helps to assess the degree of pre-program similarity, it does
not tell us if the groups are changing at similar rates prior to the program. Thus, the
NEGD is especially susceptible to selection-maturation threats.
The double pretest design includes two measures prior to the program. Consequently, if the program and
comparison group are maturing at different rates you should detect this as a change from pretest 1 to pretest 2.
Therefore, this design explicitly controls for selection-maturation threats. The design is also sometimes referred to as
a "dry run" quasi-experimental design because the double pretests simulate what would happen in the null case.
The Switching Replications Design
The Switching Replications quasi-experimental design is also very strong

with respect to internal validity. And, because it allows for two independent
implementations of the program, it may enhance external validity or
generalizability. The design has two groups and three waves of
measurement. In the first phase of the design, both groups are pretests, one
is given the program and both are posttested. In the second phase of the design, the original comparison group is
given the program while the original program group serves as the "control". This design is identical in structure to it's
randomized experimental version, but lacks the random assignment to group. It is certainly superior to the simple
pre-post nonequivalent groups design. In addition, because it assures that all participants eventually get the
program, it is probably one of the most ethically feasible quasi-experiments.
The Nonequivalent Dependent Variables (NEDV) Design
The Nonequivalent Dependent Variables (NEDV) Design is a deceptive one.

In its simple form, it is an extremely weak design with respect to internal
validity. But in its pattern matching variations, it opens the door to an entirely
different approach to causal assessment that is extremely powerful. The
design notation shown here is for the simple two-variable case. Notice that
this design has only a single group of participants! The two lines in the
notation indicate separate variables, not separate groups.
The idea in this design is that you have a program designed to change a specific outcome. For instance, let's
assume you are doing training in algebra for first-year high-school students. Your training program is designed to
affect algebra scores. But it is not designed to affect
geometry scores. And, pre-post geometry performance
might be reasonably expected to be affected by other
internally validity factors like history or maturation. In this
case, the pre-post geometry performance acts like a
control group -- it models what would likely have
happened to the algebra pre-post scores if the program
hadn't been given. The key is that the "control" variable
has to be similar enough to the target variable to be
affected in the same way by history, maturation, and the other single group internal validity threats, but not so similar
that it is affected by the program. The figure shows the results we might get for our two-variable algebra-geometry
example. Note that this design only works if the geometry variable is a reasonable proxy for what would have
happened on the algebra scores in the absence of the program. The real allure of this design is the possibility that
we don't need a control group -- we can give the program to all of our sample! The problem is that in its two-variable
simple version, the assumption of the control variable is a difficult one to meet. (Note that a double-pretest version of
this design would be considerably stronger).
The Pattern Matching NEDV Design. Although the two-variable NEDV design is quite weak, we can make it
considerably stronger by adding multiple outcome variables. In this variation, we need many outcome variables and
a theory that tells how affected (from most to least) each variable will be by the program. Let's reconsider the
example of our algebra program above. Now, instead of having only an algebra and geometry score, we have ten
measures that we collect pre and post. We expect that the algebra measure would be most affected by the program
(because that's what the program was most designed to affect). But here, we recognize that geometry might also be
affected because training in algebra might be relevant, at least tangentially, to geometry skills. On the other hand,
we might theorize that creativity would be much less affected, even indirectly, by training in algebra and so our
creativity measure is predicted to be least affected of the ten measures.
Now,
let's line
up our
theoretical expectations against our pre-post gains for each variable. The graph we'll use is called a "ladder graph"
because if there is a correspondence between expectations and observed results we'll get horizontal lines and a
figure that looks a bit like a ladder. You can see in the figure that the expected order of outcomes (on the left) are
mirrored well in the actual outcomes (on the right).
Depending on the circumstances, the Pattern Matching NEDV design can be quite strong with respect to internal
validity. In general, the design is stronger if you have a larger set of variables and you find that your expectation
pattern matches well with the observed results. What are the threats to internal validity in this design? Only a factor
(e.g., an historical event or maturational pattern) that would yield the same outcome pattern can act as an alternative
explanation. And, the more complex the predicted pattern, the less likely it is that some other factor would yield it.
The problem is, the more complex the predicted pattern, the less likely it is that you will find it matches to your
observed data as well.
The Pattern Matching NEDV design is especially attractive for several reasons. It requires that the researcher
specify expectations prior to institution of the program. Doing so can be a sobering experience. Often we make naive
assumptions about how our programs or interventions will work. When we're forced to look at them in detail, we
begin to see that our assumptions may be unrealistic. The design also requires a detailed measurement net -- a
large set of outcome variables and a detailed sense of how they are related to each other. Developing this level of
detail about your measurement constructs is liable to improve the construct validity of your study. Increasingly, we
have methodologies that can help researchers empirically develop construct networks that describe the expected
interrelationships among outcome variables (see Concept Mapping for more information about how to do this).
Finally, the Pattern Matching NEDV is especially intriguing because it suggests that it is possible to assess the
effects of programs even if you only have a treated group. Assuming the other conditions for the design are met,
control groups are not necessarily needed for causal assessment. Of course, you can also couple the Pattern
Matching NEDV design with standard experimental or quasi-experimental control group designs for even more
enhanced validity. And, if your experimental or quasi-experimental design already has many outcome measures as
part of the measurement protocol, the design might be considerably enriched by generating variable-level
expectations about program outcomes and testing the match statistically.
One of my favorite questions to my statistician friends goes to the heart of the potential of the Pattern Matching
NEDV design. "Suppose," I ask them, "that you have ten outcome variables in a study and that you find that all ten
show no statistically significant treatment effects when tested individually (or even when tested as a multivariate set).
And suppose, like the desperate graduate student who finds in their initial analysis that nothing is significant that you
decide to look at the direction of the effects across the ten variables. You line up the variables in terms of which
should be most to least affected by your program. And, miracle of miracles, you find that there is a strong and
statistically significant correlation between the expected and observed order of effects even though no individual
effect was statistically significant. Is this finding interpretable as a treatment effect?" My answer is "yes." I think the
graduate student's desperation-driven intuition to look at order of effects is a sensible one. I would conclude that the
reason you did not find statistical effects on the individual variables is that you didn't have sufficient statistical power.
Of course, the results will only be interpretable as a treatment effect if you can rule out any other plausible factor that
could have caused the ordering of outcomes. But the more detailed the predicted pattern and the stronger the
correlation to observed results, the more likely the treatment effect becomes the most plausible explanation. In such
cases, the expected pattern of results is like a unique fingerprint -- and the observed pattern that matches it can only
be due to that unique source pattern.
I believe that the pattern matching notion implicit in the NEDV design opens the way to an entirely different approach
to causal assessment, one that is closely linked to detailed prior explication of the program and to detailed mapping
of constructs. It suggests a much richer model for causal assessment than one that relies only on a simplistic
dichotomous treatment-control model. In fact, I'm so convinced of the importance of this idea that I've staked a major
part of my career on developing pattern matching models for conducting research!
The Regression Point Displacement (RPD) Design
The Regression Point Displacement (RPD) design is a simple quasi-experimental

strategy that has important implications, especially for community-based research. The
problem with community-level interventions is that it is difficult to do causal
assessment, to determine if your program made a difference as opposed to other
potential factors. Typically, in community-level interventions, program costs preclude
our implementing the program in more than one community. We look at pre-post indicators for the program
community and see whether there is a change. If we're relatively enlightened, we seek out another similar
community and use it as a comparison. But, because the intervention is at the community level, we only have a
single "unit" of measurement for our program and comparison groups.
The RPD design attempts to enhance the single program unit situation by comparing the performance on that single
unit with the performance of a large set of comparison units. In community research, we would compare the pre-post
results for the intervention community with a large set of other communities. The advantage of doing this is that we
don't rely on a single nonequivalent community, we attempt to use results from a heterogeneous set of
nonequivalent communities to model the comparison condition, and then compare our single site to this model. For
typical community-based research, such an approach may greatly enhance our ability to make causal inferences.
I'll illustrate the RPD design with an example
of a community-based AIDS education
program. We decide to pilot our new AIDS
education program in one particular
community in a state, perhaps a county. The
state routinely publishes annual HIV positive
rates by county for the entire state. So, we
use the remaining counties in the state as
control counties. But instead of averaging all
of the control counties to obtain a single
control score, we use them as separate
units in the analysis. The first figure shows
the bivariate pre-post distribution of HIV
positive rates per 1000 people for all the
counties in the state. The program county --
the one that gets the AIDS education
program -- is shown as an X and the remaining control counties are shown as Os. We compute a regression line for
the control cases (shown in blue on the figure). The regression line models our predicted outcome for a count with
any specific pretest rate. To estimate the effect of the program we test whether the displacement of the program
county from the control county regression line is statistically significant.
The second figure shows why the RPD design was given its name. In this design, we know we have a treatment
effect when there is a significant displacement of the program point from the control group regression line.
The RPD design is especially applicable in situations where a treatment or program is applied in a single
geographical unit (e.g., a state, county, city, hospital, hospital unit) instead of an individual, where there are lots of
other units available as control cases, and where there is routine measurement (e.g., monthly, annually) of relevant
outcome variables.
The analysis of the RPD design turns out to be a variation of the Analysis of Covariance model (see the Statistical
Analysis of the Regression Point Displacement Design). I had the opportunity to be the co-developer with Donald T.
Campbell of the RPD design. You can view the entire original paper entitled " The Regression Point Displacement
Design for Evaluating Community-Based Pilot Programs and Demonstration Projects."
The Basic Design
The Non-Equivalent Groups Design (hereafter NEGD) is probably

the most frequently used design in social research. It is structured
like a pretest-posttest randomized experiment, but it lacks the key
feature of the randomized designs -- random assignment. In the
NEGD, we most often use intact groups that we think are similar as
the treatment and control groups. In education, we might pick two
comparable classrooms or schools. In community-based research,
we might use two similar communities. We try to select groups that are as similar as possible so we can fairly
compare the treated one with the comparison one. But we can never be sure the groups are comparable. Or, put
another way, it's unlikely that the two groups would be as similar as they would if we assigned them through a
random lottery. Because it's often likely that the groups are not equivalent, this designed was named the
nonequivalent groups design to remind us.
So, what does the term "nonequivalent" mean? In one sense, it just means that assignment to group was not
random. In other words, the researcher did not control the assignment to groups through the mechanism of random
assignment. As a result, the groups may be different prior to the study. That is, the NEGD is especially susceptible
to the internal validity threat of selection. Any prior differences between the groups may affect the outcome of the
study. Under the worst circumstances, this can lead us to conclude that our program didn't make a difference when
in fact it did, or that it did make a difference when in fact it didn't.
The Bivariate Distribution

Let's begin our exploration of the
NEGD by looking at some
hypothetical results. The first figure
shows a bivariate distribution in the
simple pre-post, two group study.
The treated cases are indicated
with Xs while the comparison
cases are indicated with Os. A
couple of things should be obvious
from the graph. To begin, we don't
even need statistics to see that
there is a whopping treatment effect
(although statistics would help us
estimate the size of that effect more
precisely). The program cases (Xs)
consistently score better on the
posttest than the comparison cases
(Os) do. If positive scores on the
posttest are "better" then we can
conclude that the program improved
things. Second, in the NEGD the
biggest threat to internal validity is
selection -- that the groups differed
before the program. Does that appear to be the case here? Although it may be harder to see, the program does
appear to be a little further to the right on average. This suggests that they did have an initial advantage and that the
positive results may be due in whole or in part to this initial difference.
We can see the

initial difference, the
selection bias,
when we look at the
next graph. It
shows that the
program group
scored about five
points higher than
the comparison
group on the
pretest. The
comparison group
had a pretest
average of about 50
while the program
group averaged
about 55. It also
shows that the
program group
scored about fifteen
points higher than
the comparison
group on the
posttest. That is,
the comparison
group posttest
score was again
about 55, while this time the program group scored around 65. These observations suggest that there is a potential
selection threat, although the initial five point difference doesn't explain why we observe a fifteen point difference on
the posttest. It may be that there is still a legitimate treatment effect here, even given the initial advantage of the
program group.
Possible Outcome #1
Let's take a look at several different possible outcomes from a NEGD to see how they might be interpreted. The
important point here is that each of these outcomes has a different storyline. Some are more susceptible to treats to
internal validity than others. Before you read through each of the descriptions, take a good look at the graph and try
to figure out how you would explain the results. If you were a critic, what kinds of problems would you be looking for?
Then, read the synopsis and see if it agrees with my perception.
Sometimes it's
useful to look
at the means
for the two
groups. The
figure shows
these means
with the pre-
post means of
the program
group joined
with a blue line
and the pre-
post means of
the
comparison
group joined
with a green
one. This first
outcome
shows the
situation in the
two bivariate plots above. Here, we can see much more clearly both the original pretest difference of five points, and
the larger fifteen point posttest difference.
How might we interpret these results? To begin, you need to recall that with the NEGD we are usually most
concerned about selection threats. Which selection threats might be operating here? The key to understanding this
outcome is that the comparison group did not change between the pretest and the posttest. Therefore, it would be
hard to argue that that the outcome is due to a selection-maturation threat. Why? Remember that a selection-
maturation threat means that the groups are maturing at different rates and that this creates the illusion of a program
effect when there is not one. But because the comparison group didn't mature (i.e., change) at all, it's hard to argue
that it was differential maturation that produced the outcome. What could have produced the outcome? A selection-
history threat certainly seems plausible. Perhaps some event occurred (other than the program) that the program
group reacted to and the comparison group didn't. Or, maybe a local event occurred for the program group but not
for the comparison group. Notice how much more likely it is that outcome pattern #1 is caused by such a history
threat than by a maturation difference. What about the possibility of selection-regression? This one actually works a
lot like the selection-maturation threat If the jump in the program group is due to regression to the mean, it would
have to be because the program group was below the overall population pretest average and, consequently,
regressed upwards on the posttest. But if that's true, it should be even more the case for the comparison group who
started with an even lower pretest average. The fact that they don't appear to regress at all helps rule out the
possibility the outcome #1 is the result of regression to the mean.
Possible Outcome #2
Our second
hypothetical
outcome
presents a
very different
picture. Here,
both the
program and
comparison
groups gain
from pre to
post, with the
program
group gaining
at a slightly
faster rate.
This is almost
the definition
of a selection-
maturation
threat. The
fact that the
two groups differed to begin with suggests that they may already be maturing at different rates. And the posttest
scores don't do anything to help rule that possibility out. This outcome might also arise from a selection-history
threat. If the two groups, because of their initial differences, react differently to some historical event, we might
obtain the outcome pattern shown. Both selection-testing and selection-instrumentation are also possibilities,
depending on the nature of the measures used. This pattern could indicate a selection-mortality problem if there are
more low-scoring program cases that drop out between testings. What about selection-regression? It doesn't seem
likely, for much the same reasoning as for outcome #1. If there was an upwards regression to the mean from pre to
post, we would expect that regression to be greater for the comparison group because they have the lower pretest
score.
Possible Outcome #3
This third
possible
outcome cries
out "selection-
regression!" Or,
at least it would if
it could cry out.
The regression
scenario is that
the program
group was
selected so that
they were
extremely high
(relative to the
population) on
the pretest. The
fact that they
scored lower,
approaching the
comparison
group on the
posttest, may simply be due to their regressing toward the population mean. We might observe an outcome like this
when we study the effects of giving a scholarship or an award for academic performance. We give the award
because students did well (in this case, on the pretest). When we observe their posttest performance, relative to an
"average" group of students, they appear to perform a more poorly. Pure regression! Notice how this outcome
doesn't suggest a selection-maturation threat. What kind of maturation process would have to occur for the highly
advantaged program group to decline while a comparison group evidences no change?
Possible Outcome #4
Our fourth
possible
outcome also
suggests a
selection-
regression
threat. Here,
the program
group is
disadvantaged
to begin with.
The fact that
they appear to
pull closer to
the program
group on the
posttest may
be due to
regression.
This outcome
pattern may be
suspected in
studies of compensatory programs -- programs designed to help address some problem or deficiency. For instance,
compensatory education programs are designed to help children who are doing poorly in some subject. They are
likely to have lower pretest performance than more average comparison children. Consequently, they are likely to
regress to the mean in much the pattern shown in outcome #4.
Possible Outcome #5
This last
hypothetical
outcome is
sometimes
referred to as
a 'cross-over"
pattern. Here,
the
comparison
group doesn't
appear to
change from
pre to post.
But the
program
group does,
starting out
lower than
the
comparison
group and
ending up
above them. This is the clearest pattern of evidence for the effectiveness of the program of all five of the hypothetical
outcomes. It's hard to come up with a threat to internal validity that would be plausible here. Certainly, there is no
evidence for selection maturation here unless you postulate that the two groups are involved in maturational
processes that just tend to start and stop and just coincidentally you caught the program group maturing while the
comparison group had gone dormant. But, if that was the case, why did the program group actually cross over the
comparison group? Why didn't they approach the comparison group and stop maturing? How likely is this outcome
as a description of normal maturation? Not very. Similarly, this isn't a selection-regression result. Regression might
explain why a low scoring program group approaches the comparison group posttest score (as in outcome #4), but it
doesn't explain why they cross over.
Although this fifth outcome is the strongest evidence for a program effect, you can't very well construct your study
expecting to find this kind of pattern. It would be a little bit like saying "let's give our program to the toughest cases
and see if we can improve them so much that they not only become like 'average' cases, but actually outperform
them." That's an awfully big expectation to saddle any program with. Typically, you wouldn't want to subject your
program to that kind of expectation. But if you happen to find that kind of result, you really have a program effect that
has beat the odds.
Statistical Analysis of The Nonequivalent Group Design

The regression-discontinuity design. What a terrible name! In everyday language both parts of the term have connotations that are
primarily negative. To most people "regression" implies a reversion backwards or a return to some earlier, more primitive state
while "discontinuity" suggests an unnatural jump or shift in what might otherwise be a smoother, more continuous process. To a
research methodologist, however, the term regression-discontinuity (hereafter labeled "RD") carries no such negative meaning.
Instead, the RD design is seen as a useful method for determining whether a program or treatment is effective.
The label "RD design" actually refers to a set of design variations. In its simplest most traditional form, the RD design is a pretest-
posttest program-comparison group strategy. The unique characteristic which sets RD designs apart from other pre-post group
designs is the method by which research participants are assigned to conditions. In RD designs, participants are assigned to
program or comparison groups solely on the basis of a cutoff score on a pre-program measure. Thus the RD design is
distinguished from randomized experiments (or randomized clinical trials) and from other quasi-experimental strategies by its
unique method of assignment. This cutoff criterion implies the major advantage of RD designs -- they are appropriate when we
wish to target a program or treatment to those who most need or deserve it. Thus, unlike its randomized or quasi-experimental
alternatives, the RD design does not require us to assign potentially needy individuals to a no-program comparison group in order
to evaluate the effectiveness of a program.
The RD design has not been used frequently in social research. The most common implementation has been in compensatory
education evaluation where school children who obtain scores which fall below some predetermined cutoff value on an
achievement test are assigned to remedial training designed to improve their performance. The low frequency of use may be
attributable to several factors. Certainly, the design is a relative latecomer. Its first major field tests did not occur until the mid-1970s
when it was incorporated into the nationwide evaluation system for compensatory education programs funded under Title I of the
Elementary and Secondary Education Act (ESEA) of 1965. In many situations, the design has not been used because one or more
key criteria were absent. For instance, RD designs force administrators to assign participants to conditions solely on the basis of
quantitative indicators thereby often impalatably restricting the degree to which judgment, discretion or favoritism may be used.
Perhaps the most telling reason for the lack of wider adoption of the RD design is that at first glance the design doesn't seem to
make sense. In most research, we wish to have comparison groups that are equivalent to program groups on pre-program
indicators so that post-program differences may be attributed to the program itself. But because of the cutoff criterion in RD
designs, program and comparison groups are deliberately and maximally different on pre-program characteristics, an apparently
insensible anomaly. An understanding of how the design actually works depends on at least a conceptual familiarity with regression
analysis thereby making the strategy a difficult one to convey to nonstatistical audiences.
Despite its lack of use, the RD design has great potential for evaluation and program research. From a methodological point of
view, inferences which are drawn from a well-implemented RD design are comparable in internal validity to conclusions from
randomized experiments. Thus, the RD design is a strong competitor to randomized designs when causal hypotheses are being
investigated. From an ethical perspective, RD designs are compatible with the goal of getting the program to those most in need. It
is not necessary to deny the program from potentially deserving recipients simply for the sake of a scientific test. From an
administrative viewpoint, the RD design is often directly usable with existing measurement efforts such as the regularly collected
statistical information typical of most management information systems. The advantages of the RD design warrant greater
educational efforts on the part of the methodological community to encourage its use where appropriate.
The Basic Design
The "basic" RD design is a pretest-posttest two group design. The term "pretest- posttest" implies that the same measure (or
perhaps alternate forms of the same measure) is administered before and after some program or treatment. (In fact, the RD design
does not require that the pre and post measures are the same.) The term "pretest" implies that the same measure is given twice
while the term "pre-program" measure implies more broadly that before and after measures may be the same or different. It is
assumed that a cutoff value on the pretest or pre-program measure is being used to assign persons or other units to the program.
Two group versions of the RD design might imply either that some treatment or program is being contrasted with a no-program
condition or that two alternative programs are being compared. The description of the basic design as a two group design implies
that a single pretest cutoff score is used to assign participants to either the program or comparison group. The term "participants"
refers to whatever unit is assigned. In many cases, participants are individuals, but they could be any definable units such as
hospital wards, hospitals, counties, and so on. The term "program" will be used throughout to refer to any program, treatment or
manipulation whose effects we wish to examine. In notational form, the basic RD design might be depicted as shown in the figure
where:
C indicates that groups are assigned by means of a cutoff score,

an O stands for the administration of a measure to a group,
an X depicts the implementation of a program,
and each group is described on a single line (i.e., program group on top, control group on the bottom).
To make this initial presentation more concrete, we can imagine a hypothetical study where the interest is in examining the effect of
a new treatment protocol for inpatients with a particular diagnosis. For simplicity, we can assume that we wish to try the new
protocol on patients who are considered most ill and that for each patient we have a continuous quantitative indicator of health that
is a composite rating which can take values from 1 to 100 where high scores indicate greater health. Furthermore, we can assume
that a pretest cutoff score of 50 was (more or less arbitrarily) chosen as the assignment criterion or that all those scoring lower than
50 on the pretest are to be given the new treatment protocol while those with scores greater than or equal to 50 are given the
standard treatment.
It is useful to begin by
considering what the data
might look like if we did not
administer the treatment
protocol but instead only
measured all participants at
two points in time. Figure 1
shows the hypothetical
bivariate distribution for this
situation. Each dot on the
figure indicates a single
person's pretest and
posttest scores. The blue Xs
to the left of the cutoff show
the program cases. They
are more severely ill on both
the pretest and posttest.
The green circles show the
comparison group that is
comparatively healthy on
both measures. The vertical
line at the pretest score of
50 indicates the cutoff point
(although for Figure 1 we
are assuming that no Figure 1. Pre-Post distribution with no treatment effect.
treatment has been given).
The solid line through the bivariate distribution is the linear regression line. The distribution depicts a strong positive relationship
between the pretest and posttest -- in general, the more healthy a person is at the pretest, the more healthy they'll be on the
posttest, and, the more severely ill a person is at the pretest, the more ill they'll be on the posttest.
Now we can consider what the outcome might look like if the new treatment protocol is administered and has a positive effect. For
simplicity, we will assume that the treatment had a constant effect which raised each treated person's health score by ten points.
This is portrayed in Figure 2.
Figure 2. Regression-Discontinuity Design with Ten-point Treatment Effect.
Figure 2 is identical to Figure 1 except that all points to the left of the cutoff (i.e., the treatment group) have been raised by 10
points on the posttest. The dashed line in Figure 2 shows what we would expect the treated group's regression line to look like if
the program had no effect (as was the case in Figure 1).
It is sometimes difficult to see the forest for the trees in these types of bivariate plots. So, let's remove the individual data points and
look only at the regression lines. The plot of regression lines for the treatment effect case of Figure 2 is shown in Figure 3.
Figure 3. Regression lines for the data shown in Figure 2.
On the basis of Figure 3, we can now see how the RD design got its name - - a program effect is suggested when we observe a
"jump" or discontinuity in the regression lines at the cutoff point. This is illustrated in Figure 4.
Figure 4. How the Regression-Discontinuity Design got its name.
The Logic of the RD Design
The discussion above indicates what the key feature of the RD design is: assignment based on a cutoff value on a pre-program
measure. The cutoff rule for the simple two-group case is essentially:
all persons on one side of the cutoff are assigned to one group...
all persons on the other side of the cutoff are assigned to the other
need a continuous quantitative pre-program measure
Selection of the Cutoff. The choice of cutoff value is usually based on one of two factors. It can be made solely on the basis of the
program resources that are available. For instance, if a program only has the capability of handling 25 persons and 70 people
apply, one can choose a cutoff point that distinguishes the 25 most needy persons from the rest. Alternatively, the cutoff can be
chosen on substantive grounds. If the pre-program assignment measure is an indication of severity of illness measured on a 1 to 7
scale and physicians or other experts believe that all patients scoring 5 or more are critical and fit well the criteria defined for
program participants then a cutoff value of 5 may be used.
Interpretation of Results.. In order to interpret the results of an RD design, one must know the nature of the assignment variable,
who received the program and the nature of the outcome measure. Without this information, there is no distinct outcome pattern
which directly indicates whether an effect is positive or negative.
To illustrate this, we can construct a new hypothetical example of an RD design. Let us assume that a hospital administrator would
like to improve the quality of patient care through the institution of an intensive quality of care training program for staff. Because of
financial constraints, the program is too costly to implement for all employees and so instead it will be administered to the entire
staff from specifically targeted units or wards which seem most in need of improving quality of care. Two general measures of
quality of care are available. The first is an aggregate rating of quality of care based on observation and rating by an administrative
staff member and will be labeled here the QOC rating. The second is the ratio of the number of recorded patient complaints relative
to the number of patients in the unit over a fixed period of time and will be termed here the Complaint Ratio. In this scenario, the
administrator could use either the QOC rating or Complaint Ratio as the basis for assigning units to receive the training. Similarly,
the effects of the training could be measured on either variable. Figure 5 shows four outcomes of alternative RD implementations
possible under this scenario.
Only the regression lines are shown in the figure. It is worth noting that even though all four outcomes have the same pattern of
regression lines, they do not imply the same result. In Figures 5a and 5b, hospital units were assigned to training because they
scored below some cutoff score on the QOC rating. In Figures 5c and 5d units were given training because they scored above the
cutoff score value on the Complaint Ratio measure. In each figure, the dashed line indicates the regression line we would expect to
find for the training group if the training had no effect. This dashed line represents the no-discontinuity projection of the comparison
group regression line into the region of the program group pretest scores.
We can clearly see that even though the outcome regression lines are the same in all four groups, we would interpret the four
graphs differently. Figure 5a depicts a positive effect because training raised the program group regression line on the QOC rating
over what would have been expected. Figure 5b however shows a negative effect because the program raised training group
scores on the Complaint Ratio indicating increased complaint rates. In Figure 5c we see a positive effect because the regression
line has been lowered on the Complaint Ratio relative to what we would have expected. Finally, Figure 5d shows a negative effect
where the training resulted in lower QOC ratings than we would expect otherwise. The point here is a simple one. A discontinuity in
regression lines indicates a program effect in the RD design. But the discontinuity alone is not sufficient to tell us whether the effect
is positive or negative. In order to make this determination, we need to know who received the program and how to interpret the
direction of scale values on the outcome measures.
The Role of the Comparison Group in RD Designs. With this introductory discussion of the design in mind, we can now see
what constitutes the benchmark for comparison in the RD design. In experimental or other quasi- experimental designs we either
assume or try to provide evidence that the program and comparison groups are equivalent prior to the program so that post-
program differences can be attributed to the manipulation. The RD design involves no such assumption. Instead, with RD designs
we assume that in the absence of the program the pre-post relationship would be equivalent for the two groups. Thus, the strength
of the RD design is dependent on two major factors. The first is the assumption that there is no spurious discontinuity in the pre-
post relationship which happens to coincide with the cutoff point. The second factor concerns the degree to which we can know and
correctly model the pre-post relationship and constitutes the major problem in the statistical analysis of the RD design which will be
discussed below.
The Internal Validity of the RD Design. Internal validity refers to whether one can infer that the treatment or program being
investigated caused a change in outcome indicators. Internal validity as conceived is not concerned with our ability to generalize
but rather focuses on whether a causal relationship can be demonstrated for the immediate research context. Research designs
which address causal questions are often compared on their relative ability to yield internally valid results.
In most causal hypothesis tests, the central inferential question is whether any observed outcome differences between groups are
attributable to the program or instead to some other factor. In order to argue for the internal validity of an inference, the analyst
must attempt to demonstrate that the program -- and not some plausible alternative explanation -- is responsible for the effect. In
the literature on internal validity, these plausible alternative explanations or factors are often termed "threats" to internal validity. A
number of typical threats to internal validity have been identified. For instance, in a one-group pre-post study a gain from pretest to
posttest may be attributable to the program or to other plausible factors such as historical events occurring between pretest and
posttest, or natural maturation over time.
Many threats can be ruled out with the inclusion of a control group. Assuming that the control group is equivalent to the program
group prior to the study, the control group pre-post gain will provide evidence for the change which should be attributed to all
factors other than the program. A different rate of gain in the program group provides evidence for the relative effect of the program
itself. Thus, we consider randomized experimental designs to be strong in internal validity because of our confidence in the
probabilistic pre-program equivalence between groups which results from random assignment and helps assure that the control
group will provide a legitimate reflection of all non-program factors that might affect outcomes.
In designs that do not use random assignment, the central internal validity concern revolves around the possibility that groups may
not be equivalent prior to the program. We use the term "selection bias" to refer to the case where pre-program differences
between groups are responsible for post-program differences. Any non-program factor which is differentially present across groups
can constitute a selection bias or a selection threat to internal validity.
In RD designs, because of the deliberate pre-program differences between groups, there are several selection threats to internal
validity which might, at first glance, appear to be a problem. For instance, a selection-maturation threat implies that different rates
of maturation between groups might explain outcome differences. For the sake of argument, let's consider a pre-post distribution
with a linear relationship having a slope equal to two units. This implies that on the average a person with a given pretest score will
have a posttest score two times as high. Clearly there is maturation in this situation, that is, people are getting consistently higher
scores over time. If a person has a pretest score of 10 units, we would predict a posttest score of 20 for an absolute gain of 10. But,
if a person has a pretest score of 50 we would predict a posttest score of 100 for an absolute gain of 50. Thus the second person
naturally gains or matures more in absolute units (although the rate of gain relative to the pretest score is constant). Along these
lines, in the RD design we expect that all participants may mature and that in absolute terms this maturation may be different for
the two groups on average. Nevertheless, a program effect in the RD design is not indicated by a difference between the posttest
averages of the groups, but rather by a change in the pre-post relationship at the cutoff point. In this example, although we expect
different absolute levels of maturation, a single continuous regression line with a slope equal to 2 would describe these different
maturational rates. More to the point, in order for selection-maturation to be a threat to internal validity in RD designs, it must
induce a discontinuity in the pre-post relationship which happens to coincide with the cutoff point -- an unlikely scenario in most
studies.
Another selection threat to internal validity which might intuitively seem likely concerns the possibility of differential regression to the
mean or a selection-regression threat. The phenomenon of regression to the mean arises when we asymmetrically sample groups
from a distribution. On any subsequent measure the obtained sample group mean will be closer to the population mean for that
measure (in standardized units) than the sample mean from the original distribution is to its population mean. In RD designs we
deliberately create asymmetric samples and consequently expect regression towards the mean in both groups. In general we
expect the low-scoring pretest group to evidence a relative gain on the posttest and the high-scoring pretest group to show a
relative loss. As with selection-maturation, even though we expect to see differential regression to the mean this poses no problem
for the internal validity of the RD design. We don't expect that regression to the mean will result in a discontinuity in the bivariate
relationship coincidental with the cutoff point. In fact, the regression to the mean that will occur is expected to be continuous across
the range of the pretest scores and is described by the regression line itself. (We should recall that the term "regression" was
originally used by Galton to refer to the fact that a regression line describes regression to the mean.)
Although the RD design may initially seem susceptible to selection biases, it is not. The above discussion demonstrates that only
factors that would naturally induce a discontinuity in the pre-post relationship could be considered threats to the internal validity of
inferences from the RD design. In principle then the RD design is as strong in internal validity as its randomized experimental
alternatives. In practice, however, the validity of the RD design depends directly on how well the analyst can model the true pre-
post relationship, certainly a nontrivial statistical problem as is discussed in the statistical analysis of the regression-discontinuity
design.
The RD Design and Accountability. It makes sense intuitively that the accountability of a program is largely dependent on the
explicitness of the assignment or allocation of the program to recipients. Lawmakers and administrators need to recognize that
programs are more evaluable and accountable when the allocation of the program is more public and verifiable. The three major
pre-post designs -- the Pre-Post Randomized Experiments, the RD Design, and the Nonequivalent Groups Design -- are analogous
to the three types of program allocation schemes which legislators or administrators might choose. Randomized experiments are
analogous to the use of a lottery for allocating the program. RD designs can be considered explicit, accountable methods for
assigning program recipients on the basis of need or merit. Nonequivalent group designs might be considered a type of political
allocation because they enable the use of unverifiable, subjective or politically-motivated assignment. Most social programs are
politically allocated. Even when programs are allocated primarily on the basis of need or merit, the regulatory agency usually
reserves some discretionary capability in deciding who receives the program. Without debating the need for such discretion, it is
clear that the methodological community should encourage administrators and legislators who wish their programs to be
accountable to make explicit their criteria for program eligibility by either using probabilistically based lotteries or by relying on
quantitative eligibility ratings and cutoff values as in the RD design. To the extent that legislators and administrators can be
convinced to move toward more explicit assignment criteria, both the potential utility of the RD design and the accountability of the
programs will be increased.
Ethics and the RD Design
The discussion above argues that the RD Design is strong in internal validity, certainly stronger than the Nonequivalent Groups
Design, and perhaps as strong as the Randomized Experiments. But we know that the RD Designs are not as statistically powerful
as the Randomized Experiments. That is, in order to achieve the same level of statistical accuracy, an RD Design needs as much
as 2.75 times the participants as a randomized experiment. For instance, if a Randomized Experiment needs 100 participants to
achieve a certain level of power, the RD design might need as many as 275.
So why would we ever use the RD Design instead of a randomized one? The real allure of the RD Design is that it allows us to
assign the treatment or program to those who most need or deserve it. Thus, the real attractiveness of the design is ethical -- we
don't have to deny the program or treatment to participants who might need it as we do in randomized studies.
Statistical Analysis of The Regression-Discontinuity Design

[ Home ] [ Internal Validity ] [ Introduction to Design ] [ Types of Designs ] [ Experimental Design ] [ Quasi-Experimental Design ] [ Relationships Among Pre-Post Designs ] [ Designing Designs for Research ] [ Advances in Quasi-
Experimentation ]
Establishing Cause & Effect Internal Validity is the approximate truth about inferences regarding cause-effect or causal relationships. Thus, internal validity is only relevant in studies that try
Single Group Threats to establish a causal relationship. It's not relevant in most observational or descriptive studies, for instance. But for studies that assess the effects of social
Multiple-Group Threats
programs or interventions, internal validity is perhaps the primary consideration. In those contexts, you would like to be able to conclude that your program or
Social Interaction Threats
treatment made a difference -- it improved test scores or reduced symptomology. But there may be lots of reasons, other than your program, why test scores may
improve or symptoms may reduce. The key question in internal validity is whether observed changes can be attributed to your program or intervention (i.e., the
cause) and not to other possible causes (sometimes described as "alternative explanations" for the outcome).
One of the things that's most difficult to grasp about

internal validity is that it is only relevant to the specific
study in question. That is, you can think of internal
validity as a "zero generalizability" concern. All that
internal validity means is that you have evidence that
what you did in the study (i.e., the program) caused what
you observed (i.e., the outcome) to happen. It doesn't tell
you whether what you did for the program was what you
wanted to do or whether what you observed was what
you wanted to observe -- those are construct validity
concerns. It is possible to have internal validity in a study
and not have construct validity. For instance, imagine a
study where you are looking at the effects of a new
computerized tutoring program on math performance in
first grade students. Imagine that the tutoring is unique in
that it has a heavy computer game component and you
think that's what will really work to improve math
performance. Finally, imagine that you were wrong (hard,
isn't it?) -- it turns out that math performance did improve, and that it was because of something you did, but that it had nothing to do with the computer program.
What caused the improvement was the individual attention that the adult tutor gave to the child -- the computer program didn't make any difference. This study
would have internal validity because something that you did affected something that you observed -- you did cause something to happen. But the study would not
have construct validity, specifically, the label "computer math program" does not accurately describe the actual cause (perhaps better described as "personal
adult attention").
Since the key issue in internal validity is the causal one, we'll begin by considering what conditions need to be met in order to establish a causal relationship in
your project. Then we'll consider the different threats to internal validity -- the kinds of criticisms your critics will raise when you try to conclude that your program
caused the outcome. For convenience, we divide the threats to validity into three categories. The first involve the single group threats -- criticisms that apply when
you are only studying a single group that receives your program. The second consists of the multiple group threats -- criticisms that are likely to be raised when
you have several groups in your study (e.g., a program and a comparison group). Finally, we'll consider what I call the social threats to internal validity -- threats
that arise because social research is conducted in real-world human contexts where people will react to not only what affects them, but also to what is happening
to others around them.
Experimentation ]
What is Research Design?
Research design can be thought of as the structure of research -- it is the "glue" that holds all of the elements in a research project together. We often describe a design using a
concise notation that enables us to summarize a complex design structure efficiently. What are the "elements" that a design includes? They are:
Observations or Measures
These are symbolized by an 'O' in design notation. An O can refer to a single measure (e.g., a measure of body weight), a single instrument with
multiple items (e.g., a 10-item self-esteem scale), a complex multi-part instrument (e.g., a survey), or a whole battery of tests or measures given out
on one occasion. If you need to distinguish among specific measures, you can use subscripts with the O, as in O1, O2, and so on.
Treatments or Programs
These are symbolized with an 'X' in design notations. The X can refer to a simple intervention (e.g., a one-time surgical technique) or to a complex
hodgepodge program (e.g., an employment training program). Usually, a no-treatment control or comparison group has no symbol for the treatment
(some researchers use X+ and X- to indicate the treatment and control respectively). As with observations, you can use subscripts to distinguish
different programs or program variations.
Groups
Each group in a design is given its own line in the design structure. if the design notation has three lines, there are three groups in the design.
Assignment to Group
Assignment to group is designated by a letter at the beginning of each line (i.e., group) that describes how the group was assigned. The major types
of assignment are:
R = random assignment
N = nonequivalent groups
C = assignment by cutoff
Time
Time moves from left to right. Elements that are listed on the left occur before elements that are listed on the right.
Design Notation Examples
It's always easier to explain design notation through examples than it is to

describe it in words. The figure shows the design notation for a pretest-
posttest (or before-after) treatment versus comparison group
randomized experimental design. Let's go through each of the parts of
the design. There are two lines in the notation, so you should realize that
the study has two groups. There are four Os in the notation, two on each
line and two for each group. When the Os are stacked vertically on top of
each other it means they are collected at the same time. In the notation
you can see that we have two Os that are taken before (i.e., to the left of)
any treatment is given -- the pretest -- and two Os taken after the
treatment is given -- the posttest. The R at the beginning of each line
signifies that the two groups are randomly assigned (making it an
experimental design). The design is a treatment versus comparison group
one because the
top line (treatment
group) has an X
while the bottom
line (control
group) does not.
You should be
able to see why
many of my
students have
called this type of
notation the "tic-
tac-toe" method of
design notation --
there are lots of
Xs and Os!
Sometimes we have to be more specific in describing the Os or Xs than just using a single letter. In the second figure, we
have the identical research design with some subscripting of the Os. What does this mean? Because all of the Os have a
subscript of 1, there is some measure or set of measures that is collected for both groups on both occasions. But the design also has two Os with a subscript of 2, both taken at the
posttest. This means that there was some measure or set of measures that were collected only at the posttest.
With this simple set of rules for describing a research design in notational form, you can concisely explain even complex design structures. And, using a notation helps to show
common design sub-structures across different designs that we might not recognize as easily without the notation.

Experimentation ]
What are the different major types of research designs? We can classify
designs into a simple threefold classification by asking some key
questions. First, does the design use random assignment to groups?
[Don't forget that random assignment is not the same thing as random
selection of a sample from a population!] If random assignment is used,
we call the design a randomized experiment or true experiment. If
random assignment is not used, then we have to ask a second question:
Does the design use either multiple groups or multiple waves of
measurement? If the answer is yes, we would label it a quasi-
experimental design. If no, we would call it a non-experimental design.
This threefold classification is especially useful for describing the design
with respect to internal validity. A randomized experiment generally is the
strongest of the three designs when your interest is in establishing a
cause-effect relationship. A non-experiment is generally the weakest in
this respect. I have to hasten to add here, that I don't mean that a non-
experiment is the weakest of the the three designs overall, but only with
respect to internal validity or causal assessment. In fact, the simplest form
of non-experiment is a one-shot survey design that consists of nothing but
a single observation O. This is probably one of the most common forms of
research and, for some research questions -- especially descriptive ones
-- is clearly a strong design. When I say that the non-experiment is the
weakest with respect to internal validity, all I mean is that it isn't a
particularly good method for assessing the cause-effect relationship that you think might exist between a program and its outcomes.
To illustrate the different types of designs, consider one of each in design
notation. The first design is a posttest-only randomized experiment. You
can tell it's a randomized experiment because it has an R at the beginning
of each line, indicating random assignment. The second design is a pre-
post nonequivalent groups quasi-experiment. We know it's not a
randomized experiment because random assignment wasn't used. And
we know it's not a non-experiment because there are both multiple groups
and multiple waves of measurement. That means it must be a quasi-
experiment. We add the label "nonequivalent" because in this design we
do not explicitly control the assignment and the groups may be
nonequivalent or not similar to each other (see nonequivalent group
designs). Finally, we show a posttest-only nonexperimental design. You
might use this design if you want to study the effects of a natural disaster
like a flood or tornado and you want to do so by interviewing survivors.
Notice that in this design, you don't have a comparison group (e.g.,
interview in a town down the road the road that didn't have the tornado to
see what differences the tornado caused) and you don't have multiple
waves of measurement (e.g., a pre-tornado level of how people in the
ravaged town were doing before the disaster). Does it make sense to do
the non-experimental study? Of course! You could gain lots of valuable
information by well-conducted post-disaster interviews. But you may have
a hard time establishing which of the things you observed are due to the
disaster rather than to other factors like the peculiarities of the town or pre-
disaster characteristics.

Experimentation ]
Two-Group Experimental Designs Experimental designs are often touted as the most "rigorous" of all research designs or, as the "gold standard" against which all other designs are judged.
Classifying Experimental Designs In one sense, they probably are. If you can implement an experimental design well (and that is a big "if" indeed), then the experiment is probably the
Factorial Designs
strongest design with respect to internal validity. Why? Recall that internal validity is at the center of all causal or cause-effect inferences. When you want
Randomized Block Designs
Covariance Designs
to determine whether some program or treatment causes some outcome or outcomes to occur, then you are interested in having strong internal validity.
Hybrid Experimental Designs Essentially, you want to assess the proposition:
If X, then Y
or, in more colloquial terms:
If the program is given, then the outcome occurs
Unfortunately, it's not enough just to show that when the program or treatment occurs the expected outcome also happens. That's because there may be
lots of reasons, other than the program, for why you observed the outcome. To really show that there is a causal relationship, you have to simultaneously
address the two propositions:
If X, then Y
and
If not X, then not Y
Or, once again more colloquially:
If the program is given, then the outcome occurs
and
If the program is not given, then the outcome does not occur
If you are able to provide evidence for both of these propositions, then you've in effect isolated the program from all of the other potential causes of the
outcome. You've shown that when the program is present the outcome occurs and when it's not present, the outcome doesn't occur. That points to the
causal effectiveness of the program.
Think of all this like a fork in the road. Down one path, you implement the program and observe the outcome. Down the other path, you don't implement the
program and the outcome doesn't occur. But, how do we take both paths in the road in the same study? How can we be in two places at once? Ideally,
what we want is to have the same conditions -- the same people, context, time, and so on -- and see whether when the program is given we get the outcome
and when the program is not given we don't. Obviously, we can never achieve this hypothetical situation. If we give the program to a group of people, we
can't simultaneously not give it! So, how do we get out of this apparent dilemma?
Perhaps we just need to think about the problem a little differently. What if we could create two groups or contexts that are as similar as we can possibly
make them? If we could be confident that the two situations are comparable, then we could administer our program in one (and see if the outcome occurs)
and not give the program in the other (and see if the outcome doesn't occur). And, if the two contexts are comparable, then this is like taking both forks in
the road simultaneously! We can have our cake and eat it too, so to speak.
That's exactly what an experimental design tries to achieve. In the simplest type of experiment, we create two groups that are "equivalent" to each other.
One group (the program or treatment group) gets the program and the other group (the comparison or control group) does not. In all other respects, the
groups are treated the same. They have similar people, live in similar contexts, have similar backgrounds, and so on. Now, if we observe differences in
outcomes between these two groups, then the differences must be due to the only thing that differs between them -- that one got the program and the other
didn't.
OK, so how do we create two groups that are "equivalent"? The approach used in experimental design is to assign people randomly from a common pool of
people into the two groups. The experiment relies on this idea of random assignment to groups as the basis for obtaining two groups that are similar. Then,
we give one the program or treatment and we don't give it to the other. We observe the same outcomes in both groups.
The key to the success of the experiment is in the random assignment. In fact, even with random assignment we never expect that the groups we create
will be exactly the same. How could they be, when they are made up of different people? We rely on the idea of probability and assume that the two
groups are "probabilistically equivalent" or equivalent within known probabilistic ranges.
So, if we randomly assign people to two groups, and we have enough people in our study to achieve the desired probabilistic equivalence, then we may
consider the experiment to be strong in internal validity and we probably have a good shot at assessing whether the program causes the outcome(s).
But there are lots of things that can go wrong. We may not have a large enough sample. Or, we may have people who refuse to participate in our study or
who drop out part way through. Or, we may be challenged successfully on ethical grounds (after all, in order to use this approach we have to deny the
program to some people who might be equally deserving of it as others). Or, we may get resistance from the staff in our study who would like some of their
"favorite" people to get the program. Or, they mayor might insist that her daughter be put into the new program in an educational study because it may
mean she'll get better grades.
The bottom line here is that experimental design is intrusive and difficult to carry out in most real world contexts. And, because an experiment is often an
intrusion, you are to some extent setting up an artificial situation so that you can assess your causal relationship with high internal validity. If so, then you
are limiting the degree to which you can generalize your results to real contexts where you haven't set up an experiment. That is, you have reduced your
external validity in order to achieve greater internal validity.
In the end, there is just no simple answer (no matter what anyone tells you!). If the situation is right, an experiment can be a very strong design to use. But
it isn't automatically so. My own personal guess is that randomized experiments are probably appropriate in no more than 10% of the social research
studies that attempt to assess causal relationships.
Experimental design is a fairly complex subject in its own right. I've been discussing the simplest of experimental designs -- a two-group program versus
comparison group design. But there are lots of experimental design variations that attempt to accomplish different things or solve different problems. In this
section you'll explore the basic design and then learn some of the principles behind the major variations.

Experimentation ]
The Nonequivalent Groups Design A quasi-experimental design is one that looks a bit like an experimental design but lacks the key ingredient -- random assignment. My mentor, Don
The Regression-Discontinuity Design Campbell, often referred to them as "queasy" experiments because they give the experimental purists a queasy feeling. With respect to internal validity,
Other Quasi-Experimental Designs
they often appear to be inferior to randomized experiments. But there is something compelling about these designs; taken as a group, they are easily
more frequently implemented than their randomized cousins.
I'm not going to try to cover the quasi-experimental designs comprehensively. Instead, I'll present two of the classic quasi-experimental designs in some
detail and show how we analyze them. Probably the most commonly used quasi-experimental design (and it may be the most commonly used of all
designs) is the nonequivalent groups design. In its simplest form it requires a pretest and posttest for a treated and comparison group. It's identical to the
Analysis of Covariance design except that the groups are not created through random assignment. You will see that the lack of random assignment, and
the potential nonequivalence between the groups, complicates the statistical analysis of the nonequivalent groups design.
The second design I'll focus on is the regression-discontinuity design. I'm not including it just because I did my dissertation on it and wrote a book about it
(although those were certainly factors weighing in its favor!). I include it because I believe it is an important and often misunderstood alternative to
randomized experiments because its distinguishing characteristic -- assignment to treatment using a cutoff score on a pretreatment variable -- allows us to
assign to the program those who need or deserve it most. At first glance, the regression discontinuity design strikes most people as biased because of
regression to the mean. After all, we're assigning low scorers to one group and high scorers to the other. In the discussion of the statistical analysis of the
regression discontinuity design, I'll show you why this isn't the case.
Finally, I'll briefly present an assortment of other quasi-experiments that have specific applicability or noteworthy features, including the Proxy Pretest
Design, Double Pretest Design, Nonequivalent Dependent Variables Design, Pattern Matching Design, and the Regression Point Displacement design. I
had the distinct honor of co-authoring a paper with Donald T. Campbell that first described the Regression Point Displacement Design. At the time of his
death in Spring 1996, we had gone through about five drafts each over a five year period. The paper (click here for the entire paper) includes numerous
examples of this newest of quasi-experiments, and provides a detailed description of the statistical analysis of the regression point displacement design.
There is one major class of quasi-experimental designs that are not included here -- the interrupted time series designs. I plan to include them in later
rewrites of this material.

Experimentation ]
There are three major types of pre-post program-comparison group designs all sharing the basic design structure shown in the notation above:
The Randomized Experimental (RE) Design

The Nonequivalent Group (NEGD) Design
The Regression-Discontinuity (RD) Design
The designs differ in the method by which participants are assigned to the two groups. In the RE, participants are assigned randomly. In the RD design, they are assigned using a
cutoff score on the pretest. In the NEGD, assignment of participants is not explicitly controlled -- they may self select into either group, or other unknown or unspecified factors may
determine assignment.
Because these three designs differ so critically in their assignment strategy, they are often considered distinct or unrelated. But it is useful to look at them as forming a continuum,
both in terms of assignment and in terms of their strength with respect to internal validity.
We can look at the similarity three designs in terms of their assignment by graphing their assignment functions with respect to the pretest variable. In the figure, the vertical axis is
the probability that a specific unit (e.g., person) will be assigned to the treatment group). These values, because they are probabilities, range from 0 to 1. The horizontal axis is an
idealized pretest score.
Let's first examine the assignment function for the simple pre-post randomized experiment. Because units are assigned randomly, we know that the probability that a unit will be
assigned to the treatment group is always 1/2 or .5 (assuming equal assignment probabilities are used). This function is indicated by the horizontal red line at .5 in the figure. For the
RD design, we arbitrarily set the cutoff value at the midpoint of the pretest variable and assume that we assign units scoring below that value to the treatment and those scoring at or
above that value to the control condition (the arguments made here would generalize to the case of high-scoring treatment cases as well). In this case, the assignment function is a
simple step function, with the probability of assignment to the treatment = 1 for the pretest scores below the cutoff and = 0 for those above. It is important to note that for both the RE
and RD designs it is an easy matter to plot their assignment functions because assignment is explicitly controlled. This is not the case for the NEGD. Here, the idealized assignment
function differs depending on the degree to which the groups are nonequivalent on the pretest. If they are extremely nonequivalent (with the treatment group scoring lower on the
pretest), the assignment function would approach the step function of the RD design. If the groups are hardly nonequivalent at all, the function would approach the flat-line function
of the randomized experiment.
The graph of assignment functions points an important issue about the relationships among these designs -- the designs are not distinct with respect to their assignment functions,
they form a continuum. On one end of the continuum is the RE design and at the other is the RD. The NEGD can be viewed as a degraded RD or RE depending on whether the
assignment function more closely approximates one or the other.
We can also view the designs on a continuum with respect to the degree to which they generate a pretest difference between the groups.
The figure shows that the RD design induces the maximum possible pretest difference. The RE design induces the smallest pretest difference (the most equivalent). The NEGD fills
in the gap between these two extreme cases. If the groups are very nonequivalent, the design is closer to the RD design. If they're very similar, it's closer to the RE design.
Finally, we can also distinguish the three designs in terms of the a priori knowledge they give about assignment. It should be clear that in the RE design we know perfectly the
probability of assignment to treatment -- it is .5 for each participant. Similarly, with the RD design we also know perfectly the probability of assignment. In this case it is precisely
dependent on the cutoff assignment rule. It is dependent on the pretest where the RE design is not. In both these designs, we know the assignment function perfectly, and it is this
knowledge that enables us to obtain unbiased estimates of the treatment effect with these designs. This is why we conclude that, with respect to internal validity, the RD design is as
strong as the RE design. With the NEGD however, we do not know the assignment function perfectly. Because of this, we need to model this function either directly or indirectly (e.
g., through reliability corrections).
The major point is that we should not look at these three designs as entirely distinct. They are related by the nature of their assignment functions and the degree of pretest
nonequivalence between groups. This continuum has important implications for understanding the statistical analyses of these designs.

Experimentation ]
Reprinted from Trochim, W. and Land, D. (1982). Designing Designs for Research. The Researcher, 1, 1, 1-6.
Much contemporary social research is devoted to examining whether a program, treatment, or manipulation causes some outcome or result. For example, we might wish to know
whether a new educational program causes subsequent achievement score gains, whether a special work release program for prisoners causes lower recidivism rates, whether a
novel drug causes a reduction in symptoms, and so on. Cook and Campbell (1979) argue that three conditions must be met before we can infer that such a cause-effect relation
exists:
1. Covariation. Changes in the presumed cause must be related to changes in the presumed effect. Thus, if we introduce, remove, or change the level of a treatment or
program, we should observe some change in the outcome measures.
2. Temporal Precedence. The presumed cause must occur prior to the presumed effect.
3. No Plausible Alternative Explanations. The presumed cause must be the only reasonable explanation for changes in the outcome measures. If there are other factors
which could be responsible for changes in the outcome measures we cannot be confident that the presumed cause-effect relationship is correct.
In most social research the third condition is the most difficult to meet. Any number of factors other than the treatment or program could cause changes in outcome measures.
Campbell and Stanley (1966) and later, Cook and Campbell (1979) list a number of common plausible alternative explanations (or, threats to internal validity). For example, it may
be that some historical event which occurs at the same time that the program or treatment is instituted was responsible for the change in the outcome measures; or, changes in
record keeping or measurement systems which occur at the same time as the program might be falsely attributed to the program. The reader is referred to standard research
methods texts for more detailed discussions of threats to validity.
This paper is primarily heuristic in purpose. Standard social science methodology textbooks (Cook and Campbell 1979; Judd and Kenny, 1981) typically present an array of research
designs and the alternative explanations which these designs rule out or minimize. This tends to foster a "cookbook" approach to research design - an emphasis on the selection of
an available design rather than on the construction of an appropriate research strategy. While standard designs may sometimes fit real-life situations, it will often be necessary to
"tailor" a research design to minimize specific threats to validity. Furthermore, even if standard textbook designs are used, an understanding of the logic of design construction in
general will improve the comprehension of these standard approaches. This paper takes a structural approach to research design. While this is by no means the only strategy for
constructing research designs, it helps to clarify some of the basic principles of design logic.
Minimizing Threats to Validity
Good research designs minimize the plausible alternative explanations for the hypothesized cause-effect relationship. But such explanations may be ruled out or minimized in a
number of ways other than by design. The discussion which follows outlines five ways to minimize threats to validity, one of which is by research design:
1. By Argument. The most straightforward way to rule out a potential threat to validity is to simply argue that the threat in question is not a reasonable one. Such an argument
may be made either a priori or a posteriori, although the former will usually be more convincing than the latter. For example, depending on the situation, one might argue that
an instrumentation threat is not likely because the same test is used for pre and post test measurements and did not involve observers who might improve, or other such
factors. In most cases, ruling out a potential threat to validity by argument alone will be weaker than the other approaches listed below. As a result, the most plausible threats
in a study should not, except in unusual cases, be ruled out by argument only.
2. By Measurement or Observation. In some cases it will be possible to rule out a threat by measuring it and demonstrating that either it does not occur at all or occurs so
minimally as to not be a strong alternative explanation for the cause-effect relationship. Consider, for example, a study of the effects of an advertising campaign on
subsequent sales of a particular product. In such a study, history (i.e., the occurrence of other events which might lead to an increased desire to purchase the product) would
be a plausible alternative explanation. For example, a change in the local economy, the removal of a competing product from the market, or similar events could cause an
increase in product sales. One might attempt to minimize such threats by measuring local economic indicators and the availability and sales of competing products. If there
is no change in these measures coincident with the onset of the advertising campaign, these threats would be considerably minimized. Similarly, if one is studying the effects
of special mathematics training on math achievement scores of children, it might be useful to observe everyday classroom behavior in order to verify that students were not
receiving any additional math training to that provided in the study.
3. By Design. Here, the major emphasis is on ruling out alternative explanations by adding treatment or control groups, waves of measurement, and the like. This topic will be
discussed in more detail below.
4. By Analysis. There are a number of ways to rule out alternative explanations using statistical analysis. One interesting example is provided by Jurs and Glass (1971). They
suggest that one could study the plausibility of an attrition or mortality threat by conducting a two-way analysis of variance. One factor in this study would be the original
treatment group designations (i.e., program vs. comparison group), while the other factor would be attrition (i.e., dropout vs. non-dropout group). The dependent measure
could be the pretest or other available pre-program measures. A main effect on the attrition factor would be indicative of a threat to external validity or generalizability, while
an interaction between group and attrition factors would point to a possible threat to internal validity. Where both effects occur, it is reasonable to infer that there is a threat to
both internal and external validity.
The plausibility of alternative explanations might also be minimized using covariance analysis. For example, in a study of the effects of "workfare" programs on social welfare
case loads, one plausible alternative explanation might be the status of local economic conditions. Here, it might be possible to construct a measure of economic conditions
and include that measure as a covariate in the statistical analysis. One must be careful when using covariance adjustments of this type -- "perfect" covariates do not exist in
most social research and the use of imperfect covariates will not completely adjust for potential alternative explanations. Nevertheless causal assertions are likely to be
strengthened by demonstrating that treatment effects occur even after adjusting on a number of good covariates.
5. By Preventive Action. When potential threats are anticipated they can often be ruled out by some type of preventive action. For example, if the program is a desirable one,
it is likely that the comparison group would feel jealous or demoralized. Several actions can be taken to minimize the effects of these attitudes including offering the program
to the comparison group upon completion of the study or using program and comparison groups which have little opportunity for contact and communication. In addition,
auditing methods and quality control can be used to track potential experimental dropouts or to insure the standardization of measurement.
The five categories listed above should not be considered mutually exclusive. The inclusion of measurements designed to minimize threats to validity will obviously be related to the
design structure and is likely to be a factor in the analysis. A good research plan should, where possible. make use of multiple methods for reducing threats. In general, reducing a
particular threat by design or preventive action will probably be stronger than by using one of the other three approaches. The choice of which strategy to use for any particular threat
is complex and depends at least on the cost of the strategy and on the potential seriousness of the threat.
Design Construction
Basic Design Elements. Most research designs can be constructed from four basic elements:
1. Time. A causal relationship, by its very nature, implies that some time has elapsed between the occurrence of the cause and the consequent effect. While for some
phenomena the elapsed time might be measured in microseconds and therefore might be unnoticeable to a casual observer, we normally assume that the cause and effect
in social science arenas do not occur simultaneously, In design notation we indicate this temporal element horizontally - whatever symbol is used to indicate the presumed
cause would be placed to the left of the symbol indicating measurement of the effect. Thus, as we read from left to right in design notation we are reading across time.
Complex designs might involve a lengthy sequence of observations and programs or treatments across time.
2. Program(s) or Treatment(s). The presumed cause may be a program or treatment under the explicit control of the researcher or the occurrence of some natural event or
program not explicitly controlled. In design notation we usually depict a presumed cause with the symbol "X". When multiple programs or treatments are being studied using
the same design, we can keep the programs distinct by using subscripts such as "X1" or "X2". For a comparison group (i.e., one which does not receive the program under
study) no "X" is used.
3. Observation(s) or Measure(s). Measurements are typically depicted in design notation with the symbol "O". If the same measurement or observation is taken at every point
in time in a design, then this "O" will be sufficient. Similarly, if the same set of measures is given at every point in time in this study, the "O" can be used to depict the entire
set of measures. However, if different measures are given at different times it is useful to subscript the "O" to indicate which measurement is being given at which point in
time.
4. Groups or Individuals. The final design element consists of the intact groups or the individuals who participate in various conditions. Typically, there will be one or more
program and comparison groups. In design notation, each group is indicated on a separate line. Furthermore, the manner in which groups are assigned to the conditions can
be indicated by an appropriate symbol at the beginning of each line. Here, "R" will represent a group which was randomly assigned, "N" will depict a group which was
nonrandomly assigned (i.e., a nonequivalent group or cohort) and a "C" will indicate that the group was assigned using a cutoff score on a measurement.
Perhaps the easiest way to understand how these four basic elements become integrated into a design structure is to give several examples. One of the most commonly used
designs in social research is the two-group pre-post design which can be depicted as:
There are two lines in the design indicating that the study was comprised of two groups. The two groups were nonrandomly assigned as indicated by the "N". Both groups were
measured before the program or treatment occurred as indicated by the first "O" in each line. Following this preobservation, the group in the first line received a program or treatment
while the group in the second line did not. Finally, both groups were measured subsequent to the program. Another common design is the posttest-only randomized experiment. The
design can be depicted as:
Here, two groups are randomly selected with one group receiving the program and one acting as a comparison. Both groups are measured after the program is administered.
Expanding a Design. We can combine the four basic design elements in a number of ways in order to arrive at a specific design which is appropriate for the setting at hand. One
strategy for doing so begins with the basic causal relationship:
This is the most simple design in causal research and serves as a starting point for the development of better strategies. When we add to this basic design we are essentially
expanding one of the four basic elements described above. Each possible expansion has implications both for the cost of the study and for the threats which might be ruled out.
1. Expanding Across Time. We can add to the basic design by including additional observations either before or after the program or, by adding or removing the program or
different programs. For example, we might add one or more pre-program measurements and achieve the following design:
The addition of such pretests provides a "baseline" which, for instance, helps to assess the potential of a maturation or testing threat. If a change occurs between the first
and second pre-program measures, it is reasonable to expect that similar change might be seen between the second pretest and the posttest even in the absence of the
program. However, if no change occurs between the two pretests, one might be more confident in assuming that maturation or testing is not a likely alternative explanation
for the cause-effect relationship which was hypothesized. Similarly, additional postprogram measures could be added. This would be useful for determining whether an
immediate program effect decays over time, or whether there is a lag in time between the initiation of the program and the occurrence of an effect. We might also add and
remove the program over time:
This is one form of the ABAB design which is frequently used in clinical psychology and psychiatry. The design is particularly strong against a history threat. When the
program is repeated it is less likely that unique historical events can be responsible for replicated outcome patterns.
2. Expanding Across Programs. We have just seen that we can expand the program by adding it or removing it across time. Another way to expand the program would be to
partition it into different levels of treatment. For example, in a study of the effect of a novel drug on subsequent behavior. we might use more than one dosage of the drug:
This design is an example of a simple factorial design with one factor having two levels. Notice that group assignment is not specified indicating that any type of assignment
might have been used. This is a common strategy in a "sensitivity" or "parametric" study where the primary focus is one the effects obtained at various program levels. In a
similar manner, one might expand the program by varying specific components of it across groups. This might be useful if one wishes to study different modes of the delivery
of the program, different sets of program materials and the like. Finally, we can expand the program by using theoretically polarized or "opposite" treatments. A comparison
group can be considered one example of such a polarization. Another might involve use of a second program which is expected to have an opposite effect on the outcome
measures. A strategy of this sort provides evidence that the outcome measure is sensitive enough to differentiate between different programs.
3. Expanding Across Observations. At any point in time in a research design it is usually desirable to collect multiple measurements. For example, we might add a number of
similar measures in order to determine whether the results of these converge. Or, we might wish to add measurements which theoretically should not be affected by the
program in question in order to demonstrate that the program discriminates between effects. Strategies of this type are useful for achieving convergent and discriminant
validity of measures as discussed in Campbell and Fiske (1959). Another way to expand the observations is by proxy measurements. Assume that we wish to study a new
educational program but neglected to take pre-program measurements. We might use a standardized achievement test for the posttest and grade point average records as
a proxy measure of student achievement prior to the initiation of the program. Finally, we might also expand the observations through the use of "recollected" measures.
Again, if we were conducting a study and had neglected to administer a pretest or desired information in addition to the pretest information, we might ask participants to
recall how they felt or behaved prior to the study and use this information as an additional measure. Different measurement approaches obviously yield data of different
quality. What is advocated here is the use of multiple measurements rather than reliance on only a single strategy.
4. Expanding Across Groups. Often, it will be to our advantage to add additional groups to a design in order to rule out specific threats to validity. For example, consider the
following pre-post two-group randomized experimental design:
If this design were implemented within a single institution where members of the two groups were in contact with each other one might expect that intergroup
communication, group rivalry, or demoralization of a group which gets denied a desirable treatment or gains an undesirable one might pose threats to the validity of the
causal inference. In such a case. one might add an additional nonequivalent group from a similar institution which consists of persons unaware of the original two groups:
In a similar manner, whenever nonequivalent groups are used in a study it will usually be advantageous to have multiple replications of each group. The use of many
nonequivalent groups helps to minimize the potential of a particular selection bias affecting the results. In some cases it may be desirable to include the norm group as an
additional group in the design. Norming group averages are available for most standardized achievement tests for example, and might comprise an additional nonequivalent
control group. Cohort groups might also be used in a number of ways. For example, one might use a single measure of a cohort group to help rule out a testing threat:
In this design, the randomized groups might be sixth graders from the same school year while the cohort might be the entire sixth grade from the previous academic year.
This cohort group did not take the pretest and, if they are similar to the randomly selected control group, would provide evidence for or against the notion that taking the
pretest had an effect on posttest scores. We might also use pre-post cohort groups:
Here, the treatment group consists of sixth graders, the first comparison group of seventh graders in the same year, and the second comparison group consists of the
following year's sixth graders (i.e., the fifth graders during the study year). Strategies of this sort are particularly useful in nonequivalent designs where selection bias is a
potential problem and where routinely-collected institutional data is available. Finally, one other approach for expanding the groups involves partitioning groups with different
assignment strategies. For example, one might randomly divide nonequivalent groups, or select nonequivalent subgroups from randomly assigned groups. An example of
this sort involving the combination of random assignment and assignment by a cutoff is discussed in detail below.
A Simple Strategy for Design Construction.
Considering the basic elements of a research design or the possibilities for expansion are not alone sufficient. We need to be able to integrate these elements with an overall
strategy. Furthermore we need to decide which potential threats are best handled by design rather than by argument, measurement, analysis, or preventive action.
While no definitive approach for designing designs exists, we might suggest a tentative strategy based on the notion of expansion discussed above. First, we begin the designing
task by setting forth a design which depicts the simple hypothesized causal relationship. Second, we deliberately over-expand this basic design by expanding across time, program.
observations, and groups. At this step, the emphasis is on accounting for as many likely alternative explanations as possible using the design. Finally, we then scale back this over-
expanded version considering the effect of eliminating each design component. It is at this point that we face the difficult decisions concerning the costs of each design component
and the advantages of ruling out specific threats using other approaches.
There are several advantages which result from using this type of approach to design construction. First, we are forced to be explicit about the decisions which are made. Second.
the approach is "conservative" in nature. The strategy minimizes the chance of our overlooking a major threat to validity in constructing our design. Third, we arrive at a design which
is "tailored" to the situation at hand. Finally, the strategy is cost-efficient. Threats which can be accounted for by some other, less costly, approach need not be accounted for in the
design itself.
An Example of a Hybrid Design
Some of the ideas discussed above can be illustrated in an example. The design in question is drawn from an earlier discussion by Boruch (1975). To our knowledge, this design
has never been used, although it has strong features to commend it.
Let us assume that we wish to study the effects of a new compensatory education program on subsequent student achievement. The program is designed to help students who are
poor in reading to improve in those skills. We can begin then with the simple hypothesized cause-effect relationship:
Here, the "X" represents the reading program and the "O" stands for a reading achievement test. We decide that it is desirable to add a pre-program measure so that we might
investigate whether the program "improves" reading test scores. We also decide to expand across groups by adding a comparison group. At this point we have the typical:
The next problem concerns how the two groups will be assigned. Since the program is specifically designed to help students who need special assistance in reading, we rule out
random assignment because it would require denying the program to students in need. We had considered the possibility of offering the program to one randomly assigned group in
the first year and to the control group in the second, but ruled that out on the grounds that it would require two years of program expenses and the denial of a potentially helpful
program for half of the students for a period of a year. Instead we decide to assign students by means of a cutoff score on the pretest. All students scoring below a preselected
percentile on the reading pretest would be given the program while those above that percentile would act as controls (i.e., the regression-discontinuity design). However, previous
experience with this strategy (Trochim, 1994) has shown us that it is difficult to adhere to a single cutoff score for assignment to group. We are especially concerned that teachers or
administrators will allow students who score slightly above the cutoff point into the program because they have little confidence in the ability of the achievement test to make fine
distinctions in reading skills for children who score very close to the cutoff. To deal with this potential problem, we decide to partition the groups using a particular combination of
assignment by a cutoff and random assignment:
In this design we have set up two cutoff points. All those scoring below a certain percentile are assigned to the treatment group automatically by this cutoff, All those scoring above
another higher percentile are automatically assigned to the comparison group by this cutoff. Finally, all those who fall in the interval between the cutoffs on the pretest are randomly
assigned to either the treatment or comparison groups.
There are several advantages to this strategy. It directly addresses the concern to teachers and administrators that the test may not be able to discriminate well between students
who score immediately above or below a cutoff point. For example, a student whose true ability in reading would place him near the cutoff might have a bad day and therefore might
be placed into the treatment or comparison group by chance factors. The design outlined above is defensible. We can agree with the teachers and administrators that the test is
fallible. Nevertheless, since we need some criterion to assign students to the program, we can argue that the fairest approach would be to assign borderline cases by lottery. In
addition, by combining two excellent strategies (i.e., the randomized experiment and the regression-discontinuity) we can analyze results separately for each and address the
possibility that design factors might bias results.
There are many other worthwhile considerations not mentioned in the above scenario. For example, instead of using simple randomized assignment within the cutoff interval, we
might use a weighted random assignment so that students scoring lower in the interval have a greater probability of being assigned to the program. In addition, we might consider
expanding the design in a number of other ways, by including double.pretests or multiple posttests; multiple measures of reading skills; additional replications of the program or
variations of the programs and additional groups such as norming groups, controls from other schools, and the like. Nevertheless, this brief example serves to illustrate the
advantages of explicitly constructing a research design to meet the specific needs of a particular situation.
The Nature of Good Design
Throughout the design construction task, it is important to have in mind some endpoint, some criteria which we should try to achieve before finally accepting a design strategy. The
criteria discussed below are only meant to be suggestive of the characteristics found in good research design. It is worth noting that all of these criteria point to the need to
individually tailor research designs rather than accepting standard textbook strategies as is.
1. Theory-Grounded. Good research strategies reflect the theories which are being investigated. Where specific theoretical expectations can be hypothesized these are
incorporated into the design. For example, where theory predicts a specific treatment effect on one measure but not on another, the inclusion of both in the design improves
discriminant validity and demonstrates the predictive power of the theory.
2. Situational. Good research designs reflect the settings of the investigation. This was illustrated above where a particular need of teachers and administrators was explicitly
addressed in the design strategy. Similarly, intergroup rivalry, demoralization, and competition might be assessed through the use of additional comparison groups who are
not in direct contact with the original group.
3. Feasible. Good designs can be implemented. The sequence and timing of events are carefully thought out. Potential problems in measurement, adherence to assignment,
database construction and the like, are anticipated. Where needed, additional groups or measurements are included in the design to explicitly correct for such problems.
4. Redundant. Good research designs have some flexibility built into them. Often, this flexibility results from duplication of essential design features. For example, multiple
replications of a treatment help to insure that failure to implement the treatment in one setting will not invalidate the entire study.
5. Efficient. Good designs strike a balance between redundancy and the tendency to overdesign. Where it is reasonable, other, less costly, strategies for ruling out potential
threats to validity are utilized.
This is by no means an exhaustive list of the criteria by which we can judge good research design. nevertheless, goals of this sort help to guide the researcher toward a final design
choice and emphasize important components which should be included.
The development of a theory of research methodology for the social sciences has largely occurred over the past half century and most intensively within the past two decades. It is
not surprising, in such a relatively recent effort, that an emphasis on a few standard research designs has occurred. Nevertheless, by moving away from the notion of "design
selection" and towards an emphasis on design construction, there is much to be gained in our understanding of design principles and in the quality of our research.
References
Boruch, R.F. (1975). Coupling randomized experiments and approximations to experiments in social program evaluation. Sociological Methods and Research, 4, 1, 31-53.
Campbell, D.T. and Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.
Campbell, D.T. and Stanley, J.C. (1963, 1966). Experimental and Quasi-Experimental Designs for Research. Rand McNally, Chicago, Illinois.
Cook, T.D. and Campbell, D.T. (1979). Quasi-Experimentation: Design and Analysis for Field Settings. Rand McNally, Chicago, Illinois.
Judd, C.M. and Kenny, D.A. (1981). Estimating the Effects of Social Interventions. Cambridge University Press, Cambridge, MA.
Jurs, S.G. and Glass, G.V. (1971). The effect of experimental mortality on the internal and external validity of the randomized comparative experiment. The Journal of Experimental
Education, 40, 1, 62-66.
Trochim, W. (1982). Methodologically-based discrepancies in compensatory education evaluations. Evaluation Review, 6, 4, 443-480.

Experimentation ]
Reprinted from Trochim, W. (Ed.), (1986). Editor's Notes. Advances in quasi-experimental design and analysis. New Directions for Program Evaluation Series, Number 31, San Francisco, CA:
Jossey-Bass.
The intent of this volume is to update, perhaps even to alter, our thinking about quasi-experimentation in applied social research and program evaluation. Since Campbell and
Stanley (1963) introduced the term quasi-experiment, we have tended to see this area as involving primarily two interrelated topics: the theory of the validity of casual inferences and
a taxonomy of the research designs that enable us to examine causal hypotheses. We can see this in the leading expositions of quasi-experimentation (Campbell and Stanley, 1963,
1966; Cook and Campbell, 1979) as well as in the standard textbook presentations of the topic (Kidder and Judd, 1986; Rossi and Freeman, 1985), where it is typical to have
separate sections or chapters that discuss validity issues first and then proceed to distinguishable quasi-experimental designs (for example, the pretest-posttest nonequivalent group
design, the regression-discontinuity design, the interrupted time series design). My first inclination in editing this volume was to emulate this tradition, beginning the volume with a
chapter on validity and following it with a chapter for each of the major quasi-experimental designs that raised the relevant conceptual and analytical issues and discussed recent
advances. But, I think, such an approach would have simply contributed to a persistent confusion about the nature of quasi-experimentation and its role in research.
Instead, this volume makes the case that we have moved beyond the traditional thinking on quasi-experiments as a collection of specific designs and threats to validity toward a
more integrated, synthetic view of quasi-experimentation as part of a general logical and epistemological framework for research. To support this view that the notion of quasi-
experimentation is evolving toward increasing integration, I will discuss a number of themes that seem to characterize our current thinking and that cut across validity typologies and
design taxonomies. This list of themes may also be viewed as a tentative description of the advances in our thinking about quasi-experimentation in social research.
The Role of Judgment
One theme that underlies most of the others and that illustrates our increasing awareness of the tentativeness and frailty of quasi-experimentation concerns the importance of
human judgment in research. Evidence bearing on a causal relationship emerges from many sources, and it is not a trivial matter to integrate or resolve conflicts or discrepancies. In
recognition of this problem of evidence, we are beginning to address causal inference as a psychological issue that can be illuminated by cognitive models of the judgmental process
(see Chapter One of this volume and Einhom and Hogarth, 1986). We are also recognizing more clearly the sociological bases of scientific thought (Campbell, 1984) and the fact
that science is at root a human enterprise. Thus, a positivist, mechanistic view is all but gone from quasi-experimental thinking, and what remains is a more judgmental and more
scientifically sensible perspective.
The Case for Tailored Designs
Early expositions of quasi-experimentation took a largely taxonomic approach, laying out a collection of relatively discrete research designs and discussing how weak or strong they
were for valid causal inference. Almost certainly, early proponents recognized that there was a virtual infinity of design variations and that validity was more complexly related to
theory and context than their presentations implied. Nonetheless, what seemed to evolve was a "cookbook" approach to quasi-experimentation that involved "choosing" a design that
fit the situation and checking off lists of validity threats.
In an important paper on the coupling of randomized and nonrandomized design features, Boruch (1975) explicitly encouraged us to construct research designs as combinations of
more elemental units (for example, assignment strategies, measurement occasions) based on the specific contextual needs and plausible alternative explanations for a treatment
effect. This move toward hybrid, tailored, or patched-up designs, which involved suggesting how such designs could be accomplished, is one in which I have been a minor
participant (Trochim and Land, 1982; Trochim, 1984). It is emphasized by Cordray in Chapter One of this volume. The implication for current practice is that we should focus on the
advantages of different combinations of design features rather than on a relatively restricted set of prefabricated designs. In teaching quasi-experimental methods, we need to break
away from a taxonomic design mentality and emphasize design principles and issues that cut across the traditional distinctions between true experiments, nonexperiments, and
quasi-experiments.
The Crucial Role of Theory
Quasi-experimentation and its randomized experimental parent have been criticized for encouraging an atheoretical "black box" mentality of research (see, for instance, Chen and
Rossi, 1984; Cronbach, 1982). Persons are assigned to either complex molar program packages or (often) to equally complex comparison conditions. The machinery of random
assignment (or our quasi-experimental attempts to approximate random assignment) are the primary means of defining whether the program has an effect. This ceteris paribus
mentality is inherently atheoretical and noncontextual: It assumes that the same mechanism works in basically the same way whether we apply it in mental health or criminal justice,
income maintenance or education.
There is nothing inherently wrong with this program-group-versus-comparison-group logic. The problem is that it may be a rather crude, uninformative approach. In the two-group
case, we are simply creating a dichotomous input into reality. If we observe a posttest difference between groups, it could be explained by this dichotomous program-versus-
comparison-group input or by any number of alternative explanations, including differential attrition rates, intergroup rivalry and communication, initial selection differences among
groups, or different group histories. We usually try to deal with these alternative explanations by ruling them out through argument, additional measurement, patched-up design
features, and auxiliary analysis. Cook and Campbell (1979), Cronbach (1982), and others strongly favor replication of treatment effects as a standard for judging the validity of a
causal assertion, but this advice does little to enhance the validity and informativeness within individual studies or program evaluations.
Chen and Rossi (1984, p. 339) approached this issue by advocating increased attention to social science theory: "not the global conceptual schemes of the grand theorists but much
more prosaic theories that are concerned with how human organizations work and how social problems are generated." Evaluators have similarly begun to stress the importance of
program theory as the basis for causal assessment (for example, Bickman, in press). These developments allow increased emphasis to be placed on the role of pattern matching
(Trochim, 1985) through the generation of more complex theory-driven predictions that, if corroborated, allow fewer plausible alternative explanations for the effect of a program.
Because appropriate theories may not be readily available, especially for the evaluation of contemporary social programs, we are developing methods and processes that facilitate
the articulation of the implicit theories which program administrators and stakeholder groups have in mind and which presumably guide the formation and implementation of the
program (Trochim, 1985). This theory-driven perspective is consonant with Mark's emphasis in Chapter Three on the study of causal process and with Cordray's discussion in
Chapter One on ruling in the program as opposed to ruling out alternative explanations.
Attention to Program Implementation
A theory-driven approach to quasi-experimentation will be futile unless we can demonstrate that the program was in fact carried out or implemented as the theory intended.
Consequently, we have seen the development of program implementation theory (for example, McLaughlin, 1984) that directly addresses the process of program execution. One
approach emphasizes the development of organizational procedures and training systems that accurately transmit the program and that anticipate likely institutional sources of
resistance. Another strategy involves the assessment of program delivery through program audits, management information systems, and the like. This emphasis on program
implementation has further obscured the traditional distinction between process and outcome evaluation. At the least, it is certainly clear that good quasi-experimental outcome
evaluation cannot be accomplished without attending to program processes, and we are continuing to develop better notions of how to combine these two efforts.
The Importance of Quality Control
Over and over, our experience with quasi-experimentation has shown that even the best-laid research plans often go awry in practice, sometimes with disastrous results. Thus, over
the past decade we have begun to pay increasing attention to the integrity and quality of our research methods in real-world settings. One way of achieving this goal is to incorporate
techniques used by other professions -- accounting, auditing, industrial quality control -- that have traditions in data integrity and quality assurance (Trochim and Visco, 1985). For
instance, double bookkeeping can be used to keep verifiable records of research participation. Acceptance sampling can be an efficient method for checking accuracy in large data
collection efforts, where an exhaustive examination of records is impractical or excessive in cost. These issues are particularly important in quasi-experimentation, where it is
incumbent upon the researcher to demonstrate that sampling, measurement, group assignment, and analysis decisions do not interact with program participation in ways that can
confound the final interpretation of results.
The Advantages of Multiple Perspectives
We have long recognized the importance of replication and systematic variation in research. In the past few years, Cook (1985) and colleagues Shadish and Houts (Chapter Two in
this volume) have articulated a rationale for achieving systematic variation that they term critical multiplism. This perspective rests on the notion that no single realization will ever be
sufficient for understanding a phenomenon with validity. Multiple realizations -- of research questions, measures, samples, designs, analyses, replications, and so on -- are essential
for convergence on the truth of a matter. However, such a varied approach can become a methodological and epistemological Pandora's box unless we apply critical judgment in
deciding which multiples we will emphasize in a study or set of studies (Chapter Two in this volume and Mark and Shotland, 1985).
Evolution of the Concept of Validity
The history of quasi-experimentation is inseparable from the development of the theory of the validity of causal inference. Much of this history has been played out through the
ongoing dialogue between Campbell and Cronbach concerning the definition of validity and the relative importance that should be attributed on the one hand to the establishment of
a causal relationship and on the other hand to its generalizability. In the most recent major statement in this area, Cronbach (1982) articulated the UTOS model, which conceptually
links the units, treatments, observing operations and settings in a study into a framework that can be used for establishing valid causal inference. The dialogue continues in Chapter
Four of this volume, where Campbell attempts to dispel persistent confusion about the types of validity by tentatively relabeling internal validity as local molar causal validity and
external validity as the principle of proximal similarity. It is reasonable to hope that we might achieve a clearer consensus on this issue, as Mark argues in Chapter Three, where he
attempts to resolve several different conceptions of validity, including those of Campbell and Cronbach.
Development of Increasingly Complex Realistic Analytic Models
In the past decade, we have made considerable progress toward complicating our statistical analyses to account for increasingly complex contexts and designs. One such advance
involves the articulation of causal models of the sort described by Reichardt and Gollob in Chapter Six, especially models that allow for latent variables and that directly model
measurement error Joreskog and Sorbom, 1979).
Another important recent development involves analyses that address the problem of selection bias or group nonequivalence -- a central issue in quasi-experiments because
random assignment is not used and there is no assurance that comparison groups are initially equivalent (Rindskopf's discussion in Chapter Five). At the same time, there is
increasing recognition of the implications of not attending to the correct unit of analysis when analyzing the data and of the advantages and implications of conducting analyses at
multiple levels. Thus, when we assign classrooms to conditions but analyze individual student data rather than classroom aggregates, we are liable to get a different view of program
effects than we are when we analyze at the classroom level, as Shadish, Cook, and Houts argue in Chapter Two. Other notable advances that are not explicitly addressed in this
volume include the development of log linear, probit, and logit models for the analysis of qualitative or nominal level outcome variables (Feinberg, 1980; Forthofer and Lehnen, 1981)
and the increasing proliferation of Bayesian statistical approaches to quasi-experimental contexts (Pollard, 1986).
Parallel to the development of these increasingly complex, realistic analytic models, cynicism has deepened about the ability of any single model or analysis to be sufficient. Thus, in
Chapter Six Reichardt and Gollob call for multiple analyses to bracket bias, and in Chapter Five Rindskopf recognizes the assumptive notions of any analytic approach to selection
bias. We have virtually abandoned the hope of a single correct analysis, and we have accordingly moved to multiple analyses that are based on systematically distinct assumptional
frameworks and that rely in an increasingly direct way on the role of judgment.
Conclusion
All the developments just outlined point to an increasingly realistic and complicated life for quasi-experimentalists. The overall picture that emerges is that all quasi-experimentation
is judgmental. It is based on multiple and varied sources of evidence, it should be multiplistic in realization, it must attend to process as well as to outcome, it is better off when
theory driven, and it leads ultimately to multiple analyses that attempt to bracket the program effect within some reasonable range.
In one sense, this is hardly a pretty picture. Our views about quasi-experimentation and its role in causal inference are certainly more tentative and critical than they were in 1965 or
perhaps even in 1979. But, this more integrated and complex view of quasi-experimentation has emerged directly from our experiences in the conduct of such studies. As such, it
realistically represents our current thinking about one of the major strands in the evolution of social research methodology in this century.
References
Bickman, L. (ed.). Program Theory and Program Evaluation. New Directions for Program Evaluation, no. 33. San Francisco: Jossey-Bass, in press.
Boruch, R. F. "Coupling Randomized Experiments and Approximations to Experiments in Social Program Evaluation." Sociological Methods and Research, 1975, 4 (1), 31-53.
Campbell, D. T. "Can We Be Scientific in Applied Social Science?" In R. F. Conner and others (eds.), Evaluation Studies Review Annual. Vol. 9. Beverly Hills, Calif.: Sage, 1984.
Campbell, D. T., and Stanley, J.C, "Experimental and Quasi-Experimental Designs for Research on Teaching." In N. L. Gage (ed.), Handbook of Research on Teaching. Chicago:
Rand McNally, 1963.
Campbell, D. T., and Stanley, J. C. Experimental and Quasi-experimental Designs for Research. Chicago: Rand McNally, 1966.
Chen, H.,and Rossi, P. A. "Evaluating with Sense:The Theory-Driven Approach." In R.F. Conner and others (eds.), Evaluation Studies Review Annual, Vol. 9. Beverly Hills, Calif.:
Sage, 1985.
Cook, T.D. "Postpositivist Critical Multiplism." In R. L. Shotland and M. M, Mark (eds.), Social Science and Social Policy. Beverly Hills, Calif.: Sage, 1985.
Cook, T.D. and Campbell, D.T. (1979). Quasi-Experimentation: Design and Analysis for Field Settings. Rand McNally, Chicago, Illinois.
Cronbach, L.J. Designing Evaluations of Educational and Social Programs. San Francisco: Jossey-Bass, 1982.
Einhorn, H. J., and Hogarth, R. M. "Judging Probable Cause." Psychological Bulletin, 1986 99, 3-19.
Feinberg, S. E. The Analysis of Cross-Classified Categorical Data. (2nd ed.) Cambridge, Mass.: M.I.T. Press, 1980.
Forchofer, R. N., and Lehnen, R. G. Public Program Analysis: A New Categorical Data Approach. Belmont, Calif.: Wadsworth, 1981.
Joreskog, K. C., and Sorbom, D. Advances in Factor Analysis and Structural Equation Models. Cambridge: Abt Books, 1979.
Kidder, L. H.,and Judd,C. M. Research Methods in Social Relations. (5th ed.) New York: Holt, Rinehart & Winston, 1986.
Mark, M. M., and Shotland, R. L. "Toward More Useful Social Science." In R. L. Shotland and M. M. Mark(eds.),Social Science and Social Policy. Beverly Hills,Calif.: Sage, 1985.
McLaughlin, M. W. "Implementation Realities and Evaluation Design." In R. L. Shotiand and M. M. Mark (eds.), Social Science and Social Policy. Beverly Hills, Calif.: Sage, 1984.
Pollard,W. E.Bayesian Statistics for Evaluation Research. Beverly Hills, Calif.:Sage, 1986.
Rossi, P. H., and Freeman, H. E. Evaluation: A Systematic Approach. (3rd ed.) Beverly Hills, Calif.: Sage, 1985.
Trochim, W. Research Design for Program Evaluation: The Regression-Discontinuity Approach. Beverly Hills, Calif.: Sage, 1984.
Trochim, W. "Pattern Matching, Validity, and Conceptualization in Program Evaluation." Evaluation Review, 1985, 9 (5), 575-604.
Trochim, W., and Land, D. "Designing Designs for Research." The Researcher. 1982, 1 (1), 1-6.
Trochim, W,, and Visco, R. "Quality Control in Evaluation." In D. S. Cordray (ed.), Utilizing Prior Research in Evaluation Planning. New Directions for Program Evaluation, no. 27.
San Francisco: Jossey-Bass, 1985.
[ Home ] [ Measurement Validity Types ] [ Idea of Construct Validity ] [ Convergent & Discriminant Validity ] [ Threats to Construct Validity ] [ The Nomological Network ] [ The Multitrait-Multimethod Matrix ]
[ Pattern Matching for Construct Validity ]
What is the Multitrait-Multimethod Matrix?
The Multitrait-Multimethod Matrix (hereafter labeled MTMM) is an approach to assessing the construct validity of a set of measures in a study. It was developed in 1959 by Campbell
and Fiske (Campbell, D. and Fiske, D. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. 56, 2, 81-105.) in part as an attempt to provide a practical
methodology that researchers could actually use (as opposed to the nomological network idea which was theoretically useful but did not include a methodology). Along with the
MTMM, Campbell and Fiske introduced two new types of validity -- convergent and discriminant -- as subcategories of construct validity. Convergent validity is the degree to which
concepts that should be related theoretically are interrelated in reality. Discriminant validity is the degree to which concepts that should not be related theoretically are, in fact, not
interrelated in reality. You can assess both convergent and discriminant validity using the MTMM. In order to be able to claim that your measures have construct validity, you have to
demonstrate both convergence and discrimination.
The MTMM is simply a matrix or table of correlations arranged to facilitate the interpretation of the assessment of construct validity. The MTMM assumes that you measure each of
several concepts (called traits by Campbell and Fiske) by each of several methods (e.g., a paper-and-pencil test, a direct observation, a performance measure). The MTMM is a very
restrictive methodology -- ideally you should measure each concept by each method.
To construct an MTMM, you need to arrange the correlation matrix by methods within concepts. The figure shows an MTMM for three concepts (traits 1, 2 and 3) each of which is
measured with three different methods (A, B and C) Note that you lay the matrix out in blocks by method. Essentially, the MTMM is just a correlation matrix between your measures,
with one exception -- instead of 1's along the diagonal (as in the typical correlation matrix) we substitute an estimate of the reliability of each measure as the diagonal.
Before you can interpret an MTMM, you have to understand how to identify the different parts of the matrix. First, you should note that the matrix is consists of nothing but
correlations. It is a square, symmetric matrix, so we only need to look at half of it (the figure shows the lower triangle). Second, these correlations can be grouped into three kinds of
shapes: diagonals, triangles, and blocks. The specific shapes are:
The Reliability Diagonal

(monotrait-monomethod)
Estimates of the reliability of each measure in the matrix. You can estimate reliabilities a number of different ways (e.g., test-retest, internal consistency). There are
as many correlations in the reliability diagonal as there are measures -- in this example there are nine measures and nine reliabilities. The first reliability in the
example is the correlation of Trait A, Method 1 with Trait A, Method 1 (hereafter, I'll abbreviate this relationship A1-A1). Notice that this is essentially the correlation
of the measure with itself. In fact such a correlation would always be perfect (i.e., r=1.0). Instead, we substitute an estimate of reliability. You could also consider
these values to be monotrait-monomethod correlations.
The Validity Diagonals

(monotrait-heteromethod)
Correlations between measures of the same trait measured using different methods. Since the MTMM is organized into method blocks, there is one validity diagonal
in each method block. For example, look at the A1-A2 correlation of .57. This is the correlation between two measures of the same trait (A) measured with two
different measures (1 and 2). Because the two measures are of the same trait or concept, we would expect them to be strongly correlated. You could also consider
these values to be monotrait-heteromethod correlations.
The Heterotrait-Monomethod Triangles
These are the correlations among measures that share the same method of measurement. For instance, A1-B1 = .51 in the upper left heterotrait-monomethod
triangle. Note that what these correlations share is method, not trait or concept. If these correlations are high, it is because measuring different things with the same
method results in correlated measures. Or, in more straightforward terms, you've got a strong "methods" factor.
Heterotrait-Heteromethod Triangles
These are correlations that differ in both trait and method. For instance, A1-B2 is .22 in the example. Generally, because these correlations share neither trait nor
method we expect them to be the lowest in the matrix.
The Monomethod Blocks
These consist of all of the correlations that share the same method of measurement. There are as many blocks as there are methods of measurement.
The Heteromethod Blocks
These consist of all correlations that do not share the same methods. There are (K(K-1))/2 such blocks, where K = the number of methods. In the example, there are
3 methods and so there are (3(3-1))/2 = (3(2))/2 = 6/2 = 3 such blocks.
Principles of Interpretation
Now that you can identify the different parts of the MTMM, you can begin to understand the rules for interpreting it. You should realize that MTMM interpretation requires the
researcher to use judgment. Even though some of the principles may be violated in an MTMM, you may still wind up concluding that you have fairly strong construct validity. In other
words, you won't necessarily get perfect adherence to these principles in applied research settings, even when you do have evidence to support construct validity. To me,
interpreting an MTMM is a lot like a physician's reading of an x-ray. A practiced eye can often spot things that the neophyte misses! A researcher who is experienced with MTMM
can use it identify weaknesses in measurement as well as for assessing construct validity.
To help make the principles more concrete, let's make the example a bit more realistic. We'll imagine that we are going to conduct a study of sixth grade students and that we want
to measure three traits or concepts: Self Esteem (SE), Self Disclosure (SD) and Locus of Control (LC). Furthermore, let's measure each of these three different ways: a Paper-and-
Pencil (P&P) measure, a Teacher rating, and a Parent rating. The results are arrayed in the MTMM. As the principles are presented, try to identify the appropriate coefficients in the
MTMM and make a judgement yourself about the strength of construct validity claims.
The basic principles or rules for the MTMM are:
Coefficients in the reliability diagonal should consistently be the highest in the matrix.
That is, a trait should be more highly correlated with itself than with anything else! This is uniformly true in our example.
Coefficients in the validity diagonals should be significantly different from zero and high enough to warrant further investigation.
This is essentially evidence of convergent validity. All of the correlations in our example meet this criterion.
A validity coefficient should be higher than values lying in its column and row in the same heteromethod block.
In other words, (SE P&P)-(SE Teacher) should be greater than (SE P&P)-(SD Teacher), (SE P&P)-(LC Teacher), (SE Teacher)-(SD P&P) and (SE Teacher)-(LC
P&P). This is true in all cases in our example.
A validity coefficient should be higher than all coefficients in the heterotrait-monomethod triangles.
This essentially emphasizes that trait factors should be stronger than methods factors. Note that this is not true in all cases in our example. For instance, the (LC
P&P)-(LC Teacher) correlation of .46 is less than (SE Teacher)-(SD Teacher), (SE Teacher)-(LC Teacher), and (SD Teacher)-(LC Teacher) -- evidence that there
might me a methods factor, especially on the Teacher observation method.
The same pattern of trait interrelationship should be seen in all triangles.
The example clearly meets this criterion. Notice that in all triangles the SE-SD relationship is approximately twice as large as the relationships that involve LC.
Advantages and Disadvantages of MTMM
The MTMM idea provided an operational methodology for assessing construct validity. In the one matrix it was possible to examine both convergent and discriminant validity
simultaneously. By its inclusion of methods on an equal footing with traits, Campbell and Fiske stressed the importance of looking for the effects of how we measure in addition to
what we measure. And, MTMM provided a rigorous framework for assessing construct validity.
Despite these advantages, MTMM has received little use since its introduction in 1959. There are several reasons. First, in its purest form, MTMM requires that you have a fully-
crossed measurement design -- each of several traits is measured by each of several methods. While Campbell and Fiske explicitly recognized that one could have an incomplete
design, they stressed the importance of multiple replication of the same trait across method. In some applied research contexts, it just isn't possible to measure all traits with all
desired methods (would you use an "observation" of weight?). In most applied social research, it just wasn't feasible to make methods an explicit part of the research design.
Second, the judgmental nature of the MTMM may have worked against its wider adoption (although it should actually be perceived as a strength). many researchers wanted a test
for construct validity that would result in a single statistical coefficient that could be tested -- the equivalent of a reliability coefficient. It was impossible with MTMM to quantify the
degree of construct validity in a study. Finally, the judgmental nature of MTMM meant that different researchers could legitimately arrive at different conclusions.
A Modified MTMM -- Leaving out the Methods Factor
As mentioned above, one of the most difficult aspects of MTMM from an implementation point of
view is that it required a design that included all combinations of both traits and methods. But the
ideas of convergent and discriminant validity do not require the methods factor. To see this, we
have to reconsider what Campbell and Fiske meant by convergent and discriminant validity.
What is convergent validity?
It is the principle that measures of theoretically similar constructs should be highly intercorrelated.
We can extend this idea further by thinking of a measure that has multiple items, for instance, a
four-item scale designed to measure self-esteem. If each of the items actually does reflect the
construct of self-esteem, then we would expect the items to be highly intercorrelated as shown in
the figure. These strong intercorrelations are evidence in support of convergent validity.
And what is discriminant validity?
It is the principle that measures of theoretically different constructs should not correlate highly
with each other. We can see that in the example that shows two constructs -- self-esteem and
locus of control -- each measured in two instruments. We would expect that, because these are
measures of different constructs, the cross-construct correlations would be low, as shown in the
figure. These low correlations are evidence for validity. Finally, we can put this all together to see
how we can address both convergent and discriminant validity simultaneously. Here, we have
two constructs -- self-esteem and locus of control -- each measured with three instruments. The
red and green correlations are within-construct ones. They are a reflection of convergent validity
and should be strong. The blue correlations are cross-construct and reflect discriminant validity.
They should be uniformly lower than the convergent coefficients.
The important thing to notice about this matrix is that it does not explicitly include a methods
factor as a true MTMM would. The matrix examines both convergent and discriminant validity
(like the MTMM) but it only explicitly looks at construct intra- and interrelationships. We can see
in this example that the MTMM idea really had two major themes. The first was the idea of
looking simultaneously at the pattern of convergence and discrimination. This idea is similar in
purpose to the notions implicit in the nomological network -- we are looking at the pattern of
interrelationships based upon our theory of the nomological net. The second idea in MTMM was
the emphasis on methods as a potential confounding factor.
While methods may confound the results, they won't necessarily do so in any given study. And, while we need to examine our results for the potential for methods factors, it may be
that combining this desire to assess the confound with the need to assess construct validity is more than one methodology can feasibly handle. Perhaps if we split the two agendas,
we will find that the possibility that we can examine convergent and discriminant validity is greater. But what do we do about methods factors? One way to deal with them is through
replication of research projects, rather than trying to incorporate a methods test into a single research study. Thus, if we find a particular outcome in a study using several measures,
we might see if that same outcome is obtained when we replicate the study using different measures and methods of measurement for the same constructs. The methods issue is
considered more as an issue of generalizability (across measurement methods) rather than one of construct validity.
When viewed this way, we have moved from the idea of a MTMM to that of the multitrait matrix that enables us to examine convergent and discriminant validity, and hence construct
validity. We will see that when we move away from the explicit consideration of methods and when we begin to see convergence and discrimination as differences of degree, we
essentially have the foundation for the pattern matching approach to assessing construct validity.

There's an awful lot of confusion in the methodological literature that stems from the wide variety of labels that are used to describe the validity of measures. I want to make two
cases here. First, it's dumb to limit our scope only to the validity of measures. We really want to talk about the validity of any operationalization. That is, any time you translate a
concept or construct into a functioning and operating reality (the operationalization), you need to be concerned about how well you did the translation. This issue is as relevant
when we are talking about treatments or programs as it is when we are talking about measures. (In fact, come to think of it, we could also think of sampling in this way. The
population of interest in your study is the "construct" and the sample is your operationalization. If we think of it this way, we are essentially talking about the construct validity of the
sampling!). Second, I want to use the term construct validity to refer to the general case of translating any construct into an operationalization. Let's use all of the other validity terms
to reflect different ways you can demonstrate different aspects of construct validity.
With all that in mind, here's a list of the validity types that are typically mentioned in texts and research papers when talking about the quality of measurement:
Construct validity
Translation validity
Face validity
Content validity
Criterion-related validity
Predictive validity
Concurrent validity
Convergent validity
Discriminant validity
I have to warn you here that I made this list up. I've never heard of "translation" validity before, but I needed a good name to summarize what both face and content validity are
getting at, and that one seemed sensible. All of the other labels are commonly known, but the way I've organized them is different than I've seen elsewhere.
Let's see if we can make some sense out of this list. First, as mentioned above, I would like to use the term construct validity to be the overarching category. Construct validity is
the approximate truth of the conclusion that your operationalization accurately reflects its construct. All of the other terms address this general issue in different ways. Second, I
make a distinction between two broad types: translation validity and criterion-related validity. That's because I think these correspond to the two major ways you can assure/assess
the validity of an operationalization. In translation validity, you focus on whether the operationalization is a good reflection of the construct. This approach is definitional in nature --
it assumes you have a good detailed definition of the construct and that you can check the operationalization against it. In criterion-related validity, you examine whether the
operationalization behaves the way it should given your theory of the construct. This is a more relational approach to construct validity. it assumes that your operationalization should
function in predictable ways in relation to other operationalizations based upon your theory of the construct. (If all this seems a bit dense, hang in there until you've gone through the
discussion below -- then come back and re-read this paragraph). Let's go through the specific validity types.
Translation Validity
I just made this one up today! (See how easy it is to be a methodologist?) I needed a term that described what both face and content validity are getting at. In essence, both of those
validity types are attempting to assess the degree to which you accurately translated your construct into the operationalization, and hence the choice of name. Let's look at the two
types of translation validity.
Face Validity
In face validity, you look at the operationalization and see whether "on its face" it seems like a good translation of the construct. This is probably the weakest way to
try to demonstrate construct validity. For instance, you might look at a measure of math ability, read through the questions, and decide that yep, it seems like this is
a good measure of math ability (i.e., the label "math ability" seems appropriate for this measure). Or, you might observe a teenage pregnancy prevention program
and conclude that, "Yep, this is indeed a teenage pregnancy prevention program." Of course, if this is all you do to assess face validity, it would clearly be weak
evidence because it is essentially a subjective judgment call. (Note that just because it is weak evidence doesn't mean that it is wrong. We need to rely on our
subjective judgment throughout the research process. It's just that this form of judgment won't be very convincing to others.) We can improve the quality of face
validity assessment considerably by making it more systematic. For instance, if you are trying to assess the face validity of a math ability measure, it would be more
convincing if you sent the test to a carefully selected sample of experts on math ability testing and they all reported back with the judgment that your measure
appears to be a good measure of math ability.
Content Validity
In content validity, you essentially check the operationalization against the relevant content domain for the construct. This approach assumes that you have a good
detailed description of the content domain, something that's not always true. For instance, we might lay out all of the criteria that should be met in a program that
claims to be a "teenage pregnancy prevention program." We would probably include in this domain specification the definition of the target group, criteria for
deciding whether the program is preventive in nature (as opposed to treatment-oriented), and lots of criteria that spell out the content that should be included like
basic information on pregnancy, the use of abstinence, birth control methods, and so on. Then, armed with these criteria, we could use them as a type of checklist
when examining our program. Only programs that meet the criteria can legitimately be defined as "teenage pregnancy prevention programs." This all sounds fairly
straightforward, and for many operationalizations it will be. But for other constructs (e.g., self-esteem, intelligence), it will not be easy to decide on the criteria that
constitute the content domain.
Criterion-Related Validity
In criteria-related validity, you check the performance of your operationalization against some criterion. How is this different from content validity? In content validity, the criteria are
the construct definition itself -- it is a direct comparison. In criterion-related validity, we usually make a prediction about how the operationalization will perform based on our theory of
the construct. The differences among the different criterion-related validity types is in the criteria they use as the standard for judgment.
Predictive Validity
In predictive validity, we assess the operationalization's ability to predict something it should theoretically be able to predict. For instance, we might theorize that a
measure of math ability should be able to predict how well a person will do in an engineering-based profession. We could give our measure to experienced
engineers and see if there is a high correlation between scores on the measure and their salaries as engineers. A high correlation would provide evidence for
predictive validity -- it would show that our measure can correctly predict something that we theoretically thing it should be able to predict.
Concurrent Validity
In concurrent validity, we assess the operationalization's ability to distinguish between groups that it should theoretically be able to distinguish between. For
example, if we come up with a way of assessing manic-depression, our measure should be able to distinguish between people who are diagnosed manic-
depression and those diagnosed paranoid schizophrenic. If we want to assess the concurrent validity of a new measure of empowerment, we might give the
measure to both migrant farm workers and to the farm owners, theorizing that our measure should show that the farm owners are higher in empowerment. As in any
discriminating test, the results are more powerful if you are able to show that you can discriminate between two groups that are very similar.
Convergent Validity
In convergent validity, we examine the degree to which the operationalization is similar to (converges on) other operationalizations that it theoretically should be
similar to. For instance, to show the convergent validity of a Head Start program, we might gather evidence that shows that the program is similar to other Head
Start programs. Or, to show the convergent validity of a test of arithmetic skills, we might correlate the scores on our test with scores on other tests that purport to
measure basic math ability, where high correlations would be evidence of convergent validity.
Discriminant Validity
In discriminant validity, we examine the degree to which the operationalization is not similar to (diverges from) other operationalizations that it theoretically should
be not be similar to. For instance, to show the discriminant validity of a Head Start program, we might gather evidence that shows that the program is not similar to
other early childhood programs that don't label themselves as Head Start programs. Or, to show the discriminant validity of a test of arithmetic skills, we might
correlate the scores on our test with scores on tests that of verbal ability, where low correlations would be evidence of discriminant validity.

Construct validity refers to the degree to which inferences can legitimately be made from the operationalizations in your study to the theoretical constructs on which those
operationalizations were based. I find that it helps me to divide the issues into two broad territories that I call the "land of theory" and the "land of observation." The land of theory is
what goes on inside your mind, and your attempt to explain or articulate this to others. It is all of the ideas, theories, hunches and hypotheses that you have about the world. In the
land of theory you will find your idea of the program or treatment as it should be. You will find the idea or construct of the outcomes or measures that you believe you are trying to
affect. The land of observation consists of what you see happening in the world around you and the public manifestations of that world. In the land of observation you will find your
actual program or treatment, and your actual measures or observational procedures. Presumably, you have constructed the land of observation based on your theories. You
developed the program to reflect the kind of program you had in mind. You created the measures to get at what you wanted to get at.
Construct validity is an assessment of how well you translated your ideas or theories into actual programs or measures. Why is this important? Because when you think about the
world or talk about it with others (land of theory) you are using words that represent concepts. If you tell someone that a special type of math tutoring will help their child do better in
math, you are communicating at the level of concepts or constructs. You aren't describing in operational detail the specific things that the tutor will do with their child. You aren't
describing the specific questions that will be on the math test that their child will do better on. You are talking in general terms, using constructs. If you based your recommendation
on research that showed that the special type of tutoring improved children' math scores, you would want to be sure that the type of tutoring you are referring to is the same as what
that study implemented and that the type of outcome you're saying should occur was the type they measured in their study. Otherwise, you would be mislabeling or misrepresenting
the research. In this sense, construct validity can be viewed as a "truth in labeling" kind of issue.
There really are two broad ways of looking at the idea of construct validity. I'll call the first the "definitionalist" perspective because it essentially holds that the way to assure
construct validity is to define the construct so precisely that you can operationalize it in a straightforward manner. In a definitionalist view, you have either operationalized the
construct correctly or you haven't -- it's an either/or type of thinking. Either this program is a "Type A Tutoring Program" or it isn't. Either you're measuring self esteem or you aren't.
The other perspective I'd call "relationalist." To a relationalist, things are not either/or or black-and-white -- concepts are more or less related to each other. The meaning of terms or
constructs differs relatively, not absolutely. The program in your study might be a "Type A Tutoring Program" in some ways, while in others it is not. It might be more that type of
program than another program. Your measure might be capturing a lot of the construct of self esteem, but it may not capture all of it. There may be another measure that is closer to
the construct of self esteem than yours is. Relationalism suggests that meaning changes gradually. It rejects the idea that we can rely on operational definitions as the basis for
construct definition.
To get a clearer idea of this distinction, you might think about how the law approaches the construct of "truth." Most of
you have heard the standard oath that a witness in a U.S. court is expected to swear. They are to tell "the truth, the
whole truth and nothing but the truth." What does this mean? If we only had them swear to tell the truth, they might
choose to interpret that as "make sure that what you say is true." But that wouldn't guarantee that they would tell
everything they knew to be true. They might leave some important things out. They would still be telling the truth.
They just wouldn't be telling everything. On the other hand, they are asked to tell "nothing but the truth." This suggests
that we can say simply that Statement X is true and Statement Y is not true.
Now, let's see how this oath translates into a measurement and construct validity context. For instance, we might
want our measure to reflect "the construct, the whole construct, and nothing but the construct." What does this mean?
Let's assume that we have five distinct concepts that are all conceptually related to each other -- self esteem, self
worth, self disclosure, self confidence, and openness. Most people would say that these concepts are similar,
although they can be distinguished from each other. If we were trying to develop a measure of self esteem, what
would it mean to measure "self esteem, all of self esteem, and nothing but self esteem?" If the concept of self esteem
overlaps with the others, how could we possibly measure all of it (that would presumably include the part that overlaps
with others) and nothing but it? We couldn't! If you believe that meaning is relational in nature -- that some concepts
are "closer" in meaning than others -- then the legal model discussed here does not work well as a model for
construct validity.
In fact, we will see that most social research methodologists have (whether they've thought about it or not!) rejected
the definitionalist perspective in favor of a relationalist one. In order to establish construct validity you have to meet
the following conditions:
You have to set the construct you want to operationalize (e.g., self esteem) within a semantic net (or "net of meaning"). This means that you have to tell us what your
construct is more or less similar to in meaning.
You need to be able to provide direct evidence that you control the operationalization of the construct -- that your operationalizations look like what they should theoretically
look like. If you are trying to measure self esteem, you have to be able to explain why you operationalized the questions the way you did. If all of your questions are addition
problems, how can you argue that your measure reflects self esteem and not adding ability?
You have to provide evidence that your data support your theoretical view of the relations among constructs. If you believe that self esteem is closer in meaning to self
worth than it is to anxiety, you should be able to show that measures of self esteem are more highly correlated with measures of self worth than with ones of anxiety.

Convergent and discriminant validity are both considered subcategories or subtypes of construct validity. The important thing to recognize is that they work together -- if you can
demonstrate that you have evidence for both convergent and discriminant validity, then you've by definition demonstrated that you have evidence for construct validity. But, neither
one alone is sufficient for establishing construct validity.
I find it easiest to think about convergent and discriminant validity as two inter-locking propositions. In simple words I would describe what they are doing as follows:
measures of constructs that theoretically should be related to each other are, in fact, observed to be related to each other
(that is, you should be able to show a correspondence or convergence between similar constructs)
and
measures of constructs that theoretically should not be related to each other are, in fact, observed to not be related to
each other (that is, you should be able to discriminate between dissimilar constructs)
To estimate the degree to which any two measures are related to each other we typically use the correlation coefficient. That is, we look at the patterns of intercorrelations among
our measures. Correlations between theoretically similar measures should be "high" while correlations between theoretically dissimilar measures should be "low".
The main problem that I have with this convergent-discrimination idea has to do with my use of the quotations around the terms "high" and "low" in the sentence above. The
question is simple -- how "high" do correlations need to be to provide evidence for convergence and how "low" do they need to be to provide evidence for discrimination? And the
answer is -- we don't know! In general we want convergent correlations to be as high as possible and discriminant ones to be as low as possible, but there is no hard and fast rule.
Well, let's not let that stop us. One thing that we can say is that the convergent correlations should always be higher than the discriminant ones. At least that helps a bit.
Before we get too deep into the idea of convergence and discrimination, let's take a look at each one using a simple example.
Convergent Validity
To establish convergent validity, you need to show that measures that should be related are in reality related. In the figure below, we see four measures (each is an item on a scale)
that all purport to reflect the construct of self esteem. For instance, Item 1 might be the statement "I feel good about myself" rated using a 1-to-5 Likert-type response format. We
theorize that all four items reflect the idea of self esteem (this is why I labeled the top part of the figure Theory). On the bottom part of the figure (Observation) we see the
intercorrelations of the four scale items. This might be based on giving our scale out to a sample of respondents. You should readily see that the item intercorrelations for all item
pairings are very high (remember that correlations range from -1.00 to +1.00). This provides evidence that our theory that all four items are related to the same construct is
supported.
Notice, however, that while the high intercorrelations demonstrate the the four items are probably related to the same construct, that doesn't automatically mean that the construct is
self esteem. Maybe there's some other construct that all four items are related to (more about this later). But, at the very least, we can assume from the pattern of correlations that
the four items are converging on the same thing, whatever we might call it.
Discriminant Validity
To establish discriminant validity, you need to show that measures that should not be related are in reality not related. In the figure below, we again see four measures (each is an
item on a scale). Here, however, two of the items are thought to reflect the construct of self esteem while the other two are thought to reflect locus of control. The top part of the
figure shows our theoretically expected relationships among the four items. If we have discriminant validity, the relationship between measures from different constructs should be
very low (again, we don't know how low "low" should be, but we'll deal with that later). There are four correlations between measures that reflect different constructs, and these are
shown on the bottom of the figure (Observation). You should see immediately that these four cross-construct correlations are very low (i.e., near zero) and certainly much lower than
the convergent correlations in the previous figure.
As above, just because we've provided evidence that the two sets of two measures each seem to be related to different constructs (because their intercorrelations are so low)
doesn't mean that the constructs they're related to are self esteem and locus of control. But the correlations do provide evidence that the two sets of measures are discriminated
from each other.
Putting It All Together
OK, so where does this leave us? I've shown how we go about providing evidence for convergent and discriminant validity separately. But as I said at the outset, in order to argue
for construct validity we really need to be able to show that both of these types of validity are supported. Given the above, you should be able to see that we could put both
principles together into a single analysis to examine both at the same time. This is illustrated in the figure below.
The figure shows six measures, three that are theoretically related to the construct of self esteem and three that are thought to be related to locus of control. The top part of the
figure shows this theoretical arrangement. The bottom of the figure shows what a correlation matrix based on a pilot sample might show. To understand this table, you need to first
be able to identify the convergent correlations and the discriminant ones. There are two sets or blocks of convergent coefficients (in red), one 3x3 block for the self esteem
intercorrelations and one 3x3 block for the locus of control correlations. There are also two 3x3 blocks of discriminant coefficients (shown in green), although if you're really sharp
you'll recognize that they are the same values in mirror image (Do you know why? You might want to read up on correlations to refresh your memory).
How do we make sense of the patterns of correlations? Remember that I said above that we don't have any firm rules for how high or low the correlations need to be to provide
evidence for either type of validity. But we do know that the convergent correlations should always be higher than the discriminant ones. take a good look at the table and you will
see that in this example the convergent correlations are always higher than the discriminant ones. I would conclude from this that the correlation matrix provides evidence for both
convergent and discriminant validity, all in one analysis!
But while the pattern supports discriminant and convergent validity, does it show that the three self esteem measures actually measure self esteem or that the three locus of control
measures actually measure locus of control. Of course not. That would be much too easy.
So, what good is this analysis? It does show that, as you predicted, the three self esteem measures seem to reflect the same construct (whatever that might be), the three locus of
control measures also seem to reflect the same construct (again, whatever that is) and that the two sets of measures seem to be reflecting two different constructs (whatever they
are). That's not bad for one simple analysis.
OK, so how do we get to the really interesting question? How do we show that our measures are actually measuring self esteem or locus of control? I hate to disappoint you, but
there is no simple answer to that (I bet you knew that was coming). There's a number of things we can do to address that question. First, we can use other ways to address
construct validity to help provide further evidence that we're measuring what we say we're measuring. For instance, we might use a face validity or content validity approach to
demonstrate that the measures reflect the constructs we say they are (see the discussion on types of construct validity for more information).
One of the most powerful approaches is to include even more constructs and measures. The more complex our theoretical model (if we find confirmation of the correct pattern in the
correlations), the more we are providing evidence that we know what we're talking about (theoretically speaking). Of course, it's also harder to get all the correlations to give you the
exact right pattern as you add lots more measures. And, in many studies we simply don't have the luxury to go adding more and more measures because it's too costly or
demanding. Despite the impracticality, if we can afford to do it, adding more constructs and measures will enhance our ability to assess construct validity using approaches like the
multitrait-multimethod matrix and the nomological network.
Perhaps the most interesting approach to getting at construct validity involves the idea of pattern matching. Instead of viewing convergent and discriminant validity as differences of
kind, pattern matching views them as differences in degree. This seems a more reasonable idea, and helps us avoid the problem of how high or low correlations need to be to say
that we've established convergence or discrimination.
Before we launch into a discussion of the most common threats to construct validity, let's recall what a threat to validity is. In a research study you are likely to reach a conclusion
that your program was a good operationalization of what you wanted and that your measures reflected what you wanted them to reflect. Would you be correct? How will you be
criticized if you make these types of claims? How might you strengthen your claims. The kinds of questions and issues your critics will raise are what I mean by threats to construct
validity.
I take the list of threats from the discussion in Cook and Campbell (Cook, T.D. and Campbell, D.T. Quasi-Experimentation: Design and Analysis Issues for Field Settings. Houghton
Mifflin, Boston, 1979). While I love their discussion, I do find some of their terminology less than straightforward -- a lot of what I'll do here is try to explain this stuff in terms that the
rest of us might hope to understand.
Inadequate Preoperational Explication of Constructs
This one isn't nearly as ponderous as it sounds. Here, preoperational means before translating constructs into measures or treatments, and explication means explanation -- in
other words, you didn't do a good enough job of defining (operationally) what you mean by the construct. How is this a threat? Imagine that your program consisted of a new type of
approach to rehabilitation. Your critic comes along and claims that, in fact, your program is neither new nor a true rehabilitation program. You are being accused of doing a poor job
of thinking through your constructs. Some possible solutions:
think through your concepts better

use methods (e.g., concept mapping) to articulate your concepts
get experts to critique your operationalizations
Mono-Operation Bias
Mono-operation bias pertains to the independent variable, cause, program or treatment in your study -- it does not pertain to measures or outcomes (see Mono-method Bias below).
If you only use a single version of a program in a single place at a single point in time, you may not be capturing the full breadth of the concept of the program. Every
operationalization is flawed relative to the construct on which it is based. If you conclude that your program reflects the construct of the program, your critics are likely to argue that
the results of your study only reflect the peculiar version of the program that you implemented, and not the actual construct you had in mind. Solution: try to implement multiple
versions of your program.
Mono-Method Bias
Mono-method bias refers to your measures or observations, not to your programs or causes. Otherwise, it's essentially the same issue as mono-operation bias. With only a single
version of a self esteem measure, you can't provide much evidence that you're really measuring self esteem. Your critics will suggest that you aren't measuring self esteem -- that
you're only measuring part of it, for instance. Solution: try to implement multiple measures of key constructs and try to demonstrate (perhaps through a pilot or side study) that the
measures you use behave as you theoretically expect them to.
Interaction of Different Treatments
You give a new program designed to encourage high-risk teenage girls to go to school and not become pregnant. The results of your study show that the girls in your treatment
group have higher school attendance and lower birth rates. You're feeling pretty good about your program until your critics point out that the targeted at-risk treatment group in your
study is also likely to be involved simultaneously in several other programs designed to have similar effects. Can you really label the program effect as a consequence of your
program? The "real" program that the girls received may actually be the combination of the separate programs they participated in.
Interaction of Testing and Treatment
Does testing or measurement itself make the groups more sensitive or receptive to the treatment? If it does, then the testing is in effect a part of the treatment, it's inseparable from
the effect of the treatment. This is a labeling issue (and, hence, a concern of construct validity) because you want to use the label "program" to refer to the program alone, but in fact
it includes the testing.
Restricted Generalizability Across Constructs
This is what I like to refer to as the "unintended consequences" treat to construct validity. You do a study and conclude that Treatment X is effective. In fact, Treatment X does cause
a reduction in symptoms, but what you failed to anticipate was the drastic negative consequences of the side effects of the treatment. When you say that Treatment X is effective,
you have defined "effective" as only the directly targeted symptom. This threat reminds us that we have to be careful about whether our observed effects (Treatment X is effective)
would generalize to other potential outcomes.
Confounding Constructs and Levels of Constructs
Imagine a study to test the effect of a new drug treatment for cancer. A fixed dose of the drug is given to a randomly assigned treatment group and a placebo to the other group. No
treatment effects are detected. Perhaps the result that's observed is only true for that dosage level. Slight increases or decreases of the dosage may radically change the results. In
this context, it is not "fair" for you to use the label for the drug as a description for your treatment because you only looked at a narrow range of dose. Like the other construct validity
threats, this is essentially a labeling issue -- your label is not a good description for what you implemented.
The "Social" Threats to Construct Validity
I've set aside the other major threats to construct validity because they all stem from the social and human nature of the research endeavor.
Hypothesis Guessing
Most people don't just participate passively in a research project. They are trying to figure out what the study is about. They are "guessing" at what the real purpose of the study is.
And, they are likely to base their behavior on what they guess, not just on your treatment. In an educational study conducted in a classroom, students might guess that the key
dependent variable has to do with class participation levels. If they increase their participation not because of your program but because they think that's what you're studying, then
you cannot label the outcome as an effect of the program. It is this labeling issue that makes this a construct validity threat.
Evaluation Apprehension
Many people are anxious about being evaluated. Some are even phobic about testing and measurement situations. If their apprehension makes them perform poorly (and not your
program conditions) then you certainly can't label that as a treatment effect. Another form of evaluation apprehension concerns the human tendency to want to "look good" or "look
smart" and so on. If, in their desire to look good, participants perform better (and not as a result of your program!) then you would be wrong to label this as a treatment effect. In both
cases, the apprehension becomes confounded with the treatment itself and you have to be careful about how you label the outcomes.
Experimenter Expectancies
These days, where we engage in lots of non-laboratory applied social research, we generally don't use the term "experimenter" to describe the person in charge of the research. So,
let's relabel this threat "researcher expectancies." The researcher can bias the results of a study in countless ways, both consciously or unconsciously. Sometimes the researcher
can communicate what the desired outcome for a study might be (and participant desire to "look good" leads them to react that way). For instance, the researcher might look
pleased when participants give a desired answer. If this is what causes the response, it would be wrong to label the response as a treatment effect.
What is the Nomological Net?
The nomological network is an idea that was developed by Lee Cronbach and Paul Meehl in 1955 (Cronbach, L. and Meehl, P. (1955). Construct validity in psychological tests,
Psychological Bulletin, 52, 4, 281-302.) as part of the American Psychological Association's efforts to develop standards for psychological testing. The term "nomological" is derived
from Greek and means "lawful", so the nomological network can be thought of as the "lawful network." The nomological network was Cronbach and Meehl's view of construct
validity. That is, in order to provide evidence that your measure has construct validity, Cronbach and Meehl argued that you had to develop a nomological network for your measure.
This network would include the theoretical framework for what you are trying to measure, an empirical framework for how you are going to measure it, and specification of the
linkages among and between these two frameworks.
The nomological network is founded on a number of principles that guide the researcher when trying to establish construct validity. They are:
Scientifically, to make clear what something is or means, so that laws can be set forth in which that something occurs.
The laws in a nomological network may relate:
observable properties or quantities to each other
different theoretical constructs to each other
theoretical constructs to observables
At least some of the laws in the network must involve observables.
"Learning more about" a theoretical construct is a matter of elaborating the nomological network in which it occurs or of increasing the definiteness of its components.
The basic rule for adding a new construct or relation to a theory is that it must generate laws (nomologicals) confirmed by observation or reduce the number of nomologicals
required to predict some observables.
Operations which are qualitatively different "overlap" or "measure the same thing" if their positions in the nomological net tie them to the same construct variable.
What Cronbach and Meehl were trying to do is to link the conceptual/theoretical realm with the observable one, because this is the central concern of construct validity. While the
nomological network idea may work as a philosophical foundation for construct validity, it does not provide a practical and usable methodology for actually assessing construct
validity. The next phase in the evolution of the idea of construct validity -- the development of the multitrait-multimethod matrix -- moved us a bit further toward a methodological
approach to construct validity.

The idea of using pattern matching as a rubric for assessing construct validity is an area where I have tried to make a contribution (Trochim, W., (1985). Pattern matching, validity,
and conceptualization in program evaluation. Evaluation Review, 9, 5, 575-604 and Trochim, W. (1989). Outcome pattern matching and program theory. Evaluation and Program
Planning, 12, 355-366.), although my work was very clearly foreshadowed, especially in much of Donald T. Campbell's writings. Here, I'll try to explain what I mean by pattern
matching with respect to construct validity.
The Theory of Pattern Matching
A pattern is any arrangement of objects or entities. The term "arrangement" is used here to indicate that a pattern is by definition non-random and at least potentially describable. All
theories imply some pattern, but theories and patterns are not the same thing. In general, a theory postulates structural relationships between key constructs. The theory can be
used as the basis for generating patterns of predictions. For instance, E=MC2 can be considered a theoretical formulation. A pattern of expectations can be developed from this
formula by generating predicted values for one of these variables given fixed values of the others. Not all theories are stated in mathematical form, especially in applied social
research, but all theories provide information that enables the generation of patterns of predictions.
Pattern matching always involves an attempt to link two patterns where one
is a theoretical pattern and the other is an observed or operational one. The
top part of the figure shows the realm of theory. The theory might originate
from a formal tradition of theorizing, might be the ideas or "hunches" of the
investigator, or might arise from some combination of these. The
conceptualization task involves the translation of these ideas into a
specifiable theoretical pattern indicated by the top shape in the figure. The
bottom part of the figure indicates the realm of observation. This is broadly
meant to include direct observation in the form of impressions, field notes,
and the like, as well as more formal objective measures. The collection or
organization of relevant operationalizations (i.e., relevant to the theoretical
pattern) is termed the observational pattern and is indicated by the lower
shape in the figure. The inferential task involves the attempt to relate, link or
match these two patterns as indicated by the double arrow in the center of
the figure. To the extent that the patterns match, one can conclude that the
theory and any other theories which might predict the same observed
pattern receive support.
It is important to demonstrate that there are no plausible alternative theories

that account for the observed pattern and this task is made much easier
when the theoretical pattern of interest is a unique one. In effect, a more
complex theoretical pattern is like a unique fingerprint which one is seeking
in the observed pattern. With more complex theoretical patterns it is usually
more difficult to construe sensible alternative patterns that would also predict
the same result. To the extent that theoretical and observed patterns do not
match, the theory may be incorrect or poorly formulated, the observations
may be inappropriate or inaccurate, or some combination of both states may
exist.
All research employs pattern matching principles, although this is seldom

done consciously. In the traditional two-group experimental context, for
instance, the typical theoretical outcome pattern is the hypothesis that there
will be a significant difference between treated and untreated groups. The observed outcome pattern might consist of the averages for the two groups on one or more measures. The
pattern match is accomplished by a test of significance such as the t-test or ANOVA. In survey research, pattern matching forms the basis of generalizations across different
concepts or population subgroups. In qualitative research pattern matching lies at the heart of any attempt to conduct thematic analyses.
While current research methods can be described in pattern matching terms, the idea of pattern matching implies more, and suggests how one might improve on these current
methods. Specifically, pattern matching implies that more complex patterns, if matched, yield greater validity for the theory. Pattern matching does not differ fundamentally from
traditional hypothesis testing and model building approaches. A theoretical pattern is a hypothesis about what is expected in the data. The observed pattern consists of the data that
are used to examine the theoretical model. The major differences between pattern matching and more traditional hypothesis testing approaches are that pattern matching
encourages the use of more complex or detailed hypotheses and treats the observations from a multivariate rather than a univariate perspective.
Pattern Matching and Construct Validity
While pattern matching can be used to address a variety of questions in social research, the emphasis here is on its use in assessing construct validity.
The accompanying figure shows the pattern matching structure for an example involving five measurement constructs -- arithmetic, algebra, geometry, spelling, and reading. In this
example, we'll use concept mapping to develop the theoretical pattern among these constructs. In the concept mapping we generate a large set of potential arithmetic, algebra,
geometry, spelling, and reading questions. We sort them into piles of similar questions and develop a map that shows each question in relation to the others. On the map, questions
that are more similar are closer to each other, those less similar are more distant. From the map, we can find the straight-line distances between all pair of points (i.e., all questions).
This is the matrix of interpoint distances. We might use the questions from the map in constructing our measurement instrument, or we might sample from these questions. On the
observed side, we have one or more test instruments that contain a number of questions about arithmetic, algebra, geometry, spelling, and reading. We analyze the data and
construct a matrix of inter-item correlations.
What we want to do is compare the matrix of interpoint distances from our concept map (i.e., the theoretical pattern) with the correlation matrix of the questions (i.e., the observed
pattern). How do we achieve this? Let's assume that we had 100 prospective questions on our concept map, 20 for each construct. Correspondingly, we have 100 questions on our
measurement instrument, 20 in each area. Thus, both matrices are 100x100 in size. Because both matrices are symmetric, we actually have (N(N-1))/2 = (100(99))/2 = 9900/2 =
4,950 unique pairs (excluding the diagonal). If we "string out" the values in each matrix we can construct a vector or column of 4,950 numbers for each matrix. The first number is the
value comparing pair (1,2), the next is (1,3) and so on to (N-1, N) or (99, 100). Now, we can compute the overall correlation between these two columns, which is the correlation
between our theoretical and observed patterns, the "pattern matching correlation." In this example, let's assume it is -.93. Why would it be a negative correlation? Because we are
correlating distances on the map with the similarities in the correlations and we expect that greater distance on the map should be associated with lower correlation and less
distance with greater correlation.
The pattern matching correlation is our overall estimate of the degree of construct validity in this example because it estimates the degree to which the operational measures reflect
our theoretical expectations.
Advantages and Disadvantages of Pattern Matching
There are several disadvantages of the pattern matching approach to construct validity. The most obvious is that pattern matching requires that you specify your theory of the
constructs rather precisely. This is typically not done in applied social research, at least not to the level of specificity implied here. But perhaps it should be done. Perhaps the more
restrictive assumption is that you are able to structure the theoretical and observed patterns the same way so that you can directly correlate them. We needed to quantify both
patterns and, ultimately, describe them in matrices that had the same dimensions. In most research as it is currently done it will be relatively easy to construct a matrix of the inter-
item correlations. But we seldom currently use methods like concept mapping that enable us to estimate theoretical patterns that can be linked with observed ones. Again, perhaps
we ought to do this more frequently.
There are a number of advantages of the pattern matching approach, especially relative to the multitrait-multimethod matrix (MTMM). First, it is more general and flexible than
MTMM. It does not require that you measure each construct with multiple methods. Second, it treats convergence and discrimination as a continuum. Concepts are more or less
similar and so their interrelations would be more or less convergent or discriminant. This moves the convergent/discriminant distinction away from the simplistic dichotomous
categorical notion to one that is more suitably post-positivist and continuous in nature. Third, the pattern matching approach does make it possible to estimate the overall construct
validity for a set of measures in a specific context. Notice that we don't estimate construct validity for a single measure. That's because construct validity, like discrimination, is
always a relative metric. Just as we can only ask whether you have distinguished something if there is something to distinguish it from, we can only assess construct validity in terms
of a theoretical semantic or nomological net, the conceptual context within which it resides. The pattern matching correlation tells us, for our particular study, whether there is a
demonstrable relationship between how we theoretically expect our measures will interrelate and how they do in practice. Finally, because pattern matching requires a more specific
theoretical pattern than we typically articulate, it requires us to specify what we think about the constructs in our studies. Social research has long been criticized for conceptual
sloppiness, for re-packaging old constructs in new terminology and failing to develop an evolution of research around key theoretical constructs. Perhaps the emphasis on theory
articulation in pattern matching would encourage us to be more careful about the conceptual underpinnings of our empirical work. And, after all, isn't that what construct validity is all
about?
[ Home ] [ Problem Formulation ] [ Concept Mapping ]
Social scientists have developed a number of methods and processes that might be useful in helping you to formulate a
research project. I would include among these at least the following -- brainstorming, brainwriting, nominal group
techniques, focus groups, affinity mapping, Delphi techniques, facet theory, and qualitative text analysis. Here, I'll show
you a method that I have developed, called concept mapping, which is especially useful for research problem formulation.
Concept mapping is a general method that can be used to help any individual or group to describe their ideas about some
topic in a pictorial form. There are several different types of methods that all currently go by names like "concept mapping",
"mental mapping" or "concept webbing." All of them are similar in that they result in a picture of someone's ideas. But the
kind of concept mapping I want to describe here is different in a number of important ways. First, it is primarily a group
process and so it is especially well-suited for situations where teams or groups of stakeholders have to work together. The
other methods work primarily with individuals. Second, it uses a very structured facilitated approach. There are specific
steps that are followed by a trained facilitator in helping a group to articulate its ideas and understand them more clearly.
Third, the core of concept mapping consists of several state-of-the-art multivariate statistical methods that analyze the
input from all of the individuals and yields an aggregate group product. And fourth, the method requires the use of
specialized computer programs that can handle the data from this type of process and accomplish the correct analysis and
mapping procedures.
Although concept mapping is a general method, it is particularly useful for helping social researchers and research teams
develop and detail ideas for research. And, it is especially valuable when researchers want to involve relevant stakeholder
groups in the act of creating the research project. Although concept mapping is used for many purposes -- strategic
planning, product development, market analysis, decision making, measurement development -- we concentrate here on
it's potential for helping researchers formulate their projects.
So what is concept mapping? Essentially, concept mapping is a structured process, focused on a topic or construct
of interest, involving input from one or more participants, that produces an interpretable pictorial view (concept
map) of their ideas and concepts and how these are interrelated. Concept mapping helps people to think more
effectively as a group without losing their individuality. It helps groups to manage the complexity of their ideas without
trivializing them or losing detail.
A concept mapping process involves six steps that can take place in a single day or can be spread out over weeks or
months depending on the situation. The first step is the Preparation Step. There are three things done here. The
facilitator of the mapping process works with the initiator(s) (i.e., whoever requests the process initially) to identify who the
participants will be. A mapping process can have hundreds or even thousands of stakeholders participating, although we
usually have a relatively small group of between 10 and 20 stakeholders involved. Second, the initiator works with the
stakeholders to develop the focus for the project. For instance, the group might decide to focus on defining a program or
treatment. Or, they might choose to map all of the outcomes they might expect to see as a result. Finally, the group
decides on an appropriate schedule for the mapping. In the Generation Step the stakeholders develop a large set of
statements that address the focus. For instance, they might generate statements that describe all of the specific activities
that will constitute a specific social program. Or, they might generate statements describing specific outcomes that might
occur as a result of participating in a program. A wide variety of methods can be used to accomplish this including
traditional brainstorming, brainwriting, nominal group techniques, focus groups, qualitative text analysis, and so on. The
group can generate up to 200 statements in a concept mapping project. In the Structuring Step the participants do two
things. First, each participant sorts the statements into piles of similar ones. Most times they do this by sorting a deck of
cards that has one statement on each card. But they can also do this directly on a computer by dragging the statements
into piles that they create. They can have as few or as many piles as they want. Each participant names each pile with a
short descriptive label. Second, each participant rates each of the statements on some scale. Usually the statements are
rated on a 1-to-5 scale for their relative importance, where a 1 means the statement is relatively unimportant compared to
all the rest, a 3 means that it is moderately important, and a 5 means that it is extremely important. The Representation
Step is where the analysis is done -- this is the process of taking the sort and rating input and "representing" it in map
form. There are two major statistical analyses that are used. The first -- multidimensional scaling -- takes the sort data
across all participants and develops the basic map where each statement is a point on the map and statements that were
piled together by more people are closer to each other on the map. The second analysis -- cluster analysis -- takes the
output of the multidimensional scaling (the point map) and partitions the map into groups of statements or ideas, into
clusters. If the statements describe activities of a program, the clusters show how these can be grouped into logical groups
of activities. If the statements are specific outcomes, the clusters might be viewed as outcome constructs or concepts. In
the fifth step -- the Interpretation Step -- the facilitator works with the stakeholder group to help them develop their own
labels and interpretations for the various maps. Finally, the Utilization Step involves using the maps to help address the
original focus. On the program side, the maps can be used as a visual framework for operationalizing the program. on the
outcome side, they can be used as the basis for developing measures and displaying results.
This is only a very basic introduction to concept mapping and its uses. If you want to find out more about this method, you
might look at some of the articles I've written about concept mapping, including An Introduction to Concept Mapping,
Concept Mapping: Soft Science or Hard Art?,or the article entitled Using Concept Mapping to Develop a Conceptual
Framework of Staff's Views of a Supported Employment Program for Persons with Severe Mental Illness.

[ Home ] [ Problem Formulation ] [ Concept Mapping ]
"Well begun is half done" --Aristotle, quoting an old proverb
Where do research topics come from?
So how do researchers come up with the idea for a research project? Probably one of the most common sources of
research ideas is the experience of practical problems in the field. Many researchers are directly engaged in
social, health or human service program implementation and come up with their ideas based on what they see
happening around them. Others aren't directly involved in service contexts, but work with (or survey) people who are
in order to learn what needs to be better understood. Many of the ideas would strike the outsider as silly or worse.
For instance, in health services areas, there is great interest in the problem of back injuries among nursing staff. It's
not necessarily the thing that comes first to mind when we think about the health care field. But if you reflect on it for
a minute longer, it should be obvious that nurses and nursing staff do an awful lot of lifting in performing their jobs.
They lift and push heavy equipment, and they lift and push oftentimes heavy patients! If 5 or 10 out of every hundred
nursing staff were to strain their backs on average over the period of one year, the costs would be enormous -- and
that's pretty much what's happening. Even minor injuries can result in increased absenteeism. Major ones can result
in lost jobs and expensive medical bills. The nursing industry figures that this is a problem that costs tens of millions
of dollars annually in increased health care. And, the health care industry has developed a number of approaches,
many of them educational, to try to reduce the scope and cost of the problem. So, even though it might seem silly at
first, many of these practical problems that arise in practice can lead to extensive research efforts.
Another source for research ideas is the literature in your specific field. Certainly, many researchers get ideas for
research by reading the literature and thinking of ways to extend or refine previous research. Another type of
literature that acts as a source of good research ideas is the Requests For Proposals (RFPs) that are published by
government agencies and some companies. These RFPs describe some problem that the agency would like
researchers to address -- they are virtually handing the researcher an idea! Typically, the RFP describes the
problem that needs addressing, the contexts in which it operates, the approach they would like you to take to
investigate to address the problem, and the amount they would be willing to pay for such research. Clearly, there's
nothing like potential research funding to get researchers to focus on a particular research topic.
And let's not forget the fact that many researchers simply think up their research topic on their own. Of course, no
one lives in a vacuum, so we would expect that the ideas you come up with on your own are influenced by your
background, culture, education and experiences.
Is the study feasible?
Very soon after you get an idea for a study reality begins to kick in and you begin to think about whether the study is
feasible at all. There are several major considerations that come into play. Many of these involve making tradeoffs
between rigor and practicality. To do a study well from a scientific point of view may force you to do things you
wouldn't do normally. You may have to control the implementation of your program more carefully than you
otherwise might. Or, you may have to ask program participants lots of questions that you usually wouldn't if you
weren't doing research. If you had unlimited resources and unbridled control over the circumstances, you would
always be able to do the best quality research. But those ideal circumstances seldom exist, and researchers are
almost always forced to look for the best tradeoffs they can find in order to get the rigor they desire.
There are several practical considerations that almost always need to be considered when deciding on the
feasibility of a research project. First, you have to think about how long the research will take to accomplish.
Second, you have to question whether there are important ethical constraints that need consideration. Third, can
you achieve the needed cooperation to take the project to its successful conclusion. And fourth, how significant are
the costs of conducting the research. Failure to consider any of these factors can mean disaster later.
The Literature Review
One of the most important early steps in a research project is the conducting of the literature review. This is also one
of the most humbling experiences you're likely to have. Why? Because you're likely to find out that just about any
worthwhile idea you will have has been thought of before, at least to some degree. Every time I teach a research
methods course, I have at least one student come to me complaining that they couldn't find anything in the literature
that was related to their topic. And virtually every time they have said that, I was able to show them that was only
true because they only looked for articles that were exactly the same as their research topic. A literature review is
designed to identify related research, to set the current research project within a conceptual and theoretical context.
When looked at that way, there is almost no topic that is so new or unique that we can't locate relevant and
informative related research.
Some tips about conducting the literature review. First, concentrate your efforts on the scientific literature. Try to
determine what the most credible research journals are in your topical area and start with those. Put the greatest
emphasis on research journals that use a blind review system. In a blind review, authors submit potential articles to
a journal editor who solicits several reviewers who agree to give a critical review of the paper. The paper is sent to
these reviewers with no identification of the author so that there will be no personal bias (either for or against the
author). Based on the reviewers' recommendations, the editor can accept the article, reject it, or recommend that the
author revise and resubmit it. Articles in journals with blind review processes can be expected to have a fairly high
level of credibility. Second, do the review early in the research process. You are likely to learn a lot in the literature
review that will help you in making the tradeoffs you'll need to face. After all, previous researchers also had to face
tradeoff decisions.
What should you look for in the literature review? First, you might be able to find a study that is quite similar to the
one you are thinking of doing. Since all credible research studies have to review the literature themselves, you can
check their literature review to get a quick-start on your own. Second, prior research will help assure that you include
all of the major relevant constructs in your study. You may find that other similar studies routinely look at an outcome
that you might not have included. If you did your study without that construct, it would not be judged credible if it
ignored a major construct. Third, the literature review will help you to find and select appropriate measurement
instruments. You will readily see what measurement instruments researchers use themselves in contexts similar to
yours. Finally, the literature review will help you to anticipate common problems in your research context. You can
use the prior experiences of other to avoid common traps and pitfalls.

An Introduction to
Concept Mapping for
Planning and Evaluation

Cornell University
Abstract
Concept mapping is a type of structured conceptualization which can be used by groups to develop a conceptual framework which can guide
evaluation or planning. In the typical case, six steps are involved: 1) Preparation (including selection of participants and development of focus for the
conceptualization); 2) the Generation of statements; 3) the Structuring of statements; 4) the Representation of Statements in the form of a concept map
(using multidimensional scaling and cluster analysis); 5) the Interpretation of maps; and, 6) the Utilization of Maps. Concept mapping encourages the
group to stay on task; results relatively quickly in an interpretable conceptual framework; expresses this framework entirely in the language of the
participants; yields a graphic or pictorial product which simultaneously shows all major ideas and their interrelationships; often improves group or
organizational cohesiveness and morale. This paper describes each step in the process, considers major methodological issues and problems, and
discusses computer programs which can be used to accomplish the process.
An Introduction to Concept Mapping for Planning and Evaluation
Probably the most difficult step in a planning or evaluation project is the first one -- everything which follows depends on how well the project is
initially conceptualized. Conceptualization in this sense refers to the articulation of thoughts, ideas, or hunches and the representation of these in some
objective form. In a planning process, we typically wish to conceptualize the major goals and objectives, needs, resources and capabilities or other
dimensions which eventually constitute the elements of a plan. In evaluation, we may want to conceptualize the programs or treatments, samples,
settings, measures and outcomes which we believe are relevant.
This special section of Evaluation and Program Planning extends earlier work by Trochim and Linton (1986) who proposed a general framework for
structured conceptualization and showed how specific conceptualization processes can be devised to assist groups in the theory and concept formation
stages of planning and evaluation. The papers presented here focus on one specific type of structured conceptualization process which we term
"concept mapping". In concept mapping, ideas are represented in the form of a picture or map. To construct the map, ideas first have to be described
or generated, and the interrelationships between them articulated. Multivariate statistical techniques -- multidimensional scaling and cluster analysis --
are then applied to this information and the results are depicted in map form. The content of the map is entirely determined by the group. They
brainstorm the initial ideas, provide information about how these ideas are related, interpret the results of the analyses, and decide how the map is to
be utilized.
The process described here is not the only way to accomplish concept mapping. For instance, Novak and Gowin (1984) suggest that concept maps be
drawn "free-hand" after an initial articulation of the major ideas and classification of them into hierarchical concepts. In a similar manner, Rico (1983)
has advocated "free-hand" concept mapping or drawing as a useful method for developing a conceptual framework for writing. These and other
approaches have value for planning and evaluation, but fall outside of the scope of this paper. The major differences between the method described
here and other concept mapping processes are: this method is particularly appropriate for group use -- the method generates a group aggregate map; it
utilizes multivariate data analyses to construct the maps; and it generates interval-level maps which have some advantages for planning and
evaluation, especially through pattern matching as described later. Despite these differences, this paper should be viewed as a clear call for the
importance of further exploration of any processes which improve conceptualization in planning and evaluation. Throughout the papers in this
volume, however, the term "concept mapping" should be understood to refer only to the process described here, and its variations.
Group concept mapping is consistent with the growing interest in the role of theory in planning and evaluation. In evaluation, for instance, this interest
is evidenced in writings on the importance of program theory (Bickman, 1986; Chen and Rossi, 1983, 1987; Rossi and Chen, in press); in the
increased emphasis on the importance of studying causal process (Mark, 1986); in the recognition of the central role of judgment -- especially theory-
based judgment -- in research (Cordray, 1986; Einhorn and Hogarth, 1986); and, in the thinking of critical multiplism (Shadish et al, 1986) which
emphasizes the role of theory in selecting and guiding the analysis of multiple operationalizations. Concept mapping can be viewed as one way to
articulate theory in these contexts. In planning, conceptualization has had somewhat more attention and is evidenced in the sometimes daunting
proliferation of different planning models and methods of conceptualizing (Dunn, 1981).
This paper introduces the concept mapping process, suggests some of the major technical or methodological issues which are involved, and offers
some suggestions about computer programs which can be used to accomplish concept mapping. The remaining papers provide numerous examples of
the use of this process and its variations in a wide variety of contexts and consider in greater detail some of the methodological issues which face
researchers in this area.
The Concept Mapping Process
The term "structured conceptualization" refers to any process which can be described as a sequence of concrete operationally-defined steps and which
yields a conceptual representation (Trochim and Linton, 1986). The specific concept mapping process described here and discussed throughout this
volume is considered only one of many possible structured conceptualization processes. This process can be used whenever there is a group of people
who wish to develop a conceptual framework for evaluation or planning, where the framework is displayed in the form of a concept map. A concept
map is a pictorial representation of the group's thinking which displays all of the ideas of the group relative to the topic at hand, shows how these
ideas are related to each other and, optionally, shows which ideas are more relevant, important, or appropriate.
The scenario within which concept mapping is applied assumes that there is an identifiable group responsible for guiding the evaluation or planning
effort. Depending on the situation, this group might consist of the administrators, staff or members of the board of an organization; community leaders
or representatives of relevant constituency groups; academicians or members of the policy making community; funding agents or representatives of
groups with oversight responsibility; representatives of relevant client populations; or combinations of these. The concept mapping process is guided
by a facilitator who could be an outside consultant or an internal member of the group responsible for the planning or evaluation effort. The
facilitator's role is only to manage the process -- the content, interpretation and utilization of the concept map are determined entirely by the group.
An overview of the concept mapping process is provided in Figure 1.

Figure 1. The concept mapping process.
The figure shows six steps which are followed in developing a useful group concept map. Each of these steps will be discussed in some detail and
illustrated with data from a concept mapping process which was conducted in York County, Maine, to assist representatives of a number of county
human service agencies to develop a conceptual framework for planning services for the elderly. While the major focus of this project was on
planning, some comments will be offered regarding the potential use of this conceptualization for evaluation purposes.
Step 1: Preparation
There are two major tasks which must be undertaken prior to commencement of the actual group process. First, the facilitator must work with the
parties involved to decide on who will participate in the process. Second, the facilitator must then work with the participants or a subgroup to decide
on the specific focus for the conceptualization.
Selecting the Participants. One of the most important tasks which the facilitator addresses is who will participate in the concept mapping process.
Our experience has been that a conceptualization is best when it includes a wide variety of relevant people. If we are conducting strategic planning for
a human service organization, we might include administrative staff, service staff, board members, clients, and relevant members of community
groups. In a program evaluation context, we might similarly include administrators, program staff, clients, social science theorists, community
members, and relevant funding agent representatives. Broad heterogenous participation helps to insure that a wide variety of viewpoints will be
considered and encourages a broader range of people to "buy into" the conceptual framework which results.
In some situations, however, we have used relatively small homogenous groups for the conceptualization process. For instance, if an organization is
beginning a strategic planning effort and would like to lay out quickly some of the major concepts around which the planning will be based, they
might prefer to use a relatively small group of administrators and organizational staff members. The obvious advantage of doing this is that it is
logistically simpler to get people together for meetings if they are all on the staff of the organization. This type of group works well when a quick
conceptualization framework is desired, but in general we would recommend a broader sampling of opinion.
In some contexts it might be reasonable to use some random sampling scheme to select participants from a larger defined population. This is most
useful when one wishes to argue that the resulting concept map is generalizable to some larger population of interest. Simple random sampling
schemes, of course, run the risk of underrepresenting minority groups from the population and so, if sampling is used, it will typically be best to
attempt either some form of stratified random sampling or purposive sampling for heterogeneity.
There is no strict limit on the number of people who can be involved in concept mapping. It is feasible, with some process modifications, for an
individual to conduct a conceptualization alone (see Dumont, this volume, for an example). At the other extreme, we have worked with groups as
large as 75-80 people in this process. Typically, we have had between 10 and 20 people in most of our studies and this seems to be a workable
number. Groups of that size insure a variety of opinions and still enable good group discussion and interpretation.
It is also not necessary that all participants take part in every step of the process. One might, for instance, have a relatively small group do the
generation (e.g., brainstorming) step, a much larger group perform the structuring (i.e., sorting and rating) and a small group for interpretation and
utilization. In general, however, we have found that concept maps are better understood by people who have participated in all phases of the process
than by those who have only taken part in one or two steps.
In the York County study, the purpose of the conceptualization was to bring together a small group of representatives from a number of local agencies
which provide services to the elderly in order to develop a framework for planning. There was also a strong interest in piloting the concept mapping
process with the idea that it might possibly be applied later with a broader constituency group which included elderly persons from the county.
Between 10 and 15 people participated in the two meetings including representatives of the United Way and several health and mental health
organizations.
Developing the Focus. The second major preparatory step involves developing the focus or domain of the conceptualization. There are two separate
products which are desired here. First, the participants must define the focus for the brainstorming session. Second, the focus for ratings which are
performed during the structuring step of the process needs to be developed. This essentially involves defining the dimension(s) on which each of the
brainstormed statements will be rated.
It is essential that the focus for both the brainstorming and the ratings be worded as statements which give the specific instruction intended so that all
of the participants can agree in advance. In developing both the brainstorming and rating focus statements, the facilitator usually has a meeting with
the participants or some representative subgroup. In this meeting the facilitator discusses various alternatives for wording each focus and attempts to
achieve a group consensus on final choices. For example, the brainstorming focus in a strategic planning process might be worded: "Generate short
phrases or sentences which describe specific services which your organization might provide." Similarly, a rating focus for a program evaluation
might be worded: "Rate each potential outcome on a seven point scale in terms of how strongly you think it will be affected by the program, where '1'
means 'Not at all affected', '4' means 'Moderately affected' and '7' means 'Extremely affected'." The group should agree on the specific wording for
each of these focus statements.
The Brainstorming Focus. For any brainstorming session, there are a variety of ways in which the focus can be stated. For instance, if we are
interested in strategic planning, participants might focus on the goals of the organization, the mission of the organization, or the activities or services
which the organization might provide. Similarly, in program evaluations they might focus on the nature of the program, the outcomes they would like
to measure, or the types of people they would like to participate in the evaluation.
In defining the brainstorming focus, it is important to try to anticipate the types of statements which will result. For instance, one usually wants to
avoid giving double-barrelled focus statements like: "Generate short statements or sentences which describe the goals of our organization and the
needs of our clients" because, when sorting, participants are likely to perceive these two categories as particularly distinct and consequently, they are
likely to show up as two major clusters on the final concept map and obscure some of the finer relationships which might be of interest (see Keith, this
volume, for a discussion of this issue). If both emphases are desired, it would probably be better to conduct two separate conceptualizations to address
each issue. In general, it is best if the final set of brainstormed statements are "of a kind", that is, share the same level of conceptual generality and
grammatic structure. For example, a brainstormed set which consists entirely of statements which describe specific services which an organization
might provide and which are all worded in a similar manner (e.g., some aren't in question form while others are worded as statements) would be best
if the interest is in conceptualizing a taxonomy of potential services.
In the York County study, the brainstorming focus statement was a broad one: "Generate statements which describe the issues, problems, concerns or
needs which the elderly have in York County." Although this may seem to violate the advice above on avoiding double-barrelled statements, the
participants felt that "issues, problems, concerns or needs" were essentially indistinguishable and that the brainstormed statements which resulted
would constitute an interpretable, relatively homogeneous set.
The Rating Focus. In developing the focus for the ratings, one needs to consider what kind of information will be most useful. In planning, it is often
useful to ask the participants to rate how important each brainstormed item is, or how much emphasis should be placed upon it in the planning
process. In evaluation, it might be useful to ask them to rate how much effort should be given to various program components, or how much they
believe each outcome is likely to be affected by the program.
In the York County study, the rating focus statement was: "Rate each statement on a 1 to 5 scale for how much priority it should be given in the
planning process, where '1' equals the lowest priority and '5' equals the highest priority."
Step 2: Generation of Statements
Once the participants and focus statements have been defined, the actual concept mapping process begins with the generation of a set of statements
which ideally should represent the entire conceptual domain for the topic of interest. In the typical case, brainstorming is used and the focus statement
constitutes the prompt for the brainstorming session. The usual rules for brainstorming apply (Osborn, 1948; Dunn, 1981). That is, people are
encouraged to generate lots of statements and are told that there should be no criticism or discussion regarding the legitimacy of statements which are
generated during the session. Participants are encouraged to ask for clarification of any unfamiliar terms or jargon so that all who participate may
understand what was intended by a given statement. Usually, the facilitator records the statements as they are generated so that all members of the
group can see the set of statements as they evolve. We have done this by writing the statements on a blackboard, on sheets of newsprint , or by
directly entering them into a computer program which is displayed on a large screen so that everyone can see the statements. If the facilitator believes
that there may be some participants who would be reluctant to state publicly some idea because of its controversial or potentially embarrasing nature,
it may be desirable to also allow each participant to submit several statements anonymously on paper so that confidentiality will be preserved.
Theoretically, there is no limit to the number of statements which can be generated. However, large numbers of statements impose serious practical
constraints. Based on our experience, we now limit the number of statements to one hundred or less. If the brainstorming session generates more than
a hundred statements, we reduce the set. There are a number of ways in which this can be accomplished. The group as a whole or some subgroup can
examine the set of statements for redundancies or ones which can be chosen to represent a set of others. In some instances, we have taken a simple
random sample of statements from the larger set and had participants examine the selected set to be sure that no key ideas were being omitted. For
instance, Linton (this volume) randomly selected 150 statements out of a larger set of 710 statements that were generated. On several occasions we
have tried more formal thematic analysis of the text statements using a "key words in context" approach (Stone et al, 1966; Krippendorf, 1980) which
seems promising.
Once a final set of statements has been generated, it is valuable for the group to examine the statements for editing considerations. Sometimes the
wording of statements generated in a brainstorming session is awkward or technical jargon is not clear. In general, each statement should be consistent
with what was called for in the brainstorming prompt and should be detailed enough so that every member of the group can understand the essential
meaning of the statement.
There are many other ways to generate the conceptual domain than brainstorming. Sometimes a set of statements can be abstracted from existing text
documents such as annual reports, internal organizational memos, interviews or field notes. For instance, Dumont (this volume) utilized the
"documentary coding method" described by Wrightson (1976) to abstract statements or entities from interview records. In some cases there was no
need to generate statements at all because the nature of the conceptualization dictates the elements of the conceptual domain. Thus, if the goal is to
conceptualize the interrelationships between the set of 10 departments in an organization, we might simply use the 10 department names as the set of
statements. Marquart (this volume) used a set of concepts derived from the literature on employer-sponsored child care as the set of statements for
mapping. Caracelli (this volume) used the 100 items from the California Q-sort as the set of statements. We have also done some preliminary work to
examine the use of outlines for generation where each entry in the outline would be considered a separate statement. One potential advantage of doing
this is that we may be able to use the implicit structure of the outline headings and subheadings to structure the conceptual domain directly without
asking people to sort the statements. Cooksy (this volume) describes some preliminary attempts to develop models which take outlines as input and
estimate the similarity between statements (based on outline structure) as output.
In the York County study, a set of 95 statements was generated in the brainstorming session and are shown in Table 1. The statements describe a
broad range of issues which were thought by participants to be salient to the elderly in York County.
Step 3: Structuring of Statements
Once we have a set of statements which describes the conceptual domain for a given focus, we minimally need to provide information about how the
statements are related to each other. In addition, we often want to rate each statement on some dimension which is defined by the rating focus
statement. Both of these tasks constitute the structuring of the conceptual domain.
Typically, we obtain information about interrelationships using an unstructured card sorting procedure (Rosenberg and Kim, 1975). Each of the
brainstormed statements is printed on a separate 3x5 index card and the complete set of cards is given to each participant. Each person is then
instructed to sort the cards into piles "in a way that makes sense to you." There are several restrictions placed on this procedure: each statement can
only be placed in one pile (i.e., an item can't be placed in two piles simultaneously); all statements cannot be put into a single pile; and, all statements
cannot be put into their own pile (although some items may be sorted by themselves). Except for these conditions, people may pile the cards in any
way that makes sense to them. Often the participants perceive that there may be several different ways to sort the cards, all of which make sense. To
address this, we have either instructed participants to select the most sensible arrangement or, in some studies, have had each participant sort the cards
several times.
When each person has completed the sorting task, the results must be combined across people. This is accomplished in two steps. First, the results of
the sort for each person are put into a square table or matrix which has as many rows and columns as there are statements. All of the values of this
matrix are either zero or one. A '1' indicates that the statements for that row and column were placed by that person together in a pile while a '0'
indicates that they were not. This is illustrated in Figure 2 for a hypothetical person who sorted ten statements into 4 piles.
Figure 2. Procedure for computing the binary, symmetric similarity matrix for one person from their card sort.
We can see in the figure that statements 5 and 8 were sorted together in a pile. Therefore, in the table the row 5 - column 8 and row 8 - column 5
entries are '1'. Because statement 5 was not sorted with statement 6, the row 5 - column 6 and row 6 - column 5 entries are '0'. This individual matrix
is termed a binary symmetric similarity matrix. Notice that all of the diagonal values are equal to '1' because a statement is always considered to be
sorted into the same pile as itself.
Second, the individual sort matrices are added together to obtain a combined group similarity matrix. This matrix also has as many rows and columns
as there are statements. Here, however, the value in the matrix for any pair of statements indicates how many people placed that pair of statements
together in a pile regardless of what the pile meant to each person or what other statements were or were not in that pile. Values along the diagonal are
equal to the number of people who sorted. Thus, in this square group similarity matrix, values can range from zero to the number of people who
sorted. This final similarity matrix is considered the relational structure of the conceptual domain because it provides information about how the
participants grouped the statements. A high value in this matrix indicates that many of the participants put that pair of statements together in a pile and
implies that the statements are conceptually similar in some way. A low value indicates that the statement pair was seldom put together in the same
pile and implies that they are conceptually more distinct. There are many others ways than sorting to structure the conceptual domain, some of which
are briefly described in Trochim and Linton (1986). The major advantages of the sorting procedure are that it is easily understandable by participants
and that it takes little time to accomplish.
The second task in structuring the conceptual domain involves rating each statement on some dimension as described in the rating focus statement.
Usually this rating is accomplished using a Likert-type response scale (e.g., 1-to-5 or 1-to-7 rating) to indicate how much importance, priority, effort
or expected outcome is associated with each statement. For each statement one then obtains at least the arithmetic mean of the ratings and sometimes
other descriptive statistical information.
Step 4: Representation of Statements
There are three steps involved in the way in which we typically represent the conceptual domain. First, we conduct an analysis which locates each
statement as a separate point on a map (i.e., the point map). Statements which are closer to each other on this map were likely to have been sorted
together more frequently; more distant statements on the map were in general sorted together less frequently. Second, we group or partition the
statements on this map into clusters (i.e., the cluster map) which represent higher order conceptual groupings of the original set of statements. Finally,
we can construct maps which overlay the averaged ratings either by point (i.e., the point rating map) or by cluster (i.e., the cluster rating map).
To accomplish the first step, the mapping process, we typically conduct a two-dimensional nonmetric multidimensional scaling of the similarity
matrix obtained from Step 3. Nonmetric multidimensional scaling is a technique which takes a proximity matrix and represents it in any number of
dimensions as distances between the original items in the matrix. A good introductory discussion of multidimensional scaling can be found in Kruskal
and Wish (1978) and a more technical description of the algorithm which is used is given in Davison (1983).
A simple example of the principle which underlies multidimensional scaling can be given. If you were given a geographical map of the United States
and asked to construct a table of distances between three major cities, say New York, Chicago, and Los Angeles, you could accomplish this fairly
easily. You might take a ruler and measure the distances between each pair of cities and enter them into a 3 x 3 table of ruler-scale relative distances.
However, if you were given only a table of distances between the three cities and were asked to draw a map which located the three cities on it as
points in a way that fairly represented the relative distances in the table, the task would be slightly more difficult. You might begin by arbitrarily
placing two points on a page to represent two of the cities and then try to draw in a third point so that its distances to the first two cities was
proportionate to the distances given in the table. You would be able to accomplish this if the table consisted of three cities, but for more this task
would become extremely complex. Multidimensional scaling is a multivariate analysis which accomplishes this task. It takes a table of similarities or
distances and iteratively places points on a map so that the original table is as fairly represented as possible. In concept mapping, the multidimensional
scaling analysis creates a map of points which represent the set of statements which were brainstormed based on the similarity matrix which resulted
from the sorting task.
Typically, when multidimensional scaling analysis conducted the analyst has to specify how many dimensions the set of points is to be fit into. If a
one-dimensional solution is requested, all of the points will be arrayed along a single line. A two-dimensional solution places the set of points into a
bivariate distribution which is suitable for plotting on an X-Y graph. The analyst could ask for any number of solutions from 1 to N-1 dimensions.
However, it is difficult to graph and interpret solutions which are higher than three-dimensional easily. The literature on multidimensional scaling
discusses this dimensionality issue extensively. One view is that the analyst should fit a number of solutions (e.g., one- to five-dimensional solutions)
and examine diagnostic statistics to see whether a particular dimensional solution is compelling. This is analogous to examining J-plots of eigenvalues
in factor analysis in order to decide on the number of factors. Another view suggests that in certain contexts automatic use of two-dimensional
configurations might make sense. For instance, Kruskal and Wish (1978) state that
Since it is generally easier to work with two-dimensional configurations than with those involving more dimensions, ease of use
considerations are also important for decisions about dimensionality. For example when an MDS configuration is desired primarily
as the foundation on which to display clustering results, then a two-dimensional configuration is far more useful than one involving
3 or more dimensions" (p. 58).
In studies where we have examined other than two-dimensional solutions, we have almost universally found the two-dimensional solution to be
acceptable, especially when coupled with cluster analysis as Kruskal and Wish suggest. Therefore, in concept mapping we usually use a two-
dimensional multidimensional scaling analysis to map the brainstormed statements into a two-dimensional plot.
The second analysis which is conducted to represent the conceptual domain is called hierarchical cluster analysis (Anderberg, 1973; Everitt, 1980).
This analysis is used to group individual statements on the map into clusters of statements which presumably reflect similar concepts. There are a
wide variety of ways to conduct cluster analysis and there is considerable debate in the literature about the relative advantages of different methods.
The discussion centers around ambiguity in the definition of the term "cluster." Everitt (1980) and Anderberg (1973) present more extensive
discussions of this issue. We have tried a number of different cluster analysis approaches. Originally, we used an oblique principle components factor
analysis approach to hierarchical cluster analysis where the input for the analysis consisted of the similarity matrix. The problem with this approach
was that it often led to results which did not visually correspond with the way in which multidimensional scaling mapped the points. This is because
differences in the two algorithms (i.e., multidimensional scaling and cluster analysis) when applied to the same similarity matrix, sometimes meant
that points which were close to each other on the map were placed in separate clusters by the cluster analysis. These results were hard to interpret and
seemed to give equal weight to multidimensional scaling and cluster analysis. Instead, it makes sense to view the mathematical basis for
multidimensional scaling as stronger than basis for cluster analysis and, accordingly, to rely on the multidimensional scaling rather than cluster
analysis to depict the basic inter-statement conceptual similarities. What we wanted was a cluster analysis which grouped or partitioned the statements
on the map as they were placed by multidimensional scaling. We found that this could be accomplished by using the X-Y multidimensional scaling
coordinate values for each point (rather than the original similarity matrix) as input to the cluster analysis. In addition, we also found that Ward's
algorithm for cluster analysis generally gave more sensible and interpretable solutions than other approaches (e.g., single linkage, centroid). Therefore
we have moved to an approach which uses Ward's hierarchical cluster analysis on the X-Y coordinate data obtained from multidimensional scaling as
the standard procedure. This in effect partitions the multidimensional scaling map into any number of clusters.
Just as deciding on the number of dimensions is an essential issue for multidimensional scaling analysis, deciding on the number of clusters is
essential for cluster analysis. All hierarchical cluster analysis procedures give as many possible cluster solutions as there are statements. In principle,
these clustering methods begin by considering each statement to be its own cluster (i.e., an N-cluster solution). At each stage in the analysis, the
algorithm combines two clusters until, at the end, all of the statements are in a single cluster. The task for the analyst is to decide how many clusters
the statements should be grouped into for the final solution. There is no simple way to accomplish this task. Essentially, the analyst must use
discretion in examining different cluster solutions to decide on which makes sense for the case at hand. Usually, assuming a set of a hundred or fewer
statements, we begin by looking at all cluster solutions from about 20 to 3 clusters. Each time the analysis moves from one cluster level to the next
lowest (e.g., from 13 to 12 clusters) we examine which statements were grouped together at that step and attempt to decide whether that grouping
makes sense for the statements in the conceptualization. In examining different cluster solutions we have found it useful to use a cluster tree which
shows pictorially all possible cluster solutions and mergers. In general, we attempt to decide on a cluster solution which, if anything, errs on the side
of more clusters than fewer. Clearly this is a task which requires discretion on the part of the analyst and it would be ideal if we could involve the
participants directly in this decision making process, but as of yet we have not determined an easy way to accomplish this kind of task within a group
process.
Our experience shows that, in general, the cluster analysis results are less interpretable than the results from multidimensional scaling. The cluster
analysis is viewed as suggestive and, in some cases, one may want to "visually adjust" the clusters into more sensibly interpretable partitions of the
multidimensional space. The key operative rule here would be to maintain the integrity of the multidimensional scaling results, that is, try to achieve a
clustering solution which does not allow any overlapping clusters (e.g., a true partitioning of the space).
Once we have conducted the multidimensional scaling and cluster analysis, we are able to generate a point and a cluster map. The final analysis
involves obtaining average ratings across participants for each statement and for each cluster. These can then be overlayed graphically on the maps to
produce the point rating map and the cluster rating map as will be shown later.
We have several products at the end of the representation step. First, we have the two-dimensional point or statement map which locates each of the
brainstormed statements as a point. Next to each point we place the number of the statement so that participants can identify each point as a statement.
Second, we have a cluster map which shows how the cluster analysis grouped the points. Third, we have the point rating map which shows the
average ratings for each statement on the point map. Finally, we also have the cluster rating map which shows the average rating for each cluster on
the cluster map. This information forms the basis of the interpretation in the next step.
Step 5: Interpretation of Maps
To interpret the conceptualization, we usually assemble a specific set of materials and follow a specific sequence of steps -- a process which has been
worked out largely on the basis of our experiences on many different projects. The materials consist of:
1. The Statement List. The original list of brainstormed statements, each of which is shown with an identifying number.
2. The Cluster List. A listing of the statements as they were grouped into clusters by the cluster analysis.
3. The Point Map. The numbered point map which shows the statements as they were placed by multidimensional scaling.
4. The Cluster Map. The cluster map which shows how statements were grouped by the cluster analysis.
5. The Point Rating Map. The numbered point map with average statement ratings overlayed.
6. The Cluster Rating Map. The cluster map with average cluster ratings overlayed.
Notice that there are four different types of maps here. Which of them is the concept map? In fact, they are all concept maps. Each of these maps tells
us something about the major ideas and how they are interrelated. Each of them emphasizes a different part of the conceptual information. While the
maps are distinctly different ways of portraying or representing the conceptual structure (and consequently, different names are used to distinguish
them), it is important to remember that they are all related to each other and are simply reflecting different sides of the same underlying conceptual
phenomenon. In the remainder of this paper, if the type of concept map is not specified, it is usually fair to assume that the discussion pertains to the
cluster map because that is usually the most directly interpretable map.
The facilitator begins by giving the group the original set of brainstormed statements (for the York County example, the statement list is shown in
Table 1 above) and recalling that the statements were generated by the them in the brainstorming session. Participants are then reminded that they
grouped these statements into piles and told that the individual sortings were combined for the entire group. The statements as they were grouped by
the cluster analysis (the cluster list) are then presented. For the York County study, the cluster listing is given in Table 2.
Each participant is asked to read through the set of statements for each cluster and come up with a short phrase or word which seems to describe or
name the set of statements as a cluster. This is analogous to naming factors in factor analysis. When each person has a tentative name for each cluster,
the group works cluster-by-cluster in an attempt to achieve group agreement on an acceptable name for each cluster. This is an often interesting
negotiating task. When each person in turn gives their name for a certain cluster, the group can often readily see a consensus which exists. For some
clusters, the group may have difficulty in arriving at a single name. This is because the statements in that cluster might actually contain several
different ideas and, had a higher cluster solution been selected, the statements would have been subdivided into subclusters. In these cases, the
facilitator might suggest that the group use a hybrid name, perhaps by combining titles from several individuals. In any event, the group is told that
these names are tentative and may be revised later.
When the group has reached consensus on the names for each cluster, they are presented with the numbered point map. The York County map is
shown in Figure 3.
Figure 3. Numbered point map for the York County Elderly project.
They are told that the analysis placed all of the statements on the map in such a way that statements which were piled together frequently should be
closer to each other on the map than statements which were not piled together frequently. Usually it is a good idea to give them a few minutes to
identify a few statements on the map which are close together and examine the wording of those statements on the original brainstormed statement list
as a way to reinforce the notion that the analysis is placing the statements sensibly. When they have become familiar with this numbered point map,
they are told that the analysis also organized the points into groups as shown on the list of clustered statements which they just named. The cluster
map is presented and participants are shown that the map portrays visually the exact same clustering which they just looked at on the cluster list. They
are then asked to write the cluster names which the group arrived at next to the appropriate cluster on the cluster map. They are then asked to examine
this named cluster map to see whether it makes any sense. The facilitator should remind participants that in general, clusters which are closer together
on the cluster map should be more similar conceptually than clusters which are farther apart and ask them to assess whether this seems to be true or
not. Participants might even begin at some point on the map and, thinking of a geographic map, "take a trip" across the map reading each cluster in
turn to see whether or not the visual structure makes any sense. For the York County example, the named cluster map is given in Figure 4.
Figure 4. Named cluster map for the York County Elderly project.
The participants are then asked to see whether there are any sensible groups or clusters of clusters. Usually, the group is able to perceive several major
regions. These are discussed and partitions are drawn on the map to indicate the different regions. Just as in naming the clusters, the group then
attempts to arrive at a consensus concerning names for these regions. In the York County study, people were satisfied with the clustering arrangement
and did not wish to define any regions.
This final named cluster map constitutes the conceptual framework and the basic result of the concept mapping process. The facilitator should remind
the participants that this final map is their own product. It was entirely based on statements which they generated in their own words and which they
grouped. The labels on the map represent categories which they named. While in general the computer analysis will yield sensible final maps, the
group should feel free to change or rearrange the final map until it makes sense for them and for the conceptualization task at hand. At this point it is
useful for the facilitator to engage the participants in a general discussion about what the map tells them about their ideas for evaluation or planning.
If ratings were done in the structuring step, the facilitator then presents the point rating and cluster rating maps. The point rating map for the York
County Study is given in Figure 5
Figure 5. Point rating map for the York County Elderly project.
while the cluster rating map is shown in Figure 6.

Figure 6. Cluster rating map for the York County Elderly project.
Participants examine these and attempt to determine whether they make sense and what they imply about the ideas which underlie their evaluation or
planning task.
In the York County study some interesting insights arose from the interpretation process. If we look at either of the cluster maps (Figures 4 or 6) we
can see that the clusters on the right side of the map -- Personal Growth and Education, Stereotyping, Socialization Needs, and Political Strength and
Advocacy -- tend to be the types of issues of most concern to the "well elderly", people who are not ill or institutionalized. As we move counter-
clockwise on the map, the clusters on the top represent concerns of those who are becoming ill or are need of service, those on the left are most
relevant for people who are already ill and/or in need of intensive home or center-based care, and those on the bottom pertain most to persons who are
severely ill or dying. The group perceived this counter-clockwise cycle as a good description of their implicit theory of the aging process.
Furthermore, they believed that most of the political strength and advocacy work that is exists around aging issues tends to be done by the "well
elderly" as the map implies. This led them to discuss the desirability of working within the community to encourage the well elderly to perceive their
position within the entire aging cycle and begin to engage in more active advocacy for the full range of concerns which the map describes. Thus, the
map provided a foundation for an approach to the elderly which addresses major concerns and emphasizes the more active involvement especially of
the well elderly.
Step 6: Utilization of Maps

At this point in the process we turn our attention back to the original reason for conducting the structured conceptualization. The group discusses how
the final concept map might be used to enhance either the planning or evaluation effort. The uses of the map are limited only by the creativity and
motivation of the group. A number of straightforward applications suggest themselves. For instance, if the conceptualization was done as the basis for
planning, the final map might be used for structuring the subsequent planning effort. The planning group might use it for dividing up into subgroups
or task forces, each of which is assigned a specific cluster or region. Each task group could then examine issues like: the organizational budget
allocation for each cluster, how organizational personnel are distributed within each cluster, how important each cluster is relative to the others, what
resources might be brought to bear in addressing each cluster, what level of competition exists from other organizations providing services in each
cluster, and so on. The task forces can use the individual statements within a cluster as cues or prompts concerning what they should consider
specifically within each cluster. One major advantage to having the concept map is that the results of these task force investigations can often be
usefully displayed directly on the concept map as were the priority ratings for the York County example in Figures 5 or 6. Thus, any number of maps
can be created showing such variables as budget allocations, staff effort, or degree of need, and displayed by individual statement and/or by cluster.
For planning purposes, the concept map can also be used as the framework for an outline of a planning report. Regional headings would constitute the
highest level of indentation for the outline, clusters would be subheaded within their appropriate regions, individual statements could be subheadings
within clusters, and any statement-level information relevant to planning could be subheaded within this structure. Thus for planning, the concept map
provides a framework for understanding important issues in a way which enables sensible pictorial and outline representations.
The concept map is also extremely useful in evaluation contexts. Here, its utilization depends on what the focus was for the conceptualization. If the
focus was on planning a program or service which would then be evaluated, the concept map can act as an organizing device for operationalizing and
implementing the program. For instance, if the program is a training program in a human service agency, the training can be constructed based on the
concept map with different training sessions designed to address each cluster and the individual brainstormed statements acting as cues for what kinds
of information should be covered in each session. The concept map is the framework for the program construct and can form the basis of a process
evaluation of the program. In this case or in the case where the focus of the conceptualization was on the outcomes of some program, the concept map
can guide measurement development. Each cluster can be viewed as a measurement construct and the individual statements can suggest specific
operationalizations of measures within constructs. For instance, if the group wished to develop a questionnaire, they could use the concept map by
having each cluster represented with questions on the questionnaire. Furthermore, the original brainstormed statements might provide question
prompts which either directly or with some revision could be included on the questionnaire along with some rating response format. Alternatively, if a
more multimethod approach to measurement was desired, the group could make sure that within each cluster several different types of measures were
constructed to reflect the cluster. The exciting prospect here is that the concept map provides a useful way to operationalize the multitrait-multimethod
approach to measurement which was outlined by Campbell and Fiske (1959) and is described in greater detail in the paper by Davis (this volume). In
this example, the concept map represents the group's theoretical expectations about how the major measurement constructs are conceptually
interrelated. From the concept map we can predict the rank order which we expect in the correlations between measures. These expectations (the
theoretical pattern) could then be directly compared with a matrix of correlations as obtained in the study (the observed pattern) and the degree to
which the two match can constitute evidence for the construct validity of the measures. This "pattern matching" approach to construct validity is
discussed in detail in Trochim (1985; in press).
A number of papers in this volume illustrate the use of structured conceptualization for evaluation. For instance, Valentine (this volume) used concept
mapping to construct an instrument which could be used to assess caring in a nursing context. Galvin (this volume) used concept mapping to describe
the central concepts in a Big Brother/ Big Sister program, constructed an instrument from the map, and used the instrument to explore how well the
program appeared to be performing.
Computer Programs
An introduction to concept mapping would be incomplete without some consideration of the computer programs which can be used to accomplish this
process. Essentially, there are two options: use some combination of standard general-purpose word processing and statistics packages; or, use the
computer package which was designed by the author specifically for accomplishing concept mapping. Each is discussed in turn.
Using General-Purpose Software Packages
When using available general-purpose software packages, the analyst will have to be prepared to experiment with different processing options until a
suitable procedure can be constructed. Minimally, it is desirable to have a good word processing program; a statistics package which has routines for
multidimensional scaling and cluster analysis, and which has fairly flexible data manipulation capabilities; and, a graphics program to plot the final
maps.
On a mainframe computer, brainstormed statements can be entered into any standard editor. They would then need to be formatted and printed onto
cards, labels, or sheets in a manner which allows the analyst to assemble them into sorting decks. In addition, the statements would need some minor
formatting in order to produce an instrument for the rating task. Multidimensional scaling and cluster analysis are available in most large scale
statistical systems including SAS and SPSSx. The major difficulty with this approach will involve the entry and manipulation of the sort data. The
sort results could be entered into a data matrix using a standard editor where each participant constitutes a single record (row) and there are as many
variables (columns) as there are statements. The analyst would arbitraily assign a unique number to each pile of sorted statements and enter this
number in the appropriate row and column. There are two main difficulties with this procedure, both of which are resolvable. First, since the data for
any participant will be entered in the order of the statements, the sort data will have to be coded. That is, the analyst will have to set up a coding sheet
with the statements numbered sequentially and enter the pile number next to each statement before entering the data. Second, and perhaps more
troublesome, the analyst will have to write a computer program which takes the sort data matrix as input, constructs the binary square similarity
matrix for each person and then adds these to get the group similarity matrix. This matrix can then be ported to the statistics package for input to the
analysis. On the graphics side, it is possible to use the plotting routines available with statistical packages to obtain the point map. However, it may be
difficult to obtain automatically the cluster drawings and any 3-D or pseudo-3-D plots which overlay ratings as shown in Figures 5 and 6. Listings of
statements by cluster will almost certainly have to be accomplished either by re-entering the statements in cluster order or by editing the original file.
The concept mapping analyses can also be accomplished on microcomputers, although there are some trade-offs involved. For instance, there are few
microcomputer implementations of multidimensional scaling available, although there are many graphics packages which might be easier to use than
mainframe counterparts to produce the maps. Again, one could use virtually any word processing program for entering the statements (e.g., Microsoft
WORD, WordPerfect). As with the mainframe option, the statements would need to be formatted for printing as sorting decks and as an instrument
for the rating task. The SYSTAT program, available for both MS/DOS and Macintosh environments is one of the few microcomputer programs at this
time which includes both multidimensional scaling and cluster analysis (although the cluster analysis package does not include Ward's algorithm). As
with the mainframe option, the analyst will have to write a program to construct the group similarity matrix. The maps can be produced using a
combination of plotting and painting programs. For instance, one might use CricketGraph on the MAC (or even the SYSTAT Graph option) to
generate the x-y point plot and then load that picture into a painting program (e.g., SuperPaint, FullPaint, CANVAS) to manually draw cluster
boundaries and construct pseudo-3-D rating plots like those in Figures 5 and 6. Again, lists of statements by cluster will have to be constructed by
editing the original brainstormed statement file.
Using The CONCEPT SYSTEM
Because of the inconvenience of using general-purpose programs for accomplishing concept mapping, the author has written a computer program,
called The Concept System specifically to accomplish this task. The program is available for both the MS/DOS and Macintosh microcomputer
environments. The program is interactive and has separate menu-selected options for: entering the brainstormed statements; printing decks of cards or
pre-formatted rating sheets; entering sort data (the data can be entered by cluster simply by typing in the ID numbers of the statements in any given
cluster); entering the rating data; conducting the analysis (including construction of the similarity matrix, multidimensional scaling, cluster analysis,
and averaging of ratings); and, interactive graphing of the results in a wide variety of ways to produce any of the maps discussed above. The Concept
System allows the user to examine interactively any possible clustering solution, print a cluster tree and lists of statements by any selected number of
clusters. The program limits the user to no more than 100 statements in any concept mapping project. Anyone interested in obtaining information
about The Concept System program may do so by contacting the author directly. Virtually all of the projects reported in this volume utilized The
Concept System to compute maps. While this is undoubtedly the easier way to accomplish concept mapping because the program was written
specifically for this purpose, it is certainly also feasible to do so using available general-purpose programs as well.
Conclusions
Concept mapping of the type described here is designed to bring order to a task which is often extremely difficult for groups or organizations to
accomplish. The process has several distinct advantages (many of which are discussed in greater detail in the other papers in this volume). First, it
encourages the participant group to stay on task and to lay out relatively quickly a framework for a planning or evaluation study. Second, it expresses
the conceptual framework in the language of the participants rather than in terms of the evaluator or planner's language or the language of social
science theorizing. Third, it results in a graphic respresentation which at a glance shows all of the major ideas and their interrelationships. Fourth, this
graphic product is comprehensible to all of the participants and can be presented to other audiences relatively easily. Finally, we have observed over
many concept mapping projects that one of the major effects of the process is that it appears to increase group cohesiveness and morale. Especially in
groups which have previously tried to accomplish conceptualizing through committee discussions, we have found that they readily appreciate the
structure of the process and the ease with which it produces an interpretable starting point for subsequent evaluation or planning work.
This concept mapping process is by no means the only way in which group conceptualization can be accomplished nor is it necessarily the best way
for any given situation. In situations where a group can achieve consensus relatively easily on their own or where a pictorial representation of their
thinking is not desired or deemed useful, this approach would not be recommended.
As the other papers in this volume will show, we are only just beginning our efforts to devise sensible group conceptualization processes. There are
still many methodological issues which need to be explored in order to improve the process described here, and that work is continuing. However, as
it now stands, the concept mapping process is a useful procedure which helps a group to focus on the conceptualization task, and results in an easily
understandable pictorial representation of their thinking which emerges from a group process which insures input from all participants.
References
Anderberg, M.R. (1973). Cluster analysis for applications. New York, NY: Academic Press.
Bickman, L. (Ed.). (1986). Using program theory in evaluation. New Directions for Program Evaluation. San Francisco, CA: Jossey-Bass.
Chen, H.T. and Rossi, P.H. (1983). Evaluating with sense: The theory-driven approach. Evaluation Review. 7, 283-302.
Chen, H.T. and Rossi, P.H. (1987). The theory-driven approach to validity. Evaluation and Program Planning. 10, 95-103.
Cordray, D.S. (1986). Quasi-experimental analysis: A mixture of methods and judgment. In W. Trochim. (Ed.). Advances in quasi-experimental
design and analysis. New Directions in Program Evaluation. San Francisco, CA: Jossey-Bass.
Davison, M.L. (1983). Multidimensional Scaling. New York, NY: John Wiley and Sons.
Dunn, W. (1981). Public policy analysis: An introduction. Englewood Cliffs, NJ: Prentice Hall.
Einhorn, H.J. and Hogarth, R.M. (1986). Judging probable cause. Psychological Bulletin. 99, 1, 3-19.
Everitt, B. (1980). Cluster Analysis (2nd Edition). New York, NY: Halsted Press, A Division of John Wiley and Sons.
Krippendorf, K. (1980). Content analysis: An introduction to its methodology. Beverly Hills, CA: Sage Publications.
Kruskal, J.B. and Wish, M. (1978). Multidimensional scaling. Beverly Hills, CA: Sage Publications.
Mark, M.M. (1986). Validity typologies and the logic and practice of quasi-experimentation. In W. Trochim. (Ed.). Advances in quasi-experimental
design and analysis. New Directions in Program Evaluation. San Francisco, CA: Jossey-Bass.
Novak, J.D. and Gowin, D.B. (1984). Learning how to learn. Cambridge, England: Cambridge University Press.
Osborn, A.F. (1948). Your creative power. Hew York, NY: Charles Scribner.
Rico, G.L. (1983). Writing the natural way: Using right-brain techniques to release your expressive powers. Los Angeles, CA: J.P. Tarcher.
Rosenberg, S. and Kim, M.P. (1975). The method of sorting as a data-gathering procedure in multivariate research. Multivariate Behavioral Research,
10, 489-502. Rossi, P.H. and Chen, H. (in press). Issues and overview of the theory-driven approach. Evaluation and Program Planning.
Shadish, W.R. Cook, T.D. and Houts, A.C. (1986). Quasi-experimentation in a critical multiplist mode. In W. Trochim. (Ed.). Advances in quasi-
experimental design and analysis. New Directions in Program Evaluation. San Francisco, CA: Jossey-Bass.
Stone, P.J., Dunphy, D.C., Smith, M.S. and Ogilvie, D.M. (1966). The general inquirer: A computer approach to content analysis. Cambridge MA:
The Massachusetts Institute of Technology.
Trochim, W. (1985). Pattern matching, validity, and conceptualization in program evaluation. Evaluation Review, 9, 5, 575-604.
Trochim, W. and Linton, R. (1986). Conceptualization for evaluation and planning. Evaluation and Program Planning, 9, 289-308.
Trochim, W. (in press) Pattern matching and program theory. Evaluation and Program Planning.
Wrightson, M. (1976). The documentary coding method. In R. Axelrod (Ed.). The structure of decision: The cognitive maps of political elites.
Princeton, NJ: The Princeton University Press.
Table 1. Brainstormed statements from York County Elderly Project.
1) lack of public transportation

2) lack of money
3) high cost of medication
4) high cost of health care
5) stigmatization of elderly persons
6) isolation
7) dependency -- interpersonal
8) loss of close support
9) deterioration of mental capacity
10) physical disabilities
11) late life role changes
12) depression
13) inadequate home care services
14) lack of nursing homes
15) decreased functional status
16) loss of recreation
17) concern for mental wellness
18) emergency supervised housing
19) lack of socialization
20) fears about not being able to protect themselves
21) inability to cope with change
22) effects of the weather on the elderly
23) death of spouse
24) lack of funding for community care
25) lack of expertise to treat mentally ill elderly
26) confusing social service system
27) exploitation
28) physical abuse and neglect by caretakers
29) duplication of services
30) lack of outreach
31) losing your driver's license
32) Medicare limitations
33) nutrition services
34) expenses of maintaining family contacts
35) need for political advocacy
36) housing costs
37) different housing alternatives
38) over self-medication
39) over-medication by physicians
40) inability to understand or read medication instructions
41) need for independence
42) lack of awareness in mixing drugs and alcohol
43) expanding (exploding) elderly population
44) continuation of vocational satisfaction
45) lack of preparation for aging (including retirement)
46) issues around self-worth and self-esteem
47) changing role in the community
48) deterioration of family life
49) need for respite for caregivers
50) awareness of community resources
51) long-term care insurance
52) Medicare coverage being gradually "squeezed down"
53) lack of state and federal funding commitment to elderly
54) fear of prolonged dying
55) concern over being a burden
56) feeling o uselessness
57) feeling of being excluded by other generations or groups
58) desire for access to educational/recreational programs
59) desire to be treated like everybody else
60) assistance with managing finances
61) job descrimination
62) lack of education on life changes
63) exclusion of elderly as an educational resource
64) desire to pass on traditions and knowledge/experiences
65) how do some elderly function so well? (exceptional elderly)
66) premature categorization as "old" or elderly
67) care for elderly persons with dementia
68) confusion between aging and illness
69) improved use of educational opportunities
70) access issues for disabled elderly
71) accessible education through creative programming
72) more diverse social opportunities
73) placement outside of home
74) means to maintain independence
75) in-home visits by physicians
76) clergy that are aware of community resources
77) improved public awareness and support of elderly needs
78) clergy who can make home visits
79) understanding and respect for sexuality
80) fear of being crazy
81) suicide among the elderly
82) alcoholism among the elderly
83) infantilization of elderly by providers
84) an instutional end (decline and death)
85) giving elderly persons the right to choose to die
86) fear of being abandoned
87) loss of legal control
88) loss of financial control
89) better understanding of elderly by legal profession
90) distress about the family issues of their children
91) high cost of home nursing care
92) role models for the elderly
93) depletion of resources by children
94) lack of support for family caregivers
95) law enforcement intervention when elderly refuse to be hospitalized
Table 2. Statements grouped by cluster, York County Elderly Project.
Cluster 1
1) lack of public transportation

13) inadequate home care services
2) lack of money
75) in-home visits by physicians
78) clergy who can make home visits
Cluster 2
3) high cost of medication

4) high cost of health care
51) long-term care insurance
52) Medicare coverage being gradually "squeezed down"
91) high cost of home nursing care
53) lack of state and federal funding commitment to elderly
Cluster 3
24) lack of funding for community care

32) Medicare limitations
36) housing costs
70) access issues for disabled elderly
76) clergy that are aware of community resources
Cluster 4
26) confusing social service system

29) duplication of services
89) better understanding of elderly by legal profession
50) awareness of community resources
Cluster 5
14) lack of nursing homes

49) need for respite for caregivers
94) lack of support for family caregivers
18) emergency supervised housing
37) different housing alternatives
83) infantilization of elderly by providers
Cluster 6
27) exploitation
28) physical abuse and neglect by caretakers
Cluster 7
30) lack of outreach

33) nutrition services
34) expenses of maintaining family contacts
Cluster 8
25) lack of expertise to treat mentally ill elderly

95) law enforcement intervention when elderly refuse to be hospitalized
73) placement outside of home
39) over-medication by physicians
60) assistance with managing finances
Cluster 9
48) deterioration of family life

90) distress about the family issues of their children
67) care for elderly persons with dementia
93) depletion of resources by children
Cluster 10
5) stigmatization of elderly persons

61) job descrimination
66) premature categorization as "old" or elderly
Cluster 11
44) continuation of vocational satisfaction

69) improved use of educational opportunities
62) lack of education on life changes
74) means to maintain independence
65) how do some elderly function so well? (exceptional elderly)
92) role models for the elderly
64) desire to pass on traditions and knowledge/experiences
72) more diverse social opportunities
Cluster 12
35) need for political advocacy

58) desire for access to educational/recreational programs
63) exclusion of elderly as an educational resource
71) accessible education through creative programming
77) improved public awareness and support of elderly needs
43) expanding (exploding) elderly population
Cluster 13
11) late life role changes

45) lack of preparation for aging (including retirement)
57) feeling of being excluded by other generations or groups
21) inability to cope with change
85) giving elderly persons the right to choose to die
46) issues around self-worth and self-esteem
Cluster 14
16) loss of recreation

19) lack of socialization
59) desire to be treated like everybody else
79) understanding and respect for sexuality
41) need for independence
47) changing role in the community
Cluster 15
6) isolation
23) death of spouse
68) confusion between aging and illness
17) concern for mental wellness
54) fear of prolonged dying
86) fear of being abandoned
20) fears about not being able to protect themselves
Cluster 16
9) deterioration of mental capacity

38) over self-medication
12) depression
80) fear of being crazy
81) suicide among the elderly
55) concern over being a burden
56) feeling of uselessness
Cluster 17
7) dependency -- interpersonal
8) loss of close support
87) loss of legal control
40) inability to understand or read medication instructions
42) lack of awareness in mixing drugs and alcohol
82) alcoholism among the elderly
Cluster 18
10) physical disabilities

88) loss of financial control
15) decreased functional status
31) losing your driver's license
22) effects of the weather on the elderly
84) an institutional end (decline and death)
Concept Mapping: Soft Science or Hard Art?

Cornell University
Abstract
Is concept mapping "science" or "art"? Can we legitimately claim that concept maps represent reality, or are they primarily suggestive devices which might
stimulate new ways to look at our experiences? Here, the scientific side of concept mapping is viewed as "soft science" and the artistic one as "hard art" to imply
that the process has some qualities of both, but probably does not fall exclusively within either's domain. In the spirit of hard art, a "gallery" of final concept maps
from twenty projects is presented, partly to illustrate more examples of the process when used in a variety of subject areas and for different purposes, and partly for
their aesthetic value alone. In the spirit of soft science, two major issues are considered. First, the evidence for the validity and reliability of concept mapping is
introduced, along with some suggestions for further research which might be undertaken to examine those characteristics. Second, the role of concept mapping is
discussed, with special emphasis on its use in a pattern matching framework.
Concept Mapping: Soft Science or Hard Art?
One of the lingering questions regarding concept mapping is: should it be considered primarily a "scientific" process or an "artistic" one? Without getting into the
voluminous literature on the distinctions and commonalities between art and science, we might at the least assume that science strives for objectivity -- for an
agreed upon consensus about reality -- whereas art is more subjective in nature -- striving to reach and challenge our perspective on reality in a unique, personal
manner. Perhaps even posing the issue dichotomously obscures some of the commonalities between art and science. After all, like art, science requires creativity
and insight especially in theory development stages. And like science, art requires a mastery of skills, methods, tools and technologies.
This paper does not provide an answer to the question. As with most issues which are dichotomously framed, it may very well be that the truth of the matter lies
somewhere in the middle. Concept mapping may result in both a representation of reality and an interesting suggestive device. In anticipation of such a non-
conclusion the title of this paper has been tempered somewhat, describing the scientific side of concept mapping as "soft science" and the artistic one as "hard art"
to imply that the process has some qualities of both, but probably does not fall exclusively within either artistic or scientific domains. This paper will examine the
concept mapping process from each perspective -- viewing it as an artistic procedure which yields interpretable, suggestive conceptual pictures and as a scientific
one based upon sound evidence regarding its validity, reliability and theory-enhancing value.
The question of whether concept mapping is primarily hard art or soft science is an important one on several counts. First, it is essential that those who use the
process not misrepresent it to clients. It is easy to believe that the maps which result have a quality of finality to them -- they imply objectivity and scientific truth;
they seem to represent some group conceptual reality. If this is not true -- and our clients believe that is is -- then we are deceiving them and practicing scientism of
a most insidious type. Second, concept mapping would have different uses, depending on how we choose to answer the question. If it is a credible scientific
procedure, we might use it as the basis for developing theory, as a way to develop theoretical patterns for pattern matching, or as the basis for generalizability of
inferences (Trochim, in press). If concept maps are not justifiable as accurate representations of some underlying reality, they might alternatively be considered
primarily as suggestive devices, useful for their stimulative or creative value. In this latter case, the focus would be more aesthetic in nature -- how do we generate
interesting maps which suggest new ways of looking at familiar problems? Finally, the way in which we evaluate the effects of concept mapping would differ
depending on whether it is hard art or soft science. From a scientific viewpoint, we would want to know the degree to which concept maps are reliable, valid and
parsimonious in their representation of theory or data. From an artistic point of view we would be more concerned about the aesthetic judgment of the participants
-- the degree to which the map is pleasing or interpretable -- and we would look at the effects of the process on stimulating new ideas, or facilitating subsequent
planning or evaluation efforts.
The next section is in the spirit of artistry and aesthetics and presents a brief gallery of concept maps from a variety of different projects. These maps provide more
examples of the concept mapping process and will illustrate a variety of techniques for graphing concept maps. The section after that presents some of the scientific
issues which have been raised and suggests some of the research which needs to be included on our agenda for further investigations of the concept mapping
process.
Concept Mapping as Hard Art: A Gallery of Projects
As with all art, beauty is in the eye of the beholder. Because concept maps are the product of a fairly complex set of group interactions, it is reasonable to expect
that even for people within the participating group, appreciation and understanding of a concept map will differ. If so, then it is almost certainly the case that
someone who did not participate will have more difficulty in understanding what a concept map means to the participants. Every mapping process involves a
unique group of individuals who come to the process with their own experiences, jargon, and motivations. The outside viewers of a map are not privy to this and,
instead, apprehend the picture with their own perspectives and biases. It is worth recognizing, then, that there is the effect of the concept map upon the participant
group and organization(s) and its effect upon some outside viewer of it , and that these two may not be in agreement.
Here, we review briefly the final concept maps from twenty different projects (only one of these has already been seen in the introductory chapter of this volume).
These projects are included here for a number of reasons. They illustrate a wider variety of subject areas for the application of the concept mapping process. They
show different graphing techniques which can be used to display concept maps. They depict different purposes and uses of concept mapping. But perhaps most of
all, you may find that some of the maps are simply "pretty" and interesting to look at. Unfortunately, only final maps can be presented here due to space limitations.
More detailed descriptions of any project may be requested from the author.
For several projects, the anonymity of the organization is preserved (usually at their request). The twenty projects and their major characteristics are described in
Table 1.
They range over a five year period beginning in the summer of 1983 and with the most recent ones conducted in the spring of 1988. They range in size from
projects with as few as 4 participants to as many as 75; as few as 11 statements or as many as 137. The participants include organizational staff, board members and
client groups. These projects represent an evolution in our understanding of concept mapping from some of the cruder early attempts to more sophisticated recent
ones. The twenty projects are classified by subject areas in Table 2.
Most of the projects are related in some way to education, but there are also other fields well represented. Some projects are pertinent to multiple subject areas. For
instance, project 10, the Community School of Music and Arts, is classified in four subject areas -- education, educational administration, children and youth, and
the arts -- even though its primary emphasis was on the improvement of administration of the school. The projects are classified by purpose in Table 3.
The vast majority of projects were conducted for planning and/or management purposes. Although all of these projects generally followed the procedure described
in the introductory chapter of this volume, there were slight variations over time. For instance, early projects did not incorporate rating data into the maps. Also, the
quality of map production generally improved over time. Each project will be described briefly and its final concept map presented.
1. Multicultural Awareness Camp. In the summer of 1983 a group of youth workers constructed a camp program for local high school children. But this day
camp had a unique twist to it -- it was designed to help the children become more aware of different groups and cultures, and to raise issues of class, race, gender
and sexual orientation for individual and group thought, discussion and action. The day camp was an intensive 4 1/2 day program with ongoing follow-up in the
school and community.
The purpose of this project was to develop a map of the staff's goals for the program so that the staff might be better able to understand what the students might get
from it and how they themselves as staff might benefit. To generate the statements for concept mapping four core staff members responded to two questions:
1. What are your personal hopes for the multicultural program in terms of what it will accomplish?
2. How do you want the adolescents' lives to be different during the camp and afterwards?
The final cluster map for the 74 generated statements is shown in Figure 1.
The map seems divisible into three separate regions of clusters, one each for participant effects, school and community effects, and effects of the program on the
staff. The participant effects are, not surprisingly, the most detailed, with a total of four clusters grouped in the upper left of the map, one each for the effects on:
individual students (e.g., reach a clearer sense of self; feel safe to explore difficult or confusing topics); communication skills (e.g., talk to each other in more
respectful, less hurtful ways); on-going action (e.g., continue having these kinds of dialogues with each other and other adults); and, action regarding institutional
oppression (e.g., learn effective ways of challenging institutional oppression with support from each other). Staff effects are shown in the cluster at the bottom of
the map (e.g., want to know what kinds of approaches and processes work best in enabling young people and adults to explore issues of racism, sexism, etc.). The
effects of the program on the school and community (e.g., teachers will examine the kinds of information which gets transmitted to kids) are shown in the cluster
on the right of the map.
The map gave the staff a visual image of what they hoped to achieve and helped assure that all staff had some common sense of what the program was about. The
map was also used as the basis of a qualitative interview assessment of the program conducted subsequently.
2. Division of Campus Life. The Division of Campus Life (DCL) is an administrative unit at Cornell University responsible for delivering a great variety of
services (e.g., student residences, transportation, safety, dining, counseling, health, etc.) to the University community. It is comprised of eleven different
departments which vary according to size, organizational structure, and type of function performed. The goal of this project was to produce a map which could be
used as an organizing device for the long-range planning effort of the DCL. This project and the broader planning effort have been described in greater detail in
Trochim and Linton (1986) and Gurowitz, Trochim and Kramer (1988).
This was a very large project compared with the others reported here. At any given stage, there were approximately 45 people involved representing the eleven
departments. The focus for the brainstorming was the mission statement of the DCL. The statement logically divided into three major phrases and one
brainstorming session was held for each. Because of the number of people, 876 statements were originally brainstormed. This was clearly too many and so a
subcommittee of four participants was appointed to examine the set of statements for redundancies and they reduced them to a final set of 137. The final map is
shown in Figure 2.
The map was divided into four general regions. On the left side, and considerably distant from the other three regions is the one labelled "human development and
values." Most of the items which fell into this category were short general statements. The three regions on the right (management, community, programs and
services) contained statements which tended to be longer and more concrete in nature. One reason for this left-to-right split might be because of the three-part focus
(based on the mission statement) which was used. One part of the mission statement seemed to call for more general value statements while other parts implied
more concrete actions.
The map for this project formed the basis of subsequent long-range planning as described in Trochim and Linton (1986) and Gurowitz, Trochim and Kramer
(1988). In addition, the next two projects described below were direct offshoots of this original project and were also connected with DCL planning.
3. University Health Services. This project was conducted to provide a framework for long-range planning for the University Health Services (UHS) which is a
department in the Division of Campus Life described above. At any given stage there were between 50 and 75 participants, with virtually everyone in the
department participating in at least one stage. For the brainstorming, participants were asked to generate statements which describe what the UHS "should be or
should do." Because of the number of participants, three separate brainstorming sessions were held with about 25 persons at each. A total of 315 statements were
generated and 100 were randomly selected from these to produce the final set. The map which resulted is presented in Figure 3.
The map is divided into four regions and twelve clusters. In this map, directions appear to be directly interpretable. Moving from the top to the bottom implies
moving from more managerial or administrative issues to ones that are more educational or service related. Movement from left to right denotes a change from
external, service and client-related issues to the more internal concerns of the staff. To utilize the map, the participants were divided into small groups which were
responsible for generating specific recommendations for action (action statements) for different regions and clusters within regions. As a result, 145 action
statements were generated and each was addressed by the planning committee in subsequent meetings. This project is described more fully in Trochim and Linton
(1986).
One especially interesting interaction occurred in connection with the naming of Cluster 11: Meetings and Scheduling. This cluster contained a number of
statements which reflected staff dissatisfaction with the current arrangement of staggered lunch times (thereby preventing some staff from going to lunch together)
and frequent required "sack lunch" meetings (thereby preventing them from leaving the building). For whatever reason, the staff had not previously felt
comfortable in stating their displeasure with these arrangements, perhaps because they did not feel powerful enough to affect the policies. During the naming of the
clusters -- a session in which everyone in the organization (including the director and manager) participated -- one of the participants kiddingly suggested that the
cluster be named "Eat -- not meet!" There was an immediate and vigorous round of applause to this tongue-in-cheek (but very heartfelt) suggestion. Afterwards the
director confided that he had had no idea that there was a problem -- from his perspective, the lunch times had been staggered to provide continuous coverage and
the "sack lunches" had been added to provide more inservice for staff -- both goals which he assumed the staff supported. The cluster naming session enabled the
staff to make their feelings about these policies felt in a very direct way and, at the same time, preserved their anonymity and mitigated their fears about
confronting their bosses.
4. DCL Subcommittee. Following the major concept mapping process for the DCL as described earlier, a subcommittee was formed whose responsibility it was to
write the long-range plan for the Division. This subcommittee consisted of eleven members, one representing each department. One of the first issues which they
needed to address was the question of whether all of the eleven departments in the Division actually made sense there or whether some should be moved under
another branch of the University administration. Obviously, long-range planning could not proceed until some agreement was reached about what departments
were in the Division. In addition, it seemed to subcommittee members that some departments had more in common than others and that they could be involved in
more active collaboration, planning, staff interaction and exchange, and so on.
To examine these issues, we decided to do a concept mapping of the eleven departments to see how the subcommittee members would group them. Instead of using
brainstorming to generate statements, the names of the eleven departments were used. Instead of sorting the statements, all pairs of departments were rated on a
scale from 1 (least similar) to 100 (most similar) and these ratings were standardized and averaged across subcommittee members. The map which resulted is
shown in the top part of Figure 4.
The map shows some interesting distinctions. The departments on the left are by far the largest in terms of budgets, number of FTEs, and so on. They also tend to
be devoted to providing basic services such as transportation, health, safety, food (dining), and general goods (campus store). Departments on the right of the map
emphasize services and counseling. The departments in the lower right corner in general tend to deal primarily with students whereas other departments tend to
serve the broader university community. The major reorganization question centered around whether the three departments on the far right should be moved to a
different administrative part of the university where they would be with other departments which provided services and counseling directly to students.
In the subcommittee discussions, it became apparent that most members were content with the current structure -- there were two advocates for reorganization, both
of whom represented student service departments on the right of the map. The two advocates were outspoken, articulate proponents of their position -- most others
in the group did not reveal a strong position in the face of this advocacy. After many lengthy meetings discussing reorganization, we decided to use the data from
the concept map in Figure 4 to construct a "person" map to see whether it might help move the issue along. Again, we used the rating data, but this time we scaled
people (using the INDSCAL model for multidimensional scaling). The map which resulted is shown on the bottom of Figure 4. The identities of the subcommittee
members were not indicated on the map -- only arbitrary ID numbers known only to the facilitator were used. What was particularly remarkable was that everyone
in the group immediately identified the two persons in the lower right corner correctly as the two strong advocates for reorganization. There was a laugh of
recognition, very little discussion, and the reorganization issue was never seriously raised again! While it is purely conjecture, it seems reasonable that the person
map made it clear that there was a strong majority opposed to reorganizing and the two vocal advocates simply dropped the issue given that they were clearly
outnumbered.
5. The Mental Health Association. This project was undertaken to provide a framework for designing a training program for volunteers who were to work on a
one-to-one basis with deinstitutionalized mental patients. A new staff person had been hired for this purpose by the Advocacy Committee of the Board. The
dilemma which faced this new staff member was that it was not clear what the Advocacy Committee wanted the volunteers trained to do. If the staff member went
ahead and designed something that seemed sensible, it might not adequately represent the original thinking of the Board Members. Therefore, the Advocacy
Committee was encouraged to use concept mapping as a way to represent their thinking for the new staff member. The map which resulted is shown in Figure 5.
The training program was constructed so that each cluster was represented in the sessions. Individual statements within clusters were used to suggest specific skills
which needed to be taught. For instance, the cluster Supportive Communication Skills had the statements "listen attentively", and "be comfortable with hanging out
together without talking." The cluster Assisting Toward Getting Resources had the statement "help clients find housing." Each of these statements was covered in
some way in the training sessions. Although it wasn't done here, the same map structure could be used to evaluate the implementation and effectiveness of the
program. In this project, consequently, the new staff member was able to solicit specific, concrete guidance from the relevant Board members regarding the nature
of the program as they saw it -- certainly better than trying to second-guess what they might have wanted.
6. Teaching Measurement. This project was accomplished in two class sessions as part of a graduate level course in measurement. It was conducted early in the
semester in order to determine what the group perceived as the major issues in measurement and the interrelationships between these issues. Students were
prompted simply to generate statements which describe what they thought about "measurement." The map which resulted is shown in Figure 6.
The students identified six clusters for the fifty statements. What is especially interesting is that they perceived a counter-clockwise pattern across clusters which
described the measurement process from beginning to end. Measurement begins on the far left of the map with the cluster Theory/Conceptualization (e.g.,
theoretical framework), then to Practical Considerations (e.g., how time consuming, budgetary factors), onto Tools of Measurement and Data Collection (e.g.,
surveys, questionnaires, etc.), to Scaling and Testing" (e.g., ranking or ordering things), to Quality of Measures / Analysis" (e.g., reliability, validity, precision), and
finally onto Presentation (e.g., summary, recommendations, publishing). This process showed the students that they already (collectively) knew a considerable
amount about measurement, and at the same time it introduced them to the ideas of multidimensional scaling and cluster analysis -- topics which were covered later
in the course.
7. Cooperative Extension. Cooperative Extension has a rich history in community issues programming (CIP) which has become known as a specialty often
separate from the more traditional subject areas of agriculture, home economics and youth development (4-H). Many of the community issues selected for
educational programs have been in land use, agriculture districts, economic development and local government. While these are important areas for local policy
making, topics more germane to home economics or human ecology have not been addressed by most CIP specialists despite the fact that Cooperative Extension
home economics agents have undertaken many programs that include a "community issues" component (e.g., daycare, housing for the elderly, water quality, and
consumer rights).
A steering committee was appointed to strengthen the awareness of community issues within home economics subject areas and to provide a strategy for increasing
the skills and resources of faculty and agents who have a focus on educating families to incorporate more policy education into this subject area programming.
Concept mapping was used to facilitate the planning process of the steering committee; to provide a benchmark for future evaluation; and, to represent the
perceptions of county agents, faculty at the state university with at least 50% extension time, and administrators. Participants were randomly selected from subsets
of Cooperative Extension personnel whose main focus is in home economics or human ecology programs. They were asked to brainstorm ideas which "describe
your view of what a policy education program for individuals and families in Cooperative Extension should be." A total of 273 items was generated, with 75
randomly selected for use in concept mapping. Items were mailed to participants for sorting. The steering committee was responsible for interpreting the final map
shown in Figure 7.
The map is divided into nine clusters: Citizens and Policy ("citizens learn how to analyze concerns" and "...impact on political process"); Governmental Processes
("state government legislative process", "local government decision-making..."); Research and Policy Analysis ("What are the likely effects on decision outcomes
from different forms of citizen participation?"); Local Program Delivery ("Explain how program can help individuals and families"); Cooperative Extension
Programming ("'Packaged' ideas for agents"); Legitimization for Extension ("aim to develop public awareness", "success stories from other counties"); Criteria/
Approaches ("program should aim at pre-school, youth, adult and aging populations in the context of the family"); Family Empowerment ("helps families
understand tax problems", "parent forum: address problems of today and how handled"); and, Family Policy Issues ("support of child care", "environmental issues:
water policy, safety"). The map is useful for specific planning efforts. For instance, one committee member suggested that the cluster categories be used as guides
for developing materials for Cooperative Extension staff to have for in-service education opportunities and reference. The specific statements within clusters would
provide concrete examples of the topics which materials could address.
8. Student Life (R.A.s). This concept mapping project was conducted to explore the issues which college students perceive as important. Twelve Residence
Advisors (R.A.s) -- students who live in the dormitories and have some supervisory and peer counseling responsibilities -- were engaged as participants. The R.A.s
were asked to generate statements which "represent what you perceive to be important in the lives of undergraduate students." A total of 129 statements was
generated. It was hoped that the concept map would provide a framework which would help the R.A.s to plan better and more comprehensive programs for their
students. The final map is shown in Figure 8.
The map is divided into three general regions (Academic Issues, Life Issues, and Self and Others) and 15 clusters. There are a number of uses for the map. R.A.s
could examine currently planned programs and mark all areas on the map which are addressed in some way. This would show, at a glance, which topics are
emphasized and which, if any, are neglected. The specific statements within clusters might provide concrete suggestions for what types of programs might be
useful. The map may also suggest interesting relationships which could be emphasized in programs. For instance, the clusters Goal Setting, Managing Stress, and
Pressures From Home are close together on the map. This suggests that the topics are strongly related and it might be valuable to construct programs which attempt
to get at this. Finally, the map itself might be an interesting device for stimulating discussion among students regarding the issues which are important for them. In
a sense, the map could be used as the basis for a discussion program.
9. Student Life (dorm students). Like the previous study, this one was undertaken to explore what issues students see as important in their lives. Ten students
were asked to brainstorm statements which "describe the issues in your lives as students." This study was also designed to examine what kind of map might result
from a relatively short process, and consequently, the students were stopped when they had generated only 46 statements. The map which resulted is shown in
Figure 9.
What is perhaps most striking is the degree to which this map resembles the structure of the map in the previous student life study shown in Figure 8. The three
regions were very similar: Identity and Social Issues (Figure 9) is similar to Self and Others (Figure 8); School (Figure 9) is similar to Academic Issues (Figure 8);
and Outside University (Figure 9) is similar to Life Issues (Figure 8). Even at the cluster level there seems to be a great deal of similarity. For instance, in both
figures clusters related to stress turned up near the center of the map. In fact, if the two maps are rotated (while preserving the interrelational structure) it is possible
to actually overlay the two figures and achieve a fairly good correspondence. This occurs in spite of the fact that the studies were conducted with two different
groups of students -- the older R.A.s and the younger residents -- and that they involved completely independent processes (students in one were not aware of the
other) from brainstorming through interpretation. This correspondence in concept maps may suggest that there is some generalizable, consistent conceptual
similarity which would be useful for building educational theory, a topic which will be discussed in greater detail later.
10. Community School of Music and Arts. The Community School of Music and Arts (CSMA) is a resource for instruction in music, art and dance, and is a
center for cultural activity in the community which it serves. The goal of this concept mapping process was to involve the Board of Trustees in the development of
a framework for CSMA programs and services. Ten members of the Board of Trustees participated in two sessions. To generate the statements they used the focus
prompt: "Knowing what you know about the Community School, identify items which will help insure its continued growth." Fifty-four statements were generated
and the final map which resulted is shown in Figure 10.
There were four major regions -- Organization; Finances; Program; and, Benefits, Equipment and Physical Plant -- and ten clusters. The map was used to plan for
program development and fund raising. In addition, one of the major uses of the map was to encourage greater unity and cohesiveness between the trustees and
faculty as they jointly participated in additional planning.
11. Alumni Affairs. The alumni Regional Directors supervise offices throughout the country which maintain contact with and provide services to graduates of a
major university. This project was designed to help them conceptualize the major issues which their jobs and offices address. Between 10 and 15 regional directors
generated 72 statements which describe "activities and services which your office provides or might provide." The final map, shown in Figure 11, is divided into
five major regions and fifteen clusters.
The clusters at the bottom of the map were related to program issues while the others were more concerned with the management of the alumni offices themselves
(i.e., marketing, professional staff, and volunteer management). On the basis of this map, the alumni directors could devise new programs and approaches for
improving services.
12. Student Life (student assembly). The student assembly is the central student governance body in a major university. They used concept mapping to help them
to identify the issues which they might address during the next academic year. Approximately 20 elected student representatives generated 87 statements which
"describe issues in students' lives which the student assembly might address." The final map is shown in Figure 12.
One striking feature of this map is the degree to which it resembles the two maps discussed earlier (at least at the regional level) in connection with student life
issues (Projects 8 and 9 above). There are four identifiable regions -- Academics, Personal Growth, Social Interactions, and University and Community Issues --
and 15 clusters. More will be said about similarities between concept maps below.
13. Employment. Between 8 and 10 persons representing various groups involved in employment participated in this project, including members of the County
Board, the Department of Social Services, and various agencies which provide employment services. They generated 90 statements which described "issues related
to employment services in the county." The final map is shown in Figure 13.
Five regions and 16 clusters were identified. The largest region in the upper left of the map described the needs and issues of the target population. The "context"
region in the lower left was related to legislative and funding issues. The region which extended from the center of the map to the lower right referred to inter-
agency coordination -- an issue of clear importance to the representatives of the different agencies who participated in this process and compete for clients,
resources, and support. The region in the upper right is related to planning such things as employment training and to better understanding the demographics of the
working population in that area. Finally, the "service and support" region had to do with the direct services -- such as job placement -- which various agencies
provide.
14. Personnel Management. In this project, a four-person department at a water filtration plant wished to examine the current job responsibilities for persons in
the department and the need for additional personnel. All four men (including the supervisor) participated in the brainstorming session which yielded 74 statements
describing the different tasks which they perform in their jobs. Because there were only four people, each was asked to sort the statements twice. In addition, each
was asked to check on a rating sheet any of the tasks which they themselves routinely perform on the job. The final map is shown in Figure 14.
There were five major regions and twenty clusters identified. In addition, maps were produced for each of the four individuals (not shown) showing the statements
on the map which they had some responsibility for. It was immediately apparent that there were few well-delineated areas of responsibility -- each person wound
up doing tasks from all over the map. On the basis of this map, the participants decided to do one additional step to assist them in planning. First, they rated each
cluster for how many man-hours they needed to accomplish that task ideally in a typical month. Second, they rated (individually and then totally) how many man-
hours they currently spend on each task in a typical month. These two ratings could then be graphed onto the map in Figure 14 to provide a visual indicator of
where they believe they are most discrepant from the ideal in terms of personnel resources. Assuming that there are discrepancies, they might then reallocate
current job responsibilities and/or use that information as justification for an additional personnel request. A mapping of job responsibilities like this one may
enable an organization to identify areas where they have over or under-committed personnel and resources.
15. Counseling Services. A counseling agency which provides a wide range of mental health services wished to develop a conceptual framework for its long-term
strategic planning effort. A small group consisting of 10-15 key staff members and members of the Board of Directors generated 80 statements which described
services which the agency does or might provide. In addition to sorting these statements, each person rated them on a 1 (lowest priority) to 5 (highest priority)
scale. The final rating concept map is shown in Figure 15.
Four regions and 15 clusters were identified. At the top of the map are clusters related to major mental illness which in the past has been the primary service focus
for this organization. On the left are medically related services. Community services are on the bottom and specially targeted outpatient services on the right. The
highest priorities were given to the clusters for substance abuse, family oriented outpatient services and services for the elderly. While the first two had already
been addressed to some extent by the agency, there was currently no program for the elderly. Consequently, one outcome of this process was the recognition that
they wished to consider what services might be needed by the elderly. They decided that the best way to begin to address this task would be to involve other local
agencies with responsibilities for the aged to join in a concept mapping process on that topic (see project 17 below). In addition, they intend to use the map as the
basis for examining planning data such as budgetary and staff time allocations by cluster, competition from other agencies for services in each cluster, and so on.
Just as with the priority ratings, each of these additional variables can be overlaid on the original concept map to provide a visual display of the data.
16. Senior Citizens. A consortium of organizations which deal with senior citizens wished to use concept mapping to identify the issues which needed to be
addressed in their area. Between 10 and 15 staff and Board members participated. They generated statements which represented what they believed needed to be
addressed in order to improve services to the elderly. The final map is shown in Figure 16.
Five regions and 12 clusters were identified. Clearly, clusters on the left of the map (and especially on the lower left) were judged as more important. Thus, training
and development, community education, and volunteerism were considered areas which needed special attention, whereas self-help services and recreational
activities did not.
17. Elderly. This project emerged from project number 15 described above. On the basis of that earlier project, an agency identified services for the elderly as a
priority for planning. In order to begin addressing this, they assembled a consortium of agencies which deal with the elderly in their county, including
representatives of the United Way and of health and mental health services. The 10-15 participants generated 95 statements which described "the issues, problems,
concerns or needs which the elderly have in the county." In addition to sorting, each participant rated the statements on a 1 (lowest priority) to 5 (highest priority)
scale. The final map is shown in Figure 17.
There were 18 clusters identified and the group did not wish to define regions on the map. The most intriguing result of this process was the sense within the group
that they had hit upon a "theory" regarding the development of the elderly. They suggested that the map could be interpreted nicely in a counter-clockwise manner
beginning with the rightmost cluster, personal growth and education. The "healthy" elderly would view this as a priority (along with socialization) and are also the
most likely to be politically active and powerful. Consequently, much of the political power available to the elderly is spent on advocacy for issues for the well
elderly. As people grow older (continuing counter-clockwise) they increasingly run into issues related to access to resources (e.g., coordination of local services,
costs). Eventually, housing issues become more predominant involving questions about home care, outreach, supervised care, and the potential for abuse. Clearly at
this point the family is extremely relevant and likely to be involved in the decision making. Eventually with advanced age there are issues of competence,
vulnerability, depression, fears and helplessness and, ultimately, decreased functioning and death. This counter-clockwise interpretation led to a stimulating
discussion where participants concluded that it was important to intervene early in this cycle and involve the well-elderly as advocates for issues which are on their
horizon. In addition, the map is being used as the basis of an examination of current service patterns in the county (by cluster) to determine whether the agency
which initiated this project should continue to consider developing services in this area.
18. Arts Council. An Arts Council which is responsible for fostering and encouraging cultural and artistic efforts in their county wished to use concept mapping as
the basis of their long-term planning process. Approximately 10-15 members of the Board of Directors generated 63 statements which described "what should be
done by an effective Arts Council." In addition, each participant rated each statement on a 1 (lowest) to 5 (highest) priority scale. The final map which resulted is
shown in Figure 18.
The Arts Council is a relatively small organization (1 FTE professional, a part time secretary, and volunteers) which means that it relies on its Board Members
more directly to be active in the addressing the mission of the organization. Prior to the concept mapping, there was little consensus among Board Members
concerning what their roles and functions should be. On the basis of this project, they were able to identify their major tasks as helping to seek funding,
encouraging the educational function of the Council, long-term community involvement, and public relations for the Arts Council and the arts in general. In
addition, they clearly saw the need for ongoing and expanded Board development efforts.
19. Planned Parenthood. A Planned Parenthood organization wished to use concept mapping as the basis for long-term planning. They involved a relatively large
group of between 30 and 35 staff and board members in generating 89 statements which described issues which they needed to address in the longer term future.
The final map (with priority ratings) is shown in Figure 19.
There were 7 regions and 18 clusters identified. In the interpretation, it became apparent that the region pertaining to education on the lower left was given the
highest priority, closely followed by financial issues and public relations concerns.
20. Music and Arts in Daycare. In this project, concept mapping was used to develop the framework for constructing training sessions for daycare providers in
music and art activities for preschool children. Between 10 and 15 daycare providers generated 61 statements which described the types of issues which they
wished to see addressed in training in music and arts. In addition, each participant rated each statement on a 1 (not at all) to 5 (extremely) importance scale. The
final map is shown in Figure 20.
Six clusters were identified in the analysis. The highest importance was assigned to teacher training issues, followed by issues related to the skills and attitudes
which teachers were expected to have. The clusters and individual statements were used to plan for the workshops which were later administered. Thus, in this
case, concept mapping enabled the client group to devise the issue structure for their own training.
Concept Mapping as Soft Science: Some Lingering Issues
The initial motivation for the development of the concept mapping process described here was largely scientific. The thought was that concept mapping could help
in the articulation of the concepts used in social research and in their translation into operationalizations. From the outset it was important to establish: 1) that the
concept mapping process provided an accurate representation of what people were thinking (i.e., reliability and validity), and; 2) that the concept maps could be
integrated into scientific theory-building and experimentation. These two issues are considered separately below. At this point, it is not clear how well the concept
mapping process addresses these two issues and so the discussion here will be preliminary in nature and largely suggestive of some of the methodological research
which might be undertaken to investigate these concerns further.
Reliability and Validity of Concept Mapping. To date, there have been no major attempts to investigate the reliability and validity of concept mapping. For
purposes of discussion , reliability will be understood here to mean the degree to which a map is "repeatable." Validity is meant to refer to the degree to which a
map accurately reflects reality.
In terms of reliability, there are a number of questions which could be asked -- one could look for overall replicability (e.g.,similarity across maps), or look at the
reliability of a specific step in the process. A number of studies suggest themselves:
1. Reliability of brainstorming
2. Reliability of sorting
3. Reliability of ratings
4. Reliability of cluster labeling
5. Reliability of final concept maps
For each of these, one could look at the degree to which the same individual or group gets similar results on multiple occasions or the degree to which several
equivalent groups (i.e., randomly assigned) independently produce similar results. For instance, we could assess the degree to which we get similar maps when we
perform the same process twice on the same group of participants at two different times (a type of test-retest reliability) or we could look at the degree of similarity
between maps based on separate processes carried out simultaneously by random subgroups of the same population (actually, a type of convergent validity). Each
type of study would pose it's own methodological difficulties. For instance, how would one assess the degree of similarity between two sets of brainstormed
statements, two sets of cluster labels, or two maps which were constructed using entirely separate processes? For the reliability of sorting it would be possible to
correlate participant's binary similarity matrices. Reliability of ratings could be assessed with a simple correlation between the ratings.
While no direct evidence is available on reliability, we can get a rough indication by visually examining maps from similar populations on similar topics. For
instance, it is clear from the presentation of projects above that several of the maps which describe issues in students' college lives (Figures 8, 9 and 12) have some
striking similarities in features despite the fact that they were the results of three entirely different processes with three very different groups of students at three
different times. Or, one could look at the similarity between maps from similar organizations, such as the projects from the Community School or Music and Arts
(Figure 10) and the Arts Council (Figure 18); or from similar topic areas such as aging (Figures 16 and 17). In all of these cases, however, the projects differed in
important ways which might minimize the degree to which we would expect similarity in concept maps. For instance, the purpose of Project 9 was to have
Residence Advisors project what they thought the issues were in undergraduates lives whereas in Project 10 the students were asked directly what the issues are.
Lack of agreement between maps could thus be attributable either to low reliability or to the different nature of the projects. Clearly, there is a need for research
which is explicitly constructed to examine reliability issues. In terms of validity of concept mapping, the only direct evidence on the question comes from the work
undertaken by Dumont (this volume) who looked at the degree to which computed concept maps correlated with hand-placed maps (a type of convergent validity).
The evidence is somewhat ambiguous and seems to indicate that estimates of validity depend largely on the level at which one looks for agreement and the manner
in which the validity estimate is computed. Although the preliminary results are promising, Dumont's work needs to be replicated with more people, different types
of participants, and in different subject areas.
There are several general approaches which might be taken in investigations of the validity of concept mapping. One method would be to compare concept maps
(or results of any step in the process) with comparable information generated by some other method, as Dumont (this volume) did when looking at computed versus
hand-placed maps. For instance, one could compare a set of brainstormed statements with transcripts of interviews on the same topic to see whether similar issues
arise.
A second method for examining validity would be to see whether participants could identify the "correct" concept map from a set, much like a witness identifying
an accused criminal in a line-up. For instance, let's say that in addition to generating the computed concept map for a project we also generate three more maps
which have the same statements on them but where the statements are randomly placed on the map. The validity question is whether the participants could identify
the computed map as the one which most accurately reflects their thinking. Incidentally, this type of study, more than almost any other, addresses the distinction
between soft science and hard art raised here. If participants cannot distinguish the map computed from their data from randomly generated ones, there would be no
effective argument for the validity of this process. On the other hand, if we asked people to tell us which maps were the most suggestive, interesting, or creative, it
might very well turn out that randomly-generated ones would be chosen. After all, if people can sensibly interpret randomly generated ink blots in psychological
testing, why wouldn't they be able to form interpretations of random concept maps (which would still use the statements which they brainstormed)? In fact, there is
some reason to think that deliberate random arrangement of statements on a map -- while not meeting the standard of validity (or accuracy in representing what the
participants actually think) -- might be a good method for getting people to see new relationships and to think creatively.
Finally, it would be possible to examine validity by looking at whether concept maps confirm theoretically expected differences. For instance, we might have two
groups of participants -- say, teachers and students -- involved in a study where we have some clear idea of how we expect these groups to differ in their
conceptualizations. Comparison of their concept maps could help to confirm or deny our expectation. Similarly, we might have students do a conceptual map at the
beginning and end of a course and see whether any changes in the structure of the map correspond to what was communicated in the course or whether the final
map is more like the teacher's than the initial one was.
Concept Mapping in Theory Building.
How might concept mapping be useful in the social sciences? Consider the plight of the graduate student who needs to define the major constructs for a dissertation
project. While all of the texts on research say that is is important to define constructs, there is no concrete advice given on how to articulate a conceptual
framework. The concept mapping approach views concept definition as a measurement task -- much like that of developing a scale. The student generates lots of
statements which described the construct(s) in question, and then organizes those statements in some way. Or, consider the lack of conceptual clarity in much of the
psychological research which is published -- a sentiment well articulated by Sartori (1984) for psychology and the other social sciences. For instance, what are the
distinctions/relationships between terms like "self esteem", "self worth", "self image", "locus of control", "dependency", and so on? Or, what are the distinctions/
relationships between the terms "intelligence", "achievement", "academic performance", and so on? In addition, over the past few years there has been a growing
recognition in the field of evaluation research that improvement is needed in the types of theories which guide evaluation projects. In particular, Bickman (1986)
has called for more theory-driven evaluation and Chen and Rossi (1983; 1987) have argued that "grand" social science theory about social programs needs to be
augmented by or replaced with more concrete theories about how programs function in real-world settings. All of these considerations suggest that concept
mapping might be useful for improving theory-building in the social sciences.
It is important to distinguish between theories and concepts. One thing to recognize is that while theories are built upon concepts, concepts are not, in and of
themselves, theories. The concept maps shown in this paper do not necessarily constitute theories. A theory postulates a relationship -- usually causal -- between
two or more concepts. A concept map provides a framework within which a theory might be stated. Perhaps an example would help to clarify the distinction. Let's
assume that we are interested in evaluating a program designed to improve the self esteem of a certain group of high school students. In trying to define self esteem
we use concept mapping to describe all of the terms we can think of which are related to our notion of self esteem. Presumably, terms which are more similar in our
minds will be closer on the map. In this way we might distinguish several sub-aspects of self esteem and might set self-esteem within a broader framework of other
concepts related to the self. However, the concept map itself does not constitute a theory regarding the effect of our program on self esteem. To achieve such a
theory we need to state how the independent variable (i.e., the program) is related to the concepts on the map. For instance, after reviewing the program in detail,
we might conclude that some aspects of self esteem on the map will most likely be more strongly affected than others. Specifically, we have overlaid our
expectations about program effects onto the conceptual structure, showing where it will affect some concepts and not others. Thus, concept maps can act as the
framework for a statement of theory, but are usually not considered a theory in and of themselves.
This discussion suggests that concept mapping may be particular useful for theory-driven social research because of its detailed, visual, pattern-based
representation of concepts. Trochim (1985; in press) has described this under the framework of "pattern matching." In pattern matching, one needs a theoretical
pattern and an observed one. The theoretical pattern should describe the relationships or outcomes which are expected. The observed pattern consists of the
relationships or outcomes which are measured. To the extent that these patterns match and there are no other theories which would account for the observations as
well, one can conclude that the theory in question is supported. Pattern matching works best when there is a clearly articulated, detailed theoretical pattern because
detailed patterns are more likely to be unique and a match will, consequently, be attributable to this unique theoretical "fingerprint." Concept mapping is
particularly valuable for pattern matching because it can help researchers to generate (scale) their theoretical expectations in detail.
Pattern matching is a general approach to research. In evaluation, it can help to guide the development and assessment of the program, sample, measures and
outcomes. If concept mapping has utility here, it's value will be far-ranging. The theory of pattern matching is discussed in greater detail in Trochim (1985, Chen
and Rossi, 1983; 1987) and is illustrated by the papers in this volume by Davis, Marquart and Caracelli. We have only begun to explore the use of concept mapping
for pattern matching methods. For instance, Trochim (1988) shows how concept maps can be used as the framework for exploring patterns in randomized clinical
drug trial data. More research is needed on the utility of concept mapping for scientific theory building in general and for pattern matching in particular.
References
Bickman, L. (Ed.). (1986). Using program theory in evaluation. New Directions for Program Evaluation. San Francisco, CA: Jossey-Bass.
Chen, H.T. and Rossi, P.H. (1983). Evaluating with sense: The theory-driven approach. Evaluation Review. 7, 283-302.
Chen, H.T. and Rossi, P.H. (1987). The theory-driven approach to validity. Evaluation and Program Planning. 10, 95-103.
Gurowitz, W.D., Trochim, W. and Kramer, H.C. (1988). A process for planning. National Association of Student Personnel Administrators Journal. 25, 4, 226-235.
Sartori, G. (1984). Social Science Concepts: A Systematic Analysis. Beverly Hills, CA: Sage Publications.
Trochim, W. (1988). The effect of Aplrazolam on panic: Patterns across symptoms. Unpublished manuscript, Cornell University.
Trochim, W. (in press). Pattern matching and program theory. Evaluation and Program Planning.
Table 1: Characteristics of Projects
Approx. Number of Number of

Date Types of Participants
Persons Statements
1. Multicultural Awareness Summer 83 12* core staff 74
2. Division of Campus Life Winter 83 45* staff - all levels 137
3. University Health Services Spring 84 50-75 all staff 100
4. DCL Subcommittee Fall 84 11 dept. representatives 11
5. Mental Health Association Fall 84 8-12 board members 80
6. Teaching Measurement Fall 85 20* graduate students 50
7. Cooperative Extension Fall 85 -- staff 75
8. Student Life (R.A.s) Fall 85 12 undergraduate students 129
9. Student Life (dorm residents) Spring 86 10 undergraduate students 46
10. Community School of Music and Arts Summer 86 10 board members 54
11. Alumni Affairs Summer 86 10-15 staff 72
12. Student Life (Student Assembly) Summer 86 20* undergraduate students 87
13. Employment Summer 86 8-10 agency representatives 90
14. Personnel Management Fall 86 4 all staff 74
15. Counseling Services Fall 86 10-15 staff and board 80
16. Senior Citizens Spring 87 10-15 staff and board --
17. Elderly Winter 87 10-15 agency representatives 95
18. Arts Council Winter 87 10-15 staff and board 63
19. Planned Parenthood Spring 88 30-35 staff and board 89
20. Music and Arts in Daycare Spring 88 10-15 daycare providers 61
Table 2: Classification of Projects by Subject
Educational Children Mental

Education Elderly Health The Arts
Admin. & Youth Health
1. Multicultural Awareness X X
2. Division of Campus Life X
3. University Health Services X
4. DCL Subcommittee X
5. Mental Health Association X
6. Teaching Measurement X
7. Cooperative Extension
8. Student Life (R.A.s) X X
9. Student Life (dorm residents) X X
10. Community School of Music and Arts X X X X
11. Alumni Affairs X X
12. Student Life (Student Assembly) X X
13. Employment
14. Personnel Management
15. Counseling Services X X X
16. Senior Citizens X
17. Elderly X
18. Arts Council X X X
19. Planned Parenthood X X X
20. Music and Arts in Daycare X X X X
Table 3: Classification of Projects by Purpose
Survey Curriculum Theory

Planning Evaluation Management
Design Development Building
1. Multicultural Awareness X X X
2. Division of Campus Life X
3. University Health Services X X
4. DCL Subcommittee X X
5. Mental Health Association X X
6. Teaching Measurement X
7. Cooperative Extension X X X
8. Student Life (R.A.s) X X
9. Student Life (dorm residents) X
10. Community School of Music and Arts X X X
11. Alumni Affairs X X
12. Student Life (Student Assembly) X
13. Employment X X
14. Personnel Management X X
15. Counseling Services X X
16. Senior Citizens X X
17. Elderly X X
18. Arts Council X X
19. Planned Parenthood X X
20. Music and Arts in Daycare X X X X
The following document was published in the Journal of Consulting and Clinical Psychology, 1994, Vol. 62, No. 4, 766-775.
Using Concept Mapping to Develop a Conceptual Framework of Staff's Views of a

Supported Employment Program for Persons with Severe Mental Illness

Cornell University
Judith A. Cook
Thresholds National Research and Training Center, Chicago, IL
Rose J. Setze
Cornell University
This research was supported in part through NIMH Grant R01MH46712-01A1, William M.K. Trochim, Principal Investigator; and by the Center
for Mental Health Services, Substance Abuse and Mental Health Services Administration and the US Dept. of Education, National Institute on
Disability and Rehabilitation Research (Cooperative Agreement H133B00011), Judith A. Cook, Principal Investigator. The opinions expressed
herein do not reflect the position or policy of any federal agency and no official endorsement should be inferred.
Abstract
This paper describes the use of concept mapping to develop a pictorial multivariate conceptual frameworkof staff views of a program of supported
employment(SE) for persons with severe mental illness. The SE program involves extended individualized supported employment for clients
through a Mobile Job Support Worker (MJSW) who maintains contact with the client after job placement and supports the client in a variety of
ways. All fourteen staff members of a psychiatric rehabilitation agency with assignments associated with the SE program took part in the process.
They brainstormed a large number of specific program activity statements(N=96), sorted and rated the statements, and interpreted the map that was
produced through multidimensional scaling and hierarchical cluster analysis . The resulting map enabled identification of four issues that should be
included in any theory of SE programs -- the specific activity sequences that characterize the program itself; the pattern of local program evolution,
the definition of program staff roles ; and the influence of key contextual factors such as the client's family or the program's administrative
structure. The implications of concept mapping methodology for theory development and program evaluation are considered.
Using Concept Mapping to Develop a Conceptual Framework of Staff's Views of a Supported Employment
Program for Persons with Severe Mental Illness
Over the past quarter century a shift has occurred from traditional institution-based models of care for persons with severe mental illness to more
individualized community-based treatments. Along with this, the theories and methods that guide psychiatric care necessarily change to address
these shifting realities. This is perhaps most apparent in the field of psychiatric rehabilitation with its emphases on individual consumer goal
setting, skills training, job preparation and employment support (Cook, Solomon and Mock, 1989; Cook, Jonikas and Solomon, 1992). The
theoretical models available to guide psychiatric rehabilitation programs are relatively new and theory-driven field evaluations of these models are
rare or have only recently been initiated (Cook and Razzano, 1992; Cook, 1992). A certain amount of conceptual "looseness" exists in the
psychiatric rehabilitation field, especially in the area of vocational rehabilitation, regarding what constitutes services under various service delivery
models. To date there has been no significant systematic theoretical attempt to unify the disparate concepts involved in vocational rehabilitation
services in order to develop a comprehensive perspective that might guide further empirical work.
This study demonstrates how psychiatric rehabilitation programs as they exist in practice might be represented theoretically. A relatively new
methodology -- termed concept mapping -- is used to represent the implicit constructs and theories of agency staff members directly involved in
service delivery. The concept maps that result can help inform theoreticians about the thinking of practicing professionals and can act as a
framework for more rigorous evaluation of programs that are being implemented.
Models of Vocational Rehabilitation
Over the past several decades, the theory of vocational rehabilitation has experienced several stages of evolution. Original models of vocational
rehabilitation were based on the idea of sheltered workshop employment. Clients were paid a piece rate and worked only with other individuals
who were disabled. Sheltered workshops tended to be "end points" for persons with severe and profound mental retardation since few ever moved
from sheltered to competitive employment (Woest, Klein & Atkins, 1986). Controlled studies of sheltered workshop performance of persons with
mental illness suggested only minimal success (Griffiths, 1974 ; Weinberg & Lustig, 1968) and other research indicated that persons with mental
illness earned lower wages, presented more behavior problems, and showed poorer workshop attendance than workers with other disabilities
(Whitehead, 1977; Ciardiello, 1981; Olshansky, 1973; Olshansky & Beach, 1974; 1975).
Partly in reaction to sheltered workshops, the field of psychiatric rehabilitation services developed a model called transitional employment (TE).
TE uses a "train-then-place" approach with initial training on prevocational work crews, followed by a series of temporary community job
placements at minimum wage or above, with support that gradually tapers as the client moves through jobs with increasing independence, pay,
hours, and responsibilities (Dincin, 1975, Robinault & Weidinger, 1978).
In the 1980s, a new model of services called Supported Employment (SE) was proposed as less expensive and more normalizing for persons
undergoing rehabilitation (Wehman, 1985). Using a "place-then-train" approach, the SE model emphasizes first locating a job in an integrated
setting for minimum wage or above, and then placing the person on the job and providing the training and support services needed to remain
employed (Wehman, 1985). Services such as individualized job development, one-on-one job coaching, advocacy with co-workers and employers,
and "fading" support were found to be effective in maintaining employment for individuals with severe and profound mental retardation (Revell,
Wehman & Arnold, 1984). The idea that this model could be generalized to persons with all types of severe disabilities, including severe mental
illness, became commonly accepted (Wehman, 1982; Chadsey-Rusch & Rusch, 1986; Roessler, 1980).
Currently, the most popular vocational rehabilitation models for persons with severe mental illness are TE, SE, and variations on these. The major
differences between the TE and SE models are: 1) TE is a train-then-place model whereas SE is a place-then-train model; 2) TE generally has a
limit on the total amount of time clients can hold a particular position while SE has no limit; and, 3) TE jobs are filled by the agency as it sees fit
and do not technically belong to the client while SE positions are held directly by the client.
Over time, some TE programs have moved to a more place-then-train SE approach, eliminating their prevocational crews and dropping transitional
positions. Others have rejected SE, arguing for the necessity of transitional steps, for assessing clients' vocational strengths and weaknesses, and for
creating a sense of immediate participation in productive, if unpaid, work. Still others have integrated principles of SE into ongoing TE programs,
creating "hybrid" models that consist of an "array of services" (Cook, Jonikas & Solomon, 1992).
Brief description of a Mobile Job Support Worker

One such hybrid approach was developed at Thresholds, the site for the present study, which created a new staff position called the mobile job
support worker (MJSW) and removed the existing six month time limit for many placements. MJSWs provide ongoing, mobile support and
intervention at or near the work site, even for jobs with high degrees of independence (Cook & Hoffschmidt, 1993). Time limits for many
placements were removed so that clients could stay on as permanent employees if they and their employers wished. The suspension of time limits
on job placements, along with MJSW support, became the basis of SE services delivered at Thresholds.
Vocational rehabilitation programs vary in how they operationalize services or programs -- different programs often handle vocational, clinical,
programmatic and administrative aspects differently. There is no comprehensive theory that considers the definitions of and relationships among
these elements as they are implemented in practice, thus hampering efforts to evaluate program effects. One way to develop a more comprehensive
theory would begin with observations of programs as implemented in the field and work inductively toward clearer delineation of the implicit
theories that currently guide them. This inductive strategy requires a methodology that can elicit the central constructs of program staff members in
a way that is both theoretically and operationally sensible.
Brief description of Concept Mapping

A methodology developed by Trochim (1989a) , called "concept mapping," seems well-suited for an inductive approach to developing such a
theory, and is employed in this study. Concept mapping combines a group process (brainstorming, unstructured sorting and rating of the
brainstormed items) with several multivariate statistical analyses (multidimensional scaling and hierarchical cluster analysis) and concludes with a
group interpretation of the conceptual maps that result. This paper illustrates the use of concept mapping for developing a graphic conceptual
framework of the views of the staff of a psychiatric rehabilitation agency that provides a vocational rehabilitation program of SE for persons with
severe mental illness.
Method
Subjects
All participants were staff members at the Thresholds in Chicago, Illinois. Thresholds is a mental health service provider and its Research Institute
serves as the site for a National Research and Training Center on Rehabilitation and Mental Illness, with several programs of ongoing mental health
research and training. Thresholds has two urban rehabilitation service branches, one on the north side and one on the south side of Chicago. The
focus for this study was on the agency's specific SE Program. This program is designed to provide extended individualized SE for clients through a
Mobile Job Support Worker (MJSW) who maintains contact with the client after job placement and supports the client in a variety of ways,
including going to the job site if necessary to work with the client, employer, or fellow employees. Along with the longer-term post-placement
provision of SE through the MJSW once placed in a job, all program participants receive in-agency life skills and employment training and
assistance in job development and placement. The program operates at both branches of the agency.
Participants consisted of all agency staff members identified with assignments relevant to the SE program. This included the directors of both
branches, vocational staff who provide in-agency job skills training, job coaches who support clients at community work sites, and the MJSWs who
provide post-placement employment support. Fourteen agency staff members met these criteria and attended the two concept mapping sessions;
two were persons with disabilities.
Procedure
The general procedure for concept mapping is described in detail in Trochim (1989a) . Examples of results of numerous concept mapping projects
are given in Trochim (1989b) . The process implemented here was accomplished in two successive evening sessions in January 1992, with each
session lasting about three hours. A portable IBM computer was used throughout the sessions along with an overhead projector panel so that all
participants could observe the computer operations. All analyses were conducted and maps produced using the Concept System© computer
software1 that was designed for this process.
Brief description of Specific Program Activity Statements.
Session 1: Generation and Structuring of Conceptual Domain. At the first session, participants generated statements using a structured
brainstorming process (Osborn, 1948) guided by a specific focus prompt that limits the types of statements that are acceptable. The focus statement
or criterion for generating statements was operationalized in the form of the instruction to the participants:
Generate statements (short phases or sentences) which describe specific

activities that are part of the supported employment program at Thresholds.
The general rules of brainstorming applied. Participants were encouraged to generate as many statements as possible (with an upper limit of 100);
no criticism or discussion of other's statements was allowed (except for purposes of clarification); and all participants were encouraged to take part.
The group brainstormed ninety-six statements in approximately a half-hour.
Participants were given a short break while the statements were printed and duplicated for use in the structuring stage. Structuring involved two
distinct tasks, the sorting and rating of the brainstormed statements. For the sorting (Rosenberg and Kim, 1975; Weller and Romney, 1988), each
participant was given a listing of the statements laid out in mailing label format with twelve to a page and asked to cut the listing into slips with one
statement (and its identifying number) on each slip. They were instructed to group the ninety-six statement slips into piles "in a way that makes
sense to you." The only restrictions in this sorting task were that there could not be: (a) N piles (in this case 96 piles of one item each); (b) one pile
consisting of all 96 items; or (c) a "miscellaneous" pile (any item thought to be unique was to be put in its own separate pile). Weller and Romney
(1988) point out why unstructured sorting (in their terms, the pile sort method) is appropriate in this context:
The outstanding strength of the pile sort task is the fact that it can accommodate a large number of items. We know of no other
data collection method that will allow the collection of judged similarity data among over 100 items. This makes it the method of
choice when large numbers are necessary. Other methods that might be used to collect similarity data, such as triads and paired
comparison ratings, become impractical with a large number of items (p. 25
After sorting the statements, each participant recorded the contents of each pile by listing the statement identifying numbers on the back of the
rating sheet. For the rating task, the brainstormed statements were listed in questionnaire form and each participant was asked to rate each
statement on a 5-point Likert-type response scale in terms of how important the statement is to their idea of supported employment where
1=relatively unimportant (compared with the rest of the statements); 2=somewhat important; 3=moderately important; 4=very important, and,
5=extremely important. Because participants were unlikely to brainstorm statements that were totally unimportant with respect to SE, it was
stressed that the rating should be considered a relative judgment of the importance of each item to all the other items brainstormed.
This concluded the first session. Between the two sessions, the sorting and rating data were entered into the computer, the MDS and cluster
analysis were conducted, and materials were produced for the second session.
Brief description of Multidimensional Scaling.

Data Analysis. Examination of the data and preliminary results indicated problems in the sorts of several participants. For one of the fourteen
participants, the sort was not completed. For two others, the sorts had a single pile with an unusually large number of statements in it (i.e., more
than one-third of all brainstormed statements). This might have been due to participant misunderstanding of the sorting task, resistance to doing this
task conscientiously, or it may legitimately represent the way they saw the categories. Weller and Romney (1988) point out that problems that can
affect the final MDS configuration can arise when including data from persons who create larger, more generic categories ("lumpers") with data
from those who create smaller categories with finer distinctions ("splitters"). With a small overall sample size, the inclusion of sort data from
extreme lumpers could easily lessen the interpretability of the maps. Consequently, a decision rule was used where any sort that had a single pile
including more than one-third of the brainstormed statements would be eliminated from the final analysis. According to this rule, two participant
sorts were eliminated. In addition, the incomplete sort was also excluded, yielding a final sorting sample size of eleven participants. All fourteen
participants completed the ratings and their ratings were included in the analysis.
The concept mapping analysis begins with construction from the sort information of an NxN binary, symmetric matrix of similarities, Xij. For any
two items i and j, a 1 was placed in Xij if the two items were placed in the same pile by the participant, otherwise a 0 was entered (Weller and
Romney, 1988, p. 22). The total NxN similarity matrix, Tij was obtained by summing across the individual Xij matrices. Thus, any cell in this
matrix could take integer values between 0 and 11 (i.e., the 11 people who sorted the statements); the value indicates the number of people who
placed the i,j pair in the same pile.
The total similarity matrix Tij was analyzed using nonmetric multidimensional scaling (MDS) analysis with a two-dimensional solution. The
solution was limited to two dimensions because, as Kruskal and Wish (1978) point out:
Since it is generally easier to work with two-dimensional configurations than with those involving more dimensions, ease of use
considerations are also important for decisions about dimensionality. For example, when an MDS configuration is desired
primarily as the foundation on which to display clustering results, then a two-dimensional configuration is far more useful than
one involving three or more dimensions (p. 58).
The analysis yielded a two-dimensional (x,y) configuration of the set of statements based on the criterion that statements piled together most often
are located more proximately in two-dimensional space while those piled together less frequently are further apart.
Brief description of Hierarchical Cluster Analysis.

This configuration was the input for the hierarchical cluster analysis utilizing Ward's algorithm (Everitt, 1980) as the basis for defining a cluster.
Using the MDS configuration as input to the cluster analysis in effect forces the cluster analysis to partition the MDS configuration into non-
overlapping clusters in two-dimensional space. There is no simple mathematical criterion by which a final number of clusters can be selected. The
procedure followed here was to examine an initial cluster solution that on average placed five statements in each cluster. Then, successively lower
and higher cluster solutions were examined, with a judgment made at each level about whether the merger/split seemed substantively reasonable.
The pattern of judgments of the suitability of different cluster solutions was examined and resulted in acceptance of the eighteen cluster solution as
the one that preserved the most detail and yielded substantively interpretable clusters of statements.
The MDS configuration of the ninety-six points was graphed in two dimensions. This "point map" displayed the location of all the brainstormed
statements with statements closer to each other generally expected to be more similar in meaning. A "cluster map" was also generated. It displayed
the original ninety-six points enclosed by boundaries for the eighteen clusters.
The 1-to-5 rating data was averaged across persons for each item and each cluster. This rating information was depicted graphically in a "point
rating map" showing the original point map with average rating per item displayed as vertical columns in the third dimension, and in a "cluster
rating map" which showed the cluster average rating using the third dimension. The following materials were prepared for use in the second
session:
1. the list of the brainstormed statements grouped by cluster

2. the point map showing the MDS placement of the brainstormed statements and their identifying numbers
3. the cluster map showing the eighteen cluster solution
4. the point rating map showing the MDS placement of the brainstormed statements and their identifying numbers, with average statement
ratings overlaid
5. the cluster rating map showing the eighteen cluster solution, with average cluster ratings overlaid
Session 2: Interpretation of the Concept Maps. The second session convened to interpret the results of the concept mapping analysis. This session
followed a structured process described in detail in Trochim (1989a) . The facilitator began the session by giving the participants the listing of
clustered statements and reminding them of the brainstorming, sorting and rating tasks performed the previous evening. Each participant was asked
to read silently through the set of statements in each cluster and generate a short phrase or word to describe or label the set of statements as a
cluster. The facilitator then led the group in a discussion where they worked cluster-by-cluster to achieve group consensus on an acceptable label
for each cluster. In most cases, when persons suggested labels for a specific cluster, the group readily came to a consensus. Where the group had
difficulty achieving a consensus, the facilitator suggested they use a hybrid name, combining key terms or phrases from several individuals' labels.
Once the clusters were labeled, the group was given the point map and told that the analysis placed the statements on the map so that statements
frequently piled together are generally closer to each other on the map than statements infrequently piled together. To reinforce the notion that the
analysis placed the statements sensibly, participants were given a few minutes to identify statements close together on the map and examine the
contents of those statements. After becoming familiar with the numbered point map, they were told that the analysis also organized the points (i.e.,
statements) into groups as shown on the list of clustered statements they had already labeled. The cluster map was presented and participants were
told that it was simply a visual portrayal of the cluster list. Each participant wrote the cluster labels next to the appropriate cluster on their cluster
map.
Participants then examined the labeled cluster map to see whether it made sense to them. The facilitator reminded participants that in general,
clusters closer together on the map should be conceptually more similar than clusters farther apart and asked them to assess whether this seemed to
be true or not. Participants were asked to think of a geographic map, and "take a trip" across the map reading each cluster in turn to see whether or
not the visual structure seemed sensible. They were then asked to identify any interpretable groups of clusters or "regions." These were discussed
and partitions drawn on the map to indicate the different regions. Just as in labeling the clusters, the group then arrived at a consensus label for each
of the identified regions.
The facilitator noted that all of the material presented to this point used only the sorting data. The results of the rating task were then presented
through the point rating and cluster rating maps. It was explained that the height of a point or cluster represented the average importance rating for
that statement or cluster of statements. Again, participants were encouraged to examine these maps to determine whether they made intuitive sense
and to discuss what the maps might imply about the ideas that underlie their conceptualization. Given this new context for meaning, cluster labels,
regions and region labels were open for revision, although none were suggested. The remainder of the session was devoted to summarizing the
process.
Results
To illustrate the types of statements that were brainstormed and how these were clustered in the analysis, the statements comprising two of the final
eighteen clusters are shown in Table 1 along with their identifying numbers and average importance ratings. 2
For the two-dimensional solution of the MDS analysis, the final Stress value was .31. The two-dimensional configuration of the ninety-six
brainstormed statements is graphed in Figure 1.
Figure 1. Two-dimensional concept map showing the ninety-six brainstormed statements and the eighteen clusters.
In the figure, each statement is indicated by a dot with the statement identifying number next to it. Clusters of statements are enclosed in polygons
and numbered. The statements given in Table 1 can be found in cluster 3 on the lower right and cluster 9 on the bottom of the map. There are
several pairs of statements that fall in virtually the same place on the maps. If the assumptions underlying the analysis are correct, these should be
very similar statements. For instance, on the left of the map, statements 77 (helping families adjust to member independence) and 78 (helping
family members accept realistic placement of member) are located in the same place, as expected given their similar content. On the bottom of the
map, statements 33 (providing vocational support groups) and 96 (having an employment group) overlap, while in the lower right, statements 7
(modeling job activities) and 13 (modeling appropriate behavior for the work environment) fall together. In all of these, the proximity of the
statement pairs makes sense given the high degree of similarity in meaning.
The final concept map that resulted from the group interpretation is shown in Figure 2.
Figure 2. Final concept map showing the eighteen labeled clusters with cluster layers indicating average importance ratings.
Here, each cluster is shown with its label. Clusters have been divided into four general regions that were also labeled by the participants. The
number of layers used to depict the cluster borders indicates the average importance rating across all items in the cluster. A cluster border is drawn
only to show the statement points that fall within that cluster -- there is no meaningful interpretation to the size or specific shape of a cluster.
Although the distances among points and clusters are fixed in MDS, the directionality of the map is entirely arbitrary. The map could be rotated in
any direction or flipped horizontally or vertically without changing the distances among items or clusters. In other words, there is no substantive
meaning to the fact that the Administration region is located at the top while the Prevocational region is on the bottom of the page.
Reliability and Consistency of Maps
It is important to consider the reliability or consistency of the results of the concept mapping process. No single estimate of the reliability of this
complex multi-step process is possible. However, the reliability or consistency of the key part of the concept mapping analysis -- from sorting
through MDS -- can be examined in several ways.
For instance, an estimator analogous to an average inter-item correlation for the sort data can be computed by looking at the average
interrelationships among the sorts of the participants. This can be accomplished by computing the contingency coefficient (McNemar, 1955) for all
pairs of sorts. For the thirteen participants who had complete sort data (two of these were subsequently excluded from the final map analysis as
described earlier), there are 78 pairs ((13x12)/2) for which the contingency coefficient can be estimated. The resulting coefficients ranged from .62
to .95, with an average of .85. All coefficients were statistically significant with p<.01 as determined by a chi-square test. The contingency
coefficient is not without interpretative complication (Siegel, 1956), but nevertheless indicates that there is considerable consistency among the
sorters.
Perhaps the best way to assess the reliability of the analysis is to estimate a coefficient analogous to a split-half reliability. Here, the eleven
participants were randomly divided into two groups (five in one and six in the other). Similarity matrices were constructed separately for each
random group, and separate MDS configurations were computed. The correlation between the two halves was estimated and the Spearman-Brown
correction was applied to estimate reliability. The reliability estimate for the similarity matrices was .79 (df=4,559, p<.001) and the reliability
estimate for the final MDS configuration was .56 (df="4,559," p<.001). These significant positive correlations imply that even with the extremely
small number of participant sorts aggregated, there is a clear discernible statistical consistency in the results.
Two cases were excluded because their sorts contained at least one pile with over one-third of all the brainstormed statements. The effects of these
exclusions can be examined in two ways. First, the correlation of the similarity matrices with and without exclusions was estimated. This
correlation was .95 (n=4,560 pairs of 96 similarities, p<.001). Second, as one would expect, the relationship between the final MDS configurations
with and without the exclusions was lower (r=".56," df="4,559," p<.001), although still highly significant. While excluding the two "lumpers" from
the analysis may make the result more interpretable, the final maps with and without exclusions are highly and significantly correlated, suggesting
that no substantial meaning was lost.
All three analyses taken together suggest that there is an internal consistency to the final map -- the results cannot plausibly be attributed to chance
or systematic error.
Discussion
The final map is complex and could be interpreted in a wide variety of ways. This discussion focuses on four salient aspects of the map and their
implications for improving our understanding of the theory of SE and subsequent evaluations of SE programs.
Brief description of Specific Activity Sequences.

The Activity Sequence. The final map shows a progression of services moving from the bottom left quadrant counter-clockwise along the outside
toward the right side. In this "activity sequence", SE has a prevocational phase (i.e., training) involving goal setting, money management, job
preparation skills, and self-reliance training. Next, work logistics appear (i.e., placement), followed by the clusters of job coaching, job site liaison
services, mobile job support, assessing/processing work experiences, and caseworker responsibilities (i.e., training). In the terms presented earlier,
this sequence does not perfectly fit either of the two dominant psychiatric vocational rehabilitation models (i.e., the transitional train-then-place or
the supported place-then-train). The model reflected in the map is more a hybrid of the two idealized models, probably best described as a "train-
then-place-then-train" variation.
The map also describes the major concepts that make up the various training and placement activities in the sequence. For instance, the initial
prevocational training includes the clusters "Goal Setting", "Money Management", "Self Reliance", and "Job Preparation and Skills." The specific
brainstormed statements within each of these clusters describes operationalized activities involved in implementing those aspects of the program.
For example, the Goal Setting cluster includes brainstormed statements 14 (recruiting members for job site) and 30 (discussing appropriate long
and short term goals). The cluster Job Preparation and Skills, includes statements 11 (teaching a member to look for a job) and 35 (assisting with
resume writing).
Thus, the map shows that a unique hybrid sequence of activities (train-then-place-then-train) represents the model followed in the program at
Thresholds. In addition, the map suggests some of the major theoretical concepts (i.e., the clusters) involved in each of these three activity stages.
Finally, the map provides considerable detail about how these concepts are operationalized in the program from the perspective of the staff
members who provide the services.
Brief description of the Pattern of Local Program Evolution.

Local Program Evolution. The evidence in the map of a train-then-place-then-train hybrid model reflects the evolution of the SE program at this
agency. As described in Cook and Razzano (1992) , Thresholds originally used a traditional train-then-place transitional employment (TE) model.
Clients started out in a prevocational crew where they learned generic work skills such as punctuality and productivity while undergoing a series of
vocational assessments designed to target work-related difficulties. At the same time, clients began participating in a program offering an extensive
set of services, including social skills, housing and independent living training, leisure and recreation activities, medication management, and
education services. Services were meant to complement each other and to prepare clients for employment in the community. This traditional train-
then-place model of TE is described on the map by the four Prevocational clusters (train) and the cluster Work Logistics (place).
With the advent of supported employment, Thresholds produced its hybrid model by combining several elements of SE models with its ongoing TE
program to produce the hybrid model. The two most important features added to the TE program were: 1) the suspension of time limits on job
placements, and 2) the development of the Mobile Job Support Worker (MJSW) position to provide ongoing training and support after placement.
A different map would be expected from an agency that introduced an SE program without a TE program already in place. In this case, one might
expect that staff would abbreviate or omit the Prevocational region shown in the lower left of the map and begin with immediate job placement
followed by specific skills training and workplace-based support, etc.
This is an important issue for both theory development and evaluation of SE programs. Simple distinctions between place-then-train and train-then-
place models may not describe program practice well. In methodological terms, such simple theoretical distinctions do not ensure the construct
validity of the treatment. Generalizing from hybrid programs grossly classified as either TE or SE may obscure theoretical implications and confuse
the interpretation of observed results. The quality of cross-study meta-evaluations will rest in part on our ability to describe the central program
activity sequences accurately.
In a larger sense, it is also worth noting that one can read the map from left to right as a description of the transition or evolution from facility-
based care to community-based job placement. On the facility side are familiar traditional issues of family involvement, clinical feedback, and even
staff training and development. On the far right are the newer community-based arenas related to the MJSW role, job coach and job site liaison
issues, and employer concerns. The map depicts this facility-based to community-based dimensionality and suggests some of the key constructs
that bridge between the two (e.g., work logistics).
Brief description of Staff Role.

Staff Roles. The map provides evidence that the staff perceives a strong relationship between case management (the Case Worker Responsibility
cluster) and vocational services (the MJSW cluster). Traditionally, case management and vocational services are considered to be distinct and are
delivered by separate staff members. The proximity of these clusters on the map suggests that staff do not perceive this role distinction. This
finding is corroborated by Cook and Razzano's (1992, p. 37) analysis of three years of logged descriptions of MJSW support where the SE services
included many of the services on the right-hand, outside quadrant of the map (e.g., casework, ,job site liaison, and providing verbal support as well
as resolving vocational problems).
The close relationship of these two staff roles is important for the theory of SE programs. In the traditional SE model they are viewed as distinct,
but there may be compelling reasons for combining them. As Cook and Razzano (1992) note, as clients with mental illness succeed at their jobs
they may confront a set of social and interpersonal demands that require case management support. As a result, SE for persons with psychiatric
disabilities may need to extend beyond skills training and workplace support (the vocational component) to other life areas reflecting the nature of
rehabilitation as a multi-dimensional process. During the implementation phase of SE services at Thresholds, considerable time was spent
discussing caseworkers' concerns that MJSWs would encroach on their turf, while MJSWs reported that clients were calling upon them for "non-
vocational" needs (i.e., case management and clinical assistance). It does appear that these services are closely related both in the ways staff think
about their jobs (as reflected in the maps) and in the amount of time they actually devoted to these services (as reflected in the logged data reported
by Cook and Razzano, 1992) . Theories of SE need to address how these roles are defined now and could be redefined in the future.
Brief description of Contextual Factors.

Contextual Factors. The concept map shows two major areas that consist primarily of what might best be termed contextual factors -- psychosocial
issues on the left side and administrative ones on the top. Both have important roles to play in any theory of SE.
The distance of the family and psychiatric issue clusters from both the Prevocational and the Vocational Service regions suggests that staff do not
view them as intimately involved with SE services. The relatively strong importance rating given to these clusters suggests that staff view family
and psychiatric issues as important in this context. Several investigators have commented on the exclusion of families in the process of psychiatric
rehabilitation (Cook, 1988; Spaniol, Zipple and Lockwood, 1992). The Thresholds agency whose staff participated in the mapping has no formal
mechanism for family involvement in vocational service delivery. This may explain why family issues were located in the extreme left-hand
section of the map away from all other service clusters. The location of the family and psychiatric clusters near the Prevocational region suggests
that if families and the traditional psychiatric treatment community do have a role to play in SE, it may be most important and relevant in the early
formative stages of the transitional process. Or, if their involvement in later phases is desired, it must be explicitly structured into the community-
based component because staff may not naturally see them as part of SE.
The other notable contextual area of the map concerns administrative issues. It comes as no surprise that program staff members would perceive
administrative matters as integral to their idea of SE programs. What is perhaps surprising is that evaluations of social programs so seldom
incorporate administrative matters into their theoretical frameworks. Chen (1990) emphasizes the need to do so, describing the implementation
environment as one of six important domains that need to be included in program theory. The concept map identifies some of the major
administrative areas of relevance. The specific brainstormed statements in the clusters begin to delineate how these areas are operationalized in the
management of the program. Information of this nature is essential for the development of implementation theory for SE programs.
Conclusions
The concept map that resulted from this process suggests a number of central constructs that ought to be included in any theory about SE programs.
These fall into four major areas. First, one must address the variety of activity sequences that characterize SE and related programs. In addition to
the train-then-place (TE) and place-then-train (SE) models, it is especially important to investigate whether programs as implemented follow these
idealized models or manifest more complex variations, as was evident in the program studied here. The theory should spell out the different
possible sequences and delineate the specific types of activities that are implied by "train" and "place." Second, SE theory should consider the local
program evolution that provider agencies follow in developing SE programs out of existing ones. Providers seldom start new programs in a
vacuum. Those who develop SE programs will often continue to implement some current program efforts while phasing out others. The transition
itself is likely to affect the nature of the resulting SE program and contribute to the development of various hybrid models, as was evident in the
concept map developed here. Third, theory development must consider the implications of SE programs with respect to staff roles. The potential
for role confusion or conflict is especially great between staff who represent newer community-based SE models and those who were engaged in
more agency-based vocational programs. Role issues are also more likely to arise in agencies that are transitioning from other models to SE
programs than in agencies that are developing totally new programs. Fourth, a theory of SE must include consideration of contextual factors that
affect program development and implementation. Important among these factors would be the role of the family in SE, the influence of the broader
health and mental health system, the influence of local employment factors, and the administrative structures and processes of the provider
agencies.
Of course, there are likely to be other important elements for a theory of SE programs than the ones identified through this concept mapping
process alone, although the four outlined above would almost certainly need to be included in any credible theory. This study was conducted at
only one agency with a small group of staff members. Results may not be generalizable beyond that immediate context. Replication of the concept
mapping process described here with the staffs of other SE providers would permit a better assessment of the validity of these findings. Replication
with different relevant populations, especially other mental health and rehabilitation professionals, social scientists, family and community
members, and the consumers of the services, would likely help identify additional key constructs for a comprehensive theory of SE programs.
Brief description of the implications of Concept Mapping Methodology.

The concept mapping process offers the consulting psychologist and other professional facilitators a practical tool that has broad applicability. In
addition to its value for theory development as described in this study, concept maps can be used in planning the future implementation of a
program. Participants might divide into small task groups to consider how to improve different aspects of a program. Each task group could be
given one or two cluster areas from the map (concentrating first on those clusters that were rated most important). They might read the statements
that fell into their assigned clusters and generate a few specific action statements that could help improve the quality of the program in that area.
These can then be brought back to the group as a whole for consideration at a single meeting with a concrete action being taken on each
recommended action statement (e.g., reject the action statement with reasons detailed; accept a modified form of the action statement with rationale
for the changes; or accept the action as suggested). This type of process, with specific feedback, helps to reinforce the importance of staff input in
assuring the ongoing quality of the program.
Concept mapping is also useful in periodic implementation assessment. On a regular basis (quarterly, semi-annually) program staff can meet to go
over each of the major clusters on the map and discuss how well the program is being implemented. The map essentially acts as an organizing
device or agenda for the discussion. It is used to review progress and spot implementation problems before they become serious. The maps might
also be used directly with clients to review the program.
Concept mapping can be used in reporting and describing the program. Sometimes, it is useful to organize records (manual files, databases) based
on the taxonomy of issues that appear on the map. Cluster topics can be major headings, with the statements in each cluster serving as potential
folders or sub-files. Or, the structure of the map can be used when summarizing program activities. This can be done in graphic form on the map
itself, or the cluster labels can be used as the elements in an outline or text presentation.
Finally, concept mapping can be used to enhance the evaluation of programs. The map provides a multivariate framework that describes the central
constructs of the program and, through the specific brainstormed statements, suggests operationalizations that can be implemented or observed.
Clusters on the map can be used as a guide for setting up program implementation checks that assess the degree to which the various activities are
carried out in each area. Results from such checks can be displayed on the original map and compared with the importance ratings of staff to
determine whether the activities deemed most important are most salient in actual implementation. Thus, the map can be viewed as an
operationalization of program theory, essential for examining the construct validity of the cause (Cook and Campbell, 1979). An analogous process
directed toward mapping program outcomes would improve our ability to assess the construct validity of the effect and make possible a much
richer multivariate pattern matching approach (Trochim, 1989c; Trochim and Cook, 1992) to evaluating social programs.
References
Chadsey-Rusch, J. and Rusch, F.R. (1986). The ecology of the workplace. In J.

Chadsey-Rusch, C. Haney-Maxwell, L. A. Phelps and F. R. Rusch (Eds.),
School-to-Work Transition Issues and Models. (pp. 59-94), Champaign IL: Transition Institute at Illinois.
Chen, H. (1990). Theory-Driven Evaluations. Newbury Park, CA, Sage Publications.
Ciardiello. J.A. (1981).Job placement success of schizophrenic clients in sheltered
workshop programs. Vocational Evaluation and Work Adjustment Bulletin, 14, 125-128, 140.
Cook. J.A. (1988). Who "mothers" the chronically mentally ill? Family Relations,
37(1), 42-49. Cook. J.A. (1992). Job ending among youth and adults with severe mental illness.
Journal of Mental Health Administration, 19(2), 158-169.
Cook. J.A., & Hoffschmidt. S. (1993). Psychosocial rehabilitation programming: A
comprehensive model for the 1990's. In R.W. Flexer and P. Solomon (Eds.), Social and Community Support for People with Severe Mental
Disabilities: Service Integration in Rehabilitation and Mental Health. Andover, MA: Andover Publishing.
Cook, J.A., Jonikas, J., & Solomon, M. (1992). Models of vocational rehabilitation for youth and adults with severe mental illness. American
Rehabilitation, 18, 3, 6-32.
Cook, J.A. & Razzano, L. (1992). Natural vocational supports for persons with severe mental illness: Thresholds Supported Competitive
Employment Program, in L. Stein (ed.), New Directions for Mental Health Services, San Francisco: Jossey-Bass, 56, 23-41.
Cook, J.A., Solomon, M., & Mock, L. (1989). What happens after the first job placement: Vocational transitions among severely emotionally
disturbed and behavior disordered adolescents. Programming for Children and Adolescents with Behavioral Disorders, 4, 71-93.
Cook, T.D. and Campbell, D.T. (1979). Quasi-Experimentation: Design and analysis issues for field settings. Boston: Houghton-Mifflin.
Dincin, J. (1975). Psychiatric rehabilitation. Schizophrenia Bulletin, 1, 131-138.
Everitt, B. (1980). Cluster Analysis. 2nd Edition, New York, NY: Halsted Press, A Division of John Wiley and Sons.
Griffiths, R.D. (1974). Rehabilitation of chronic psychotic patients. Psychological Medicine, 4, 316-325.
Kruskal, J.B. and Wish, M. (1978). Multidimensional Scaling. Beverly Hills, CA: Sage Publications.
McNemar, Q. (1955). Psychological Stratistics (2nd edition). New York, NY: Wiley.
Olshansky, S. (1973). A five-year follow-up of psychiatrically disabled clients. Rehabilitation Literature, 34, 15-16.
Olshansky, S. & Beach, D. (1974). A five-year follow-up of mentally retarded clients. Rehabilitation Literature, 35, 48-49.
Olshansky, S. & Beach, D. (1975). A five-year follow-up of psychiatrically disabled clients. Rehabilitation Literature, 36, 251-252, 258.
Osborn, A.F. (1948). Your Creative Power. New York, NY: Charles Scribner.
Revell, G., Wehman, P. & Arnold, S. (1984). Supported work model of competitive employment for persons with mental retardation: Implications
of rehabilitative services. Journal of Rehabilitation, 50, 33-38.
Robinault, I. and Weidinger, M. (1978). Mobilization of Community Resources: A Multi-Faceted Model for Rehabilitation of Post-Hospitalized
Mentally Ill. New York NY: ICD Rehabilitation and Research Center.
Roessler, R. T. (1980). Factors affecting client achievement of rehabilitation goals. Journal of Applied Rehabilitation Counseling, 11, 169-172.
Rosenberg, S. and Kim, M.P. (1975). The method of sorting as a data gathering procedure in multivariate research. Multivariate Behavioral
Research, 10, 489-502.
Siegel, S. (1956). Nonparametric Statistics for the Behavioral Sciences. New York, NY: McGraw-Hill.
Spaniol, L., Zipple, A. M. & Lockwood, D. (1992). The role of the family in psychiatric rehabilitation. Schizophrenia Bulletin, 18(3), 341-348.
Trochim, W. (1989a). An introduction to concept mapping for planning and evaluation. Evaluation and Program Planning, 12, 1, 1-16.
Trochim, W. (1989b). Concept mapping: Soft science or hard art? Evaluation and Program Planning, 12, 1, 87-110.
Trochim, W. (1989c). Outcome pattern matching and program theory. Evaluation and Program Planning, 12, 4, 355-366.
Trochim, W. and Cook, J. (1992). Pattern matching in theory-driven evaluation: A field example from psychiatric rehabilitation. in H. Chen and P.
H. Rossi (Eds.) Using Theory to Improve Program and Policy Evaluations. Greenwood Press, New York, 49-69.
Wehman, P. (1982). Competitive Employment: New Horizons for Severely Disabled Individuals. Baltimore MD: Paul H. Brookes.
Wehman, P. (1985). Supported competitive employment for persons with severe disabilities. In P. McCarthy, J. Everson, S. Monn & M. Barcus
(Eds.), School-to-Work Transition for Youth with Severe Disabilities, (pp. 167-182), Richmond VA: Virginia Commonwealth University.
Weinberg, J.L. & Lustig, P. (1968). A workshop experience for post-hospitalized schizophrenics. In G. N. Wright and A. B. Trotter, Rehabilitation
Research. (pp. 72-78), Madison, Weller S.C. & Romney, A.K. (1988). Systematic Data Collection. Newbury Park, CA, Sage Publications.
Whitehead, C.W. (1977). Sheltered Workshop Study: A Nationwide Report on Sheltered Workshops and their Employment of Handicapped
Individuals. (Workshop Survey, Volume 1), U.S. Department of Labor Service Publication. Washington, DC: U.S. Government Printing Office.
Woest, J., Klein, M. and Atkins, B.J. (1986). An overview of supported employment strategies. Journal of Rehabilitation Administration, 10(4),
130-135.
Table 1
Two of the eighteen clusters from the final map solution showing the original brainstormed statements (ID number indicates the order of
brainstorming; number in parenthesis is the average importance rating for the item or cluster)
Cluster 3: MJSW (Mobile Job Support Worker) (3.56)
12. accompanying a member to a job initially (3.214286)
85. fostering the development and maintenance of member's healthy relationships with peers/coworkers (3.857143)
94. holding group meetings at the job site (3.142857)
38. interpreting for hearing impaired members (4.214286)
49. assisting in job enhancement (3.357143)
Cluster 9: Job Preparation and Skills (3.64)
11. teaching a member to look for a job (3.357143)
80. teaching basic skills on a crew to prepare members for jobs (4.000000)
17. having a good idea of job expectations (3.500000)
92. helping members view jobs as positive transitional placements (3.642857)
33. providing vocational support groups (4.214286)
96. having an employment group (4.357143)
35. assisting with resume writing (2.571429)
58. instructing members on the purpose and value of supported employment (3.000000)
89. assisting adjustment after job loss (4.142857)
Footnotes
1 The Concept System © computer software is available for IBM-PC and compatabile computers. The program is a complete user-friendly package
for implementing the concept mapping process. It is used to enter brainstormed statements, print these for sorting and rating, enter sorting and
rating data, conduct the statistical analysis (including multidimensional scaling and hierarchical cluster analysis) and display a wide variety of map
results. The user can interact directly with the program when creating and examining maps. Information about the software may be obtained by
writing to Concept Systems, P.O. Box 4721, Ithaca NY 14853 or calling (607) 257-2375.
2 This abbreviated listing is presented for illustrative purposes only. The complete list of all ninety-six brainstormed statatements and eighteen
clusters can be obtained from the first author: William M.K. Trochim, Department of Human Service Studies, MVR Hall, Cornell University,
Ithaca NY 14853.
[ Home ] [ Construct Validity ] [ Reliability ] [ Levels of Measurement ] [ Survey Research ] [ Scaling ] [ Qualitative Measures ] [ Unobtrusive Measures ]
Measurement Validity Types Construct validity refers to the degree to which inferences can legitimately be made
Idea of Construct Validity from the operationalizations in your study to the theoretical constructs on which those
Convergent & Discriminant Validity
operationalizations were based. Like external validity, construct validity is related to
The Nomological Network
generalizing. But, where external validity involves generalizing from your study context
The Multitrait-Multimethod Matrix to other people, places or times, construct validity involves generalizing from your
Pattern Matching for Construct Validity program or measures to the concept of your program or measures. You might think of
construct validity as a "labeling" issue. When you implement a program that you call a
"Head Start" program, is your label an accurate one? When you measure what you term
"self esteem" is that what you were really measuring?
I would like to tell two major stories here. The first is the more straightforward one. I'll
discuss several ways of thinking about the idea of construct validity, several metaphors
that might provide you with a foundation in the richness of this idea. Then, I'll discuss
the major construct validity threats, the kinds of arguments your critics are likely to raise
when you make a claim that your program or measure is valid. In most research
methods texts, construct validity is presented in the section on measurement. And, it is
typically presented as one of many different types of validity (e.g., face validity,
predictive validity, concurrent validity) that you might want to be sure your measures
have. I don't see it that way at all. I see construct validity as the overarching quality with
all of the other measurement validity labels falling beneath it. And, I don't see construct
validity as limited only to measurement. As I've already implied, I think it is as much a
part of the independent variable -- the program or treatment -- as it is the dependent
variable. So, I'll try to make some sense of the various measurement validity types and
try to move you to think instead of the validity of any operationalization as falling within
the general category of construct validity, with a variety of subcategories and subtypes.
The second story I want to tell is more historical in nature. During World War II, the U.S.
government involved hundreds (and perhaps thousands) of psychologists and
psychology graduate students in the development of a wide array of measures that
were relevant to the war effort. They needed personality screening tests for prospective
fighter pilots, personnel measures that would enable sensible assignment of people to
job skills, psychophysical measures to test reaction times, and so on. After the war,
these psychologists needed to find gainful employment outside of the military context,
and it's not surprising that many of them moved into testing and measurement in a
civilian context. During the early 1950s, the American Psychological Association began
to become increasingly concerned with the quality or validity of all of the new measures
that were being generated and decided to convene an effort to set standards for
psychological measures. The first formal articulation of the idea of construct validity
came from this effort and was couched under the somewhat grandiose idea of the
nomological network. The nomological network provided a theoretical basis for the idea
of construct validity, but it didn't provide practicing researchers with a way to actually
establish whether their measures had construct validity. In 1959, an attempt was made
to develop a method for assessing construct validity using what is called a multitrait-
multimethod matrix, or MTMM for short. In order to argue that your measures had
construct validity under the MTMM approach, you had to demonstrate that there was
both convergent and discriminant validity in your measures. You demonstrated
convergent validity when you showed that measures that are theoretically supposed to
be highly interrelated are, in practice, highly interrelated. And, you showed discriminant
validity when you demonstrated that measures that shouldn't be related to each other in
fact were not. While the MTMM did provide a methodology for assessing construct
validity, it was a difficult one to implement well, especially in applied social research
contexts and, in fact, has seldom been formally attempted. When we examine carefully
the thinking about construct validity that underlies both the nomological network and the
MTMM, one of the key themes we can identify in both is the idea of "pattern." When we
claim that our programs or measures have construct validity, we are essentially claiming
that we as researchers understand how our constructs or theories of the programs and
measures operate in theory and we claim that we can provide evidence that they
behave in practice the way we think they should. The researcher essentially has a
theory of how the programs and measures related to each other (and other theoretical
terms), a theoretical pattern if you will. And, the researcher provides evidence through
observation that the programs or measures actually behave that way in reality, an
observed pattern. When we claim construct validity, we're essentially claiming that our
observed pattern -- how things operate in reality -- corresponds with our theoretical
pattern -- how we think the world works. I call this process pattern matching, and I
believe that it is the heart of construct validity. It is clearly an underlying theme in both
the nomological network and the MTMM ideas. And, I think that we can develop
concrete and feasible methods that enable practicing researchers to assess pattern
matches -- to assess the construct validity of their research. The section on pattern
matching lays out my idea of how we might use this approach to assess construct
validity.

True Score Theory Reliability has to do with the quality of measurement. In its everyday sense, reliability is the
Measurement Error "consistency" or "repeatability" of your measures. Before we can define reliability precisely we have to
Theory of Reliability
lay the groundwork. First, you have to learn about the foundation of reliability, the true score theory of
Reliability & Validity
measurement. Along with that, you need to understand the different types of measurement error
because errors in measures play a key role in degrading reliability. With this foundation, you can
consider the basic theory of reliability, including a precise definition of reliability. There you will find out
that we cannot calculate reliability -- we can only estimate it. Because of this, there a variety of
different types of reliability that each have multiple ways to estimate reliability for that type. In the end,
it's important to integrate the idea of reliability with the other major criteria for the quality of
measurement -- validity -- and develop an understanding of the relationships between reliability and
validity in measurement.

The level of measurement refers to the relationship among the values that are assigned to the attributes for a
variable. What does that mean? Begin with the idea of the variable, in this example "party affiliation." That variable
has a
number
of
attributes.
Let's
assume
that in
this
particular
election
context
the only
relevant
attributes
are
"republican",
"democrat",
and
"independent".
For
purposes
of
analyzing the results of this variable, we arbitrarily assign the values 1, 2 and 3 to the three attributes. The level of
measurement describes the relationship among these three values. In this case, we simply are using the numbers
as shorter placeholders for the lengthier text terms. We don't assume that higher values mean "more" of something
and lower numbers signify "less". We don't assume the the value of 2 means that democrats are twice something
that republicans are. We don't assume that republicans are in first place or have the highest priority just because
they have the value of 1. In this case, we only use the values as a shorter name for the attribute. Here, we would
describe the level of measurement as "nominal".
Why is Level of Measurement Important?
First, knowing the level of measurement helps you decide how to interpret the data from that variable. When you
know that a measure is nominal (like the one just described), then you know that the numerical values are just short
codes for the longer names. Second, knowing the level of measurement helps you decide what statistical analysis is
appropriate on the values that were assigned. If a measure is nominal, then you know that you would never average
the data values or do a t-test on the data.
There are typically four levels of measurement that are defined:
Nominal
Ordinal
Interval
Ratio
In nominal measurement the numerical values just "name" the attribute uniquely. No ordering of the cases is
implied. For example, jersey numbers in basketball are measures at the nominal level. A player with number 30 is
not more of anything than a player with number 15, and is certainly not twice whatever number 15 is.
In ordinal measurement the attributes can be rank-ordered. Here, distances between attributes do not have any
meaning. For example, on a survey you might code Educational Attainment as 0=less than H.S.; 1=some H.S.; 2=H.
S. degree; 3=some college; 4=college degree; 5=post college. In this measure, higher numbers mean more
education. But is distance from 0 to 1 same as 3 to 4? Of course not. The interval between values is not
interpretable in an ordinal measure.
In interval measurement the

distance between attributes
does have meaning. For
example, when we measure
temperature (in Fahrenheit),
the distance from 30-40 is
same as distance from 70-80.
The interval between values
is interpretable. Because of
this, it makes sense to
compute an average of an
interval variable, where it
doesn't make sense to do so
for ordinal scales. But note
that in interval measurement
ratios don't make any sense -
80 degrees is not twice as hot
as 40 degrees (although the
attribute value is twice as
large).
Finally, in ratio measurement

there is always an absolute zero that is meaningful. This means that you can construct a meaningful fraction (or
ratio) with a ratio variable. Weight is a ratio variable. In applied social research most "count" variables are ratio, for
example, the number of clients in past six months. Why? Because you can have zero clients and because it is
meaningful to say that "...we had twice as many clients in the past six months as we did in the previous six months."
It's important to recognize that there is a hierarchy implied in the level of measurement idea. At lower levels of
measurement, assumptions tend to be less restrictive and data analyses tend to be less sensitive. At each level up
the hierarchy, the current level includes all of the qualities of the one below it and adds something new. In general, it
is desirable to have a higher level of measurement (e.g., interval or ratio) rather than a lower one (nominal or
ordinal).
Types of Surveys Survey research is one of the most important areas of measurement in applied social
Selecting the Survey Method research. The broad area of survey research encompasses any measurement procedures
Constructing the Survey
that involve asking questions of respondents. A "survey" can be anything form a short paper-
Interviews
and-pencil feedback form to an intensive one-on-one in-depth interview.
Plus & Minus of Survey Methods
We'll begin by looking at the different types of surveys that are possible. These are roughly
divided into two broad areas: Questionnaires and Interviews. Next, we'll look at how you
select the survey method that is best for your situation. Once you've selected the survey
method, you have to construct the survey itself. Here, we will be address a number of issues
including: the different types of questions; decisions about question content; decisions about
question wording; decisions about response format; and, question placement and sequence
in your instrument. We turn next to some of the special issues involved in administering a
personal interview. Finally, we'll consider some of the advantages and disadvantages of
survey methods.

General Issues in Scaling Scaling is the branch of measurement that involves the construction of an instrument that
Thurstone Scaling associates qualitative constructs with quantitative metric units. Scaling evolved out of efforts in
Likert Scaling
psychology and education to measure "unmeasurable" constructs like authoritarianism and self
Guttman Scaling
esteem. In many ways, scaling remains one of the most arcane and misunderstood aspects of
social research measurement. And, it attempts to do one of the most difficult of research tasks --
measure abstract concepts.
Most people don't even understand what scaling is. The basic idea of scaling is described in
General Issues in Scaling, including the important distinction between a scale and a response
format. Scales are generally divided into two broad categories: unidimensional and
multidimensional. The unidimensional scaling methods were developed in the first half of the
twentieth century and are generally named after their inventor. We'll look at three types of
unidimensional scaling methods here:
Thurstone or Equal-Appearing Interval Scaling

Likert or "Summative" Scaling
Guttman or "Cumulative" Scaling
In the late 1950s and early 1960s, measurement theorists developed more advanced techniques
for creating multidimensional scales. Although these techniques are not considered here, you may
want to look at the method of concept mapping that relies on that approach to see the power of
these multivariate methods.

The Qualitative Debate Qualitative research is a vast and complex area of methodology that can easily take up whole
Qualitative Data textbooks on its own. The purpose of this section is to introduce you to the idea of qualitative
Qualitative Approaches
research (and how it is related to quantitative research) and give you some orientation to the major
Qualitative Methods
types of qualitative research data, approaches and methods.
Qualitative Validity
There are a number of important questions you should consider before undertaking qualitative
research:
Do you want to generate new theories or hypotheses?
One of the major reasons for doing qualitative research is to become more
experienced with the phenomenon you're interested in. Too often in applied social
research (especially in economics and psychology) we have our graduate students
jump from doing a literature review on a topic of interest to writing a research
proposal complete with theories and hypotheses based on current thinking. What
gets missed is the direct experience of the phenomenon. We should probably
require of all students that before they mount a study they spend some time living
with the phenomenon. Before doing that multivariate analysis of gender-based
differences in wages, go observe several work contexts and see how gender tends
to be perceived and seems to affect wage allocations. Before looking at the
effects of a new psychotropic drug for the mentally ill, go spend some time visiting
several mental health treatment contexts to observe what goes on. If you do, you
are likely to approach the existing literature on the topic with a fresh perspective
born of your direct experience. You're likely to begin to formulate your own ideas
about what causes what else to happen. This is where most of the more
interesting and valuable new theories and hypotheses will originate. Of course,
there's a need for balance here as in anything else. If this advice was followed
literally, graduate school would be prolonged even more than is currently the case.
We need to use qualitative research as the basis for direct experience, but we also
need to know when and how to move on to formulate some tentative theories and
hypotheses that can be explicitly tested.
Do you need to achieve a deep understanding of the issues?
I believe that qualitative research has special value for investigating complex and
sensitive issues. For example, if you are interested in how people view topics like
God and religion, human sexuality, the death penalty, gun control, and so on, my
guess is that you would be hard-pressed to develop a quantitative methodology
that would do anything more than summarize a few key positions on these issues.
While this does have its place (and its done all the time), if you really want to try to
achieve a deep understanding of how people think about these topics, some type
of in-depth interviewing is probably called for.
Are you willing to trade detail for generalizability?
Qualitative research certainly excels at generating information that is very detailed.

Of course, there are quantitative studies that are detailed also in that they involve
collecting lots of numeric data. But in detailed quantitative research, the data
themselves tend to both shape and limit the analysis. For example, if you collect a
simple interval-level quantitative measure, the analyses you are likely to do with it
are fairly delimited (e.g., descriptive statistics, use in correlation, regression or
multivariate models, etc.). And, generalizing tends to be a fairly straightforward
endeavor in most quantitative research. After all, when you collect the same
variable from everyone in your sample, all you need to do to generalize to the
sample as a whole is to compute some aggregate statistic like a mean or median.
Things are not so simple in most qualitative research. The data are more "raw"
and are seldom pre-categorized. Consequently, you need to be prepared to
organize all of that raw detail. And there are almost an infinite number of ways this
could be accomplished. Even generalizing across a sample of interviews or written
documents becomes a complex endeavor.
The detail in most qualitative research is both a blessing and a curse. On the
positive side, it enables you to describe the phenomena of interest in great detail,
in the original language of the research participants. In fact, some of the best
"qualitative" research is often published in book form, often in a style that almost
approaches a narrative story. One of my favorite writers (and, I daresay, one of the
finest qualitative researchers) is Studs Terkel. He has written intriguing accounts
of the Great Depression (Hard Times), World War II (The Good War) and
socioeconomic divisions in America (The Great Divide), among others. In each
book he follows a similar qualitative methodology, identifying informants who
directly experienced the phenomenon in question, interviewing them at length, and
then editing the interviews heavily so that they "tell a story" that is different from
what any individual interviewee might tell but addresses the question of interest. If
you haven't read one of Studs' works yet, I highly recommend them.
On the negative side, when you have that kind of detail, it's hard to determine what
the generalizable themes may be. In fact, many qualitative researchers don't even
care about generalizing -- they're content to generate rich descriptions of their
phenomena.
That's why there is so much value in mixing qualitative research with quantitative.
Quantitative research excels at summarizing large amounts of data and reaching
generalizations based on statistical projections. Qualitative research excels at
"telling the story" from the participant's viewpoint, providing the rich descriptive
detail that sets quantitative results into their human context.
Is funding available for this research?
I hate to be crass, but in most social research we do have to worry about how it will
get paid for. There is little point in proposing any research that would be unable to
be carried out for lack of funds. For qualitative research this is an often especially
challenging issue. Because much qualitative research takes an enormous amount
of time, is very labor intensive, and yields results that may not be as generalizable
for policy-making or decision-making, many funding sources view it as a "frill" or as
simply too expensive.
There's a lot that you can (and shouldn't) do in proposing qualitative research that
will often enhance its fundability. My pet peeve with qualitative research proposals
is when the author says something along these lines (Of course, I'm paraphrasing
here. No good qualitative researcher would come out and say something like this
directly.):
This study uses an emergent, exploratory, inductive

qualitative approach. Because the basis of such an
approach is that one does not predetermine or delimit the
directions the investigation might take, there is no way to
propose specific budgetary or time estimates.
Of course, this is just silly! There is always a way to estimate (especially when we
view an estimate as simply an educated guess!). I've reviewed proposals that say
almost this kind of thing and let me assure you that I and other reviewers don't
judge the researcher's credibility as very high under these circumstances. As an
alternative that doesn't hem you in or constrain the methodology, you might reword
the same passage something like:
This study uses an emergent, exploratory, inductive

qualitative approach. Because the basis of such an
approach is that one does not predetermine or delimit the
directions the investigation might take, it is especially
important to detail the specific stages that this research will
follow in addressing the research questions. [Inset detailed
description of data collection, coding, analysis, etc.
Especially note where there may be iterations of the
phases.]. Because of the complexities involved in this type of
research, the proposal is divided into several broad stages
with funding and time estimates provided for each. [Provide
detail].
Notice that the first approach is almost an insult to the reviewer. In the second, the
author acknowledges the unpredictability of qualitative research but does as
reasonable a job as possible to anticipate the course of the study, its costs, and
milestones. Certainly more fundable.

Unobtrusive measures are measures that don't require the researcher to intrude in the research context. Direct and
participant observation require that the researcher be physically present. This can lead the respondents to alter their
behavior in order to look good in the eyes of the researcher. A questionnaire is an interruption in the natural stream
of behavior. respondents can get tired of filling out a survey or resentful of the questions asked.
Unobtrusive measurement presumably reduces the biases that result from the intrusion of the researcher or
measurement instrument. However, unobtrusive measures reduce the degree the researcher has over the type of
data collected. For some constructs there may simply not be any available unobtrusive measures.
Three types of unobtrusive measurement are discussed here.
Indirect Measures
An indirect measure is an unobtrusive measure that occurs naturally in a research context. The researcher is able
to collect the data without introducing any formal measurement procedure.
The types of indirect measures that may be available are limited only by the researcher's imagination and
inventiveness. For instance, let's say you would like to measure the popularity of various exhibits in a museum. It
may be possible to set up some type of mechanical measurement system that is invisible to the museum patrons.
In one study, the system was simple. The museum installed new floor tiles in front of each exhibit they wanted a
measurement on and, after a period of time, measured the wear-and-tear of the tiles as an indirect measure of
patron traffic and interest. We might be able to improve on this approach considerably using electronic measures.
We could, for instance, construct an electrical device that senses movement in front of an exhibit. Or we could place
hidden cameras and code patron interest based on videotaped evidence.
One of my favorite indirect measures occurred in a study of radio station listening preferences. Rather than
conducting an obtrusive survey or interview about favorite radio stations, the researchers went to local auto dealers
and garages and checked all cars that were being serviced to see what station the radio was currently tuned to. In
a similar manner, if you want to know magazine preferences, you might rummage through the trash of your sample
or even stage a door-to-door magazine recycling effort.
These examples illustrate one of the most important points about indirect measures -- you have to be very careful
about the ethics of this type of measurement. In an indirect measure you are, by definition, collecting information
without the respondent's knowledge. In doing so, you may be violating their right to privacy and you are certainly not
using informed consent. Of course, some types of information may be public and therefore not involve an invasion
of privacy.
There may be times when an indirect measure is appropriate, readily available and ethical. Just as with all
measurement, however, you should be sure to attempt to estimate the reliability and validity of the measures. For
instance, collecting radio station preferences at two different time periods and correlating the results might be useful
for assessing test-retest reliability. Or, you can include the indirect measure along with other direct measures of the
same construct (perhaps in a pilot study) to help establish construct validity.
Content Analysis
Content analysis is the analysis of text documents. The analysis can be quantitative, qualitative or both. Typically,
the major purpose of content analysis is to identify patterns in text. Content analysis is an extremely broad area of
research. It includes:
Thematic analysis of text
The identification of themes or major ideas in a document or set of documents. The documents
can be any kind of text including field notes, newspaper articles, technical papers or organizational
memos.
Indexing
There are a wide variety of automated methods for rapidly indexing text documents. For instance,
Key Words in Context (KWIC) analysis is a computer analysis of text data. A computer program
scans the text and indexes all key words. A key word is any term in the text that is not included in
an exception dictionary. Typically you would set up an exception dictionary that includes all non-
essential words like "is", "and", and "of". All key words are alphabetized and are listed with the text
that precedes and follows it so the researcher can see the word in the context in which it occurred in
the text. In an analysis of interview text, for instance, one could easily identify all uses of the term
"abuse" and the context in which they were used.
Quantitative descriptive analysis
Here the purpose is to describe features of the text quantitatively. For instance, you might want to
find out which words or phrases were used most frequently in the text. Again, this type of analysis
is most often done directly with computer programs.
Content analysis has several problems you should keep in mind. First, you are limited to the types of information
available in text form. If you are studying the way a news story is being handled by the news media, you probably
would have a ready population of news stories from which you could sample. However, if you are interested in
studying people's views on capital punishment, you are less likely to find an archive of text documents that would be
appropriate. Second, you have to be especially careful with sampling in order to avoid bias. For instance, a study of
current research on methods of treatment for cancer might use the published literature as the population. This
would leave out both the writing on cancer that did not get published for one reason or another as well as the most
recent work that has not yet been published. Finally, you have to be careful about interpreting results of automated
context analyses. A computer program cannot determine what someone meant by a term or phrase. It is relatively
easy in a large analysis to misinterpret a result because you did not take into account the subtleties of meaning.
However, content analysis has the advantage of being unobtrusive and, depending on whether automated methods
exist, can be a relatively rapid method for analyzing large amounts of text.
Secondary Analysis of Data
Secondary analysis, like content analysis, makes use of already existing sources of data. However, secondary
analysis typically refers to the re-analysis of quantitative data rather than text.
In our modern world there is an unbelievable mass of data that is routinely collected by governments, businesses,
schools, and other organizations. Much of this information is stored in electronic databases that can be accessed
and analyzed. In addition, many research projects store their raw data in electronic form in computer archives so
that others can also analyze the data. Among the data available for secondary analysis is:
census bureau data

crime records
standardized testing data
economic data
consumer data
Secondary analysis often involves combining information from multiple databases to examine research questions.
For example, you might join crime data with census information to assess patterns in criminal behavior by
geographic location and group.
Secondary analysis has several advantages. First, it is efficient. It makes use of data that were already collected by
someone else. It is the research equivalent of recycling. Second, it often allows you to extend the scope of your
study considerably. In many small research projects it is impossible to consider taking a national sample because of
the costs involved. Many archived databases are already national in scope and, by using them, you can leverage a
relatively small budget into a much broader study than if you collected the data yourself.
However, secondary analysis is not without difficulties. Frequently it is no trivial matter to access and link data from
large complex databases. Often the researcher has to make assumptions about what data to combine and which
variables are appropriately aggregated into indexes. Perhaps more importantly, when you use data collected by
others you often don't know what problems occurred in the original data collection. Large, well-financed national
studies are usually documented quite thoroughly, but even detailed documentation of procedures is often no
substitute for direct experience collecting data.
One of the most important and least utilized purposes of secondary analysis is to replicate prior research findings.
In any original data analysis there is the potential for errors. In addition, each data analyst tends to approach the
analysis from their own perspective using analytic tools they are familiar with. In most research the data are
analyzed only once by the original research team. It seems an awful waste. Data that might have taken months or
years to collect is only examined once in a relatively brief way and from one analyst's perspective. In social
research we generally do a terrible job of documenting and archiving the data from individual studies and making
these available in electronic form for others to re-analyze. And, we tend to give little professional credit to studies
that are re-analyses. Nevertheless, in the hard sciences the tradition of replicability of results is a critical one and we
in the applied social sciences could benefit by directing more of our efforts to secondary analysis of existing data.

[ Home ] [ The Qualitative Debate ] [ Qualitative Data ] [ Qualitative Approaches ] [ Qualitative Methods ] [ Qualitative Validity ]
The Qualitative-Quantitative Debate
There has probably been more energy expended on debating the differences between and relative advantages of qualitative and
quantitative methods than almost any other methodological topic in social research. The "qualitative-quantitative debate" as it is
sometimes called is one of those hot-button issues that almost invariably will trigger an intense debate in the hotel bar at any social
research convention. I've seen friends and colleagues degenerate into academic enemies faster than you can say "last call."
After years of being involved in such verbal brawling, as an observer and direct participant, the only conclusion I've been able to reach
is that this debate is "much ado about nothing." To say that one or the other approach is "better" is, in my view, simply a trivializing of
what is a far more complex topic than a dichotomous choice can settle. Both quantitative and qualitative research rest on rich and
varied traditions that come from multiple disciplines and both have been employed to address almost any research topic you can think
of. In fact, in almost every applied social research project I believe there is value in consciously combining both qualitative and
quantitative methods in what is referred to as a "mixed methods" approach.
I find it useful when thinking about this debate to distinguish between the general assumptions involved in undertaking a research
project (qualitative, quantitative or mixed) and the data that are collected. At the level of the data, I believe that there is little difference
between the qualitative and the quantitative. But at the level of the assumptions that are made, the differences can be profound and
irreconcilable (which is why there's so much fighting that goes on).
Qualitative and Quantitative Data
It may seem odd that I would argue that there is little difference between qualitative and quantitative data. After all, qualitative data
typically consists of words while quantitative data consists of numbers. Aren't these fundamentally different? I don't think so, for the
following reasons:
All qualitative data can be coded quantitatively.
What I mean here is very simple. Anything that is qualitative can be assigned meaningful numerical values. These values can then be
manipulated to help us achieve greater insight into the meaning of the data and to help us examine specific hypotheses. Let's consider
a simple example. Many surveys have one or more short open-ended questions that ask the respondent to supply text responses.
The simplest example is probably the "Please add any additional comments" question that is often tacked onto a short survey. The
immediate responses are text-based and qualitative. But we can always (and usually will) perform some type of simple classification of
the text responses. We might sort the responses into simple categories, for instance. Often, we'll give each category a short label that
represents the theme in the response.
What we don't often recognize is that even the simple act of categorizing can be viewed as a quantitative one as well. For instance,
let's say that we develop five themes that each respondent could express in their open-ended response. Assume that we have ten
respondents. We could easily set up a simple coding table like the one in the figure below to represent the coding of the ten responses
into the five themes.
Person Theme 1 Theme 2 Theme 3 Theme 4 Theme 5

1
2
3
4
5
6
7
8
9
10
This is a simple qualitative thematic coding analysis. But, we can represent exactly the same information quantitatively as in the
following table:
Person Theme 1 Theme 2 Theme 3 Theme 4 Theme 5 Totals

1 1 1 0 1 0 3
2 1 0 1 0 0 2
3 1 1 0 1 0 3
4 0 1 0 1 0 2
5 0 1 0 1 1 3
6 1 1 0 0 1 3
7 0 0 1 1 1 3
8 0 1 0 1 0 2
9 0 0 1 0 1 2
10 0 0 0 1 1 2
Totals 4 6 3 7 5
Notice that this is the exact same data. The first would probably be called a qualitative coding while the second is clearly quantitative.
The quantitative coding gives us additional useful information and makes it possible to do analyses that we couldn't do with the
qualitative coding. For instance, from just the table above we can say that Theme 4 was the most frequently mentioned and that all
respondents touched on two or three of the themes. But we can do even more. For instance, we could look at the similarities among
the themes based on which respondents addressed them. How? Well, why don't we do a simple correlation matrix for the table
above. Here's the result:
Theme 1 Theme 2 Theme 3 Theme 4

Theme 2 0.250
Theme 3 -0.089 -0.802
Theme 4 -0.356 0.356 -0.524
Theme 5 -0.408 -0.408 0.218 -0.218
The analysis shows that Themes 2 and 3 are strongly negatively correlated -- People who said Theme 2 seldom said Theme 3 and vice
versa (check it for yourself). We can also look at the similarity among respondents as shown below:
P1 P2 P3 P4 P5 P6 P7 P8 P9
P2 -0.167
P3 1.000 -0.167
P4 0.667 -0.667 0.667
P5 0.167 -1.000 0.167 0.667
P6 0.167 -0.167 0.167 -0.167 0.167
P7 -0.667 -0.167 -0.667 -0.167 0.167 -0.667
P8 0.667 -0.667 0.667 1.000 0.667 -0.167 -0.167
P9 -1.000 0.167 -1.000 -0.667 -0.167 -0.167 0.667 -0.667
P10 -0.167 -0.667 -0.167 0.167 0.667 -0.167 0.667 0.167 0.167
We can see immediately that Persons 1 and 3 are perfectly correlated (r = +1.0) as are Persons 4 and 8. There are also a few perfect
opposites (r = -1.0) -- P1 and P9, P2 and P5, and P3 and P9.
We could do much more. If we had more respondents (and we often would with a survey), we could do some simple multivariate
analyses. For instance, we could draw a similarity "map" of the respondents based on their intercorrelations. The map would have
one dot per respondent and respondents with more similar responses would cluster closer together.
The point is that the line between qualitative and quantitative is less distinct than we sometimes imagine. All qualitative data can be
quantitatively coded in an almost infinite varieties of ways. This doesn't detract from the qualitative information. We can still do any
kinds of judgmental syntheses or analyses we want. But recognizing the similarities between qualitative and quantitative information
opens up new possibilities for interpretation that might otherwise go unutilized.
Now to the other side of the coin...
All quantitative data is based on qualitative judgment.
Numbers in and of themselves can't be interpreted without understanding the assumptions which underlie them. Take, for example, a
simple 1-to-5 rating variable:
Here, the respondent answered 2=Disagree. What does this mean? How do we interpret the value "2" here? We can't really
understand this quantitative value unless we dig into some of the judgments and assumptions that underlie it:
Did the respondent understand the term "capital punishment"?

Did the respondent understand that a "2" means that they are disagreeing with the statement?
Does the respondent have any idea about alternatives to capital punishment (otherwise how can they judge
what's "best")?
Did the respondent read carefully enough to determine that the statement was limited only to convicted
murderers (for instance, rapists were not included)?
Does the respondent care or were they just circling anything arbitrarily?
How was this question presented in the context of the survey (e.g., did the questions immediately before
this one bias the response in any way)?
Was the respondent mentally alert (especially if this is late in a long survey or the respondent had other
things going on earlier in the day)?
What was the setting for the survey (e.g., lighting, noise and other distractions)?
Was the survey anonymous? Was it confidential?
In the respondent's mind, is the difference between a "1" and a "2" the same as between a "2" and a "3" (i.
e., is this an interval scale?)?
We could go on and on, but my point should be clear. All numerical information involves numerous judgments about what the number
means.
The bottom line here is that quantitative and qualitative data are, at some level, virtually inseparable. Neither exists in a vacuum or can
be considered totally devoid of the other. To ask which is "better" or more "valid" or has greater "verisimilitude" or whatever ignores the
intimate connection between them. To do good research we need to use both the qualitative and the quantitative.
Qualitative and Quantitative Assumptions
To say that qualitative and quantitative data are similar only tells half the story. After all, the intense academic wrangling of the
qualitative-quantitative debate must have some basis in reality. My sense is that there are some fundamental differences, but that they
lie primarily at the level of assumptions about research (epistemological and ontological assumptions) rather than at the level of the
data.
First, let's do away with the most common myths about the differences between qualitative and quantitative research. Many people
believe the following:
Quantitative research is confirmatory and deductive in nature.

Qualitative research is exploratory and inductive in nature.
I think that while there's a shred of truth in each of these statements, they are not exactly correct. In general, a lot of quantitative
research tends to be confirmatory and deductive. But there's lots of quantitative research that can be classified as exploratory as well.
And while much qualitative research does tend to be exploratory, it can also be used to confirm very specific deductive hypotheses.
The problem I have with these kinds of statements is that they don't acknowledge the richness of both traditions. They don't recognize
that both qualitative and quantitative research can be used to address almost any kind of research question.
So, if the difference between qualitative and quantitative is not along the exploratory-confirmatory or inductive-deductive dimensions,
then where is it?
My belief is that the heart of the quantitative-qualitative debate is philosophical, not methodological. Many qualitative researchers
operate under different epistemological assumptions from quantitative researchers. For instance, many qualitative researchers
believe that the best way to understand any phenomenon is to view it in its context. They see all quantification as limited in nature,
looking only at one small portion of a reality that cannot be split or unitized without losing the importance of the whole phenomenon.
For some qualitative researchers, the best way to understand what's going on is to become immersed in it. Move into the culture or
organization you are studying and experience what it is like to be a part of it. Be flexible in your inquiry of people in context. Rather
than approaching measurement with the idea of constructing a fixed instrument or set of questions, allow the questions to emerge and
change as you become familiar with what you are studying. Many qualitative researchers also operate under different ontological
assumptions about the world. They don't assume that there is a single unitary reality apart from our perceptions. Since each of us
experiences from our own point of view, each of us experiences a different reality. Conducting research without taking this into account
violates their fundamental view of the individual. Consequently, they may be opposed to methods that attempt to aggregate across
individuals on the grounds that each individual is unique. They also argue that the researcher is a unique individual and that all
research is essentially biased by each researcher's individual perceptions. There is no point in trying to establish "validity" in any
external or objective sense. All that we can hope to do is interpret our view of the world as researchers.
Let me end this brief excursion into the qualitative-quantitative debate with a few personal observations. Any researcher steeped in the
qualitative tradition would certainly take issue with my comments above about the similarities between quantitative and qualitative data.
They would argue (with some correctness I fear) that it is not possible to separate your research assumptions from the data. Some
would claim that my perspective on data is based on assumptions common to the quantitative tradition. Others would argue that it
doesn't matter if you can code data thematically or quantitatively because they wouldn't do either -- both forms of analysis impose
artificial structure on the phenomena and, consequently, introduce distortions and biases. I have to admit that I would see the point in
much of this criticism. In fact, I tend to see the point on both sides of the qualitative-quantitative debate.
In the end, people who consider themselves primarily qualitative or primarily quantitative tend to be almost as diverse as those from the
opposing camps. There are qualitative researchers who fit comfortably into the post-positivist tradition common to much contemporary
quantitative research. And there are quantitative researchers (albeit, probably fewer) who use quantitative information as the basis for
exploration, recognizing the inherent limitations and complex assumptions beneath all numbers. In either camp, you'll find intense and
fundamental disagreement about both philosophical assumptions and the nature of data. And, increasingly, we find researchers who
are interested in blending the two traditions, attempting to get the advantages of each. I don't think there's any resolution to the
debate. And, I believe social research is richer for the wider variety of views and methods that the debate generates.

Qualitative data is extremely varied in nature. It includes virtually any information that can be captured that is not
numerical in nature. Here are some of the major categories or types:
In-Depth Interviews
In-Depth Interviews include both individual interviews (e.g., one-on-one) as well as "group"
interviews (including focus groups). The data can be recorded in a wide variety of ways including
stenography, audio recording, video recording or written notes. In depth interviews differ from
direct observation primarily in the nature of the interaction. In interviews it is assumed that there is
a questioner and one or more interviewees. The purpose of the interview is to probe the ideas of
the interviewees about the phenomenon of interest.
Direct Observation
Direct observation is meant very broadly here. It differs from interviewing in that the observer does
not actively query the respondent. It can include everything from field research where one lives in
another context or culture for a period of time to photographs that illustrate some aspect of the
phenomenon. The data can be recorded in many of the same ways as interviews (stenography,
audio, video) and through pictures, photos or drawings (e.g., those courtroom drawings of witnesses
are a form of direct observation).
Written Documents
Usually this refers to existing documents (as opposed transcripts of interviews conducted for the
research). It can include newspapers, magazines, books, websites, memos, transcripts of
conversations, annual reports, and so on. Usually written documents are analyzed with some form
of content analysis.

A qualitative "approach" is a general way of thinking about conducting qualitative research. It describes, either
explicitly or implicitly, the purpose of the qualitative research, the role of the researcher(s), the stages of research,
and the method of data analysis. here, four of the major qualitative approaches are introduced.
Ethnography
The ethnographic approach to qualitative research comes largely from the field of anthropology. The emphasis in
ethnography is on studying an entire culture. Originally, the idea of a culture was tied to the notion of ethnicity and
geographic location (e.g., the culture of the Trobriand Islands), but it has been broadened to include virtually any
group or organization. That is, we can study the "culture" of a business or defined group (e.g., a Rotary club).
Ethnography is an extremely broad area with a great variety of practitioners and methods. However, the most
common ethnographic approach is participant observation as a part of field research. The ethnographer becomes
immersed in the culture as an active participant and records extensive field notes. As in grounded theory, there is
no preset limiting of what will be observed and no real ending point in an ethnographic study.
Phenomenology
Phenomenology is sometimes considered a philosophical perspective as well as an approach to qualitative

methodology. It has a long history in several social research disciplines including psychology, sociology and social
work. Phenomenology is a school of thought that emphasizes a focus on people's subjective experiences and
interpretations of the world. That is, the phenomenologist wants to understand how the world appears to others.
Field Research
Field research can also be considered either a broad approach to qualitative research or a method of gathering
qualitative data. the essential idea is that the researcher goes "into the field" to observe the phenomenon in its
natural state or in situ. As such, it is probably most related to the method of participant observation. The field
researcher typically takes extensive field notes which are subsequently coded and analyzed in a variety of ways.
Grounded Theory
Grounded theory is a qualitative research approach that was originally developed by Glaser and Strauss in the
1960s. The self-defined purpose of grounded theory is to develop theory about phenomena of interest. But this is
not just abstract theorizing they're talking about. Instead the theory needs to be grounded or rooted in observation --
hence the term.
Grounded theory is a complex iterative process. The research begins with the raising of generative questions which
help to guide the research but are not intended to be either static or confining. As the researcher begins to gather
data, core theoretical concept(s) are identified. Tentative linkages are developed between the theoretical core
concepts and the data. This early phase of the research tends to be very open and can take months. Later on the
researcher is more engaged in verification and summary. The effort tends to evolve toward one core category that
is central.
There are several key analytic strategies:
Coding is a process for both categorizing qualitative data and for describing the implications and details of
these categories. Initially one does open coding, considering the data in minute detail while developing
some initial categories. Later, one moves to more selective coding where one systematically codes with
respect to a core concept.
Memoing is a process for recording the thoughts and ideas of the researcher as they evolve throughout the
study. You might think of memoing as extensive marginal notes and comments. Again, early in the
process these memos tend to be very open while later on they tend to increasingly focus in on the core
concept.
Integrative diagrams and sessions are used to pull all of the detail together, to help make sense of the data
with respect to the emerging theory. The diagrams can be any form of graphic that is useful at that point in
theory development. They might be concept maps or directed graphs or even simple cartoons that can act
as summarizing devices. This integrative work is best done in group sessions where different members of
the research team are able to interact and share ideas to increase insight.
Eventually one approaches conceptually dense theory as new observation leads to new linkages which lead to
revisions in the theory and more data collection. The core concept or category is identified and fleshed out in detail.
When does this process end? One answer is: never! Clearly, the process described above could continue
indefinitely. Grounded theory doesn't have a clearly demarcated point for ending a study. Essentially, the project
ends when the researcher decides to quit.
What do you have when you're finished? Presumably you have an extremely well-considered explanation for some
phenomenon of interest -- the grounded theory. This theory can be explained in words and is usually presented
with much of the contextually relevant detail collected.

There are a wide variety of methods that are common in qualitative measurement. In fact, the methods are largely
limited by the imagination of the researcher. Here I discuss a few of the more common methods.
Participant Observation
One of the most common methods for qualitative data collection, participant observation is also one of the most
demanding. It requires that the researcher become a participant in the culture or context being observed. The
literature on participant observation discusses how to enter the context, the role of the researcher as a participant,
the collection and storage of field notes, and the analysis of field data. Participant observation often requires
months or years of intensive work because the researcher needs to become accepted as a natural part of the culture
in order to assure that the observations are of the natural phenomenon.
Direct Observation
Direct observation is distinguished from participant observation in a number of ways. First, a direct observer
doesn't typically try to become a participant in the context. However, the direct observer does strive to be as
unobtrusive as possible so as not to bias the observations. Second, direct observation suggests a more detached
perspective. The researcher is watching rather than taking part. Consequently, technology can be a useful part of
direct observation. For instance, one can videotape the phenomenon or observe from behind one-way mirrors.
Third, direct observation tends to be more focused than participant observation. The researcher is observing
certain sampled situations or people rather than trying to become immersed in the entire context. Finally, direct
observation tends not to take as long as participant observation. For instance, one might observe child-mother
interactions under specific circumstances in a laboratory setting from behind a one-way mirror, looking especially for
the nonverbal cues being used.
Unstructured Interviewing
Unstructured interviewing involves direct interaction between the researcher and a respondent or group. It differs
from traditional structured interviewing in several important ways. First, although the researcher may have some
initial guiding questions or core concepts to ask about, there is no formal structured instrument or protocol. Second,
the interviewer is free to move the conversation in any direction of interest that may come up. Consequently,
unstructured interviewing is particularly useful for exploring a topic broadly. However, there is a price for this lack of
structure. Because each interview tends to be unique with no predetermined set of questions asked of all
respondents, it is usually more difficult to analyze unstructured interview data, especially when synthesizing across
respondents.
Case Studies
A case study is an intensive study of a specific individual or specific context. For instance, Freud developed case
studies of several individuals as the basis for the theory of psychoanalysis and Piaget did case studies of children to
study developmental phases. There is no single way to conduct a case study, and a combination of methods (e.g.,
unstructured interviewing, direct observation) can be used.
Depending on their philosophical perspectives, some qualitative researchers reject the framework of validity that is
commonly accepted in more quantitative research in the social sciences. They reject the basic realist assumption
that their is a reality external to our perception of it. Consequently, it doesn't make sense to be concerned with the
"truth" or "falsity" of an observation with respect to an external reality (which is a primary concern of validity). These
qualitative researchers argue for different standards for judging the quality of research.
For instance, Guba and Lincoln proposed four criteria for judging the soundness of qualitative research and explicitly
offered these as an alternative to more traditional quantitatively-oriented criteria. They felt that their four criteria
better reflected the underlying assumptions involved in much qualitative research. Their proposed criteria and the
"analogous" quantitative criteria are listed in the table.
Traditional Criteria for Judging Alternative Criteria for Judging

Quantitative Research Qualitative Research
internal validity credibility
external validity transferability
reliability dependability
objectivity confirmability
Credibility
The credibility criteria involves establishing that the results of qualitative research are credible or believable from the
perspective of the participant in the research. Since from this perspective, the purpose of qualitative research is to
describe or understand the phenomena of interest from the participant's eyes, the participants are the only ones who
can legitimately judge the credibility of the results.
Transferability
Transferability refers to the degree to which the results of qualitative research can be generalized or transferred to
other contexts or settings. From a qualitative perspective transferability is primarily the responsibility of the one
doing the generalizing. The qualitative researcher can enhance transferability by doing a thorough job of describing
the research context and the assumptions that were central to the research. The person who wishes to "transfer"
the results to a different context is then responsible for making the judgment of how sensible the transfer is.
Dependability
The traditional quantitative view of reliability is based on the assumption of replicability or repeatability. Essentially it
is concerned with whether we would obtain the same results if we could observe the same thing twice. But we can't
actually measure the same thing twice -- by definition if we are measuring twice, we are measuring two different
things. In order to estimate reliability, quantitative researchers construct various hypothetical notions (e.g., true
score theory) to try to get around this fact.
The idea of dependability, on the other hand, emphasizes the need for the researcher to account for the ever-
changing context within which research occurs. The research is responsible for describing the changes that occur in
the setting and how these changes affected the way the research approached the study.
Confirmability
Qualitative research tends to assume that each researcher brings a unique perspective to the study. Confirmability
refers to the degree to which the results could be confirmed or corroborated by others. There are a number of
strategies for enhancing confirmability. The researcher can document the procedures for checking and rechecking
the data throughout the study. Another researcher can take a "devil's advocate" role with respect to the results, and
this process can be documented. The researcher can actively search for and describe and negative instances that
contradict prior observations. And, after he study, one can conduct a data audit that examines the data collection
and analysis procedures and makes judgements about the potential for bias or distortion.
There has been considerable debate among methodologists about the value and legitimacy of this alternative set of
standards for judging qualitative research. On the one hand, many quantitative researchers see the alternative
criteria as just a relabeling of the very successful quantitative criteria in order to accrue greater legitimacy for
qualitative research. They suggest that a correct reading of the quantitative criteria would show that they are not
limited to quantitative research alone and can be applied equally well to qualitative data. They argue that the
alternative criteria represent a different philosophical perspective that is subjectivist rather than realist in nature.
They claim that research inherently assumes that there is some reality that is being observed and can be observed
with greater or less accuracy or validity. if you don't make this assumption, they would contend, you simply are not
engaged in research (although that doesn't mean that what you are doing is not valuable or useful).
Perhaps there is some legitimacy to this counter argument. Certainly a broad reading of the traditional quantitative
criteria might make them appropriate to the qualitative realm as well. But historically the traditional quantitative
criteria have been described almost exclusively in terms of quantitative research. No one has yet done a thorough
job of translating how the same criteria might apply in qualitative research contexts. For instance, the discussions of
external validity have been dominated by the idea of statistical sampling as the basis for generalizing. And,
considerations of reliability have traditionally been inextricably linked to the notion of true score theory.
But qualitative researchers do have a point about the irrelevance of traditional quantitative criteria. How could we
judge the external validity of a qualitative study that does not use formalized sampling methods? And, how can we
judge the reliability of qualitative data when there is no mechanism for estimating the true score? No one has
adequately explained how the operational procedures used to assess validity and reliability in quantitative research
can be translated into legitimate corresponding operations for qualitative research.
While alternative criteria may not in the end be necessary (and I personally hope that more work is done on
broadening the "traditional" criteria so that they legitimately apply across the entire spectrum of research
approaches), and they certainly can be confusing for students and newcomers to this discussion, these alternatives
do serve to remind us that qualitative research cannot easily be considered only an extension of the quantitative
paradigm into the realm of nonnumeric data.
[ Home ] [ True Score Theory ] [ Measurement Error ] [ Theory of Reliability ] [ Types of Reliability ] [ Reliability & Validity ]
What is reliability? We hear the term used a lot in research contexts, but what does it really mean? If you think
about how we use the word "reliable" in everyday language, you might get a hint. For instance, we often speak
about a machine as reliable: "I have a reliable car." Or, news people talk about a "usually reliable source". In both
cases, the word reliable usually means "dependable" or "trustworthy." In research, the term "reliable" also means
dependable in a general sense, but that's not a precise enough definition. What does it mean to have a dependable
measure or observation in a research context? The reason "dependable" is not a good enough description is that it
can be confused too easily with the idea of a valid measure (see Measurement Validity). Certainly, when we speak
of a dependable measure, we mean one that is both reliable and valid. So we have to be a little more precise when
we try to define reliability.
In research, the term reliability means "repeatability" or "consistency". A measure is considered

reliable if it would give us the same result over and over again (assuming that what we are
measuring isn't changing!).
Let's explore in more detail what it means to say that a measure is "repeatable" or "consistent". We'll begin by
defining a measure that we'll arbitrarily label X. It might
be a person's score on a math achievement test or a
measure of severity of illness. It is the value (numerical
or otherwise) that we observe in our study. Now, to see
how repeatable or consistent an observation is, we can
measure it twice. We'll use subscripts to indicate the first
and second observation of the same measure. If we
assume that what we're measuring doesn't change
between the time of our first and second observation, we
can begin to understand how we get at reliability. While
we observe a score for what we're measuring, we
usually think of that score as consisting of two parts, the
'true' score or actual level for the person on that
measure, and the 'error' in measuring it (see True Score
Theory).
It's important to keep in mind that we observe the X score -- we never actually see the true (T) or error (e) scores.
For instance, a student may get a score of 85 on a math achievement test. That's the score we observe, an X of 85.
But the reality might be that the student is actually better at math than that score indicates. Let's say the student's
true math ability is 89 (i.e., T=89). That means that the error for that student is -4. What does this mean? Well, while
the student's true math ability may be 89, he/she may have had a bad day, may not have had breakfast, may have
had an argument, or may have been distracted while taking the test. Factors like these can contribute to errors in
measurement that make the student's observed ability appear lower than their true or actual ability.
OK, back to reliability. If our measure, X, is reliable, we should find that if we measure or observe it twice on the
same persons that the scores are pretty much the same. But why would they be the same? If you look at the figure
you should see that the only thing that the two observations have in common is their true scores, T. How do you
know that? Because the error scores (e1 and e2) have different subscripts indicating that they are different values.
But the true score symbol T is the same for both observations. What does this mean? That the two observed scores,
X1 and X2 are related only to the degree that the observations share true score. You should remember that the
error score is assumed to be random. Sometimes errors will lead you to perform better on a test than your true
ability (e.g., you had a good day guessing!) while other times it will lead you to score worse. But the true score --
your true ability on that measure -- would be the same on both observations (assuming, of course, that your true
ability didn't change between the two measurement occasions).
With this in mind, we can now define reliability more precisely. Reliability is a ratio or fraction. In layperson terms we
might define this ratio as:
true level on the measure
the entire measure

You might think of reliability as the proportion of "truth" in your measure. Now, we don't speak of the reliability of a
measure for an individual -- reliability is a characteristic of a measure that's taken across individuals. So, to get
closer to a more formal definition, let's restate the definition above in terms of a set of observations. The easiest way
to do this is to speak of the variance of the scores. Remember that the variance is a measure of the spread or
distribution of a set of scores. So, we can now state the definition as:
the variance of the true score
the variance of the measure

We might put this into slightly more technical terms by using the abbreviated name for the variance and our variable
names:
var(T)
var(X)
We're getting to the critical part now. If you look at the equation above, you should recognize that we can easily
determine or calculate the bottom part of the reliability ratio -- it's just the variance of the set of scores we observed
(You remember how to calculate the variance, don't you? It's just the sum of the squared deviations of the scores
from their mean, divided by the number of scores). But how do we calculate the variance of the true scores. We
can't see the true scores (we only see X)! Only God knows the true score for a specific observation. And, if we can't
calculate the variance of the true scores, we can't compute our ratio, which means we can't compute reliability!
Everybody got that? The bottom line is...
we can't compute reliability because we can't calculate the variance of the true scores
Great. So where does that leave us? If we can't compute reliability, perhaps the best we can do is to estimate it.
Maybe we can get an estimate of the variability of the true scores. How do we do that? Remember our two
observations, X1 and X2? We assume (using true score theory) that these two observations would be related to each
other to the degree that they share true scores. So, let's calculate the correlation between X1 and X2. Here's a
simple formula for the correlation:
covariance(X1, X2)
sd(X1) * sd(X2)
where the 'sd' stands for the standard deviation (which is the square root of the variance). If we look carefully at this
equation, we can see that the covariance, which simply measures the "shared" variance between measures must be
an indicator of the variability of the true scores because the true scores in X1 and X2 are the only thing the two
observations share! So, the top part is essentially an estimate of var(T) in this context. And, since the bottom part of
the equation multiplies the standard deviation of one observation with the standard deviation of the same measure at
another time, we would expect that these two values would be the same (it is the same measure we're taking) and
that this is essentially the same thing as squaring the standard deviation for either observation. But, the square of
the standard deviation is the same thing as the variance of the measure. So, the bottom part of the equation
becomes the variance of the measure (or var(X)). If you read this paragraph carefully, you should see that the
correlation between two observations of the same measure is an estimate of reliability.
It's time to reach some conclusions. We know from this discussion that we cannot calculate reliability because we
cannot measure the true score component of an observation. But we also know that we can estimate the true score
component as the covariance between two observations of the same measure. With that in mind, we can estimate
the reliability as the correlation between two observations of the same measure. It turns out that there are several
ways we can estimate this reliability correlation. These are discussed in Types of Reliability.
There's only one other issue I want to address here. How big is an estimate of reliability? To figure this out, let's go
back to the equation given earlier:
var(T)
var(X)
and remember that because X = T + e, we can substitute in the bottom of the ratio:
var(T)
var(T) + var(e)
With this slight change, we can easily determine the range of a reliability estimate. If a measure is perfectly reliable,
there is no error in measurement -- everything we observe is true score. Therefore, for a perfectly reliable measure,
the equation would reduce to:
var(T)
var(T)
and reliability = 1. Now, if we have a perfectly unreliable measure, there is no true score -- the measure is entirely
error. In this case, the equation would reduce to:
var(e)
and the reliability = 0. From this we know that reliability will always range between 0 and 1. The value of a reliability
estimate tells us the proportion of variability in the measure attributable to the true score. A reliability of .5 means
that about half of the variance of the observed score is attributable to truth and half is attributable to error. A
reliability of .8 means the variability is about 80% true ability and 20% error. And so on.

True Score Theory is a

theory about measurement.
Like all theories, you need to
recognize that it is not proven
-- it is postulated as a model
of how the world operates.
Like many very powerful
model, the true score theory
is a very simple one.
Essentially, true score theory
maintains that every
measurement is an additive
composite of two
components: true ability (or
the true level) of the
respondent on that measure;
and random error. We
observe the measurement -- the score on the test, the total for a self-esteem instrument, the scale value for a
person's weight. We don't observe what's on the right side of the equation (only God knows what those values are!),
we assume that there are two components to the right side.
The simple equation of X = T + eX has a parallel equation at the level of the variance or variability of a measure.
That is, across a set of scores, we assume that:
var(X) = var(T) + var(eX)
In more human terms this means that the variability of your measure is the sum of the variability due to true score
and the variability due to random error. This will have important implications when we consider some of the more
advanced models for adjusting for errors in measurement.
Why is true score theory important? For one thing, it is a simple yet powerful model for measurement. It reminds us
that most measurement has an error component. Second, true score theory is the foundation of reliability theory. A
measure that has no random error (i.e., is all true score) is perfectly reliable; a measure that has no true score (i.e.,
is all random error) has zero reliability. Third, true score theory can be used in computer simulations as the basis for
generating "observed" scores with certain known properties.
You should know that the true score model is not the only measurement model available. measurement theorists
continue to come up with more and more complex models that they think represent reality even better. But these
models are complicated enough that they lie outside the boundaries of this document. In any event, true score
theory should give you an idea of why measurement models are important at all and how they can be used as the
basis for defining key research ideas.
The true score theory is a good simple model for measurement, but it may not always be an accurate reflection of
reality. In particular, it assumes that any observation is composed of the true value plus some random error value.
But is that reasonable? What if all error is not random? Isn't it possible that some errors are systematic, that they
hold across most or all of the members of a group? One way to deal with this notion is to revise the simple true
score model by dividing the error component into two subcomponents, random error and systematic error. here,
we'll look at the differences between these two types of errors and try to diagnose their effects on our research.
What is Random Error?
Random error is caused by any factors that randomly affect measurement of the variable across the sample. For
instance, each person's mood can inflate or deflate their performance on any occasion. In a particular testing, some
children may be feeling in a good mood and others may be depressed. If mood affects their performance on the
measure, it may artificially inflate the observed scores for some children and artificially deflate them for others. The
important thing about random error is that it does not have any consistent effects across the entire sample. Instead,
it pushes observed scores up or down randomly. This means that if we could see all of the random errors in a
distribution they would have to sum to 0 -- there would be as many negative errors as positive ones. The important
property of random error is that it adds variability to the data but does not affect average performance for the group.
Because of this, random error is sometimes considered noise.
What is Systematic Error?
Systematic error is caused by any factors that systematically affect measurement of the variable across the sample.
For instance, if there is loud traffic going by just outside of a classroom where students are taking a test, this noise is
liable to affect all of the children's scores -- in this case, systematically lowering them. Unlike random error,
systematic errors tend to be consistently either positive or negative -- because of this, systematic error is sometimes
considered to be bias in measurement.
Reducing Measurement Error
So, how can we reduce measurement errors, random or systematic? One thing you can do is to pilot test your
instruments, getting feedback from your respondents regarding how easy or hard the measure was and information
about how the testing environment affected their performance. Second, if you are gathering measures using people
to collect the data (as interviewers or observers) you should make sure you train them thoroughly so that they aren't
inadvertently introducing error. Third, when you collect the data for your study you should double-check the data
thoroughly. All data entry for computer analysis should be "double-punched" and verified. This means that you enter
the data twice, the second time having your data entry machine check that you are typing the exact same data you
did the first time. Fourth, you can use statistical procedures to adjust for measurement error. These range from
rather simple formulas you can apply directly to your data to very complex modeling procedures for modeling the
error and its effects. Finally, one of the best things you can do to deal with measurement errors, especially
systematic errors, is to use multiple measures of the same construct. Especially if the different measures don't share
the same systematic errors, you will be able to triangulate across the multiple measures and get a more accurate
sense of what's going on.

You learned in the Theory of Reliability that it's not possible to calculate reliability exactly. Instead, we have to
estimate reliability, and this is always an imperfect endeavor. Here, I want to introduce the major reliability estimators
and talk about their strengths and weaknesses.
There are four general classes of reliability estimates, each of which estimates reliability in a different way. They are:
Inter-Rater or Inter-Observer Reliability

Used to assess the degree to which different raters/observers give consistent estimates of
the same phenomenon.
Test-Retest Reliability
Used to assess the consistency of a measure from one time to another.
Parallel-Forms Reliability
Used to assess the consistency of the results of two tests constructed in the same way
from the same content domain.
Internal Consistency Reliability
Used to assess the consistency of results across items within a test.
Let's discuss each of these in turn.
Inter-Rater or Inter-Observer Reliability
Whenever you use humans as a part of your measurement

procedure, you have to worry about whether the results you get
are reliable or consistent. People are notorious for their
inconsistency. We are easily distractible. We get tired of doing
repetitive tasks. We daydream. We misinterpret.
So how do we determine whether two observers are being

consistent in their observations? You probably should establish
inter-rater reliability outside of the context of the measurement in
your study. After all, if you use data from your study to establish
reliability, and you find that reliability is low, you're kind of stuck.
Probably it's best to do this as a side study or pilot study. And, if
your study goes on for a long time, you may want to reestablish
inter-rater reliability from time to time to assure that your raters
aren't changing.
There are two major ways to actually estimate inter-rater reliability. If your measurement consists of categories -- the
raters are checking off which category each observation falls in -- you can calculate the percent of agreement
between the raters. For instance, let's say you had 100 observations that were being rated by two raters. For each
observation, the rater could check one of three categories. Imagine that on 86 of the 100 observations the raters
checked the same category. In this case, the percent of agreement would be 86%. OK, it's a crude measure, but it
does give an idea of how much agreement exists, and it works no matter how many categories are used for each
observation.
The other major way to estimate inter-rater reliability is appropriate when the measure is a continuous one. There, all
you need to do is calculate the correlation between the ratings of the two observers. For instance, they might be
rating the overall level of activity in a classroom on a 1-to-7 scale. You could have them give their rating at regular
time intervals (e.g., every 30 seconds). The correlation between these ratings would give you an estimate of the
reliability or consistency between the raters.
You might think of this type of reliability as "calibrating" the observers. There are other things you could do to
encourage reliability between observers, even if you don't estimate it. For instance, I used to work in a psychiatric
unit where every morning a nurse had to do a ten-item rating of each patient on the unit. Of course, we couldn't
count on the same nurse being present every day, so we had to find a way to assure that any of the nurses would
give comparable ratings. The way we did it was to hold weekly "calibration" meetings where we would have all of the
nurses ratings for several patients and discuss why they chose the specific values they did. If there were
disagreements, the nurses would discuss them and attempt to come up with rules for deciding when they would give
a "3" or a "4" for a rating on a specific item. Although this was not an estimate of reliability, it probably went a long
way toward improving the reliability between raters.
Test-Retest Reliability
We estimate test-retest reliability when we administer the same test to the same (or a similar) sample on two
different occasions. This approach assumes that there is no substantial change in the construct being measured
between the two occasions. The amount of time allowed between measures is critical. We know that if we measure
the same thing twice that the correlation between the two observations will depend in part by how much time
elapses between the two measurement occasions. The shorter the time gap, the higher the correlation; the longer
the time gap, the lower the correlation. This is because the two observations are related over time -- the closer in
time we get the more similar the factors that contribute to error. Since this correlation is the test-retest estimate of
reliability, you can obtain considerably different estimates depending on the interval.
Parallel-Forms Reliability
In parallel forms reliability you first have to create two parallel forms. One way to accomplish this is to create a large
set of questions that address the same construct and then randomly divide the questions into two sets. You
administer both instruments to the same sample of people. The correlation between the two parallel forms is the
estimate of reliability. One major problem with this approach is that you have to be able to generate lots of items
that reflect the same construct. This is often no easy feat. Furthermore, this approach makes the assumption that
the randomly divided halves are parallel or equivalent. Even by chance this will sometimes not be the case. The
parallel forms approach is very similar to the split-half reliability described below. The major difference is that
parallel forms are constructed so that the two forms can be used independent of each other and considered
equivalent measures. For instance, we might be concerned about a testing threat to internal validity. If we use
Form A for the pretest and Form B for the posttest, we minimize that problem. it would even be better if we
randomly assign individuals to receive Form A or B on the pretest and then switch them on the posttest. With split-
half reliability we have an instrument that we wish to use as a single measurement instrument and only develop
randomly split halves for purposes of estimating reliability.
Internal Consistency Reliability
In internal consistency reliability estimation we use our single measurement instrument administered to a group of
people on one occasion to estimate reliability. In effect we judge the reliability of the instrument by estimating how
well the items that reflect the same construct yield similar results. We are looking at how consistent the results are
for different items for the same construct within the measure. There are a wide variety of internal consistency
measures that can be used.
Average Inter-item Correlation
The average inter-item correlation uses all of the items on our instrument that are designed to measure the same
construct. We first compute the correlation between each pair of items, as illustrated in the figure. For example, if
we have six items we will have 15 different item pairings (i.e., 15 correlations). The average interitem correlation is
simply the average or mean of all these correlations. In the example, we find an average inter-item correlation of .90
with the individual correlations ranging from .84 to .95.
Average Itemtotal Correlation
This approach also uses the inter-item correlations. In addition, we compute a total score for the six items and use
that as a seventh variable in the analysis. The figure shows the six item-to-total correlations at the bottom of the
correlation matrix. They range from .82 to .88 in this sample analysis, with the average of these at .85.
Split-Half Reliability
In split-half reliability we randomly divide all items that purport to measure the same construct into two sets. We
administer the entire instrument to a sample of people and calculate the total score for each randomly divided half.
the split-half reliability estimate, as shown in the figure, is simply the correlation between these two total scores. In
the example it is .87.
Cronbach's Alpha (a)
Imagine that we compute one split-half reliability and then randomly divide the items into another set of split halves
and recompute, and keep doing this until we have computed all possible split half estimates of reliability. Cronbach's
Alpha is mathematically equivalent to the average of all possible split-half estimates, although that's not how we
compute it. Notice that when I say we compute all possible split-half estimates, I don't mean that each time we go
an measure a new sample! That would take forever. Instead, we calculate all split-half estimates from the same
sample. Because we measured all of our sample on each of the six items, all we have to do is have the computer
analysis do the random subsets of items and compute the resulting correlations. The figure shows several of the
split-half estimates for our six item example and lists them as SH with a subscript. Just keep in mind that although
Cronbach's Alpha is equivalent to the average of all possible split half correlations we would never actually calculate
it that way. Some clever mathematician (Cronbach, I presume!) figured out a way to get the mathematical
equivalent a lot more quickly.
Comparison of Reliability Estimators
Each of the reliability estimators has certain advantages and disadvantages. Inter-rater reliability is one of the best
ways to estimate reliability when your measure is an observation. However, it requires multiple raters or observers.
As an alternative, you could look at the correlation of ratings of the same single observer repeated on two different
occasions. For example, let's say you collected videotapes of child-mother interactions and had a rater code the
videos for how often the mother smiled at the child. To establish inter-rater reliability you could take a sample of
videos and have two raters code them independently. To estimate test-retest reliability you could have a single rater
code the same videos on two different occasions. You might use the inter-rater approach especially if you were
interested in using a team of raters and you wanted to establish that they yielded consistent results. If you get a
suitably high inter-rater reliability you could then justify allowing them to work independently on coding different
videos. You might use the test-retest approach when you only have a single rater and don't want to train any
others. On the other hand, in some studies it is reasonable to do both to help establish the reliability of the raters or
observers.
The parallel forms estimator is typically only used in situations where you intend to use the two forms as alternate
measures of the same thing. Both the parallel forms and all of the internal consistency estimators have one major
constraint -- you have to have multiple items designed to measure the same construct. This is relatively easy to
achieve in certain contexts like achievement testing (it's easy, for instance, to construct lots of similar addition
problems for a math test), but for more complex or subjective constructs this can be a real challenge. If you do have
lots of items, Cronbach's Alpha tends to be the most frequently used estimate of internal consistency.
The test-retest estimator is especially feasible in most experimental and quasi-experimental designs that use a no-
treatment control group. In these designs you always have a control group that is measured on two occasions
(pretest and posttest). the main problem with this approach is that you don't have any information about reliability
until you collect the posttest and, if the reliability estimate is low, you're pretty much sunk.
Each of the reliability estimators will give a different value for reliability. In general, the test-retest and inter-rater
reliability estimates will be lower in value than the parallel forms and internal consistency ones because they involve
measuring at different times or with different raters. Since reliability estimates are often used in statistical analyses
of quasi-experimental designs (e.g., the analysis of the nonequivalent group design), the fact that different estimates
can differ considerably makes the analysis even more complex.

We often think of reliability and validity as separate ideas but, in fact, they're related to each other. Here, I want to
show you two ways you can think about their relationship.
One of my favorite metaphors for the relationship between reliability is that of the target. Think of the center of the
target as the concept that you are trying to measure. Imagine that for each person you are measuring, you are
taking a shot at the target. If you measure the concept perfectly for a person, you are hitting the center of the target.
If you don't, you are missing the center. The more you are off for that person, the further you are from the center.
The figure above shows four possible situations. In the first one, you are hitting the target consistently, but you are
missing the center of the target. That is, you are consistently and systematically measuring the wrong value for all
respondents. This measure is reliable, but no valid (that is, it's consistent but wrong). The second, shows hits that
are randomly spread across the target. You seldom hit the center of the target but, on average, you are getting the
right answer for the group (but not very well for individuals). In this case, you get a valid group estimate, but you are
inconsistent. Here, you can clearly see that reliability is directly related to the variability of your measure. The third
scenario shows a case where your hits are spread across the target and you are consistently missing the center.
Your measure in this case is neither reliable nor valid. Finally, we see the "Robin Hood" scenario -- you consistently
hit the center of the target. Your measure is both reliable and valid (I bet you never thought of Robin Hood in those
terms before).
Another way we can think about the relationship between reliability and validity is shown in the figure below. Here,
we set up a 2x2 table. The columns of the table indicate whether you are trying to measure the same or different
concepts. The rows show whether you are using the same or different methods of measurement. Imagine that we
have two concepts we would like to measure, student verbal and math ability. Furthermore, imagine that we can
measure each of these in two ways. First, we can use a written, paper-and-pencil exam (very much like the SAT or
GRE exams). Second, we can ask the student's classroom teacher to give us a rating of the student's ability based
on their own classroom observation.
The first cell on the upper left shows the comparison of the verbal written test score with the verbal written test
score. But how can we compare the same measure with itself? We could do this by estimating the reliability of the
written test through a test-retest correlation, parallel forms, or an internal consistency measure (See Types of
Reliability). What we are estimating in this cell is the reliability of the measure.
The cell on the lower left shows a comparison of the verbal written measure with the verbal teacher observation
rating. Because we are trying to measure the same concept, we are looking at convergent validity (See
Measurement Validity Types).
The cell on the upper right shows the comparison of the verbal written exam with the math written exam. Here, we
are comparing two different concepts (verbal versus math) and so we would expect the relationship to be lower than
a comparison of the same concept with itself (e.g., verbal versus verbal or math versus math). Thus, we are trying to
discriminate between two concepts and we would consider this discriminant validity.
Finally, we have the cell on the lower right. Here, we are comparing the verbal written exam with the math teacher
observation rating. Like the cell on the upper right, we are also trying to compare two different concepts (verbal
versus math) and so this is a discriminant validity estimate. But here, we are also trying to compare two different
methods of measurement (written exam versus teacher observation rating). So, we'll call this very discriminant to
indicate that we would expect the relationship in this cell to be even lower than in the one above it.
The four cells incorporate the different values that we examine in the multitrait-multimethod approach to estimating
construct validity.
When we look at reliability and validity in this way, we see that, rather than being distinct, they actually form a
continuum. On one end is the situation where the concepts and methods of measurement are the same (reliability)
and on the other is the situation where concepts and methods of measurement are different (very discriminant
validity).

[ Home ] [ Establishing Cause & Effect ] [ Single Group Threats ] [ Multiple-Group Threats ] [ Social Interaction Threats ]
Regression to the MeanThe Single Group Case
What
is
meant by a "single group" threat? Let's consider two single group designs and then consider the threats that are
most relevant with respect to internal validity. The top design in the figure shows a "posttest-only" single group
design. Here, a group of people receives your program and afterwards is given a posttest. In the bottom part of
the figure we see a "pretest-posttest" single group design. In this case, we give the participants a pretest or
baseline measure, give them the program or treatment, and then give them a posttest.
To help make this a bit more concrete, let's imagine that we are studying the effects of a compensatory
education program in mathematics for first grade students on a measure of math performance such as a
standardized math achievement test. In the post-only design, we would give the first graders the program and
then give a math achievement posttest. We might choose not to give them a baseline measure because we
have reason to believe they have no prior knowledge of the math skills we are teaching. It wouldn't make sense
to pretest them if we expect they would all get a score of zero. In the pre-post design we are not willing to
assume that they have no prior knowledge. We measure the baseline in order to determine where the students
start out in math achievement. We might hypothesize that the change or gain from pretest to posttest is due to
our special math tutoring program. This is a compensatory program because it is only given to students who are
identified as potentially low in math ability on the basis of some screening mechanism.
The Single Group Threats
With either of these scenarios in mind, consider what would happen if you observe a certain level of posttest
math achievement or a change or gain from pretest to posttest. You want to conclude that the outcome is due to
your math program. How could you be wrong? Here are some of the ways, some of the threats to interval
validity that your critics might raise, some of the plausible alternative explanations for your observed effect:
History Threat
It's not your math program that caused the outcome, it's something else, some historical event
that occurred. For instance, we know that lot's of first graders watch the public TV program
Sesame Street. And, we know that in every Sesame Street show they present some very
elementary math concepts. Perhaps these shows cause the outcome and not your math
program.
Maturation Threat
The children would have had the exact same outcome even if they had never had your special
math training program. All you are doing is measuring normal maturation or growth in
understanding that occurs as part of growing up -- your math program has no effect. How is this
maturation explanation different from a history threat? In general, if we're talking about a
specific event or chain of events that could cause the outcome, we call it a history threat. If
we're talking about all of the events that typically transpire in your life over a period of time
(without being specific as to which ones are the active causal agents) we call it a maturation
threat.
Testing Threat
This threat only occurs in the pre-post design. What if taking the pretest made some of the
children more aware of that kind of math problem -- it "primed" them for the program so that
when you began the math training they were ready for it in a way that they wouldn't have been
without the pretest. This is what is meant by a testing threat -- taking the pretest (not getting
your program) affects how participants do on the posttest.
Instrumentation Threat
Like the testing threat, this one only operates in the pretest-posttest situation. What if the
change from pretest to posttest is due not to your math program but rather to a change in the
test that was used? This is what's meant by an instrumentation threat. In many schools when
they have to administer repeated testing they don't use the exact same test (in part because
they're worried about a testing threat!) but rather give out "alternate forms" of the same tests.
These alternate forms were designed to be "equivalent" in the types of questions and level of
difficulty, but what if they aren't? Perhaps part or all of any pre-post gain is attributable to the
change in instrument, not to your program. Instrumentation threats are especially likely when
the "instrument" is a human observer. The observers may get tired over time or bored with the
observations. Conversely, they might get better at making the observations as they practice
more. In either event, it's the change in instrumentation, not the program, that leads to the
outcome.
Mortality Threat
Mortality doesn't mean that people in your study are dying (although if they are, it would be
considered a mortality threat!). Mortality is used metaphorically here. It means that people are
"dying" with respect to your study. Usually, it means that they are dropping out of the study.
What's wrong with that? Let's assume that in our compensatory math tutoring program we have
a nontrivial dropout rate between pretest and posttest. And, assume that the kids who are
dropping out are the low pretest math achievement test scorers. If you look at the average gain
from pretest to posttest using all of the scores available to you at each occasion, you would
include these low pretest subsequent dropouts in the pretest and not in the posttest. You'd be
dropping out the potential low scorers from the posttest, or, you'd be artificially inflating the
posttest average over what it would have been if no students had dropped out. And, you won't
necessarily solve this problem by comparing pre-post averages for only those kids who stayed
in the study. This subsample would certainly not be representative even of the original entire
sample. Furthermore, we know that because of regression threats (see below) these students
may appear to actually do worse on the posttest, simply as an artifact of the non-random
dropout or mortality in your study. When mortality is a threat, the researcher can often gauge
the degree of the threat by comparing the dropout group against the nondropout group on
pretest measures. If there are no major differences, it may be more reasonable to assume that
mortality was happening across the entire sample and is not biasing results greatly. But if the
pretest differences are large, one must be concerned about the potential biasing effects of
mortality.
Regression Threat
A regression threat, also known as a "regression artifact" or "regression to the mean" is a

statistical phenomenon that occurs whenever you have a nonrandom sample from a population
and two measures that are imperfectly correlated. OK, I know that's gibberish. Let me try again.
Assume that your two measures are a pretest and posttest (and you can certainly bet these
aren't perfectly correlated with each other). Furthermore, assume that your sample consists of
low pretest scorers. The regression threat means that the pretest average for the group in your
study will appear to increase or improve (relatively to the overall population) even if you don't do
anything to them -- even if you never give them a treatment. Regression is a confusing threat to
understand at first. I like to think about it as the "you can only go up from here" phenomenon. If
you include in your program only the kids who constituted the lowest ten percent of the class on
the pretest, what are the chances that they would constitute exactly the lowest ten percent on
the posttest? Not likely. Most of them would score low on the posttest, but they aren't likely to
be the lowest ten percent twice. For instance, maybe there were a few kids on the pretest who
got lucky on a few guesses and scored at the eleventh percentile who won't get so lucky next
time. No, if you choose the lowest ten percent on the pretest, they can't get any lower than
being the lowest -- they can only go up from there, relative to the larger population from which
they were selected. This purely statistical phenomenon is what we mean by a regression threat.
To see a more detailed discussion of why regression threats occur and how to estimate them,
click here.
How do we deal with these single group threats to internal validity? While there are several
ways to rule out threats, one of the most common approaches to ruling out the ones listed
above is through your research design. For instance, instead of doing a single group study, you
could incorporate a control group. In this scenario, you would have two groups: one receives
your program and the other one doesn't. In fact, the only difference between these groups
should be the program. If that's true, then the control group would experience all the same
history and maturation threats, would have the same testing and instrumentation issues, and
would have similar rates of mortality and regression to the mean. In other words, a good control
group is one of the most effective ways to rule out the single-group threats to internal validity.
Of course, when you add a control group, you no-longer have a single group design. And, you
will still have to deal with threats two major types of threats to internal validity: the multiple-
group threats to internal validity and the social threats to internal validity.

Establishing a Cause-Effect Relationship
How do we establish a cause-effect (causal) relationship? What criteria do we have to meet? Generally, there are three
criteria that you must meet before you can say that you have evidence for a causal relationship:
Temporal Precedence
First, you have to be able to show that your cause happened before your effect. Sounds easy, huh? Of
course my cause has to happen before the effect. Did you ever hear of an effect happening before its
cause? Before we get lost in the logic here, consider a classic example from economics: does inflation
cause unemployment? It certainly seems plausible that as inflation increases, more employers find that
in order to meet costs they have to lay off employees. So it seems that inflation could, at least partially,
be a cause for unemployment. But both inflation and employment rates are occurring together on an
ongoing basis. Is it possible that fluctuations in employment can affect inflation? If we have an increase
in the work force (i.e., lower unemployment) we may have more demand for goods, which would tend
to drive up the prices (i.e., inflate them) at least until supply can catch up. So which is the cause and
which the effect, inflation or unemployment? It turns out that in this kind of cyclical situation involving
ongoing processes that interact that both may cause and, in turn, be affected by the other. This makes
it very hard to establish a causal relationship in this situation.
Covariation of the Cause and Effect
What does this mean? Before you can show that you have a causal relationship you have to show that
you have some type of relationship. For instance, consider the syllogism:
if X then Y
if not X then not Y
If you observe that whenever X is present, Y is also present, and whenever X is absent, Y is too, then
you have demonstrated that there is a relationship between X and Y. I don't know about you, but
sometimes I find it's not easy to think about X's and Y's. Let's put this same syllogism in program
evaluation terms:
if program then outcome

if not program then not outcome
Or, in colloquial terms: if you give a program you observe the outcome but if you don't give the program
you don't observe the outcome. This provides evidence that the program and outcome are related.
Notice, however, that this syllogism doesn't not provide evidence that the program caused the outcome
-- perhaps there was some other factor present with the program that caused the outcome, rather than
the program. The relationships described so far are rather simple binary relationships. Sometimes we
want to know whether different amounts of the program lead to different amounts of the outcome -- a
continuous relationship:
if more of the program then more of the outcome

if less of the program then less of the outcome
No Plausible Alternative Explanations
Just because you show there's a relationship doesn't mean it's a causal one. It's possible that there is some
other variable or factor that is causing the outcome. This is sometimes referred to as the "third variable" or
"missing variable" problem and it's at the heart of the issue of internal validity. What are some of the possible
plausible alternative explanations? Just go look at the threats to internal validity (see single group threats,
multiple group threats or social threats) -- each one describes a type of alternative explanation.
In order for you to argue that you have demonstrated internal validity -- that you have shown there's a causal
relationship -- you have to "rule out" the plausible alternative explanations. How do you do that? One of the
major ways is with your research design. Let's consider a simple single group threat to internal validity, a
history threat. Let's assume you measure your program group before they start the program (to establish a
baseline), you give them the program, and then you measure their performance afterwards in a posttest. You
see a marked improvement in their performance which you would like to infer is caused by your program. One
of the plausible alternative explanations is that you have a history threat -- it's not your program that caused the
gain but some other specific historical event. For instance, it's not your anti-smoking campaign that caused the
reduction in smoking but rather the Surgeon General's latest report that happened to be issued between the
time you gave your pretest and posttest. How do you rule this out with your research design? One of the
simplest ways would be to incorporate the use of a control group -- a group that is comparable to your program
group with the only difference being that they didn't receive the program. But they did experience the Surgeon
General's latest report. If you find that they didn't show a reduction in smoking even though they did experience
the same Surgeon General report you have effectively "ruled out" the Surgeon General's report as a plausible
alternative explanation for why you observed the smoking reduction.
In most applied social research that involves evaluating programs, temporal precedence is not a difficult criterion to
meet because you administer the program before you measure effects. And, establishing covariation is relatively simple
because you have some control over the program and can set things up so that you have some people who get it and
some who don't (if X and if not X). Typically the most difficult criterion to meet is the third -- ruling out alternative
explanations for the observed effect. That is why research design is such an important issue and why it is intimately
linked to the idea of internal validity.
The Central Issue
A multiple-group design typically involves at least two groups and before-after measurement. Most often, one group
receives the program or treatment while the other does not and constitutes the "control" or comparison group. But
sometimes one group gets the program and the other gets either the standard program or another program you
would like to compare. In this case, you would be comparing two programs for their relative outcomes. Typically you
would construct a multiple-group design so that you could compare the groups directly. In such designs, the key
internal validity issue is the degree to which the groups are comparable before the study. If they are comparable,
and the only difference between them is the program, posttest differences can be attributed to the program. But
that's a big if. If the groups aren't comparable to begin with, you won't know how much of the outcome to attribute to
your program or to the initial differences between groups.
There really is only one multiple group threat to internal validity: that the groups were not comparable before the
study. We call this threat a selection bias or selection threat. A selection threat is any factor other than the
program that leads to posttest differences between groups. Whenever we suspect that outcomes differ between
groups not because of our program but because of prior group differences we are suspecting a selection bias.
Although the term 'selection bias' is used as the general category for all prior differences, when we know specifically
what the group difference is, we usually hyphenate it with the 'selection' term. The multiple-group selection threats
directly parallel the single group threats. For instance, while we have 'history' as a single group threat, we have
'selection-history' as its multiple-group analogue.
As with the single group threats to internal validity, we'll assume a simple example involving a new compensatory
mathematics tutoring program for first graders. The design will be a pretest-posttest design, and we will divide the
first graders into two groups, one getting the new tutoring program and the other not getting it.
Here are the major multiple-group threats to internal validity for this case:
Selection-History Threat
A selection-history threat is any other event that occurs between pretest and posttest that the
groups experience differently. Because this is a selection threat, it means the groups differ in some
way. Because it's a 'history' threat, it means that the way the groups differ is with respect to their
reactions to history events. For example, what if the children in one group differ from those in the
other in their television habits. Perhaps the program group children watch Sesame Street more
frequently than those in the control group do. Since Sesame Street is a children's show that
presents simple mathematical concepts in interesting ways, it may be that a higher average posttest
math score for the program group doesn't indicate the effect of our math tutoring -- it's really an
effect of the two groups differentially experiencing a relevant event -- in this case Sesame Street --
between the pretest and posttest.
Selection-Maturation Threat
A selection-maturation threat results from differential rates of normal growth between pretest and
posttest for the groups. In this case, the two groups are different in their different rates of maturation
with respect to math concepts. It's important to distinguish between history and maturation threats.
In general, history refers to a discrete event or series of events whereas maturation implies the
normal, ongoing developmental process that would take place. In any case, if the groups are
maturing at different rates with respect to the outcome, we cannot assume that posttest differences
are due to our program -- they may be selection-maturation effects.
Selection-Testing Threat
A selection-testing threat occurs when there is a differential effect between groups on the posttest of
taking the pretest. Perhaps the test "primed" the children in each group differently or they may have
learned differentially from the pretest. in these cases, an observed posttest difference can't be
attributed to the program, they could be the result of selection-testing.
Selection-Instrumentation Threat
Selection-instrumentation refers to any differential change in the test used for each group from
pretest and posttest. In other words, the test changes differently for the two groups. Perhaps the
test consists of observers who rate the class performance of the children. What if the program group
observers for example, get better at doing the observations while, over time, the comparison group
observers get fatigued and bored. Differences on the posttest could easily be due to this differential
instrumentation -- selection-instrumentation -- and not to the program.
Selection-Mortality Threat
Selection-mortality arises when there is differential nonrandom dropout between pretest and
posttest. In our example, different types of children might drop out of each group, or more may drop
out of one than the other. Posttest differences might then be due to the different types of dropouts --
the selection-mortality -- and not to the program.
Selection-Regression Threat
Finally, selection-regression occurs when there are different rates of regression to the mean in the
two groups. This might happen if one group is more extreme on the pretest than the other. In the
context of our example, it may be that the program group is getting a disproportionate number of
low math ability children because teachers think they need the math tutoring more (and the teachers
don't understand the need for 'comparable' program and comparison groups!). Since the tutoring
group has the more extreme lower scorers, their mean will regress a greater distance toward the
overall population mean and they will appear to gain more than their comparison group
counterparts. This is not a real program gain -- it's just a selection-regression artifact.
When we move from a single group to a multiple group study, what do we gain from the rather significant investment
in a second group? If the second group is a control group and is comparable to the program group, we can rule out
the single group threats to internal validity because they will all be reflected in the comparison group and cannot
explain why posttest group differences would occur. But the key is that the groups must be comparable. How can we
possibly hope to create two groups that are truly "comparable"? The only way we know of doing that is to randomly
assign persons in our sample into the two groups -- we conduct a randomized or "true" experiment. But in many
applied research settings we can't randomly assign, either because of logistical or ethical factors. In that case, we
typically try to assign two groups nonrandomly so that they are as equivalent as we can make them. We might, for
instance, have one classroom of first graders assigned to the math tutoring program while the other class is the
comparison group. In this case, we would hope the two are equivalent, and we may even have reasons to believe
that they are. But because they may not be equivalent and because we did not use a procedure like random
assignment to at least assure that they are probabilistically equivalent, we call such designs quasi-experimental
designs. If we measure them on a pretest, we can examine whether they appear to be similar on key measures
before the study begins and make some judgement about the plausibility that a selection bias exists.
Even if we move to a multiple group design and have confidence that our groups are comparable, we cannot
assume that we have strong internal validity. There are a number of social threats to internal validity that arise from
the human interaction present in applied social research that we will also need to address.

What are "Social" Threats?
Applied social research is a human activity. And, the results of such research are affected by the human interactions
involved. The social threats to internal validity refer to the social pressures in the research context that can lead to
posttest differences that are not directly caused by the treatment itself. Most of these threats occur because the
various groups (e.g., program and comparison), or key people involved in carrying out the research (e.g., managers
and administrators, teachers and principals) are aware of each other's existence and of the role they play in the
research project or are in contact with one another. Many of these threats can be minimized by isolating the two
groups from each other, but this leads to other problems (e.g., it's hard to randomly assign and then isolate; this is
likely to reduce generalizability or external validity). Here are the major social interaction threats to internal validity:
Diffusion or Imitation of Treatment
This occurs when a comparison group learns about

the program either directly or indirectly from program
group participants. In a school context, children from
different groups within the same school might share
experiences during lunch hour. Or, comparison group
students, seeing what the program group is getting,
might set up their own experience to try to imitate that
of the program group. In either case, if the diffusion of
imitation affects the posttest performance of the
comparison group, it can have an jeopardize your
ability to assess whether your program is causing the
outcome. Notice that this threat to validity tend to
equalize the outcomes between groups, minimizing
the chance of seeing a program effect even if there is
one.
Compensatory Rivalry
Here, the comparison group
knows what the program group is
getting and develops a
competitive attitude with them.
The students in the comparison
group might see the special math
tutoring program the program
group is getting and feel jealous.
This could lead them to deciding
to compete with the program
group "just to show them" how
well they can do. Sometimes, in
contexts like these, the
participants are even encouraged
by well-meaning teachers or
administrators to compete with
each other (while this might make
educational sense as a motivation for the students in both groups to work harder, it works against
our ability to see the effects of the program). If the rivalry between groups affects posttest
performance, it could maker it more difficult to detect the effects of the program. As with diffusion
and imitation, this threat generally works to in the direction of equalizing the posttest performance
across groups, increasing the chance that you won't see a program effect, even if the program is
effective.
Resentful Demoralization
This is almost the opposite of compensatory rivalry.

Here, students in the comparison group know what the
program group is getting. But here, instead of
developing a rivalry, they get discouraged or angry
and they give up (sometimes referred to as the "screw
you" effect!). Unlike the previous two threats, this one
is likely to exaggerate posttest differences between
groups, making your program look even more effective
than it actually is.
Compensatory Equalization of Treatment
This is the only threat of the four

that primarily involves the people
who help manage the research
context rather than the
participants themselves. When
program and comparison group
participants are aware of each
other's conditions they may wish
they were in the other group
(depending on the perceived
desirability of the program it could
work either way). Often they or
their parents or teachers will put pressure on the administrators to have them reassigned to the
other group. The administrators may begin to feel that the allocation of goods to the groups is not
"fair" and may be pressured to or independently undertake to compensate one group for the
perceived advantage of the other. If the special math tutoring program was being done with state-of-
the-art computers, you can bet that the parents of the children assigned to the traditional non-
computerized comparison group will pressure the principal to "equalize" the situation. Perhaps the
principal will give the comparison group some other good, or let them have access to the computers
for other subjects. If these "compensating" programs equalize the groups on posttest performance,
it will tend to work against your detecting an effective program even when it does work. For
instance, a compensatory program might improve the self-esteem of the comparison group and
eliminate your chance to discover whether the math program would cause changes in self-esteem
relative to traditional math training.
As long as we engage in applied social research we will have to deal with the realities of human interaction and its
effect on the research process. The threats described here can often be minimized by constructing multiple groups
that are not aware of each other (e.g., program group from one school, comparison group from another) or by
training administrators in the importance of preserving group membership and not instituting equalizing programs.
But we will never be able to entirely eliminate the possibility that human interactions are making it more difficult for
us to assess cause-effect relationships.

[ Home ] [ External Validity ] [ Sampling Terminology ] [ Statistical Sampling Terms ] [ Probability Sampling ] [ Nonprobability Sampling ]
External validity is related to generalizing. That's the major thing you need to keep in mind. Recall that validity refers
to the approximate truth of propositions, inferences, or conclusions. So, external validity refers to the approximate
truth of conclusions the involve generalizations. Put in more pedestrian terms, external validity is the degree to which
the conclusions in your study would hold for other persons in other places and at other times.
In science
there are
two major
approaches
to how we
provide
evidence for
a
generalization.
I'll call the
first
approach
the
Sampling
Model. In
the
sampling
model, you
start by
identifying
the
population
you would
like to
generalize to. Then, you draw a fair sample from that population and conduct your research with the sample. Finally,
because the sample is representative of the population, you can automatically generalize your results back to the
population. There are several problems with this approach. First, perhaps you don't know at the time of your study
who you might ultimately like to generalize to. Second, you may not be easily able to draw a fair or representative
sample. Third, it's impossible to sample across all times that you might like to generalize to (like next year).
I'll call the second approach to generalizing the Proximal Similarity Model. 'Proximal' means 'nearby' and 'similarity'
means... well, it means 'similarity'. The term proximal similarity was suggested by Donald T. Campbell as an
appropriate relabeling of the term external validity (although he was the first to admit that it probably wouldn't catch
on!). Under this model, we begin by thinking about different generalizability contexts and developing a theory about
which contexts are more like our study and which are less so. For instance, we might imagine several settings that
have people who are more similar to the people in our study or people who are less similar. This also holds for times
and places. When we place different contexts in terms of their relative similarities, we can call this implicit theoretical
a gradient of similarity. Once we have developed this proximal similarity framework, we are able to generalize.
How? We conclude that we can generalize the results of our study to other persons, places or times that are more
like (that is, more proximally similar) to our study. Notice that here, we can never generalize with certainty -- it is
always a question of more or less similar.
Threats
to
External
Validity
A threat to
external
validity is an
explanation
of how you
might be
wrong in
making a
generalization.
For
instance,
you
conclude
that the
results of
your study (which was done in a specific place, with certain types of people, and at a specific time) can be
generalized to another context (for instance, another place, with slightly different people, at a slightly later time).
There are three major threats to external validity because there are three ways you could be wrong -- people, places
or times. Your critics could come along, for example, and argue that the results of your study are due to the unusual
type of people who were in the study. Or, they could argue that it might only work because of the unusual place you
did the study in (perhaps you did your educational study in a college town with lots of high-achieving educationally-
oriented kids). Or, they might suggest that you did your study in a peculiar time. For instance, if you did your
smoking cessation study the week after the Surgeon General issues the well-publicized results of the latest smoking
and cancer studies, you might get different results than if you had done it the week before.
Improving External Validity
How can we improve external validity? One way, based on the sampling model, suggests that you do a good job of
drawing a sample from a population. For instance, you should use random selection, if possible, rather than a
nonrandom procedure. And, once selected, you should try to assure that the respondents participate in your study
and that you keep your dropout rates low. A second approach would be to use the theory of proximal similarity more
effectively. How? Perhaps you could do a better job of describing the ways your contexts and others differ, providing
lots of data about the degree of similarity between various groups of people, places, and even times. You might
even be able to map out the degree of proximal similarity among various contexts with a methodology like concept
mapping. Perhaps the best approach to criticisms of generalizations is simply to show them that they're wrong -- do
your study in a variety of places, with different people and at different times. That is, your external validity (ability to
generalize) will be stronger the more you replicate your study.

As with anything else in life you have to learn the language of an area if you're going to ever hope to use it. Here, I want to
introduce several different terms for the major groups that are involved in a sampling process and the role that each group plays in
the logic of sampling.
The major question that motivates sampling in the first place is: "Who do you want to generalize to?" Or should it be: "To whom do
you want to generalize?" In most social research we are interested in more than just the people who directly participate in our study.
We would like to be able to talk in general terms and not be confined only to the people who are in our study. Now, there are times
when we aren't very concerned about generalizing. Maybe we're just evaluating a program in a local agency and we don't care
whether the program would work with other people in other places and at other times. In that case, sampling and generalizing might
not be of interest. In other cases, we would really like to be able to generalize almost universally. When psychologists do research,
they are often interested in developing theories that would hold for all humans. But in most applied social research, we are
interested in generalizing to specific groups. The group you wish to generalize to is often called the population in your study. This
is the group you would like to sample from because this is the group you are interested in generalizing to. Let's imagine that you
wish to generalize to urban homeless males between the ages of 30 and 50 in the United States. If that is the population of interest,
you are likely to have a very hard time developing a reasonable sampling plan. You are probably not going to find an accurate
listing of this population, and even if you did, you would almost certainly not be able to mount a national sample across hundreds of
urban areas. So we probably should make a distinction between the population you would like to generalize to, and the population
that will be accessible to you. We'll call the former the theoretical population and the latter the accessible population. In this
example, the accessible population might be homeless males between the ages of 30 and 50 in six selected urban areas across the
U.S.
Once you've identified the theoretical and accessible populations, you have to do one more thing before you can actually draw a
sample -- you have to get a list of the members of the accessible population. (Or, you have to spell out in detail how you will contact
them to assure representativeness). The listing of the accessible population from which you'll draw your sample is called the
sampling frame. If you were doing a phone survey and selecting names from the telephone book, the book would be your
sampling frame. That wouldn't be a great way to sample because significant subportions of the population either don't have a phone
or have moved in or out of the area since the last book was printed. Notice that in this case, you might identify the area code and all
three-digit prefixes within that area code and draw a sample simply by randomly dialing numbers (cleverly known as random-digit-
dialing). In this case, the sampling frame is not a list per se, but is rather a procedure that you follow as the actual basis for
sampling. Finally, you actually draw your sample (using one of the many sampling procedures). The sample is the group of people
who you select to be in your study. Notice that I didn't say that the sample was the group of people who are actually in your study.
You may not be able to contact or recruit all of the people you actually sample, or some could drop out over the course of the study.
The group that actually completes your study is a subsample of the sample -- it doesn't include nonrespondents or dropouts. The
problem of nonresponse and its effects on a study will be addressed elsewhere.
People often confuse what is meant by random selection with the idea of random assignment. You should make sure that you
understand the distinction between random selection and random assignment.
At this point, you should appreciate that sampling is a difficult multi-step process and that there are lots of places you can go wrong.
In fact, as we move from each step to the next in identifying a sample, there is the possibility of introducing systematic error or bias.
For instance, even if you are able to identify perfectly the population of interest, you may not have access to all of them. And even if
you do, you may not have a complete and accurate enumeration or sampling frame from which to select. And, even if you do, you
may not draw the sample correctly or accurately. And, even if you do, they may not all come and they may not all stay. Depressed
yet? This is a very difficult business indeed. At times like this I'm reminded of what Donald Campbell used to say (I'll paraphrase
here): "Cousins to the amoeba, it's amazing that we know anything at all!"

Let's begin
by defining
some very
simple
terms that
are relevant
here. First,
let's look at
the results
of our
sampling
efforts.
When we
sample, the
units that
we sample
-- usually
people --
supply us
with one or
more
responses.
In this
sense, a
response is
a specific
measurement
value that a
sampling
unit
supplies. In
the figure,
the person
is
responding to a survey instrument and gives a response of '4'. When we look across the responses that we get for
our entire sample, we use a statistic. There are a wide variety of statistics we can use -- mean, median, mode, and
so on. In this example, we see that the mean or average for the sample is 3.72. But the reason we sample is so that
we might get an estimate for the population we sampled from. If we could, we would much prefer to measure the
entire population. If you measure the entire population and calculate a value like a mean or average, we don't refer
to this as a statistic, we call it a parameter of the population.
The Sampling Distribution
So how do we get from our sample statistic to an estimate of the population parameter? A crucial midway concept
you need to understand is the sampling distribution. In order to understand it, you have to be able and willing to do
a thought experiment. Imagine that instead of just taking a single sample like we do in a typical study, you took three
independent samples of the same population. And furthermore, imagine that for each of your three samples, you
collected a single response and computed a single statistic, say, the mean of the response. Even though all three
samples came from the same population, you wouldn't expect to get the exact same statistic from each. They would
differ slightly just due to the random "luck of the draw" or to the natural fluctuations or vagaries of drawing a sample.
But you would expect that all three samples would yield a similar statistical estimate because they were drawn from
the same population. Now, for the leap of imagination! Imagine that you did an infinite number of samples from the
same population and computed the average for each one. If you plotted them on a histogram or bar graph you
should find that most of them converge on the same central value and that you get fewer and fewer samples that
have averages farther away up or down from that central value. In other words, the bar graph would be well
described by the bell curve shape that is an indication of a "normal" distribution in statistics. The distribution of an
infinite number of samples of the same size as the sample in your study is known as the sampling distribution. We
don't ever
actually
construct a
sampling
distribution.
Why not?
You're not
paying
attention!
Because to
construct it
we would
have to take
an infinite
number of
samples
and at least
the last time
I checked,
on this
planet
infinite is
not a
number we
know how
to reach. So
why do we
even talk about a sampling distribution? Now that's a good question! Because we need to realize that our sample is
just one of a potentially infinite number of samples that we could have taken. When we keep the sampling
distribution in mind, we realize that while the statistic we got from our sample is probably near the center of the
sampling distribution (because most of the samples would be there) we could have gotten one of the extreme
samples just by the luck of the draw. If we take the average of the sampling distribution -- the average of the
averages of an infinite number of samples -- we would be much closer to the true population average -- the
parameter of interest. So the average of the sampling distribution is essentially equivalent to the parameter. But
what is the standard deviation of the sampling distribution (OK, never had statistics? There are any number of
places on the web where you can learn about them or even just brush up if you've gotten rusty. This isn't one of
them. I'm going to assume that you at least know what a standard deviation is, or that you're capable of finding out
relatively quickly). The standard deviation of the sampling distribution tells us something about how different
samples would be distributed. In statistics it is referred to as the standard error (so we can keep it separate in our
minds from standard deviations. Getting confused? Go get a cup of coffee and come back in ten minutes...OK, let's
try once more... A standard deviation is the spread of the scores around the average in a single sample. The
standard error is the spread of the averages around the average of averages in a sampling distribution. Got it?)
Sampling Error
In sampling contexts, the standard error is called sampling error. Sampling error gives us some idea of the
precision of our statistical estimate. A low sampling error means that we had relatively less variability or range in the
sampling distribution. But here we go again -- we never actually see the sampling distribution! So how do we
calculate sampling error? We base our calculation on the standard deviation of our sample. The greater the sample
standard deviation, the greater the standard error (and the sampling error). The standard error is also related to the
sample size. The greater your sample size, the smaller the standard error. Why? Because the greater the sample
size, the closer your sample is to the actual population itself. If you take a sample that consists of the entire
population you actually have no sampling error because you don't have a sample, you have the entire population. In
that case, the mean you estimate is the parameter.
The 68, 95,

99 Percent
Rule
You've
probably
heard this one
before, but it's
so important
that it's always
worth
repeating...
There is a
general rule
that applies
whenever we
have a normal
or bell-shaped
distribution.
Start with the
average -- the
center of the
distribution. If
you go up and
down (i.e., left
and right) one
standard unit,
you will
include
approximately
68% of the
cases in the
distribution (i.
e., 68% of the
area under the
curve). If you
go up and
down two standard units, you will include approximately 95% of the cases. And if you go plus-or-minus three
standard units, you will include about 99% of the cases. Notice that I didn't specify in the previous few sentences
whether I was talking about standard deviation units or standard error units. That's because the same rule holds for
both types of distributions (i.e., the raw data and sampling distributions). For instance, in the figure, the mean of the
distribution is 3.75 and the standard unit is .25 (If this was a distribution of raw data, we would be talking in standard
deviation units. If it's a sampling distribution, we'd be talking in standard error units). If we go up and down one
standard unit from the mean, we would be going up and down .25 from the mean of 3.75. Within this range -- 3.5 to
4.0 -- we would expect to see approximately 68% of the cases. This section is marked in red on the figure. I leave to
you to figure out the other ranges. But what does this all mean you ask? If we are dealing with raw data and we
know the mean and standard deviation of a sample, we can predict the intervals within which 68, 95 and 99% of our
cases would be expected to fall. We call these intervals the -- guess what -- 68, 95 and 99% confidence intervals.
Now, here's
where
everything
should come
together in
one great
aha!
experience if
you've been
following
along. If we
had a
sampling
distribution,
we would be
able to
predict the
68, 95 and
99%
confidence
intervals for
where the
population
parameter
should be!
And isn't that
why we
sampled in
the first
place? So
that we could
predict
where the
population is
on that
variable?
There's only
one hitch. We don't actually have the sampling distribution (now this is the third time I've said this in this essay)! But
we do have the distribution for the sample itself. And we can from that distribution estimate the standard error (the
sampling error) because it is based on the standard deviation and we have that. And, of course, we don't actually
know the population parameter value -- we're trying to find that out -- but we can use our best estimate for that -- the
sample statistic. Now, if we have the mean of the sampling distribution (or set it to the mean from our sample) and
we have an estimate of the standard error (we calculate that from our sample) then we have the two key ingredients
that we need for our sampling distribution in order to estimate confidence intervals for the population parameter.
Perhaps an example will help. Let's assume we did a study and drew a single sample from the population.
Furthermore, let's assume that the average for the sample was 3.75 and the standard deviation was .25. This is the
raw data distribution depicted above. now, what would the sampling distribution be in this case? Well, we don't
actually construct it (because we would need to take an infinite number of samples) but we can estimate it. For
starters, we assume that the mean of the sampling distribution is the mean of the sample, which is 3.75. Then, we
calculate the standard error. To do this, we use the standard deviation for our sample and the sample size (in this
case N=100) and we come up with a standard error of .025 (just trust me on this). Now we have everything we need
to estimate a confidence interval for the population parameter. We would estimate that the probability is 68% that the
true parameter value falls between 3.725 and 3.775 (i.e., 3.75 plus and minus .025); that the 95% confidence
interval is 3.700 to 3.800; and that we can say with 99% confidence that the population value is between 3.675 and
3.825. The real value (in this fictitious example) was 3.72 and so we have correctly estimated that value with our
sample.

A probability sampling method is any method of sampling that utilizes some form of random selection. In order to
have a random selection method, you must set up some process or procedure that assures that the different units in
your population have equal probabilities of being chosen. Humans have long practiced various forms of random
selection, such as picking a name out of a hat, or choosing the short straw. These days, we tend to use computers
as the mechanism for generating random numbers as the basis for random selection.
Some Definitions
Before I can explain the various probability methods we have to define some basic terms. These are:
N = the number of cases in the sampling frame

n = the number of cases in the sample
NCn = the number of combinations (subsets) of n from N

f = n/N = the sampling fraction
That's it. With those terms defined we can begin to define the different probability sampling methods.
Simple Random Sampling
The simplest form of random sampling is called simple random sampling. Pretty tricky, huh? Here's the quick
description of simple random sampling:
Objective: To select n units out of N such that each NCn has an equal chance of being selected.
Procedure: Use a table of random numbers, a computer random number generator, or a mechanical
device to select the sample.
A somewhat stilted, if accurate, definition. Let's see if we can make it a little more real. How do we select a simple
random sample? Let's assume
that we are doing some research
with a small service agency that
wishes to assess client's views
of quality of service over the past
year. First, we have to get the
sampling frame organized. To
accomplish this, we'll go through
agency records to identify every
client over the past 12 months. If
we're lucky, the agency has
good accurate computerized
records and can quickly produce
such a list. Then, we have to actually draw the sample. Decide on the number of clients you would like to have in the
final sample. For the sake of the example, let's say you want to select 100 clients to survey and that there were 1000
clients over the past 12 months. Then, the sampling fraction is f = n/N = 100/1000 = .10 or 10%. Now, to actually
draw the sample, you have several options. You could print off the list of 1000 clients, tear then into separate strips,
put the strips in a hat, mix them up real good, close your eyes and pull out the first 100. But this mechanical
procedure would be tedious and the quality of the sample would depend on how thoroughly you mixed them up and
how randomly you reached in. Perhaps a better procedure would be to use the kind of ball machine that is popular
with many of the state lotteries. You would need three sets of balls numbered 0 to 9, one set for each of the digits
from 000 to 999 (if we select 000 we'll call that 1000). Number the list of names from 1 to 1000 and then use the ball
machine to select the three digits that selects each person. The obvious disadvantage here is that you need to get
the ball machines. (Where do they make those things, anyway? Is there a ball machine industry?).
Neither of these mechanical procedures is very feasible and, with the development of inexpensive computers there
is a much easier way. Here's a simple procedure that's especially useful if you have the names of the clients already
on the computer. Many computer programs can generate a series of random numbers. Let's assume you can copy
and paste the list of client names into a column in an EXCEL spreadsheet. Then, in the column right next to it paste
the function =RAND() which is EXCEL's way of putting a random number between 0 and 1 in the cells. Then, sort
both columns -- the list of names and the random number -- by the random numbers. This rearranges the list in
random order from the lowest to the highest random number. Then, all you have to do is take the first hundred
names in this sorted list. pretty simple. You could probably accomplish the whole thing in under a minute.
Simple random sampling is simple to accomplish and is easy to explain to others. Because simple random sampling
is a fair way to select a sample, it is reasonable to generalize the results from the sample back to the population.
Simple random sampling is not the most statistically efficient method of sampling and you may, just because of the
luck of the draw, not get good representation of subgroups in a population. To deal with these issues, we have to
turn to other sampling methods.
Stratified Random Sampling
Stratified Random Sampling, also sometimes called proportional or quota random sampling, involves dividing your
population into homogeneous subgroups and then taking a simple random sample in each subgroup. In more formal
terms:
Objective: Divide the population into non-overlapping groups (i.e., strata) N1, N2, N3, ... Ni, such
that N1 + N2 + N3 + ... + Ni = N. Then do a simple random sample of f = n/N in each strata.
There are several major reasons why you might prefer stratified sampling over simple random sampling. First, it
assures that you will be able to represent not only the overall population, but also key subgroups of the population,
especially small minority groups. If you want to be able to talk about subgroups, this may be the only way to
effectively assure you'll be able to. If the subgroup is extremely small, you can use different sampling fractions (f)
within the different strata to randomly over-sample the small group (although you'll then have to weight the within-
group estimates using the sampling fraction whenever you want overall population estimates). When we use the
same sampling fraction within strata we are conducting proportionate stratified random sampling. When we use
different sampling fractions in the strata, we call this disproportionate stratified random sampling. Second, stratified
random sampling will generally have more statistical precision than simple random sampling. This will only be true if
the strata or groups are homogeneous. If they are, we expect that the variability within-groups is lower than the
variability for the population as a whole. Stratified sampling capitalizes on that fact.
For
example,
let's say that
the
population
of clients for
our agency
can be
divided into
three
groups:
Caucasian,
African-
American
and
Hispanic-
American.
Furthermore,
let's assume
that both
the African-
Americans
and
Hispanic-
Americans are relatively small minorities of the clientele (10% and 5% respectively). If we just did a simple random
sample of n=100 with a sampling fraction of 10%, we would expect by chance alone that we would only get 10 and 5
persons from each of our two smaller groups. And, by chance, we could get fewer than that! If we stratify, we can do
better. First, let's determine how many people we want to have in each group. Let's say we still want to take a
sample of 100 from the population of 1000 clients over the past year. But we think that in order to say anything about
subgroups we will need at least 25 cases in each group. So, let's sample 50 Caucasians, 25 African-Americans, and
25 Hispanic-Americans. We know that 10% of the population, or 100 clients, are African-American. If we randomly
sample 25 of these, we have a within-stratum sampling fraction of 25/100 = 25%. Similarly, we know that 5% or 50
clients are Hispanic-American. So our within-stratum sampling fraction will be 25/50 = 50%. Finally, by subtraction
we know that there are 850 Caucasian clients. Our within-stratum sampling fraction for them is 50/850 = about
5.88%. Because the groups are more homogeneous within-group than across the population as a whole, we can
expect greater statistical precision (less variance). And, because we stratified, we know we will have enough cases
from each group to make meaningful subgroup inferences.
Systematic Random Sampling
Here are the steps you need to follow in order to achieve a systematic random sample:
number the units in the population from 1 to N

decide on the n (sample size) that you want or need
k = N/n = the interval size
randomly select an integer between 1 to k
then take every kth unit
All of this
will be much
clearer with
an example.
Let's
assume that
we have a
population
that only
has N=100
people in it
and that you
want to take
a sample of
n=20. To
use
systematic
sampling,
the
population
must be
listed in a
random
order. The
sampling
fraction
would be f =
20/100 = 20%. in this case, the interval size, k, is equal to N/n = 100/20 = 5. Now, select a random integer from 1 to
5. In our example, imagine that you chose 4. Now, to select the sample, start with the 4th unit in the list and take
every k-th unit (every 5th, because k=5). You would be sampling units 4, 9, 14, 19, and so on to 100 and you would
wind up with 20 units in your sample.
For this to work, it is essential that the units in the population are randomly ordered, at least with respect to the
characteristics you are measuring. Why would you ever want to use systematic random sampling? For one thing, it
is fairly easy to do. You only have to select a single random number to start things off. It may also be more precise
than simple random sampling. Finally, in some situations there is simply no easier way to do random sampling. For
instance, I once had to do a study that involved sampling from all the books in a library. Once selected, I would have
to go to the shelf, locate the book, and record when it last circulated. I knew that I had a fairly good sampling frame
in the form of the shelf list (which is a card catalog where the entries are arranged in the order they occur on the
shelf). To do a simple random sample, I could have estimated the total number of books and generated random
numbers to draw the sample; but how would I find book #74,329 easily if that is the number I selected? I couldn’t
very well count the cards until I came to 74,329! Stratifying wouldn’t solve that problem either. For instance, I could
have stratified by card catalog drawer and drawn a simple random sample within each drawer. But I’d still be stuck
counting cards. Instead, I did a systematic random sample. I estimated the number of books in the entire collection.
Let’s imagine it was 100,000. I decided that I wanted to take a sample of 1000 for a sampling fraction of
1000/100,000 = 1%. To get the sampling interval k, I divided N/n = 100,000/1000 = 100. Then I selected a random
integer between 1 and 100. Let’s say I got 57. Next I did a little side study to determine how thick a thousand cards
are in the card catalog (taking into account the varying ages of the cards). Let’s say that on average I found that two
cards that were separated by 100 cards were about .75 inches apart in the catalog drawer. That information gave
me everything I needed to draw the sample. I counted to the 57th by hand and recorded the book information. Then,
I took a compass. (Remember those from your high-school math class? They’re the funny little metal instruments
with a sharp pin on one end and a pencil on the other that you used to draw circles in geometry class.) Then I set
the compass at .75”, stuck the pin end in at the 57th card and pointed with the pencil end to the next card
(approximately 100 books away). In this way, I approximated selecting the 157th, 257th, 357th, and so on. I was
able to accomplish the entire selection procedure in very little time using this systematic random sampling approach.
I’d probably still be there counting cards if I’d tried another random sampling method. (Okay, so I have no life. I got
compensated nicely, I don’t mind saying, for coming up with this scheme.)
Cluster (Area) Random Sampling
The problem with random sampling methods when we have to sample a population that's disbursed across a wide
geographic region is that you will have to cover a lot of ground geographically in order to get to each of the units you
sampled. Imagine taking a simple random sample of all the residents of New York State in order to conduct personal
interviews. By the luck of the draw you will wind up with respondents who come from all over the state. Your
interviewers are going to have a lot of traveling to do. It is for precisely this problem that cluster or area random
sampling was invented.
In cluster sampling, we follow these steps:
divide population into clusters (usually along geographic boundaries)

randomly sample clusters
measure all units within sampled clusters
For
instance, in
the figure
we see a
map of the
counties in
New York
State. Let's
say that we
have to do a
survey of
town
governments
that will
require us
going to the
towns
personally.
If we do a
simple
random
sample
state-wide
we'll have to
cover the
entire state
geographically.
Instead, we
decide to do
a cluster
sampling of five counties (marked in red in the figure). Once these are selected, we go to every town government in
the five areas. Clearly this strategy will help us to economize on our mileage. Cluster or area sampling, then, is
useful in situations like this, and is done primarily for efficiency of administration. Note also, that we probably don't
have to worry about using this approach if we are conducting a mail or telephone survey because it doesn't matter
as much (or cost more or raise inefficiency) where we call or send letters to.
Multi-Stage Sampling
The four methods we've covered so far -- simple, stratified, systematic and cluster -- are the simplest random
sampling strategies. In most real applied social research, we would use sampling methods that are considerably
more complex than these simple variations. The most important principle here is that we can combine the simple
methods described earlier in a variety of useful ways that help us address our sampling needs in the most efficient
and effective manner possible. When we combine sampling methods, we call this multi-stage sampling.
For example, consider the idea of sampling New York State residents for face-to-face interviews. Clearly we would
want to do some type of cluster sampling as the first stage of the process. We might sample townships or census
tracts throughout the state. But in cluster sampling we would then go on to measure everyone in the clusters we
select. Even if we are sampling census tracts we may not be able to measure everyone who is in the census tract.
So, we might set up a stratified sampling process within the clusters. In this case, we would have a two-stage
sampling process with stratified samples within cluster samples. Or, consider the problem of sampling students in
grade schools. We might begin with a national sample of school districts stratified by economics and educational
level. Within selected districts, we might do a simple random sample of schools. Within schools, we might do a
simple random sample of classes or grades. And, within classes, we might even do a simple random sample of
students. In this case, we have three or four stages in the sampling process and we use both stratified and simple
random sampling. By combining different sampling methods we are able to achieve a rich variety of probabilistic
sampling methods that can be used in a wide range of social research contexts.

The difference between nonprobability and probability sampling is that nonprobability sampling does not involve
random selection and probability sampling does. Does that mean that nonprobability samples aren't representative
of the population? Not necessarily. But it does mean that nonprobability samples cannot depend upon the rationale
of probability theory. At least with a probabilistic sample, we know the odds or probability that we have represented
the population well. We are able to estimate confidence intervals for the statistic. With nonprobability samples, we
may or may not represent the population well, and it will often be hard for us to know how well we've done so. In
general, researchers prefer probabilistic or random sampling methods over nonprobabilistic ones, and consider them
to be more accurate and rigorous. However, in applied social research there may be circumstances where it is not
feasible, practical or theoretically sensible to do random sampling. Here, we consider a wide range of
nonprobabilistic alternatives.
We can divide nonprobability sampling methods into two broad types: accidental or purposive. Most sampling
methods are purposive in nature because we usually approach the sampling problem with a specific plan in mind.
The most important distinctions among these types of sampling methods are the ones between the different types of
purposive sampling approaches.
Accidental, Haphazard or Convenience Sampling
One of the most common methods of sampling goes under the various titles listed here. I would
include in this category the traditional "man on the street" (of course, now it's probably the "person
on the street") interviews conducted frequently by television news programs to get a quick (although
nonrepresentative) reading of public opinion. I would also argue that the typical use of college
students in much psychological research is primarily a matter of convenience. (You don't really
believe that psychologists use college students because they believe they're representative of the
population at large, do you?). In clinical practice,we might use clients who are available to us as our
sample. In many research contexts, we sample simply by asking for volunteers. Clearly, the
problem with all of these types of samples is that we have no evidence that they are representative
of the populations we're interested in generalizing to -- and in many cases we would clearly suspect
that they are not.
Purposive Sampling
In purposive sampling, we sample with a purpose in mind. We usually would have one or more
specific predefined groups we are seeking. For instance, have you ever run into people in a mall or
on the street who are carrying a clipboard and who are stopping various people and asking if they
could interview them? Most likely they are conducting a purposive sample (and most likely they are
engaged in market research). They might be looking for Caucasian females between 30-40 years
old. They size up the people passing by and anyone who looks to be in that category they stop to
ask if they will participate. One of the first things they're likely to do is verify that the respondent
does in fact meet the criteria for being in the sample. Purposive sampling can be very useful for
situations where you need to reach a targeted sample quickly and where sampling for
proportionality is not the primary concern. With a purposive sample, you are likely to get the
opinions of your target population, but you are also likely to overweight subgroups in your
population that are more readily accessible.
All of the methods that follow can be considered subcategories of purposive sampling methods. We might sample
for specific groups or types of people as in modal instance, expert, or quota sampling. We might sample for diversity
as in heterogeneity sampling. Or, we might capitalize on informal social networks to identify specific respondents
who are hard to locate otherwise, as in snowball sampling. In all of these methods we know what we want -- we are
sampling with a purpose.
Modal Instance Sampling
In statistics, the mode is the most frequently occurring value in a distribution. In sampling, when we
do a modal instance sample, we are sampling the most frequent case, or the "typical" case. In a lot
of informal public opinion polls, for instance, they interview a "typical" voter. There are a number of
problems with this sampling approach. First, how do we know what the "typical" or "modal" case is?
We could say that the modal voter is a person who is of average age, educational level, and income
in the population. But, it's not clear that using the averages of these is the fairest (consider the
skewed distribution of income, for instance). And, how do you know that those three variables --
age, education, income -- are the only or event the most relevant for classifying the typical voter?
What if religion or ethnicity is an important discriminator? Clearly, modal instance sampling is only
sensible for informal sampling contexts.
Expert Sampling
Expert sampling involves the assembling of a sample of persons with known or demonstrable
experience and expertise in some area. Often, we convene such a sample under the auspices of a
"panel of experts." There are actually two reasons you might do expert sampling. First, because it
would be the best way to elicit the views of persons who have specific expertise. In this case, expert
sampling is essentially just a specific subcase of purposive sampling. But the other reason you
might use expert sampling is to provide evidence for the validity of another sampling approach
you've chosen. For instance, let's say you do modal instance sampling and are concerned that the
criteria you used for defining the modal instance are subject to criticism. You might convene an
expert panel consisting of persons with acknowledged experience and insight into that field or topic
and ask them to examine your modal definitions and comment on their appropriateness and validity.
The advantage of doing this is that you aren't out on your own trying to defend your decisions -- you
have some acknowledged experts to back you. The disadvantage is that even the experts can be,
and often are, wrong.
Quota Sampling
In quota sampling, you select people nonrandomly according to some fixed quota. There are two
types of quota sampling: proportional and non proportional. In proportional quota sampling you
want to represent the major characteristics of the population by sampling a proportional amount of
each. For instance, if you know the population has 40% women and 60% men, and that you want a
total sample size of 100, you will continue sampling until you get those percentages and then you
will stop. So, if you've already got the 40 women for your sample, but not the sixty men, you will
continue to sample men but even if legitimate women respondents come along, you will not sample
them because you have already "met your quota." The problem here (as in much purposive
sampling) is that you have to decide the specific characteristics on which you will base the quota.
Will it be by gender, age, education race, religion, etc.?
Nonproportional quota sampling is a bit less restrictive. In this method, you specify the minimum
number of sampled units you want in each category. here, you're not concerned with having
numbers that match the proportions in the population. Instead, you simply want to have enough to
assure that you will be able to talk about even small groups in the population. This method is the
nonprobabilistic analogue of stratified random sampling in that it is typically used to assure that
smaller groups are adequately represented in your sample.
Heterogeneity Sampling
We sample for heterogeneity when we want to include all opinions or views, and we aren't
concerned about representing these views proportionately. Another term for this is sampling for
diversity. In many brainstorming or nominal group processes (including concept mapping), we would
use some form of heterogeneity sampling because our primary interest is in getting broad spectrum
of ideas, not identifying the "average" or "modal instance" ones. In effect, what we would like to be
sampling is not people, but ideas. We imagine that there is a universe of all possible ideas relevant
to some topic and that we want to sample this population, not the population of people who have the
ideas. Clearly, in order to get all of the ideas, and especially the "outlier" or unusual ones, we have
to include a broad and diverse range of participants. Heterogeneity sampling is, in this sense,
almost the opposite of modal instance sampling.
Snowball Sampling
In snowball sampling, you begin by identifying someone who meets the criteria for inclusion in your
study. You then ask them to recommend others who they may know who also meet the criteria.
Although this method would hardly lead to representative samples, there are times when it may be
the best method available. Snowball sampling is especially useful when you are trying to reach
populations that are inaccessible or hard to find. For instance, if you are studying the homeless, you
are not likely to be able to find good lists of homeless people within a specific geographical area.
However, if you go to that area and identify one or two, you may find that they know very well who
the other homeless people in their vicinity are and how you can find them.

[ Home ] [ Probabilistic Equivalence ] [ Random Selection & Assignment ]
Random selection is how you draw the sample of people for your study from a population. Random assignment is
how you assign the sample that you draw to different groups or treatments in your study.
It is possible to have both random selection and assignment in a study. Let's say you drew a random sample of 100
clients from a population list of 1000 current clients of your organization. That is random sampling. Now, let's say
you randomly assign 50 of these clients to get some new additional treatment and the other 50 to be controls. That's
random assignment.
It is also possible to have only one of these (random selection or random assignment) but not the other in a study.
For instance, if you do not randomly draw the 100 cases from your list of 1000 but instead just take the first 100 on
the list, you do not have random selection. But you could still randomly assign this nonrandom sample to treatment
versus control. Or, you could randomly select 100 from your list of 1000 and then nonrandomly (haphazardly) assign
them to treatment or control.
And, it's possible to have neither random selection nor random assignment. In a typical nonequivalent groups design
in education you might nonrandomly choose two 5th grade classes to be in your study. This is nonrandom selection.
Then, you could arbitrarily assign one to get the new educational program and the other to be the control. This is
nonrandom (or nonequivalent) assignment.
Random selection is related to sampling. Therefore it is most related to the external validity (or generalizability) of
your results. After all, we would randomly sample so that our research participants better represent the larger group
from which they're drawn. Random assignment is most related to design. In fact, when we randomly assign
participants to treatments we have, by definition, an experimental design. Therefore, random assignment is most
related to internal validity. After all, we randomly assign in order to help assure that our treatment groups are similar
to each other (i.e., equivalent) prior to the treatment.

[ Home ] [ Probabilistic Equivalence ] [ Random Selection & Assignment ]
What is Probabilistic Equivalence?
What do
I mean
by the
term
probabilistic
equivalence?
Well, to
begin
with, I
certainly
don't
mean
that two
groups
are equal
to each
other.
When we
deal with
human
beings it
is
impossible to ever say that any two individuals or groups are equal or equivalent. Clearly the important term in the
phrase is "probabilistic". This means that the type of equivalence we have is based on the notion of probabilities. In
more concrete terms, probabilistic equivalence means that we know perfectly the odds that we will find a difference
between two groups. Notice, it doesn't mean that the means of the two groups will be equal. It just means that we
know the odds that they won't be equal. The figure shows two groups, one having a mean of 49 and the other with a
mean of 51. Could these two groups be probabilistically equivalent? Certainly!
We achieve probabilistic equivalence through the mechanism of random assignment to groups. When we randomly
assign to groups, we can calculate the chance that the two groups will differ just because of the random assignment
(i.e., by chance alone). Let's say we are assigning a group of first grade students to two groups. Further, let's
assume that the average test scores for these children for a standardized test with a population mean of 50 were 49
and 51 respectively. We might conduct a t-test to see if the means of our two randomly assigned groups are
statistically different. We know -- through random assignment and the law of large numbers -- that the chance that
they will be different is 5 out of 100 when we set our significance level to .05 (i.e., alpha = .05). In other words, 5
times out of every 100, when we randomly assign two groups, we can expect to get a significant difference at the .05
level of significance.
When we assign randomly, the only reason the groups can differ is because of chance assignment because their
assignment is entirely based on the randomness of assignment. If, by chance, the groups differ on one variable, we
have no reason to believe that they will automatically be different on any other. Even if we find that the groups differ
on a pretest, we have no reason to suspect that they will differ on a posttest. Why? Because their pretest difference
had to be a chance one. So, when we randomly assign, we are able to assume that the groups do have a form of
equivalence. We don't expect them to be equal. But we do expect that they are "probabilistically" equal.

[ Home ] [ Two-Group Experimental Designs ] [ Classifying Experimental Designs ] [ Factorial Designs ] [ Randomized Block Designs ] [ Covariance Designs ]
[ Hybrid Experimental Designs ]
Probabilistic Equivalence The simplest of all experimental designs is the two-group posttest-
Random Selection & Assignment only randomized experiment. In design notation, it has two lines --
one for each group -- with an R at the beginning of each line to
indicate that the groups were randomly assigned. One group gets the
treatment or program (the X) and the other group is the comparison
group and doesn't get the program (note that this you could
alternatively have the comparison group receive the standard or
typical treatment, in which case this study would be a relative
comparison).
Notice that a pretest is not required for this

design. Usually we include a pretest in order to
determine whether groups are comparable prior
to the program, but because we are using
random assignment we can assume that the two
groups are probabilistically equivalent to begin
with and the pretest is not required (although
you'll see with covariance designs that a pretest
may still be desirable in this context).
In this design, we are most interested in

determining whether the two groups are different
after the program. Typically we measure the
groups on one or more measures (the Os in
notation) and we compare them by testing for
the differences between the means using a t-test
or one way Analysis of Variance (ANOVA).
The posttest-only randomized experiment is

strong against the single-group threats to
internal validity because it's not a single group
design! (Tricky, huh?) It's strong against the all
of the multiple-group threats except for selection-
mortality. For instance, it's strong against
selection-testing and selection-instrumentation
because it doesn't use repeated measurement.
The selection-mortality threat is especially
salient if there are differential rates of dropouts
in the two groups. This could result if the treatment or program is a noxious or negative one (e.g., a painful medical
procedure like chemotherapy) or if the control group condition is painful or intolerable. This design is susceptible to
all of the social interaction threats to internal validity. Because the design requires random assignment, in some
institutional settings (e.g., schools) it is more likely to utilize persons who would be aware of each other and of the
conditions they've been assigned to.
The posttest-only randomized experimental design is, despite its simple structure, one of the best research
designs for assessing cause-effect relationships. It is easy to execute and, because it uses only a posttest, is
relatively inexpensive. But there are many variations on this simple experimental design. You can begin to explore
these by looking at how we classify the various experimental designs.
Statistical Analysis of The Posttest-Only Randomized Experiment

Although there are a great

variety of experimental design
variations, we can classify and
organize them using a simple
signal-to-noise ratio metaphor.
In this metaphor, we assume
that what we observe or see
can be divided into two
components, the signal and the
noise (by the way, this is
directly analogous to the true
score theory of measurement).
The figure, for instance, shows
a time series with a slightly
downward slope. But because
there is so much variability or
noise in the series, it is difficult
even to detect the downward
slope. When we divide the
series into its two components,
we can clearly see the slope.
In most research, the signal is

related to the key variable of
interest -- the construct you're trying to measure, the program or treatment that's being implemented. The noise consists of all of the
random factors in the situation that make it harder to see the signal -- the lighting in the room, local distractions, how people felt that day,
etc. We can construct a ratio of these two by dividing the signal by the noise. In research, we want the signal to be high relative to the
noise. For instance, if you have a very powerful treatment or program (i.e., strong signal) and very good
measurement (i.e., low noise) you will have a better chance of seeing the effect of the program than if you have
either a strong program and weak measurement or a weak program and strong measurement.
With this in mind, we can now classify the experimental designs into two categories: signal enhancers or noise
reducers. Notice that doing either of these things -- enhancing signal or reducing noise -- improves the quality of
the research. The signal-enhancing experimental designs are called the factorial designs. In these designs, the
focus is almost entirely on the setup of the program or treatment, its components and its major dimensions. In a
typical factorial design we would examine a number of different variations of a treatment.
There are two major types of noise-reducing experimental designs: covariance designs and blocking designs. In these designs we typically
use information about the makeup of the sample or about pre-program variables to remove some of the noise in our study.

Factorial Design Variations

A Simple Example
Probably the easiest way to begin understanding factorial designs is by looking at an example. Let's imagine a design
where we have an educational program where we would like to look at a variety of program variations to see which
works best. For instance, we would like to vary the amount of time the children receive instruction with one group
getting 1 hour of instruction per
week and another getting 4
hours per week. And, we'd like
to vary the setting with one
group getting the instruction in-
class (probably pulled off into a
corner of the classroom) and the
other group being pulled-out of
the classroom for instruction in
another room. We could think
about having four separate
groups to do this, but when we
are varying the amount of time in
instruction, what setting would
we use: in-class or pull-out?
And, when we were studying
setting, what amount of
instruction time would we use: 1
hour, 4 hours, or something
else?
With factorial designs, we don't

have to compromise when
answering these questions. We can have it both ways if we cross each of our two time in instruction conditions with
each of our two settings. Let's begin by doing some defining of terms. In factorial designs, a factor is a major
independent variable. In this example we have two factors: time in instruction and setting. A level is a subdivision of a
factor. In this example, time in instruction has two levels and setting has two levels. Sometimes we depict a factorial
design with a numbering notation. In this example, we can say that we have a 2 x 2 (spoken "two-by-two) factorial
design. In this notation, the number of numbers tells you how many factors there are and the number values tell you
how many levels. If I said I had a 3 x 4 factorial design, you would know that I had 2 factors and that one factor had 3
levels while the other had 4. Order of the numbers makes no difference and we could just as easily term this a 4 x 3
factorial design. The number of different treatment groups that we have in any factorial design can easily be
determined by multiplying through the number notation. For instance, in our example we have 2 x 2 = 4 groups. In our
notational example, we would need 3 x 4 = 12 groups.
We can also depict a factorial design in design notation.

Because of the treatment level combinations, it is useful to use
subscripts on the treatment (X) symbol. We can see in the
figure that there are four groups, one for each combination of
levels of factors. It is also immediately apparent that the groups
were randomly assigned and that this is a posttest-only design.
Now, let's look at a variety of different results we might get from

this simple 2 x 2 factorial design. Each of the following figures
describes a different possible outcome. And each outcome is
shown in table form (the 2 x 2 table with the row and column
averages) and in graphic form (with each factor taking a turn on
the horizontal axis). You should convince yourself that the information in the tables agrees with the information in both
of the graphs. You should also convince yourself that the pair of graphs in each figure show the exact same information
graphed in two different ways. The lines that are shown in the graphs are technically not necessary -- they are used as
a visual aid to enable you to easily track where the averages for a single level go across levels of another factor. Keep
in mind that the values shown in the tables and graphs are group averages on the outcome variable of interest. In this
example, the outcome might be a test of achievement in the subject being taught. We will assume that scores on this
test range from 1 to 10 with higher values indicating greater achievement. You should study carefully the outcomes in
each figure in order to understand the differences between these cases.
The Null Outcome
Let's begin by looking at the

"null" case. The null case is a
situation where the treatments
have no effect. This figure
assumes that even if we didn't
give the training we could expect
that students would score a 5 on
average on the outcome test.
You can see in this hypothetical
case that all four groups score
an average of 5 and therefore
the row and column averages
must be 5. You can't see the
lines for both levels in the
graphs because one line falls
right on top of the other.
The Main Effects
A main effect is an outcome

that is a consistent difference
between levels of a factor. For
instance, we would say there’s a
main effect for setting if we find
a statistical difference between
the averages for the in-class and
pull-out groups, at all levels of
time in instruction. The first
figure depicts a main effect of
time. For all settings, the 4 hour/
week condition worked better
than the 1 hour/week one. It is
also possible to have a main
effect for setting (and none for
time).
In the second main effect graph
we see that in-class training was better than pull-out training for all amounts of time.
Finally, it is possible to have a

main effect on both variables simultaneously as depicted in the third main effect figure. In this instance 4 hours/week
always works better than 1 hour/week and in-class setting always works better than pull-out.
Interaction Effects
If we could only look at main

effects, factorial designs would
be useful. But, because of the
way we combine levels in
factorial designs, they also
enable us to examine the
interaction effects that exist
between factors. An interaction
effect exists when differences on
one factor depend on the level
you are on another factor. It's
important to recognize that an
interaction is between factors,
not levels. We wouldn't say
there's an interaction between 4
hours/week and in-class
treatment. Instead, we would
say that there's an interaction
between time and setting, and
then we would go on to describe the specific levels involved.
How do you know if there is an interaction in a factorial design? There are three ways you can determine there's an
interaction. First, when you run the statistical analysis, the statistical table will report on all main effects and
interactions. Second, you know there's an interaction when can't talk about effect on one factor without mentioning the
other factor. if you can say at the end of our study that time in instruction makes a difference, then you know that you
have a main effect and not an interaction (because you did not have to mention the setting factor when describing the
results for time). On the other hand, when you have an interaction it is impossible to describe your results accurately
without mentioning both factors. Finally, you can always spot an interaction in the graphs of group means -- whenever
there are lines that are not parallel there is an interaction present! If you check out the main effect graphs above, you
will notice that all of the lines within a graph are parallel. In contrast, for all of the interaction graphs, you will see that
the lines are not parallel.
In the first interaction effect

graph, we see that one
combination of levels -- 4 hours/
week and in-class setting -- does
better than the other three. In
the second interaction we have
a more complex "cross-over"
interaction. Here, at 1 hour/week
the pull-out group does better
than the in-class group while at
4 hours/week the reverse is true.
Furthermore, the both of these
combinations of levels do
equally well.
Summary
Factorial design has several

important features. First, it has
great flexibility for exploring or
enhancing the “signal”
(treatment) in our studies.
Whenever we are interested in examining treatment variations, factorial designs should be strong candidates as the
designs of choice. Second, factorial designs are efficient. Instead of conducting a series of independent studies we are
effectively able to combine these studies into one. Finally, factorial designs are the only effective way to examine
interaction effects.
So far, we have only looked at a very simple 2 x 2 factorial design structure. You may want to look at some factorial
design variations to get a deeper understanding of how they work. You may also want to examine how we approach
the statistical analysis of factorial experimental designs.
Statistical Analysis of Factorial Designs

The Randomized Block Design is research design's equivalent to stratified random sampling. Like stratified sampling, randomized block
designs are constructed to reduce noise or variance in the data (see Classifying the Experimental Designs). How do they do it? They
require that the researcher divide the sample into relatively homogeneous subgroups or blocks (analogous to "strata" in stratified
sampling). Then, the experimental design you want to implement is implemented within each block or homogeneous subgroup. The key
idea is that the variability within each block is less than the variability of the entire sample. Thus each estimate of the treatment effect
within a block is more efficient than estimates across the entire sample. And, when we pool these more efficient estimates across blocks,
we should get an overall more efficient estimate than we would without blocking.
Here, we can see a simple example. Let's assume

that we originally intended to conduct a simple
posttest-only randomized experimental design. But,
we recognize that our sample has several intact or
homogeneous subgroups. For instance, in a study of
college students, we might expect that students are
relatively homogeneous with respect to class or year.
So, we decide to block the sample into four groups:
freshman, sophomore, junior, and senior. If our
hunch is correct, that the variability within class is
less than the variability for the entire sample, we will
probably get more powerful estimates of the
treatment effect within each block (see the discussion
on Statistical Power). Within each of our four blocks,
we would implement the simple post-only randomized
experiment.
Notice a couple of things about this strategy. First, to

an external observer, it may not be apparent that you
are blocking. You would be implementing the same
design in each block. And, there is no reason that the
people in different blocks need to be segregated or
separated from each other. In other words, blocking
doesn't necessarily affect anything that you do with
the research participants. Instead, blocking is a
strategy for grouping people in your data analysis in order to reduce noise -- it is an analysis strategy. Second, you will only benefit from a
blocking design if you are correct in your hunch that the blocks are more homogeneous than the entire sample is. If you are wrong -- if
different college-level classes aren't relatively homogeneous with respect to your measures -- you will actually be hurt by blocking (you'll
get a less powerful estimate of the treatment effect). How do you know if blocking is a good idea? You need to consider carefully whether
the groups are relatively homogeneous. If you are measuring political attitudes, for instance, is it reasonable to believe that freshmen are
more like each other than they are like sophomores or juniors? Would they be more homogeneous with respect to measures related to
drug abuse? Ultimately the decision to block involves judgment on the part of the researcher.
How Blocking Reduces Noise

So how does blocking work to reduce noise in the
data? To see how it works, you have to begin by
thinking about the non-blocked study. The figure
shows the pretest-posttest distribution for a
hypothetical pre-post randomized experimental
design. We use the 'X' symbol to indicate a
program group case and the 'O' symbol for a
comparison group member. You can see that for
any specific pretest value, the program group
tends to outscore the comparison group by about
10 points on the posttest. That is, there is about a
10-point posttest mean difference.
Now, let's consider an example where we divide

the sample into three relatively homogeneous
blocks. To see what happens graphically, we'll use
the pretest measure to block. This will assure that
the groups are very homogeneous. Let's look at
what is happening within the third block. Notice
that the mean difference is still the same as it was
for the entire sample -- about 10 points within each
block. But also notice that the variability of the
posttest is much less than it was for the entire
sample. Remember that the treatment effect
estimate is a signal-to-noise ratio. The signal in this case is the mean difference. The
noise is the variability. The two figures show that we haven't changed the signal in
moving to blocking -- there is still about a 10-point posttest difference. But, we have
changed the noise -- the variability on the posttest is much smaller within each block
that it is for the entire sample. So, the treatment effect will have less noise for the same
signal.
It should be clear from the graphs that the blocking design in this case will yield the
stronger treatment effect. But this is true only because we did a good job assuring that
the blocks were homogeneous. If the blocks weren't homogeneous -- their variability
was as large as the entire sample's -- we would actually get worse estimates than in
the simple randomized experimental case. We'll see how to analyze data from a
randomized block design in the Statistical Analysis of the Randomized Block Design.
Statistical Analysis of The Randomized Block Design

Design Notation
The basic Analysis of Covariance Design (ANCOVA or ANACOVA) is a just pretest-posttest

randomized experimental design. The notation shown here suggests that the pre-program
measure is the same one as the post-program measure (otherwise we would use subscripts to
distinguish the two), and so we would call this a pretest. But you should note that the pre-
program measure doesn't have to be a pretest -- it can be any variable measured prior to the
program intervention. It is also possible for a study to have more than one covariate.
The pre-program measure or pretest is sometimes also called a "covariate" because of the way it's used in the data analysis -- we "covary"
it with the outcome variable or posttest in order to remove variability or noise. Thus, the ANCOVA design falls in the class of a "noise
reduction" experimental design (see Classifying the Experimental Designs).
In social research we frequently hear about statistical "adjustments" that attempt to control for important factors in our study. For instance,
we might read that an analysis "examined posttest performance after adjusting for the income and educational level of the participants."
In this case, "income" and "education level" are covariates. Covariates are the variables you "adjust for" in your study. Sometimes the
language that will be used is that of "removing the effects" of one variable from another. For instance, we might read that an analysis
"examined posttest performance after removing the effect of income and educational level of the participants."
How Does A Covariate Reduce Noise?
One of the most important ideas in social research is how we make a statistical adjustment -- adjust one variable based on its covariance
with another variable. If you understand this idea, you'll be well on your way to mastering social research. What I want to do here is to
show you a series of graphs that illustrate pictorially what we mean by adjusting for a covariate.
Let's begin with data from a simple ANCOVA design as described above. The first figure shows the pre-post bivariate distribution. Each
"dot" on the graph represents the pretest and posttest score for an individual. We use an 'X' to signify a program or treated case and an 'O'
to describe a control or comparison case. You should be able to see a few things immediately. First, you should be able to see a whopping
treatment effect! It's so obvious that you don't even need statistical analysis to tell you whether there's an effect (although you may want to
use statistics to estimate its size and probability). How do I know there's an effect? Look at any pretest value (value on the horizontal axis).
Now, look up from that value -- you are looking up the posttest scale from lower to higher posttest scores. Do you see any pattern with
respect to the groups? It should be obvious to you that the program cases (the 'X's) tend to score higher on the posttest at any given
pretest value. Second, you should see that the posttest variability has a range of about 70 points.
Now, let's fit some straight lines to the data. The lines on the graph are regression lines that describe the pre-post relationship for each of
the groups. The regression line shows the expected posttest score for any pretest score. The treatment effect is even clearer with the
regression lines. You should see that the line for the treated group is about 10 points higher than the line for the comparison group at any
pretest value.
What we want to do is remove some of the variability in the posttest while preserving the difference between the groups. Or, in other
terms, we want to "adjust" the posttest scores for pretest variability. In effect, we want to "subtract out" the pretest. You might think of this
as subtracting the line from each group from the data for each group. How do we do that? Well, why don't we actually subtract?!? Find the
posttest difference between the line for a group and each actual value. We call each of these differences a residual -- it's what's left over
when you subtract a line from the data.
Now, here comes the tricky part. What does the data look like when we subtract out a line? You might think of it almost like turning the
above graph clockwise until the regression lines are horizontal. The figures below show this in two steps. First, I construct and x-y axis
system where the x dimension is parallel to the regression lines.
Then, I actually turn the graph clockwise so that the regression lines are now flat horizontally. Now, look at how big the posttest variability
or range is in the figure (as indicated by the red double arrow). You should see that the range is considerably smaller that the 70 points we
started out with above. You should also see that the difference between the lines is the same as it was before. So, we have in effect
reduced posttest variability while maintaining the group difference. We've lowered the noise while keeping the signal at its original
strength. The statistical adjustment procedure will result in a more efficient and more powerful estimate of the treatment effect.
You should also note the shape of the pre-post relationship. Essentially, the plot now looks like a zero correlation between the pretest and,
in fact, it is. How do I know it's a zero correlation? Because any line that can be fitted through the data well would be horizontal. There's no
slope or relationship. And, there shouldn't be. This graph shows the pre-post relationship after we've removed the pretest! If we've
removed the pretest from the posttest there will be no pre-post correlation left.
Finally, let's redraw the axes to indicate that the pretest has been removed. here, the posttest values are the original posttest values minus
the line (the predicted posttest values). That's why we see that the new posttest axis has 0 at it's center. Negative values on the posttest
indicate that the original point fell below the regression line on the original axis. Here, we can better estimate that the posttest range is
about 50 points instead of the original 70, even though the difference between the regression lines is the same. We've lowered the noise
while retaining the signal.
[DISCLAIMER: OK, I know there's some statistical hot-shot out there fuming about the inaccuracy in my description above. My picture rotation is not
exactly what we do when we adjust for a covariate. My description suggests that we drop perpendicular lines from the regression line to each point to
obtain the subtracted difference. In fact, we drop lines that are perpendicular to the horizontal axis, not the regression line itself (in Least Squares
regression we are minimizing the the sum of squares of the residuals on the dependent variable, not jointly on the independent and dependent
variable). In any event, while my explanation may not be perfectly accurate from a statistical point of view, it's not very far off, and I think it conveys
more clearly the idea of subtracting out a relationship. I thought I'd just put this disclaimer in to let you know I'm not dumb enough to believe that the
description above is perfectly accurate.]
The adjustment for a covariate in the ANCOVA design is accomplished with the statistical analysis, not through rotation of graphs. See the
Statistical Analysis of the Analysis of Covariance Design for details.
Summary
Some thoughts to conclude this topic. The ANCOVA design is a noise-reducing experimental design. It "adjusts" posttest scores for
variability on the covariate (pretest). This is what we mean by "adjusting" for the effects of one variable on another in social research. You
can use any continuous variable as a covariate, but the pretest is usually best. Why? Because the pretest is usually the variable that would
be most highly correlated with the posttest (a variable should correlate highly with itself, shouldn't it?). Because it's so highly correlated,
when you "subtract it out" or "remove' it, you're removing more extraneous variability from the posttest. The rule in selecting covariates is
to select the measure(s) that correlate most highly with the outcome and, for multiple covariates, have little intercorrelation (otherwise,
you're just adding in redundant covariates and you will actually lose precision by doing that). For example, you probably wouldn't want to
use both gross and net income as two covariates in the same analysis because they are highly related and therefore redundant as
adjustment variables.
Statistical Analysis of the Analysis of Covariance Design

Hybrid experimental designs are just what the name implies -- new strains that are formed by combining features of more established
designs. There are lots of variations that could be constructed from standard design features. Here, I'm going to introduce two hybrid
designs. I'm featuring these because they illustrate especially well how a design can be constructed to address specific threats to internal
validity.
The Solomon Four-Group Design
The Solomon Four-Group Design is designed to deal with a potential testing threat. Recall
that a testing threat occurs when the act of taking a test affects how people score on a retest
or posttest. The design notation is shown in the figure. It's probably not a big surprise that
this design has four groups. Note that two of the groups receive the treatment and two do
not. Further, two of the groups receive a pretest and two do not. One way to view this is as a
2x2 (Treatment Group X Measurement Group) factorial design. Within each treatment
condition we have a group that is pretested and one that is not. By explicitly including testing
as a factor in the design, we are able to assess experimentally whether a testing threat is
operating.
Possible Outcomes. Let's look at a couple of possible outcomes from this design. The first outcome graph shows what the data might
look like if there is a treatment or program effect and there is no testing threat. You need to be careful in interpreting this graph to note that
there are six dots -- one to represent the average for each O in the design notation. To help you visually see the connection between the
pretest and posttest average for the same group, a line is used to connect the dots. The two dots that are not connected by a line
represent the two post-only groups. Look first at the two pretest means. They are close to each because the groups were randomly
assigned. On the posttest, both treatment groups outscored both controls. Now, look at the posttest values. There appears to be no
difference between the treatment groups, even though one got a pretest and the other did not. Similarly, the two control groups scored
about the same on the posttest. Thus, the pretest did not appear to affect the outcome. But both treatment groups clearly outscored both
controls. There is a main effect for the treatment.
Now, look at a result where there is evidence of a testing threat. In this outcome, the pretests are again equivalent (because the groups
were randomly assigned). Each treatment group outscored it's comparable control group. The pre-post treatment outscored the pre-post
control. And, the post-only treatment outscored the post-only control. These results indicate that there is a treatment effect. But here, both
groups that had the pretest outscored their comparable non-pretest group. That's evidence for a testing threat.
Switching Replications Design
The Switching Replications design is one of the strongest of the

experimental designs. And, when the circumstances are right for this design,
it addresses one of the major problems in experimental designs -- the need
to deny the program to some participants through random assignment. The
design notation indicates that this is a two group design with three waves of
measurement. You might think of this as two pre-post treatment-control
designs grafted together. That is, the implementation of the treatment is repeated or replicated. And in the repetition of the treatment, the
two groups switch roles -- the original control group becomes the treatment group in phase 2 while the original treatment acts as the
control. By the end of the study all participants have received the treatment.
The switching replications design is most feasible in organizational contexts where programs are repeated at regular intervals. For
instance, it works especially well in schools that are on a semester system. All students are pretested at the beginning of the school year.
During the first semester, Group 1 receives the treatment and during the second semester Group 2 gets it. The design also enhances
organizational efficiency in resource allocation. Schools only need to allocate enough resources to give the program to half of the students
at a time.
Possible Outcomes. Let's look at two possible outcomes. In the first example, we see that when the program is given to the first group,
the recipients do better than the controls. In the second phase, when the program is given to the original controls, they "catch up" to the
original program group. Thus, we have a converge, diverge, reconverge outcome pattern. We might expect a result like this when the
program covers specific content that the students master in the short term and where we don't expect that they will continue getting better
as a result.
Now, look at the other example result. During the first phase we see the same result as before -- the program group improves while the
control does not. And, as before, during the second phase we see the original control group, now the program group, improve as much as
did the first program group. But now, during phase two, the original program group continues to increase even though the program is no
longer being given them. Why would this happen? It could happen in circumstances where the program has continuing and longer term
effects. For instance, if the program focused on learning skills, students might continue to improve even after the formal program period
because they continue to apply the skills and improve in them.
I said at the outset that both the Solomon Four-Group and the Switching Replications designs addressed specific threats to internal
validity. It's obvious that the Solomon design addressed a testing threat. But what does the switching replications design address?
Remember that in randomized experiments, especially when the groups are aware of each other, there is the potential for social threats --
compensatory rivalry, compensatory equalization and resentful demoralization are all likely to be present in educational contexts where
programs are given to some students and not to others. The switching replications design helps mitigate these threats because it assures
that everyone will eventually get the program. And, it allocates who gets the program first in the fairest possible manner, through the lottery
of random assignment.

[ Home ] [ Threats to Conclusion Validity ] [ Improving Conclusion Validity ] [ Statistical Power ]
One of the most interesting introductions to the idea of statistical power is given in the 'OJ' Page which was created
by Rob Becker to illustrate how the decision a jury has to reach (guilty vs. not guilt) is similar to the decision a
researcher makes when assessing a relationship. The OJ Page uses the infamous OJ Simpson murder trial to
introduce the idea of statistical power and illustrate how manipulating various factors (e.g., the amount of evidence,
the "effect size", and the level of risk) affects the validity of the verdict.
There are four interrelated components that influence the conclusions you might reach from a statistical test in a
research project. The logic of statistical inference with respect to these components is often difficult to understand
and explain. This paper attempts to clarify the four components and describe their interrelationships.
The four components are:
sample size, or the number of units (e.g., people) accessible to the study
effect size, or the salience of the treatment relative to the noise in measurement
alpha level (α, or significance level), or the odds that the observed result is due to chance
power, or the odds that you will observe a treatment effect when it occurs
Given values for any three of these components, it is possible to compute the value of the fourth. For instance, you
might want to determine what a reasonable sample size would be for a study. If you could make reasonable
estimates of the effect size, alpha level and power, it would be simple to compute (or, more likely, look up in a table)
the sample size.
Some of these components will be more manipulable than others depending on the circumstances of the project. For
example, if the project is an evaluation of an educational program or counseling program with a specific number of
available consumers, the sample size is set or predetermined. Or, if the drug dosage in a program has to be small
due to its potential negative side effects, the effect size may consequently be small. The goal is to achieve a balance
of the four components that allows the maximum level of power to detect an effect if one exists, given programmatic,
logistical or financial constraints on the other components.
Figure 1 shows the basic decision matrix involved in a statistical conclusion. All statistical conclusions involve
constructing two mutually exclusive hypotheses, termed the null (labeled H0) and alternative (labeled H1)
hypothesis. Together, the hypotheses describe all possible outcomes with respect to the inference. The central
decision involves determining which hypothesis to accept and which to reject. For instance, in the typical case, the
null hypothesis might be:
H0: Program Effect = 0

while the alternative might be
H1: Program Effect <> 0
The null hypothesis is so termed because it usually refers to the "no difference" or "no effect" case. Usually in social
research we expect that our treatments and programs will make a difference. So, typically, our theory is described in
the alternative hypothesis.
Figure 1 below is a complex figure that you should take some time studying. First, look at the header row (the
shaded area). This row depicts reality -- whether there really is a program effect, difference, or gain. Of course, the
problem is that you never know for sure what is really happening (unless you’re God). Nevertheless, because we
have set up mutually exclusive hypotheses, one must be right and one must be wrong. Therefore, consider this the
view from God’s position, knowing which hypothesis is correct. The first column of the 2x2 table shows the case
where our program does not have an effect; the second column shows where it does have an effect or make a
difference.
The left header column describes the world we mortals live in. Regardless of what’s true, we have to make decisions
about which of our hypotheses is correct. This header column describes the two decisions we can reach -- that our
program had no effect (the first row of the 2x2 table) or that it did have an effect (the second row).
Now, let’s examine the cells of the 2x2 table. Each cell shows the Greek symbol for that cell. Notice that the columns
sum to 1 (i.e., α + (1-α) = 1 and β + (1-β) = 1). Why can we sum down the columns, but not across the rows?
Because if one column is true, the other is irrelevant -- if the program has a real effect (the right column) it can’t at
the same time not have one. Therefore, the odds or probabilities have to sum to 1 for each column because the two
rows in each column describe the only possible decisions (accept or reject the null/alternative) for each possible
reality.
Below the Greek symbol is a typical value for that cell. You should especially note the values in the bottom two cells.
The value of α is typically set at .05 in the social sciences. A newer, but growing, tradition is to try to achieve a
statistical power of at least .80. Below the typical values is the name typically given for that cell (in caps). If you
haven’t already, you should note that two of the cells describe errors -- you reach the wrong conclusion -- and in the
other two you reach the correct conclusion. Sometimes it’s hard to remember which error is Type I and which is
Type II. If you keep in mind that Type I is the same as the α or significance level, it might help you to remember that
it is the odds of finding a difference or effect by chance alone. People are more likely to be susceptible to a Type I
error, because they almost always want to conclude that their program works. If they find a statistical effect, they
tend to advertise it loudly. On the other hand, people probably check more thoroughly for Type II errors because
when you find that the program was not demonstrably effective, you immediately start looking for why (in this case,
you might hope to show that you had low power and high β -- that the odds of saying there was no treatment effect
even when there was were too high). Following the capitalized common name are several different ways of
describing the value of each cell, one in terms of outcomes and one in terms of theory-testing. In italics, we give an
example of how to express the numerical value in words.
To better understand the strange relationships between the two columns, think about what happens if you want to
increase your power in a study. As you increase power, you increase the chances that you are going to find an effect
if it’s there (wind up in the bottom row). But, if you increase the chances that you wind up in the bottom row, you
must at the same time be increasing the chances of making a Type I error! Although we can’t sum to 1 across rows,
there is clearly a relationship. Since we usually want high power and low Type I Error, you should be able to
appreciate that we have a built-in tension here.
H0 (null hypothesis) true H0 (null hypothesis) false
H1 (alternative hypothesis) H1 (alternative hypothesis)

false true
In reality...
In reality...
There is a relationship
There is no relationship There is a difference
There is no difference, or gain
no gain Our theory is correct
Our theory is wrong
We accept the null 1-a b

hypothesis (H0)
(e.g., .95) (e.g., .20)
We reject the alternative
hypothesis (H1) THE CONFIDENCE LEVEL TYPE II ERROR
We say... The odds of saying there is no The odds of saying there is no

relationship, difference, gain, relationship, difference, gain,
"There is no when in fact there is none when in fact there is one
relationship"
"There is no difference, The odds of correctly not The odds of not confirming our
no gain" confirming our theory theory when it’s true
"Our theory is wrong"
95 times out of 100 when there 20 times out of 100, when
is no effect, we’ll say there is there is an effect, we’ll say
none there isn’t
We reject the null hypothesis a 1-b

(H0)
(e.g., .05) (e.g., .80)
We accept the alternative
hypothesis (H1) TYPE I ERROR POWER
We say... (SIGNIFICANCE LEVEL) The odds of saying that there is

an relationship, difference,
"There is a The odds of saying there is an gain, when in fact there is one
relationship" relationship, difference, gain,
"There is a difference when in fact there is not The odds of confirming our
or gain" theory correctly
"Our theory is The odds of confirming our
correct" theory incorrectly 80 times out of 100, when
there is an effect, we’ll say
there is
5 times out of 100, when there
is no effect, we’ll say there is on
We generally want this to be as
large as possible
We should keep this small
when we can’t afford/risk
wrongly concluding that our
program works
Figure 1. The Statistical Inference Decision Matrix
We often talk about alpha (α) and beta (β) using the language of "higher" and "lower." For instance, we might talk
about the advantages of a higher or lower α-level in a study. You have to be careful about interpreting the meaning
of these terms. When we talk about higher α-levels, we mean that we are increasing the chance of a Type I Error.
Therefore, a lower α-level actually means that you are conducting a more rigorous test. With all of this in mind, let’s
consider a few common associations evident in the table. You should convince yourself of the following:
the lower the α, the lower the power; the higher the α, the higher the power
the lower the α, the less likely it is that you will make a Type I Error (i.e., reject the null when it’s true)
the lower the α, the more "rigorous" the test
an α of .01 (compared with .05 or .10) means the researcher is being relatively careful, s/he is only willing to
risk being wrong 1 in a 100 times in rejecting the null when it’s true (i.e., saying there’s an effect when there
really isn’t)
an α of .01 (compared with .05 or .10) limits one’s chances of ending up in the bottom row, of concluding
that the program has an effect. This means that both your statistical power and the chances of making a
Type I Error are lower.
an α of .01 means you have a 99% chance of saying there is no difference when there in fact is no
difference (being in the upper left box)
increasing α (e.g., from .01 to .05 or .10) increases the chances of making a Type I Error (i.e., saying there
is a difference when there is not), decreases the chances of making a Type II Error (i.e., saying there is no
difference when there is) and decreases the rigor of the test
increasing α (e.g., from .01 to .05 or .10) increases power because one will be rejecting the null more often
(i.e., accepting the alternative) and, consequently, when the alternative is true, there is a greater chance of
accepting it (i.e., power)

Threats to Conclusion Validity
A threat to conclusion validity is a factor that can lead you to reach an incorrect conclusion about a relationship in
your observations. You can essentially make two kinds of errors about relationships:
1. conclude that there is no relationship when in fact there is (you missed the relationship or
didn't see it)
2. conclude that there is a relationship when in fact there is not (you're seeing things that
aren't there!)
Most threats to conclusion validity have to do with the first problem. Why? Maybe it's because it's so hard in most
research to find relationships in our data at all that it's not as big or frequent a problem -- we tend to have more
problems finding the needle in the haystack than seeing things that aren't there! So, I'll divide the threats by the type
of error they are associated with.
Finding no relationship when there is one (or, "missing the needle in the haystack")
When you're looking for the needle in the haystack you essentially have two basic problems: the tiny
needle and too much hay. You can view this as a signal-to-noise ratio problem.The "signal" is the
needle -- the relationship you are trying to
see. The "noise" consists of all of the
factors that make it hard to see the
relationship. There are several important
sources of noise, each of which is a threat
to conclusion validity. One important threat
is low reliability of measures (see
reliability). This can be due to many
factors including poor question wording,
bad instrument design or layout, illegibility
of field notes, and so on. In studies where you are evaluating a program you can introduce noise
through poor reliability of treatment implementation. If the program doesn't follow the prescribed
procedures or is inconsistently carried out, it will be harder to see relationships between the
program and other factors like the outcomes. Noise that is caused by random irrelevancies in the
setting can also obscure your ability to see a relationship. In a classroom context, the traffic outside
the room, disturbances in the hallway, and countless other irrelevant events can distract the
researcher or the participants. The types of people you have in your study can also make it harder
to see relationships. The threat here is due to random heterogeneity of respondents. If you have
a very diverse group of respondents, they are likely to vary more widely on your measures or
observations. Some of their variety may be related to the phenomenon you are looking at, but at
least part of it is likely to just constitute individual differences that are irrelevant to the relationship
being observed.
All of these threats add variability into the research context and contribute to the "noise" relative to
the signal of the relationship you are looking for. But noise is only one part of the problem. We also
have to consider the issue of the signal -- the true strength of the relationship. There is one broad
threat to conclusion validity that tends to subsume or encompass all of the noise-producing factors
above and also takes into account the strength of the signal, the amount of information you collect,
and the amount of risk you're willing to take in making a decision about a whether a relationship
exists. This threat is called low statistical power. Because this idea is so important in
understanding how we make decisions about relationships, we have a separate discussion of
statistical power.
Finding a relationship when there is not one (or "seeing things that aren't there")
In anything but the most trivial research study, the researcher will spend a considerable amount of
time analyzing the data for relationships. Of course, it's important to conduct a thorough analysis,
but most people are well aware of the fact that if you play with the data long enough, you can often
"turn up" results that support or corroborate your hypotheses. In more everyday terms, you are
"fishing" for a specific result by analyzing the data repeatedly under slightly differing conditions or
assumptions.
In statistical analysis, we attempt to determine the probability that the finding we get is a "real" one
or could have been a "chance" finding. In fact, we often use this probability to decide whether to
accept the statistical result as evidence that there is a relationship. In the social sciences,
researchers often use the rather arbitrary value known as the 0.05 level of significance to decide
whether their result is credible or could be considered a "fluke." Essentially, the value 0.05 means
that the result you got could be expected to occur by chance at least 5 times out of every 100 times
you run the statistical analysis. The probability assumption that underlies most statistical analyses
assumes that each analysis is "independent" of the other. But that may not be true when you
conduct multiple analyses of the same data. For instance, let's say you conduct 20 statistical tests
and for each one you use the 0.05 level criterion for deciding whether you are observing a
relationship. For each test, the odds are 5 out of 100 that you will see a relationship even if there is
not one there (that's what it means to say that the result could be "due to chance"). Odds of 5 out of
100 are equal to the fraction 5/100 which is also equal to 1 out of 20. Now, in this example, you
conduct 20 separate analyses. Let's say that you find that of the twenty results, only one is
statistically significant at the 0.05 level. Does that mean you have found a statistically significant
relationship? If you had only done the one analysis, you might conclude that you've found a
relationship in that result. But if you did 20 analyses, you would expect to find one of them
significant by chance alone, even if there is no real relationship in the data. We call this threat to
conclusion validity fishing and the error rate problem. The basic problem is that you were
"fishing" by conducting multiple analyses and treating each one as though it was independent.
Instead, when you conduct multiple analyses, you should adjust the error rate (i.e., significance
level) to reflect the number of analyses you are doing. The bottom line here is that you are more
likely to see a relationship when there isn't one when you keep reanalyzing your data and don't take
that fishing into account when drawing your conclusions.
Problems that can lead to either conclusion error
Every analysis is based on a variety of assumptions about the nature of the data, the procedures
you use to conduct the analysis, and the match between these two. If you are not sensitive to the
assumptions behind your analysis you are likely to draw erroneous conclusions about relationships.
In quantitative research we refer to this threat as the violated assumptions of statistical tests.
For instance, many statistical analyses assume that the data are distributed normally -- that the
population from which they are drawn would be distributed according to a "normal" or "bell-shaped"
curve. If that assumption is not true for your data and you use that statistical test, you are likely to
get an incorrect estimate of the true relationship. And, it's not always possible to predict what type of
error you might make -- seeing a relationship that isn't there or missing one that is.
I believe that the same problem can occur in qualitative research as well. There are assumptions,
some of which we may not even realize, behind our qualitative methods. For instance, in interview
situations we may assume that the respondent is free to say anything s/he wishes. If that is not true
-- if the respondent is under covert pressure from supervisors to respond in a certain way -- you
may erroneously see relationships in the responses that aren't real and/or miss ones that are.
The threats listed above illustrate some of the major difficulties and traps that are involved in one of the most basic
of research tasks -- deciding whether there is a relationship in your data or observations. So, how do we attempt to
deal with these threats? The researcher has a number of strategies for improving conclusion validity through
minimizing or eliminating the threats described above.

So you may have a problem assuring that you are reaching credible conclusions about relationships in your data.
What can you do about it? Here are some general guidelines you can follow in designing your study that will help
improve conclusion validity.
Guidelines for Improving Conclusion Validity
Good Statistical Power. The rule of thumb in social research is that you want statistical
power to be greater than 0.8 in value. That is, you want to have at least 80 chances out of
100 of finding a relationship when there is one. As pointed out in the discussion of
statistical power, there are several factors that interact to affect power. One thing you can
usually do is to collect more information -- use a larger sample size. Of course, you have to
weigh the gain in power against the time and expense of having more participants or
gathering more data. The second thing you can do is to increase your risk of making a
Type I error -- increase the chance that you will find a relationship when it's not there. In
practical terms you can do that statistically by raising the alpha level. For instance, instead
of using a 0.05 significance level, you might use 0.10 as your cutoff point. Finally, you can
increase the effect size. Since the effect size is a ratio of the signal of the relationship to the
noise in the context, there are two broad strategies here. To up the signal, you can
increase the salience of the relationship itself. This is especially true in experimental
contexts where you are looking at the effects of a program or treatment. If you increase the
dosage of the program (e.g., increase the hours spent in training or the number of training
sessions), it will be easier to see the effect when the treatment is stronger. The other option
is to decrease the noise (or, put another way, increase reliability).
Good Reliability. Reliability is related to the idea of noise or "error" that obscures your
ability to see a relationship. In general, you can improve reliability by doing a better job of
constructing measurement instruments, by increasing the number of questions on an scale
or by reducing situational distractions in the measurement context.
Good Implementation. When you are studying the effects of interventions, treatments or
programs, you can improve conclusion validity by assuring good implementation. This can
be accomplished by training program operators and standardizing the protocols for
administering the program.

©1995 Robert M. Becker
Cornell University
Trochim Home Page
...with your host, Judge Lance Ito
Good afternoon, this court is now in session...I want to welcome you to the OJ trial decision matrix analogy...Although I was quite
busy and all with the "trial of the century" going on, I thought I'd share a few ideas I was kicking around while daydreaming during
Kato's testimony...man that Kato was quite a character, don't you think?
Anyway, when I wasn't busy breaking up the lastest melee between Marcia and Johnny, I was busily concoting this analogy ...so sit
back, review the testimony, and if you don't have any further objections, let's get on with the case...
The Double Murders

Unless you've been stuck under a proverbial rock, you're aware that Mr. Simpson was on trial for double murder. While the judicial
system, Mr. Simpson, and the "dream team" stood tough on "innocent until proven guilty," Ms. Clark and Mr. Darden worked
feverishly to prove what they believed to be the truth...that Mr. Simpson did in fact commit the murders...
"The Social Scientists"
...Let's imagine that Ms. Clark and Mr.Darden...aw heck, we've become so close these past months, I'll call them
Marcia and Chris...let's imagine that Marcia and Chris are social scientists instead of just slimy lawyers...they are
social scientists with a theory they want to prove...
The Null and Alternative Hypotheses
As the prosecuting attorneys, their theory was that OJ committed the murders...that OJ Simpson
is guilty...their theory was actually the alternative hypothesis (Ha), with the null hypothesis
(Ho) being OJ Simpson is innocent...
Marcia and Chris labeled their hypotheses and presented them to us as prosecution Exhibit A...
Think of Me as a Facilitator...
While Marcia and Chris presented their theory in court, remember that my role was that of judge, not
jury...similar to a program evaluator facilitating information for stakeholders, I was the facilitator of the
information within the courtroom...I may have made important decisions about how information was
presented within the courtroom (which could have potentially affected the outcome of the case), but I did
not make the final decisions or verdict on the case...
...and the Jury as Stakeholders

...That role was left to the jury...you can think of the jury as a sample of the
population who represented the stakeholders known, in this case, as the State of
California (all citizens of the State.) There were multiple stakeholders in this case --
the victims families, friends, the defendant and his family, friends, the jury, the
State of California, the United States, the judicial system, the lawyers involved in
the case, etc., BUT only the jury made the decision (drew a conclusion) on whether
to accept the null hypothesis (OJ is innocent) or to reject (OJ is guilty)....
The Jury as Critics
What were the criteria that the jury used to make conclusions about Marcia and Chris' hypothesis? Similar to the social sciences, the
jury made decisions based on how well the lawyers presented their evidence...in this sense, we can think of the lawyers as the
conductors and publishers of the research, and the jury as the critics or other social scientists who judge the ultimate validity of the
presented theories or program effects...the stronger Marcia and Chris can provide evidence or proof for their theory, the more likely
the jury would reject the null hypothesis and validate Marica and Chris' belief that OJ is gulity...
...But just as there are professional standards for providing significant evidence for a social science theory, so too
did the jury have to make a decision based on judicial standards....
Reasonable Doubt as Alpha Level
Recall that social scientists provide evidence based on probability or alpha levels -- the lower the alpha
level, the more evidence needed to prove a theory -- a common alpha level for the social sciences would
be .05, meaning that a researcher will make a conclusion that their theory is correct when they feel 95% or
more certain about their evidence...
The OJ jury would only accept the prosecution's theory that OJ is guilty only if all twelve members
conclude beyond a reasonable doubt that OJ did indeed commit the murders...
As in the social sciences, there is a tension in the judicial system between the prosecutor's ability to
provide evidence for their case (power) and the jury's feelings about returning a gulity verdict
(significance or alpha level)....
Standard of Innocent Until Proven Guilty
The standard of reasonable doubt may vary from jury to jury and case to case, but generally, juries, unlike social scientists, may be
more likely to make (or feel comfortable with) a Type II Error based on the notion of "innocent until proven guilty."
The Jury's Low Alpha Level
In such a high profile double murder trial as this, the jury most likely abided by very strict and rigid standards when assessing levels
of reasonable doubt concerning the evidence linking OJ to the murders...this means the juryÕs alpha level was probably very low,
maybe .01 -- meaning they wanted to be 99% sure that their conclusion matches reality...
...As a result, Marcia and Chris had a harder time getting their theory accepted by the jury -- they had to increase the amount (sample
size) and the persuasiveness (effect size) of their evidence in order to increase the chances that the jury would conclude that their
theory is indeed the correct theory (power)...
Increasing Power through Witnesses
One way the prosecution tried to increase the "power" of their testimony was to introduce as much evidence as possible which linked
OJ to the murders...One way they tried to do this was to get key witnesses like Alan Park, the limo driver, and Kato Kaelin, OJ's
surfer dufus buddy...
Increasing Power through Physical Evidence
They also tried to increase power by introducing as much physical evidence as possible (increasing sample size)...
Increasing Power and Effect Size through DNA
...And they increased power by getting MORE blood samples for DNA testing...and getting BETTER samples to improve effect size
-- in turn improving the overall POWER...
Increasing Power and Effect Size through Motive
The prosecution also attempted to introduce a motive, thus increasing power through increases in sample size (more testimony,
evidence) and effect size (providing more salience to the evidence, theory).
Linking All the Evidence
The goal for the prosecution was to prove that all the evidence linking OJ to the murders was not just coincidental or due to chance...
that the evidence provided a strong enough link between OJ and the murders...that the evidence was not just circumstancial...that the
jury would be making a Type II error if they did not return a guilty verdict...
...While Marcia and Chris made a strong argument for their theory, the defense had a few tricks up their sleeve....
The Dream Team

Reasonable Doubt as Noise
The defense introduced reasonable doubt by introducing as much noise as possible into the prosecution's theory...they called their
own witnesses to refute testimony, thus decreasing effect size...
JOHNNY : "Did he [Furhman] ever use the 'N' word?"
MCKINNY : "Yes he did."
Reducing Effect Size
They cross examined and laid question to the credibility of the prosecution's witnesses...once again, introducing noise and reducing
effect size...
--"Detective Fuhrman, did you ever plant or manufacture evidence in this case?"
"I wish to assert my fifth amendment privilege"--
More Noise
The defense provided reasonable doubt (created noise) for the physical evidence...
"If it doesn't fit, you must acquit!"
Any way the defense can introduce noise -- can provide for some kind of alternative explanation for the evidence (i.e. the program
effect, outcome, etc.), the effect size of the prosecution's evidence is reduced...
Drawing a Conclusion
Once all the evidence was presented and contested, the jury then deliberated the matter...
The OJ Decision Matrix
If the jury were to construct their own decision matrix, it might look something like this....(click on matrix boxes for more details!!!)
Time for a Vacation!
Well folks, there you have it...and you were wondering what I was doing all day on my
lap top computer (no solitaire for me!) Man, I need a vacation ...maybe a nice trip to
Hawaii...so I can begin writing my book on the beach while working on my tan!
put court in recess

Allright, ladies and gentleman we are going to take our recess for this session . . . please remember all my admonitions to you . . .
don't discuss the case amongst yourselves, don't form any opinions about the case, don't conduct any deliberations till matter has been
submitted to you, do not allow anybody to communicate with you with regard to the case . . . you may step down, you are ordered to
visit this site again, but don't discuss your testimony with anybody except the lawyers . . . we'll stand in recess until then . . .
OJ Receives Justice...OJ is a Free Man!
OJ Gets Off....America Stunned!
OJ Wrongly Sent to Prison...Free OJ!
OJ Found Guilty...OJ Fans are Stunned...He's Staying in Jail!
[ Home ] [ Five Big Words ] [ Types of Questions ] [ Time in Research ] [ Types of Relationships ] [ Variables ] [ Hypotheses ] [ Types of Data ] [ Unit of Analysis ]
[ Two Research Fallacies ]
An hypothesis is a specific statement of prediction. It describes in concrete (rather than theoretical) terms what you expect will happen in
your study. Not all studies have hypotheses. Sometimes a study is designed to be exploratory (see inductive research). There is no
formal hypothesis, and perhaps the purpose of the study is to explore some area more thoroughly in order to develop some specific
hypothesis or prediction that can be tested in future research. A single study may have one or many hypotheses.
Actually, whenever I talk about an hypothesis, I am really thinking simultaneously about two hypotheses. Let's say that you predict that
there will be a relationship between two variables in your study. The way we would formally set up the hypothesis test is to formulate two
hypothesis statements, one that describes your prediction and one that describes all the other possible outcomes with respect to the
hypothesized relationship. Your prediction is that variable A and variable B will be related (you don't care whether it's a positive or
negative relationship). Then the only other possible outcome would be that variable A and variable B are not related. Usually, we call the
hypothesis that you support (your prediction) the alternative hypothesis, and we call the hypothesis that describes the remaining
possible outcomes the null hypothesis. Sometimes we use a notation like HA or H1 to represent the alternative hypothesis or your
prediction, and HO or H0 to represent the null case. You have to be careful here, though. In some studies, your prediction might very well
be that there will be no difference or change. In this case, you are essentially trying to find support for the null hypothesis and you are
opposed to the alternative.
If your prediction specifies a direction, and the null therefore is the no difference prediction and the prediction of the opposite direction,
we call this a one-tailed hypothesis. For instance, let's imagine that you are investigating the effects of a new employee training
program and that you believe one of the outcomes will be that there will be less employee absenteeism. Your two hypotheses might be
stated something like this:
The null hypothesis for this study is:
HO: As a result of the XYZ company employee training program, there will either be no significant difference in employee
absenteeism or there will be a significant increase.
which is tested against the alternative hypothesis:
HA: As a result of the XYZ company employee training program, there will be a significant decrease in employee
absenteeism.
In the figure on the left, we see this situation illustrated graphically. The
alternative hypothesis -- your prediction that the program will decrease
absenteeism -- is shown there. The null must account for the other two
possible conditions: no difference, or an increase in absenteeism. The
figure shows a hypothetical distribution of absenteeism differences. We
can see that the term "one-tailed" refers to the tail of the distribution on
the outcome variable.
When your prediction does not specify a direction, we say you have a two-
tailed hypothesis. For instance, let's assume you are studying a new
drug treatment for depression. The drug has gone through some initial
animal trials, but has not yet been tested on humans. You believe (based
on theory and the previous research) that the drug will have an effect, but
you are not confident enough to hypothesize a direction and say the drug will reduce depression (after all, you've seen more than enough
promising drug treatments come along that eventually were shown to have severe side effects that actually worsened symptoms). In this
case, you might state the two hypotheses like this:
The null hypothesis for this study is:

HO: As a result of 300mg./day of the ABC drug, there will be no significant difference in depression.
which is tested against the alternative hypothesis:
HA: As a result of 300mg./day of the ABC drug, there will be a significant difference in depression.
The figure on the right illustrates this two-tailed prediction for this case.
Again, notice that the term "two-tailed" refers to the tails of the distribution
for your outcome variable.
The important thing to remember about stating hypotheses is that you

formulate your prediction (directional or not), and then you formulate a
second hypothesis that is mutually exclusive of the first and incorporates
all possible alternative outcomes for that case. When your study analysis
is completed, the idea is that you will have to choose between the two
hypotheses. If your prediction was correct, then you would (usually) reject
the null hypothesis and accept the alternative. If your original prediction
was not supported in the data, then you will accept the null hypothesis
and reject the alternative. The logic of hypothesis testing is based on
these two basic principles:
the formulation of two mutually exclusive hypothesis statements that, together, exhaust all possible outcomes
the testing of these so that one is necessarily accepted and the other rejected
OK, I know it's a convoluted, awkward and formalistic way to ask research questions. But it encompasses a long tradition in statistics
called the hypothetical-deductive model, and sometimes we just have to do things because they're traditions. And anyway, if all of this
hypothesis testing was easy enough so anybody could understand it, how do you think statisticians would stay employed?

Research involves an eclectic blending of an enormous range of skills and activities. To be a good social researcher, you have to be able
to work well with a wide variety of people, understand the specific methods used to conduct research, understand the subject that you
are studying, be able to convince someone to give you the funds to study it, stay on track and on schedule, speak and write persuasively,
and on and on.
Here, I want to introduce you to five terms that I think help to describe some of the key aspects of contemporary social research. (This list
is not exhaustive. It's really just the first five terms that came into my mind when I was thinking about this and thinking about how I might
be able to impress someone with really big/complex words to describe fairly straightforward concepts).
I present the first two terms -- theoretical and empirical -- together because they are often contrasted with each other. Social research
is theoretical, meaning that much of it is concerned with developing, exploring or testing the theories or ideas that social researchers
have about how the world operates. But it is also empirical, meaning that it is based on observations and measurements of reality -- on
what we perceive of the world around us. You can even think of most research as a blending of these two terms -- a comparison of our
theories about how the world operates with our observations of its operation.
The next term -- nomothetic -- comes (I think) from the writings of the psychologist Gordon Allport. Nomothetic refers to laws or rules
that pertain to the general case (nomos in Greek) and is contrasted with the term "idiographic" which refers to laws or rules that relate to
individuals (idiots in Greek???). In any event, the point here is that most social research is concerned with the nomothetic -- the general
case -- rather than the individual. We often study individuals, but usually we are interested in generalizing to more than just the individual.
In our post-positivist view of science, we no longer regard certainty as attainable. Thus, the fourth big word that describes much
contemporary social research is probabilistic, or based on probabilities. The inferences that we make in social research have
probabilities associated with them -- they are seldom meant to be considered covering laws that pertain to all cases. Part of the reason
we have seen statistics become so dominant in social research is that it allows us to estimate probabilities for the situations we study.
The last term I want to introduce is causal. You've got to be very careful with this term. Note that it is spelled causal not casual. You'll
really be embarrassed if you write about the "casual hypothesis" in your study! The term causal means that most social research is
interested (at some point) in looking at cause-effect relationships. This doesn't mean that most studies actually study cause-effect
relationships. There are some studies that simply observe -- for instance, surveys that seek to describe the percent of people holding a
particular opinion. And, there are many studies that explore relationships -- for example, studies that attempt to see whether there is a
relationship between gender and salary. Probably the vast majority of applied social research consists of these descriptive and
correlational studies. So why am I talking about causal studies? Because for most social sciences, it is important that we go beyond just
looking at the world or looking at relationships. We would like to be able to change the world, to improve it and eliminate some of its
major problems. If we want to change the world (especially if we want to do this in an organized, scientific way), we are automatically
interested in causal relationships -- ones that tell us how our causes (e.g., programs, treatments) affect the outcomes of interest.

There are three basic types of questions that research projects can address:
1. Descriptive.When a study is designed primarily to describe what is going on or what exists. Public opinion polls that seek only to
describe the proportion of people who hold various opinions are primarily descriptive in nature. For instance, if we want to know
what percent of the population would vote for a Democratic or a Republican in the next presidential election, we are simply
interested in describing something.
2. Relational.When a study is designed to look at the relationships between two or more variables. A public opinion poll that
compares what proportion of males and females say they would vote for a Democratic or a Republican candidate in the next
presidential election is essentially studying the relationship between gender and voting preference.
3. Causal.When a study is designed to determine whether one or more variables (e.g., a program or treatment variable) causes or
affects one or more outcome variables. If we did a public opinion poll to try to determine whether a recent political advertising
campaign changed voter preferences, we would essentially be studying whether the campaign (cause) changed the proportion of
voters who would vote Democratic or Republican (effect).
The three question types can be viewed as cumulative. That is, a relational study assumes that you can first describe (by measuring or
observing) each of the variables you are trying to relate. And, a causal study assumes that you can describe both the cause and effect
variables and that you can show that they are related to each other. Causal studies are probably the most demanding of the three.

Time is an important element of any research design, and here I want to introduce one of the most fundamental distinctions in research
design nomenclature: cross-sectional versus longitudinal studies. A cross-sectional study is one that takes place at a single point in
time. In effect, we are taking a 'slice' or cross-section of whatever it is we're observing or measuring. A longitudinal study is one that
takes place over time -- we have at least two (and often more) waves of measurement in a longitudinal design.
A further distinction is made between two types of longitudinal designs: repeated measures and time series. There is no universally
agreed upon rule for distinguishing these two terms, but in general, if you have two or a few waves of measurement, you are using a
repeated measures design. If you have many waves of measurement over time, you have a time series. How many is 'many'? Usually,
we wouldn't use the term time series unless we had at least twenty waves of measurement, and often far more. Sometimes the way we
distinguish these is with the analysis methods we would use. Time series analysis requires that you have at least twenty or so
observations. Repeated measures analyses (like repeated measures ANOVA) aren't often used with as many as twenty waves of
measurement.

A relationship refers to the correspondence between two variables. When we talk about types of relationships, we can mean that in at
least two ways: the nature of the relationship or the pattern of it.
The Nature of a Relationship
While all relationships tell about the correspondence between two variables, there is a special type of relationship that holds that the two
variables are not only in correspondence, but that one causes the other. This is the key distinction between a simple correlational
relationship and a causal relationship. A correlational relationship simply says that two things perform in a synchronized manner. For
instance, we often talk of a correlation between inflation and unemployment. When inflation is high, unemployment also tends to be high.
When inflation is low, unemployment also tends to be low. The two variables are correlated. But knowing that two variables are
correlated does not tell us whether one causes the other. We know, for instance, that there is a correlation between the number of roads
built in Europe and the number of children born in the United States. Does that mean that is we want fewer children in the U.S., we
should stop building so many roads in Europe? Or, does it mean that if we don't have enough roads in Europe, we should encourage U.
S. citizens to have more babies? Of course not. (At least, I hope not). While there is a relationship between the number of roads built and
the number of babies, we don't believe that the relationship is a causal one. This leads to consideration of what is often termed the third
variable problem. In this example, it may be that there is a third variable that is causing both the building of roads and the birthrate, that
is causing the correlation we observe. For instance, perhaps the general world economy is responsible for both. When the economy is
good more roads are built in Europe and more children are born in the U.S. The key lesson here is that you have to be careful when you
interpret correlations. If you observe a correlation between the number of hours students use the computer to study and their grade point
averages (with high computer users getting higher grades), you cannot
assume that the relationship is causal: that computer use improves
grades. In this case, the third variable might be socioeconomic status --
richer students who have greater resources at their disposal tend to both
use computers and do better in their grades. It's the resources that drives
both use and grades, not computer use that causes the change in the
grade point average.
Patterns of Relationships
We have
several
terms to
describe
the major
different
types of
patterns one might find in a relationship. First, there is the case of no
relationship at all. If you know the values on one variable, you don't know
anything about the values on the other. For instance, I suspect that there
is no relationship between the length of the lifeline on your hand and your
grade point average. If I know your GPA, I don't have any idea how long
your lifeline is.
Then, we have the positive relationship. In a positive relationship, high

values on one variable are associated with high values on the other and
low values on one are associated with low values on the other. In this
example, we assume an idealized positive relationship between years of
education and the salary one might expect to be making.
On the other hand a negative relationship implies that high values on one
variable are associated with low values on the other. This is also
sometimes termed an inverse relationship. Here, we show an idealized
negative relationship between a measure of self esteem and a measure of
paranoia in psychiatric patients.
These are the simplest types of relationships we might typically estimate

in research. But the pattern of a relationship can be more complex than
this. For instance, the figure on the left shows a relationship that changes
over the range of both variables, a curvilinear relationship. In this
example, the horizontal axis represents dosage of a drug for an illness
and the vertical axis represents a severity of illness measure. As dosage
rises, severity of illness goes down. But at some point, the patient begins
to experience negative side effects associated with too high a dosage,
and the severity of illness begins to increase again.

You won't be able to do very much in research unless you know how to talk about variables. A variable is any entity that can take on
different values. OK, so what does that mean? Anything that can vary can be considered a variable. For instance, age can be considered
a variable because age can take different values for different people or for the same person at different times. Similarly, country can be
considered a variable because a person's country can be assigned a value.
Variables aren't always 'quantitative' or numerical. The variable 'gender' consists of two text values: 'male' and 'female'. We can, if it is
useful, assign quantitative values instead of (or in place of) the text values, but we don't have to assign numbers in order for something to
be a variable. It's also important to realize that variables aren't only things that we measure in the traditional sense. For instance, in much
social research and in program evaluation, we consider the treatment or program to be made up of one or more variables (i.e., the
'cause' can be considered a variable). An educational program can have varying amounts of 'time on task', 'classroom settings', 'student-
teacher ratios', and so on. So even the program can be considered a variable (which can be made up of a number of sub-variables).
An attribute is a specific value on a variable. For instance, the variable sex or gender has two attributes: male and female. Or, the
variable agreement might be defined as having five attributes:
1 = strongly disagree
2 = disagree
3 = neutral
4 = agree
5 = strongly agree
Another important distinction having to do with the term 'variable' is the distinction between an independent and dependent variable.
This distinction is particularly relevant when you are investigating cause-effect relationships. It took me the longest time to learn this
distinction. (Of course, I'm someone who gets confused about the signs for 'arrivals' and 'departures' at airports -- do I go to arrivals
because I'm arriving at the airport or does the person I'm picking up go to arrivals because they're arriving on the plane!). I originally
thought that an independent variable was one that would be free to vary or respond to some program or treatment, and that a dependent
variable must be one that depends on my efforts (that is, it's the treatment). But this is entirely backwards! In fact the independent
variable is what you (or nature) manipulates -- a treatment or program or cause. The dependent variable is what is affected by the
independent variable -- your effects or outcomes. For example, if you are studying the effects of a new educational program on student
achievement, the program is the independent variable and your measures of achievement are the dependent ones.
Finally, there are two traits of variables that should always be achieved. Each variable should be exhaustive, it should include all
possible answerable responses. For instance, if the variable is "religion" and the only options are "Protestant", "Jewish", and "Muslim",
there are quite a few religions I can think of that haven't been included. The list does not exhaust all possibilities. On the other hand, if
you exhaust all the possibilities with some variables -- religion being one of them -- you would simply have too many responses. The way
to deal with this is to explicitly list the most common attributes and then use a general category like "Other" to account for all remaining
ones. In addition to being exhaustive, the attributes of a variable should be mutually exclusive, no respondent should be able to have
two attributes simultaneously. While this might seem obvious, it is often rather tricky in practice. For instance, you might be tempted to
represent the variable "Employment Status" with the two attributes "employed" and "unemployed." But these attributes are not
necessarily mutually exclusive -- a person who is looking for a second job while employed would be able to check both attributes! But
don't we often use questions on surveys that ask the respondent to "check all that apply" and then list a series of categories? Yes, we do,
but technically speaking, each of the categories in a question like that is its own variable and is treated dichotomously as either
"checked" or "unchecked", attributes that are mutually exclusive.

We'll talk about data in lots of places in The Knowledge Base, but here I just want to make a fundamental distinction between two types
of data: qualitative and quantitative. The way we typically define them, we call data 'quantitative' if it is in numerical form and
'qualitative' if it is not. Notice that qualitative data could be much more than just words or text. Photographs, videos, sound recordings
and so on, can be considered qualitative data.
Personally, while I find the distinction between qualitative and quantitative data to have some utility, I think most people draw too hard a
distinction, and that can lead to all sorts of confusion. In some areas of social research, the qualitative-quantitative distinction has led to
protracted arguments with the proponents of each arguing the superiority of their kind of data over the other. The quantitative types argue
that their data is 'hard', 'rigorous', 'credible', and 'scientific'. The qualitative proponents counter that their data is 'sensitive', 'nuanced',
'detailed', and 'contextual'.
For many of us in social research, this kind of polarized debate has become less than productive. And, it obscures the fact that qualitative
and quantitative data are intimately related to each other. All quantitative data is based upon qualitative judgments; and all
qualitative data can be described and manipulated numerically. For instance, think about a very common quantitative measure in
social research -- a self esteem scale. The researchers who develop such instruments had to make countless judgments in constructing
them: how to define self esteem; how to distinguish it from other related concepts; how to word potential scale items; how to make sure
the items would be understandable to the intended respondents; what kinds of contexts it could be used in; what kinds of cultural and
language constraints might be present; and on and on. The researcher who decides to use such a scale in their study has to make
another set of judgments: how well does the scale measure the intended concept; how reliable or consistent is it; how appropriate is it for
the research context and intended respondents; and on and on. Believe it or not, even the respondents make many judgments when
filling out such a scale: what is meant by various terms and phrases; why is the researcher giving this scale to them; how much energy
and effort do they want to expend to complete it, and so on. Even the consumers and readers of the research will make lots of judgments
about the self esteem measure and its appropriateness in that research context. What may look like a simple, straightforward, cut-and-
dried quantitative measure is actually based on lots of qualitative judgments made by lots of different people.
On the other hand, all qualitative information can be easily converted into quantitative, and there are many times when doing so would
add considerable value to your research. The simplest way to do this is to divide the qualitative information into units and number them! I
know that sounds trivial, but even that simple nominal enumeration can enable you to organize and process qualitative information more
efficiently. Perhaps more to the point, we might take text information (say, excerpts from transcripts) and pile these excerpts into piles of
similar statements. When we do something even as easy as this simple grouping or piling task, we can describe the results
quantitatively. For instance, if we had ten statements and we grouped these into five piles (as shown in the figure), we could describe the
piles using a 10 x 10 table of 0's and 1's. If two statements
were placed together in the same pile, we would put a 1 in
their row-column juncture. If two statements were placed in
different piles, we would use a 0. The resulting matrix or table
describes the grouping of the ten statements in terms of their
similarity. Even though the data in this example consists of
qualitative statements (one per card), the result of our simple
qualitative procedure (grouping similar excerpts into the
same piles) is quantitative in nature. "So what?" you ask.
Once we have the data in numerical form, we can manipulate
it numerically. For instance, we could have five different
judges sort the 10 excerpts and obtain a 0-1 matrix like this
for each judge. Then we could average the five matrices into
a single one that shows the proportions of judges who
grouped each pair together. This proportion could be
considered an estimate of the similarity (across independent
judges) of the excerpts. While this might not seem too
exciting or useful, it is exactly this kind of procedure that I use
as an integral part of the process of developing 'concept
maps' of ideas for groups of people (something that is useful!).
One of the most important ideas in a research project is the unit of analysis. The unit of analysis is the major entity that you are
analyzing in your study. For instance, any of the following could be a unit of analysis in a study:
individuals
groups
artifacts (books, photos, newspapers)
geographical units (town, census tract, state)
social interactions (dyadic relations, divorces, arrests)
Why is it called the 'unit of analysis' and not something else (like, the unit of sampling)? Because it is the analysis you do in your study
that determines what the unit is. For instance, if you are comparing the children in two classrooms on achievement test scores, the unit is
the individual child because you have a score for each child. On the other hand, if you are comparing the two classes on classroom
climate, your unit of analysis is the group, in this case the classroom, because you only have a classroom climate score for the class as a
whole and not for each individual student. For different analyses in the same study you may have different units of analysis. If you decide
to base an analysis on student scores, the individual is the unit. But you might decide to compare average classroom performance. In
this case, since the data that goes into the analysis is the average itself (and not the individuals' scores) the unit of analysis is actually
the group. Even though you had data at the student level, you use aggregates in the analysis. In many areas of social research these
hierarchies of analysis units have become particularly important and have spawned a whole area of statistical analysis sometimes
referred to as hierarchical modeling. This is true in education, for instance, where we often compare classroom performance but
collected achievement data at the individual student level.

A fallacy is an error in reasoning, usually based on mistaken assumptions. Researchers are very familiar with all the ways they could go
wrong, with the fallacies they are susceptible to. Here, I discuss two of the most important.
The ecological fallacy occurs when you make conclusions about individuals based only on analyses of group data. For instance,
assume that you measured the math scores of a particular classroom and found that they had the highest average score in the district.
Later (probably at the mall) you run into one of the kids from that class and you think to yourself "she must be a math whiz." Aha! Fallacy!
Just because she comes from the class with the highest average doesn't mean that she is automatically a high-scorer in math. She could
be the lowest math scorer in a class that otherwise consists of math geniuses!
An exception fallacy is sort of the reverse of the ecological fallacy. It occurs when you reach a group conclusion on the basis of
exceptional cases. This is the kind of fallacious reasoning that is at the core of a lot of sexism and racism. The stereotype is of the guy
who sees a woman make a driving error and concludes that "women are terrible drivers." Wrong! Fallacy!
Both of these fallacies point to some of the traps that exist in both research and everyday reasoning. They also point out how important it
is that we do research. We need to determine empirically how individuals perform (not just rely on group averages). Similarly, we need to
look at whether there are correlations between certain behaviors and certain groups (you might look at the whole controversy around the
book The Bell Curve as an attempt to examine whether the supposed relationship between race and IQ is real or a fallacy.

[ Home ]
Here, we'll look at a number of different factorial designs. We'll begin with a two-factor design where one of the
factors has more than two levels. Then we'll introduce the three-factor design. Finally, we'll present the idea of the
incomplete factorial design.
A 2x3 Example
For these examples, let's

construct an example where
we wish to study of the effect
of different treatment
combinations for cocaine
abuse. Here, the dependent
measure is severity of illness
rating done by the treatment
staff. The outcome ranges
from 1 to 10 where higher
scores indicate more severe
illness: in this case, more
severe cocaine addiction.
Furthermore, assume that the
levels of treatment are:
Factor 1: Treatment
psychotherapy
behavior
modification
Factor 2: Setting
inpatient
day treatment
outpatient
Note that the setting factor in
this example has three levels.
The first figure shows what an

effect for setting outcome
might look like. You have to
be very careful in interpreting
these results because higher
scores mean the patient is
doing worse. It's clear that
inpatient treatment works
best, day treatment is next
best, and outpatient treatment
is worst of the three. It's also
clear that there is no
difference between the two
treatment levels
(psychotherapy and behavior
modification). Even though
both graphs in the figure
depict the exact same data, I
think it's easier to see the
main effect for setting in the
graph on the lower left where setting is depicted with different lines on the graph rather than at different points along
the horizontal axis.
The second figure shows a main effect for treatment with psychotherapy performing better (remember the direction
of the outcome variable) in all settings than behavior modification. The effect is clearer in the graph on the lower
right where treatment levels are used for the lines. Note that in both this and the previous figure the lines in all
graphs are parallel indicating
that there are no interaction
effects.
Now, let's look at a few of the

possible interaction effects. In
the first case, we see that day
treatment is never the best
condition. Furthermore, we
see that psychotherapy works
best with inpatient care and
behavior modification works
best with outpatient care.
The other interaction effect

example is a bit more
complicated. Although there
may be some main effects
mixed in with the interaction,
what's important here is that
there is a unique combination
of levels of factors that stands
out as superior: psychotherapy done in the inpatient setting. Once we identify a "best" combination like this, it is
almost irrelevant what is
going on with main effects.
A Three-Factor Example
Now let's examine what a

three-factor study might look
like. We'll use the same
factors as above for the first
two factors. But here we'll
include a new factor for
dosage that has two levels.
The factor structure in this 2 x
2 x 3 factorial experiment is:
Factor 1: Dosage
100 mg.
300 mg.
Factor 2: Treatment
psychotherapy
behavior
modification
Factor 3: Setting
inpatient
day treatment
outpatient
Notice that in this design we

have 2x2x3=12 groups!
Although it's tempting in
factorial studies to add more
factors, the number of groups
always increases
multiplicatively (is that a real
word?). Notice also that in
order to even show the tables
of means we have to have to
tables that each show a two
factor relationship. It's also
difficult to graph the results in
a study like this because
there will be a large number
of different possible graphs.
In the statistical analysis you
can look at the main effects
for each of your three factors,
can look at the three two-way
interactions (e.g., treatment
vs. dosage, treatment vs.
setting, and setting vs.
dosage) and you can look at
the one three-way interaction.
Whatever else may be happening, it is clear that one combination of three levels works best: 300 mg. and
psychotherapy in an inpatient setting. Thus, we have a three-way interaction in this study. If you were an
administrator having to make a choice among the different treatment combinations you would be best advised to
select that one (assuming your patients and setting are comparable to the ones in this study).
Incomplete Factorial
Design
It's clear that factorial designs

can become cumbersome
and have too many groups
even with only a few factors.
In much research, you won't
be interested in a fully-
crossed factorial design like
the ones we've been showing
that pair every combination of
levels of factors. Some of the
combinations may not make
sense from a policy or
administrative perspective, or
you simply may not have
enough funds to implement all
combinations. In this case,
you may decide to implement
an incomplete factorial
design. In this variation, some of the cells are intentionally left empty -- you don't assign people to get those
combinations of factors.
One of the most common uses of incomplete factorial design is to allow for a control or placebo group that receives
no treatment. In this case, it is actually impossible to implement a group that simultaneously has several levels of
treatment factors and receives no treatment at all. So, we consider the control group to be its own cell in an
incomplete factorial rubric (as shown in the figure). This allows us to conduct both relative and absolute treatment
comparisons within a single study and to get a fairly precise look at different treatment combinations.

[ Home ]
A regression threat, also known as a

"regression artifact" or "regression to
the mean" is a statistical phenomenon
that occurs whenever you have a
nonrandom sample from a population
and two measures that are imperfectly
correlated. The figure shows the
regression to the mean phenomenon.
The top part of the figure shows the
pretest distribution for a population.
Pretest scores are "normally"
distributed, the frequency distribution
looks like a "bell-shaped" curve.
Assume that the sample for your study
was selected exclusively from the low
pretest scorers. You can see on the top
part of the figure where their pretest
mean is -- clearly, it is considerably
below the population average. What
would we predict the posttest to look
like? First, let's assume that your
program or treatment doesn't work at all
(the "null" case). Our naive assumption
would be that our sample would score
just as badly on the posttest as they did
on the pretest. But they don't! The
bottom of the figure shows where the
sample's posttest mean would have
been without regression and where it
actually is. In actuality, the sample's
posttest mean wound up closer to the
posttest population mean than their
pretest mean was to the pretest
population mean. In other words, the
sample's mean appears to regress
toward the mean of the population from
pretest to posttest.
Why Does It Happen?
Let's start with a simple explanation and work from there. To see why regression to the mean happens, consider a
concrete case. In your study you select the lowest 10% of the population based on their pretest score. What are the
chances that on the posttest that exact group will once again constitute the lowest ten percent? Not likely. Most of
them will probably be in the lowest ten percent on the posttest, but if even just a few are not, then their group's mean
will have to be closer to the population's posttest than it was to the pretest. The same thing is true on the other end.
If you select as your sample the highest ten percent pretest scorers, they aren't likely to be the highest ten percent
on the posttest (even though most of them may be in the top ten percent). If even just a few score below the top ten
percent on the posttest their group's posttest mean will have to be closer to the population posttest mean than to
their pretest mean.
Here are a few things you need to know about the regression to the mean phenomenon:
It is a statistical phenomenon.
Regression toward the mean occurs for two reasons. First, it results because you asymmetrically
sampled from the population. If you randomly sample from the population, you would observe
(subject to random error) that the population and your sample have the same pretest average.
Because the sample is already at the population mean on the pretest, it is impossible for them to
regress towards the mean of the population any more!
It is a group phenomenon.
You cannot tell which way an individual's score will move based on the regression to the mean
phenomenon. Even though the group's average will move toward the population's, some individuals
in the group are likely to move in the other direction.
It happens between any two variables.
Here's a common research mistake. You run a program and don't find any overall group effect. So,
you decide to look at those who did best on the posttest (your "success" stories!?) and see how
much they gained over the pretest. You are selecting a group that is extremely high on the posttest.
They won't likely all be the best on the pretest as well (although many of them will be). So, their
pretest mean has to be closer to the population mean than their posttest one. You describe this nice
"gain" and are almost ready to write up your results when someone suggests you look at your
"failure" cases, the people who score worst on your posttest. When you check on how they were
doing on the pretest you find that they weren't the worst scorers there. If they had been the worst
scorers both times, you would have simply said that your program didn't have any effect on them.
But now it looks worse than that -- it looks like your program actually made them worse relative to
the population! What will you do? How will you ever get your grant renewed? Or your paper
published? Or, heaven help you, how will you ever get tenured?
What you have to realize, is that the pattern of results I just described will happen anytime you
measure two measures! It will happen forwards in time (i.e., from pretest to posttest). It will happen
backwards in time (i.e., from posttest to pretest)! It will happen across measures collected at the
same time (e.g., height and weight)! It will happen even if you don't give your program or treatment.
It is a relative phenomenon.
It has nothing to do with overall maturational trends. Notice in the figure above that I didn't bother
labeling the x-axis in either the pretest or posttest distribution. It could be that everyone in the
population gains 20 points (on average) between the pretest and the posttest. But regression to the
mean would still be operating, even in that case. That is, the low scorers would, on average, be
gaining more than the population gain of 20 points (and thus their mean would be closer to the
population's).
You can have regression up or down.
If your sample consists of below-population-mean scorers, the regression to the mean will make it
appear that they move up on the other measure. But if your sample consists of high scorers, their
mean will appear to move down relative to the population. (Note that even if their mean increases,
they could be losing ground to the population. So, if a high-pretest-scoring sample gains five points
on the posttest while the overall sample gains 15, we would suspect regression to the mean as an
alternative explanation [to our program] for that relatively low change).
The more extreme the sample group, the greater the regression to the mean.
If your sample differs from the population by only a little bit on the first measure, their won't be much
regression to the mean because there isn't much room for them to regress -- they're already near
the population mean. So, if you have a sample, even a nonrandom one, that is a pretty good
subsample of the population, regression to the mean will be inconsequential (although it will be
present). But if your sample is very extreme relative to the population (e.g., the lowest or highest x
%), their mean is further from the population's and has more room to regress.
The less correlated the two variables, the greater the regression to the mean.
The other major factor that affects the amount of regression to the mean is the correlation between
the two variables. If the two variables are perfectly correlated -- the highest scorer on one is the
highest on the other, next highest on one is next highest on the other, and so on -- there will no be
regression to the mean. But this is unlikely to ever occur in practice. We know from measurement
theory that there is no such thing as "perfect" measurement -- all measurement is assumed (under
the true score model) to have some random error in measurement. It is only when the measure has
no random error -- is perfectly reliable -- that we can expect it will be able to correlate perfectly.
Since that just doesn't happen in the real world, we have to assume that measures have some
degree of unreliability, and that relationships between measures will not be perfect, and that there
will appear to be regression to the mean between these two measures, given asymmetrically
sampled subgroups.
The Formula for the Percent of Regression to the Mean
You can estimate exactly the percent of regression to the mean in any given situation. The formula is:
Prm = 100(1 - r)
where:
Prm = the percent of regression to the mean

r = the correlation between the two measures
Consider the following four cases:
if r = 1, there is no (i.e., 0%) regression to the mean

if r = .5, there is 50% regression to the mean
if r = .2, there is 80% regression to the mean
if r = 0, there is 100% regression to the mean
In the first case, the two variables are perfectly correlated and there is no regression to the mean. With a correlation
of .5, the sampled group moves fifty percent of the distance from the no-regression point to the mean of the
population. If the correlation is a small .20, the sample will regress 80% of the distance. And, if there is no correlation
between the measures, the sample will "regress" all the way back to the population mean! It's worth thinking about
what this last case means. With zero correlation, knowing a score on one measure gives you absolutely no
information about the likely score for that person on the other measure. In that case, your best guess for how any
person would perform on the second measure will be the mean of that second measure.
Estimating and Correcting Regression to the Mean
Given our percentage formula, for any

given situation we can estimate the
regression to the mean. All we need to
know is the mean of the sample on the first
measure the population mean on both
measures, and the correlation between
measures. Consider a simple example.
Here, we'll assume that the pretest
population mean is 50 and that we select a
low-pretest scoring sample that has a
mean of 30. To begin with, let's assume
that we do not give any program or
treatment (i.e., the null case) and that the
population is not changing over time on the
characteristic being measured (i.e., steady-
state). Given this, we would predict that the
population mean would be 50 and that the
sample would get a posttest score of 30 if
there was no regression to the mean. Now,
assume that the correlation is .50 between
the pretest and posttest for the population.
Given our formula, we would expect that
the sampled group would regress 50% of
the distance from the no-regression point
to the population mean, or 50% of the way
from 30 to 50. In this case, we would
observe a score of 40 for the sampled
group, which would constitute a 10-point
pseudo-effect or regression artifact.
Now, let's relax some of the initial assumptions. For instance, let's assume that between the pretest and posttest the
population gained 15 points on average (and that this gain was uniform across the entire distribution, that is, the
variance of the population stays the same across the two measurement occasions). In this case, a sample that had
a pretest mean of 30 would be expected to get a posttest mean of 45 (i.e., 30+15) if there is no regression to the
mean (i.e., r=1). But here, the correlation between pretest and posttest is .5 so we expect to see regression to the
mean that covers 50% of the distance from the mean of 45 to the population posttest mean of 65. That is, we would
observe a posttest average of 55 for our sample, again a pseudo-effect of 10 points.
Regression to the mean is one of the trickiest threats to validity. It is subtle in its effects, and even excellent
researchers sometimes fail to catch a potential regression artifact. You might want to learn more about the
regression to the mean phenomenon. One good way to do that would be to simulate the phenomenon. If you're not
familiar with simulation, you can get a good introduction in the The Simulation Book. If you already
understand the basic idea of simulation, you can do a manual (dice rolling) simulation of regression
artifacts or a computerized simulation of regression artifacts.

Computer Simulations for Research Design
written by
William Trochim
Sarita Davis
This is a complete online workbook that introduces the use of computer simulations in applied social research designs. There are two
versions of each of the simulations, one accomplished manually (by rolling dice) and the other done using the MINITAB statistical
package. The major two-group, program-comparison designs (randomized experiment, regression-discontinuity, nonequivalent group
design) are simulated and the issue of regression artifacts is studied. Please direct any comments to Bill Trochim.
● Acknowledgments
● Introduction to Simulations
● Manual Simulations
● Generating Data
● The Randomized Experimental Design
● The Nonequivalent Group Design
● The Regression Discontinuity Design
● Regression Artifacts
Computer Simulations
● Generating Data
● The Randomized Experimental Design
● The Nonequivalent Group Design
● (Part I)
● (Part II)
● The Regression Discontinuity Design

● Regression Artifacts
● Applications of Simulations in Social Research

● Conclusion
● References
Home Page
Copyright © 1996, William M.K. Trochim

Acknowledgments
These simulation exercises have evolved from an earlier set of dice rolling exercises that Donald T. Campbell used (and still uses, we
hear) in the 1970s in teaching research methodology to undergraduate and graduate students. Over the years, those exercises helped
introduce many a struggling graduate student to the joys both of simulation and methodology. We hope that much of the spirit of
those earlier simulations is retained here. Certainly, none of the problems in our simulations can be attributed to Campbell's efforts.
He was able to achieve a blend of congeniality and rigor that we have tried to emulate.
The computer versions of these simulations came out of Bill Trochim's efforts in the early 1980s to translate some of those Campbell
dice rolling exercises into increasingly available computer technologies. Previous versions were implemented in a number of the
graduate and undergraduate research methods courses at Cornell over the years. We owe a great debt to the many students who
struggled with earlier drafts and offered their valuable criticisms and suggestions.
During the mid-80s Trochim began working with these exercises with James Davis who, at the time, was T.A. for his graduate-level
methods courses. James improved on them considerably, taking what were separate exercises and integrating them into a single
computerized simulation that illustrated the three major pre/post research designs. His efforts led to two co-authored articles on
simulation cited in this workbook.
This current set of exercises was resurrected in the Spring of 1993 initially to provide an interesting and challenging problem area for
Sarita Tyler's Ph.D. qualifying examinations. Essentially she took a set of file folders that had some poorly xeroxed copies of the old
dice rolling and computer exercises on them, and integrated these into the coherant package contained here. We had no idea when
she began that this process was going to result in an integrated workbook -- all she originally intended was to learn something about
simulations. Clearly the present volume would not have happened without her considerable efforts.
Simulation Home Page

Introduction to Simulations
Simulation (sim' yoo la 'shen ) an imitation or counterfeit. This definition, according to Websters Dictionary, implies the presence of
a replication so well constructed that the product can pass for the real thing. When applied to the study of research design,
simulations can serve as a suitable substitute for constructing and understanding field research. Trochim and Davis (1986) posit that
simulations are useful for (1) improving student understanding of basic research principles and analytic techniques: (2) investigating
the effects of problems that arise in the implementation of research; and (3) exploring the accuracy and utility of novel analytic
techniques applied to problematic data structures.
As applied to the study of research design, simulations can serve as a tool to help the teacher, evaluator, and methodologist address
the complex interaction of data construction and analysis, statistical theory, and the violation of key assumptions. In a simulation, the
analyst first creates data according to a known model and then examines how well the model can be detected through data analysis.
Teachers can show students that measurement, sampling, design, and analysis issues are dependent on the model that is assessed.
Students can directly manipulate the simulation model and try things out to see immediately how results change and how analyses are
affected. The evaluator can construct models of evaluation problems -- making assumptions about the pretest or type of attrition,
group nonequivalence, or program implementation -- and see whether the results of any data analyses are seriously distorted. The
methodologist can systematically violate assumptions of statistical procedures and immediately assess the degree to which the
estimates of program effect are biased (Trochim and Davis, 1986, p. 611).
Simulations are better for some purposes than is the analysis of real data. With real data, the analyst never perfectly knows the real-
world processes that caused the particular measured values to occur. In a simulation, the analyst controls all of the factors making up
the data and can manipulate these systematically to see directly how specific problems and assumptions affect the analysis.
Simulations also have some advantages over abstract theorizing about research issues. They enable the analyst to come into direct
contact with the assumptions that are made and to develop a concrete "feel" for their implications on different techniques.
Simulations have been widely used in contemporary social research (Guetzkow, 1962; Bradley, 1977, Heckman, 1981). They have
been used in program evaluation contexts, but to a much lesser degree (Mandeville, 1978; Raffeld et at., 1979; Mandell and Blair
1980). Most of this work has been confined to the more technical literature in these fields.
Although the simulations described here can certainly be accomplished on mainframe computers, this workbook will illustrate their
use in manual and microcomputer contexts. There are several advantages to using simulations in these two contexts. The major
advantage to manual simulations is that they cost almost nothing to implement. The materials needed for this process are: dice, paper,
and pencils. Computer simulations are also relatively low in cost. Once you have purchased the microcomputer and necessary
software there are virtually no additional costs for running as many simulations as are desired. As it is often advantageous to have a
large number of runs of any simulation problem, the costs in mainframe computer time can become prohibitive. A second advantage
is the portability and accessibility. Manual simulations can be conducted anywhere there is a flat surface on which to roll dice.
Microcomputers are also portable in that one can easily move from home to office to classroom or into an agency either to conduct
the simulations or to illustrate their use. Students increasingly arrive at colleges and universities with microcomputers that enable
them to conduct simulations on their own.
This workbook illustrates some basic principles of manual and computer simulations and shows how they may be used to improve
the work of teachers, evaluators, and methodologists. The series of exercises contained in this manual are designed to illuminate a
number of concepts that are important in contemporary social research methodology including:
● simulations and their role in research

● basic measurement theory concepts
● the elements of pretest/posttest group designs, including nonequivalent, regression-discontinuity and randomized
experimental designs
● some major threats to internal validity, especially regression artifacts and selection threats
The basic model for research design presented in this simulation workbook is the program or outcome evaluation. In program
evaluation the goal is to assess the effect or impact of some program on the participants. Typically, two groups are studied. One
group (the program group) receives the program while the other does not (the comparison group). Measurements of both groups are
gathered before and after the program. The effect of the program is determined by looking at whether the program group gains more
than the comparison group from pretest to posttest. The exercises in this workbook describe how to simulate the three most
commonly used program evaluation designs, the Randomized Experiment, the pretest/posttest Nonequivalent Group Design, and the
Regression-Discontinuity design. Additional exercises are presented on regression artifacts, which can pose serious threats to internal
validity in research designs that involve within-subject treatment comparisons.
We can differentiate between these research designs by considering the way in which assignment of units to treatment conditions is
conducted - in other words, what rule has determined treatment assignment. In the randomized experimental (RE) design, persons are
randomly assigned to either the program or comparison group. In the regression-discontinuity (RD) design (Trochim, 1984), all
persons who score on one side of a chosen preprogram measure cutoff value are assigned to one group, with the remaining persons
being assigned to the other. In the nonequivalent group design (NEGD) (Cook and Campbell, 1979; Reichardt, 1979), persons or
intact groups (classes, wards, jails) are "arbitrarily" assigned to either the program or comparison condition. These designs have been
used extensively in program evaluations where one is interested in determining whether the program had an effect on one or more
outcome measures. The technical literature on these designs is extensive (see for instance, Cook and Campbell, 1979; Trochim,
1986). The general wisdom is that if one is interested in establishing a causal relationship (for example, in internal validity), RE
designs are most preferred, the RD design (because of its clear assignment-by-cutoff rule) is next in order of preference, and the
NEGD is least preferable.
All three of the program evaluation designs (RE, RD, and NEGD) have a similar structure, which can be described using the
notation:
O X O
O O
where the Os indicate measures and the X indicates that a program is administered. Each line represents a different group; the first
line depicts the program participants whereas the second shows the comparison group. The passage of time is indicated by movement
from left to right on a line. Thus, the program group is given a preprogram measure (indicated by the first O), is then given the
program (X), and afterward is given the postprogram measure (the last O). The vertical similarity in the measurement structure
implies that both the pre and postmeasures are given to both groups at the same time. Model-building considerations will be
discussed separately for each design.
The simulations are presented in two parts. The first part contains the manual simulations, including the basic randomized
experiment, nonequivalent group and regression-discontinuity research designs with an additional exercise presented on regression
artifacts. Part two of this manual contains the computer simulation equivalents of the research designs presented in part one. Also
included in this section is a computer analog to the regression artifacts simulation.
Both Parts I and II begin with an exercise called Generating Data. This exercise describes how to construct the data that will be used
in subsequent exercises. Because this exercise lays the foundation on which subsequent simulations are based, it is extremely
important that you do it first and follow the instructions very carefully.

PART I: Manual Simulations
The manual simulations described here rely on the use of dice to create data that mimic the types of information you might collect in
certain research situations. Essentially all you need to complete these exercises is a pair of dice, several different colored pens or
pencils, and some paper. You should begin these exercises with the first one, Generating Data. You cannot do the subsequent
exercises without doing this one first, because you will use the data generated in the first exercise as the basis for all the others. For
the most part, it is best if you go through the exercises in the order presented, although you may skip exercises if desired.
While there are advantages to using dice to simulate data, there are also various shortcomings. Rolling dice can take some time. In
the time it takes you to roll and record the value of a pair of dice, most computers can create hundreds or even thousands of random
numbers. But manual simulations allow you to observe carefully how a simulation is constructed. There is a certain tactile quality to
them that cannot be equaled on a computer. This is especially valuable for students who are new to simulation or to the research
design topics covered here. Because dice rolling takes considerably longer than computer data generation, the total number of cases
you can create is limited. Simulations work best -- show results most clearly -- when there are more cases rather than fewer.
Consequently, the results you obtain may not be as clear from these manual simulations as from the computer ones. Because of this
limitation, it would be desirable for you to do these manual simulations in concert with others, perhaps in connection with a class you
are taking or with a group of friends interested in social research methods. After completing each exercise you can compare results to
get a clearer picture of whether your data patterns are typical or more unusual.
Another disadvantage of dice rolling for generating data is in the distribution that results. Much of the statistical analysis in
contemporary social research assumes that the data come from a normal or bell-shaped distribution. The roll of a pair of dice
approximates such a distribution, but not exactly. In fact, the distribution of a pair of dice is a triangular one with a minimum value of
2, a maximum of 12, and an average of 7. You can see that by looking at a table of all possible sums of a pair of dice shown in Table
1.
1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
Table 1. All possible sums of the roll of two dice.
You can see the theoretical distribution that results in the histogram in Figure 1. While this is not exactly a bell-shaped curve, it is
similar in nature, especially in that it has its highest value in the center of the distribution, with values declining in frequency towards
the tails. For all practical purposes, this difference in distribution has no effect on the results of the manual simulations. However, if
you tried to use dice to generate large amounts of data for analysis by statistical procedures that assume normal distributions, you
would be violating that assumption and might get erroneous results.
In these manual simulations, we have kept statistical jargon to an absolute minimum. We don't require you to calculate any formal
statistics beyond an average. For many statistical analyses common in social research, we have you try to estimate what you would
get. For instance, we have you try to fit a straight line through a pre/post data plot by hand. In statistical analysis (and in the
simulations in part two) we would fit a regression line. We don't have you calculate statistical formulas in these exercises because the
calculations would often be cumbersome and time-consuming and would most likely detract from your understanding of the
simulation principles involved.
Figure 1. Frequency distribution of all possible sums of the roll of two dice.
However, you could do the calculations on your own or by entering the dice rolling data into a computer for analysis.
One of the advantages and distinct pleasures of doing simulations is that they allow you to experiment with assumptions about data
and find out what happens. Once you have completed the manual exercises, we encourage you to use your creativity to explore
assumptions that you find questionable. Making up your own simulations is a relatively simple task and can lead to greater
understanding of how social research operates.

Generating Data
This exercise will illustrate how simulated data can be created by rolling dice to generate random numbers. The data you create in
this exercise will be used in all of the subsequent manual simulation exercises. Think about some test or measure that you might like
to take on a group of individuals. You administer the test and observe a single numerical score for each person. This score might be
the number of questions the person answered correctly or the average of their ratings on a set of attitude items, or something like that,
depending on what you are trying to measure. However you measure it, each individual has a single number that represents their
performance on that measure. In a simulation, the idea is that you want to create, for a number of imaginary people, hypothetical test
scores that look like the kinds of scores you might obtain if you actually measured these people. To do this, you will generate data
according to a simple measurement model, called the "true score" model. This model assumes that any observed score, such as a
pretest or a posttest score, is made up of two components: true ability and random error. You don't see these two components when
you measure people in real life, you just assume that they are there.
We can describe the measurement model with the formula
O = T + eo
where O is the observed score, T is the person's true ability or response level on the characteristic being measured and eO represents
random error on this measure. In real life, all we see is the person's score -- the O in our formula above. We assume that part of this
number or score tells us about the true ability or attitude of the person on that measure. But, we also assume that part of what we
observe in their score may reflect things other than what we are trying to measure. We call this the error in measurement and use the
symbol eo to represent it in the formula. This error reflects all the situational factors (e.g., bad lighting, not enough sleep the night
before, noise in the testing room, lucky guesses, etc.) which can cause a person to score higher or lower on the test than his/her true
ability or level alone would yield. In the true score measurement model, we assume that this error is random in nature, that for any
individual these factors are as likely to inflate or deflate their observed score. There are models for simulating data that make
different assumptions about what influences observed scores, but the true score model is one of the simplest and is the most
commonly assumed.
You will use this true score model to generate imaginary pretest and posttest scores for 50 hypothetical persons. This will be
accomplished using a pair of dice. For each person you will roll the pair of dice once to generate a score representing true ability,
once to generate pretest measurement error and once to generate posttest measurement error. These values should be entered for each
person in the appropriate columns in Table 1-1. You will then construct a pretest using the simple formula
X = T + eX
where X is the pretest, T is the true ability (simply the sum of the roll of a pair of dice) and eX is pretest measurement error (also
based on the sum of the roll of a pair of dice). In real life this is all you would be given, and you would assume that each test score is
a reflection of some true ability and random error. You would not see the two components; you only see the observed score.
Similarly, you will then construct a posttest score using the formula
Y = T + eY
where Y is the posttest, T the same true score that is used for the pretest and eYis posttest measurement error (based on the sum of yet
another roll of the pair of dice).
This procedure can be made clearer by doing it. Notice that the first column in Table 1-1 lists the numbers of the persons in the study,
from 1 to 50. You will begin by generating a pretest and posttest score for person 1. First, roll the pair of dice once and sum the
values (this will be a score between 2 and 12). This is called the true score. Enter the value in the first row of column 2. This score
represents the true ability or level (T) of person 1 on this measure. Repeat this step for all 50 persons.
Second, roll two dice and place their sum in the first row of column 3. This number represents the error in measurement on the
pretest (eX). Repeat this for all 50 persons. Third, roll the pair of dice again and place their sum in the first row of column 4. This
value represents error in measurement on the posttest (eY). Again, repeat this for all 50 persons. You have now created an imaginary
true score and errors in measurement for all 50 persons, recording the results in the appropriate columns.
Now you are going to construct the observed pretest and posttest scores. This requires only simple addition. For the first person (row)
take the true score (T) from column 2 and add it to the pretest error value (eX) from column 3. Place this sum in column 5 (the
pretest, X). Do this for all 50 people. Now, for the first person, add the true score (T) from column 2 to the posttest error value (eY)
from column 4. Place this sum in column 6 (the posttest, Y). Do this for all 50 people.
It would be worth stopping at this point to think about what you have done. You have been creating imaginary test scores. You have
constructed two tests called X and Y. Both of these imaginary tests measure the same trait because both of them share the same true
score. The true score reflects the true ability of each person on this imaginary or simulated test. In addition, each test has its own
random error. If this were real life, of course, you would not be constructing test scores like this. Instead, you would simply be given
the two sets of observed test scores, X and Y. You would assume that the two measures have a common true score and independent
errors but would not see these. Thus, you have generated simulated data. The advantage of using such data is that, unlike with real
data, you know how the X and Y tests are constructed because you constructed them. You will see in later simulations that this
enables you to test different analyses to see if they give the results that you put into the data. If the analyses work on simulated data,
then, you may assume that they will also work for real data as long as the real data meet the assumptions of the measurement model
used in the simulations.
Next, you are going to look at the pretest and posttest data you simulated. Let's do this by graphing the pretest and posttest
histograms. Figure 1-1 can be used to graph the pretest. Begin with the first person's pretest (X) value in column 5. Locate the
column on Figure 1-1 for that value and make an 'X' in the first row of that column on the figure. For instance, if the first person has
a pretest score of 7, your graph should look like:
Now continue plotting the pretest values for the 50 people. If you come to a value that you already had before, place your 'X' in the
row above the last 'X' you made for that value. For instance, if the second person had a pretest score of 9 and the third had a score of
7, your graph for these first three people would look like:
Repeat this for the pretest scores for all 50 people. Now, using Figure 1-2, repeat this process to draw the histogram for the posttest
values in column 6.
Now let's estimate the central tendency for the pretest distribution shown in Figure 1-1. The best way to do this would be to calculate
the mean or average of the 50 scores. But a quicker way to get a rough idea would be to locate the middle of the distribution by
counting. Starting with the lowest column in which there is an 'X' in Figure 1-1, count the lowest 25 'Xs' in the figure. What column
of Figure 1-1 is the 25th 'X' in? Simply put a mark at the bottom of the figure under this column to show that this is where the
"center" of the distribution is located. Then, use the same counting procedure to estimate where the center is on the posttest histogram
of Figure 1-2.
Now, let's look at the pretest and posttest scores together. You will graph their bivariate (i.e., two-variable) distribution on the graph
in Figure 1-3. To do this, begin with the pretest and posttest score for person 1. Notice that the pretest is shown on the horizontal axis
while the posttest is the vertical one. Go along the horizontal axis in Figure 1-3 until you come to the value for the pretest score for
the first person. Now go up in that column until you come to the row that has the value for the posttest score for the first person. You
are going to make a mark in the box that represents the pretest (column) and posttest (row) value for the first person. But because
there may be more than one person who has the same pretest and posttest score, you will want to use a system to mark the box that
allows you to see how many people of the fifty have a pre-post pair in any box. We recommend that you use the following system.
For the first mark in a specific box, do .
The second time you find a person with the same pre/post pair, add another diagonal .
For a third case, add a vertical line .
If there is a fourth, add a horizontal line .
It is not likely that you will have any more than four cases in any given box, but if you do, create a way to indicate this. In this
manner, plot all of the pre/post pairs for the 50 persons in your simulation.
Now let’s try to fit a line through this bivariate distribution in Figure 1-3. To do this, begin with the leftmost column on the graph.
For each column, you are going to try to estimate its central tendency. If there are no marks in a column, skip that column and move
to the next column to the right. If there are marks in the column, place a dot (•) halfway between the lowest and highest mark in that
column. If there is only one mark in a column, just place the dot in that row. Note that there will only be one dot per column. (This is,
admittedly, a rough and simplified way to estimate central tendency. If you want to be more accurate, you can calculate the average
posttest score for all persons having the same pretest score and place your mark accordingly.) Nevertheless, our rough estimation
procedure should approximate the central tendency well enough for our purposes here. Now, beginning with the dot farthest to the
left, connect the dots in adjacent columns with a line. Because it may be hard to distinguish this line from the bivariate marks you
made in the boxes, you might want to connect these dots using a different colored pen. The figure below shows how a part of your
bivariate plot with the dots and connecting lines might look..
Is the line that connects the dots in your graph relatively smooth? or very jagged? Is it a flat (horizontal) line? or not? Does this line
tell you anything about the relationship between the pretest and posttest? It should be clear that the X and Y tests are positively
related to each other, that is, higher scores on one test tend to be associated with higher scores on the other.
Now, you should again stop to consider what you have done. In the first part of the exercise you generated two imaginary tests--X
and Y. In the second part, the bivariate graph showed you that the two tests are positively related to each other. You set them up to be
related by including the same true ability score in both tests. You should think about the following points:
• If you had generated data for thousands of persons, the pretest and posttest distributions would look nearly identical. Furthermore,
the estimates of pretest and posttest central tendency (e.g., averages) would be nearly identical and both distributions would have
equal numbers of persons on either side of the central values. You can get a better sense of this if you compare your graphs with
those of other persons who do this exercise.
• Each score (pretest and posttest) is composed of equal parts of true ability and random error. This is a common (although simplistic)
measurement model called the “true score” model. Because we only have one true score for each test, we are assuming that each test
is unidimensional, that is, measures only one trait. A factor analysis of both tests should yield one factor.
• The amounts of true score and error which are present in a test determine the reliability of the test. If you had used two parts true
score to one part error, you would have more reliable tests; if you had used one part true score to two parts error, less reliable tests.
(Specifically, reliability is defined as the ratio of the variance of the true scores to the variance of the total or observed score.)
• The pretest and posttest are related because they both share the same true score. (If you had generated separate pretest and posttest
true scores there would be no relationship between the two tests.) But the relationship between pretest and posttest is far from perfect
because each test has independent measurement error. In this example, if you computed the correlation it would be about .5.
• The line that you fit to the bivariate distribution is a very rough approximation to a regression line. You should be convinced that if
you had thousands of persons, the best line through the data would be a straight line with a slope equal to about .5. (If the variances
of the two variables are equal, as in this example, the correlation would be equal to the slope of the regression line. You can see
whether the variances appear equal by looking at the spread of the scores around the central values in the pretest and posttest
frequency distributions.)
Generating Data
Table 1-1
1 2 3 4 5 6
Pretest Error Posttest Error
Person True Score (T) (eX) (eY) Pretest X Posttest Y
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Generating Data
Table 1-1
(cont.)
1 2 3 4 5 6
Pretest Error Posttest Error
Person True Score (T) (eX) (eY) Pretest X Posttest Y
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Generating Data
Figure 1-1
Generating Data
Figure 1-2
Generating Data
Figure 1-3

The Randomized Experimental Design
In this exercise you will simulate a simple pre/post randomized experimental design. This design can be depicted in notational form as
R O X O
R O O
where each O indicates an observation or measure on a group of people, the X indicates the implementation of some treatment or program,
separate lines are used to depict the two groups in the study, the R indicates that persons were randomly assigned to either the treatment or control
group, and the passage of time is indicated by moving from left to right. We will assume that we are comparing a program and comparison group
(instead of two programs or different levels of the same program).
You will use the data you created in the first exercise. Copy the pretest scores from the first exercise (Table 1-1, column 5) into column 2 of Table
2-1. Now we need to randomly assign the 50 persons into two groups. To do this, roll one die for each person. If you get a 1,2, or 3, consider that
person to be in the program group and place a Ô1Õ in the column 3 of Table 2-1 labeled ÒGroup Assignment (Z)Ó. If you get a 4, 5, or 6,
consider that person to be in the comparison group and place a ‘0’ in the column 3. Now, imagine that you give the program or treatment to the
people who have a '1' for Group Assignment and that the program has a positive effect. In this simulation, we will assume that the program has an
effect of 7 points for each person who receives it. This "Hypothetical Program Effect" is shown in column 4 of Table 2-1. To determine the
treatment effect for each person, multiply column 3 by column 4 and place the result in column 5 labeled ÒEffect of Program (G)Ó. (The G stands
for how much each person Gains as a result of the program.) You should have a value of Ô7Õ for all persons randomly assigned to the program
group and a value of Ô0Õ for those in the comparison group. Why did we multiply the column of 0 and 1 values by the column of 7s when we
could just as easily have told you to simply put a 7 next to each program recipient? We do it this way because it illustrates how a 0,1 variable,
called a "dummy variable," can be used in a simple formula to show how the Effect of the Program is created. In this simulation, for instance, we
can summarize how the program effect is created using the formula
G=Zx7
where G is the gain or program effect, Z is the dummy coded (0,1) variable in column 3 and the 7 represents the constant shown in column 5.
Next, copy the posttest values (Y) from the first exercise (Table 1-1, column 6) to column 6 of Table 2-1 labeled “Posttest (Y) from Table 1-1".
Finally, to create the observed posttest value that has a program effect built in, add the values in column 5 and 6 in Table 2-1 and put them into
column 7 labeled "Posttest (Y) for Randomized Experimental DesignÓ. You should recognize that this last column in Table 2-1 has the same
posttest scores as in the first simulation exercise, except that each randomly assigned program person has 7 extra points that represent the effect or
gain of the program.
As before, you should graph the univariate distributions for the pretest and posttest in Figures 2-1 and 2-2. But here, unlike in the first exercise, the
50 people are randomly divided into two groups. It would be nice if you could distinguish the scores of these two groups in your graphs of the
distributions. You should do this by using different colored pens or pencils and lines that slant in different directions for the program and
comparison cases. For instance, let's say that the first four persons in Table 2-1 have the following pretest scores and group assignments in Table 2-
1:
Pretest
X Group Assignment
Person
from Z
Table 1-1
1 12 1
2 12 0
3 10 0
4 12 1
In this case, the histogram for these four persons would look like
Now plot the data for all 50 persons for both the pretest (Figure 2-1) and posttest (Figure 2-2) making sure to distinguish between the two
randomly assigned groups both color and in the angle of the mark you make. As in the first simulation, you should estimate the central tendency
for both the pretest and posttest, but here you should do it separately for the program and comparison cases.
Now, plot the bivariate distribution in Figure 2-3. Again, you need to have a system for distinguishing between program and comparison cases
when they fall in the same box. Use different colored pens or pencils for each. In addition, use different marks according to the following system:
First program case , first comparison case , second program case , second comparison case .
So, if you have a pre-post pair that happens to have two program and two comparison cases, the graph should look like .
Now, plot the lines on the bivariate plot that describe the pre/post relationship (as described in the first simulation exercise), doing a separate line
for the program and comparison groups.
There are several things you should note. First, the pretest central values for the two groups should be similar (although they are not likely to be
exactly the same). This is, of course, because the groups were randomly assigned. Second, the posttest central values should clearly differ. In fact,
we expect that the difference between the average posttest values should be approximately the 7 units that you put in. In addition, the vertical
difference between the relationship lines in the bivariate plot (Figure 2-3) should also be about 7 units.
At this point you should be convinced of the following:
• The design simulated here is a very simple single-factor randomized experiment. The single factor is represented by the 0,1 treatment variable
that represents which group people are in. You could simulate more complex designs. For example, to simulate a randomized block design you
would first rank all persons on the pretest. Then you could set the block size, for example at n = 2. Then, beginning with the lowest two pretest
scorers, you would randomly assign one to the program group and the other to the comparison group. You could do this by rolling a die--if you get
a 1, 2, or 3 the lowest scorer is a program participant; if you get a 4, 5, or 6 the higher scorer is. Continuing in this manner for all twenty-five pairs
would result in a block design.
• You could also simulate a 2 x 2 factorial design. Let's say that you wanted to evaluate an educational program that was implemented in several
different ways. You decide to manipulate the manner in which you teach the material (Lecture versus Printed Material) and where the program is
given (In-class or Pull-out). Here you have two factors -- Delivery of Material and Location -- and each factor has two levels. We could summarize
this design with a simple table that shows the four different program variations you are evaluating
Notice that we need a 2x2 table to summarize this study. Not surprisingly, we would call this a 2x2 factorial experimental design. If you were
simulating this, you would actually have to randomly assign persons into one of the four groups. One advantage of this kind of design is that it
allows you to examine how different program characteristics operate together (or "interact") to produce a program effect.
• You should recognize that the design simulated here can be considered a repeated measures experimental design because a pretest and posttest
are used. You could simulate a posttest-only experimental design as well.
Randomized Experimental Design

Table 2-1
1 2 3 4 5 6 7
Posttest
Pretest Posttest
Effect of (Y)
X Group Assignment Hypothetical Y
Person Program for Randomized
from Z Program Effect from
(G) Experimental
Table 1-1 Table 1-1
Design
1 7
2 7
3 7
4 7
5 7
6 7
7 7
8 7
9 7
10 7
11 7
12 7
13 7
14 7
15 7
16 7
17 7
18 7
19 7
20 7
21 7
22 7
23 7
24 7
25 7

Table 2-1
(cont.)
1 2 3 4 5 6 7
Posttest
Pretest Posttest
Effect of (Y)
X Group Assignment Hypothetical Y
Person Program for Randomized
from Z Program Effect from
(G) Experimental
Table 1-1 Table 1-1
Design
26 7
27 7
28 7
29 7
30 7
31 7
32 7
33 7
34 7
35 7
36 7
37 7
38 7
39 7
40 7
41 7
42 7
43 7
44 7
45 7
46 7
47 7
48 7
49 7
50 7

Figure 2-1

Figure 2-2

Figure 2-3

The Nonequivalent Group Design
In this exercise you are going to create a nonequivalent group or an untreated control group design of the form
N O X O
N O O
separate lines are used to depict the two groups in the study, the N indicates that assignment to either the treatment or control group is not
controlled by the researcher (the groups may be naturally formed or persons may self-select the group they are in), and the passage of time is
indicated by moving from left to right. We will assume that we are comparing a program and comparison group (instead of two programs or
different levels of the same program).
This design has several important characteristics. First, the design has pretest and posttest measures for all participants. Second, the design calls for
two groups, one which gets some program or treatment and one which does not (termed the "program" and "comparison" groups respectively).
Third, the two groups are nonequivalent, that is, we expect that they may differ prior to the study. Often, nonequivalent groups are simply two
intact groups which are convenient to the researcher (e.g., two classrooms, two states, two cities, two mental health centers, etc.).
You will use the pretest and posttest scores from the first exercise as the basis for this exercise. The first thing you need to do is to copy the pretest
scores from column 5 of Table 1-1 into column 2 of Table 3-1. Now, you have to divide the 50 participants into two nonequivalent groups. We can
do this in several ways, but the simplest would be to consider the first 25 persons as being in the program group and the second 25 as being in the
comparison group. The pretest and posttest scores of these 50 participants were formed from random rolls of pairs of dice. Be assured, that on
average these two subgroups should have very similar pretest and posttest means. But in this exercise we want to assume that the two groups are
nonequivalent and so we will have to make them nonequivalent. The easiest way to make the groups nonequivalent on the pretest is to add some
constant value to all the pretest scores for persons in one of the groups. To see how you will do this, look at Table 3-1. You should have already
copied the pretest scores (X) for each participant into column 2. Notice that column 3 of Table 3-1 has a number "5" in it for the first 25
participants and a "0" for the second set of 25 persons. These numbers describe the initial pretest differences between these groups (i.e., the groups
are nonequivalent on the pretest). To create the pretest scores for this exercise add the pretest scores from column 2 to the constant values in
column 3 and place the results in column 4 of Table 3-1 under the heading "Pretest (X) for Nonequivalent Groups". Note that the choice of a
difference of 5 points between the groups was arbitrary. Also note that in this simulation we have let the program group have the pretest advantage
of 5 points.
Now you need to create posttest scores. You should copy the posttest scores from column 6 of Table 1-1 directly into column 5 of Table 3-1. In
this simulation, we will assume that the program has an effect and you will add 7 points to the posttest score of each person in the program group.
In Table 3-1, the initial group difference (i.e., 5 points difference) is listed again in column 6 and the program effect or gain (i.e., 7 points) in
column 7. Therefore, you get the final posttest score by adding the posttest score from the first exercise (column 5), the group differences (column
6) and the program effect or gain (column 7). The sum of these three components should be placed in column 8 of Table 3-1 labeled "Posttest Y
for Nonequivalent Groups".
It is useful at this point to stop and consider what you have done. When you combine the measurement model from the first exercise with what you
have done here, we can represent each personÕs pretest score with the formula
X = T + D + eX
where
X = the pretest score for a person
T = the true ability or true score (based on the roll of a pair of dice)
D = initial group difference (D = 5 if the person is in the program group; D = 0 if in comparison group)
eX = pretest measurement error (based on the roll of a pair of dice)
Similarly, we can now represent the posttest for each person as
Y = T + D + G + eY
Y = the posttest score for a person
T = the same true ability as for the pretest
D = the same initial group difference as on the pretest
G = the effect of the program or the Gain (G = 7 for persons in the program; G = 0 for comparison persons)
eY = posttest measurement error (based on a different roll of the dice than pretest error)
It is important to get a visual impression of the data and so, as in the first two exercises, you should graph the univariate and bivariate distributions.
Remember that as in the randomized experimental simulation you need to distinguish the program group scores from the comparison group scores
on all graphs. Graph the pretest distribution in Figure 3-1, the posttest in Figure 3-2, and the bivariate distribution in Figure 3-3. As before, you
should also estimate the central tendency in the univariate distributions, taking care to do this separately for each group. And, you should visually
fit a line through the bivariate data, fitting separate lines for the program and comparison groups.
When all of this is completed you should be convinced of the following:
● There are differences between the program and comparison groups on the pretest. If you examine the pretest distributions in Figure 3-1,
you should see that the central score for the program group is about 5 points higher than the central score of the comparison group (this is
no surprise because you added in the 5 points). This difference is typical of what we expect when we use nonequivalent groups in
research and simply tells us that prior to the study one group is higher than the other on the pretest characteristic.
● There are even larger differences between the groups on the posttest. In fact, the posttest difference between groups should be about 12
points (again, this is no surprise because you added in 5 + 7 points). If this were real data and you were going to analyze it, you would
probably begin to suspect that your program may have had an effect because the posttest difference exceeds the pretest difference.
● If you were to graph the central values for the pretest and posttest for the two groups, you would probably get a picture that looks
something like this:
One alternative explanation (for a program effect) that you would have to consider is the possibility of a selection-maturation threat, that
is, that your two groups are maturing at different rates. However, you know this is not the case because you specifically put in the
samesize group difference of 5 points on both the pretest and posttest (i.e., in the absence of the program, the groups did not mature at
different rates). Nevertheless, if you were analyzing data like this in real life, you would have to assumethat in the absence of the program
the differences between the groups were the same on the pretest and posttest and that any additional difference (in this case 7 points) must
be due to the program. You might know from previous research that a maturational pattern like the one in the above figure would be
unlikely and rule out the threat as being improbable on that basis. Nevertheless, it should be apparent that you would be much better off,
if you had a better idea of how the two groups would have changed from pre to post in the absence of the program. If you had taken an
additional pretest observation (i.e. double pretest or the "dry run"experiment), you would have a much better idea of whether selection-
maturation is a legitimate threat. In any event, you should be more firmly convinced of the importance of selection bias threats in
nonequivalent group designs of this type.
● You should also note what would happen if you analyzed the data in other ways. Obviously a simple t-test of differences on the posttest
would give an inappropriately large estimate of program effect -- in this example, it would tell you that the groups differ by about 12
points, but you know that a good deal of that is due to initial differences. On the other hand, an analysis of variance (or t-test) on gain
scores would work here but only because you know that without the program (i.e., if you had not added the 7 point program effect) the
two groups would have gained, on the average, exactly the same amount (in this simulation, they would have gained nothing!). You
should be convinced then that the analysis of variance on gain scores relies on the assumption of equal gain in both groups in the absence
of the program.
● You have only simulated one possible outcome of many. You could, for example, simulate a null case (i.e., no effect of the program)
simply by omitting the 7 points added to the program group persons. You could have a constant maturation rate by adding a constant
value to all posttest scores. Or, you could simulate a selection-maturation problem by adding different constants to the posttest scores (or
true scores) of the two groups. Or you could start out with an inferior program group by adding the group difference to the comparison
group instead.
● Finally, you should also recognize an important fact about selection bias which is not illustrated in this exercise. When we select
nonequivalent groups we expect that they may differ on one or more characteristics prior to the study. If we find that the pretest scores of
our two groups are equal, we cannot assume that there is no selection bias or difference between the groups. The pretest averages could
be equal by chance or the groups could differ on any number of other characteristics that are not measured by the pretest but nevertheless
affect the posttest scores. We cannot conduct a t-test on pretest differences, find that there is no significant difference and conclude that
selection bias is not a problem. Selection bias occurs whenever our groups differ on some pre-study characteristic that affects the posttest
and when this pre-study difference is not perfectly described or "accounted for" by the difference on the pretest.
Nonequivalent Group Design
Table 3-1
1 2 3 4 5 6 7 8
Pretest
Pretest Posttest Posttest
Pretest (X) for Posttest Effect of
X Y (Y) for
Person Group Nonequi- Group Program
from from Nonequi-valent
Difference valent Difference (G)
Table 1-1 Table 1-1 Groups
Groups
1 5 5 7
2 5 5 7
3 5 5 7
4 5 5 7
5 5 5 7
6 5 5 7
7 5 5 7
8 5 5 7
9 5 5 7
10 5 5 7
11 5 5 7
12 5 5 7
13 5 5 7
14 5 5 7
15 5 5 7
16 5 5 7
17 5 5 7
18 5 5 7
19 5 5 7
20 5 5 7
21 5 5 7
22 5 5 7
23 5 5 7
24 5 5 7
25 5 5 7

Table 3-1
(cont.)
1 2 3 4 5 6 7 8
Pretest
Pretest Posttest Posttest
Pretest (X) for Posttest Effect of
X Y (Y) for
Person Group Nonequi- Group Program
from from Nonequi-valent
Difference valent Difference (G)
Table 1-1 Table 1-1 Groups
Groups
26 0 0 0
27 0 0 0
28 0 0 0
29 0 0 0
30 0 0 0
31 0 0 0
32 0 0 0
33 0 0 0
34 0 0 0
35 0 0 0
36 0 0 0
37 0 0 0
38 0 0 0
39 0 0 0
40 0 0 0
41 0 0 0
42 0 0 0
43 0 0 0
44 0 0 0
45 0 0 0
46 0 0 0
47 0 0 0
48 0 0 0
49 0 0 0
50 0 0 0

Figure 3-1
Figure 3-2

Figure 3

The Regression Discontinuity Design
In this exercise you are going to create data for a regression-discontinuity design. It can be depicted in notational form as:
C O X O
C O O
separate lines are used to depict the two groups in the study, the C indicates that assignment to either the treatment or control group is done using a
cutoff score on the pretest assignment measure, and the passage of time is indicated by moving from left to right. We will assume that we are
comparing a program and comparison group (rather than a relative comparison of two programs or different levels of the same program).
The regression-discontinuity design is a type of nonequivalent group design that is characterized by its method of assigning persons to groups
using a cutoff score on an assignment measure -- all persons who score above the cutoff are assigned to one group while those scoring on the other
side are assigned to the other. Two things need to be decided when selecting a cutoff value. First we need to decide whether the high or low pretest
scorers will receive the program. We might give the program to the high pretest scorers if we are studying the effects of scholarships (high
achievement), awards (high performance), novel medical treatments or therapies (high on measures of illness) and so on. We might give the
program to the low pretest scorers when studying compensatory education (low achievement), poverty (low income), and so on. In this exercise
we will simulate a program given to the low pretest scorers. Second, we need to decide the specific value of the pretest cutoff. In the real world,
cutoff values are selected in a number of ways. When there are a limited number of program openings, the cutoff score can be selected so that
exactly the desired number of persons score either above or below it (depending on whether the program goes to high or low scorers). In other
situations, some theoretical value is appropriate for the cutoff. For example, the pretest average might be chosen as the cutoff because in a
particular context it makes sense to give the program to those who are "below average" or "above average". In this exercise we arbitrarily use a
cutoff equal to the theoretical pretest average.
You will again make use of the pretest and posttest scores that you generated in the first exercise. If you recall that the pretest scores (as generated
in the first exercise) can range from 4 to 24, it should be clear that the expected pretest average is 14 units. Thus, we will assign all cases having a
pretest score less than or equal to 14 units to the program group and all others to the comparison group (remember that in this simulation the
program is given to the low pretest scorers). The assignment strategy can be summarized as follows:
Z = 1 if X <= 14
= 0 otherwise
where Z is the 0,1 "dummy" assignment variable.
You will generate the data for this exercise using Table 4-1. In the first column of Table 4-1 you should copy the pretest scores (Table 1-1, column
5) from the first exercise into column 2 of Table 4-1. Now, examine the pretest score for person 1. If it is less than or equal to 14, enter a '1' in
Column 3 of Table 4-1 labeled Group Assignment (Z). If it is 15 or higher, enter a '0'. Continue doing this for all 50 persons. When you have
finished, notice that the next column, labeled "Hypothetical Program Effect" consists entirely of '7's, that is, the program will increase the posttest
scores of each program participant by 7 units. But not everyone gets the program and so not everyone should get the effect of 7 units. You only
want those persons who have a Z = 1 (program persons) to get the 7.
An easy way to accomplish this is to multiply the assignment variable (Column 3) by the effect size (Column 4) and put the result in Column 5,
labeled "Effect of Program". So, the fourth column should have '7's for all program persons and '0's for all comparison group persons. Next, you
should copy the posttest scores from the first exercise (Column 6 of Table 1-1) into column 6 of Table 4-1. Finally, to get the posttest scores with
the program effect included you simply add the "Effect of Program" (Column 5) and posttest scores (Column 6) and place the result in Column 7
of Table 4-1 labeled "Posttest (Y) for Regression-Discontinuity Design."
It is useful at this point to stop and consider what you have done. In the first exercise you generated the pretest according to
X = T + eX
and in this exercise you constructed the program assignment variable Z using a cutoff rule. Then, using a hypothetical program effect of G = 7
units (G for Gain), you constructed the effect of the program by multiplying GZ. You then copied the posttest from the first exercise and you
should recall that it was generated by the model:
Y = T + eY
Finally, you added the effect of the program to this posttest value and obtained the posttest for this exercise:
Y = T = GZ + eY
Again, it is always important to examine the data visually and so should graph the univariate pretest distribution in Figure 4-1 and the univariate
posttest distribution in Figure 4-2. As in previous exercises, be sure to use a different colored pen or pencil for the program and comparison
groups. Also, estimate the central tendency for each group on both graphs using either the counting method or by computing the averages. You
should also graph the bivariate distribution as you did before, remembering to keep the marks for the two groups distinct both in color and symbol.
Also, estimate the line that fits through the bivariate data.
Let's consider the univariate distributions. Clearly, the pretest distribution in this exercise is identical to the pretest distribution of the first exercise.
The only difference is that the program group has scores of 14 or less and the comparison group has scores of 15 or more. Notice that because of
this the pretest averages for the two groups are very different. This is what we mean when we say that the regression-discontinuity design induces
maximal pretest differences between the groups. Now look at the posttest distribution. If this was all the information you had (i.e., you did not
know the pretest information) you would probably conclude that the program and comparison groups don't differ much --that is, the program is not
effective. It is only when you consider how different they are on the pretest that you can see there is a program effect, that is, the program group
did much better than would have been expected on the basis of their pretest scores.
Next, look at the bivariate distribution. As in the previous exercise, you visually fit separate lines for the program and comparison groups. Let's use
these jagged lines to try to estimate a straight line that fits through the data. You will have to do this visually. The figure below shows the dots
estimated in each column from a hypothetical example and the lines connecting them. It also shows the straight line that we visually estimated to
fit through the jagged one. You should estimate the line for the program and comparison groups separately. The program group line should be to
the left of column 14 (and include it) and the comparison group one should be to the right.
You can easily estimate the slopes of these lines. First, take the program group line. Place a dot somewhere on this line at a point where one of the
column lines intersects the straight line. Now move exactly two columns to the right and place a dot where the straight line intersects the column
line. At this point, you should have something that resembles the following:
You know that the horizontal line is exactly 2 units wide. Measure the vertical distance between the
two dots in your graph. Be sure that you measure this distance in terms of the units of the graph. Let's
say that you find that it is about 1-1/2 units high. To estimate the slope, you simply construct a ratio
where the vertical distance is the numerator and the horizontal distance is the denominator. In this
example, you would calculate:
slope = (1-½)/2
= 1.5/2
= .75 or ¾
The slope enables us to say how much change in the vertical direction we get for each 1-unit change in
the horizontal direction. In this example, for every increase of 1 unit in the X direction we get an
increase of .75 units or 3/4 unit in the Y direction. The estimates of slope for the program and comparison group lines should be very similar.
Now let's estimate the size of the program effect. First, draw a vertical line through the entire bivariate distribution at the cutoff point (i.e., X = 14).
Place a dot where the program group straight line intersects the cutoff line. Similarly, place a dot where the comparison group line intersects the
cutoff line. Now count the number of vertical units between these two dots. This is the regression-discontinuity estimate of the program effect.
You should find that this estimate is about 7 units which is, of course, what you put in. This is illustrated in the figure below.
After completing the previous exercise, you should be convinced of the following:
Although in these dice rolling simulations we have avoided presenting statistical terminology as much as possible, our discussion of the regression-
discontinuity design would not be complete without it. After all, the first half of the name of this design is "regression." It should be no surprise
that when we statistically analyze this design in the real world, we use regression analysis. Here, we consider some of the major issues involved in
such an analysis.
A crucial step in the analysis of data from the regression-discontinuity design involves guessing the true shape of the regression line. In our
example this is easy to do because we created the data and we know that the true shape is a straight line in each group. This is because the pretest
and posttest both share the same true score. In real life, we don't often know what the true regression shape is, and we have to guess at it. Thus, if
you were conducting a real data analysis, you might try a variety of regression lines until you were confident that you had captured this true shape.
Since we know that the true shape in this case is linear, we could construct the appropriate regression model as follows:
Y = b0 + b1X* + b2Z + eY
where:
Y = the posttest
X* = the pretest minus the cutoff value (i.e., X - 14)
Z = the 0,1 group assignment variable
b0 = the intercept, that is, the y value at which the comparison group regression line meets the cutoff line
b1 = the slope (we assume it's the same in both groups)
b2 = the program effect, that is, the amount you must add or subtract to b0 in order to find where the program group regression
line meets the cutoff line.
eY = random error
In regression-discontinuity analysis, we usually subtract the cutoff value from each pretest score before the analysis so that the cutoff is at a value
of X = 0 which is the intercept in the model. Notice that the term b2Z is simply the program effect b2 times the assignment variable (Z) which is
exactly what we put in as GZ. You should be able to estimate all of the b's in the formula above from the bivariate graph. First, b0, is the posttest
value for the point you marked on the cutoff line where the comparison group line intercepts it. Second, b1, is the estimate of the slope. If your
program and comparison group slope estimates differed considerably take the average of the two. Finally, b2, is the program effect -- the posttest
(Y) distance between the two regression lines at the cutoff. Let's say that you estimate b0 = 14, b1 = .5 and b2 = 7. You could then write out the
regression formula as:
Y = 14 + .5X* + 7Z
(We drop the eY term out because that describes deviations from the regression lines.) Basically, when you run a regression-discontinuity analysis
you enter in the values for Y, X* (remember to subtract the cutoff from each X) and Z and the regression program gives you the estimates of b0,
b1, and b2.
You should be convinced that this single formula describes the regression lines for both groups as well as the program effect. To see this, you can
construct the formula for each regression line separately. First, construct the formula for the program group line (substituting your own estimates
instead of these) by setting Z = 1 (remember this is the program group):
YP = 14 + .5X* + 7(1)
YP = 14 + .5X* + 7
YP = 21 + .5X*
Now you can construct the formula for the comparison group line by substituting Z = 0.
YC = 14 + .5X* + 7(0)
YC = 14 + .5X*
Now, to convince yourself that the program effect is correctly estimated, construct the program effect at the cutoff. Remember that we subtracted
the cutoff from each pretest value and so the cutoff is at X* = 0. Therefore, the Y estimate for the program group value at the cutoff in this
example would be
YP = 21 + .5(0)
YP = 21
and the comparison group y value at the cutoff would be
YC = 14 + .5(0)
YC = 14
and, therefore, the program effect would be the difference between the two groups or
YP - YC = 21 -14
YP - YC = 7
You should get a value close to the value of 7 units which is, of course, what you put in when you constructed the data. It should also be clear that
when a dichotomous dummy variable (e.g., Z) is used in a regression equation, you are essentially telling the analysis that you want to fit two
lines, one for each group having each value of Z.
Regression-Discontinuity Design
Table 4-1
1 2 3 4 5 6 7
Posttest
Pretest Posttest
Group Hypothetical (Y)
X Effect of Y
Person Assignment Program for Regression-
from Program from
Z Effect Discontinuity
Table 1-1 Table 1-1
Design
1 7
2 7
3 7
4 7
5 7
6 7
7 7
8 7
9 7
10 7
11 7
12 7
13 7
14 7
15 7
16 7
17 7
18 7
19 7
20 7
21 7
22 7
23 7
24 7
25 7
Table 4-1
(cont)
1 2 3 4 5 6 7
Posttest
Pretest Posttest
Group Hypothetical (Y)
X Effect of Y
Person Assignment Program for Regression-
from Program from
Z Effect Discontinuity
Table 1-1 Table 1-1
Design
26 7
27 7
28 7
29 7
30 7
31 7
32 7
33 7
34 7
35 7
36 7
37 7
38 7
39 7
40 7
4 7
42 7
43 7
44 7
45 7
46 7
47 7
48 7
49 7
50 7
Figure 4-1
Figure 4-2
Figure 4-3

Regression Artifacts
In this exercise we are going to look at the phenomenon of regression artifacts or "regression to the mean." First, you will use the data from the
original simulation and create nonequivalent groups just like you did in the Nonequivalent Group Design exercise. Then you will "match" persons
from the program and comparison groups who have the same pretest scores, dropping out all persons for whom there is no match. You do this
because you are concerned that the groups have different pretest averages, and you would like to obtain "equivalent" groups. Second, you are
going to regraph the data for all 50 persons from the Generating Data (GD) exercise, to gain a deeper understanding of regression artifacts.
To begin, review what you did in the NEGD exercise. Starting with 50 pretest and posttest scores (each composed of a common true score and
unique error components), you first made the groups nonequivalent on the pretest by adding 5 to each program person's pretest value. This initial
difference was the same on the posttest, and so you added the same 5 points there. Finally, you included a program effect of 7 points, added to
each program person's posttest score.
In this exercise, you will start with the data in the GD exercise, and will do the same thing you did in the NEGD exercise except that we will not
add in a program effect. That is, in this simulation we assume that the program either was never given or did not work (i.e., the null case). The first
thing you need to do is to copy the pretest scores from column 5 of Table 1-1 into column 2 of Table 5-1. Now, you have to divide the 50
participants into two nonequivalent groups. We can do this in several ways, but the simplest would be to consider the first 25 persons as being in
the program group and the second 25 as being in the comparison group. The pretest and posttest scores of these 50 participants were formed from
random rolls of pairs of dice. Be assured, that on average these two subgroups should have very similar pretest and posttest means. But in this
exercise we want to assume that the two groups are nonequivalent and so we will have to make them nonequivalent. The easiest way to make the
groups nonequivalent on the pretest is to add some constant value to all the pretest scores for persons in one of the groups. To see how you will do
this, look at Table 5-1. You should have already copied the pretest scores (X) for each participant into column 2. Notice that column 3 of Table 5-1
has a number "5" in it for the first 25 participants and a "0" for the second set of 25 persons. These numbers describe the initial pretest differences
between these groups (i.e., the groups are nonequivalent on the pretest). To create the pretest scores for this exercise, add the pretest scores from
column 2 to the constant values in column 3 and place the results in column 4 of Table 5-1 under the heading "Pretest (X) for Regression
Artifacts". Note that the choice of a difference of 5 points between the groups was arbitrary. Also note that in this simulation we have let the
program group have the pretest advantage of 5 points.
Now you need to create posttest scores. You should copy the posttest scores from column 6 of Table 1-1 directly into column 5 of Table 5-1. In
this simulation, we will assume that the program either has no effect or was never given, and so you will not add any points to the posttest score
for the effect of the program. But we assume that the initial difference between the groups persists over time, and so you will add to the posttest
the 5 points that describes the nonequivalence between groups. In Table 5-1, the initial group difference (i.e., 5 points difference) is listed again in
column 6. Therefore, you get the final posttest score by adding the posttest score in column 5 and the group differences in column 6. The sum
should be placed in column 7 of Table 5-1 labeled "Posttest Y for Regression Artifacts".
Now, just as you have done in previous exercises, plot the pretest and posttest frequency distributions in Figures 5-1 and 5-2, being sure to use
different colors for the program (persons 1-25) and comparison (persons 26-50) groups. Also, estimate the central tendency for each group on both
the pretest and posttest. You should notice that the average of the program groups is about 5 points higher than the average of the comparison
group on both measures.
If you were conducting a nonequivalent group design quasi-experiment and obtained the pretest distribution in Figure 5-1, you would rightly be
concerned that the two groups differ prior to getting the program. To remedy this, you might think it is a good idea to look for persons in both
groups who have similar pretest scores, and use only these matched cases as the program and comparison groups. You might conclude that by only
using persons "matched" on the pretest you can obtain "equivalent" groups.
You will match persons on their pretest scores, and put the matched cases in Table 5-2. To do this, first look at the pretest frequency distribution in
Figure 5-1. Notice again that the comparison group tended to score lower. Beginning at the lowest pretest score and moving upwards, find the
lowest pretest score at which there are both program and comparison persons. Most likely there will be more comparison persons than program
ones at the first score that has both. For instance, let's imagine that the pretest score of 9 is the first score that has persons from both groups and
that at this value there are two cases from the comparison group and one from the program group. Obviously you will only be able to find one
matched pair--you will have to throw out the data from one of the comparison group person because there is only a single program group case
available for matching. Since the dice used to generate the data yield random scores, you can simply take the first person in the comparison group
(Table 5-1, persons 26-50) who scored a 9 on the pretest. Record that person's ID number in column 1 of Table 5-2, their pretest in column 2 and
their posttest score in column 3. Next, find the program person (in Table 5-1, persons 1-25) who also scored a 9 on the pretest and enter that
person's ID number in column 4 of Table 5-2, their pretest in column 5 and their posttest score in column 6. Then move to the next highest pretest
score in Figure 5-1 for which there are persons from both groups. Again, find matched pairs, and enter them into Table 5-2. Continue doing this
until you have obtained all possible matched pairs. Notice that you should never use the same person more than once in Table 5-2.
At this point, you have created two groups matched on the pretest. To do so, you had to eliminate persons from the original sample of 50 for whom
no pretest matches were available. You may now be convinced that you have indeed created "equivalent" groups. To confirm this, you might
calculate the pretest averages of the program and comparison groups. They should be identical.
Have you in fact, created "equivalent" groups? Have you removed the selection bias (of 5 points) by matching on the pretest? Remember that you
have not added in a program effect in this exercise. If you successfully removed the selection difference on the pretest by matching, you should
find no difference between the two groups on the posttest (because you only put in the selection difference between the two groups on the
posttest). Calculate the posttest averages for the program and comparison groups in Table 5-2. What do you find?
Most of you will find that on the posttest the program group scored higher on average than the comparison group did. If you were conducting this
study, you might conclude that although the matched groups start out with equal pretest averages, they differ on the posttest. In fact, you would be
tempted to conclude that the program is successful because the program group scored higher than the comparison group on the posttest. But
something is obviously wrong here--you never put in a program effect! Therefore, the posttest difference that you are finding must be wrong.
To discover what is wrong you will plot the data in Table 5-2 in a new way. Look at Figure 5-4 labeled "Pair-Link Diagram". Starting with only
the comparison persons in Table 5-2, draw a straight line between the pretest and posttest scores of each person. Do the lines tend to go up, down,
or stay the same from pretest to posttest? Next, using a different colored pen, draw the lines for the program group persons in Table 5-2. In which
direction do these lines go? You should find that most of the program group lines go down while most of the comparison group lines go up from
pretest to posttest. As a result of what you have seen, you should be convinced of the following:
● The average posttest difference between the program and comparison group is entirely due to regression artifacts that result from the
matching procedure. Recall that because of the pretest difference of 5 points, which you put in, the entire program group had a higher
pretest average than the entire comparison group. When you matched persons on the pretest, you were actually selecting the higher
scoring comparison persons and the lower scoring program persons. Therefore, we expect the matched comparison group to regress down
toward the entire group's mean and the matched program group to regress up toward the entire group's mean.
● In this simulation you made the program group higher on the pretest by adding 5 points. You should recognize that if the comparison
group had been given this initial "advantage" the results of matching would have been reversed. In this case the matched comparison
group would have had a higher posttest average than the matched program group. You would mistakenly conclude that the program was
harmful--that is, even though the two matched groups start with equal pretest averages, the program group loses relative to the
comparison group. Of course, any gain or loss is due to regression artifacts which result from a matching process that selects persons
from the higher end of the distribution in one group and the lower end in the other.
● Matching should not be confused with blocking. If you had taken persons from two groups which differ on the pretest, matched them on
pretest scores and then randomly assigned one of each pair to the program and comparison group, you would have equal numbers of
advantaged and disadvantaged persons in each group. In this case, regression artifacts would cancel out and would not affect results.
Why do regression artifacts occur? We can get some idea by looking at a pair-link diagram for the entire set of 50 persons in the original
Generating Data exercise. Draw the pair-links for each of the 50 persons of Table 1-1 on Figure 5-5. Recall that for this original set of data we had
only one group (i.e., no program and comparison group), no selection biases and no program effects. You should be convinced of the following:
● Persons who score extremely high or extremely low on the pretest seldom do as extremely on the posttest. That is, there should be very
few pair-link lines which go from a low pretest score to an equally low posttest score or which go from a high pretest score to an equally
high posttest score.
● Recall that the pretest and posttest consists of two components, a true score which is the same on both tests and separate error scores for
each. You should know that the regression artifact cannot be due to the true score. If you were to draw a pair-link diagram between the
pretest and posttest true score, you would obtain nothing but horizontal lines (no regression) because it is the same for both tests.
However, if you drew a pair-link diagram between the pretest error score and the posttest error score, you would see a clear regression
effect. People with low pretest errors would tend to have higher posttest error scores and vice versa. This is because the pretest and
posttest error scores were based on independent dice rolls, the two sets of error scores are random or uncorrelated. We can conclude that
regression artifacts must be due to the error in measurement, not to the true scores.
● We can also view this in terms of correlations. First, assume that we have no measurement error-- persons always get the same score on
the pretest and posttest. In this case, the pair-link diagram would only have horizontal lines, as stated above, and there would be no
regression artifact. Furthermore, if people scored the exact same on both tests, there would be a perfect correlation between the two tests
(i.e., r = 1). Next, assume that our pretest and posttest are terrible measures that only reflect error (i.e., they do not measure true ability,
but do reflect random errors, at two points in time). Here, the two tests would be random or uncorrelated (i.e., r = 0). and we would
expect maximum regression to the mean (i.e., no matter what subgroup you select on the pretest, the posttest average of that subgroup
will always tend to equal the posttest average of the entire group). You should recognize that the more measurement error you have in the
measures, the lower the correlation between the measures. Finally, you should also see that the lower the correlation between two
measures the greater the regression artifact and, the higher the correlation the lower the regression.
● Finally, you should recognize that regression artifacts are purely a statistical phenomenon that results from a symmetric subgroup
selection and imperfect correlation. This means that when we select a subgroup from the extreme of a distribution, we will find regression
to the mean on any variable that is not perfectly correlated with the selection measure. This can lead the unwary analyst to some bizarre
conclusions. For example, let us say you wanted to look at the effect of a special educational program that was given to all students in a
school. Assume that you have pretest and posttest scores for everyone (but there is no control group). You would like to know whether
subgroups in the school improved. First, you look at the students who scored low on the pretest. They appear to improve on the posttest
(regression artifacts, of course). Next, you look at the students who scored high on the pretest. They appear to lose ground on the posttest.
You might incorrectly conclude the education helps low scoring students but hurts high scoring students. Now let us say you decide to
look at groups who differ on the posttest. The low posttest scorers did much better on the pretest. The high posttest scorers did much
worse on the pretest. It almost appears as if students regress backwards in time. But by now you should recognize that this is simply a
regression artifact that results from selection of groups on the extremes of the posttest and the imperfect correlation between the pretest
and posttest.
Table 5-1
1 2 3 4 5 6 7
Pretest Posttest
Pretest Pretest Posttest Posttest
X Y
Person Group (X) for Group (Y) for
from from
Difference Regression Artifacts Difference Regression Artifacts
Table 1-1 Table 1-1
1 5 5
2 5 5
3 5 5
4 5 5
5 5 5
6 5 5
7 5 5
8 5 5
9 5 5
10 5 5
11 5 5
12 5 5
13 5 5
14 5 5
15 5 5
16 5 5
17 5 5
18 5 5
19 5 5
20 5 5
21 5 5
22 5 5
23 5 5
24 5 5
25 5 5
Table 5-1
(cont.)
1 2 3 4 5 6 7
Pretest Posttest
Pretest Pretest Posttest Posttest
X Y
Person Group (X) for Group (Y) for
from from
Difference Regression Artifacts Difference Regression Artifacts
Table 1-1 Table 1-1
26 0 0
27 0 0
28 0 0
29 0 0
30 0 0
31 0 0
32 0 0
33 0 0
34 0 0
35 0 0
36 0 0
37 0 0
38 0 0
39 0 0
40 0 0
41 0 0
42 0 0
43 0 0
44 0 0
45 0 0
46 0 0
47 0 0
48 0 0
49 0 0
50 0 0
Figure 5-1
Figure 5-2
Figure 5-3
Table 5-2
Matched Cases from Table 5-1
1 2 3 4 5 6
Comparison Group Pretest X from Table Posttest Y from Table Program Group Pretest X from Table Posttest Y from Table 5-
Person Number 5-1 5-1 Person Number 5-1 1
Program Program
Comparison Group Comparison Group Group Posttest
Group Pretest
Pretest Average= Posttest Average= Average=
Average=
Figure 5-4
Pair-Link Diagram
Figure 5-5
Pair-Link Diagram

Computer Simulations
The computer exercises that follow utilize a computer program known as MINITAB. It has several distinct advantages for simulation
exercises. This statistical package is widely available and relatively inexpensive. The exercises shown here were tested under Release
10.2 of MINITAB for Windows and should run with that or any later release on either Macintosh, IBM-compatible, or even
mainframe computers. It is highly interactive -- you enter a command and can immediately see the results. But it can also be run in
"batch" mode which allows you to enter a long sequence of commands and have them run at once as a program. Batch programs can
also be put in a loop that allows you to run the same sequence of commands many times, accumulating the results from each run. The
exercises in this workbook are all set up for interactive mode because that is the best way to learn about computer simulation (see
Trochim and Davis (1986) for more details on running programs in batch mode). MINITAB is also very easy to learn. There are a
number of readily available instruction manuals and tutorials that introduce the novice to statistical computing.
But MINITAB is not the only computer program that can be used for simulation, nor is it the program of choice for professional
simulators. Virtually any statistical package -- SPSS, SAS, SYSTAT, Statview, DataDesk -- can be used for simulations, and each
has some advantages and disadvantages. For all its strengths as a teaching tool, MINITAB is not often viewed as a serious tool for
advanced statistical analysis in social research because it is slower than others, lacks many of the features of advanced packages, and
may not be as precise. However, for the kinds of exercises described in this workbook, MINITAB is an excellent choice.
In order to do these exercises, all you need to know is how to install and start the MINITAB program. This information can be found
in the manual that comes with the program. The exercises are "machine independent" -- we do not describe them in terms of any
particular operating system or machine configuration. The exercises don't assume any prior knowledge or use of the MINITAB
language, although that would be extremely helpful. We encourage you to work through the tutorial in the manual that comes with
the program.
We hope that the exercises given here will provide you with a solid and interesting introduction to computer simulation for social
research. We believe that you will find that simulation is an important tool for increasing you understanding of social research
methodology.

Generating Data
Introduction
This first computer exercise introduces you to many of the basic computer and statistical concepts that will be used in later exercises.
In this exercise you are going to create some data and then perform some simple analyses of it. If you follow the steps outlined below
you should be able to work through the entire exercise without a problem. However, in order to really benefit, it is important that you
work slowly and think about what you are doing at each step.
After completing the exercise you are certainly encouraged to play with variations of your own and some that will be suggested. You
can ask for a short or long description of any command by typing HINT or HELP followed by the command name. When you
encounter a new command you are encouraged to do this. You might also take a moment to see if the command is listed in the
MINITAB Handbook.
Now, get into the MINITAB program. If you don't know how to do this, you have to look it up in the MINITAB manual that came
with the program. You should see the MINITAB prompt (which looks like this MTB>). Now you are ready to enter the following
commands:
MTB> Random 10 C1;

SUBC> Normal 0 1.
You begin by having the computer create or generate 10 random numbers. We want these numbers to be normally distributed (i.e., to
come from a "bell-shaped" distribution) with a mean or average of zero and a standard deviation of 1. (Remember that the standard
deviation is a measure of the "spread" of scores around the mean). You told the computer to put these ten numbers in variable Cl. To
get an idea of what the RANDOM command does type:
MTB> Help Random
Before getting to more serious matters, you should play a little with the ten observations you created. First, print them out to your
screen.....
MTB> Print C1.
Or get means and standard deviations....
MTB> Describe C1
The mean should be near zero and the standard deviation near one. Or, draw a histogram or bar graph....
MTB> Histogram C1
Does it look like a bell-shaped curve? Probably not, because you are only dealing with 10 observations. Why don't you start over, this
time generating 50 numbers instead of 10....
MTB> Random 50 C1;
SUBC> Normal 0 1.
Notice that you have erased or overwritten the original 10 observations in Cl. If you do....
MTB> Print C1
all you see are the newest 50 numbers. To describe the data...
MTB> Describe C1
Notice that the mean and standard deviation are probably closer to 0 and 1 than was the case with 10 observations. Why?
MTB> Histogram C1
This should look a little more like a normal curve than the first time (although it may still look pretty bizarre).
Simulation: Generation of Two Variables
The above commands were included to familiarize you with the Random/Normal command. Now you will conduct a real simulation.
You'll create data according to a simple measurement model. You will generate two imaginary test scores for 500 individuals. If you
like, you can imagine that you have given two achievement tests to 500 school children.
The measurement model we'll use assumes that a test score is made up of two parts - true ability and random error. We can depict the
model as:
O = T + eo
Here, O is the observed score on a test, T is true ability on that test and eo is random error.
Notice what we're doing here. We will create 500 test scores or Os for two separate tests. In real life this is all we would be given and
we would assume that each test score is a reflection of some true ability and random error. We would not (in real life) see the two
components on the right side of the equation - we only see the observed score. We'll call our first test the X achievement test, or just
plain X. It has the model....
X = T + eX
which just says that the X test score is assumed to have both true ability and error in measurement. Similarly, we'll call the second
test the Y achievement test, or Y, and assume the model....
Y = T + eY
Notice that both of our tests are measuring the same construct, for example, achievement. For any given child, we assume this true
ability is the same on both tests (i.e., T). Further, we assume that a child gets different scores on X and Y entirely because of the
random error on either tests - if the tests both measured achievement perfectly (i.e., without error) both tests would yield the same
score for every child. OK, now try the following....
MTB> Random 500 C1;
SUBC> Normal 0 3.
MTB> Random 500 C2;
SUBC> Normal 0 1.
MTB> Random 500 C3;
SUBC> Normal 0 1.
Be sure to enter these exactly as shown. The first command created 500 numbers which we'll call the true scores or T for the 500
imaginary students. The second command generated the 500 random errors for the X test while the final command generated the 500
errors for the Y test. All three (Cl-C3) will have a mean near zero and the true score will have a bigger standard deviation than the
two random errors. How do we know that this will be the case? We set it up this way because we wanted to create an X and Y test
that were fairly accurate - reflected more true ability than error. Now, name the three variables so you can keep track of them.
MTB> Name C1 = 'true' C2 = 'x error' C3 = 'y error'
Don't forget the apostrophe. Now get descriptive statistics for these three variables....
MTB> Describe C1-C3
Note that the means and standard deviations should be close to what you specified. Now construct the X test...
MTB> Add C1 C2 C4.
Remember, Cl is the true score and C2 is random error on the X test. You are actually creating 500 new scores by adding together a
true score, Cl, and random error, C2. Now, construct the Y test...
MTB> Add C1 C3 C5.
Notice that you use the same true ability, Cl (both tests are assumed to measure the same thing) but a different random error.
It would be worth stopping at this point to think about what you have done. You have been creating imaginary test scores. You have
constructed two tests which you labeled X and Y. Both of these imaginary tests measure the same trait because both of them share
the same true score. This true score (Cl) reflects the true ability of each child on an imaginary achievement test, for example. In
addition, each test has its own random error (C2 for X and C3 for Y). This random error reflects all the situational factors (e.g., bad
lighting, not enough sleep the night before, noise in the testing room, lucky guesses, etc.) that can cause a child to score better or
worse on the test than his true ability alone would yield. One more word about the scores. Because the true score and error variables
were constructed to all have zero means, it should be obvious that the X and Y tests will also have means near zero. This might seem
like an unusual kind of test score, but it was done for technical reasons. If you feel more comfortable doing so, you may think of
these scores as achievement test scores where a positive value indicates a child who scores above average for his/her age or grade,
and a negative score indicates a child who scores below average.
If this were real life, of course, you would not be constructing test scores like this. Instead, you would measure the two sets of scores,
X and Y, and would do an analysis of them. You would assume that the two measures have a common true score and independent
errors, but you would not see these. Thus, you have generated what we call simulated data. The advantage of using such data is that,
unlike with real data, you know how the X and Y tests are constructed because you constructed them. You will see in later
simulations that this enables you to test different analysis approaches to see if they give back the results that you put into the data. If
the analyses work on simulated data then you might assume that they will also work for real data if the real data meet the
assumptions of the measurement model used in the simulations.
Now, pretend that you didn't create the X and Y tests but, rather, that you were given these two sets of test scores and asked to do a
simple analysis of them. You might begin by exploring the data to see what it looks like. First, name the two tests....
MTB> Name C4= 'X' C5 = 'Y'
Try this command...
MTB> Info
This just tells you how many variables, what names (if any) and how many observations you have. Now, describe the data....
MTB> Describe C4-C5
By the way, you might also try the other column operations, listed in the MINITAB Handbook. For example....
MTB> Count C4
tells you there are 500 observations in C4,
MTB> Sum C4
gives the sum,
MTB> Average C4
gives the mean (which should be near zero),
MTB> Medi C4
gives the median,
MTB> Standard C4
gives the standard deviation, and
MTB> Maxi C4
MTB> Mini C4
give the highest and lowest value in C4.
Now look at the distributions....
MTB> Histogram C4
MTB> Histogram C5
These should look a lot more like bell-shaped or normal curves than the earlier graphs did. Look at the bivariate relationship between
X and Y....
MTB> Plot C5 * C4; SUBC> symbol.
Notice a few things. You plotted C5 on the vertical axis and C4 on the horizontal. Each point on the graph indicates an X score
paired with a Y score. It should be clear that the X and Y tests are positively correlated, that is, higher scores on one test tend to be
associated with higher scores on the other. To confirm this, do...
MTB> Correlation C4 C5
The correlation should be near .90. You can predict scores on one test using scores on the other. To do this you will use regression
analysis. Fit the straight-line regression of Y on X...
MTB> Regress C5 1 C4
For now, don't worry about what all the output means (although you might want to start looking at Chapter 10, Correlation and
Regression in the MINITAB Handbook. In case you don't already know, you can stop a listing on the screen at any point by holding
down the CTRL key and at the same time pressing the S key. To begin the listing again press both CTRL and Q. You may want to
play with this a little bit.) The regression equation describes the best-fitting straight line for the regression of Y on X. You could draw
this line on the graph you did earlier. Just substitute some values in for X (try X = O, 1, -1, 2, and -2) and calculate the Y using the
equation which the regression analysis gives you. Then plot the X, Y, pairs and you will see that they fall on a straight line. Recall
from your high school algebra days that the number immediately to the right of the equal sign is the intercept and tells you where the
line hits the Y axis (i.e., when x = 0). The number next to the X variable name is the slope. It is possible to look at a plot of the
residuals and the regression line if we use the subcommand form of the regress statement....
MTB> Regress C5 1 C4;

SUBC> Residuals C20;
SUBC> Coefficients C22.
MTB> Let C21=C5-C20
We have arbitrarily chosen columns C20-C22 to store the residuals, predicted values and coefficients, respectively. The predicted Y
value is simply the observed Y minus the residual. The LET command is used to construct the predicted Y value. (Try 'Help regress'
to get information about the command or consult the MINITAB Handbook.) Now, to do a plot of the regression line you plot the
predicted values against the X variable....
This is actually a plot of the straight line that you fit with the regression analysis. It doesn't look like a "perfect" straight line because
it is done on a line printer and there is rounding error, but it should give you some idea of the type of line that you fit. Now, you can
also look at the residuals (i.e., the Y-distance from the fitted regression line to each of the data points). To do this type....
Notice that the bivariate distribution is circular in shape indicating that the residuals are uncorrelated with the X variable (remember
the assumption in regression that these must be uncorrelated?). This graph shows that the regression line fits the data well - there
appear to be about as many residuals which are positive (i.e., above the regression line) as negative You might also want to examine
the assumption that the residuals are normally distributed. Can you figure out a way to do this?
Now, you should again stop to consider what you have done. In the first part of the exercise you generated two imaginary tests, X
and Y. In the second part you did some analyses of these tests. The analyses told you that the means of the tests were near zero,
which is no surprise because that's the way you set things up. Similarly, the bivariate graph and the correlation showed you that the
two tests were positively related to each other. Again, you set them up to be correlated by including the same true ability score in
both tests. Thus, in this first simulation exercise, you have confirmed through simulation that these statistical procedures do tell you
something about what is in the data.
It would probably be worth your time to play around with variations on this exercise. This would help familiarize you with
MINITAB and with basic simulation ideas. For example, try some of the following....
● Change the reliability of the X and Y tests. Recall that reliability is simply the ratio of true score variance to total score
variance. You can increase reliability by increasing the standard deviation in the first Random/Normal statement from 3 to
some higher number (while leaving the error variable standard deviations at 1). Similarly, you can lower the reliability by
lowering the true score standard deviation to some value less than 3. Look at what happens to the correlation between X and
Y when you do this. Also, look at what happens to the slope estimate in the regression equation.
● Construct tests with more "realistic" averages. You can do this very simply by putting in a different mean than 0 in the first
Random/Normal statement. However, you should note that the true score measurement model always assumes that the mean
of the error variables is zero. Confirm that the statistical analyses can detect the mean that you put in.
● You can always generate more than two tests. Just make sure that each test has some element of true score and error. If you
really want to get fancy, why not try generating three variables using two different true scores. Have one variable get the
first true score, one get only the second true score, and one have both true scores (you'll have to add in both in a Let
statement). Of course, all three variables should have their own independent errors. Look at the correlations between the
three variables. Also, try to run a regression analysis with all three (look at the Chapter in the MINITAB Handbook or try
the Help command to see how to do this).
● One concept that we will discuss later in the simulation on nonequivalent group designs involves what happens in a
regression analysis when we have error in the independent variable. To begin exploring this idea you might want to rerun
the simulation with the one difference being that you don't add in the error on the X test (i.e., when you do the Let command
the first time you leave out the C2 variable). Here, the X measure would have nothing but true score - it would be a perfect
measure of true ability. See what happens to the correlation between X and Y. Also, see how the slope in the regression
differs from the slope in the original analysis. Now, run the simulation again, this time including the error in the X test but
not in the Y test. Again, observe what happens to the correlation and regression coefficient. The key question is whether
there is bias in the regression analysis. How similar are the results from the three simulations (i.e., the original simulation,
the perfect X simulation and the perfect Y simulation)? The hard question is why the results of the three simulations come
out the way they do. If you are concerned that the results differ because the random numbers you generate are different, run
the original three Random/Normal commands, do the original simulation as is, and then run the X-perfect and Y- perfect
simulations beginning with the let commands and eliminating either the X or Y error terms. In this case you are using the
same set of random numbers for all three runs and differences in results can be attributed to use of either perfect or
imperfect measurement. At any rate, this is a complex issue that will be discussed in more detail later.
One of the best ways to learn MINITAB is to do it. Don't hesitate to sit at the computer and use the Help and Hint commands to
explore the system.
Downloading the MINITAB Commands
If you wish, you can download a text file that has all of the MINITAB commands in this exercise and you can run the exercise simply
by executing this macro file. To find out how to call the macro check the MINITAB help system on the machine you're working on.
You may want to run the exercise several times -- each time will generate an entirely new set of data and slightly different results.
Click here if you would like to download the MINITAB macro for this simulation.

The Randomized Experimental Design
In this exercise you will simulate a simple pretest-posttest randomized experimental design. This design is of the form
R O X O
R O O
and thus has a pretest, a posttest, and two groups that have been randomly assigned. Note that in randomized designs a pretest is
technically not required although one is often included as a covariate to increase the precision of program effect estimates. We will
assume that we are comparing a program and comparison group (instead of two programs or different levels of the same program),
To begin, get into MINITAB in the usual manner. You should see the MTB prompt (which looks like this MTB>). Now you are
ready to enter the following commands.
You will create two hypothetical tests as in previous exercises. Here, one test will be considered the pretest, the other the posttest.
Assume that both tests measure the same true ability and that they each have their own unreliability or error:
MTB> Random 500 C1;

SUBC> Normal 50 5.
MTB> Random 500 C2;
SUBC> Normal 0 5.
MTB> Random 500 C3;
SUBC> Normal 0 5.
Here C1 represents the true ability on the tests for 500 people. C2 and C3 represent random error for the pretest and posttest
respectively. Notice that the mean ability score for the tests will initially be set to 50 test score units. Next, construct the observed test
scores:
MTB> Add C1 C2 C4.

MTB> Add C1 C3 C5.
You should notice that each test has about equal amounts of true score and error (because all three Random/Normal statements above
use a 5 unit standard deviation). Now, name the columns:
MTB> Name C1 = 'true' C2 ='x error' C3 ='y error' C4 = 'pretest' C5 = 'posttest'
So far you have created a pretest and post for 500 hypothetical persons. Next, you need to randomly assign half of the people to the
treated group and half to the control. One way to do this is to create a new random number for each individual. You will then use this
variable to assign cases randomly. Since we want equal size groups (250 in each) you can assign all persons less than or equal to the
median on this random number to one group, and all above the median to the other. Here is the way to do this:
MTB> random 500 C6;

SUBC> normal 0 5.
creates the random assignment number

MTB> let k1=min(C6)
MTB> let k2=median(C6)
MTB> let k3=max(C6)
gets the minimum, median and maximum values on this random assignment number. And
MTB> code (k1:k2) 0 (k2:k3) 1 c6 c7
creates the two equal size groups. To confirm that they are equal in size, do
MTB> table c7
and you should see that there are 250 0's and 1's.
Now, to be consistent with other exercises and to get rid of the unnecessary variable, put C7 into C6 and erase C7
MTB> let C6=C7 MTB> erase C7
Then, name C6
MTB> name C6='group'
Try the following three statements to verify that you have two groups of 250 persons:
MTB> Sign C6
MTB> Histogram 'Group'
Each of these presents slightly different information but both verify that you have two equal sized groups.
Now that you have created two groups, let's say that your treatment had an effect. To put in an effect you have to create a posttest
score that has something added into it for those people who received the treatment, and does not add this in for the control cases.
Remember that to create the posttest originally, you just added together the True Score and Posttest Error for each individual. To
create the posttest with a 10-point treatment effect built in, you would use the following formula
Y = T + eY + (10* Z)
where Z is the 0,1 group variable (C6) you just created. To do this in MINITAB do
MTB> let c7=c1 + c3 + (10*c6)

MTB> name c7='postgain'
Now, c5 is the posttest when there is no treatment effect and c7 is the posttest when there is a 10-point treatment effect.
At this point, it's worth stopping and thinking about what you've done. You created a random True Score (C1) and added it to
independent error (C2) to create a pretest (C4) and to other independent error (C3) to create a posttest (C5). Then you randomly
assigned half of the people to a treatment (C6=1) and to a control (C6=0) condition. Finally, you created a posttest that has a 10-point
treatment effect in it (C7). If this were a real study (and not a simulation), you would observe only three variables: the pretest (X,
C3), the group (Z, C6) and the posttest with a treatment effect in it (Y, C7).
Let's imagine how we might analyze the data using these three variables, in order to see whether the treatment has an effect. One of
the first things we might do is to look at some simple distributions for the pretest and posttest. First, look at some histograms:
MTB> Histogram 'pretest'.
MTB> Histogram 'postgain'.
MTB> Histogram 'pretest';

SUBC> MidPoint;
SUBC> Bar 'group'.
MTB> Histogram 'postgain';

SUBC> MidPoint;
SUBC> Bar 'group'.
The first two commands show the histograms for all 500 cases while the last two show histograms for the two groups separately. Can
you see that the two groups differ on average on the posttest?
Now, look at the bivariate distribution
MTB> Plot 'postgain' * 'pretest';

SUBC> Symbol 'group'.
You should see that the treated group has lots more high posttest scorers than the control group.
Now, look at some descriptive statistics tables.
MTB> Table 'Group';

SUBC> Means 'pretest' 'postgain';
SUBC> StDev 'pretest' 'postgain';
SUBC> N 'pretest' 'postgain'.
Here you should see clearly that while the two groups are very similar in average value on the pretest, they differ by nearly 10 points
on the posttest.
In a randomized experiment, you technically don't need to measure a pretest. You could have the design:
R X O
R O
If you did, all you would be able to do to look for treatment effects is to compare the groups on the posttest. This might best be
accomplished by conducting a simple t-test on the posttest
MTB> TwoT 95.0 c7 c6;

SUBC> alternative 0.
You can get the same result by using regression analysis with the following formula
Y = b0 + b1Z + eY
where
Y = posttest
Z = the 0,1 assignment variable
b0 = posttest mean of the comparison group
b1 = difference between the program and comparison group posttest means
eY = random error
This model can be run in MINITAB using
MTB> Regress 'postgain' 1 'Group'.
This regresses the posttest score onto the 0,1 group variable Z. The results for both the t-test and regression versions should be
identical, but you have to know where to look to see this. In the t-test results, the last line will say in it 'T=' and report a t-value. The
way you set up the simulation, this t-value should be negative in value (because it tests the control-treatment group difference which
should be negative because the treatment group mean is larger by about ten points). Now look at the regression table under the
heading 't-ratio'. The t-ratio for Group should be the same as the t-test result (except that the sign is reversed).
In general, the regression analysis method of testing for differences easier to use and interpret than the t-test results. In the regression
results, b0 is the coefficient for the Constant and b1 is the coefficient for Group. The b0 in this case is actually the average posttest
value for the control group. The b1 is the amount you add to the control group average to get the treatment group posttest average,
that is, the estimate of the difference between the two groups on the posttest. This should be somewhere around 10 points. Both
coefficients are tested with a t-test. The p-value tells you the probability that the estimated coefficient was obtained by chance.
So far, all you've done is to look at the difference between groups on the posttest. But you also have a pretest measured. How does
this pretest help in analyzing the data? In a randomized experiment, the pretest (or any other covariate) is used to reduce variability in
the posttest that is unrelated to the treatment. If you reduce posttest variability in this way, it should be easier to see a treatment
effect. In other terms, for the very same posttest, including a good pretest should yield a higher t-value associated with the coefficient
for differences between groups. To see this, you have to run a regression model that includes the pretest values in it. This model is:
Y = b0 + b1X + b2Z + eY
where
Y = the posttest
X = the pretest
Z = the assignment variable
b0 = the intercept of the comparison group line
b1 = slope of regression lines
b2 = the program effect
eY = random error
You can run this in MINITAB by doing:
MTB> Regress 'postgain' 2 'pretest' 'Group'.
Now, if you look at the t-ratio associated with the Group variable you should see that it is higher than it was in the original regression
equation you ran. Even though you used the exact same posttest variable, you are able to see the treatment effect more clearly (i.e.,
got a higher t-value) because you included a good covariate (the pretest) that reduced some of the noise in the posttest that might
obscure the treatment effect.
At this point you should be convinced of the following:
● One way to analyze data from this design is to conduct a t-test or one-way ANOVA on the difference between the posttest
means. This can be accomplished using the simple model given earlier:
Y = b0 + b1Z + eY
Notice several things. First, this model fits regression lines for both groups, but because X is not included the lines have no
slope (i.e., they are flat lines). You can construct the predicted line for both groups by substituting the appropriate values for
Z. The regression line for the program group is:
YP = b0 + b1(1)
YP = b0 + b1
and for the comparison groups it is:
YC = b0 + b1(0)
YC = b0
Therefore, the effect of the program is the difference between the two lines or
YP- YC= (b0- b1) - b0

YP- YC= b1
You should be convinced that this is the difference between the posttest means for the two groups.
● You also analyzed the data using a model called the analysis of covariance (ANCOVA):
Y = b0 + b1X + b2Z + eY
This analysis is almost identical to the analysis used later for the regression-discontinuity design. Theoretically, one should
get a similar estimate of the program effect with the ANCOVA and the ANOVA but the ANCOVA estimate will in general
be more precise. Specifically, the ANCOVA tests for posttest differences after "adjusting for"variance in the pretest. In
general, then, given the same data and significance level it will be easier to find a significant effect (i.e., b2is not equivalent
to 0) when using ANCOVA).
● The design simulated here is a very simple single-factor randomized experiment. You could simulate more complex
designs. For example, to simulate a randomized block design you would first rank all persons on the pretest. Then you could
set the block size, for example at n = 2. Then, beginning with the lowest two pretest scorers, you would randomly assign one
to the program group and the other to the comparison group. You could do this by rolling a die--if you get a 1, 2, or 3 the
lowest scorer is a program participant; if you get a 4, 5, or 6 the higher scorer is. Continuing in this manner for all twenty-
five pairs would result in a block design. Designing a randomized block simulation of this type is difficult in MINITAB --
see if you can figure it out.
You could also simulate a 2 x 2 factorial design. Here, you simply need to randomly assign four groups. You might want to
assume that both programs are not equally effective and hence have different effect sizes for each. Also, in this case you
would need to consider the interaction of the two factors and put in a specific effect size to simulate it. To develop such a
factorial design in MINITAB you have to create two dummy-coded (0,1) variables to represent groups. You should also
construct a variable representing their interaction (the easiest way is to multiply the two dummy-coded treatment variables
together). You can then put in a treatment effect for either factor or for the interaction by adding some value associated with
those terms into the posttest score. Your regression model will have to include the dummy-coded group variables and the
interaction term.
● You might also change several of the key parameters of the simulation to see what happens. For instance, you might change
the size of the error terms (C2 and C3) holding everything else constant, and look at what happens to the t-value associated
with the treatment effect. Or, you might change the size of the treatment effect from 10 to 3 points and see how this affects
the analysis.

Part I
In this exercise you are going to create data for and analyze a nonequivalent group design. The design has several important
characteristics. First, a pretest and posttest are given to all participants. Second, the design usually has two groups, one which gets
some program or treatment and one which does not (usually termed the "program" and " comparison groups respectively). Third, the
two groups are "nonequivalent groups", that is, we expect that they may differ prior to the study. Often, nonequivalent groups are
simply two intact groups that are accessible to the researcher (e.g., two classrooms, two states, two cities, two mental health centers,
etc.). We can depict the design using the following notation:
N O X O
N O O
where the N indicates that the groups are nonequivalent, the first O represents the pretest, the X indicates administration of some
program or treatment, and the last O signifies the posttest. Notice that the top line represents the program group while the bottom line
signifies the comparison group.
To begin, get into MINITAB in the usual manner. You should see the MTB prompt (which looks like this MTB>). Now you are
ready to enter the following commands.
You will create two hypothetical tests as in previous exercises. Here, one test will be considered the pretest, the other the posttest. We
will assume that both tests measure the same true ability and that they each have their own unreliability or error:
MTB> Random 500 C1;

SUBC> Normal 50 5.
MTB> Random 500 C2;
SUBC> Normal 0 5.
MTB> Random 500 C3;
SUBC> Normal 0 5.
Here C1 represents the true ability on the tests for 500 people. C2 and C3 represent random error for the pretest and posttest
respectively. Notice that the mean ability score for the tests will initially be set to 50 test score units. Next, construct the observed
tests scores:
MTB> Add C1 C2 C4.

MTB> Add C1 C3 C5.
You should notice that each test has about equal amounts of true score and error (because all three Random/Normal statements above
use a 5 unit standard deviation). Now, name the columns:
MTB> Name C1 ='true' C2 ='x error' C3 ='y error' C4 ='pretest' C5 ='posttest'
What you have done so far is to create a pretest and post for 500 hypothetical persons. Next, you have to create "nonequivalent"
groups. For convenience, you will create groups of 250 persons each. To do this enter:
MTB> Set C6
DATA> 1:500
DATA> End
MTB> Code (1:250) 0 C6 C6
MTB> Code (251:500) 1 C6 C6
The SET statement (and the two associated DATA statements) simply numbers each person from 1 to 500 and puts this sequence of
numbers in C6. The first code statement essentially says "change all the numbers from 1 to 250 in C6 to 0's and put these 0's back
into C6." The second code replaces the numbers from 251 to 500 with a 1. You have created two groups of 250 persons each. You
know which group a person is in by looking at their value in C6. If they have a 0, they are in one group; if they have a 1, they are in
the other. For convenience, the persons having a zero will be the comparison group and those having a one will be the program
group. You should name this new variable:
MTB> Name C6 = 'Group'
Try the following three statements to verify that you have two groups of 250 persons:
MTB> Table C6
MTB> Sign C6
MTB> Histogram 'Group'
Each of these presents slightly different information but all of them verify that you have two equal sized groups.
But you have still not created "nonequivalent" groups. To see this, you will use the subcommand form of the TABLE command:
MTB> Table C6;

SUBC> means C4 C5.
The first row of the table gives the pretest and posttest means for the comparison group (C6 = 0) while the second row gives these
values for the program group. At this point, all four means should be near 50 test score units.
In the nonequivalent group design we typically select two groups which we hope are similar or equivalent. Nevertheless, because we
don't select these groups randomly, we expect that one group may be better or worse than the other before our study. You saw from
the table command above that both groups appear to be similar on the pretest. Therefore, you can create nonequivalent groups by
making the program group slightly better in test ability. This situation might occur in real life if we chose two classrooms of students
that we thought were pretty similar, only to find out that one group scores on the average a few points better than the other. To create
the "advantaged" program group do the following:
MTB> Let C4 = C4 + (5 * C6)

MTB> Let C5 = C5 + (5 * C6)
It is important to think about what these statements are doing. The first let command operates on the pretest scores. You add five test
score points to each program group pretest score. How does this work? Remember that C6 has a 0 for all the comparison group
persons and a 1 for the program people. When you multiply 5 times this C6 variable the result will be a zero for each comparison
person and a 5 for each program person. You then add these 0 or 5 points to the original pretest score and put the result right back
into C4. The second Let command does the same thing for the posttest scores and, as a result, this "advantage" should be seen on
both the pre and posttest. Now verify that you have an "advantaged" program group (that is, that you have nonequivalent groups).
You will again use the table command but will add another subcommand to give the standard deviations:
MTB> Table C6;

SUBC> means C4 C5;
SUBC> stdev C4 C5.
Clearly, the program group has pre and posttest averages in the vicinity of 55 test score units.
So far, you have created two nonequivalent groups having a pretest and posttest. But one of these groups received your program or
treatment. Did it work? It would appear from the data that it did not. The difference between the group means on the pretest is about
the same as their posttest difference. About the only way that you could claim that the program had an effect is if you had reason to
believe that without it the posttest difference between the means would have been different than it is. This would be possible, for
example, if the groups had been maturing at different rates (a selection - maturation threat) but without any other evidence than these
test scores this would be a hard argument to accept. On the basis of this data you would probably conclude that the program was
ineffective. This makes sense especially because you did not build into the data any program effect. Now add 10 test score points to
the posttest for each program person - a treatment effect of 10 points:
MTB> Let C5 = C5 + (10*C6)
which you should recognize as the same type of command that you used above to create nonequivalent groups in the first place. Now,
look at the means and standard deviations:
MTB> Table C6;

SUBC> means C4 C5;
SUBC> stdev C4 C5.
Now, the pretest difference between the two groups is still about 5 points on the average, but the posttest difference is about 15
points. The "gain" of the program group over what you might expect on the basis of pretest scores appears to be about 10 points
(which, of course, is exactly what you set it up to be).
At this point it is worth reflecting on what you have done. If you had conducted a study using the nonequivalent group design, you
could have obtained data like that which is described in the last table. You would notice that the groups appear to be different on the
pretest, with the program group having the advantage. You would also notice that the difference between the groups is considerably
larger on the posttest. In fact, you have simulated data for a nonequivalent group design and (whether you realize it or not) you have
explicitly controlled the size of the correlation between the measures, the number of persons in each group, the amount of
nonequivalence between the groups, the size of the program effect, and so on. One reason we run simulations of this type is to
determine whether statistical analyses which we use give us accurate estimates of the effect of the effect of our programs. Since we
specifically put in a 10 point program effect, we would expect that an accurate analysis would tell us that the effect was about that
large. Let's find out if our analysis will work.
The typical strategy for analyzing pretest-posttest group designs is one which is based on the Analysis of Covariance (ANCOVA).
Essentially, we want to look at the difference between the two groups on the posttest after we have "adjusted for" the initial
differences between the groups as measured on our covariate - the pretest. The ANCOVA can be analyzed using multiple regression
analysis (you should recognize that the ANCOVA is simply a subset of the multiple regression model - we would get exactly the
same results for the analysis whether we use a computer program which does ANCOVA or one which does regression as long as we
tell the regression program the correct model to estimate. We will generally use the regression command in MINITAB to conduct an
Analysis of Covariance).
Before actually running the analysis you ought to look at plots of the data. Try some of the following:
MTB> Histogram C4
MTB> Histogram C5
MTB> Plot C5 * C4
MTB> Plot C5 * C4;
SUBC> symbol C6.
The Histogram commands show the distributions for the pretest and posttest. The first plot command shows the pre-post bivariate
distribution. The second plot command shows this same distribution but uses different symbols for the program and comparison
groups. Unfortunately, it may be difficult to see the program and comparison groups distinctly. As a side exercise, you might try to
use the choose command to create separate columns of pre and posttest scores for the two groups so these distributions can be plotted
separately. Now run the ANCOVA using the MINITAB regression command. The regression model form of the ANCOVA can be
stated as:
Y = b0 + b1X + b2Z + eY
where
Y = the posttest (C5)
X = the pretest (C4)
Z = the assignment variable (C6)
b0 = the intercept of the comparison group line
b1 = slope of regression lines
b2 = the program effect
eY = random error
To do this analysis enter:
MTB> Regress C5 2 C4 C6
The computer will first print out the regression equation. The first number on the right of the equal sign is the intercept (b0)of the
comparison group regression line (because you included C6, a dummy 0,1 variable in the regression, in effect two lines are being fit
to the data, one for each group). The second number in the equation gives the slope (b1) for the program and comparison group
regression lines (recall that the Analysis of Covariance assumes that the slopes of the two groups are equal - thus, we only simulate a
single value). The third number after the equal sign is the estimate of the program effect (b2). Recall that you put in a program effect
of 10 points. Is this value close? The table below the equation tests whether these three values are significantly different from zero.
Since you are particularly interested in determining whether this analysis gives an accurate estimate of the program effect, you should
look in the table for the line for variable C6, the "group " variable. The coefficient or estimate that was shown in the equation is
repeated first on this line. Then the standard deviation of the estimate is shown. You know that you put in a program effect of 10
points. To see whether the estimate given by the analysis is accurate at a .05 level of significance, you have to construct a confidence
interval for the estimate or coefficient. To do this, first multiply the standard deviation for that coefficient by 2 and then add and
subtract this value from the estimate. For example, let's say the analysis tells you that the estimate or coefficient of the C6 variable is
11.3 and that the standard deviation is 0.5 units. Given this, the .05 confidence interval ranges from 10.3 to 12.3 (that is 11.3 plus or
minus 2 times .5) . This analysis would be telling you that the best estimate of the program effect is 11.3 and that the odds are less
than 5 out of 100 that the true effect is outside of that range. Recall that you have simulated that the true effect is ten points. In this
example, you would wonder whether the analysis we used (ANCOVA) is working correctly because the program effect that you put
in doesn't fall within the 95% confidence interval. When you construct the confidence interval do you find that 10 is included within
it or not? Is the estimate of effect above or below 10?
If you have followed the instructions, you will find that most of the time you will not get an accurate estimate of the effect. In fact,
ANCOVA yields biased estimates of effect for this type of nonequivalent group design. We do have better analysis strategies, but in
order to understand them well it is important to understand why the ANCOVA strategy fails. You should try to get some idea of why
ANCOVA fails by conducting simulations like the one above. Some variations are suggested below. The next exercise will present
an analysis strategy which can often be used to obtain correct estimates of program effect.
● A key reason for the failure of ANCOVA is unreliability or error in the measures. You explicitly controlled the reliability by
setting the standard deviations of the true and error scores in the Random/Normal statements. Try the simulation again
setting the true score standard deviation to 10 and the error standard deviations to 1.
● Try the variation above but make the pretest more reliable than the posttest. To do this, use a small standard deviation for
the pretest error (C2) and a larger one for posttest error (C3).
● Try to construct a simulation where the treatment group is disadvantaged relative to the comparison group. To do this, you
will have to multiply the C6 variable by a negative number in the appropriate let statement above.
● Put in a negative program effect. To do this you will have to use a negative number where you used the +10 above. A
negative effect implies that your program actually hurt rather than helped the program group relative to the comparison
group.
When we use simulation techniques to investigate the accuracy of a statistical analysis we never rely on the results of a single run
because the results could be wrong simply by chance. Typically, we would run the simulation several hundred times and average the
estimates of program effect to see if the analysis is biased or not. Although that many runs is probably not feasible for you, it might
be worthwhile for you to compare the estimates of effect that you got with estimates which others obtain. If you average these
estimates, you should see more clearly that ANCOVA yields a biased estimate for the nonequivalent group design.

Part II
This is the second exercise in the analysis of nonequivalent group designs. In the first exercise you learned how to simulate data for a
pretest - posttest two-group design. You put in an initial difference between the two groups (i.e., a selection threat) and also put in a
treatment effect. You then analyzed the data with an Analysis of Covariance (ANCOVA) model and, if all went well, you found that
the estimate of program effect was biased (i.e., the true effect was not within a 95% confidence interval of the obtained estimate).
In this exercise, you will create data exactly as you did in the last exercise. This time however, you will adjust the data for
unreliability on the pretest before conducting the ANCOVA. Because unreliability or measurement error on the pretest causes the
bias when using ANCOVA, this correction should result in an unbiased estimate of the program effect.
By now you should be fairly familiar with what most of these MINITAB commands are doing (remember, you should consult the
MINITAB Handbook or use the HELP command if you want information about a command) and so the initial command will be
presented without explanation. Get into MINITAB and you should see the MINITAB prompt (which looks like this MTB>). Now
you are ready to enter the following commands:
MTB> Random 500 C1;

SUBC> Normal 50 5.
MTB> Random 500 C2;
SUBC> Normal 0 5.
MTB> Random 500 C3;
SUBC> Normal 0 5.
MTB> Add C1 C2 C4.
MTB> Add C1 C3 C5.
MTB> Name C1 ='true' C2 ='x error' C3 ='y error' C4 ='pretest' C5 ='posttest'
MTB> Set C6
DATA> 1:500
DATA> End
MTB> Code (1:250) 0 C6 C6
MTB> Code (251:500) 1 C6 C6
MTB> Name C6 = 'Group'
MTB> Let C4 = C4 + (5 * C6)
MTB> Let C5 = C5 + (15 * C6)
You have constructed the data exactly as in Part I. Notice that the data commands from the previous exercise have been condensed
here. If you don't understand why you used these commands, go back and read the instructions for Part I again. Be sure that you
understand that you now have data for two groups, one of which received a program or treatment (i.e., those with a '1' for C6) and the
other a comparison group (i.e., with a '0' for C6). The groups differ on the pretest by an average of five points, with the program
group being advantaged. On the posttest, the groups differ by fifteen points on the average. It is assumed that five of this fifteen point
difference is due to the initial difference between groups and that the program had a ten point average effect. You should verify this
by examining the group means:
MTB> Table C6;

SUBC> means C4 C5.
You should also look at histograms and at the bivariate distribution:
MTB> Histogram C4
MTB> Histogram C5
MTB> Plot C5 * C4
MTB> Plot C5 * C4;
SUBC> Symbol C6.
Recall that the ANCOVA yields biased estimates of effect because there is measurement error on the pretest. You added in the
measurement error when you added C2 (x-error) into the pretest score. In fact, you added in about as much error as true score
because the standard deviations used in all three Random/Normal commands are the same (and equal to 5). In order to adjust the
pretest scores for measurement error or unreliability, we need to have an estimate of how much variance or deviation in the pretest is
due to error (we know that it is about half because we set it up that way). Recall that reliability is defined as the ratio of true score
variance to total score variance and that we estimate reliability using a correlation. Traditionally, we can use either a split-half
reliability (the correlation between two randomly-selected subsets of the same test) or a test-retest correlation. In these simulations
we will correct the pretest scores using a test-retest correlation. Since the test-retest correlation may differ between groups it is often
advisable to calculate this correlation for the program and comparison groups separately. Before we can do this we have to separate
the data for the two groups and put it in separate columns:
MTB> Copy C6 C4 C5 C20-C22;

SUBC> use C6 = 0.
MTB> Copy C6 C4 C5 C30-C32;
SUBC> use C6 = 1.
MTB> Name C21 = 'contpre' C22 = 'contpost' C31= 'progpre' C32 = 'progpost'
The first statement copies all cases having a '0' in C6 (i.e., all the comparison group cases) and their pretest and posttest scores (C4
and C5) and puts these scores in C20 through C22. It is important for you to notice that the order in which you copy the variables (i.
e., C6, C4, C5) is the order in which they are put into the other columns. Therefore, C20 should consist entirely of '0' values. The
next statement copies all the program group cases and puts them in C30 through C32. We are not interested in C20 or C30 because
these consist of the '0' and '1' values so we don't name them. We name C21 'contpre' to stand for control pretest, C31 'progpre' to
stand for program group pretest, and so on. Now, we are ready to estimate the test- retest reliabilities for each group. These are
simply the correlations between the pretest and posttest for each group:
MTB> Correlation C21 C22 M1

MTB> Correlation C31 C32 M2
Each correlation is stored in a 2x2 matrix variable. In order for us to use these correlations later, we have to copy them from the M
matrix into a K-variable constant. We can do this with the following commands:
MTB> copy M1 C41 C42

MTB> copy C41 K1;
SUBC> use 2.
MTB> copy M2 C43 C44
MTB> copy C43 K2;
SUBC> use 2.
Now, we will adjust the pretest scores for the comparison group. All you need to do is the following formula:
MTB> Let C23 = aver (C21) + (K1 * (C21 - aver (C21)))

Be sure to type this statement exactly as is. If you make a mistake, just re-enter the command. Now, let's adjust the scores for the
program group. Use the following formula:
MTB> Let C33 = aver (C31) + (K2 * (C31 - aver (C31)))
You should recognize that for each group you are taking each person's pretest score, subtracting out the group pretest average,
multiplying this difference by the within-group pre-post correlation (the K-variable), and then adding the group pretest mean back in.
For instance, the formula for this reliability correction for the control group is:
We should probably name these new adjusted pretests:
MTB> Name C23='contadj' C33='progadj'
Now, compare the pretest means for each group before and after making the adjustment:
MTB> Describe C21 C22 C23 C31 C32 C33
What do you notice about the difference between unadjusted and adjusted scores? Did the mean change because of the adjustment
(compare the means for C21 and C23 for instance)? Did the standard deviations? Remember that about half of the standard deviation
of the original scores was due to error. How much have we reduced the standard deviations by adjusting? How big is the pre-post
correlation for each group? Is the size of the correlation related to the standard deviations in any way?
Now, we want to combine the pretest scores for the two groups back into one set of scores (i.e., one column of data). To do this we
use the Stack command:
MTB> Stack (C23) (C33) (C7).

Which means 'stack the scores in C23 on top of the scores in C33 and put these into C7'. Now let's name the adjusted pretest scores:
MTB> Name C7 = 'adjpre'
Now take a look at the means and standard deviations of the original pretest, adjusted pretest and posttest by group:
MTB> Table C6;

SUBC> means C4 C7 C5;
SUBC> stdev C4 C7 C5.
It should be clear to you that the adjustment for measurement error on the pretest did not affect the means, but did affect the standard
deviations. You are now ready to conduct the analysis. First, do the ANCOVA on the original pretest scores (C4) just like in Part I.
You should see that the estimate of effect is biased. Then, conduct the analysis using the adjusted pretest scores (C7). This time you
should see that the estimate of effect is much closer to the ten points that you originally put in. First, the biased analysis:
Remember that the estimate of the program effect is the COEFFICIENT in the table which is on the C6 GROUP line. Is it near a
value of ten? You can construct a 95% confidence interval for this estimate using the ST.DEV. OF COEF. which is on the same line
as the estimate. To construct this interval, first multiply the standard deviation of the coefficient by two (remember from statistics
that the 95% confidence interval is plus or minus approximately two standard error units). To get the lower limit of the interval
subtract this value from the coefficient in the table, to get the upper limit add them. Does the true program effect (10 points) fall
within the bounds of this interval. For most of you it will not (just by chance, it may for a few of you. Event if it does fall within the
interval it will probably not be near the center of the interval). Now, conduct the same ANCOVA analysis this time substituting the
adjusted pretest scores (C7) for the original ones:
Again, look at the coefficient in the table that is on the same line as the C6 GROUP variable. Is this value closer to the true program
effect of 10 points than it was for the previous analysis? It should be (again, for some of you this may not be true just by chance
alone). Once again, construct the 95% confidence interval for the coefficient using the standard deviation of the coefficient given in
the table. You should see that this time the true program effect of 10 points falls within the interval.
You have just conducted a reliability-corrected Analysis of Covariance. In practice, we know that test-retest and split-half reliability
estimates will often differ, sometimes considerably. Because of this, we will often conduct two analyses like the one above- once
using the test-retest correlations and once using the split-half correlations. Although this will give us two estimates of the program
effect, we expect that one is a conservative adjustment while the other is a liberal one. We can therefore be fairly confident that the
true program effect probably lies somewhere between the two estimated effects.
At this point, you should stop and consider the steps that were involved in conducting this analysis. You might want to try one or
more of the following variations:
● Change the reliability of the pretest. Recall that the reliability of the pretest depends on how much true score and error you
add into it. Try the simulation with extremely low reliability (high standard deviation for error, C2, low for true score, C1)
and high reliability. You might even want to try a perfectly reliable pretest (i.e., don't add C2 in at all). In this case, we
would expect that the unadjusted ANCOVA yields an unbiased estimate of the program effect.
● Change the reliability of the posttest. Here you would alter the ratio of the standard deviations for the Y error, C3, and true
score, C1 variables. Recall that unreliability on the posttest should not result in biased estimates with unadjusted ANCOVA.
However, the estimates will be less precise when you have more error on the posttest. You will see this by looking at the
standard deviation of the coefficient and comparing it with the standard deviation of the coefficient which you obtained in
this simulation. With higher error variance on the posttest, the 95% confidence interval should be wider.
● As in the previous exercise variations, try to construct a simulation where the program group is disadvantaged relative to the
comparison group.
● Conduct the simulation above but put in a negative program effect. To do this, just put in a -15 for the 15 in the LET
statement which constructed the posttest.

The Regression Discontinuity Design
In this exercise we are going to create and analyze data for a regression discontinuity design. Recall that in its simplest form the
design has a pretest, a posttest, and two groups, usually a program and comparison group. The distinguishing feature of the design is
its procedure for assignment to groups -- persons or units are assigned to one or the other group solely on the basis of a cutoff score
on the pre-program measure. Thus, all persons having a pre-program score on one side of the cutoff value are put into one group and
all remaining persons are put in the other. We can depict the design using the following notation:
C O X O
C O O
where the C indicates that groups are assigned by a cutoff score, the first O represents the pretest, the X depicts the administration of
some program or treatment and the second O signifies the posttest. Notice that the top line represents the program group while the
second line indicates the comparison group.
In this simulation you will create data for a "compensatory" program case. We assume that both the pretest and posttest are fallible
measures of ability where higher scores indicate generally higher ability. We also assume that we want the program being studied to
be given to the low pretest scorers - those whose are low in pretest ability.
Get into MINITAB as you normally would. You should see the MINITAB prompt . Now you are ready to enter the commands below.
The first step is to create two hypothetical tests, the pretest and posttest. Before you can do this you need to create a measure of true
ability and separate error measures for each test:
MTB> Random 500 C1;

SUBC> Normal 50 5.0.
MTB> Random 500 C2;
SUBC> Normal 0 5.0.
MTB> Random 500 C3;
SUBC> Normal 0 5.0.
Now you can construct the pretest by adding true ability (C1) to pretest error (C2):
MTB> Add C1 C2 C4
Before constructing the posttest it is useful to create the variable that describes the two groups. The pretest mean will be about 50 and
you will use 50 as the cutoff score in this simulation. Because this is a compensatory case, we want all those who score lower than or
equal to 50 to be program cases, with all those scoring above 50 to be in the comparison group. The following two code statements
will create a new dummy variable (C5) with a value of 1 for program cases and 0 for comparison cases:
MTB> Code (0:50) 1 C4 C5

MTB> Code (50:100) 0 C5 C5
To check on how many persons you have in each condition do:

MTB> Table C5
Notice that you probably don't have exactly 250 people in each group (although in the long run, that is how many you would expect
if you divide a normal distribution at the mean). Now you are ready to construct the posttest. We would like to simulate an effective
program so we will add in 10 points for all program cases (recall that you accomplish this by multiplying 10 by the dummy-coded
treatment variable - for all program cases this product is 10, for comparison cases, 0 -- this is then added into the posttest):
MTB> Let C6= C1 + C3 + (10*C5)
It is convenient to name the variables:
MTB> Name C1 = 'true' C2 ='x error' C3 ='y error' C4 ='pretest' C5='group' C6='posttest'
To get some idea of what the data look like try:
MTB> Table C5;

SUBC> means C4 C6.
and don't forget to put the period at the end of the second line. This command gives pre and post means for the two groups. Note that
the program group starts off at a distinct disadvantage - we deliberately selected the lower scorers on the pre-program measure.
Notice also that the comparison group actually regresses back toward the overall mean of 50 between the pretest and posttest. This is
to be expected because you selected both groups from the extremes of the pretest distribution. Finally, notice that the program group
scores as well or better than the comparison group on the posttest. This is because of the sizable 10 point program effect which you
put in. You might examine pre and post histograms, correlations, and the like. Now, look at the bivariate distribution:
MTB> Plot C6 * C4;

SUBC> symbol C5.
You should be able to see that the bivariate distribution looks like it "jumps" at the pretest value of 50 points. This is the discontinuity
that we expect in a regression-discontinuity design when the program has an effect (note that if the program has no effect we expect a
bivariate distribution that is continuous or does not jump).
At this point you have finished creating the data. The distribution that you see might be what you would get if you conducted a real
study (although real data seldom behaves as well as this). The first step in analyzing this data is to examine the data to try to
determine what the "likely" pre-post function is. We know that the true function here is linear (that the same straight-line fit in both
groups is appropriate) but with real data it will often be difficult to tell by visual inspection alone whether straight or curved lines are
needed. Therefore even though we might think that the most likely function or distribution is linear, w e will deliberately over-fit or
over-specify this likely function a bit to be on the safe side.
The first thing you need to do to set up the analysis is to set up a new variable that will assure the program effect will be estimated at
the cutoff point. To do this, you simply create a new variable that is equal to the pretest minus the cutoff score. You should see that
this new variable will now be equal to zero at the cutoff score and that all program cases will have negative values on this score while
the comparison group will have positive ones. Since the regression program would automatically estimate the vertical difference or
"jump" between the two groups at the intercept (i.e., where the pretest equals 0), when you create this new variable you are setting
the cutoff equal to the a pretest value of 0 and the regression program will correctly estimate the jump at the cutoff. Put this new
variable in C7:
MTB> Let C7 = C4 - 50
MTB> Name C7 = 'pre-cut'
and name it appropriately. You will see that we always substitute this variable for the pretest in the analyses.
Now you need to set up some additional variables that will enable you to over-specify the "likely" true linear function:
MTB> Let C8 = C7 * C5
This new variable is simply the product of the corrected pretest and the dummy assignment variable. Thus C8 will be equal to zero
for each comparison group case and equal to the corrected pretest for each program case. When this variable is added into the
analysis we are in effect telling the regression program to see if there is any interaction between the pretest (C7) and the program
(C5). This is equivalent to asking whether the linear slopes in the two groups are equal or whether they are different (which implies
that the effect of the program differs depending on what pretest score a person had). Now, construct quadratic (second-order) terms:
For C9 you simply square the pretest. When this variable is entered into the analysis we are in effect asking whether the bivariate
distribution looks curved in a quadratic pattern (consult an introductory algebra book if you don't recall what a quadratic or squared
function looks like). The second variable, C10, allows the quadratic elements in each group to differ, and therefore, can be
considered a quadratic interaction term. You should name the variables:
MTB> Name C8 ='I1' C9 = 'pre2' C10 = 'I2'
where I1 stands for 'linear interaction' , PRE2 for the 'squared pretest', and I2 for the 'quadratic interaction'. You could continue
generating even higher-order terms and their interactions (cubic, quadratic, quintic etc.) but these will suffice for this demonstration.
You are now ready to begin the analysis. You will do this in a series of regression steps, each time adding in higher-order terms.
Because of the length of the standard regression output, you might want to request briefer output with the command
MTB> Brief 1
In the first step, you fit a model which assumes that the bivariate distribution is best described by straight lines with the same slopes
in each group and a jump at the cutoff:
This is simply the standard Analysis of Covariance (ANCOVA) model. The coefficient associated with the GROUP variable in the
table is the estimate of the program effect. Since you created the data you know that this regression analysis exactly-specifies the true
bivariate function - you created the data to have the same slope in each group and to have a program effect of 10 points. Is the
estimate that you obtained near the true effect of ten? You can construct a 95% confidence interval (using plus or minus 2 times the
standard deviation of the coefficient for the GROUP variable). Does the true effect of ten points fall within this interval (it should for
almost all of you)?
With real data we would not be sure that the model we fit in this first step includes all the necessary terms. If we have left out a
necessary term (for instance, if there was in fact a linear interaction) then it would be very likely that the estimate we obtained in this
first analysis would be biased (you will see this in the next simulation). To be on the safe side, we will add in a few more terms to the
analysis in successive regression steps. If we have already included all necessary terms (as in the analysis above) then these
additional terms should be superfluous. They should not bias the estimate of program effect, but there will be less precision. For the
next step in the analysis you will allow the slopes in the two groups to differ by adding in Il, the linear interaction term:
MTB> Regress C6 3 C7 C5 C8
The coefficient for the GROUP variable is, as usual, the estimate of program effect. We know that the new variable, C8, is
unnecessary because you set up the simulation so that the slopes in both groups are the same. You should see that the coefficient for
this Il variable is near zero and that a zero value almost surely falls within the 95% confidence interval of this coefficient. Because
this term is unnecessary, you should still have an unbiased estimate of the program effect. Is the coefficient for GROUP near the true
value of 10 points? Does the value of 10 fall within the 95% confidence interval of the coefficient? You should also note that the
estimate of the program effect is less precise in this analysis than in the previous one - the standard error of the coefficient for the
GROUP variable should be larger in this case than in the previous run. Now, add in the quadratic term:
MTB> Regress C6 4 C7 C5 C8 C9
Again, you should see that the coefficients for the superfluous terms (I1 and PRE2) are near zero. Similarly, the estimate of program
effect should still be unbiased and near a value of ten. This time the standard error of the GROUP coefficient will be a little larger
than last time - again indicating that there is some loss of precision as higher-order terms are added in. Finally, you will allow the
quadratic terms to differ between groups by adding in the quadratic interaction term, I2:
MTB> Regress C6 5 C7 C5 C8 C9 C10
By now you should be able to see the pattern across analyses. Unnecessary terms will have coefficients near zero. The program effect
estimate should still be near ten, but the 95% confidence interval will be slightly wider indicating that there is a loss of precision as
you add in more terms.
In an analysis of real data, you would by now be more convinced that your initial guess that the bivariate distribution was linear was
a sensible one. You might decide to continue fitting higher order terms or you might stop with the quadratic terms. This whole
procedure may strike you as somewhat wasteful. If we think the correct function is linear, why not just fit that? The procedure
outlined here is a conservative one. It is designed to minimize the chances of obtaining a biased estimate of program effect by
increasing your chances of overspecifying the true function. At this point you should stop and consider the steps that were involved
in conducting this analysis. You might want to try one or more of the following variations:
● Change the reliability of the pretest. Recall that the reliability of the pretest depends on how much true score and error you
add into it. Try the simulation with extremely low reliability (high standard deviation for error, C2, low for true score, C1)
and high reliability. You might even want to try a perfectly reliable pretest (i.e., don't add C2 in at all). What effect does
pretest reliability have on estimates of the treatment effect? How is this different from or similar to the nonequivalent group
design?
● Change the reliability of the posttest. Here you would alter the ratio of the standard deviations for the Y error, C3, and true
score, C1 variables. How does posttest reliability relate to the estimate of the effect?
● Construct a simulation where the program group consists of the high pretest scorers. In what kinds of real-world situations
would the high scorers be likely to receive a new program?
● Put in a negative program effect. To do this, just put in a -10 for the 10 in the LET statement which constructed the posttest.
How does the shape of the bivariate distribution change when you introduce a negative rather than a positive effect?

In this exercise you are going to look at how regression to the mean operates. It is designed to convince you that regression effects do
occur, when they can be expected to occur, and to hint at why they occur.
To begin, get into MINITAB as usual. You should see the MINITAB prompt (which looks like this MTB>). Now you are ready to
enter the following commands. You will begin by creating two variables similar to the ones in the Generating Data exercise:
MTB> Random 500 C1;

SUBC> Normal 50 10.
MTB> Random 500 C2;
SUBC> Normal 0 5.
MTB> Random 500 C3;
SUBC> Normal 0 5.
You have created three random variables. The first will have a mean or average of 50 and a standard deviation of 10. This will be a
measure of the true ability of people on some characteristic. You also created two random error scores having a mean of zero
(remember that we always assume that errors have zero mean) and a standard deviation half that of the true score. These errors will
be used to construct two separate "tests":
MTB> Add C1 C2 C4.

MTB> Add C1 C3 C5.
Thus, you have created two tests of some ability. The tests are related or correlated because they share the same true score (i.e., they
measure the same true ability) Name the five variables you have created so far:
MTB> Name C1= ''true' C2 = 'x error' C3 = 'y error' C4 = 'X' C5 = 'Y'
Check on the distributions:
MTB> Describe C1-C5
You should find that the mean of the TRUE variable is near 50, the means of the two error variables are near 0 and the means of x
and y are near 50. Now look at the distributions of the two tests to see if they appear to be normal in shape:
MTB> Histogram C4
MTB> Histogram C5
The graphs should look like bell-shaped curves. You should also look at the correlation between x and y:
MTB> Correlation C4 C5
and at the bivariate distribution:

MTB> Plot C5 * C4;
SUBC> Symbol 'x'.
Up to now, this is pretty much what you did in the first exercise. Now let's look at regression to the mean. Remember that we said
that regression will occur whenever we select a group asymmetrically from a distribution. Let's say that the X test is a measure of
math ability and that we would like to give a special math program to all those children who score below average on this test. We
would then like to select a group of all those scoring below the mean of 50:
MTB> Copy C4 C5 C1-C3 C6-C10;

SUBC> use C4 = 0:50.
Be sure to type these commands in exactly as written. In words, what you have done is to "COPY all those with scores between 0 and
50 on variable C4 (the X test) and to also COPY the scores in C5 (the Y test) and C1 through C3 for those cases; and put the copied
cases in C6 through C10 respectively." Now, name the new columns:
MTB> Name C6='New x' C7='New y' C8='New t' C9='New xe' C10='New ye'
Notice that the order of the variables has been rearranged. The X test score for the group you chose is in C6 and the Y test score is in
C7.
Now, assume that you never gave your math program (your school district lost all funding for special training - sometimes
simulations can be realistic!) but that you measured your selected group later on a similar math test, the Y test. Look at the before-
and-after means for your selected group:
MTB> Describe C6 C7
It looks like your group improved slightly between the two tests! Does this mean that your program might not have been necessary?
Is this improvement merely normal maturation in math ability? Of course not! All you are witnessing is regression to the mean. You
selected a group that consisted of low-scorers on the basis of the X test. On any other measure which is imperfectly related to the X
test this group will appear to score better simply because of regression to the mean. Look at the distributions of the selected group:
MTB> Histogram C6
MTB> Histogram C7
Notice that the distribution of the X test for this group looks like one half of a bell-shaped curve. It should because you selected the
persons scoring on the lower half of the X test. But the Y test distribution is not as clearly cut as the X test. Why not? Persons have
the same true score on both tests (remember that you added in the same true score to the X and Y test). It must have something to do
with the errors then. Obviously, since every person has the same true score on both tests, and since persons appeared on average to
score a little higher on the Y test than on the X test, we should expect to see more negative errors on the X test (i.e., x errors) than on
the Y test. Let's see if that is true. You can use the SIGN command to see how many negative, zero, and positive values a variable
has:
MTB> Sign C9
MTB> Sign C10
There should be a few more negative errors for the X test and a few more positive errors for the Y test. Are there? At this point you
should stop to reflect on what you've done. You created two tests that measure the same ability. The tests are imperfectly measured (i.
e., they have error). You then selected an asymmetrical sample on the basis of scores on one of the tests - you selected the "low"
scorers on the X test. Even though you didn't do anything to this group, when you measured them again on the Y test you found that
they appear to improve slightly. If you worked in a school district, people might begin to question your suggestion that those children
needed special training. Let's say that you decided to show them that the apparent gain in this group really isn't accurate. You decide
that you will look at the change between the X and Y test for those children who score above the mean:
MTB> COPY C4 C5 C1-C3 C6-C10;

SUBC> use C4 = 50:100.
Now you look at the means for this above average group on the X and Y tests:
MTB> Describe C6 C7
What happened? It appears that the above average group lost ground between the X and Y tests. Now your critics are really
convinced (or should we say confused?). They argue that the low scorers improved just fine without your special math program but
that the high scorers were the ones who lost ground. Maybe they say, you should be giving your program to the high scorers to help
prevent further decline.
What is going on here? What you have witnessed is a statistical phenomenon called regression to the mean. It occurs because we
have imperfect measurement. Recall that a child's true ability in math is indicated by the true score. The tests partially show us the
true score but they also have error in them. For any given child the error could work for or against them - they could have a good day
(i.e., a positive error) or a bad one (i.e., negative error). If they have a bad day, their test score will be below their true ability. If they
have a good day, the test score will be above their true ability (i.e., they got lucky or guessed well). When you selected a group that
was below the overall average for the entire population of 500 children, you chose a lot of children who really did have below
average true ability - but you also chose a number of children who scored low because they had a bad day (i.e., had a negative x
error). When you tested them again, their true ability hadn't changed, but the odds that they would have as bad a day as the first time
were much lower. Thus, it was likely that on the next test (i.e., the Y test) the group would do better on the average.
It is possible that sometimes when you do the above simulation, the results will not turn out as we have said. This is because you
have selected a group that is really not too extreme. Let's really stack the deck now and see a clear cut regression artifact. We will
choose a really extreme low scoring group:
MTB> COPY C4 C5 C1-C3 C6 C10;

SUBC> use C4 = 0:40.
We have selected all those who scored lower than 40 on the X test. Look at the distributions:
MTB> Histogram C6
MTB> Histogram C7
Here, the X test distribution looks like it was sharply cut, the Y test looks much less so. Now, look at the means:
MTB> Describe C6 C7
Here we have clear regression to the mean. The selected group scored much higher on the Y test than on the X test.
To get some idea of why this occurred look at the positive and negative values of the errors for the two tests:
MTB> Sign C9
MTB> Sign C10
There should be far more negative values on the x error than on the y error (new xe and new ye).
We can predict how much regression to the mean will occur in any case. To do so we need to know the correlation between the two
measures for the entire population (the first correlation you calculated above) and the means for each measure for the entire
population (in this case, both means will be near 50). The percent of regression to the mean is simply 100 (1-r) where r is the
correlation. For example, assume that the correlation between the X and Y tests for all 500 cases is .80 and that the means are both
equal to 50. Further, assume you select a low scoring group from the X test and when you look at the mean for this group you find
that it equals 30 points. By the formula you would expect this mean to regress 100 (1-.8) or 20% of the way back towards the mean
on the other measure. The mean that we would expect on the other measure (the Y test) in the absence of regression is the same as for
the X test -- 30 points. But with regression we expect the actual mean to be 20% closer to the overall Y test mean of 50. Twenty
percent of the distance between 30 and 50 would put the Y test mean at 34 and it would appear that the group gained four points, but
this gain would instead be due to regression. Now, assume the same situation, but this time assume the correlation between the X and
Y test is .50. Here we would expect that there would be 100 (1-.5) = 50% regression to the mean and if the group which was selected
had a 30 on the X test we would expect them to get a 40 on the Y test just because of the regression phenomenon.
There are several variations on this basic simulation that you might want to try:
● You should vary the strength of the correlation between the X and Y tests. You can do this by varying the standard
deviations of the x and y errors (in the original three Random/Normal statements). You might want to try the simulation
with the error variances very small (Random 500 C2; Normal 0 5., for example) or you could simply lower the variance of
the true score (Random 500 C1; /Normal 50 1). Whichever you do, you should realize that you will be affecting the range of
the scores that you obtain on the X and Y tests (the smaller the variances, the smaller the range). Make sure that when you
do a copy statement you are actually choosing some cases. You should verify for yourself that the higher the correlation
between X and Y the lower the regression, and vice versa. You can even simulate the extreme cases. To simulate a perfect
correlation, set both X and Y equal to the true score only (e.g., Let C4=Cl). To simulate a zero correlation, set X and Y
equal to their respective error terms (and no true score).
● Regression to the mean occurs in two directions. Try selecting persons based on extremes on the Y test and verify that they
regress toward the mean on the X test. While it might seem obvious to you now that regression will occur, it has not always
been so obvious in practice. Consider a school district that has conducted a pre-post evaluation of a program and would like
to look more closely at the results. They might for example, want to look at how the children who scored low or high on the
posttest had done on the pretest. They might look at a group of high posttest scorers and would find that these children did
"worse" on the pretest. Similarly, they might look at the low posttest scorers and they would find that these children had
done "betterÓ on the pretest. They might conclude that children who did fairly well on the pretest were hurt by their
program while poor pretest scorers were helped by it. But this kind of pattern in the results occurs whether a program is
given or not - these evaluators would be basing their conclusions entirely on regression artifacts. Try to look at regression in
both directions - from the X to the Y test, and vice versa.

Applications of Simulations in Social Research
There are a number of ways in which the simulation exercises can be useful in program evaluation contexts. First, they provide a
powerful teaching tool (Eamon, 1980; Lehman, 1980). Students of program evaluation can explore the relative advantages of these
designs under a wide variety of conditions. In addition, the simulations show the student exactly how an analysis of these designs
could be accomplished using real data. Second, the simulations provide a way to examine the possible effects of evaluation
implementation problems on estimates of program effect (Mandeville, 1978; Raffeld et al., 1979; Trochim, 1984). Just as NASA
explores difficulties in a space shuttle flight using an on-ground simulator, the data analyst can examine the possible effects of
attrition rates, floor or ceiling measurement patterns, and other implementation factors. Finally, simulations make it possible to
examine the potential of new data analysis techniques. When bias is detected in traditional analysis and analytic solutions are
forthcoming, simulations can be a useful adjunct to statistical theory.
Applications For Teaching
Simulations offer several advantages for teaching program evaluation. First, students can construct as well as observe the simulation
program in progress and get an idea of how a real data analysis might unfold. In addition, the simulation presents the same
information in a number of ways. The student can come to a better understanding of the relationships between within-group pretest
and posttest means and standard deviations, bivariate plots of pre-and postmeasures that also depict group membership, and the
results of the ANCOVA regression analyses. Second, the simulations illustrate clearly some of the key assumptions that are made in
these designs and allow the student to examine what would happen if these assumptions are violated. For instance, the simulations
are based on the assumption that within-group- pre-post slopes are linear and that the slopes are equal between groups. The effects of
allowing the true models to have treatment interaction terms or nonlinear relationships can be examined directly with small
modifications to the simulation program as Trochim (1984) illustrated for the RD design. Third, the simulations demonstrate the
importance of reliable measurement. By varying the ratio of true score and error term variances, the student can directly manipulate
reliability and show that estimates of effect become less efficient as measures become less reliable. Finally, simulations are an
excellent way to illustrate that apparently sensible analytic procedures can yield biased estimates under certain conditions. This is
shown most clearly in the simulations on the NEGD. Although the apparent similarity between the design structures of the RE and
NEGD might suggest that traditional ANCOVA regression models are appropriate, the simulations clearly show this to be false and
thereby confirm the statistical literature in the area (Reichardt, 1979).
Applications for the Study of Design Implementation
The validity of estimates from the simulation exercises contained in this manual depend on how well they are executed or
implemented in the field. There are many implementation problems occurring in typical program evaluations--attrition problems,
data coding errors, floor and ceiling effects on measures, poor program implementation, and so on--that degrade the theoretical
quality of these designs (Trochim, 1984). Clearly, there is a need for improved evaluation quality control (Trochim and Visco, 1985),
but when implementation problems cannot be contained, it is important for the analyst to examine the potential effects of such
problems on estimates of program gains. This application of simulation is analogous to simulation studies that NASA conducts to try
to determine the effects of problems in the functioning of the space shuttle or a communications satellite. There, an exact duplicate of
the shuttle or satellite is used to try to recreate the problem and explore potential solutions. In a similar way, the program evaluator
can attempt to recreate attrition patterns or measurement difficulties to examine their effects of the analysis and discover analytic
corrections that may be appropriate. The analyst can directly manipulate the models of the problems in order to approximate their
reality more accurately and to examine the performance of a design under more varied situations. Such simulations are useful in that
they can alert the analyst to potential bias and even indicate the direction of bias under various assumptions.
Applications For the Investigation of New Analyses

One of the most exciting uses of simulation involves the examination of the accuracy and viability of "new" statistical techniques that
are designed to address the deficiencies of previous models. There are two reasons why simulations are particularly valuable here.
First, the conditions that the analysis will yield unbiased estimates. Second, simulations allow the analyst to examine the performance
of the analysis under degraded conditions or conditions that do not perfectly match the mathematical ideal. Thus, simulations can act
as a proving ground for new analyses that supplement and extend what is possible through mathematical argument alone.
This application of simulations can be illustrated well by reflecting on the NEGD simulations, where the estimates of program effect
were clearly biased. This bias is well know in the methodological literature (Reichardt, 1979) and results from unreliability
(measurement error) in the preprogram measure under conditions of nonspecific able group nonequivalence. One suggestion for
addressing this problem analytically is to conduct what is usually called a reliability-corrected analysis of Covariance to adjust for
pretest unreliability in the NEGD. The analysis involves correcting the pretest scores separately for each group using the following
formula:
Xadj = X + rxx(xi - xmean)
where:
Xadj = the adjusted or reliability corrected pretest
xmean = the within-group pretest mean
xi = pretest score for case i
rxx = an estimate of pretest reliability
The analyst must use an estimate of reliability and there is considerable discussion in the literature (Reichardt, 1979; Campbell and
Boruch, 1975) about the assumptions underlying various estimates (for example, test-retest or internal consistency). The reader is
referred to this literature for more detailed consideration of this issues. The choice of reliability estimate is simplified in simulations
because the analyst knows the true reliability (as discussed earlier). This adjusted pretest is then used in place of the unadjusted
pretest for the NEGD simulations.
To illustrate the correction, simulations were conducted under the same conditions in the NEGD exercises using the reliability-
corrected ANCOVA. It is clear that the reliability corrected NEGD analysis yields unbiased estimates, thus lending support to the
idea that this correction procedure is appropriate, at least for the conditions of these simulations.
Simulations have been used to explore and examine the accuracy of a wide range of statistical analyses for program evaluation
including models for adjusting for selection biases in NEGD (Trochim and Spiegelman, 1980; Muthen and Joreskog, 1984); for
correcting for misassignment with respect to the cutoff in RD designs (Campbell et al., 1979; Trochim, 1984), and for assessing the
effects of attrition in evaluations (Trochim, 1982).

Conclusion
This workbook offers several simulation models that are appropriate for use of three common research designs for evaluating
program effects. The logic of these simulations can be easily extended to other relevant research contexts. For instance, many
agencies routinely conduct sample surveys to identify needs and target populations, assess services that are provided, and compare
agency functioning with the performance of other similar agencies or with some standard. One would construct simulation models
for survey instruments for the same reasons that they are constructed for evaluation designs--to improve teaching and general
understanding, to explore problems in implementing the survey (such as non response patterns), or to examine the probable effect of
various analytic strategies. The key to doing this would again rest on the statistical model used to generate hypothetical survey
responses. A "true score" measurement model is useful, at least for simple simulations, but may have to be modified. For instance,
assume that one question on a survey deals with client satisfaction with a particular service and that the response is a 7-point Likert-
type format where 1=very dissatisfied, 7=very satisfied, and 4=neutral. The analyst could make the assumption that for some sample
or subsample the true average response is a scale value equal to 5 points (somewhat satisfied), and that the true distribution of
responses is normal around these values, with some standard deviation. At some point, the analyst will have to convert this
hypothetical underlying continuous true distribution to the 7-point integer response format either by rounding or by generating
normally distributed random integers in the first place. Such a variable could then be correlated or cross-tabulated with other
generated responses to explore analytic strategies for that survey. Similar extensions of the models discussed here can be made for
simulations of routinely collected management information system (MIS) information, for data for correlational studies, or for time-
series situations, among others.
Simulations are assumptive in nature and vary in quality to the degree that the reality is correctly modeled. When constructing a
simulation, it is important that the analyst seek out empirical evidence to support the assumptions that are made whenever this is
feasible. For instance, it should be clear that the simulations described here could be greatly enhanced if we had more specific data on
how much and what type of attrition typically occurs, what type of floor or ceiling effects are common, what patterns of
misassignment relative to the cutoff value typically arise for the RD design, what the typical test-retest reliabilities (for use in
reliability-corrected ANCOVA) might be, and so on. Although some relevant data will be available in the methodological literature,
all of these issues are context specific and demand that the analyst know the setting in some detail if the simulations are to be
reasonable.
One way to approach the assumptive nature of the simulation task is to recognize that reality conditions or constraints in the models
need to be examined systematically across a range of plausible conditions. This implies that multiple analyses under systematically
varied conditions that are based upon principles of parametric experimental design are needed in state-of-the art simulation work.
This point is made well by Heiberger et al. (1983:585):
The computer has become a source of experimental data for modern statisticians much as the farm field was to the
developers of experimental design. However, many "field" experiments have largely ignored fundamental
principles of experimental design by failing to identify factors clearly and to control them independently. When
some aspects of test problems were varied, others usually changed as well--often in unpredictable ways. Other
computer-based experiments have been ad hoc collection of anecdotal results at sample points selected with little
or no design.
Heiberger et al. (1983) go on to describe a general model for simulation design that allows the analyst to control systematically a
large number of relevant parameters across some multidimensional reality space, including the sample size, number of endogenous
variables, number of "key points" or condition values, matrix eigenvalues and eigenvectors, intercorrelations, least squares regression
coefficients, means standard errors, and so on.
Although rigorous, experimentally based simulations are essential for definitive analysis of complex problems, they will not always
be feasible or even desirable for many program evaluation contexts. Instead, it is important to recognize that simulations are
generally a useful tool that can be used to conduct more definitive statistical studies. However, simulations in program evaluation can
provide the analyst with the means to explore and probe simple relevant data structures for the purposes of improving the instruction
of research, examining research implementation issues and pilot testing analytic approaches for problematic data.

References
Bradley, D. R. (1977) "Monte Carlo simulations and the chi-square test of independence." Behavior Research methods and
Instrumentation 9: 193-201.
Campbell D.T. and R.F. Boruch (1975) "Making the case for randomized assignment of treatments by considering the alternatives:
Six ways in which quasi-experimental evaluations tend to underestimate effects," in C.A. Bennet and A.A. Lumsdaine (eds.)
Evaluation and Experience: Some Critical Issues in Assessing Social Programs. New York: Academic Press.
Campbell, D.T., C.S. Reichardt, and W. Trochim (1979) "The analysis of fuzzy regression discontinuity design: Pilot Simulations."
Unpublished manuscript, Department of Psychology, Northwestern University, Chicago.
Cook, T.D. and D. Campbell (1979) Quasi-experimentation: Design and Analysis Issues for Field Settings. Chicago: Rand-McNally.
Eamon, D.E. (1980) "Labsim: A data driven simulation program for instruction in research and statistics." Behavior Research
Methods and Instrumentation 12: 160-164.
Golderger, A.S. (1972) "Selection bias in evaluating treatment effects: Some formal illustrations." Discussion paper, Institute for
Research on Poverty, University of Wisconsin, Madison.
Guetzkow, H (1962) Simulation in Social Science. Englewood Cliffs, NJ: Prentice-Hall.
Heckman. J. (1981) "The incidental parameters problem and the initial conditions in estimating a discrete time-discrete data
stochastic process." in C.F. Manski and D. McFadden (eds.) Structural Analysis of Discrete Data with Economic Applications.
Cambridge, MA: MIT Press.
Heiberger, R.M., P.F. Veleman, and A.M. Ypelaar (1983) "Generating test data with independent controllable features for
multivariate general linear forms." Journal of the American Statistical Association 78:585-595.
Lehman, R.S. (1980) "What simulations can do to the statistics and design course." Behavior Research Methods and Instrumentation
12:157-159.
Mandell, L.M. and E.L. Blair (1980) "Forecasting and evaluating human service system performance through computer simulation,"
pp. 60-67 in American Statistical Association Proceedings, Washington, DC: American Statistical Association.
Mandeville, G.K. (1978) "An evaluation of Title I and model C1: The special regression model." American Education Research
Association. Proceedings, Washington, D.C, April.
Muthen, D. and K.G. Joreskog (1984) "Selectivity problems in quasi-experimental studies," in R.F. Conner et al. (eds.) Evaluation
Studies Review Annual, Vol. 9 Beverly Hills, CA: Sage.
Raffeld, P., D. Stamman, and G. Powell (1979) "A simulation study of the effectiveness of two estimates of regression in the Title I
model A Procedure." American Educational Research Association Proceedings, Washington, D.C. April.
Reichardt, C. (1979) "The design and analysis of the non-equivalent group quasi-experiment." Unpublished doctoral dissertation,
Northwestern University, Chicago.
Ryan, T.A., B.L Joiner, and B.F. Ryan (1991) MINITAB Handbook. PSW-Kent Publishing Company Boston.
Trochim, W. (1982) "Methodologically based discrepancies in compensatory education evaluation." Evaluation Review 6, 3: 443-
480.
Trochim, W. (1984) Research Design for Program Evaluation: The Regression Discontinuity Approach. Beverly Hills, CA. Sage.
Trochim, W. (1986) Advances in Quasi-Experimental Design and Analysis. San Francisco: Jossey-Bass.
Trochim, W. and J. Davis (1986) "Computer Simulation for Program Evaluation." Evaluation Review Vol. 5. 10 No. 5, October pp.
609-634.
Trochim, W. and C.H. Spiegleman (1980) "The relative assignment variable approach to selection bias in pretest-posttest group
designs," p. 102 in Proceedings of the Social Statistics Section, American Sociological Association, Washington, DC, September.
Trochim, W. and R. Visco (1985) "Quality control in evaluation," in DS. Cordray (ed.) Utilizing Prior Research in Evaluation
Planning. San Francisco: Jossey-Bass.

Home
Glossary
References
Selecting Statistics
About
Help
How many variables does the problem involve?
One Variable
Two Variables
More than two variables

Home
Glossary
References
About
Help
Glossary
ADDITIVE. A situation in which the best estimate of a dependent variable is obtained by simply
adding together the appropriately computed effects of each of the independent variables.
Additivity implies the absence of interactions. See also INTERACTION.
AGREEMENT. Agreement measures the extent to which two sets of scores (e.g., scores obtained
from two raters) are identical. Agreement involves a more stringent matching of two variables
than does covariation, which implicitly allows one to change the mean (by adding a constant)
and/or to change the variance (by multiplying by a constant) for either or both variables
before checking the match.
BIAS. The difference between the expected value of a statistic and the population value it is
intended to estimate. See EXPECTED VALUE.
BIASED ESTIMATOR. A statistic whose expected value is not equal to the population value. See
EXPECTED VALUE.
BIVARIATE NORMALITY. A particular form of distribution of two variables that has the traditional
""bell"" shape (but not all bell-shaped distributions are normal). If plotted in three- dimensional
space, with the vertical axis showing the number of cases, the shape would be that of a three-
dimensional bell (if the variances on both variables were equal) or a ""fireman's hat"" (if the
variances were unequal). When perfect bivariate normality obtains, the distribution of one
variable is normal for each and every value of the other variable. See also NORMAL
DISTRIBUTION.
BRACKETING. The operation of combining categories or ranges of values of a variable so as to

produce a small number of categories. Sometimes referred to as 'collapsing' or 'grouping.'
CAPITALIZATION ON CHANCE. When one is searching for a maximally powerful prediction

equation, chance fluctuations in a given sample act to increase the predictive power
obtained; since data from another sample from the same population will show different
chance fluctuations, the equation derived for one sample is likely to work less well in any other
sample.
CAUSAL MODEL. An abstract quantitative representation of real-world dynamics (i.e., of the

causal dependencies and other interrelationships among observed or hypothetical variables).
COMPLEX SAMPLE DESIGN. Any sample design that uses something other than simple random
selection. Complex sample designs include multi-stage selection, and/or stratification, and/or
clustering.
COVARIATE. A variable that is used in an analysis to correct, adjust, or modify the scores on a
dependent variable before those scores are related to one or more independent variables. For
example, in an analysis of how demographic factors (age, sex, education, etc.) relate to wage
rates, monthly earnings might first be adjusted to take account of (i.e., remove effects
attributable to) number of hours worked, which in this example would be the covariate.
COVARIATION. Covariation measures the extent to which cases (e.g., persons) have the same
relative positions on two variables. See also AGREEMENT.
DEPENDENT VARIABLE. A variable which the analyst is trying to explain in terms of one or more
independent variables. The distinction between dependent and independent variables is
typically made on theoretical grounds-in terms of a particular causal model or to test a
particular hypothesis. Synonym: criterion variable.
DESIGN MATRIX. A specification, expressed in matrix format, of the particular effects and
combinations of effects that are to be considered in an analysis.
DICHOTOMOUS VARIABLE. A variable that has only two categories. Gender (male/female) is an
example. See also TWO-POINT SCALE.
DUMMY VARIABLE. A variable with just two categories that reflects only part of the information
actually available in a more comprehensive variable. For example, the four-category variable
Region (Northeast Southeast, Central, West) could be the basis for a two-category dummy
variable that would distinguish Northeast from all other regions. Dummy variables often come in
sets so as to reflect all of the original information. In our example, the four-category region
variable defines four dummy variables (1) Northeast vs. all other; (2) Southeast vs. all other; (3)
Central vs. all other; and, (4) West vs. all other. Alternative coding procedures (which are
equivalent in terms of explanatory power but which may produce more easily interpretable
estimates) are effect coding and orthogonal polynomials.
EXPECTED VALUE. A theoretical average value of a statistic over an infinite number of samples
from the same population.
HETEROSCEDASTICITY. The absence of homogeneity of variance. See HOMOGENEITY OF

VARIANCE.
HIERARCHICAL ANALYSIS. In the context of multidimensional contingency table analysis, a

hierarchical analysis is one in which inclusion of a higher order interaction term implies the
inclusion of all lower order terms. For example, if the interaction of two independent variables is
included in an explanatory model, then the main effects for both of those variables are also
included in the model.
HOMOGENEITY OF VARIANCE. A situation in which the variance on a dependent variable is the

same (homogeneous) across all levels of the independent variables. In analysis of variance
applications, several statistics are available for testing the homogeneity assumption (see Kirk,
1968, page 61); in regression applications, a lack of homogeneity can be detected by
examination of residuals (see Draper and Smith, 1966, page 86). In either case, a variance-
stabilizing transformation may be helpful (see Kruskal, 1978, page 1052). Synonym:
homoscedasticity. Antonym: heteroscedasticity.
HOMOSCEDASTICITY. See HOMOGENEITY OF VARIANCE.

INDEPENDENT VARIABLE. A variable used to explain a dependent variable. Synonyms: predictor
variable, explanatory variable. See also DEPENDENT VARIABLE.
INTERACTION. A situation in which the direction and/or magnitude of the relationship between
two variables depends on (i.e., differs according to) the value of one or more other variables.
When interaction is present, simple additive techniques are inappropriate; hence, interaction is
sometimes thought of as the absence of additivity. Synonyms: nonadditivity, conditioning
effect, moderating effect, contingency effect. See also PATTERN VARIABLE, PRODUCT
VARIABLE.
INTERVAL SCALE. A scale consisting of equal-sized units (dollars, years, etc.). On an interval scale
the distance between any two positions is of known size. Results from analytic techniques
appropriate for interval scales will be affected by any non-linear transformation of the scale
values. See also SCALE OF MEASUREMENT.
INTERVENING VARIABLE. A variable which is postulated to be a predictor of one or more

dependent variables, and simultaneously predicted by one or more independent variables.
Synonym: mediating variable.
KURTOSIS. Kurtosis indicates the extent to which a distribution is more peaked or flat-topped
than a normal distribution.
LINEAR. The form of a relationship among variables such that when any two variables are
plotted, a straight line results. A relationship is linear if the effect on a dependent variable of a
change of one unit in an independent variable is the same for all possible such changes.
MATCHED SAMPLES. Two (or more) samples selected in such a way that each case (e.g.,
person) in one sample is matched-i.e., identical within specified limits-on one or more
preselected characteristics with a corresponding case in the other sample. One example of
matched samples is having repeated measures on the same individuals. Another example is
linking husbands and wives. Matched samples are different from independent samples, where
such case-by-case matching on selected characteristics has not been assured.
MEASURE OF ASSOCIATION. A number (a statistic) whose magnitude indicates the degree of

correspondence-i.e., strength of relationship-between two variables. An example is the Pearson
product-moment correlation coefficient. Measures of association are different from statistical
tests of association (e.g., Pearson chi-square, F test) whose primary purpose is to assess the
probability that the strength of a relationship is different from some preselected value (usually
zero). See also STATISTICAL MEASURE, STATISTICAL TEST.
MISSING DATA. Information that is not available for a particular case (e.g., person) for which at
least some other information is available. This can occur for a variety of reasons, including a
person's refusal or inability to answer a question, nonapplicability of a question, etc. For useful
discussions of how to overcome problems caused by missing data in surveys see Hertel (1976)
and Kim and Curry (1977).
MULTIVARIATE NORMALITY. The form of a distribution involving more than two variables in which
the distribution of one variable is normal for each and every combination of categories of all
other variables. See Harris (1975, page 231) for a discussion of multivariate normality. See also
NORMAL DISTRIBUTION.
NOMINAL SCALE. A classification of cases which defines their equivalence and non-
equivalence, but implies no quantitative relationships or ordering among them. Analytic
techniques appropriate for nominally scaled variables are not affected by any one-to-one
transformation of the numbers assigned to the classes. See also SCALE OF MEASUREMENT.
NONADDITIVE. Not additive. See ADDITIVE, INTERACTION.
NORMAL DISTRIBUTION. A particular form for the distribution of a variable which, when plotted,
produces a ""bell"" shaped curve- symmetrical, rising smoothly from a small number of cases at
both extremes to a large number of cases in the middle. Not all symmetrical bell-shaped
distributions meet the definition of normality. See Hays (1973, page 296).
NORMALITY. See NORMAL DISTRIBUTION.
ORDINAL SCALE. A classification of cases into a set of ordered classes such that each case is
considered equal to, greater than, or less than every other case. Analytic techniques
appropriate for ordinally scaled variables are not affected by any monotonic transformation of
the numbers assigned to the classes. See also SCALE OF MEASUREMENT.
OUTLYING CASE (OUTLIER). A case (e.g., person) whose score on a variable deviates
substantially from the mean (or other measure of central tendency). Such cases can have
disproportionately strong effects on statistics.
PATTERN VARIABLE. A nominally scaled variable whose categories identify particular

combinations (patterns) of scores on two or more other variables. For example, a party-by-
gender pattern variable might be developed by classifying people into the following six
categories: (1) Republican males, (2) Independent males, (3) Democratic males, (4)
Republican females, (5) Independent females, (6) Democratic females. A pattern variable can
be used to incorporate interaction in multivariate analysis.
PRODUCT VARIABLE. An intervally scaled variable whose scores are equal to the product
obtained when the values of two other variables are multiplied together. A product variable
can be used to incorporate certain types of interaction in multivariate analysis.
RANKS. The position of a particular case (e.g., person) relative to other cases on a defined
scale-as in '1st place,' '2nd place,' etc. Note that when the actual values of the numbers
designating the relative positions (the ranks) are used in analysis they are being treated as an
interval scale, not an ordinal scale. See also INTERVAL SCALE, ORDINAL SCALE.
SCALE OF MEASUREMENT. As used here, scale of measurement refers to the nature of the
assumptions one makes about the properties of a variable; in particular, whether that variable
meets the definition of nominal, ordinal, or interval measurement. See also NOMINAL SCALE,
ORDINAL SCALE, INTERVAL SCALE.
SKEWNESS. Skewness is a measure of lack of symmetry of a distribution.
STANDARDIZED COEFFICIENT. When an analysis is performed on variables that have been

standardized so that they have variances of 1.0, the estimates that result are known as
standardized coefficients; for example, a regression run on original variables produces
unstandardized regression coefficients known as b's, while a regression run on standardized
variables produces standardized regression coefficients known as betas. (In practice, both
types of coefficients can be estimated from the original variables.) Blalock (1967), Hargens
(1976), and Kim and Mueller (1976) provide useful discussions on the use of standardized
coefficients.
STANDARDIZED VARIABLE. A variable that has been transformed by multiplication of all scores
by a constant and/or by the addition of a constant to all scores. Often these constants are
selected so that the transformed scores have a mean of zero and a variance (and standard
deviation) of 1.0.
STATISTICAL INDEPENDENCE. A complete lack of covariation between variables; a lack of

association between variables. When used in analysis of variance or covariance, statistical
independence between the independent variables is sometimes referred to as a balanced
design.
STATISTICAL MEASURE. A number (a statistic) whose size indicates the magnitude of some
quantity of interest-e.g., the strength of a relationship, the amount of variation, the size of a
difference, the level of income, etc. Examples include means, variances, correlation
coefficients, and many others. Statistical measures are different from statistical tests. See also
STATISTICAL TEST.
STATISTICAL TEST. A number (a statistic) that can be used to assess the probability that a
statistical measure deviates from some preselected value (often zero) by no more than would
be expected due to the operation of chance if the cases (e.g., persons) studied were
randomly selected from a larger population. Examples include Pearson chi-square, F test, t test,
and many others. Statistical tests are different from statistical measures. See also STATISTICAL
MEASURE.
TRANSFORMATION. A change made to the scores of all cases (e.g., persons) on a variable by
the application of the same mathematical operation(s) to each score. (Common operations
include addition of a constant, multiplication by a constant, taking logarithms, ranking,
bracketing, etc.)
TWO-POINT SCALE. If each case is classified into one of two categories (e.g., yes/no, male/
female, dead/alive), the variable is a two-point scale. For analytic purposes, two-point scales
can be treated as nominal scales, ordinal scales, or interval scales.
WEIGHTED DATA. Weights are applied when one wishes to adjust the impact of cases (e.g.,
persons) in the analysis, e.g., to take account of the number of population units that each case
represents. In sample surveys weights are most likely to be used with data derived from sample
designs having different selection rates or with data having markedly different subgroup
response rates.
Home
Glossary
References
About
Help
References
Andrews, D. F.; Bickel, P. J.; Hampel, F. R.; Huber, P. J.; Rogers, W. H.; and Tukey, J. W. Robust
Estimates of Location: Survey and Advances. Princeton: Princeton University Press, 1972.
Andrews, F. M., and Messenger, R. C. Multivariate Nominal Scale Analysis. Ann Arbor: Institute
for Social Research, The University of Michigan, 1973.
Andrews, F. M.; Morgan, J. N.; Sonquist, J. A.; and Klem, L. Multiple Classification Analysis.
Second edition. Ann Arbor: Institute for Social Research, The University of Michigan, 1973.
Blalock, H. M., Jr. Causal inferences, closed populations, and measures of association.
American Political Science Review 61 (1967): 130-136.
Blalock, H. M., Jr. Can we find a genuine ordinal slope analogue? In Sociological Methodology
1976, edited by D. R. Heise. San Francisco: Jossey-Bass, 1975.
Blalock, H. M., Jr. Social Statistics. Second edition, revised, New York: McGraw-Hill, 1979.
[BMDP] Dixon, W. J., editor. BMDP Statistical Software 1981 Manual. Berkeley, California:
University of California Press, 1981.
Bock, R. D. Multivariate Statistical Methods in Behavioral Research. New York: McGraw-Hill, 1975.
Bock, R. D., and Haggard, E. A. The use of multivariate analysis of variance in behavioral
research. In Handbook of Measurement and Assessment in Behavioral Sciences, edited by D. K.
Whitla. Reading, Massachusetts: Addison-Wesley, 1968.
Bock, R. D., and Yates, G. MULTIQUAL: Log-LinearAnalysis of Nominal or Ordinal Qualitative Data
by the Method of Maximum Likelihood. User's Guide. Chicago: National Educational Resources,
1973.
Borg, I., and Lingoes, J. C. A model and algorithm for multidimensional scaling with external
constraints on the distances. Psychometrika 45 (1980):25-38.
Bowker, A. H., A test for symmetry in contingency tables. Journal of the American Statistical
Association 43 (1948): 572-574.
Bradley, D. R.; Bradley, T. D.; McGrath, S. G.; and Cutcomb S. D Type I error rate of the chi-
square test of independence in RxC tables that have small expected frequencies.
Psychological Bulletin 86 (1979) 1290-1297.
Bradley, J. V. Distribution-Free Statistical Tests. Englewood Cliffs, New Jersey: Prentice-Hall, 1968.
Brown, M. B., and Forsythe, A. B. The small sample behavior of some statistics which test the
equality of several means. Technometrics 16 (1974a): 129-132.
Brown, M. B., and Forsythe, A. B. Robust tests for the equality of variances. Journal of the
American Statistical Association 69 (1974b): 364-367.
Camilli, G., and Hopkins, K. D. Applicability of chi-square to 2 x 2 contingency tables with small
expected cell frequencies. Psychological Bulletin 85 (1978): 163-167.
Carroll, J. D., and Chang, J. J. Analysis of individual differences in multidimensional scaling via
an N- way generalization of ""Eckart-Young"" decomposition. Psychometrika 35 (1970): 283-319.
Carroll, J. D.; Pruzansky, S.; and Kruskal, J. B. CANDELINC: a general approach to

multidimensional analysis of many-way arrays with linear constraints on parameters.
Psychometrika 45 (1980): 3-24.
Cohen, J. A coefficient of agreement for nominal scales. Educational and Psychological

Measurement 20 (1960): 37-46.
Cohen, J. Weighted kappa: nominal scale agreement with provision for scaled disagreement
or partial credit. Psychological Bulletin 70 (1968): 213-220.
Conover, W. J. Practical Nonparametric Statistics. New York: John Wiley, 1971.
Cooley, W. W., and Lohnes, P. R. Multivariate Data Analysis. New York: Wiley, 1971.
D'Agostino, R. B. Simple compact portable test of normality: Geary's test revisited. Psychological
Bulletin 74 (1970):138-140.
Darlington, R. B. Reduced variance regression. Psychological Bulletin 85 (1978): 1238-1255.
Dempster, P.; Schatzoff, M.; and Wermuth, N. A simulation study of alternatives to ordinary least
squares. Journal of the American Statistical Association 72 (1977): 77-102.
Dixon, W. J., and Massey, F. J., Jr. Introduction to Statistical Analysis. Third edition. New York:
McGraw-Hill, 1969.
Draper, N. R., and Smith, H. Applied Regression Analysis. New York: Wiley, 1966.
DuMouchel, W. H. The regression of a dichotomous variable. Unpublished. Survey Research

Center Computer Support Group, Institute for Social Research, University of Michigan, 1974.
DuMouchel, W. H. On the analogy between linear and log-linear regression. Technical Report
No. 67. Unpublished. Department of Statistics, University of Michigan, March 1976.
Feinberg, S. E. The Analysis of Cross-Classified Data. Cambridge, Massachusetts: The MIT Press,
1977.
Fennessey, J., and d'Amico, R. Collinearity, ridge regression, and investigator judgement.
Sociological Methods and Research 8 (1980): 309-340.
Fleiss, J. L.; Cohen, J.; and Everitt, B. S. Large sample standard errors of kappa and weighted
kappa. Psychological Bulletin 72 (1969): 323-327.
Freeman, L. C. Elementary Applied Statistics for Students in Behavioral Science. New York: Wiley,
1965.
Gillo, M. W. MAID: A Honeywell 600 program for an automatised survey analysis. Behavioral
Science 17 (1972): 251-252.
Gillo, M. W., and Shelley, M. W. Predictive modelling of multivariable and multivariate data.
Journal of the American Statistical Association 69 (1974):646-653.
Glass, G. V., and Hakstian, A. R. Measures of association in comparative experiments: their

development and interpretation. American Educational Research Journal 6 (1969): 403-414.
Glass, G. V.; Willson, V. L.; and Gottman, J. M. Design and Analysis of Time Series Experiments.
Boulder, Colorado: Colorado Associated University Press, 1975.
Gokhale, D. V., and Kullback, S. The Information in Contingency Tables. New York: Marcel
Dekker, 1978.
Goodman, L. A., and Kruskal, W. H. Measures of association for cross classifications. Journal of
the American Statistical Association 49 (1954): 732-764.
Goodman, L. A., and Kruskal, W. H. Measures of association for cross classifications III:
approximate sampling theory. Journal of the American Statistical Association 58 (1963): 310-364.
Goodman, L. A., and Kruskal, W. H. Measures of association for cross classification IV:
simplification of asymptotic variances. Journal of the American Statistical Association 67 (1972):
415-421.
Gorsuch, R. L. Factor Analysis. Philadelphia: W. B. Saunders, 1974. " R = R + "
Gross, A. J., and Clark, V. A. Survival Distributions: Reliability Applications in the Biomedical
Sciences. New York: Wiley, 1975.
Guttman, L. A general nonmetric technique for finding the smallest coordinate space for a
configuration of points. Psychometrika 33 (1968): 469-506.
Hannan, M. T., and Tuma, N. B. Methods for temporal analysis. In Annual Review of Sociology:
1979, edited by A. Inkeles. Palo Alto: Annual Reviews, 1979.
Hargens, L. A note on standardized coefficients as structural parameters. Sociological Methods
and Research 5 (1976): 247-256.
Harris, R. J. A Primer of Multivariate Statistics. New York: Academic Press, 1975.
Harshbarger, T. R. Introductory Statistics: A Decision Map. New York: Macmillan, 1971.
Harshman, R. A. PARAFAC: Foundations of the PARAFAC procedure- models and conditions for
an 'explanatory' multi-model factor analysis. Working papers in phonetics 16. Los Angeles:
University of California at Los Angeles, 1970.
Hartwig, F. Exploratory Data Analysis. Beverly Hills, California: Sage, 1979.
Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt, Rinehart, and
Winston, 1973.
Hertel, B. R. Minimizing error variance introduced by missing data routines in survey analysis.
Isaac, P. D., and Poor, D. D. S. On the determination of appropriate dimensionality in data with
error. Psychometrika 39 (1974): 91-109.
Joreskog, K. G., and Sorbom, D. LISREL: Analysis of Linear Structural Relationships by the Method
of Maximum Likelihood. Version IV. User's Guide, Chicago: National Educational Resources,
1978.
Kalbfleisch, J. D., and Prentice, R. L. The Statistical Analysis of Failure Time Data. New York: Wiley,
1980.
Kelley, T. L. An unbiased correlation ratio measure. Proceedings of the National Academy of

Sciences 21 (1935): 554-559.
Kendall, M. G. Rank Correlation Methods. Fourth edition. London: Griffin, 1970.
Kendall, M. G., and Stuart, A. The Advanced Theory of Statistics, Volume 2. New York: Hafner,
1961.
Kerlinger, F. N., and Pedhazur, E. J. Multiple Regression in Behavioral Research. New York: Holt,
Rinehart and Winston, 1973.
Kim, J. Predictive measures of ordinal association. American Journal of Sociology 76 (1971): 891-
907.
Kim, J. Multivariate analysis of ordinal variables. American Journal of Sociology 81 (1975): 261-
298.
Kim, J., and Curry, J. The treatment of missing data in multivariate analysis. Sociological
Methods and Research 6 (1977): 215-240.
Kim, J., and Mueller, C. W. Standardized and unstandardized coefficients in causal analysis.
Kirk, R. E. Experimental Design: Procedures for the Behavioral Sciences. Belmont, California:
Brooks/Cole, 1968.
Krippendorf, K. Bivariate agreement coefficients for reliability of data. In Sociological

Methodology: 1970, edited by E. F. Borgatta and G. W. Bohrnstedt. San Francisco: Jossey-Bass,
1970.
Kruskal, J. B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis.

Psychometrika 29 (1964a): 1-27.
Kruskal, J. B. Nonmetric multidimensional scaling: a numerical method. Psychometrika 29

(1964b): 115-130.
Kruskal, J. B. Transformations of data. In International Encyclopedia of Statistics, Volume 2,

edited by W. H. Kruskal and J. M. Tanur. New York: Crowell Collier and Macmillan. Originally
published 1968. Copyright renewed in 1978 by The Free Press.
Kruskal, J. B., and Wish, M. Multidimensional Scaling. Beverly Hills, California: Sage, 1978.
Kruskal, J. B.; Young, F. W.; and Seery, J. B. How to use KYST, a very flexible program to do
multidimensional scaling and unfolding. Unpublished. Bell Laboratories, Murray Hills, New Jersey,
1973.
Landis, J. R.; Stanish, W. M.; Freeman, J. L.; and Koch, G. G. A computer program for the
generalized chi-square analysis of categorial data using weighted least squares (GENCAT).
Computer Programs in Bio-medicine 6 (1976): 196-231.
Langeheine, R. Erwartete fitwerte fur Zufallskonfigurationen in PINDIS. Zeitschrift fur

Sozialpsychologie 11 (1980): 38-49.
Leinhardt, S., and Wasserman, S. S. Exploratory data analysis: an introduction to selected

methods. In Sociological Methodology 1979, edited by K. F. Schuessler. San Francisco: Jossey-
Bass, 1978.
Light, R. J. Measures of response agreement for qualitative data: some generalizations and
alternatives. Psychological Bulletin 76 (1971): 365-377.
Lingoes, J. C., and Borg, I. Procrustean individual difference scaling. Journal of Marketing
Research 13 (1976): 406-407.
Lingoes, J. C.; Roskam, E. E.; and Borg, I. Geometric Representations of Relational Data. Second
edition. Ann Arbor: Mathesis Press, 1979.
MacCallum, R. C., and Cornelius, E. T. A Monte Carlo Investigation of recovery of structure by

ALSCAL. Psychometrika 42 (1977): 401-428.
Mayer, L. S., and Robinson, J. A. Measures of association for multiple regression models with
ordinal predictor variables. In Sociological Methodology 1978, edited by K. F. Schuessler. San
Francisco: JosseyBass, 1977.
McCleary, R., and Hay, R. A., Jr., with Meidinger, E. E., and McDowall, D. Applied Time Series
Analysis for the Social Sciences. Beverly Hills, California: Sage, 1980.
McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969.
[MIDAS] Fox, D. J., and Guire, K. E. Documentation for MIDAS. Third edition. Ann Arbor: Statistical
Research Laboratory, The University of Michigan, 1976.
Morrison, D.F. Multivariate Statistical Methods. Second edition. New York: McGraw-Hill, 1976.
Mosteller, F., and Tukey, J. W. Data Analysis and Regression. Reading: Massachusetts: Addison-
Wesley 1977.
Neter, J., and Wasserman, W. Applied Linear Statistical Models. Homewood, Illinois: Richard D.
Irwin, 1974.
Nunnally, J. C. Psychometric Theory. Second edition. New York: McGraw-Hill, 1978.
Olson, C. L. On choosing a test statistic in multivariate analysis of variance. Psychological

Bulletin 83 (1976): 579-586.
Olsson, U. Maximum likelihood estimation of the polychoric correlation coefficient.

Psychometrika 44 (1979): 443-460.
Olsson, U. Measuring correlation in ordered two-way contingency tables. Journal of Marketing

Research 17 (1980): 391-394.
[OSIRIS] Survey Research Center Computer Support Group. OSIRIS IV User's Manual. Seventh
edition. Ann Arbor: Institute for Social Research, The University of Michigan, 1981.
Overall, J. E. and Klett, C. J. Applied Multivariate Analysis. New York: McGraw-Hill, 1972.
Ramsay, J. O. Maximum likelihood estimation in multidimensional scaling. Psychometrika 42

(1977): 241-266.
Rao, C. R. Linear Statistical Inference and its Applications. New York: Wiley, 1965.
Robinson, W. S. The statistical measurement of agreement. American Sociological Review 22

(1957):17-25.
Rozeboom, W. W. Ridge regression: bonanza or beguilement? Psychological Bulletin 86 (1979):

242- 249.
Sands, R., and Young, F. W. Component models for three-way data: an alternating least
squares algorithm with optimal scaling features. Psychometrika 45 (1980): 39-68.
[SAS] SAS Institute, Inc. SAS User's Guide, 1979 Edition. Raleigh, North Carolina: SAS Institute, 1979.
[SAS] SAS Institute, Inc. The SAS Supplemental Library User's Guide, 1980 Edition. Cary, North
Carolina: SAS Institute, 1980.
Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York: McGraw-Hill, 1956.
Smith, G., and Campbell, F. A critique of ridge regression methods. Journal of the American
Statistical Association 75 (1980): 74-81.
Sneath, P. H. A., and Sokal, R. R. Numerical Taxonomy. San Francisco: W. H. Freeman, 1973.
Snedecor, G. W., and Cochran, W. G. Statistical Methods. Sixth edition. Ames, lowa: The lowa
State University Press, 1967.
Somers, R. H. A new asymmetric measure of association for ordinal variables. American

Sociological Review 27 (1962): 799-811.
Sonquist, J. A.; Baker, E. L.; and Morgan, J. N. Searching for Structure. Revised edition. Ann
Arbor: Institute for Social Research, The University of Michigan, 1974.
Sorbom, D. and Joreskog, K.G. COFAMM: Confirmatory Factor Analysis with Model
Modification. User's Guide. Chicago: National Educational resources, 1976.
Spence, I., and Graef, J. The determination of the underlying dimensionality of an empirically
obtained matrix of proximities. Multivariate Behavioral Research 9 (1974): 331-342.
Spence, I., and Ogilvie, J. C. A table of expected stress values for random rankings in nonmetric
multidimensional scaling. Multivariate Behavioral Research 8 (1973): 511-517.
[SPSS] Nie, N. H.; Hull, C. H.; Jenkins, J. G.; Steinbrenner, K.; and Bent D. H. SPSS: Statistical
Package for the Social Sciences. Second edition. New York: McGraw-Hill, 1975.
[SPSS] Hull, C. H., and Nie, N. H. SPSS UPDATE 7-9: New Procedures and Facilities for Releases 7-9.
New York: McGraw-Hill, 1981.
Srikantan, K. S. Canonical association between nominal measurements. Journal of the

American Statistical Association 65 (1970): 284-292.
Statistical Research Laboratory. Elementary Statistics Using MIDAS. Second edition. Ann Arbor:
Statistical Research Laboratory, The University of Michigan, 1976.
Statistics Department, University of Chicago. ECTA program: description for users.

Mimeographed paper, 1973.
Stuart, A. The estimation and comparison of strengths of association in continency tables.
Biometrika 40 (1953): 105-110.
Takane, Y.; Young, F. W.; and DeLeeuw, J. Nonmetric individual differences multidimensional
scaling: an alternating least squares method with optimal scaling features. Psychometrika 42
(1977): 7-67.
Tukey, J. W. Exploratory Data Analysis. Reading, Massachusetts: Addison-Wesley, 1977.
Young, F. W., and Torgerson, W. S. TORSCA, a FORTRAN IV program for Shepard-Kruskal

multidimensional scaling analysis. Behavioral Science 12 (1976): 498.
Yule, G. V., and Kendall, M. G. An Introduction to the Theory of Statistics Fourteenth edition.
London: Griffin, 1957. "
Home
Glossary
References
About
Help
This program was written by:

Cornell University
Based on:
Andrews, F.M., Klem, L. Davidson, T.N., O'Malley, P.M., and Rodgers, W.L. (1981).
A Guide for Selecting Statistical Techniques for Analyzing Social Science Data,
2nd Ed. Survey Research Center, Institute for Social Research, The University of
Michigan, Ann Arbor MI, all rights reserved. Used with permission of the Survey
Research Center, Institute for Social Research, The University of Michigan. Any
modifications of the guide are the responsibility of the present developer and
have not been checked with the original authors.
PLEASE NOTE: The 1998 printed edition of Selecting Statistics was

published by SAS under the title:
Andrews, Frank M., Klem, Laura, O'Malley, Patrick M.,

Rodgers, Willard L., Welch, Kathleen B., Davidson, Terrence
N. (1998). Selecting Statistical Techniques for Social
Science Data : A Guide for SAS. SAS Institute. To order
directly through amazon.com, click here. You may also
order the new edition by phoning 1-800-727-3228. The
order number is P55854.
The complete original text version may be obtained from the Survey Research Center [(313)
764-8370]
The development of this program was supported in part through NIMH Grant R01MH46712-
01A1, William M.K. Trochim, Principal Investigator; and US Dept. of Education and NIMH Grant
H133B00011, Judith A. Cook, Principal Investigator.
Everyone who uses this program owes a great debt to the people who painstakingly put
together the text document on which this is based.
This program should not be viewed as a substitute for advanced study of statistics or
consultation with a professional statistician. This program is best used as an exploratory tool
when first thinking about statistical analyses, and as an aid to educating students on the
selection of the correct statistical technique. The author of this program, the distributor, and the
authors of the text on which it is based assume no liability for any errors contained herein.
Home
Glossary
References
About
Help
Instructions
This program is designed to be simple to use. You can navigate through the program using a
mouse. The program opens at the first screen in a search. Generally, if you just click on the
options in answering the questions that are posed, you will be led to a statistical technique.
Please send any comments, corrections or suggestions to Bill Trochim
Buttons
The ReStart button
Use the ReStart button when you want to begin another search.
The Glossary button
Use the Glossary button to take you to a scrolling window that has a glossary of terms used in
the questions. This is especially useful if you don't understand a technical term in the question
you are being asked. Use the Back button in your browser return to your search.
The Reference button
Use the Reference button to take you to a scrolling window that lists the references given in the
original text on which this program is based. Use the Back button in your browser return to your
search.
The Help button
Use the Help button to bring you to this page. Use the Back button in your browser return to your
search.
The About button
Use the About button to bring you to a page that explains how this program was authored and
where you can request printed copies. Use the Back button in your browser return to your
search.
The Results Pages
A Results Page for a search shows you the statistical test and/or statistical measure for your
situation, provides notes where relevant, and usually suggests a statistical program that can be
used to accomplish the analysis. A Results Page can only be reached at the end of a search.
Miscellaneous Notes
This program does not contain all of the information given in the original text document. That
document, for instance, gives more extensive notes about specific statistical analyses and their
limitations. It also gives information about a wider range of statistical computer programs. This
program only lists analyses that are available in SPSS and SAS. However, you should note that
some of the references for a given statistical technique may be about a special-purpose
statistical program. The names of such programs are indicated in capital letters enclosed in
parentheses at the end of a citation. For instance, if the recommended analysis can be
accomplished by the special-purpose LISREL program, you will see (LISREL) at the end of the
citation. When computer programs are included in the citations, a citation that has the term
(General) at the end of it is one that generally describes the statistical technique.
Home
Glossary
References
About
Help
How do you want to treat the variable with respect to scale of measurement?"
Nominal
Ordinal
Interval
Home
Glossary
References
About
Help
What do you want to know about the distribution of the variable?"
Central Tendency
Frequencies
Dispersion
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
The Mode
Notes
McNemar, Q. Psychological Statistics. Fourth Edition. New York: Wiley 1969, p.14
Home
Glossary
References
About
Help
Statistical Test
Relative frequencies (e.g., percentages), or Absolute Frequencies
Statistical Measure
(none)
Notes
Relative Frequencies: Blalock, H. M., Jr. Social Statistics. Second edition, revised, New York:
McGraw-Hill, 1979. p. 31
Absolute Frequencies: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley
1969. p 5
SPSS: FREQUENCIES
SAS: UNIVARIATE, CHART

Home
Glossary
References
About
Help
Statistical Test
The relative frequency of modal value or class
Statistical Measure
(none)
Notes
Blalock, H. M., Jr. Social Statistics. Second edition, revised, New York: McGraw-Hill, 1979. , p. 31
SPSS: FREQUENCIES

Home
Glossary
References
About
Help
What do you want to know about the distribution of the variable?
Central Tendency
Frequencies
Dispersion
Home
Glossary
References
About
Help
Statistical Test
The Median
Statistical Measure
(none)
Notes
McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p. 14
SPSS: FREQUENCIES
SAS: UNIVARIATE
Home
Glossary
References
About
Help
Statistical Test
Relative frequencies (e.g., percentages),
or Absolute frequencies,
or N-tiles
Statistical Measure
(none)
Notes
N-tiles: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p. 19
SAS: UNIVARIATE
Home
Glossary
References
About
Help
Statistical Test
Inter-quartile deviation
Statistical Measure
(none)
Notes
SAS: UNIVARIATE (SAS prints Q3-Q1; our reference refers to [Q3-Q1]/2)

Home
Glossary
References
About
Help
What do you want to know about the distribution of the variable?
Central Tendency
Frequencies
Dispersion
Symmetry
Peakedness
Normality
Home
Glossary
References
About
Help
Do you want to treat outlying cases differently from others?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Windsorized mean,
or Trimmed mean,
or Hampel estimate of location,
or Biweight mean
Statistical Measure
(none)
Notes
Windsorized Mean: Dixon, W. J., and Massey, F. J., Jr. Introduction to Statistical Analysis. Third
edition. New York: McGraw-Hill, 1969. p. 330
Trimmed mean: Andrews, D. F.; Bickel, P. J.; Hampel, F. R.; Huber, P. J.; Rogers, W. H.; and Tukey,
J. W. Robust Estimates of Location: Survey and Advances. Princeton: Princeton University Press,
1972. p. 2B1
Hampel Estimate: Andrews, D. F.; Bickel, P. J.; Hampel, F. R.; Huber, P. J.; Rogers, W. H.; and
Tukey, J. W. Robust Estimates of Location: Survey and Advances. Princeton: Princeton University
Press, 1972. p. 2C3
Biweight Mean: Mosteller, F., and Tukey, J. W. Data Analysis and Regression. Reading
Massachusetts: Addison-Wesley 1977. p. 205
Home
Glossary
References
About
Help
What is the form of the distribution?
Symmetric
Skewed
Home
Glossary
References
About
Help
Statistical Test
The Mean
Statistical Measure
(none)
Notes
SPSS: CONDESCRIPTIVE, FREQUENCIES
SAS: UNIVARIATE, MEANS

Home
Glossary
References
About
Help
Statistical Test
The Median or the mean
Statistical Measure
(none)
Notes
Median: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p. 14
Mean: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p. 16
SPSS: FREQUENCIES
SAS: UNIVARIATE
Home
Glossary
References
About
Help
Statistical Test
Relative frequencies (e.g., percentages),
or Absolute frequencies,
or N-tiles
Statistical Measure
(none)
Notes
Relative frequencies: Blalock, H. M., Jr. Social Statistics. Second edition, revised, New York:
Absolute frequencies: Blalock, H. M., Jr. Social Statistics. Second edition, revised, New York:
McGraw-Hill, 1979. p. 5\
N-tiles: Blalock, H. M., Jr. Social Statistics. Second edition, revised, New York: McGraw-Hill, 1979.
p.19
Relative & Absolute Frequencies:
SPSS: FREQUENCIES
N -tiles:
SAS: UNIVARIATE
Home
Glossary
References
About
Help
Statistical Test
Standard deviation,
or coefficient of variation,
or range
Statistical Measure
(none)
Notes
Biased estimators
Standard Deviation: Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt,
Rinehart, and Winston, 1973. p. 238
Coefficient of Variation: Blalock, H. M., Jr. Social Statistics. Second edition, revised, New York:
Range: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p. 19
Standard deviation:
Coefficient of Variation:
Range:
SAS: UNIVARIATE
Home
Glossary
References
About
Help
Statistical Test
Skewness
Statistical Measure
(none)
Notes
To test departures from normality: for N greater than 150, refer the critical ratio of the skewness
measure to a table of the unit normal curve; for N between 25 and 150, refer the skewness
measure to a table for testing skewness. This is a biased estimator.
Skewness: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p. 25
Critical ratio of skewness: Snedecor, G. W., and Cochran, W. G. Statistical Methods. Sixth
edition. Ames, lowa: The lowa State University Press, 1967. p. 86
Table for testing skewness: Snedecor, G. W., and Cochran, W. G. Statistical Methods. Sixth
edition. Ames, lowa: The lowa State University Press, 1967. p. 552

Home
Glossary
References
About
Help
Statistical Test
Kurtosis
Statistical Measure
(none)
Notes
To test departures from normality: for N greater than 1000, refer the critical ratio of the kurtosis
measure to a table of the unit normal curve; for N between 200 and 1000, refer the kurtosis
measure to a table for testing kurtosis; for N less than 200, use Geary's criterion. This is a biased
estimator.
Kurtosis: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p. 25
Critical Ratio of Kurtosis: Snedecor, G. W., and Cochran, W. G. Statistical Methods. Sixth edition.
Ames, lowa: The lowa State University Press, 1967. p. 86
Table for testing kurtosis: Snedecor, G. W., and Cochran, W. G. Statistical Methods. Sixth edition.
Geary's criterion for kurtosis: D'Agostino, R. B. Simple compact portable test of normality: Geary's
test revisited. Psychological Bulletin 74 (1970):138-140.

Home
Glossary
References
About
Help
Statistical Test
Kolmogorov-Smirnov one sample test,
or Lilliefors extension of the Kolmogorov-Smirnov test,
or the Chi-square goodness of fit test
Statistical Measure
(none)
Notes
Kolmogorov-Smirnov: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York:
Lilliefors: Conover, W. J. Practical Nonparametric Statistics. New York: John Wiley,1971. p. 302
Chi-square: Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt,
Kolmogorov -Smirnov:
SPSS: NPAR
Lilliefor's:
SAS: UNIVARIATE
Chi -square:
SPSS: NPAR
SAS: FREQ
see also: Skewness, Kurtosis

Home
Glossary
References
About
Help
How do you want to treat the variables with respect to scale of measurement?
Both Interval
Both Nominal
Both Ordinal
One Interval, One Ordinal
One Interval, One Nominal
One Ordinal, One Nominal

Home
Glossary
References
About
Help
Is a distinction made between a dependent and an independent variable?
Yes
No
Home
Glossary
References
About
Help
Do you want to treat the relationship as linear?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Regression coefficient (b or beta)
Statistical Measure
F test (F equals t-squared)
Notes
Beta is a standardized version of b.
Regression coefficient: Hays, W. L. Statistics for the Social Sciences. Second edition. New York:
Holt, Rinehart, and Winston, 1973. p. 623, 630
F-test for regression: Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt,
SPSS: REGRESSION
SAS: GLM, REG

Home
Glossary
References
About
Help
Statistical Test
Coefficients from curvilinear regression (b or beta)
Statistical Measure
F test (F equals t-squared for each coefficient)
Notes
Beta is the standardized version of b. The type of curvilinear regression referred to here is also
known as polynomial regression.
Coefficients from curvilinear regression: Draper, N. R., and Smith, H. Applied Regression Analysis.
New York: Wiley, 1966. p. 129; Hays, W. L. Statistics for the Social Sciences. Second edition. New
York: Holt, Rinehart, and Winston, 1973. p. 675
F-test: Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt, Rinehart, and
Winston, 1973. p. 680
SPSS: REGRESSION, ONEWAY
SAS: GLM
Home
Glossary
References
About
Help
Do you want to test whether the means on the two variables are equal?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
(none)
Statistical Measure
t-test for paired observations
Notes
The t test for paired observations is appropriate for parallel measures from matched cases as
well as for repeated measures on a single set of cases.
Winston, 1973. p. 424
SPSS: T-TEST
SAS: MEANS (Requires that the data analyzed be the differences between the paired
observations)
Home
Glossary
References
About
Help
Do you want to treat the relationship as linear?
Yes
No
Home
Glossary
References
About
Help
What do you want to measure?
Agreement
Covariation
Home
Glossary
References
About
Help
Should there be a penalty if the variables do not have the same distributions?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Robinson's A,
or the intraclass correlation coefficient
Statistical Measure
F test
Notes
The intraclass correlation coefficient is a biased estimator.
Robinson's A: Robinson, W. S. The statistical measurement of agreement. American Sociological

Review 22 (1957):17-25.
Intraclass Correlation: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley
1969. p. 322
F-test: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p. 322
Home
Glossary
References
About
Help
Statistical Test
Krippendorff's coefficient of agreement
Statistical Measure
(none)
Notes

1970. p. 143
Home
Glossary
References
About
Help
How many of the variables are dichotomous?
None
One
Both
Home
Glossary
References
About
Help
Statistical Test
Pearson's product moment r
Statistical Measure
Do Fisher's r to Z transformation and refer the critical ratio of Z to a table of the unit normal curve.
Notes
This is a biased estimator.
r: Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt, Rinehart, and
Winston, 1973. p. 623
z-transformation test: Hays, W. L. Statistics for the Social Sciences. Second edition. New York:
Holt, Rinehart, and Winston, 1973. p. 662
SPSS: PEARSON CORR, CROSSTABS
SAS: CORR
Home
Glossary
References
About
Help
Is the dichotomous variable a collapsing of a continuous variable and do you

want to estimate what the correlation would be if it were continuous?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Biserial r
Statistical Measure
Refer critical ratio for biserial r to a table of the unit normal curve
Notes
This measure depends on a strict assumption of the normality of the continuous variables that
have been dichotomized. Furthermore, the sampling error is large when dichotomies are
extreme. Nunnally (1978, pages 135-137) advises against the use of these coefficients.
Biserial r: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p. 215;
Nunnally, J. C. Psychometric Theory. Second edition. New York: McGraw-Hill, 1978. p. 135
Critical ratio test: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p.
217
Home
Glossary
References
About
Help
Statistical Test
Pearson's product moment r (equals point biserial r)
Statistical Measure
Refer critical ratio for point biserial r to a table of the unit normal curve
Notes
Pearson's r in this case is mathematically equivalent ro a point biserial r; the tests are almost
equivalent.
Critical ratio: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969.p. 219
Home
Glossary
References
About
Help
Are the variables collapsings of continuous variables and do you want to estimate
what the correlation would be if they were continuous?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Tetrachoric r
Statistical Measure
Refer critical ratio for tetrachoric r to a table of the unit normal curve
Notes
This measure depends on a strict assumption of the normality of the continuous variables that
have been dichotomized. Furthermore, the sampling error is large when dichotomies are
extreme. Nunnally (1978, pages 135-137) advises against the use of these coefficients.
Tetrachoric: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p. 221;
Nunnally, J. C. Psychometric Theory. Second edition. New York: McGraw-Hill, 1978. p. 136
Critical ratio: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p. 223
Home
Glossary
References
About
Help
Statistical Test
Pearson's product moment r (equals phi)
Statistical Measure
Refer critical ratio for phi to a table of the unit normal curve
Notes
Pearson's r in this case is mathematically equivalent ro a point biserial r; the tests are almost
equivalent.
Critical ratio for phi: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969.
p. 227
Critical ratio for phi:
SPSS: CROSSTABS
SAS: FREQ
Home
Glossary
References
About
Help
Statistical Test
Sorry, there is no known analysis for this case.
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Are the variables both two-point scales?
Yes
No
Home
Glossary
References
About
Help
Symmetry
Covariation
Home
Glossary
References
About
Help
Statistical Test
(none)
Statistical Measure
McNemar's test of symmetry
Notes
McNemar's test of symmetry is appropriate for parallel measures from matched cases as well as
for repeated measures on a single set of cases. In this case, McNemar's test of symmetry is
equivalent to Cochran's Q.
Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York: McGraw-Hill, 1956. p.
63 (when both variables are two-point scales, McNemar's test of symmetry and McNemar's test
for the significance of changes are equivalent); Bowker, A. H., A test for symmetry in
contingency tables. Journal of the American Statistical Association 43 (1948): 572-574.
SPSS: NPAR
Home
Glossary
References
About
Help
Statistical Test
Yule's Q,
or Phi
Statistical Measure
Fisher's exact test,
or Refer critical ratio of phi to a table of the unit normal curve,
or Pearson chi-square
Notes
In this case, Yule's Q is equivalent to Goodman and Kruskal's Gamma and Phi is equivalent to
Pearson's product moment r. In general, Q will be higher in absolute value than Phi because Q
ignores pairs of cases which fall in the same category on one or both of the variables. Pearson
chi-squares can be corrected for continuity (Yate's correction) but this is controversial. See
Camilli and Hopkins (1978).
Yule's Q: Yule, G. V., and Kendall, M. G. An Introduction to the Theory of Statistics Fourteenth
edition. London: Griffin, 1957. p. 30
Phi: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p. 225
Critical ratio of phi: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969.
p. 227
Fisher's exact test: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York:
Pearson chi-square: Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt,
phi:
SPSS: CROSSTABS
SAS: FREQ (For two dichotomous variables, Cramer's V is equivalent to phi)
Critical ratio of phi:
SPSS: CROSSTABS
SAS: FREQ
Fisher's exact test:
SPSS: CROSSTABS
Pearson chi square:
SPSS: CROSSTABS
SAS: FREQ
Home
Glossary
References
About
Help
Yes
No
Home
Glossary
References
About
Help
Do you want a statistic based on the number of cases in each category or on the
number of cases in the modal categories?
Number in Each Category
Number in Modal Categories

Home
Glossary
References
About
Help
Statistical Test
Goodman and Kruskal's tau b
Statistical Measure
Refer critical ratio of tau b to a table of the unit normal curve.
Notes
tau b: Blalock, H. M., Jr. Social Statistics. Second edition, revised, New York: McGraw-Hill, 1979.
p. 307
Critical ratio of tau b: Goodman, L. A., and Kruskal, W. H. Measures of association for cross
classification IV: simplification of asymptotic variances. Journal of the American Statistical
Association 67 (1972): 415-421. p. 417
Home
Glossary
References
About
Help
Statistical Test
Asymmetric lambda
Statistical Measure
Refer critical ratio of lambda to a table of the unit normal curve
Notes
Asymmetric lambda: Hays, W. L. Statistics for the Social Sciences. Second edition. New York:
Critical ratio of lambda: Goodman, L. A., and Kruskal, W. H. Measures of association for cross
classifications 111: approximate sampling theory. Journal of the American Statistical Association
58 (1963): 310-364. p. 316
Asymmetric lambda:
SPSS: CROSSTABS
SAS: FREQ
Critical ratio of lambda:
SAS: FREQ
Home
Glossary
References
About
Help
Agreement
Symmetry
Covariation
Home
Glossary
References
About
Help
Should there be a penalty if the variables do not have the same distributions?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Scott's coefficient of agreement, pi
Statistical Measure
(none)
Notes

1970. p. 142
Home
Glossary
References
About
Help
Statistical Test
Cohen's agreement coefficients, kappas
Statistical Measure
Refer critical ratios for Cohen's coefficients to a table of the unit normal curve.
Notes
Cohen's agreement coefficients (kappas): Cohen, J. A coefficient of agreement for nominal

scales. Educational and Psychological Measurement 20 (1960): 37-46.; Cohen, J. Weighted
kappa: nominal scale agreement with provision for scaled disagreement or partial credit.
Psychological Bulletin 70 (1968): 213-220.
Critical ratio: Fleiss, J. L.; Cohen, J.; and Everitt, B. S. Large sample standard errors of kappa and
weighted kappa. Psychological Bulletin 72 (1969): 323-327.
Home
Glossary
References
About
Help
Statistical Test
(none)
Statistical Measure
McNemar's test of symmetry
Notes
McNemar's test of symmetry is appropriate for parallel measures from matched cases as well as
for repeated measures on a single set of cases.
Bowker, A. H., A test for symmetry in contingency tables. Journal of the American Statistical
Association 43 (1948): 572-574.
SPSS: NPAR
Home
Glossary
References
About
Help
Do you want a statistic based on the number of cases in each category or on the
number of cases in the modal categories?
Number in Each Category
Number in Modal Categories

Home
Glossary
References
About
Help
Do you want a statistic whose upper limit varies with the number of categories and
whose upper limit may be less than one?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Contingency coefficient
Statistical Measure
Pearson chi-square
Notes
Contingency coefficient: Hays, W. L. Statistics for the Social Sciences. Second edition. New York:
Pearson chi-square: Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt,
SPSS: CROSSTABS
SAS: FREQ
Home
Glossary
References
About
Help
Statistical Test
Cramer's V
Statistical Measure
Pearson chi-square
Notes
Winston, 1973. p. 745 (Hays called it Cramer's statistic); Srikantan, K. S. Canonical association
between nominal measurements. Journal of the American Statistical Association 65 (1970): 284-
292.
SPSS: CROSSTABS
SAS: FREQ
Home
Glossary
References
About
Help
Statistical Test
Symmetric lambda
Statistical Measure
Refer critical ratio of symmetric lambda to a table of the unit normal curve
Notes
Symmetric lambda: Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt,
Critical ratio: Goodman, L. A., and Kruskal, W. H. Measures of association for cross classifications
III: approximate sampling theory. Journal of the American Statistical Association 58 (1963): 310-
364. p. 321
Home
Glossary
References
About
Help
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Somers' d
Statistical Measure
For N greater than 10, refer the critical ratio of S to a table of the unit normal curve; for N less
than or equal to 10, refer d to a table of critical values of S.
Notes
Somer's D: Somers, R. H. A new asymmetric measure of association for ordinal variables.

American Sociological Review 27 (1962): 799-811.
Critical Ratio of S: Kendall, M. G. Rank Correlation Methods. Fourth edition. London: Griffin, 1970.
p.52
Standard Error of S assuming ties: Kendall, M. G. Rank Correlation Methods. Fourth edition.
London: Griffin, 1970. p. 55
Table of critical values of S assuming ties: Harshbarger, T. R. Introductory Statistics: A Decision

Map. New York: Macmillan, 1971. p. 535
Somer's d:
SPSS: CROSSTABS
SAS: FREQ
Critical ratio of S:
SPSS: CROSSTABS, NONPAR CORR
SAS: FREQ
Home
Glossary
References
About
Help
Agreement
Covariation
Home
Glossary
References
About
Help
Statistical Test
(none)
Statistical Measure
(none)
Notes
The data may be transformed to ranks and r or Krippendorff's r may be used.

Home
Glossary
References
About
Help
Do you want to treat the ranks of the ordered categories as interval scales?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Spearman's rho
Statistical Measure
When N is 10 or larger, refer the critical value of r to a table of the t distribution; for N less than 10
refer r to a table of critical values of r.
Notes
Spearman's rho: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York:
Critical ratio: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York: McGraw-
Hill, 1956. p. 212
Table of critical values of rho: Siegel, S. Nonparametric Methods for the Behavioral Sciences.
New York: McGraw-Hill, 1956. p. 284
SPSS: NONPAR CORR
SAS: FREQ
Home
Glossary
References
About
Help
Statistical Test
Kendall's tau a, tau b or tau c,
Or Goodman And Kruskal's Gamma,
Or Kim's d
Statistical Measure
For N greater than 10 refer the critical ratio of S to a table of the unit normal curve; for N less
than or equal to 10, refer these statistics to a table of the critical values of S.
Notes
These statistics differ with respect to how they treat pairs of cases that fall in the same category
on one or both of the variables. Except in extreme cases (i.e., where any of the statistics equals
0 or 1) the absolute value of Gamma will be the highest of the five statistics, tau a will be the
smallest, and tau b, tau c, and Kim's d will be intermediate. This ordering is because Gamma
ignores all ties (when present in the data - as is usually the case), whereas the other four
statistics penalize for ties in the sense of reducing the absolute value of the statistic obtained.
Unlike tau b or Kim's d, tau c can attain plus or minus 1 even if the two variables do not have
the same number of categories. If there are no ties on either variable the five measures are
identical. See Goodman and Kruskal (1954), Kendall (1970), Kendall and Stuart (1961), Stuart
(1953), and Kim (1971).
Kendall's tau a: Kendall, M. G. Rank Correlation Methods. Fourth edition. London: Griffin, 1970.
p. 5
Standard Error of S assuming no ties: Kendall, M. G. Rank Correlation Methods. Fourth edition.
Table of critical values of S assuming no ties: Kendall, M. G. Rank Correlation Methods. Fourth
edition. London: Griffin, 1970. p. 173
Kendall's tau b: Kendall, M. G. Rank Correlation Methods. Fourth edition. London: Griffin, 1970.
p. 35
Kendall's tau c: Kendall, M. G. Rank Correlation Methods. Fourth edition. London: Griffin, 1970.
p. 47
Goodman and Kruskal's Gamma: Hays, W. L. Statistics for the Social Sciences. Second edition.
New York: Holt, Rinehart, and Winston, 1973. p. 800
Kim's d: Kim, J. Predictive measures of ordinal association. American Journal of Sociology 76

(1971): 891-907. p. 899
Kendall's tau a:
SPSS: NONPAR CORR
Kendall's tau b:
SPSS: CROSSTABS
SAS: FREQ, CORR
Kendall's tau c:
SPSS: CROSSTABS
SAS: FREQ (refered to as Stuart's tau c)
Goodman & Kruskal Gamma:
SPSS: CROSSTABS
SAS: FREQ
Home
Glossary
References
About
Help
Is the ordinal variable a two-point variable?
Yes
No
Home
Glossary
References
About
Help
Any two-point variable meets the criteria for an intervally-scaled variable. You will
be branched to the interval variable branch.
Continue
Home
Glossary
References
About
Help
Do you want to treat the ordinal variable as if it were based on an underlying

normally distributed variable?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Jaspen's coefficient of multiserial correlation
Statistical Measure
Do Fisher's r to Z transformation and refer the critical ratio of Z to a table of the unit normal curve
Notes
This is a biased estimator. Jaspen's coefficient is the product moment correlation between the
interval variable and a transformation of the ordinal variable. The magnitude of this statistic is
sensitive to the assumption of normality.
Jaspen's coefficient: Freeman, L. C. Elementary Applied Statistics for Students in Behavioral

Science. New York: Wiley, 1965. p. 131
Fisher's Z transformation: Hays, W. L. Statistics for the Social Sciences. Second edition. New York:
Holt, Rinehart, and Winston, 1973. p. 662; Harshbarger, T. R. Introductory Statistics: A Decision
Home
Glossary
References
About
Help
Do you want to treat the ordinal variable as if it were a monotonic transformation

of an underlying interval variable?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Mayer and Robinson's M
Statistical Measure
Do Fisher's r to Z transformation and refer critical ratio of Z to a table of the unit normal curve
Notes
Mayer and Robinson's M: Mayer, L. S., and Robinson, J. A. Measures of association for multiple
regression models with ordinal predictor variables. In Sociological Methodology 1978, edited by
K. F. Schuessler. San Francisco: JosseyBass, 1977.
Fisher's Z transformation: Mayer, L. S., and Robinson, J. A. Measures of association for multiple
regression models with ordinal predictor variables. In Sociological Methodology 1978, edited by
K. F. Schuessler. San Francisco: JosseyBass, 1977. ; Hays, W. L. Statistics for the Social Sciences.
Second edition. New York: Holt, Rinehart, and Winston, 1973. p. 662
Home
Glossary
References
About
Help
Statistical Test
Sorry, there is no known analysis for this case
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Is the interval variable dependent?
Yes
No
Home
Glossary
References
About
Help
Do you want a measure of the strength of relationship between the variables or a

test of the statistical significance of differences between groups?
Measure of Strength
Test of Significance
Home
Glossary
References
About
Help
Do you want to describe the relationship in your data or to estimate it in the

population which you have sampled?
Describe
Estimate
Home
Glossary
References
About
Help
Statistical Test
Eta-Squared
Statistical Measure
F test
Notes
If the nominal variable is a two-point scale, the t-test is an alternative (because in such cases, F
= t-squared).
Eta-squared: Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt,
SPSS: BREAKDOWN, ANOVA
SAS: GLM, ANOVA

Home
Glossary
References
About
Help
Statistical Test
• Omega-Squared
•Intraclass Correlation Coefficient
• Kelley's epsilon
Statistical Measure
F test
Notes
If the nominal variable is a two-point scale, the t-test is an alternative (because in such cases, F
= t-squared).
Omega-squared: Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt,
Intraclass correlation: Hays, W. L. Statistics for the Social Sciences. Second edition. New York:
Kelley's epsilon: Kelley, T. L. An unbiased correlation ratio measure. Proceedings of the National
Academy of Sciences 21 (1935): 554-559.; Glass, G. V., and Hakstian, A. R. Measures of
association in comparative experiments: their development and interpretation. American
Educational Research Journal 6 (1969): 403-414.
F-test for all of the above: Hays, W. L. Statistics for the Social Sciences. Second edition. New
Omega -squared: SAS: GLM, ANOVA
Intraclass correlation: SAS: GLM, ANOVA
F-test for omega squared, intraclass correlation and Kelley's epsilon: SPSS: BREAKDOWN, ANOVA
SAS: GLM, ANOVA

Home
Glossary
References
About
Help
Are you willing to assume that the intervally scaled variable is normally distributed
in the population?
Yes
No
Home
Glossary
References
About
Help
Do you want to assume homoscedasticity across levels of the independent

variable?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Analysis of Variance
Statistical Measure
F test
Notes
Analysis of Variance: Hays, W. L. Statistics for the Social Sciences. Second edition. New York:
F-test: Analysis of Variance: Hays, W. L. Statistics for the Social Sciences. Second edition. New
SPSS: ANOVA, ONEWAY, BREAKDOWN, MANOVA
SAS: GLM, ANOVA

Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
• Welch statistic
• Brown-Forsythe statistic
• t-test
Notes
Welch statistic & Brown-Forsythe: Brown, M. B., and Forsythe, A. B. The small sample behavior of
some statistics which test the equality of several means. Technometrics 16 (1974a): 129-132.
t-test: Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt, Rinehart, and
Winston, 1973. p. 404, 410
Analysis of Variance:
SAS: GLM, ANOVA
T -test:
SPSS: T-TEST
SAS: T-TEST
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
Bartlett's test
Notes
Bartlett's Test: Kirk, R. E. Experimental Design: Procedures for the Behavioral Sciences. Belmont,
Ca!ifornia: Brooks/Cole, 1968. p. 61
SAS: GLM, ANOVA

Home
Glossary
References
About
Help
Is the nominal variable a two-point variable?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Sorry, there is no know analysis for this case
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Yes
No
Home
Glossary
References
About
Help
Is the ordinal variable dependent?
Yes
No
Home
Glossary
References
About
Help
Is the nominal variable two-point?
Yes
No
Home
Glossary
References
About
Help
Are the cases (e.g., people) in one category of the nominal variable matched to
the cases in the other category of that variable?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
(none)
Statistical Measure
Sign test
Wilcoxon signed-rank test
Notes
Sign test: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York: McGraw-Hill,
1956. p. 68
Wilcoxon: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York: McGraw-
Hill, 1956. p. 75
Sign test:
SPSS: NPAR
SAS: MRANK
Wilcoxon:
SPSS: NPAR
SAS: UNIVARIATE
Home
Glossary
References
About
Help
Statistical Test
Somers' d
Statistical Measure
if N>10 refer critical ratio of S to unit normal table;
if N<=10 refer d to table of critical values of S;
also • Median test; • Mann-Whitney U test; • Kolmogorov-Smirnov two sample; • runs test
Notes
Somers, R. H. A new asymmetric measure of association for ordinal variables. American

Sociological Review 27 (1962): 799-811.
Critical ratio of S: Kendall, M. G. Rank Correlation Methods. Fourth edition. London: Griffin, 1970.
p. 52
Standard error of S assuming ties: Kendall, M. G. Rank Correlation Methods. Fourth edition.
Table of Critical Values of S assuming ties: Harshbarger, T. R. Introductory Statistics: A Decision

Median test: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York: McGraw-
Hill, 1956. p. 111
Mann-Whitney U: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York:
Kolmogorov-Smirnov: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York:
runs test: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York: McGraw-Hill,
1956. p. 136
Somers' d and Critical Ratio of S:

SPSS: CROSSTABS
SAS: FREQ
Median test:
SPSS: NPAR
SAS: NPAR1WAY, MRANK
Mann-Whitney U:
SPSS: NPAR
Kolmogorov -Smirnov:
SPSS: NPAR
Runs test:
SPSS: NPAR (in SPSS, this test is called Wald-Wolfowitz)

Home
Glossary
References
About
Help
Are the cases (e.g., people) in one category of the nominal variable matched to
the cases in each of the other categories of that variable?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
(none)
Statistical Measure
Friedman test
Notes
Winston, 1973. p. 785
SPSS: NPAR, RELIABILITY
SAS: RANK, MRANK

Home
Glossary
References
About
Help
Statistical Test
Freeman's coefficient of differentiation
Statistical Measure
• Kruskal-Wallis test
• Median test (for more than 2 groups)
Notes
Freeman's coefficient: Freeman, L. C. Elementary Applied Statistics for Students in Behavioral

Science. New York: Wiley, 1965. p. 112
Kruskal-Wallis test: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York:
Median test: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York: McGraw-
Hill, 1956. p. 179
SPSS: NPAR

Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Is the nominal variable two-point?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Is a distinction made between dependent and independent variables?
Yes
No
Home
Glossary
References
About
Help
Is there more than one dependent variable?
Yes
No
Home
Glossary
References
About
Help
Is there more than one independent variable?
Yes
No
Home
Glossary
References
About
Help
Do you want to treat the relationships among the variables as additive?
Yes
No
Home
Glossary
References
About
Help
Do you want to treat all the dependent and independent variables as interval?
Yes
No
Home
Glossary
References
About
Help
Do you want to treat all the relationships as linear?
Yes
No
Home
Glossary
References
About
Help
Does the analysis include at least one intervening variable?
Yes
No
Home
Glossary
References
About
Help
Does your analysis include at least one latent (i.e., unmeasured) variable?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Structural models with latent variables
Statistical Measure
(none)
Notes
1978.
Home
Glossary
References
About
Help
Statistical Test
Path analysis
Statistical Measure
(none)
Notes
Rinehart and Winston, 1973 p. 305
SPSS: REGRESSION
SAS: SYSREG
Home
Glossary
References
About
Help
Statistical Test
Canonical correlation
Statistical Measure
• Wilks' lambda
• Roy's greatest root criterion
• Pillai-Bartlett V
Notes
Canonical correlation: Cooley, W. W., and Lohnes, P. R. Multivariate Data Analysis. New York:
Wiley,1971. p. 168; Harris, R. J. A Primer of Multivariate Statistics. New York: Academic Press,
1975. p. 132
Wilks' lambda: Cooley, W. W., and Lohnes, P. R. Multivariate Data Analysis. New York:
Wiley,1971. p. 175; Morrison, D.F. Multivariate Statistical Methods. Second edition. New York:
McGraw-Hill, 1976.p. 222; Harris, R. J. A Primer of Multivariate Statistics. New York: Academic
Press, 1975. p. 143
Roy's greatest root: Morrison, D.F. Multivariate Statistical Methods. Second edition. New York:
Press, 1975. p. 143
Pillai-Bartlett: Morrison, D.F. Multivariate Statistical Methods. Second edition. New York: McGraw-
Hill, 1976.p. 223
SPSS: CANCORR
SAS: CANCORR
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Do you want to treat all the dependent variables as interval?
Yes
No
Home
Glossary
References
About
Help
Do you want to explore the relationships among a set of variables in two or more
groups simultaneously or do you want to compare the similarity of the patterns of
the relationships among a set of variables either (a) across two or more groups or
(b) with a prespecified pattern?
Explore Relationships
Compare Patterns
Home
Glossary
References
About
Help
Do you want to treat the variables as measured on interval scales and the
relationships among them as linear?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Three-mode factor analysis
Statistical Measure
(none)
Notes
Gorsuch, R. L. Factor Analysis. Philadelphia: W. B. Saunders, 1974. p. 283

Home
Glossary
References
About
Help
Statistical Test
Three-way non-metric multidimensional scaling techniques
Statistical Measure
(none)
Notes
Kruskal, J. B., and Wish, M. Multidimensional Scaling. Beverly Hills, California: Sage, 1978. p. 60
(General)
Carroll, J. D., and Chang, J. J. Analysis of individual differences in multidimensional scaling via
an N-way generalization of Eckart-Young decomposition. Psychometrika 35 (1970): 283-319.
(INDSCAL)
Harshman, R. A. PARAFAC: Foundations of the PARAFAC procedure models and conditions for
an 'explanatory' multi-model factor analysis. Working papers in phonetics 16. Los Angeles:
University of California at Los Angeles, 1970. (PARAFAC)
Lingoes, J. C., and Borg, I. Procrustean individual difference scaling. Journal of Marketing
Research 13 (1976): 406-407. (PINDIS)
Carroll, J. D.; Pruzansky, S.; and Kruskal, J. B. CANDELINC: a general approach to

multidimensional analysis of many-way arrays with linear constraints on parameters.
Psychometrika 45 (1980): 3-24. (CANDELINC)
Ramsay, J. O. Maximum likelihood estimation in multidimensional scaling. Psychometrika 42

(1977): 241-266. (MULTISCAL)
(1977): 7-67. (ALSCAL)
Sands, R., and Young, F. W. Component models for three-way data: an alternating least
squares algorithm with optimal scaling features. Psychometrika 45 (1980): 39-68. (ALSCOMP3)
SAS: ALSCAL
Home
Glossary
References
About
Help
Do you want to preserve the metric units in which the variables were measured or
to standardize them by the observed variance of each?
Standardize
Original Metric
Home
Glossary
References
About
Help
Statistical Test
Confirmatory factor analysis of standardized variance-covariance matrices
Statistical Measure
Maximum likelihood chi-square
Notes
Confirmatory factor analysis: Gorsuch, R. L. Factor Analysis. Philadelphia: W. B. Saunders, 1974.

pp. 116, 251 (General)
1978. (COFAMM)
Maximum likelihood chi-square: Gorsuch, R. L. Factor Analysis. Philadelphia: W. B. Saunders,

1974. pp. 118, 139; Joreskog, K. G., and Sorbom, D. LISREL: Analysis of Linear Structural
Relationships by the Method of Maximum Likelihood. Version IV. User's Guide, Chicago: National
Educational Resources, 1978. (COFAMM)
SAS: FACTOR
Home
Glossary
References
About
Help
Statistical Test
Confirmatory factor analysis of variance-covariance matrices
Statistical Measure
Notes

1978. (COFAMM)

Educational Resources, 1978. (COFAMM)
SAS: FACTOR
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Do you want to treat the independent variable as nominally scaled and all of the
dependent variables as intervally scaled?
Yes
No
Home
Glossary
References
About
Help
Do you want to test only whether the vectors of means are equal for all categories
of the independent variables?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Multivariate analysis of variance
Statistical Measure
• Wilks' lambda
Notes
Multivariate Analysis of variance: Cooley, W. W., and Lohnes, P. R. Multivariate Data Analysis.
New York: Wiley,1971. p. 223; Harris, R. J. A Primer of Multivariate Statistics. New York: Academic
Press, 1975. p. 101; Bock, R. D., and Haggard, E. A. The use of multivariate analysis of variance in
behavioral research. In Handbook of Measurement and Assessment in Behavioral Sciences,
edited by D. K. Whitla. Reading, Massachusetts: Addison-Wesley, 1968.
Press, 1975. p. 109; Olson, C. L. On choosing a test statistic in multivariate analysis of variance.
Press, 1975. p. 103, 109; Olson, C. L. On choosing a test statistic in multivariate analysis of
variance. Psychological Bulletin 83 (1976): 579-586.
Hill, 1976.p. 223; Olson, C. L. On choosing a test statistic in multivariate analysis of variance.
SPSS: MANOVA
SAS: GLM, ANOVA

Home
Glossary
References
About
Help
Statistical Test
Profile analysis
Statistical Measure
• Wilks' lambda
Notes
Profile analysis: Morrison, D.F. Multivariate Statistical Methods. Second edition. New York:
McGraw-Hill, 1976.pp. 153, 205
Wilks' lambda: Morrison, D.F. Multivariate Statistical Methods. Second edition. New York:
McGraw-Hill, 1976.p. 222
McGraw-Hill, 1976.p. 178
Hill, 1976.p. 223
SPSS: MANOVA
SAS: GLM, ANOVA

Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Do you want to statistically remove the linear effects of one or more covariates
from the dependent variable?
Yes
No
Home
Glossary
References
About
Help
Do you want to treat the relationships involving the covariate(s) as additive?
Yes
No
Home
Glossary
References
About
Help
Do you want to treat the dependent variable and the covariate(s) as interval and
the independent variable(s) as nominal?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Covariance Analysis
Statistical Measure
F test
Notes
Covariance analysis: Snedecor, G. W., and Cochran, W. G. Statistical Methods. Sixth edition.
F test: Snedecor, G. W., and Cochran, W. G. Statistical Methods. Sixth edition. Ames, lowa: The
lowa State University Press, 1967. p. 179
SPSS: ANOVA, MANOVA
SAS: GLM
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Do you want to treat the relationships involving the variables as additive?
Yes
No
Home
Glossary
References
About
Help
How do you want to treat the dependent variable with respect to scale of
measurement?
Ordinal
Interval
Nominal
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Do you want to treat all the independent variables as interval?
Yes
No
Home
Glossary
References
About
Help
Do you want to treat all the relationships as linear?
Yes
No
Home
Glossary
References
About
Help
Does the analysis include at least one intervening variable?
Yes
No
Home
Glossary
References
About
Help
Does the analysis include at least one latent (i.e., unmeasured) variable?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Structural models with latent variables
Statistical Measure
(none)
Notes
1978.
Home
Glossary
References
About
Help
Statistical Test
Path analysis
Statistical Measure
(none)
Notes
SPSS: REGRESSION
SAS: SYSREG
Home
Glossary
References
About
Help
Do you want a single measure of the relationship between the dependent

variable and all the independent variables taken together?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Multiple correlation (multiple regression)
Statistical Measure
F test
Notes
Multiple correlation: Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt,
F test: Hays, W. L. Statistics for the Social Sciences. Second edition. New York: Holt, Rinehart, and
Winston, 1973. p. 709
SPSS: REGRESSION
SAS: GLM, REG

Home
Glossary
References
About
Help
Do you want a statistic which assigns to each independent variable some of the
explainable variance in the dependent variable which that independent variable
shares with other independent variables?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Regression coefficient (b or beta)
Statistical Measure
Notes
Regression coefficient: Hays, W. L. Statistics for the Social Sciences. Second edition. New York:
Holt, Rinehart, and Winston, 1973. pp. 704, 708; Kerlinger, F. N., and Pedhazur, E. J. Multiple
Regression in Behavioral Research. New York: Holt, Rinehart and Winston, 1973 pp. 56, 61
F test: Kerlinger, F. N., and Pedhazur, E. J. Multiple Regression in Behavioral Research. New York:
Holt, Rinehart and Winston, 1973 p. 66
SPSS: REGRESSION
SAS: GLM, REG

Home
Glossary
References
About
Help
Do you want a statistic that measures the additional proportion of the total
variance in the dependent variable explainable by each independent variable,
over an above what the other independent variables can explain?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Part correlation
Statistical Measure
Notes
Part correlation: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p.
185
F test: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p. 321
SPSS: REGRESSION
Home
Glossary
References
About
Help
Do you want a statistic that measures the additional proportion of the total
variance in the dependent variable explainable by each independent variable,
over an above what the other independent variables can explain, expressed
relative to the proportion of variance in the dependent variable unexplainable by
the other independent variables?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Partial correlation
Statistical Measure
Do Fisher's r to Z transformation and refer critical ratio of Z to a table of the unit normal curve
Notes
Partial correlation: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969. p.
183
Fisher's Z transformation: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley
1969. p. 185
SPSS: PARTIAL CORR, REGRESSION
SAS: GLM, REG

Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Statistical Test
Multiple curvilinear regression
Statistical Measure
(none)
Notes
Neter, J., and Wasserman, W. Appiied Linear Statistical Models. Homewood, Illinois: Richard D.
Irwin, 1974. p. 273
SPSS: REGRESSION, MANOVA
SAS: GLM
Home
Glossary
References
About
Help
Statistical Test
Dummy variable regression or multiple classification analysis
Statistical Measure
(none)
Notes
Draper, N. R., and Smith, H. Applied Regression Analysis. New York: Wiley, 1966. p. 134
SPSS: REGRESSION, ANOVA
SAS: GLM
Home
Glossary
References
About
Help
Do you want to treat all the independent variables as interval?
Yes
No
Home
Glossary
References
About
Help
Do you want to treat the relationships among the independent variables as linear?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Multiple discriminant function
Statistical Measure
• Wilks' lambda
Notes
Multiple discriminant function: Cooley, W. W., and Lohnes, P. R. Multivariate Data Analysis. New
York: Wiley,1971. p. 243
Wiley,1971. p. 248
Press, 1975. p. 103, 109
Hill, 1976.p. 223
SPSS: DISCRIMINANT (doesn't include Roy's greatest root or Pillai-Bartlett)
SAS: CANDISC, DISCRIM (doesn't include any of the statistical tests)

Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Is the dependent variable two-point?
Yes
No
Home
Glossary
References
About
Help
Is there a very high proportion in one category of the dependent variable (e.g.,
90%)?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Dummy variable regression using weighted least squares or maximum likelihood, usually on a
transformed dependent variable (e.g., on logits)
Statistical Measure
(none)
Notes
Draper, N. R., and Smith, H. Applied Regression Analysis. New York: Wiley, 1966. p. 77, 134
(Weighted least squares-General)

Center Computer Support Group, Institute for Social Research, University of Michigan, 1974.;
No. 67. Unpublished. Department of Statistics, University of Michigan, March 1976. (Maximum
likelihood-DREG)
No. 67. Unpublished. Department of Statistics, University of Michigan, March 1976. (GENCAT)
SAS: FUNCAT
Home
Glossary
References
About
Help
Do you want to assume homoscedasticity?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Dummy variable regression or multiple classification analysis
Statistical Measure
(none)
Notes
SPSS: REGRESSION, ANOVA
SAS: GLM
Home
Glossary
References
About
Help
Statistical Test
Dummy variable regression using weighted least squares or maximum likelihood, usually on a
transformed dependent variable (e.g., on logits)
Statistical Measure
(none)
Notes
Draper, N. R., and Smith, H. Applied Regression Analysis. New York: Wiley, 1966. p. 77, 134
(Weighted least squares-General)

Center Computer Support Group, Institute for Social Research, University of Michigan, 1974.;
No. 67. Unpublished. Department of Statistics, University of Michigan, March 1976. (Maximum
likelihood-DREG)
No. 67. Unpublished. Department of Statistics, University of Michigan, March 1976. (GENCAT)
SAS: FUNCAT
Home
Glossary
References
About
Help
Do you want to treat all of the independent variables as nominal?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Multidimensional contingency table analysis
Statistical Measure
Chi-square tests
Notes
for Social Research, The University of Michigan, 1973. (MNA)

Mimeographed paper, 1973. (ECTA)
Computer Programs in Bio-medicine 6 (1976): 196-231. (GENCAT)
1977. (General)
Chi-square tests: Feinberg, S. E. The Analysis of Cross-Classified Data. Cambridge,

Massachusetts: The MIT Press, 1977. p. 36 (Pearson and maximum likelihood)
SAS: FUNCAT
Home
Glossary
References
About
Help
Do you want to do an empirical search for strong relationships or to test a set of

prespecified relationships?
Search
Test
Home
Glossary
References
About
Help
Dependent: Nominal or Interval
Independent: Nominal or Ordinal
Other
Home
Glossary
References
About
Help
Statistical Test
Binary segmentation techniques
Statistical Measure
Notes
Sonquist, J. A.; Baker, E. L.; and Morgan, J. N. Searching for Structure. Revised edition. Ann
Arbor: Institute for Social Research, The University of Michigan, 1974. (SEARCH, formerly known
as AID)
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Do you want to treat the dependent variable as ordinal?
Yes
No
Home
Glossary
References
About
Help
Do you want to treat all the independent variables as nominal?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Multidimensional contingency table analysis based on the cumulative logistic distribution
Statistical Measure
Chi-square tests
Notes
Multidimensional contingency table analysis:
p. 541 (General)
Bock, R. D., and Yates, G. MULTIQUAL: Log-LinearAnalysis of Nominal or Ordinal Qualitative Data
by the Method of Maximum Likelihood. User's Guide. Chicago: National Educational Resources,
1973. (MULTIQUAL)
Chi-square tests:
p. 518 (Pearson and maximum likelihood)
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Do you want to treat any of the independent variables as ordinal?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Do you want to treat the dependent variable as interval and all the independent
variables as nominal and do you want to assume homoscedasticity?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Analysis of variance
Statistical Measure
F test
Notes
Analysis of variance: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley 1969.
p. 325
SPSS: ANOVA, MANOVA
SAS: GLM, ANOVA

Home
Glossary
References
About
Help
Do you want to treat all of the variables as nominal?
Yes
No
Home
Glossary
References
About
Help
Do you want to do a hierarchical analysis?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
Chi-square tests
Notes
Multidimensional Contingency Table Analysis:

1977. (General)

SAS: FUNCAT
Home
Glossary
References
About
Help
Statistical Test
Multidimensional contingency table analysis technique allowing an unconstrained design matrix
Statistical Measure
Chi-square tests
Notes
Multidimensional contingency table analysis: Landis, J. R.; Stanish, W. M.; Freeman, J. L.; and
Koch, G. G. A computer program for the generalized chi-square analysis of categorial data
using weighted least squares (GENCAT). Computer Programs in Bio-medicine 6 (1976): 196-231.
(GENCAT)

SAS: FUNCAT
Home
Glossary
References
About
Help
Do you want to treat the dependent variable as interval and all of the
independent variables as nominal?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Analysis of variance using weighted least squares
Statistical Measure
(none)
Notes
SAS: GLM
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Multidimensional contingency table analysis using weighted least squares may be appropriate.
Home
Glossary
References
About
Help
Do you want to measure agreement?
Yes
No
Home
Glossary
References
About
Help
All Nominal
All Ordinal
All Interval
Other
Home
Glossary
References
About
Help
Statistical Test
Light's agreement coefficient
Statistical Measure
Refer critical ratio of k to a table of the unit normal curve
Notes
Light, R. J. Measures of response agreement for qualitative data: some generalizations and
alternatives. Psychological Bulletin 76 (1971): 365-377.
Home
Glossary
References
About
Help
Statistical Test
Kendall's coefficient of concordance (W)
Statistical Measure
For N>7 use chi-square test for W;
for N<= 7 refer s to a table of critical values of s
Notes
Kendall's W: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York: McGraw-
Hill, 1956. p. 229
Chi-square test for W: Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York:
Table of critical values of s in the Kendall W: Siegel, S. Nonparametric Methods for the
Behavioral Sciences. New York: McGraw-Hill, 1956. p. 286
Home
Glossary
References
About
Help
Statistical Test
• Intraclass correlation coefficient
• Robinson's A
Statistical Measure
F test
Notes
Intraclass correlation: McNemar, Q. Psychological Statistics. Fourth edition. New York: Wiley
1969. p. 322
Robinson's A: Robinson, W. S. The statistical measurement of agreement. American Sociological

Review 22 (1957):17-25.
F test for intraclass correlation: McNemar, Q. Psychological Statistics. Fourth edition. New York:
Wiley 1969. p. 322
F test for Robinson's A: Robinson, W. S. The statistical measurement of agreement. American

Sociological Review 22 (1957):17-25. p. 23; McNemar, Q. Psychological Statistics. Fourth edition.
New York: Wiley 1969. p. 322
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Do you want to test whether the means (or proportions) on all variables are equal?
Yes
No
Home
Glossary
References
About
Help
Are all the variables two-point?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
(none)
Statistical Measure
Cochran's Q
Notes
Siegel, S. Nonparametric Methods for the Behavioral Sciences. New York: McGraw-Hill, 1956. p.
161
SPSS: NPAR, RELIABILITY

Home
Glossary
References
About
Help
Statistical Test
Analysis of variance with repeated measures
Statistical Measure
F test
Notes
Analysis of variance with repeated measures: McNemar, Q. Psychological Statistics. Fourth

edition. New York: Wiley 1969. p. 338
SPSS: RELIABILITY, MANOVA
SAS: GLM, ANOVA

Home
Glossary
References
About
Help
Do you want to treat the relationships among the variables as additive?
Yes
No
Home
Glossary
References
About
Help
Do you want to analyze patterns existing among variables or among individual

cases (e.g., persons)?
Variables
Cases
Home
Glossary
References
About
Help
Do you have two or more sets of variables and do you want to measure the
strength of the association between those sets?
Yes
No
Home
Glossary
References
About
Help
Do you want to treat the variables as measured on interval scales and

Yes
No
Home
Glossary
References
About
Help
Statistical Test
Canonical Correlation
Statistical Measure
• Wilks' lambda
Notes
Canonical correlation: Cooley, W. W., and Lohnes, P. R. Multivariate Data Analysis. New York:
Wiley,1971. p. 168; Harris, R. J. A Primer of Multivariate Statistics. New York: Academic Press,
1975. p. 132
Press, 1975. p. 143
Press, 1975. p. 143
Hill, 1976.p. 223
SPSS: CANCORR
SAS: CANCORR
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Does the analysis involve (a) one group of individual cases or (b) two or more
groups?"
One Group
Two or More Groups

Home
Glossary
References
About
Help
Do you want to explore covariation among the variables (e.g., to examine their
relationships to underlying dimensions) or do you want to find clusters of variables
that are more strongly related to one another than to the remaining variables?
Explore Covariation
Find Clusters
Home
Glossary
References
About
Help
Do you want to treat the variables as measured on interval scales and the
Yes
No
Home
Glossary
References
About
Help
Do you want to explore the relationships among the set of variables or do you
want to compare the pattern of the relationships with a prespecified pattern?
Explore Relationships
Compare Patterns
Home
Glossary
References
About
Help
Do you want to preserve the metric units in which the variables were measured or
Standardize
Original Metric
Home
Glossary
References
About
Help
Statistical Test
Factor analysis of correlation matrix
Statistical Measure
(none)
Notes
Gorsuch, R. L. Factor Analysis. Philadelphia: W. B. Saunders, 1974.
SAS: FACTOR
Home
Glossary
References
About
Help
Statistical Test
Factor analysis of variance-covariance matrix
Statistical Measure
(none)
Notes

Home
Glossary
References
About
Help
Do you want to preserve the metric units in which the variable were measured or
Standardize
Original Metric
Home
Glossary
References
About
Help
Statistical Test
Confirmatory factor analysis of a standardized variance-covariance matrix
Statistical Measure
Notes

1978. (COFAMM)

1974. pp. 118, 139
1978. (COFAMM)
Home
Glossary
References
About
Help
Statistical Test
Confirmatory factor analysis of variance-covariance matrix
Statistical Measure
Notes

1978. (COFAMM)

Educational Resources, 1978. (COFAMM)"
Home
Glossary
References
About
Help
Do you want to locate each of the variables in multidimensional space?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Non-metric multidimensional scaling techniques
Statistical Measure
(none)
Notes
Kruskal, J. B., and Wish, M. Multidimensional Scaling. Beverly Hills, California: Sage, 1978.
(General)
Kruskal, J. B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis.

Psychometrika 29 (1964a): 1-27. (MDSCAL)
Kruskal, J. B. Nonmetric multidimensional scaling: a numerical method. Psychometrika 29

(1964b): 115-130. (MDSCAL)
Guttman, L. A general nonmetric technique for finding the smallest coordinate space for a
configuration of points. Psychometrika 33 (1968): 469-506.
Lingoes, J. C.; Roskam, E. E.; and Borg, I. Geometric Representations of Relational Data. Second
edition. Ann Arbor: Mathesis Press, 1979. (MINISSA)
Young, F. W., and Torgerson, W. S. TORSCA, a FORTRAN IV program for Shepard-Kruskal

multidimensional scaling analysis. Behavioral Science 12 (1976): 498. (TORSCA)
(1977): 7-67. (ALSCAL)
Kruskal, J. B.; Young, F. W.; and Seery, J. B. How to use KYST, a very flexible program to do
multidimensional scaling and unfolding. Unpublished. Bell Laboratories, Murray Hills, New Jersey,
1973. (KYST)
SAS: ALSCAL
Home
Glossary
References
About
Help
Do you want to treat all variables as nominal?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
Chi-square tests
Notes

1977. (General)

SAS: FUNCAT
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
(none)
Notes
Home
Glossary
References
About
Help
Statistical Test
Clustering techniques such as single linkage, complete linkage, average linkage, K-means
Statistical Measure
(none)
Notes
Sneath, P. H. A., and Sokal, R. R. Numerical Taxonomy. San Francisco: W. H. Freeman, 1973.
SAS: CLUSTER, FASTCLUS

Home
Glossary
References
About
Help
Do you want to treat the variables as measured on interval scales and

Yes
No
Home
Glossary
References
About
Help
Statistical Test
Q-type factor analysis
Statistical Measure
(none)
Notes
Overall, J. E. and Klett, C. J. Applied Multivariate Analysis. New York: McGraw-Hill, 1972. p. 201;
SPSS: FACTOR
SAS: FACTOR
Home
Glossary
References
About
Help
Do you want to treat all of the variables as nominal?
Yes
No
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
Chi-square tests
Notes

1977. (General)

SAS: FUNCAT
Home
Glossary
References
About
Help
Statistical Test
Statistical Measure
Notes
Research Methods Tutorials
This page consists of tutorial Web projects put together by the graduate students in Program Evaluation and Planning at Cornell
University. Each of these projects teaches a specific research methods topic that the student selected. Students were instructed to
write these tutorials for undergraduate or graduate student audiences that are new to social research methods. Please send any
comments or questions to Bill Trochim.
Foundations
● The Layman's Guide to Social Research Methods (Colosi)

● Steps for a Succesful Policy Analysis (Barrientos)
● You're the Evaluator (Durgan)
● Validity (Rymarchyk)
● In Search of Truth Through Quantitative Reasoning (Maldonado)
● Ethical Dilemmas in Program Evaluation and Research Design (LaPolt)
● Before The Inquiry: Scoping the Research Terrain (Raymer)
● Identifying & Focusing Research Questions (Ruiz-Casares)
● Q & A: What is Concept Mapping? (Katsumoto)
● Epidemiology (Kim)
● Action Research (SenGupta)
● Qualitative Research Methods (Mensah-Dartey)
● Research Inferences: Understanding Validity, Validity Types, and Threats to Validity (Lo)
● The Use of Pattern Matching and Concept Mapping in Community Organizing Projects (King)
Sampling & External Validity
● Procedures in Sampling (Ojwaya)

● Sampling in Research (Mugo)
Measurement & Construct Validity
● Phenomenologists Behind Bars: How to do qualitative research interviews with incarcerated youths (Bedard)
● Observational Field Research (Brown)
● Threats to Construct Validity (Dreibe)
● How Stable and Consistent Is Your Instrument? (Johnson)
● The NEP and Measurement Validity (Pelstring)
● Reliability Learning Maze (Levell)
● Validity Issues in Measuring Psychological Constructs: The Case of Emotional Intelligence (Young)
Research Design & Internal Validity

● Introduction to Research Design (Abrahams)
● Choosing a Research Design (Belue)
● Research Design and Mixed Methods Approaches: A Hands-On Experience (Sydenstricker)
● Threats to Internal Validity (Wells)
● Designs to Rule Out Threats to Internal Validity (Kong)
● Securing Internal Validity - How to Avoid the Threats (Burns)
● Single Group Threats to Internal Validity (Martin)
● Regression Toward the Mean (Cheng)
● Randomized Experiments (Hussain)
● Factorial Design (Coffin)
● Analysis of Covariance Experimental Designs (Lee)
● Correlational Research Designs (LaMar)
● The Basics of Regression-Discontinuity Designs (Nieves)
● The Regression-Discontinuity Design (Rwampororo)
● The Regression-Discontinuity Design in Teacher Education (Parapi)
● Validity and the Regression-Discontinuity (RD) Design (Walkley)
● The Nonequivalent Dependent Variables Design with Pattern Matching (Volz)
● Non-Experimental Design (Blakesley)
● STS Science Curriculum Research Designs (Chakane)
● Comparative Research Design Simulation for Program Evaluation (Rueda)
● Longitudinal Research (Cho)
Data Analysis & Conclusion Validity
● Statistical Analysis (Scott-Pierce)

● Introduction to Statistical Power (Biswas)
● Multivariate Statistics: An Introduction (Flynn)
● The Concept of Analysis of Variance(Rehberg)
● Categorical Data Analysis (Cho)
● Analyzing the Multitrait Multimethod Matrix (Jabs)
Social Research Methods

Home Page
Copyright © 2004, 1996, William M.K. Trochim
Files to Search
The Layman's Guide to Social Research Methods
Introduction:
Understanding the basics of social research methods can, at times, feel as if one is walking a very fine line between complete
comprehension and abysmal failure. The terminology is complex, the concepts highly interrelated, and as we all know, every little
detail matters when it comes to a successful thesis or dissertation. As such, I decided a sort of "cliff notes" to my experience in two
good courses on this topic may be useful (if not amusing) to others who are in the same position as I am - a weary eyed student about
to begin a dissertation. It is, as you know a daunting task filled with both anticipation and a lot of fear.
Contents:This web page provides an overview of the key elements of social research methods and is comprised of four different
sections for your use. The first section is my attempt to explain (in English) the primary differences between reliability and validity.
This section includes a discussion of threats to many types of validity - as all Cornell students in Bill Trochim's (check out his Center
for Social Research Methods) classes have been brainwashed to obsess about them (but we know it is for our own good). In addition,
the second section provides a brief discussion on two fundamental ways to increase the strength of your analysis - statistical power,
and research design. I've also compiled a glossary of some of the terminology that we must face (and know) to appear credible as
fledgling social scientists. Finally, I've tried to pool together some resources I have gathered over the last two years - both printed
materials and additional web sites that may be helpful in understanding these concepts.
Menu:
● Reliability and Validity: What's the Difference?

● Strengthening Your Analysis
● Glossary
● Additional Resources
I hope this page helps you in your quest for understanding. If, at the very least, you know you are not alone in your struggles, then I
have succeeded somewhat!
Please e-mail Laura Colosi with any comments or questions.
Copyright 1997, Laura A. Colosi. All Rights Reserved.

Reliability and Validity: What's the Difference?
Reliability
Definition: Reliability is the consistency of your measurement, or the degree to which an instrument measures the same way each time it is used
under the same condition with the same subjects. In short, it is the repeatability of your measurement. A measure is considered reliable if a
person's score on the same test given twice is similar. It is important to remember that reliability is not measured, it is estimated.
There are two ways that reliability is usually estimated: test/retest and internal consistency.
Test/Retest
Test/retest is the more conservative method to estimate reliability. Simply put, the idea behind test/retest is that you should get the same score
on test 1 as you do on test 2. The three main components to this method are as follows:
1.) implement your measurement instrument at two separate times for each subject;
2). compute the correlation between the two separate measurements; and
3) assume there is no change in the underlying condition (or trait you are trying to measure) between test 1 and test 2.
Internal Consistency
Internal consistency estimates reliability by grouping questions in a questionnaire that measure the same concept. For example, you could write
two sets of three questions that measure the same concept (say class participation) and after collecting the responses, run a correlation between
those two groups of three questions to determine if your instrument is reliably measuring that concept.
One common way of computing correlation values among the questions on your instruments is by using Cronbach's Alpha. In short, Cronbach's
alpha splits all the questions on your instrument every possible way and computes correlation values for them all (we use a computer program
for this part). In the end, your computer output generates one number for Cronbach's alpha - and just like a correlation coefficient, the closer it is
to one, the higher the reliability estimate of your instrument. Cronbach's alpha is a less conservative estimate of reliability than test/retest.
The primary difference between test/retest and internal consistency estimates of reliability is that test/retest involves two administrations of the
measurement instrument, whereas the internal consistency method involves only one administration of that instrument.
Validity
Definition:Validity is the strength of our conclusions, inferences or propositions. More formally, Cook and Campbell (1979) define it as the
"best available approximation to the truth or falsity of a given inference, proposition or conclusion." In short, were we right? Let's look at a
simple example. Say we are studying the effect of strict attendance policies on class participation. In our case, we saw that class participation
did increase after the policy was established. Each type of validity would highlight a different aspect of the relationship between our treatment
(strict attendance policy) and our observed outcome (increased class participation).
Types of Validity:
There are four types of validity commonly examined in social research.
1. Conclusion validity asks is there a relationship between the program and the observed outcome? Or, in our example, is there a
connection between the attendance policy and the increased participation we saw?
2. Internal Validity asks if there is a relationship between the program and the outcome we saw, is it a causal relationship? For
example, did the attendance policy cause class participation to increase?
3. Construct validity is the hardest to understand in my opinion. It asks if there is there a relationship between how I operationalized
my concepts in this study to the actual causal relationship I'm trying to study/? Or in our example, did our treatment (attendance policy)
reflect the construct of attendance, and did our measured outcome - increased class participation - reflect the construct of participation?
Overall, we are trying to generalize our conceptualized treatment and outcomes to broader constructs of the same concepts.
4. External validity refers to our ability to generalize the results of our study to other settings. In our example, could we generalize our
results to other classrooms?
Threats To Internal Validity
There are three main types of threats to internal validity - single group, multiple group and social interaction threats.
Single Group Threats apply when you are studying a single group receiving a program or treatment. Thus, all of these threats can be greatly
reduced by adding a control group that is comparable to your program group to your study.
A History Threat occurs when an historical event affects your program group such that it causes the outcome you observe (rather than your
treatment being the cause). In our earlier example, this would mean that the stricter attendance policy did not cause an increase in class
participation, but rather, the expulsion of several students due to low participation from school impacted your program group such that they
increased their participation as a result.
A Maturation Threat to internal validity occurs when standard events over the course of time cause your outcome. For example, if by chance,
the students who participated in your study on class participation all "grew up" naturally and realized that class participation increased their
learning (how likely is that?) - that could be the cause of your increased participation, not the stricter attendance policy.
A Testing Threat to internal validity is simply when the act of taking a pre-test affects how that group does on the post-test. For example, if in
your study of class participation, you measured class participation prior to implementing your new attendance policy, and students became
forewarned that there was about to be an emphasis on participation, they may increase it simply as a result of involvement in the pretest measure
- and thus, your outcome could be a result of a testing threat - not your treatment.
An Instrumentation Threat to internal validity could occur if the effect of increased participation could be due to the way in which that pretest
was implemented.
A Mortality Threat to internal validity occurs when subjects drop out of your study, and this leads to an inflated measure of your effect. For
example, if as a result of a stricter attendance policy, most students drop out of a class, leaving only those more serious students in the class
(those who would participate at a high level naturally) - this could mean your effect is overestimated and suffering from a mortality threat.
The last single group threat to internal validity is a Regression Threat. This is the most intimating of them all (just its name alone makes one
panic). Don't panic. Simply put, a regression threat means that there is a tendency for the sample (those students you study for example) to score
close to the average (or mean) of a larger population from the pretest to the posttest. This is a common occurrence, and will happen between
almost any two variables that you take two measures of. Because it is common, it is easily remedied through either the inclusion of a control
group or through a carefully designed research plan (this is discussed later). For a great discussion of regression threats, go to Bill Trochim's
Center for Social Research Methods.
In sum, these single group threats must be addressed in your research for it to remain credible. One primary way to accomplish this is to include
a control group comparable to your program group. This however, does not solve all our problems, as I'll now highlight the multiple group
threats to internal validity.
Multiple Group Threats to internal validity involve the comparability of the two groups in your study, and whether or not any other factor
other than your treatment causes the outcome. They also (conveniently) mirror the single group threats to internal validity.
A Selection-History threat occurs when an event occurring between the pre and post test affects the two groups differently.
A Selection-Maturation threat occurs when there are different rates of growth between the two groups between the pre and post test.
Selection-Testing threat is the result of the different effect from taking tests between the two groups.
A Selection-Instrumentation threat occurs when the test implementation affects the groups differently between the pre and post test.
A Selection-Mortality Threat occurs when there are different rates of dropout between the groups which leads to you detecting an effect that
may not actually occur.
Finally, a Selection-Regression threat occurs when the two groups regress towards the mean at different rates.
Okay, so know that you have dragged yourself through these extensive lists of threats to validity - you're wondering how to make sense of it all.
How do we minimize these threats without going insane in the process? The best advice I've been given is to use two groups when possible, and
if you do, make sure they are as comparable as is humanly possible. Whether you conduct a randomized experiment or a non-random study -->
YOUR GROUPS MUST BE AS EQUIVALENT AS POSSIBLE! This is the best way to strengthen the internal validity of your research.The
last type of threat to discuss involves the social pressures in the research context that can impact your results. These are known as social
interaction threats to internal validity.
Diffusion or "Imitation of Treatment occurs when the comparison group learns about the program group and imitates them, which will lead to
an equalization of outcomes between the groups (you will not see an effect as easily).
Compensatory Rivalry means that the comparison group develops a competitive attitude towards the program group, and this also makes it
harder to detect an effect due to your treatment rather than the comparison groups reaction to the program group.
Resentful Demoralization is a threat to internal validity that exaggerates the posttest differences between the two groups. This is because the
comparison group (upon learning of the program group) gets discouraged and no longer tries to achieve on their own.
Compensatory Equalization of Treatment is the only threat that is a result of the actions of the research staff - it occurs when the staff begins to
compensate the comparison group to be "fair" in their opinion, and this leads to an equalization between the groups and makes it harder to detect
an effect due to your program.
I know, I know - you're thinking - no I just can't go on. Let's take a deep breath and I'll remind you what construct validity is, and then we'll look
at the threats to it one at a time. OK? OK.
Constuct validity is the degree to which inferences we have made from our study can be generalized to the concepts underlying our program in
the first place. For example, if we are measuring self-esteem as an outcome, can our definition (operationalization) of that term in our study be
generalized to the rest of the world's concept of self-esteem?
Ok, let's address the threats to construct validity slowly - don't be intimidated by their lengthy academic names - I'll provide an English
translation.
Inadequate Preoperational Explication of Constructs simply means we did not define our concepts very well before we measured them or
implemented our treatment. The solution? Define your concepts well before proceeding to the measurement phase of your study.
Mono-operation bias simply means we only used one version of our independent variable (our program or treatment) in our study, and hence,
limit the breadth of our study's results. The solution? Try to implement multiple versions of your program to increase your study's utility.
Mono-method bias simply put, means that you only used one measure or observation of an important concept, which in the end, reduces the
evidence that your measure is a valid one. The solution? Implement multiple measures of key concepts and do pilot studies to try to demonstrate
that your measures are valid.
Interaction of Testing and Treatment occurs when the testing in combination with the treatment produces an effect. Thus you have
inadequately defined your "treatment," as testing becomes part of it due to its influence on the outcome. The solution? Label your treatment
accurately.
Interaction of Different Treatments means that it was a combination of our treatment and other things that brought about the effect. For
example, if you were studying the ability of Tylenol to reduce headaches and in actuality it was a combination of Tylenol and Advil or Tylenol
and exercise that reduced headaches -- you would have an interaction of different treatments threatening your construct validity.
Restricted Generalizability Across Constructs simply put, means that there were some unanticipated effects from your program, that may make
it difficult to say your program was effective.
Confounding Constructs occurs when you are unable to detect an effect from your program because you may have mislabeled your constructs
or because the level of your treatment wasn't enough to cause an effect.
As with internal validity, there are a few social threats to construct validity also. These include:
1. Hypothesis Guessing: when participants base their behavior on what they think your study is about - so your outcome is really not
due solely to the program - but also to the participants' reaction to you and your study.
2.Evaluator Apprehension: When participant's are fearful of your study to the point that it influences the treatment effect you detect.
3.Experimenter Expectancies: when researcher reactions shape the participant's responses - so you mislabel the treatment effect you
see as due to the program when it is more likely due to the researchers behavior.
See, that wasn't so bad. We broke things down and attacked them one at a time. You may be wondering why I haven't given you along list of
threats to conclusion and external validity - the simple answer is it seems as if the more critical threats involve internal and construct validity.
And, the means by which we improve conclusion and external validity will be highlighted in the section on Strengthening Your Analysis.
Summary
The real difference between reliability and validity is mostly a matter of definition. Reliability estimates the consistency of your measurement,
or more simply the degree to which an instrument measures the same way each time it is used in under the same conditions with the same
subjects. Validity, on the other hand, involves the degree to which your are measuring what you are supposed to, more simply, the accuracy of
your measurement. It is my belief that validity is more important than reliability because if an instrument does not accurately measure what it is
supposed to, there is no reason to use it even if it measures consistently (reliably).
Back to the Home Page

Strengthening Your Analysis
There are several ways to strengthen your analysis. The first, discussed in Reliability and Validity: What's the Difference?, is to
address as many threats to the validity of your research as possible. In addition, employing the methods discussed to achieve an
accurate estimate of the reliability of your measure is also a good idea. This section will provide you with a brief overview of two
other ways in which you can strengthen your analysis: basic statistics and research design.
Statistics
Conclusion Validity relates to the statistical power of your study, and is the first type of validity to deal with in conducting any study
- as it addresses the attempt to determine a relationship between your variables and your outcome.
There are three primary ways to improve conclusion validity:
1.Good reliability of your measure, as discussed in the first section of this web page;
2.Good implementation usually achieved by standardizing the way in which your program is administered; and
3.Good statistical power.
I'd like to focus a bit on statistical power. There are four primary components of statistical power: the sample size, the effect size,
alpha level and power.
Sample size is simply the number of people or units available to be studied.
Effect Size is simply the ability to detect an effect relative to the other factors that appear in your study.
Alpha level refers to the likelihood that what you observed is due to chance rather than your program.
Power is the likelihood that you will detect an effect from your program when it actually happens.
For additional terms that are related to statistical inferences go to the Glossary of Terms. Finally, remember that the purpose of most
research is to assess or explore relationships among a set of variables, and that the use of some straightforward, thought-out basic
statistical methods can really enhance the strength of your findings.
Research Design
The other way to strengthen your analysis is through a thought-out research design. There are three primary types of research designs:
1. Randomized or true experimental designs, which are the strongest design for establishing a cause and effect relationship,
as random assignment is used - and the groups involved are considered equivalent. This helps to reduce all of the multiple
group threats to internal validity discussed earlier.
2.Quasi-Experimental designs are those which employ multiple measures or a control group without randomly assigning
participants to group. The ability of these designs to establish a cause effect relationship is dependent upon the degree to
which the two groups in the study are equivalent.
3. Non-experimental designs do not employ multiple measure, do not use a control group and do not use any random
assignment in its design. These are usually descriptive studies conducted using a simple survey instrument only once. While
they are useful in their own right - they are weak in establishing cause/effect relationships.
For a comprehensive discussion of the variety of research designs that fit into all these categories, check out Bill Trochim's Center for
Social Research Methods. Finally, a good research design is realistic in its scope relative to the resources of your study, appropriate
for the context in which your treatment will occur, and involves a few good steps to measure your effect efficiently.
Back to Home Page
Copyright 1997, Laura A. Colosi. All rights reserved.

Glossary of Terms
Concurrent validity- the ability to distinguish between groups that should be theoretically distinguishable.
Content validity - whether or not your instrument reflects the content you are trying to measure.
Convergent validity - measures that should be related are related. Discriminant validity - measures that should not be related are not.
Face Validity - addresses whether or not a measurement instrument is valid on its face.
Predictive validity - the ability to predict something you want to predict.
Basic Statistical Terms
Causal relationship- a relationship where variation in one variable causes variation in another.
Correlation - a measure of the association between two variables, closer to 1 means a stronger correlation.
Covariation - a measure of how two variables both vary relative to one another.
Deviation - the difference of a score from the mean.
Error Component - the part of the variance of an observed variable that is due to random measurement errors.
Hypothesis - a theory or prediction made about the relationship between two variables.
Interaction - when the effect of one variable (or factor) is not the same at each level of the other variable (or factor).
Linear Correlation - a statistical measure of the strength of the relationship between variables (e.g., treatment and outcome). The closer the coefficient is to +1 or
-1, the stronger the relationship - a positive correlation implies a direct relationship between the variables, a negative correlation implies an inverse relationship.
Linear Regression - the prediction equation which estimates the value of the outcome variable ("y") for any given treatment variable ("x").
Main Effect - the effect of a factor on the dependent variable (response) measured without regard to other factors in the analysis.
Mean - the average of your sample, computed by taking the sum of the individual scores and dividing them by the total number of individuals (sample size, "n").
Median - if you rank the observations according to size, the median is the observation that divides the list into equal halves.
Mode - the observation that occurs most frequently.
Null Hypothesis - the prediction that there is no relationship between your treatment and your outcome.
Random sample - a sample of a population where each member of the population has an equal chance of being in the sample.
Significance level - the probability of finding a relationship between your treatment and effect when there isn't one in reality.
Type I Error - rejecting the null hypothesis when it is true.
Type II Error - accepting the null hypothesis when it is false.
Variation - a measure of the spread of the variable, usually used to describe the deviation from a central value (e.g, the mean). Numerically, its the sum of the
squared deviations from the mean.
Back to home page.

Additional Resources
There are several good resources available in print and on line that deal with social research methodology and statistical analysis.
Printed Materials
Carmines, E. and Zeller, R. (1979). Reliability and Validity Assessment. Newbury Park: Sage Publications.
This is a short booklet, easy to read and full of information on reliability and validity.
Converse, J. and Presser, S. (1986). Survey Questions: Handcrafting the Standardized Questionnaire. Newbury Park: Sage Publications.
Again a short booklet, easy to read with a great deal of basic information on developing questionnaires. Also provides some relevant material on
reliability and validity threats specific to questionnaires.
Fowler, F.J. (1988). Survey Research Methods, Second Edition. Newbury Park: Sage Publications.
A relatively short book - but I have never read it straight through, but rather use it as reference for questions on sampling, data collection and survey
issues.
Hinton, P. (1995). Statistics Explained: A Guide for Social Science Students. London: Routledge Publishing.
I hate the title of this book - it automatically deflates the confidence of the fledgling social scientist. This book is great - it is like a big version of cliff
notes for some pretty advanced statistical concepts. I highly recommend it if you are looking for a refresher in these topics.
Hosmer, D. and Lemeshow, S. (1991). The importance of assessing the fit of logistic regression models: a case study. American Journal of Public Health,
V. 81, No. 12, 1630-1635.
A nice, short - real world - example of using logistic regression analysis.
Kim, J.O. and Mueller, C. (1978). Factor Analysis: Statistical Methods and Practical Issues. Newbury Park: Sage Publications.
Another short, easy to read booklet.
Kleinman, D., Kupper, L. and Muller, K. (1988). Applied Regression Analysis and Other Multivariable Methods. California: Duxbury Press.
This is a big, ugly, and frightening text book. It is written like a textbook and does get confusing at times. The thing I like about it is it provides lengthy,
real-world examples of almost each major topic in each chapter. It also has a good glossary and a lot of tables that can be useful in understanding
regression, etc.
McIver, J. and Carmines, E. (1981). Unidimensional Scaling. Newbury Park: Sage Publications.
Unidimensional what? This is another good, short booklet that explains the topic pretty well.
Sproull, N. (1995).Handbook of Social Research Methods: A Guide for Practitioners and Students in the Social Sciences. New Jersey, The Scarecrow
Press, Inc.
Another textbook? I must be crazy - but this one is actually written in English, I found it quite useful as a reference book.
Trochim, W.M.K. (1989). Introduction to concept mapping for planning and evaluation. Evaluation and Program Planning, 12, 1-16.
What can I say about Bill Trochim? He is the new father of social research methodology - a must read!
Web sites: Social Science Research & Methodology
Bill Trochim's Center for Social Research Methods

By far the most comprehensive and straightforward information you will find on a variety of difficult topics. The site also contains simulation exercises,
references and additional web resources.
Project Gallery: Program Evaluation and Research Design

This site is the baby of Bill Trochim's site - a combination of many brilliant grad student's thoughts and insight into various topics in social research
methodology.
Academic Press
This site is best used to find other sources of printed materials on selected topics in social science research.
Social Science Research Council

This site is a grab bag of printed materials, conference schedules and other sites to look at.

Steps for a Successful Policy Analysis
What is Policy Analysis?

"The process through which we identify and evaluate alternative policies or programs that are
intended to lessen or resolve social, economic, or physical problems."
- Carl V. Patton
Policy Analysis in Six easy steps.

Based on the ideas and approach followed by Carl V. Patton there exists a very simple pattern of
ideas and points to be considered in doing an actual policy analysis. The six steps are as follows:
1. Verify, define, and detail the problem. The most relevant and important of them all because
many times the objectives are not clear or even contradictory from each other. A successful
policy analysis will have allocated and identified clearly the problem to be resolved in the
following steps. This is the foundation for an efficient and effective outcome of the whole
process. The analyst must question both the interested parties involved as well as their agendas of
the outcome. Locating the problem in such a way that eliminates any ambiguity for future
references.
2. Establish evaluation criteria. In order to compare, measure and select among alternatives,
relevant evaluation criteria must be established. In this step it must be considered cost, net
benefit, effectiveness, efficiency, equity, administrative ease, legality, and political acceptability.
Economic benefits must be considered in evaluating the policy. How the policy will harm or
benefit a particular group or groups will depend on the number of option viable Options more
difficult than others must be considered but ultimately decided through analyzing the parties
involved with policy. Political and other variables go hand in hand with the evaluation criteria to
be followed. Most of the time the client, or person or group, interested in the policy analysis will
dictate the direction or evaluation criteria to follow.
3. Identify alternative policies. In order to reach this third step the other two must have been
successfully reached and completed. As it can be seen, the policy analysis involves an
incrementalist approach; reaching one step in order to go on to the next. In this third step
understanding what is sought is very important. In order to generate alternatives, it becomes
important to have a clear understanding of the problem and how to go about it. Possible
alternatives include the "do nothing approach" (status quo), and any other that can benefit the
outcome. Combining alternatives generates better solutions not thought of before. Relying on
past experiences from other groups or policy analysis helps to create a more thorough analysis
and understanding. It is important to avoid settling prematurely on a certain number of options in
this step; many options must be considered before settling into a reduced number of alternatives.
Brainstorming, research, experiments, writing scenarios, or concept mapping greatly help in
finding new alternatives that will help reach an "optimal" solution.
4. Evaluate alternative policies. Packaging of alternatives into strategies is the next step in
accomplishing a thorough policy analysis. It becomes necessary to evaluate how each possible
alternative benefits the criteria previously established. Additional data needs to be collected in
analyzing the different levels of influence: the economical, political and social dimensions of the
problem. These dimensions are analyzed through quantitative and qualitative analysis, that is the
benefits and costs per alternative. Political questions in attaining the goals are analyzed as to see
whether they satisfy the interested parties of the policy analysis. In doing this more concise
analysis the problem may not exist as originally identified; the actual problem statement from the
first step may suffer a transformation, which is explained after evaluating the alternatives in
greater detail. New aspects of the problem may be found to be transient and even different from
the original problem statement. This modification process allows this method of policy analysis
to allow for a "recycling" of information in all the steps. Several fast interactions through the
policy analysis may well be more efficient and effective than a single detailed one. What this
means is that the efficiency is greatly increased when several projects are analyzed and evaluated
rather than just one in great detail, allowing for a wider scope of possible solutions. Patton further
suggests to avoid the tool box approach: attacking options with a favorite analysis method; its
important to have a heterogeneous approach in analyzing the different possible alternatives. It
becomes inefficient to view each alternative under a single perspective; its clearly relevant the
need to evaluate each alternative following diverse evaluating approaches singled out according
to the uniqueness of each of them.
5. Display and distinguish among alternative policies. The results of the evaluation of possible
alternatives list the degree to which criteria are met in each of them. Numerical results don't
speak for themselves but are of great help in reaching a satisfying solution in the decision.
Comparison schemes used to summarize virtues are of great help in distinguishing among several
options; scenarios with quantitative methods, qualitative analysis, and complex political
considerations can be melded into general alternatives containing many more from the original
ones. In making the comparison and distinction of each alternative it is necessary to play out the
economic, political, legal, and administrative ramification of each option. Political analysis is a
major factor of decision of distinction among the choices; display the positive effects and
negative effects interested in implementing the policy. This political approach will ultimately
analyze how the number of participants will improve or diminish the implementation. It will also
criticize on how the internal cooperation of the interested units or parties will play an important
role in the outcome of the policy analysis. Mixing two or more alternatives is a very common and
practiced approach in attaining a very reasonably justified policy analysis.
6. Monitoring the implemented policy. Assure continuity, determine whether they are having
impact. "Even after a policy has been implemented, there may be some doubt whether the
problem was resolved appropriately and even whether the selected policy is being implemented
properly. This concerns require that policies and programs be maintained and monitored during
implementation to assure that they do not change for unintentionally, to measure the impact that
they are having, to determine whether they are having the impact intended, and to decide whether
they should be continued, modified or terminated."
Mainly, we are talking about internal validity; whether our programs makes a difference, if there
is no other alternate explanations. This step is very important because of the special characteristic
that program evaluation and research design presents in this particular step. William Trochim
presents a very complete explanation of this concept. His Home Page will be of great help in this
matter.
Related Sites
● Political Science and Public Policy
● Administrative Science Quarterly
● Advanced Management Journal
● Institute of Cognitive and Decision Sciences
● Public Affairs Information for US Department
● Evaluation Review
● Harvard Business Review
● Public Administration Review
● Policy Sciences
Back to Top
Contact Information
Electronic mail address
eb32@cornell.edu
Back to Top
Back to Top
Copyright information goes here.

Last revised: 22 Apr 1999 12:34:13 -0400s-type="EDITED" s-format="%B %d, %Y" -->May 08, 199723 Aug
2004 23:26:28 -0700.
You're the Evaluator!
Is your head in the clouds (perhaps the virtual variety)? You've read
through the Knowledge Base and you've studied the 3 major types of
Research Designs. Right?
Here's your chance to test your knowledge of the Regression

Discontinuity (RD) Design with a 'real life' scenario (you can also goto
knowledge base links to review topics along the way).
Congratulations! You've been commisioned by the US General

Accounting Office (GAO) to evaluate the Federal Special Supplemental
Food P rogram for Women, Infants and Children (better known as WIC) in
New York State.
CLICK HERE TO ACCEPT THE CHALLENGE
Tutorial Home Page TrochimHome Page
Copyright ©: 1997. Sally R. Durgan. All Rights Reserved

The WIC Evaluation
First of all, WIC is USDA program that provides nutrition education and
nutritious foods, such as milk, juice and cereal to p regnant, lactating and
post-partum women and young children up to age five.
The WIC Criteria: Women, Infants and Children whose household

income are 185% of the federal poverty and who are at "nutritional
risk" are eligible for WIC in NY State.
The Evaluation Question: Does participation in the WIC program

improve birth outcomes for participating mothers?*
*You will just be considering this aspect of the program. Birth outcomes will measured by birth
weight.
You want to find a design both strong in internal validity and ethical
considerations.
The 'criteria' (think cutoff) may lead you to consider the Regression
Discontinuity (RD) design.
Click below to examine the benefits and drawbacks of the

RD DESIGN FOR A WIC EVALUATION
Top of Page
Previous Page

The Regression-Discontinuity
(RD) Design For A WIC Evaluation

Criteria
Advantages/Disadvantages
Criteria
Does the WIC program meet "assignment based pm a cutoff value on a

preprogram measure"?
Are all persons on one side of the cutoff are assigned to one group? Yes,
women with low income and "nutritional risk" are enrolled in WIC.
Are all persons on the other side of the cutoff are assigned to the other
group? Yes, women not meeting the criteria are not enrolled in WIC.
Is there a continuous quantitative preprogram measure? Yes, women are

assigned based on low income (185% of the poverty line). The amount (in
dollars) is a continuous quantitative preprogram measure. Birthweight, the
outcome of interest, is a continuous quantitative postprogram measure.
Advantages/Disadvantages:
What are some of the advantages of using the for an evaluation of the
WIC program? (Remember internal validity and ethics)
Program participants are assigned to groups based on cutoff criteria (low
income and nutritional risk).
A well-implemented RD design is comparable in internal validity to

conclusions from randomized experiments.
From an ethical perspective, RD gets the program to those most in need.

It is not necessary to deny the program for some pregnant women who
meet the criteria simply for the sake of a scientific test.
An RD design could point out whether or not the program assignment is

really based on the established criteria. This is important for accountability.
What are some of the disadvantages?
In practice, the criteria, particularly "nutritional risk" may not be defined

in the same way at all WIC sites in NY state or for all individuals at a
single WIC site. This would create a "fuzzy" design.
In sum, the main advantage of the RD design in this "case study" is for
both ethical and practical (program improvement, accountability) reasons.
I hope this increased your understanding of the RD design (or at least

raised awareness of the complexities in the 'real world').
Top of Page
Previous Page

Research Synthesis Gallery
This page consists of a gallery of Web projects put together by the students in HSS691: Program Evaluation and Research Design at
Cornell University. Each of these projects is a critical synthesis of currently available web sites (current as of the course date) in a
specific topical area of interest to the student. Students were instructed to search the web for the best web resources to address their
topical interest and to emphasize especially any research issues or web sites that were relevant. Papers are listed by topical area.
Please send any comments or questions to Bill Trochim.
Communication
● Information Communication Technology (SenGupta) (2000)

● Computer-Mediated Communication Resource Page (Martin) (1997)
Community Development and Service
● Community Development Connections and Resources (King) (1997)

● Preparing 4-H for the Twenty-First Century (LaPolt) (1997)
Education
● Attention Deficit Hyperactivity Disorder (Scott-Pierce) (2000)

● High School Dropouts (Wells) (2000)
● Moral Development: Reasoning, Judgment, and Action (LaMar) (1997)
● STS School Science Programs (Chakane) (1996)
● Institute for Education Research in Central Asia (Samid Hussain) (1996)
● Emotions: Intelligence and Decision Making (Young) (1996)
Environment
● Environmental Sociology: A Resource Page (Sydenstricker-Neto) (1997)

● Indigenous Peoples and Environmental Initiatives in the U.S. (Maldonado) (1996)
● Agroforestry Research for Woodfuel Development (Mugo) (1996)
● Evaluation Tools for Integrated Pest Management Projects(Rueda) (1996)
Health
● BREAST FEEDING and HIV in DEVELOPING COUNTRIES (Biswas) (2000)

● THE LEAD RESOURCE PAGE (Jusko) (2000)
● Aids In Sub-Saharan Africa (Mensah-Darty) (2000)
● AIDS ORPHANS IN SUB-SAHARAN AFRICA (Ruiz-Casares) (2000)
● Today's Vision of Public Health (Canaj) (1997)
● Worksite Wellness Programs: A Health Promotion Initiative (Nieves) (1997)
● Self-Efficacy in Health Related Behavior Change (Walkley) (1997)
● Managed Care (Blakesley) (1996)
● Health Care in a Managed Care Environment (Levell) (1996)
International Development
● Women in Agriculture in Sub-Sahara Africa (Lo) (2000)

● BEEKEEPING AND SUSTAINABLE DEVELOPMENT (Ojwaya) (2000)
● Participation in International Development (Katsumoto) (1997)
● Poverty and Hunger in Sub-Saharan Africa (Rwampororo) (1997)
● Sampling in Developing Countries (Flynn) (1996)
● Evaluation of Agricultural Education Programs (Parapi) (1996)
● Spotlight on Evaluation Techniques in International Development Programs (Volz) (1996)
Management/Business
● The Internet Advertising Page (Cho) (2000)

● Contingent Valuation (Kong) (2000)
● Organizational Learning and Memory (Cho) (1997)
● Research in the Global Marketplace of Ideas (Rehberg) (1996)
Nutrition
● Evaluating Nutrition Programs (Durgan) (1997)

● Web Resources of Nutrition for the Elderly (Lee) (1997)
● Nutrition Education Resources (Jabs) (1996)
Research Methods
● Participatory Evaluation (Coffin) (2000)

● Participatory Evaluation Internet Resources (Raymer) (2000)
● Decision and Judgement Analysis (Belue) (1997)
● The PopStop: Topics on Population and Demography (Burns) (1997)
● Web Advertising and Statistics (Cheng) (1997)
● Measuring Environmental Attitudes - The New Environmental Paradigm (Pelstring) (1997)
● Integration of Qualitative and Quantitative Research Methods to Strengthen Internal Validity (Bowen) (1996)
● Welcome to Surveys On-line! (Brown) (1996)
Social Welfare
● Children's Rights Website (Bedard) (2000)

● Information Systems Technology for Human Services (Abrahams) (1997)
● Homeless Children in Mexico (Barrientos) (1997)
● Understanding Welfare Reform (Colosi) (1997)
● Child Welfare Resources on the Web (Johnson) (1997)
● Solution Focused Therapy for Child Welfare (Rymarchyk) (1997)
● Service Initiatives (Dreibe) (1996)
● Violence Against Women: Research on the Web (Ward) (1996)
Theology
● Who Is Jesus? (Kim) (2000)
Trochim Home Page

Information Communication Technology - ICT
Aiming Towards Global Connectivity...
Information Communication Technology(ICT) is a powerful tool - a tool that can help build social networks
and contribute towards progressive social change. Increased access to information goes hand in hand with
socio-economic and overall human development.
Though the right to seek, receive and impart information is a basic human right, many people in the
developing world are deprived of this very basic right. Using ICT for Development can be an effective way
of reaching out to rural and remote 'unconnected masses'.
When we talk about ICT we include Telephones, Faxes, Audio Video Equipment, Radio, Television and the
Internet. In true spirit of creating an information society that fits into the information age, efforts are being
made to cross digital divides and provide equal access to every body. Developmental agencies and
Communicators world wide have taken up the challenge of bridging information gaps between the North
and the South.
This website introduces to its readers the concept of using ICT for development and provides links to sites
which deal with "Reaching the Unreached". The United Nations and its agencies have included ICT on
their agendas. Telecenters are Mushrooming in the developing world, providing access and information on
issues like agriculture, education, health, human rights and legal counseling.
Success Stories
The Grameen Telecom's village phone project is a landmark case study of ICT for development. The
Grameen Bank promoted micro enterprise among women in villages in Bangladesh by enabling them to
retail phone calls on their cellular phones . This is also a classic case of 'leapfrogging' stages in development,
it clearly shows how you can skip stages of development by adopting new technology.
Literature on ICT
If you want to read about the wild development of information and communication technologies and their
sociocultural impacts this book is a must. Issues such as freedom of the media, the role of public service
broadcasting, editorial independence, the use of the Internet in education, cultural pluralism, worldwide
access to information resources, challenges to intellectual property and censorship on the Internet are
discussed by eminent specialists from all origins. Most importantly, the impact of information and
communication technologies on human development and the role that governments should play in this
respect is addressed. Regional chapters examine to which extent telecommunications, computers and the
Internet reach developed and developing countries, urban and rural areas, literate and illiterate, the rich
and the poor.
● Publications on Information and Communication by IDRC

● Books on Development Communication
Telephone, fax, email, Internet; telemedicine, distance education, news distribution, telecommuting: are
some of the services offered by the community Telecenters. At Cornell the Department of Communication
is is actively involved in the research and design of Telecenters in Developing nations. Prof. Royal D. Colle
is a pioneer in the field of ICT and is presently looking into setting Telecenters in rural China, Graduate
student Raul Roman is studying the Telecenter Movement in South Africa , while Ami Sengupta is working
on Women and ICT in Rural Nepal . We would be happy to share with you any information on ICT as our
projects develop.
Links:
● International Development Research Center

● Telecommons Development Group
● Digital Divide Network
● Telecenters in Developing Nations
● Association for Progressive Communications
● Communication for Development
●
The ICT movement is gaining fast popularity and as the world moves towards becoming a true Global
Village, I am sure Information Communication Technology will play a very vital role in connecting the
world. However while talking of ICT's and Development there are some important considerations that need
to be made. Primary among them are: Do Telecenters truly respond to the communication and information
needs of the communities they are intended to serve? What impact do they have on social equity and
economic development? Are the privileged members of the rural communities the first ones to get access to
ICT's ?There by is the hegemony of the information haves continueing? How big an impediment is illiteracy
to the effectiveness of ICT's? Having considered these problem areas of the ICT discourse promoters of new
information technology can hopefully overcome the barriers and make the world a real information
highway!
Website created by:

Ami Sengupta
4/16/00
If you do a search in AltaVista on the phrase "computer-mediated communication" you will be presented with a list of over 10,000
sites! While this is certainly better than coming up with no sites, a researcher could spend hours, days even, sifting through the
wealth of material available throughout the web that relates to this field. The question is, what is relevant from a research
perspective? While I can't presume to speak for all reseachers, I have accumulated a number of links that I hope will prove useful to
people who are interested in exploring this field.
The study Computer-Mediated Communication examines the nature of interactions that occur when people encounter each other in
the mediated spaces created by computer technology. CMC media include e-mail, video-conferences, multi-user domains (MUDs),
community networks, shared databases and a variety of tools that allow users to communicate and share information over computer
networks. The particular focus of my own research deals with CMC as it relates to collaboration and learning in applied contexts.
Because of this, the information that I have found useful often has an educational slant to it.
Computer-Mediated Communication (CMC) Research Laboratories and

Scholarship Institutes
Internet methodology Virtual Communities
Trochim Home HSS691 Project Gallery

CMC Scholarship Internet Methodology Research Labs and Institutes Virtual Communities
Copyright © 1997 Wendy Martin. All rights reserved.

CMC scholarship on the Web is rich and varied. It deals with many subtopics, some of which include CMC and
learning, Computer-Supported Collaborative Work (CSCW), CMC and Situated Cognition, CMC and work groups
and many more. You can find huge amounts of scholarly work by searching on any of these topics or by following
links from the sites I link to throughout this site, especially in the research laboratory and institutes section. Here, I
will just provide some links to places I have found useful.
On-Line Journals On-Line Conference Proceedings Scholarly Resources
On-Line Journals
The Journal of Computer-Mediated Communication (JCMC) is a joint project of the Annenberg School for
Communication at the University of Southern California and the Information Systems Division of School of
Business Administration, Hebrew University of Jerusalem. A good example of Web publishing, JCMC took the
idea of collaborative knowledge bases to heart by deciding not only to publish completely on-line (enabling free
access to the content), but to make the information interactive. Readers are able to submit comments about any the
material that can then be read by others browsing through this site.
The CMC Studies Center was created by John December. This site not only provides students, teachers and
researchers with access to CMC magazine, an on-line publication about human-computer interaction and
communication issues and policy, it also offers a huge collection of resources and information about programs and
courses that relate to the field of computer-mediated communication. This site is worth exploring in detail if you
want a comprehensive introduction to this field.
An example of an on-line educational technology journal is From Now On. This journal provides on-line articles
about integrating technology into educational environments and using network communications as an educational
resource.
The Information Society web site is actually more descriptive than the other on-line journals. It provides
information about the content of the "traditionally" published journal. However, it does contain some of the content
of the actual journal, and can serve as a resource for pointing researchers toward valuable information and
highlighting the pertinent issues in the field.
Top of page
On-Line Conference Proceedings
The 1995 Computer-Supported Collaborative Learning (CSCL) Conference Proceedings are available on-line.
These were papers presented at this conference, which examine issues related to using technology to enhance
learning.
The ACM's 1996 Computer-Human Interaction (CHI) Conference Proceedings are on-line as well. This is a
comprehensive listing of not only the papers presented by panel discussions, poster sessions and a great deal more.
Top of page
Scholarly Resources
This area is just sort of a melange of sites that I found interesting and useful for research.
A good introduction to the kinds of issues that CMC researchers deal with is a paper by John December entitled
"Un its of Analysis for Internet Communication."
John Seely Brown, of Xerox Palo Alto Research Center (PARC) and Rob Kling, of the University of California,
Irvine are a few of the researchers I have studied in the field of Computer-Mediated Communication. On-line
versions of their publications are available from these sites.
Who's Who in Educational Technology is a list of major projects in educational technology research.
A very good on-line resource for researchers is Theory and Practice, published by the Centre for Computer
Information Systems and Mathematics at Athabasca University, in Canada. This is a fully accredited open
university, specializing in distance education programs. This site provides information about many of the important
learning theories that inform researchers and educators who are interested in incorporating technology into learning
environments.
Evaluation of cooperative systems project examines methods for evaluating computer-supported cooperative work
(CSCW).
Following are two bibliographies (unfortunately linkless) that list scholarly articles related to the field of
Computer-Mediated Communication.
Bibliography of Organizational Computer-Mediated Communication
ProjectH's Guide to Resources in Computer-Mediate Communication
Top of page
Home Trochim Home HSS691 Project Gallery
Internet Methodology Research Labs and Institutes Virtual Communities

The following research institutes and laboratories are doing important work in the field of computer-mediated
communication. Some concentrate on the development of technologies and products, others are more involved in
spearheading programs that integrate technology into learning environments, while still others are more interested
in the evaluation of communication technologies.
● Xerox PARC
● Institute for Learning Technologies
● Center for Children and Technology
● The Media Lab
● Center for Technology in Learning

● Interactive Multimedia Group
Xerox PARC
Headed by John Seely Brown, the Xerox Corporation's Palo Alto Research Center (Xerox PARC) is one of the
most important research institutes in the field of communication technologies. Its interdisciplinary focus
(researchers on staff include not only computer scientists and human-factors specialists, but anthropologists and
linguists) has enabled it to engage in a wide array of research projects that have produced valuable information in
the area of situated learning and cognition and collaborative work processes. The site states that:
"The Xerox Palo Alto Research Center (PARC) performs pioneering research that covers a broad spectrum of
research fields ranging from electronic materials and device research through computer-based systems and
software, to research into work practices and technologies in use. The center's mission is to pursue those
technologies that relate to Xerox's current and emerging businesses."
Top of page
Institute for Learning Technologies

Directed by Robbie McClintock, the Institute for Learning Technologies , "founded in1986 at Teachers College,
Columbia University, works to advance the role of computers and other information technologies in education and
society." The site states that:
"The Institute is engaged in a number of large scale research projects intended to develop, test and implement
effective pedagogical approaches to the use of new information and communications technologies in education.
The Institute's subject focus areas address K-12 and undergraduate education in math, science, and engineering, the
social sciences, the arts and humanities, and graduate level professional studies in a variety of fields. The Institute's
scope of new media development efforts includes curriculum development, design and evaluation, faculty and
teacher development, and dissemination."
Top of page
Center for Children and Technology
The Education Development Center's Center for Children and Technology site states:
"For fifteen years, the Center for Children and Technology (CCT) has been at the forefront of creating and
researching new ways to foster learning and improve teaching through the development and thoughtful
implementation of new educational technologies. CCT has extensive knowledge of multiple technologies,
including:
Inter- and Intranet systems; distance learning applications, large systems in computer labs, and stand alone
software on individual computers.
CCT's work is centered in three areas: research, including basic, formative, and program evaluation; design and
development of innovative technology prototypes and products; and the implementation and operation of large-
scale technology integration efforts."
Top of page
The Media Lab
Boasting a staff of some of the most innovative researchers in technology studies, Massachusett's Institute of
Technology's Media Lab is one of the leading institutions in researching and developing new communication
technologies. Their site says:
"MIT's Media Laboratory, founded in 1985, carries on advanced research into a broad range of information
technologies including digital television, holographic imaging, computermusic, computer vision, electronic
publishing, artificial intelligence, human/machine interface design, and education-related technologies. Our charter
is to invent and creatively exploit new media for human well-being and individual satisfaction without regard to
present-day constraints. We employ supercomputers and extraordinary input/output devices to experiment today
with notions that will be commonplace tomorrow. The not-so-hidden agenda is to drive technological inventions
and break engineering deadlocks with new perspectives and demanding applications."
Top of page
Center for Technology in Learning
Stanford Research Institute's Center for Technology in Learning is headed by Roy Pea, the former education dean
at Northwestern University. The site claims:
"SRI's Center for Technology in Learning (CTL) focuses on significant issues in learning,teaching, and training
and the ways that the innovative use of technology can help address those issues. CTL's three main areas of work
are: advanced technology research, mathematics and science education, and evaluation of technology applications.
"Staff in the Center for Technology in Learning include cognitive scientists, education researchers,
trainingspecialists, and specialists in computer and telecommunications networking. The Center also draws on staff
from many specialties across SRI, including research engineers and scientists who are working with advanced
technologies."
Top of page
Interactive Multimedia Group
Headed by Geri Gay, Cornell University's Interactive Multimedia Group happens to be where I work now. While I
could provide my own desciption of the work the lab is engaged in, I believe that the site's description does a better
job than I would:
"The Interactive Multimedia Group (IMG) at Cornell University is an interdisciplinary research and design team
directed by Dr. Geri Gay in the Department of Communication. The goal of our research is to understand and
improve the expanding role of computers in communicating, learning, working, and playing. We study how
humans interact with computers, and how technology can mediate communication.
"Since 1985, the IMG has focused on the use of computers in schools and universities in areas as diverse as
engineering, entomology, language learning and art history. We have also explored the use of computers in
museums, libraries, and corporations.
"The IMG is currently involved with national and international collaborations such as the Getty Art History
Project, Digital Library: Making of America, and the Global Digital Museum."
Top of page
CMC Scholarship Internet Methodology Virtual Communities

The internet is a new and daunting territory for researchers. It is very difficult to even know how to amass the
quantitative and qualitative data researchers need to study internet usage. Furthermore, researchers are debating the
methodological approaches that have been taken in the field.
● The Marty Rimm Methodology Debate

● Internet Survey Data
● Software for Web Evaluation
The Marty Rimm Methodology Debate
An interesting debate over methodology used to study the internet took place as a reaction to a now infamous and
highly questionable study conducted at Carnegie-Mellon University by an undergraduate named Marty Rimm. The
study itself was published in The Georgetown Law Journal as "Marketing Pornography on the Information
Superhighway: A Survey of 917,410 Images, Description, Short Stories and Animations Downloaded 8.5 Million
Times by Consumers in Over 2000 Cities in Forty Countries, Provinces and Territories" by Marty Rimm.
But even before the study came out,Time Magazine published an article called:
ON A SCREEN NEAR YOU: CYBERPORN by Philip Elmer-Dewitt, which used the results of Marty Rimm's
research as the foundation for its highly controversial claims about pornography on the internet.
This research project, and the resulting publications, set off a tumult of criticism about how the study was
conducted, the methods used and the reliability of the data that was generated. While the study itself has been
pretty well discredited, the issues it brought to the forefront have forced researchers to articulate their concerns
about methodologies for studying the Internet. Donna L. Hoffman and Thomas P. Novak of Vanderbilt University
have created a comprehensive site--The Cyberporn Debate--which presents both of the articles above and the many
critiques written in response to this flawed research project, including their own articles:
A Detailed Analysis of the Conceptual, Logical, and Methodological Flaws in the Article: "Marketing
Pornography on the Information Superhighway"
and
A Detailed Critique of the TIME Article: "On a Screen Near You: Cyberporn (DeWitt, 7/3/95)"
There are also links to critiques of the ethics of this particular study:
THE ETHICS OF CARNEGIE MELLON'S "CYBER-PORN" STUDY
by Jim Thomas, as well as a critique of Rimm's method for gathering scholarly support for his research:
THE RIMMJOB METHOD (alternatively, THE MARTY METHOD)
by Mike Godwin.
Top of page
Internet Survey Data
Vanderbilt University's Project 2000 offers a resource for web measurement data to researchers, including the
CommerceNet/Nielsen Internet Demographics Survey, which many researchers feel is the best study available of
internet usage (though it comes at a fairly high price!). The Project 2000 site states that:
"Project 2000 is a five-year sponsored research effort devoted to the scholarly and rigorous investigation of the
marketing implications of commercializing hypermedia computer-mediated environments (CMEs) like the World
Wide Web and other emerging electronic environments."
For free Internet Survey information, check out The Graphics, Visualization, & Usability (GVU) Center's World
Wide Web User Surveys.
The Flashligh t Project "is developing survey items, interview plans, cost analysis methods, and other procedures
that educational institutions can use to monitor the success of their educational strategies that use technology."
Top of page
Software for Web Evaluation
The following software allows you to obtain statistical data about the activity on your web site from your site logs--
the kinds of users who are visiting the site, which browsers they use, where they are linking from--and presents the
information in graphs and charts that are easy to understand. Of course, such convenience often comes at a price,
though some of these products allow you to "test drive" the software free of charge for a limited time, and some are
free!
Surf Report Web Site Statistics

Web Manage
WWWstat--this one is free!
MKStats--free for personal use!
Statisphere
LogDoor
Top of page

CMC Scholarship Research Labs and Institutes Virtual Communities
So, what is a "Virtual Community"? Well, in short, it's an area of cyberspace that connects users and enables them
to interact with each other in a variety of ways. Is that vague enough for you? The problem with defining Virtual
Communities is that they come in many shapes and sizes. Another problem with my defining the concept of Virtual
Communities is that, as is true of almost anything in this information-rich world, other people have already done it,
and probably better than I ever could.
Howard Rheingold's book, appropriately named, The Virtual Community, is online and at your disposal to peruse
so you can learn more about the kinds of communities I am referring to here. It is an excellent account of how
people have used computers and network communications not to isolate themselves, but rather to build
relationships with others.
Another good resource is a Master's Thesis called Communities On-Line: Community-Based Computer Networks
by a woman named Anne Beamish at MIT. She distinguishes between "Virtual Community" and "Community
Network" because, as she states, "community networks are located in and support a specific physical place." These
networks make up in relevance what they lack in reach, providing information about the community to local
residents and encouraging their participation in community affairs.
To find out more about Virtual Communities, see the following:
Community Networks MUDs and MOOs Virtual Communities in Education The Problem with Virtual Community
Community Networks
One of the most comprehensive community networks is the Blacksburg Electronic Village , based in Blacksburg,
Virginia, where Virginia Tech is located. This network provides, for residents and visitors alike, a wealth of
information about the town: maps, history, weather forecasts, schools, etc. The Blacksburg project set as its goal to
wire every household in the town, so that the network would truly be a reflection of the community. What makes
this network unique, and useful to researchers, is that the project administrators are committed to evaluating this
project and putting that research information on-line so others who are interested in forming electronic
communities can learn from the Blacksburg community's efforts.
The Well is one of the oldest Virtual Communities. It began (not surprisingly) in the San Francisco Bay area, but it
has members from all over the world. The Well is a vast collection of conferences where members can discuss a
variety of topics, and gather information from the human resources which are the community members. The
"residents" of The Well consider this space a true community, and value the relationships that are created here.
Rheingold's book The Virtual Community goes into detail about the activities of Well denizens.
Another well-established virtual community is the Echo Virtual Salon, based in New York City's West Village. As
with The Well, the 3,000 members here can meet other members on-line and discuss a variety of topics. The focus
is on information about New York City, with a concentration on the arts and culture of the city, and the members
often meet face-to-face at events organized through Echo.
If you're not quite ready to join a full-fledged virtual community, you might just want to test the virtual waters by
joining a listserv or newsgroup that discusses topics of interest to you. Tile Net is a search engine that allows you
to search all the listservs and newgroups available on the Internet, in order to find one that appeals to you.
Researchers who are interested in examining communication behavior on the Internet may find this particularly
useful for randomly selecting various groups for study.
Top of page
MUDs and MOOs
What on earth is a MUD, or a MOO for that matter? Not surprisingly, this question has been answered already by
many other people. I will simply define a MUD as a multi-user dungeon, or multi-user domain (for those who
would like to forget the adventure game beginnings of MUDs). MOO stands for MUD object-oriented. What does
all this mean? MUDs and MOOs are spaces where users can engage in synchronous communication with other
users. These spaces are usually text descriptions of rooms or areas through which the participants can move,
though there now are a number of graphical MOOs, and even some 3-dimensional virtual reality spaces. For more
comprehensive information about MUDs and MOOs, see the MUD/MOO Support Page from Pacific Coast Net.
For informaton about the history of MUDs, see the MUDdex.
Wheew! Now that I've passed that baton, I can get on to what I want to talk about in relation to MUDs and MOOs,
which is that they are another kind of Virtual Community. If as a researcher you simply want to investigate this
culture, there is a list of Web MOOS that you can access. However, if you would like to participate on a MOO that
is specifically designed for media researchers, go to MIT's MediaMoo. Instructions for access and use of this
MOO, can be found at the Netoric Project site.
Top of page
Virtual Communities in Education
Not surprisingly, educators who are interested in integrating technology into the classroom have seized on the idea
of Virtual Community as a way for teachers and students to engage in constructive, participatory learning. Other
educators have seen the topic of Virtual Communities as a subject worthy of study and have designed whole
courses about it.
The Well-Connected Educator is a site for teachers, administrators, parents and students to not only access
information, but where they themselves can serve as content providers. They can read articles written by others in
the education community and submit their own articles. They can also join conferences for discussing issues
related to education.
The Knowledge Network provides a site for Virtual School Field Trips, which is a rich central resource for access
to numerous on-line museums and educational environments.
Joan Mazur and Traci Bliss from the Department of Curriculum and Instruction at the University of Kentucky
conducted a study of a network system called CASENET, which is designed to help create a feeling of community
and support among teachers as they attempt to bring technology into their school's classes.
WebCT "is a tool that facilitates the creation of sophisticated World Wide Web-based educational environments. It
can be used to create entire on-line courses, or to simply publish materials that supplement existing courses."
Diversity University is an educational MOO which is designed to be used in classrooms. Teachers can use this
MOO as a framework for creating on-line teaching environments. These can be specifically for the use of a single
group of student who are physically located in the same space, or it can be used to create virtual school rooms that
connect students from all over the world.
Often the virtual community picture is presented as quite rosy, but M. A. Syverson, the Associate Director of the
Computer Writing and Research Lab at the University of Texas at Austin has examined the use of MOOs in
education and finds there are still challenges to overcome. The on-line article Problems in Evaluating Learning in
MOOs and MUDs: Preliminary Working Models discusses the issues that need to be addressed.
The University of Rochester has a seminar series on Digital Culture given by David Rodowick. The online class
overview is a useful resource for links to literature about this field that can be accessed on the Web. In particular,
check out the unit on Virtual Communities, and be sure to look through the bibliography at the end of the page.
Top of page
The Problem with Virtual Community
Not everyone views the concept of Virtual Community in a completely positive light. Langdon Winner, in his
article " ;Who Will Be in Cyberspace?" questions the oft cited notion that Virtual Communities will create a
culture of participatory democracy.
"Virtual Communities: Abort, Retry, Failure?" by Jan Fernback & Brad Thompson criticizes what the authors
believe is a misunderstanding of the concept of community.
"Virtuality and Its Discontents: Searching for Community in Cyberspace" is adapted from the book Life on the
Screen by Sherry Turkle. It examines the character of the kinds of community that are created with network
communications.
"A Rape in Cyberspace: How an Evil Clown, a Haitian Trickster Spirit, Two Wizards, and a Cast of Dozens
Turned a Database Into a Society" by Julian Dibbell is a bizarre story about the difficulty in drawing the line
between real life and virtual life. The residents of LambdaMOO experienced a virtual rape right in their "living
room" and this forced them to evaluate the meaning of freedom, ethics and morality in virtual spaces.
Top of page
CMC Scholarship Internet Methodology Research Labs and Institutes

COMMUNITY DEVELOPMENT
ORGANIZING AND ACTIVISTS
CONNECTIONS AND >RESOURCES
● Introduction
● ContentsTable
Across the country and all over the Web individuals who are working to re-build civic connections and develop strong
communities have a wealth of information to share. The Community Development, Organizing and Activists Connections and
Resources page is provided to help community members and community development professionals access the information
available on the World Wide Web and to build connections through access to Web Sites, Listserves and On-line Publications,
and a Directory of Program Evaluation Resources.
This site is designed to serve a more specific purpose. . .
It is true that all communities must have access to certain basic resources to remain healthy and
strong. Many communities that have faced economic decay have lost direct connections with
those resources. Parents and others who can afford to pay for resources often access them freely
and appropriately. Whether it is through access to high quality education or health systems, or
access to drug rehabilitation and counseling for a troubled family member, many financially
secure families can access these services through paying school tax, writing a check or utilizing
their employer's health insurance benefits. Unfortunately families without financial resources
are often made to feel a sense of shame when they try to access the very same resources.
Inability to pay for the resources privately may

often mean that the resource gets relabeled as a 'service'.
Individuals who pay for services are seen as appropriately
accessing resources in order to realize their full potential.
Lower income individuals are seen as relying on 'services'
when they too attempt to access the same information,
counseling, training, etc.
In order to begin leveling the community health playing ground we must work on making resources equally available to ALL
individuals and families, no matter what their income levels or method of payment may be. . .
But that is still only one piece of the puzzle. . .
In addition to providing links to vital resources and helping encourage community self-direction,
community development and organizing work must focus on building ethnic pride and
respecting and utilizing cultural strengths and self-identification. Key issues such as meaningful
political involvement, fostering economic development through personal efficacy, skill building
instead of continuing dependency, and developing spiritual growth for individuals, families and
communities as a whole must also be addressed. The philosophical ideologies, mythology and
cultural identities of each ethnic group must be studied and respected in order to respond to the
needs found within each community. Designing programs that are based on someone elses
culture and then attempting diversity by forcing another culture to bend, twist and conform to the standards of the majority
culture is not practicing diversity. It may be slow genocide, but it is not diversity. One must learn to listen to and respect the
internal views of the culture and the sociological and psychological development of an ethnic group. In the past diversity has
often ment that people were classified by the ways that they seemed to vary from the 'norms' of the ruling majority culture.
That viewpoint is no longer viable. (In fact it never was!)
I have therefore begun the process of incorporating links to resources such as ethnic and spiritual development and
community organizing training resources along with others that address general topics, such as job training, job search,
childcare, health resources, community policing, and connecting with other community development and organizing groups,
etc.
The outer perimeter of the Community Development resources table lists ten areas of vital community resources. The inner
section of the table lists links to reference information that may help in putting those resources to work in your community.
As you will see the site is still in its first stages of development. Furthermore, because the information available on the web
grows and changes on a daily basis, your assistance in keeping this site up to date is greatly appreciated. Please send
information about changes and/or additions to gmk5@cornell.edu. Include the URL address and title of the recommended
link and a brief description of the resource. Your input is greatly appreciated !
Strengthening Spirituality,
Ethnic Identity Political and Civic Involvement
and Understanding Kinship and Fictive Kinship
● Federal Resources
● Religious and Morality ● Networks of Support for ● State / Local Resources
Building Resources Parents ● Voter Participation Resources
● Ethnic Pride ● Parenting Education Resources ● Legal Assistance and
● Developing Information
Understanding
Personal Efficacy / Economic
Building Strong, Youth Opportunity Development
Focused In-School and After School
Activities ● Self-employment and
Resources to Start your own
Business
● Youth Education Resources
● Adult Employment Resources
● Teen focused and
● On-line Help Wanted Lists
Child focused lists of computer
● Adult Welfare-to-Work
skill building resources DIRECTORY OF COMMUNITY Resources
● Teen Employment and Career DEVELOPMENT WEB SITES, ● Links to Technical Training
Development
LISTSERVES AND ON-LINE and Job Preparation
● School-to-Work Resources
PUBLICATIONS Resources
● Accessing College
● Adult Education Resources
Includes links to:
● Lists of On-line College
College Access Program DIRECTORY OF PROGRAM Resources
Resources, EVALUATION RESOURCES ● College Scholarship
College Scholarship
Information (for students of
Information (for students of all
all ages)
ages),
● Teen focused computer skill
Lists of On-line College
building resources
Resources,
● Child focused computer skill
Technical Training and Job
building resources
Preparation Resources.
● Computer Skill Building
Resources (Adult Focused)
Who's Who in Community Safe Child Care

Personal Efficacy / Access to Health Development, Organizing and Activism
Care ● Child Care Resources
● The planning for this future site
● Community
addition includes links to Head
● Health Care Centers
Start Web Pages and Evaluations
Resources ● Community Development, from across the country.
● Substance Use Organizing and Activism ● Web Resources for Young
Recovery Projects Children
and ● Community Development,
Prevention Organizing and Activism
Resources Training
Resources SITE UNDER CONSTRUCTION
Adequate and Affordable Housing
Personal Efficacy / Community Safety
Recreational Opportunities
● Housing and Security
and
Public Works ● Camping Resources
● Neighborhood Watch ● Summer College Resources
Projects Resources
and ● Family Vacation Resources
● Human Rights ● Web 'Recreational' Activities
Resources Resources
● Police / Civilian Review Board
Resources
● Enviromental Justice SITE UNDER CONSTRUCTION
Resources
SITE UNDER CONSTRUCTION
Return to top of page
WEB SITE, LISTSERVE DIRECTORY I EVALUATION DIRECTORY I Spirituality / Ethnic Identity I Kinship I Political and Civic Involvement I School /
Youth I Personal Efficacy / Economic Opportunity I Personal Efficacy / Health Care I Community Development, Organizing and Activism I Child Care I Housing
I Personal Efficacy / Community Safety and Security I Recreational Opportunities
Trochim Home Page Gallery Home Page
COMMUNITY DEVELOPMENT CONNECTIONS AND RESOURCES WEB SITE Copyright &copy1997; Georgette M. King gmk5@cornell.edu. All rights
Reserved.
DATE LAST UPDATED: 4/18/97

DIRECTORY OF COMMUNITY DEVELOPMENT
WEB SITES, LISTSERVES AND ON-LINE PUBLICATIONS
This directory summarizes the links that appear in the COMMUNITY DEVELOPMENT CONNECTIONS AND
RESOURCES WEB SITE and contains a listing of LISTSERVES and ON-LINE PUBLICATIONS that may be of interest to
families and communities. An extensive list of related LISTSERVES can be found at www.cas.psu.edu/do cs/cyfernet/
handout4.html.
Because the information available on the web grows and changes on a daily basis your assistance in keeping this site up to
date is greatly appreciated. Please send information about changes and/or additions to gmk5@cornell.edu. Include the URL
address and title of the recommended link and a brief description of the resource.
● A
● WEB SITES
Adams JobBank Online

http://www.adamsonline.com/
The Adams Media Co. specializes in self-help books for those seeking work and job mobility. This online resource provides
leads to jobs in computers, "general" work areas, education, social work, sales, health care, and etc. It also has a database
searchable by state.
African American Women: Breast cancer and mammography facts

http://www.graylab.ac.uk/canc ernet/600616.html
African American Women Online

http://www.uoknor.edu: 80/jmc/home/gmccauley/text.html
Includes links for:
African American Women's Health Issues
Famous African American Women Writers
Links for African American Women Who Write
African Centered Educational Homepage

http://http://home.earthlink.ne t/~bcd227/index.html
The Afro-American Newspaper Company of Baltimore, Inc.

http://www.afroam.org/index.html
Along with a vast array of news and information for children, teens, and adults alike, this site also contains a list of
Web Links to Africa http://www.afroam.org/cultu re/africa/africa.html
American Evaluation Association (AEA)

http://www.eval.org/
"We are an international professional association of evaluators devoted to the application and exploration of program
evaluation, personnel evaluation, technology, and many other forms of evaluation. "
America's Job Bank
http://www.ajb.dni.us./
This is a MASSIVE site with information on about 250,000 jobs. Listings are from 1,800 state employment service offices
around the country and represent every line of work from professional and technical to blue collar and entry level.
AVON'S BREAST CANCER AWARENESS CRUSADE

http://www.pmedia.com/Avon/library.h tml
Includes:
Glossary of Terms
Breast Cancer FAQ
More Facts about Breast Cancer
What You Should Know About Mammograms
Breast Cancer Support Groups
● LISTSERVES
● ON-LINE PUBLICATIONS
● B
● WEB SITES
"Base Communities: Citizen Action at the Grassroots"

http://www.cpn.org/sections/topics/religion/stories-studies/ black_baltimore8.html
A study of religious "base communities" in Baltimore. Base Communities are small, intimate peer groups of a dozen or two
dozen people, in which participants can evaluate the day's struggles, commiserate with one another's failures, celebrate
success, and plan for the next day's fight in larger public arenas. Excerpted from Harold A. McDougall's Black Baltimore: A
New Theory of Community. Case study plus. Baltimore, MD.
Biddeford, Maine, Police Department

http://www.lamere.net/bpd
Boldface Jobs
http://www.boldfacejobs.com/
This site not only allows you to search various databases for jobs but also to enter your name and resume in the database as
well.
Building Diverse Communities

http://www.cpn.org/sections/topics/community/stories-studies/pew_diver se_com.html
From the Sea Islands, SC Excerpts from "Black Baltimore, A New Theory of Community", by Harold A. McDougall http://
www.cpn.org/sections/topics/community/stories-studies/black_baltimore 7.html
● LISTSERVES

● C
● WEB SITES
CES Welfare Reform Site

http://www.cyfernet.mes.umn.edu/welfa re.html
". . . a compilation of reports explaining the Welfare Reform legislation and its impact, and Cooperative Extension System
resources to help individuals, families, agencies and the community respond to these changes. "
CNN Financial Information

http://www.cnnfn.com
CareerPath
http://www.careerpath.com/info.html
This site allows you to check out the want ads in the Sunday editions of the New York Times, The Washington Post, the
Chicago Tribune, the L.A. Times, the Boston Globe, and other newspapers. If you are targeting employment in a particular
city, here's the place to go. Unfortunately, other than the New York Times, there are no other New York cities' papers listed
here.
Careers and Jobs

http://www.starthere.com/jobs/
An exhaustive set of databases, this site carries hundreds of job listings and links to other sites in the areas of academia, law
enforcement, chemistry, industry, computer, and the environment. It also provides links to groups offering resume services,
job fairs, and training and development.
Center for Civic Education Programs and Publications

http://www.primenet.com/%7Ecce/c atalog.html
Chicago, Ill., Police Department

http://www.ci.chi.il.us/CommunityP olicing
Child Welfare Resources on the Web

http://trochim .human.cornell.edu/gallery/johnson/johnson1.htm
Children's Behavior, Development, Emotions and Psychology Site Listings

http:// http://kidshealth.org/parent/behavior/index.html
A resource for parents provided by KidsHealth http://http://kidshealth.org/
Christmas and Hanukkah Cards and Letter to Santa Page

http://www.marlo.com/allages.htm
This is an interactive page for adults and kids to send letters to anyone they want--including Santa! This sites has lots of
other, non-seasonal fun things for kids, too.
Christmas Crafts Page

http://www.merry-christmas.com/crafts.htm
This page has numerous holiday craft ideas for parents and kids to do together.
College Bound
http://www.cbound.com/
College Bound is a non-profit organization working with D.C. public school students who want to go to college -- but need
a helping hand. The program was started in 1991 as "The D.C. Partnership for Education" by a teacher, a Capitol Hill staffer
and an officer of the Congressional Black Caucus. College Bound services uniquely combine mentoring, tutoring, and
positive life experiences with the opportunity for scholarship assistance.
College Funding Site

http://http.tamu.edu/~ gac3280/.money_html/dollar.html
Thinking about those college application deadlines? This web site contains information on various scholarships, internships,
grants, etc. for minority students.
Community Policing Consortium

http://www.communitypolicing.org
Community policing topics are updated monthly and include an array of information such as training sessions and training
curricula, versions of the Community Policing Exchange, Sheriff Times and Information Access Guide, and a growing list of
related web sites.
Cornell Legal Information

http://www.law.cornell.edu
Cornell University's John Henrik Clarke Africana Library

http://www.library.cornell.edu/afri cana/
The John Henrik Clarke Africana Library provides a special collection focusing on the history and culture of peoples of
African ancestry. The library serves as a bibliographic reference and referral center by providing access to African and
African American resources available either in the Cornell University Library or collections at other institutions.
This site includes links to:
ABOUT THE CLARKE AFRICANA LIBRARY
SELECTED AFRICANA LIBRARY RESOURCES
SELECTED NEW BOOKS
AFRICANA STUDIES & RESEARCH CENTER
CORNELL AFRICAN & AFRICAN-AMERICAN COLLECTIONS
MBATA BOOK FUND
UPCOMING LOCAL EVENTS
LIBRARY CATALOGS
INSTITUTE FOR AFRICAN DEVELOPMENT
AFRICANA STUDIES WEB SERVERS
Cornell Youth and Work Program

http://www.human.cornell.edu/youthwork/
Work Force Preparedness and School to Work program information.
CYFERNet
http://www.cyfernet.mes.umn.edu:2400/ The Cooperative Extension System's children, youth and family information
service. Contains links to community and family development areas including:
Health
Child Care
Building Organizational Collaborations
Promoting Family Strength
Science and Technology Literacy
Strengthening Community
Welfare Reform
Diversity
Work Force Preparedness and School to Work
● LISTSERVES
CHILDREN ACCESSING CONTROVERSIAL INFORMATION

A mailing list for discussing children accessing controversial information through computer networks. See http://www.zen.
org/~brendan/caci.html for more information.
Instructions:
SUBSCRIBE: send email to caci-request@media.mit.edu with the command: subscribe
UNSUBSCRIBE: send email to caci-request@media.mit.edu with the command: unsubscribe
TO SEND MAIL TO THE GROUP: (information not available at this time) Contact: caci-owner@media.mit.edu
CHILDREN, YOUTH & FAMILY DISCUSSION

A general discussion of issues related to the health, education and well being of children, youth and families.
Instruction:
SUBSCRIBE: send email to listserv@tc.umn.edu with the command: subscribe CYF-L your name
UNSUBSCRIBE: send email to listserv@vm1.spcs.umn.edu with the command: unsubscribe CYF-L your name
TO SEND MAIL TO THE GROUP: CYF-L@vm1.spcs.umn.edu
Contact: Lori Bock, cyfcec@maroon.tc.umn.edu
COMMUNITIES IN ECONOMIC TRANSITION DISCUSSION GROUP

The Communities in Economic Transition national initiative focuses on providing two types of assistance to communities;
strategic planning; and business assistance and enterprise development. This forum is open for practitioners, academics, and
others to share ideas, programs, and opinions. It is open to everyone, regardless of area of specialty or discipline.
Instructions:
SUBSCRIBE: send email to almanac@esusda.gov with the command: subscribe cet-mg
UNSUBSCRIBE: send email to almanac@esusda.gov with the command: unsubscribe cet-mg
TO SEND MAIL TO THE GROUP: cet-mg@esusda.gov
Contact: Todd Landfried, CSREES-USDA (tlandfried@reeusda.gov)
● D
● WEB SITES
Decisions for Health Network (NDHN)

http://wysiwyg: //127/http://www.oznet.ksu.edu/ndhn/index.htm
"The mission of the National Decisions for Health Network (NDHN) is to support and empowe individuals, children,
youths, families and communities as they make decisions about health and health care and adopt healthy lifestyles. This
network will provide technical assistance to the children, youth and family customers and programs focused on health issues
across the life span."
● LISTSERVES
DIVERSITY FORUM
Discussion and dissemination of information about program and audience diversity within Cooperative Extension.
Instructions:
SUBSCRIBE: send email to majordomo@reeusda.gov with the command: subscribe usda.diversity
UNSUBSCRIBE: send email to majordomo@reeusda.gov with the command: unsubscribe usda.diversity
TO SEND MAIL TO THE GROUP: usda.diversity@reeusda.gov
Contact: Manju Seal, seal@admin.uwex.edu, 608-265-3589
● E
● WEB SITES
English teacher - MARC ZICARI

http://www.webster.monroe.edu/marc
You might want to take a look at his page - it contains some interesting stuff for English teachers
Evaluators, Links of Interest to

http://www.eval.org/links.htm
● LISTSERVES
EDNET
A discussion of educational uses of the Internet. This is a very active list... you'll get a lot of mail from it!
Instructions:
SUBSCRIBE: send email to listproc@lists.umass.edu with the command: subscribe EdNet Your Name
UNSUBSCRIBE: send email to listproc@lists.umass.edu with the command: unsubscribe EdNet
TO SEND MAIL TO THE GROUP: ednet@nic.umass.edu
Contact: ednetmgr@educ.umass.edu
● F
● WEB SITES
The Family Village Web site

http://www.familyvil lage.wisc.edu/lib_sids.htm#Articles
Waisman Center
University of Wisconsin--Madison
1500 Highland Ave.
Madison, WI 53705-2280
Includes links to other SIDS related WEB sites.
Fathers' Resource Center
http://freenet.msp.mn.us/org/frc/l st.html
Federal Jobs
http://www.fedworld.gov/jobs/job search.html
For those interested in working for Uncle Sam, here's a listing of all, government agency jobs around the country. You can
limit your site by region or engage in more general searches by job types.
● LISTSERVES
FATHERNet DISCUSSION GROUP

A forum for discussing the importance of fathers in children's lives. Thoughtful dialog on spanking, child support, time
management, equality in parenting, and men concerned about being good fathers.
Instructions:
SUBSCRIBE: send email to listserv@tc.umn.edu with the command: subscribe father-L your name
UNSUBSCRIBE: send email to listserv@vm1.spcs.umn.edu with the command: unsubscribe father-L
TO SEND MAIL TO THE GROUP: father-L@vm1.spcs.umn.edu
● G
● WEB SITES
Glossary of Internet Terms

http://www.matisse.net/files/glos sary.html
Wondering what people are talking about when they say words and phrases like ASCII, Java, TCP/IP. and Spamming?
Here's the site to help you make sense of it all
● LISTSERVES
● H
● WEB SITES
HUD HOME PAGE

http://www. huduser.org/ and the text version can be found at: gopher://huduser.org
The U.S. Department of Housing and Urban Development (HUD) is expanding its use of the "information superhighway" to
give the public quick access to HUD programs and services and to make it easier for people to do business with the
Department.
HUD' s home page is not just about HUD; it's a clearing house of information on homes and communities. It provides
information from other federal agencies, non-profits and the private sector through links to about 175 other Web sites.
Individuals can access information on jobs skills training, community services, opportunities of employment, education.
Also you will learn how HUD funds are being spent by local governments, and other data summaries and maps on nearly
1,000 communities.
The Human Rights Web

http://www.gen.emory.edu/medweb/www.hrweb.org
Humanus Employment Site

http://www.humanus.com/employment.htm l
Contains links to several sites for jobs especially within New York State. You can link to the "Monster Board," Career
Plaza, Federal Jobs Digest, NY Search Job Descriptions, Syracuse Online, Western NY Jobs, and the "Silicon Alley Job
Board" through this site.
● LISTSERVES
● I
● WEB SITES
IGC's Human rights information on the Internet

http://www.igc.apc.org/igc/issues /hr/#orgs
Includes:
Relief and humanitarian aid efforts
Actions Alerts
Human Rights News and Recent Developments
Documents & Clearinghouses
Organizations, including Children's Rights
The Innocence Project

http://www.criminaljus tice.org:80/PUBLIC/innocent.htm
The Innocence Project has helped to obtain the release of more than eight innocent prisoners with new DNA tests and
evidence which excluded them as participants in crimes for which they had been convicted.
IthacaNet
http://www.ithaca.ny.us/
● LISTSERVES

● J
● WEB SITES
Juvenile Justice World Wide Web (WWW) Site Listings

http://www.ncjrs.org/jjwww.htm
● LISTSERVES
● K
● WEB SITES
Kids Internet Delight

http://allison.clark.net/pu b/journalism/kid.html
This list of online resources includes 80 places for kids to go with and without their parents. Definitely worth the trip.
KidsHealth
http://kidshealth.org/
Health education for kids and parents from the Nemours Foundation with separate browsing areas for kids, parents and
professionals. Includes many list of links to family health related sites.
● LISTSERVES
● L
● WEB SITES
Lawyers Committee for Human Rights

http://WWW.LCHR.ORG/
● LISTSERVES
LIMITED RESOURCE AUDIENCES DISCUSSION GROUP

This discussion group is designed for Extension staff at county, state, and federal levels to share information about reaching
and teaching limited resource audiences.
Instructions:
SUBSCRIBE: send email to almanac@esusda.gov with the command: subscribe lra-mg
UNSUBSCRIBE: send email to almanac@esusda.gov with the command: unsubscribe lra-mg
TO SEND MAIL TO THE GROUP: lra-mg@esusda.gov
Contact: Jane Schuchardt, CSREES-USDA, (jschuchardt@reeusda.gov)
● M
● WEB SITES
MedWeb: Public Health

http://www.gen.emory.ed u/medweb/medweb.ph.html#Sites
Moral Development: Reasoning, Judgment, and Action

http://trochim.human.corn ell.edu/gallery/lamar/lamar1.htm
● LISTSERVES
● N
● WEB SITES
NYSDA's Home Page

http://www.nysda.org/
NATIONAL ACTION PLAN ON BREAST ( NAPBC) Web site

http://www.napbc.org/
National Center for Diversity

http://www.cyfernet.mes.umn.edu/
Touches on some areas of diversity without going very far in depth, or towards building an understanding of the viewponts
and issues of non-white Americans. Still there is a little information here that may be of some use. . .
National Criminal Justice Reference Service (NCJRS)

http://www.ncjrs.org
gopher://ncjrs.org
NCJRS is the largest criminal justice information netwrok in the world. The National institute of Justise (NIJ) created
NCJRS to furnish research findings to professionals who use the knowledge to improve the criminal justice system. NCJRS
operates specialized clearinghouse that are staffed by information specialists who offer reference, referral and distibution
services.
National Network for Collaboration
http://www.cyfernet.mes.umn. edu:2400/build.html
"The information presented here utilizes the knowledge and expertise of specialists from the National Network for
Collaboration to provide a guide to begin, strengthen and sustain the collaborative journey, for the building and sustaining
of positive change."
National Network for Family Resiliency (NNFR)

http://www.agnr.umd.edu/users/nnfr/
The National Network for Family Resiliency (NNFR) Evaluation Work Team
http:// www.agnr.umd.edu/users/nnfr/evaluation.html
". . . (P)rovides abstracts of evaluation tools and a bibliography of evaluation resources that address issues identified by
Special Interest Groups."
National SIDS Resource Center Web site

http://www.circsol.com/SIDS/
2070 Chain Bridge Rd., Suite 450
Vienna, VA 22182
Telephone: 703-821-8955
(703) 821-2098 (Fax)
info@circsol.com
Neighborhood Assoc. Manual

http://www.open.org/scserv/cid.html
From Salem, Oregon
● LISTSERVES
The Net Magazine Online

http://www.thenet-usa.com
This magazine contains many interesting developments and web sites. Especially interesting and helpful are the "Sites of the
month" and the "blue pages."
● O
● WEB SITES
100 BMC NEWS

http://www.govirtual.com/100bmc/
News, Information, and Events of 100 Black Men of Chicago.
The 100 Black Men of America, Inc. Mission Statement:
"To improve the quality of life of our citizens and enhance educational opportunities for African-Americans and minorities,
through it's chapters, in all communities -- with a particular emphasis on young African-American males."
100 Black Men Of America, Inc. Charlotte Chapter
http://ww w.charweb.org/organizations/nonprofits/100bm/index100.htm
"A group of concerned African American men whose goal is to improve the quality of life in the African American
community through their collective resources, abilities, and experiences.
In developing its programmatic agenda, the 100 Black Men recognized the challenges confronting our African American
Youth, and strive to provide leadership and support in addressing these and other critical needs of the Black Community."
100 Black Men Of America, Inc. Tallahassee Area Chapter

http://www.freenet.tlh.fl. us/~tacot1bm/index.html
100 Black Men Of Western Pennsylvania

http://hillhouse.ckp.edu/servic es/100bm.html
"Dedicated to providing our youth positive role models, educational assistance and creative alternatives to achieving
personal success."
Online Resources for African American Women and Womanist Studies

http://www.uic.edu/~vjpitch/
● LISTSERVES
● P
● WEB SITES
Print Publications about the Internet

http://www. hec.ohio-state.edu/famlife/technol/printpub.htm#intro
There are numerous print resources about the Internet and the World Wide Web. This site briefly reviews many to help you
make purchasing decisions.
Project AIYME
http://www-leland.stanford.edu/g roup/aiyme/
"Project AIYME stands for Asian American Initiative for Youth Motivation and Empowerment. Project AIYME is a fun
and exciting program that promotes youth leadership and empowerment through mentoring and peer interaction.
Our primary mission is to provide Asian American junior high students throughout the South Bay with positive role models
from the Stanford community and from their own peers. Students attend weekend retreats at Stanford University and special
outings and day programs with their mentors and AIYME coordinators."
● LISTSERVES
PARENTING DISCUSSION GROUP

A discussion group on topics related to parenting children (including child development, education, and care) from birth
through adolescence.
Instructions:
SUBSCRIBE: send email to listserv@postoffice.cso.uiuc.edu with the command: subscribe PARENTING-L yourname
UNSUBSCRIBE: send email to listserv@postoffice.cso.uiuc.edu with the command: unsubscribe PARENTING-L
TO SEND MAIL TO THE GROUP: PARENTING-L@postoffice.cso .uiuc.edu
Project Vote Smart

http://vote-smart.org/
National nonpartisan, nonprofit political information service providing educational materials. Currenlty, Project Vote Smart
provides information for the general public through:
* The Voter's Research Hotline 1-800-622-SMART
* The Vote-Smart Web http://www.vote-smart.org
* The Voter's Self-Denfense Manual
PC Magazine Online
http://www.pcmag.com/pcmag.htm
Another good resource online is PC Magazine. The top 100 web sites section is a sure winner.
PEDIATRICS FOR PARENTS

A Source of Children's Health Information. Pediatrics for Parents is a monthly E-zine covering advances in children's health
care, in depth articles on selected topics, and practical advice. The articles, written by health care professionals, discuss
many different topics. The editor, Rich Sagall, M.D., is a board certified family practitioner. This is not a mail discussion
group; you only receive mail by subscribing to it.
Instructions:
SUBSCRIBE: send email to pediatricsforparents@pobox.com with the command: subscribe SAC your name (for example:
subscribe hlh2@cornell.edu)
TO SEND MAIL TO THE GROUP: NEW-LIST@LISTSERV.NODAK.EDU
● Q
● WEB SITES
● LISTSERVES
● R
● WEB SITES
Reducing the risks of SIDS: Some steps parents can take.

http://sids-network.org/risk.htm
(1996, November 29). Sudden Infant Death Syndrome. Network. [On-line].
● LISTSERVES
● S
● WEB SITES
Sioux Heritage / Lakhota Language

http://lakhota.com/
On this site, you will discover Lakhota translation of over 100 common American-English words and phrases. If you prefer
audio, check out " Lakhota Online," Professionally recorded audio of native speaking elders.
Social Security Administration

http://
● LISTSERVES
SCHOOL AGE CARE DISCUSSION GROUP

A forum for discussing school-age child care issues and topics of concern, including planning, resources, activities, funding,
staff and staff development, and related subjects. The list is operated by the ERIC Clearinghouse on Elementary and Early
Childhood Education (ERIC/EECE) and the School-Age Child Care Project (SACCProject) at the Center for the Study of
Women, Wellesley College.
Instructions:
SUBSCRIBE: send email to Majordomo@ux1.cso.uiuc.edu with the command: subscribe SAC your name
UNSUBSCRIBE: send email to Majordomo@ux1.cso.uiuc.edu with the command: unsubscribe SAC
TO SEND MAIL TO THE GROUP: sac@ux1.cso.uiuc.edu
SCIENCE & TECHNOLOGY (Part of the CYFAR Initiative)

A discussion of science and technology nonformal education issues.
Instructions:
SUBSCRIBE: send email to almanac@mes.umn.edu with the command: subscribe sciteced
UNSUBSCRIBE: send email to listproc@listproc.wsu.edu with the command: unsubscribe sciteced
TO SEND MAIL TO THE GROUP: sciteced@mes.umn.edu
● T
● WEB SITES
Tales of Wonder
http://itpubs.ucdavis.edu/richard/t ales/
This site is a collection of folk and fairy tales from around the world, from Russia, Central Asia, Japan, the Middles East,
Scandinavia, Scotland, England, Africa, North (Native) America, etc.
Tioga County Web Site

http://www.spectra.net/~tioga/
● LISTSERVES
● U
● WEB SITES
U.S. Children's Rights Law from Cornell Law School.

http://www.law.corne ll.edu/topics/childrens_rights.html
The United Nations Online Crime Prevention and Criminal Justice Network Clearinghouse
http://www.ncjrs.org/unojust
United States Holocaust Memorial Museum Washington, DC

http://www.ushmm.org/
University of Minnesota Human Rights Library.

http://www.umn.edu/humanrts/
An extensive listing of U.S. and international human rights resources, including a page listing links to Human Rights and
Related Sources Available Through the Internet.
● LISTSERVES
● V
● WEB SITES
● LISTSERVES

● W
● WEB SITES
WWW Frequently Asked Questions

http://sunsite.unc.edu/boute ll/faq/www_faq.html (new) OR
http://www.arn.net/faq/wwwfaq1.html (old)
Head here to read basic questions and answers about the Web. It is suppose to have a new address (listed first), but I could
only find it at the old address (listed second). This page is updated frequently with the latest developments.
Washington State Institute for Community Oriented Policing

http://www.idi.wsu.edu/wsicop/
The mission of the Washington State Institute for Community Oriented Policing (WSICOP) is to broker relevant community
policing training programs to community members, police and/or other government officials that will provide technical
assistance to law enforcement agencies, and to conduct research on the implementation and results of community oriented
policing.
WebDiva InfoCenter
http://www.afrinet.net/%7Ehallh/
An extensive, dinamic African American Resource site. MANY MANY links to other Afrocentric sites.
Web Links to Africa

http://www.afroam.org/cultu re/africa/africa.html
Links to web sites originating in Africa.
Western New York Law Clinic

http://www.wnylc.org
Excellant source of information about welfare reform legislation!
Women's Health Weekly

http://www.newsfile.com/homepage/1w. htm
● LISTSERVES
● X
● WEB SITES
● LISTSERVES

● Y
● WEB SITES
● LISTSERVES
● Z
● WEB SITES
● LISTSERVES
Return to:
EVALUATION DIRECTORY I Spirituality / Ethnic Identity I Kinship I Political and Civic Involvement I School / Youth I Personal Efficacy / Economic
Opportunity I Personal Efficacy / Health Care I Community Development, Organizing and Activism I Child Care I Housing I Personal Efficacy / Community
Safety and Security I Recreational Opportunities
Community Development Home Page Trochim Home Page Gallery Home Page
COMMUNITY DEVELOPMENT CONNECTIONS AND RESOURCES WEB SITE Copyright &copy1997; Georgette M. King gmk5@cornell.edu. All rights Reserved.

Yvonne L. LaMar Cornell University
HSS 691
Moral Development The practical applications of morality theories can only be fully appreciated in the context of moral
development. Several social scientists have theorized about the chronology and order that people develop the ability to know right
from wrong. This paper has two purposes. The first purpose is to briefly explore the educational implications of two popular theories
of moral development. The second purpose is to examine the formation and applications of moral judgment and moral action along
with educator’s role in this area of development.
Theories of Moral Development
Lawrence Kohlberg’s (1969) Theory of Moral Development is one of the most widely used approaches to the examination of moral
reasoning. This stage theory is based on various responses to scenarios which involved a moral dilemma. Kohlberg recognized three
levels of moral development which encompassed six stages. A brief description of the levels and stages follows: Level 1, The
Preconventional Level Children at this level respond to moral cues from their social reference group, most commonly, parents. At
this stage children are extremely self-involved and moral behavior is only in response to sanctions and rewards based on behavior,
hence the description of Stage 1, “The Obedience and Punishment Orientation”. Stage 2 is the Instrumental Relativism Orientation.
At this stage moral behavior depends on the desires of the individual. Level 2, The Conventional or Moral Level Moral reasoning
is now based on existing social norms as well as the rights of others. Kohlberg asserts that most adolescents and some adults operate
at this level of reasoning. Stage 3 is the Interpersonal Concordance Orientation indicates that the individual has developed the ability
to empathize and is no longer selfish in their moral reasoning. Stage 4 is the Orientation Toward Authority, Law, and Duty. At this
stage moral activity becomes a function of following rules and has no association with the need for personal approval Level 3, The
Postconventional or Autonomous Level This is the most advanced level of moral reasoning which relies on universal principles in
approaching moral problems. Stage 5 is the Social Contract Orientation which relies heavily on noble principles such as equality and
human dignity. Stage 6 is the Universal Ethical Principles Orientation which is rarely reached. This orientation relies on principles
that are self-generated and universally-applicable. This brief description of levels and stages was provided as a backdrop for the
educational implications of moral development theories. According to Kohlberg (1969) education plays a major role in moral
development. His strongest statement to this effect is that moral reasoning stops at the same point that formal education stops.
Although this assertion has stimulated much needed discussion about moral education, there has been a great deal of resistance.
Parents tend to fear that formal moral education may contradict the religious or philosophical values that are taught in the home and
by social agencies other than the school. Even in communities where moral education is welcome there is disagreement about which
values and what topics to cover in the classroom. Children are encouraged to behave in a moralistic fashion whether the school has
embraced a moral education agenda or not. Middle-class, conformist values are rewarded and other behavior is punished or
pathologized. Behavior that can be characterized as immoral will be discussed later in this paper along with practical ways to for
educators to approach them.
Carol Gilligan’s Feminist Theory of Moral Development
According to Kohlberg (1969), more males than females move beyond stage 4 of their moral development. There appeared to be
deficiencies in females’ ability to reason morally. This deficiency was accepted and confirmed by other theorists until Carol Gilligan
of Harvard University supplied an alternative explanation. Gilligan demonstrated through several studies that males and females
approach moral issues from completely different perspectives. The most significant difference in the females’ approach was the
inclination to emphasize interpersonal relationships (Gilligan, 1977). Females tend to base their moral actions on responsibility
toward specific others more than on abstract principles. Gilligan described the stages of female moral development in terms of levels
and transitions.
Level 1, Orientation Toward Self- Interest The First Transition: from selfishness to responsibility Level 2, Identification of
Goodness with Responsibility for Others The Second Transition: from conformity to a new inner judgment Level 3: Focusing on
the Dynamics Between Self and Others This stage theory suggests that we seriously consider gender differences when evaluating
moral reasoning, judgment, and actions. The implications of Gilligan’s research go beyond the domain of psychology to reach into
any field that theorizes about people as an intelligent and respectful effort to provide an empirical basis for differences that are
discovered between genders. In an educational context this would suggest that different types of behavior should be expected from
boys and girls. There is also the implication that moral education interventions should be tailored to the appropriate gender .
Morality and Identity
Morality and identity are interrelated aspects of the individual personality. Spontaneous moral judgment is an integral part of social
conduct and reflects a dimension of one's personality. Piaget (1954) alludes to this connection with the concepts of egocentrism and
decentering. Egocentrism is the inability to see the perspective of others. Decentering is the process of moving from egocentrism to
sociocentrism which involves role-taking ability and empathic reactions to others. Decentering depends highly on conscious social
activity and an individual's personal sense of autonomy. Decentering coincides with Piaget's Concrete Operational Stage (age 7 or 8)
when children's language reveals more concern about others and less self-involvement (Piaget, 1954).
In the context of morality, identity formation embraces two central concepts. The first concept is primary personal identity which is
phenomenal thinking that occurs with a sense of ownership (conscious thought). The second concept is autonomous identity which
is phenomenal experiences which are not accompanied by that sense of ownership (unconscious processes). The primary identity is
prominent and continuous through the life span. The autonomous identity exists as a conscience or inner entity (Baldwin, 1906)
which is perceived or summoned during moments of ethical deliberation. The autonomous identity is accessed only when conflict is
present.
According to Davidson and Youniss (1991) the “experiencing self” slips between the primary and autonomous identifies without a
break in phenomenal continuity due to the way that thought seamlessly skips over temporal gaps. This has several implications for
the study of moral development. First, it suggests that social activity is conducive to moral development. Second, it describes what it
is that develops - the seamless internal dialogue between the primary and autonomous identities. Finally, it predicts the course of
moral development into adulthood coinciding with Piaget’s stages of cognitive development.
Another implication of Davidson and Youniss’ (1991) view of the “experiencing self” is the clarification between spontaneous moral
judgment and moral theorizing. They propose that spontaneous moral judgments are universal where moral theorizing is bound to
include elements derived from cultural (or other pluralistic) assumptions and practices. The link between moral judgment and moral
identity is that both begin from the same underlying structures of thought (cognition).
The Socialization of Moral Judgment and Behavior: A Cross-Cultural Perspective Bronfenbrenner and Garbarino (1976)
developed a typology of moral orientation that consists of five stages; 1. self orientation- individual is motivated by impulses of self-
gratification without regards to or expectations of others. 2. authority-oriented - individual generalizes parental structures and
values to include moral standards of other adults and authority figures. 3. Peer-oriented - individual becomes an adaptive conformist
who goes along with peer group. 4. Collective-oriented - individual is committed to a set of enduring group goals which take
precedence over individual desires. 5. Objectively-oriented - individual responds to situations on the basis of principles. Smooth
transition from stage to stage requires extensive social interaction. Movement from Type 1 to Type 2 is stimulated by attachment
patterns and responsiveness but is mainly associated with parental prohibitions (Bowlby, 1946). The individual stops responding to
personal impulses in order to please parents or authority figures. Lack of discipline or authority figures hinders progress from the first
to second type. Long term consequences of early social neglect indicate a pattern of psychopathology which might be categorized as
amoral (Bowlby, 1946).
Movement from Type 2 to Type 3 presupposes that the individual has connections to several social agencies (e.g. school, church,
peers, and family) and is being pulled in different, contradictory directions. Moderate dissonance is typical at this stage of life when
the individual is not ready to conform but must make independent decisions that can be applied to concrete situations. There must be
enough tension to require resolution but not enough to be overwhelming. Attainment of Type 3 status also requires a setting where
the individual has opportunities and social support for developing abstract thinking because of overlapping social allegiances.
Unfortunately, these social conditions do not occur in every culture.
The kind of social structure capable of generating Type 3 morality must be pluralistic where the different social agencies represent
different expectations, sanctions, and rewards for members of society. These differences often generate inter-group conflict that is
largely regulated by a set of ground rules (e.g. laws, The Constitution). Inter-group conflict is also subdued by some common
commitment to unifying goals, such as religious ethics. The drawbacks of a pluralistic society are less troubling when compared to its
alternatives. In a monolithic setting there is one set of goals which requires an authority orientation only. In an anomic setting there
are no goals, meaning no integration and no variety. Pluralism occurs on several levels - within families (two parents vs. One,
extended vs. Nuclear), peer group, neighborhood, community, work world, civic and political organizations, each of these settings
varies according to social class.
Social Cognitive Theory of Moral Thought and Action When children are young there are physical boundaries to control their
actions. As they mature, social relations are designed to elicit culturally-acceptable behavior. According to Bandura (1991) cognitive
guides are formed in our social relationships which regulate our conduct under changing circumstances. This view reinforces Reiss’
(1965) Familial and Social Transmission Models which asserts that values and standards of conduct arise from diverse sources of
influence and are promoted by institutional backing. Internalization of standards of moral conduct requires modeling from various
sources. Children observe parents, siblings, peers and other adults to help them determine what behavior is appropriate for what
situation.
Several unfortunate circumstances exist that underscore the need for intentional modeling from those who have direct contact with
children. First, families that are estranged from the mainstream do not heed to institutional values causing confusing inconsistencies
in the moral development of the children involved. Second, social change often arises from a breakdown of transmission between
generations, leaving younger generations to find moral models among their contemporaries, which may not be appropriate. The last
unfortunate reality is that television provides extensive opportunity for modeling aggressive and inappropriately sexualized behavior.
Without adequate discussion of adult material that is readily available to them, children are likely to develop misconceptions about
important issues.
Moral Judgment and Action
If an individual does not perceive evident fact in a predicament they experience cognitive conflict. Cognitive conflict serves as an
equilibrium mechanism which motivates cognitive change (Bandura, 1991). However, discrepancies between events and mental
structures are not automatically motivating. A limit to the balancing effect of cognitive conflict is that events that are too bewildering
or too familiar do not arouse interest or exploration.
Interplay of Personal and Social Sanctions Bandura’s (1991) Social Cognitive Theory identifies three sources of influence;
behavior, cognition, and environment. Behavior usually produces self-evaluative reactions and social effects which may be
complementary or opposing. To enhance the compatibility between personal and social influences people generally select associates
with similar standards of conduct (although interacting with people of differing standards does not necessarily create personal
conflict). In some cases, people who are not much committed to personal standards adopt a pragmatic orientation, tailoring their
behavior to whatever the situation calls for. Selective association and the pragmatic orientation of moral behavior requires certain
abilities according to Bandura (1991). The ability to selectively activate or disengage moral control requires that an individual must
be capable of self-regulating behavior which could be considered purposeful access to the autonomous self.
Moral Disengagement Moral disengagement allows for different types of behavior under the same moral standards. This behavior is
not culturally-acceptable and is, in some cases, reprehensible. The following is a list of moral disengagement practices (Bandura,
1991) along with practical guidelines for educators to follow to counteract their effects.
1. Moral Justification - in this practice people do not engage in reprehensible conduct until they have justified to themselves
the morality of their actions. The conduct is made socially and personally acceptable by portraying it in the service of moral
purposes (e.g. military conduct, killing abortion doctors). The most effective way to counteract this behavior is not through
the individual’s personality structures, aggressive drives, or moral standards but by helping the individual to cognitively
reconstruct the moral value of the action (Bandura, 1991).
2. Euphemistic Labeling is an example of how language shapes the thought patterns on which people base many of their
actions. Diener et al. (1975) found that language possesses a disinhibitory power that allows people to behave in ways that
they would not normally find unacceptable. The use of ethnic slurs are an example of euphemistic labeling. An effective
way to deal with this behavior is to reveal the purpose of using this language. If people are unable to apply negative
generalizations they are more likely to recognize the universal, human traits that others possess which allows for
sympathetic behavior.
3. Advantageous Comparison exploits the contrast principle. The contrast principle declares that when an option is presented
and then compared with another more attractive option, the second option appears to be more appealing than when viewed
alone. A classic example of advantageous comparison in the presentation of an outrageously expensive item by a
salesperson and then being presented with a lower priced but still expensive item. The less expensive item seems cheap by
comparison but may have been perceived as expensive if it were presented alone. In an educational context this practice
should be acknowledged and avoided. Correcting this behavior involves simply recognizing its use in order to make morally
sound decisions.
4. Displacement of Responsibility involves distortion of the relationship between actions and the effects they cause. In many
cases, people blame authorities for their actions rather than accept personal responsibility (Rule and Nesdale, 1976). Use of
this practice depends highly on the legitimacy of the authority being obeyed; the higher the authority, the more people defer.
This external attribution is also affected by other justifications such as social consensus about the morality of the enterprise.
Stressing the value of autonomy and self-direction is a useful way to prevent or diffuse this behavior.
5. In the practice of Disregard or Distortion of Consequences people readily recall information about the potential benefits
of an action but are less able to remember its harmful effects. This response is especially strong when damage is done and
the effects can not be avoided, although evidence of harm will likely be discredited (Mynatt and Herman, 1975).
Personalization is an effective way to counteract this behavior. In Milgram’s (1974) classic obedience study, subjects were
less like to inflict pain when they had personal contact with their victims, even when threatened.
6. Diffusion of Responsibility is reached by division of labor in a moral enterprise. Group decision-making obscures direct
responsibility for particular acts. In some cases people go to great lengths to conceal their part in immoral activities. This
practice is especially evident in bureaucracies where no one person claims responsibility for situations that may be a result
of company policies or government regulations. Individuals may experience intense pressure to participate in such behavior
but must be aware of the consequences that they will face individually.
7. The most common example of Attribution of Blame is victim-blaming. Victim-blaming involves the devaluation of people
incurring harm and gives perpetrators even more tendency toward maltreatment. Victim-blaming makes perpetrators’
actions excusable or even self-righteous. There is sometimes trivialization and distortion of consequences that are so
convincing that victims come to believe the degrading characterizations of themselves. Realistic representations of people
can be useful in limiting the use of this device.
The aforementioned devices do not spontaneously transform a person. Moral disengagement requires conducive social conditions
rather than monstrous people to produce heinous deeds (Bandura, 1991). The practices are usually preceded by gradual diminution of
self-sanctions and are often accompanied by self-exonerations that are necessary to maintain self-esteem. A person’s values and
beliefs affect what information they seek and how they interpret it (Bandura, 1991). The distorted interpretation of morally
disengaged behavior is most likely the result of an individual’s need to possess positive self-regard in light of their immoral behavior.
Dehumanization Dehumanization can be used as an umbrella term to describe all of Bandura’s (1991) morally disengaged
behaviors. To perceive another as human activates empathic emotional reactions through perceived similarities (Bandura, 1991). The
joys and suffering of similar people are more arousing than those of strangers or individuals who have been divested of human
qualities (Bandura, 1991). Dehumanization is an essential ingredient in the perpetration of inhumanities (Kipnis, 1974). Kipnis
(1974) found that those who are seen as subhuman were not only regarded as lacking sensitivitybut as only responsive to harsh
treatment.
A later study found more hopeful results. Helm and Morelli (1979) found that people steadfastly refuse to behave punitively, even in
response to strong authoritarian commands, if the situation is personalized by having powerholders see the oppressed people or by
having them behave directly toward potential victims instead of remotely. This finding might be called “The Power of
Humanization!”.
Summary The theories of moral development were briefly introduced to serve as a conceptual bridge for the theories of moral
judgment and action that followed. I could only attempt to relate the aforementioned concepts to current school settings. Without a
clear moral education agenda, the role of educators in the students’ formation of moral judgment is vague. What is clear is that any
efforts at intervention or social modeling should be done in collaboration with the other social systems that influence the lives of
students.
So many opportunities exist for modeling of negative attributes that much consideration should be given to the active pursuit of a
moral education curriculum which involves deliberate role modeling and opportunities for ethical discussions. The obligations of
educators are ever increasing and their efforts at modeling culturally-acceptable behavior is more important than it has ever been. In
some cases, educators may be the only mainstream influence in a child’s life and may serve as a representative of a world that the
child might not otherwise know. Whether the child chooses to enter the world that the educator represents is an individual decision,
the availability and access to this world is what must become mandatory.
Knowing the order and descriptions of moral stages of development is small part of the educator’s obligation to students.
Understanding the application of these theories to various situations in everyday life and throughout history could possibly be the
beginning of a process of advanced moral reasoning for both teacher and student.
WWW Links
Here are some interesting web links for those who want to explore the issue of morality more thoroughly. Also, the "Reference"
section provides links to publishers and journals whose books and articles were used for this paper. Citizens for Community Values -
a grassroots educational organization that aims to promote ethical discussions through various ventures. Morality and Sirituality in
Children's and Young Adult's Literature - this organization explores morality in classic and contemporary literature. This site also
provides lesson plans for character education courses. Character Education Institute - This organization also provides curriculum
enhacers for grades K-12. Character Education Resources Home Page - This non-profit organization provides resources, workshops,
and continuing education opportunities for educators, administrators, and anyone else who is interested in this issue. Saint Pius Faith
Formation Online! - religious education program links and information for Christian youth and adults. Global Street Corner- a
bulletin board for issues that concern adolescents and the people who love them. The Opportunity for Adolescents Research and
commentary about psychological health and effective identity formation. Family Research Centerthis Judeo-Christian organization
promotes religious values and gives advice for parents. Youth and Children's Resource Net This site includes resources and
information about various aspects of psychological development including current research.
References
Baldwin, J.M. (1906). Social and ethical interpretation in mental development (4th edition) New York: Macmillan. First published
1897.
Bandura, A. (1991). Social Cognitive theory and social referencing in Feinman S. Social referencing and social construction of
reality. New York: Plenum.
Bowlby, J., (1946). Forty juvenile thieves: their character and home life. London: Hogarth
Brock, T.C. and Buss, A.H. (1962) Dissonance, aggression, and evaluation of pain. Journal of Abnormal and Social Psychology, 65,
197-202.
Brock, T.C. and Buss, A.H. (1964) Effects of justification for aggression and communication with the victim on postaggression
dissonance. Journal of Abnormal and Social Psychology, 68, 403-412.
Darley, J.M., Klosson, E.C. and Zanna, M.P. (1978). Intentions and their contexts in the moral judgments of children and adults.
Child Development, 49, 66-74.
Davidson, P., Youniss, J. (1991) Which comes first, morality or identity? In Handbook of moral behavior and development, Volume
1: Theory (Eds.) Kurtines, W.M., Gewirta, J.L. Lawrence Erlbaum Associates: Hillsdale NJ.
Diener, E., Dinne, J. Endresen, K., Beamen, A.L. and Fraser, S.C. (1975). Effects of altered responsibility, cognitive set, and
modeling on physical and deindividuation. Journal of Personality and Social Psychology, 31, 328-337.
Garbarino,J. And Bronfenbrenner, U. (1976). The socialization of moral judgment and behavior in cross-cultural perspective. In
Moral Development Behavior (Ed) Lickona, T. New York: Holt Rinehart and Winston. Pp70-83.
Gilligan, C. (1977). In a different voice: women’s conceptions of self and of morality. Harvard Educational Review. pp.481-517.
Haney, C., Banks, C., and Zimbardo, P. (1973). Interpersonal dynamics in a simulated prison. International Journal of Criminology
and Penology, 1, 69-97
. Helm, C. andMorelli, M. (1979). Stanley Milgram and the obedience experiment: authority, legitimacy, and human action. Political
Theory, 7, 321-346.
Kipnis, D. (1974) The powerholders. In J.T. Tedeschi (Ed.). Perspectives on social power (pp.82-122) Chicago: Aldine.
Kohlberg, L. (1969). Stage and sequence: the cognitive approach to socialization. In Goslin, D.A., ed. Handbook of socialization
theory and research. Chicago: Rand McNally.
Milgram, S. (1974). Obedience to authority: an experimental view. New York: Harper and Row.
Piaget, J. (1932). The moral judgment of the child. London: Routledge and Kegan Paul.
Piaget, J. (1981) Intelligence and Affectivity: Their relationship during child development. Palo Alto CA: Annual Review
Monographs. (First Published, 1954).
Rule, B.G., and Nesdale, A.R. (1976). Moral judgments of aggressive behavior. In Geen, R.G. and O’Neal, E.C. (Eds). Perspectives
on Aggression (pp.37-60) New York: Academic Press.
Stayton, D., Hogan, R., and Ainsworth, M. (1971). Infant obedience and maternal behavior: origins of socialization reconsidered.
Child Development, 42, 1057-1069.
YOUR QUESTIONS & COMMENTS ARE WELCOME
Yvonne L. LaMar
© 1997 Copyright All Rights Reserved, Yvonne L. LaMar

DIRECTORY OF PROGRAM EVALUATION >RESOURCES
American Evaluation Association (AEA)

http://www.eval.org/
"We are an international professional association of evaluators devoted to the application and exploration of program evaluation,
personnel evaluation, technology, and many other forms of evaluation. "
Evaluators, Links of Interest to

http://www.eval.org/links.htm
Evaluating Nutrition Programs

http://trochim. human.cornell.edu/gallery/durgan/srdurgan.htm
National Network for Collaboration

http://www.cyfernet.mes.umn. edu:2400/build.html
"The information presented here utilizes the knowledge and expertise of specialists from the National Network for Collaboration to
provide a guide to begin, strengthen and sustain the collaborative journey, for the building and sustaining of positive change."
The National Network for Family Resiliency (NNFR) Evaluation Work Team
http://www.agnr.umd.edu/u sers/nnfr/evaluation.html
". . . (P)rovides abstracts of evaluation tools and a bibliography of evaluation resources that address issues identified by Special
Interest Groups."
Return to:
DIRECTORY I Spirituality / Ethnic Identity I Kinship I Political and Civic Involvement I School / Youth I Personal Efficacy / Economic Opportunity I Personal
Efficacy / Health Care I Community Development, Organizing and Activism I Child Care I Housing I Personal Efficacy / Community Safety and Security I

CONNECTIONS AND RESOURCES
Strengthening Spirituality, Ethnic Identity and Understanding
Strengthening Spirituality, Ethnic Identity and Understanding
● Spritual and Morality Building Resources

● Ethnic Pride
● Developing Understanding
● Spritual and Morality Building Resources
● WEB SITES
Christmas and Hanukkah Cards and Letter to Santa Page

http://www.marlo.com/allages.htm
This is an interactive page for adults and kids to send letters to anyone they want--including Santa! This sites has lots of
other, non-seasonal fun things for kids, too.
Christmas Crafts Page

http://www.merry-christmas.com/cra fts.htm
This page has numerous holiday craft ideas for parents and kids to do together.

http://trochim.human.corn ell.edu/gallery/lamar/lamar1.htm
● Building Ethnic Identity and Pride: African American Resources

African Centered Educational Homepage

http://http://home.earthlink.ne t/~bcd227/index.html
The Afro-American Newspaper Company of Baltimore, Inc.

http://www.afroam.org/index.html
Along with a vast array of news and information for children, teens, and adults alike, this site also contains a list of
Web Links to Africa http://www.afroam.org/cultu re/africa/africa.html
African American Women Online

http://www.uoknor.edu: 80/jmc/home/gmccauley/text.html
Includes links for:
African American Women's Health Issues
Famous African American Women Writers
Links for African American Women Who Write
College Bound
Cornell University's John Henrik Clarke Africana Library

http://www.library.cornell.edu/afri cana/
The John Henrik Clarke Africana Library provides a special collection focusing on the history and culture of peoples of
African ancestry. The library serves as a bibliographic reference and referral center by providing access to African and
African American resources available either in the Cornell University Library or collections at other institutions.
This site includes links to:
ABOUT THE CLARKE AFRICANA LIBRARY
SELECTED AFRICANA LIBRARY RESOURCES
SELECTED NEW BOOKS
AFRICANA STUDIES & RESEARCH CENTER
CORNELL AFRICAN & AFRICAN-AMERICAN COLLECTIONS
MBATA BOOK FUND
UPCOMING LOCAL EVENTS
LIBRARY CATALOGS
INSTITUTE FOR AFRICAN DEVELOPMENT
AFRICANA STUDIES WEB SERVERS
100 BMC NEWS

http://www.govirtual.com/100bmc/
"To improve the quality of life of our citizens and enhance educational opportunities for African-Americans and minorities,
through it's chapters, in all communities -- with a particular emphasis on young African-American males."
100 Black Men Of America, Inc. Charlotte Chapter

http://ww w.charweb.org/organizations/nonprofits/100bm/index100.htm
"A group of concerned African American men whose goal is to improve the quality of life in the African American
community through their collective resources, abilities, and experiences.
In developing its programmatic agenda, the 100 Black Men recognized the challenges confronting our African American
Youth, and strive to provide leadership and support in addressing these and other critical needs of the Black Community."
100 Black Men Of America, Inc. Tallahassee Area Chapter

http://www.freenet.tlh.fl. us/~tacot1bm/index.html
100 Black Men Of Western Pennsylvania

http://hillhouse.ckp.edu/servic es/100bm.html
"Dedicated to providing our youth positive role models, educational assistance and creative alternatives to achieving
personal success."
Online Resources for African American Women and Womanist Studies

http://www.uic.edu/~vjpitch/
Tales of Wonder
Web Links to Africa

http://www.afroam.org/cultu re/africa/africa.html
Links to web sites originating in Africa.
WebDiva InfoCenter
http://www.afrinet.net/%7Ehallh/
An extensive, dinamic African American Resource site. MANY MANY links to other Afrocentric sites.
● Building Ethnic Identity and Pride: Asian American Resources
Project AIYME
http://www-leland.stanford.edu/g roup/aiyme/
"Project AIYME stands for Asian American Initiative for Youth Motivation and Empowerment. Project AIYME is a fun
and exciting program that promotes youth leadership and empowerment through mentoring and peer interaction.
Our primary mission is to provide Asian American junior high students throughout the South Bay with positive role models
from the Stanford community and from their own peers. Students attend weekend retreats at Stanford University and special
outings and day programs with their mentors and AIYME coordinators."
Tales of Wonder
❍ Building Ethnic Identity and Pride: Native American Resources
Sioux Heritage / Lakhota Language

http://lakhota.com/
On this site, you will discover Lakhota translation of over 100 common American-English words and phrases. If
you prefer audio, check out " Lakhota Online," Professionally recorded audio of native speaking elders.
Tales of Wonder
This site is a collection of folk and fairy tales from around the world, from Russia, Central Asia, Japan, the Middles
East, Scandinavia, Scotland, England, Africa, North (Native) America, etc.
❍ Developing Understanding
National Center for Diversity

http://www.cyfernet.mes.umn.edu/
Touches on some areas of diversity without going very far in depth, or towards building an understanding of the
viewponts and issues of non-white Americans. Still there is a little information here that may be of some use. . .
United States Holocaust Memorial Museum Washington, DC

http://www.ushmm.org/
DIVERSITY FORUM
Instructions:
Return to:
WEB SITE, LISTSERVE DIRECTORY I EVALUATION DIRECTORY I Kinship I Political and Civic Involvement I School / Youth I Personal
Efficacy / Economic Opportunity I Personal Efficacy / Health Care I Community Development, Organizing and Activism I Child Care I Housing I
Personal Efficacy / Community Safety and Security I Recreational Opportunities
Reserved.

COMMUNITY DEVELOPMENT CONNECTIONS AND
RESOURCES
Kinship and Fictive Kinship
Kinship and Fictive Kinship
● Networks of Support for Parents

● Parenting Education Resources
● Web Resources for Parents

Thinking about those college application deadlines? This web site contains information on various scholarships,
internships, grants, etc. for minority students.
Dr. Toy
http://www.drtoy.com
For toy recommendations, consult Dr. Toy. Child psychologist Stevanne Auerbach reviews and rates out-of- the-
mainstream toys, and gives contact information for the manufacturers. Parenting Online, on America Online
(keyword: Parenting), features forums where parents find out what kids like.
Internet for Parents Site

http://www.respress.com/kids _parents/rlist.htm/
This site has stories and games for younger children and older children to surf. Scroll down from the initial screen to
Chapter Eight to get to the fun stuff for kids, including Around the world in 10 minutes, Surfing with young kids,
Surfing with older kids, and Knowledge surfing.

Tales of Wonder
http://www.ece.ucdavis.edu/~da rsie/tales.html
This site is a collection of folk and fairy tales from around the world, from Russia, Central Asia, Japan, the Middles
East, Scandinavia, Scotland, England, Africa, North (Native) America, etc. One of my personal favorites.

● LISTSERVES
Below are listed several mail groups to which you may want to subscribe. For more possibilities, visit www.cas.psu.
edu/do cs/cyfernet/handout4.html and look at others listed there.

Instructions:

Instruction:
EDNET
Instructions:
FATHERNet DISCUSSION GROUP

A forum for discussing the importance of fathers in children's lives. Thoughtful dialog on spanking, child support, time
management, equality in parenting, and men concerned about being good fathers.
Instructions:
SUBSCRIBE: send email to listserv@tc.umn.edu with the command: subscribe father-L your name
UNSUBSCRIBE: send email to listserv@vm1.spcs.umn.edu with the command: unsubscribe father-L
TO SEND MAIL TO THE GROUP: father-L@vm1.spcs.umn.edu
PARENTING DISCUSSION GROUP

A discussion group on topics related to parenting children (including child development, education, and care) from birth
through adolescence.
Instructions:
SUBSCRIBE: send email to listserv@postoffice.cso.uiuc.edu with the command: subscribe PARENTING-L yourname
UNSUBSCRIBE: send email to listserv@postoffice.cso.uiuc.edu with the command: unsubscribe PARENTING-L
TO SEND MAIL TO THE GROUP: PARENTING-L@postoffice.cso .uiuc.edu


care, in depth articles on selected topics, and practical advice. The articles, written by health care professionals, discuss many
different topics. The editor, Rich Sagall, M.D., is a board certified family practitioner. This is not a mail discussion group;
you only receive mail by subscribing to it.
Instructions:
Return to:

COMMUNITY DEVELOPMENT CONNECTIONS AND RESOURCES
Political and Civic Involvement
● State / Local Resourcess
● Voter Participation Resources
● Legal Assistance and Information
● WEB SITES
HUD HOME PAGE

Department.
1,000 communities.
Social Security Administration

http://
● State / Local Resources
New York State - Counties, Cities, Towns, School Districts, and Schools Web sites
Town of Caroline, NY
http://munex.arme.cornell.edu/carolin e
Danby, NY
http://munex.arme.cornell.edu/danby
Groton
http://munex.arme.cornell.edu/groton
Ithaca, NY, City of

http://www.ci.ithaca.ny.us
Maintained by the city clerks office. Tracks Common Council, boards, commisions and their schedules. Community
members can read Common Council minutes and find out who represents areas of the community. Info on parking, noise
poluution regs and marraige license info also available.
Ithaca, NY - Cayuga Heights Elementary

http://k12.cnidr.org/gsh/school s/ny/che.html
IthacaNet
http://www.ithaca.ny.us/
Lansing, NY
http://www.ithaca.ny.us/Govt/Lansin gTown
Lansing, NY - Lansing Central Schools

http://www.ithaca.ny.us/Education/LCS
Tioga County Web Site

http://www.spectra.net/~tioga/
Trumansburg, NY Central Schools

http://www.trumansburg.ny.us/tcs
Ulysses, NY
http://www.trumansburg.ny.us
LISTSERVES

Instructions:
● Voter Participation Resources
Project Vote Smart

http://vote-smart.org/
National nonpartisan, nonprofit political information service providing educational materials. Currenlty, Project Vote Smart
provides information for the general public through:
* The Voter's Research Hotline 1-800-622-SMART
* The Vote-Smart Web http://www.vote-smart.org
* The Voter's Self-Denfense Manual
● Legal Assistance and Information
Center for Civic Education Programs and Publications

http://www.primenet.com/%7Ecce/c atalog.html
Cornell Legal Information

http://www.law.cornell.edu
The Innocence Project

http://www.criminaljus tice.org:80/PUBLIC/innocent.htm
The Innocence Project has helped to obtain the release of more than eight innocent prisoners with new DNA tests and
evidence which excluded them as participants in crimes for which they had been convicted.
Juvenile Justice World Wide Web (WWW) Site Listings

http://www.ncjrs.org/jjwww.htm
NYSDA's Home Page

http://www.nysda.org/
U.S. Children's Rights Law from Cornell Law School.

http://www.law.corne ll.edu/topics/childrens_rights.html
Western New York Law Clinic

http://www.wnylc.org
Excellant source of information about welfare reform legislation!
Return to:
COMMUNITY DEVELOPMENT CONNECTIONS AND
RESOURCES
Building Strong, Youth Focused

In-School and After School Activities
Building Strong, Youth Focused In-School and After School Activities

● Teen focused and Child focused lists of computer skill building resources
● Teen Employment and Career Development
● School-to-Work Resources
Includes links to College Access Program Resources, College Scholarship Information (for students of all ages), Lists
of On-line College Resources, and Links to Technical Training and Job Preparation Resources.
● WEB SITES
● LISTSERVES
Below are listed several mail groups to which you may want to subscribe. For more possibilities, visit www.cas.psu.edu/do
cs/cyfernet/handout4.html and look at others listed there.

Instructions:

Instruction:
EDNET
Instructions:
SCHOOL AGE CARE DISCUSSION GROUP

A forum for discussing school-age child care issues and topics of concern, including planning, resources, activities, funding,
staff and staff development, and related subjects. The list is operated by the ERIC Clearinghouse on Elementary and Early
Childhood Education (ERIC/EECE) and the School-Age Child Care Project (SACCProject) at the Center for the Study of
Women, Wellesley College.
Instructions:
SUBSCRIBE: send email to Majordomo@ux1.cso.uiuc.edu with the command: subscribe SAC your name
UNSUBSCRIBE: send email to Majordomo@ux1.cso.uiuc.edu with the command: unsubscribe SAC
TO SEND MAIL TO THE GROUP: sac@ux1.cso.uiuc.edu
SCIENCE & TECHNOLOGY (Part of the CYFAR Initiative)

A discussion of science and technology nonformal education issues.
Instructions:
SUBSCRIBE: send email to almanac@mes.umn.edu with the command: subscribe sciteced
UNSUBSCRIBE: send email to listproc@listproc.wsu.edu with the command: unsubscribe sciteced
TO SEND MAIL TO THE GROUP: sciteced@mes.umn.edu
● Teen focused and Child focused lists of computer skill building resources
● Teen Employment and Career Development
● School-to-Work

College Bound

Thinking about those college application deadlines? This web site contains information on various scholarships, internships,
grants, etc. for minority students.
Return to:

Economic Opportunity
● Self-employment and Resources to Start your own Business

● Adult Welfare-to-Work Resources
● Links to Technical Training and Job Preparation Resources
● Lists of On-line College Resources
● College Scholarship Information (for students of all ages)
● Teen focused computer skill building resources
● Child focused computer skill building resources
● Computer Skill Building Resources (Adult Focused)
● Self-employment and Resources to Start your own Business
HUD HOME PAGE

The U.S. Department of Housing and Urban Development (HUD) is expanding its use of the "information
superhighway" to give the public quick access to HUD programs and services and to make it easier for people to do
business with the Department.
HUD' s home page is not just about HUD; it's a clearing house of information on homes and communities. It
provides information from other federal agencies, non-profits and the private sector through links to about 175 other
Web sites.
Individuals can access information on jobs skills training, community services, opportunities of employment,
education. Also you will learn how HUD funds are being spent by local governments, and other data summaries and
maps on nearly 1,000 communities.
LISTSERVES

The Communities in Economic Transition national initiative focuses on providing two types of assistance to
communities; strategic planning; and business assistance and enterprise development. This forum is open for
practitioners, academics, and others to share ideas, programs, and opinions. It is open to everyone, regardless of area
of specialty or discipline.
Instructions:


The Adams Media Co. specializes in self-help books for those seeking work and job mobility. This online resource
provides leads to jobs in computers, "general" work areas, education, social work, sales, health care, and etc. It also
has a database searchable by state.
America's Job Bank

This is a MASSIVE site with information on about 250,000 jobs. Listings are from 1,800 state employment service
offices around the country and represent every line of work from professional and technical to blue collar and entry
level.
Boldface Jobs
This site not only allows you to search various databases for jobs but also to enter your name and resume in the
database as well.
CareerPath
This site allows you to check out the want ads in the Sunday editions of the New York Times, The Washington Post,
the Chicago Tribune, the L.A. Times, the Boston Globe, and other newspapers. If you are targeting employment in a
particular city, here's the place to go. Unfortunately, other than the New York Times, there are no other New York
cities' papers listed here.
Careers and Jobs

An exhaustive set of databases, this site carries hundreds of job listings and links to other sites in the areas of
academia, law enforcement, chemistry, industry, computer, and the environment. It also provides links to groups
offering resume services, job fairs, and training and development.
Federal Jobs
For those interested in working for Uncle Sam, here's a listing of all, government agency jobs around the country.
You can limit your site by region or engage in more general searches by job types.

Contains links to several sites for jobs especially within New York State. You can link to the "Monster Board,"
Career Plaza, Federal Jobs Digest, NY Search Job Descriptions, Syracuse Online, Western NY Jobs, and the "Silicon
Alley Job Board" through this site.
● LISTSERVES

Careers and Jobs

America's Job Bank

This is a MASSIVE site with information on about 250,000 jobs. Listings are from 1,800 state employment service
offices around the country and represent every line of work from professional and technical to blue collar and entry
level.

The Adams Media Co. specializes in self-help books for those seeking work and job mobility. This online resource
provides leads to jobs in computers, "general" work areas, education, social work, sales, health care, and etc. It also
has a database searchable by state.
Boldface Jobs
This site not only allows you to search various databases for jobs but also to enter your name and resume in the
database as well.
Federal Jobs
For those interested in working for Uncle Sam, here's a listing of all, government agency jobs around the country.
You can limit your site by region or engage in more general searches by job types.
CareerPath
This site allows you to check out the want ads in the Sunday editions of the New York Times, The Washington Post,
the Chicago Tribune, the L.A. Times, the Boston Globe, and other newspapers. If you are targeting employment in a
particular city, here's the place to go. Unfortunately, other than the New York Times, there are no other New York
cities' papers listed here.

This site, maintained by RRIC's own Dan Childers, contains links to several sites for jobs especially within New
York State. More specifically, you can link to the "Monster Board," Career Plaza, Federal Jobs Digest, NY Search
Job Descriptions, Syracuse Online, Western NY Jobs, and the "Silicon Alley Job Board."

● Adult Welfare-to-Work Resources LISTSERVES

The Communities in Economic Transition national initiative focuses on providing two types of assistance to
communities; strategic planning; and business assistance and enterprise development. This forum is open for
practitioners, academics, and others to share ideas, programs, and opinions. It is open to everyone, regardless of area
of specialty or discipline.
Instructions:
● Links to Technical Training and Job Preparation Resources (SEE ALSO: Adult Education Resources and )
Careers and Jobs
Return to top of page HUD HOME PAGE
http://www. huduser.org and the text version can be found at: gopher://huduser.org
The U.S. Department of Housing and Urban Development (HUD) is expanding its use of the "information
superhighway" to give the public quick access to HUD programs and services and to make it easier for people to do
business with the Department.
HUD' s home page is not just about HUD; it's a clearing house of information on homes and communities. It
provides information from other federal agencies, non-profits and the private sector through links to about 175 other
Web sites.
Individuals can access information on jobs skills training, community services, opportunities of employment,
education. Also you will learn how HUD funds are being spent by local governments, and other data summaries and
maps on nearly 1,000 communities.
(Information about this site provided by Pat Pollak, CEH, Cornell University College of Human Ecology.)

● Lists of On-line College Resources
● College Scholarship Information (for students of all ages) College Funding Site
Thinking about those college application deadlines? This web site contains information on various scholarships,
internships, grants, etc. for minority students.
● Teen focused computer skill building resources
● Child focused computer skill building resources

● Computer Skill Building Resources (Adult Focused)
Print Publications about the Internet

http://www. hec.ohio-state.edu/famlife/technol/printpub.htm#intro
There are numerous print resources about the Internet and the World Wide Web. This site briefly reviews the
following to help you make purchasing decisions.
Glossary of Internet Terms

http://www.matisse.net/files/glos sary.html
Wondering what people are talking about when they say words and phrases like ASCII, Java, TCP/IP. and
Spamming? Here's the site to help you make sense of it all
WWW Frequently Asked Questions

http://sunsite.unc.edu/boute ll/faq/www_faq.html (new) OR
http://www.arn.net/faq/wwwfaq1.html (old)
Head here to read basic questions and answers about the Web. It is suppose to have a new address (listed first), but I
could only find it at the old address (listed second). This page is updated frequently with the latest developments.
The Net Magazine Online

http://www.thenet-usa.com
This magazine contains many interesting developments and web sites. Especially interesting and helpful are the
"Sites of the month" and the "blue pages."
PC Magazine Online
http://www.pcmag.com/pcmag.htm
Another good resource online is PC Magazine. The top 100 web sites section is a sure winner.
Return to:
Reserved.

ORGANIZING AND ACTIVISTS CONNECTIONS AND RESOURCES
Personal Efficacy / Access to Health Care
● Health Care Resources

Cancer
General Topics
Sudden Infant Death Syndrome Information:
● Substance Use Recovery and Prevention Resources
● Evaluations of health / nutrition programs - See Directory of Program Evaluation Web Sites
● Health Care Resources
WEB SITES
Cancer

AVON'S BREAST CANCER AWARENESS CRUSADE

http://www.pmedia.com/Avon/library.h tml
Includes:
Glossary of Terms
Breast Cancer FAQ
More Facts about Breast Cancer
What You Should Know About Mammograms
Breast Cancer Support Groups
NATIONAL ACTION PLAN ON BREAST ( NAPBC) Web site

http://www.napbc.org/
General Topics
Decisions for Health Network (NDHN)

http://wysiwyg: //127/http://www.oznet.ksu.edu/ndhn/index.htm
"The mission of the National Decisions for Health Network (NDHN) is to support and empowe individuals, children,
youths, families and communities as they make decisions about health and health care and adopt healthy lifestyles. This
network will provide technical assistance to the children, youth and family customers and programs focused on health issues
across the life span."
KidsHealth
http://kidshealth.org/
Health education for kids and parents from the Nemours Foundation with separate browsing areas for kids, parents and
professionals. Includes many list of links to family health related sites.
Women's Health Weekly

http://www.newsfile.com/homepage/1w. htm
Sudden Infant Death Syndrome Information
The Family Village Web site

http://www.familyvillage.wi sc.edu/lib_sids.htm#Articles
Waisman Center
University of Wisconsin--Madison
1500 Highland Ave.
Madison, WI 53705-2280
Includes links to other SIDS related WEB sites.
National SIDS Resource Center Web site

http://www.circsol.com/SIDS/
2070 Chain Bridge Rd., Suite 450
Vienna, VA 22182
Telephone: 703-821-8955
(703) 821-2098 (Fax)
info@circsol.com
Reducing the risks of SIDS: Some steps parents can take.

http://sids-network.org/risk.htm
(1996, November 29). Sudden Infant Death Syndrome. Network. [On-line].
LISTSERVES

Instruction:
care, in depth articles on selected topics, and practical advice. The articles, written by health care professionals, discuss
many different topics. The editor, Rich Sagall, M.D., is a board certified family practitioner. This is not a mail discussion
group; you only receive mail by subscribing to it.
Instructions:

● Substance Use Recovery and Prevention Resources
Return to:
Youth I Personal Efficacy / Economic Opportunity I Community Development, Organizing and Activism I Child Care I Housing I Personal Efficacy / Community
Safety and Security I Recreational Opportunities

COMMUNITY DEVELOPMENT, ORGANIZING AND ACTIVIST CONNECTIONS
AND RESOURCES
Who's Who in Community Development, Organizing and Activism
Who's Who in Community Development, Organizing and Activism
● Community Centers
● Community Development, Organizing and Activism Projects
● Community Development, Organizing and Activism Training Resources
● Community Centers
New York State
Ithaca, NY - Southside Community Center

http://tam.cornell.e du/faculty/ruina/www_ribs/sscs.html
● Community Development, Organizing and Activism Projects
New York State
● Community Development, Organizing and Activism Training Resources
Building Community: African American / Diversity Focused
Neighborhood Assoc. Manual

http://www.open.org/scserv/cid.html
From Salem, Oregon
Building Diverse Communities

http://www.cpn.org/sections/topics/community/stories-studies/pew_diver se_com.html
From the Sea Islands, SC Excerpts from "Black Baltimore, A New Theory of Community", by Harold A. McDougall
http://www.cpn.org/sections/topics/community/stories-studies/black_baltimore 7.html
http://www.cpn.org/sections/topics/religion/stories-studies/black_balt imore8.html
Excerpted from Harold A. McDougall's Black Baltimore: A New Theory of Community. Case study plus. Baltimore,
MD.
Building Community: Religious Focus

http://www.cpn.org/sections/topics/religion/stories-studies/black_balt imore8.html
A study of religious "base communities" in Baltimore. Base Communities are small, intimate peer groups of a dozen
or two dozen people, in which participants can evaluate the day's struggles, commiserate with one another's failures,
celebrate success, and plan for the next day's fight in larger public arenas. Excerpted from Harold A. McDougall's
Black Baltimore: A New Theory of Community. Case study plus. Baltimore, MD.
Return to:
Youth I Personal Efficacy / Economic Opportunity I Personal Efficacy / Health Care I Child Care I Housing I Personal Efficacy / Community Safety and Security I
Reserved.

Community Safety and Security
● Neighborhood Watch and Child Safety Resources

● Human Rights Resources
● Police/Civilian Review Board Resources
● Neighborhood Watch and Child Safety Resources
Biddeford, Maine, Police Department

http://www.lamere.net/bpd
Chicago, Ill., Police Department

http://www.ci.chi.il.us/CommunityP olicing
Community Policing Consortium

http://http://www.communitypolicing.org
Community policing topics are updated monthly and include an array of information such as training sessions and training
curricula, versions of the Community Policing Exchange, Sheriff Times and Information Access Guide, and a growing list of
related web sites.
HUD HOME PAGE

Department.
1,000 communities.
National Criminal Justice Reference Service (NCJRS)

http://www.ncjrs.org
gopher://ncjrs.org
NCJRS is the largest criminal justice information netwrok in the world. The National institute of Justise (NIJ) created
NCJRS to furnish research findings to professionals who use the knowledge to improve the criminal justice system. NCJRS
operates specialized clearinghouse that are staffed by information specialists who offer reference, referral and distibution
services.
The United Nations Online Crime Prevention and Criminal Justice Network Clearinghouse
http://www.ncjrs.org/unojust
Washington State Institute for Community Oriented Policing

http://www.idi.wsu.edu/wsicop/
The mission of the Washington State Institute for Community Oriented Policing (WSICOP) is to broker relevant community
policing training programs to community members, police and/or other government officials that will provide technical
assistance to law enforcement agencies, and to conduct research on the implementation and results of community oriented
policing.
LISTSERVES

Instructions:

Instruction:

Instructions:
● Human Rights Resources
LISTSERVES
DIVERSITY FORUM
Instructions:

● Police/Civilian Review Board Resources
Return to:

PREPARING 4-H
FOR THE TWENTY-FIRST CENTURY
While researchers and evaluators spend most of their time looking back, into the past, their intentions are to use their resulting understandings
and greater knowledge to make a contribution to the future. Policy makers, funders, administrators, staff, and clients of social programs often use
evaluation results in order to improve their programs and move them successfully into tomorrow. 4-H is no exception, it has seen evaluations of its programs at the local, state
and national levels. This web site is designed to serve as a central web resource for people interested in researching or evaluating 4-H programs. Web researchers will find it a
valuable tool as they prepare 4-H for its journey into the twenty-first century.
What is 4-H?
4-H is an organization which provides educational extra-curricular programs for children and teenagers. A brief explanation of 4-H is available at the USDA
website. The United States Department of Agriculture maintains a relationship with each state's land grant university through the Cooperative Extension
System. Each state has county extension offices which are responsible for organizing local 4-H clubs, programs and events. For links to state, county and club
4-H websites, click on the image of the United States to the right.
Visit Nancy's 4-H website for a great explanation of what 4-H is all about from a 4-H 'er herself.
The National 4-H Council also maintains a website which will provide you with "more than you ever imagined".
For more information on the history of 4-H take a trip to your library and check out the following two books:
Reck, Franklin M. (1951) The 4-H Story: A History of 4-H Club Work. Iowa: The Iowa State College Press.
Wessel, Thomas and Marilyn Wessel. (1982) 4-H: An American Idea: A History of 4-H. Michigan: Bookcrafters.
Both of these books provide a great deal of information on the early years of 4-H. Reck's book goes into great detail about the agricultural and home economics foundations of
the organization. Today, the program still maintains a significant population of youth who participate in this same genre of activities. However, the program has grown
tremendously during the past few decades to include an almost infinite list of program areas. Rocketry, computer science, natural resources, and community service are only a
few of the activities that 4-H members now learn about and participate in. The subject areas that 4-H covers continues to expand,but the philosophy of the organization remains
the same: to follow the 4-H motto of "learning by doing", while at the same time contributing to positive youth development.
4-H Around the World

Today, 4-H can be found internationally. 4-H clubs in Norway, Sweden, and Finland are presently maintaining websites. However, you
must be warned, they are in their respective languages. There is a second Finnish website which English speakers will find more
accessible.
Click on the logo to the left to visit 4-H Canada.
Because 4-H has grown in many different directions, evaluations of 4-H are difficult to conceptualize and implement. Some 4-H participants are members of traditional 4-H
clubs, where there are regularly scheduled meetings and events. Often, these clubs specialize in one subject area (horses, dairy farming, community service, sewing, etc.).
However, there are many youth who are reached by non-traditional 4-H programming. For example, they may have participated in a one day, 4-H organized "Down on the Farm
Day" or they may have been a part of a "4-H International Night". 4-H staff may have presented workshops in schools or camps on nutrition, water conservation or bicycle
safety. This broad scope of programs and differing levels of participation leaves 4-H evaluators wondering where to begin. The following list of 4-H research articles and
abstracts are respresentative of the different methodologies and focuses evaluations of 4-H programs can have.
4-H Research Articles and Abstracts
[Extension] Philosophy for Evaluation
4-H On the Internet
Pennsylvania's Impact Study
Notes on a 4-H Ropes Course Evaluation
4-H Cares Evaluation Instrument Review
4-H Strikes a Positive Note
Measuring self-esteem of 4-H er's
Life Skill Development Related to Participation in 4-H Animal Science Projects
Unobtrusive Evaluation Tools for 4-H
4-H After School Assistance Program
4-H Research links

The majority of the above articles were published by the Journal of Extension. This Cooperative Extension System publication
is a peer reviewed journal which is now available only on-line. Note: When conducting a search for 4-H information within the
Journal of Extension, the gopher search engine provides the best results. Their other search engine does not read "4-H" with
much success. The Cooperative Extension System also contributes to CYFERNet, Cooperative Extension System Children,
Youth and Family network. CYFERNet is an internet information site which provides up to date resources for research based,
children, youth and family information. You can connect CYFERNet and the Journal of Extension by clicking on their logos.
Another excellent research and evaluation source for 4-H is maintained by the Michigan State University Extension System. Their research and
evaluation site is worth visiting.
For more information about Cooperative Extension visit the Cooperative State Research Extension Education System hompage as well as the USDA homepage.
You may want to check out the homepage of WestEd, a non-profit research, development and service agency. Another potentially useful site is the The National Center for
Education Statistics. There are many other sites which can contribute to 4-H related research. Because 4-H is an all encompassing program, all related links could not possibly
be listed here.
Researchers and evaluators of any program will find the resources and links at Bill Trochim's Center for Social Research Methods
extremely valuable.
4-H Point of Interest
Those of you familiar with 4-H might be aware of a new advertising campaign that is underway. This campaign is being organized by National 4-H Council and
the Ad Council (the people who introduced America to Smokey the Bear, Woodsy the Owl, and McGruff the Crime Dog). This campaign is a part of the Ad
Council's Commitment 2000. Visit their site for more information.
This page was designed by Elizabeth K. LaPolt for HSS 691-Program Evaluation and Research Design, a graduate course at Cornell University.
Notes: Cornell University is the land grant university of New York State.
4-H in New York State is a part of Cornell Cooperative Extension.

Please e-mail any comments or questions to ekl5@cornell.edu.
Copyright ©1997,Elizabeth K. LaPolt, All Rights Reserved

Home Purpose of this page
The purpose of this page is to act as a resource for people who are beginning to search for answers
History
regarding the issue of Attention Deficit Hyperactivity Disorders in children. This page will allow the
reader access information on both sides of the argument, mental disorder or behavior problems. Under
each section of this web page, the reader will be able to link to further information supporting the topic
Child and Adolescent area.
Psychiatrists
Good luck and happy hunting!
Diagnosis of ADHD and

related sites
Misdiagnosis
Need for a page on this topic
Web Search
In recent years, the debate regarding Attention Deficit Hyperactivity Disorder (ADHD) has exploded.
While it is believed that this disorder effects youth and adults, the increased frequency for which the
diagnosis is being made in children and adolescents is causing concern and controversy. In 1996, the
prescriptions for Ritalin, the number one medication used to treat ADHD in children, increased by
nearly 400% in less than 10 years (Attention Deficit Disaster). With this trend to medicate our children,
proponents, for both sides of this issue, have defended their positions by sighting research, quoting
statistics, case studies, and invoking testimonials.
CORNELL UNIVERSITY
Policy Analysis and Management 613
Michelle L. Scott-Pierce
Home History
History It is only since the 1950's have children been classified as having "hyper kinetic impulse disorder"
which was accepted as a brain damage syndrome. In the 1960's, Stella Chess described the "hyperactive
child syndrome' which described children who displayed hyperactive tendencies but whom had not
Child and Adolescent suffered from a brain injury. In the 1970's, children who displayed hyperkinesias demonstrated more
Psychiatrists prevalence toward attention deficits than toward hyperactivities. However, it was not until 1980 the
DSM-II (Diagnostic and Statistical Manual of Mental Disorders version II) recognized Hyperkinetic
Reaction in Children as a diagnosable mental disorder (Classification of ADHD throughout History).
Diagnosis of ADHD and related
Since the American Psychiatric Association (APA) recognized "ADHD" as a diagnosable mental
sites illness, the criteria for making a diagnosis have been updated and published in the DSM manual
revisions. The first of these revisions occurred in the DSM-III (1980), which titled the diagnosis
"Attention Deficit Disorder without Hyperactivity. In 1987, the DSM-IIIR was released changing the
Misdiagnosis diagnosis to "Undifferentiated Attention Deficit Disorder." The final version of the DSM-IV was
released in 1994. This version labeled the diagnosis "Attention Deficit Hyperactivity Disorder" and
contained additional criteria for assessing a child as ADHD. Problems in Identification and
Web Search Assessment of ADHD
CORNELL UNIVERSITY
Throughout history prominent psychiatrists have devoted a great deal of time and research to the intellectual
Home development of children. Four prominent psychiatrists in the area of Adolescent Development and Educational
Theories are Piaget, Eriksen, Kohlberg, and Krathwohl. This information is expanded upon in the article Research
By Well-Known Educational Theorists.
History
Child and Adolescent

Psychiatrists Education Theorists
Diagnosis of ADHD and Piaget asserts intellectual development includes stages of reasoning, Sensorimotor, Preoperational, Concrete, and
related sites Formal Operational. Implementing these theories into reality, Piaget asserts that children are active knowers, learning
through a variety of active tasks which promote structure and organization, as well as paradoxes to motivate a child
beyond what is known.
Misdiagnosis
Eriksen's theory of development is individuals are seeking Identity throughout their lives, moving in and out of the
several different stages. Two of these stages are Identity Diffused, no decisions have been made regarding one's
identity, the other is Foreclosed, premature decisions have been made regarding identity, usually based on other's
Web Search opinions.
Kohlberg's theory is one of moral development, based on seven basic stages. Each of these stages builds upon the
foundation laid in the previous, until the individual reaches a point where justice and individual dignity are
important.
Krathwohl's theory asserts that each person will develop their own belief system. The belief system will be developed
as an individual receives information, responding to stimuli, valuing an idea, organizing values, internalizing the
values.
Other Psychiatrists
CORNELL UNIVERSITY
Home As the research from the Experts documents, children and youth are active individuals seeking to expand their
knowledge, define their boundaries, and assert their individualities, all within a time in their physical development
where hormones and emotions are often in conflict. By virtue of these characteristics, it is not surprising that
History comments such as "I wish I had a third of their energy" or "Please pay attention to the task at hand" are said to
almost all young people at one time in their lives.
Not all young people who "seem to be hyper" or who have "difficulty staying focused" are ADHD. There are many
Child and Adolescent sites on the web which are available to help explain the Identification and Assessment of ADHD. Listed below are
Psychiatrists several sites with brief descriptions of the information which can be obtained by visiting the site.
"Problems in Identification and Assessment of ADHD"--This site discusses the diagnostic history of ADHD, the
Diagnosis of ADHD and new diagnosis criteria in the DSM-IV, and offers some alternatives to an ADHD diagnosis.
related sites "North Country Psychiatric Association"--this site contains a great deal of information on the diagnosis of children
with mental disabilities.
Misdiagnosis CHADD (Children and Adults with Attention-Deficit/Hyperactivity Disorder)--This site provides information and
support for people who have been diagnosed with Attention Deficit Hyperactivity Disorder.
Web Search
CORNELL UNIVERSITY
Home Two prominent representatives from the medical community against the diagnosis of ADHD are Fred A. Baughman,
Jr. M.D. and David Keirsey PhD. Dr. Baughman maintains that the diagnosis of ADHD is doing more harm for our
children than help. The diagnosis is allowing for our children to be medicated instead of challenged to perform at
History age-appropriate levels. Dr. Keirsey maintains that children and youth who are diagnosed as ADHD are
demonstrating behaviors outside of the "norm" but are not mentally disabled. These young people are "on the
lookout for fun" and "are not concerned with "pleasing" teachers or parents".
Child and Adolescent Fred A. Baughman, Jr. M.D.'s articles and books on this issue can be attained through local libraries and bookstores.
Psychiatrists The web address for the article "What Every Parent Needs to Know About ADD" is 'wysiwyg://36/http://www.
geocities.com/HotSprings/8568/Baughman_MD_ADD_ADHD_Ritalin.htm' can not be accessed as a hyperlink,
but rather must be typed into the Location field.
David Keirsey, PhD. has several articles published via the web. The main access page for Dr. Keirsey is http://
related sites Keirsey.com. The Great ADD Hoax examines the way in which a youths temperament is misunderstood and
contributes to the miss diagnosis of a child with ADHD when in actuality, according to Dr. Keirsey, who is a
temperament psychiatrist, it is the child's temperament which is truly the problem.
Misdiagnosis
Attention Deficit Disaster is a case study of a young boy diagnosed with ADHD.
Web Search One article in support of the diagnosis of ADHD which clearly eliminates the "Behavioral Excuses" ADHD persons
have traditionally been allowed to utilize in "explaining" their behavior is Neurobiological Diagnosis and Personal
Responsibility: How does morality fit in with ADHD? This article is will worth the time to explore.
"Born to Explore! The other side of ADD"--is a paper written by Teresa Gallagher. Through this paper the
coincidence of ADHD and Creativity in children is explored. The paper describes the many behaviors which are
associated with both ADHD and Creativity. The paper also encourages parents to consider alternative diagnosis to
ADHD.
"ADHD and Children who are gifted"--This site examines the connection between children who are diagnosed with
ADHD who could also be considered gifted.
CORNELL UNIVERSITY
There are many search engines available to locate information on the issue of ADHD. A few of these
Home engines can be accessed from this locations by selecting the hyperlink on this page. On the bottom of
the page, please find Key Words which will assist you in your search.
History
Netscape
Excite
Child and Adolescent
Psychiatrists Google
GoTo
LookSmart
related sites
Lycos
Misdiagnosis Snap
Web Search
Other Key Word Searches:

ADHD
ADD
Adolescent Development
Adolescent Behavior
Parenting
Education
CORNELL UNIVERSITY
The Children
High School Drop outs

Table of Contents
Introduction To The Problem
Defining The Problem
Characteristic of The Problem
Who Cares About the Problem
Back to Bill Trochim's Home Page
Copyright ©2000
Contact Me HereArlean Wells

INTRODUCTION
Why are Children leaving

High School
Without a Diploma
High school dropout is a major problem facing America today. In 1996, 11.1 percent of 16-to-24 year-old were no
longer enrolled and have not completed high school. In this year the dropout rate was higher for Hispanic (29.4
percent) than non-Hispanic Blacks (13 percent) or non-Hispanic whites (7.3 percent) (The Information Series on
Current Topics 1999). In 1997, 11.0 percent of students, between the ages of 16-to-24, dropout of high school.
Again the dropout rate was higher for Hispanic (25.3 percent) than non-Hispanic Blacks (13.4 percent) or non-
Hispanic White (7.6 percent) (Digest of Education Statistics 1998) gov. census link Although there was a slightly
decrease in school dropout rate between the years of 1996 and 1997, America still suffer from the vast majority of
students that do leave high school before receiving a high school diploma.
For many years researchers have struggle with the phenomenon of America's high school dropout problem.
Questions such as, who are at risk of becoming school drop outs? What are the factors that lead up to students
deciding to dropout of school? what can be done to prevent high school drop outs? to only name a few, have been
addressed by researchers for years and are still a big concern among researchers today.
Back to the Table of Contents
Copyright © 2000 Arlean Wells, Ithaca, New York. All rights reserved.
Revised: August 23, 2004.
Defining The Problem
Types of Dropout Rates
Event rates:
Event rates describe the proportion of students who leave school each year without completing a high school program. This annual
measure of recent dropout occurrences provides important information about how effective educators are in keeping students enrolled
in school. Event Rates
Status rates:
Status rates provide a cumulative data on drop outs among all young adults within a specified age range. Status rates are higher than
event rates because they include all drop outs, regardless of when they last attended school. Since status rates reveal the extent of the
dropout problem in the population, this rate also can be used to estimate the need for further education and training that will help
drop outs participate fully in the economy and life of the nation. Status rates .
Cohort rates:
Cohort rates measure what happens to a cohort of students over a period of time. This rate is based on repeated measures of a group
of students with shared experiences and reveals how many students starting in a specific grade drop out over time. Cohort rates
Copyright © 2000, Arlean Wells, Ithaca, New York.

Characteristics Of The Problem
Education At Risk
CRITICS OF EDUCATION OFTEN EXPLAIN AMERICA'S DROPOUT PROBLEM INTERNS Of THEIR OWN
PHILOSOPHIES OR SOCIAL OUTLOOK. THEY (RESEARCHERS) OFTEN FOCUS MORE ON
DEMOGRAPHIC FACTORS AS THE UNDERLYING REASONS WHY STUDENTS LEAVE SCHOOL. AND
THEY BELIEVE ONCE THOSE FACTORS ARE UNDERSTOOD, INFORMATION ABOUT HOW TO
REDUCE THE SCHOOL DROPOUT RATE WILL BE AVAILABLE. FOR EXAMPLE, STUDENTS THAT ARE
"PLACE" AT RISK IS IDENTIFIED AS BEING AT RISK OF DROPPING OUT OF SCHOOL. Characteristic of
High School Dropout
Copyright ©2000, Arlean Wells, Ithaca, New York.

Who Cares About The Problem
When identifying the criteria that are assumed to manipulate students to dropout of school, one
needs not to focus only on the demographic issues, but to also address the personal factors that
are behind students decision to retire from school at a premature age.
There are a lot of organizations that focus on America's Dropout Problems. Organization such
as; The Center for Research onThe Education Of Students Placed At Risk,(CRESPAR) The
philosophy behind CRESPAR is that students are not inherently at risk but rather are placed at
risk of educational failure by many adverse practices and situations.
The cornerstone of CRESPAR's programmatic efforts, is built on the premise that all students
can succeed with a demanding school curriculum and a high-expectations approach. Below are
projects that are conducted by researchers at CRESPAR, in hoping that one day all children will
succeed in school.
Exposure To Violence and School
Cultural Factors In school
Classroom Cultural Ecology
Talent Development Middle school
Talent Development High School
Opportunities To Learn
The Safe Start
Super School
Another important organization that focus on the development of children and their education is
the Search Institute The Search Institute Search Institute is an independent, nonprofit,
nonsectarian organization whose mission is to advance the well-being of adolescents and
children by generating knowledge and promoting its application. Search Institute identified 40
assets that are critical for young people's growth and development. The first 20 developmental
assets focus on positive experiences that young people receive from the people and institutions
in their lives. The last 20 addresses the community's responsibility for its young. The 40 Assets
and Their definitions
Communities That Care (CTC) is another institution that emphasizes the well-being of children.
CTC is a community operating system that provide research based tools to help communities
promote the positive development of children and youth and prevent adolescents substance
abuse, delinquent, teen pregnancy, school dropout and violence
Copyright © 2000 Arlean wells, Ithaca, New York.

STS Science Curricula on the Net
Compiled by Morrison Chakane
Introduction
Science, Technology, and Society (STS) is a means for learning science. Scientific principles and concepts are studied within a
technological and societal contexts. For example, students can learn the principle of "solubility" by studying the processes involved
in a Water Treatment Plant. (American Chemical Society, 1993). In this way, emphasis is on studying science as a process. (National
Academy Press, 1996).
Already there are STS curricula completely developed in the United States appearing on the Net. The American Association for the
Advancement of Science (AAAS) through its Project 2061 has set the national goal of "scientific literacy" for STS curricula. (See
below for further discussion). Content covered by these curricula can be categorised as either "science-controlled"or "technology-
controlled" or "society-controlled". (Fensham, 1992). I have found the ones on the Net to be "science-controlled", that is, content of
emphasis is science. Also, the content is organized according to what Aikenhead (1992) calls "Singular Discipline through STS
Content" meaning "STS Content serves as an organizer for the science content and its sequence". (p.55). Lastly, each curricula
follows "authentic evaluation". (See below for further discussion).
Online Curricula
ChemCom, is a high school course designed by American Chemical Society for college bound non-science majors. Resources on
the Net include ChemCom Club WWW Page, CHEMCOM: Chemistry in the Community Discussion List, and Mr. Leiseth's
ChemCom Class which provides a report of the course implementation.
Science Education for Public Understanding Program, SEPUP, is a course developed for middle school students at the Lawrence
Hall of Science with funding from National Science Foundation. Resources on the Net include SEPUP Assessment System,
SEPUP Module Orders, LSU SEPUP Implementation Center, and a student activity Investigating Ground Water.
National Science Education Standards, simply reffered to as Standards is a program developed by the National Academy of Sciences
for all K-12 students and published by National Acdemy Press. The online reosurce, Standards covers the whole text chapters of
science teaching, science content, and science assessment.
Benchmarks is a science program for K-12 developed by Science For All Americans, SFAA from
the Project 2061 report. It provides teachers with strands to develop an STS based curriculum. The
actual activity involved in designing such a curriculum is also available on Benchmark on Disk. The
Resources for Science Literacy is a computer-based tool to help educators enhance their own
science literacy. The Designs for Science Literacy provides educators with a variety of design
principles to build K-12 curricula.
The Goal of STS Programs
All these programs strive for the national goal of scientific literacy set by the American Association for the Advancement of
Science, AAAS in their report Project 2061. Students should
● know scientific principles and concepts of their natural world

● understand the social, historical and interactions of science with Technology and Society and
● aquire the abilities necessary to do scientific inquiry.
These goals calls for different science teaching, content and assessement.
Assessement
Assessement should be authentic. Data should be collected for
● students achievement and attitude

● teacher prepration and quality
● program characteristics
● resource allocation and policy instruments.
Also, the following different methods should be used to collect data:
● Paper and pencil

● Perfomance testing
● Interviews
● Portfolios
● Perfomances
● Observing Programs, students, and teachers in classroom
● Transcript analysis and
● Expert reviews of educational materials.
Teachers will be faced with, amongst others, problem of organizing data gathered from different methods above. I suggest the
Multitrait-multimethod matrix developed by Campbell to organize data. This matrix not only help organizing data, but also provides
evidence for convergent and discriminant validity, and reliability.
Evaluation
Data collected for one concept, like ability to do scientific inquiry should show high correlations on the matrix, and data collected for
different concepts like undertsanding and knowing should form low correlations on the matrix, regardless of the methods used. This
is a condition required for construct validity. For conclusion validity, statistical power should be high. For internal validity, there
must be no alternative explanations such as mortality, testing, media, instrument for the cause of the effect. And for external validity,
the sample should be randomly selected.
However, Terri Miles calls for a new criteria for validity. The "old" criteria above should be modified to inlude consequences
fairness transfer and generalizability cognitive complexity content quality meanungfullness cost and efficiency
References
❍ American Chemical Society, 1993. ChemCom:Chemistry in the Community, 2nd ed. 1993. Iowa: Kendal Hunt
Publishers.
❍ Aikenhead, G. 1992. What is STS Science Teaching? in STS Education: International Perspective on Reform
edited by Joan Solomon and Glen Aikenhead. New York: Teachers College.
❍ Fensham,J. 1992. Science and Technology in Handbook of Research on Curriculum edited by Philip Jackson. New
York: Macmillan Publishing Company.
mc70@cornell.edu
Institute for Education Research in Central Asia
"You are never alone in the centre of the desert, if you are in Central Asia."
-An old regional proverb
International Organizations Central Asian Links
Research Methodologies
Recommended Readings
This institute has been set up as a site to assist a user in accessing the limited resources on WWW regarding education research
in Central Asia. Central Asia is an emerging area of interest in terms of global politics. The potential for Central Asia to become
an economic giant within Asia is tremendous. This page will explore the role that research in education can play in human resource
development in Central Asian Republics.
Historic Background on Education in Central Asia
Adapted
from: The dilemmas of choosing by Sanyal, Bikas C. and Kitaev, Igor in IIEP Newsletter/ October-December 1993. and Synergy and
continuum by Kitaev, Igor in IIEP Newsletter/ October-December 1995.
The collapse of the Soviet Union four years ago not only brought independence to the countries of the region, it also accelerated the
development of new attitudes towards education. Free education and an extremely high (99 %) literacy level were major
achievements of the Central Asian countries as a part of the former Soviet Union in terms of human development. The new challenge
was to maintain these achievement levels within relatively less centralized administrative structures. But as soon as the public
education in Central Asia lost its compulsary and universal nature after 1991, the internal efficiency plunged under the budgetary
constraints of the transition period. It took three to four painful years for public opinion in Central Asian countries to accept that, in a
decentralised context, their education systems were faced with imminent collaps because the regional and local authorities who had
taken over the running of schools simply did not have the necessary human or financial resource to ensure state backing at the same
levels as previously. This structural adjustment effectively blocked the quantitative expansion of the systems, and double and triple
shift systems were brought in to compensate. In order to stop the decline of internal efficiency, goverments were forced to undertake
harsh but necessary decisions. in kazakhstan, in 1994, 1, 057 kindergartens, 30 vocational and technical schools, 20 boarding schools,
65 elementary and 157 multigrade schools in sparsely populated areas were closed for cost-efficiency reasons. Several specialized
higher education institutions were closed in Tadjikistan, and evening classes and correspondence courses were discontinued in
Turkmenistan. Therefore the states realized that they could no longer maintain previous patterns of educational management and
finance based on social demand and unlimited free supply of public education.
New Directions for Education
There is an
enormous concern to develop adequate means for developing the region's potential in education. After an initial period of rejecting
everything from the recent past, and a rush to indiscriminately adopt everything foreign, the transition is gradually moving from the
present centralized system to one which emphasizes national identity and diversity. Following are some of the new directions that
have been adopted as new educational laws within the Central Asian Countries:
Reducing the length of compulsory education from eleven to nine years, in order to avoid educational wastage
Allowing private and other types of non-public education at all levels
Gradually introducing tuition fees in higher education and various types of user fees at lower levels
Discontinuing the practice of gauranteed employment for graduates
Converting from the Cyrillic to the Latin alphabet and promoting the use of local languages for instruction
Exploration Links
The Institute
for Education Research will provide a research framework for exploring these new directions through the lens of external and
internal efficiency and equity within the education sector in Central Asia. The four links at the top of this page are sites that are
useful for those investigating education research within a broad context.
The first link is to International Organizations. These include the major organizations who are doing work in Central Asia. Examples
are: World Bank , United Nations Development Programme (UNDP), World Health Organization (WHO), International Monetary
Fund (IMF), and Asian Development Bank. There are others as well, but these are the most well known organizations.
The next link will take you to Central Asian Links. The first links on this page are country home pages that offer a wide range of
information about each country. There are two pages for Kazahkstan ( Kazahkstan 1 and Kazakhstan 2), there are also two links for
Kyrgyzstan ( Kyrgyzstan 1 and Kyrgyzstan 2), Tajikistan, two for Turkmenistan ( Turkmenistan 1 and Turkmenistan 2), and two for
Usbekistan ( Uzbekistan 1 and Uzbekistan 2).
For those interested in more detailed information about Central Asia, there are links on this page to Central Asian Experts, Central
Asian References, Related Peoples, Places &Topics, Inner Asia in Cyberspace, Central Asian Studies @ UW, Newly Independent
States of Central Asia, and the Forum for Central Asian Studies at Harvard University.
The third link provides Research Methodologies used for conducting research at two levels. The first level deals with establishing
and evaluating the external efficiency and equity in education. This is the macrolevel research model.The second level uses refined
measures for gauging the internal efficiency and equity. This is the microlevel research model. An appropriate interface will act as a
bridge between the two levels of research. This is the macro/microlink model.
The last link, Recommended Readings, will give you a list of several readings relating to education and Central Asia in general.
Some of these have abstracts that you may find helpful.
If you have
any comments about this page and its related links, or have information about Central Asian educational research that has not
been mentioned here, please send me a line.
Samid Hussain, Dept. of Education, Cornell University

Links to International Organizations
World Bank
Asian Development Bank
International Monetary Fund (IMF)
United Nations Development Programme (UNDP)
UNDP Links
World Health Organization (WHO)
Asian Institute of Technology (Bangkok, Thailand)
Inter-American Development Bank
International Institute for Sustainable Development (IISD)
World-Wide Web Virtual Library: United Nations Information Services
World-Wide Web Virtual Library: Asian Studies (Australian National University)
WWW sites in Asia (List of W3 Consortium)

Central Asian Links
● Kazahkstan 1
● Kazakhstan 2
● Kyrgyzstan 1
● Kyrgyzstan 2
● Tajikistan
● Turkmenistan 1
● Turkmenistan 2
● Uzbekistan 1
● Uzbekistan 2
● Central Asian Experts
● Central Asian References
● Related Peoples, Places &Topics
● Inner Asia in Cyberspace
● Central Asian Studies @ UW
● Newly Independent States of Central Asia
● Forum for Central Asian Studies at Harvard University

Research Methodologies
For policy makers to effectively address reform in Central Asia, first, they need to know the desired outcomes. These can be
projected through the use of the macrolevel model of education research. Second, the linkages and interactions between different
levels of organizations has to be understood. This can be accomplished by the microlevel research model. Third, this information has
to be alligned with the national priorities while at the same time being sensitive to the local needs. This is done through identifying
the regularities within the similar types of schools and linking them to the education policies at different organizational levels.
Ideally, it should result in the design of the most suitable interventions at inputs, processes, and/or the outputs. This can take several
different forms such as reallocation of resources, shifts in autonomy, or greater outcome incentives.
Following are the recommended research methodologies for evaluating the efficiency and equity concerns in education. These can
apply to all the developing countries.
Macrolevel Research in Education
Microlevel Research in Education
Interface between Macrolevel and Microlevel Research in Education

Macrolevel Research in Education
This research model lays the foundation for human resource development. It prioritizes the alternative strategies for contributing
towards the economic growth. Its function is to evaluate the external efficiency and equity within the education sector.
Microlevel Research in Education
This Research model uses the classroom as the unit of analysis and draws upon similarities across classrooms within similar
types of schools.
Interface between Macrolevel and Microlevel Research in Education
The idea presented by this concept map is to balance the microlevel research with the macrolevel phenomenon. This can be
accomplished through a contextual aggregation of microlevel research and linking it to macrolevel decision making. Therefore, the
regularities seen within the classrooms across similar types of schools can provide better insights for an effective policy response in
contrast to decontextualized aggregated school information that is not sensitive to the enormous diversity existing between schools,
regions, and countries.
Recommended Readings
Resources for Teaching about Inner Asia (A Somewhat Annotated Guide).
Author: Weston,David C. (1988).
ERIC Abstract: This partially annotated bibliography contains selected resource materials about Central Asia/Inner Asia for use
with the secondary school social studies curriculum. Part 1 provides a list of eight organizations that specialize in the study of Central
Asia, while Part 2 contains a collection of seven reference works. Part 3 offers six English language periodicals and newsletters that
are published in a number of different countries, and a list of 21 periodical and magazine articles appear in part 4. Part 5 annotates 17
books for students that are either about Central Asia or recount the journeys of people who have traveled there. Eleven teaching aids,
prepared by the Research Institute for Inner Asian Studies, Bloomington (Indiana), are included in part 6.
Teaching Techniques and Materials for the Study of Inner Asia: The Peoples of the Steppe. A Historical and Cultural Perspective.
Author: Cornelius, Martha J. et al. (1978).
ERIC Abstract:This curriculum unit is intended to serve as a general introduction to the study of the traditions and culture of the
vast heartland of the Eurasian land mass called Inner Asia. Objectives are to stimulate student and teacher interest in Inner Asian
studies, as well as to encourage students to learn about the historical experience of other peoples. Learning activities and resource
materials are included to provide maximum flexibility for teachers. A selected bibliography and a reference section on teaching aids
has been included to facilitate any additional research which may be needed. Contained in this curriculum guide are units dealing
with: (1) the culture and customs of Inner Asian peoples; (2) a unit on comparative religious beliefs;(3) geographic exercises; (4)
myths and legends; (5) Russian eastward expansion; (6) Mongol revolution; (7) economics and government; (8) roles of modern
women; (9) influence of communism; and (10) simulation exercises on modern China. Short plays and games are included to
reinforce the concepts.
EDUCATION AND SOCIAL CHANGE--A STUDY OF THE ROLE OF THE SCHOOL IN A TECHNICALLY DEVELOPING
SOCIETY IN CENTRAL ASIA.
Author: MEDLIN, WILLIAM K. et al. (1965).
ERIC Abstract:A DEFINITION OF THE RANGE OF INFLUENCE OF THE UZBEK TEACHER AS AN AGENT OF
SOCIOCULTURAL CHANGE IN SOVIET UZBEKISTAN AND A DETERMINATION OF THE ROLE OF THE TEACHER IN
TRANSMITTING NEW VALUES AND REINTERPRETING TRADITIONAL CULTURES WERE THE MAJOR PURPOSES OF
THE STUDY. AN INTERDISCIPLINARY APPROACH WAS USED WHEREIN HISTORICAL-CULTURAL,
PSYCHOLOGICAL, AND SOCIOLOGICAL RESEARCH METHODS WERE USED. POLICIES AND METHODS WHICH
SOVIET AUTHORITIES PURSUED OVER A 40-YEAR PERIOD IN UZBEKISTAN HAVE MET WITH QUALIFIED
SUCCESS. THE MAJOR CONCLUSION OF THE STUDY WAS THAT THE SCHOOL WAS PERFORMING SPECIFIC AND
VITAL ROLES OF CHANGE. THE FORMULATION OF THE UNDERLYING PRINCIPLES OF THIS POSITION INTO A
THEORY THAT DEFINES A MODEL FOR SOCIOCULTURAL CHANGE IS A GENUINE POSSIBILITY, BUT WILL
DEPEND UPON FURTHER WORK IN THIS FIELD. (HB).
Kazakh: Language Competencies for Peace Corps Volunteers in Kazakhstan.
Author: Cirtautas,Ilse. (1992).
ERIC Abstract:The text is designed for classroom and self-study of Kazakh by Peace Corps volunteers training to serve in
Kazakhstan. It consists of language and culture lessons on 13 topics: personal identification; classroom communication; conversation
with a host counterpart or family; general communication; food; money; transportation; getting and giving directions; shopping at a
bazaar; reception by a host family; workplace language; medical and health issues; and interaction with officials. An introductory
section outlines major phonological and grammatical characteristics of the Kazakh language and features of the Cyrillic alphabet.
Subsequent sections contain the language lessons, organized by topic. Each lesson consists of a prescribed competency, a brief
dialogue, vocabulary list, and grammatical and vocabulary notes. Many sections also contain cultural notes. Appended materials
include a translation of the dialogues, glossary, word list, and brief bibliography on Kazakh language, history, and literature and
culture.
On the Development of Methods and Forms of Teaching and Upbringing Work in the Schools of Siberia and Eastern Kazakhstan
(1917-1931).
Author: Romanov, A. P. (1981).
In: Soviet-Education. v24 n1 p16-28 Nov 1981.
ERIC Abstract:Describes the public education system implemented in Siberia and Eastern Kazakhstan from 1917 to 1931.
Discussed are curricula, teaching methods, and extracurricular work. All students were required to make practical applications of the
knowledge acquired at school. (AM).
Central Asia : emerging new order
PUBLISHED: New Delhi : Har Anand Publications, c1995.
Bibliography of Islamic Central Asia
AUTHOR: Bregel, Yuri, 1925-
PUBLISHED: Bloomington, Indiana : Indiana University, Research Institute for Inner Asian Studies, 1995.
Guide to scholars of the history and culture of Central Asia
AUTHOR: Schoeberlein-Engel, John S. (John Samuel)

PUBLISHED: Cambridge : Harvard Central Asia Forum, 1995.
Central Asia in historical perspective
PUBLISHED: Boulder : Westview Press, 1994.
Central Asia : its strategic importance and future prospects
PUBLISHED: New York : St. Martin's Press, 1994.
Political and economic trends in Central Asia
PUBLISHED: London : British Academic Press, c1994.
Central Asia : a survey of libraries and publishing in the region
AUTHOR: Johnson, Eric A. (Eric Alan), 1963-
PUBLISHED: Washington, D.C. International Research & Exchanges Board, [1993]

EMOTIONS and EMOTIONAL INTELLIGENCE
This page was written in 1996 and since then has continued to be the most popular emotional intelligence resource on the web. Since
this website was published there have been many new and exciting findings in the EI community.
CLICK HERE FOR THE LATEST INFORMATION ON EMOTIONAL INTELLIGENCE INCLUDING:
● Emotional intelligence tests

● Emotional intelligence books
● Emotional intelligence resources
● Emotional intelligence research and case studies
The original page from 1996 is below.
This page is an on-line bibliography in the area of emotions and emotional intelligence, describing current research findings and
notes of interest. The main areas covered are:
● Emotional Intelligence
❍ What is emotional intelligence?
❍ Why is emotional intelligence important?
❍ Tests of emotional intelligence
● Emotions
❍ Affect, Mood and Emotions
❍ The Brain and the Neuropsychology of Emotions
● Methods for Researching Emotions

● References
EMOTIONAL INTELLIGENCE
What is emotional intelligence?
Recent discussions of EI proliferate across the American landscape -- from the cover of Time, to a best selling book by Daniel
Goleman, to an episode of the Oprah Winfrey show. But EI is not some easily dismissed "neopsycho-babble." EI
has its roots in the concept of "social intelligence," first identified by E.L. Thorndike in 1920. Psychologists have
been uncovering other intelligences for some time now, and grouping them mainly into three clusters: abstract
intelligence (the ability to understand and manipulate with verbal and mathematic symbols), concrete intelligence
(the ability to understand and manipulate with objects), and social intelligence (the ability to understand and relate
to people) (Ruisel, 1992). Thorndike (1920: 228), defined social intelligence as "the ability to understand and
manage men and women, boys and girls -- to act wisely in human relations." And (1983) includes inter- and
intrapersonal intelligences in his theory of multiple intelligences (see Gardner for an interesting interview with the
Harvard University professor). These two intelligences comprise social intelligence. He defines them as follows:
Interpersonal intelligence is the ability to understand other people: what motivates them, how they work, how to
work cooperatively with them. Successful salespeople, politicians, teachers, clinicians, and religious leaders are all
likely to be individuals with high degrees of interpersonal intelligence. Intrapersonal intelligence ... is a correlative
ability, turned inward. It is a capacity to form an accurate, veridical model of oneself and to be able to use that
model to operate effectively in life.
Emotional intelligence, on the other hand, "is a type of social intelligence that involves the ability to monitor one's own and others'
emotions, to discriminate among them, and to use the information to guide one's thinking and actions" (Mayer & Salovey, 1993:
433). According to Salovey & Mayer (1990), EI subsumes Gardner's inter- and intrapersonal intelligences, and involves abilities that
may be categorized into five domains:
Self-awareness:
Observing yourself and recognizing a feeling as it happens.
Managing emotions:
Handling feelings so that they are appropriate; realizing what is behind a feeling; finding ways to handle fears and
anxieties, anger, and sadness.
Motivating oneself:
Channeling emotions in the service of a goal; emotional self control; delaying gratification and stifling impulses.
Empathy:
Sensitivity to others' feelings and concerns and taking their perspective; appreciating the differences in how people
feel about things.
Handling relationships:
Managing emotions in others; social competence and social skills.
Self-awareness (intrapersonal intelligence), empathy and handling relationships (interpersonal intelligence) are essentially
dimensions of social intelligence. See the Time magazine piece for an overview of emotional intelligence. Their article basically
summarizes Daniel Goleman's Emotional Intelligence book in a few simple pages, interjecting other experts' opinions and pieces of
research to lend to a more balanced critique of emotional intelligence. In addition, look st the piece on emotional
intelligence from a Hindu newspaper article. It offers a more theoretical and historical perspective on emotional
intelligence.
Why is emotional intelligence important?
Researchers investigated dimensions of emotional intelligence (EI) by measuring related concepts, such as social
skills, interpersonal competence, psychological maturity and emotional awareness, long before the term "emotional intelligence"
came into use. Grade school teachers have been teaching the rudiments of emotional intelligence since 1978, with the development of
the Self Science Curriculum and the teaching of classes such as "social development," "social and emotional learning," and "personal
intelligence," all aimed at "raise[ing] the level of social and emotional competence" (Goleman, 1995: 262). Social scientists are just
beginning to uncover the relationship of EI to other phenomenon, e.g., leadership (Ashforth and Humphrey, 1995), group
performance (Williams & Sternberg, 1988), individual performance, interpersonal/social exchange, managing change, and
conducting performance evaluations (Goleman, 1995). And according to Goleman (1995: 160), "Emotional intelligence, the skills
that help people harmonize, should become increasingly valued as a workplace asset in the years to come."
Tests of Emotional Intelligence
Although no validated paper-and-pencil tests of emotional intelligence exist, two "fun" versions of emotional intelligence tests have
been developed. Test yourself to see how you rate on emotional intelligence with a test from "USA Weekend" or the test from Utne
Reader. Because no one has yet to develop a good scale for emotional intelligence, you may want to investigate the Web page on
personality, temperament, psychopathology, and emotion scales developed by Albert Mehrabian, professor of psychology at the
University of California, Los Angeles. You may be able to piece together a few of these scales for a rough approximation of the
dimensions researchers hypothesize characterize emotional intelligence.
Emotions
Affect, Mood and Emotions
"It is clear, however, that, without the preferences reflected by positive and negative affect, our experiences would
be a neutral gray. We would care no more what happens to us or what we do with our time than does a computer."
C. Daniel Batson, Laura L. Shaw & Kathryn C. Oleson (Differentiating Affect,

Mood, and Emotion: Toward Functionally Based Conceptual Distinctions,
1992)
The terms affect, mood, and emotion are used interchangeably throughout much of the literature, without distinguishing between
them (Batson, Shaw, & Oleson, 1992: 294). Some of the confusion or lack of clarity may be a result of the overlap among the
concepts (Morris, 1992). Some researchers have attempted to distinguish these concepts based on structural differences and
functional differences. Schwarz and Clore (1988) differentiated emotion from mood based on structural differences, such as the
specificity of the targets (e.g., emotions are specific and intense and are a reaction to a particular event, whereas mood are diffuse and
unfocused (George & Brief, 1995; Frijda, 1987; Clark & Isen, 1982) and timing (e.g., emotions are caused by something more
immediate in time than moods). Batson and collegues (1992) differentiated mood, affect and emotion based on functional
differences, like changes in value state (affect), beliefs about future affective states (mood), and the existence of a specific goal
(emotion).
"Affect seems to reveal preference (Zajonc, 1980); it informs the organism experiencing it about those states of
affairs that it values more than others. Change from a less valued to a more valued state is accompanied by positive
affect; change from a more valued to a less valued state is accompanied by negative affect. Intensity of the affect
reveals the magnitude of the value preference."
If you are seriously interested in the area of emotion, affect, and/or mood, investigate the Geneva Emotion Research Group. Located
at the University of Geneva, this group conducts research in the area of emotions, including experimental studies on emotion-
antecedent appraisal, emotion induction, physiological reactions and expression of emotion (including both facial and vocal) and
emotional behavior in autonomous agents. The University of Amsterdam's experimental psychology department is conducting
research in the area of emotions as well.
The Brain and the Neuropsychology of Emotions

Double click on the hot flames for a hot bed of information from The Beckman Institute for Advanced Science and Technology's
Cognitive Neuroscience Group at the University of Chicago at Urbana-Champaign. The Cognitive Neuroscience
Group is a group of researchers investigating how the brain and emotions work. In additon, if you are interesting in
books on neuroscience, and want a little light reading for over the weekend, investigate books from Neuropsychology
Central is an on-line resource for everyone interested in the area. The primary objectives of the homepage are:
● To describe the importance of neuropsychology as a science of brain and behavior

● To increase public knowledge of neuropsychology as a branch of practical medicine
● To indicate the contribution which neuropsychology is making to the neurosciences
● To act as a resource for the professional and layperson, alike
Here's just a sampling of what the page includes:
Neuropsychological Assessment
Resources directly related to the assessment of mental function in various neuropsychologically impaired
populations.
Brain Imaging
Resources covering all aspects of neuroimaging with a special emphasis on functional imaging techniques.
Cognitive Neuropsychology
Neuropsychological theory and resources from the cognitive orientation.
Homepages
Personal pages of individuals actively pursuing careers in neuropsychology and closely related fields.
Laboratories
University and medical school labs dedicated to the study of neuropsychology.
Neuropsychology Central Forum

Neuropsychology Central's www discussion group for practitioners, academics, and interested parties.
Newsgroups
Professional and support newsgroups closely related to the study of neuropsychology and neuropsychological
difficulties.
Professional Organizations
Links to organizations and professional conferences.
Publications
Printed material available on the internet related to neuropsychology.
General Neuroscience
A hodgepodge of interesting and superbly crafted links related to the neurosciences.
Various Psychology Links

A great place to jump-off this page and into other worlds of psychology.
Another great resource in neuropsychology comes from Brown University's Department of Psychiatry and Human Behavior's page of
neuropsychology links on the World Wide Web. This page is overflowing with information, and is a great starting point for venturing
through the neuropsychology world on the Web.
Methods for Researching Emotions
The Beckman Institute for Advanced Science and Technology's Cognitive Neuroscience Group at the University of Chicago at
Urbana-Champaign proceeds in researching the brain and emotions. On their page they indicate the various methodologies they use
to investigate cognition and emotion. Look at all the abstracts of technical reports produced by this group, or select a specific abstract
you would like to view by clicking on the abstract number. ml#CNS-94-02">[CNS-94-02] Russell A. Poldrack, On Testing for
Stochastic Independence between Memory Tests If you are having problems conceiving a research design appropriate for
investigating some aspect of emotion, just contact Geneva Emotion Week conference is being held May 16-19, 1996. The conference
has two major themes:
1. a colloquium focusing on major topics in the psychology of emotion

2. workshops on advanced research methods in the field of emotion
And finally, for an interesting little piece similar to the notion of "how NOT to lie with statistics," check out Clay Helberg from the
University of Wisconsin Schools of Nursing and Medicine's piece entitled Pitfalls of Data Analysis or in other words How to Avoid
Lies and Damned Lies from an applied statistics conference.
References
Ashforth, B.E. & Humphrey, R.H. (1995). Emotion in the workplace: A reappraisal. Human Relations, 48(2), 97-125.
Eysenck, S.B., Pearson, P.R., Easting, G. & Allsopp, J.F. (1985). Age norms for impulsiveness, venturesomeness and empathy in
adults. Personality and Individual Differences, 6(5), 613-619.
Gardner, H. (1993). Multiple Intelligences. New York: BasicBooks.
Goleman, D. (1995). Emotional intelligence. New York: Bantam Books.
Greenberg, M.T., Kusche, C.A., Cook, E.T. & Quamma, J.P. (1995). Promoting emotional competence in school-aged children: The
effects of the PATHS curriculum. Development and Psychopathology, 7, 117-136.
Mayer, J.D. & Salovey, P. (1993). The intelligence of emotional intelligence. Intelligence, 17, 433-442.
Ruisel, I. (1992). Social intelligence: Conception and methodological problems. Studia Psychologica, 34(4-5), 281-296.
Salovey, P. & Mayer, J.D. (1990). Emotional intelligence. Imagination, Cognition, and Personality, 9(1990), 185-211.
Thorndike, E.L. (1920). Intelligence and its uses. Harper's Magazine, 140, 227-235.
Watson, M. & Greer, S. (1983). Development of a questionnaire measure of emotional control. Journal of Psychosomatic Research,
27(4), 299-305. Williams, W.M. & Sternberg, R.J. (1988). Group intelligence: Why some groups are better than others. Intelligence,
12, 351-377.
Although I will attempt to keep this information accurate, I cannot guarantee the accuracy of the information provided. Copyright ©
1996, Cheri A. Young. All rights reserved.
Environmental Sociology
A Resource Page
John Sydenstricker-Neto
This web site provides an overview of environmental sociology and hyperlinks to resources on the Web. The main purpose of this
page is to be a resource where viewers can find more information about this growing subdiscipline and connect to interesting sites.
This page is periodically updated, so your questions, comments, and ideas are most welcome.
● What is Environmental Sociology?

● Working Groups in Professional Associations
● Teaching Environmental Sociology
● Some Relevant Themes
● Future Perspectives
● Sociology Journals
● Links of Interest
● Cited References
What is Environmental Sociology?
Environmental sociology is the study of the reciprocal interactions between the physical environment, social organization, and social
behavior. Within this approach, environment encompasses all physical and material bases of life in a scale ranging from the most
micro level to the biosphere.
An important development of this subdiscipline was the shift from a "sociology of environment" to an "environmental sociology."
While the former refers to the study of environmental issues through the lens of traditional sociology, the latter encompasses the
societal-environmental relations (Dunlap and Catton, 1979; Dunlap and Catton, 1994).
A diversity of paradigms, themes, and levels of analysis have characterized environmental sociology. However, despite this diversity,
a minimal identity of the subdiscipline has been established through significant empirical research and a theoretical contribution "self-
consciously fashioned as a critique to 'mainstream' sociology" (Buttel, 1987:468). Two key contributions to this critique are the joint
work of Riley Dunlap and William Catton Jr. and that of Allan Schnaiberg. While the former work of Dunlap and Catton, has been
more influential within the subdiscipline, Schnaiberg's work has shaped the discipline as a whole (Buttel, 1987).
Early work of Catton and Dunlap (1978; 1980) emphasized the narrow anthropocentrism of classical sociology. The HEP-NEP
distinction--"human exemptionalism 'paradigm' and new ecological 'paradigm'"--contrast traditional sociological thought and
emerging environmental sociology. Schnaiberg's contribution came with the development of the notions of "societal-environmental
dialectic" and the "treadmill of production" (1975; 1980). Contrary to Dunlap and Catton, his work is rooted in Marxist political
economy and neo-Marxist and neo-Weberian political sociology.
[Back to Contents]
Working Groups in Professional Associations
Environmental sociology has existed for approximately twenty-five years as a subdiscipline in the United States. The initial efforts
that led to the transition from a sociology of environment to an environmental sociology, however, go back to the mid 1960s. Three
working groups within scientific associations synthesize this process (Dunlap and Catton, 1979; Freudenburg and Gramling, 1989).
In 1964, within the Rural Sociological Society (RSS), sociologists formed the "Sociological Aspects of Forestry Research
Committee." The next year, this committee was renamed "Research Committee of Natural Resource Development" and later evolved
to become the current "Natural Resources Research Group," one of the largest and more active research groups of RSS having
common interests with other research groups such as "Sociology of Agriculture."
In 1972, the "Environmental Problems Division" was added to the Society for the Study of Social Problems (SSSP). It formally
organized in 1973 based on a broad range of interests with particular attention to environmentalism and environment as a social
problem. Later, this SSSP division was named "Environment and Technology."
Following this trend, in 1973, a committee "to develop guidelines for sociological contributions to environmental impact statements"
was created within the American Sociological Association (ASA). The next year, this committee became the "Ad Hoc Committee on
Environmental Sociology," and two years later a "Section on Environmental Sociology" was officially recognized. Today's "ASA
Section on Environment and Technology" plays an important role in the greening of the sociological community. It has stimulated
new areas of investigation and is responsible for the ENVTECSOC electronic listserv.
At an international level, the "Research Committee on Environment and Sociology" (established in 1971) of the International
Sociological Association (ISA) has played an important role. The wide range of interests within this research committee is reflected
in the sections planned for the XIV World Congress of Sociology (1998). The environment has also received the attention of other
ISA research committees.
[Back to Contents]
Teaching Environmental Sociology
The Department of Sociology at Washington State University (WSU) was the first program to offer a specialization in environmental
sociology at the Ph.D. level, reflecting the leadership of its faculty in the construction of the subdiscipline. Today, many other
departments around the country have environmental sociology as a concentration in their graduate programs and/or offer courses at
the undergraduate level, including the following departments: Rural Sociology at Cornell University; Rural Sociology at the
University of Wisconsin - Madison; Sociology at the University of California at Santa Cruz; Sociology at Montana State University;
Sociology at Northwestern University; and Human Ecology at Cook College, Rutgers University. A sample of syllabi of courses on
environmental sociology or related areas gives a flavor of the dynamism and wide range of interests within the subdiscipline.
[Back to Contents]
Some Relevant Themes
As with other subdisciplines, over the years environmental sociology has undergone fragmentation. This reflects, in part, the inherent
process of development of reductionist science today and also the wide range of themes and research questions encompassed by
societal-environmental interactions. Without being comprehensive, important issues that have been part of the research agenda of
environmental sociologists are highlighted.
Links to internet resources are listed. Instead of a list of specific sites, directories of sites are provided. These directories comprise a
comprehensive list of internet sites on specific topics, including organizations, projects and activities, electronic journals, libraries,
references, documents, or metadatabases.
Agriculture - Agriculture, and more precisely, sustainable agriculture, has been an important research area in
environmental sociology. In a broad sense, sustainable agriculture implies a concern with the economic,
environmental, and social or community dimensions of farming within a local and regional context (Beus and
Dunlap, 1990). Agricultural studies within this ecological perspective have opened new areas of research and
contributed to cross-fertilization within sociology and other disciplines. Toxicology, environmental health, and
social movements, just to mention a few, are some of these areas.
Internet resources
● Agriculture - Virtual Library

● Not Just Cows
● Sustainable Agriculture - Virtual Library
● Sustainable Agriculture (Yahoo!)
Energy and Fuels - Sociologists have paid little attention to energy, despite its importance to social life. The
current state of the art of sociological reflection is polarized between macro and micro approaches, which has made
it quite difficult to integrate variables into the broad theoretical perspectives and more theoretical grounding into
empirical studies. In addition, most development of this area of research has paralleled the waves of energy crises.
Like other niches within environmental sociology, studies of energy are interdisciplinary and closely related to
policy analysis and planning (Rosa et al., 1988).
Internet resources
● Global Energy Marketplace

● Energy (Yahoo!)
Environmental Movement - Studies on the environmental movement are rooted in the social movements'
research tradition. This has been an important and growing area of research within environmental sociology.
Topics of interest include origins of the environmental movement, internal organization and network formation,
global and local movements, and the political role of environmental organizations. Research on environmental
attitudes, values, and behaviors has greatly influenced studies on ideologies and values shared by environmental
movements (Dunlap and Catton, 1979; Buttel, 1987).
Internet Resources
● Environmental Movement (Scarce)
Hazards and Risks - In general, "hazards" refers to natural disasters such as earthquakes, hurricanes,
droughts, etc., and "risk" to human transformation of the environment or technological disasters. While humans
have been exposed to hazards for centuries, risk is considered more of a twentieth-century phenomenon. Initial
sociological work on risk dates from the mid 1980s. Though studies on natural disasters have a longer history
within the field, they have been very much concentrated on "emergency adjustments" instead of adjustments of
humans to their physical environments. More recent sociological work addressing technology have shortened the
gap between studies on hazards and risk and promoted a focus on long-term human adjustments or environmental
consequences (Dunlap and Catton, 1979, Rosa, 1998).
Internet Resources
● Hazards and Risk - Virtual Library
Leisure/Recreation - This area of research grew up from traditional sociological research on leisure behavior
and was key in the transition from a "sociology of environment" to an "environmental sociology." Today, it
comprises studies on parks and forest, land management and planning, ecotourism, and resource management
(Dunlap and Catton, 1979). The International Symposiums on Society and Resource Management have been an
important forum of discussion of these issues within the social sciences.
Internet resources
● Recreation - Sociosite
● ISSRM: Culture, Environment, and Society
Natural Resources - Studies of resource-dependent communities are one of the important roots of
environmental sociology. Studies departing from this tradition have called attention to the human dimensions of
processes, which deplete natural resources. These studies have furnished alternative perspectives on the causes and
consequences of, as well as solutions to these processes. On the one hand, these studies offer alternatives to ones
restricted to the biological dimensions; on the other hand, they are alternatives in terms of acknowledging
indigenous and other local peoples. Interest on themes such as deforestation, soil conservation practices, and
agroforestry systems have grown very rapidly.
Internet resources
● Forestry - Virtual Library
● Agroforestry
Social Impact Assessment (SIA) - Studies of impacts have a long tradition within sociology and other
sciences, but social or socioeconomic impact assessment as a field emerged in the early 1970s. Its origin--response
to environmental legislation--and its development have combined science and policy-making. In general, SIAs aim
to anticipate the likely consequences of a project before it is executed. The field is interdisciplinary by definition
and has mainly focused on large-scale construction projects such as energy projects, highways, etc. Two current
important challenges in the field of SIA, and hopefully future contributions of it, are to produce new data, both
comprehensive and integrated, and to define mechanisms to incorporate scientific knowledge into the political
decision-making process (Freudenburg, 1986).
Internet resources
● International Association for Impact Assessment (IAIA)
Sustainable Development - Though development studies have a long tradition in sociology, studies on
sustainable development emerged in the late 1980s. Sociologists were influenced by the same facts and trends that
led to the notion of sustainability defended in Our Common Future, the report of the World Commission on
Environment and Development (1987), United Nations. Besides the breadth of sociological literature on
sustainable development, this is still a contested concept within sociology. This fact might have more to do with
contemporary sociological thinking forged within the rise of the western developmental project (McMichael, 1996)
than with inherent challenges of the concept itself.
Internet resources
● Sustainable Development - CESSE

● Sustainable Development Dimensions - FAO
● Sustainable Development - Virtual Library
● Sustainable Development (Yahoo!)
●
[Back to Contents]
Future Perspectives
In the mid 1980s, a time of retraction of the environmental momentum, Buttel (1987) noted that in the 1970s, "environmental
sociologists sought nothing less than the reorientation of sociology toward a more holistic perspective that would conceptualize
social processes within the context of the biosphere. These lofty intentions, however, have largely failed to come to fruition . . . [In
addition,] environmental sociology has become routinized and is now viewed--both by its practitioners and other sociologists--less as
a scholarly 'cause' or movement than as just another sociological specialization" (1987:466).
This change of route in the development of environmental sociology reflects two orders of issues: 1) the historical circumstances at
the national and international level, such as the emergence of the environmental critique and the energy crisis in the early 1970s and
waves of declined interest in the 1980s, and 2) inherent difficulties to address theoretical issues that sociologists have struggled with
since the 1800s: the nature of society, the nature of social stratification, and the means through which social change can come about.
Zavestoski's (1997) discussion of the theoretical status of current environmental sociology recognizes its limitations. According to
this author, a solid theoretical basis for environmental sociology should:
1. "acknowledge, although not necessarily account for, both substructurally environmental phenomena and intentional
environmental phenomena;
2. account for the unique position of humans as both a part of the web of life as well as social, self-reflective, and moral
beings;
3. strive to avoid biological reductionism and social determinism;
4. establish the proper relation between social constructivism and logical positivism/empirical realism;
5. determine the usefulness of ecological concepts; and
6. acknowledge the role of the social psychological process of the self in micro-level decision-making about behaviors that
affect the environment" (1997:6).
It is interesting to note that these guidelines parallel some of the early challenges and aims that led to the emergence of environmental
sociology in the 1970s and that remain broader challenges to the discipline as a whole. Current areas of research within
environmental sociology and emerging ones such as environmental justice, global environmental change, and urban environment,
would greatly benefit from these theoretical advances. More than that, however, if environmental sociologists are able to address
these issues, "traditional sociology could as easily be seen as a more limited form of environmental sociology--a form of sociology
that deliberately limits its vision. . ." (Gramling and Freudenburg, 1996).
[Back to Contents]
Sociology Journals
Environmental sociology is poorly represented in mainstream American sociology journals. Fewer than two percent of all articles
published in nine mainstream sociology journals from 1969 through 1994 discussed the environment. Higher-prestige journals were
even less likely to publish environmental articles. In the 1990s, the number of articles on the environment increased (Krogan and
Darlington, 1996). Some sociology or social sciences journals more open to environmental analysis include:
● The American Sociologist (special issue, 1994 vol. 25(1))

● Human Ecology
● Rural Sociology
● Social Forces
● Social Problems (special issue, 1993, vol. 40)
● Social Science Quarterly (special issues, 1996 vol. 77(3); 1997, vol. 78(1), and vol. 78(4))
● Society and Natural Resources
● Sociological Forum
● Sociological Inquiry (special issue, 1993, vol. 53)
● Sociological Perspectives
● The Sociological Quarterly
● Sociological Spectrum (special issue, 1993, vol.13)
[Back to Contents]
Links of Interest
Environmental Sociology Pages
1. Environmental Sociology Page - Rik Scarce (MSU)
Sociology - General
1. Sociology Internet Resources (W. Conn. State Univ.)

2. Sociology - Sociosite
3. Sociology - Virtual Library
4. SocioWeb
5. Sociology (Yahoo!)
Ecology/Environmental Sciences
1. Ecology (Yahoo!)
2. Environment - CESSE
3. Environment - Sociosite
4. Environment - Virtual Library
5. Environmental Studies (Yahoo!)
Institutes and Research Centers
1. CGIAR Research Centers

2. Institutes - Environmental Studies (Yahoo!)
3. CEEP - University of Delaware
4. IISDnet, Manitoba, Canada
5. NIE, Washington, DC
6. OCEES, Oxford, UK
Databases
1. CIESIN
2. Envirofacts Warehouse
3. SEDAC (CIESIN)
4. Statistical Resources on the Web - University of Michigan
Other links - Miscellaneous

1. Amanaka'a Amazon Network
2. Envirolink
3. Environment - Latin America
4. Environmental Health Information Service
5. Environmental Organization Webdirectory
6. Environmental Working Group
7. The Green Disk
8. Rainforest Action Network
9. Science, Technology and Society (Yahoo!)
Electronic Lists
ELAN (Environment in Latin America Network) - Useful updates on breaking environmental issues in LA and
provides a networking for students, scholars, and activists around the world. To subscribe, send the message "SUB
ELAN" to: listproc@csf.colorado.edu.
ENVTECSOC - Section on Environment and Technology (ASA) discussion list. An active (but not overburdened)
forum for all kinds of news in the world of environmental sociology and related areas. To subscribe, send the
message "sub envtecsoc yourfirstname yourlastname" to: listproc@csf.colorado.edu.
Infoterra - For general communication on environmental topics, posting and responding to queries to the Infoterra
network and requesting information from UNEP. To subscribe, send the message subscribe "infoterra your@email.
address" to: Majordomo@cedar.univie.ac.at.
Rainforest@UMIAMI.IR.MIAMI.EDU. All aspects of rain forests. Any inquiries, problems, or issues about the list
can be sent to: administrator@gdarwin.cox.miami.edu. Subscribe from listserv@gdarwin.cox.miami.edu.
SANet - The sustainable agriculture bulletin board; a worldwide forum for all topics related to sustainable
agriculture. To subscribe, send the message "subscribe sanet-mg" to: almanac@ces.ncsu.edu.
Sustainable Agriculture listservs (FSR, University of Guelph, Ontario)
[Back to Contents]
Cited References
Beus, Curtis, E. and Dunlap, Riley, E. 1990. "Conventional versus alternative agriculture: The paradigmatic roots of the debate."
Rural Sociology, 55(4), pp. 590-616.
Buttel, Frederick H. 1987. "New directions in environmental sociology." Annual Review of Sociology, 13, pp. 465-88.
Catton, William, Jr. and Dunlap, Riley, E. 1978. "Environmental sociology: A new paradigm." The American Sociologist, 13:41-9.
Catton, William, Jr. and Dunlap, Riley, E. 1980. "A new ecological paradigm for post-exuberant paradigm." American Behavioral
Scientist, 24(1), pp. 15-47.
Dunlap, Riley, E. and Catton, William, Jr. 1979. "Environmental sociology." Annual Review of Sociology, 5, pp. 243-73.
Dunlap, Riley, E. and Catton, William, Jr. 1994. "Struggling with human exemptionalism: The rise, decline, and revitalization of
environmental sociology." The American Sociologist, 25(1), pp. 5-30.
Freudenburg, William, R. 1986. "Social impact assessment." Annual Review of Sociology, 12, pp. 451-78.
Freudenburg, William R. and Gramling, Robert. 1989. "The emergence of environmental sociology: Contributions of Riley E.
Dunlap and William Catton, Jr." Sociological Inquiry, 59(4), pp. 439-52.
Gramling, Robert and Freudenburg, William R. 1996. "Environmental sociology: toward a paradigm for the 21st century."
Sociological Spectrum, 16, pp. 347-70.
Krogman, Naomi T. and Darlington, JoAnne DeRouen (1996). "Sociology and the environment: An analysis of journal coverage."
The American Sociologist, 27(3), pp.39-55.
McMichael, Philip. 1996. Development and social change: A global perspective. Thousand Oaks, CA: Pine Forge Press.
Rosa, Eugene, A. 1998. "Risk and environmental sociology." Environment, Technology, and Society (newsletter, Section on
Environment and Technology, American Sociological Association), 88, pp. 8.
Rosa, Eugene, A., Machlis, Gary, E. and Keating, Kenneth, M. 1988. "Energy and society." Annual Review of Sociology, 14, pp.
149-72.
Schnaiberg, Allan. 1975. "Social synthesis of the societal-environmental dialectic: The role of distributional impacts." Social Science
Quarterly, 56, pp. 5-20.
Schnaiberg, Allan. 1980. The environment. New York, NY: Oxford University Press.
The World Commission on Environment and Development (United Nations). 1987. Our Common Future. Oxford, UK: Oxford
University Press.
Zavestoski, Steve. 1997. "Emerging theoretical parameters in environmental sociology." Environment, Technology, and Society
(newsletter, Section on Environment and Technology, American Sociological Association), 85, pp. 5-6.
[Back to Contents]
Top of Page Send your Comments Student Gallery Trochim's Home Page
Special thanks for William Trochim who introduced me to the Web world, Gilbert Gillespie Jr. and Barbara Lynch who offer some
of the links on this page, and Annette Finney who revised the text.
jms56@cornell.edu
Copyright © 1997 John Sydenstricker-Neto

Sample of Syllabi
Arunas Juska - Society, Environment, and Resource Conservation, University of Nebraska at Omaha
Max Pfeffer - Environment and Society, Cornell University
Rik Scarce - Environmental Sociology, Montana State University
Timmons Roberts, Teaching Resources on the Environment, University of California at Santa Cruz
Tom Rudel - Environmental Problems in Historical Perspective , Cook College, Rutgers University
[ Back to Environmental Sociology ]

ENVIRONMENT AND SOCIETY
Rural Sociology 324/Science and Technology Studies 324
Cornell University
Spring 1998
Professor: Max J. Pfeffer
Office: Warren Hall 331

E-mail: mjp5@cornell.edu
Telephone: 255-1676
Office hours: 3:00-5:00 PM, Wednesdays or by appointment
Graduate Teaching Assistant: Alan Barton
Office: Warren Hall 132

E-mail: awb4@cornell.edu
Telephone: 255-2155
Office hours: 12:30-1:15, Mondays and Wednesdays or by appointment
Undergraduate Teaching Assistant: Clayton Summers
Office: 133 Warren Hall

E-mail: cfs2@cornell.edu
Telephone: 256-5145
Office hours: 3:00-4:00PM, Thursday
Course Objectives
The main objective of the course is to help you develop a critical understanding of the dominant trends in modern U.S. environmental
thought. These include preservationism, conservationism, deep ecology, eco-feminism, social ecology, NIMBYism, and
environmental justice. Many of these areas of thought are part of a set of loosely integrated ideas some consider an alternative social
paradigm rooted in ecological concerns and principles. We will compare environmentalist thought with liberal, or free market,
economics and positivist science. Another course objective is to familiarize you with the sociological dimensions of some major
contemporary substantive environmental problems and policies. These topics include air and water quality, public lands management,
biodiversity, deforestation, climate change, and ozone depletion.
Required Textbooks
Harper, Charles. Environment and Society: Human Perspectives on Environmental Issues. Prentice-Hall, 1996.
Kline, Benjamin. First Along the River: A Brief History of the U.S. Environmental Movement. Acada Books, 1997.
A packet of additional required reading is available at the Cornell Campus Store.
Requirements
1. Three examinations (two preliminaries and one final)
These will consist primarily of essays administered through a combination of take-home and in-class questions. We
will hold review sessions prior to each exam during the regular class meeting time.
2. Regular attendance and completion of all assigned readings.
Grading
Requirement Percentage of Grade

Examination 1 30
Examination 2 30
Final examination 40
TOTAL 100
[ Back to Top | Back to Syllabi | Back to Environmental Sociology ]
Schedule, Topic and Reading Assignment
January 19 Introduction and Orientation
Reading required:
Kline, Chapters 1-2.
Reading suggested:
Timothy O'Riordan. "Frameworks for Choice: Core Beliefs and the Environment." Environment 37(8):4-29, 1993.
January 21 Environmentalism and Conservation
Reading required:
Harper, Chapter 1.
Reading suggested:
McKibben, Bill. "What Good is a Forest?" Audubon 98(3):54-63, 1996.
Babbit, Bruce. "Science: Opening the Next Chapter of Conservation History." Science 267(March):1954-1955,
1995.
Brick, Phil. "Determined Opposition: The Wise Use Movement Challenges Environmentalism." Environment 37
(8):17-41, 1995
Orr, David W. "Conservation and Conservatism." Conservation Biology 9(2):242-245, 1995
Tokar, Brian. "Between the Loggers and the Owls: The Clinton Northwest Forest Plan." The Ecologist 24(4):149-
153, 1994.
Caufield, Henry P. "The Conservation and Environmental Movements: An Historical Analysis." Pp. 13-55 in
James P. Lester (ed.), Environmental Politics and Policy: Theories and Evidence. Durham: Duke University Press,
1989.
Koppes, Clayton R. "Efficiency, Equity, Esthetics: Shifting Themes in American Conservation." Pp. 230-251 in
Donald Worster (ed.), The Ends of the Earth: Perspectives on Modern Environmental History. Cambridge:
Cambridge University Press, 1988.
Short, Brant C. "America's Conservation Consensus: Origins of the Public Lands Debate." Pp. 4-9 in Brant C.
Short, Ronald Reagan and the Public Lands: American's Conservation Debate, 1979-1984. College Station: Texas
A&M University, 1979.
January 23 Environmentalism and Preservation
Reading required:

Reading suggested:
Nash, Roderick. Wilderness and the American Mind. New Haven: Yale University Press, 1967.
Chester, Charles C. "Controversy over Yellowstone's Resources: People, Property, and Bioprospecting."
Environmental History 38(8):10-35, 1996.
Cronon, William. "The Trouble with Wilderness or, Getting back to the Wrong Nature. Environmental History 1
(1):8-27, 1996.
January 26 National Parks and National Forests in American Environmentalism
Reading required:
Kline, Chapter 7.
Morgan, Mark J. "Resources, Recreationists, and Revenues: A Policy Dilemma for Today's State Park Systems."
Environmental Ethics 18(3):279-291, 1996.
Watkins, T.H "National Parks, National Paradox." Audubon 99(4):40-43, 1997.
Reading suggested:
Anderson, H. Michael. "Reforming National-Forest Policy." Issues in Science and Technology 10(2):40-47, 1994.
Sax, Joseph L. "Parks, Wilderness, and Recreation." Pp. 114-140 in Michael J. Lacey (ed.), Governments and
Environmental Politics: Essays on Historical Developments Since World War Two. Washington, D.C.: The
Woodrow Wilson Center Press, 1993.
Kelly, John R. "Wildland Recreation and Urban Society: Critical Perspectives." Pp. 33-50 in Alan W. Ewert,
Deborah J. Chavez and Arthur W. Magill, Culture, Conflict, and Communication in the Wildland-Urban Interface.
Boulder: Westview Press, 1993.
National Research Council. Science and the National Parks. Washington, D.C.: National Academy Press, 1992.
Rosenbaum, Walter A., "Our 700 Million Acres: The Battle for Public Lands." Pp. 270-299 in Walter A.
Rosenbaum, Environmental Politics and Policy. Washington, D.C.: CQ Press, 1991.
January 28 Contemporary Environmentalism
Reading required:
Harper, Chapter 8.
Reading suggested:
Dunlap, Riley E. and Angela G. Mertig. "The Evolution of the U.S. Environmental Movement from 1970 to 1990:
An Overview." Pp. 1-10 in Riley E. Dunlap and Angela G. Mertig (eds.), American Environmentalism: The U.S.
Environmental Movement, 1970-1990. Philadelphia: Taylor and Francis, 1992.
McCloskey, Michael. "Twenty Years of Change in the Environmental Movement: An Insider's View." Pp. 77-88 in
Riley E. Dunlap and Angela G. Mertig (eds.), American Environmentalism: The U.S. Environmental Movement,
1970-1990. Philadelphia: Taylor and Francis, 1992.
Sheail, John. "Green History: The Evolving Agenda." Rural History: Economy, Society, Culture 4(2):209-223,
1993.
Hayes, Denis. "Earth Day 1990: Threshold of a Green Decade." Natural History April:55-58, 67-70, 1993.
January 30 Domestic Environmental Policy Alternatives The Power to Protect
Reading required:
Hockenstein, Jeremy B., Robert N. Stavins, and Bradley W. Whitehead. "Crafting the Next Generation of Market
Based Environmental Tools." Environment 39(4):12-33, 1997.
Jasanoff, Sheila. "The Dilemma of Environmental Democracy." Issues in Science and Technology 13(1):63-70,
1996.
Reading suggested:
Gottlieb, Robert. "Beyond NEPA and Earth Day: Reconstructing the Past and Envisioning a Future for
Environmentalism." Environmental History Review 19(4):1-14, 1995.
Karliner, Joshua. "The Environment Industry: Profiting from Pollution." The Ecologist 24(2):59-63, 1994.
Hahn, Robert W. "United States Environmental Policy: Past, Present and Future." Natural Resources Journal 34
(2):305-348, 1994.
Freeman, A. Myrick II. "Economies, Incentives, and Environmental Regulation." Pp. 145-167 in Norman J. Vig
and Michael Kraft (eds.), Environmental Policy in the 1990s: Toward a New Agenda. Washington, D.C.:
Congressional Quarterly, 1994.
John, Dewitt. "Civic Environmentalism." Issues in Science and Technology 10(4):30-34, 1994.
Mitchell, Robert Cameron, Angela G. Mertig, and Riley E. Dunlap. "Twenty Years of Environmental Mobilization:
Trends Among National Environmental Organizations." Pp. 11-26 in Riley E. Dunlap and Angela G. Mertig (eds.),
American Environmentalism: The U.S. Environmental Movement, 1970-1990. Philadelphia: Taylor and Francis,
1992.
Ingram, Helen M. and Dean E. Mann. "Interest Groups and Environmental Policy." Pp. 135-157 in James P. Lester
(ed.), Environmental Politics and Policy: Theories and Evidence. Durham: Duke University Press, 1989.
Rosenbaum, Walter A. "The Bureaucracy and Environmental Policy." Pp. 212-237 in James P. Lester (ed.),
Environmental Politics and Policy: Theories and Evidence. Durham: Duke University Press, 1989.
February 2 Clean Air Policies
Reading required:
Harper, Chapter 9.
Switzer, Jacqueline Vaughn. "Urban Air Quality: Getting Better or Getting Worse?" Pp. 191-213, in
Environmental Politics: Domestic and Global Dimensions, 1994.
Reading suggested:
Nixon, Will. "The Air Down There." The Amicus Journal Summer:42-45, 1994.
Popper, Frank J. "Thinking Globally, Acting Locally." Technology Review April:46-53, 1992.
Schroeder, Christopher. "The Evolution of Federal Regulations of Toxic Substances." Pp. 263-313 in Michael J.
Lacey (ed.), Government and Environmental Politics: Essays on Historical Developments Since World War Two.
Washington, D.C.: The Woodrow Wilson Center Press, 1993.
Rosenbaum, Walter A., "The Unfinished Agenda: Air and Water." Pp. 169-211 in Walter A. Rosenbaum,
Environmental Politics and Policy. Washington, D.C.: CQ Press, 1991.
February 4 Clean Water Policies
Reading required:
Harper, pp. 71-80.
Naiman, Robert J., John J. Magnuson, Diane M. McKnight, Jack A. Stanford, James R. Karr. "Freshwater
Ecosystems and Their Management: A National Initiative." Science 270(5236):584-585, 1995
Foran, Jeffery A. and Robert W. Adler. "Cleaner Water, But Not Clean Enough." Issues in Science and Technology
10 (2):33-39, 1994.
Reading suggested:
Moore, Susan, Margot Murphy and Ray Watson. "A Longitudinal Study of Domestic Water Conservation
Behavior." Population and Environment 16(2):175-189, 1994.
February 6 The Regulation of Toxic Substances
Reading required:
Harper, pp. 89-107.
Russell, Edmund P. III. "Lost Among the Part Per Billion: Ecological Protection at the United State Environmental
Protection Agency, 1970-1993." Environmental History 2(1):29-51, 1997.
Reading suggested:
Hearne, Shelley A. "Tracking Toxics: Chemical Use and the Public's 'Right to Know.'" Environment 38(6):5-33,
1996.
Graham, John D. and March Sadowitz. "Superfund Reform: Reducing Risk through Community Choice." Issues in
Science and Technology 10(4):35-40, 1994.
Szasz, Andrew. Ecopopulism: Toxic Waste and the Movement for Environmental Justice. Minneapolis: University
of Minnesota Press, 1994.
Rosenbaum, Walter A. "A Chemical Plague: Toxic and Hazardous Substances." Pp. 212-239 in Walter A.
Rosenbaum, Environmental Politics and Policy. Washington, D.C.: CQ Press, 1991.
February 9 New Directions in Environmental Management:
Environmental Destruction and Restoration
Reading required:
Williams, Ted. "Deregulating the Wild." Audubon 99(4):56-63: 92-94, 1997.
Dobson, Andy P., A.D. Bradshaw, A.J.M. Baker. "Hopes for the Future: Restoration Ecology and Conservation
Biology." Science 277(July):515-522,, 1997.
Reading suggested:
Cairns, John Jr. "Ecosocietal Restoration: Reestablishing Humanity's Relationship with Natural Systems."
Environment 37(5):4-9, 1995.
Derr, Mark. "Redeeming the Everglades." Audubon October:48-56, 128-131, 1993.
Teclaff, Ludwik A. "Beyond Restoration--The Case of Ecocide." Natural Resources Journal 34(4):933-956, 1994.
Holloway, Marguerite. "Nurturing Nature." Scientific American April:98-108, 1994.
National Research Council. Restoration of Aquatic Ecosystems. Washington, D.C.: National Academy Press, 1992.
February 11 Deep Ecology
Reading required:
Devall, Bill. "Deep Ecology and Radical Environmentalism." Pp. 51-62 in Riley E. Dunlap and Angela G. Mertig
(eds.), American Environmentalism: The U.S. Environmental Movement, 1970-1990. Philadelphia: Taylor and
Francis, 1992.
Reading suggested:
Borelli, Peter. "The Ecophilosophers." Amicus Journal Spring:30-39, 1988.
Russell, Dick. "The Monkey Wrenchers." Pp. 27-49 in Peter Borelli (ed.), Crossroads: Environmental Priorities for
the Future. Washington, D.C.: Island Press, 1989.
February 13 Eco-Feminism
Reading required:
Birkeland, Janis. "The Relevance of Ecofeminism to the Environmental Professions." The Environmental
Professional 17(1):55-71, 1995.
Reading suggested:
Slicer, Deborah. "Is There an Ecofeminism--Deep Ecology 'Debate'?" Environmental Ethics 17:151-169, 1995.
Birkeland, Janis. "Ecofeminism" Linking Theory and Practice." Pp. 13-59 in Greta Gaard (ed.), Ecofeminism.
Philadelphia: Temple University Press, 1993.
Kheel, Marti. "Ecofeminism and Deep Ecology: Reflections on Identity and Difference." Trumpeter 8(2):62-72,
1991.
Plumwood, Val. "Nature, Self, and Gender: Feminism, Environmental Philosophy, and the Critique of
Rationalism." Hypatia 6(1):3-27, 1991.
February 16 Social Ecology
Reading required:
Bookchin, Murray. "The Concept of Social Ecology." Pp. 152-162 in Carolyn Merchant (ed.), Ecology. Atlantic
Highlands: Humanities Press, 1994.
Reading suggested:
Sale, Kirkpatrick. "Silent Spring and After: The U.S. Movement Today." The Nation 257(3):92-95, 1993.
Bookchin, Murray. Remaking Society. New York: Black Rose Books, 1989.
Tokar, Brian. "Social Ecology, Deep Ecology, and the Future of Green Political Thought." The Ecologist 18
(4/5):132-142, 1988.
February 18 Discussion and Review
February 20 PRELIMINARY EXAMINATION 1
February 23 Science, Politics and Environmental Conflict
Reading required:
Hattis, Dale. "Drawing the Line: Quantitative Criteria for Risk Management." Environment 38(6):10-39, 1996.
Reading suggested:
Barnard, Neal D. and Stephen R. Kaufman. "Animal Research is Wasteful and Misleading." Scientific American
276(2):80-82, 1997.
Botting, Jack H. and Adrian R. Morrison. "Animal Research is Vital to Medicine." Scientific American 276(2):83-
85, 1997.
Susskind, Lawrence and Sarah McKearnan. "Enlightened Conflict Resolution." Technology Review April:70-72,
1995.
Oppenheimer, Michael. "Context, Connection, and Opportunity in Environmental Problem Solving." Environment
37(5):10-15, 1995.
National Research Council. Understanding Risk. Washington, D.C.: National Academy Press, 1996.
Andrews, Richard. "Risk Assessment: Regulation and Beyond." Pp. 167-186 in Norman J. Vig and Michael E.
Kraft (eds.), Environmental Policy in the 1990s: Toward a New Agenda. Washington, D.C.: Congressional
Quarterly Press, 1994.
Dietz, Thomas, Paul Stern and Robert W. Rycroft. "Definitions of Conflict and the Legitimation of Environmental
Risk." Sociological Forum 4(1):47-70, 1989.
Freudenberg, Nicolas. "Science and Politics: The Limitations of Environmental Health Research." Pp. 42-59 in
Nicolas Freudenberg (ed.), Not in Our Backyards! Community Action for Health and the Environment. New York:
Monthly Review Press, 1984.
Brinkman, Ronald and Sheila Jasanoff. "Concept of Risk and Safety in Toxic-Substance Regulation: A
Comparison of France and the United States." Pp. 203-213 in Dean E. Mann (ed.), Environmental Policy
Formation: The Impact of Values, Ideology, and Standards. Lexington: Lexington Books, 1981.
February 25 Grassroots Environmentalism
Reading required:
Lichterman, Paul. "Piecing Together Multicultural Community: Cultural Differences in Community Building
Among Grass-Roots Environmentalists." Social Problems 42(4):513-524, 1995.
Reading suggested:
Freudenburg, Nicolas and Carol Steinsapir. "Not in Our Backyards: The Grassroots Environmental Movement."
Pp. 27-35 in Riley E. Dunlap and Angela G. Mertig (eds.), American Environmentalism: The U.S. Environmental
Movement, 1970-1990. Philadelphia: Taylor and Francis, 1992.
Glance, Natalie S. and Bernardo A. Huberman. "The Dynamics of Social Dilemmas." Scientific American
March:76-81, 1994.
Mazamanian, Daniel and David Morell. "The 'Nimby' Syndrome: Facility Siting and the Failure of Democratic
Discourse." Pp. 125-143 in Norman J. Vig and Michael E. Kraft (eds.), Environmental Policy in the 1990s: Toward
a New Agenda. Washington, D.C.: Congressional Quarterly Press, 1994.
Taylor, Dorceta E. "Environmentalism and the Politics of Inclusion." Pp. 53-63 in Robert D. Bullard, Confronting
Environmental Racism: Voices from the Grassroots. Boston: South End Press, 1993.
Walsh, Edward, Rex Warland, and D. Clayton Smith. "Backyards, NIMBYs, and Incinerator Sitings: Implications
for Social Movement Theory." Social Problems 40(1):25-38, 1993.
February 27 The Geographic Distribution of Environmental Hazards
Reading required:
Melosi, Martin V. "Equity, Eco-Racism and Environmental History." Environmental History Review. 19(3):1-16,
1995.
Reading suggested:
Vollers, Maryanne. "Everyone Has Got to Breathe." Audubon March-April:65-73, 1995.
Collin, Robert W. and William Harris, Sr. "Race and Waste in Two Virginia Communities." Pp. 93-106 in Robert
D. Bullard, Confronting Environmental Racism: Voices from the Grassroots. Boston: South End Press, 1993.
Bailey, Conner, Charles E. Faupel, and James H. Gundlach. "Environmental Politics in Alabama's Blackbelt." Pp.
107-122 in Robert D. Bullard, Confronting Environmental Racism: Voices from the Grassroots. Boston: South End
Press, 1993.
Greenberg, Michael. "Proving Environmental Inequity in Siting Locally Unwanted Land Uses." Risk: Issues in
Health and Safety 4:235-252, 1993.
Hamilton, Cynthia. "Coping with Industrial Exploitation." Pp. 63-76 in Robert D. Bullard, Confronting
Pulido, Laura. "Sustainable Development at Granado del Valle." Pp. 123-140 in Robert D. Bullard, Confronting
March 2 Chemical Valley
Reading required:
Lewis, Susan and Kathy James. "Whose Voice Sets the Agenda for Environmental Education? Misconceptions
Inhibiting Racial and Cultural Diversity." The Journal of Environmental Education. 26(3):5-12.
Reading suggested:
Sheppard, Judi Anne Caron. "The Black-White Environmental Concern Gap: An Examination of Environmental
Paradigms." Journal of Environmental Education 26(2):24-35, 1995.
Phoenix, Janet. "Getting the Lead out of the Community." Pp. 77-92 in Robert D. Bullard, Confronting
Pea, Devon and Joseph Gallegos. "Nature and Chicanos in Southern Colorado." Pp. 141-160 in Robert D. Bullard,
Confronting Environmental Racism: Voices from the Grassroots. Boston: South End Press, 1993.
Moses, Marion. "Farmworkers and Pesticides." Pp. 161-178 in Robert D. Bullard, Confronting Environmental
Racism: Voices from the Grassroots. Boston: South End Press, 1993.
March 4 Environmental Discrimination and Social Inequality
Reading required:
Hampson, Fen Osler and Judith Reppy. "Environmental Change and Social Justice." Environment 39(3):12-39,
1997.
Spence, Mark David. "Crown of the Continent, Backbone of the World: The American Wilderness and Blackfeet
Exclusion from Glacier National Park." Environmental History 1(3):29-49, 1996.
Reading suggested:
Taylor, Dorceta. "Environmentalism and the Politics of Inclusion." Pp. 53-62 in Robert D. Bullard, Confronting
Capex, Stella M. "The "Environmental Justice' Frame: A Conceptual Discussion and an Application." Social
Problems 40(1):5-24, 1993.
March 6 Biodiversity: Conservation or Preservation?
Reading required:
Harper, pp. 80-88.
Raustiala, Kal and David G. Victor. "The Future of the Convention on Biological Diversity." Environment 38
(4):17-44, 1996.
Reading suggested:
Hurlbut, David. "Fixing the Biodiversity Convention: Toward a Special Protocol for Related Intellectual Property."
Natural Resources Journal 34(2):379-409, 1994.
Fowler, Carey and Pat Mooney. Shattering: Food, Politics, and the Loss of Genetic Diversity. Tuscon: The
University of Arizona Press, 1990.
Murphy, Dennis. "Challenges to Biological Diversity in Urban Areas." Pp. 71-76 in E.O. Wilson and Francis Peter
(eds.), Biodiversity. Washington, D.C.: National Academy Press, 1988.
March 9 Biodiversity or Biotechnology?
Reading required:
Posey, Darrell A. "Protecting Indigenous People's Rights to Biodiversity." Environment 38(3):6-45, 1996.
Paoletti,, Maurizio and David Pimentel. "Gentic Engineering in Agriculture and the Environment: Assessing Risks
and Benefits." Bioscience 46(9):665-673,1996.
Reading suggested:
Kloppenburg, Jack, Jr. and Beth Burrows. "Biotechnology to the Rescue? Twelve Reasons Why Biotechnology is
Incompatible with Sustainable Agriculture." The Ecologist 26(2):61-67, 1996.
Reid, Walter V. "The Economic Realities of Biodiversity." Issues in Science and Technology 10(2):48-55, 1994.
University of Arizona Press, 1990.
Sagoff, Mark. 1988. "Biotechnology and the Environment: What is the Risk?" Agriculture and Human Values. 137
(Summer):26-35, 1988.
March 11 Tropical Deforestation and the Human Population
Secrets of the Chuco
Reading required:
Batisse, Michel. "Biosphere Reserves: A Challenge for Bio-Regional Development." Environment 39(5):7-33, 1997.
Guha, Ramachandra. "The Authoritarian Biologist and the Arrogance of Anti-Humanism: Wildlife Conservation in the Third World."
The Ecologist 27(1):14-20, 1997.
Reading suggested:
Hecht, Susana B. "The Logic of Livestock and Deforestation in Amazonia." Bioscience 43(November):687-695, 1993.
Rudel, Thomas K. Tropical Deforestation: Small Farmers and Land Clearing in the Ecuadorian Amazon. New York: Columbia
University Press, 1993.
Tucker, Richard P. "The Depletion of India's Forests Under British Imperialism: Planters, Foresters, and Peasants in Assam and
Kerala." Pp. 128-140 in Donald Worster (ed.), The Ends of the Earth: Perspectives on Modern Environmental History. Cambridge:
Cambridge University Press, 1988.
Fearnside, Phillip M. "Deforestation in Brazilian Amazonia: The Rate and Causes of Forest Destruction." The Ecologist. 19(6):214-
218, 1988.
March 13 Tropical Deforestation and Biodiversity
Reading required:
Jepma, CJ. "Tropical Rainforests." Pp. 5-21 in CJ Jepma, Tropical Deforestation: A Socio-Economic Approach.
London: Earthscan Publications Ltd., 1995.
Reading suggested:
Schmink, Marianne. "The Socioeconomic Matrix of Deforestation." Pp. 253-276 in Lourdes Arizpe, M. Priscilla
Stone and David C. Major (eds.), Population and Environment: Rethinking the Debate. Boulder: Westview, 1994.
University of Arizona Press, 1990, Chapter 5.
Uhl, Christopher. "Restoration of Degraded Land in the Amazon Basin." Pp. 326-332 in E.O. Wilson and Francis
Peter (eds.), Biodiversity. Washington, D.C.: National Academy Press, 1988.
Lugo, Ariel. "Estimating Reductions in the Diversity of Tropical Forest Species." Pp. 58-69 in E.O. Wilson and
Francis Peter (eds.), Biodiversity. Washington, D.C.: National Academy Press, 1988.
March 23 Environment and Population: Neo-Malthusianism
Reading required:
Harper, pp. 151-175.
Reading suggested:
Princen, Thomas. "Toward a Theory of Restraint." Population and Environment 18(3):233-255, 1997.
Sagoff, Mark. "Carrying Capacity and Ecological Economics." BioScience 45(9):610-618, 1995.
Daly, Herman E. "Reply to Mark Sagoff's 'Carrying Capacity and Ecological Economics." BioScience 45(9):621-
624, 1995.
Bongaarts, John. "Can the Growing Human Population Feed Itself?" Scientific American March:36-42, 1994.
Meffe, Gary K., Anne H. Ehrlich, and David Ehrenfeld. "Human Population Control: The Missing Agenda."
Conservation Biology 7(1):1-3, 1993.
Ehrlich, Paul R. and Anne H. Ehrlich. The Population Explosion. New York: Simon and Schuster, 1990.
March 25 Environment and Population: Conservatives and Liberals
Reading required:
Reading suggested:
Banks, R. Darryl and George R. Heaton, Jr. "An Innovation-Driven Environmental Policy." Issues in Science and
Technology 12(1):43-51, 1995.
Reilly, William K., Phillip Shabecoff and Derva Lee Davis. "Is There Cause for 'Environmental Optimism'?"
Environmental Science and Technology 29(8):366-369, 1995.
Bartlett, Albert A. "Reflections on Sustainability, Population Growth, and the Environment." Population and
Environment 16(1):5-35, 1994.
Dunlap, Riley. "Ecologist vs. Exemptionalist: The Ehrlich-Simon Debate." Social Science Quarterly 64:200-203,
1983.
March 27 Environment and Population: Striking a Balance
The Bomb Under the World
Reading required:
Connolly, Barbara and Robert O. Keohane. "Institutions for Environmental Aid: Politics, Lessons and
Opportunities." Environment 38(5):12-42, 1996.
Moffat, Anne Simon. "Ecologists Look at the Big Picture." Science 273(September):1480, 1996.
Reading suggested:
Olson, Molly Harriss. "Charting a Course for Sustainability." Environment 38(4):11-36, 1996.
Henson, Paul. "Population Growth, Environmental Awareness, and Policy Direction." Population and Environment
15(4):265-277, 1994.
Meadows, Donella, Dennis L. Meadows, and Jrgen Randers. Beyond the Limits: Confronting Global Collapse,
Envisioning a Sustainable Future. Post Mills: Chelsea Green Publishing Company, 1992.
March 30 Environment, Poverty and Development
Reading required:
Rosegrant, Mark W. and Robert Livernash. "Growing More Food, and Doing Less Damage." Environment 38(4):6-
31, 1996.
Mellor, John W. "Environmental Problems and Poverty." Environment 30(9):9-13, 28-30, 1988.
Reading suggested:
Nagpal, Tanvi. "Voices From the Developing World: Progress Toward Sustainable Development." Environment 37
(8):10-35, 1995.
Keyfitz, Nathan. "The Growing Human Population." Pp. 61-72 in Managing Planet Earth: Readings from Scientific
American Magazine. New York: W. H. Freeman and Company, 1990.
April 1 Environment, Technology and Development
Reading required:
Kammen, Daniel M. and Michael R. Dove. "The Virtues of Mundane Science." Environment 39(6):10-41, 1997.
Matson, P.A., W.J. Parton, A.G. Power, A.J. Swift. "Agricultural Intensification and Ecosystem Properties."
Science 277(July):504-509, 1997.
Reading suggested:
Plunknett, Donald L. "Technology for Sustainable Agriculture." Scientific American September:182-186, 1995.
Crosson, Pierre and Norman J. Rosenberg. "Strategies for Agriculture." Pp. 73-83 in Managing Planet Earth:
Readings from Scientific American Magazine. New York: W. H. Freeman and Company, 1990.
MacNeill, Jim. "Strategies for Sustainable Economic Development." Pp. 109-123 in Managing Planet Earth:
Readings from Scientific American Magazine. New York: W. H. Freeman and Company, 1990.
Senanayake, Ranil. "The Ecological, Energetic, and Agronomic Systems of Ancient and Modern Sri Lanka." Pp.
227-307 in Gordon Douglass (ed.), Agricultural Sustainability in a Changing World Order. Boulder: Westview
Press, 1984.
April 3 Discussion and Review
April 6 PRELIMINARY EXAMINATION 2
April 8 Human Dimensions of Global Ecosystems
Reading required:
Bloom, David. "International Public Opinion on the Environment." Science 269(5222):354-357, 1995.
Dunlap, Riley E., George H. Gallup and Alec M. Gallup. "Of Global Concern: Results of the Planetary Survey."
Environment 35(9):7-39, 1993.
Reading suggested:
Caldwell, Lynton. "Globalizing Environmentalism: Threshold of a New Phase in International Relations." Pp. 63-
76 in Riley E. Dunlap and Angela G. Mertig (eds.), American Environmentalism: The U.S. Environmental
Movement, 1970-1990. Philadelphia: Taylor and Francis, 1992.
April 10 Human Dimensions of Ozone Depletion
Reading required:
Reading suggested:
Dowie, Mark. "A Sky Full of Holes." The Nation 263(2):11-16, 1996.
Stewart, Richard B. "Comprehensive and Market-Based Approaches to Global Change Policy." Pp. 24-37 in John
M. Reilly and Margot Anderson (eds.), Economic Issues in Global Climate Change: Agriculture, Forestry, and
Natural Resources. Boulder: Westview Press, 1992.
Morrisette, Peter M. "The Evolution of Policy Responses to Stratospheric Ozone Depletion." Natural Resources
Journal 29:794-820, 1989.
Young, Orin. "The Politics of International Regime Formation: Managing Natural Resources and the
Environment." International Organization 43(3):349-375, 1989.
April 13 Science, Technology and Climate Change
Reading required:
Harper, pp. 115-147
Reading suggested:
Muller, Frank. "Mitigating Climate Change: The Case for Energy Taxes." Environment 38(2):13-43, 1995.
Schneider, Stephen H. "Detecting Climatic Change Signals: Are There Any 'Fingerprints'?" Science 263:341-347,
1994.
Victor, David G. and Julian E. Salt. "Managing Climate Change." Environment 36(10):7-15, 1994.
Broecker, Wallace S. "Global Warming on Trial." Natural History April:6-14, 1992.
Singer, Fred S. "Warming Theories Need Warning Label." The Bulletin of the Atomic Scientists 48(5):34-39,
1992.
White, Robert M. "The Great Climate Debate." Scientific American 263(1):36-42, 1990.
Kasprzyk, Leszek. "Science and Technology Policy and Global Change." Social Science Journal 41:433-439, 1989.
April 15 Environmental Implications of Free Trade
Reading required:
Harper, Chapter 10.

Reading suggested:
Bhagwati, Jagdish. "The Case for Free Trade." Scientific American November:42-49, 1993.
Daly, Herman E. "The Perils of Free Trade." Scientific American November:50-57, 1993.
Costanza, Robert, John Audley, Richard Borden, Paul Ekins, Carl Folke, Silvio O. Funtowicz, and Jonathan Harris.
"Sustainable Trade: A New Paradigm for World Welfare." Environment 37(9):16-39, 1995.
Durwood, Zaelke, Paul Orbuch, and Robert F. Housman (eds.), Trade and the Environment: Law, Economics, and
Policy. Washington, D.C.: Island Press, 1993.
April 17 Borderline Cases:
Environmental Matters at the United States-Mexico Border
Required reading:
Vogel, David. "Reconciling Free Trade with Responsible Regulation." Issues in Science and Technology 12(1):73-
79, 1995.
April 20 Policies for Global Environmental Protection
Reading required:
Porter, Gareth and Janet Welsh Brown. "The Emergence of Global Environmental Politics." Pp. 1-30 in Gareth
Porter and Janet Walsh Brown, Global Environmental Politics. Boulder: Westview, 1996.
Reading suggested:
Cruz, Wilfrido, Mohan Munasinghe and Jerry Warford. "Greening Development: Environmental Implications of
Economic Policies." Environment 38(5):6-38, 1996.
O'Riordan, Timothy, William C. Clark, Robert W. Kates, and Alan McGowan. "The Legacy of Earth Day:
Reflections at a Turning Point." Environment 37(3):7-15, 1995.
Schneider, Stephen H. "The Whole Earth Dialogue." Issues in Science and Technology 4(3):93-99, 1988.
Tucker, Richard P. and John F. Richards. "The Global Economy and Forest Clearance in the Nineteenth Century."
Pp. 577-585 in Kendall E. Bailes (ed.), Environmental History: Critical Issues in Comparative Persepective.
Lanham: University Press of America, 1985.
Hardin, Garrett. "The Tragedy of the Commons." Science 162:1243-1248, 1968.
April 22 Environmentalism and the Sociology of Values
Reading required:
Harper, Chapter 2.
Reading suggested:
Elliot, Herschel. "A General Statement of the Tragedy of the Commons." Population and Environment 18(6):515-
531, 1997.
O'Neill, John. "Cost-Benefit Analysis, Rationality and the Plurality of Values." The Ecologist 26(3):98-103, 1996.
Gowdy, John M. "Progress and Environmental Sustainability." Environmental Ethics 16:41-55, 1994.
Dunlap, Riley E. and William R. Catton, Jr. "Struggling with Human Exemptionalism: The Rise, Decline and
Revitalization of Environmental Sociology." American Sociologist 25(1):5-30, 1994.
Stern, Paul C., Thomas Dietz and J. Stanley Black. "Support for Environmental Protection: The Role of Moral
Norms." Population and Environment 8(3/4):204-207, 1986.
Giddens, Anthony. "Fundamental Concepts of Sociology." Pp. 145-168 in Anthony Giddens, Capitalism and
Modern Social Theory: An Analysis of the Writings of Marx, Durkheim and Max Weber. Cambridge: Cambridge
University Press, 1971.
White, Lynn Jr. "The Historical Roots of Our Ecological Crisis." Science 155(3767):1203-1207, 1967.
April 24 The Dominant Social Paradigm
Reading required:
Rees, William E. "The Ecology of Sustainable Development." The Ecologist 20(1):18-23, 1990.
Reading suggested:
Dorfman, Robert. "An Economist's View of Natural Resource and Environmental Problems." Pp. 67-95 in Robert
Repetto (ed.), The Global Possible. New Haven: Yale University Press, 1984.
Smith, Robert J. "Privatizing the Environment." Journal of Labor Research 3:11-50, 1982.
Catton, William R, Jr. and Riley E. Dunlap. "A New Ecological Paradigm for Post-Exuberant Sociology."
American Behavioral Scientist 24(1):15-47, 1980.
April 27 The New Environmental Paradigm
Reading required:
Harper, Chapter 7.
Reading suggested:
Gigliotti, Larry M. "Environmental Issues: Cornell Students' Willingness to Take Action, 1990." The Journal of
Environmental Education 26(1):34-42, 1994.
Dunlap, Riley E., "Trends in Public Opinion Toward Environmental Issues: 1965-1990." Pp. 89-116 in Riley E.
Dunlap and Angela G. Mertig (eds.), American Environmentalism: The U.S. Environmental Movement, 1970-
1990. Philadelphia: Taylor and Francis, 1992.
Paehlke, Robert C. "Environmental Values and Democracy: The Challenges of the Next Century." Pp. 349-367 in
Norman J. Vig and Michael E. Kraft (eds.), Environmental Policy in the 1990's: Toward a New Agenda.
Washington, D.C.: Congressional Quarterly Press, 1994.
Watts, Nicholas and Geoffrey Wandesforde-Smith. "Postmaterial Values and Environmental Policy Change." Pp.
29-42 in Dean E. Mann (ed.), Environmental Policy Formation: The Impacts of Values, Ideology, and Standards.
Lexington: Lexington Books, 1981.
Dunlap Riley E. and Kent VanLiere. "The New Environmental Paradigm." Journal of Environmental Education 9
(4):10-19.
April 29 Toward an Alternative Paradigm
Reading required:
Colwell, Thomas. "The Nature-Culture Distinction and the Future of Environmental Education." The Journal of
Environmental Education 28(4):4-8, 1997.
Reading suggested:
Kates, Robert W. "Sustaining Life on the Earth." Scientific American October:114-122, 1994.
May 1 Discussion and Review
To be announced FINAL EXAMINATION

"The prophesies of the Ancestors of the Lakota Nations have important meaning for the future of
Mother Earth. Through the generations, these prophesies have been maintained courageously,
methodically and accurately. Now, we are in a critical stage of our spiritual, moral, and
technological developments as nations. All Life is precariously balanced. We must remember
that all things on Mother Earth have spirit and are intricately related. The Lakota prophesy of
"Mending the Sacred Hoop of all Nations" has begun. May we find, in the ancient wisdom of
Indigenous Nations, the spirit and courage to mend and heal."
Arvol Lookinghorse
19th Generation Keeper of the Sacred
Pipe
Lakota Nation
In recent years there has been a major movement initiated by Native Peoples to address the fundamental question
of how to safeguard the earth for future generations while caring for the needs of present peoples. Many feel
intense urgency to address environmental concerns because we see, hear, and feel the wounds of Mother Earth;
lands that have been harmed through exploitation of water, timber, soil and wildlife resources; soils and waters
contaminated by industrial waste and leaking landfills. All living things are embraced by the web of life; injury in
one place injures us all. Breaking the intimate ties that connect people and the earth threatens our cultural and
temporal survival. There is an immediate need to address these concerns, to live in harmony with Mother Earth, the
Four Grandfathers, and the World Above.
The purpose of this webpage is to detail a current project at the Cornell American Indian Program
evaluating existing indigenous environmental initiatives in the United States. The collection of data is on
going and the research is expected to be completed during the summer of 1996.
Background
The North American Agreement for Environmental Cooperation (NAAEC) was negotiated as a side agreement of
the North American Free Trade Agreement(NAFTA) to ensure environmental protection as a result of "increased
economic growth through trade." A Commission for Environmental Cooperation(CEC) was established to facilitate
the globalization of ecosystem management and environmental protection with the governments of Canada, the
United Mexican States and the United States of America.
CEC recognized the traditional and cultural position that indigenous peoples, as First Nations, have historically
maintained in addressing environmental issues. An immediate goal of this non-profit, international organization
was to facilitate an exchange of information and experiences among all governments and organizations concerned
with these issues and native populations. Consequently, the promotion and communication of aboriginal
sustainable development efforts in the three countries defined the present research objectives. These objectives are
as follows:
1. To identify the mechanisms, initiatives and projects currently in place or proposed in North
America for facilitating the involvement of indigenous people in environmental issues and efforts
2. To prepare an overview of all the mechanisms, initiatives and projects identified
3. To provide a brief statistical analysis of the United States indigenous population describing
demographics, cultural distributions, and ecogeographical regions.
The Commission for Environmental Cooperationsolicited the United States component participant by directly
contacting Cornell's American Indian Program (AIP). Jane Mt. Pleasant, the director of this program has an
extensive background in agronomics and environmental issues, as well as administering the AIP Agriculture
Project. Dr. Mt. Pleasant agreed that Cornell University, as a land grant institution, had a responsibility to respond
to this invitation and accepted the role of the US component for this research evaluation project.
Research Design
The Research Design for Program Evaluation, HSS 691course taught at Cornell University by WM Trochim
provided the design foundation for this study. To ensure construct and internal validity, the American Indian
Program developed a research design clearly defining the operationalized constructs of "indigenous groups",
"sustainable efforts", and "environmental issues." Integrating the pre-existing objectives and participant criteria of
CEC, the operationalized definitions of "issues" and a review of active projects, mechanisms and initiatives
relevant to environmental protection led to the develpment of a purposive survey within a summative evaluation
research plan. The methodology included selecting a minimum of twenty (20) case studies and conducting
qualitative interviews extracting specific descriptive program data. Data triangulation included a review of relevant
literature, existing program documents and technological resources. Participatory Action Research methods were
also incorporated in the collection of data.
Resources
A review of relevant literature began with Cornell's Mann Librarywhich directs an individual to a "quick title
search". Approximately 400 titles covering the environment and indigenous peoples resulted in additional
electronic network resources. Native Americans and the Environment, All Links provides an alphabetized listing of
information on events, speeches, projects, reports and informal literature. Also, the Environmental Bibliography
and ENVIRONLINE, accessible through Mann Library's Gateway are groups of databases managed by DIALOG
Information Services which provide indexing and abstracting coverage of more than 1000 international primary
and secondary publications reporting on all environmental aspects. In addition, the Cornell University Center for
the Environment maintains a current projects summary listing.
Simultaneously, a subject search was used to identify more than 400 articles with references to Native Americans
or American Indians. Several electronic resources are available which provide historical and contemporary
information about indigenous cultures. The following links include demographic, cultural and environmental
sources :
The American Indian College Fund
The Indigenous Knowledge and Development Monitor
The Lakota Home Page
List of Federally Recognized Tribes
The Native Net
Natural Resources; Forests and Indigenous People
More than seventy-five (75) active projects and initiatives have been identified through primary and secondary
sources. Of these, twenty will be selected for more extensive interviews. Selections were based on geographical
balance and broad representation of the key issues identified by the survey team and CEC staff. The projects
address resource management, conservation, preservation, environmental protection, and education or capacity
building. Interestingly, this research has identified the use of technological resources for the transmission of
information, which was not an original CEC objective. However, there are numerous variables to explore in order
to determine the extent and use of such resources among indigenous populations. A final note; the Cornell
American Indian Program is presently constructing a homepage for this very reason and it should be on line before
the end of this semester.
Html submitted to WM Trochim

HSS 691, Research Design for Program Evaluation
April 1, 1996
Submitted by R. Maldonado
Name of the Organization Environmental Initiative
American Indian Program Jane Mt. Pleasant, Director

Agriculture Project Northeast area
300 Caldwell Hall Issue: Agriculture
Cornell University
Ithaca, NY 14850
Office: 607-255-6587
Fax: 607-255-4246
First Environment Project Katsi Cook, Program Director
227 Blackman Hill Road Northeast area
Berkshire, NY 13736 Issue: Education/Capacity Building
607-255-1605
First Nations Development Institute Rebecca Adamson, President
11917 Main Street, The Stores Building Southeast area
Fredericksburg, VA 22408 Issue: Education/Capacity Building
Office: 540-371-5615
Fax: 540-371-3505
Fila River Indian Community Sandi Dass, Project Director
Juvenile Justice Center and Rehabilitation Farm Project Southwest area
O'Otham Oidak Farm Project Issue: Education / Capacity Building
P.O. Box 219
Sacaton, Arizona 85247
Office: 602-899-1012
Fax: 602-899-9856
Native American Fish and Wildlife Ken Poynter, Executive Director
750 Burbank Street Southwest area
Broomfield, Colorado 80020 Issue: Resource Management
Office: 303-466-1725
Fax: 303-466-5414
Native Seeds/SEARCH Kevin Dahl, Edcation Direcotr
2509 N. Cambell Ave. #235 Southwest area
Tuscon, AZ 85719 Issue: Conservation
Office: 602-327-9123
Fax: 602-327-5821
Navajo Natural Heritage Program Jack Meyer, Director
P.O. Box 308 Southwest area
Window Rock, AZ 85515 Issue: Resource Management
Office: 520-871-7060
Fax: 520-871-7069
E-mail: ANYSTEDT@TNC.ORG
Traditional Native Amerian Farmers Association Clayton Brascoupe, Director
P.O. Box 170 Southwest area
Tesuque Pueblo, New Mexico 87574-0170 Issue: Resource Management
Office: 505-983-4047
Fax: 505-983-2172
Zuni Conservation Project Jim Enote, Project Leader
P.O. Drawer 630 Southwest area
Zuni, New Mexico 87327 Issue: Resource Management
Office: 505-782-5852
Fax: 505-782-2726
Agroforestry research for woodfuel development
Mugo F.W.
INTRODUCTION
Agroforestry is a land use system that involves the growing ofcrops and woody perennials and keeping of livestock on the sameland
unit in space or time. Active agroforestry research hasbeen going on for the last two decades but most of it has been onthe interaction
of crops and trees especially as it relates tocrop yield and the biomass yield particularly for mulching orfodder for animals.
One of the main reasons of introducingtrees on the farm through agroforestry was to increase theavailability of woody resources to the
households since thetraditional sources which were indeginous forests were gettingdepleted very fast. Apparently, this is not being
achieved asanticipated mainly because adequate attention has not been givento research that can generate information that directly
addressesthe problem of woodfuel availability at the farm level.
Withthis realization, a number of individuals are now getting intothis field to try and address the question of woodfuelavailability at the
farm level. But information on how to do thiskind of research is scarce and too scattered. This page isspecifically prepared to assist
those interested in the socio-economic and cultural aspects of agroforestry research forwoodfuel production. The traditional research
steps i.e researchquestion identification, design, implementation, data analysis,and reporting have been adopted for ease of
understanding
RESEARCH METHODOLOGY
An introduction to evaluation methods can assist you in getting somewhere to start. Research problemformulationis generally time
demanding and the temptation to dispair can be quite high. Research and problem formulation requires an indepth
review and understanding of what has been done in the subject of interest. A review on international agroforestrypolicies and
priorities, its evolution and future both in the developed and the developing countries will assist you in shaping your research. That is
to put your research within the policy context. Some examples of what has been done and what is going on can also give you an idea of
what other researchers are doing. For example, studies on Multipurposetree species,intercropping, rootscharecteristics and other
general areas likeecologycan be very useful if your interest is in tree physiology, phenology and other generaltree-soil-
croprelationships. Even with this, there is need to look at the general status of agroforestry research in order to develop a good
conceptual understanding of the field. Research on cultivationsystems like what has been done in the Northern highlands of Zambia
can be very useful especially if your interest is in the farming systems process.The Diagnosis and Design method developed by
ICRAF, Situation analysis and Participatory rural appraisal have also been used successfully in identifying community related research
problems.These three are all similar and they involve a quick survey ofthe community of interest to identify the problems and design
theappropriate research. General ideas on the emic and eticmethodsmay help especially if your focus is on socio-economic aspects.
Other general information onforestry,agroforestryand livestockwill widen your scope of understanding. Somespecimen may also be
useful if you would like to look at details of a particular tree.
RESEARCH DESIGN
As is thecase for any research, the researcher is interested in gettingresults that are as accurate as possible. This is what willreflect the
quality of research most of the time referred to asvalidity. To attain this, there is need to be aware of the mostcommon areas of research
that can limit the accuracy or validityof the research findings especially the woodfuel supply studieswhere one thing can have many
meanings at the same time forexample, we know that firewood comes from a tree but not any treeon the farm is used for firewood. It
may have been planted forpoles, timber, shade, fence e.t.c. If you counted all the treesand grouped them together to stand for fuel, your
findings willbe false. To understand better how to escape such threats, youcan read the details on Construct,External,internal,
conclusion, validity. The design you choose for you research should be one that will produce the highest quality data possible given
your constraints.
SAMPLING
Sampling is only necessary if one cannot study the whole population of interest at one go.There arevarious methods of sampling. The
choice of a sampling method willdepend on the type of research being conducted but in most cases,several methods are combined.
Since a community always consistsof different categories of households i.e. the old and the young,widows and widowers and absentee
husbands or wives (where one ofthe spouses works and stays away from home), there is need toconsider a fair representation of the
community bysamplingfrom each category unless the intention is to focus on just one of these subgroups. This will improve the
external validity of your study.
DATA COLLECTION ORGANIZATION AND ANALYSIS
This stage of research demands hard work and occationally you
may require help depending on what you are doing in particular. At the design stage of your reseach process, it is good to think about
how you would like to analyize your research data. Look at what researchers in your field use and decide early enough so that you
don`t end up with data which cannot be analyized. There are various data analysis tools which can be used for the analysis. The choice
of a statisticaltool will depend on the type of data that you intend to generate. SAScan do most of the statiscal analysis that you may
want though you can look for additional help especially when you are dealing with qualitative analysis.Selectioncan assit you in
chosing the statistical operations you shold employ when analyzing your data.
DOCUMENTATION OF REASERCH FINDINGS AND PUBLISHING
For the documentation of your research findings, contactICRAFand get details on how you canpublish your research findings in their
quaterly newsletter the "Agroforestry Today". You can use additional sources of information on agroforestry to improve on your
research and also identify opportunities for further research. If you have done a good job at your research and you are 99% confident
that no body can successfully challange you, as soon as your research data is published, you become King
Doesn`t it feel nice. Enjoy your research. If you get stuck on the way, consult the WEB
EVALUATION TOOLS FOR INTEGRATED PEST MANAGEMENT PROJECTS
By Alfredo Rueda
This home page is the result of a search of tools for the evaluation of Integrated Pest Management (IPM) Projects in the Internet. The goal of IPM
projects is that farmers implement new pest management tactics at the field level to reduce the use of pesticides. I am disappointed with the result of
the search because there is not much information or examples of IPM project evaluation. For technical information on pest management there are
several excellent resurces. Other information available in this page is relate to evaluation and research methods that may provide and inspiration to
IPM specialists to construct similar resources in the Web. Finally I include some links to development and educational organizations working in
third world countries in agricultural projects which may utilize IPM information.
IPM EVALUATION PROJECTS
A good place to start is to introduce the reader with the concept of Integrated Pest Management, the electronic publication entitled IPM: The Quiet
Evolution". In this publication there are some case studies of the pseudo-evaluation of good IPM programs in California. I refer to these case studies
as pseudo evaluations because they refer more to the technical component of the program that to a real evaluation of the implementation by farmers.
Other example of the type of evaluation that scientist made of their programs is Impacts of the New York State Integrated Pest Management
Program 1986-1994.
There are only a few sites in the web that deal directly with the evaluation of IPM projects. In the U.S.A.,
Argonne National Laboratory is developing a Decision Support System ( DSS) for the U.S. Department of
Agriculture, to find alternatives to pesticides under regulatory review. The purpose of the system is to help
improve pest management decision making, including at the policy level. The program is still under
development, but you can find a slide presentation of the whole project. The origin of this project is by the
Clinton initiative to implement IPM in 75 % of the agricultural land in this country by the year 2000.
In Australia the Cooperative Research Centre for Tropical Pest Management

(CRC) is targeting the crucial issues involved in designing, developing, and
implementing improved pest management strategies. The Decision Analysis
and Implementation Program Projects within this program aim to determine
how social, economic, community and policy issues influence pest management
practice, and to use this information for increasing IPM implementation. In this
place they will have an international short course in this topic. IPM: Tools for
Implementation draws on the world class, multi-disciplinary expertise within
the Center to provide training in a range of approaches, principles and
techniques aimed at improving research and implementation of integrated pest
management.
IPM TECHNICAL RESOURCES
● IPM SERVERS
In the U.S.A. there are several universities with entomological servers. Some of the best or with more links capability are the
Entomology Department at Colorado State University and the college of agriculture of The University of Florida. Colorado
State University has the best collection of references in entomology and IPM in the country. In this server the links are
organized by topic making it easier to search. AgriGator , the server at The University of Florida has a collection of sites and
sources that provide agricultural and biological related information. This is a much larger server because it cover all
agricultural topics. A good feature of this server is that it has a geographical index for the servers. AGRALIN World Wide
Web server is a project of the Wageningen Agricultural Univers! ity. From this server you can ac cess several interesting
agricultural and entomological places in Europe.
● IPM NEWS AND PUBLICATIONS
There are several places to find current information in IPM. Local information to the state of New York
can be found at the New York State IPM Program News and Information page that provide news to the
agricultural and scientific comunity in topics related in IPM within the state. A larger organization is the
Consortium for International Crop Protection formed by 13 different universities to collaborate in IPM
programs.IPMnet is the Consortium for International Crop Protection's response to the critical need for an
international IPM information/communication network. IPMnet offers free access to databases, IPMnet
NEWS, as well as other agricultural publications, a message board, and communication with others interested in IPM in developing countries.
While IPMnet will remain responsi! ve to any queries involving pest management, emphasis is placed in developing countries, with other networks
concentrating on the more developed-world segments.
Some of the networks that provide news at the national level is the Center for Integrated Pest
Management in North Carolina server a lead role in technology development, program
implementation, training, and public awareness for IPM at the state, regional, and national level. The
CIPM is an organizational unit within the College of Agriculture and Life Sciences at North
Carolina State University. It is composed of faculty members from all academic departments in the
College and involves all relevant disciplines impacting on IPM. The American Crop protection Association is the channel for the chemical industry
to inform the public about their role in the implementation of IPM. This place provides limited information because it is required to be a member of
the association to enter in most of the places.
There are some specialist news groups like the The Resistant Pest Management Newsletter that provide information of the pesticide resistance
problems and the resistance management mechanist.
● PEST IDENTIFICATION AND CONTROL
The first step in solving pest problems is the proper pest identification. Pest identification manuals are
available in some places in the Internet. The Tree of Life is a project designed to contain information about the
phylogenetic relationships of organisms, to link biological information available on the Internet in the form of
a phylogenetic navigator, and to illustrate the diversity and unity of living organisms. The Three of Life is an
excellent place to start with the identification of your problem because it covers the whole living species.
Al
Rueda and Anthony M. Shelton developed the CIIFAD / Global Crop Pest Identification and Information Services in IPM. This is a multilingual
prototype with the goal of help to increase crop pest diagnosis and IPM information capability among extensionists and farmers of developing
countries. University of California Statewide Integrated Pest Management provides an excellent search place to look for pests sorted by commodity.
The University of Kentucky Department of Entomology also has some information with some of the best pest pictures I have found in the web until
now. Both places provide a general description of the pest and some recommendations on the control. They fail to have a better explanation of what
should be the IPM program for the pest. NYS IPM home page is another source of pest information specific to the state of New York. NY IPM
program holds the best electronic information on Biological Control in North America for the quality of the context and presentation.
PROGRAM EVALUATION AND RESEARCH RESOURCES

The first stop in the electronic tour of evaluation and research servers must be The Knowledge Base. This is an on-line textbook
for in research methods with several examples and some applications for research such as Concept Mapping and Computer
Simulations for Research Design. For the computer simulation page you must have the statistical package MINITAB in your
computer. If you own a copy of MINITAB, you can download some macros from their page to facilitate your data analysis. Other interesting site
for information in research methods Cultural Anthropology Methods Journal is CAM. CAM publishes articles on the real "how to" of qua! litative
and quantitative research methods. Unfortunately, in the on-line version there are only a few articles you can read, but they have the index of the
other articles and an order form to obtain them.
The Abstracts of HHS Evaluation/Research Studies and Other Relevant Reports database contains descriptions of HHS
Program Evaluation Studies divided by several categories. This place is a good place to see real examples of program
evaluations in the U.S.A. The Evaluation of Family Planning Program Impact is a recent and major five year initiative
funded by USAID (U.S. Agency for International Development) to support technical and methodological advancement
of population program evaluation. The purpose of the EVALUATION Project is to strengthen the capacity of USAID
and host-country institutions to evaluate the impact of population programs on fertility.
An interesting evaluation project with many good search applications is the User Needs Assessment and Evaluation for
the UC Berkeley Electronic Environmental Library. The goals of the needs assessment and evaluation component of this
project is to maximize usability of digital libraries.
The Center for Indigenous Knowledge for Agriculture and Rural Development (CIKARD) at Iowa State University
focuses its activities on preserving and using the local knowledge of farmers and other rural people around the globe. CIKARD goal is to collect
indigenous knowledge and make it available to development professionals and scientists. CIKARD concentrates on indigenous knowledge systems
(such as local soil taxonomies), decision-making, organizational structures, and innovations.
DEVELOPMENT AND EDUCATIONAL ORGANIZATIONS LOCATOR
The following servers are places where you can find the organization that you want to learn more about. The ONE WORLD is an
European sever where most NGO's are located. They provide interesting news in development, educational resources and a
directory for several NGO. To locate your European colleges a good place to start is the EUROPE FLAGS which has indexes of
organizations and Universities for most European countries.
For government organizations in the US theThe Department of Health and Human Services provide links to several agencies.
I know that it is educational and fun to discover all these new places around the globe, but I need to remind you that in a few days is the taxes
deadline. I am providing to you the IRS Forms and Documentation Page where you can download any tax format and instructive to complete the
taxes in few hours to comeback to the web again.
For any comment please write to: Alfredo Rueda, Dept. of Entomology, Cornell University
Breast Feeding and HIV in Developing Countries
There is plenty of evidence that breast-feeding is beneficial for both the mother and the child, and thus the practice has been
encouraged by numerous health-promoting organizations. For the mother, the benefits include lower rates of breast and ovarian
cancer. For the child, lower rates of diarrhea and respiratory infections and better growth in terms of both physical and mental
development. And for the mother-child pair it helps in establishing a good bonding relationship. Which probably has deeper
implications towards later social interactions and the stability of the society as a whole.
There has been a recent uproar about the appropriateness of this practice in view of the emergence of AIDS in the developing
countries. Following is a discussion about the different aspects related to this issue.
● Importance of Breast Feeding in Developing Countries

● Introduction of the new scourge - AIDS
● AIDS and Mother To Child Transmission during Breast Feeding
● The Dilemma
● Hidden Agenda
● A Crucial Question - How much is the risk?
● Research so far
● Some views across different groups of people
Importance of Breast Feeding in Developing Countries
In a developing country situation, breast feeding is more important due to its immediate impact on child and maternal
health, both in terms of mortality and morbidity. Infectious diseases including diarrhoeal and respiratory ones are the major
killers, and breast feeding reduces both of them significantly enough.
Introduction of the new scourge - AIDS
One of the biggest scourge, at least from the health point of view, which we have carried over into the new millenium is
AIDS - Acquired Immuno Deficiency Syndrome. Controlling HIV (Human Immunodeficiency Virus) infection, the
causative factor of AIDS, has become one of the major challenges to providers of health care today. Ever since its
emergence as a public health problem about a little more than a decade ago, there has been a continual increase in its
prevalence in different parts of the world. No effective curative treatment has been invented yet. The only method to counter
it seems to be primarily preventive in nature. Also, the treatments available to control the disease and/or its consequences
are very costly for both the individual and the health care institutions. This cost factor is one of the important issues in
developing countries in contrast to developed nations.
AIDS and Mother To Child Transmission during Breast Feeding
In many of these developing countries HIV infection is as common in females as in males. Also it is known that HIV
infection can be transmitted from infected mothers to uninfected infants by various methods including breast feeding. This
evidence has become the basis for suggestions that HIV-infected mothers should not breastfeed their babies. This
intervention works well in a developed country towards decreasing the mother to child HIV transmission rate, though not
eliminating it completely.
But in a developing country situation, the scenario is a little different and more complex if not interesting from the overall
health care point of view.
The Dilemma
The biggest question now is - "What should HIV-infected mothers in a developing country do - breastfeed their babies or
not?"
The apparent answer as suggested by an average person in a developed country with all its amenities would tend to be
negative - "After all why infect the child?" But what should an impoverished, illequipped, often homeless mother in a less
privileged country do with her little, helpless child who has barely seen the light of the day? Not to breastfeed, and leave it
to the mercy of barely existant or often non-existant child-care institutions, leave it at the reluctant hand of other family
members - already panicked by the fear of contracting the disease in someway from the mother or the baby, they themselves
being already ostracized and seggregated by the society? Or should she take care of the baby all by herself - providing the
baby with the overdiluted formula that she can barely manage with the little means she is left with. Should she provide her
baby with this panacea, the overdiluted milk, which has been prepared unhygienically with contaminated water, to some
extent due to her ignorance but largely because she does not have access to a better source? Should she do all this, and yet
deprive the baby from the precious, nutrient-rich breast-milk that she could otherwise easily feed her child with?
Hidden Agenda
The advancement of medical science after World War II and the recent projections of genetic engineering engendered a
view that we would soon be able to conquer the race against different diseases. But with little practical solutions in sight so
far, the rapid rate of spread of AIDS (or rather the detection of its spread) became the cause of a hysteria leading
organizations like UN (United Nations) towards anti-breast-feeding campaign. And it is always easier to spread a fear about,
than to enlighten people on the relative merits and demerits of a controversial issue. Again, even if no clear alternative
suggestion is made, people of necessity would adopt the alternatives, specially the existing ones known to them rather than
trying any new ones suggested. In this case the alternative being use of Infant Formula. Interestingly, marketing of infant
formula in developing countries was a questionable and highly debated issue due to numerous reasons for quite a while
before AIDS was discovered. Following the recent anti-breast feeding campaign there is a chance of reemergence or rather
reestablishment of the same old problem
A Crucial Question - How much is the risk?
Some of the most important decisive questions, in the absence of cheap effective treatment are - What are the relative risks
of breast feeding and not breast feeding the baby by an HIV infected mother on the overall health of the community or
(developing) country and how much of the mother to child transmission (MTCT) of AIDS is caused by breast feeding alone
compared to other modes from the same infected mother? It is known that MTCT occurs even without breast feeding by the
infected mother. Also MTCT occurs only in a certain percentage of breast fed babies. Unfortunately there are only few
quantitative data on these risk estimations to help resolve the issue. Some evidences suggest the risk of MTCT via breast
feeding is little if not insignificant.
Research so far
There have been numerous researches carried out revolving around this issue. The following website provided here is a
good summarization of some of them.
Some views across different groups of people

Here is another website which will lead you to a collection of views about this issue.
Aids In Sub-Saharan Africa
graphic
Lydia M. Mensah-Dartey
This page is designed to provide information on the state of Aids in Sub-Saharan Africa. It
is aimed at serving as a World Wide Web resource associated with Aids in Sub-Saharan
Africa to those who have little or no knowledge on the subject. This page has links to other
sites that viewers can go to for further information.
Contents:
● Overview
● Impact
● Intervention
● Remarks
Overview:
AIDS has become a global epidemic since its discovery in 1981. It affects all ages, races,
genders, classes, and nationalities. AIDS poses a tremendous problem to underdeveloped
countries because they lack the education and the funds to halt the increasing spread of
AIDS. AIDS has gained such a widespread effect on the world that there is now a call for an
international regime to rise up and serve as a helping hand to underdeveloped countries, as
well as the rest of the world.
More than 50 million people have now been infected with the HIV virus, according to the
latest UN and World Health Organization figures. Deaths from Aids reached a record 2.6m
in the past year - 2.2m died in 1998 - and an estimated 5.6m adults and children were
infected. 90% of all people living in the world with HIV dwell in developing countries.
Sub-Saharan Africa - which includes countries such as Botswana, Zimbabwe and Namibia -
still accounts for the majority of all new infections. 70% of the world's victims of HIV and
Aids are in Sub-Saharan Africa, which has less than 10 percent of the world's population.
This figure is not easy to take in. Eastern and Southern Africa have been particularly hard
hit. Aids is now seen as a development problem more than a health issue.
There are now more women carrying the virus than men. African girls aged 15 to 19 are five
to six times more likely to be HIV-positive than boys the same age.
Life expectancy in southern Africa is expected to fall from 59 in the early 1990s to 45
between 2005 and 2010. This would be roughly the same level as in the early 1950s.
Factors contributing to the spread of Aids in Africa are poverty, ignorance, the prohibitive
cost of Aids drugs, an aversion to discovery sex and promiscuity. One major factor is
prostitution. Poverty makes young girls drop out of school and run to the city to work as
child minders, where most of them earn very little money. Because this money is not
enough to provide for her family at home, the girl is driven to prostitution at an early age
where she thinks she can earn more money daily.
Treatment of Aids is a big challenge in Sub-Saharan Africa because of scarce money, few
drugs and little hope.
Click here for more information on an overview of Aids in Sub-Saharan Africa. You can also
Watch the CNN International documentary "The New Face of AIDS".
The following site, Aids and Africa, presents a few basic notions concerning HIV/AIDS and
focuses on the biological, epidemiological and socioeconomic characteristics of the African
epidemic. This site also talks about what Aids is, transmission of Aids, where this disease
comes from, impact of Aids, prevention programs and treatment methods being used at the
moment to prevent this disease. You also get the chance to watch a 23 slide show on STD-
AIDS in Africa.
Back to top
Impact:
The Aids epidemic killed 2.2 mil people in Africa in 1998
Peter Piot, executive director of UNAIDS, emphasized
that Aids is the single greatest threat to development
in many countries. He stated that, "With an epidemic
of this scale, every new infection adds to the ripple
effect, impacting families, communities, households
and, increasingly, businesses and
economies." (Source: United Nations Aids program)
As the HIV epidemic deepens in Africa, it is leaving an economically devastated continent in

its awake. It is important to note that the impact of aids goes far beyond the immediate
family of those who fall ill and die. The whole nation and community are affected. The
effects could even be international. Family structure is dissolved. Children become orphans,
elderly relatives are left without support, and households and communities are
impoverished. In particular, when Aids claims the lives of people in their most productive
years, grieving orphans and elderly most contend with the sudden loss of financial support
and communities must bear the burden of caring for those left behind. The national and
global economy loses leaders, managers, producers, and even consumers. Countries must
draw on a diminishing pool of trained and talented workers. Aids leaves Africa’s economic
future in doubt.
Click here
Back to top
Intervention:
A UNAIDS "Epidemic Update" revealed that, despite concerted prevention efforts in

developing countries, the increase in infections continues.
There are numerous activities and organizations

helping to combat the Aids epidemic in Africa.
Enter here to read on efforts by the US government and the UN to combat the aids problem
in Africa.
The World Bank maintains a web site that contains information on the Aids-HIV prevention
best practice and epidemiological facts by country. You can get Aids contact persons in
Africa, Key World Bank documents for each country in Sub-Saharan Africa, statistics and
links to additional information.
(Picture: A 26-year-old woman, her arm covered with a
skin infection, suffers from AIDS at the government
hospital in Gulu, Uganda.)
Back to top
Remarks:
“Time for a blunt message to Africa!” says a CNN report (author: George Ayittey).
“Refugees make easy targets for sexual predators, many of whom carry HIV, the AIDS virus
… Africa cannot continue in the new millennium preoccupied with violence, war and political
instability. Sustainable development cannot occur in such an environment. Nor can control
of the AIDS epidemic.”
It is worth noting that HIV/AIDS is a global development issue as a number of researchers

have rightly noted. Some questions that remains to be investigated further include, “Does
the epidemic cause developmental delays or does it result from them?” For some work on
this, see an article on “AIDS and Development - What is the Link?” by Dr. Josef Decosas.
Back to top
For any comment please write to: Lydia
Last revised: 4/17/00

AIDS Orphans in Sub-
Saharan Africa
The Scale of the Problem
African AIDS Crisis
Defining "Orphan"
AIDS and Orphans
Household and Family
Conspiracy of Silence
Related Links
References and Bibliography
Send comments to the author via email.

Back to first page
THE SCALE OF THE PROBLEM

The global death toll from AIDS was 2.6 million last year alone. By the end of 1999, according to estimates from the Joint United Nations
Programme on HIV/AIDS (UNAIDS) and the World Health Organization (WHO), the number of people living with HIV grew to 33.6 million
worldwide.
Cyclones, floods, famine, malaria, tuberculosis, wars take many lives, and leave many vulnerable children behind. However, the magnitude
of the AIDS pandemic exceeds any other cataclysm. On one hand, HIV (the virus that causes AIDS) does not respect children, and at the end
of 1999, 3.6 million children (<15 years of age) around the world had died due to HIV/AIDS, and another 1.2 million was living with HIV/
AIDS.
On the other hand, the number of AIDS orphans is believed to far outstrip the number of children orphaned by other causes. It is estimated
that AIDS has already orphaned 11.2 million children since the epidemic was recognized in 1981. According to UNAIDS, by the end of the
year 2000, 13 million children will have lost their mother or both parents to AIDS. This cumulative figure includes orphans who have since
died, as well as those who are no longer under age 15. There may be 30 million more by 2010. "For every adult that dies, four or five
children are left behind," UNAIDS director Peter Piot says. More children have been orphaned by AIDS than people who have developed
AIDS. Orphans are one of the most tragic consequences of the AIDS epidemic in highly affected countries.
Already, 95 percent of all AIDS orphans are in sub-Saharan Africa, yet only a tenth of the world's population lives south of the Sahara. In
some countries, AIDS has orphaned 10 percent of children under 15. In a continent already ravaged by wars and mired in poverty, AIDS is
wiping out much of a generation, cutting down skilled workers, and destroying families.
Back to first page

Back to first page
AFRICAN AIDS CRISIS

Two in every three persons in the world living with HIV/AIDS, and nine in every ten children born with HIV infection are in
the sub-Saharan Africa region.
In 1998, wars in Africa killed 200,000 people. AIDS killed 2 million people on the continent. AIDS is the leading killer in sub-
Saharan Africa. What is more, in Africa, HIV and AIDS pose a far more serious threat to soldiers than their dangerous
profession. In most countries, infection rates in the military from sexually transmitted diseases are generally two to five
times higher than the rates in comparable civilian populations. Overcrowding, violence, rape, despair and the need to sell or
give away sex to survive contribute to a significant increase in HIV infection among refugees in the camps. People are six
times more likely to contract HIV in a refugee camp than in the general population. Africa is home to more than 4 million
refugees.
Life expectancy at birth in southern Africa, which rose from 44 in the early 1950s to 59 in the early 1990s, is set to drop back
to 45 in the next 10 years because of AIDS. In some countries, it is even lower. Using different modeling methodologies,
researchers estimate life expectancy in Zimbabwe to be between 30 and 35 years in the 21st century (US Bureau of the
Census; Gregson et al., 1994). Significant reductions in life expectancy due to AIDS imply an increase in the numbers of
orphaned children.
East and primarily Southern Africa present the most alarming statistics. Botswana, Kenya, Malawi, Mozambique, Namibia,
Rwanda, South Africa, Uganda, Zambia, and Zimbabwe hold the hardest hit populations.
AIDS impacts children directly and indirectly. According to UNAIDS, by the end of 1999, 570,000 children were infected. 90%
of the infections in children are acquired in vertical transmission of HIV infection, which may occur during pregnancy, at
delivery or post-partum through breastfeeding. The main patterns of HIV transmission in Sub-Saharan Africa are through
heterosexual contacts, and by mother-to-infant transmission. On the other hand, the HIV infection has indirect effects on
children who are affected by but not infected with HIV.
Back to first page

Back to first page
WHAT IS YOUR DEFINITION OF

ORPHAN?
Differences in orphan definition have program and policy implications. It is
thus, very important that researchers, epidemiologists and policy-makers
explicitly state their understanding and usage of the term "orphan". Not only
does it vary if we approach it from an epidemiological or a legal point of
view, but the ordinary language usage varies among people of different
cultures and ethnic groups.
Michaels & Levine (1992) of the Orphan Project use the term "orphan" to
describe children who have lost one or both parents. They indicate that a
majority of the women with AIDS are single mothers and, thus, their children
will lose their only parent. This rationale may not be accurate in all contexts
and geographic locations, though.
From Greek (orphanos), and Late Latin orphanus, the American

Heritage Dictionary of the English Language defines orphan as a "child whose
parents are dead". According to the on-line Encyclopaedia Britannica and the
Merriam-Webster Dictionary an orphan is: (1) a child deprived by death of
one or usually both parents; (2) a young animal that has lost its mother, and
(3) one deprived of some protection or advantage-e.g., orphans of the
storm.
These definitions contain several important elements and distinctions: on

one hand, there is a child who may have lost one or both parents; on the
other, there is an emphasis on maternal orphanhood, as it leaves the young
animal (also true of infants) in a particularly vulnerable situation; finally,
there is a figurative use of the word, which puts on the same level parentless
children and people who are alone, solitary; abandoned, cast-off, forsaken,
lost; disregarded, ignored, neglected, slighted.
The definition of 'AIDS orphan' used by UNAIDS, WHO and UNICEF is of a

child who loses his/her mother to AIDS before reaching the age of 15 years.
Some of these children have also lost, or will later loose, their father to
AIDS. A child whose father dies typically experiences serious psychological,
emotional, social and economic loss. However, because reliable data on the
number of paternal orphans are not available in many countries, the orphan
statistics used by UNAIDS and UNICEF do not include children who have lost
only their fathers.
Overall, there are variations in the age up to which children are considered
orphans (14, 15, 18 or 21 years old) and the patterns of parental death
(both parents die, either parent died or death of mother only). As I said
before, it is important that we state our definition. Here, we use UNAIDS's
understanding of the term.
Back to first page

Back to first page
IMPACT OF AIDS ON ORPHANS

Not only do children carry the emotional burden of watching a loved one
suffer and die, but they also experience the trauma of the family unit
collapsing, the stigma of AIDS associated to parental death, and a severe
decrease in the family's economic power. Several focus group discussions
and interviews held with orphaned children and community members from a
rural area near Mutare, Zimbabwe, shed light on orphans' concerns,
including feeling different from other children, stress, stigmatization,
exploitation, schooling, lack of visits and neglect of support by relatives
(Foster et al., 1997b). Frequently, a social and self-imposed silence reinforce
the feelings of grief, loss, and failure, since it prevents children and
adolescent carers to prepare themselves for the death of a loved one or to
acknowledge that it will inevitably come no matter how much effort they put
into their care (UNAIDS, 1999).
Other research conducted in sub-Saharan Africa found that AIDS orphans

experience interruption in their education (Ankrah, 1993; Foster et al.,
1997b) and more self-reported depression (Sengendo & Nambi, 1997) than
children orphaned due to other reasons or children in intact families. Even
though some studies only found a limited effect on school attendance by
orphanhood (Kamali et al., 1996), most countries' current indicators show
that, when resources are limited, orphans are the first ones to drop school.
In rural areas of Zambia, 64 percent of orphans are not enrolled in formal
school, compared with 48 percent of non-orphans (UNAIDS, 1999). An
orphan enumeration survey of 570 households in and around Mutare,
Zimbabwe in 1992 found that orphaned household-heads are likely to be
older and less well-educated than non-orphan household heads (Foster et
al., 1995).
Orphaned children experience extreme or increased poverty (Ntozi, 1997)

and live in fear that they have the disease themselves. Many do. Oftentimes,
all this results in abuse and overworked and underpaid employment of young
children, as well as in school drop out and a related loss of future. To
survive, many children become sex workers and, by so doing, expose
themselves, once again, to risk.
Orphanhood has serious implications for child health. Frequently elderly and
adolescent caregivers are uninformed about good nutritional practices, oral
rehydratation, immunization and diagnosing serious illness and are unable to
travel with infants to seek immunization or other health services (Foster,
1998).
Back to first page

Back to first page
CHANGES IN HOUSEHOLD
COMPOSITION AND FAMILY
STRUCTURE
Orphans are one of the main factors in societal instability caused by AIDS.
Parents worry about who will take care of their children when they die.
Certainly, in the past, traditional networks of immediate and extended
families would have assumed the care of orphans. Today, the epidemic is
breaking down and overburdening extended families and community
resources.
Throughout Sub-Saharan Africa, traditionally, the extended family-primarily

maternal aunts and uncles-helped cope with child survivors. Sometimes,
relatives neglect or exploit orphan relatives. Other times, unable to provide
for the children, they cast them out. Oftentimes, both very elderly and very
young children struggle to care for children orphaned by AIDS (Ntozi &
Zirimenya, 1999). Many children are raised by other children. The
emergence of orphan households headed by siblings is an indication that the
extended family is under stress (Foster et al., 1995). A study of 300 orphan
households in Zimbabwe in 1995 found that nearly half of caregivers of
orphans were grandparents (Foster et al., 1996), and another study of child-
headed households-some headed by children as young as 11 years of age-
showed that in 86 percent of households both parents had died an in 93
percent, the mother had died (Foster et al., 1997a).
Back to first page

Back to first page
CHALLENGING THE CONSPIRACY

OF SILENCE
Capacity and resources are stretched to breaking points, both at the private
and the public level. National budgets are completely strained. By 2005, the
health sector costs related to HIV/AIDS are expected to account for more
than a third of all government health-spending in Ethiopia, more than half in
Kenya and nearly two-thirds in Zimbabwe (UNAIDS, 1999). It is important to
put pressure on governments and pharmaceutical companies to ensure
greater African access to AIDS treatments, and to health infrastructure.
Drug treatments, however, will not solve this crisis. There are still many
obstacles and challenges to overcome. A serious commitment from African
leaders and high-level government officials has to occur for these trends to
change. It is essential that they speak about the problem and make it a
national priority, helping create a climate in which AIDS victims decrease in
numbers, and are not stigmatized. Their role is critical in setting priorities
straight, and putting policies and programs in place. Thus, for example,
public education campaigns, condom distribution, voluntary testing,
counseling and support services, are some of the strategies Ugandan
President Yoweri Museveni embarked his country upon. This policies have
paid off: HIV infection has dropped from 15 percent of the population to 9.7
percent. South African former and current presidents, Nelson Mandela and
Thabo Mbeki are good examples as well.
One of the Governments' main strategies in many countries has been to

promote and support community-based programs. It is important to support
orphans in their own communities, and for communities to meet soon to
discuss problems due to the impact of AIDS. Governments encourage
communities to provide care for orphans and to rely on institutional care only
as a last resort, because communities are in the best position to asses their
needs, and also because of the increasingly limited governments' ability to
provide basic services...while at the same time there is an increase in the
demand for them!
Care for orphan children should be provided in culturally-appropriate ways.

In Africa, for example, the first line of approach in orphan care must be
community-based. Formal foster care and institutional care should be only
provided as secondary and ultimate options respectively, and play a
transitional role. These provisions are, e.g., part of Zimbabwe's orphan
policy on care and protection of orphans, adopted by the Cabinet in May
1999.
It is important to build upon communities' existing capabilities, and to

channel material support and assistance in developing income-generating
activities through community groups to strengthen family coping
mechanisms at critical times. Outside organizations should help those
communities help themselves by providing material resources and assistance
with planning, monitoring, and evaluation. Priority should be given to
establishing programs in high risk communities such as low-income urban
areas and the surrounding rural areas.
Orphan enumeration is essential for planning and mobilizing support by and

from policymakers (Foster et al., 1995). Currently, there are systems for
surveillance of the size of the orphan population, yet it is necessary to
establish monitoring and evaluation systems to assess the impact of AIDS on
children, as well as that of pilot family and community-support interventions
have upon orphaned children, other children in the community, families and
the communities in general. Identifying those who are more vulnerable and
exposed to higher risks and deprivations is important in order to plan their
assistance.
Government commitment to AIDS education and prevention efforts for

young people are crucial. Fighting shame and denial, stigma and
discrimination through education and support, as well as through national
policies that protect the property rights of orphans, and their rights to health
care and education, and ensure that services are provided to all orphans
regardless of the cause of death of the parent or parents, or their gender or
religion.
Finally, whenever possible, children should be allowed to speak for

themselves. Foster et al.'s study of orphaned children and community
members in Mutare, Zimbabwe, proved that a first step in enabling children
to self-advocate was to listen to their concerns (Foster, 1997c).
Back to first page

Back to first page
RELATED LINKS
● AEGIS
Since its inception in 1990, AEGIS, using a combination of FidoNet(r)

(connecting over 32,000 electronic bulletin boards in 66 countries) and
Internet communication tools, seeks to relieve some of the suffering
and isolation caused by HIV/AIDS, and foster the understanding and
knowledge that will lead to better care, prevention, and a cure. It offers
a vast range of information, from clinical and legal information to late-
breaking news, with chat facilities.
● Centers for Disease Control and Prevention (CDC): Division of HIV/

AIDS Prevention
CDC, located in Atlanta, Georgia, USA, is an agency of the Department

of Health and Human Services. CDC's HIV mission is to prevent HIV
infection and reduce the incidence of HIV-related illness and death, in
collaboration with community, state, national, and international
partners. This site contains information on intervention, research,
support, surveillance & epidemiology.
● Children in Distress (CINDI)
CINDI, established in mid 1996, is an informal consortium of non-

government organisations (NGOs) based in Pietermaritzburg, the
capital of the province of KwaZulu-Natal, South Africa's most populous
province. This site contains links to Nelson Mandela Childrens Fund, and
other institutions and programs in the area.
● Children and AIDS International Non-Government Organisation Network

(CAINN)
CAIN was established in 1996 by NGOs and community-based
organizations to promote the rights and needs of children and young
people infected by and affected by HIV/AIDS. Its main objective is to
promote the implementation of the UN Convention on Rights of the
Child and other relevant international declarations and agreements.
● The Rights of Children Living with HIV/AIDS (CRIN)
CRIN is a global network of organizations sharing their experiences of

information on children's rights. Its web site includes the full text of
briefing and position papers on the rights of children living with HIV/
AIDS and links to a list of relevant publications.
● EU HIV/AIDS programme in developing countries (EU)
The European Union (EU) HIV/AIDS Programme in Developing

Countries aims to reduce the spread of HIV/AIDS. It cooperates with
governments and non-governmental organisations (NGOs),
international agencies and the United Nations, the private sector and
people living with HIV/AIDS. The programme was started in 1987 and
has since implemented HIV/AIDS interventions at national, regional and
international level in at least 90 developing countries.
● Southern Africa AIDS Information Dissemination Service (SAfAIDS)
SAfAIDS, established in 1994, is a subregional non-governmental

organisation aimed at promoting policy, research, planning and
programme development around HIV/AIDS in the southern Africa
region. Of significance to SAfAIDS' aim is promoting in all sectors an
understanding of the epidemic as a crucial issue for development,
rather than the view that AIDS is primarily a health problem to be dealt
with by health sectors in isolation.
● UNAIDS: The Joint United Nations Programme on HIV/AIDS
UNAIDS provides an expanded, multisectoral response to HIV/AIDS.

The six co-sponsoring organizations are the World Bank, the United
Nations Children's Fund (UNICEF), the United Nations Development
Programme (UNDP), the United Nations Population Fund (UNFPA), the
United Nations Education, Scientific and Cultural Organization
(UNESCO), and the World Health Organization (WHO).
● United Nations Children's Fund (UNICEF)
UNICEF's web site contains the Convention on the Rights of the Child,
which celebrates its tenth anniversary this year, as well as The State of
the World's Children 2000 and The Progress of Nations 1999 reports,
which issue an urgent call to leadership on behalf of children and
discuss the most daunting obstacles to development, with special
attention to AIDS emergency & orphanhood.
● UN Development Programme: HIV and Development Programme
The United Nations Development Programme HIV and Development

Programme draws together UNDP's headquarters, regional and country
programming and other initiatives designed to strengthen the capacity
of nations and organizations to respond effectively to the HIV epidemic.
This site contains information on program activities and initiatives
worldwide, policies, publications, and related links.
● U.S. Agency for International Development (USAID): Office of

Population, Health & Nutrition - USAID'S STRATEGY
USAID, is a federal government agency that conducts foreign

assistance and humanitarian aid to advance the political and economic
interests of the USA. Its site includes information on the work of its
Office of Population, Health & Nutrition, its extensive HIV/AIDS
programmes, and its series of AIDS Briefs, publications that discuss
ways in which HIV/AIDS can be integrated into projects, plans,
programmes and so affect sectors and target groups. USAID has
recently published the 1999 U.S. International Response to HIV/AIDS,
which presents a global analysis and assessment of the US foreign
policy interests as well as a review of the objectives and
accomplishments of the international strategy on HIV/AIDS developed
in 1995.
● U.S. Census Bureau

This site is full of information on the AIDS epidemic, both domestic and
international. The HIV/AIDS Surveillance Data Base is a compilation of
information from those studies appearing in the medical and scientific
literature, presented at international conferences, and appearing in the
press. The International Programs Center (IPC), part of the Population
Division of the U.S. Bureau of the Census, conducts demographic and
socioeconomic studies and strengthens statistical development around
the world through technical assistance,training, and software products.
It maintains the International Data Base (IDB), a computerized data
bank containing statistical tables of demographic, and socio-economic
data for 227 countries and areas of the world. This site also contains
The World Population Profile: 1998 contains a Special Chapter Focusing
on HIV/AIDS in the Developing World.
● World Bank: HIV/AIDS and the World Bank
The World Bank is one of the leading financiers of HIV/AIDS activities in

the world. This site contains information on World Bank HIV/AIDS
activities ad links to other related sites.
● World Health Organization (WHO): Office of HIV/AIDS and Sexually

Transmitted Diseases (ASD)
WHO, specialized agency of the United Nations, with headquarters in

Geneva, was established in 1948.The services of the agency are both
advisory or technical. This site contains AIDS-related statistics, as well
as other useful information on the state of the epidemic worldwide.
● University of Zambia Medical Library
The University of Zambia Medical Library Guide to Medical Resources

contains a special section on HIV/AIDS in Zambia. Included are a
compilation of information from the Zambia Health Information Digest,
and material gathered by scanning the web sites of Zambian
newspapers. Also there is a bibliography on HIV/AIDS in Zambia
(current up to 1995), abstracts of recent research by students and
faculty from the University of Zambia College of Medicine and the
University Teaching Hospitalis, and links to other HIV/AIDS resources
accessible on the Internet.
Back to first page
Back to first page
REFERENCES & BIBLIOGRAPHY

Ankrah, E.M. (1993). The impact of HIV/AIDS on the family and other
significant relationships: the African clan revisited. AIDS Care, 5, 5-22, cited
in Forehand et al., 1999.
Forehand, R., Pelton, J., Chance, M., Armistead, L. et al. (1999). AIDS Care,
11, 6, 715-722.
Foster, G., Shakespeare, R., Chinemana, F., Jackson, H., Gregson, S.,
Marange, C. & Mashumba, S. (1995). Orphan prevalence and extended
family care in a peri-urban community in Zimbabwe. AIDS Care, 7, 3-17.
Foster, G., Makufa, C., Drew, R., Kambeu, S. & Saurombe, K. (1996).
Supporting children in need through a community-based orphan visiting
programme. AIDS Care, 8, 389-403.
Foster, G., Makufa, C., Drew, R.& Kralovec, E. (1997a). Factors leading to
the establishment of children-headed households-the case of Zimbabwe.
Health Transition Review, 7, (Suppl. 2), 157-170.
Foster, G., Makufa, C., Drew, R., Mashumba, S. & Kambeu, S. (1997b).
Perceptions of children and community members concerning the
circumstances of orphans in rural Zimbabwe. AIDS Care, 9, 4, 391-405.
Foster, G. (1997c). Orphans. AIDS Care, 9, 1, 82-87.
Foster, G. (1998). Today's children-challenges to child health promotion in

countries with severe AIDS epidemics. AIDS Care, 10, (Suppl. 1), S17-S23.
Gregson, S., Garnett, G.P., Shakespeare, R.., Foster, G. & Anderson, R.

(1994). Determinants of the demographic impact of HIV-1 in sub-Saharan
Africa: the effect of a shorter mean adult incubation period on trends in
orphanhood. Health Transition Review, 4 (Suppl.), 65-92.
Hunter, S. (1992). Orphans and AIDS in Africa. Africa Notes (Cornell

University Institute for African Development), April 1992, 5-7.
Kamali, A., Seeley, J.A., Nunn, A.J., Kengeya-Kayondo, J.F., Ruberantwari,

A. & Mulder, D.W. (1996). The orphan problem: experience of a sub-Saharan
Africa rural population in the AIDS epidemic.
Michaels, d. & Levine, C. (1992). Estimates of the number of motherless

youth orphaned by AIDS in the United States. Journal of the American
Medical Association, 268, 3456-3461, cited in Forehand et al., 1999.
Ntozi, J.P. (1997). Effect of AIDS on children: the problem of orphans in

Uganda. Health Transition Review, 7 (Suppl.), 23-40.
Ntozi, J.P., Ahimbisibwe, F.E., Odwee, J.O., Ayiga, N. & Okurut, F.N. (1999).
Orphan care: the role of the extended family in northern Uganda. In The
Continuing HIV/AIDS Epidemic in Africa: Responses and Coping Strategies. I.
O. Orubuloye, J. Caldwell & J. P. Ntozi, eds. Canberra: Health Transition
Centre, pp.225-236..
Ntozi, J.P. & Zirimenya, S. (1999). Changes in household composition and

family structure during the AIDS epidemic in Uganda. In The Continuing HIV/
AIDS Epidemic in Africa: Responses and Coping Strategies. I.O. Orubuloye,
J. Caldwell & J. P. Ntozi, eds. Canberra: Health Transition Centre, pp.193-
209.
Sengendo, J. & Nambi, J. (1997). The psychological effect of orphanhood: a

study of orphans in Rakai District. Health Transitions Review, 7 (Suppl.), 105-
124, cited in Forehand et al., 1999.
Southern Africa AIDS Information Dissemination Service (SAfAIDS) and

Commercial Farmers Union (CFU). (1996). Orphans on Farms: Who Cares?
An Exploratory Study into Foster Care for Orphaned Children on Commercial
Farms in Zimbabwe. Harare, Zimbabwe: SAfAIDS & CFU.
U.S. Bureau of the Census. (1997). The demographic impacts of HIV/AIDS-

perspectives from the world population profile: 1996. Washington, DC:
International Programs Center Population Division.
Joint United Nations Programme on HIV/AIDS (UNAIDS). (1998). Report on
the Global HIV/AIDS Epidemic. Geneva, Switzerland: Joint United Nations
Programme on HIV/AIDS.
Joint United Nations Programme on HIV/AIDS (UNAIDS). (1999). Children

orphaned by AIDS: Front-line responses from eastern and southern Africa.
Geneva, Switzerland: Joint United Nations Programme on HIV/AIDS. http://
www.unaids.org/publications/documents/children/
Back to first page

Today's Vision of Public Health
The public health perception today has changed since we now have to worry and take care of things that years ago were considered
of second hand impact to human health. One of the most significant examples would be public awareness about smoking and smoke
free environments, which is now higher than ever. Similar examples are the actively enforce laws against driving while intoxicated,
wearing a bike helmet while riding or wearing seat belts while driving.
All this work undertaken in the process of prevention, protection and promotion of public health comes as a result of the good job of
public health proffesionals. This is why April 1st - April 7th has been proclaimed National Public Health Week by President Bill
Clinton and Governator Parris Glendening. The people who work in Public Health Offices has a key role in providing innovative
strategies for protecting and improving our health.
Public Health Mission Today
- Prevention Protection Promotion of Health in cases of epidemics, environmental hazards, injuries, disasters etc.
Services provided by Public Health Offices
Monitor health status
Diagnose health problems
Information and education of public about health problems
Enforcement of law that protect health
Research for new innovative solution of health problems
PUBLIC HEALTH OFFICES
Their Major Duties
The public health offices serve as the focal point for leadership and coordination across the public health science. It provides
direction to program offices and provides advice and counsel on public health issues. Let's see more information on the activities of
different Public Health Offices. The types of public health offices presented here are some typical kind of offices that work usually in
Public Health Departments.
Office of Disease Prevention and Health Promotio Mission:
To provide state/national leadership in improving the health of the population through prevention of premature death, disease, and
disability
Major Functions:
Establishing and managing state/national health promotion and disease prevention goals and objectives as a framework for
multisectoral action to improve health and reduce risks to health
Convening and coordinating Operating Divisions and Agencies of the Department, other Federal agencies, national nonprofit
voluntary and professional associations, State and local agencies and organizations to achieve measurable improvements in health
and to reduce risks to health
Providing a source of expertise on public health and science, especially in areas related to prevention public health, and primary
health care
Serving as a one-stop-shopping source for consumer health information resources within the public and private sectors
Defining significant areas of opportunity for innovative initiatives to improve the health of the public, involving multiple agencies,
and providing leadership to address those opportunities
Examples of Principal Areas of Activity:
Tracking progress of objectives of Healthy People 2000 and disseminating information about the status of objectives and initiatives
to achieve them to a broad constituency and developing plans and revising the framework for a prevention agenda for the first decade
of the 21st century
Coordinating nutrition policy, environmental health policy, primary and preventive health care initiatives, school health, worksite
health promotion, Healthy Cities/Healthy Communities, health information/communication resources, and public health functions on
behalf of the Operating Divisions that compose the Public Health Service and, as appropriate, the entire Department; and serving as
the principal coordinator/liaison with related work in other Federal departments and in the private sector
Reviewing and improving consumer health information resources available through the National Health Information Center, with
special emphasis on decentralized approaches that use community-level resources
Pursuing ground-breaking innovations in cost-effectiveness analysis methodology and use of advanced communication technology
for consumer health information
Office of Emergency Preparedness Mission:
To manage and coordinate Federal health, medical and health-related social service response and recovery to Federally declared
disasters under the Federal Response Plan Major Functions:
Coordination and delivery of Department-wide emergency preparedness activities, including continuity of government, continuity of
operations, emergency and
assistance during disasters and other emergencies
Coordination of the health and medical response of the Federal government, in support of State and local governments, in the
aftermath of terrorist acts involving chemical and biologic agents
Direction and maintenance of the medical response component of the National Disaster Medical System, including the development
and operational readiness capability of Disaster Medical Assistance Teams and other special team that can be deployed as the
primary medical response teams in case of disasters
Examples of Principal Areas of Activity:
Response to the medical and health-related needs of the community resulting from hurricanes, flooding fire etc. Assistance with the
recovery efforts in community locations resulting from the damage inflicted by various nature or manmade disasters Strengthening
the national capacity to respond to other natural or terrorist disasters requiring medical or health-related emergency services
Working with State and local governments to enhance readiness of metropolitan area medical strike teams to respond to acts of
terrorism
Development of plans for continuity of operations and continuity of government necessitated by the Departmental reorganization
Completion of the of the interdepartmental plan to respond to the health and medical consequences of nuclear, biologic and chemical
terrorism
Response to the medical and health-related needs of victims of other natural disasters that occur during the year
Office of Population Affairs Mission:
To assist in making comprehensive voluntary family planning services readily available to all persons desiring such services
To coordinate domestic population and family planning research with the present and future needs of family planning programs
Major Functions:
Providing analysis, policy advice, and administrative leadership on topics related to family planning and population research, such as
reproductive health, teen pregnancy, unintended pregnancy, and contraceptive technology
Serving as primary focus within the government, acting as clearinghouse of information, and providing liaison with other agencies in
areas of domestic and international population and family planning research and programs
Principal Areas of Activity:
Administer new adolescent pregnancy prevention initiatives through new demonstrations, with first national cross-site evaluation
Outreach efforts beyond family planning clinics to provide health education services for young people, through community
organizations Technical assistance for Title X providers to function effectively in managed care environments Establishment of a
pilot demonstration project to employ and provide on-the-job training in allied health skills for young, low-income men recruited
from communities served by Title X clinics
Analysis of the status of family planning nurse practitioner training and certification in relation to current trends in nursing education
and development of policy and program recommendations
Implementation of 076 guidelines for HIV screening and education at family planning service sites in conjunction with pregnancy
testing services
Collaboration with relevant PHS OpDivs on contraception development and utilization patterns
Office on Women's Health Mission:
To serve as the focal point for women's health activities within the U.S. Department of Health and Human
Services
To improve the health of women across the lifespan by stimulating and coordinating crosscutting activities in women's health
research, health care service delivery, and public and health professional education and training within the Department of Health and
Human Services; and with other government agencies, public and private organizations, and consumer and health care professional
groups
To advise the Assistant Secretary for Health and serve as a critical Federal resource on scientific, medical, ethical, and policy issues
related to the advancement of women's health
Major Functions:
Stimulating and coordinating effective women's health programs and policies within communities and in partnership with other
agencies and public and private organizations at the national, state, and local levels
Promoting improved access for women to comprehensive, culturally appropriate disease prevention, health promotion, diagnosis, and
treatment services
Educating the public and health care professionals about women's health issues
Fostering the recruitment, retention, and promotion of women in health professions and in research careers
Principal Areas of Activity:
Implementation and coordination of the National Action Plan on Breast Cancer, a major public-private partnership to eradicate the
disease as a threat from the lives of American women
Stimulation of the transfer of imaging technologies from the defense, space, and intelligence communities to improve the early
detection of breast cancer
Coordination of a women's health communications strategy
Establishment of a National Women's Health Information Center with an "800" toll-free number and Internet capacity to disseminate
a broad range of women's health information developed across the agencies of the Federal government to consumers, health care
professionals, policymakers, and private sector organizations
Implementation of initiatives to improve health professional's training in women's health, including collaboration in the development
of a model women's health curriculum for medical student education, residency, and post-residency fellowship training, and the
development of resource materials to promote careers in women's health
Continued implementation of an educational campaign for the public, health care professionals an policymakers -- Healthy Women
2000 -- focusing on ways to provide women with the tools and knowledge needed to lead long healthy lives Promotion of a Public
Health Service regional women's health agenda by supporting regional women's health coordinators, programs, projects, conferences
and other activities; and fostering collaborative women's health initiatives across the regions
Useful Links to Public Health Subjects
❍ PUBLIC HEALTH IN AMERICA

❍ World Health Organization
❍ About Work:Career Guide-Public Health Administrator
❍ General Public Health--HealthWeb
❍ Faculty of Public Health Medicine - Introduction to Public Health Medicine
❍ National Institutes of Health
❍ Epidemiology and Public Health
❍ The American Public Health Association(APHA)
❍ SPH Home page
❍ Basic Library Resources for Public Health: Health Services Administration
❍ Public Health forum of Women's Health
❍ Making a Powerful Connection:Health of the Public and the NII
❍ Conferences
❍ Yahoo! Health:Public Health and Safety:Organizations: Professional
❍ Priority Healthcare Wearside Health Promotion
❍ Welcome to Priority Healthcare Wearside Health Promotion
❍ Yahoo!-Health:Public Health and Safety: Institutes
❍ Information Network for Public Health Officials
❍ Women-S-Health
❍ Children-S-Health
❍ http://www.healthopedia.com/ [Healthopedia.com]
❍ http://health.allrefer.com/ [AllRefer Health Encyclopedia]
WORKSITE
WELLNESS
PROGRAMS:
A HEALTH
PROMOTION
INITIATIVE
Claudia Nieves Velasquez
Division of Nutritional Sciences
Cornell University
Copyright © 1997, Claudia Nieves Velasquez
This page was created as part of a project for the course HSS 691 Program Evaluation and Research Design (Professor Bill Trochim) Spring
1997, Cornell University
Back to Bill Trochim's Knowledge's Base Home Page
Back to Project Gallery page
Introduction:
Worksite health promotion refers to the systematic approach endorsed by an organization designed to enhance the health of its
employees through initiatives based in the worksite as well as in the employee's community, clinic and home. Activities usually
promote awareness education, behavior and lifestyle change and the creation of supportive environments. The ultimate goal of
worksite health promotion is to create a culture which values and meets both individual and organizational needs for health
improvement.
Interest in worksite health promotion and disease prevention has grown in the past decade, enhanced in part by the increase in science
base supporting the role of prevention in reducing premature death and disability. Employers are interested in the potential for
worksite health promotion and disease prevention efforts to help contain health care costs while improving productivity and morale
and reducing absenteeism and employee turnover (1992 National Survey of Worksite Health Promotion Activities).
The level of resources devoted to worksite health promotion initiatives has increased, both in public and private sectors. Several
agencies like the Office of Disease Prevention and Health Promotion (ODPHP), Center for Disease Control (CDC), and others, play
an important role in stimulating and coordinating efforts to promote health and reduce the risk of disease.
This website is a comprehensive summary of the art and science of workplace health promotion programs, and provides important
links to revised web-site information related with the topic.
INTRODUCTION/ WHY HEALTH PROMOTION AT THE WORKSITE ?/ THE INITITATIVE: WORKSITE

WELLNESS PROGRAMS / WHAT IS A WORKSITE WELLNESS PROGRAM/ US TRENDS IN WORKSITE
WELLNESS PROGRAMS/ TO START YOUR OWN WWP / THANK YOU AND FEEDBACK
WHY HEALTH PROMOTION AT THE

WORKSITE ?
Health promotion includes environmental and social support for healthy

behaviors and conditions in adittion to building awareness, knowledge,
skills, and interpersonal support for personal behavior change. This
programs, hold the promises of reducing the burden of ill health,
moderating medical care costs, and improving positive health in all its
dimensions. Therefore, it is important to review what support and
conditions can be provided by health promotion programs at the worksite.
In many respects, worksites are opportune settings for delivering risk
factor interventions because they provide ready access to working
populations, the opportunity for promoting environmental supports for
behavior change , and natural structures for social support. In addition,
health related policies can be made at the organizational level to influence
lifestyle changes (Glasgow, et.al. 1995).
North American worksites have been providing a variety of health-related programs and activities to workers, in response to various
efforts to control health care costs as well as to provide worker benefits that have a potential for improving productivity (Weisbrod, et.
al. 1991). Nationally, employers devote approximately 95 percent of health benefit expenses to treating illness. Overall health benefit
expenses could be significantly decreased by shifting more resources to preventive care and health promotion efforts. Much of these
activity have also been seen as a response to the United States Public Health Service’s Healthy People 2000 campaign and the
formation of the 1990 National Goals and Objectives which specifically set worksite objectives.
The evolution of the concept of health promotion in the worksites, has been followed by an increment in research initiatives
conducted to evaluate particular interventions to deliver appropriate risk factor information, and achieve changes in health related
behaviors and lifestyle. Scientific publications and articles related with such studies, can be found in the American Journal of Health
Promotion, American Journal of Public Health, Journal of Community Health, and Journal of the American Medical Association
(JAMA).
The California Wellness Foundation is one of many associations, that provide funding for research to improve the health and well-
being of people and communities through health promotion and disease prevention programs. An example of this type of study is the
current research on Small Business Workplace Wellness Program conducted by the University of California, Irvine Health Promotion
Center.
Visit the following links on the internet for more information on:
Why Health Promotion at the Worksite ?
How Does Worksite Health Promotion Contribute to an Organization’s Fiscal Health?

The Initiative: Worksite Wellness Programs

The World Health Organization (WHO) formulated a definition of health in 1970 that as had a great effect on the medical
model of health care. Health was defined as “a state of complete physical, mental, and social well being, not merely the
absence of disease of infirmity”. ( Shillingford J. Shillingford A. 1991) This definition, paved the way for wellness.
Wellness is a constant and deliberate effort to stay healthy and achieve the highest potential for total well-being, it is what
they call the latest appeal to Americans’ belief in human perfectibility; but wellness is something one has to work at. This
concept is based on the assumption that only an individual can make life-style changes that improve its health. Eventhough
life style is determined by each individual, personal responses to changes and behaviors, should be enabled by the
community through the provision of facilities and education. Philosophically, the wellness movement could have had its
roots in the Greek civilization, for the concept includes wholeness of mind and body; in the Greek culture, programs of
physical education became a systematic part of overall education. Factors responsible for the emergence of the movement
came not from education but from corporate business. (Goldsmith, 1986).
The remarkable growth of worksite health promotion wellness programs in industrialized nations since the late 1970’s has
been influenced by four phenomena:
(1) changing demographic profiles of workplaces
(2) rising of medical care costs and costs of lost productivity in unhealthy workers
(3) recognition of the greater influence of behavior and environmental change, and
(4) emerging evidence that health education and health promotion strategies have been effective in altering the
behavioral and environmental precursors of health (Green and Kreuter, 1991).
Employers can offer health promotion programs at or associated with the workplace in the hope of maintaining or
improving the various dimensions of health for both humanitarian and business reasons. These reasons have to do with the
availability for work and cost as well as altruistic or paternalistic interest in employee well-being. With increasing
competitive pressure from abroad, employer interest in methods to improve creativity and productivity increased as well.
Interest in prevention on the part of employers has risen rapidly as mentioned before.

What is a Worksite Wellness Program ?????????
Worksite Wellness Programs, are systematic and integrated programs that include educational, motivational and practical
activities on the areas of :
PHYSICAL FITNESS
HEALTH EDUCATION
NUTRITION
CARDIOVASCULAR RISK REDUCTION
CANCER PROTECTION
SMOKING CESSATION
STRESS MANAGEMENT
SPIRITUALITY
SUBSTANCE-ABUSE CONTROL
SEXUALITY
INDUSTRIAL SAFETY
IF YOU BROWSE THROUGH SOME OF THE LINKS THAT ARE MENTIONED IN THIS PAGE,
YOU MAY FIND THAT THEY IN TERM DIRECT YOU TO INTERNET SITES THAT HAVE
INFORMATION ON EACH OF THE AREAS COVERED IN WORKSITE HEALTH PROMOTION
PROGRAMS.
The programs aim to change the employee’s life style and reduce disease risks. The way different companies go about
encouraging these changes varies in detail, but almost all programs contain the elements mentioned before. Different
strategies for the promotion of health may be used depending on the number of employees, budget and facilities of each
worksite.
Most programs include individually focused strategies, environmental issues, and organizational and community efforts;
Some of them, even go further and coordinate activities with community programs and resources, specially when
communities do not have adequate disease prevention programs.

US TRENDS IN WORKSITE
WELLNESS PROGRAMS ...
The Office of Disease Prevention and Health

Promotion (ODPHP) funded a national survey
in 1985 to assess the level of integration of
health promotion activities in private worksites
with 50 or more employees. In 1992, ODPHP
commissioned a second national survey to
quantify and characterize evolving trends in
the nature and extent of worksite health promotion programs since the 1985 study and, in some cases, to establish baseline
data points for national health objectives.
The 1992 National Survey of Worksite Health Promotion Activities provides a description of the characteristics of worksite
health promotion activities in the private sector, and compares them across industries of different size, it also describes
aspects of worksite health promotion administration, evaluation and benefits. Overall, the 1992 survey shows and increase
in worksite health promotion activities since 1985; particularly notable in areas of nutrition, weight control, physical fitness,
high blood pressure and stress management. A greater proportion of worksites in the services industry and transportation/
communications/utilities industry offer health status/health risk questionnaires and screenings as well as information or
activities than worksites in other industries.
TO START YOUR OWN WWP:
If you or your company are thinking of starting a WWP keep on reading ...
Before an organization starts the design process of a worksite wellness program, it should prepare by answering four basic
questions:
1. How ready is it to develop a health promotion program ?

2. Is it setting realistic goals for the program ?
3. How participative a process does it want to follow in designing the program ?
4. How extensive a design process does it wish to follow ?
Organizations that know ahead of time the answers to these questions, or are thinking about a very simple program, do not
need an extensive process of design and those that work with external vendors can sometimes rely on their expertise to
initiate the design.
CLICK HERE for some good guidelines to follow when starting your worksite wellness program that have been
developed by the Center for Health Promotion at the University of California at Irvine.
The following chart, gives several links to private and public agencies that provide important information and
services that may help you create what you need:
PUBLIC INSTITUTIONS
PRIVATE COMPANIES
Pacific Care Industries

US Department of Publlic Health Services
Therafit Network
ODPHP
Association of Worksite Health Promotion
on line
UCI Health Promotion Research Center
National Center for Chronic Disease Prevention
Wellness Council of America
Click Here for....

OUTSTANDING WORKSITE WELLNESS PROGRAMS !!!
The Wellness Council of West Virginia, follows Worksite Wellness Programs that have been successful in the
USA, and presents them with a special award, these companies have well developed programs, which can be
followed as examples for future initiatives. Look for them !!!!
Also take a look at the Wellness Council of Americas's list of Healthiest Companies
INTRODUCTION/ WHY HEALTH PROMOTION AT THE WORKSITE ?/ THE INITITATIVE: WORKSITE WELLNESS
PROGRAMS / WHAT IS A WORKSITE WELLNESS PROGRAM/ US TRENDS IN WORKSITE WELLNESS PROGRAMS/ TO
START YOUR OWN WWP / THANK YOU AND FEEDBACK
THANK YOU
HOPE YOU HAD A GREAT TIME VISITING THIS WEB PAGE
NOW YOU ARE READY TO START WORKING ON YOUR HEALTH; CLICK ON

THE IMAGE FORE SOME INTERESTING WELLNESS
RECOMENDATIONS !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Click on the mailbox to send e-mail ...
Back to Main Page

(1) Changing demographic profiles in most workplaces:
Health promotion tends to be organized around worksites for adults because that’s where the people are. Worksites are
to many adults, what schools are to children and youth-places where most of of the daylight hours are spent, friendships
are made, where many of the rewards that make one feel worthy are dealt, and where one can be reinforced by peers; it is also a place
where one feels pressure to perform and deliver. The workplace has replaced the neighborhood as the community of reference and
social identity for many urban and suburban North Americans and Europeans. These demographic and social trends- combined with
the pervasive influence of occupational environments on adult health, quality of life, behavior, and lifestyle-make them logical if not
ideal settings for health promotion programs.
BACK TO WHY THE INITIATIVE: WORKSITE WELLNESS PROGRAMS
(2) Containing Health Care Costs:
The share of the burden for increases in health-care costs, have fallen more heavily on industry than on
individuals of governments. In 1985, employers in the USA paid $ 400 billion or approximately 80% of private
insurance premiums, which accounted for approximately one quarter of all the personal health care services in the country. The
Health Net page on Worksite Wellness Programs, presents numerical data on the rising of costs of health care in the US.

(3)Recognizing the relationships between behavior, the environment
and health:
The trend in adoption of health promotion programs by employers parallels the progressive steps in stages-
of-change theory. In the early 1980’s, most employers were in either a pre-contemplation or a not-ready-
for action stage. The earlier adopters (CEO’s who because of health problems were true believers in a
healthy lifestyle) usually installed health promotion programs for personal more than for economic
reasons. During the 1980’s, employers began to initiate these programs based on a growing awareness of their potential health and
economic benefits. Through repeated exposures to health messages, the general public, including the employers, began to see the
relevance of health promotion.
Many companies have found relief in Worksite Wellness Programs, to reduce employee’s health risks, and lessen their need for
health care. Employers are also promoting these type of programs since they enhance morale and increase employee retention and
boost a company’s image among workers and the community (Chenoweth, 1991). Employee wellness programs are initiated for
different reasons, maybe the CEO just had a heart attack !!!, but whatever the reason, wellness programs are worth a company's
investment.

Self-Efficacy
in Health Related Behavior Change
Rosemary H. Walkley
Division of Nutritional Sciences
Cornell University
The purpose of this web page is to enable you to link with research on self-efficacy in health related behavior change, to find a a
book on the topic, or to find other information on self-efficacy. If you believe you can do it, you can, so follow me for a tour. If you
don't believe you can do it, you really need to follow me to learn more about self-efficacy. This document is divided into 4 parts:
What is self-efficacy?
Research on Self-Efficacy and Health Related Behavior Change
Books on Self-Efficacy
More General Information on Self-Efficacy
Choose your link and come with me, or scroll down to read the whole document.
Return to Trochim Home Page

Return to Project Gallery Home Page
Return to Title Page
What is self-efficacy?
The random House Dictionary of the English Language (1987) defines efficacy as "the capacity for producing a desired
result or effect" and self as "a person ...referred to with respect to complete individuality." Self-efficacy, then, can be defined as "the
perception or judgement of one's ability to perform a certain action successfully or to control one's circumstances," by the survey of
Social Science (Psychology Series, F. N. Magil,ed, Salem Press, Englewood Cliffs, N. J., 1993).
Albert Bandura defined self-efficacy as a judgement of one's capability to accomplish a certain level of performance. He stated that ,
studies have shown that "perceived self-efficacy is a significant determinant of performance that operates partially independently of
underlying skills".
The Social Cognitive Theory, as outlined by Bandura, is based on a triad including both environmental and internal forces: behavior,
cognitive and other personal factors, and environmental factors. These factors influence each other. Bandura calls this triadic
reciprocal determinism where reciprocal refers to the mutual action between the factors and determinism refers to the production of
effects by certain factors. The strength of the reciprocity between factors can vary by person and situation and take place over time.
Because of the complexity of human behavior, thought process, and environment, it is not possible to study every interaction at the
same time. By studying subsystems, understanding of the interactions of factors in the subsystems can occur. Self- efficacy is one of
those cognitive and other personal factors.
For more information on self-efficacy and the Social Cognitive Theory visit Lynn Ann Rampoldi Hnilo's paper on The
Hierarchy of Self-Efficacy and the Development of an Internet Self-Efficacy Scale or Self-Efficacy and Health Behaviors by Ralph
Schwarzer and Reinhard Fuchs.

Research on Self-Efficacy and Health Related Behavior Change.
Naval Health Research Center’s mission is to support fleet operational readiness through research, development, testing, and
evaluation on the biomedical and psychological aspects of Navy and Marine Corps personnel health and performance. You can find
an abstract of an article called, “Efficacy of Health Promotion Videotapes in the U.S. Navy: A Lesson for Health Educators“
published in the Journal of Health Education. This study evaluated six health promotion tapes in terms of changes in knowledge, self-
efficacy, behavioral intentions, and self-report of behavior.
The abstract of an article, "The Importance of Self-Efficacy Expectations in Elderly Patients Recovering from Coronary
Artery Bypass Surgery" by Carroll, D. L. and published in Heart & Lung, January-February 1995;24(1):50-59 is presented. This
article is in Women’s Health Weekly which includes other journal articles and news stories. This is a subscription newsletter which
you can reach through Newsfile.
The Ad Referendum site allows you to browse the content list of journals on diabetes. For instancethe May/June 1996 Issue
of the Diabetes Educator contained an article called “Self-efficacy theory as a framework for community pharmacy-based diabetes
education programs” by JA Johnson. In the January/Jebruary 1995 Issue of the same journal there was an article on “Self-Efficacy
and Confidence in Outcomes as Determinants of Self-Care Practices in Inner-City, African-American Women with Non-Insulin-
Dependent Diabetes” by A.H. Skelly, J.R. Marshall, B.P. Haughey, et al. You need to go to a hard copy of the journal to actually read
the article.
One project of NERI, is “Strategies to Promote Adaptive Self-Conceptions for Enhanced Physical Activity”. The overall
goal of this project is to adapt cognitive restructuring strategies developed initially by Dr. Margie Lachman in memory training for
improving exercise training of older persons and for dealing with older person's fear of falling. They hope to enhance self-efficacy
and sense of control for use in exercise training and an intervention to reduce fear of falling. New England Research Institutes, Inc. is
a small business devoted to social epidemiologic and public health research, the results of which can inform patient care and health
care policy.
Modeling Health Behavior Change: The Health Action Process Approach (HAPA) by Ralf Schwarzer is a this brief article
describing a new health behavior model, called the Health Action Process Approach (HAPA). Its basic notion is that the adoption,
initiation, and maintenance of health behaviours must be explicitly conceived as a process that consists of at least a motivation phase
and a volition phase. Self-efficacy plays a crucial role.
Health World archives articles of interest. According to them “HealthWorld Online IS the most comprehensive global health
network on the Internet – but we are so much more. HealthWorld is the only Internet health network that integrates both natural and
conventional health information into a synergistic whole. HealthWorld ...opens the door to a new approach: Self-Managed Care™.” I
have never seen the journals they archive but you can visit their Home Page . One article in their Medical Self-care Archives is The
Anatomy of Empowerment which talks about Kate Lorig’s research in self-efficacy’s role in coping with arthritis.
Books on Self-Efficacy
Self-Efficacy and Health Behaviours is a chapter of a book written by Ralf Schwarzer & Reinhard Fuchs from Freie
Universität Berlin. This chapter is to appear in: Conner, M., & Norman, P. (1995). Predicting Health Behaviour: Research and
Practice with Social Cognition Models. Buckingham: Open University Press. This is quite a long chapter with information on the
Social Cognitive Theory and much more. It also has a long refence list at the end. You can find out more about Ralf Schwarzer who
is Professor of Psychology at the Freie Universität Berlin, Germany, and Visiting Professor at York University, Toronto, Canada.
Cambridge University Press offers the book Self-Efficacy in Changing Societies which is edited by Albert Bandura. The
book analyzes the diverse ways in which beliefs of personal efficacy operate within a network of sociocultural influences to shape
life paths. The chapters, by internationally known experts, cover such concepts as infancy and personal agency, competency through
the life span, the role of family, and cross-cultural factors.The volume addresses important issues of human adaptation and change
that will be of considerable interest to people in the fields of developmental psychology, education, health and sociology.
Hogrefe & Huber Publishers advertise the book Power Therapy, Maximizing Health Through Self-Efficacy, by M. Aleksiuk
from the Department of Psychology, at the University of Alberta, Edmonton, Canada. This book explains a unique set of new
techniques which can be used as both defenses and therapies with regard to our most common psychological problems. Using
concepts from evolutionary psychology, the author explains: (1) Why having a clear sense of control over our own lives is vitally
important for our mental health. (2) What completely natural mechanisms are available for us to achieve this result. (3) How we can
recover a sense of control if we happen for whatever reason to have lost it.
Working With Self-Esteem in Phychotherapy is a book on the WEB by Nathaniel Branden, Ph.D. He includes self-efficacy
as one of the two components of self esteem along with self respect. He goes on to explain more about self respect and how to help
patients increase their self-esteem.
More General Information on Self-Efficacy
The Generalized Self-Efficacy Scale is available at this site. According to the authors, "it is a 10-item psychometric scale that
is designed to assess optimistic self-beliefs to cope with a variety of difficult demands in life. The scale has been originally developed
in German by Matthias Jerusalem and Ralf Schwarzer in the early 80ies and has been used in many studies with thousands of
participants. In contrast to other scales that were designed to assess dispositional optimism, this one explicitly refers to personal
agency, i.e., the belief that one's actions are responsible for successful outcomes. The scale is now available in 20 language versions.”
These can be taken off of the WEB by selecting the language you are interested in and pressing enter. Not all languages appeared
when I tried.
Cambridge University Press offers Self-Efficacy in Changing Societies which is edited by Albert Bandura. The book
analyzes the diverse ways in which beliefs of personal efficacy operate within a network of sociocultural influences to shape life
paths. The chapters, by internationally known experts, cover such concepts as infancy and personal agency, competency through the
life span, the role of family, and cross-cultural factors.The volume addresses important issues of human adaptation and change that
will be of considerable interest to people in the fields of developmental psychology, education, health and sociology.
You can find a Self-Efficacy Reference List from the University of Houston.
Other WEB sites contain information on self-efficacy in others areas besides self-efficacy and changing health related
behaviors. For examples try the following sites:
Smoking -Predictors of Smoking Abstinence Self-Efficacy.
Using computers - The Development of the Computer User Self-Efficacy Scale and The Hierarchy of Self-Efficacy
and the Development of an Internet Self-Efficacy Scale
Self-efficacy of students - Self-Efficacy In College Students and The Development of the Health Student Self-
Efficacy Scale
Copyright © 1997, Rosemary Walkley, All Rights Reserved

MANAGED CARE
By:
Anne Blakeslee
Cornell University
BIBLIOGRAPHY
WEB LINKS:
InfoXpress: Under
Intro to Managed Care
Construction
InfoXpress: Under
Medicaid Managed Care
Construction
InfoXpress: Under
Managed Care Research
Construction
Managed Care InfoXpress: Under
Terminology Construction
RELATED WEB SITES:
Mental Health Net (see Mental Health

Policy Resource Center, Inc.
Administation)
Health Insurance Weekly More Great Managed Care Resources
This page was created with the Hot Dog Pro Web Page Editor
Managed Care Bibliography
Aluise, John J., Konrad, Thomas R., and Buckner, Bates. Impact of IPAs on fee-for-service medical groups. Health Care Manage
Review, 1989, 14 (1), 55-63.
American Nurse. The official publication of the American Nurses Association, June 1995. Managed care: does the promise meet the
potential?
Appelbaum, Paul S. Legal Liability and Managed Care. American Psychologist, March 1993. p. 251-257.
Aronow, David B. Management information systems: Their role in the marketing activities of HMOs. Health Care Manage Review,
1988, 13 (4), 59-64.
Boggs, H. Glenn. Health maintenance organizations: improvements in the regulatory environment. Health Care Manage Review,
1986, 11 (2), 56-59.
Boland, P. (1991). Making managed healthcare work: A practical guide to strategies and solutions. New York: McGraw Hill.
Brennan, Caron Primas, R.N., B.S.N. Managed care and health information networks. Journal of Health Care Finance 1995; 21 (4):
1-5.
Brown, Montague and McCool, Barbara P. Vertical integration: exploration of a popular strategic concept. Health Care Manage
Review, 1986, 11 (4), 7-19.
Caper, Philip. Managed competition that works. The Journal of the American Medical Association v. 269 (May 19, 1993) p. 2524-6.
Casalino, Larry. An interview with David Himmelstein and Steffie Woolhandler. Socialist Review. p. 135-153.
Clancy, Carolyn M., Brody, Howard. Managed care: Jekyll or Hyde? Journal of American Medical Association, v. 273 (Jan. 25,
1995) p. 338-9.
Conrad, Douglas A. And Dowling, William L. Vertical integration in health services: Theory and managerial implications. Health
Care Manage Review, 1990, 15 (4), 9-22.
Cummings, Nicholas A., Impact of managed care on employment and training: a primer for survival. Professional Psychology,
Research and Practice v. 26 (Feb. 1995) p. 10-15.
DeMuro, Paul R. The financial manager's guide to managed care & integrated delivery systems: strategies for contracting,
compensation & reimbursement. Conference Board, 1994.
Dolinsky, Arthur L., and Caputo, Richard K. An assessment of employers' experiences with HMOs: Factors that make a difference.
Health Manage Review, 1991, 16 (1), 25-31.
Doner, Kathy S., Rosner, Fred, La Puma, John. Managed care: ethical issues. Journal of American Medical Association v. 274
(August 23/30, 1995) p. 609-11.
EBRI Issue Brief. August 1992. p. 543-571.
Feldman, Roger, Kralewski, John, Shapiro, Janet and Chan, and Hung-Ching. Contracts between hospitals and health maintenance
organizations. Health Care Manage Review, 1990, 15 (1), 47-60.
Feldman, S. (Ed.). (1991). Managed mental health services. Springlfield, IL: Charles B. Thomas.
Freeborn, Donald K. Promise and performance in managed care: the prepaid group practice model. Johns Hopkins University Press,
1994.
Garnick, D. W., et al. (1990). Services and charges by PPO physicians for PPO and indemnity patients: An episode of care
comparison. Medical Care. 28. p. 894-906.
Ghouldice, R. G. (1988). Antitrust and managed care. Medical Group Management. 35 (4). p. 12-13, 33.
Goldman, Robert. Managed service restructuring in health care: a strategic approach in a competitive environment. Haworth Press,
1995.
Goldstein, D. E., & McKell, D. C. (1990). Medical staff alliances: How to build successful partnerships with your physicians.
Chicago: American Hospital Publishing.
Guay, Albert H. Understanding managed care. Journal of the American Dental Association v. 126 (April 1995) p. 425-33.
Hanson, Russell L. Health-care reform, managed competition, and subnational politics. Pulius v. 24 (Summer 1994) p. 49-68.
Higgins, C. Wayne and Meyers Eugene D. The economic transformation of American health insurance: implications for the hospital
industry. Health Care Manage Review, 1986, 11 (4), 21-27.
Hillman, A. L., Pauly, M. V., & Kerstein, JU. J. (1989). How do financial incentives affect physicians' clinical decisions and the
financial performance of health mainenance organizations? New England Journal of Medicine. 321 (2). P. 86-92.
Hospital & Health Networks, September 20, 1995. Heavy managed care means slower rise in hospital costs.
Iglehart, John K. Health policy report: the struggle between managed care and fee-for-service practice. The New England Journal of
Medicine v. 331 (July 7, 1994) p. 63-7.
Iglehart, John K. Physicians and the growth of managed care. The New England Journal of Medicine v. 331 (Oct. 27, 1994) p. 1167-
71.
John, Joby and Miaoulis, George. A model for understanding benefit segmentation in preventive health care. Health Manage Review,
1992, 17 (2), 21-32.
Kassirer, Jerome P. Managed care and the morality of the marketplace. The New England Journal of Medicine v. 333 (July 6, 1995)
p. 50-2.
Kralewski, John E., Feldman, Roger, Dowd, Bryan, and Shapiro, Janet. Strategies employed by HMOs to achieve hospital discounts:
A case study of seven HMOs. Health Care Management Review, 1991, 16 (1), 9-16.
Mays, Huey L., Katzoff, Jerald, Rivo, Marc L. Managed health care: implications for the physician workforce and medical
education. Journal of American Medical Association v. 274 (Sept. 6, 1995) p. 712-15.
McCullough, Laurence B. Should we create a health care system in the United States? The Journal of Medicine and Philosophy v. 19
(Oct. 1994) p. 483-90.
McCurren, J. Kevin. Factors for success: Capitated primary physicians in Medicare HMOs. Health Care Manage Review, 1991, 16
(2), 49-53.
Melnick, Glann A., Zwanziger, Jack and Verity-Guerra, Alicia. The growth and effects of hospital selective contracting. Health Care
Manage Review, 1989, 14 (3), 57-64.
Miller, Robert H., Luft, Harold S. Estimating health expenditure growth under managed competition: science, simulations and
scenarios. Journal of American Medical Association, v. 273 (Feb. 22, 1995) p. 656-62.
Morrison, Richard D., Rimler, George W. The ethical impacts of managed care. Journal of Business Ethics v. 12 (June 1993) p. 493-
501.
Pauly, M. V., Hillman, A. L., & Kerstein, J. (1990). Managing physican incentivesin manged care: The role of for-profit ownership.
Managed Care. 28. P. 1013-1026.
Reid, Richard A., Fulcher, Julie H., and Smith, Howard L. Hospital-health care plan affiliations: considerations for strategy design.
Health Care Manage Review, 1986, 11 (4), 53-61.
Rodwin, Marc A. Conflicts in managed care. The New England Journal of Medicine, v. 332 (Mar. 2, 1995) p. 604-607.
Rohrer, James E. The secret of medical management. Health Care Manage Review, 14 (3), 7- 13.
Sederer, Lloyd I., M.D., and St. Clair, R. Lawrence, M.D. Managed care and the Massachusetts Experience. American Journal of
Psychiatry 146:9, September 1989. p. 1142-1148.
Shimshak, Daniel G., DeFuria, Maureen C., and Getson, Jacob. Controlling disenrollment in health maintenance organizations.
Health Manage Review, 1988, 13 (1), 47-55.
Spragins, Ellyn. Beware your HMO. Newsweek October 23, 1995. p. 54-56.
Studin, Ira. Strategic health care management: applying the lessons of today's top management experts to the business of managed
care. Irwin Professional Publishers, 1995.
Vladeck, Bruce C. Managed care and quality. Journal of American Medical Association v. 273 (May 17, 1995) p. 1483.
Waitzkin, Howard. The strange career of managed competition: from military failure to medical success? American Journal of
Public Health v. 84 (March 1994) p. 482-89.
Weintraub, Michael I., Helsel, Eugene V., Marino, Joseph T. Physicians and managed care. The New England Journal of Medicine
v. 332 (April 27, 1995) p. 1173-4.
Wells, K.B., Manning, W. G., Jr., & Valdex, R.B,. (1990). The effects of a prepaid group practice on mental health outcomes. Health
Services Aresearch. 25 (4). P. 615-625.
Widra, Linda S. And Fottler, Myron D. Determinants of HMO success: The case of Complete Health. Health Care Manage Review,
1992, 17 (2), 33-44.
Wolford, G. Rodney, Brown, Montague, and McCool, Barbara P. Getting to go in managed care. Health Care Manage Review, 1993,
18 (1), 7-19.
Wolinsky, Howard. Ethics in managed care. Lancet (North American Edition) v. 345 (June 10, 1995) p. 1499.
Wrightson, W., Jr. (1990). HMO rate setting & financial stragegy. Ann Arbor, MI: Health Administration Press.
Zelman, William N. and McLaughlin, Curtis P. Product lines in a complex marketplace: Matching organizational strategy to buyer
behavior. Health Care Mange Review, 1990, 15 (2), 9-14.
Return to Managed Care Home Page

MANAGED CARE
Probably the most dramatic realignment of the nation's health care system in recent years has been the development of managed care
plans. Managed Care is a generic term that has evolved over the past few years to encompass a variety of forms of prepaid and
managed fee-for-service health care. It is also often referred to as "Managed Competition." The number of managed health care
enrollees has increased dramatically over the past 20 years. Total enrollment in HMOs is over 40 million and PPOs are about the
same. Estimates of managed care enrollees are as high as 100 million plus. To explain this growth, we need to look at the progression
of corporate practice in America.
The 20th century in the United States witnessed the transformation of society from rural to urban, from an individual to institutional
domination, from an agricultural to a manufacturing economy, and from self-employment to employee status in increasingly larger
corporations.
During the same period, medical practice made the transition from generalist to specialist, from solo to group practice, from direct
payment for health care to group insurance, and from a predominantly cottage industry to increasing emphasis on the corporate
management of medical care.
In commenting on the development of corporate management, Peter Drucker states:
During the last fifty years, society in every developed country has become a society of institutions. Every major social task, whether economic
performance or health care, education or the protection of the environment, the pursuit of new knowledge or defense, is today being entrusted to
big organizations, designed for perpetuity and managed by their own managements.
He goes on to warn that we are increasingly dependent on these institututions. His comments are particularly relevant to managed
health care plans as we approach the 21st century.
In the United States until the early 20th Century, physicians in private practice almost always billed patients directly on a fee-for-
service basis.
Physicians introduced pre-paid group practice into the American medical care system during the second quarter of the 20th Century,
which then offered them a choice between two systems of patient care. However, early growth of managed medical care delivery was
relatively slow.
The corporate practice of medicine in the traditional health care system, whether investor owned or not-for-profit, was advanced by
action of the federal government through the Medicare and Medicaid laws of 1965.
Medicare and Medicaid legislation prompted the development of investor-owned hospital chains and stimulated the growth of
university medical centers -- both of which furthered the corporate practice of medicine by increasing the number of management
personnel and physicians employed by hospitals and medical schools.
Clearly, it was the new health financing laws for the elderly and the poor that laid the groundwork for increased corporate
continuance of medical care delivery by third-party payers through government mandated regulation of fee-for-service and indemnity
payments for health services.
After years of unchecked health care inflation, third-party payers were authorized by government--thus, imposing additional
corporate cost controls on hospitals, physicians, and patients, such as DRGs, prospective pricing, and a resource-based relative value
scale.
Further federal support for the corporate practice of medicine resulted from the passage of the 1970 HMO Act which encouraged
expansion in using government grants, contracts and loans.
After the passage of the HMO Act, strong support for the HMO concept came from business, the executive, legislative and judicial
branches of government and managed care proliferated in several states, such as California.
Bipartisan support for managed care was based on the concept that HMOs can decrease costs and encourage free-market
competitions in the medical care arena and are only limited by government intervention.
Perhaps one measure of success is the virtual disappearance of some 17 national health insurance bills introduced into Congress in
the early 1970s.
Managed care is strictly an outgrowth of the private sector, dating back some 60 years. The year 1929 had been a signal year for
medical care organizations in the United States when it witnessed the establishment of a rural farmer's cooperative health plan by Dr.
Michael Shadid, in Elk City, Oklahoma -- a community of about 6,000 without any medical specialists. Despite opposition by the
county medical society, Shadid formed a lay organization composed of leading farmers -- they sold shares at $50 each to raise money
for a new hospital and then provided each shareholder with medical care at a discount rate.
That same year, two California physicians (Donald Ross and H. Clifford Loos) in Los Angeles entered into a prepaid contract to
provide comprehensive health services to about 2000 water company employees. These two plans were the beginnings of managed
care now serving over 40% of the American public.
Several prepaid plans started between 1930 and 1960. In 1954, a variant of the prepaid plan appeared, the IPA (Individual Practice
Association). A relative value fee schedule for guaranteeing payment was adopted, all grievances were heard by a voluntary board of
physicians, and a sincere attempt was made to monitor the quality of care.
Since then, the IPA/HMO has grown much faster than either the group practice or the staff model HMO. These models led to
widespread dissemination of managed care plans. Under managed care programs the fundamental incentive structure of most fee-for-
service medicine is dramatically altered to encourage greater control over theuse and costs of heatlh care service.
A major factor in the overall success of HMOs was the willingness of physicians to accept financial risk. If the HMO physician
incurred expenses exceeding budgeted costs, the part or all of the short fall would be absorbed by the physician. On the other hand,
any excess in revenue over expenses could be shared by the physican. In addition, enrollees achieved considerable savings in health
premiums -- mainly by reducing the number of unnecessary hospital admissions and long stays.
By combining coverage for outpatient and inpatient care in a single premium, HMOs were also clearly able to reduce hospital
utilization. The question is whether this reduces cost or simply by shifting some services appropriately to a less costly ambulatory
setting.
HMO physicians share the risk of over utilization of medical care whether they work on a contract basis (group model), on a salary
(staff model), or capitation (some IPAs, networks, and direct contract capitated models), or on fee-for-service (some IPAs and direct
contract fee-for-service models). They are also subject to the corporate influences, such as those arising from mergers, budget cuts,
diversification, as well as to having their services marketed to the public. In actuality, managed care is consigned to institutional
sponsorship.
Clearly, managed care has become a continuum, with a number of plan types offering an array of features that vary in their abilities
to balance access to care, cost, quality control, benefit design, and flexibility. There is not one single definition of the term managed
care that has endured in the past or will survive in the future.
Other General Managed Care Sites:
● Managed Health Care Today & Chiropractic Online Today: Managed Health Care -- An Introduction
● Prevention and Managed Care: Opportunities For Managed Care Organizations, Purchasers of Health Care, and Public
Health Agencies
● University Consortium For Clinical Research: UCCR: Nutrition Support For Managed Care
● Your Money & Your Life: America's Managed Care Revolution (Facts About Managed Care)
● NewsPage
● Your Money & Your Life: Do Managed Care and Competition Really Save Money?
● Chironet: Defining Managed Care: "The Devil You Know"
● Catholic Health Association of Wisconsin: Ethical Considerations in Managed Care
InfoXpress
This page is under construction. Please return at a later date.

Medicaid Managed Care
Medicaid is operated by states but uses both federal and state funds to support it. Medicaid began as a fee-for-service program in
1966, and is overseen by some state agencies and the Health Care Financing Administration (HCFA). It has incurred considerable
cost inflation over the past couple decades. In fact, Medicaid expenditures are often cited as being the fastest growing segment of
state budgets. Medicaid payment systems have been slow to adapt to recent health care cost-containment methods whose
effectiveness seems to have been demonstrated in the private sector. However, managed care approaches for Medicaid recipients
have often been encouraged as a means to controlling costs. A few states enrolled Medicaid recipients in managed care programs as
early as the 1970s, with mixed results. However, the conversion of Medicaid to a Managed Care model of service delivery is rapidly
growing in the 1990s. In fact, it has been estimated that Medicaid Managed Care participants have nearly quadrupled since 1990 -- to
more than 10 million today.
This change in Medicaid calls for many important policy and management questions to be scrutinized. And, even though there has
been this experience with Medicaid Managed Care models, there are important developments that have recently occurred that make
this scrutiny important.
Specifically, many states have begun mandatory enrollment of certain Medicaid populations. This may be complicated by the fact
that many HMOs providing services to Medicaid population have had little or no prior experience servicing this market.
Additionally, some providers and patients may be adversely affected which means the ethics of Medicaid Managed Care, too, must
be scrutinized.
Other Medicaid Managed Care related sites:
● University of Chicago / Center for Health Administration Studies: Medicaid Enrollees In

HMO's
● Texas Department of Health / Bureau of Managed Care: Expansion of Medicaid Managed

Care in Texas
● Henry J. Kaiser Family Foundation: The Kaiser Commission on The Future of Medicaid
● National Cancer Institute: How Might Managed Care Affect the Indigent?
Managed Care Research
Research is to see what everybody has seen and to think what nobody has thought.
-- Albert Szent-Gyorgy
In any domain, research plays an important role. But in health care, research is vital. It is responsible for the way in which
discoveries are made, new ideas are perpetuated or discarded, events are controlled or predicted, and theory developed or refined.
The amount of social and behavioral research in health care increased dramatically over the last couple decades. Major shifts in the
nature of health and illness as well as in the management of health problems. Today researchers are looking much more closely at the
relationship of risk to diseases and methods for prevention. Clearly, preventing or managing disease and the individual's behavior and
life-style are now recognized as critical factors.
Managed care has taken an active role in these changes. Therefore, considerable emphasis is being given by researchers to look at
managed care as it operates within our health care systems as a whole. Managed care has a dual impact within our health care system
-- it effects health care both directly and indirectly. Managed care effects health care directly in terms of the type or care provided,
the amount of care given, the quality of care and the access to care. Indirectly, it affects health care costs. Clearly managed care is
ripe for research and a continued emphasis will be placed in this area.
Managed care will continue to be the target of evaluation research -- a systematic application of social research procedures for
assessing an understanding of the design, implementation and utility of managed care programs in the United States. This is
imperative as policy decisions will be made with respect to funding, planning, etc.
Other Managed Care Research Links:
● Medscape: Managed Care

● Sterling Healthcare Outcomes
● THE Best Place to Find Research Information (Trochim's Knowledge Base)
InfoXpress Managed Care Terminology
Capitation - Implies the plans bear the costs of inefficiency in the production and delivery of services and reap the rewards of
efficiency.
Case management - Also referred to as Large Case Management. A method of managing the provision of health care to members
with catastrophic or high cost medical conditions. The goal is to coordinate the care so as to both improve continuity and quality of
care as well as lower costs. This generally is a dedicated function in the utilization management department.
Churning - The practice of a provider seeing a patient more often than is medically necessary, primarily to increase revenue through
an increased number of services. Churning may also apply to any performance-based reimbursement system where there is a heavy
emphasis on productivity (in other words, rewarding a provider for seeing a high volume of patients whether through fee-for-service
or through an appraisal system that pays a bonus for productivity).
Closed panel - A managed care plan that contracts with physicians on an exclusive basis for services and does not allow those
physicians to see patients for another managed care organization. Examples include staff and group model HMOs. Could apply to a
large private medical group that contracts with an HMO.
Community rating - The rating methodology required of federally qualified HMOs and of HMOs under the laws of many states, and
occasionally indemnity plans under certain circumstances. The HMO must obtain the same amount of money per member for all
members in the plan. Community rating does allow for variability by allowing the HMO to factor in differences for age, sex, mix
(average contract size), and industry factors; not all factors are necessarily allowed under state laws.
Coordinated care - The federal government's term for managed care. Presumably a "kinder and gentler" way of saying it.
Direct contracting - A term describing a provider of integrated health care. A delivery system contracting directly with employers
rather than through an insurance company or managed care organization. A superficially attractive option that occasionally works
when the employer is large enough. Not to be confused with direct contract model.
Direct contract model - A managed care health plan that contracts directly with private practice physicians in the community, rather
than through an intermediary such as an IPA or medical group. A common type of model in open panel HMOs.
Disenrollment - The process of termination of coverage. Voluntary termination would include a member quitting because he or she
simply wants out. Involuntary termination would include leaving the plan because of changing jobs. A rare and serious form of
involuntary Disenrollment is when the plan terminates a member's coverage against the member's will. This is usually only allowed
(under state and federal laws) for gross offenses such as fraud, abuse, nonpayment of premium or copayments, or a demonstrated
inability to comply with recommended treatment plans.
EPO - Exclusive provider organization. An EPO is similar to an HMO in that it often uses primary physicians as gatekeepers, often
capitates providers, has a limited provider panel, and uses an authorization system, etc.. It is referred to as exclusive because the
member must remain within the network to receive benefits. The main difference is that EPOs are generally regulated under
insurance statutes rather than HMO regulations. Not allowed in many states that maintain that EPOs are really HMOs.
Gatekeeper - An informal, though widely used term that refers to a primary care case management model health plan. In this model,
all care from providers other than the primary care physician, except for true emergencies, must be authorized by the primary care
physician before care is rendered. This is a predominant feature of almost all HMOs.
Group model - An HMO that contracts with a medical group for the provision of health care services. The relationship between the
HMO and the medical group is generally very close, although there are wide variations in the relative independence of the group
from the HMO. A form of closed panel health plan.
HMO - Health Maintenance Organization - The definition of an HMO has changed substantially. Originally, an HMO was defined
as a prepaid organization that provided health care to voluntarily enrolled members in return for a preset amount of money on a per-
member per-month basis. With the increase in self-insured business, or with financial arrangements that do not rely on prepayment,
that definition is no longer accurate. Now the definition needs to encompass two possibilities: a health plan that utilizes primary care
physicians as gatekeepers (although there are some HMOs that do not).
IDS - Integrated Delivery System; also referred to as an Integrated Health Care Delivery System - An organized system of health
care providers to span a broad range of health care services. See: IPA, PHO, MSO, Equity model, Staff model, Foundation model.
IPA - Independent practice association. An organization that has a contract with a managed care plan to deliver services in return for
a single capitation rate. The IPA in turn contracts with individual providers to provide the services either on a capitation basis or on a
fee-for-service basis.
Managed health care - A regrettably nebulous term. At the very least, is a system of health care delivery that tries to manage the
cost of health care, the quality of that health care, and access to that care. Common denominators include a panel of contracted
providers that is less than the entire universe of available providers, some type of limitations on benefits to subscribers who use
noncontracted providers (unless authorized to do so), and some type of authorization system. Managed health care is actually a
spectrum of systems, ranging from so-called managed indemnity, through PPOs, POS, open panel HMOs, and closed panel HMOs.
Open panel - A managed care plan that contracts (either directly or indirectly) with private physicians to deliver care in their own
offices. Examples would include a direct contract HMO and IPA.
POS - Point of service. A plan where members do not have to choose how to receive services until they need them. The most
common use of the term applies to a plan that enrolls each member in both an HMO (HMO-like) system and an indemnity plan.
Occasionally referred to as an HMO swing-out plan, an out-of-plan benefits rider to an HMO, or a primary care PPO. These plans
provide a difference in benefits (eg, 100% coverage rather than 70%) depending on whether the member chooses to use the plan
(including its providers and in compliance with authorization system) or go outside the plan for services. Dual choice refers to an
HMO-like plan with an indemnity plan, and triple choice refers to the addition of a PPO to the dual choice. An archaic but still valid
definition applies to a simple PPO, where members receive coverage at a greater level if they use preferred providers (albeit without
a gatekeeper system) than if they choose not to do so.
PPA - Preferred provider arrangement. Same as a PPO but sometimes used to refer to a somewhat looser type of plan in which the
payer (ie, the employer) makes the arrangement rather than the providers.
PPO - Preferred provider organization. A plan that contracts with independent providers at a discount for services. The panel is
limited in size and usually has some type of utilization review system associated with it. A PPO may be risk bearing, like an
insurance company, or may be nonrisk bearing, like a physician sponsored PPO that markets itself to insurance companies or self-
insured companies via an access fee.
PPS - Prospective payment system. A generic term applied to a reimbursement system that pays prospectively rather than on the
basis of charges. Generally is used only to refer to hospital reimbursement and applied only to DRGs, but it may encompass other
methodologies as well.

Other Glossary links:
● Colorado Health Net

● Cybergate: All About Managed Care and the New Medicine
● Your Money & Your Life: What Is Managed Care? (A Glossary)
MANAGED CARE: GENERAL INFORMATION
JUST THE FACTS, M'AM. NOTHING BUT THE FACTS, M'AM
Well, as you encounter the wide array of alleged facts contained in the WEB pages, you begin to comprehend how bewildering is this
rapidly growing entity, managed care. The old joke, Soon you'll need a Ph.D. to be a sanitation engineer!, is becominging rapidly non-
comedic in its applicability to the level of knowledge needed to understand the many intricacies of managed care.
The traditional fee-for-service health care system has become a financial siphon in the pockets of thirdparty insurers, a burden that
has been passed with alacrity onto to the consumers of health care. To compensate for the escalating costs of health care today, the
private and public sectors have turned to cost-cutting strategists who dangle the proposal of cost-containment practices in harmonious
coupling with quality health care.
● Cost Cutting Ideas In Healthcare
● TV Program Transcript: <1>Your Money & Your Life
● Managed Healthcare: Quality Ratings
The array of questions and dissonant answers sparked by the cost-containment strategems is indicative of the disjuncture in managed
care produced by efforts to attain this ideal.
● Will The Very Best Doctor Treat Me And My Family?
● Is My Doctor Trying To Save Me, Or Save Money?
● Quality Ratings
● Must I Fight To Get Routine Care From My HMO?
● My Doctor Is Being Pushed Around By An HMO---Should I Care?
● How Well Will My HMO Take Care Of My Chronic Disease?
● Can Medicare HMO Adequately Care For The Elderly And Save Money?
● Do Managed Care And Competition Really Save Money?
● Does Managed Care Provide Higher Quality Care Than Fee-For-Service Medicine?
● Is managed Care Good For My Patients?
● Does managed Care Provide The Quality Of Nursing Care?
● What Is The Impact Of Managed Care On patients In Hospitals?
● Do Members Have Adequate Rights Inside HMOs?
● Should The Government Encourage Americans To Enroll In Medicare Managed Care?
● Does Managed Care Provide Higher Quality Care Then Fee-For-Service Medicine?
The proliferation of managed care glossaries with less than uniform definitions of managed care terms and concepts, attests to the
kind and degree of confusion that awaits the unwary shopper for health care services.
● What Is Managed Care?
● Managed Care Glossary
● Glossary Of managed Care And Health Insurance Terms
● How Do HMOs work?
● How Do PPOs work?
● Definition Of Terms: As Used In The Managed Healthcare Insudtry Today
Into this mix looms the specter of a proliferation of legislative and civic healthcare reform
● Healthcare Reform And Legislation
● The Health Policy Page
● Proposed law: California Health Seurity Act
MANAGED CARE CONTROVERSY: THE DEBATE HEATS UP

When entering into debate, the question of where to initially engage your opponent, usually referred to as "learned colleague",
becomes strategically important. The same becomes true for the individual who reports the debate issues and the methods employed
in the colleaguial sparring. This is especially so for social scientists committed to accurate reconstruction of their observations.
As a budding social scientist who shares that same committment and, who is heavily influenced by the mixed-methods approach to
researching social phenomena, the effort to achieve a functional balance of the "I" present in my inquiry process is a constant
endeavor. As a health care professional, and in placation of this effort, I have here decided to start cautiously my tour of the W.W.W.
discussions of the controversy surrounding health care delivery in a managed care environment.
Charles A. Simpson, D.C., in his paper, Defining Managed Care:
,warns the medical community to tread cautiously when demonizing managed care as a whole. He reminds us that what constitutes
managed care often depends on whom you are talking to, and that the differences are mostly those of emphasis.
The debate issues seem to be centered principally around Access to care
HSR 10-Year Index: Access To Care
Capitation
● Capitation: An Open Forum
● Capitation: Forum Questions & Answers
● Topic: Public health Education
● Topic: The Real Answer To Capitation
● Topic: Capitation Not Affect Decision making?
● Topic: Health Care Reform From The consumer's Point Of View
Cost
● The Cost Of Medical Care: Public Expectations
Denial of care
● Managed Care-vs-The gold Standard: An Editorial
An area of medicine that has had a relatively low profile is now gaining nationwide attention as the activities of bioethicists are
treated to the type of microscopic investigation that has mushroomed in the wake of criticism about the the potential (and some
would say, actual) conflict of interest physicians face in managed care when their fee is tied to incentives to contain the cost of
providing quality health care. Plugging into the On-Line Debate on-line debate gives opportunity for receiving and sending
information to an interested and motivated forum (most of the participants are physicians, a few of whom present some novel
thoughts and innovative strategies for cost containment).
Ethics
● Center For Clinical Ethics and Humanities in Health Care
● Topic: What Can Society Do To Reduce The Public's Demands And Unreasonable
Expectations In Medical Care? What Are The Ethics Involved?
● Topic: What Can we Do?...A Response
● Ethics Of Managed Care
Liability
● Topic: Ultimate Liability Issue Of Being In A Capitated Arrangement
● Australian Health Outcomes Clearing House
Physician Incentives
● Topic: Incentive System For Physicians
● Topic: Doctors Are Finacially Rewarded For Keeping Lid On The Number Of Tests
● Topic: Susan Dorr Gould's 12 Point List Of incentives Principles
The central organizations in managed care are Health Maintenance Organizations (HMOs) that broker health care services between
the providers and the consumers. Increasingly, HMOs are seizing every opportunity to explain and justify their organizational
approach to health care delivery. With the most to lose, the huge HMOs are in the vanguard of efforts to assuage negative public
opinion. Several sites on the WEB read like solicitous advertisments, but contain useful information about the issues in managed care:
Role of HMOs
● Humana, Inc.
● How Your HMO Could Hurt You

Role of Nurses
● Pt Advocacy Role Of Nurses Evolving In Managed Health Care
Physicians/Specialists
● Topic: Issues Specialty Physicians Are Facing
● Topic: Physicians Have An Opportunity To Gain Control Back. Risk?
● American Medical Specialty Organization, Inc.
Some see managed care as giving rise to unusual opportunity.
● Topic: Physicians have An Opportunity To Gain Control Back. Risk?
Others see managed care as a vehicle to drive forth their commercial interests. The nunbers and kinds of companies that provide
support services to the managed care industry is only surprising because so many have sprung up so recently. The support services
include: Informatics
● Succeed With patient-Centered Outcomes And Velocity
● health Care And The NII
● Databases
MANAGED CARE RESEARCH: WHERE WE ARE & WHERE WE ARE GOING The health care consumer (i.e., the patient)
is rhetorically centered in the designs for improving health care delivery in a managed care environment. Pragmatically, the patient is
positioned aside to the real central issue of cost effectiveness/cost containment strategies. However, if managed care is to be a
survivable system of health care delivery, the patient must gain actual prominence in the assessment, design and utilization of health
care services. Providers of health care services survive at the will of the consumer (an oft-ignored fact), and, consequently, providers
need to begin systematic inquiry into sustainable methods for delivering health care that meets the needs of the consumer----that is,
providers need contextual understanding of the meaning of health and disease for the consumer and how these meanings influence
their health choices.
The categories of research into the delivery of managed health care services must broaden past the constrictive and inadequate
emphasis on cost-related phenomena. Michael S. Lundy, M.D., in recognition of the importance of documentation has for responsive
clinical care, has cogently detailed some major issues and concerns for outcomes resesarch in an environment of managed care. Dr.
Lundy has presented this information in his paper, The Computer-Based patient Record [CPR], Managed Care and the Fate of
Clinical Outcomes Research
Abstract
<./a>
In this work, he also presents strong advocacy for meaningful patient (consumer) involvement in the defining of outcomes measures,
and the construction of the CPR for use as a clinical-management and a research tool. A caveat to the indiscriminant embrass of
automated data collection capability is given by Donald Berwick, M.D. in his article "Quality Comes Home", discussed by David A.
Mackoul, M.D. in Medical Grant Takes Aim at Managed Care, Shouldn't You?
Medical Grant Takes Aim At managed Care, Shouldn't You?
The pharmaceutical industry has special and focused concerns about the issues related to pharmacological treatment efficacy in the
environment of Managed Pharmaceuticals. It has responded to these concerns through Outcomes Research
● Outcomes: Demonstrating The Value Of Our Medicines Through Outcomes Research
● Managed Pharmaceutical Care---Expanding Our Leadership Role In Managing Drug Care

And Patient Outcomes
● site name
Most commonly, the types of pharmacoeconomic evaluations funded by the drug companies are cost-effectiveness related
● Pharmacoeconomic Evaluations
Public expenditures for health care has escalated alarmingly. Consequent to this, is the increasing switch to Medicaid and Medicare
enrollment in Health Maintenance Organizations. This innovative approach to providing health care to recipients of public aid is
added to the debate agenda: Advocates' arguments are that managed care provides higher quality care; critics' arguments are that
managed care restricts access to care. Opportunistic "bilking" of Medicare and Medicaid funds continues apace with the switching
process, and consequently, impact studies are becoming more numerous. For example, Claire Kohrman, James Hughes and Ronald
Andersen cite the results of a study they undertook to look at the impact HMO enrollment had for obstetrical care in their paper,
Medicaid Enrollees in HMO's: A Comparative Anaysis of Perinatal Outcomes for Mothers and
Newborns in a Large Chicago HMO
Several other WEB sites address different aspects of the research that is occurring, pending, or needed in managed care:
Nutrition
● Nutrition support For managed Care

University-based
● UAB Research
The Association for Health Services Research (HSR) publishes a journal that covers all aspects of health care delivery. Their scope is
comprehensive, and the WEB pages contain HSR 10-year indexes by subject: Access to Care
HSR 10-Year Index: Access To care
Utilization State
Articles
● Related Articles
● Key Articles
● Publications Of HCA Faculty Members
Books
● Books
● The Health Policy nstitute
● Center For Clinical Ethics And Humanities In Health Care
● Guide To Organizations With Interest In Health Care Issues
Research Series
Research series
ON-LINE RESOURCES
● Your Money & Your Life

● Organizations and Government
● Federal Online Databases
Women in Agriculture in Sub-Saharan Africa
By Marieme Lo
Content
Introduction
The role of women in agriculture
Obstacles to women’s empowerment
Women’s empowerment strategies
Favorite links
Bibliography
Introduction
Women’s role in agriculture and food security is critical in sub-Saharan Africa.
However many researches point out the lack of visibility of their participation, and
contribution in agriculture and development in general. The impediments to women's
empowerment encompass their lack of access to decision making processes, their low
participation in local governance, as well as their limited access to technology inputs
and credit. Land tenure is another stumbling block to women’s full access and
control of land and the agricultural output. Although many projects endeavor to
address rural women’s needs, their empowerment should go beyond the efficiency,
functionalist approach that only value their productive and reproductive roles. It is
a matter of equity to empower women in a key sector where they are the major
contributors to household, community subsistence and food security.
Back to top
The role of women in agriculture

In Sub-Saharan Africa, studies have shown that women play a crucial role in many
aspects of crop production. The gender decision of labor is clearly defined according
to cash or subsistence agriculture. While men are often responsible for land clearing,
burning and ploughing, women specialize in weeding, transplanting, post-harvest work
and, in some areas, land preparation. Both take part in seeding and harvesting.
The division of labor and male farming system in shifting cultivation confine women to
subsistence food production, and men to cash crop cultivation, which contributes to
the economic disempowerment of women. Moreover, Sahelian women in particular
play a major role in household animal-production enterprises. They tend to have the
primary responsibility for the husbandry of small animals and ruminants, they also
take care of large animal systems - herding, providing water and feed, cleaning stalls
and milking. In all types of animal production systems, women have a predominant
role in processing, particularly of milk products, and are commonly responsible for
marketing.
In the Sahel is a zone of contrast, women assume different roles in the food
production often as staple crops traders in the market. In peri-urban areas, women
control the market especially the vegetable market and contribute significantly to the
informal sector, the booming and most vibrant economic sector. More access to
economic power and resources provide women with income and the fall back position
and bargaining power in the household, especially when women are the shock bearers
of whole communities in situation of crisis.
A FAO report confirms that in Sub- Saharan countries women provide:

“ 70% of the agricultural workers
60-80% of the labor to produce food for household consumption and sale
100% of the processing for basic foodstuffs
90% of household water and fuel wood
80% of food storage and transport from farm to village
90% of the hoeing and weeding work
[1]
60% of the harvesting and marketing activities”
Disaggregated data on the gender division of labor confirms men and women different
role in food and cash crop production. It also reveals men and women differential
managerial and financial control over production, storage, marketing of agricultural
products, as well as their unequal access to land, credit, and productivity enhancing
inputs. The data selection and interpretation if not informed by gender awareness
does little justice to women’s critical contribution in agriculture subsistence,
especially when “ subsistence production does not represent a large share of GDP in
[2]
monetary terms.”
Back to top
Obstacles to women’s empowerment

Despite their role as the backbone of food production and provision for family
consumption in the Sahel, women have limited access to critical resources, inputs
and support services. Their access is even more limited due to cultural, traditional
and sociological factors and the gender division of labor that confine women to
subsistence, food crop cultivation, and men to cash crop production.
Another instance to understand rural women predicaments is the household, not only
as a unit of conflict, subordination and negotiation, but also a manifestation of the
deep seated inequality, that is embedded in and interpreted as culturally determined
within a set of socially constructed roles, rights and acceptance. “ Households/
families are recognizably constituted of multiple actors, with varying (often
conflicting) preferences and interests, and differential abilities to pursue and realize
those interests. They are arenas of (albeit not the determinants of) consumption,
production and investment, within which both labor and resource allocation are
[3]
made.”
Women have little access to the benefits of research and innovation, especially in the
domain of food crops, which - in spite of ensuring food security at the household and
community level - have a low priority in crop improvement research. Beyond their
demographic representation they constitute an important constituency for research.
A World Bank report on “Rural Women in the Sahel and their Access to Agricultural
Extension Sector Study assessed the impact of the extension program on women in
five countries. Using surveys, interviews and country reports they made inferences
that “women's access to technological inputs such as improved seeds, fertilisers and
pesticides is limited. Their findings pinpoint that “women are not frequently
reached by extension services and are rarely members of co-operatives, which often
distribute government subsidized inputs to small farmers.”
Their analysis pinpoints also some of the impediments to productivity among which
the lack of access to appropriate technology: “women's low labor productivity arises
from, among other things, difficulties in obtaining the water and fuel necessary for
many value-adding an income-generating activities, and from the lack of appropriate
and affordable tools, equipment and technologies to save labor and conserve produce.”
Back to top
Women’s empowerment strategies

The data selection and interpretation if not informed by gender awareness or gender
analysis does little justice to women’s critical contribution in agriculture subsistence
especially when “subsistence production does not represent a large share of GDP in
[4]
monetary terms.”
Rural appraisal and participatory needs’ assessment and issue mapping should take
into women representation and bear in mind potential gender biases to have a true
picture of the existing condition and context of rural settings. A real effort of
excavation should be done in research and project implementation to identify and
address women’s gendered needs.
Women have little access to the benefits of research and innovation, especially in the
domain of food crops, which - in spite of ensuring food security at the household and
community level - have a low priority in crop improvement research. Beyond their
demographic representation, they constitute an important constituency for research
but yet have not been reached extensively by the research output to improve their
conditions.
Research for appropriate technologies, micro credit, market should be tailored to and
take into account their specific needs. Women farming system and needs are often
ignored when devising technology intervention strategies. Although gender
mainstreaming and analysis have made significant headway in project design and
implementation, gender mainstreaming still remains a concept.
Strategies to address the impediments to women’s empowerment at the societal and

economic levels should go beyond the welfare, poverty alleviation, efficiency model,
and the WID and GAD theory laden debates. Power of decision and choice of
alternatives should be entrusted in rural women’s hands to make the decisions
regarding their self-fulfilment and realisation, within their complex socio-cultural
environment and determinants.
Back to top
Favorite links
http://www.fao.org/sd/default.htm
http://www.fao.org/wfs/resource/english/arc96-4.htm
http://www.worldbank.org/afr/findings/english/find51.htm
http://www.un.org/womenwatch/
http://www.fao.org/sd/fsdirect/fbdirect/fsp001.htm
http://www.thp.org/prize/99/prospectus.html
http://www.web.apc.org/~econews/
http://www.fao.org/sd/fsdirect/fbdirect/fsp001.htm
http://www.unifem.undp.org/s&tech6.htm
http://genderstats.worldbank.org/menu.asp
http://www.worldbank.org/gender/know/researchcover.htm
http://www.unifem.undp.org/s&tech6.htm
http://www.worldbank.org/gender/know/projc.htm
http://www.worldbank.org/gender/tensteps.htm
http://www.sidint.org/publications/development/vol41/no4/41-4f.htm
http://www.worldbank.org/gender/know/checkag.htm
http://www.fao.org/WAICENT/FAOINFO/SUSTDEV/WPdirect/WPan0002.htm
http://wbln0018.worldbank.org/essd/essd.nsf/gender/home
Back to top
Bibliography
Teresa Picard. 1995. “Listening to and Learning from African Women Farmers.” In
Women and Sustainable Development in Africa. ed. Valentine Udoh James. P.35.
United Nations. The World’s Women. Trends and Statistics. New York. UN. Social
Statistics and Indicators. 1995.
Harcourt,Wendy (ed.). 1994. Feminist perspective on sustainable Development.
London and New Jersey: Zed book limited.
Miller, C and Razavi, S. Gender Analysis. Alternative Paradigms. UNDP. Gender in
Development Paradigm. April 1998.
Agarwal.B Feminist Economist Vol l3. No 1(Spring ):1997
World Bank Report. Rural Women in the Sahel and their Access to Agricultural
Extension . Sector Study: Overview of Five Country Studies. June 1995
http://www.worldbank.org/gender/know/eswafr.htm
Naila Kaber. Reversed Realities. Verso: London. New York. 1994. p291.
World Bank . Findings No 46. Rural Women in the Sahel and Their Access to
Agricultural Extension - Overview of Five. Country Studies August. 1995
http://www.worldbank.org/afr/findings/english/find51.htm
James,Valentine U. (ed.).1995. Women and Sustainable Development in Africa.
Westport, Connecticut: Praeger Publishers.
Sanders, J. Nagy and S. Ramaswamy. 1990 “ Developing New Agricultural Technologies
for the Sahelian Countries: the Burkina Faso Case,” EDDCC: 1-22.
Jahan, Rounaq. 1995.The Elusive Agenda: Mainstreaming Women in Development.
London: Zed Books Ltd.
Jackson, C. and Pearson, R. 1998. Feminist Vision of Development: Gender Analysis
and Policy. London: Routledge.
Back to top
Last revised: Date (4/17/00)
Contact: Marieme Lo
E-mail address: mailto:ML242@cornell.edu
[1]
FAO Report 1995
[2]
United Nations. The World’s Women. Trends and Statistics. New York. United Nations.
Social Statistics and Indicators. 1995 p.114.
[3]
Agarwal.B Feminist Economist Vol l3. No 1(Spring ):1997.p.3.
[4]
United Nations. The World’s Women. Trends and Statistics. New York. United Nations.
Social Statistics and Indicators. 1995 p.114.
BEEKEEPING AND SUSTAINABLE DEVELOPMENT
BY: Jael Ojwaya

Beekeeping is fun. Bee stings but honey is sweet. Welcome... Let us go honey hunting.
What is beekeeping?
Beekeeping refers to the art of keeping bees either for pleasure or commercial purposes. Sounds rather silly but this is what draws the distinction between
beekeeping as a hobby or for commercial purposes. My interest is on commercial beekeeping as means to sustainable development. Hence sustainable beekeeping
seeks to address the importance of beekeeping in terms of its ecological, social and economic benefits. Within ecological dimensions, bees are a source of pollinators
that help increase crop yields. The economic benefits lie within bee products such as honey, royal jelly, propolis, bee pollen and beeswax that are highly valuable and
have high market prices. But most importantly on honey as a source of food with high nutrition value. In communities where beekeeping is done for commercial
purposes, it has led to self reliance through the innovation of local industries associated with the production of beekeeping equipment and bee products.
Beekeeping Methods
Broadly speaking, beekeeping methods can be divided into two categories; mainly traditional and modern methods. The former is mainly practiced in societies in
which beekeeping is done in small scale and in most cases done as a hobby. Although this definition may not capture who traditional beekeepers are in real life
situation, one distinction between these two groups is on the equipment used. For complete information on traditional and modern beekeeping equipment click here.
Importance of beekeeping
Whenever I disclose to some of my friends that my research interest is on beekeeping and honey production, I can tell from the look on their faces that this is the most
crazy option that I have ever chosen in life. Some questions that I have been asked include what the hell I intend to do with bees (of course not the honey), bees are
dangerous, where will you get a job in future as a beekeeper and so on. This reaction is not unusual, as beekeeping has been a marginalized activity within most
developing countries. The benefits associated with beekeeping still remain a huge mystery for many whom have not ventured into the field. The importance of
beekeeping in grassroots socio-economic development is one option that is available for developing countries as a means to meet the local needs of their people yet this
area has not been exploited. This paper points out strategies and methods of beekeeping that traditional beekeepers in Africa and other developing countries need to
adopt in order to improve the quality of beekeeping and honey production. Some experiences in modern beekeeping in the developed countries are referred to. We
all learn from past experiences!!!!!! And you? Click here.
Need for beekeeping in Africa.

Due to economic marginalization, and frequent failure of past development interventions, the need for local communities in rural Africa to secure economic survival
have encouraged a focus on Indigenous Knowledge (IK) and self-reliance strategies. Where IK and local self-reliance strategies exist and, in instances, where they
have been supported by external agencies, such as WIPO, WTO, FAO, UNCED to name but a few, these often constitute the only means of survival for local
communities. Governments and International Organizations have recently seen the need for reviving local self-reliance and restructuring regional development on
more indigenous and self-reliant lines. Beekeeping is an ideal method towards achieving such goals. It is an ecologically and technically appropriate form of income
generation for communities in most economically and environmentally poor areas of Africa. Its role in promoting economic self-reliance and the need to enhance this
role were identified in the Banjul Bee Declaration of 1991 (Bees for Development 1991).
Beekeeping and self reliance
Although beekeeping can only rarely become the sole source of income and livelihood for people in the Third World, its role as a source of supplementary earnings,
food, and employment should not be underestimated. Key points in the argument that beekeeping is a key element in promoting rural self-reliance are that:
Beekeeping promotes rural diversification and hence is an alternative source of income and employment, particularly in areas where arable land is restricted
and demographic growth is resulting in insufficiently profitable land holdings.
Beekeeping is an activity that can successfully be adopted by women in many parts of the continent.
Beekeeping allows for a degree of risk avoidance by providing a reliable, high value product that enables rural farmers to survive in times of economic crisis.
Beekeeping is a low cost, sustainable undertaking with a low environmental impact.
For more information on Honey bees and beekeeping CLICK ON the links below.
How to begin beekeeping Commercial beekeeping sites Beekeeping and development

Bees and pollination Africanized Bees Modern beekeeping methods
Research on beekeeping Beekeeping journals Beekeeping in Africa
Beekeeping projects worldwide FAQ's for Beekeepers Directory for Beekeeping
For more information and comments on this web page send e-mail to jao23@cornell.edu
Participation in International Development
Shuzo Katsumoto
This page is designed to provide those with little or no knowledge on international development with a brief guide to World Wide
Web resources associated with participation in the field of international development. It aims to help viewers locate sites of interest
for further information.
Contents
● Participation as the Key Word

❍ What is participation in international development?
❍ Why is participation important?
● Methods for Participatory Projects

● Resources on Participatory Development
❍ Multilateral Organizations: WB, IDB, ADB, UNDP, UNEP, UNICEF
❍ Governmental Organizations: USAID, JICA, CIDA, GTZ
❍ Non-Governmental Organizations: CARE, Save the Children, Oxfam
Participation as the Key Word Obviously, "participation" has been given increasing attention in the international
development arena in recent years. Until recently, local residents who were the intended beneficiaries of development projects have
been marginalized members of the projects themselves. Professional development workers have become aware of the significance of
local participation.
What is participation in international development?
Participation is a process through which stakeholders influence and share control over development initiatives and
the decisions and resources which affect them.
This is a definition of "participation" from the Final Report of the Participation Learning Group within the World Bank. To interpret
this more casually, participation in the context of international development means the process of involvement of people at most stake
in the development project . It is desired that people take part in a project at every phase of the project cycle; these phases are,
roughly, project identification, implementation, and evaluation. Local people, who have the greatest interest in projects which affect
them, are those who know the real needs, appropriate size and form of projects, and effective ways to give feedback. Therefore it is
necessary for local residents to be involved as major actors in the development of their own project. If you wish to know more
details, see the World Bank Participation Sourcebook, where the Bank clarifies its stance with respect to participation, or refer to the
Inter-American Development Bank Participation Framework .
Why is participation important?
Participation is important because it makes projects efficient, effective, and sustainable in a variety of ways. The main points are the
following:
● Participation clarifies project goals, essentially the promotion of the social and economic development of local
communities.
● Participation reduces project cost, by identifying site-specific data crucial for determining the most effective size, form and
means of execution for projects.
● Participation prevents/reduces management conflicts that may be caused between development workers and local people,
by negotiating and sharing the development process.
● Participation promotes technology transfer to people in need, which is often necessary for projects to have lasting impact.
● Participation encourages a culture of self help and a commitment among the people to the development of their own
communities. This is one of the most significant goals of participatory projects.
The Inter-American Development Bank Participation Framework introduced above also neatly describes the effect of participation.
Return to Contents
Methods for Participatory Projects
It is often hard for those involved in development projects, both professional development workers and local people, to share their
ideas about project goals and the effective means to accomplish them. This may not be surprising because usually they are from
different cultures and occupational backgrounds. In order to implement an effective participatory project, and be able to evaluate it,
specially systematized methodology is needed. In the late 1960s, US Agency for International Development (USAID) proposed a
method called Logical Framework (Log Frame, in short) to help project management, and in the 1970s many agencies associated
with international development cooperation applied and introduced this concept to their project management systems. In 1983,
Deutsche Gesellschaft fur Technische Zusammenarbeit (GTZ), the German governmental organization for development assistance,
integrated the idea of participation to the Log Frame and developed a project planning method called ZOPP, which is translated into
English as "Objectives-Oriented Project Planning." Since then, other European governmental and non-governmental agencies
introduced this ZOPP to their projects.
Based on the theories of Log Frame and ZOPP, the Project Cycle Management (PCM) was developed by the Foundation for
Advanced Studies on International Development, Japan (FASID). PCM is characterized by its participatory approach and application
of Log Frame. It is a method for project management in which the project cycle is managed by using a conceptual framework for
projects, called Project Design Matrix (PDM). With its graphical model, PDM helps people clarify logical relationships among
project components such as goals, activities, inputs, and external conditions. It is expected for both professional development
workers and local residents to participate in workshops with the objective of PDM formulation. An appropriate application of this
PCM enables those involved in projects to clarify and share project goal as well as data on projects and their environments. The
Japan International Cooperation Agency (JICA) has recently started implementing PCM in its development projects, and people in
the development field have become increasingly interested in PCM as an effective method to achieve participatory projects.
The World Bank introduces an array of resources on participatory methods and tools, including typology of methods.
You can download Logical Framework described above at the Team Technologies Inc. home page.
In addition, as a related method to project formulation, you may find the idea of Concept Mapping helpful. Several articles on
concept mapping by William Trochim, a professor at Cornell University, appear in his home page, including "Concept Mapping",
"Introduction to Concept Mapping".
Also, you can find a description on concept mapping at the Concept Systems Inc. website, which introduces their software products
to help groups or organizations think and work effectively and efficiently. You can order softwore from them through this site.
Return to Contents
Resources on Participatory Development
Below are resources on participatory development at three different levels of agencies. Unfortunately, there seems not to be many
sites available directly covering the topic of participation. Therefore, you will see some agencies providing only general information
on their activities. Nonetheless, you will see how these major organizations currently view development and how they are conducting
their projects. Some difference between the stances of these agencies may emerge by comparing them:
● Multilateral Organizations
The principle of multilateral organizaitions is to deal with regional or worldwide issues beyond a single nation's interest.
Here are some major multilateral organizations working in the international development arena.
The World Bank (WB) : The World Bank Group comprises five organizations: the International Bank for
Reconstruction and Development (IBRD), the International Development Association (IDA), the International Finance
Corporation (IFC), the Multilateral Investment Guarantee Agency (MIGA), and the International Centre for the Settlement
of Investment Disputes (ICSID). If one says just "the World Bank," it usually means the IBRD, the World Bank Group's
main lending organization. It lends money to developing countries for various development projects and structural
adjustment programs within a macroeconomic framework.
Here are Examples of Participatory Approaches the Bank used in their operations. This site encompasses the Bank-
supported projects from a variety of countries, sectors, and types of activities.
You can also see Examples of Bank Best Practice Projects in the Africa Region Participation Home Page within the WB
page.
The Inter-American Development Bank (IDB) : IDB was established in 1959 as the oldest and largest regional
multilateral development institution. It is engaged in promoting economic and social development in Latin America and the
Caribbean through lending.
You can check out Participation in Development: The Evolution of the Concept and Practices within the Bank. This is a
brief description laying out the Bank's stance with respect to participation in development.
The Asian Development Bank (ADB) : ADB is a development finance institution which consists of 56 member
countries. The purpose of its operation is to help accelerate the economic and social development of its member countries in
the Asian and Pacific region.
ADB does not have any sites which directly present participation in development issues. But you can see ADB Project
Profiles listed according to country.
United Nations Development Programme (UNDP) : UNDP, an organization within the United Nations system,
is the world's largest multilateral source of grant funding for development. It stresses its role of promoting sustainable
human development.
Regretfully, there are no resources on participation available in the UNDP website. For their activities, see UNDP
Programme Activities.
United Nations Environment Programme (UNEP) : UNEP is an organization within the United Nations system,
whose mission is to create a basis for comprehensive consideration and coordinated action on the problems of the human
environment.
It is a pity the UNEP website does not provide any resources on participation. Nevertheless, you can see a general
introduction to UNEP programmes.
United Nations Children's Fund (UNICEF) : UNICEF is the only organization in the United Nations system
which is dedicated exclusively to children's issues. It advocates and works for the protection of children's rights.
The UNICEF Namibia home page includes a brief description on Rights of Participation and Child Participation in Namibia
Today. It also includes the idea of participation in its Programme Highlights for the Programme of Cooperation 1997-2001.
Return to Contents
● Governmental Organizations
Governmental organizations necessarily have toconsider the interests of their own countries as well as the benefits of
developing countries. In other words, their general stance is that they implement development projects which meet their own
economic and political interests and concerns abroad.
US Agency for International Development (USAID) : USAID is a governmental agency established in 1961 by
then President John F. Kennedy. It provides economic development and humanitarian assistance in order to enhance U.S.
economic and political interests overseas.
USAID has considerable resources on participatory projects. The Participatory Development page encompasses The
Administrator's Statement of Principles on Participatory Development and a series called The Participation Forum including
Participation in Policy Reform, Participation and Gender, Participation When There is No Time, and What Participation
Means in Disasters and Conflicts.
In addition, USAID introduces its new strategy for its global response to the HIV/AIDS pandemic, which is being
developed through a participatory approach outlined in Participatory Strategy.
You may also be interested in their activity profile on Private Participation in Environmental Services.
The Japan International Cooperation Agency (JICA) : JICA, a Japanese governmental agency, implements the
technical cooperation aspect of Japan's Official Development Assistance (ODA) programs. As a part of its activities, it
dispatches experts and Japan Overseas Cooperation Volunteers (JOCV) to developing countries.
JICA's home page holds Participatory Development and Good Governance Report of the Aid Study Committee, which is
comprised the following three chapters: 1) Debate Over Participatory Development and Good Governance: Background and
Present Situation , 2) Approaches to Japan's Aid for Participatory Development and Good Governance, and 3) Framework
for Japan's Assistance for Participatory Development and Good Governance and Future Tasks.
The Canadian International Development Agency (CIDA) : CIDA is the Canadian governmental organization
carrying out programs to help people in developing countries achieve self-sustainable economic and social development,
and thereby contribute to Canada's political and economic interests abroad.
CIDA does not have any direct resources on participation in its website. You can look at their program list, which presents a
variety of programs they have done.
The Deutsche Gesellschaft fur Technische Zusammenarbeit (GTZ) : GTZ, the German agency, is one of the
largest state service organizations for international development cooperation in the world.
GTZ does not provide any information on participation in its site. Nevertheless, their Activity Areas and Projects website
may be of interest.
Return to Contents
● Non-Governmental Organizations
Non-Governmental Organizations (NGOs) are grass-roots organizaitions and tend to carry out small- or medium-size
projects based on their humanitarian ideals. Taking their stances at the citizen's level, they work closely with those in need
of help.
Here I introduce three major NGOs. Unfortunately, they do not have any resources on participation.
CARE : CARE is a worldwide organization which conducts its projects in the areas of health, water,
sanitation and population, emergency aid, food security, agriculture and natural resource management, small business
support, and education. In 1996 it had projects ongoing in more than 60 countries of the world.
Save the Children : Save the Children is the largest international children's charity in the United Kingdom.
It has programmes in over 50 countries and endeavors to defend children's rights and to free children and their families from
poverty.
Oxfam : Oxfam is a global development and relief organization working in more than 120 countries. It was
set up in the UK and initiated its activity during the World War II with the purpose of relieving Greek people in urgent need
of food and medical supplies under Nazis occupation. It has been carrying out a variety of programmes to tackle poverty in
general.
Return to Contents
Trochim Home Page
Project Gallery Home Page
Please send comments/suggestions to sk106@cornell.edu
created in April, 1997

Project Cycle
Shuzo Katsumoto
How a Development Project is Formed
Development projects are implemented with the aim of promoting economic and social development in less developed countries.
There are various kinds of projects: dam construction, road paving, increased literacy, improved nutrition, environmental protection
and so forth. There are three types of agencies which are involved in a development project: 1) international organizations
implementing multi-lateral cooperation; 2) states' governments implementing bi-lateral cooperation; and 3) non-governmental
organizations (NGOs) implementing grass-roots cooperation. In spite of these differences in their kinds and agencies, development
projects display a common structure, which is called the project cycle . A project cycle has four major phases:
1. Project finding
2. Feasibility study
3. Implementation
4. Evaluation
1. Project Finding
Information on the needed projects, greatly influenced by the national development master plan of the developing country, is
collected. Many potential projects which may be effective in promoting the development of the country are listed and then
prioritized. For example, infrastructure building may be the primary work to be done immediately in some countries, but may not be
in other countries. Here, at the first phase of a project cycle, it becomes clear what kind of project is urgently needed and where the
appropriate place for the project is.
2. Feasibility Study
When it is clear that a project is potentially beneficial, factors critical to implementing the project are looked into and considered.
The research includes a cost-benefit analysis; if the cost is estimated to be greater than the benefit, the project is turned down. If the
budget turns out to be insufficient to carry out the project, the project is refused. If it becomes obvious that appropriate techniques are
not available, the project is regarded as not feasible. In this way, a potential project is examined from many different aspects. Only
when everything seems likely to be successful is the project appraised and then carried out.
3. Implementation
Once it has been considered both beneficial and feasible, the project is carried out according to the project plan in cooperation with
many different people. For example, if the case is a dam construction project implemented by a state's government as a part of its
Official Development Assistance (ODA), a large construction company that won the contract takes part in the project as builder, the
local workers participate in it as labor force, some development consultants specializing in water resources support it as technical
advisors, and government officers take the roles of supervisors. Without a good combination of the work of these people, the project
will not be not implemented successfully.
4. Evaluation
The completion of the project does not mean the end of the project. It never ends before it is evaluated from various different aspects
and provides some implications or suggestions for future projects. A feedback system is seen here. An evaluation includes the
financial, technical, economic, political, and human-welfare aspects of the project. Some projects, such as a dam construction, may
take a long time to be evaluated, since the utility of a dam can be judged only with the passing of time. If a typhoon comes and
damages a part of the dam two years after the construction is completed, the dam is considered to have had some technical problem
in its construction. This kind of fault never turns up as soon as the construction ends.
A project cycle represents a fundamental process of development projects. Whatever the projects are, or whoever the actors are,
every development project basically follows the cycle.
You can also refer to the description on project cycle within the Inter-American Development Bank home page.
Return to Home
Trochim Home Page
created in April, 1997

POVERTY AND HUNGER IN SUB-SAHARAN AFRICA (SSA)
Ladies and Gentlemen: The green leaves signify freedom from hunger and poverty.
Surveys undertaken in much of Africa show that most people derive their livelihood from agriculture and
reside mostly in the rural areas. Surveys also show that most poor people are the rural people, even though
there is also a significant number of the urban poor. The Economist of November 16th, 1996 alleged that
"people are hungry because they are poor, not because the earth is running out of food." At the recently
concluded world food summit in Rome, the concern was on the fate of 800 million people who are chronically
malnourished, most of whom are in SSA. This web page provides exploratory sites which give an insight to
the degree of poverty and hunger in the world, particularly in SSA. The nature and dimensions of the
problems of hunger and poverty are well outlined in the Overview Paper which was presented at the International Fund for
Agricultural Development (IFAD) conference on Hunger and Poverty, held in Brussels in November, 1995. The focus is on the role
of women since women constitute 60 percent of the world's one billion rural poor (mostly in Africa), the issue is detailed in the
Women and Poverty: Beijing '95 conference report.
This web page therefore addresses three sub-topics for your perusal: The topics provide a challenge for researchers and evaluators in
trying to address the scope of problems related to hunger and poverty in their research. A variety of methods can be used to define
the problem ranging from brainstorming, concept mapping, to "needs assessment" as you try to link the role of women to the
problem.
1. POVERTY
2. HUNGER
3. ROLE OF WOMEN
&127;POVERTY
Poverty is a welfare concept of inequality, and is defined as the

ratio of total household expenditure per adult to the poverty
line. The poverty line is a government's estimate of how many
poor people live below a certain threshold of income. A Poverty
Measure helps to show relative levels of income. Who are the
poor?
● THE POOR,
Stiglitz (1993) defines the poor as those who do not
work and those with low wages. However, for SSA the
majority of people work in agricultural fields and are
poor because their wage rates are too low to enable
them earn enough income to access the basic needs of
life
● THE LANDLESS, These constitute the poorest of the
poor in most of SSA because they have no access to
resources to grow their own food for consumption, and most of them constitute the agricultural labor pool, which is highly
underpaid.
● Some hope to uplift status of the poor has been provided through the IMF/World Bank Economic reforms. There has been
some evidence that Economic Reforms didn't hurt the poor as had been alleged, click here for some evidence from a recent
survey by Prof. David Sahn of Cornell University.
&127;HUNGER
Hunger is a symptom of inadequate food availability resulting

in low calorie consumption per day to meet the basic body
requirement. WHO recommends a standard calorie intake of
2,200 calories per person per day, but the average for SSA is
2,096 calories per capita which falls short of this requirement.
Some hope is being provided through international Non-
Governmental Organizations like the Hunger Project. However
there are some countries with per capita intakes higher than this,
but these are fewer than the ones having the average or less.
With increasing population growth and urbanization, and the
limited potential to increase production through expansion of
cultivated area, it implies that food needs will have to be met
through increased yields, and income. As such countries have
shifted gears from promoting growth per se to Poverty
Alleviation and Food Security Strategies. The following site
shows some case studies of countries in Africa that have
implemented successful reforms to combat poverty and hunger using the later approach:
ROLE OF WOMEN
There are three pillars of Food Security namely;
● FOOD AVAILABILITY, OR ADEQUATE FOOD PRODUCTION

● ECONOMIC ACCESS TO AVAILABLE FOOD
● NUTRITIONAL SECURITY, which depends on the availability of non-
food resources such as childcare, health care, clean water and sanitation.
Women play a significant, if not dominant, roles in supplying all the three pillars
necessary to achieve food security in SSA. But women play these roles in the face of
enormous social, cultural and economic constraints. Their key roles can be
categorized as follows:
Child bearing and rearing
Family and household maintenance
Production and Income earning activities
Women produce 75 percent of the food in households in most of SSA, yet there has
been failure to recognize their centrality in food production, as well as biases in many
institutional arrangements. A Working Group on Women, Food and Agriculture
which was set up during the Beijing conference continues to address issues regarding women's centrality in food production, and
seeks to increase understanding and support of women's role in achieving agricultural sustainability, food security and rural
development among other things. The habitual lack of support to women reflected in many national policies, local institutional
arrangements and development agencies' policies and projects is regarded by many as the cause of low food production in Africa.
This gender bias has been well documented by the WorldWatch Institute. Hunger has been perpetuated as a result of women striving
to meet their household needs among other roles through their own COPING MECHANISMS.
Lastly, for those of you interested in learning more about people faced with hunger, poverty, disease, environmental degradation and
over-population in the third world, here is a link to a course.
Trochim Home Page Project Gallery Home Page
Revised March 28, 1997
Rosern K. Rwampororo
Cornell University.
Sampling in Developing Countries (and elsewhere)
A Guide to Web Resources
Introduction
This page is intended to provide visitors with an overview of how surveys are conducted in developing countries using established
probability sampling techniques. For a good introduction to the principles of probability samples, see Trochim's Knowledge Base
special section on Sampling.
Another handy resource for applied social researchers is found on the Florida Agricultural Information Retrieval System
(FAIRS) Web site. The module titled "Extension Agent Resources" includes a section on Program Development with a thorough
discussion of sampling for impact evaluations, as well as other research-related topics. This is definately worth checking out. Also,
for an unrelated (but fun) side trip, get lost in the site's Leaders and Group Management module.
Since the both the quality and quantity of resources on the Web about sampling in developing countries are extremely limited, look at
the following resources with a creative eye. Some resources may seem only tangentially related, but might spark your imagination,
such as the "Sampling non-human populations" heading below!
Development institutions
Major organizations provide a good starting place for finding information about sampling. Two of the largest are the U.S.
Agency for International Development (USAID) and the World Bank. Another good source is Princeton University's Office of
Population Research.
USAID's Demographic and Health Survey program is a multi-national effort to measure aspects of women's health. It provides
information on several surveys of women's health, including:
● A newsletter about recent surveys

● A selection of press releases with country flags. Each country page links to more information, including
maps of the sampling plan in some cases.
● Publications you can order, including a sampling manual.
The World Bank's Living Standards Measurement Study (LSMS) is similar to the USAID effort, in that this program is also designed
to provide technical assistance to a variety of countries so that the results of various studies will be comparable and provide reliable
data. The description of the LSMS is an example of a good methods write-up.
The Princeton Site includes data archives, with data sets and summaries of various studies in which Princeton has collaborated. Most
concern demographic and health data, like the Chinese Fertility Studies, which are well documented here. The Phase I description
gives an especially good overview of the sampling plan.
Sampling in industrialized countries
In industrialized countries, there are more types of sampling strategies typically available to the researcher,
given the availability of telecommunication and funding. Here are some examples:
● A survey of truck drivers conducted while they were on the road. The sampling plan was complex, using a "multi-stage area
sample with measured response probabilities." This strategy, of catching people in the middle of their usual routine, could
be useful in developing countries, as well. But see your statistics prof to make sure you know what you are doing!
● The Canada Family Violence Survey used a stratified sampling plan that incorporated the random digit dialling technique,
used in telephone surveys.
● The Missouri School of Journalism's Radio News Survey uses a systematic sampling technique.
Sampling on the Web About the Web
This heading is certainly a bit removed from the idea of sampling in developing countries, however it provides some
thought-provoking ideas, given that few of us know who is out there using the Web. Questions of external validity of such surveys
should be addressed, since we have little on which to base our assessment of generalizability. Here are a couple of interesting sites:
● SRI International conducted a survey of Web users. Check out the results.
● A WWW User Inventory: This is a survey of Web users that purports to conduct a personality inventory of users. The
researchers want to know about locus of control, since they believe this will better enable internet services to address user
needs. An interesting idea. Take the survey and see what you think.
● Also, look into the abstracts from the Workshop on Internet Survey Methodology and Web Demographics that was held
at MIT in January, 1996 for some updates on sampling issues on the Web.
Sampling non-human populations
Usually in applied social research we sample people, however, this is not always the case. The studies listed below includes sampling
of:
● Documents: The National Highway Traffic Safety Administration collects data about accidents and
estimates frequency and severity of crashes using its National Accident Sampling System. Since
documents often contain a wealth of information, using such a system could save a researcher in a
developing country a lot of time and money if reliable documentary information on the topic of interest is
available.
● Physical Items: The Environmental Working Group did a study of the level of pesticides in baby food.
The items sampled were jars of baby food, and a multi-stage area sampling plan was used.
● Fish: The Leetown Science Center of the National Biological Service works on maintaining and
supporting aquatic resources -- like fish. The fish sampling may have relevence to those who want to
sample people on the move. Well, not much to do with social research, perhaps, but the pictures are good.
I especially like the one captioned "Collecting fish using battery powered backpack electroshocker." We
certainly can't use that method in social research!
Send me your comments and suggestions

Colleen Flynn Thapalia
EVALUATION OF AGRICULTURE EDUCATION PROGRAMS
Introduction
The intention of this web site is to search for information, resources and research papers in the World Wide Web about program
value and effectiveness in agriculture education in secondary schools but more specifically and relevant, for agriculture education
programs in developing countries. Information found either direct or related can be summarized on this web site as well as provide
linkages to other web sites or related web sites for use by graduate students, researchers, academics, government departments, donor
agencies, private or non-government organizations (NGOs), and multi-lateral organizations.
Additionally but importantly, this web site will endeavor to identify and provide linkages with methodological issues related to the
Knowledge base on Evaluation Research and its application in Agriculture Education Curriculum programs. The area of special
interest in the knowledge base is the piece on Research Problem Formulation especially its application to evaluation research in
Agriculture Education. This is necessary so that relevant and appropriate social research methodologies can be studied which are
pertinent to Evaluation of Agriculture Education programs and Agriculture Curriculum Evaluation Research in secondary schools in
developing countries.
Availability of Pertinent information on Internet Resources
Research methods and research information on the topic of interest is not surprisingly limited because it is specifically vocational and
technically oriented. Search on the world wide web indicate that such research information is not available at this time probably
because the electronic information technology is in the formative stages of development, although there are research information
known to exist in other mediums in the libraries and in the literature around the world.
For example, when the Yahoo search engine was used to search by key words or combination of key words such as Secondary
Agriculture Curriculum, Agriculture Curriculum, Agriculture Curriculum Evaluation, Agricuture Education Evaluation, Agriculture
Curriculum in Developing countries, Agriculture Curriculum from K-12 grades, Adult Education programs and the like yielded no
matches or no match results on the world wide web for evaluation research on the topic.
Types of Information Available on the topic-Evaluation of Agriculture Education Programs
It appears however that much of the information that is available on the Internet resources regarding Agriculture Education and
Agriculture Curriculum matters include information on areas such as Course Catalogues, Program Information, Enrollments,
Resources for Women Studies in Curriculum and other very general information on topics of the like.
For example, the Californian PolyTech. Institute offers Academic Programs available through internet reflecting the growing
diversity of choices and skills in Agriculture Education when searching under the same name. Another, perhaps a more useful
location is the National Network for Curriculum Coordination which has a Curriculum Coordination Center that has a network of
representatives in that subject area on Curriculum related issues.
It may also be very useful to explore and share with people at the the CCI Agriculture mailing list that comprise of people who share
messages with each other on things that pertain to a particular topic. The is a site where messages are send and received through a
central computer facility.
Relevant Evaluation Research in other Subject areas related to the topic-Evaluation of Agriculture Education Programs
Since there isn't much or no research information or sources of information on the topic in the world wide web, it is necessary to
identify relevant and adaptable research information so that knowledge base and understanding can be established as to the nature,
scope and relevancy of evaluation work in agriculture curriculum program evaluation. Such if available will assist researchers
conceptualize, design, plan, implement and make inference on evaluation research in agriculture education. Relevant subject areas
of interest used to search for possible web sites include: Vocational Education, Teacher Education, Classroom techniques,
Science Education and Technology Education.
Sites on Vocational Education
The only site that has quite a bit of information on vocational education is the National Centre for Vocational Education Research
based in Australia known as NCVER, it is a centre of research excellence and a key knowledge centre whose core business is
analytical research and knowledge communication. The NCVER has some very useful functions which is highly recommended for
www users interested in the topic. Another interesting web site related to Vocational Education and Training is the Internet
Resources in Vocational Education which can be found by linking to its base in the University of Dakota. It has links to a number of
other vocational education locations. The SCOTVEC is another useful web site worth visiting. It is the Scottish Education Council, a
national body for vocational education qualifications. Perhaps a more significant site is related to Training and Development. The site
contains non-commercial, training-related Internet resources. Further, a list of conferences held world wide are also on the distance
education web site.
Sites on Teacher Education
The best and recommended web site to go to for some general information on Teacher Education but has nothing on evaluation
research is the Journal of Industrial Teacher Education. It has lengthy articles spread over four years on the topic. The Michigan State
Dept of Education has a web site for Classroom & Teacher Resources and Information that is also of some value.
Sites on Classroom Techniques
Search on this topic reveiled one interesting site, the great cartoonist online service which can be of use for curriculum development
purposes. A good sources of cartoons are available and being added with time.
Sites on Science and Technology Education

This particular subject areas are interesting and are emerging fields of interest. The Exploration in Education is one such site. The
University of Nebraska-Lincoln Research in Physics Education Group performs research in a variety of level of schooling in physics
education and focuses on aspects of teaching physics. These web site and others on the topics of science and technology are available
and search using those key words and topics will reveil a good source of web sites.
Summary: Evaluation of Agriculture Education Programs
It is now clear that vocational subject areas like Agriculture Education do not have information on web sites locally or
internationally. The medium of communication is probably not utilized at this time and may not be until some years after the
innovation is more extensively used in other subject/fields like the arts and sciences.
It may be possible in other more technical research matters related to agriculture production and agribusiness but certainly not so in
the field of education, at least, insofar as using this medium is concern for the time being.
If such information is not available in the developed world, it may even be less of a chance of its availability in the developing
countries where finance, expertise and resources are in an even greater demand for other purposes.
In sum, there are no web sites at this point in time based on curriculum evaluation research in agriculture education.
Efforts should be made by concern researchers and academics to have some web sites established for academic, research, teaching
and also for the purpose of sharing. Those that are contemplating such an endeavor should be encouraged to do so before we in the
field of agriculture education and evaluation research in agriculture education in particular get further and further behind in the
opportunity to take advantage of the technology. As more and more homes are now equiped with computers accessible to the
electronic information system throughout the world, such a medium will become more and more useful.
please forward comments/suggestions/additions to e-mail address: acp10@cornell.edu Created in May, 1996

Arnold C. Parapi, Graduate Student, Agriculture Education, Kennedy Hall, Cornell University, Ithaca, NY.
14853 Ph/Fax #(607)266 8253
Spotlight on Evaluation Techniques in International
Development Programs
The purpose of this page is to direct web-users to resources on the Internet that
are involved with International Development and the Evaluation of those programs. Individuals who will find this site useful may be
part of government agencies, non-profit organizations, or multi-lateral organizations.
Trends in International Development
Sustainable Development
As the end of the century is approaching, development workers are beginning to evaluate their efforts, and are subsequently turning to
better methods of development. One recent trend has been the Sustainable Development movement. The phrase originated at the
Brundtland Commission meetings in 1987 and has recently been addressed by the creation of the UN Commission on Sustainable
Development, which meets yearly to discuss international sustainable development issues. The International Institute for Sustainable
Development home page provides a good place to start exploring the principles of sustainable development. The concept of
Sustainable Development within the "Four Views of Nature" is presented by the Green Cross as a critical view to the applicability of
the ideal in practice.
Included here are links to:
International Development Resources
❍ Virtual Library of International Development compiled by the Official Development Assistance agency of Canada.
Includes user friendly format to access development organizations, or browse subject areas and world regions.
❍ "Resources for Social and Economic Development" by Richard Estes at the University of Pennsylvania. This is an
excellent starting point for learning about development.
❍ The Institute for Development Studies in Sussex hosts an online catalog Devline for searching directories,
bibliographies and Internet resources.
❍ International Development related sites compiled by the International Development Research Center.
For other useful resources such as videos, slides, documents, and speeches: see the IDRC library.
❍ Another list provides links to alternative international organizations (not necessarily involved in development).
❍ A comprehensive list of non-profit organizations in North America has been compiled in this meta-index.
❍ An extensive page of resources for everything relating to International and Area Studies with a good section in
country specific profiles.
❍ International and Area Studies page at Yale University. This page acts as a pointer to educational resources and
organizations at Yale in the area of international studies.
❍ The African Studies department at University of Pennsylvania has set up this extensive page with links to
development organizations and topic oriented groups.
Evaluation in Development
Development programs of all types are designed for some method of evaluation; unfortunately at this time there isn't much
information available on the various methods of evaluation currently utilized in the field of international development. This
may be due to the non-traditional methods of evaluation or lack of proper methodology in evaluation. Certainly, the
"Evaluation Culture"called for by Bill Trochim is not yet visible in social research nor development research.
In lieu of support for development evaluation, field workers must rely on assessment of other types of projects to draw
parallels and methodological equivalents.
Below are some resources available to the practitioner of evaluation in development projects.
Evaluation Resources
❍ The British Library for Development Studies has compiled an extensive listing of development research directories
and organizations.
❍ The International Development Research Center is an agency of the Canadian government researching "sustainable
and equitable development."
❍ The Intermediate Technology Development Group provides free consultations for development organizations.
❍ Alternatively, evaluators can use internet resources for monitoring of development projects:
■ The Decision Support System is a computer based guide to decision making that is focused on
sustainability.
■ The Manaaki Whenua Landcare Research group in New Zealand presents a framework for Participatory
Research in development.
■ The University of Guelph maintains proposals of frameworks for evaluations that were presented during
the Electronic Conference on Sustainable Indicators.
■ A step by step model of evaluation of educational systems may also prove useful in methodology and
systems of evaluation.
■ The Earth Times Gopher can retrieve publications related to international development and evaluation.
■ The searchable index at the International Institute for Sustainable Development provides information on
policies that reflect sustainable development.
Evaluation in Action
The United States Agency for International Developmentposts the only description (that I have found) on an evaluat ion of a
family planning project.
please send comments/additions to alv4@cornell.edu
created April 1,1996

by Amy Volz
HICHANG's
Internet Advertising Page
Research Measure Spending News Institutions
Suggestions? Talk to me
Research Archives || Research || Case Studies || Operational Articles ||
Since Internet advertising is still new to many advertisers and marketers, many people have
questioned the effectiveness of this new marketing channel. As we know, evaluating any kinds of
program effectiveness is a challenging task, and it is especially true for Internet advertising since it is
very hard to find out direct correlation between advertising exposure and purchasing behavior.
Below are some links where you can find out how researchers and practitioners in this field have
tackled with this issue.
Research Papers
● IAB Advertising Effectiveness Study
One of the most comprehensive and projectable test of online advertising effectiveness. With twelve major Web sites and
over 16,000 individual users taking part in the test, the study shows the results of rigorous test of advertising effectiveness
issues such as brand awareness, consumer acceptance of online advertising, product attribute communication, and purchase
intent. The study claims that it is based on the classical experimental design (control group with random assignment of
samples). Therefore, they assume that the causal inference made in this study (Internet advertising increases brand
awareness, for example) has internal validity. While it might be true, it is still questionable whether or not the conclusion of
this study (Internet advertising is effective) is valid for all stakeholders in the market. For advertisers, the concept of
advertising effectiveness implies "cost effectiveness."If we apply this conceptual definition to the study, the conclusion
would be questionable. Additionally, there is a problem of demand characteristics.
● A New Marketing Paradigm for Electronic Commerce
By Donna L. Hoffman & Thomas P. Novak, February 19, 1996. The paper argues that the traditional one-to-many model,
with its attendant implications and consequences for marketing theory and practice, has only limited utility in emerging
many-to-many media like the World Wide Web, and that a new marketing paradigm is required for this communication
medium.
● What makes women click?
Study on demographic, self perceptions/perceived benefits, motivation of female internet users, explaining its implication
for internet advertising. A good example of factor analysis.
● Marketing in Hypermedia Computer-Mediated Environments: Conceptual Foundations
It is excellent theoretical analysis of the Web as a medium for market communications. In this working paper, which is
written by Donna Hoffman and Thomas Novak, the communication characteristics of Hypermedia CMEs (such as the Web)
are analyzed, and a structural model of consumer behavior is developed incorporating the notion of flow.
● Commercial Scenarios for the Web: Opportunities and Challenges
This paper by Hoffman, Novak and Chatterjee explores the role of the web as a distribution channel and a medium of
marketing communication, the benefits of using the Web in a commercial context, and the barriers to commercial growth of
the Web.
top
CASE STUDIES
● Building Effective Web Presence for Tide Presence
This paper examines the implications of Tide's changing demographics, the ability of the Web to deliver the demographic
and how to best communicate Tide's message to consumers. The research question is, how can Tide effectively use an
integrated Web presence to gain product awareness and loyalty among the young, female consumers that are vital to its
future. Topics include online marketing strategies, measuring the value of Tide advertising, and metrics for measurement on
the Web. A good summary of measurement issue is included.
top
OPERATIONAL ARTICLES ABOUT INTERNET ADVERTISING
● 20 Reasons to Put Your Business on the World Wide Web Marketing reasons for using WWW technology. Comes with
French and Spanish translation.
● The Business of the Internet A basic but well-written introduction to the Internet as a business tool.
● Why Internet Advertising? A brief summary of study that provides a list of strength of WWW advertising and rationale for
them. A good source for persuading clients to consider WWW advertising as a good marketing channel.
top
Home Research Measure Spending News Institutions
Research Archives || Research || Case Studies || Operational Articles ||
Since Internet advertising is still new to many advertisers and marketers, many people have
questioned the effectiveness of this new marketing channel. As we know, evaluating any kinds of
program effectiveness is a challenging task, and it is especially true for Internet advertising since it is
very hard to find out direct correlation between advertising exposure and purchasing behavior.
Below are some links where you can find out how researchers and practitioners in this field have
tackled with this issue.
Research Papers
● IAB Advertising Effectiveness Study
One of the most comprehensive and projectable test of online advertising effectiveness. With twelve major Web sites and
over 16,000 individual users taking part in the test, the study shows the results of rigorous test of advertising effectiveness
issues such as brand awareness, consumer acceptance of online advertising, product attribute communication, and purchase
intent. The study claims that it is based on the classical experimental design (control group with random assignment of
samples). Therefore, they assume that the causal inference made in this study (Internet advertising increases brand
awareness, for example) has internal validity. While it might be true, it is still questionable whether or not the conclusion of
this study (Internet advertising is effective) is valid for all stakeholders in the market. For advertisers, the concept of
advertising effectiveness implies "cost effectiveness."If we apply this conceptual definition to the study, the conclusion
would be questionable. Additionally, there is a problem of demand characteristics.
● A New Marketing Paradigm for Electronic Commerce
By Donna L. Hoffman & Thomas P. Novak, February 19, 1996. The paper argues that the traditional one-to-many model,
with its attendant implications and consequences for marketing theory and practice, has only limited utility in emerging
many-to-many media like the World Wide Web, and that a new marketing paradigm is required for this communication
medium.
● What makes women click?
Study on demographic, self perceptions/perceived benefits, motivation of female internet users, explaining its implication
for internet advertising. A good example of factor analysis.
● Marketing in Hypermedia Computer-Mediated Environments: Conceptual Foundations

It is excellent theoretical analysis of the Web as a medium for market communications. In this working paper, which is
written by Donna Hoffman and Thomas Novak, the communication characteristics of Hypermedia CMEs (such as the Web)
are analyzed, and a structural model of consumer behavior is developed incorporating the notion of flow.
● Commercial Scenarios for the Web: Opportunities and Challenges
This paper by Hoffman, Novak and Chatterjee explores the role of the web as a distribution channel and a medium of
marketing communication, the benefits of using the Web in a commercial context, and the barriers to commercial growth of
the Web.
top
CASE STUDIES
● Building Effective Web Presence for Tide Presence
This paper examines the implications of Tide's changing demographics, the ability of the Web to deliver the demographic
and how to best communicate Tide's message to consumers. The research question is, how can Tide effectively use an
integrated Web presence to gain product awareness and loyalty among the young, female consumers that are vital to its
future. Topics include online marketing strategies, measuring the value of Tide advertising, and metrics for measurement on
the Web. A good summary of measurement issue is included.
top
OPERATIONAL ARTICLES ABOUT INTERNET ADVERTISING
● 20 Reasons to Put Your Business on the World Wide Web Marketing reasons for using WWW technology. Comes with
French and Spanish translation.
● The Business of the Internet A basic but well-written introduction to the Internet as a business tool.
● Why Internet Advertising? A brief summary of study that provides a list of strength of WWW advertising and rationale for
them. A good source for persuading clients to consider WWW advertising as a good marketing channel.
top
Measuring Internet Advertising Effectiveness
As we have learned in our research methods class, measuring/evaluating any kinds of

program effects is difficult, since people tend to use different concepts with different
operations to measure a certain construct. Since Internet advertising is so new, the industry
have been suffered from the lack of standardization for measurement. Until now, there have
been different definitions for the same terms, a lack of comparability, and completely
unique systems that do not allow for scalable auditing. The effectiveness of Internet
advertising can be measured in terms of awareness, product/service recall, attitude changes,
and purchasing behavior. Among others, quality audience measurement of Internet
advertising is the first step for ensuring the long-term viability of the Internet advertising
market. Below are some links where you can find more detailed information about how
players in this market have tried to build up consensus on this issue. As you can see, it
would be a good example of how hard it is to create a solid construct and measurement
scheme that can satisfy multiple stake holders.
|| Studies || Organizations||
Studies about Internet Advertising Measurement
● CASIE Guidelines
CASIE (a joint project of the Association of National Advertisers, Inc., and the American Association of Advertising
Agencies), with the support of the Advertising Research Foundation (ARF), has created the Guiding Principles of
Interactive Media Audience Measurement. This book focuses on supplying guidelines for providing quality audience
measurement of interactive media, both cyberspace and interactive television, at the levels of vehicle exposure as defined in
the ARF document "Toward Better Media Comparisons".
● New Metrics for New Media: Toward the Development of Web Measurement Standards
A comprehensive study about Internet advertising measurement standards conducted by Novak, T.P. and D.L. Hoffman.
Topics include in-depth review of various measurement standards and suggestion of new metrics developed by the authors.
● Metrics and Methodology for Internet Advertising

A recent proposal released by Internet Ad Bureau's media measurement task force. Topics include a list of definitions of
various measurement terms and techniques, discussions, glossary, and so on.
● WWW advertising CPM (Cost per thousand)
AdKnowledge, the leading provider of Internet marketing solutions for advertisers, agencies and web publishers, reveals
new statistics that show the average gross web ad rate for March 1998 was $36.63, a decrease of 6% during the last 12
months and 2% during the first quarter 1998.
Web Usage Measurement/Auditing Companies
● ABC (Audit Bureau of Circulations)

● Art Technology Group, Inc.
● BPA International
● The Delahaye Group, Inc.
● Group Cortex, Inc.
● Huntana
● Internet Profiles Corporation (I/PRO)
● Interse Corporation
● Logical Design Solutions
● Market Arts Web Track LLC
● Marketwave
● Media Metrix, Inc.
● NetCount, LLC
● Nielsen Interactive Services
● Open Market, Inc.
● Streams Online Media Development
● Interband Communications Corp.
● W3.COM
● WebTrack

Measuring Internet Advertising Effectiveness
As we have learned in our research methods class, measuring/evaluating any kinds of

program effects is difficult, since people tend to use different concepts with different
operations to measure a certain construct. Since Internet advertising is so new, the industry
have been suffered from the lack of standardization for measurement. Until now, there have
been different definitions for the same terms, a lack of comparability, and completely
unique systems that do not allow for scalable auditing. The effectiveness of Internet
advertising can be measured in terms of awareness, product/service recall, attitude changes,
and purchasing behavior. Among others, quality audience measurement of Internet
advertising is the first step for ensuring the long-term viability of the Internet advertising
market. Below are some links where you can find more detailed information about how
players in this market have tried to build up consensus on this issue. As you can see, it
would be a good example of how hard it is to create a solid construct and measurement
scheme that can satisfy multiple stake holders.
|| Studies || Organizations||
Studies about Internet Advertising Measurement
● CASIE Guidelines
CASIE (a joint project of the Association of National Advertisers, Inc., and the American Association of Advertising
Agencies), with the support of the Advertising Research Foundation (ARF), has created the Guiding Principles of
Interactive Media Audience Measurement. This book focuses on supplying guidelines for providing quality audience
measurement of interactive media, both cyberspace and interactive television, at the levels of vehicle exposure as defined in
the ARF document "Toward Better Media Comparisons".
● New Metrics for New Media: Toward the Development of Web Measurement Standards
A comprehensive study about Internet advertising measurement standards conducted by Novak, T.P. and D.L. Hoffman.
Topics include in-depth review of various measurement standards and suggestion of new metrics developed by the authors.
● Metrics and Methodology for Internet Advertising

A recent proposal released by Internet Ad Bureau's media measurement task force. Topics include a list of definitions of
various measurement terms and techniques, discussions, glossary, and so on.
● WWW advertising CPM (Cost per thousand)
AdKnowledge, the leading provider of Internet marketing solutions for advertisers, agencies and web publishers, reveals
new statistics that show the average gross web ad rate for March 1998 was $36.63, a decrease of 6% during the last 12
months and 2% during the first quarter 1998.
Web Usage Measurement/Auditing Companies
● ABC (Audit Bureau of Circulations)

● Art Technology Group, Inc.
● BPA International
● The Delahaye Group, Inc.
● Group Cortex, Inc.
● Huntana
● Internet Profiles Corporation (I/PRO)
● Interse Corporation
● Logical Design Solutions
● Market Arts Web Track LLC
● Marketwave
● Media Metrix, Inc.
● NetCount, LLC
● Nielsen Interactive Services
● Open Market, Inc.
● Streams Online Media Development
● Interband Communications Corp.
● W3.COM
● WebTrack
Online Advertising Spending
Forecasting is extremely difficult. It's especially true for such exploding market as
Internet & Online advertising. Historically, every year the market size has been doubled-up.
Below are some figures and links where you can find out more detailed market updates
Ad. Revenue Updates
Internet/online advertising revenues ("revenues") surpassed the $1.0 billion quarter for the first time,
totaling $1.2 billion for the third quarter of 1999, $726 million or 148 percent growth over the 1998 third-
quarter total of $491 million. Revenues totaled $2.8 billion for the first nine months of 1999, more than
double the same period for 1998, and averaged over $400 million per month during the third quarter of
1999. Based on historical trends, revenues are on pace to exceed $4 billion in 1999.
(Source: IAB report)
Links for industry updates
● Cyber Atlas
● IAB
● Advertising Age

Resources for Latest Information
There are many sites on the Web that serve as great resources for learning more about this ever-
changing industry. There is still much to learn about marketing on the Web. Here is a list of
resources that help you identify helpful information.
Journals & Discussion List
● Advertising Age - Interactive Daily

● Adweek Online
● Media Central
● ClickZ
● The Online Advertising Discussion List
Organizations Conducting Research on Interactive Media Usage
● @plan. Inc.
● Arbitron NewMedia
● Communications Industry Researchers, Inc.
● Cyber Atlas
● Cowles/SIMBA Information
● Datamonitor
● Dataquest
● Decision Analyst
● Find/SVP
● Forrester Research
● Frost & Sullivan
● Gartner Group
● Georgia Institute of Technology Graphics, Visualization & Usability Center
● Killen & Associates, Inc.
● Louis Harris & Associates
● Netsmart
● Next Century Media
● A. C. Nielsen
● Nielsen Media Research
● Project 2000
● Telecommunications Research Inc.
● Veronis, Suhler & Associates
● VisionQuest 2000
● The Yankee Group
● Yankelovich Partners, Inc.
Research groups and institutions
Many institutions are involved in the Internet advertising business to solve out problems in the
field and to provide guidelines for using the Internet as an advertising channel.
● Internet Advertising Bureau
Internet Advertising Bureau is the global association devoted exclusively to maximizing the use and effectiveness of
advertising on the Internet. On this site readers will find information on research and latest news about WWW advertising.
● CASIE
The Coalition for Advertising Supported Information and Entertainment CASIE), a group of advertising industry
organizations, provides guidelines for using interactive media as an advertising channel. CASIE members include the
American Association of Advertising Agencies (AAAA), the Advertising Research Foundation (ARF), and the Association
of National Advertisers (ANA). Check this page to get recent research report and other information.
● World Wide Web Consortium (W3C)
● The W3C is an open consortium of over 100 member companies involved win web software, services and content. The
companies include America Online, AT&T, CompuServe, IBM, Microsoft, Netscape, Open Market and Sun Microsystems,
among others. W3C exists to develop common standards for the evolution of the Web. Their pages on demographics is a
great source for getting up to speed on the technical challenges regarding audience tracking. W3C also provides some free
public services, including prototype and sample applications demonstrating Web technology, and a repository of
information about Web specifications for developers and users. W3C also runs a mailing ist (www-logging@w3c.org)
devoted to discussing the issues of usage tracking.

Contingent Valuation
sk234@cornell.edu
Content
This web site provides an orientation to environmental economics, especially its growing sub discipline on contingent valuation
of non market benefit of environmental improvements. It has also provide hyper links to some case studies for which the method has
been used and several other web sites from which you can find more similar related topics. This page is periodically updated, so your
questions, comments, and ideas are most welcome.
I. Introduction
1.1 Definition
1.2 Rationale
II. Types of valuation technique

2.1 Direct cost
2.2 Revealed demand
2.3 Bidding game
III. Hypothetical question modes
IV. Application and case studies
4.1 Case study one: Salmon restoration

4.2 Case study two: Water allocation in Mono lake, California
4.3 Forest preservation in Australia
4.4 Coastal Water Quality: A contingent approach
4.5 The Referendum Format Contingent Valuation Method
V. Conclusion
VI. References
I. Introduction
1.1 Definition
Contingent valuation is a method of estimating the non market value of environmental attributes or amenities such as values of
a grand canyon, endangered species or recreational or scenic resources etc. These values are generally measured based on the
willingness to pay for improved environment or the willingness to accept compensation for damaged environment or to accept a
condition of being deprived of the improved environment. The most appealing aspect of the contingent valuation method is that it
allows us to estimate total value rather than components of that total value ( Frykblom,1997). For more information, please click
here!
The non market value of environmental goods is categorized into three components, existence, option and bequest values. An
existence value is the value the public are willing to pay to some specific environmental amenities or scenic resources in order to
keep them from being extinct or damaged. Valuers are not concerned about whether or not they will see or use them in the future.
They just want to know that they exists. For instance, some people might have never seen rain forests or whales, but they are very
concerned about their current status. They want them to exist even they don't know if in their lives they can use or see them (Knight
and Bates, 1995).
A bequest value is, on the other hand, the value that public gives to preserving the quality environment for their children or next
generations to enjoy it as they do now. The case of endangered species preservation has illustrated the concern of this value. For
instance, people have enjoyed sighting the African elephant who are now threatened with extinction. The recreational experience and
enjoyment they have from seeing the elephants have emotionally encouraged them, in one way or another to help preserve the
animals. One might imagine that if they are gone extinct within our generation we no more have a chance to sight seeing them again.
Most importantly, the next generations will be completely deprived of this chance of enjoyment or any chance to see the animal in
reality but only in picture.
In this case, contingent valuation will ask you how much you are willing to pay to preserve them from extinction. The amount
you would be willing to pay depends on how strong you feel about your encountering with the animal. How strongly you want your
children or grand children to enjoy them. The value you give based on your recreational experience to preserve the elephants, in this
case, is the bequest value because you seem o be less concerned about your own opportunity but about your children and future
generation.
Finally, the option value of any environment amenity, is the value that the public are willing to pay to preserve it for future use
but they are not sure when they are going to use it. You might hear people said about how smart and friendly a whale is. You are
strongly want to see them sometime in the future. And for the time being, there are reportedly two whales remaining in the water
world today. Let's assume you are told that the remaining two whales, one in the Pacific Ocean and other in another ( I don't know
where she is now) will be killed due to the very compulsory experimentation for any scientific breakthrough for future economic
development. How would you feel about this information. I am sure you don't really want them to be killed for whatever reason, do
you? They are not many now, and the possibility that they could repopulate the water world is null as they both are now in different
locations and no chance to meet each other at all. While you are very keen to sight the animal sometime in the near future, there was
a survey asking you how much you are willing to pay in order to reserve your future opportunity to see them or you are deprived off
this chance forever. The answer you give to this question no matter how much it is, is the option value that you are personally willing
to pay to save the animal.
1.2 Rationale
Why the method is appropriate and for what purpose it is well known in planning and decision making related to environmental
or sustainable natural resources conservation and management. The old resources management paradigm has produced social
consequences and externalities. For instance, the traditional resources management that has purely based on market benefits as the
main value component in the benefit cost analysis for policy making has proved unsustainable for the long run. As result, the
integrity of surrounding ecosystem (environment) has been severely damaged. Ozone depletion, global warming, acid rain and other
hazardous pollutants extinction of species etc. have resulted from the negative impacts of human activities on the natural
environment or ecosystem in general. This consequences, some are irreversible have awakened some scientists, economists and
management authorities to explore other alternative approach to better manage our remaining resources before it is too late. Several
ecological and environmental economic philosophies have subsequently emerged for consideration in decision making about the
resources management, especially the alternative way of valuing the natural resources or environment attributes, that is, the
contingent valuation method.
Robert Constanza in his article about ecological economics said " there is an increasing awareness that our global ecological life
support system is endangered, and that decision made on the basis of local, narrow, short term criteria can produce disastrous results
globally. Societies are becoming to realize as well that traditional economic and ecological models and concepts fall short in their
ability to deal with these problems." ( in Knight and Bates, 1995, p.323)
In his article about shifting and broadening the economic paradigm toward natural resources, Loomis points out the following:
... the principle and concerns of natural resources economics [have been evolved] from one that was primarily concerned about
scarcity of commodity natural resources to a discipline that accords preservation of natural environment equal importance to
development. While the same broad principle that guide efficient use of marketed natural resources apply to natural environment-
fish, wildlife, water quality and so on- there has also been much innovation in conceptual and empirical foundation of modern natural
resources economics ( in Knight et al, 1995, p. 221).
The contingent valuation has, then, seen practiced widely in providing information for resources management policy. Among its
poplar roles, the contingent valuation has been used in benefit cost analysis of many environmental regulations or program whose
implementation might have significant impacts on the environment. It allows another new value component of the resources, the non
market value of resources that has been neglected in the old management paradigm to be incorporated in the decision making
process.
II. Types of valuation technique
There is a number of techniques that have been used in the contingent valuation approach to estimate the non marketed value of
any specific environment amenity or scenic resources. For the purpose of this paper, I would discuss three valuation techniques that
are currently practiced in non marketed valuation of environmental goods or in other word, the public goods. These include direct
cost, revealed demand and bidding game.
2.1 Direct cost
It is a method of estimating the non marketed benefit of reduced environmental damage based on direct estimate of the cost to
be projected from that damage. These costing technique is practically challenging. Unavailability of information, pricing and
accounting problems inherent in the analysis has made it difficult to put the method into widely use. It would be even more difficult
if you are trying to apply this technique in the valuation of aesthetic environmental improvement, since the cost of aesthetic damage
is not explicitly reflected in the market (Randal et al, 1974).
Originally, the direct costing technique has seen practiced in flood control projects where there was the strong requirement of
the benefit and cost analysis of building any physical structure such as dams or reservoirs to prevent residential and agriculture lands
from flooding. Before making a decision whether to implement flood control projects which would, on the one hand very costly to
the society and on the other, would impact on the ecosystem; the benefit cost analysis was conducted . In calculating the benefit and
cost of the projects, the direct cost technique has been used. The technique allowed for the estimate of the cost of damage to crops
and residential properties and other inherent costs such as cleaning up cost, rehabilitation, resettlement costs and so on ( Kneese,
1984). The benefits of the projects would be extrapolated from the damage aversion. How much the damage by flood would be, is
then translated into the benefits of the flood control projects.
2.2 Revealed demand
This valuation technique is trying to infer the non marketed benefit from the revealed demand for some appropriate proxy. In
the case of reduced air pollution, the revealed demand for residential land is related to the concentration of air pollution (Randal et al,
1974). To understand the concept of this technique, let's look at a simple example of noise pollution from a nearby airport. Here, we
assume there are two houses that are very similar in everything except location, as one is located close to the airport, and the other is
located in very peaceful and quiet environment. Could you guess which one would be more expensive? The price of the house close
to the airport must, of course, be lower than that of the other which is not; unless you enjoy noisy environment or enjoy watching
airplanes taking off and landing. From this example you can see a difference in price of these very similar houses. If the house were
not located close to the airport, it would have the same price. In other word, if the noise pollution is abated, the benefit from that
abatement would be the price difference.
2.3 Bidding game
It is also a technique of estimating the non market benefit of improved environmental quality or establishment of recreation
sites. Through this method the respondents will be asked " to answer yes or no to the question: would you continue to use this
recreation area if the cost to you was to increased by X dollars? The amount is varied up and down in repetitive question, the highest
response will be recorded. Individual response may be aggregated to generate a demand curve for recreation services provided by the
area". (Randal et al, 1974, p.443).
In a simple sense, the bidding game would ask the respondents to react to varying bids. The bids will be raised or lowered until
the respondents switch their reaction from the point of inclusion to exclusion. In order to make the response more reliable and stable,
the respondents must be the consumers of the product rather than the potential ones. The technique becomes more dependable if the
survey is conducted at the recreation sites where respondents are currently engaged in the activity (Knetsch and Davis, 1966).
III. Hypothetical question modes
There are several questions that have been used both in a control experiment and in practice. Among those are dichotomous,
open ended, payment card and bidding game.
Dichotomous choice is a bid offered to the respondent that he/she can accept or reject, while in the open-ended question the
respondent is, on the other hand, asked for his or her maximum willingness to pay for something that are of his or her most interest,
in this case, the improved quality environment.
Payment card is another mode of question used in contingent valuation. The card is shown to the respondent with several bids
printed on it. The respondent is asked if any of those bids is close to his/her maximum willingness to pay. And finally, the bidding
game is refereed to the sequence of bids offered to the respondent so that his or her maximum willingness to pay can be
elicited." (Frykblom, 1997).
IV. Application and case studies
A contingent valuation method must be applied in a clear context, and services the provided by any improved environment
attributes have to be made known to respondents involved in the process of valuation. Emphasis must also me paid on the design of
the survey. Some aspects that could potentially affect the outcome of the contingent valuation method must be taken into
consideration. Educational attainment and income level of respondents have great impacts on the willingness to pay for any improved
environment quality. This case is most evident in most developing countries where people are too poor to think about environmental
protection or improvement.
4.1 Case study one: Salmon restoration
This case study illustrate the benefit cost of demolishing the hydro-electric dams to allow free passage of salmon. Salmon is an
anodronomous species which require long reproductory migration. Since the construction of the dams, migratory routes has been
blocked and as result salmon population have been drastically declined.Click here to the case study!
4.2 Case study two: Water allocation in Mono lake, California

This case study will highlight the use of contingent valuation method in the estimating the willingness to pay of the local
residents to preserve endangered species that are threatened by negative change in their habitat. Mono lake one among many other
water bodies that have suffered ecological change due to economic development activities during the past decades. In the case of lake
Mono, the main issue is the allocation of water use. Redistribution of water sources that feeds the Mono lake to another new user, in
this case to Los Angeles has affected the in-stream flow which result in change in water habitat such as low flow rate, increasing
water temperature, polluting river bed etc. As result, the food supplies of migratory birds have been decreased, thus threatening the
existence of the birds. Click here to the case study.
4.3 Forest preservation in Australia
This case study would help illustrate the use of contingent valuation in providing information to decision making about whether
to keep unprotected forest area from lumber and create it as the forest reservation. The contingent valuation was used to try to
quantify the non marketed benefit of the potential forest reserve. After the valuation process, it was found that the benefit from
precluding the unprotected forest from timber exploitation outweighs the cost and benefit from the timber. The positive net present
value is of $ 543 millions for preserving the forest. This case study can be accessible through the following fee based web address:
http://www.idealibrary.com/links/artid/jema.1993.1042
4.4 Coastal Water Quality: A contingent approach
In this case study the contingent valuation was used to estimate the non marketed benefit of two environmental goods: 1.)
improved water salubrity and 2.) preservation of ecosystem against eutrophication.
The study has found that whatever environmental goods, the willingness to pay was observed to rise with income of the
respondents. This case has been very evident in most developing countries where people are most concerned with their daily struggle
to sustain their lives. The people are so poor to think about the quality environment. To the rural poor in most developing nations,
environmental protection would take the back seat while on the front one they are struggling to find enough food to just survive for
the day ( Kolstad, 2000). If they were asked for their willingness to pay for environmental protection or pollution abatement, they
would say that those are not their immediate concerns. The result will, therefore, be underestimated or zero value will perhaps be
placed on the environmental protection.
The study has also found that the willingness to pay for the salubrity was affected by the environmental sensibility and
awareness. In this case, the respondents' willingness to pay depends very much on information about negative effects and the current
level of pollution. If they are not aware of being currently affected by this pollution, they tend to, of course, undervalue the non
marketed benefit of environmental improvement. Finally, the willingness to pay for the preservation of the ecosystem was, on the
other hand, very related to the level of educational attainment of the respondents. The more the respondent has been educated, the
higher value they tend to place on the preservation of the ecosystem and vice versus. This case study can be accessible through the
following fee based web address:
http://www.idealibrary.com/links/artid/jema.1995.0078
4.5 The Referendum Format Contingent Valuation Method
This case study on Household's Valuation of Alternative Levels of Hazardous Waste Risk Reductions illustrates the use of
another form of contingent valuation, the mail questionnaires, to estimate household's willingness to pay to reduce risk of premature
death from hazardous waste in the environment. The result is similar to the contingent valuation that uses person interview (Pierre du
Vair and John Loomis, 1999 ) This case study can be accessible through this following fee based web address: http://www.
idealibrary.com/links/artid/jema.1993.1060
V. Conclusion
In conclusion, new emerging issues in the discipline of natural resources and environmental economics requires an alternative
system of decision making that has no longer based purely on marketed values of natural resources but included the non marketed
value in the total value component of the resources. To this end the contingent valuation ranges the most promising and exhaustive
method of estimating this non-marketed benefit. Since the non market value of the environment or natural resources in particular has
been included as the main part of the total value component in benefit cost analysis, the information based on which the decision is
made would produce a comprehensive and sustainable management plan. Unlike the traditional pure benefit cost analysis, the
contingent valuation has involved affected communities in the process of valuation. The result would, therefore, be comparatively
politically feasible.
VI. References
1. Alan Randal, Berry Ives and Clyde Eastman (1994), Bidding Games for Valuation of Aesthetic Environmental Improvement,
Journal of Environmental Economics and Management ,1, August, 132-49
2. Charles D. Kolstad (2000), Environmental Economics, Oxford University Press
3. Jack l. Knetsch and Robert K. Davis (1966), Comparison of Methods for Recreation Evaluation, in Allen V. Kneese and Stephen
C. Smith (eds), Water Research, Baltimore: Johns Hopkins Press for Resources for the Future, 125-42
4. Loomis, J. 1996. “Measuring the Benefits of Removing Dams and Restoring the Elwha River: Results of a Contingent Valuation
Survey,” Water Resources Research 32(2):441-447.
5. Michael Lockwood, J.Loomis and T. Delacy, A contingent Valuation Survey and Benefit -Cost Analysis of Forest Preservation in
East Gippsland, Australia. Journal of Environmental Management (1993) 38, 233-243
6. Ph. Le Goffe ( 1994). The Benefit of Improvement in Coastal Water Quality: A contingent approach. Journal of Environmental
Management (1995) 45, 305-317.
7. Peter Frykblom, Hypothetical Question Modes and Real Willingness to Pay, Journal of Environmental Economics and
Management (1997) 34,275-287.1.
8. Richard L. Knight and Sarah F. Bates ( 1995). A new Century for Natural Resources Management. Island Press: Washington DC
20009.
9. Wallace E. Oates (1994), The Economics of the Environment, Edward Elgar.

Organizational Learning and Memory
Hee-Jae Cho
This page introduces the resources available for the area of organizational learning and memory on the World Wide Web.
Contents
● Organizational Learning
❍ What is organizational learning?
❍ Who has interest in organizational learning?
■ Business Pioneers in System Dynamics
■ Learning and Educational Psychologists in Psychology and Education
■ Consultants and Practitioners in Human Resource Management
■ Miscellaneous Group
❍ Locus of Learning
■ Community of Practice
● Orgnaizational Memory
❍ What is organizational memory?
❍ Storage
❍ Retrieval
● Methods of Organizational Learning

● References
Organizational Learning
What is organizational learning?
The concept of organizational learning was proposed in late seventies by Argyris and Schon (1978), who are the pioneers of
OL research. Organizational learning refers to learning behavior in an organization. It means the organization itself learns as
an independent learning organism. A number of articles and books on organizational learning (OL) have been published
since Senge's Fifth Discipline(1990) attracted people's interests. The Center of Organizational Learning at MIT is the leader
in this area. If you are interested in the content of Fifth Discipline, visit the Fifth Discipline Web page written by Flemming
Funch.
Who has interest in organizational learning?
Learning has been receiving a spotlight in the business area as a new management concept. Different from past learning
researches which were usually done in the psychology and education, learning researches of the nineties take different
perspective--organizations which learn and create knowledge, or organizational learning. Many people outside psychology
and education areas have shown their interests in learning processes. They can be categorized into four groups based on
their orientations and approaches to learning.
Business Pioneers in System Dynamics
According to Forrester (1994), founder of system dynamics and the mentor of Senge, system dynamics is a profession that
(1) integrates knowledge about the real world with concepts of how feedback structures cause change through time, and
with computer simulation for dealing with systems that are too complex for mathematical analysis; (2) can lead to an
improved framework for understanding complexity; (3) can unify the diverse aspects of society and nature by combining the
interactions between science, psychology, politics, biology, environment, economics, and management.
For more information, see The Center of Organizational Learning or INFORMS (Institute for Operations Research
and the Management Sciences).
Learning and Educational Psychologists in Psychology and Education
The second group is from psychology and education areas. Researchers from psychology are trained to develop the learning
theories and those from education usually have interest in application of theories and models to the diverse learning
institutions such as school and corporation. The knowledge of these areas helps people develop tools and manuals for
learning organization.
For more information, see Institute of Learning Science at Northwestern University and Learning Research and
Development Center at University of Pittsburgh.
Consultants and Practitioners in Human Resource Management
The third group includes consultants and field practitioners in human resource areas such as training, human resource
strategic planning, and so on. The characteristics of this group is to depend on their experiences rather than theories. The
contribution of this group is to create the concept of knowledg management.
For more information, visit the Center for Creativity and Innovation in England and Metanoia to see how the
practitioner's approaches to organizational learning look like.
Miscellaneous Group
The fourth group includes people from areas such as sociology, anthropology, public policy, and communication. They are
usually interested in how policy, communication network, historical characteristics affect learning processes in the
organization; in other words, their researches are focused on the effects of learning environments (policy, network,
demographic, culture, etc.) on learning. The knowledge of these areas helps understand the effect of learner's environments
on organizational learning.
Locus of Learning
Community of Practice
Work by the Institute for Research on Learning (IRL), set up by Xerox in Palo Alto, shows that the real locus of learning, in
corporations and in life, is a community of practice--an informal group of people who have a shared interest in a subject
(Stewart, 1996).
If you want to learn more about it, refer to the Invisible Key to Success and Company Values that Add Value published in
Fortune magazine.
Go to contents
Go to Trochim Home Page
Go to Project Gallery Home Page
Organizational Memory
What is organizational memory?
Organizational memory refers to stored information from an organization that can be brought to bear on present decisions
(Walsh & Ungson,1991). They also mentioned that organizational memory is not centrally stored, but distributed across
different retention facilities.
Storage
Storage is the retention of experiences and information to memory. It is important for the organization to retain its
experiences and informations for the future use.
Walsh and Ungson (1991) suggested the existence of five storage bins or retention facilities that compose the structure of
memory within organizations and one source outside of the organization. Five structures include individual, culture,
transformation, structures, and ecology. And one outside source is external archives. According to their organizational
memory structure, all six facilitate organizational memory.
Retrieval
Retrieval is the recovery of previous experiences which are stored in memory. It is important for the organizations to
increase their ability to retrieve their previous experiences whenever they need.
Huber (1995) suggests computer-based organizational memory. He said it helps automated capturing and sophisticated
retrieval of information which results in computer- resident organizational memories with completeness and precision.
Visit Winthrop Group to see how the concept of organizational memory is applied in the real world situation.
Go to contents
Methods of Organizational Learning and memory
The methods for researching organizational learning and memory are usually case study, simulation, experiment, and interview.
The Center of Organizational Learning at MIT leads organizational learning researches. On their page, they exemplify
methods and tools of learning.
Go to contents
References
Argyris, C., & Schon, D. (1978). Organizational Learning: A Theory of Action Perspective. Reading, MA: Addison-Wesley.
Forrester, J.W. (1994). Policies, decision, and information sources for modeling. In J.D.W. Morecroft, & J.D.Sterman (Eds.),
Modeling for Learning Organization (pp.51-84). Portland, Oregon: Productivity Press.
Huber (1995). Organizational Learning: The Contributing Processes and the Literature. In Cohen & Sproull (ed.), Organizational
Learning. CA: Sage.
Senge, P.M. (1990). The Fifth Discipline. NY: Doubleday/Currency.
Stewart, T.A. (1996). Company values that add value. Fortune, July 8: 145-148.
Stewart, T.A. (1996). The invisible key to success. Fortune, August 5: 173-176.
Walsh & Ungson (1991). Organizational Memory. Academy of Management Review, 16:57-91.
Go to contents
Copyright © 1997, Hee-Jae Cho. All rights reserved.
Email to hc35@cornell.edu
Revised: April 4th, 1997

Research in the Global Marketplace of Ideas
"...whoever undertakes to set himself up as a judge in the field of Truth and Knowledge is
shipwrecked by the laughter of the gods"
-- Albert Einstein
Researchers can find information as well as their own home pages on the World Wide Web. Access to and from worldwide sources requires a text of information for
presentation, in a comprehensive fashion, within the web media world which embodies a range of sophistication both in the presentation of a home page and its content.
The medium provides a resource of vast proportions at our fingertips, but the content of each page and its categorized interconnections with other web sites and home
pages requires an analysis to determine a value relative to respective research.
One way to envision access on the WWW is to view a resourse web page such as the PRAXIS web site. The page is a connection to other web sites on a global scale.
The "gateways", as evident in the Praxis site, include a host of resources for "Social and Economic Development" and include categories from throughout the world.
These also include Web search engines to guide one to arenas for resources, reports from countries on comparative social research, and sites to examine careers in
social development work.
For purposes of this discussion, an analysis of the global implications of the web will focus on certain web resources available for use by communities in the United
States. The WWW is a form of media presentation and as such, it may be helpful for this analysis to understand United States culture as defined by media analysts.
According to some media analysts, mass media discourse, through its prevalence in the United States, has constructed the national audience as consumers in a market-
consumer relationship. In this case, it may be helpful to examine each web page with an initial question: "What is being sold on this page?"
For this purposes of disclosing a market strategy on each web page or any combination of web pages it may be helpful to use a method of media discourse analysis by
answering the following questions:
● How is the text presented?

● Who is the audience for this text?
● What is the order of discourse?
In many instances, such as the Praxis site the answers to these questions are not clear cut. However, there are other examples of pages which express more specific
aims. At face value, a text on the web can have a direct appeal to the audience to purchase a product. The text of the page is designed to sell products. The audience is
the consuming public with access to the WWW and the order of discourse facilitates the intent of the page to sell things.
Blatant advertising such as on The National Trading Post web site can be regarded as a cause-effect relationship which clearly could be measured based on audience
responses. However, other market strategies can be far more subtle.
One such subtle marketing strategy is something which I will call the selling of "categorical assumptions." In this analysis the term "categorical assumption" will refer
to the reponses or "matches" that come under a particular keyword search. The first categorical assumption is made when the individual assumes that a topic to be
searched comes under a particular subject word or heading. We are convinced or "sold" on a particular topic heading for a search. From the web search, "matches" are
found to provide gateways to other sites. At this point, the matches can imply another categorical assumption not intended in the initial search. Matches can imply that
an additional category is assumed.
For instance, in YAHOO, a topic search under economic development, women calls up one United States site. From this particular site, one could assume that women
and economic development are topics exclusive to the domain of business. In a similar search, using the words economic development, feminism, there is only one
match. The categorical assumption here is that all of feminism and economic development coincide with ecological concerns. In each case the text implies another
category which was not intended in the search. In a sense the search of the text is designed to appeal to an audience interested in women and feminism in economic
development, but in the discourse, in each case, another concept is "marketed" along with the original topic heading. If these additional concepts remain unquestioned
by a researcher, then the search is not discriminating between the two concepts and therefore unreliable.
As researchers we need to take web searches into careful consideration given the text analysis above. If we accept one concept and simply assume another throughout a
study then the results could be misleading. The World Wide Web offers much material for research purposes. However, categorical assumptions are a tremendous
threat to validity since there is a particular cause-effect relationship in the "selling" of an additional concept which reduces the researcher to a market consumer without
a critical awareness of the material at hand. In this way, the text can manipulate the researcher within the research design. I will make my argument clearer as I
demonstrate the "categorical assumptions" in my own research on the World Wide Web.
Increasingly, as a brief illustration, the world is becoming an arena for the selling of economic zones. The North Atlantic Free Trade Agreement (NAFTA), as one
example, has its own web site for information on business and opportunity. Emerging countries such as Estonia have web pages as well for their economic promotion,
as do arch-rival United States neighbors such as Cuba and even states within countries seeking to promote themselves within the NAFTA arena, such as Morelos in
Mexico. The list goes on and on. Web sites are a trend offering more accessible possibility for marketing strategies for persons and entities researching opportunity. In
the United States, again as an example, municipalities are also addressing the need to be visible in a global economy through the development of their own unique
home pages on the WWW, such as those available now from St. Louis, Chicago, , Los Angeles, as are states such as New York State and smaller, more local
communities such as Ithaca, New York.
The proliferation of United States state and municipal home pages on the web coincides with a national agenda which has emphasized the need for community
revitalization through empowerment zones. The purpose of empowerment zones (EZ/EC) is stated in the research priorities of the Department of Housing and Urban
Development (HUD), "Empowerment zones will be a primary tool for central-city economic revitalization ... the department will foster and support community based
initiatives for economic development, social service delivery, and housing renovation". In the interest of developing empowerment zone communities, the U.S.
government has offered a web site to facilitate a community's own construction of a home page through a toolbox the government offers on the WWW.
According to the text of the HUD economic plan the federal government's agenda for community empowerment contains at least two categories: economic
development and social service delivery. The revitalization plan offers opportunities for communities to market themselves on the WWW since "revitalization" will
encompass attracting new jobs to address the needs of people who are now on public assistance Indeed, community development is an important agenda, according to
Robert Reich, Secretary of the Department of Labor. The economy is now a competitive "global web" and global visibility on the World Wide Web is designed to
market the community to enhance economic development. In addition, according to Reich, the competitive business of seeking out jobs on the global marketplace has
required that taxpayers pay corporations to move to U.S. communities. In New York State, the cost to taxpayers is between $9,000 and $53,000 per job per year, year
after year (Lynch, p. 16, 1996). The question here is, what are the categorical assumptions of "economic development" as marketed by communities on the World Wide
Web if corporate welfare is required to attract private industry to an area. Also, if public money is funding private industry as public programs decline, what does this
mean for the research on human services and social service delivery? (For information on present efforts for welfare reform see the content of the current welfare
reform bill, HR 4, the "Personal Responsibility and Work Opportunity Act of 1995," now before Congress"). These issues -- the categorical assumptions of economic
development through corporate welfare and social service delivery as part of public social programs -- do not appear to be clearly stated on the government's
empowerment zone pages on the web.
Research and research methods in the global marketplace of ideas will be challenging. The argument above is stated to initiate a particular dialogue toward an analysis
of the use of the WWW for global agendas that impact on local communities. As an important medium for access to the world in research, the WWW is also an
important instrument that can be used to question, investigate and interrogate its own uses.
If you wish to express your thoughts you may send a message by e- mail Eileen Robertston-Rehberg, this page's author, at ear5@cornell.edu
Bibliography:
Backhouse, Roger E. Economists and the Economy: The Evolution of Economic Ideas, 2nd Edition. London: Transaction Publishers, 1994.
Fairclough, Norman. Media Discourse. New York: Edward Arnold, 1995.
Lynch, Robert G. Do State and Local Tax Incentives Work?. Washington, DC: Economic Policy Institute, 1996.
Reich, Robert B. The Work of Nations: Preparing Ourselves for 20th Century Capitalism. New York: Vintage Books, 1991.
Evaluating Nutrition Programs
Nutritious diets are related to reduction of disease and the improvement of health and well-being:
......35% of cancer deaths are linked to diet (U.S. National Cancer Institute estimate, 1997 ).......... The best way to help lower your blood
cholesterol level is to eat less saturated fatty acids and cholesterol, and control your weight (The American Heart Association, 1997)...........
As a result, many nutrition education programs have been developed. Goals of these programs are often to provide information
and ultimately to change participants' dietary behaviors.
How do we know if nutrition education programs "work?" What evaluation methods are used?
This page provides information and links to other web sources on the topics of:
Nutrition Program Evaluations
Evaluation Methods
Nutrition Resources on the Web.
Project Gallery HomePage TrochimHome Page
Send comments to:srd2@cornell.edu Top of Page

The following links are to full-text research articles on Nutrition Evaluation topics available online. The main source that I found was the Journal
of Extension,( about) accessible via gopher. The journal has the following copyright message, "Single copies of articles may be reproduced in
electronic or print form for use in educational or training activities." The aim of this page is primarily educational, so they are included as links.
Evaluating Evaluation--What We've Learned (Research in Brief )"This paper reports the results of telephone surveys with Extension nutrition
specialists and field staff who participatedin IIP.Although two separate surveys and analyses were used, both examined the use of nationally
developed evaluation instruments focused on nutrition impact indicators and the perceived importance of the program evaluation process in
general" (from abstract) by K. Chapman-Novakofski, Extension Nutrition Specialist, University ofIllinois, Urbana, Illinois et al. Journal of
Extension Volume 35 Number 1 February 1997.
The Effect of Nutrition Education on Improving Fruit and Vegetable Consumption of Youth (Feature Article) "The high rate of cancer deaths in
two rural towns in northeast Colorado prompted community action and Extension intervention. Citizens, aware that nutrition and eating practices
could lower cancer risk, contacted their Colorado State University Cooperative Extension agent for program possibilities. A team was formed to
work in these remote small towns to improve nutrition, diet, and health using the 5 A Day message" (from abstract). by Linda Ryan, M.S., R.D. Ph.
D et al, Food Science and Human Nutrition Colorado State University Fort Collins, Colorado.. Journal of Extension Volume 33, Number 5 October
1995
Extension Programming to Educate the Elderly about Nutrition (Research in Brief). "An important issue for the Cooperative Extension System is
to assist families with responsibility for dependent elderly by promoting use of community-based programs and resources. This study compared
self-perceived problems of life quality and diets of participants in Title III congregate and home-delivered meal programs. The purpose was to
generate ideas for collaborative education efforts between Extension and government-sponsored elderly nutrition programs using self-assessed
nutrition and life quality issues of the clientele (from abstract." by Georgia C. Lauritzen, Ph.D., R.D et al. Department of Nutrition & Food
Sciences Utah State University Loga, Utah Journal of Extension Volume 32, Number 2 August 1994.
Expectations May be Too High for Changing Diets of Pregnant Teens (Research in Brief). "Expecting to change the diets of low income pregnant
teenagers through less than nine months of nutrition education may be as unrealistic as it is desirable. Instead, we may need to look at other
positive changes resulting from nutrition education and hope for dietary changes to follow in the future" (from abstract). by Holly Alley, M.S., R.
D., L.D. et al, University of Georgia Cooperative Extension Service Athens. Journal of Extension Volume 33, Number 1 February 1995.
Evaluating Extension Program Effectiveness: Food Safety Education in Texas (Feature Article) "Budget constraints have caused federal and state
legislators to carefully examine funding requests in terms of the effects or outcomes of particular programs. Those programs most likely to be
funded are ones which give the greatest return for the dollars spent. Thus, in order to receive funding in the future, Extension personnel will need to
continue developing and improving strategies to inform legislators about the value of their programming efforts" (from abstract). by Peggy Gentry-
Van Laanen Nutrition Specialist Texas Agricultural Extension Service Texas A&M University College Station. Journal of Extension Volume 33,
Number 5 October 1995.
Post-Then-Pre Evaluation Measuring behavior change more accurately. (Feature Articles) "What's an easy, simple, reliable, and valid way to
measure whether a program has impact? This question is asked frequently by Extension agents and specialists as they respond to accountability
needs within the Extension organization. The "post -then-pre" method of self-report evaluation offers one solution" (from abstract). by S. Kay
Rockwell Harriet Kohn. Journal of Extension Volume 27, Number 2 Summer 1989.
Ideas at Work Integrating Evaluation Into Teaching "One traditional Extension evaluation tool has been pre- and post-tests. Knowledge increase is
traditionally less important than a positive behavior change. The characteristics of learners attending classes can make traditional evaluation
inappropriate" (from abstract). by Sonia W. Butler Extension Home Economist Rutgers Cooperative Extension of Ocean County Toms River, New
Jersey. Journal of Extension Volume 29, Number 1 Spring 1991.
Evaluation Methods
Top of Page
Send comments to:srd2@cornell.edu
Evaluation Methods
The following are links to sites with backround and methodologic information about evaluation:
General:
Bill Trochim's Center for Social Research Methods
Quantitative Evaluation:
Qualitative Research and Evaluation:
Q.D.A. Resources
Why do qualitative research?
Qualitative Research Text Resources
Using Hierarchical Categories in Qualitative Data Analysis
Top of Page
Nutrition Resources
The following links are to nutrition-related sites (which often contain further links to specific nutrition related issues):
General Nutrition:
ADA Nutrition Links
Index of Food and Nutrition Internet Resources
Center for Nutrition Policy Promotion
World Health Organization Headquarters' Major Programmes Page
Fruits and Vegetables:
Eat More Fruits and Vegetables!
Fruits and Vegetables
Fruits and Vegetables in Disease Prevention
FDA Consumer--Fruits and Vegetables: Eating Your Way to 5 A Day
Evaluation Methods
Top of Page
Copyright ©: 1997. Sally R. Durgan. All Rights Reserved

Web Resources of Nutrition for the Elderly
Current and projected demographic increases in the elderly population contributes to increase interests in the
nutritional health of older adults and to the urgency of a number of research questions.This page will provide
some links to various internet sources related with questions of nutrition for the elderly.
■ Introduction
■ What is Aging?
■ Effect of Aging on Nutrition
■ Goals of nutrition for the elderly
■ Nutrition evaluation of the elderly
❍ Limtations of Nutritional Evaluation
❍ Nutritional Screening
■ Nutrition program for the elderly

■ Reference
Thanks for coming!

Jung Sun Lee
School of Medicine, Division of Geriatric Medicine
University of Pittsburgh
Introduction
The recent report of the Census bureau of the statistics contained some sobering statistics and graphics
concerning future demographic trends of the elderly population in the United States. As the increasing lifespan
places greater responsibility on older people, those in the nutrition and health care field will face the challenge
of providing health and nutrition education that will promote wellness and assist older persons in maintaining
their independence. At the same, time it will be necesary to plan appropriate nutrition services for those
individuals in declining health who require specialized care.
These projections are more than just numbers. This information concerns the lives of real people with particular
needs and resources who will require health and nurition services that are practiced or even understood in our
society today. Professionals and others deal with older Americans need a view of nutrition that fcuses on living,
quality of life, and function rather than solely on the clinical aspects of treatment. More attention should be
given to achieving greater integration of preventive and curative health services and to assuring that the social,
emotional, and economic supports all of us need as we grow older are provided.
This Web site tries to integrate some electronic resources to link those kinds of issues.
Return to Home Page

What is Aging?
The aging process is multifactorial and results from the combined effects of inherited(genetic) and acquired
factors, including life style, food habits, physical activity, and diseases. That give rise to the various apprches in
aging research.
Followings are diverse kinds of aging resources on the web. Almost of them are aiming to give general
perspectives of aging.
❍ Andrus Gerontology Library Web Resources on Aging

This site gives profound link information related with aging. It also provides useful research
bibliography lists related with aging.
❍ Directory of WEB and Gopher sites on Aging

This is a government site of aging. This includes important information about governmental programs
related with the elderly.
❍ Administration on Aging's(AoA)
Information is provided about the Administration on Aging and its programs for the elderly, statistical
information on the aging. It also includes a link to AoA's National Aging Infomation Center and extensive
links to other aging related Web resources.
❍ The Aging Research Centre(ARC)

This site is dedicated to providing the study of aging process. Especially, this includes the real audio
and radio programs on biological aging process.
❍ Wellness-Health Care Information Resources

This site gives list of aging links in not only America but also Canada
❍ Nutritional Health of Elderly New Yorker

This site presents selected findings from the New York State Elderly Nutrition Survey, a statewide
telephone survey for assessing the extent of nutritional health risk among elderly New Yorkers.
❍ Glossary of Aging Terms

Basic vocabulary list related with aging is shown.
Return to Home Page

Effect of Aging on Nutrition
Aging produces physiologic changes that affect the need for several essential nutrients. While the impact of age-
related alterations in physiology and metabolism has been extensively assessed in pharmacological studies, it
has only been within the last two decades that much research has been conducted to define the impact of these
changes on human nutritional requirements.
The nutrition status of the elderly is also dependent on social conditions and is influenced by the long-term
effects of chronic disease and the intake of medication, which can sometimes generate undesired interactions
with nutrients.
The physiologic changes of aging, including perceptual, endocrine, gastrointestinal, renal, and muscular
changes, may affect nutrition needs.
1. Energy needs decrease with age as the result of changes in basal metabolism and physical activity.
Basal need are not under the individual's control, but energy expenditure in physical exercise varies
according to individual activity patterns. Physical activity can play a major role in maintaining energy
balance.
2. Sensory impairments that occur with aging may result in consumption of a more monotonous diet. Taste
and smeell dysfunction in older adults reveal a progressive decline with age.
Return to Home Page

Goals of nutrition for the elderly
Good Nutritional status contributes immeasurably to the quality of life of our nation's elderly population. There
are some web sites dealing with that;
■ Importance of Good Nutrition of the Elderly
■ Better Eating for Better Aging
■ In addition, USDA provides several web sites about dietary recommendation such as,
❍ Dietary Guidelines for Americans
❍ Food Guide pyramid
❍ Healthy Eating Index

But these sites are subjected to general population rather than the elderly people.
America's latest seminal document,Healthy people 2000, on health promotion and disease prevention goals for
the year 2000 presents worthy health goals for aging Americans.
The overall goal is to prevent older Amricans from being disabled and to help those with disabilities to prevent
further declines and preserve function. More specifically, it is hoped to increase the span of healthy life, that is,
life that permits independent function, not just a longer life. Several of the year 2000 goals for maintaining
vitality and independence in older people are partocularly relevant with respect to nutrition. The year 2000
nutrition Goals chiefly address health status and risk reduction.
Return to Home Page

Nutrition Evaluation of the Elderly
The nutritional status of elderly persons is determined by their nutrient requirements and intake, which are
themselves influenced by other factors, such as physical activity, lifestyle, family and social networks, mental
and intellectual activity, disease states, an socioeconomic constraints.
Any meaningful evaluation of nutritional status must therefore include information on these factors to help in
understanding the etiology of possible deficiencies, designing corrective interventions, and evaluating their
effectiveness.
Frequently asked questions regarding nutritional evaluation of the elderly can ce summarized as following ways
1. Do the elderly differ from the rest of the population with respect to nutritinal parameters?
2. What factors increase the potential for malnutrition among elderly within institutions?
3. How valid and reliable are present technique in diagnostic malnutrition?
4. Are tests more accurate than “global assessment? in predicting outcomes?
5. Does appropriate nutritional evluation and management affect outcme of concurrent disease?
One prerequisite for answering such questions is development of a body of knowledge describing the
nutritional status of healthy elderly people-their health condition as influenced by the intake and utilization of
nutrients. Nutritional status information, collected through the process of nutritional biochemistry and
physiology with the applied sciences of human nutrition and gerontology.
Unrecognized undernutrition is a frequent Achilles' heel among the elderly that can cause the condition known
as failure to thrive and can trigger a domino effect to further decrease physical health and psychological
function in the elderly.
Return to Home Page

Limitations of Nutritional Evaluation
Global evaluation of physical, mental, and social states before treatment and readaptation is fundamental to the
care of the elderly to assess health problems and restore their autonomy. Management after geriatric
assessment is helping to improve the survival and functional status of the elderly.
Simple and rapid screening tests for functional evaluations in geriatric evaluation programs are in use for
testing mental faculties( Mini-Mental State Examination), autonomy(Activities of daily living, and instrumental
activities of daily living), gait and balance(Tinetti gait and balance scale), and emotional(Geriatric Depression
Scale) state. Nutrition evaluation, however, is usually absent. This is partially explained by the lack of a
specifically validated scale to access the risk of malnutrition in the elderly.
In addition, consideration of the new evidence of changed nutrient requirements associated with aging raises
the important issue of defining appropriate criteria for the selection of RDAs, because
1. Studies of older individuals have been few,
2. The RDA for persons iver 50 has been extrapolated from thatdeveloped for young adults. For the
majority of nutrients, recommended intakes are the same for younger(age 25 to 50) and older(age 51 and
over) adults.
3. There are no adjustments in the recommended intakes of any nutrients beyond age 51. Many of the
criteria employed by the Food and Nutrition Board of the US National Academy of Sciences lack the
sensitivity to detect subtle nutrition-sensitive alterations in metabolism which have significant
consequences for the aging process or place little weight on the risk factors for chronic diseases
common among the elderly.
Thus, one critical issue is whether nutritional requirements should be adjusted on the basis of observed age-
associated changes in body composition and physiologic function or whether optimal body composition and
levels of function should be determined for different age groups and nutrient intakes designed to achieve them.
The absence of validated age-adjusted values for anthrometric, biochemical, and clinical standards has always
brought an measurement issues, especially the problem of nutrition evaluation for the elderly is related with
construct validity of measurement, and also there are several kinds of threats to this validity.
It is obvious that only those evaluations that use appropriate standards for age and sex are likely to be
meaningful. At the same time, the caregiver must be aware of the intrinsic limitations of most tests. While
repitition of tests may improve reliability, sensitivity, and specificity are best enhanced by a more holistic
approach to evaluation and an awareness that factors other than malnutrition may affect nutritional indicators,
because the general condition of the eldrly and the practices of the examiner also can have a critical effect on
nutritional status.
Every effort must be made to idetify potentially detrimental practices that might compromise he achievemenrof
successful outcomes.
Return to Home Page

Nutritional Screening
Nutritional status has traditionally been assessed through the use of anthropmetric measurement, biochemical
indexes, clinical evaluation, and dietary analyses. these methods may be invalid and unreliable for elderly
persons for four reasons
1. Data from elders are compared with normal values for young persons,
2. Physical conditions make some measurements difficult or skew results,
3. Recommended dietary allowances(RDAs) are indiscriminate for those over 50 years of age,
4. Nutrient deficiency signs and symptoms are in appropriately attributed to normal aging.
Moreover, standardized normal values and measurements do not consider the effects of ethnicity, income, and
education on nutritional risk, and little is known about nutritional risk in the oldest-old. Nonetheless,
identification of an elder who is at nutritional risk and the factors that contribute to reduce existing nutritional
risk, to prevent health complications, to avoid associated health care costs that are due to poor nutrition, and to
improve the quality of life.
Recently, efforts have been directed toward finding and using inexpensive, efficient, and comprehensive
methods of evaluating nutritional risk across all ages of elders, specifically noninstotutionalized elders
Evaluation of nutrition status is important for all nutrition or dietary intervention. Global evaluation of nutrition
status is composed of a synthesis of information, including clinical evaluation, a dietary history, an
anthropometric evaluation, and biochemical evaluation. Until recently, no nutrition tests were available.
Lately three different types of nutrition screenings have been developed. But, It is not easy to find many relative
web sites which give some information about those methods except a few simple documents. Following are the
list of Nutrition screening methods and their related web resources.
■ First is the Public Awareness Checklist of the Nutrition Screening Initiative. This simple test is aimed at
increasing the nutrition awareness of elderly people but is not used to diagnose malnutrition. The
screening initiative was formed in early 1990 as a 5-year multifaceted compaign to promote nutrition
screening and better nutritional care in the United States.
The Initiative proceeds from the premise that better nutritional care can improve quality of life, facilitate
aging in place, promote health, and improve outcomes when people are ill or injured. Good nutritional
status can shorten hospital stays and delay entry into nursing homes.
The Initiative developed and widely distributed three screening tools- the Determine Your Nutritional
Health Checklist; and Level I and II nutrition screens, along with the Nutrition screening manual for
professionals caring for older Americans. The Initiative validated the usefulness of the Checklist as a
public awareness tool. Public-service announcements and posters were distributed to encourage the
general public both to pay more attention to their nutritional health and to expect the health-care system
to assess their nutritional status more routinely. The Initiative believes the interventions represent the
strongest interdisciplinary actions that can be taken based on the current state of nutrition and medical
science and practices relating to nutrition screening, assessment, and care
❍ "You are what you eat"
❍ "Food Nutrition Newsletter"
■ Second the Subjective Global Assessment(SGA) and the Prognostic Nutrition Index are aimed at
evaluating the nutrition risks of hospitalized patients. These tests seem to be an evaluation of the risks
of complications. The Nutritional Risk Index was developed in the early 1980s when the traditional
methods of assessment were expensive, inefficient, and inappropriate for measuring nutritional status of
elders. Prospective analysis of hospital courses of general medicine patients with varied diagnoses
indicated that patients with adverse outcomes were malnourished upon admission or were at risk for
developing malnutrition. In light of this finding, 16-item Nutritional Risk Index was diagnosed. This was
found to be valid and reliable with urban community-dwelling elderly 65 years old or older and with
ambulatory, elderly male veterans. However, the validation studies did not examine nutritional risk
among minority elders or among the oldest-old, particularly centenarians.
❍ "What is SGA of nutritional status"
❍ "SGA of Nutritional status:further validation"
■ They are not aimed at screening for risks of malnutrition in the elderly within general practice at
admission to nursing homes, or for the frail elderly. To complement these screening tools, Yves et al
develop a simple tool to assess the risk of malnutrition in the elderly, called Mini Nutritional Assessment
(MNA). MNA is validated as a practical, noninvasive tool allowing for rapid evaluation of the nutrition
status of frail elderly.
❍ "Assessing the Nutritional status of the Elderly: The MNA"
Return to Home Page

Nutrition Program for the Elderly
There are several web sites which introduce and evaluate the nutrition program for the elderly. Those are
usually made by government and institution related with aging.
■ Food Assistance Program
❍ Food Stamp
❍ Food Distribution Program
❍ Nutrition Program for the Elderly
■ National Nutrition Program for the Elderly
■ Home-Delivered Meals
Evaluation published by US Administration of Aging
■ Nutritional Health of Elderly New Yorkers

This site presents selected findings fron the New York State Elderly Nutrition Survey.
■ Program Evaluation
In this site, program evaluation abstracts and documents can be obtained by searching the document
index page using key words.
Return to Home Page

Reference
1. H. James Armbrecht, John M. Prendergast, and Rodney M. Coe. Nutritional Intervention in the Aging
process. 1984 Springer-Verlag NY Inc.
2. Sue R. Williams and Bonnie S. Worthingtin-Roberts. Nutrition Throughout the Life Cycle (2nd ed.).1992.
Mosby Year Book Inc.
3. Reva T. Frankle and Anita L. Owen. Nutrition in the Community-The art of delivering services. 1993
Mosby Year Book Inc.
4. Margaret D. Simko, Catherine Cowell, and Judith A. Gilbride. Nutrition Assessment- A comprehensive
guide for planning intervntion (2nd ed.) 1995. ASPEN Publisher Inc.
5. Hamish Munro and Gunter Schlierf. Nutrition of the elderly. 1992. Raven Press NY
6. U.S. Administration on Aging Symposium. Nutrition Research and the Elderly. Nutrition Review 1994 52
(8, partII)
7. U.S. Administration on Aging Symposium .Nutrition Research and the ElderlyII. Nutrition Review 1996 54
(1, partII)
8. Barbara M. Posner , Kevin W. Smth, and Donald R. Miller. Nutrition and health risks in the Elderly:
Thenutrition Screening Initiative. American Journal of Public Health :1993;83(7):972-978
9. Jane V White, Barbara M. Posner, and David A. Lipschitz. Nutrition Screening Initiative :Development
and Implementation of the public awareness checklist and screening tools. American Dietetic
Association.1992;92(2):163-167
10. Sharon M. Nichols-Richardson, Mary Ann Johnson, and Leonard W. Poon. Demographic predictors of
nutritional risk in eldely persons. The Southern Gerontological Society.1996;15(3):361-375
11. Karen M. Chapman, Joan O. Ham, and Robert A. Pearlman. Longitudinal Assessment of the Nutritional
status of elderly veteran. Journal of Gerontology:Biological sciences. 1996;51A(4):B261-B269
Return to Home Page

Welcome to a collection of nutrition education resources
In this document you will find links to various aspects of nutrition intervention programs. In the 1990's, people's awareness and
understanding of wellness has increased.
The U.S. Government has become involved by creating Healthy People 2000 which are national health promotion and disease
prevention objectives for the nation. Health promotion includes numerous aspects, one of which is nutrition.
In order to reach a great number of people, using limited resources, many governmental and organizational nutrition programs are
occurring in communities and worksites. The effectiveness of programs delivered to such populations is often difficult to evaluate
due to confounding factors such as societal trends, process evaluation, and others. Although there currently is limited information of
specific nutrition intervention methods on the World Wide Web, there is a wealth of nutrition information that can be incorporated
into intervention designs.
There are several abstracts that deal with specific evaluation issues of nutrition intervention studies, which are listed below. The full
text of these documents can be received by mail.
● Evaluation techniques for performing studies:

❍ in Public Health.
❍ Sampling and variables in nutrition studies
❍ Survey methods in health research
❍ Nutritional surveying targeted to Native Americans
❍ Health and Human Services evaluation studies
● Review of Worksite Intervention Results:

❍ In the cafeteria - the Healthy Menu Program
❍ In overall health -1992 National Survey of Worksite Health Promotion Activities
❍ With regards to implementation - Healthy Difference Report
❍ Ways to successfully integrate programs. - Unified Approaches to Health Communication
❍ Delivery of programs up to 1979 National Conference on Health Promotion Programs in Occupational Settings
● Review the effectiveness of:

❍ Health and Human Services Cancer Program
❍ Center for Disease Control's Program Wonder
❍ Prevention Programs
❍ Past trials
● Using failings for future research:

❍ Failure to meet people's needs
❍ Problems in information dissemination
The previous abstracts are a sampling of ones that can be obtained by searching the document index page using key words such as
Health Promotion, Nutrition Education, Information Dissemination, and Food and Nutrition.
Another site for evaluation abstracts is the Health and Human Services Evaluation and Research Page in which searches can be done
on issues of evaluation. The Dietary Guidelines Advisory Committee has a great amount of information outlining current dietary
guidelines and ways to deliver nutrition messages. The U.S. Department of Agriculture provides dietary recommendations as well as
a large list of references dealing with the effectiveness of past interventions. This is a good starting place to collect references of
specific studies and intervention techniques.
In determining what areas to target interventions to it is often helpful to determine the current status of the population to be studied.
There are several measurements such as the 1995 Gallup Pole as to what people are eating and recommendations based on those
results.
With regards to implementing intervention programs, there are numerous educational materials and information available through
WWW. General searches can be made through Yahoo by typing key word searches. This takes time and perserverance in order to
determine where reliable, accurate, and good information is located.
Some specific educational materials that can be used to deliver nutrition intervention messages include:
● Food Pyramid information and a vast amount of other educational nutrition material.
● Graphic Food Pyramid and other information with a large amount of graphics.
● regarding programs for K-12 school children and specific programs for diet analysis as well as links to other sites
● Food Safety issues and links to other sites.
Another great source provides links for various international, national, and state organizations dealing with health. An extensive
listing of electronic nutrition resources provides information on obtaining these databases and connections but provides no direct
links itself. There is also a good starting place to get information from food biotechnology to dietary tips.
For access to governmental agencies that deal with nutrition issues:
● Department of Health and Human Services

● Food and Drug Administration
● Food and Drug Administration's Center for Food Safety and Applied Nutrition
● Food and Nutrition Information Center
● Library of Congress
● National Institutes of Health
● US Department of Agriculture
Participatory Evaluation
information compiled by L. Coffin
● What is Participatory Evaluation?

● How is it done?
● What have other organizations done?
● Read more!
In the international rural development circles, participation is one of the words heard most often. People are talking about
Participatory Action Research, Participatory Rural Appraisal, Participatory Monitoring and Evaluation and the list goes on
and on. This site will be dedicated to Participatory Evaluation.
Most development organization have evaluation built into their projects. Participatory Evaluation is one method to perform
this task that can be a very effective option to getting to the "real" results of a program.
What is Participatory Evaluation?
Many organizations and individuals have defined participatory evaluation. The Institute of Development Studies (IDS)has
compiled information about what it is and much more. The Harvard Family Research Institute has a very informative article
on the theory and its roots. The Harvard Family Research Insitute has also published other informative articles to explain the
theory of participatory evaluation. The Action Evaluation Project has called it empowerment evaluation and has described
this variation on the theme.
back to top
How is it done? There are many guides and ideas on how to perform participatory evaluations with groups and
communities. Organizations such as The Food and Agriculture Organization of the United Nations (FAO) and the United
Nations Development Program (UNDP) have guides (UNDP-guide 1) (UNDP-guide 2). Even Cornell professors have this
type of information out there in cyberspace. Another great site is Eldis.
back to top
What have other organizations done?
In addition to all the information on what participatory evaluation is and how to do it, there is also information on the results
of undertaking this method. Here are a few case studies:
❍ Haiti
❍ El Salvador
❍ a Canadian school
❍ Argentina
back to top
Read more!
And of course there are mountains of resources to use!
❍ MORE
back to top
DECISION AND JUDGEMENT
ANALYSIS
Decisions and Uncertainty!?!

"An approximate answer to the right question is
worth a great deal more than a precise answer
to the wrong question"- John Tukey.
Please Email me with any links , information or comments about this web page
Welcome to my WEB site on Decision and Judgment sciences. I hope that you
will find the information on books courses and other web sites useful.
Judgment can be defined as the application of knowledge to the process of
forming an opinion or evaluation by discerning and comparing. A decision can
be defined as a determination arrived to after deciding. Judgments and
decisions are made in various types of research, as well as in everyday life.
There are numerous schools of thought, theories and methodological

approaches to judgment and decision analysis. Some theories take root in
economic utility theory, game or probability theory, assuming a rational fully
informed decision/judgment maker. Other theories address irreducible
uncertainty, risk, and uninformed decision/judgment makers. The theories are
endless. (hopefully the links listed below will address your theory(s)/
methodologies of interest). Decision and Judgment analysis over the years
have become a crucial part of the medical, business, computing, psychological,
and policy and program making/evaluating fields. There are now computer
packages and research centers dedicated to decision sciences. My area of
interest is in a methodology called Judgment Analysis (more specifically,
Social Judgment Theory-SJT) . The theory behind SJT can be traced to
psychologist Egon Brunswick"s Probabilistic Functionalism in conjunction
with regression-based statistical analysis. SJT, more specifically is a
methodology that compares a person's judgment to that of distal criteria (i.e., a
standard) or to that of another person. The judgments are made from a set of
cues. The proper cues can be determined by experts in the area. Concept
mapping , is one way for the experts to gather ideas and decide! on the
appropriate cues. Social Judgment Theory, can be applied to many fields:
Educational and medical, decision making, accounting and program evaluation
to name a few. There are many other decision approaches based on different
theories and methodologies.
There are some journals that are largely devoted to research on the judgment/
decision sciences: 'Medical Decision Making' is a good source. Many other
prominent medical (The American Journal of Medicine) and other types of
journals also (due to the popularity and need for decision making) have articles
that pertain to SJT and other decision approaches. I have found some articles
that pertain specifically toSocial Judgment Theory .
Some of my favorite books on decision sciences are: "The Theory of Choice"

by Heap, Hollis, Lyons, Sugden, and Wale- is a great introduction to decision
science terminology from an 'economic' point of view. It provides a general
overview of 'rational choice theory'. " Judgment Analysis" by R.W. Cooksey-
provides a detailed description of the theory and applications of Judgment
Theory/Analysis. "Human Judgment and Social Policy-by K.R. Hammond
provides an overview of uncertainty, and the need for human judgment . I
found this book very informative and easy to read. Finally, there is a another
book that I just heard about-"Making Hard Decisions 2nd edition-An
Introduction to Decision Analysis"- by R.T. Clemen, click here for an outline
Table of Contents for "Making Hard Decisions" There is also a web site
Principia Cybernetica Web that has a great sel! ection of decision theory
terminology (defined) to get you started.
Summer programs with Decision/

Judgment related courses
● Harvard School of Public Health

Has a classes in Medical decision Making and Health Policy issues
● Erasmus Summer Programme 1997-Netherlands

Has a classes in Health Services Research, including, Medical
decision Making and Health Policy issues
● Society for Judgment and Decision Making Homepage
Society for Medical Decision Making

●
The Home page for the Society of Medical Decision Making-Great

Source!
● Decision Analysis Society Home Page.
● Decision Analysis Books.
A List of books recommended by the Decision Analysis Society.
Aliah ●
Aliah's comprehensive web site on the decision and judgment

sciences
● Medical Decision Making at University of Illinois Chicago

● Brunswik Society
A WEB Page on some aspects of social judgment theory.
● Georgia State Decision Sciences Home Page
Information about their graduate program, and research.
● Dartmouth shared Decision making Programs
Home page for Decisioneering corp.

●
Information on computing packages, and other decision making tools.
Lumina Decision Systems

●
Good information on artificial intelligence, bayesian networks, and

other decision science research facilities.
Decision Analysis and Support Systems

●
Methods for the approximate specification of preferences
● Decision Precision Home Page

Courses on decision analysis
● Social Judgment Theory and the Lens Model
Modelling factors which influence the judgment of amount of risk.
●Decision Precision Home Page

Lawrence Livermore National Laboratories Decision Science and
Research Systems
This web page was done a project for Cornell University's, Human Service
Studies 691-Research methods in program evaluation. To look at other web
pages from this class Project Gallery Home Page
Rhonda BeLue
Cornell University
Department of Policy Analysis and Management
Ithaca, New York 14850.
3/31/97
A Sample of Decision Approaches
Now Let's decide on a Decision

Approach
I learned about a variety of decision approaches by reading 'Judgment

Analysis' by Ray Cooksey and 'Theory of Choice' by Hargreaves-Heap et al. I
will describe a few different kinds of decision approaches, so that you can
compare and contrast them with Social Judgment theory.
Lets start with Signal to Noise Theory: Signal to Noise theory is a

methodology that models a persons ability to detect a signal or event against
any noise or other events going on. The response is usually binary when using
this method (yes/no). The intersection between the noise probability
distribution, and the signal to noise distribution defines the decision to be
made. Decision Theory : is a type of approach that has it's roots in economic
utility theory. Utility theory assumes that, people are rational , and will make a
decision that will maximize his or her own utility. This theory is bases on
knowing the circumstances around the judgment and knowing or estimating,
the probability of each circumstance occurring.
Game theory: game theory is in a sense, just a mathematical technique. This

technique involves knowing the basic rules of the game: Who is making the
decisions-who is playing the game, what strategies and information are
available to the players, the ability to form coalitions amongst players, pay-offs
to the players, and finally, what is known by all players.
Fuzzy Decision Theory: involves deals with making decisions with imprecise
information and measures. It deals with situations that might happen, as
opposed to assuming situations will happen.
Decision Theory, Game theory, and Fuzzy decision theory, all have their own
set of axioms, and guide lines that govern the decision processes.
As you can see, there is quite a broad spectrum of decision and judgment
approaches, and those represent only a few. I guess there is a theory out there
for everyone. Now someone needs to make up a theory on how to decide
which theory to use.
BACK TO GALLERY PAGE Project Gallery Home Page BACK TO Bill
Trochim's Home PageTrochim BACK TO MY HOME PAGE Decision and
Judgment
SOCIAL JUDGEMENT ANALYSIS
PAPERS
Here's some stuff to keep you

thinking!?!
Here are a list of papers on applications of SJT that I found to be interesting. I

found reading these papers very helpful to my understanding of the scope and
the applications of Judgment and decision sciences. I would highly recommend
reading these articles, or any other articles on the topic to enhance your
understanding of this field. 1)Use of Clinical Judgment Analysis to Explain
Regional Variations in Physicians accuracies in Diagnosing Pneumonia: by
Tape, Heckerling, Onrato, and Wigton, from "Medical Decision Making-
1991;11:189-197)
2) Exploring Physicians Responses to Patients Extramedical Characteristics-

The Decision to Hospitalize: by Kuder, Vilman, Delmo from "Medical
care,1987, vol 25 No. 9.
3)Social Judgment Theory and Medical Judgment: by Wigton, from Thinking

and Reasoning, 1996,2 (2/3), 175-190.
4)A Social Judgment Theory Perspective on clinical problem solving: by

Engel, Wigton, LaDuca, Blacklow from Evaluation and health professions.
5)Drug therapy decisions-A social judgment analysis: by Gillis and Lipkin

from Journal of Nervous and Mental Disease, 1981,169(7).
I have a more extensive list of journal names. Please email me is you are
interested in some more
BACK TO GALLERY PAGE Project Gallery Home Page BACK TO Bill

Trochim's Home PageTrochim BACK TO MY HOME PAGE Decision and
Judgment
The PopStop
Topics on Population and Demography
Welcome to the PopStop. The purpose of this on-line bibliography is to provide a brief overview of issues and trends in the area of
population and demography. The site includes current research and analysis, statistics and notes of interest on population.
What is Demography?
Currently, the world's population is fast approaching 6 billion people. This is a 33 percent increase from 4 billion in 1980. Annual
population gains are projected to be above 86 million until the year 2015. (United Nations Population Fund). For up to the minute
statistics on the US and the World's population from US Census Bureau view Popcl ocks.
As nations and transnationals grow increasingly aware of the importance of demographics and its impact on social, economic, and
political planning strategies, the use of population studies will also increase. Demography is the study of the size, distribution,
structure charateristics and processes of human populations. Population and Demography covers a wide range of issues and topics.
They are as follows:
● Migration and Immigration

● Marriage in the Post-Industrial Age
● Infant Mortality Rates
● Family Planning
● Life Expectancy and Public Health
● Census Data and Statistics
● Ageing and Pensions
● Agricultural Technology and Urbanization
Why is Demography Important?
"The everyday activities of all human beings, communities and countries are interrelated with population change, patterns and levels
of use of natural resources, the state of the environment, and the pace and quality of economic and social development." (Programme
of Action of the United Nations International Conference on Population & Development,Cairo 1994).
Population growth is a major factor in energy consumption, housing shortages, inflation, food security, unemployment and
environmental degradation. Many scientists warn that unchecked populaton growth threatens to consume an already compromised
store of the world's resources. In an effort to curtail the vagaries of population and its impact on societies, we must understand their
patterns of fertility, mortality, immigration, urbanization, development, and aging to begin to develop population polices for the
future.
Population growth has been and will continue to be an integral part of our lives. Over the past two decades in countries of the South,
we have seen a steady decline in the standard of living and quality of life caused by political instability, increased unemployment,
mounting external indebtedness and a decline in economic growth. Demographic diferentials between developing countries and
developed countries remain striking. The average life expectancy in countries of the North is 73 years, whereas in countries in the
South it hovers around 57 years. It is also a fact that families of the South tend to be larger. These realities contribute to the
continuation of the social and economic disparity of life between the North and South.
Population issues have been recognized as a fundamental element in development planning. Therefore development policies, plans
and programs must reflect the inextricable links between population, education, health, and the environment. Population and
development policies reinforce each other when they are responsive to individual, family and commuinity needs.
Methods for Researching Population and Demography
Demography utilizes methods and theoretical perspectives of economics, sociology, statistics, geography, anthropology and
other fields, as well as having its own core of analytical techniques. The Social Research Update is published quarterly by the
Department of Sociology, University of Surrey, Guildford in England. An interesting article on Social Survey Methodology
by Roy Sainsbury, John Ditch and Sandra Hutton entitled Computer Assisted Personal Interviewing (Jan 96) might be
helpful. Also, archiving is essential to demographers to address questions associated with preserving qualitative data and
facilitate secondary analysis of archived material. Qualidata is good resource to learn more about archiving. Also, The
Population Council has developed software for researchers that is used to make population projections which can be
downloaded from the internet.
The Australia National University has the most comprehensive internet guide to major Census Data and Demographic
Centers compiling reasearch and population studies via the World Wide Web Virtual Library. Currently the page contains
155 links. In addition to its site on Population and Demography, the WWWVL has useful links on fields that are closely
related to Population and Demography. Here are a sampling of those and some other population related sites.
● WWWVL Anthropology
● WWWVL International AffairsWWWVL International Development Co-operation
● WWWVL Political Science
● WWWVL Politics & Economics
● WWWVL Social Sciences
● United Nations Population Fund
● African-American Population Statistics
● Applied Population Laboratory University of Michigan - Population Studies Center Penn State-Population Research
Center
● Netherlands Interdisciplinary Demographic Institute
● Planet Earth Homepage
● The World Village Project
● The World Bank
Copyright @ 1997, Paul A. Burns

The Web, this word send shivers down the spines of some corporations and individuals. Pictures of spiders, cobwebs, miles of
knotted and gnarled wires or a mushroom cloud may be brought up at the mere mention of the "Web". The fears associated with this
is understandable especially when one doesn't know what the Web can offer.
The Web is and has been impacting the lives of mere mortals and multi-national companies everyday. Through the multitude of sites
available on the Web to meet the needs and wants of each and every individual, the number of people using The Web has been
increasing everyday. For businesses, the Web represents a means to reach customers all over the world, 24 hours a day, seven days a
week, all year round (unless the server decides to do otherwise and take a "break").
Sales on the Internet in terms of amount of merchandise sold, was stated to be about $500 million in 1996 and has been projected to
reach about $8 billion by the year 2000 (source: Internet World). As to the number of people using the Internet in place of their local
shopping malls, about 10% of the 25 million users are doing just that. The Internet is fast becoming more of a commerce platform
rather than just a publishing platform and an incrasing number of companies are introducing Internet-related infrastructure services
and retail outlets which make internet marketing easier and more efficient.
Not convinced? Need some more evidence and hard facts? Well, have no fear, here are some links to some cool sites (some with
graphs that will blow your mind away) that offer statistics about the size of the Internet and the Web to satisfy your information
needs.
General Magic - Internet Trends of worldwide domain hosts 1989-1997
Network Wizards - Internet Domain Survey
NUA Internet Marketing - Internet Surveys (Index)
Netcraft - State of the Web; Web Server Survey
Matthew Gray's site - Internet Stats on growth and usage
MIDS (Matrix Information and Directory Service) - Composition, content and users of the net
GVU (Graphics, Visualization & Usability Center) User Surveys - WWW survey, growth and demographics
Engineering Workstations (UIUC) - Latest browser stats
IBM Infomarket - WWW stats, usage, popular pages & top 20 domains
Interse Corporation - A company that provides web analysis and looks at Web business trends
And if you are really interested in doing a specialized survey and have the means and funds to do so, there are also companies on the
Net that offer such services.
Forrester Research - A company that offer Internet business/advertising services to other businesses
Jupiter Communications - A company that offers research and analysis on interactive services/media
Nielsen Media Research Interactive Services - A company that provides Internet demographic information and research
services
ActivMedia - A company that does research and studies of the Internet
Killen & Associates - Consultants to financial services and telecommunication industries
Zona Research - A research company that provides information and advice on the Internet industry
BBM Bureau of Measurement - A non-profit broadcast research company for TV, Radio and Interactive media
So now that you have a Web Page up, what do you do? That question has always plagued me when I set up a Web page, "How will
they know that I am HERE?". Well, hopefully this site will be a step foward in the next stage
The Internet has a variety of directories, catalog services and search engines that allow you to advertise your site on the Web. Some
of these even allow businesses to list their Web sites for free and others require a fee. There are even sites that offer information and
advice on advertising, marketing and business opportunites on the Web.
Here are some sites that offer you a variety of services and I only chose the ones that gave them for free! (And trust me, these sites
will give you TONS of information!)
Advertising Age - A Web version of a magazine that offers information on advertising, marketing and the media
Web Site Banner Advertising - A Website that offers information and links on banner advertising
Multimedia Marketing Group's WebStep Top 100 - A Website that gives the links to the top 100 site that will allow you to list
your site for FREE!
Internet Link Exchange - A site that provides links to and information on banner services for FREE!
By now, I hope that this site has managed to whet your appetite to do business and advertise on the Web. The incentives are many
and don't worry, I am not earning any commission in providing all this information :-) All the best in your Web endeavours!!
Back to Project Gallery
Back to Bill Trochim's Center for Social Research
Copyright © 1997
Created by Lynette P Cheng (lpc4@cornell.edu)
March 1997
Measuring Environmental Attitudes
The New Environmental Paradigm
Lisa Pelstring 3/25/97
This web site provides an overview of the concept of environmental attitude focusing primarily on one well-known measure of
environmental attitude, as well as hyperlinks to related sites the viewer may find interesting.
● What is Environmental Attitude

● Importance of Measuring Environmental Attitudes
● The New Environmental Paradigm
● Other Environmental Attitudes Measures
● Environmental Attitude-Related Web Sites
● References
● Return to Trochim Home Page
What is Environmental Attitude
What is Attitude? is perhaps the first question to ask. Fishbein and Ajzen (1975) define attitude as "a learned predisposition to
respond in a consistently favorable or unfavorable manner with respect to a given object" (p.6). The researchers break the definition
into three components: attitude is learned; it predisposes action; and such action or behavior is generally consistent. Attitude is
evaluative in nature--evaluative toward, for instance, pollution or wildlife--and such evaluations are based on beliefs.
With the above elements in mind, "environmental" attitude can then be defined as "a learned predisposition to respond consistently
favorable or unfavorable manner with respect" to the environment. As this web site will demonstrate, there are many scales available
that attempt to measure many aspects of people's attitudes toward the environment--attitudes toward wildlife, pollution, habitat are
just several examples. To view an example of an environmental attitude scale click here.
Importance of Understanding Environmental Attitudes

Understanding and measuring environmental attitudes has become increasingly important as the number of environmental conflicts
have increased throughout the Unites States and the world. Paralleling an increase in environmental conflicts is the growing trend
toward greater public involvement in how these conflicts are handled.
In the past few decades, government, nongovernmental organizations, and the private sector have responded as initially embryonic
communication and education efforts have evolved into programs at the local, state, and federal levels. Communication programs in
government, for example, exist in such diverse areas as federal and state wildlife management agencies, state cooperative extension
services (Peyton and Decker 1987), state and federal hazardous waste agencies, and local health agencies. These educational and
outreach programs attempt to involve the public in decisionmaking as well as influence perceptions and behaviors toward issues such
as wildlife resource management (Peyton and Decker 1987).
Even with communication programs in place, environmental conflicts among stakeholders are inevitable (Fazio 1987). However,
there are numerous reasons for many of these conflicts--poor understanding of stakeholder concerns and attitudes is a prime example.
Communication planners working in all sectors--private, nongovernmental, and government--often design education and outreach
programs without first attempting to understand audience attitudes. Without better understanding of such attitudes, many of these
communication programs "miss their mark"--failing to garner audience attention because audience values and attitudes were ignored.
If communication programs are to attain their objectives--whether these objectives be persuasive or educational in nature--
understanding target audience attitudes is critical. Many conflicts over environmental and natural resource management issues are
impossible to avoid. But knowing more about environmental attitudes may at the very least help guide communication and education
efforts and hopefully lead to more thoughtful, informed, and effective discussion.
The New Environmenal Paradigm
Dunlap and Van Liere (1978) were the original researchers to posit that a new world-view was emerging--one that differed
dramatically from the Dominant Social Paradigm (the public's belief in progress and development, science and technology, a laissez-
faire economy, etc). Calling it the New Environmental Paradigm (NEP), the authors assert that this emerging outlook comprised such
concepts as limits to growth, steady-state economy, and natural resource preservation. Dunlap and Van Liere developed an
instrument to measure public acceptance of the NEP that has subsequently been tested by other researchers and is till being used
today. The original testing instrument comprised 12 items listed below:
● We are approaching the limit of the number of people the earth can support.
● The balance of nature is very delicate and easily upset.
● Humans have the right to modify the natural environment.
● Humankind was created to rule over the rest of nature.
● When humans interfere with nature it often produces disastrous consequences.
● Plants and animals exist primarily to be used by humans.
● To maintain a healthy economy we will have to develop a "steady state" economy where industrial growth is controlled.
● Humans must live in harmony with nature in order to survive.
● The earth is like a spaceship with only limited room and resources.
● Humans need not adapt to the natural environment because they can remake it to suit their needs.
● There are limits to growth beyond which our industrialized society cannot expand.
● Mankind is severely abusing the environment.
Problems with the NEP

One of the more significant issues researchers have raised concerning the NEP scale is whether it is unidimensional or
multidimensional. In other words, is the concept of environmental attitude really only one concept, or does it comprise several
concepts or dimensions. Albrecht et al. (1982) conducted a replicative study to assess the NEP scale's reliability, validity, and
unidimensionality. The authors surveyed two groups--a farm sample and an urban sample--and found that the NEP scale had three
dimensions (unlike Dunlap and Van Liere who found the scale to be unidimensional). Factor analysis showed that these dimensions
included: the balance of nature; limits to growth; and man over nature.
Geller and Lasley (1985) also tested the dimensionality of the 12-item NEP scale using samples from the Van Liere study and from
the 1982 Albrecht study. While Dunlap found the original 12- item scale to be unidimensional and Albrecht found it to consist of
three factors, Geller and Lasley were unable to confirm either researchers configuration using confirmatory factor analysis. Only by
reducing the scale to nine items were Geller and Lasley able to "cautiously accept" Albrecht's finding of three factors: balance of
nature, limits to growth, and man over nature.
Researchers Scott and Willits (1994) conducted a 1990 statewide survey examining Pennsylvania residents' opinions about the NEP
and pro-environment behaviors. Confirming past research results, they found that while most respondents expressed support for the
NEP, most did not participate in pro-environment behaviors--in other words, environmental attitudes were not predictors of pro-
environmental behavior. But more importantly, they too raised questions over the NEP scale's possible unidimensionality. Scott and
Willits found only two underlying dimensions: a humans-with-nature factor and a balance of nature/limits to growth factor.
In several National Park Service studies, Noe and Snow (1990) applied the NEP scale in surveys of visitors to five Southwestern
parks. The authors hypothesized that park visitors would support a more ecological view of man and nature, and that the scale items
would show consistency and unidimensionality. Results indicated that the scale was multidimensional with only two factors--
balance of nature and man over nature--and that some items used in the original NEP scale could be dropped.
Even in this preliminary review, it is apparent that the NEP scale is not unidimensional. Gray (1985) questions "whether any measure
of environmental paradigms can be unidimensional" and asserts that "beliefs in such a complex domain as ecology are not likely to
be simple...but complex and multidimensional."
Other Environmental Attitude Measures
While I found the NEP cited most often in the literature, I did also find other measures that looked at more general environmental
attitudes as well as specific environmental issues--such as wildlife-- and specific populations--such as children.
About the same time period that Dunlap and Van Liere were developing the NEP, Weigel and Weigel (1978) produced the
Environmental Concern Scale. This scale is similar to the NEP in that it examines more attitudes toward more general environmental/
ecological issues. The 16-item scale was used in four separate studies conducted by the researchers with the goal of predicting
environmental behavior. This measure includes such items as "[t]he currently active anti-pollution organizations are really more
interested in disrupting society that they are in fighting pollution" and "[t]he federal government will have to introduce harsh
measures to halt pollution since few people will regulate themselves." Click here for an example of one study using the
Environmental Concern Scale.
In the early 1970s, Maloney and Ward (1973) developed the Ecology Scale which measured attitudes as well as knowledge,
emotions, and behavior. Their scale comprised four subscales: the Verbal Commitment Subscale, the Actual Commitment Subscale,
the Affect Subscale, and the Knowledge Subscale. Each of these subscales came to a total of 130 items that were tested on
environmental group members, college students, and residents of Los Angeles. Results from this study indicate that most people (not
surprisingly) scored higher in terms of verbal commitment and affect, and lower in actual commitment and knowledge.
Leeming et al. (1995) used a modification of Maloney and Ward's scale to produce the Children's Environmental Attitude and
Knowledge Scale (CHEAKS). The items on CHEAKS were very different but the overall structure of the scale was similar. Leeming
et al. broke the scale into four areas--Verbal Commitment, Actual Commitment, Affect, and Knowledge subscales. The measure was
tested on a total of 2,642 children.
Environmental Attitude-Related Web Sites
The University of Michigan has an Environmental Education and Communication Resources page. It contains 2500 items that
includes curricula, monographs, reports, case studies, periodicals, and newsletters. By using the Search Form and typing
"environmental attitudes" in the keywords box, you will receive a list of article abstracts and other references relating to this topic.
One report, Environmental Attitudes and Behaviors of American Youth, is provided in full text.
One report that references an environmental attitude scale first developed by Weigel and Wiegel (1978) is available online. A Study
of Environmental Attitudes and Concepts of Environment and Environmental Education of Geography Student Teachers provides
scale items and study results.
The University of Wisconsin web site provides a Data and Program Library Service. Again, doing a search with "environmental
attitudes" as keywords will connect you with a 1996 report, International Social Survey Program: Environment, focusing on
international environmental perceptions. The report includes survey items, codebook, and results.
Cornell University's Human Dimensions Research Unit within the Natural Resources Department also provides an index of
publications, many of which are related to attitudes toward specific environmental issues, such as wildlife or natural resources.
The Human Dimensions of Wildlife Journal web site provides a listing of the Table of Contents for each issue. After scrolling
through several issues, I found several articles relating to wildlife and environmental attitudes.
The State Education and Environmental Roundtable web site provides a list of studies that measure environmental attitudes and
knowledge. It unfortunately does not provide full texts of these studies online.
The University of Toronto has a research database called the Environmental Grey Literature Search. I typed in environmental attitude
and received a list of seven report. While the reports were not online, title, author, and call number were provided for each report.
References
Albrecht, Don; Bultena, Gordon; Hoiberg, Eric; Nowak, Peter. The New Environmental Paradigm Scale, Journal of Environmental
Education, 13(3): 39-43, Spring 1982.
Atkin, CK. Mass media information campaign effectiveness, in Rice RE, Paisley, WJ (eds): Public Communication Campaigns.
Beverly Hills, Sage Publications, 1981.
Cushman, Donald P., and McPhee, Robert D. Message-Attitude-Behavior Relationship. New York, Academic Press, 1980.
Fishbein, Martin, and Ajzen, Icek. Belief, Attitude, Intention, and Behavior: An Introduction to Theory and Research. Addison-
Wesley Publishing, Reading, MA, 1975.
Geller, Jack M. and Lasley, Paul. The New Environmental Paradigm Scale: A Reexamination, Journal of Environmental Education,
17(3):9-12, Fall 1985.
Leeming, Frank C; Dwyer, William O., and Bracken, Bruce A. Children's Environmental Attitude and Knowledge Scale:
Construction and Validation. Environmental Education, vol. 26, no. 3:22- 31 1995.
Maloney, Michael, P. and Ward, Michael P. Ecology: Let's Hear from the People. American Psychologist, 583-586, July 1973.
Noe, Francis, P. and Snow, Rob. The New Environmental Paradigm and Further Scale Analysis, Journal of Environmental
Education. 21(4):20-26, 1990.
Weigel, Russell, and Weigel, Joan. Environmental Concern: The Development of a Measure. Environment and Behavior, !0(1):3-15
1978.
Copyright © 1997 lmp23@cornell.edu

The Sin of Omission -Punishable by Death to Internal Validity: An Argument for
Integration of Qualitative and Quantitative Research Methods to Strengthen
Internal Validity
Kathryn A. Bowen PhD Student
Cornell University
The above geometrical diagram demonstrates my belief that as the social science researcher merges Qualitative
and Quantitative methodologies, the Internal Validity of the research design is strengthened. Metaphorically, the
triangle is known as the strongest geometric shape. The diagram demonstrates in a geometric fashion how the
triangle, denoting research methodology, is enhanced and Internal Validity strengthened as the two research
methods merge.
Program evaluation of social services set the stage for a variety of research opportunities. One of the primary goals of evaluators
conducting this type of social research is to construct research designs that are reliable and valid so high quality evaluations can be
conducted while enhancing scientific knowledge.
A concern for most evaluators of human service programs is the complex nature of the phenomena under study, the human
experience. Multiple perspectives are required in order to reflect the richness of these complexities. In addition, due to the fluid
nature of human behavior rigorous attention must be directed toward threats to Internal Validity in social science research endeavors .
This WEB site will examine the merits of integrating Qualitative and Quantitative research methodologies in the form of
Triangulation, in order to strengthen the Internal Validity of social scientists research.
Resources for Social Science Research
● Research Engine for the Social Sciences
Social Science Data Archives and Data Libraries
Social Science Resource Center
Princeton Social Science Library
Bolder University Social Science Resource Center
The Sin of Omission
Some academians claim that the heated debates between the bi-polar quantitative and qualitative methodological encampments are
passe, however it appears that the literature continues to contain many works by social science researchers that accept one
epistemological perspective to the exclusion of others. In addition few WEB sites have been constructed discussing integration of
qualitative and quantitative methodologies via Triangulation.
>From the perspective of a fledgling social researcher I think it is time that we realize the complex nature of the context in which we
aspire to conduct our research. Human phenomenon cannot be completely controlled or isolated in a sterile environment.
Quantitative research designs including measurement, prediction and causal inference do not always fit in isolation with the world of
social science where perceptions, feelings, values, and participation are frequently the variables we are attempting to measure.
By omitting qualitative methods, the social science researcher may overlook many phenomena that occur within the context of the
setting. Campbell notes that quantitative measurements rests on qualitative assumptions about which constructs are worth measuring
and how constructs are conceptualized. (Shaddish & Cook, 1991).
By omitting quantitative methods casual relationships between variables as well as quantification and analysis of those variables to
determine statisti cal probabilities and certainty of a particular outcome will be flagrantly absent.
I propose that the inherent differences between Quantitative and Qualitative research methodologies be used to the advantage of the
social science researcher. By combining the different perspectives a more comprehensive research design can be constructed.
The following suggestions were adapted from an article by Dr. Mary Duffy in 1987. Dr. Duffy was actually outlining differences in
methodologies, however I will combine the perspectives in order to illustrate the benefits of multiple methods in the study of human
phenomenon.
Benefits of Combining Qualitative and Quantitative Methods
While the Quantitative design strives to control for bias so that facts can be understood in an objective way, the Qualitative
approach is striving to understand the perspective of the program stakeholders, looking to firsthand experience to provide meaningful
data.
The accumulation of facts and causes of behavior are addressed by quantitative methodology as the qualitative methodology
addresses concerns with the changing and dynamic nature of reality.
Quantitative research designs strive to identify and isolate specific variables within the context (seeking correlation,
relationships, causality) of the study as the Qualitative design focuses on a holistic view of what is being studied (via documents,
case histories, observations and interviews).
Quantitative data is collected under controlled conditions in order to rule out the possibility that variables other than the one
under study can account for the relationships identified while the Qualitative data are collected within the context of their natural
occurrence.
Both Quantitative and Qualitative research designs seek reliable and valid results. Data that are consistent or stable as indicated
by the researcher's ability to replicate the findings is of major concern in the Quantitative arena while validity of the Qualitative
findings are paramount so that data are representative of a true and full picture of constructs under investigation.
By combining methods, advantages of each methodology complements the other making a stronger research design with resulting
more valid and reliable findings. The inadequacies of individual methods are minimized and more threats to Internal Validity are
realized and addressed.
Links to Illuminate Concepts
Qualitative MethodsAn interesting argument for the utilization of qualitative methods and how this methodology relates to
reality experienced by human beings. The author offers a provocative paper based on social research as discovery. He uses
the philosopher, Kleining who questions the issue of how we, as human beings find out about the world we live in. The
author talks generally about how research methods are a matter of development from the every-day strategies of openness
and discovery rather than from closure and interpretation. This is an excellent paper in that it assists the researcher in
developing skills in thinking about reality and the importance of incorporating qualitative methods into research designs in
order to better capture the reality of human beings.
The Human Side of being Human Harvey Jackins provides background insights into reality and how humans function in this
reality. This site provides text which incites a broader view of reality and the human being's position within.
Pitfalls of Data Analysis This is an excellent site on pitfalls of data analysis. Clay Helberg of the University of Wisconsin
informs readers how to avoid "Lies and Damned Lies". The Web Site is written in an informal, chatty fashion. Trickier
aspects of applied data analyst are reviewed as Clay talksabout sources of bias, errors in methodology, and problems with
interpretation. The graphics are quite good although black and white only. Included is a great list of areas for potential
problems when using statistical analysis of data. Clay offers a good discussion on the importance of the researcher
understanding the conditions for causal inference. If a causal inference needs to be made a random sample should be drawn.
As is the case in some social research, random samples are not possible. In this instance added effort is needed to discover
causal relationships by using a variety of approaches.
Qualitative MethodsThis site is a qualitative methods course description created by Professor Peggy Beranek. Its utility is
the extensive reference list which provides the WEB user with literature documenting the utilization of qualitative methods.
Introduction to Validity This site provides an indepth look at validity in social science research.The Knowledge Base serves
to organize the different types of validity in a clear manner that makes complex concepts more tangible. Superb graphics,
well constructed and organized. A must for any WEB user with questions or areas of confusion relating to validity. Some
sections are under construction so even greater information benefit is forthcoming.
Internal Validity, Toward a Stay of Execution
Once the social science researcher has identified variables that covary, the next major step is to determine whether or not there is any
causal relationship between the two. If causality is established, the researcher must then decide whether the direction of causality is
from the independent variable on the dependent variable or vice versa. In social research this is often a major challenge since human
phenomenon typically does not occur in neat little boxes. We cannot make an epistemological assumption that the social world
behaves consistently so that objective forms of measurement can be used in isolation. Further, we should not strip the data
completely from their natural context but rather strive to understand human behavior from the stakeholder's own frame of reference.
Knowledge of a time sequence is of vital importance in order to ascertain the direction of causality . The social science researcher
might use a preprogram questionnaire, indepth interview or a combination of both methods prior to program interventions {Time 1},
then once again among the same participants after a predetermined length of time in the program {Time 2}. In a perfect world, the
social science researcher could establish a causal relationship between program intervention and changes in participant behavior from
the above described method, however there might very well be a third-variable lurking in the shadows that can cause the researcher
to assume incorrectly that the program has no effect{false negative/Type II Error} or that the program has an effect when it actually
does not {false positive/Type I Error}.
"Accounting for third-variable alternative interpretations of presumed A-B relationships is the essence of internal validity".
(Campbell & Stanley, 1963, p. 50). A strategy that can be used to illuminate third-variable alternative interpretations is Triangulation.
Triangulation is a term used in navigation to describe a technique whereby two known or visible points are used to plot a third.
Campbell (1956) was the first to apply the term "triangulation" to research methodology. (Breitmayer, 1993). "Triangulation
combines independent yet complementary research methods to:
enhance the description of a process or processes under study
identify a chronology of events

provide evidence for internal validity estimates
serve as a corroborating or validating process for study findings. Thus, an expanded understanding and contextual representation
of the studies phenomena result". (Hinds and Young, 1987, p. 195).
Methodological triangulation can be classified as simultaneous or sequential. "Simultaneous triangulation is the use of the qualitative
and quantitative methods at the same time. In this case, there is limited interaction between the two datasets during the data
collection, but the findings complement one another at the end of the study. Sequential triangulation is used if the results of one
method are essential for planning the next method. The qualitative method is completed before the quantitative method is
implemented or vice versa". (Morse, 1991, p. 120).
Determination of the specific research problem is the first step in qualitative-quantitative triangulation. This can be accomplished by
identifying whether the theory that drives the research is developed inductively from the social science researcher her/himself or
deductively as is characteristic in quantitative inquiry. Mitchell (1986) suggests that triangulation offers flexibility and an in-depth
approach that single method designs cannot provide. Four principles must be adhered to, however according to Mitchell in order for
triangulation to be used effectively:
"1. the research question must be clearly focused
2. the strengths and weaknesses of each chosen method must complement each other
3. the methods should be selected according to their relevance to the nature of the phenomenon being studies, and
4. continual evaluation of the approach should be under-taken during the study." (Corner, 1990, p. 721).
Benefits of triangulation have been identified by several social science researchers. Madey (1982) discusses using exploratory
interviews and/or observations in improving the sampling framework. Data collection using observation and exploratory interviews
can provide information about the receptivity and frames of reference of program participants prior to the construction of quantitative
survey instruments. As a result, better instruments are created as well as improved methods of instrument administration.
Mary Duffy, (1987), cites nine benefits associated with Triangulation:
● The conceptual framework, which provides the theoretical base of the study, can be developed in whole or in part from
qualitative methods.
In areas where methods produce information overlap, certain quantitative results can be verified by results obtained
through qualitative methods.
Qualitative data gained from interviews and/or observations can be used as the basis for selecting survey items to be
used in instrument construction.
External validation of empirically generated constructs can be obtained by comparison with interview and/or
observation data: where discrepancies exist, additional probing can be done to determine whether the mismatch was because
of a weakness in the instrument or to misinterpretation by the individuals taking the test.
Case studies can be used to illustrate statistically derived models.
Clarification of ambiguous and provocative replies to individual questionnaires can be observe by reexamining field
notes.
Quantitative data can provide information about program stakeholders who were overlooked initially.
The use of a survey instrument that collects data from all program stakeholders in the study may serve to correct the
qualitative research problem of collecting data only from an elite group within the system being studies.
Using quantitative assessment can correct for the "holistic fallacy"; (the perception by the researcher that all aspects of a
given situation are congruent, when in fact only those persons interviewed by the researcher may have held that particular
view). Also the use of quantitative instruments can verify observations collected during informal field observations. (p.
132).
Although triangulation moves the social science researcher closer to convergence, corroboration and correspondence of results across
different method types, threats to Internal Validity must be recognized and minimized.
Decision MatrixThis site provides a superb analogy using the proceedings from the OJ Trial and the possibility of making a
Type I or Type II error. While this site may seem to use an issue that is quite controversial, the example assists the WEB
user in thinking about making inferences based upon research findings and potential errors that are inherent in most social
science research projects.
Internal Validity
Donald Campbell thought very deeply about construct validity and actually came up with a complex technique that involves the use
of a multi trait multi method matrix. The multi trait multi method matrix requires convergent and discriminate validity as conditions
for naming something. Campbell espouses multiple operationalism, the belief that many measures are needed to triangulate on a
single construct.
While the MTMM is a systematic approach to assess construct validity, no such approach has been developed to assist the researcher
think about threats to internal validity. Threats vary depending upon the context of each individual research environment. Specific
threats to internal validity have been identified by many and totally conquered by few.
Designs and Methodology Links

Multi-Method MatrixDr. Trochim from Cornell University once again provides very readable text with vivid graphics that discuss
and illustrate the concepts of the Multi-Method Matrix. Trochim describes the validational process of utilizing a matrix of
intercorrelations among tests representing at least two traits, each measured by at least two methods.
Designs A brief illustration of various types of experimental and quasi-experimental designs are provided in an abridged form.
Specific threats to internal validity are numbered 1-8. With each design, potential threats are listed. Very useful during the planning
process of a research project. The social science researcher can trouble shoot depending on the particular design she/he decides to
use.
Design Should Meet Certain Criteria A succinct definition of internal validity is provided by the Florida Agriculture Information
Retrieval service. Examples are given outside the context of Social Science (i.e., environment, plant growth) however illustrations are
straight forward and provide a different context in which to think about design issues. The page appears to be under development, so
further discussions concerning design might be forthcoming.
Introduction to Internal and External Validity This site does a very nice job discussing internal validity along with various threats that
can interfere. Dr. William Huitts of Valdosta University has established this site for his students thus explaining the psychological
bent noted in the the writings. The discussion of internal validity is in clear terms with good discussions of 8 threats to internal
validity. Language of the text is helpful because Dr. Huitt links independent and dependent variables in his discussion of internal
validity and potential threats. Good as a secondary resource.
In order to provide a point of reference, I will offer a hybrid research design that illustrates the use of Triangulation as a means of
addressing several common threats to Internal Validity.
Hybrid Social Service Program
Let us assume that we wish to study the effects of a prenatal program called Healthy Beginnings on pregnant adolescent females self-
care abilities. Healthy Beginnings is a program consisting of nursing interventions that occur at the same time as the prenatal visits.
The intervention is by and large counseling, referral to social services such as WIC, home visits before and after delivery and
contraceptive information and prescriptions)
Initially concept mapping would be conducted among program planners, program implementers, teens, nurse midwives involved in
seeing participants during prenatal visits and social workers working with teens. The goal of creating this concept map is to assist the
Healthy Beginning stakeholders to work collectively as a group while maintaining their own individual perception of the program.
Concept mapping is a structural process, focused on the construct of interest (Self-Care perception of adolescent mothers). Input from
a range of program stakeholder is required in order to create a conceptual map of ideas and meanings of the program. This would be
particularly important among the teen participants, who are readily 'put-off' by vacuous, pretentious terminology.
Two randomly selected groups of teens would be selected, half participating in the Healthy Beginnings program, the other half seen
by the Obstetric physician for prenatal care, only. The two groups would be similar demographically and developmentally. Dissimilar
strategies and methods would be used within the same research design. Triangulation of methods that are different would:
reflect the theorized mulitdimensionality of the construct of self-care

provide more detail about the meaning teenagers attach to the phenomenon of self-care as it relates to pregnancy, parturition, and
motherhood
index a process of change in the perception of self-care abilities as was theoretically predicted
methods will be diverse and independent of each other
methods will be suitable for use in the field setting.
Dissimilar strategies including observations, structured interviews, self-report questionnaires, and document review would be
employed. These methods could be viewed as compensatory, as the limitations of one are offset by the strengths of the other. For
example, review of documents (patient chart, prenatal information sheets, postpartum discharge summary) could be used to
counterbalance the reactive influence of the social science researcher's presence on the adolescent female's self-report data.
Observational data, which can become contaminated by the researchers bias could be compared with or checked against an
adolescent's questionnaire (measuring perceptions regarding self-care) and interview responses. Interviews could include question
that were open-ended so that the predetermined, defined and limited foci of the questionnaire would be offset. The questionnaire
could be administered prior to participating in the Healthy Beginnings Program as well as a follow-up questionnaire at the 6 weeks
post-partum visit. In addition a random sample of teens could be interviewed post-partum from both the treatmentment and non-
treatment groups.
This selection of methods would hopefully have dissimilar biases and therefore result in less systematic effects of participant and
investigator based errors leading to problems with the internal validity of the research.
Links: Teen Pregnancy and Concept Mapping
Teen Pregnancy This site is very provocative in that it shares actual stories of teenage women who are pregnant and how their lives
are changed as a result. The text is fairly simplistic, however the case study approach in describing the teen's situations (along with
color pictures of the young women) is an effective mechanism for keeping the reader interested and heightens awareness concerning
the social problem of teenage pregnancy.
Concept MappingThis site provides excellent graphics with understandable text and examples. Included are additional links for
specific examples of concept mapping in use and extensive general informationThis process would express the conceptual framework
in the language of the participants rather than in the rhetoric of the social science researcher.
Hung Jury: Caveats for Using Methodological Triangulation
In theory the use of Triangulation seems like a logical way to strengthen the Internal Validity of social science research. Researcher,
however must never rest on their laurels and depend on the methodology alone to insure solid, internally valid work. The following
caveats are important to keep in the frontal lobe as we endeavor to conduct social science research in a dynamic never static
environment.
● Unit of Analysis: a common unit of analysis is required to guide the data collection and analysis. The common unit
needs to be a part of all aspects of the triangulation.
Time and money constraints: the time and money required to combine different approaches of data collection and
analysis is likely to be considerable.
Investigator demands: the investigator who wants to use multiple triangulation successfully needs a broad theoretical
perspective and a broad knowledge base of research methodology, including both quantitative and qualitative methods. Also
required is the ability and desire to deal with complicated design, measurement and analysis issues.
Data Analysis: analysis of data generated by multiple triangulation is a difficult problem that has yet to be solved. This
would be particularly important among the teen participants, who are readily 'put-off' by vacuous, pretentious terminology.
Dissimilar strategies including observations, structured interviews, self-report questionnaires, and document review would
be employed. These methods could be viewed as compensatory, as the limitations of one are offset by the strengths of the
other. For example, review of documents (patient chart, prenatal information sheets, postpartum discharge summary) could
be used to counterbalance the reactive influence of the social science researcher's presence on the adolescent female's self-
report data. Observational data, which can become contaminated by the researchers bias could be compared with or checked
against an adolescent's questionnaire and interview responses. Interviews would include question that were open-ended so
that the predetermined, defined and limited foci of the questionnaire would be offset. The methods selected would hopefully
have dissimilar biases and therefore result in less systematic effects of participant and investigator based errors leading to
problems with the internal validity of the research.
The literature provides few guidelines. Numerous questions are generated by the analysis issue such as:
How to combine numerical data, linguistic and textural data
How to interpret divergent results between numerical and linguistic data
What to do with overlapping concepts that emerge from the data and are not clearly differentiated from each other
Whether and how to weight data sources
Whether each different method used should be considered equally sensitive and weighted equally
Conclusion
Methodological triangulation is not the panacea for every social science research project. We as social science researchers should be
mindful however, that one methodology can narrow a researcher's perspective and can deprive him/her of the benefits of building on
the strengths inherent in a variety of research methodologies. Triangulation can maximize the strengths and minimize the weakness
of each individual approach while strengthening research results and contributions to theory and knowledge development. The
benefits of triangulation also serve to enrich and deepen our understanding of the research environment while seeking convergence,
corroboration, and correspondence of results across the different method types. This framework highlights the integrative potential of
these strategies, and underscores their potential power not only to incorporate qualitative and quantitative analyses, but also vice
versa, and, even beyond, to spiral iteratively around the different data sets, adding depth of understanding with each cycle. (Caracelli
& Greene, 1993). Through this process threats to internal validity can be recognized and addressed.
While methodological triangulation can enhance, illustrate, and clarify research findings, the researcher should keep in mind that use
of multiple methods can also lead to the discovery of paradox and contradiction. In addition to considering the caveats listed above,
at the onset of the research project, the social science researcher must meticulously develop a comprehensive conceptual framework
for methodological triangulation which includes planning for data analysis along with planning the design of the study. The analysis
of research findings from one methodology can then provide a set of substantive categories that is used as a framework applied in
analyzing the remaining research findings. (i.e. Indepth interviews or concept mapping to inform questionnaire development).
Reference List
Breitmayer, B.J. (1993). Triangulation in Qualitative Research: Evaluation of Completeness and Confirmation Purposes. IMAGE:
Journal of Nursing Scholarship 25(3), 237.
Campbell, D.T. & Fiske, D.W. (1959). Convergent and discriminant validation by the multi-trait-multi-method matrix. Psychological
Bulletin, 56, 81-105.
Caracelli, V. & Greene, J. (1993). Data Analysis Strategies for Mixed-Method Evaluation Designs. Educational Evaluation and
Policy Analysis, 15(2), 196.
Corner, J. (1990). In search of more compete answers to research questions. Quantitative versus qualitative research methods: is there
a way forward? Journal of Advanced Nursing, 16, 718-727.
Duffy, M.E. (1887). Methodological Triangulation: A Vehicle for Merging Quantitative and Qualitative Research Methods. IMAGE:
Journal of Nursing Scholarship,19(3), 130-133.
Hinds, P. & Young, K. (1987). A Triangulation of Methods and Paradigms to Study Nurse-Given Wellness Care. Nursing Research,
36(3), 195.
Mady, D. (1982). Benefits of Qualitative and Quantitative methods in program evaluation, with illustrations. Educational Evaluation
and Policy Analysis, 4, 223-236.
Mitchell, E. (1986). Multiple Triangulation: A methodology for nursing science. Advances in Nursing Science, 8, 18-26.
Morse, J. (1991). Approaches to Qualitative-Quantitative Methodological Triangulation. Nursing Research, 40(1), 120.
Patton, M. (1990). Qualitative Evaluation and Research Methods. SAGE Publications, Newbury Park. 464.
Shaddish, W, Cook, T. Leviton, L. (1991). Foundations of Program Evaluation: Theories of Practice. SAGE Publishing Company.
Newbury Park.
Trochim, W. (1982). Designing Designs for Research. The Researcher, 1(1), 195-200.
Trochim, W. (1989). An Introduction to Concept Mapping for Planning and Evaluation. Evaluation and Program Planning, 12, 1-16.
Kathryn A. Bowen
E-mail:kab19@cornell.edu
Copyright ©: 1996, Kathryn A, Bowen, Revised April 1, 1996

Welcome to Surveys On-line!
This digital domicile is designed to give you an overview of the kinds of surveys on-line. This page lists and
describes numerous surveys you may choose to participate in or browse. Examples of the absurd, the humorous,
and the academic are all here!
Contents:
Surveys For Fun: The Humorous and Absurd
Surveys that ask you what kinds of (usually strange) things you may intellectualize. Some examples of Surveys For Fun are:
❍ The Big Nose Survey
❍ Paranormal Belief Survey
❍ What Was the Coolest Ship in the Star Wars Trilogy?
Surveys on the Serious Side

Surveys that really want to know something about who you are! Many of these surveys are conducted by university students. Some examples of
Serious Surveys are:
❍ What is Missing on the Web?
❍ Intercultural Psychological Survey
❍ Internet User Demographics Survey
Places to go to Find on-line User Demographics and Statistics

Numerous On-line Surveys already conducted have results displayed on their Web Pages. This section will point you in the direction to find
information and tell you what you'll see once you get there.
What is the Current State of On-line Survey Measurement?
My opinions on where we are and where we are going.
Surveys for Fun: The Humorous and Absurd
First, if you are from another planet and wish to tell us about your mother planet please contact The Planet Information Survey. Information regarding special
abilities and the celebration of earthling holidays is requested. If you are an earthling, but you have experienced an alien encounter perhaps the Paranormal
Belief Survey is for you. Or try theDo You Believe Survey. Answer questions about UFO's, Government Cover-ups, and controversial events in society.
Moving into reality...Do you have a question you are dying to ask hundreds of strangers? Post your question on the Web! You can add a question with up to 15
responses. Your question will rest among others such as What is Your Definition of Stupid, What was the Coolest Ship From the Star Wars Trilogy, and Do
Women Think About Sex as Much as Men.
So, you would like to share your own opinion. Are you interested in media coverage of the Royal Family? This four question survey wants to know. Results
will be sent to People Magazine! Do you have an opinion about the Net Community?SurveyNet lists a variety of opinion surveys asking about your opinions of
the net. These surveys change periodically. You can also view results of past surveys SurveyNet has conducted.
Thinking about your appearance? Describe your nose in the Big Nose Survey. If you have come this far you probably should take the Wierdness Test. Your
score will be calculated for you immediately following completion. While at the same site, also participate in the Purity Test For Non-Virgins (Digital Sex
Doesn't Count!).
Starting to feel guilty? Survey your Bible Knowledge.
Now if you have participated in all of these surveys you are probably spending too much time on the Net. Take the Internet Use Survey. This survey is being
conducted by a Ph.D. candidate in clinical psychology.
Return to Contents
Surveys on the Serious Side
Here is a listing of mainly academic surveys being conducted on the Web. Be kind, participate in these surveys (they help students!).
A great survey being conducted by a Ph.D. student in instructional technology actually teaches you how to construct Web pages!
An Intercultural Psychological Survey is being conducted by psychology masters student at the University of Freiburg in Germany. There are three
questionnaires. All participants may receive a summary of their results after completion of the investigation.
A Presidential Politics on the Web Survey aims to find how technology has changed politics. Participate in this survey if you have visited Web sites dedicated
to the 1996 presidential campaign. These UC Berkeley students want to know about your experience!
Another interesting survey is The Role of the Internet in Health Care. How do you think the Net will change communication between doctors, hospitals, and
patients?
The Gallup Organization also has numerous surveys posted on their Web site. Take their quick Internet survey!
And if you still haven't had enough visit InfoQuest!a great link to Internet surveys that will fill your heart's content.
Return to Contents
Places to Find On-line User Demographics and Statistics.
There are three very good links to Internet statistics. Most of the research findings reported were gathered from surveys conducted on-line or at least partially
on-line.
● Resources on Internet Survey Methodology

● InfoQuest! Internet Surveys and Statistics
● Dr. K's Links to WWW Demographics
One of the largest on-line surveys is the GVU User Survey. The GVU User Survey was conducted by the graphics, visualization, and usability center (GVU) at
Georgia Institute of Technology. Summary reports about user demographics, Web and Internet usage, and preferences for Web service providers are available
through their Home Page.
Return to Contents
What is the Current State of On-line Survey Measurement?
The Net is in its infancy. The quality of reputable surveys conducted on the World Wide Web and the Internet reflect this state. Clearly, findings surveys on the
Net is a difficult task. One is drowned by the large numbers of them! Although it is frustrating, one must remember that this technology represents freedom of
speech as its best! You can post a survey or participate in a survey on just about any topic you can think of. The only problem is what to do with all of this
information. What are people currently doing? It appears that most of the surveys on-line are entertainment oriented. The academic and research oriented
surveys, however, present a problem. The academic and research surveys need to begin addressing issues of external validity. Few surveys comment about the
fact that they are drawing a nonrandom and nonrepresentative sample. First, the samples are nonrandom because people are self-selecting to participate in these
surveys. Usually, the participants are most interested in your topic, hence they take the time to answer your survey. Second, the sample of people that on-line
surveys draw from are not representative of the population. Currently, Caucasian, middle-aged, educated, higher-income, males are the most prevalent group
on-line. On-line surveys should not generalize to the population based on findings from samples of self-selecting middle aged males!
The growth of this digital world has brought about concerns of World Wide Web and Internet surveys. The proliferation of advertisements on the Net requires
new and improved ways of assessing audience desires, beliefs, and opinions. For instance, the Resources on Survey Methodology is a link to various resources
on Internet survey methodology. This is more bent towards the concern of academic and research type methodology. Guiding principals of interactive media
audience measurement is a good page to learn about innovations in audience measurement within the advertising industry. Moreover, a discussion about the
needs and the future of audience measurement are included at this site. Finally, more information regarding audience measurement is at the W3C Home Page.
You will find a numerous links to other sites dealing with media measurement.
Return to Contents
Thanks for coming!
Laura Brown
Masters Student
Cornell University, Department of Communication
Comments to author: LAB19@Cornell.Edu
Children's Rights Website
"Children in need of protection
are children
in need of having their rights respected."
- Michelle, age 14
This webpage is intended for child protection professionals and laypersons who have an interest in learning
about children's rights and how a children's rights approach can become part of child protection efforts. It offers an
introductory overview of children's rights, useful links and my 12-step children's rights guideline for child protection.
RIGHTS NOW LINK THIS
United Nations High Commissioner for Human

What are Human Rights? Rights:
"A right is something to which one has a just claim; something http://www.unhchr.ch/
that one may properly claim as due." (Webster's dictionary)
Human rights are generally defined as rights which are inherent

in our nature and without which we cannot live fully as human
beings. Human rights are called universal because they apply to UNIVERSAL DECLARATION OF HUMAN
all human beings, without exception. RIGHTS:
The purpose of human rights is to foster the conditions which http://www.unhchr.ch/udhr/index.htm

allow us to fully develop and use our human qualities, our
intelligence, our talents and our conscience and to satisfy our
spiritual and other needs. Human rights are based on people's
increasing demand for a life in which the inherent dignity and
worth of each human being will receive respect and protection. Human Rights Watch:
The notion of human dignity is central to all pursuits of human
rights.
http://www.hrw.org/
What are Children's Rights?
When we talk about the rights of children, we mean human

rights as they relate specifically to the of children. The U.N.
Convention on the Rights of the Child, which is the broadest Child Rights Information Network:
and most widely endorsed children's rights instrument
worldwide, defines children as all persons aged 18 and under.
Hence, children's rights apply universally to all persons 18 and http://www.crin.org/
under (with the exception of countries where the age of
majority is obtained earlier).
While human rights certainly apply to all human beings,

children need human rights tailored to their special needs and
vulnerabilities. That is why children have children's rights, in
addition to human rights. International Save the Children Alliance:
http://www.savethechildren.net/
Some children's rights are in fact the same as the rights every
person has regardless of age: protection from torture, access to
health care and freedom of expression and association. Most of
the children's rights recognized in the U.N. Convention,
Defense for Children International:
however, address the specific needs of children: rights
regarding primary education, opportunities for play, and dci-hq@pingnet.ch
adoption. Others take into account the inherent vulnerability of
children, such as the right to protection from abuse and neglect
(Article 19), from economic exploitation, i.e. child labor
(Article 32), and from all forms of sexual exploitation and
abuse (Article 34).
Just as human rights are universal in that they apply to all

human beings simply for being humans, children's rights are
also universal and therefore apply to all children everywhere in
the world, regardless of nationality, creed, religion,
circumstances.
Defense for Children International (DCI) is a leading

organization in the area of children's rights. It played a major
role in the creation of the UN Convention. It is set up to ensure
on-going practical, systematic and concerted international
action is specifically directed towards promoting and protecting
the rights of the child.
In order to help all those individual organisations working for

the legal and social defence of children DCI has produced a kit
on the Rights of the Child. to help readers better understand the
legal texts .An electronic version of the kit is available.
The right to Protection from Armed Conflict and Desk on Children in Armed Conflict and
Displacement Displacement:
In a world where wars are constantly being fought between http://www.crin.org/war.htm

countries and internally, what happens to the children of war?
Initiatives are needed to stop the use of child soldiers, campaign
are needed to ban land mines, and people are needed to
advocate for the rights of refugee and displaced children and
promote the optional protocol to the Convention on the Rights
of the Child on Involvement of Children in Armed Conflict.
There are organisations whose sole purpose is working to

enforce the rights of children involved in armed conflict and
displacement.
The Right to Protection from the Internet Childnet International:
Another right which is not specifically covered in the UN http://www.childnet.mdx.ac.uk/chi ldnet/

Convention but which has risen to the forefront of our
consciousness: the need to protect children from the negative
potential of the internet. Protection against internet pedophiles,
pornographic materials, violent sites, etc... Some individuals
and organizations are beginning to organize in order around the
child's right to "cyber" protection:
Childnet International is such an group. It is a non-profit

organisation, concerned to enable children to benefit from all
the changes in international communications, and to protect
them from any negative influences.
The Right to Vote ?
An issue which is not covered in the UN Convention Association for Children's Suffrage:
on the Rights of the Child is that of Children's
Sufferage. Should children be allowed to vote? A http:// www.brown.edu/Students/
number of child rights advocates think so. If children Association_for_Childrens_Suffrage/
should vote, at what age? And, more fundamentally,
why?
Proponents of children's right to vote contend that KidsVote:

those who do not vote (i.e., children) are unlikely to
have their interests protected because their needs will http://www.childwelfare.com /kids/kidsvote.htm
be seriously overshadowed by political decision-
makers. After all, politicians place great importance of
their constituents' lobby efforts, and children, well,
don't have the political clout to compete.
"The fundamental problem is that our political system

fails to provide a mechanism that lets the interests of
children to be represented. In modern democratic
societies like the United States, political power derives
from the vote. Those who can vote are able to assure
that their needs and interests are protected. Yet,
children are unable to vote." (Assoc. for Children's
Suffrage)
"Obviously before they develop the cognitive skills and

emotional maturity necessary for making difficult
political judgments, children cannot be expected to
vote. Perhaps these children should have their right to
vote exercised by proxy. We could assign their proxy
to their principal care giver. If children were given the
franchise, then their interests and needs would receive
attention equal to other groups in democratic
society." (Duncan Lindsay, Uof Calif.)
The Right to Both Parents Children's Rights Council:
Some children's rights advocate spend their time & energy http://www.vix.com/crc/
fighting for children to be able to have both parents involved in
their lives, and also their extended families.
This is an issue which becomes huge in the case where parents

The Alliance for Non-Custodial Parents Rights:
separate, divorce, re-marry, move away, or when one parent
dies.
http://www.ancpr.org/
Formed in 1985, the Children's Rights Council (CRC) is a
national non-profit organization based in Washington, DC that
works to assure children meaningful and continuing contact
with both their parents and extended family regardless of the
parents' marital status.
All children have the right to health, decent living Unicef:

conditions including access to shelter, drinking water,
and sanitary conditions. http://www.unicef.org/
All children have the right to education Educate the Children:
ETC-Ithaca@aol.com
20th Century Landmarks in the Children's Rights
Movement:
1923 The Save the Children International Union draws up and

approves the Declaration of the Rights of the Child - commonly 1983 Several non-governmental organisations form the
known as the Declaration of Geneva - a five point statement of NGO Ad Hoc Group to maximize their contribution to
basic child welfare and protection principles. the UN Working Group's efforts, and over the years have
an unprecedented impact on the formulation of the
Convention.
1924 The Fifth Assembly of the League of Nations in Geneva

endorses the Declaration of Geneva, inviting members to be
guided by its principles. 1989 The Working Group submits the final drat text to
the Commission o Human Rights, and on November 20th
-- 30 years to the day after the approval of the 1959
Declaration -- the United Nations adopts the Convention
on the Rights of the Child.
1948 The newly formed United Nations approves a slightly
expanded text of the Declaration.
1990 The Convention on the Rights of the Child enters

into force on September 2nd.
1959 A new 10-point Declaration of the Rights of the Child is
promulgated by the United Nations General Assembly on
November 20th.
1995 The U.S.A. signs the Convention on the Rights of

the Child on February 16th.
1978 The Government of Poland tables the first draft text of a

Convention on the Rights of the Child.
??? (still to come) The U.S.A. will ratify the Convention

on the Rights of the Child.
1979 The International Year of the Child. The United Nations

Commission on Human Rights considers the Polish proposal,
and sets up a Working Group to review it and produce a final
draft. [Adapted from Defense for Chidren International's
pamphlet "Children's Rights AND YOU".]
Website created by: Claire Bedard
Direct all enquiries to:
cb53@cornell.edu
Information Systems Technology
for Human Services
As you are surfing the World Wide Web (WWW) trying to find
information on a topic of interest what is it about that web site that
keeps you returning time after time, or sends you away never to
return? Information. The web site either has the information you
are looking for or it doesn't. This information could be on how to
evaluate and measure program effectiveness, develope an
information system, or research sites for an article you are writing.
The point is you want that information and you can find it easily!
The World Wide Web has an enomous amount of information some
information is very good and others and not.
The Purpose of this Web Site is to serve as an resource center for

anyone interested in Information Systems Technology, whether you
are a seasoned professional, researcher, student or a novist trying to
learn more about what is this technology and how is it being used;
this Web Site will be of interest to you.
Home | Information Superhighway | Community Information Systems Technology | TeleMedicine |
Return to Project Gallery | Trochim's Social Research Home Page
Updated:March 28, 1997

=>Copyright © 1997David Abrahams. All rights reserved.
Comments and questions: daa9@cornell.edu
WHAT IS
THE INFORMATION SUPERHIGHWAY?
THE INFORMATION AGE AND THE INFORMATION SUPERHIGHWAY WOULD NOT BE POSSIBLE WITHOUT THE
EVOLUTION OF THE PERSONAL COMPUTER AND TELECOMMUNICATIONS TECHNOLOGY THAT WE HAVE SEEN
OVER THE PAST TWO DECADES. DURING THAT TIME, WE HAVE GAINED THE ABILITY TO CONVERT LARGER
AND LARGER VOLUMES OF DATA INTO INFORMATION AND INTERGRATE VOICE, GRAPHICS, VIDEO AND TEXT
INTO A SINGLE MEDIUM AND THEN TRANSMIT THEM ECONOMICALLY BETWEEN PEOPLE AND MACHINES.
A commonly asked question is " What is the Information Superhighway or What is the
Internet?" Many people have heard of the terms Information Superhighway and Internet, few
really have an unambiguous definition of these terms. The Information Superhighway " is a
popular name for the Internet that has become an umbrella concept. A lot of people think that
Internet is the same thing as Information Superhighway, however the Internet and the
Information Superhighway are not the same entity. As a term it is used by different people to
describe different information systems developments that utilzes telecommunication as method
of communications. The Information Superhighway is an international value added system made
up of commercial and individual applications services providers that utilizes the Internet as the
backbone of an ever expanding communication system. It is an international "network of
networks" that links more than 3 million computers over the world.
The Internet originated in the U.S. as ARPANet. It was set up by Department of Defense to
support military research regarding the maintenance of communication between their computers
during an emergency, inaddition the military to allow its contractors to share software and other
information. As it expanded during the 1970’s and 1980’s the network became dominated by
university researchers and spread around the world. It was and still is a cooperative venture,
governed by the "Internet Society".
We have also seen the convergence of these computing and communications technologies,
making possible an astonishing array of innovative information products and services. Today,
easily accessible networks connect people all over the world. But the information people
exchange on them has been limited to voice, or more recently, data and faxes. Even these
different media are usually handled by separate transmissions.
The '90s has brought the Internet (or the Net) out of the world of business and academia, and
thrust it fully into the consumer sector (known as the World Wide Web or WWW), where it has
burgeoned into a virtual community without borders, with a population of close to 20 million
worldwide. However, despite military, institutional, and corporate involvement in bringing the
"information superhighway" to fruition, no one really owns the Internet. It's merely a giant
network of computers, owned or funded by a variety of entities or nations, all willing to be
connected to each other for their mutual benefit. It has been described as an anarchy, where
anyone can say or do almost anything at anytime, anywhere. Today, almost anyone with even a
modest income can reap the benefits of instantaneous communication for their own purposes
with a computer, a modem, and a telephone line.
Eventually, people will be able to reach out from any location they choose and seamlessly exchange information in the form of voice,
video, data and images in any combination they need, with no more effort than it takes to dial a phone call today.


=>Copyright © 1997 David Abrahams. All rights reserved.
Community Information Systems Technology

What is a community information system?
Community information system is a computer-based information system that is becoming increasingly common in
the U.S. and around the world. Also called community networks civic networks or free-nets. The goal of most
community information system is to strengthen community life for local residents by fostering economic
development, educational opportunities, health and well-being. Community information systems enables residents
of the community and its schools access to information and communication services; governments can provide
citizens with information; residents can communicate with elected officials quickly and inexpensively; small-and
medium-sized businesses have access to information and communications that normally would only be available to
large firms. Examples of community information system include information and referral services in public
libraries, community bulletin boards, local job placement services, and program databases in social service agencies
Addtion information on Community Information Systems is listed below:
Communities On-Line: Community-Based Computer Networks by Anne Beamish
Is an excellent resource if you want to learn about Community Networks. This is a full text research paper written
by Anne Beamish from MIT. Her paper defines community networks and describes their distinguishing
characteristics, goals, and history. It also gives a tour of several community networks and the services they offer.
Communities On-Line: Community-Based Computer Networks Anne Beamish, 1995.
"Community-Building Communications Technologies and Decision Support Systems" by Michael Shiffer
This paper describes the elements of a planning and community development process. It discuss the relationship of
multimedia information technologies such as community information networks, collaborative planning tools, and
advocacy/evaluative applications to planning and community development. Finally, it closes by identifying issues
and questions for further research. Michael Shiffer, 1996.
The Virtual Community: By Howard Rheingold
“The late 1990s may eventually be seen in retrospect as a narrow window of historical opportunity, when people
either acted or failed to act effectively to regain control over communications technologies. Armed with knowledge,
guided by a clear, human-centered vision, governed by a commitment to civil discourse, we the citizens hold the key
levers at a pivotal time. What happens next is largely up to us.” Howard Rheingold
The Virtual Community: By Howard Rheingold is an interesting book that talks about computers, technology and
potential impacts this medium has on social behaviors and personal liberties - good book to read. Howard
Rheingold, 1993.
The Community Information Exchange
The Community Information Exchange is a national, nonprofit information service, located in Washington, DC.,
that provides community-based organizations and their partners with the information they need to successfully
revitalize their communities. The Exchange provides comprehensive information about strategies and resources for
affordable housing, economic and community development, customizes this information for individualized
inquiries, and offers technical assistance.
Kids Web
A World Wide Web Digital Library for School kids between the grades k-12. If you have children or you are
interested in application links for your community, this is an excellent one to add to you link.
The Green Future Foundation Information Technology and Community
This is normally a great site to find out any information on information Technology and the Community, please
beware that a number of links have changed their address and do not work from this site. There are over 70
reference links at this site with plenty that work.
The San Marino Network
The San Marino Network (SMnet) was launched just over a year ago as a public-private collaboration linking The
Huntington Library, the San Marino Unified School District, the City of San Marino and the San Marino Public
Library. Their objective is to develop an innovative educational, research, and civic applications involving schools,
the public library, research institutions, local government, and individual citizens.
Home Page for City of Seaside, California
City of Seaside, California Home Page. Located on the Monterey Bay. An example of a community information
system that’s primary focus is economic development. It also includes, municipal information and many links to
local sites.
Blacksburg Electronic Village
Blacksburg Electronic Village is concidered the most wired community in the United States. The concept of the
BEV came about in early 1991. Virginia Tech boasted a campus-wide CBX data network and began looking into
ways to extend access to personnel in off-campus homes and offices. A decision was made to join forces with Bell
Atlantic of Virginia (then C & P Telephone) and the Town of Blacksburg to offer Internet access to every citizen in
town.
The next two years were spent preparing the town's information infrastructure -- installing digital switching
equipment and a fiber backbone. In the spring of 1993, a group of citizens beta tested the first distribution of the
BEV software which included Internet e-mail and gopher clients. The BEV officially opened its doors for business
in October, 1993. Initially, only dial-up access was offered; ISDN and Ethernet were made available later. The
software package was enlarged to include a full suite of Internet tools.
It is difficult to get an exact number of citizens using the BEV and the Internet. A recent town poll indicated that
62% of Blacksburg's 36,000 citizens (about 22,000) use Internet e-mail. Statistics based on IP addresses show
roughly 18,000 citizens have access to the BEV and the Internet. It is estimate that BEV’s membership is
somewhere between 18,000 and 22,000. For more information about Blacksburg Electronic Village vitit there site
by double clicking your mouse on the underlined title: Blacksburg Electronic Village

=>Copyright ©1995 David Abrahams. All rights reserved.
Telemedicine
Home | Information Superhighway | Community Information Systems Technology |

Return to Project Gallery | Trochim's Research Method's Home Page
Related Sites
What is Telemedicine?
All over the world people living in urban, rural and remote areas have limited access to timely quality medical care and
minimal access to specialty medical care. Many illnesses that are not only treatable and/or curable go untreated. The
result for these individuals are higher critical care incidences, longer hospital stays, or death. For health care providers it
means high cost to deliver medical care and tougher medical decisions regarding who can and can not recieve treatment
because of fiscal consideration and ethical issues centered around the principle of Utility "the greatest good for the great
number".
Telemedicine is a high tech solution to the problem of access to general and special medical care. Telemedicine uses
telecommunications to transfer medical data and information from one location to another. It has the ability to send
patients records, video conferencing, high resolution photos, radiological images and sound to any location equiped with
a PC with a modem, internet, connected to a satellite, video conferencing equipment and phone. Because telemedicine
can bridge the geographical problem and link hospitals, medical schools, research institutes, regulatory agencies like the
Center for Disease Control to physicians anywhere; the ablity diagnose and treate illnesses in a timely and quality
manner is a reality.
The following are two examples of TeleMedicine Technology applications in practice. The first involves the National
Jewish Center for Immunology and Respiratory Medicine and Los Alamos National Laboratory. Los Alamos National
Laboratory developed a telemedicine system called TeleMed which is based on a national distributed radiographic
repository for the National Jewish Center for Immunology and Respiratory Medicine in Denver, CO. Without leaving
their offices, participating doctors can view radiographic data via a sophisticated multimedia interface. With the new
system, a doctor can match a patient's radiographic information with the data in the repository, review treatment history
and success, and then determine the best treatment. This capability of creating a patient record "on-the-fly" from
multiple databases is caled "The Virtual Patient Record". An excellent telemedicine white paper was written by David
Forslund and David Kilman Los Alamos National Laboratory, titled: The Virtual Patient Record: A Key to Distributed
Healthcare and Telemedicine by David Forslund and David Kilman Los Alamos National Laboratory.
The second example of TeleMedine in action is the Case of ZHU Ling: The First International Telemedicine Trial to
China. On April 10, 1995, a student from Beijing University, BEI Zhicheng, and his fellows sent an SOS e-mail message
through the Internet to ask for international help for a young female university student, ZHU Lingling, suffering
unknown but severe disease.
The message was received over 2,000 e-mail replies from 18 countries and regions and the Internet played a very
important but complicated role in Zhu Lin's life. The challenge was to see just how far telemedicine by the Internet could
be taken to bridge the cultural, linguistic, and even political gulfs between China and the Western world. (To find out
more about this case press here: The First International Telemedicine Trial to China: ZHU Ling's Case). The First
International Telemedicine Trial to China: ZHU Ling's Case
Issues Raised by the Use of Telemedicine
It is cases like ZHU Ling that scratches the surface of the potential of Telemedicine and but also raises some very
important questions and legal and ethical issues.Issues Raised by the Use of Telemedicine are:
Summarized from legislation records of the 104th US Congress (You can access any of the TeleMedicine proceeding by
selecting 104th Congress and when you are on that Web page type TeleMedicine to retreive the Congressional Records)
104th Congress, the "Peer Reviewed Articles" section of TIE's Legal and Ethical Issues in Telemedicine, and Arent
Fox's "Telemedicine and the Law"
1. Confidentiality and security

❍ The confidentiality of medical records is the biggest barrier to the full realization of telemedicine.
❍ Instantaneous access, though extremely beneficial to the medical community, requires the institution of
large medical record databases containing information about all patients cared for by a particular
hospital, HMO, etc.
❍ Large databases of medical records, left unprotected, jeopardize the integrity of those records as
confidential.
2. Dispersion of liability
❍ Use of tele-consultation, teleassistance, and telemedicine, in general, disperses liability for damage
between many parties.
❍ The question of who is responsible when several parties are involved becomes an important one.
❍ Statutes for malpractice become muddled when jurisdictions are mixed.
3. Licensure and accreditation
❍ Telemedicine creates conflicts between states, or countries, over the rules and regulations of accreditation
and licensure.
❍ Medical licenses are issued by state and thus it is effectively illegal to practice medicine in another state,
since rules for accreditation may differ between them.
❍ The question then becomes, "Is it illegal for a patient in North Carolina to receive treatment advice from a
doctor in South Carolina?"
4. Fraud
❍ Accuracy of information. Is the information people receive online correct?
❍ The source and currency of information must be considered.
❍ Fraud relates also to medical records, is their privacy being compromised?
5. Tele-diagnosis may be ethically unsound

❍ As much help as telemedicine may be, there is no substitue for being in the room with a patient.
❍ Too much reliance on remote diagnosis or assistance can result in inaccurate diagnosis and is therefore
unethical.
Telemedicine Index of Additional Sources
Home | Information Superhighway | Community Information Systems Technology

Return to Project Gallery | Trochim's Research Method's Home Page |
For additional Telemedicine Information
Telemedicine Index of Additional Sources

Return to Project Gallery | Trochim's Research Method's Home Page
● Alaska Telemedicine Project -The program was developed specifically for the purpose of providing family physicians to
practice in rural parts of the state.
● California Telehealth/Telemedicine Coordination Project - to support the emergence of telehealth/ telemedicine networks
to benefit Californians.
● City of Bits form MIT Press. by Mitchell, W.J. -The network is the urban site before us, an invitation to design and
construct the City of Bits (capital of the twenty-first century), just as, so long ago, a narrow peninsula beside the
Maeander became the place for Miletos. But this new settlement will turn classical categories inside out and will
reconstruct the discourse in which architects have engaged from classical times until now.......... . Its places will be
constructed virtually by software instead of physically from stones and timbers, and they will be connected by logical
linkages rather than by doors, passageways, and streets.
● Health Care and the NII DRAFT FOR PUBLIC COMMENT PART I: What Is the Application Arena? Description of a
Health Care Information Infrastructure
● First International Telemedicine Trial to China - chronology of the case of a Zhu Ling, a female university student
suffering from an unknown but severe disease.
● The Health Law Resource -This page is for those interested in health care law. Primarily, this page is intended as a
resource for health care practitioners, professionals or anyone interested in learning more about the dynamic field of
health care law, and more specifically, the regulatory and transactional aspects of health care law practice.
● Telehealth Industry Project - McGill University-Industry Canada Telehealth project is studying the telehealth industry in
Canada, which includes telemedicine, health networks (such as CHINs), and telecare.
● Telemedicine and Telehealth Networks - a magazine for professionals in the field of telemedicine.
● Telemedicine and the Law
● Telemedicine Information Exchange -The TIE attempts to provide an all-inclusive platform without bias for all
information on telemedicine. Inclusion of items in the TIE does not necessarily infer endorsement by the Telemedicine
Research Center.
● Telemedicine Resources and Services - information on various telemedicine projects and services internationally.
● Index - PACSpage: Telemedicine / PACS Resource - dedicated to the medical imaging industry: PACS, teleradiology,
DICOM, telemedicine, radiology, informatics, and vendors.
● Usenet - sci.med.telemedicine -If you have questions regarding any aspect of TeleMedicine submit them to to TeleMed
User Net or you can read what others people have questions about.
To the top of the page | Home | Information Superhighway

Community Information Systems Technology | Trochim's Research Method's Home Page
Copyright ©1997 David Abrahams
Homeless Children in Mexico
The following page will present an overview of the present situation of homeless children in Mexico City. This
problem is of great concern to the parties involved: the local government, the rest of the population and the
children themselves. Every year the numbers of children on the streets continue to increase at alarming rates
which presents a concern for the Government. A "lost generation" continues to grow without any solution to the
problem in the near future. The need to control the homeless children problem is clear both for direct or indirect
reasons; crime, prostitution, drug trafficking have also begun to rise, leading to a clear relationship between the
two. Public hygiene demands an immediate action to resolve permanently this problem which is costing the
nation and the citizens as well. Mexico is not benefiting from a future adult population that is not being
productive or meeting the minimum needs for a Mexican citizenship. This growing population of children will
become adults one day; the city’s future gang members and criminal elite.
Mexico’s suffering economy has increased the signs of abject poverty. The crisis bring greater unemployment
and unemployment leads to greater poverty. Poverty expels children onto the streets. Many parents cannot
longer support these growing families.
These homeless children tend to become drug users which are bought with the day’s earnings from begging or
any other simple job. It serves as an escape from their reality; it helps them to forget. Every day more and more
children join the ranks of homelessness, mainly from the southern states of Veracruz, Oaxaca, Michoacan, and
Chiapas, which have been hardest hit by rising unemployment and poverty.
Street youth are not the problem but rather a symptom of the real problem. These children are evidence of how
the recent economic adjustments have damaged the weakest members of the country.
In Mexico City alone, there are over 13,000 children who have no home, no one to love and care for them.
Because of poverty, many have been turned out into the streets to fend for themselves by parents who could
not care for them. They scrounge food from garbage cans, earn what little money they can from odd jobs (when
available) and often turn to crime. Mexico has no welfare system to care for these children. There is very
detailed information on this theme worth looking into. Visit the related sites I think would be very beneficial in
understanding and learning more about homelessness.
The Convention on the Rights of the Child is the most widely ratified human rights convention in history. About
96% per cent of the world’s children live in States that have recognized their rights and are legally obliged to
fulfill them. The Convention on the Rights of the Child is helping to shape legal codes and school curricula, as
well as public and official attitudes about children and their rights.
From babies sold for adoption to youngsters stolen for sexual abuse or for begging and selling on the streets,
the numbers of kidnapped children are on the rise. There are even cases of youngsters mutilated to provoke pity
so they can fetch more money. Others are kidnapped to help smuggle drugs across the U.S. border. Most of the
victims are from working class and poor families. These sort of incidents are hard to follow when the authorities
find it difficult to maintain an exact population of the children on the streets. Mexico City’s legislature is
debating ways to increase penalties to deter child trafficking. But the real deterrent –an effective, sensitive law
enforcement system- requires more than just another law.
According to UNICEF's Final Report of the II Census of Minors on the Streets in Mexico City, there are 13,373
minors (68.5% male and the rest female), from which barely 1,850 (13.84%) live on the streets and have broken
any relation with their families, while the rest (11,523) only work. UNICEF makes a clear distinction between
children that only work on the streets and those that also live in it. According to the UNICEF there were 515
location points for these "children on the streets" in 1992; the new census of 1995 found 1,214 locations points:
135.73% more in only three years. From these 85 are used for sleeping , 100 for working and sleeping, and the
rest for working. The greatest number are located in avenues and crossroads (386), markets (323) and metro
stations (148), although there are also some in public parks and gardens, commercial corridors, tourist zones,
autobus terminals, parking lots, trash dumps, and cemeteries.
According to further studies, in 1992, children under the age of twelve represented 25% of the total children on
the streets, but in 1995, 6,323 children between 0 and eleven years old were registered, an equivalent of 47.2% of
the total.
Migration contributed to the increasing effect with 65% of the total population; Oaxaca, Puebla, and the State of
Mexico were the states where most came from. Other states contributing to the increase are to a lesser degree:
Michoacan, Morelos, Queretaro, Guerrero, Hidalgo, Chiapas, Yucatan, Jalisco, Veracruz, and Nuevo Leon. Only
35% of these street children were born in Mexico City.
The living conditions of these children are brute: only 5.38% eats meat, while 61.29% eats tacos, tortas, and
tamales (all mainly made up of flour or wheat), and 23.65% eats "junk food". In the last six months, 90.3%
reported to have been sick, mainly from respiratory or gastrointestinal problems.
According to more data obtained, seven out of ten of these children use drugs, such as thoner, cement,
mariguana, alcohol, and tablets. 24.7% of the total has been using them for over three years.
Their school level is not registered but it is assumed through several analyses that 88.1% know how to read and
write, while 11.9% is illiterate. These children solicit on the streets (60%), sell some product (10.06%), or to do
some very simple work (29.94%). Of all these, 37.63% earns 20 pesos a day; 43.01% up to 50 pesos a day and
19.36% up to 100 pesos; a lot more than the minimum wage for Mexico (approximately $25 pesos per day).
Many programs have been implemented in an attempt to eliminate this growing problem, but all have failed due
to the lack of an appropriate application of the ex-ante and ex-post program evaluation and research design.
There have been several recent succesful approaches to effective and efficient program implementations based
on strong evaluatory concepts. Among several relevant research professors in the area of program evaluation
and research design, one stands out from the rest. Prof. William Trochim's knowledge on the subject can greatly
increase the understanding of such an important field. It is strongly recommended to visit his Home Page where
Prof. Trochim goes into great detail on the subject.
Back to Top
Contact Information
Electronic mail address

eb32@cornell.edu
Back to Top
Comments and Suggestions
[FrontPage Save Results Component]
Please tell me what you think about this page.
Comments:
From:
Submit Comments Clear Form
Back to Top
Last revised: August 23, 2004.

Understanding Welfare Reform
"When an individual is no longer a true participant, when he no longer feels a sense of responsibility to his society, the content of democracy is emptied. When
culture is degraded and vulgarity enthroned, when the social system does not build security but induces peril, inexorably the individual is impelled to pull away
from a soulless society. This process produces alienation -- perhaps the most pervasive and insidious development in contemporary society." Martin Luther King,
Jr.
INTRODUCTION:
The recent passage of the Personal Responsibility and Work Opportunity Reconciliation Act of 1996 (P.L. 104-193), an extensive and historical reform of the
nation's welfare system is one illustration of a general shift towards devolution of our government's functions. This legislation restructures the delivery of many
important social programs with effects that are still unimagined, hence creating a critical role for evaluation and other forms of applied research (Corbett 1996).
Translating this federal legislation into action, monitoring its progress, assessing its human and financial consequences and exploring its assumptions offer varied
opportunities to make use of many different types of evaluation research in policy and program settings.
Welfare reform was largely the result of political pressure for cultural change, the devolution shift, and fiscal pressures on federal spending produced by demands
for tax cuts, the rising cost of senior citizen entitlements, and the budget deficits that accumulated during the 1980s (Peterson 1995). State governments too,
however, face similar pressures and reducing costs has become a common public policy goal. Indeed, some would suggest that the state's motive for requesting
greater authority is solely to reduce the costs of delivering such services (Chisman 1995). There are many different issues and questions that result from the
general devolution shift and the specific legislative changes now enacted under the welfare reform political movement. This web page is intended to be a
beginning guide to the basics of welfare reform.
CONTENTS:
This page contains several parts including some brief discussion of the basics of welfare reform, a list of several resources available on the web, a bibliography of
many useful reference materials, as well as a list of several related web sites. In addition, I have provided several sites specific to social science research and
methodology to augment the information on welfare reform. These sites will provide one with the tools that may be needed to design and implement an effective
evaluation of welfare reform efforts.
MENU:
● The ABC's of Welfare Reform

● Resources Available
● Related WWW Sites
CONCLUSIONS:
The web can be a useful research tool for any area of interest. The ability to get information quickly (and in many cases for free) is one excellent feature of
conducting research on the web. The other great thing about web research is the enormous amount of information available to anyone. This abundance, however,
can lead to a long, tedious process of "sifting" through all that your search yields to get to information that is relevant to your specific interest. This web page is
one attempt to help others with that sifting process - in relation to welfare reform, evaluation methodologies and the general area of social science research. One
final point is that often one can obtain some quality information on their topic, but beware of references that are in essence opinions of certain types of
organizations (such as advocacy groups or political lobbyist). This is not to denote that type of information as a resource, as long as it fortifies some more
substantiated references from other types of organizations. So, I hope this page is useful to you, I have included two good search engines so you may begin your
own research -- good luck...
Search Engines:
Alta-Vista
Yahoo
...and have fun searching the web! If you have any questions or comments on this web page, please contact Laura Colosi at Cornell University.
The ABC's of Welfare Reform
The Personal Responsibility and Work Opportunity Reconciliation Act of 1996
Prior to the Personal Responsibility and Work Opportunity Reconciliation Act of 1996, AFDC benefits were jointly funded by both the federal and state
governments, but benefit amounts were set by the states (Peterson 1995). The new welfare reform bill eliminates Aid to Families and Dependent Children (AFDC)
and replaces it with a fixed block grant program known as Temporary Emergency Assistance (TEA) that allows states to set their own eligibility standards and
benefit levels for individuals referred to as Temporary Assistance for Needy Families (TANF). Consequently, social welfare and job training programs will allow
for substantial state discretion in determining eligibility and designing delivery systems. This situation will result in greater variation in benefit levels across state
lines and an increase in experimentation by states - potentially in implementing punitive measures to reduce dependency. States are now in a race to reduce welfare
benefit levels to avoid attracting higher numbers of poor persons to their state. This has become known as the "race to the bottom" to see which states can
eliminate the most families from their welfare rolls.
Evaluation of Welfare Reform
Evaluation research involving the impact of welfare reform is currently conducted by non-profit organizations and government agencies. These efforts are aimed at
creating common outcome measures to assess individual well-being and program or policy success. These efforts are extensive, for example, the Urban Institute's
New Federalism project is one of its largest undertakings to date - involving total funds of $30 million. The project will span many years and includes several
components including the tracking of welfare program's changing eligibility criteria, benefit levels, time limits and behavioral incentives. The hope is to provide
accurate descriptions of the "evolution of state policy choices and broad trends outcomes affecting children and families (Urban Institute, 1996)." The project also
involves extensive case studies of twelve states' development and implementation of new welfare policies in an effort to understand the ways in which policies are
implemented, and the extent of variation between localities in both the implementation of their policies and the effect new policies have on service delivery and
coordination.
The Administration on Children and Families (ACF) at the Department of Health and Human Services is the primary office responsible for the federal
government's evaluation efforts of welfare reform. ACF has recently begun a new initiative with the Office of the Assistant Secretary for Planning and Evaluation
(ASPE) to work with states to include measuring child outcomes in the course of conducting their state welfare evaluations. The goal is to reach a consensus
among participating states on a set of core measurements which will allow future evaluations of welfare waivers to show the effects of welfare reform on child
well-being and also to increase the state's capacity to track trends in child outcomes (ACF, 1996).
Evaluation research of welfare reform includes assessments of employment programs, TEA programs, child care programs and child support enforcement efforts
and many types of organizations contribute to these efforts including, but not limited to, the National Governor's Association, Center for Law and Social Policy,
Center on Budget and Policy Priorities, Congressional Research Service, General Accounting Office, and Child Trends, Inc. Thus it is apparent that these efforts
are both extensive and varied in approach, and it is my contention that evaluators can contribute to these efforts by providing expertise in evaluation within both
the policy and program management settings. This expertise, coupled with an understanding of the different uses for evaluation, will only strengthen the
contribution provided by evaluation research to the assessment of the impact of welfare reform.

Printed Materials
Papers Available Online
The JOBS Evaluation: How Well Are They Faring?-- http://aspe.os.dhhs.gov/HSP/cyp/jobchdxs.htm

This paper was written by Kristin Moore of Child Trends, Inc. in Washington, D.C. and provides a good summary of evaluation efforts of the
JOBS (Job Opportunity and Basic Skills) Program prior to the new legislation (February 1996).
Welfare Reform in an Uncertain Environment-- http://epn.org/clasp/clunce.html

Written by Mark Greenberg of the Center for Law and Social Policy in Washington, D.C.> (February 1996).
Creating a Work Based Welfare System Under TANF-- http://epn.org/clasp/clunce.html

This paper is also found at the Center for Law and Social Policy's Web Site, and is written by Steve Savner (November 1996). It addresses the
new block grant program Temporary Assistance to Needy Families (TANF) that replaces AFDC and its impact on work incentives for welfare
recipients.
Problems in the Evaluation of Community-Wide Initiatives-- http://epn.org/sage/rsholl.html

This paper provide the reader with a general overview of the problems incurred when undertaking evaluation of community wide initiatives.
Bibliography of Selected Materials
Welfare Reform & Devolution
Bane, Mary Jo and Ellwood, David T. Welfare Realities: From Rhetoric to Reform. Harvard University Press, Cambridge, Ma., 1994.
Chisman, Forrest P. "Can the States Do Any Better?" The Nation. May 1995, pages 600-602.
Clinton, Bill and Gore, Al. Putting People First: How We Can All Change America. Times Books, New York, 1992.
Cook, Gareth C. "Devolution Chic: Why sending power to the states could make a monkey out of Uncle Sam." The Washington Monthly. April
1995, pages 9-16.
Danziger, Sheldon H., Sandefur, Gary D., and Weinberg, Daniel H., eds. Confronting Poverty. Harvard University Press, Cambridge, Ma., 1994.
Dunn, W. and Kelley, R., eds. Advances in Policy Studies since 1950. Policy Studies Review Annual, vol. 10. Transaction Publishers, New
Brunswick, N.J.: 1992.
Gillespie, Ed and Schellhas, Bob, eds. Contract With America. Times Books, New York, 1994.
Handler, Joel. The Poverty of Welfare Reform. Yale University Press, New Haven, Ct., 1995.
Peterson, Paul. "Who Should Do What? Divided Responsibility in the Federal System." The Brookings Review. Spring 1995, pages 6-11.
Reuben, Richard C. "The New Federalism." ABA Journal. April 1995, pages 76-81.
Rivlin, Alice M. Reviving the American Dream: the Economy, the States & the Federal Government. The Brookings Institution, Washington,
DC, 1992.
St. George, James R. "Unfunded Mandates: Balancing State and National Needs." The Brookings Review. Spring 1995, pages 12-15.
Program Evaluation
Chambers, Donald E., et. al. Evaluating Social Programs. Boston, Ma. Allyn and Bacon Publishers, 1992.
Chelimsky, Eleanor. "Linking Program Evaluation to User's Needs." In the Politics of Program Evaluation. Sage Publications, Newbury Park:
1987. (edited by Palumbo, Dennis.)
Chelimsky, Eleanor. "On the Social Science Contribution to Governmental Decision-Making." Science, Vol. 254; October 11, 1991.
Chelimsky, Eleanor. "The Political Environment of Evaluation and What it Means for the Development of the Field." Evaluation Practice, Vol.
16, No. 3, 1995.
Greene, Jennifer and McClintock, C. "The Evolution of Evaluation Methodology." Theory into Practice, XXX. Ohio State University, 1991.
House, Ernest. "Evaluation and Social Justice: Where Are We?" In McLaughlin and Phillips, eds. 1990.
House, Ernest. "Putting Things Together Coherently: Logic and Justice." New Directions for Program Evaluation , no. 68, Winter 1995: Jossey-
Bass Publishers.
Palumbo, Dennis. "Politics and Evaluation." In the Politics of Program Evaluation. Sage Publications, Newbury Park: 1987. (edited by
Palumbo, Dennis.)
Patton, Michael Q. "The Evaluator's Responsibility for Utilization." Evaluation Practice, vol. 9; May 1988, Sage Publications.
Shadish, William., R., Jr., Cook, Thomas C., and Leviton, Laura C. Foundations of Program Evaluation. Sage Publications, Newbury Park,
1995.
Weiss, Carol H. "Evaluation for Decisions: Is Anybody There? Does Anybody Care?" Evaluation Practice, vol. 9, February 1988; Sage
Publications.
Weiss, Carol H. "Evaluation Research in the Political Context: Sixteen Years and Four Administrations Later." In McLaughlin, M.W. and
Phillips, D.R. Evaluation and Education: At Quarter Century. 1991.

Related Web Sites
Web Sites: Welfare Reform, Children and Families
Non-profit organizations
American Public Welfare Association-- http://www.apwa.org
Brookings Institute-- http://www.brook.edu
Center for Law and Social Policy-- http://epn.org/clasp/
Children's Defense Fund-- http://www.tmn.com/cdf/index.html
Child Trends, Inc.-- http://www.childtrends.org
Child Welfare League-- http://www.handsnet.org/handsnet2/cwla
Economic Policy Network-- http://epn.org

This site is fantastic -- it provides links to dozens of organizations involved in welfare reform.
Manpower Demonstration Research Corporation-- http://www.stw.ed.gov/RFI/manpower.htm
Government Resources
U.S. Department of Health and Human Services-- http://www.os.dhhs.gov
Administration for Children and Families-- http://www.acf.dhhs.gov
Assistant Secretary of Planning and Evaluation-- http://aspe.os.dhhs.gov/hsp/isphome.htm
U.S. Census--http://www.census.gov
U.S. General Accounting Office-- http://www.gao.gov/index.htm
University Sites
Children Youth and Family Consortium (UMN)-- http://www.cyfc.umn.edu
Welfare Reform References (UMN - College of Human Ecology)-- http://www.cyfernet.mes.umn.edu:2400/welfare_reform

This site provides a great deal of written references, as well as links to other relevant organizations involved in welfare reform.
Institute for Research on Poverty-- http://www.ssc.wisc.edu/irp
This site is at the university of Wisconsin, a prominent institute on poverty and welfare reform. It provides a great deal of summary materials, as well as full text of
many good articles written by their scholars on welfare reform.
Legislative Resources
Thomas--http://thomas.loc.gov/home/thomas.html
A great site to obtain full text legislation, as well as summaries and legislative histories of each bill as it is proposed or passed.
LEGI-SLATE-- http://www.legislate.com
Supreme Court-- http://www.usscplus.com/index.shtml

This site provides full text of most Supreme Court decisions, as well as a search mechanism to find any decisions related to your topic.
News Organizations
Politics Now-- http://www.politicsnow.com

This site was created by ABC news, and has search mechanisms by topic or by date.
Washington Post-- http://www.washingtonpost.com

My hometown favorite! Allows you to read each day's newspaper, search by topic or date -- as well as look through the Washington Post archives online.
Web sites: Social Science Research & Methodology
Bill Trochim's Center for Social Research Methods-- http://trochim.cornell.edu
Project Gallery: Program Evaluation and Research Design-- http://trochim.cornell.edu/gallery/gallery.htm
Academic Press-- http://www.apnet.com/www/catalog/so.htm
Social Science Research Council-- http://www.ssrc.org/index.htm
Institute for Research in Social Science (IRSS), UNC - Chapel Hill-- http://www.unc.edu/depts/irss/ssassoc.htm

Child Welfare Resources on the Web
Melody R. Johnson
Welcome to my first web page! Until about two months ago, I hadn't even browsed the web, let
alone written a web page. Thanks to the course I'm currently taking in Program Evaluation and
Research Design, I have the opportunity to create my own web page and provide you with helpful
information. All of the research for this web page was conducted using the various search engines I
came across while perusing the WWW Learning Center.
If you are a child advocate, or if you are interested in any area of child welfare, this is the page for
you. On this page, I have attempted to provide you with a sample listing of some general child
welfare resources and organizations along with a brief description of each. This page contain links
to resources at both the state and national level. The information provided is not listed according to
topic (i.e., poverty, homelessness, child abuse, foster care, etc.) because many of the organizations
deal with more than one of these issues, and it would be redundant to list them more than once. For
your information, I have included an annotated list of printed child welfare resources (i.e., books,
publications, and journals). In addition, there is a section on evaluation of child welfare programs. I
have concluded by sharing my experience as a first-time user of the web.
CONTENTS:
● State Organizations
● National Organizations
● Books, Publications, and Journals
● Evaluation of Child Welfare Programs
● Conclusion: My Experience Using the Web

STATE ORGANIZATIONS
Boys Town USA

Officially known as Father Flanagan's Boys Home (located in Boys Town, Nebraska), this
organization serves as home to hundreds of abused, abandoned, and neglected children. Also
located in over 17 other states, including California, Texas, Lousiana, New York, Georgia,
Pennsylvania, and Washington, DC, Boys Town provides long-term residential care to at-risk youth.
There is a 24-hour national, toll-free hotline that offers crisis, resource, and referral service for at-
risk children and parents. In addition, there is a resource and training center that offers specialized
workshops, program development service and educational training.
Child Advocates of Santa Clara and San Mateo Counties (CASA)

This non-profit organization utilizes over 500 volunteers to provide one-on-one service to abused
and neglected children who are in foster care systems in California's Santa Clara and San Mateo
Counties.
Child Trends
Child Trends is a non-profit research firm that focuses on children and families. It was established
in January 1979, with initial support from the Foundation for Child Development. The primary goal
of this organizations is to improve the quality, scope, and use of research and statistical
information concerning America's children.
Children's Home Society of Florida

This organization provides adoption services to disadvantaged children. It's mission is to respond
to the unique needs of children and fulfill every child's right to a stable living environment and the
opportunity for healthy family development.
Georgia Association on Homes and Services for Children (GAHSC)

This organization represents numerous welfare agencies and professionals who serve at-risk
children and families by providing leadership and resources.
The Judith Granger Birmingham Center for Child Welfare

Located at the University of Texas at Arlington School of Social Work, this organization seeks to
help equip child welfare practitioners with current, detailed, and scientific knowledge about the
effective practice models, ways to support the adequate development of children and families, and
strategies to preserve families. The Center also serves as a research and resource center for Texas,
the Southwest, and the nation, disseminating knowledge to improve the conditions of vulnerable
children and their families.
North Carolina Child Advocacy Institute
This is one of the key child advocacy organizations in North Carolina. In addition to providing
information on children and children's issues, this organization provides information on policies
and legislation affecting children.
NATIONAL ORGANIZATIONS
The Annie E. Casey Foundation

This private charitable organization seeks to help build better futures for disadvantaged children by
fostering public policies, human service reforms, and community supports that more effectively
meet the needs of these vulnerable children. This is accomplished through grants made by the
foundation to help states, cities, and neighborhoods develop more innovative, cost-effective
responses to the needs of today's most vulnerable children.
Children's Defense Fund

This private non-profit organization provides a strong and effective voice for children of America,
who cannot vote, lobby, or speak for themselves. Although CDF advocates for all children,
particular attention is paid to the needs of poor, minority, and disabled children. The Children's
Defense Fund believes that no child should be left behind, and that every child needs and deserves
a Healthy Start, a Head Start, a Fair Start, a Safe Start, and a Moral Start in life.
Children's House
An interactive resource center, Children's House is a meeting place for the exchange of information
that serves the well being of children. It generates and disseminates knowledge by translating
research and programming into policy and practice.
Children Now
This organization promotes pioneering solutions to improve the lives of America's children.
Through innovative research and communications strategies, Children Now reaches and builds
partnerships with parents, lawmakers, concerned citizens, business, media, and community leaders
to affect positive change. Although the focus is on children who are poor or at risk, this
organizations seeks to improve the conditions for all children by making them top priority.
Children's Partnership
This nonprofit, nonpartisan policy and strategy center informs leaders and the public about the
needs of America's children. This is accomplished by undertaking research and policy analysis,
publishing reports and materials, developing multimedia campaigns, and forging new alliances
among parents, policymakers and the private sector to achieve tangible gains for children.
Child Welfare League of America

The nation's oldest and largest organization devoted entirely to the well-being of America's
vulnerable children and families, the Child Welfare League of America provides a wide range of
services, along with its member agencies, to protect abused, neglected, and otherwise vulnerable
children. In addition, the CWLA strives to stengthen and support families.
Coalition for America's Children
The 350+ members of this nonpartisan organization have been pressing candidates in their states
and communities to announce a platform of positions and proposals on vital children's issues.
They have been posting questions to candidates about key children's issues, highlighting kids'
concerns in their communities, conducting public opinion surveys, and sharing among each other
-- and with every citizen -- their progress in bringing kids' issues to the forefront of the political
debate.
Comprehensive Child Development Program

This national organization seeks to provide comprehensive support services that will enhance the
physical, emotional, and intellectual development of low-income children. The goal is also to
address the needs of low-income children and families by promoting school readiness and
economic and social self-sufficiency.
UNICEF
interactive resource center, Children's House is a meeting place for the exchange of information
that serves the well being of children. It generates and disseminates knowledge by translating
research and programming into policy and practice.
BOOKS, PUBLICATIONS, AND JOURNALS
Boys Town Press

Boys Town Press provides resources to professionals, educators, and parents who serve youth.
Topics range from common sense parenting to the well-managed classroom. Other topics include
teaching bacis and social skills to youth, building skills in high-risk families, caring for youth in
shelters, and working with aggressive youth.
Child Welfare Review

An electronic journal, Child Welfare Review (CWR) covers various issues related to the well-being of
children. CWR contains over 100 articles organized into six categories: child abuse, child advocacy,
child poverty, foster care and adoption, welfare reform and children, and values and children.
Children and Youth Services Review

This journal is an interdisciplinary forum dedicated to the development of a scientific and scholarly
knowledge base on social service programs for children and youth. The goal of the Review is to
provide in-depth coverage of adoptions, child abuse and neglect, child welfare, foster care, income
support, mental health services, and social policy. The Review, which provides full-length articles,
current research and policy notes, and book reviews, is published eight times per year.
The Future of Children

The Future of Children is a publication of The Center for the Future of Children, The David and
Lucille Packard Foundation. This free publication disseminates information on major issues related
to children's well-being. Special emphasis is on providing objective analysis and evaluation,
translating existing knowledge into effective programs and policies, and promoting constructive
institutional change.
Prevention Yellow Pages

This site provides a worldwide directory of programs, research, references and resources dedicated
to the prevention of youth problems and the promotion of nurturing children. The site provides
information on over 35 different topics include abuse, crime, delinquency, education, high risk,
resilience, and violence.
Policy Review
Policy Review: The Journal of American Citizenship, is the flagship magazine of the Heritage
Foundation dedicated to rebuilding the institutions of American citizenship (i.e., familiee,
neighborhoods, public and private schools, and voluntary civic organizations). Although its focus is
not exclusively on children, Policy Review publishes articles on topics directly related to children,
such as adoption and foster care, crime, and education.
The State of the World's Children
Marking the 50th anniversary of UNICEF, this report summarizes the efforts of UNICEF to support
children of war who suffer from armed conflict. The report also provides statistical data on the
current state of the world's children and advocates improving the lives of children.
Trends in the Well-Being of America's Children and Youth: 1996

This report is a publication of the U.S. Department of Health and Human Services, Office of the
Assistant Secretary for Planning and Evaluation (ASPE). Most studies on children and youth issues
originate from the Division of Children and Youth Policy, a component of the Office of Human
Services Policy.This report provides statistical data on child population characteristics, family
structure, poverty and income, and other issues affecting children and youth.
The Welfare of Children

In this book, Duncan Lindsay, a leading authority on child welfare, critically examines the current
child welfare system. He focuses on the transformation of child welfare into child protective
services and notes that there is no evidence that the transformation into protective services has
reduced child abuse fatalities or provided a safer environment for children. He further stresses the
dire need for the child welfare system to address the well-being of disadvantaged and impoverished
children.
EVALUATION OF CHILD WELFARE PROGRAMS
Evaluation is paramount in determining the effectiveness of any program. There are several
research methods available for evaluating child welfare programs. Following are just some of the
various methods employed by different organizations.
The Annie E. Casey Foundation issues an annual statistics report on the status and well-being of
America's children.
Boys Town has established a Program Planning, Research and Evaluation Department (PPRE)
which assists in the design of and plans for children and family programs. PPRE also functions to
measure program effectiveness and disseminate knowledge about program innovations. Click here
for a list of published articles and papers written by PPRE researchers.
The Center for the Future of Children provides information on the long-term outcomes and effects
of early childhood programs.
The Child Welfare Partnership has a page that specifically describes methods for conducting
evaluation of child welfare programs.
The General Accounting Office (GAO) compiles statistics on child welfare services in online
research reports.
The National Committee to Prevent Child Abuse utilizes a 50-state survey as well as national
statistics on child abuse to summarize the number of child abuse reports and fatalities, and
changes in the funding and scope of child welfare services.
Human Resources Development Canada (HRDC) also serves as a resource for program evaluation.
While the scope of HDRC is the evaluation of programs and policies in general, the page provides
useful information on analyzing programs and policies and their effectiveness.
An evaluation of North Carolina's Child Welfare System identified both the strengths and
weaknessess of the system, while offering suggestions on how to improve service delivery.
In addition to conducting evaluation and policy research, the Office of the Assistant Secretary for
Planning and Evaluation (ASPE) conducts policy analysis and advice, policy developments, and
strategic and implementation planning. Some of the work conducted by ASPE includes
Performance Improvement 1995, the first annual report on evaluation activities of the U.S. Public
Health Service. Within ASPE, there is a Division of Children and Youth Policy, which conducts
research and evaluation on a variety of issues affecting children and youth. The Division also
provides studies in progress on numerous topics as well as a complete listing of program
evaluation abstracts.
Robinson G. Hollister and Jennifer Hill of the Russell Sage Foundation wrote an excellent (and quite
extensive) paper on Problems in the Evaluation Community-Wide Initiatives. They highlight both the
importance of and the difficulty of meeting random assignment standards, discuss the nature of the
unit of analysis, and explain the problem of boundaries. They also provide the basic requirements
for statistical inference in evaluations as well as discuss problems with selection bias, types of
comparison groups, statistical modeling of community-level outcomes, and types of hypotheses
which could be tested. They even recommend steps to developing better methods for evaluating
child welfare initiatives, and conclude with an appendix containing annotated examples of studies
using various evaluation strategies.
Finally, there is a Youth Indicator Survey offered by the Department of Education. This site
describes trends in the well-being of American youth.
NOTE: For your information, the Knowledge Base Home Page provides detailed
information on evaluation, research methods, and related topics.
MY EXPERIENCE USING THE WEB
While many of my colleagues and professors rave about the World Wide Web, I have yet to
share their enthusiasm. It literally takes HOURS to surf through the different search engines, and
there are SO MANY of them. I found myself in a zombie-like trance from the countless hours I spent
staring at the computer screen trying to find child welfare resources. I often became frustrated and
discouraged. I was tempted to just go to the library, but that would have defeated the purpose of
this assignment.
As I've mentioned, this was my first time using the Web, so I am willing to give it another
chance. I'm confident that with more frequent usage, I will become a master at this, and the Web will
serve as a useful research tool. I may even consider using the Web this summer as I think about my
dissertation topic!
Check out this site for web pages on other interesting topics. Contact me via to
make comments and/or suggestions.
Copyright © 1997, Melody R. Johnson. All Rights Reserved

Solution Focused Therapy for Child Welfare
Gretchen K. Rymarchyk
The purpose of this page is to provide readers with information regarding Solution Focused Therapy , Child Abuse Prevention and
Treatment , and evaluation and research relating to each of these. Interested readers will find numerous worthwhile links to other
sources of information on these topics as well. This page was developed as a part of the learning required by a graduate course in
Quantitative Social Research Methods at Cornell University . Other pages of similar purpose but differing topics can be found on the
course home page in the Project Gallery.
In the field of mental health, brief therapy techniques are becoming increasingly popular. Traditionally, mental health counseling has
meant spending years going to a therapist to dissect personal problems ad infinitum, and combing through childhood memories to
pinpoint hidden sources of present day misery. This required vast financial and time committments without much guarantee of relief -
elements that most people in today's world cannot or will not tolerate. Mental health professionals have responded creatively by
developing several brief therapies that not only provide a significant reduction in time committments (and therefore financial
committments), but are much more effective than the traditional methods.
Solution Focused Therapy (SFT) is one of these brief therapies. SFT is theorized to be almost immediately effective, effective in the
long term, and on average only requires six to twelve sessions to complete. This technique focuses on solutions rather than problems.
Clients are encouraged to think about times when their problem did not exist, and how these times contributed to the absence of the
problem (read: solution), and how to recreate such circumstances in their present situations. Focus is on the clients' strengths and
abilities rather than their weaknesses. Solutions are derived by clients themselves and therefore not only are they more involved in
their success, but the solutions fit their unique lifestyles. Finally, because the clients find their own solutions that work, often self-
esteem is increased. SFT has been applied very successfully in a variety of situations, including addictions counseling ,marriage
counseling , pastoral counseling , mediation ,and with groups of school children .
Notable personalities in the development of and training for SFT include Insoo Kim Berg and Steve de Shazer of the Brief Family
Therapy Center in Milwaulkee, Wisconsin. They frequently run workshops around the country on SFT. Berg in particular will ask an
agency for their "worst" client - the one who seems to most frequently be in crisis - to work with during these workshop, and is
successful with him/her in this short time. Also , Scott Miller and Michael White (?) give workshops worldwide .
Insoo Kim Berg authored a book called Family Based Services: A Solution-Focused Approach (1994) in which she outlines the
application of SFT to child welfare services. Due to the crisis oriented nature of child abuse and neglect, the seriousness of the
problem, the transience of the majority of the affected population, and the limited time child protective workers have to work with
the families, SFT is clearly one of the most appropriate options available to child welfare workers.
The following are further links to information on training and education in the methods and techniques of SFT:
Wounded Healer provides a book list for the treatment and prevention of families that are at risk of abuse and neglect, or have
already experienced it in some form.
Solution-Focused Counseling Groups: A Key for School Counselors gives an excellent description of SFT, its application to group
work, and its application to school children. It follows with a case example of SFT as applied to a group of school children, and ends
with a long list of references as well as relevent details for those who wish to replicate this work.
Solution Focused Therapy in the Managed Care Environment is the title of a training session offered by Community Program
Innovations. According to the training summary, SFT is essential for the efficient effectiveness that is required by managed care.
Although this particular training has already taken place, Community Program Innovations offers many trainings in the areas of
children and families. CPI sends trainers all around the country to conduct training programs and workshops.
Formal education is offered at a college in Vancouver, which specializes in the techniques of Milton Erickson. This college offers
courses so that one can become a Registered Professional Clinical Counselor, and a member of the Solution Focused Counseling
Association.
Change-Works Inc. is a company whose business is the use of Solution Focused Counseling for many situations, including athletes
and businesses. This company also provides training and workshops in the area of SFT.
To join a SFT List Serve , click here. Some of the most up-to-date information in the field can be obtained by subscribing to the
appropriate list serve!
Advanced SFT training will be held this May in Philadelphia by Scott Miller, Ph.D. This training is primarily for Pastoral
Counseling, however the concept of SFT can be applied to many different areas of counseling, so child welfare workers should not
be put off by this.
The following are further links to child abuse and neglect related information:
Considering Children summarizes an evaluation performed by a private agency on North Carolina's Child Welfare Service, and
resulting recommendations indicate that the protective services for children are themselves in a crisis state.
1995 Audit: Child Welfare Outcome Measures is a summary of the state of Utah's struggle to find appropriate evaulation measures
for their child welfare services. One of these measures will include a client satisfaction survey. It is likely that clients who have to
spend more time in "the system" with fewer changes for the better will report less satisfaction than those who spend less time but get
better results. SFT would be an excellent intervention on behalf of clients.
National Data Archive on Child Abuse and Neglect provides analysis of research, and makes data available to researchers in the area
of child abuse and neglect. There is also a summer research institute and a discussion group for researchers that can be accessed at
this site.
To join the International Society for Prevention of Child Abuse and Neglect click here. This site provides information on world wide
membership, sponsors, a journal, a newsletter, international partners, and conferences. This is an interdisciplinary and international
organization.
The American Professional Society on the Abuse of Children provides training to professionals, "provides research guidelines to
inform professional practice," educates the public on children's issues, and is active in public policy.
The National Committee to Prevent Child Abuse home page has an index listing information about the organization. If you click on
Research you can read about the Committee's efforts in the areas of program evaluation and development.
It is clear that the nature and pervasiveness of the problem of child maltreatment is serious,
and needs a remedy that is quick, effective, and reliable. Child Welfare Workers, as well as
Mental Health Professionals in other areas should be encouraged to use Solution Focused
Therapy in work with children and families, with special emphasis on families with at-risk
children.
Top of Page
Welcome to the World of Service Initiatives!
"All that is necessary for the forces of evil to win the world is for enough good men to do nothing." Edmund Burke
INTRODUCTION:
The purpose of this page is to provide a resource base of GOOD PEOPLE WHO ARE DOING SOMETHING! With the plethora of
problems facing society (including homelessness, AIDS, child abuse, the destruction of the environment, etc) and the instability and
uncertainty of government support, many initiatives have emerged to creatively and effectively address a broad range of issues. The
initiatives on this page tend to tackle problems from a human resources perspective. In other words, these are links to organizations,
universities and other sites where the focus is on training service-oriented leaders. Problems related to doing research on the web,
particularly in regard to service initiatives, are also addressed.
This page contains several parts. There are links to national organizations, state organizations, university programs, and resource
centers that are actively engaged in service initiatives. These sites have been carefully chosen from hundreds of options that emerged
through searches on several different engines. Following these groupings there is a section on the merits of these links for research.
Finally, some conclusions are made.
MENU:
● NATIONAL ORGANIZATIONS
● STATE ORGANIZATIONS
● UNIVERSITY PROGRAMS
● RESOURCES
● RESEARCH ON THE WEB AND CONCLUSIONS
NATIONAL ORGANIZATIONS:
These links are a sampling of national service initiatives. They include the more well-known programs, and oftentimes
they provide links to lesser-known, high-quality projects.
The Corporation for National Service -- http://www.cns.gov/

This page describes the various programs under the Corporation for National Service -- including AmeriCorps, Learn and
Serve America and the National Senior Service Corps.
Campus Outreach Opportunity League (COOL) -- http://www.cool2serve.org/cool/home.html

COOL is a national organization that helps college students start and strengthen community service programs on
campuses. This page explains the organization and its various programs. As well, it has a page of over 25 links to
campus-based community service programs.
The Peace Corps-- http://www.peacecorps.gov/

This page describes the Peace Corps, tells how to become a volunteer and is a link to other government departments and
programs.
Association for Experiential Education (AEE) -- http://www.princeton.edu/~rcurtis/aee.html

AEE is a non-profit organization that is committed to the development, practice and evaluation of experiential education .
This page describes the organization, membership, and gives a very helpful bibliography of related materials.
The Student Coalition for Action in Literacy (SCALE) -- http://www.unc.edu/depts/scale/what.html

SCALE is a national service organization that helps college students develop literacy programs in their communities.
The Council of Chief State School Officers -- http://www.ccsso.org/

The Council of Chief State School Officers is a nationwide, non-profit organization composed of public officials who lead
the departments responsible for elementary and secondary education in the US. This site provides links to other state
agencies, includes legislative position papers on education issues, and policy statements on different issues (including
service).
Back to the home page

STATE ORGANIZATIONS:
Here are several state organizations, similar to those at the national level, that are engaged in service initiatives. While
neither of these pages is particularly useful in and of itself they provide examples of state projects.
Youth Options -- http://ccwf.cc.utexas.edu/~csp/

This organization works with youth and families in Austin, Texas.
Maryland Governor's Commission on Service - http://sailor.lib.md.us:80/mgcos/

This is Maryland's organization to administer programs under the National and Community Service Trust Act of 1993.

UNIVERSITY PROGRAMS:
Many universities and colleges have service programs. Some are more extensive and well-known than others. Here is a
sampling of the leading initiatives.
Cornell University -- http://www.cornell.edu/student/Organizations/ORG15.html

This page lists the service organizations on Cornell's campus, ranging from The Public Service Center to service
fraternities.
Stanford University -- http://www-portfolio.Stanford.edu/104468

This page introduces The Haas Center for Public Service and links to other campus organizations.
Brown University -- http://www.brown.edu/Departments/Swearer_Center/

This page introduces the Howard R. Swearer Center for Public Service. It discusses Brown's commitment to service and
learning and links to other campus resources.
Gettysburg College -- http://www.gettysburg.edu/project/sl/top.html

This page looks at Gettysburg's Center for Public Service and has links to other service learning projects, both on
campus and nationally.

RESOURCES:
There are several pages on the web that act as clearinghouses for service materials. Some of the best resource
centers are as follows:
The National Service-Learning Cooperative Clearinghouse-- http://www.nicsl.coled.umn.edu

This clearinghouse includes information on service-learning programs, organizations, people, calendar events, literature
and information specialists.
The National Service Resource Center -- http://www.etr-assocaiates.org/NSRC/

This resource center supports the Americorps initiative through training and technical assistance. It includes a lending
library, newsletters, a listserve to which you can subscribe, and a link to other useful sites.
WhoCares: A Journal of Service and Action -- http://www.whocares.org/

WhoCares is a national journal which looks at issues of community service and social activism. From this page one can
link to previous issues of the magazine, subscribe to the magazine and link to many other web pages.
US Departmant of Education -- http://www.ed.gov/

This page is an introduction to the US Department of Education, including its programs, services and publications, as
well as links to other pages.
CIVITAS - http://www.primenet.com/%7Ecce/civitasexec.html
CIVITAS is curriculum framework whose purpose is bring civic education into the schools. This page describes the
rationale behind this, discusses civic virtue, civic participation, civic knowledge and links to other related organizations.
Center for Civic Education Programs and Publications -- http://www.primenet.com/%7Ecce/catalog.html

The Center creates and dispenses curriculum around civic education, teacher training, teacher education, and research
and evaluation in civic education.

RESEARCH ON THE WEB:
The WWW is constantly changing and growing, a

characteristic that makes its use both exciting and frustrating. This is especially true for researchers, as different search engines tend
to turn up completely different things and methodological issues may be difficult to address based on currently available information.
In creating this page on service, many hours were spent surfing the web looking for pertinent sources. The process I went through
and the material I found are comparable to a preliminary literature review. In other words, the search process turned up materials that
would help a researcher who was in square one. The search helped identify the big names in service and several of the "hot spots"
around the country for these issues.
After its utility as a literature review, the web, in its current state, is not especially useful in research for several reasons. First, the
web is a bias source of information. In an area such service, grass-roots organizations are very important. Unfortunately, grass-roots
operations don't often have the resources to create a web page. As well, other organizations that may have the resources don't
necessarily have the technical ability or may not realize the importance of a web page. Therefore, WHO is on the web is still limited.
Additionally, WHAT is on the web is also limited. The information in the above hot links is primarily explanatory. The pages consist
of program and organization descriptions, information on how to become member of an organization and how to link to other related
items. Conspicuously missing is research to indicate why these programs and organizations are necessary, evaluations of the
programs, and discussions of "where to go from here." As well, I found no full text articles available through these links.
On a final note, and in defense of the web as research tool, these pages could potentially be used for drawing samples (at least pilot
samples). Several of the links provided include an area where one can subscribe to a list-serve. When something is posted on a list-
serve it is distributed to 100's maybe 1000's of people. This could potentially be useful in distributing questionnaires. However, as
with any convenience sample, there is a strong built in bias.
CONCLUSIONS:
If you were looking for a place "get started" with service issues -- you are in luck! However, if research was on your agenda, that
may be a bit more difficult -- if not impossible. In the future, we need to include access to more articles and "behind the
scenes"aspects of these programs (pilot tests, basis for doing the program, etc.) in order to consider the web an effective tool in
research.
In the meantime, keep surfing the web with these useful engines:
Webcrawler Search Engine Lycos Search Engine Yahoo Search Engine
and GOOD LUCK!!!!
Send any questions or comments regarding this page to nmd1@cornell.edu (Nicole M. Driebe). 3/31/96

VIOLENCE AGAINST WOMEN:
RESEARCH ON THE WEB
EDUCATION SITES
GOVERNMENT SITES
ORGANIZATIONS AND INDIVIDUALS
In lieu of a more traditional (i.e., written) assignment examining methodological topics in a particular substantive area,
graduate students in Research Design for Program Evaluation (HSS 691) were instructed to explore the resources of the
Internet and offer a critique -- in the form of a web page -- of its current usefulness to prospective researchers.
My focus area is violence against women -- more specifically, domestic violence -- and what follows is my attempt at
drawing an admittedly rudimentary roadmap for use by those of you interested in "traveling the electronic highway" in
search of information on this topic. I found well over 50 sites that were, to a greater or lesser degree, related to violence
against women. The number addressing research or methodological issues in any important way were, however, relatively
few and far between. Unfortunately, at this point the Web is as lacking in depth as it is impressive in width -- at least as a
source for scholarly research in this particular field.
Again, there is a sizeable difference between what's out there on violence against women and what's out there related to
research on violence against women. My intention here is to steer you to the more substantive sites to save you considerable
time and effort in your future journeys. Of course, as they say, one man's shack is another man's castle (is that how the
saying goes?) Anyway, my point is that depending on what you're looking for, my review may be helpful or not. In any case,
I have provided sufficient links throughout to at least get you headed in the right direction.
Many sites have been created by individuals and contain general information and statistics, personal stories and experiences
and, in many cases, lists of annotated links to related pages.Sites developed by various associations and organizations have
more "face validity" in terms of their seeming legitimacy and authority to speak on the subject of violence against women.
For the most part, these pages, too, offer the same general background information and lack indepth material pertaining to
research in the field.
The road gets a little less bumpy from here on out. I've included several government links and consider these more valuable
destinations on our route. Although still scant, information contained on these sites is a tad more substantial in terms of
research, reliable statistics, and project funding than anything discussed thus far.
The final and, in my opinon, most worthwhile destination on our whirlwind tour leads us to several education sites around
the country. These pages house some discussion on prevailing research topics and certain methodological issues in the field,
as well as abstracts and, at times, full text of recent scholarly articles and reports.
The Internet is a vast resource. I spent many hours surfing, scanning, crawling, spinning, and tromping through
the web. I used several search engines as well as different keywork searches and techniques, and I am positive that
there are many things -- possibly some extremely valuable sites -- that I have missed. Still, it is my opinion that
the World Wide Web has simply not evolved to the point at which it is capable of providing sufficient and credible
information for scholarly research on domestic abuse.
Having said that, we must recognize the astounding potential of the internet as a tool for research in all fields in
the future and consider how to mold this new resource to its most advantageous form for that purpose. In doing so,
it is critical to consider a wide array of issues, including sampling, potential bias, access issues, and the like. For
now, I believe we can appreicate the Web for its diversity and utilize it to whatever extent possible. Like I said, it will be a
more or less helpful instrument depending on the context in which it is being used.
I'll let you be on your way, finally, but allow me to say a few short words in conclusion: If you are in a rush to get
quality research material for that final project, take a detour to your nearest university library. If you've got nothing
but time on your hands, spend some of it poking around out there. But proceed with caution. An expedition through the Web
can be like a three-hour cruise you never come back from!
I have organized the remainder of this project into three subsections corresponding to the categories of sites mentioned
above. In each subsection I have provided links and a brief (and hopefully helpful) explanation of what you can expect to
find at each one. I have paid closer attention to detail in the first subsection, Education Sites, because it has the most to offer
in terms of research on the subject of violence against women.
BON VOYAGE!
EDUCATION SITES
GOVERNMENT SITES
ORGANIZATIONS AND INDIVIDUALS
Kristin Ward
(kjw11@cornell.edu)
EDUCATION SITES
If you are interested in examining current debates and areas of research in the field of domestic violence, this is where the
good stuff is. As I mentioned earlier, you will be able to print out full text of recent scholarly articles from some of these
pages. The following links will guide you to articles ranging from critiques of the Conflict Tactics Scale to a call for
increased collaboration between advocates for battered women and researchers in the field.
WIFE BATTERING AND CHILD ABUSE
Several studies have indicated that women who are battered often have kids who are beaten in the home. Moreover, there is
evidence to suggest that children who are not themselves abused physically are nonetheless at an increased risk of
experiencing learning, behavioral and psychological problems as a result of bearing witness to the routine beating of "Mom."
Susan Schechter and Jeffrey L. Edleson prepared a briefing paper entitled In the Best Interest of Women and Children: A
Call for Collaboration Between Child Welfare and Domestic Violence Constituencies addressing this issue. Drawing from
that work, Dr. Edleson authored a second paper, Mothers and Children: Understanding the Links Between Battering and
Child Abuse, in which he discusses the overlap between wife assaults and child abuse and points to five areas in which
supporting studies are needed to further our present understanding of the connection.
RESEARCHERS AND ADVOCATES
There is often a tension between advocates for battered women and researchers in the field. The need of the researcher to
adhere to stringent methodological standards, implement contrived experimental or quasi-experimental designs and maintain
an objective distance from this highly emotional issue often flies in the face of advocates concerned with the privacy, safety
and healthy recovery of battered women. Edward Gondolf, Kersti Yllo and Jacquelyn Campbell have written on the need for
increased collaboration between the two camps and why it is an especially timely issue. The paper, Collaboration Between
Researchers and Advocates, is a must-read.
MALE BATTERERS: RESEARCH & PROGRAM EVALUATION
The efficacy of intervention programs for men who batter is another important issue and has not been clearly established.
Intervention for Men Who Batter: A Review of Research is an effort on the part of Richard M. Tolman and Jeffrey L.
Edleson to summarize and critique existing research in that area.
In his paper, Expanding Batterer Program Evaluation, Edward W. Gondolf addresses methodological problems inherent in
the evaluation of batterer programs and underscores the importance of conceptualizing "effectiveness." In so doing, he
emphasizes the need for alternative evaluation approaches and the advantages they may offer.
SURVEYS, SCALES, QUESTIONNAIRES
Richard Tolman's Psychological Maltreatment of Women Inventory is available for access on his Web-site. The PMWI is a
58-item instrument designed to measure the level of psychological abuse of women by their intimate male partners. Partial
text from Tolman's PMWI validation Study (1995) is available, but his homepage -- at least at the time of this writing -- is
still under construction.
The Canadian government, as part of its Family Violence Initiative, conducted a Violence Against Women Survey from
February to June 1993. This site contains results as well as information on the objectives, methodology and estimation
procedures of that survey and guidelines for releasing estimates based on it. (NOTE: The full text is 55 pages, so you might
want to consider whether or not you really need a copy before printing it out.) In addition to the text, there are introductory
instrucitons for using the VAW data set with SAS and SPSS programs on a Unix system.
The Conflict Tactics Scale (CTS) created by Straus and Gelles, tabulates the number of violent incidents reported by both the
female and male in an intimate partnership. It has been used, much to the dismay of battered women's advocates, to promote
the notion that men and women are equally abusive in relationships. Critics charge that the CTS fails to take into account
psychological abuse, threats and other forms of intimidation, the reason for violence by women (i.e., self-defense), and/or the
severity of injury resulting from the violent behavior. There is quite a bit of debate on the Web regarding this issue. If you're
interested, you'll want to hit the following spots:
The Conflict Tactics Scale and Domestic Violence
Wife Beating
Wife Beating -- Reply
Intimidation vs. CTS: What Really Constitutes Abuse?
MISC: Related Statistics
Manitoba, Canada Study
NOTE: I found some of the above information on David Throop's Men's Issues Page-- not on an "Education Site." His page has a lot of stuff on
the issue of abused men and assertions that wife battering is blown out of proportion, if you're interested.
Gelles investigates the sources and legitimacy of oft' sited violence against women statistics in Domestic Violence Factoids.
I think this is an important and informative "fact sheet" and recommend looking at it.
The University of Minnesota's Higher Education Center Against Violence and Abuse is the most informative Web-site on
domestic abuse and research that I was able to find. It has links to some of the sites and articles I have already listed, plus
many more resources and reports on violence and abuse. You'll also want to check out the University of Maryland's
Women's Studies Home Page. It is more broad in content scope and includes conference announcements, calls for papers,
employment opportunities and government documents. Resources for Feminist Research is a Canadian scholarly journal at
the Ontario Institute for Studies in Education. This page allows a visitor to print abstracts and tables of content from back
issues of the Journal. There is a lot of interesting feminist research available here. I did not find anything specific to violence
against women, but that does not mean that it won't be included in future issues. You might want to give it a quick glance.
Finally, Tom Turner and Lydia Potthoff from the University of Michigan's School of Information and Library Studies
created a site which includes General Sources of Women's and Feminist Information. It is a lengthy and potentially very
helpful annotated listing of and links to organizations, publications, electronic mailing lists, on-line newsletters, political
resources and more. Again, this site addresses women's issues more broadly.
GOVERNMENT SITES
Information contained on various government Web-pages may be useful to your research, although there is little
reference to violence against women specifically. Pages accessible through the following links contain little
scholarly research.
The Department of Justice has one of the few pages with specific information on violence against women. You can
access text from the Federal Register regarding the Grants to Combat Violence Against Women Program created
by the 1994 Violence Against Women Act. In addition, the DOJ's Bureau of Justice Statistics offers full abstracts
of its publications, including "Violence Against Women: Estimates from the Redesigned National Crime
Victimizaiton Survey," "Violence Between Intimates," and "Child Rape Victims." Each of these contain valuable
data and statistics.
The Centers for Disease Control and Prevention page also contains information specific to our issue. At this site,
you will see listings of CDC-sponsored/funded programs related to violence of all kinds, including domestic
violence. These listings include descriptions of specific programs and their research objectives.
The Department of Health & Human Services is an additional resource for researchers. One section of the HHS
Home Page, Policy & Research/Data Information, may be especially helpful, as it provides program evaluation
abstracts of studies done on HSS programs. Information on government-wide and HSS grants is also available here.
For information concerning any federal legislation, Congressional debates or hearing testimony, you'll want to
access the U.S. Congressional home page. THOMAS is another, perhaps more useful, tool for these purposes. It
can be accessed from the Congressional home page.
ASSOCIATIONS, ORGANIZATIONS & INDIVIDUALS
Many associations and organizations exist that are concerned about the prevalence of violence against women in
our society. Individuals with access to the internet and knowledge of HTML programming language also have a
presence on the Web. The following links represent only a very few of these types of pages, most of which provide
general background information, promote specific advocacy and awareness campaigns, provide hotline numbers,
describe shelter and other programs and offer links to similar sites.
The Justice Information Center is a service of the National Criminal Justice Reference Service, a collection of
clearinghouses which supports all bureaus of the U.S. Department of Justice. Its Web-site includes a Sexual Assault
Recovery Service, a list of full-text documents on domestic violence and its victims and information on Post-
Traumatic Stress Disorder from the American Psychological Association.
Here is a sampling of these kinds of pages. I do not think they are useful for scholarly research, so I have not taken
time to describe them in detail. They may, however, provide you with background information, contact names and
numbers or ideas that you might find helpful. So if you have time, you may want to check them out.
Model Code on Domestic Violence
The Family Violence Prevention Fund
The American Medical Association
Domestic Violence Information Pages (by E-Magazine)
Kate Korman's Home Page

This page is meant to be used as a resource to help
begin your exploration into the question of who Jesus is. Since arbitrarily searching for web information about Jesus can lead to
many tangents far removed from the topic of interest, it is my hope that this page can make the process of answering this question
easier by organizing various links that provide thoughtful information and comprehensive viewpoints. Links about Jesus have been
organized by subject below.
We begin with the question of 'why Jesus-' why it is important to better understand this fascinating figure. We then proceed
to the 'Overview' of his life according to the Gospels to obtain a general understanding of who Jesus is, then look deeper at the major
points of his life in the section entitled 'Looking Deeper.' I should note here that the overviews and looking deeper sections are based
on the Gospel accounts of Jesus, the reliability of which are under debate. (For discussion about the accuracy of these sources, please
click here.) Some of the 'looking deeper' links approach the life of Jesus through a non-gospel lense, and some with a mix of both
gospel and non-gospel sources.
Then given a general understanding of his life as told from gospel and non-gospel sources, we will then explore how those
from different religions perceive him to be. From this, we see a glimpse of how the person of Jesus has influenced people's
perceptions of life in strikingly different ways.
Why Jesus?
Overview
Looking Deeper
His family and heritage
His Ministry and Teaching
His Arrest and Execution
Perspectives
Other Links
The name, ‘Jesus’, can spark a variety of reactions, ranging from

impassioned anger to awe-inspired adoration. He is a figure that is not only the corner stone of Christianity, but a person that
continues to affect the viewpoints of other major religions of the world, such as Hinduism, Buddhism, and Islam.
Even though the great philosopher Neitze said that God is dead, religion-or belief in the divine- is an integral part of today’s
society. In the United States alone, 97% believe in God, 90% identify as a part of a specific religious group, and approximately 75
million attend a religious function weekly (statistics taken from sociologist Doctor Becker at Cornell University). Given these
statistics, it is not unreasonable to expect that the values taught by these religions have directly and indirectly affected the way we
think. Since Jesus appears to be the originator of many teachings that have, in a sense, shaped who we are, understanding the person
of Jesus will better enable us to understand ourselves and the world around us. Therefore, before we can begin to see how Jesus has
affected us and those around us, we must have a basic grasp of his life course.
According to the Gospel accounts of Jesus, Jesus was born in a Bethlehem

animal barn, an unlikely place for 'the Messiah' to make his initial appearance on earth. In Judaism, the messiah is foretold to come
and establish Israel's sovereignty.
Jesus' conception was immaculate, but throughout his life undoubtedly faced ridicule and disapproval from others as being
the product of an out of wedlock affair. Even today, some contend that Jesus was Mary's bastard child, although the Gospel accounts
specify otherwise. Later in his childhood, the Magi came to pay their respects, believing him to be the one God had sent to save His
people. As the popular Christmas song depicts with some accuracy, the Magi brought gold, incense, and myrrh- gold symbolizing
Jesus divinity, incense signifying his anointing as the Son of God, and myrrh for his future suffering.
Jesus' ministry began in his early thirties; after being rejected from his hometown of Nazareth, Jesus traveled to Capernaum
and gathered the first of his 12 disciples. Although Jesus' following grew, his disciples were his 'inner circle' so to speak, the ones
who worked most intimately with him. Jesus' teachings were radical during his time. He publicly forgave people's sins, which was
believed that only God could do, and embraced the outcasts of society. His band of followers included tax collectors, prostitutes, and
the poor. On one occassion, a respected teacher of the law had Jesus over for dinner. In the middle of the meal, a woman who had
lived a sinful life-a prostitute-came to the Pharisees' house and stood weeping behind Jesus. The members of the table objected, but
Jesus rebuked the host and told the woman that her sins were forgiven her. Jesus not only forgave the woman of her sins, which was
blasphemous in the culture to being with, but treated those considered to be less-the 'unclean,' sinners, women- with equal respect to
those who were revered.
As Jesus' popularity grew, the priests and scribes grew more afraid that Jesus would usurp their authority. Therefore, they
planned a conspiracy to have him arrested and executed. With the cooperation of one of Jesus' disciples, Judas Iscariot, the Jewish
authorities had him arrested and brought before the high priest for a preliminary trial. At the trial, Jesus claimed that he was the Son
of God; claiming to be the Christ, Jesus was claiming to be God incarnate in human form to bring deliverance to the people. The
gospel records that after Jesus stated his believed identity, the high priest ripped his garment in half, signifying horror and grief. Jesus
was immediately charged with blasphemy and was brought before higher authorities for capital punishment.
Jesus was crucified at Golgatha. Above his cross, hung the sign 'king of the Jews', a claim Jesus had confirmed earlier.
Ironically, ‘the king’ died next to two thieves who were crucified alongside him.
Jesus’ body was placed in a tomb and guarded by a squadron of soldiers because during his ministry, Jesus had foretold of
his death and subsequent resurrection. The priests and scribes wanted to prevent Jesus’ followers from stealing his body to later claim
his resurrection. However, after three days the gospels speak of Jesus’ disappearance from the tomb and later visits to his disciples
and others. Interestingly, Jesus first appeared to Mary Magdalene, who was doubly marginalized in society because of her past as a
prostitute and her status as a woman. Jesus then ascended back to heaven with the promise that he would return to complete the
establishment of God’s kingdom on earth.
Other Links
Life summary of Jesus

http://encarta.msn.com/find/Concise.asp?ti=035C4000
http://www.pantheon.org/mythica/articles/j/jesus_christ.html
Timeline of Jesus' life

http://www.lifeofchrist.com/history/timeline/index.html
Given that we know the basic overview of Jesus' life, let us now look at
some aspects of his life in greater detail.
Jesus' family and heritage

According to the Gospel accounts, Jesus was the son of a carpenter and a peasant woman, and was thus unfamiliar with
material wealth or high social status. Jesus parents, Mary and Joseph, were devout Jews; early in his life, they presented Jesus to the
Lord in Jerusalem, as was customary to those who honored 'the Law of the Lord,' the law of the Jewish people. Therefore, Jesus was
well versed in the Torah.
Other Links about Jesus' Family and Heritage

http://www.britannica.com/bcom/eb/article/0/0,5716,109560+2+106456,00.html http://www.pbs.org/wgbh/pages/
frontline/shows/religion/jesus/bornliveddied.html http://www.pbs.org/wgbh/pages/frontline/shows/religion/jesus/
socialclass.html
Jesus' Ministry and Teaching
It is difficult to summarize Jesus' Ministry and Teachings in a few paragraphs or a few pages, for that matter, because some
of his statements recorded in the gospels range from the straightforward to the enigmatic.
For example, Jesus talked about love in a way that the world in his time expected: "The most important [commandment]. . .
is this. . . Love the Lord your God with all of your heart and with all your soul and with all of your mind and with all of your
strength." He also talked about love in a way that was not expected: "For God so loved the world that he gave his one and only Son
(to be killed). . . ."(John 3:16) Here, he emphasizes love as an act of obedience rather than an emotion.
In general, Jesus' teachings were in the form of parables, and not always popular; the gospels say that many initial followers
deserted him because they thought "this is hard teaching. Who can accept it?" (John 6:60)
Jesus' ministry was also accompanied by many miracles- the blind saw, the lame walked, and the dead were resurrected.
However, Jesus seemed reluctant at the time to make himself known: After he healed two blind men, Jesus "warned them sternly,
'See that no one knows about this.'" (Matthew 9:30)
Yet despite Jesus' many miracles, his message was beyond physical healing, and focused more on the healing within. He
talked of giving rest to the weary and cleansing from sin through repentance. He taught about the heart's intentions as a truer measure
of goodness than positive appearances. What made him such a controversial figure at the time, though, were the claims he made of
himself as the source of rest, a pure heart, and immortality: "I am the resurrection and the life. He who believes in me will live, even
though he dies; and whoever lives and believes in me will never die." (John 11:25-26) Therefore it is understandable that some
believe Jesus to be a good man and others, a lunatic.
Other Links about Jesus' Ministry and Teaching
http://www.britannica.com/bcom/eb/article/0/0,5716,109560+4+106456,00.html http://www.pbs.org/wgbh/pages/
frontline/shows/religion/jesus/ministry.html Message.html
Jesus' Arrest and Execution
Jesus was taken by the chief priests, officers and elders to be arrested; the group that came to arrest Jesus was led by one of
his disciples, who identified Jesus with a kiss-a common greeting of welcome. After Jesus was taken away to be presented before
Pilate, the Roman governer, his disciples scattered and hid from fear that they too would be arrested.
Pilate questioned Jesus and found no basis for a charge against him, but the Jews insisted that . . . "he must die because he
claimed to be the Son of God." (John 19:7). Ironically, the very people who followed Jesus during his ministry either deserted him or
demanded his death. He was then taken to Golgotha-the place of the Skull-and crucified.
Other Links about his arrest and execution

suffering.html
http://www.pbs.org/wgbh/pages/frontline/shows/religion/jesus/arrest.html
People view the person of Jesus in a variety of ways depending on how

they interpret the gospel and non-gospel accounts of him. Out of all the major religions of the world, Christianity bases its beliefs
entirely on the gospels. Thus, the summaries given above about his life give a fairly accurate view of Christian belief.
Other religions, such as Islam, Hinduism, and Buddhism, draw upon their oral traditions and choose to believe certain
aspects of the gospel. Though there are no historical records of this, in India there is the strong tradition that Jesus went to India as a
teenager to bring Hindu values of realizing 'god-consciousness' back to the Jews. In Hindu belief, everyone is capable of realizing
their universal divine-consciousness, a spiritual experience that brings the awareness that all of us are gods-the same beings.
Unlike Hinduism, Muslims believe that Jesus was a prophet from the true god Allah, and Buddhist see Jesus as Buddha's
'brother' in the sense that he taught about the ability of love to bridge differences between people. In contrast to the religions
mentioned thus far, Judaism initially did not admire the person of Jesus and for most of Jewish history Jesus was seen as a
blasphemer, someone who tried to elevate himself on the same level as God. However, today there seems to be a distinction between
believing the Jesus who claimed to be God versus the Jesus who taught truths from the Jewish scriptures. According to theologian
John Cobb, "They see Jesus as an admirable Jew, but they don't believe any Jew could be God."
So what can we gather from these drastically different perspectives of one man? From just a very superficial look at Jesus'
life and how others view him, we can see the sanctity we place on love, the admiration we have for someone who teaches us to value
each other, and the yearning of our hearts for something bigger than ourselves that lasts forever. Thus the question of who Jesus is
not only an interesting and revealing one, but a question that has only begun to be answered.
The Christian perspective

http://www.gospelcom.net/rzim/jt/wjtoe.htm
http://www.whoisjesus.com/whois.html
The Buddhist perspective

http://newsweek.com/nw-srv/printed/us/so/a17552-2000mar19.htm
The Jewish perspective

jewishperspective.html
The Islamic perspective

islamicperspective.html
The Hindu perspective

hinduperspective.html
http://www.britannica.com/bcom/eb/article/0/0,5716,109559+1,00.html
http://www.newsweek.com/nw-srv/issue/13_99a/printed/us/so/rl0113_1.htm http://www.pbs.org/wgbh/pages/frontline/shows/
religion/
Are the Gospels Reliable?
http://www.britannica.com/bcom/eb/article/9/0,5716,109559+4+106456,00.html
http://www.pbs.org/wgbh/pages/frontline/shows/religion/story/gospels.html
http://reliable.html
Print Edition Banner
Society
Partners: COVER TALK

Web Exclusives:
Kenneth L. Woodward: This is Ken Woodward, religion editor and author of the current cover story Campaign 2000
Partners:
washingtonpost.com which examines the figure of Jesus in world religions other than Christianity.
Live Talks:
Cover Talk-The Other Jesus
To Christians, he is the Son of
God. But the world's other great
Edison, NJ: Why does Bible contradict itself in so many places? Why does the Bible claim that Jesus religions have their own visions
was the only Son of God when we all are sons of God? If it is so, who are we? Why do Christians believe of a legendary figure. Weary in
there is no God other than Jesus? body but estatic in spirit, Pope
John Paul II this week makes his
long-anticipated pilgrimage to the
Kenneth L. Woodward: First, the Bible is the work of many hands compiled and edited over several Holy Land. But the land of his
centuries and differs slightly in what is left in and what is left out in the Catholic Orthodox and Protestant heart's desire is holy to Jews and
traditions. It is Christianity which is responsible for the New Testament and it is in the New Testament Muslims as well. Newsweek
Society that the belief is expressed that Jesus is the Son of God. This assertion means that while we are all Religion Editor Kenneth L.
created of God, Jesus is his only begotten son born of the Holy Spirit. It is therefore a monotheism whose Woodward joined us Wednesday,
nature is expressed as relational between Father, Son and Holy Spirit. It is not compatible with March 22 for a Live Talk to
The Other Jesus discuss Jesus as seen through the
essentially Asian notions that all of us are inherently divine. In all monotheistic faiths, it is sinful to
eyes of other religious traditions.
equate a creature with the creator. Christians believe there is no God but God. And they simply do not
The Pilgrimage Read the transcript.
belong to the world view which holds that there are many divinities as Hindus do. But even in Hinduism,
one finds a hierarchy usually among gods so that Krishna or Shiva tends to be higher than the other gods
The Karma of the Gospel or goddesses. Politics Talk-Education: A
Ticket to Private School
A Rabbi Argues With Jesus The idea is catching on: if public
schools are failing, then give kids
The Long Road to 'vouchers' for a private education.
Reconciliation New Holland, PA: In discussing who Jesus is and what he has done, what historical sources--i.e. What do you think? Join in the
authorities--should we rely on? After all each religion has a different perspective on Jesus--which is the national debate Thursday, March
right one? 23, at noon E.S.T., when National
Nation
Correspondent Lynette Clemetson
International joins us for a Live Talk on
Business Kenneth L. Woodward: The purpose of our cover story was to show readers that there is a figure of education vouchers. Submit
Science and Technology Jesus in all world religions, not just Christianity. I would suppose that people will accept the figure of questions.
Arts and Entertainment Jesus that is most congruent with the religion to which they belong. For those who are outside of all
Departments religious communities, it is essential to ask on what grounds does each religion put forward its
• Periscope knowledge of Jesus. I would add that after more than 200 years there is not a lot we can know about
From Newsweek U.S. Edition
• Conventional
Jesus apart from the sacred scriptures in which his story is told.
Wisdom The Cover: The Other Jesus
• Cyberscope National Affairs
• My Turn
• Letters
Washington DC: Could you speak about the evidence of the actual existence of Jesus? I am consistently ● Guns: Smith Wesson Cuts a
• Perspectives amazed at the number of people who think that he was not a real person, and yet other religions besides Deal
• Last Word Christianity accept him as being real.
by Anna Quindlen International
● Taiwan: Voters Throw Down
Kenneth L. Woodward: To suppose that there was no human being called Jesus and that the story is a the Gauntlet to China
International Edition fabrication requires flights of fancy far more perilous than in accepting him as real. Business
● Publishing: Stephen King's
Special Issues Internet Thriller

Science & Technology
● Biology: The Very Nature of
Kenneth L. Woodward: I thank you all for your interesting and provocative questions. Nurturing
Arts & Entertainment
● Hollywood: Onward to Oscar
Night
Laura Fording: Join us every Wednesday at noon EST for Live Talk.
© 2000 Newsweek, Inc.
Back to the top
Print Edition Banner

Jesus' Parables
http://www.pbs.org/wgbh/pages/frontline/shows/religion/jesus/parables.html
http://www.lifeofchrist.com/teachings/parables/seaside/index.html
Find
Saturday, Apr. 15, 2000
ENCYCLOPÆDIA BRITANNICA Tools
Jesus Christ E-mail this article

Print this article
The message of Jesus
More About This Topic
The Kingdom of God
Article
Jesus announced the approaching Kingdom of God and therefore called people to repentance. The first
Index Entry
two Gospels have set this at the beginning in a programmatic saying as a summary of his preaching and
have thus characterized the central and dominant theme of his mission as a whole (Mark 1:15; Matt.
4:17). Thus, the Kingdom of God, or Kingdom of Heaven (a Jewish circumlocution for God preferred by Internet Links
Matthew), does not just denote a final chapter of his "system of doctrine" (a concept that cannot be
applied to Jesus, in any case). The underlying Jewish word (malkhuta) means God's kingship, and not
primarily his domain. This meaning prevails in the New Testament texts. But Kingdom of God or Heaven is
also used in a spatial sense ("Enter . . ."). The burning expectation of the Kingdom of God was widely
spread in contemporary Judaism in manifold form, based on the Old Testament faith in the God of the
fathers, the Creator and Lord of the world, who had chosen Israel to be his people. But with this faith
there had united itself the contradictory experience that the present condition of the world was ungodly,
that Satanic powers reigned in it, and that God's kingship would only manifest itself in the future. In wide
circles, this expectation had the form of a national, political hope in the Davidic Messiah, though it had
expanded this hope in apocalyptic speculation to a universal expectation. In each case it was directed
toward the Last Days. Likewise, in Jesus' message, the expression Kingdom of God has a purely
eschatological--i.e., future--sense and means an event suddenly breaking into this world from the outside,
through which the time of this present world is ended and overcome.
These traditional motifs of the end of the world, the Last Judgment, and the new world of God are not
lacking in the sayings of Jesus preserved in the Gospel tradition. Thus, Jesus has not by any means
changed the Kingdom of Heaven into a purely religious experience of the individual human soul or given
the Jewish eschatological expectation the sense of an evolutionary process immanent in the world or of a
goal attainable by human effort. Some of his parables have given rise to such misunderstanding (e.g., the
stories of the seed and harvest, the leaven, and the mustard seed). In such cases, the modern thought of
an organic process has been wrongly introduced into the texts. People of classical and biblical times,
however, heard in them connotations of the surprising and the miraculous. The Kingdom of God, thus, is
not yet here. Hence the prayer, "Thy kingdom come!" (Matt. 6:10; Luke 11:2), and the tenses, for
example, in Jesus' Beatitudes and predictions of woe (Luke 6:21-26). The poor, the hungry, and the
weeping are not yet in heaven. The petitions of the Lord's Prayer presuppose the deeply distressing
circumstance that God's name and will are abused, that his Kingdom is not yet come, and that men are
threatened by the temptation to fall away.
In regard to Jesus' preaching, one cannot, therefore, speak of a realized eschatology--i.e., the Last Times
are now here (according to the view of C.H. Dodd, a British biblical scholar)--but of an eschatology "in
process of realizing itself" (according to the view of Joachim Jeremias, a German biblical scholar); for
God's Kingdom is very close. It is on the threshold, already casts its light into the present world, and is
seen in Jesus' own ministry through word and deed. In this, his message differs from the eschatology of
his time and breaks through all of its conceptions. He neither shared nor encouraged the hope in a
national messiah from the family of David, let alone proclaimed himself as such a messiah, nor did he
support the efforts of the Zealots to accelerate the coming of the Kingdom of God. He also did not tolerate
turning the Kingdom of God into the preserve of the pious adherents of the Law (Pharisees; Qumran
sect), and he did not participate in the fantastic attempts of the apocalyptic visionaries of his time to
calculate and thus depict in detail the end of the present world and the dawn of the new "aeon," or age
(Luke 12:56). Nor did he undertake a direct continuation of the Baptist's preaching.
All the ideas and images in Jesus' preaching converge with united force in the one thought, namely, that
God himself as Lord is at hand and already making his appearance, in order to establish his rule. Jesus did
not want to introduce a new idea of God and develop a new theory about the end of the world. It would
therefore be incorrect to understand his preaching in the Jewish apocalyptic sense of immediate
expectancy, coming, as it were, to a boiling point. The proximity of the Kingdom of God actually means
that God himself is at hand in a liberating attack upon the world and in a saving approach to those in
bondage in the world; he is coming and yet is already present in the midst of the still-existing world. In
Jesus' message, God is no longer the prisoner of his own majesty in a sacral sphere into which pious
tradition had exiled him. He breaks forth in sovereign power as Father, Helper, and Liberator and is
already now at work, as is indicated by Jesus' proclaiming of his nearness and by Jesus' actions in
entering the field of battle himself, to erect the signs of God's victory over Satan: "But if it is by the finger
of God that I cast out demons, then the kingdom of God has come upon you" (Luke 11:20). For this
reason, Jesus called out: the shift in the aeons is here; now is the hour of which the prophets' promises
told (Matt. 11:5; Isa. 35:5). This "here and now" carries all the weight in Jesus' message: "Blessed are
the eyes which see what you see! For I tell you that many prophets and kings desired to see what you
see, and did not see it, and to hear what you hear, and did not hear it" (Luke 10:23-24). In answer to the
Pharisees' question about when the Kingdom of God is coming, Jesus therefore said, "The Kingdom of God
does not come in an observable way, nor will they say, Look, here it is!' or There!' For look, the Kingdom
of God is within your reach" (Luke 17:20-21; another translation: "in the midst of you").
The dominant feature of Jesus' preaching is the Heavenly Father's turning in mercy and love to the
suffering, guilty, outcast, and to those who, according to the prejudices of the "pious," have no right to
receive a share in the final salvation. Numerous parables described how God behaves toward them and
shows himself as Lord and King (e.g., Luke 15; Matt. 18:23ff.; 20:1ff.). They all speak of God's action in
images drawn from daily life, so that everyone can understand. They belong to the uncontestedly oldest
stock of the Jesus tradition. But Jesus did not only teach this, he practiced and illustrated it himself by his
own behaviour and thereby offended the pious, who claimed the Kingdom of Heaven for themselves.
In this message of the approaching Kingdom of God, Jesus' call to repentance is grounded. He called on
all not to miss the hour of salvation (Luke 14:16ff.; 13:6ff.), to sacrifice everything for the Kingdom of
God (Matt. 13:44ff.), and to receive it like a child (Mark 10:15), without the presumptuous and desperate
conceit that one might win it and realize it by one's own works (Mark 4:26ff.; Matt. 13:24ff.). Jesus'
summons to be wise, to be on the watch (Luke 16:1ff.; 12:35ff.; Mark 13:33ff.; Matt. 24:45ff.), and to
surrender the fiction of one's own righteousness (Luke 18:10ff.) belongs here, too. In Jesus' preaching,
repentance does not mean a prerequisite or precondition or even a penitent contemplation of oneself but,
rather, a consequence of the proximity of the Kingdom of God (Matt. 4:17) and an opening of oneself for
his future, a movement not backward, but forward. Jesus in this way binds future and present insolubly
together. The apocalyptic's question about how much time still has to elapse before the new world of God
is here is thus rendered meaningless. He who asks this only proves that he understands neither the future
nor the present properly; namely, God's future as the salvation that is already dawning and one's own
present in the light of the coming Kingdom of God.
Jesus therefore rejected the demand that he produce "signs" as proof of the dawning of the time of
salvation (Matt. 12:38ff.; Mark 8:11). He himself is to be viewed as the "sign," just as once Jonah, the
prophet of repentance, was the only sign given to the people in Nineveh (Luke 11:29ff.). The sign is not
identical with the thing signified, but it is a valid indication of it.
According to the Synoptics, Jesus never made his "messiahship" the subject of his teaching or used it as
legitimation for his message. It is significant that the "I am" sayings of John, which bear the stamp of
Christology throughout, are not found in the Synoptic tradition. That does not in any way affect the fact
that Jesus in a decisive way included his own person as eschatological prophet and charismatic miracle
worker in the event of the Kingdom of God: "And blessed is he who takes no offense at me" (Matt. 11:6).
<< Previous | Next >>
Click here for a list of other articles that contain information on this subject
Contents of this article:
Introduction
The gospel tradition
Sources
Non-Christian sources
Christian sources
The Pauline Letters
The Gospels
Times and environment
Political conditions
Religious conditions
The Pharisees
The Sadducees
The scribes
The Zealots
The Essenes
The life and ministry of Jesus
The birth and family
The birth of Jesus
The family of Jesus
The ministry
The role of John the Baptist
The beginning of the ministry
The calling of the disciples
The Galilean period
The Kingdom of God
The will of God
The sufferings and death of Jesus in Jerusalem
The story of Jesus and faith in Jesus
The picture of Christ in the early church: The Apostles' Creed
Preexistence
Jesus Christ
God's only son
The Lord
Incarnation and humiliation
Conceived by the Holy Spirit, born of the Virgin Mary
Suffered under Pontius Pilate, was crucified, dead, and buried
He descended into hell
Glorification
The third day he rose again from the dead
He ascended into heaven, and sitteth on the right hand of God the father almighty
From thence he shall come to judge the quick and the dead
The dogma of Christ in the ancient councils
The councils of Nicaea and Constantinople
Early heresies
Nicaea
Constantinople
The councils of Ephesus and Chalcedon
The parties
The settlement at Chalcedon
The interpretation of Christ in Western faith and thought
Doctrines of the person and work of Christ
The medieval development
The Reformation and classical Protestantism
The debate over Christology in modern Christian thought
Origins of the debate
The 19th century
The 20th century
Bibliography
The sufferings and death of Jesus
About Britannica.com | Comments & Questions | Company Information | Advertising Sales Kit
© 1999-2000 Britannica.com Inc.
Find
Saturday, Apr. 15, 2000
ENCYCLOPÆDIA BRITANNICA Tools
Jesus Christ E-mail this article

Print this article
More About This Topic
Jesus' decision to go to Jerusalem is the turning point in his story. The events it set in motion soon came
to have decisive significance for the faith of his followers. It is not coincidental that the Gospels narrate Article
this period of his life in disproportionate breadth. Despite the many points of agreement among the
Gospels, there also are considerable discrepancies within the tradition of the Passion. Thus, one cannot Index Entry
expect the tradition of the Passion to provide historically accurate reports, for it has been formed from the
viewpoint of the church and its faith in Christ. The most important theological motifs in the narratives
include the intention of presenting Jesus' sufferings and death as the fulfillment of God's will, the decision,
Internet Links
in conformity with the words of the Old Testament Prophets and Psalms, to proclaim him as Messiah and
Son of God, despite his brutal end. Nevertheless, important historical facts may be inferred from the texts.
Jesus probably went to Jerusalem with his disciples for the Passover in order to call the people of Israel
gathered there to a final decision in view of the dawning Kingdom of God. He must have been aware of
the heavy conflicts with the Jewish rulers that lay ahead of him. The story of the cleansing of the Temple,
in particular, shows that Jesus did not avoid these conflicts. The later tradition, stylizing the story, gives
as Jesus' sole motive for going to Jerusalem his desire to die there and to rise again in accordance with
the will of God (Mark 8:31; 9:31; 10:32ff.). The best clue for a reconstruction of the outward course of
Jesus' Passion is given by his Crucifixion. It proves that he was condemned and executed under Roman
law as a political rebel. All reports agree that he died on Friday (Mark 15:42; Matt. 27:62; Luke 23:54;
John 19:31). They differ, however, in that, according to the Synoptics, this was the 15th of Nisan (March/
April); i.e., the first day of the Passover. But, according to John, it was the previous day; i.e., the one on
which the Passover lambs were slaughtered and on which the festival was begun in the evening (in
accordance with the Jewish division of days) with a common meal. Thus, according to John, Jesus' last
meal with the disciples was not itself a Passover meal but took place earlier. Each of these datings may be
theologically motivated, whether it be that the Eucharist is to be represented as the Passover meal
(Synoptics) or whether Jesus himself is to be shown as the true Passover lamb, who died at the hour
when the lambs were slaughtered (John). Historically, the Johannine dating is to be preferred, and the
14th Nisan (April 7) is to be regarded as the day of Jesus' death. The question of the occasion for Jesus'
execution and the role that the Jews played is thereby more difficult and more important.
The way the Gospels present the facts of the case, Jesus was actually condemned to death by the
supreme Jewish tribunal (Mark 14:55ff.). Pilate, on the other hand, was convinced of Jesus' innocence and
made vain attempts to release him but finally yielded to the Jews' pressure against his better judgment
(Mark 15:22ff.). The historical reliability of this account has rightly been questioned. First, the Synoptic
reports differ among themselves. According to Mark and Matthew, the Jewish supreme court had already
gathered in the home of the High Priest after Jesus' arrest in the night of Holy Thursday to Friday and
condemned him to death as a blasphemer at that point (Mark 14:64). Thereafter, they resolved to hand
Jesus over to Pilate in a new session in the early morning (Mark 15:1). Luke knows of only one session
and has the interrogation take place in the morning (Luke 22:66), but he says nothing about Jesus'
condemnation (Luke 22:71). John deviates even more; here, only the high priests Annas and Caiaphas
are involved in the interrogation of Jesus (John 18:13ff.). Secondly, with regard to all the Gospel
accounts, the question arises, what earwitness can be supposed later to have given the disciples an exact
report? Thirdly, the jurisdictional competency of the Jewish Sanhedrin is disputed. In the opinion of some
scholars, the Jewish authorities were permitted to pronounce sentence of death and to carry it out by
stoning in the case of serious religious offenses (blasphemy). In the opinion of others, though, this
required the confirmation of the Roman procurator. Also, trials of this kind were not to be conducted
during the period of the festival.
The strongest argument against the Synoptic presentation is, however, that it is styled throughout in a
Christian, and not in a Jewish, way; i.e., on the basis of scriptural proof and the Christian confession to
the messiahship and divine Sonship of Jesus. The High Priest's question, "Are you the Christ, the Son of
the Blessed?" (Mark 14:61), is unthinkable from the viewpoint of Jewish premises, because Son of God
was not a Jewish title for the Messiah. Thus, the account reflects the controversies of the later church with
the Judaism of its day.
There also is in the Gospels a tendency to exonerate Pilate at the Jews' expense. His behaviour, however,
does not match the picture that nonbiblical sources have handed down about him. But everything speaks
for Jesus' having been arrested as a troublemaker, informally interrogated, and handed over to Pilate as
the leader of a political revolt by the pro-Roman priestly and Sadducean members of the Sanhedrin, who
were dominant in Jerusalem society in those days. The cleansing of the Temple and a prophetic,
apocalyptic saying of Jesus (John 2:19; cf. Mark 14:58; Acts 6:14) about the destruction of the Temple
may thereby have played a role. It can hardly be assumed that each and all of the Pharisees, who were
without political influence at that time, were involved in the plot. Nor are they mentioned as a separate
group in the Passion narratives alongside the priests, elders, and scribes.
The other scenes in the Passion story do not need to be listed here separately. They relate more to the
theological meaning of Jesus' Passion and are, to a large measure, formed in an edifying cultic manner,
even though they refer to events that are certainly historical; e.g., Judas' betrayal, Jesus' last meal with
his disciples, and Peter's denial of Jesus. The traces of an eyewitness account are perhaps still
recognizable at certain points (Mark 14:52; 15:21).
The accounts differ in their presentation of Jesus' death, especially in their rendering of his last words. It
is only in Mark and Matthew that Jesus dies crying out the prayer from Psalm 22: "My God, my God, why
hast thou forsaken me?" The distinction between the repentant and the defiant thief is only found in Luke.
Jesus' last words are given differently in Luke ("Father, into thy hands I commit my spirit!") and John ("It
is finished"). Each of these accounts, as also the testimony of the Roman centurion ("Truly this man was
the Son of God!"; Mark 15:39), gives expression to the significance of Jesus and his story.
<< Previous | Next >>
Click here for a list of other articles that contain information on this subject
Contents of this article:
Introduction
The gospel tradition
Sources
Non-Christian sources
Christian sources
The Pauline Letters
The Gospels
Political conditions
Religious conditions
The Pharisees
The Sadducees
The scribes
The Zealots
The Essenes
The birth and family
The birth of Jesus
The family of Jesus
The ministry
The role of John the Baptist
The beginning of the ministry
The calling of the disciples
The Galilean period
The Kingdom of God
The will of God
The picture of Christ in the early church: The Apostles' Creed
Preexistence
Jesus Christ
God's only son
The Lord
Incarnation and humiliation
Conceived by the Holy Spirit, born of the Virgin Mary
Suffered under Pontius Pilate, was crucified, dead, and buried
He descended into hell
Glorification
The third day he rose again from the dead
He ascended into heaven, and sitteth on the right hand of God the father almighty
From thence he shall come to judge the quick and the dead
The dogma of Christ in the ancient councils
The councils of Nicaea and Constantinople
Early heresies
Nicaea
Constantinople
The councils of Ephesus and Chalcedon
The parties
The settlement at Chalcedon
The interpretation of Christ in Western faith and thought
Doctrines of the person and work of Christ
The medieval development
The Reformation and classical Protestantism
The debate over Christology in modern Christian thought
Origins of the debate
The 19th century
The 20th century
Bibliography
The sufferings and death of Jesus
About Britannica.com | Comments & Questions | Company Information | Advertising Sales Kit
© 1999-2000 Britannica.com Inc.
The Jewish Perspective
(From Newsweek.com; March 27, 2000)
That Jesus was a Jew would seem to be self-evident from Gospels. But before the first Christian century was out, faith in Jesus as
universal Lord and Savior eclipsed his early identity as a Jewish prophet and wonder worker. For long stretches of Western history,
Jesus was pictured as a Greek, a Roman, a Dutchman—even, in the Germany of the 1930s, as a blond and burly Aryan made in the
image of Nazi anti-Semitism. But for most of Jewish history as well, Jesus was also a deracinated figure: he was the apostate, whose
name a pious Jew should never utter.
Indeed, the lack of extra-Biblical evidence for the existence of Jesus has led more than one critic to conclude that he is a Christian
fiction created by the early church. There were in fact a half dozen brief passages, later excised from Talmudic texts, that some
scholars consider indirect references to Jesus. One alludes to a heresy trial of someone named Yeshu (Jesus) but none of them has
any independent value for historians of Jesus. The only significant early text of real historical value is a short passage from Flavius
Josephus, the first-century Jewish historian. Josephus describes Jesus as a "wise man," a "doer of startling deeds" and a "teacher"
who was crucified and attracted a posthumous following called Christians. In short, argues Biblical scholar John P. Meier of Notre
Dame, the historical Jesus was "a marginal Jew in a marginal province of the Roman Empire"—and thus unworthy of serious notice
by contemporary Roman chroniclers.
Christian persecution of the Jews made dialogue about Jesus impossible in the Middle Ages. Jews were not inclined to contemplate
the cross on the Crusaders' shields, nor did they enjoy the forced theological disputations Christians staged for Jewish conversions.
To them, the Christian statues and pictures of Jesus represented the idol worship forbidden by the Torah. Some Jews did compile
their own versions of a "History of Jesus" ("Toledoth Yeshu") as a parody of the Gospel story. In it, Jesus is depicted as a seduced
Mary's bastard child who later gains magical powers and works sorcery. Eventually, he is hanged, his body hidden for three days and
then discovered. It was subversive literature culled from the excised Talmudic texts. "Jews were impotent in force of arms," observes
Rabbi Michael Meyer, a professor at Hebrew Union Seminary in Cincinnati, "so they reacted with words."
When skeptical scholars began to search for the "historical Jesus" behind the Gospel accounts in the 18th century, few Jewish
intellectuals felt secure enough to join the quest. One who did was Abraham Geiger, a German rabbi and early exponent of the
Reform Jewish movement. He saw that liberal Protestant intellectuals were anxious to get beyond the supernatural Christ of Christian
dogma and find the enlightened teacher of morality hidden behind the Gospel texts. From his own research, Geiger concluded that
what Jesus believed and taught was actually the Judaism of liberal Pharisees, an important first-century Jewish sect. "Geiger argued
that Jesus was a reformist Pharisee whose teachings had been corrupted by his followers and mixed with pagan elements to produce
the dogmas of Christianity," says Susannah Heschel, professor of Jewish studies at Dartmouth. Thus, far from being a unique
religious genius—as the liberal Protestants claimed—Geiger's Jesus was a democratizer of his own inherited tradition. It was, he
argued, the Pharisees' opponents, the Sadducees, who became the first Christians and produced the negative picture of the Pharisees
as legalistic hypocrites found in the later Gospel texts. In sum, Geiger—and after him, other Jewish scholars—distinguished between
the faith of Jesus, which they saw as liberal Judaism, and the faith in Jesus, which became Christianity.
The implications of this "Jewish Jesus" were obvious, and quickly put to polemical use. Jews who might be attracted by the figure of
Jesus needn't convert to Christianity. Rather, they could find his real teachings faithfully recovered in the burgeoning Reform Jewish
movement. Christians, on the other hand, could no longer claim that Jesus was a unique religious figure who inspired a new and
universal religion. Indeed, if any religion could claim universality, it was monotheistic Judaism as the progenitor of both Christianity
and Islam.
The Holocaust occasioned yet another way of imagining Jesus. If some Jews blamed Christians—or God himself—for allowing the
ovens of Auschwitz, a few Jewish artists found a different way to deal with the horror of genocide: they applied the theme of the
crucified Christ to the Nazis' Jewish victims. This is particularly evident in harrowing paintings of Marc Chagall, where the dying
Jesus is marked by Jewish symbols. And in "Night," his haunting stories of the death camps, Elie Wiesel adopted the Crucifixion
motif for his wrenching scene of three Jews hanged from a tree, like Jesus and the two thieves on Golgotha. The central figure is an
innocent boy dangling in protracted agony because his body is too light to allow the noose its swift reprieve. When Wiesel hears a
fellow inmate cry, "Where is God?" the author says to himself: "Here He is. He has been hanged here, on these gallows." "There's no
lack of suffering in Judaism," says Alan Segal, professor of Jewish Studies at Barnard College and Columbia University, "and no
reason why Jews shouldn't pick up an image central to Christianity."
Today, the Jewishness of Jesus is no longer a question among scholars. That much of what he taught can be found in the Jewish
Scriptures is widely accepted by Christian as well as Jewish students of the Bible. At some seminaries, like Hebrew Union, a course
in the New Testament is now required of rabbinical candidates. Outside scholarly circles, there is less focus on Jesus, and most Jews
will never read the Christian Bible. And, of course, Jews do not accept the Christ of faith. "They see Jesus as an admirable Jew," says
theologian John Cobb, "but they don't believe that any Jew could be God."
The Islamic Perspective
At the onset of Ramadan last year, Vatican officials sent greetings to the world's Muslims, inviting them to reflect on Jesus as "a
model and permanent message for humanity." But for Muslims, the Prophet Muhammad is the perfect model for humankind and in
the Qur'an (in Arabic only), they believe, the very Word of God dwells among us. Even so, Muslims recognize Jesus as a great
prophet and revere him as Isa ibn Maryam—Jesus, the son of Mary, the only woman mentioned by name in the Qur'an. At a time
when many Christians deny Jesus' birth to a virgin, Muslims find the story in the Qur'an and affirm that it is true. "It's a very strange
situation, where Muslims are defending the miraculous birth of Jesus against Western deniers," says Seyyed Hossein Nasr, professor
of Islamic studies at George Washington University. "Many Westerners also do not believe that Jesus ascended into heaven. Muslims
do." Indeed, many Muslims see themselves as Christ's true followers.
What Muslims believe about Jesus comes from the Qur'an—not the New Testament, which they consider tainted by human error.
They also draw upon their own oral traditions, called hadith, and on experts' commentaries. In these sources, Jesus is born of Mary
under a palm tree by a direct act of God. From the cradle, the infant Jesus announces that he is God's prophet, though not God's son,
since Allah is "above having a son" according to the Qur'an.
Nonetheless, the Muslim Jesus enjoys unique spiritual prerogatives that other prophets, including Muhammad, lack. Only Jesus and
his mother were born untouched by Satan. Even Muhammad had to be purified by angels before receiving prophethood. Again, in the
Qur'an Muhammad is not presented as a miracle worker, but Jesus miraculously heals the blind, cures lepers and "brings forth the
dead by [Allah's] leave." In this way Jesus manifests himself as the Messiah, or "the anointed one." Muslims are not supposed to pray
to anyone but Allah. But in popular devotions many ask Jesus or Mary or John the Baptist for favors. (According to one recent
estimate, visions of Jesus or Mary have occurred some 70 times in Muslim countries since 1985.)
Although Muhammad supersedes Jesus as the last and greatest of the prophets, he still must die. But in the Qur'an, Jesus does not die,
nor is he resurrected. Muslims believe that Jesus asked God to save him from crucifixion, as the Gospels record, and that God
answered his prayer by taking him directly up to heaven. "God would not allow one of his prophets to be killed," says Martin Palmer,
director of the International Consultancy on Religion, Education and Culture in Manchester, England. "If Jesus had been crucified, it
would have meant that God had failed his prophet."
When the end of the world approaches, Muslims believe that Jesus will descend to defeat the antichrist—and, incidentally, to set the
record straight. His presence will prove the Crucifixion was a myth and eventually he will die a natural death. "Jesus will return as a
Muslim," says Nasr, "in the sense that he will unite all believers in total submission to the one God."
The Hindu Perspective
The gospels are silent about the life of Jesus between his boyhood visit to the Jerusalem Temple with his parents, and the beginning
of his public ministry at the age of 30. But in India there is a strong tradition that the teenage Jesus slipped away from his parents,
journeyed across Southeast Asia learning yogic meditation and returned home to become a guru to the Jews. This legend reveals just
how easily Hinduism absorbs any figure whom others worship as divine. To Hindus, India is the Holy Land, its sacred mountains and
rivers enlivened by more than 300,000 local deities. It is only natural, then, that Jesus would come to India to learn the secrets of
unlocking his own inherent divinity.
As Gandhi was, many Hindus are drawn to the figure of Jesus by his compassion and nonviolence—virtues taught in their own
sacred Scriptures. But also like Gandhi, Hindus find the notion of a single god unnecessarily restrictive. In their perspective, all
human beings are sons of God with the innate ability to become divine themselves. Those Hindus who read the Gospels are drawn to
the passage in John in which Jesus proclaims that "the Father and I are one." This confirms the basic Hindu belief that everyone is
capable through rigorous spiritual practice of realizing his or her own universal "god-consciousness." The great modern Hindu saint
Ramakrishna recorded that he meditated on a picture of the Madonna with child and was transported into a state of samadhi, a
consciousness in which the divine is all that really exists. For that kind of spiritual experience, appeal to any god will do. "Christ-
consciousness, God-consciousness, Krishna-consciousness, Buddha-consciousness—it's all the same thing," says Deepak Chopra, an
Indian popularizer of Hindu philosophy for New Age Westerners. "Rather than "love thy neighbor,' this consciousness says, 'You and
I are the same beings."
VALIDITY
By Gretchen K. Rymarchyk
Social science research differs from research in fields such as physics and chemistry for many reasons. One reason is that
the things social science research are trying to measure are intangible, such as attitudes, behaviors, emotions, and
personalities. Whereas in physics you can use a ruler to measure distance, and in chemistry you can use a graduated cylinder
to measure volume, in social science research you cannot pour emotions into a graduated cylider or use a ruler to measure
how big someone's attitude is (no puns intended).
As a result, social scientists have developed their own means of measuring such concepts as attitudes, behaviors, emotions,
and personalities. Some of these techniques include surveys, interviews, assessments, ink blots, drawings, dream
interpretations, and many more. A difficulty in using any method to measure a phenomenon of social science is that you
never know for certain whether you are measuring what you want to measure.
Validity is an element of social science research which addresses the issues of whether the researcher is actually measuring
what s/he says s/he is. As an example, let us pretend we want to measure attitude. A psychologist by the name of Kurt
Goldstein developed a way to measure "abstract attitude" by assessing several different abilities in brain injury patients,
such as ability to separate their internal experience from the external world, ability to shift from one task to another, and the
ability to recognize an oragnized whole, to break it into component parts, and then reorganize it as before. Carl Jung defined
attitude a introversion and extraversion. Raymond Cattell defined attitude in three components: intensity of interest, interest
in an action, and interest in action toward an object (Hall & Lindzey, 1978).
Are any of these things what you think of when someone mentions the word "attitude?"
Do any of these definitions of attitude seem like they are defining the same thing? Do
they seem valid to you?
A definition of attitude that would seem to possess more validity to you might be the definition provided in the American
Heritage Dictionary: "A state of mind or feeling with regard to some matter; disposition" (1987, p. 140). This definition of
attitude may appear to you to be the most valid.
Validity in social science research has several different components - some people feel there are only three components of
validity , and others feel there are four components of validity . On this page all four will be addressed. All of these facets of
validity you would ideally want to establish for your research measure prior to administering it for your actual research
project.
Face validity requires that your measure appears relevent to your construct to an innocent bystander, or more specifically,
to those you wish to measure. Face validity can be established by your Mom - just ask her if she thinks your survey could
adequately and completely assess someone's attitude. If Mom says yse, then you have face validity. However, you may want
to take this one step further and ask individuals similar to those you wish to study if they feel the same way your Mom does
about your survey. The reason for asking these people is that people can sometimes become resentful and uncooperative if
they think they are being misrepresented to others, or worse, if they think you are misrepresenting yourself to them. For
instance, if you tell people you are measuring their attitudes, but your survey asks them how much money they spend on
alcohol, they may think you have lied to them about your study. Or if your survey only asks how they feel about negative
things (i.e. if their car was stiolen, if they were beat up, etc.) they may think that you are going to find that these people all
have negative attitudes, when that may not be true. So, it is important to establish face validity with your population of
interest.
In order to have a valid measure of a social construct, one should never stop at achieving only face validity, as this is not
sufficient. However one should never skip establishing face validity, because if you do not have it, you cannot achieve the
other components of validity.
Content validity is very similar to face validity, except instead of asking your Mom or your target members of your
population of interest, you must ask experts in the field (unless your Mom is an expert on attitude). The theory behind
content validity, as opposed to face validity, is that experts are aware of nuances in the construct that may be rare or elusive
of which the layperson may not be aware. For example, if you submitted your attitude survey to Kurt Goldstein for a content
validity check, he may say you need to have something to assess whether your respondents can break something down into
component parts, then resynthesize it, as this is an important aspect of attitude, and otherwise you have no content validity.
For an example of a study where a content validity check was used for an attitude assessment, click here . Another example
measures influences , and another measures impacts .
Many studies procede following content validity acvhievement, however this does not necessarily mean the measures used
are entirely valid. Criterion validity is a more rigorous test than face or content validity. Criterion validity means your
attitude assessment can predict or agree with constructs external to attitude. Two types of criterion validity
exist:
Predictive validity- Can your attitude survey predict? For example, if someone scores high, indicating that they
have a positive attitude, can high attitude scores also be predictive of job promotion? If you administer your
attitude survey to someone and s/he rates high, indicating a posotive attitude, then alter that week s/he is fired from
his/her job and his/her spouse divorces him/her, you may not have predictive validity.
Concurrent validity- Does your attitude survey give scores that agree with other things that go along with
attitude? For example, if someone scores low, indicating that they ahve a negative attitude, are low attitude scores
concurrent with (happen at the same time as) negative remarks from that person? High bolld pressure? If you
administer your attutude survey to someone who is cheerful and smiling a lot, but they rate low, indicating a
negative attitude, your survey may not have concurrent validity. For an extremely thorough example of research on
the use of solution-focused group therapy with school children, which includes a concurrent validity check, click
here .
Finally, the most rigorous validity test you can put your attitude survey through is the construct validity check. Do
the scores your survey produce correlate with other related constructs in teh anticipated manner? For example, if
your attitude survey has construct validity, lower attitude scores (indicating negative attitude) should correlate
negatively with life satisfaction survey scores, nd positively with life stress scores. These other constructs do not
necessarily have to be predictive or concurrent, however often times they are. For an in-depth discussion of
construct validity, click here . To see what some of the threats are to construct validity, click here .
If your attitude survey has made it this far, you have a most valid construct! Congratulations!
For more information on validity, here are some helpful sources:
For a discusson of validity issues in the field of industrial and organizational psychology, this link discusses in highly
technical terms, issues relating to content validity in particular, that come up when trying to measure for promotion or hiring.
For a general and easy to read discussion on all types of measurement validity in social science research, this link is very
helpful in clarifying differences between validity components.
For a discussion of validity issues in the field of psychology, this link discusses specific problems that coe up when dealing
in the area of Emotional Intelligence.
In search of truth through quantitative reasoning
Introduction
Quantitative reasoning becomes self-evident when actively participating in the overall learning experience. A research
oriented learning experience includes a formal and informal process of gaining, utilizing and systematically applying
knowledge to an area of interest in order to make sense of the interrelationships between what one knows and what one
learns. With quantitative reasoning skills, one can integrate deductive logic aspects from multiple knowledge dimensions into
program evaluation and research. There is a beginning, middle and an end to this cyclical process which allows for the
adjustment of additional information. When approaching evaluation questions, within a particular context, it is important to
keep in mind that a scientific, linear model is but one method of organizing information. The purpose of this home page
assignment is to provide a framework for constructing relevant quantitative interrelationships grounded on William
Trochim's Program Evaluation and Research Design course offered through Human Service Studies (HSS), Planning and
Evaluation. Technological advances in computer use will allow you, the new research student, to follow and apply this
information to your specific research project. This html assignment will introduce you to the world of quantitative research
methods, as well as open your eyes to the possibilities that exist for applying similar concepts to qualitative studies.
There is tremendous value in understanding the plural dimensions of both quantitative and qualitative approaches to
evaluation methodologies. Wolcott, Guba, and Lincoln, advocate the necessity of becoming familiar with all other methods in
order to appropriately select the method that best fits your area of research and design. The context, purpose, and types of
research questions asked will define the methodological foundation of a study. Keeping this caveat in mind will eliminate
mismatched efforts and results that can only frustrate a beginning student in research.
The quality of one's research will establish the foundation for the entire inquiry process and is based on qualitative
judgments. Judgments can be applied or transferred to quantitative terms with both inductive and deductive reasoning
abilities. "Quantitative reasoning...refers to a wide range of mental abilities" (Wolfe, p. 3) that facilitates deductive reasoning
in a variety of settings. This article demonstrates the value of quantitative reasoning across many disciplines using learned
and intuitive skills that most individuals in graduate programs already possess. However, the burden is on the researcher to
justify the methodology and validity of their particular evaluation or research area of interest. As a student in both
quantitative and qualitative courses in the field of HSS, I find it necessary to emphasize the importance of gaining a working
knowledge of both methodologies within meaningful contexts. Throughout these collective courses, I have been provided
opportunities to apply theoretical concepts to field assignments that made more sense than simply reading about experiences.
After considering the context and nature of your project, then select the appropriate method of inquiry to help direct the
development of specific research questions. The objective of your inquiry is to ask questions in order to retrieve the data or
information that is salient to your project. Collecting and analyzing data with quantitative strategies includes understanding
the relationships among variables utilizing descriptive and inferential statistics. This process will require a serious research
student to gain a fuller knowledge base by undertaking courses in statistics or regression analysis.
Briefly, descriptive statistics are theoretical postulates used to draw inferences about populationsand to estimate the
parameters of those populations. Measures of central tendency and dispersion summarize the information contained in a
sample and are usually provided in summary form, such as distributions, graphical and or numerical methods (Applied
Regression Analysis for Business and Economics, 1996). Inferential statistics are based on descriptive statistics and
assumptions that generalize to the population from a selected sample. These assumptions focus on the use of continuous data
and that the sample is a random representation of the population. Inferences made at large use probabilities and probability
distributions. Statistical evidence is especially important to policy makers or other stakeholders that have a vested interest in
research/evaluation projects. Patton, Guba & Lincoln concur that stakeholders use extrapolated information as the basis of
decision making.
Again, as a reminder, consider the context of your own research and focus on the hypothesis generated by your interests.
Asking empirical questions in testable forms will involve the traditional use of the Null hypothesis versus the Alternative
hypothesis. Test statistics for significance are used to determine if the null or alternative is to be accepted or rejected. The
null hypothesis tests for the differences between population means. Inferential logic will establish the standards of your study
based on theory and application to reality.
To effectively express quantitative concepts requires familiarity with the language. Wolfe (1993, p. 3) emphasizes the
importance of understanding and participating in the entire process at all levels of cognitive reasoning. Reading, writing, and
interacting with the research process will promote true learning by integrating critical thinking skills. With quantitative
analysis, it is especially important to understand the units of measurement in comprehensible formats, such as visual
representations. Graphs, charts, plots, and histograms adequately display raw data for a given context and chances of
remembering visuals are greater than remembering numbers or text. Acquiring this working knowledge also includes skills
in understanding scales and distributions. I highly recommend students enroll in the HSS Measurement and Design course
for an in-depth "awakening" to the world of nominal, ordinal, interval and ratio scales.
Now, the big question is how does one arrive at the "approximation to the Truth based on conclusions from
research?" (Trochim, Home Page, Knowledge Base; Validity). Truth and inquiry are a process related to logic, evidence, and
argument as Trochim will elaborate on in his course presentations and Knowledge Base. Human beings interpret raw data
and there are general guidelines presented to evaluate assertions and to assess validity. Quantitative strategies use normal
distributions based on statistical or regression analysis. This approximation to normality (truth) is tangible evidence that
assertions may be true. However, if the constructs used to establish causality are not clearly operationalized to begin with,
then those inferences about relationships and variables may not be valid or reliable. It is logical to improve construct validity
in order to strengthen internal and conclusion validity.
A resource to address conclusion and internal validity issues for quantitative methods is Trochim's paper on Statistical Power
and Statistical Tests (1984). This article describes "four interrelated components that influence conclusions you might reach
from a statistical test in a research project "(p. 1) utilizing a 2 x 2 decision matrix. The null and alternative hypotheses
"describe all possible outcomes with respect to the inference" and is dependent upon the researcher determining which
hypothesis "allows the maximum level of power to detect an effect if one exists" (Ibid. p.1). Types I and II errors are firmly
established in probability theory that one or the other hypothesis is the incorrect conclusion arrived at in the research.
Basically, a researcher wants to stat istically demonstrate that their program did have an effect in order to accept the
alternative hypothesis.
Simultaneously, understanding the theoretical background within the research construct can help eliminate systematic bias
early on. Post-Positivism constructs support the logic of reasoning or decision-making that parallel claims for evidence of
relationships. When constructing arguments for validity, it becomes increasingly evident that one should attempt to control
for or anticipate as many threats to validity as is humanly possible. Working with an experienced research advisor as a
mentor or facilitator in your research area will be of tremendous value. Seek the advise of those experienced in your field of
interest and invite them to join your team. There is a continuum of research perspectives available that can provide variety,
clarity, or vision to your inquiry.
On this note, design construction for quantitative or qualitative research projects should be routinely considered throughout
the research cycle. Good design construction (Trochim & Land, 1982) has several characteristics that are applicable within
general and specific contexts. Effective research strategies should focus on individualized designs that are theory grounded,
are situational, feasible, redundant, and efficient. As part of the introduction to the HSS 691 course, I would recommend
reading the article " Designing designs for research" early in the semester to begin understanding where the above issues fit
in the overall scheme of research.
Summary of quantitative reasoning issues
To summarize the material presented, I have offered a brief outline of interrelated concepts supporting quantitative
reasoning within the methodology. There is value in learning strategies to systematically strengthen the overall design and
conclusions of research projects. This approach has some transferable attributes that may also address qualitative studies
through mixed methods. I can only speak from my personal experience that research, in general, is a very rigorous exercise in
critical thinking. Developing plausible arguments for inferences, data collection and analysis strategies, and actually writing
or presenting research findings is a major effort in any field that will require a commitment and compassion for your
particular area of interest. The evolving study one chooses to engage in can provide inestimable results to countless, unseen
others.
There are a few issues regarding this outline that have not been addressed that merit some attention. Often I am reminded
that "social research occurs in social contexts" and that not all human beings enter social research with the same level of
quantitative or qualitative knowledge. Knowing personal strengths and limitations can prepare a student in research methods
for the amount of work required to gain competency in quantitative research. As a student masters elementary concepts, an
expansion of this knowledge can be applied to broader discussions in assertions and the development of evidence to support
those assertions. One must learn to think deductively and inductively in order to view the similarities and differences of both
methods which can enhance your methodological approach to research questions. Abstract concepts from both methods are
not mutually exclusive, as I have observed in the course of my study.
Present research utilizing qualitative methods
As mentioned earlier, I am presently researching environmental efforts among indigenous populations in the United States
across three ecogeographical regions. A brief contextual description of the Cornell American Indian Program (AIP) research
deserves attention. Originally, the Commission for Environmental Cooperation (CEC) solicited the participation of AIP in
assessing the extent of environmental initiatives within Native American communities across the country. This entailed
defining the constructs of "indigenous groups," "sustainable efforts," and "cultural or indigenous knowledge systems." The
research design included a purposive survey within a summative evaluation research plan. Twenty (20) descriptive case
studies were selected that best represented nine ecogeographical regions of the country.
Data triangulation methods were integrated throughout the research process involving a review of relevant literature and
existing program documents, as well as utilizing technological resources. A preliminary draft of ten of the twenty case studies
revealed several inferences regarding motivations for initiating and maintaining environmental efforts within indigenous
communities. The predominant theme was that indigenous knowledge systems were the impetus for community-wide
responses to environmental concerns.
Within this context, I am beginning to piece together the elements of research and design strategies I have acquired during
the course of my graduate program. Grounded theory supports those validity issues I had the most concern with. The
internal debate I had about qualitative studies included questions regarding the validity and reliability of my project and the
value of such a study in a larger context. I realized that I was seeking evidence to support a relationship between category
and environmental initiatives.
An indigenous knowledge system (IK), as a major categorical concept, was identified through each interviewee. This concept
encompassed those values that had historical or cultural significance in the continuation of tradition which included the
development of an environmental initiative that addressed the needs of that population within a specific region of the
country. Specifically, for the northeast region, where the Iroquoia nations reside, corn as a traditional food, was viewed as
"an extremely unifying concept and way of life...which defined (native) culture... "(Interview 9). This integrative category
extended into family, ceremonial, and community levels which facilitated the continuation of established values.
Patton (1990), Miles & Huberman (1994), Strauss & Corbin (1990), and Guba (1989) provided "trustworthiness"
information that validated my research findings. Just as data speaks for itself and emerges into themes and patterns, so did
my understanding that "trustworthiness" emerges through the efforts of the researcher to provide credible, confirmable and
dependable findings. Using an audit trail offered visible support that the (IK) category was integrated into the overall
research findings of my project. Technical literature, comparable interviews, and other existing linkages in the environmental
network confirmed that traditional values within specific communities supplied the basis for pursuing sustainable efforts.
These multiple sources addressed the internal and external validity issues of qualitative analysis that were my initial
concerns.
Wolcott's (1990) article "On seeking - and rejecting - validity in qualitative research" proposed that when one lives through
an experience, the experience is validated. The synthesis of personal, professional, and spiritual elements of an experience
provides an image of what is Real (capital "R"). These elements are not compartmentalized in order to have a fuller
understanding of the complexity of relationships within systems. So it is with the traditional values of native peoples.
Conclusion
In conclusion, I foresee mixing methodological approaches in establishing relationships between the above described
variables. To statistically provide evidence of a relationship would require examining separate native populations, based on
comparable characteristics. The feasibility of increasing the number of Native communities involved in this study may limit
my scope, however I also recognize the importance of determining if there are other factors influencing a community's
environmental efforts. Hey, anything is possible when you have a commitment to searching for an approximation to the truth!
Comments/Questions: R.Maldonado rmm8@cornell.edu
Bibliography
Denzin, N.K., & Lincoln, Y.S. (Eds.) (1994). Handbook of qualitative research.
Thousand Oaks, CA: Sage. Chapter 27. Huberman, A.M. & Miles, M.B. "Data
Management and Analysis Methods"
Dielman, T.E. (1996). Applied Regression Analysis for Business and Economics.
International Thompson Publishing, Inc.: Wadsworth Publishing Company.
Guba, E.G., & Lincoln, Y.S. (1989). Fourth generation evaluation. Newbury Park,
CA: Sage
Miles, M.B., & Huberman, A.M. (1994).Qualitative data analysis: An expanded

sourcebook. (2nd ed.). Newbury Park, CA: Sage.
Patton, M.Q. (1990). Qualitative evaluation and research methods. (2nd ed.). Newbury Park, CA: Sage.
Trochim, W. (1984). Statistical power and statistical tests. Unpublished manuscript, Cornell University, Ithaca, New York.
Trochim, W. and Land, D. (1982). Designing designs for research. The Researcher, 1, 1, pgs. 1-6)..
Trochim, W. (1995). Home page for HSS 691, Program Evaluation & Research Design [World Wide Web Site]. http://trochim.
human.@cornell.edu (Producer and Distributor).
Strauss, A., & Corbin, J. (1990). Basics of qualitative research: Grounded theory procedures and techniques. Beverly Hills, CA:
Sage.
Wolcott, H.F. (1990). "On seeking - and rejecting - validity in qualitative research." In E.W. Eisner and A. Peshkin (Eds.),
Qualitative inquiry in education: The continuing debate, (pp. 121-152). NY: Teachers College Press.
Wolfe, C. (1993). Quantitative Reasoning Across a Curriculum. College Teaching, 41, 2-8.
Ethical Dilemmas in Program Evaluation and Research Design
Elizabeth K. LaPolt
As you browse through any college course description catalog you will inevitably come across courses in ethics. While many of the these
classes are offered through philosophy departments, you will sometimes see them listed under biology departments or political science
departments; however, ethical issues should not be of concern only to future philosophers, doctors and politicians. Evaluators of social programs
must be prepared to face moral and ethical dilemmas at all stages of their work.
Trochim defines evaluation as "the systematic acquisition and assessment of information to provide useful feedback about some object"(1991,
29). At first glance this statement does not seem all that ethically daunting. What makes evaluation wrought with moral and ethical
complications is the fact that it is people who design and perform the "systematic acquisition and assessment of information to provide" other
people "useful feedback" about programs which are meant to, in some way, affect yet another group of people. When you give these people the
titles of evaluators, evaluation audiences (which could be funders, administrators or even politicians), and program stakeholders, the potential
for ethical complications to arise becomes evident.
As an evaluator approaches a project, that person wants to ensure that the quality of their work is deemed acceptable by other social scientists
and the evaluation community. The "theory of validity" is an approach to evaluation by which many evaluators set their standards. Four types of
validity cumulatively contribute to this theory: conclusion validity; internal validity; construct validity; and external validity. Ethical questions
may arise as the researcher tackles each of these dimensions of validity.
Conclusion validity addresses the question of whether or not a relationship exists between two items. In terms of evaluation, this would be an
analysis of whether or not a relationship exists between your program and the results you observed. Where are the ethical concerns here? What
could be wrong in trying to determine whether or not a nutrition education program taught people to eat healthier or whether a support group
helped women cope with breast cancer? Ethical issues first arise not in determining whether there is a relationship, but whether you want to be a
contributor to researching that relationship.
OK. You still don't see the big deal. Well, what if someone were to approach you and ask you to evaluate the effectiveness of a program which
convinces minority women to have hysterectomies; perhaps someone approaches you about doing an evaluation of how much torture a prisoner
can withstand, would you take the job? (Maybe theses scenarios seem a little far fetched to you, but if you examine your 20th Century history
books, I am sure you will find evidence of people calling for this type of research.) If those situations still seem a little ridiculous to you, think
about some of the controversial social programs that are in place today. Would you participate in the evaluation of a program that distributes
clean hypodermic needles to drug addicts in the hopes of preventing the spread of HIV? Would you evaluate the success of a program which
actively hands out condoms to teenagers? Perhaps your values are not in alignment with the point the evaluation wants to prove. Before
undertaking an evaluation you need to consider the ethical and moral implications of the research you are about to conduct. Are these
evaluations that you would want to conduct? Would you do it and then say it was wrong or would you choose not to even be associated with the
research? Perhaps you could establish construct validity, that there is a relationship, but what will the implication of your findings be? Would
you want to contribute to the establishment of construct validity of an issue that does not meet your personal ethics and you feel does not
contribute to the greater good of society? These are questions that you, as a social researcher, should stop to ponder.
In the American Evaluation Association's Guiding Principles for Evaluators, principle III.
E states, "Evaluators articulate and take into account the diversity of interests and values
that may be related to the general and public welfare"(19). Ernest House adds that,
"evaluators should serve the interests not only of the sponsor, but of the larger society, and
of various groups within society, particularly those most affected by the program under
review"(32). He continues, "recognizing that there are interests to be served ties
evaluation to the larger society and to issues of social justice"(32).
Well, once you've determined that you can undertake an evaluation your mission is to actually prove that there is a relationship between your
program and its outcomes, thereby establishing conclusion validity. There are three ways in which you can improve the likelihood of conclusion
validity: ensure reliability, properly implement all testing procedures; and establish good statistical power.
While ethical concerns could present themselves when you are addressing reliability and instrumentation, it is with the concept of statistical
power that I would like to address the possibility of ethical concerns presenting themselves.
There are four components of statistical power: sample size, effect size, alpha level, and power. Power is exactly what we are looking for in
most cases, we want to increase the odds of saying that there is a relationship between our program and the outcomes, when in fact, there is one.
Unfortunately , by setting yourself up for high power, you are also increasing the odds of saying there is a relationship, when in fact, there is no
relationship. This is called a type I error and is referred to as alpha. Alpha is a value which can be set by you, the evaluator (and in this situation,
the statistician). You can consider the value to be reflective of the level of risk you are willing to take in being wrong. This is where it is up to
you to make a decision, and yes, it may turn into a situation where you will have to reflect upon your ethics. Determining which is worse, a type
I error or a type II error is forcing you to make a moral judgment, answering the question of what is right and what is wrong.
Perhaps it will help you to think about this in terms of the American justice system. Would you consider it worse to let a guilty person go free
(type II error) or is it our duty to keep innocent people from being punished (type I error). (I highly recommend that you visit the OJ pages on
this web site to follow up on this concept.) How you answer these questions will depend on the nature of your evaluation, and what the
implications are for your conclusions. So, I guess you'd like an example? Suppose you are evaluating a multi-million dollar government funded
program which is supposed to help children from limited resource families do better in school. Which scenario would be worse, canceling the
program and saving taxpayers million of dollars by mistakenly determining that the program was ineffective or concluding that the program
does work and help children, even through the program does not actually succeed in assisting them. Who do you want to put at greater "risk",
the taxpayers, whose tax dollars could be being wasted, or the children, who could lose out on a valuable program? Your alpha levels will
reflect this, and it is your call as an evaluator to determine those levels.
OK, once you've established conclusion validity and grappled with the issue of power, it is time to move on and deal with internal validity,
proving that it's actually your program that's making a difference. There are many threats to internal validity, all of which need to be addressed.
One of the single group threats is history. The simple fact that life brings new things into people's lives daily can affect the internal validity of
your study. People do not stop existing outside of the constructs of your study. How many limitations can you put on a person you are studying
in order to control for the possibilities of history threats? Would you tell a child that he or she can't watch Sesame Street over the course of your
study because you want to prove that it was your program, not Big Bird or Grover contributing to that child's growth?
Of particular relevance to ethical issues are multiple group social interaction threats. You're trying to prove that it's your program that's making
a difference, so you establish control or comparison groups. This has the potential to bring about compensatory equalization of treatment.
Perhaps the teacher of a class which has been chosen as comparison group sees what's going on in the program classroom and she decides to do
something extra for her class because she feels her class is missing out on something. How can you argue, on moral grounds, that she should
deny her class that growth for the sake of research?
Perhaps resentful demoralization occurs and the control group does worse, because they're upset about not getting the program. What can you
do to prevent this from happening? You could keep the program a secret from the control group. Ah, but is that ethical? Is it correct to not tell a
group of people about the benefits of a program because you need to use them as a control, this is a dilemma particularly if there is significant
evidence that the program you are evaluating is beneficial. Is it correct to deny the comparison group a treatment for the sake of research
validity? Rossi "find[s] it hard to envisage the circumstances under which doing so would not endanger the integrity of an evaluation. Giving
out such information to a comparison or control group is the equivalent of shooting oneself in the foot, potentially narrowing the differences
between them and treatment groups, correspondingly lowering the power of the evaluation"(57).What kind of tradeoffs are you willing to make
for the sake of social science research?
OK, so you've muddled your way through some of the ethical dimensions of internal validity. Now you have to face construct validity and
determine if it is your program, all dimensions of your program and nothing but your program which is influencing the stakeholders. There are
many threats to construct validity, one of which is "Restricted Generalizability Across Constructs." In other words, what do you do when your
program does work, but it it causing side effects, unanticipated consequences? How will you address this dilemma? I guess you could use
another example. OK. Suppose you've devised a program which is intended to enable children to resolve their conflicts without violence.
Perhaps you've determined that for most of the children the program works, they talk out their problems more often, rather than pick fights.
However, a certain portion of the population of children seem to be worse off from the program. They react even more violently then they had
previously, they learn to like starting trouble with others and resorting to fist fights. Does your program work, or doesn't it, can you wholly
establish construct validity?
In addition, there are several other threats to construct validity, some of which are social threats. These social threats include evaluation
apprehension, hypothesis guessing and experimenter expectancies. These three threats all relate to whether or not you have told the study
participants what your are studying. However, by not telling the study participants what you are studying violates the ethical and legal
dimensions of voluntary participation and informed consent. The people you are studying should not be forced into participating. Additionally,
the people you are studying must give consent to participate, fully understanding any risks that your study puts them under. But, what do you do
if this affects the validity of your study?
OK, I can tell you're waiting for another example. Well, perhaps you've heard about this study. (I swear this was an actual study, but I am
recalling it from memory. I do not have a reference for it , but I couldn't resist using as an example here.) A study was done on how physical
proximity affects people's level of comfort. In other words, how much personal space do you need between you and another person without
feeling uncomfortable? Well, the investigators studied the concept by hiding in the stalls of mens' bathrooms and recording the amount of time it
took for men to urinate, depending upon how near or far another man stood at the urinals. The amount time was used as the indicator for level of
comfortability. Well, if you ask me, this is not only an invasion of privacy, but by no means did the researchers get voluntary participation or
informed consent. What can I say, all for the sake of research?
Once you have established construct validity, it's time to deal with external validity. How generalizable are your study findings? Can the
conclusions of your study be generalized to a larger population of people? Are the results representative of only the people who participated in
the evaluation or is the information you collected applicable to a larger part of society?
"Formally speaking the most representative samples will be those that are randomly chosen from the population, and it is possible for these
randomly selected units to be randomly assigned to various experimental groups" (Cook and Campbell, 75). But, this is not necessarily feasible
for all studies, this method, "can be followed for some issues where it is important to generalize to particular target populations of persons, it is
less clear whether it is often feasible to generalize to target settings, except where these are highly restricted" (75). Perhaps the one of the best
ways to address the issue of ethical dilemmas in relation to external validity is by commenting on the debate between Regression Discontinuity
(RD) Design and Randomized Clinical trials (RCT).
RD is a research design in which people are placed into a group based on a cutoff score. For example, if you've developed a math tutoring
program, you would place all of the students into the program who have scored below a specified score on some type of math ability test. This
design, "intend[s] to balance ethical and scientific concerns when it is deemed unethical or infeasible to randomize all patients into study
treatments" (Trochim and Cappelleri 387). This design enables researchers to get their program to those who need its services the most. Trochim
suggests that it forces politicians to use "accountable methods for assigning program recipients on the basis of need or merit" (1990, 126).
However, there are drawbacks to the design. "The lower power and efficiency of cutoff-based designs could increase rather than decrease the
complexity, duration, or expense of controlled clinical trials" (Trochim and Cappelleri, 1992, 392). This means involving more people, more
time, and more money. This presents some serious ethical dilemmas. Using a medical example, "if the drug is eventually found safe and
effective, more patients will have been denied optimal care in an RD design than in a randomized clinical trial. If the drug is found to have
unacceptable side effects for the level of effectiveness, more patients will have been exposed to the risk of side effects in RD design than in a
randomized clinical trial. Either way, more patients will be given the wrong therapy in an RD design than in a randomized clinical
trial"(Williams 148). The benefits and drawbacks to RD need to be examined carefully if you choose to use this research design in your
evaluation or study.
In order to successfully design your evaluation, you must closely examine how ethics and moral decisions complicate the theory of validity.
This web paper has been written to complement Bill Trochim's Knowledge Base. The format of my discussion follows the general outline of the
theory of validity as it is presented in his web site. By no means is this an exhaustive discussion of where ethical dilemmas can occur in program
evaluations. Rather, I prepared this paper with the intention of helping you prepare for the some of the moral issues and decisions you will have
to make as you stage your program evaluation and as you attempt to maintain validity throughout your research. At times it may be difficult,
and you will have to compromise between your moral ethics and the research standards you want to adhere to.
As a social science researcher, you will have to translate your personal ethics into your professional ethics, and both codes of ethics should
reflect the fact that you are a part of larger society. "The role of the evaluator as member of society at large reflects our presence in a democratic
society where common citizenship with it certain expectations of duty, responsibility and practice" (Newman 100). Social science research
usually intends to contribute beneficial information to society. This concern for the well being of others should be present throughout all stages
of your work, "an underlying tenet of Western democracy is that every citizen has the responsibility to protect and defend the common
good" (Newman 102).
References
American Evaluation Association, 1994. Guiding principles for evaluators. New Directions for Evaluation, 66, 19-26.
Cook, T.D. and Campbell, D.T. 1979. Validity. Chapter 2 of Quasi-Experimentation: Design and Analysis Issues for Field Settings. Jossey-
Bass, pps. 1-7.
House, E.R. 1994. Principled Evaluation: A critique of the AEA guiding principles. New Directions for Evaluation, 66, 27-34.
Luft, H. The applicability of the regression discontinuity design in health evaluation. In L. Sechrest, E. Perrin, and J. Bunker (Eds.), 1990.
Research Methodology: Strengthening Causal Interpretations of Nonexperimental Data, Washington, DC: U.S. Dept. of HHS, DHHS, Number
(PHS) 90-3454, pps. 141-143.
Newman, D.L. 1994 The future of ethics in evaluation: developing the dialogue. New Directions for Evaluation, 66, 55-60.
Rossi, P.H. 1994. Doing good and getting it right. New Directions for Evaluation, 66, 55-60.
Trochim, W. and Cappelleri, J. 1992. Cutoff assignment strategies for enhancing randomized clinical trials. Controlled Clinical Trials, 13, 190-
212.
Trochim, W.M. 1991. Developing an evaluation culture for international agricultural research. In D.R. Lee, S. Kearl, and N. Uphoff (Eds.).
Assessing the Impact of International Agricultural Research for Sustainable Development: Proceedings from a Symposium at Cornell
University, Ithaca, NY, June 16-19, the Cornell Institute for Food, Agriculture and Development, Ithaca, NY.
Trochim, W.M. The regression-discontinuity design. In L. Sechrest, E. Perrin, and J. Bunker (Eds.), 1990. Research Methodology:
Strengthening Causal Interpretations of Nonexperimental Data, Washington, DC: U.S. Dept. of HHS, DHHS, Number (PHS) 90-3454, pps. 119-
139.
Williams, S. Regression discontinuity design in health evaluation. In L. Sechrest, E. Perrin, and J. Bunker (Eds.), 1990. Research Methodology:
Strengthening Causal Interpretations of Nonexperimental Data, Washington, DC: U.S. Dept. of HHS, DHHS, Number (PHS) 90-3454, pps. 145-
149.
This page was written and designed by Elizabeth K. LaPolt for HSS 691-Program Evaluation and Research Design, a
graduate course at Cornell University.
Please e-mail any comments or questions to ekl5@cornell.edu.
Copyright ©1997,Elizabeth K. LaPolt, All Rights Reserved

IDENTIFYING AND FOCUSING RESEARCH QUESTIONS:
FIRST STEP IN THE DESIGN PROCESS
Do you have a paper to write for one of your classes? Have you been commissioned to conduct an evaluation or some
research project for an agency? Do you want to help address issues that are of concern to a specific community or
institution? Oftentimes, research manuals assume that you--the reader, the researcher--already have identified the questions
of your study. However, this is not always the case. The purpose of this site is to help you in the process of generating and
focusing your research questions.
It is important that you define the

object of your study clearly. The way
in which a problem is defined has
significant implications for the
research design--the variables that are
examined, the kinds and sources of
information that are gathered, the
analysis strategies that are used, and
the conclusions that are drawn.
Oftentimes, when deadlines

approach, people jump into the
execution of vaguely defined studies.
I encourage you to fight this impulse,
and to take some time to think about
your research and the implications of
your choices. Front-end attention to
design takes time and may generate
some costs; however, the investment
can save you time and costs later You
will avoid, for instance, duplicating data collection and wasting time on deciding what to do next, or on data gathering and
analysis that are irrelevant to the questions. If you are working for some institution or individual, you will also increase the
usefulness of the product of your work to the intended recipient. Remember. Taking time to design your research carefully
will pay off in the end. Not only will you avoid using time and resources ineffectively, but, what is more important, the
quality of your work will be enhanced.
The first step in the research design process is that of defining the questions for study. Once the questions for study have
been posed, you will be able to develop the methodology and formulate the data collection and analysis plans for answering
them. You probably agree on the relevance of asking the correct question or questions, but are wondering what exactly is
the "right question". This is not necessarily an easy task, yet it is not impossible either. Following are some tips that will
help you clarify the issues, decide what questions are feasible, and ultimately formulate your research questions and assess
your research design
According to Lee Cronbach (Designing Evaluations of Educational and Social Programs. San Francisco: Jossey-Bass,
1982, pp.210-244), formulating the right question(s) has two phases, which we will refer to as Question Generation and
Question Focusing
.
ASSESSING THE DESIGN
Once you have selected your design, and before you start implementing your study, it is important that you take some time
to think about your research and the implications of your choices. To help you in this task, following are a list of questions
developed and used by the United States General Accounting Office (GAO). GAO's Program Evaluation and Methodology
Division uses a review system in order to judge the appropriateness of moving into implementation. This review system
consists basically of the following five key questions:
1. How appropriate is the design for answering the questions posed for the study?
2. How adequate is the design for answering the questions posed for the study?
3. How feasible is the execution of the design within the required time and proposed resources?
4. How appropriate is the design with regard to the user's needs for information, conclusiveness, and timeliness?
5. How adequate were the negotiations with the user regarding the relationship between the information need and
the study design?
For further information on this review system, see USGAO, Designing Evaluations PEMD-10.1.4, March 1991 or click
here.
QUESTION GENERATION
According to Cronbach (Designing Evaluations of Educational and Social Programs. San Francisco: Jossey-Bass, 1982, pp.210-244), in the first phase, researcher(s) [and
sponsor(s)] should consider a wide number of potential questions (and methods by which to answer them), even if they do not seem very plausible.
Let's have a look at specific ways in which you can identify research questions. We can distinguish between questions that you develop on your own, and questions that you
generate with others. You may also want to read other perspectives on problem formulation.
Develop, if possible, descriptive, normative, and impact (cause-and-effect) questions, and think about the implications of each one of them for the development of a design.
This phase helps identify and agree on which questions have a highest priority, as well as on which ones can be easily answered, which ones are more difficult, expensive
and/or time-consuming, and which ones cannot be at all answered, and why.
GENERATING QUESTIONS ON YOUR OWN
No matter if it is something that you read in a newspaper or a scientific publication or a conversation that you have with a
friend, they will raise questions in your mind that you will not be able to answer or that will make you want to learn more
about the subject. Oftentimes, people make you think, challenge your assumptions, and trigger your interest in knowing
more about specific topics, populations, etc. Sometimes, you may want to find answers to controversial social questions;
some others, you may want to test someone else's theories or proof certain postulates. If you think problems through
critically, you will constantly find new subjects for investigation.
PERSONAL EXPERIENCE & INTERESTS
Your personal experience and interests are a rich source of inspiration and a powerful thrust to pursue research. When you
can choose the subject to explore, you should ask yourself "what subject do I want to explore?" and also "what subject do I
want to be an explorer of?" Think of questions that are of direct personal interest to you. In Jean Johnson words, "searching
seems to be a result of our natural curiosity, our desire to find answers to problems, our urge to question what other s have
told us, or perhaps just our need to know more about the unknown…. The result will be not only an additional paper but also
an addition to your personal store of knowledge--each helps to define you as a person." (The Bedford Guide to the Research
Process, 2nd ed. Boston: Bedford Books of St. Martin's Press, 1992, p.3 & 7).
Before deciding on a specific topic, explore one or more subjects, or general areas of study. These subjects may interest you
because of what you have read, heard, or seen, or because of what has happened to you or to someone you know. Previous
coursework, your hobbies, and previous experiences can be sources of research topics. Even when you are somehow limited
in your subject choice (for example, if your professor requires that you write about Welfare Reform), you can approach it
from many angles, and find a part of that subject that relates to something you are interested in (in our example, you may
want to focus on the impact of new legislation on unemployment or children's health status).
Remember that it is good to start with a long list of subjects that interest you. Then reread it carefully and pick two that
appeal to you most, and try to further focus your topic by considering the information that is available (you may not want to
choose a topic if all the information is contained in journals written in a language that you don't understand), as well as other
constraints that you may face.
GENERATING QUESTIONS WITH OTHERS
STAKEHOLDERS
If you are doing research for somebody else, whether you are working on your own or you are part of an action research
project, it is essential that you attend to the needs and concerns of clients and coresearchers. Do they need to know the
impact of a nation-wide income generation project? Or is their concern limited to factors that contribute to the sustainability
of the project in one particular city?
It is important that you focus your research on stakeholders' questions, issues, and intended uses. When you are
commissioned a research or evaluation task, it is critical that you find out how stakeholders define your work, the kinds of
things that they are interested in learning, and what they expect from you. Oftentimes, the questions you will focus on are
the questions they want answered, the ones they need information to answer and your job will be to provide that
information.
In order to help a group generate a list of questions, Michael Patton suggests an interesting exercise. Get together a group of
key stakeholders, and ask them to complete ten times the blank in the sentence "I would like to know ______________
about (name of program or object of study)." Then, divide them into groups of three or four people each and ask them to
combine their lists together into a single list of ten things each group wants to know. Finally, get back together and generate
a single list of ten basic things the group does not have the information on but they would like to have information on,
information that could make a difference in what they are doing. Further sessions should take place afterwards in order to
refine and prioritize these questions (Utilization-Focused Evaluation, 2nd ed. Newbury Park: SAGE, 1986, pp.75-77).
QUESTION FOCUSING
The second phase in the formulation of the right question is to match possible questions with available resources. In this
phase, attention focuses on whether the questions are answerable, given money, staff, and time Utilization-Focused
Evaluation, 2nd ed. Newbury Park: SAGE, 1986, p.81).
Generally, it is more difficult to establish focus and priorities than to generate potential questions, because it involves trade-
offs between breadth and depth, between truth (internal and external validity) and utility, between quantitative and
qualitative methods, as well as within each of these methods. "The extent to which a research question is broad or narrow,"
says Patton, "depends on the resources available, the time available, and the needs of decision makers. In brief, these are
choices not between good and bad but among alternatives, all of which have merit" (p.234).
Q & A: What is Concept Mapping?
Shuzo Katsumoto
This page is designed to provide an audience with no prior research knowledge with basic ideas of Concept Mapping. For the
purposes of simplicity and clarity, the text is presented in Q & A form. This tutorial can be followed from beginning to end, or you
can use the table of contents to jump to specific questions unknown to you.
Table of Contents
● What is concept mapping?
● What sort of ideas are to be dealt with in concept mapping?
● Why is concept mapping useful?
● When is concept mapping useful?
● Are there several different types of concept mapping?
● What are the features of the concept mapping developed by Prof. Trochim?
● What steps are involved in concept mapping?
❍ Step 1: Preparation
■ What is a facilitator?
■ How can a facilitator select participants?
■ Is there any limit on the number of participants?
■ What should be the focus of a brainstorming?

■ How do we decide by what criteria we will rate brainstormed items?
❍ Step 2: Generation of Statements
■ What is the facilitator's role when statements are being generated?
■ Is there a limit to the number of statements that will be generated?
❍ Step 3: Structuring of Statements
■ Is there a unique correct answer in the arrangement of the statements in the sorting procedures?
■ Is there any limit on the number of piles in the sorting procedures?
❍ Step 4: Representation of Statements
■ Is there any good reference for studying multidimensional scaling?
■ Is there any desirable or appropriate number of clusters?
❍ Step 5: Interpretation of Maps
■ Which of the four maps is the concept map?
■ What is the main point in interpreting the concept map?
❍ Step 6: Utilization of Maps
■ What is important in utilizing the concept map?
● What computer programs are able to accomplish the process of concept mapping?
● Is there any other effect of concept mapping than the conceptualization?
● Further Information
What is concept mapping?
Concept mapping is a general method with which you can clarify and describe people's ideas about some topic in a
graphical form. By mapping out concepts in pictorial form, you can get a better understanding of the relationships among
them.
What sort of ideas are to be dealt with in concept mapping?
For example, in a planning process, major goals, needs, available resources and capabilities, externalities, and any other
dimensions constituting a plan can be represented in a concept map. In evaluating social research projects, such dimensions
as programs, samples, settings, measures and outcomes would be included in the process of concept mapping.
Why is concept mapping useful?
In a planning or evaluation project, it is often very difficult for people involved in the project to make the situation clear and
recognizable and to have a common idea of the project among them. This is because things are usually tangible and complex
with lots of different environmental and human factors involved in a project. Concept mapping encourages the participant
group to stay on task, and the conceptual framework is expressed in the language of the participants rather than in that of the
planner or evaluator. With its pictorial representation and its participant-oriented features concept mapping can be a
powerful method to organize complex problems.
When is concept mapping useful?
As stated above, concept mapping is a general method for conceptualization, and it is utilized for many projects. Project
formulation, strategic planning, product development, market analysis, decision making, and measurement development are
probably the main ones.
Are there several different types of concept mapping?
Yes. Experts in planning and evaluation such as Trochim, W., Novak, J.D, Rico, G.L., Chen, H.T., and Rossi, P.H. suggest
concept mapping as a useful method for developing a conceptual framework. All of these concept mappings are similar in
that they result in a graphic representation, but their approaches or emphases are different. Here, I can tell you about the one
developed by Professor Trochim, W., at Cornell University.
What are the features of the concept mapping developed by Prof. Trochim?
There are several significant features: First, it stresses a group process. So it is especially useful when some stakeholder
groups get involved in certain work. Second, it takes a structured facilitated approach. It basically consists of six steps, and a
trained facilitator is expected to manage the process. Third, it uses multivariate statistical methods. These effectively
analyze the input from participants and depict the results in a pictorial form. And fourth, it takes advantages of specialized
computer programs. These efficiently handle the data and accomplish the procedures.
What steps are involved in concept mapping?

OK. Here is an overview of the concept mapping process. Let me explain them step by step:
Step 1: Preparation
Prior to commencement of the actual group process, two major things have to be done. First, the facilitator works with
people involved to decide on who will participate in the concept mapping process. It is good to encompass a broad range of
relevant people in order to insure that a wide variety of viewpoints are taken into consideration.
What is a facilitator?
A facilitator manages and guides the concept mapping process. The facilitator could be an outside person or an
internal member of the group responsible for the project. Because a group process is stressed, it is the entire group
that determines the content, interpretation, and utilization of the concept map.
How can a facilitator select participants?
Usually we use some random sampling method to select participants from a broader defined population, since it
raises generalizability of the resulting concept map.
Is there any limit on the number of participants?
No. There is no strict limit on the number of people. However, a relatively small group of between 10 to 20 people
typically seems to be workable enough.
The Second thing that should be done by the facilitator in the preparation step is to work with the participants to develop the
focus for the project. There are two major aspects which should be focused: the focus for brainstorming and rating. The
focuses for these two are worded as statements, and participants are expected to agree to them in advance.
What should be the focus of a brainstorming?
For example, participants may focus on the goals of organization, the program or treatment, or the outcomes they
expect to see as a result of the program or treatment.
How do we decide by what criteria we will rate brainstormed items?
The stakeholder group needs to think what kind of information will be most important or useful for them or their
project. For instance, the facilitator may ask the participants to rate how important each brainstormed item is, or
how much each outcome is likely to be affected by the program.
Step 2: Generation of Statements
After the participants and focus statements have been decided, the participants develop a large set of statements describing
the focus from a number of different aspects. Although an array of methods such as brainstorming, brainwriting, and
nominal group techniques are available to generate statements, brainstorming is typically used. In the brainstorming session,
participants are encouraged to express what they want to say as well as to ask for clarification of any unfamiliar terms so
that all participants understand well the meanings of the statements.
What is the facilitator's role when statements are being generated?
The facilitator usually records the statements by writing them on a blackboard, on sheets of newsprint, or directly
entering them into a computer so that all participants can see the set of statements as they generate. Also, the
facilitator is expected to take care of participants who are reluctant to state publicly their ideas for various reasons.
The facilitator in this case may give them anonymous comment sheets.
Is there a limit to the number of statements that will be generated?
No. There is no limit theoretically. But in practice, a hundred or less are desired. If more than a hundred of
statements are generated, it is advisable to reduce the set by examining for redundancies, selecting representative
statements, or taking other means.
Step 3: Structuring of Statements
Once the statements of the issue have been generated, the participants organize the statements to see how they are related to
each other. They do two things here: sorting and rating of the statements. First, each participant is encouraged to sort the
statements into piles of similar ones according to their own judgments, usually using a deck of cards that has one statement
on each card. Then, the results of all participants will be combined and examined carefully. Second, the participants rate
each of the statements on some dimension, whose focus should be decided in the preparation step. For instance, a rating may
be accomplished by using a 1-to-5 scale to show their priorities, where 1 indicates the lowest priority and 5 stands for the
highest priority.
Is there a unique correct answer in the arrangement of the statements in the sorting procedures?
No. The participants are instructed to pile the cards of the statements in any way that they think makes sense. In
other words, any arrangements the participants make are regarded and treated as sensible ones. But, note that there
are some restrictions on this procedure. Each statement can only be placed in one pile. Also, all statements cannot
be put into a single pile. In addition, all statements cannot be placed in their own pile.
Is there any limit on the number of piles in the sorting procedures?
No. The participants can have as few or as many piles as they want. Each pile is labeled with some short
descriptive name.
Step 4: Representation of Statements
Once the sorting and rating of the statements have been done, the stakeholder group is ready to make a concept map. Two
major statistical analyses are used here: multidimensional scaling and cluster analysis. By the first analysis,
multidimensional scaling, each statement is represented as a single separate point on the map, and statements which are
piled together by more participants are put closer to each other on the map. This means that distance between points
(statements) on the map stands for the degree of interrelationships among the statements, namely, the closer the distance is,
the more interrelated to each other the statements are. By the second analysis, cluster analysis, the outcomes of the
multidimensional scaling (the points on the map) are partitioned into groups or clusters. Statements which seem to be
strongly interrelated to each other, or to reflect similar ideas and concepts, are grouped into a cluster. Therefore, we can say
a cluster represents some conceptual domain. By the way, the picture shown at the top of this Web page stands for figures of
clusters.
Is there any good reference for studying multidimensional scaling?
Here are two good works on multidimensional scaling:
Kruskal, J.B. and Wish, M. (1978). Multidimensional Scaling. Beverly Hills, CA: Sage Publication.
Davidson, M.T. (1983). Multidimensional Scaling. New York, NY: John Wiley and Sons.
Is there any desirable or appropriate number of clusters?
It is a hard task to decide how many clusters into which the statements should be partitioned. There is no strict
criteria, but assuming there are a hundred or fewer statements, the cluster solution usually falls in between 20 to 3
clusters. For further information, see the references below:
Anderberg, M.R. (1973). Cluster Analysis for Application. New York, NY: Academic Press.
Everitt, B. (1980). Cluster Analysis (2nd Edition). New York, NY: Halsted Press, A David of John Wiley and
Sons.
Step 5: Interpretation of Maps
In the step of interpretation of the conceptualization established in map form, several materials are to be used. There are two
lists and fours maps:
1) The Statement List: The initial list of statements generated by brainstorming. Each of which is identified with an number.
2) The Cluster List: The list of clusters grouped by the cluster analysis.
3) The Point Map: The map on which the statements are shown as numbered points. They are placed by multidimensional
scaling.
4) The Cluster Map: The map showing the clusters partitioned by the cluster analysis.
5) The Point Rating Map: The numbered point map with average statement ratings overlayed.
6) The Cluster Rating Map: The cluster map with average cluster ratings overlayed.
Using these materials, the facilitator works with the participants to help them understand the various maps.
Which of the four maps is the concept map?
Actually, we can say all of them are concept maps!!! Each of them emphasizes a different aspect of the conceptual
information, and is related to each other. If someone says "concept map" without specifying the type of the map,
then usually the cluster map (4) is on the discussion topic, because it is most directly interpretable among the four
types of the map.
What is the main point in interpreting the concept map?
The main point of interpretation of the map is that all participants come to figure out well the interrelationships
among the clustered statements. It is aimed that everyone in the stakeholder group has a clear picture of the project
through the visual devise, or the concept map. Furthermore, it is essential that everybody shares the sense that the
concept map is their own product as a result of their collaboration-- it is an achievement based on statements that
they generated in their own words and that they grouped, and the labels on the map were named by them all.
Step 6: Utilization of Maps
The final step is the utilization of maps. The stakeholder group uses the concept map to address their original focus. The
concept map is useful in both planning and evaluation in a wide variety of ways: In planning, for instance, the map may
show you action plans, planning group structure, needs assessment, or program development. In evaluation, it may display
the basis for developing measures, sampling, or outcome assessment.
What is important in utilizing the concept map?
Note that the uses of the concept map are all up to the people who use it. That is, it is indispensable that the people
who use the concept map have strong creativity and motivation. Concept mapping is a method to clarify and
describe the complex situation and people's ideas. Concept mapping is a tool to think effectively. And, it is people
who use the method or tool.
The flow chart below summarizes the six steps of concept mapping:
What computer programs are able to accomplish the process of concept mapping?
There are essentially two options: combination of standard general-purpose word processing and statistics packages, or The
Concept System which was designed by Prof. Trochim specifically for concept mapping. In the former option, your
computer programs must meet several requirements, and you may encounter some inconvenience. If you are interested in
the latter option, you can ask for further information by contacting him directly. You can get a more detailed sense here .
Is there any other effect of concept mapping than the conceptualization?

Yes. Finally, I should not forget to tell you one of the major effects of the concept mapping that I did not mention so far. It
appears that the process of concept mapping increases group cohesiveness and morale. So, we can say concept mapping is
effective in the group itself, in addition to in the conceptualization I have explained all above.
Further Information
Trochim's Knowledge Base
You can find more detailed information on concept mapping in Professor Trochim's Knowledge Base. For a very basic introduction
to concept mapping, you can see Concept Mapping. If you need more detailed description, An Introduction to Concept Mapping for
Planning and Evaluation will provide you with a neat material. Concept Mapping: Soft Science or Hard Art? illustrates twelty
examples of final concept maps from a variety of subject areas such as University Health Services, Student Life, Employment, Senior
Citizens, and Music and Arts in Daycare. In addition, Using Concept Mapping to Develop a Conceptual Framework of Staff's Views
of a Supporte Employment Program for Persons with Severe Mental Illness gives you an in-depth sense of the actual use of concept
mapping.
Trochim Home Page

created in May, 1997

Hello! Welcome to the Epidemiology home page. The purpose of this website is to give you a very simple
introduction to epidemiology, particularly to the basic types of epidemiological study designs.
● Measures the frequency of diseases and related events, such as

mortality, morbidity, hospital stays, and health behaviors
● Looks for causes and risk factors of disease to propose effective strategies for disease prevention and control
Epidemiological studies are either

observational or experimental. Within observational studies, are the subsets of case control and cohort studies; within the
subset of cohort there can be retrospective or prospective cohort studies. An example of an experimental epidemiological
study is a randomized control trial. Please see the pictorial representation of the different types of epidemiological studies
below.
We will now explore the different types of epidemiological studies in depth.

Observational
Case Control
Cohort: Retrospective, Prospective
Experimental
Randomized Controlled Trial
Other Links
Observational Studies
In observational studies, the epidemiologist does not assign a treatment but rather observes. For example, if the
epidemiologist wanted to see if smoking is related to lung cancer, she would not be able to ethically assign people to smoke
and not smoke, but rather would observe the prevalence of who (smokers vs. non) develops cancer.
Case Control Studies

● Looks for causes of disease
● Looks at those with disease (case) vs. those without disease (control)
● The case group and control group should share the same potential causes/factors that are not to be studied (e.
g. age, socioeconomic status, sex, etc.), but should not share factors that the epidemiologist suspects are causal
factors
● The sample should be large enough so the findings can be extrapolated to a larger population
● Tests to see whether a given causal factor is significant through the odds ratio- the odds ratio is best
explained through an example
Cohort Studies
● Looks for causes of disease
● Can have a control group or not
● There are two types: retrospective and prospective
Experimental Studies
In experimental studies, the epidemiologist assigns subjects treatments. This is in contrast to the observational study,
where the researcher observes subjects and, in a sense, 'waits' for the 'treatment' or results to happen. One type of
experimental study is the Randomized Control Trial.
Randomized Control Trial

● Subjects are randomly given the treatment or control
● Since they were randomly assigned to either the treatment or control group, the two groups are not likely
comparable because of probabilistic equivalence
● Example: To see whether St. John's Wort (a medicinal herb) decreases the incidences of depression, subjects
were randomly given either St. John's Wort or a placebo. Depending on what treatment they received (herb or
placebo), they were in either group 1 or group 2.
Other Links
Epidemiology General Lectures (under the topic 'Supercourse lectures' please select epidemiology and biostatistics.
Guidelines for Good Epidemiology Practices
Four Types of Epidemiological Studies
Cohort Studies: Retrospective Studies
In retrospective studies, subjects are divided into groups and the individual groups are categorized by certain
characteristics. The outcomes are then measured in the present time.
Example: An epidemiologist wants to know whether people of the Christian faith have lower incidences of cancer
than people with Muslim faith, so she conducts a retrospective cohort study. From the sample, she takes 2 groups- each
group is characterized by have the faith of either Christianity or Muslim. She then measures the incidence of cancer in each
group.
Cohort Studies: Prospective Studies
In prospective studies, group/s (cohort/s) of healthy people are followed through time and the epidemiologist
observes whether lifestyle factors are related to occurence of disease. A prospective cohort study starts in the present time
and waits for outcomes in the future.
Example: An epidemiologist wants to know whether people of the Christian faith have lower incidences of cancer
than people with Muslim faith, so she conducts a prospective cohort study. After obtaining two groups-Christians and
Muslim-she takes initial cancer incidence measures, then does a follow-up measure after some time as passed.
Odds Ratio Example
An epidemiologist decides to conduct a Case Control Study by comparing 50 people with lung cancer to 50 people
without lung cancer. She investigates the potential causal factor of smoking to find that 45 out of 50 smokers had lung
cancer, whereas 5 out of 50 non-smokers had lung cancer. This is graphically depicted below:
The Odds Ratio= (case(+) * control(+))/(case(-) * control(-))

The Odds Ratio for the example above= 45*47/5*3
Conclusion: Since the odds ratio is so high, conclude that smoking is a significant factor in lung cancer.
Action
Research Action Research
Home What Is Action Research?
Action Research (AR) has its academic roots in sociology, social
Definition psychology, psychology, organizational studies, and education.
Features of AR Action research can be described as a family of research
methodologies which pursue action (or change) and research (or
Uses of AR understanding) at the same time. In most of its forms it does this
by using a cyclic or spiral process which alternates between
PAR
action and critical reflection. In the later cycles, it alternates
Uses of PAR between data collection and interpretation in the light of the
understanding developed in the earlier cycles. It is thus an
Problems
emergent process, which takes shape as understanding increases;
Resources it is an iterative process, which converges towards a better
understanding of what happens. In most of its forms it is also
participative and qualitative.
Action Research is a methodology, which is intended to have

both action outcomes and research outcomes. The action is
primary. In distinction, there are some forms of action research
where research is the main emphasis and the action is almost a
fringe benefit. The responsiveness of action research allows it to
be used to develop hypotheses from the data, "on the run" as it
were. It can therefore also be used as a research tool for
investigative or pilot research, and generally for diagnosis or
evaluation. (B, Dick, Action Research Resources)
Action Research:
• Is educative
• Deals with individuals as members of social groups
• Is problem-focused, context-specific and future-
orientated
• Involves a change intervention
• Involves a cyclic process in which research, action and
evaluation are interlinked
• Aims at improvement and involvement
• Is founded on a research relationship in which those
involved are participants in the change process
(Hart E and Bond M 1995)
Back to top
Last revised: May 15, 2000
Action Research
What Is Action Research?
Action Research (AR) has its academic roots in sociology, social psychology, psychology,
organizational studies, and education. Action research can be described as a family of research
methodologies which pursue action (or change) and research (or understanding) at the same time.
In most of its forms it does this by using a cyclic or spiral process which alternates between
action and critical reflection. In the later cycles, it alternates between data collection and
interpretation in the light of the understanding developed in the earlier cycles. It is thus an
emergent process, which takes shape as understanding increases; it is an iterative process, which
converges towards a better understanding of what happens. In most of its forms it is also
participative and qualitative.
Action Research is a methodology, which is intended to have both action outcomes and research
outcomes. The action is primary. In distinction, there are some forms of action research where
research is the main emphasis and the action is almost a fringe benefit. The responsiveness of
action research allows it to be used to develop hypotheses from the data, "on the run" as it were.
It can therefore also be used as a research tool for investigative or pilot research, and generally
for diagnosis or evaluation. (B, Dick, Action Research Resources)
Action Research:
• Is educative
• Deals with individuals as members of social groups
• Is problem-focused, context-specific and future-orientated
• Involves a change intervention
• Involves a cyclic process in which research, action and evaluation are interlinked
• Aims at improvement and involvement
• Is founded on a research relationship in which those involved are participants in the
change process
(Hart E and Bond M 1995)
Back to top
Definition of Action Research:
Good action research is developmental; namely, it is a form of reflective inquiry,

which enables practitioners to better realize such qualities in their practice. The tests
for good action research are very pragmatic ones. Does it improve the professional
quality of the transactions between practitioners and clients/colleagues?
Mission Statement, CARN
Action research can be a practical way to test new ideas and participate in
professional development. It allows individuals or teams to follow their interests,
their needs, and their problems, while expanding their repertoire of available skills.
It puts a theoretical concept, teaching practice, or totally new hypothesis to the test in
a classroom or school. Action Research is done by and for the people taking the action
and relates to the action they are taking. Its purpose can be improving the practice of
an individual researcher, or it can be collaborative and focus on school goals.
Richard Sagor
Characteristics of Action Research
Action Research tends to be....
• Cyclical: similar steps tend to recur, in a similar sequence
• Participative: the stakeholders and informants are involved as partners, or at least active
participants, in the research process
• Qualitative: it deals more with language than with numbers
• Reflective: critical reflection upon the process and outcomes are important parts of each
cycle.
To achieve action, action research is responsive. It has to be able to respond to the emerging
needs of the situation. It must be flexible in a way that some research methods cannot be. To
increase rigor, it is usually cyclic. The early cycles are used to help decide how to conduct the
later cycles. In the later cycles, the interpretations developed in the early cycles can be tested and
challenged and refined. In most instances the use of qualitative information increases
responsiveness. It is possible to work in natural language, which is easier for informants.
Bargal, et al, have identified six features of Action Research:

1. A cyclic process of planning, action, and evaluation;
2. A continuous feedback of the research results to all parties involved, including clients;
3. Co-operation between researchers, practitioners, and clients from the start and throughout
the entire process;
4. Application of the principles that govern social life and group decision making;
5. Taking into account differences in value systems and power structures of all the parties
involved in the research;
6. Using action research concurrently to solve a problem and to generate new knowledge
(Bargal, D., Gold, M., and Lewin, M. (1992). Introduction: The Heritage of Kurt Lewin. Journal
of Social Issues, 48, 2.)
One crucial step in each cycle consists of critical reflection. The researcher and others involved
first recollect and then critique what has already happened. The increased understanding that
emerges from the critical reflection is then put to good use in designing the later steps. The cycle
best known in Australia is probably that of Stephen Kemmis and his colleagues at Deakin
University. The steps are:
plan Þ act Þ observeÞ reflect Þ plan

The reflection leads on to the next stage of planning. The "planning" isn't a separate and prior
step; it is embedded in the action and reflection. Short, multiple cycles allow greater rigor to be
achieved. As change is intended to result, effective action research depends upon the agreement
and commitment of those affected by it. This is usually generated by involving them directly in
the research process. In many instances, researchers try to involve them as equal partners.
Most writers on the topic state or assume that action research is cyclic, or at least spiral in
structure. To put this differently, certain steps tend to recur, in more-or-less similar order, at
different phases of an action research study. At the same time, or so the action researcher hopes,
progress is made towards appropriate action and research outcomes.
Using a cyclic process in most circumstances enhances responsiveness. It makes sense to design
the later stages of an action research activity in such a way that you capitalize on the
understanding developed in the early stages.
A cyclic process is important. It gives more chances to learn from experience provided that there
is real reflection on the process and on the outcomes, intended and unintended. Again, a cyclic
process allows this to happen more easily. If each step is preceded by planning and followed by
review, learning by researcher and stakeholder is greater. The quality of evidence can also be
increased by the use of multiple sources of evidence within all or most cycles. Differences
between data sources, used critically, can lead the researchers and the participants towards a
deeper and more accurate understanding.
Most action research is qualitative. Sometimes a mix of qualitative and quantitative methods is
used. All else being equal, numbers do offer advantages. In field settings, though, one often has
to make other sacrifices to be able to use them. Most importantly, sometimes numbers are not
easily applied to some features of a study. In addition, developing a suitable quantitative measure
is often difficult and time-consuming. It may be more time-efficient to use qualitative data. As I
mentioned before, it is also easier to be flexible and responsive to the situation if you are using
When to Use Action Research?
In many field settings it is not possible to use more traditional research methods because they
can't readily be adjusted to the demands of the situation. If you do alter them in midstream you
may have to abandon the data collected up to that point. (This is because you have probably
altered the odds under the null hypothesis.)
I think that the major justification for action research methods is that they can be responsive to
the situation in a way that many other research methods cannot, at least in the short term. On
these grounds I think action research will usually, though perhaps not always, be cyclic in nature.
In the interests of rigor, each cycle will include critical reflection. In most instances it will also be
qualitative and participative to some extent.
When you wish to find out about a few variables, and the causal relationships between them.
Then, experimental or quasi-experimental research will serve you much better than action
research. Alternatively, you may wish to explore some organization or group or culture in depth.
For this, you may do better to use ethnographic methods.
Action research methods are most likely to be appropriate when you do not know where to start,
and do not have a lot of time to invest in the study. It is useful for exploratory research, where
you do not yet have a very precise research question. But it is most valuable when you have to be
responsive to the changing demands of a situation, as when you wish to build a research
component into some change program or the like. Good research, it can be argued is research,
which uses a methodology, which fits the situation, and the goals you are pursuing.
AR is most commonly used for research in Education and Community Development.
Action Research in Education

It puts a theoretical concept, teaching practice, or totally new hypothesis to the test in a classroom
or school. Action research is done by and for the people taking the action and relates to the action
they are taking. Its purpose can be to improve the practice of an individual researcher, or of
collaborative research; it can focus on school goals. Teachers raise questions about classroom
practice, carefully document procedures and gather data on student performance, then reflect on
that data and practical experience to determine what to do next. Action research cycles often start
with a question (e.g., Can I accelerate student learning by using cooperative learning groups?).
The steps that often follow problem formulation are theory development, design intervention,
data collection, and data analysis. The process is not a lock-step regime. More often than not, data
collection and analysis lead to new questions or further data collection for the same problem/
hypothesis much like peeling away the layers of an onion.
Perhaps the most important part of the process is the reflection on collected data. Having asked a
question that begs an answer, and designed a plan for collecting that information, teachers reflect
on their experiences and ask questions like:
• What were the anticipated effects?
• Were there some unanticipated effects?
• What have we learned from this?
• What might we have to relearn or unlearn in our work?
• What are our next steps? Should we stop doing this because it doesn't work?
• Continue doing this because it is getting results we find desirable?
• Start doing something else that may be more likely to succeed?
Participatory Action Research (PAR)
It is also generally held that action research is participative, though writers differ on how
participative it is. So, the extent of participation may vary. In some instances there may be a
genuine partnership between the researcher and the researched group. In such a case the
distinction between researcher and the stakeholders will disappear. On other occasions the
researcher may choose for whatever reason to maintain a separate role. Participation may be
limited to being involved as an informant. The degree of participation varies with every research
design.
PAR is by far the most popular form of Action Research .In using this research method it is assumed that the eventual
interpretation of information is richer if the involvement of the stakeholders is greater. Action outcomes can usually be achieved only
with some commitment from those most affected. Participation is one means to that commitment. There may well be other reasons,
too. For a detailed paper on PAR, click here.
Participatory Action Research in Rural Development is Useful For:
• Listening to rural people
• Involving beneficiaries in design, implementation and operation
• Providing the vehicle for continuous dialogue
• Earning their trust
• Delivering useful and useable applications
• Staying on
Problems
Arguments justifying AR or PAR are often persuasive. But, is AR a problem-free research
perspective? The day a research perspective becomes problem-free, it also becomes a dogma.
Therefore, we must always ask the sort of question we have just asked. To answer this question,
we may like to look at a number of PAR experiences, ponder over the dynamics of the process,
and contemplate about the contribution of 'research' in every instance. PAR literature talks about
liberating 'subjugated knowledges' (following Michel Foucault), but does not comment
methodologically on the 'what' and the 'how' of either the 'subjugation' or the 'liberation'. Can
'subjugation' and 'liberation' be recognized? Further, is there a notion of accumulation in this type
of research? What method allows local accumulation of knowledge, as PAR would expect to
effect? Besides, if PAR involves changes in the lives of people, then is not the researcher
accountable in some sense? To whom is the researcher accountable?
The role of researchers is yet another issue. Action Research seeks to give a new meaning to the
notion of research, in which, one may say, what enters the research process is subject to judgment
and negotiation. Further, what is done with what enters is also not pre-determined. And, perhaps,
what is done with the product of the process is also subject to the forces of individual and
collective will. In this, the researcher-role is potentially nebulous. It is quite likely that the person
playing the role is also one of the interested parties. In this situation, can the usual sanctity of the
research act be maintained? Besides, does the researcher have to be accountable to a wider
scientific forum? Role dilemmas are also considered by Warmington (1980, pp. 36ff) and
Rapoport (1970).
Resources and Recommended Reading
Action Research Resources maintained by Bob Dick, at the Southern Cross University, Australia.
Action Research International online Journal for Action Research
Cornell's very own Participatory Action Research Network, linked to other PAR resources and
archives, includes calendar of PAR activities near you and PAR bookshop. Click here
Bill Trochim's Social Research Methods Web-Base contains the Research Methods Knowledge
base and a rich source of web tutorials on a variety of research topics.
Yoland Wardsworth, What is PAR?
Contributed papers on action research by a variety of people; this is a mirror site of the action
research reader at Ian's "Action Research on the Web", site hosted by the University of Sydney.
Introduction to Action Research: Democratizing the Research Process, Davydd J. Greenwood,

and Morten Levin Paperback / Published 1998. It is an excellent book for anyone venturing out
into the world of Action Research. To Purchase or to know more click here.
Participatory Rural Appraisal (PRA) Bibliography: compiled at the Institute for Development
Studies, Brighton. It includes abstracts of 3000 unpublished articles by practitioners and
researchers. It also includes notes from field studies and critical reflections. The website is
hosted by Britain's Premier Institute of Development Studies. Click here to view.
Website compiled and created by:

Ami Sengupta
Qualitative
Research
Qualitative Research Methods
Home Introduction:
Rationale This website presents a tutorial on qualitative research methods. It is

designed to help readers with little or no knowledge on the subject.
Design There are several types and classifications of qualitative research
methods, but here only five of them are discussed (Creswell, 1998).
Characteristics
Types/Traditions
A qualitative research may be generally defined as a study, which is
conducted in a natural setting where the researcher, an instrument of
Biography data collection, gathers words or pictures, analyzes them inductively,
focuses on the meaning of participants, and describes a process that is
Phenomenology both expressive and persuasive in language.
Grounded Theory Creswell (1998) defines qualitative study as:

Ethnography “Qualitative research is an inquiry process of
understanding based on distinct methodological
Case Study traditions of inquiry that explore a social or
human problem. The researcher builds a complex,
References holistic picture, analyzes words, report detailed
views of informants, and conducts the study in a
Related Sites natural setting.”
Qualitative research should not be viewed as an easy substitute for a

“statistical” or quantitative study. It demands a commitment to an
extensive time in the field, engagement in the complex, time-
consuming process of data analysis, writing of long passages, and
participation in a form of social and human science research that does
not have firm guidelines or specific procedures and is evolving and
changing constantly. For reasons why one could conduct qualitative
research, click here.
Comments? Send mail to: Lydia

Back to top
Qualitative Research Methods
Introduction:
This website presents a tutorial on qualitative research methods. It is designed to help readers
with little or no knowledge on the subject. There are several types and classifications of
qualitative research methods, but here only five of them are discussed (Creswell, 1998).
A qualitative research may be generally defined as a study, which is conducted in a natural

setting where the researcher, an instrument of data collection, gathers words or pictures, analyzes
them inductively, focuses on the meaning of participants, and describes a process that is both
expressive and persuasive in language.
Creswell (1998) defines qualitative study as:
“Qualitative research is an inquiry process of understanding based on

distinct methodological traditions of inquiry that explore a social or
human problem. The researcher builds a complex, holistic picture,
analyzes words, report detailed views of informants, and conducts the
study in a natural setting.”
Qualitative research should not be viewed as an easy substitute for a “statistical” or quantitative
study. It demands a commitment to an extensive time in the field, engagement in the complex,
time-consuming process of data analysis, writing of long passages, and participation in a form of
social and human science research that does not have firm guidelines or specific procedures and
is evolving and changing constantly. For reasons why one could conduct qualitative research,
click here.
Comments? Send mail to: Lydia

Back to top
Reasons For Conducting Qualitative Research
To engage in qualitative enquiry, there is a need to first determine whether a strong rationale
exists for choosing a qualitative approach. The following reasons could call for a qualitative
inquiry:
● Topics that need to be explored: This is a situation where variables cannot be easily
identified, theories are not available to explain behavior of participants or their population
of study;
● Need to present a detailed view of the topic: This is the case where the distant panoramic
view is not enough to present answers to the problem;
● Need to study individuals in their natural setting: This is the case where, if participants are
removed from their natural setting, it leads to contrived findings that are out of context;
● Need to write in a literary style: This is where the writer engages a story telling form of
narration and the personal pronoun “I” is used;
● Where there is sufficient time and resources to spend on extensive data collection in the
field and detailed data analysis of “text” information;
● The nature of research question: In a qualitative study, the research questions often starts
with a how or a what; and
● Audiences are receptive to qualitative research.
To go to a discussion on designing a qualitative research, click here.

Designing A Study
Generally, the format for the design of this study follows the traditional research approach of
presenting a problem, asking a question, collecting data to answer the question, analyzing the
data, and answering the question.
The following format can serve as a guide for planning a study:

● Introduction
❍ Statement of the Problem
❍ Purpose of the Study
❍ The Grand Tour Question and Sub Questions
❍ Definitions
❍ Delimitations and Limitations
❍ Significance of the Study
● Procedure
❍ Assumptions and Rationale for a Qualitative Design
❍ The Type of Design Used
❍ The Role of the Researcher
❍ Data Collection Procedures
❍ Methods for Verification
❍ Outcomes of the Study and Its Relation to Theory and Literature
● Appendixes
More details of each one of these are discussed under the traditions.
For information on characteristics of a “good” qualitative research, click here.
Types/Traditions of Qualitative Research
Five types/traditions of qualitative research are identified here.
The following are specified concerning each tradition:

● Definition
● Procedures involved in conducting a study
● Potential problems that exist in using it
Click on each of the five traditions listed below for details of the issues mentioned above on each
of them:
● Biography
● Phenomenology
● Grounded Theory
● Ethnography
● Case Study
Biography
Definition:
A biographical study is the study of an individual and his/her experiences as told to the researcher
or found in documents and archival records (Creswell, 1998).
Procedures Involved In Conducting A Study:

• It begins with an objective set of experiences in the subject’s life, noting life course stages
and experiences. The life course stages may be childhood, adulthood, or old age, written in a
chronology, or experiences such as education, marriage, and employment.
• Next, the researcher gathers concrete contextual biographical material using interviewing.
Here, the researcher focuses on gathering stories as the subject recounts a set of life
experiences in the form of a story or narrative.
• The researcher then organizes the stories around themes that indicate epiphanies (i.e.,
pivotal events) in the subject’s life.
• The researcher explores the meanings of these stories. However, the researcher relies on
the individual to provide explanations and then searches for multiple meanings.
• In addition, the researcher looks for larger structures to explain the meanings, and
provides an interpretation for the life experiences of the individual. The larger structures
could be social interactions in groups, cultural issues, ideologies and historical context. If
more than one individual is studied, cross-interpretation can be done.
Challenges:
• The information gathering from and about the subject is usually very extensive and
demanding.
• There is the need to have a clear understanding of the history and context to enable one to
position the subject within the larger trends in society or in the culture.
• It takes a keen eye to determine the particular stories, slant, or angle that “works” in
writing a biography and to uncover the “figure under the carpet” (Edel, 1994) that explains the
multilayered context of a life.
• The researcher needs to be able to bring himself/herself into the narrative and to
acknowledge his or her standpoint, since this is an interpretive research.
Click on any of the other traditions or the comparison below to read on them.
Phenomenology Grounded Theory Ethnography Case Study

Phenomenology
Definition:
A phenomenological study describes that meaning of the lived experiences for several individuals
about a concept or a phenomenon (Creswell, 1998). As noted by Polkinghorne (1989),
phenomenology explores the structures of consciousness in human experiences.

• The researcher writes research questions that explore the meaning of lived experiences for
individuals, and asks individuals to describe these experiences.
• The researcher then collects data, typically via long interviews, from individuals who have
experienced the phenomenon under investigation.
• The data analysis involves horizontalization (i.e., extracting significant statements from
transcribed interviews). The significant statements are then transformed into clusters of
meanings according to how each statement falls under specific psychological and
phenomenological concepts. Finally, these transformations are tied together to make a general
description of the experience – both the textural description (of what was experienced) and the
structural description (of how it was experienced). The researcher can incorporate his/her
personal meaning of the experience here.
• Finally, the report is written such that readers understand better the essential, invariant
structure of the experience (or essence) of the experience. The reader should come away with
the feeling that, “I understand better what it is like for someone to experience
that.” (Polkinghorne, 1989).
Challenges:
• The researcher requires a solid grounding in the philosophical precepts of phenomenology.
• The subjects selected into the study should be individuals who have actually experienced
the phenomenon.
• The researcher needs to bracket his/her own experiences, which is difficult to do.
• The researcher needs to decide as to how and when his/her personal experiences will be
incorporated into the study.
Biography Grounded Theory Ethnography Case Study

Grounded Theory
Definition:
The intent of grounded theory is to generate or discover a theory – an abstract analytical schema
of a philosophy, that relates to a particular situation. This situation could be one in which
individuals interact, take actions, or engage in a process in response to a phenomenon (Creswell,
1998).

• In open coding, the researcher forms initial categories of information about the
phenomenon being studied by segmenting information. Within each category (a category
represents a unit of information composed of events, happenings and instances), the researcher
finds several properties, or subcategories, and looks for data to dimensionalize, or show the
extreme possibilities on a continuum of, the property.
• In axial coding, the researcher assembles the data in new ways after open coding. The
researcher presents this using a coding paradigm or logic diagram in which he/she identifies a
central phenomenon, explores causal conditions (i.e., categories of conditions that influence
the phenomenon), specifies strategies (i.e., the actions or interactions that result from the
central phenomenon), identifies the content and intervening conditions (i.e., the narrow and
broad conditions that influence the strategies), and delineates the consequences (i.e., the
outcomes of the strategies) for this phenomenon.
• In selective coding, the researcher identifies a “story line” and writes a story that integrates
the categories in the axial coding model. In this phase, conditional propositions (or
hypotheses) are typically presented.
• Finally, the researcher develops and visually portrays a conditional matrix that elucidates
the social, historical, and economic conditions influencing the central phenomenon.
This process results in a theory, written by the researchers close to a specific problem or
population of people.
Challenges:
• The researcher needs to set aside, as much as possible, theoretical ideas or notions so that
the analytical, substantive theory can emerge.
• Despite the evolving, inductive nature of this form of a qualitative inquiry, the researcher
must recognize that this is a systematic approach to research with specific steps in data
analysis.
• The researcher needs faces the difficulty of determining when the categories are saturated
or when the theory is sufficiently detailed.
Biography Phenomenology Ethnography Case Study
Ethnography
Definition:
An ethnography is a description and interpretation of a cultural or social group or system
(Creswell, 1998). In such a study, the researcher examines the group’s observable and learned
patterns of behavior, customs, and ways of life (Harris, 1968). Here, the researcher becomes a
participant observer, and gets immersed in the day-to-day lives of the people or through one-on-
one interviews with members of the group. The researcher focuses on the meanings of behavior,
language, and inter-actions of the culture-sharing group.

• The research begins with the researcher looking at people in interaction in ordinary
settings and attempting to discern pervasive patterns such as life cycles, events, and cultural
themes.
• To establish patterns, the ethnographer engages in extensive work in the field (field work),
gathering information through observations, interviews, and materials helpful inn developing a
portrait and establishing “cultural rules” of the culture-sharing group.
• The researcher is sensitive to gaining assess to the field through gatekeepers. The
ethnographer locates key informants, i.e., individuals who provide useful insights into the
group and can steer the researcher to information and contacts. The researcher is also
sensitive about reciprocity between the investigator and the subjects being studied, so that
something will be returned to the subjects being studied in exchange for their information.
Lastly, the researcher is also sensitive to reactivity, the impact of the researcher on the site and
the people being studied. The researcher also makes every effort to make his/her intent known
from the start to avoid any trace of deception.
• The researcher then does a detailed description of the culture-sharing group or individual,
an analysis by themes or perspectives and some interpretation for meanings of social
interaction and generalizations about human social life.
Challenges:
• The researcher needs to have a grounding in cultural anthropology and the meanings of
social-cultural systems as well as the concepts typically explored by ethnographers.
• The time to collect data is extensive, involving prolonged time in the field.
• The style of writing, literary (almost story telling approach), may limit audience and may
be challenging for some authors who are used to traditional approaches of writing social
science research.
• There is the possibility that the researcher would “go native” and be unable to complete the
study or be compromised in the study.
Biography Phenomenology Grounded Theory Case Study

Case Study
Definition:
Creswell (1998) defines a case study as an exploration of a “bounded system” or a case (or
multiple cases) over time through detailed, in-depth data collection involving multiple sources of
information rich in context. Some consider “the case” as an object of study (e.g., Stake, 1995)
while others consider it a methodology (e.g., Merriam, 1998). According to Creswell, the
bounded system is bounded by time and place, and it is the case being studied – a program, an
event, an activity, or individuals.

• The researcher needs to situate the case in a context or setting. The setting may be a
physical, social, historical, and/or economic.
• The researcher needs to identify the focus of the study. It could be either on the case
(intrinsic study), because of its uniqueness, or it may be on an issue or issues (instrumental
study), with the case used instrumentally to illustrate the issue. A case study could involve
more than one case (collective case study).
• In choosing what case to study, a researcher may choose a case because it shows different
perspectives on the problem, process, or event of interest, or it may be just an ordinary case,
accessible, or unusual.
• The data collection is extensive, drawing on multiple sources of information such as
observations, interviews, documents, and audio-visual materials.
• The data analysis can be either a holistic analysis of the entire case or an embedded
analysis of a specific aspect of the case.
• From the data collection, a detailed description of the case is done. Themes or issues are
formulated and then the researcher makes an interpretation or assertions about the case.
• When multiple cases are chosen, a typical format is to provide a detailed description of
each case and themes within the case (called within-case analysis), followed by a thematic
analysis across the cases (called a cross-case analysis), as well as assertions or an
interpretation of the meaning of the case.
• In the final stage, the researcher reports the “lessons learned” from the case (Lincoln and
Guba, 1985).
Challenges:
• The researcher needs to identify his/her case among a host of possible candidates.
• The researcher needs to decide whether to study a single case or multiple cases. The
motivation for considering many cases is the issue of generalizability, which is not so much of
a pressing issue in qualitative inquiry. Studying more than one case runs the risk of a diluted
study, lacking the “depth” compared to a single case. “How many” cases becomes a challenge
then.
• Getting enough information to get a good depth for the case is a challenge.
• Deciding on the boundaries in terms of time, events and processes may be challenging.
Some cases have no clean beginning and ending points.
Biography Phenomenology Grounded Theory Ethnography

Characteristics of a “Good” Qualitative Research
There are standards for assessing the quality of qualitative studies (Creswell, 1998; Howe &
Eisenhardt, 1990; Lincoln, 1995; Marshall & Rossman, 1995). The following short list of
characteristics of a “good” qualitative research is presented by Creswell (1998):
• It entails Rigorous data collection: The researcher collects multiple forms of data,
summarizes them adequately and spends adequate time in the field.
• The study is framed within the assumptions and characteristics of the qualitative approach
to research.
• The researcher identifies, studies and employs one or more traditions of inquiry.
• The researcher starts with a single idea or problem that s/he seeks to understand, not a
causal relationship of variables.
• The study involves detailed methods, a rigorous approach to data collection, data analysis,
and report writing.
• The writing is so persuasive that the reader experiences “being there.”
• Data is analyzed using multiple levels of abstraction. That is, the researcher’s work is
presented in a way that moves from particulars to general levels of abstraction.
• The writing is clear, engaging, and full of unexpected ideas. The story and findings
become believable and realistic, accurately reflecting all the complexities that exist in real
situation.
To take a look at the different types/traditions of qualitative research as outlined by Creswell

(1998), click here.
References:
References
Creswell, J. W. (1998), “Qualitative Inquiry And Research Design: Choosing Among Five
Traditions” Sage Publications, London, New Delhi
Dabbs J.M., Faulkner R.R., and Maanen J.V., (1982), “Varieties Of Qualitative Research.” Sage
Publications
Flinders D.J. and Mills G.E. (1993), “Theory And Concepts In Qualitative Research: Perspectives
From The Field” Teachers College, Colombia University NY and London.
Lewin K., Stephens D. and Vulliamy G. (1990), “Doing Educational Research In Developing
Countries: Qualitative Strategies.” The Falmer Press (London, New York, Philadelphia)
Patton M.Q. (1987), “How To Use Qualitative Methods In Evaluation.” Sage Publications
Shaffr W.B. and Stebbins R.A. (1991), “Experiencing Fieldwork (an inside view of qualitative
research).” Sage Publications
Links
Nova Southeastern University has a page maintained by Ron Chenail, that has resources
on qualitative research (papers, abstracts, and qualitative methods on mental health
research workshop).
Qualpage, by Judy Norris, has information on conferences, workshops, conference

proceedings, discussion forums, electronic journals, interest groups, software resources,
etc.
Social Science Information Gateway (SOSIG) has a lot of Internet resources on

York University’s graduate program in education maintains a course page (Dr. Ron
Owston) on qualitative methods with link to other resources on the www.
Heskes & Partners is a Qualitative Market Research company in The Netherlands for
applications of Concept development research, Product development research, Image
and Positioning research, and Customer Satisfaction research.
Social Research Update is published quarterly by the Department of Sociology,

University of Surrey, Guildford GU2 5XH, England
The Use of Pattern Matching and Concept Mapping in
Community Organizing Projects
How can Pattern Matching and Concept Mapping be applied to Community Organizing Projects?
What is Pattern Matching?
What is Concept Mapping?
Community Organizing and Development Evaluation Sites
Who's Who in Community Organizing and Activism
The Use of Pattern Matching and Concept Mapping in Community Organizing Projects Copyright &copy1997; Georgette M. King gmk5@cornell.edu. All rights
Reserved.

The Use of Pattern Matching and Concept Mapping in
Community Organizing Projects
One of the applications for Pattern Matching and Concept Matching in Community Organizing Projects is the ability to use technology to graphically show the priorty associated
with areas community need or concerns that community members want to address and then either show the perceived matches or perceived disconnects in the levels of resources
that community members believe to be devoted to each area. As the Neighborhood Association of the Manual Community Involvement Division of the Salem Neighborhoods, Inc.
points out, having "... visual aids prepared in advance..." can be an effective tool when community groups use their power to effect change. It is not enough, therefore, to simply
state that the people in a given community feel that the levels of resources devoted to a specific area of need are insufficient.
SIMPLY TALKING AT power brokers

can sometimes not only
be INEFFECTIVE. . .
. . .It may actual serve to keep the balance of power

tipped away from community groups. . .
USING A COMBINATION OF TECHNOLOGY

and VISUAL PRESENTATIONS is a more powerful way
of delivering a message and balancing power!
It is much more effective to be able to show the community's perception of its resource needs and the community's perception of the current response from resource providers,
local, state or federal government, the local school district, or any other body that can either be held accountable for resource deficits and/or work with the community
organizers to effect change. This process will serve as a powerful planning tool also, allowing community groups to focus their energies on areas that are perceived as neglected,
or to alter publicity or marketing of an existing, but overlooked, resource.
Quoted from the Community Involvement Division of the Salem Neighborhoods, Inc. Neighborhood Association Manual: The "How To"s of Citizen Action.
Enhance Your Group's Clout
". . . Identify and challenge decision-makers' assumptions about problems. . ."

". . . Understand concerns, viewpoints and priorities of your enemies and allies. . ."
". . . Define your ideas, your vision of a society where problems which concern you have been resolved. . ."
An example of Pattern Matching in use to address community concerns
Often community organizing projects begin by surveying the community and compiling a list of community concerns. That list may then be prioritized to give the community
organizers information about what concerns should be addressed first.
Is there more that could be done with that list of concerns to make a more powerful impact?
One of the first questions that may come to mind is wether there is a common understanding of the concepts that are expressed on the list. If a percentage of the community
members envisions the concept of 'quality childcare', let's say, as being more 24 hour childcare centers that will accommodate the work schedules of parents assigned to swing-
shifts, and others define 'quality childcare' as after-school programs for latch key children, or in-home daycare for toddlers. It is clear to see that they may also feel that the
community work done should focus in totally different directions. Establishing construct validity through the development of clear definitions of each concept and then
prioritizing the elements of the concepts' definitions, is a vital element of the process.
Once a list of concepts has been prioritized and the elements of the concepts have been defined and prioritized, the third step in the process would consist of getting an accurate
feel for the community's perception about how the enumerated resource needs are currently being addressed by various levels of resource providers. Sticking with our childcare
example, let's suppose that the community has prioritized 'quality childcare' as the top item on their list. They have also defined the key elements of the concept of 'quality
childcare' and have prioritized the elements based on community needs.
● Priority # 1: Quality Childcare
24 hour childcare centers

After-school programs for latch key children
In-home daycare for toddlers
Expanded full day kindergarden
Assistance in locating open, licensed childcare slots
That process would not give a clear picture of how the community perceives that the issue of 'quality childcare' is being addressed. It is possible that when they examine their
perception of the response to the vital need for 'quality childcare' that they are completely satisfied with the energy and money that is being focused on that issue. Perhaps the
community can readily identify projects that are underway to expand the local 24 hour childcare center, or know that the area elementary schools and community centers will
have expanded after school and summer programs for latch key children. Perhaps they are very happy about the joint efforts between community members and the area daycare
provider's organization to recruit, train and provide start-up funds for community members that are interested in setting up in-home daycare centers.
If such a connection between the communities perception of the importance and definition of the concept 'quality childcare' and their perception of the level and quality of
resources devoted to the concept 'quality childcare' exists the rung of the Pattern Match Ladder Graph that represents childcare would look like this:
But it is also likely that a community may feel there is no such connection. They may perceive that little is happening to address the need for 'quality childcare', or that the
actions that are being taken are not appropriate. (A community that is very isolated through poor public transportation from a wonderful new childcare center that is strictly fee-
for-service, and also lacks a sliding scale fee option, may understandably get a very low ranking for the resource level associated with concept 'quality childcare'.
THE SLOPE OF THE LINE GRAPH CLEARLY INDICATES A DISCONNECT IN THE COMMUNITY'S PRIORITIZATION OF THE CONCEPT 'QUALITY
CHILDCARE' AND THE COMMUNITY'S PERCEPTION OF THE ENERGIES THAT ARE BEING EXPENDED BY THE VARIOUS LEVELS OF RESOURCE
PROVIDERS TO MEET THAT NEED.
Now it should be stated that the action that the community organizers take may vary quite a bit when such a disconnect has been found, just as the response from resource
providers may vary, also. Perhaps the community organizers may want to lobby the school board to shift funds to expanding an after school program. They may also, however,
want to begin a volunteer based program at a local church, and/or encourage community members to consider starting home-based daycare services by promoting, or creating,
entrepreneurial training and support efforts.
The resource community may in fact be defined as not just the typical service providers (notice how I avoid that phrase completely) but may be made up of a pool of
organizations, individuals, community groups, religious groups, funding providers, and all levels of government. A response from some of those resource providers may be to just
shift the focus of their efforts to make sure that they get the most 'bang for their bucks'; a shift in focus could be the difference between spending themselves into a deficit position
or actually doing something positive and productive for their target population. Other groups may be able to identify an unmet need and use the documentation of that unmet
need to seek new funding.
Having such a neatly wrapped bundles of needs,

measured community interests, and well defined and prioritized
concepts handed to a potential resource provider may be an
agencies dream. Instead of having to convince a powerbroker to
take on a project the powerbroker now has a strong incentive to
collaborate with the community organizers and gain the community's
continued support for a new funding proposal initiative.
A Pattern Match Ladder Graph that addresses all of the top priorities of a community along with the community's corresponding perception of the levels of effort that are being
devoted to addressing those priorities may look something like this:
But that still leaves open the question of how the definitions and priorities of the elements that comprise the community's definitions of the concepts can be shown visually. For
that we will move on and look an example of Concept Mapping in use to address community concerns.
An example of Concept Mapping in use to address community concerns
Resources describing the process of concept mapping can be found by going to What is Concept Mapping?. Creating a concept map of the elements that describe the major areas
of community concerns allows an oppurtunity to describe the concepts themselves and provide a visual representation of the priorities of the constructs and how they relate to
each other.
Using the information that was gathered from the index card sorting and prioritizing of the concerns and then listing and prioritizing the elements of the individual constructs
to create a concept map will result in another powerful tool.
For more information please also see:
Home Page
Community Organizing and Development Evaluation Sites
Who's Who in Community Organizing and Activism
Return to Home Page Community Organizing Home Page Trochim Home Page Gallery Home Page
The Use of Pattern Matching and Concept Mapping in Community Organizing Projects Copyright © 1997; Georgette M. King gmk5@cornell.edu. All rights Reserved.

For information about Concept Mapping and assistance in research design and
implementation try looking at the following web sites.
Concept Mapping
Concept Mapping Projects Page
Concept Systems Incorporated
Research Pointers Page
Research Problem Formulation
Articles about Concept Mapping
Concept Systems Incorporated: Ideas in Action
Research Center
Using Concept Mapping to Develop a Conceptual Framework of Staff's Views of a Supported Employment
Program for Persons with Severe Mental Illness by William M.K. Trochim, Cornell University, Judith A. Cook,
Thresholds National Research and Training Center, Chicago, IL and Rose J. Setze, Cornell University. (Published in
the Journal of Consulting and Clinical Psychology, 1994, Vol. 62, No. 4, 766-775)
The Use of Pattern Matching and Concept Mapping in Community Organizing Projects WEB SITE Copyright © 1997; Georgette M. King gmk5@cornell.edu. All rights
Reserved.

For information about Pattern Matching and assistance in research design and
implementation try looking at the following web sites.
Pattern Matching for Construct Validity
The Pattern Matching NEDV Design.
Research Pointers Page
Research Problem Formulation
The Use of Pattern Matching and Concept Mapping in Community Organizing Projects WEB SITE Copyright © 1997; Georgette M. King gmk5@cornell.edu. All rights
Reserved.

Procedures in
Sampling
Introduction
Family This tutorial attempt to identify the major procedures used in research sampling. Why is this an
Hobby important step to consider in undertaking any research? This is because in research sampling, we
want to obtain a representative sample. That is a sample that looks like the population within an
Talents
acceptable margin of error. In order to understand this better, we will start off by defining some
Album research sample jargons that will constantly be used within the discussion. However, emphasis will
be made on the distinction between the two major types of sampling procedures: probability and non-
probability sampling and how they apply in different research situations. Last, we will take a short test
at the end of this lesson to refresh our minds with the concepts discussed.
Definitions
Most of these definitions are taken from Dane Francis’s Research Methods published in 1990. Links are made
to relevant sites existing on other WebPages.
Sampling is the process of selecting participants for a research project.
Sampling unit or element: This refers to a single thing selected for inclusion in a research project. For
example if you sample students from a college, one student would be your sample unit or element.
Population: All the possible units or elements
Sample A portion of the elements in a population is considered a sample. Any given sample can be
part of more than one population. Does this sound like one of those jargons again? Let us unpack
this statement by looking at the following example. You and four of your classmates can be a sample
of your class, a sample of your university students, a sample of your country and so on. We can
therefore define a sample as a more concrete portion of a population or populations if such a term
ever exists.
Sampling frame: Is a listing of the elements in a population. In a university’s admissions office, the
list of names may include all the students who have been admitted into the college even the ones
who never show up or those who have decided to quit. The sample frame is therefore the largest
possible sample of a population. That is everything that can be selected from.
A parameter: Is a value associated with a population and can only be estimated in inferential
statistical terms using a sample. Why? This is because a parameter is usually figured in terms of an
abstract value. What does this mean? We cannot calculate the mean of a population if we cannot
measure and do not know the exact number of units in the population. However since a sample is
more concrete, we can use statistics to estimate parameters. Since these are only estimates that
may contain various amounts of error it is usually referred to us inferential statistics.
Sampling error: Is a term used to refer to the extent to which a sample statistics incorrectly estimates
a population parameter.
Confidence level: This is a probability associated with the accuracy of an inferential statistic since we
do not know exactly what is and what might have not been included in a given population.
Probability sampling:
Probability sampling is a technique used to ensure that every element in a sample frame has an equal chance of
being incorporated into the sample. If this statement does not make so much sense yet, let us try to recall some
of the terms we defined earlier on. Remember that sampling frame is the largest sample that can be obtained
from a population. And that a sample, is only but a portion of things in the population. How can we then ensure
that we have a chance of including every element in a sample frame into our sample? Not an easy task… ah!
Below are some ways by which we can try to accomplish probability sampling. Each has its own advantages
and limitations associated with them. However, you can combine different probability sampling procedures in
order to obtain the sample you want as long as you limit yourself to random selection procedures.
a) Simple random sampling
In simple random sampling, we use an unsystematic random selection process i.e. we identify every
element in the sampling frame then choose them on some planned basis ensuring that every
element has the same opportunity of being selected. What if the sample frame is too large? We can
use a random number table. This is a table containing random numbers. In order to use the table,
you will need to determine the size of your sample frame and the largest number in your sample
frame has to be included into the table. Let us look at this table below with three columns and nine
rows.
24356 46724 25641 67514 98257 98165 35678 87192 98173

17625 78256 71522 98127 09161 01823 91728 56720 09765
28937 97628 09152 61723 91873 91723 87542 19782 25637
Let us create some fictitious numbers of 45 students in a class and we want to get the average math
score for seven students. How do we go about choosing the seven students using the random
sample table above? The first step is to decide how to move in the columns and rows: either up or
down but this has to be systematic. Second, choose any starting point to select a sample of seven
from the sampling frame of 45. Let us assume that we are using the first two digits of the number. Let
us take column two row seven. The numbers are 9 and 1. We are already in a problem. 91 is too
large than 45. Let us take column one row three. The number is 2 and 5. 25 is within the range of 45.
This
will be our first number to use from the sample frame of 45. We will then continue until we get to the
seven students we needed.
b) Systematic random sampling
Systematic random sampling is done through some ordered criteria by choosing elements from a
randomly arranged sampling frame. You can chose from every “nth” element in a sample frame i.e.
10th, 15th, 20th and so on. What are the procedures involved? You have to decide on your sample
size. Let us say your sample size will be made up of 30 students from a sample frame of 400
students, you may want a proportion of 30/400 = 0.075, which is every 13th person.
In both the simple random sampling and the systematic sampling you will be required to generate a
list from the sampling frame. It also requires that the elements within the sampling frame be
homogeneous.
c) Stratified random sampling
When dealing with a sample frame that is not homogeneous and contains subgroups such as
freshmen, juniors, and so on in a listing of university students for instance, you will need to represent
those subgroups in your sample. In order to achieve this, random selection from each subgroup in
the sampling frame has to be considered using the same procedures. The subgroups within the
sample frame will have to be treated as though they are separate sampling frames themselves. Does
this sound like yet another tongue twister? Not after we have equipped ourselves with the sampling
jargons identified earlier in this lesson. Just to refresh our minds on this, click here.
d) Cluster sampling
Have you ever imagined a situation whereby a sampling frame list is unavailable and as a researcher
you have to continue with your work? For instance, how would you go about obtaining a sample
frame list of all college students in Canada? Hard! Would you abandon carrying your research on this
basis? If I were you I would say a big NO! Why? Through cluster sampling, we can randomly select
hierarchical groups from the sampling frame by creating clusters that can be further sampled into
finer gradations of clusters until we can obtain a list of elements.
Let us create a scenerio that will illustrate how cluster sampling is achieved. To select a random
sample of 600 college students in Canada for instance, let us create a sample frame consisting of a
list of colleges. Using the simple random sampling, we can select six colleges from the list. Then
from the clusters, we can randomly select 100 students from the six colleges to obtain 600 students
residing in Canada.
Non probability Sampling

Non probability sampling is any procedure in which elements will not have the equal opportunities of
being included in a sample. In non-probability sampling, you set criteria for elements to be included
in the sample i.e. on basis of region, appearance and so forth hence limiting the chances of
representation in the sample. The simple question one would be obliged to ask is why bother with
non-probability sampling when we already know that there exists bias in sample selection? Non
probability sampling equally plays a major role in the field of explanatory research. However one
major feature of non-probability sampling is that it does not use random sampling and therefore you
cannot estimate sampling error.
We will now look at some examples of non-probability sampling used in research.
a) Accidental sampling
This is selection based on availability or ease of inclusion. Assume that you were walking down the
street and an interviewer chose to videotape you for the evening local news broadcast. Can this be
termed accidental or random sampling? I would argue strongly for accidental sampling because this
was a mere selection based on your availability and willingness to talk. Accidental sampling can lead
to misinterpretation of results. Do you recall the outcome of the 1936 U.S presidential elections
published in The literary Digest? Maybe you were not yet born by that time. Try to find this
publication and see what results accidental sampling can produce.
b) Purposive sampling
In exploratory or pilot projects you may be purposely inclined to obtain data from specific individuals.
Such data may give you internal validity of your project, but you may not be able to generalize it to
other places and people.
c) Quota sampling
In quota sampling, you select sampling elements on the basis of categories that are assumed to exist
within a population. How is quota sampling different from stratified random sampling discussed
earlier on? In the former, elements are randomly selected from stratified groups while in the latter a
presumed subdivision is used as the bases of the selection procedure. Although in quota sampling
the results may almost reflect similarities with the population, there is difficulty in determining the
amount of sample error. Do you remember the famous 1948 photograph of Harry Truman holding a
newspaper with the following infamous headline…DEWEY DEFEATS TRUMAN? (Dane, 1990). Find
out more about this story and reflect on how quota sampling may lead to our inability to generalize to
a population.
Questions
1. What does sampling in research mean?
A: A process of selecting participants for a research project

B: A portion of the elements in a population
C: Assigning participants to various roles in a project
D: All the above
2. What is meant by a sampling error?
A: When you fail to get information for your research

B: When a portion of the population fails to participate in research
C: The extent to which a sample statistic incorrectly estimates a population parameter.
D: The accuracy in research finding
3: A sample frame is…
A: A portion of the elements in a population

B: A concrete listing of all the elements in a population
C: A value associated with a population
D: A concrete listing of some elements in a population
4: Random selection is…
A: A technique that provides each population element an equal opportunity of being

includes in the sample.
B: Is when you select elements on the basis of categories assumed to exist within a
population.
C: Selection based on the availability or ease of inclusion
D: Any procedure in which elements have unequal chances of being included into the
sample.
5: Which of the one of the following best describes probability sampling.
A: When all elements in a population are included in a sample

B: When a portion of the sample is included in a sample.
C: When hierarchical groups are selected from a sample frame.
D: Random sampling
6: Which one of the following is not a non-probability sampling method.
A: Accidental sampling
B: Quota sampling
C: Purposive sampling
D: Cluster sampling
7: What is the main distinction between probability and nonprobability sampling?
A: In probability sampling, sample error can be estimated.

B: In non-probability sample, sample error can be estimated.
C: Probability sampling has no sample error
D: Non of the above is true
8: When do we use cluster sampling in research?
A: When doing non-probability sampling.

B: Whenever a list of the entire frame does not exist
C: When there are cyclic changes that repeat throughout the sample frame
D: Whenever we want to include all subgroups into our sample.
9: What is meant by confidence level in research?
A: The probability associated with sampling error in a population

B: The probability associated with the accuracy of an inferential statistics
C: The probability to which a sample statistic incorrectly estimates a population
parameter.
D: The probability associated with confidence interval in a population
10: The primary goal of any sampling procedure is to obtain a representative

sample. What does this mean?
A: A sample that includes all the elements in a population

B: A sample that follows all the sampling procedures
C: A sample that resembles the population within an acceptable error.
D: A sample that avoids accidental sampling.
Answers:
1: A 2:C 3:B 4:A 5:D 6:D 7:A 8:B 9:B
10:C
For more information e-mail jao23@cornell.edu

Sampling In Research
Mugo Fridah W.
INTRODUCTION
This tutorial is a discussion on sampling in research it is mainly designed to eqiup beginners with knowledge on the general issues on
sampling that is the purpose of sampling in research, dangers of sampling and how to minimize them, types of sampling and guides
for deciding the sample size. For a clear flow of ideas, a few definitions of the terms used are given.
What is research?
According Webster(1985), to research is to search or investigate exhaustively. It is a careful or diligent search, studious inquiry or
examination especially investigation or experimentation aimed at the discovery and interpretation of facts, revision of accepted
theories or laws in the light of new facts or practical application of such new or revised theories or laws, it can also be the collection
of information about a particular subject.
What is a sample?
A sample is a finite part of a statistical population whose properties are studied to gain information about the whole(Webster, 1985).
When dealing with people, it can be defined as a set of respondents(people) selected from a larger population for the purpose of a
survey.
A population is a group of individuals persons, objects, or items from which samples are taken for measurement for example a
population of presidents or professors, books or students.
What is sampling? Sampling is the act, process, or technique of selecting a suitable sample, or a representative
part of a population for the purpose of determining parameters or characteristics of the whole population.
What is the purpose of sampling? To draw conclusions about populations from samples, we must use
inferential statistics which enables us to determine a population`s characteristics by directly observing only a
portion (or sample) of the population. We obtain a sample rather than a complete enumeration (a census ) of the
population for many reasons. Obviously, it is cheaper to observe a part rather than the whole, but we should
prepare ourselves to cope with the dangers of using samples. In this tutorial, we will investigate various kinds of
sampling procedures. Some are better than others but all may yield samples that are inaccurate and unreliable.
We will learn how to minimize these dangers, but some potential error is the price we must pay for the
convenience and savings the samples provide.
There would be no need for statistical theory if a census rather than a sample was always used to obtain information about
populations. But a census may not be practical and is almost never economical. There are six main reasons for sampling instead of
doing a census. These are; -Economy -Timeliness -The large size of many populations -Inaccessibility of some of the population -
Destructiveness of the observation -accuracy
The economic advantage of using a sample in research Obviously, taking a sample requires fewer resources
than a census. For example, let us assume that you are one of the very curious students around. You have
heard so much about the famous Cornell and now that you are there, you want to hear from the insiders. You
want to know what all the students at Cornell think about the quality of teaching they receive, you know that all
the students are different so they are likely to have different perceptions and you believe you must get all these
perceptions so you decide because you want an indepth view of every student, you will conduct personal
interviews with each one of them and you want the results in 20 days only, let us assume this particular time
you are doing your research Cornell has only 20,000 students and those who are helping are so fast at the
interviewing art that together you can interview at least 10 students per person per day in addition to your 18
credit hours of course work. You will require 100 research assistants for 20 days and since you are paying them
minimum wage of $5.00 per hour for ten hours ($50.00) per person per day, you will require $100000.00 just to
complete the interviews, analysis will just be impossible. You may decide to hire additional assistants to help
with the analysis at another $100000.00 and so on assuming you have that amount on your account.
As unrealistic as this example is, it does illustrate the very high cost of census. For the type of information desired, a small wisely
selected sample of Cornell students can serve the purpose. You don`t even have to hire a single assistant. You can complete the
interviews and analysis on your own. Rarely does a circustance require a census of the population, and even more rarely does one
justify the expense.
The time factor.
A sample may provide you with needed information quickly. For example, you are a Doctor and a disease has
broken out in a village within your area of jurisdiction, the disease is contagious and it is killing within hours
nobody knows what it is. You are required to conduct quick tests to help save the situation. If you try a census
of those affected, they may be long dead when you arrive with your results. In such a case just a few of those
already infected could be used to provide the required information.
The very large populations
Many populations about which inferences must be made are quite large. For example, Consider the population of high school seniors
in United States of America, agroup numbering 4,000,000. The responsible agency in the government has to plan for how they will
be absorbed into the differnt departments and even the private sector. The employers would like to have specific knowledge about the
student`s plans in order to make compatiple plans to absorb them during the coming year. But the big size of the population makes it
physically impossible to conduct a census. In such a case, selecting a representative sample may be the only way to get the
information required from high school seniors.
The partly accessible populations
There are Some populations that are so difficult to get access to that only a sample can be used. Like people in prison, like crashed
aeroplanes in the deep seas, presidents e.t.c. The inaccessibility may be economic or time related. Like a particular study population
may be so costly to reach like the population of planets that only a sample can be used. In other cases, a population of some events
may be taking too long to occur that only sample information can be relied on. For example natural disasters like a flood that occurs
every 100 years or take the example of the flood that occured in Noah`s days. It has never occured again.
The destructive nature of the observation Sometimes the very act of observing the desired charecteristic of a
unit of the population destroys it for the intended use. Good examples of this occur in quality control. For
example to test the quality of a fuse, to determine whether it is defective, it must be destroyed. To obtain a
census of the quality of a lorry load of fuses, you have to destroy all of them. This is contrary to the purpose
served by quality-control testing. In this case, only a sample should be used to assess the quality of the fuses
Accuracy and sampling A sample may be more accurate than a census. A sloppily conducted census can
provide less reliable information than a carefully obtained sample.
BIAS AND ERROR IN SAMPLING A sample is expected to mirror the population from which it comes, however,
there is no guarantee that any sample will be precisely representative of the population from which it comes.
Chance may dictate that a disproportionate number of untypical observations will be made like for the case of
testing fuses, the sample of fuses may consist of more or less faulty fuses than the real population proportion
of faulty cases. In practice, it is rarely known when a sample is unrepresentative and should be discarded.
Sampling error
What can make a sample unrepresentative of its population? One of the most frequent causes is sampling error.
Sampling error comprises the differences between the sample and the population that are due solely to the particular units that happen
to have been selected.
For example, suppose that a sample of 100 american women are measured and are all found to be taller than six feet. It is very clear
even without any statistical prove that this would be a highly unrepresentative sample leading to invalid conclusions. This is a very
unlikely occurance because naturally such rare cases are widely distributed among the population. But it can occur. Luckily, this is a
very obvious error and can be etected very easily.
The more dangerous error is the less obvious sampling error against which nature offers very little protection. An example would be
like a sample in which the average height is overstated by only one inch or two rather than one foot which is more obvious. It is the
unobvious error that is of much concern.
There are two basic causes for sampling error. One is chance: That is the error that occurs just because of bad luck. This may result in
untypical choices. Unusual units in a population do exist and there is always a possibility that an abnormally large number of them
will be chosen. For example, in a recent study in which I was looking at the number of trees, I selected a sample of households
randomly but strange enough, the two households in the whole population, which had the highest number of trees (10,018 and 6345 )
were both selected making the sample average higher than it should be. The average with these two extremes removed was 828 trees.
The main protection agaisnt this kind of error is to use a large enough sample. The second cause of sampling is sampling bias.
Sampling bias is a tendency to favour the selection of units that have paticular characteristics.
Sampling bias is usually the result of a poor sampling plan. The most notable is the bias of non response when for some reason some
units have no chance of appearing in the sample. For example, take a hypothetical case where a survey was conducted recently by
Cornell Graduate school to find out the level of stress that graduate students were going through. A mail questionnaire was sent to
100 randomly selected graduate students. Only 52 responded and the results were that students were not under strees at that time
when the actual case was that it was the highest time of stress for all students except those who were writing their thesis at their own
pace. Apparently, this is the group that had the time to respond. The researcher who was conducting the study went back to the
questionnaire to find out what the problem was and found that all those who had responded were third and fourth PhD students. Bias
can be very costly and has to be gaurded against as much as possible. For this case, $2000.00 had been spent and there were no
reliable results in addition, it cost the reseacher his job since his employer thought if he was qualified, he should have known that
before hand and planned on how to avoid it. A means of selecting the units of analysis must be designed to avoid the more obvious
forms of bias. Another example would be where you would like to know the average income of some community and you decide to
use the telephone numbers to select a sample of the total population in a locality where only the rich and middle class households
have telephone lines. You will end up with high average income which will lead to the wrong policy decisions.
Non sampling error (measurement error)
The other main cause of unrepresentative samples is non sampling error. This type of error can occur whether a census or a sample is
being used. Like sampling error, non sampling error may either be produced by participants in the statistical study or be an innocent
by product of the sampling plans and procedures.
A non sampling error is an error that results solely from the manner in which the observations are made.
The simplest example of non sampling error is inaccurate measurements due to malfuntioning instruments or poor procedures. For
example, Consider the observation of human weights. If persons are asked to state their own weights themselves, no two answers will
be of equal reliability. The people will have weighed themselves on different scales in various states of poor caliberation. An
individual`s weight fluctuates diurnally by several pounds, so that the time of weighing will affect the answer. The scale reading will
also vary with the person`s state of undress. Responses therefore will not be of comparable validity unless all persons are weighed
under the same circumstances.
Biased observations due to inaccurate measurement can be innocent but very devastating. A story is told of a French astronomer who
once proposed a new theory based on spectroscopic measurements of light emitted by a particular star. When his colleques
discovered that the measuring instrument had been contaminated by cigarette smoke, they rejected his findings.
In surveys of personal characteristics, unintended errors may result from: -The manner in which the response is elicited -The social
desirability of the persons surveyed -The purpose of the study -The personal biases of the interviewer or survey writer
The interwiers effect
No two interviewers are alike and the same person may provide different answers to different interviewers. The manner in which a
question is formulated can also result in inaccurate responses. Individuals tend to provide false answers to particular questions. For
example, some people want to feel younger or older for some reason known to themselves. If you ask such a person their age in
years, it is easier for the idividual just to lie to you by over stating their age by one or more years than it is if you asked which year
they were born since it will require a bit of quick arithmetic to give a false date and a date of birth will definitely be more accurate.
The respondent effect
Respondents might also give incorrect answers to impress the interviewer. This type of error is the most difficult to prevent because it
results from out right deceit on the part of the respondee. An example of this is what I witnessed in my recent study in which I was
asking farmers how much maize they harvested last year (1995). In most cases, the men tended to lie by saying a figure which is the
reccomended expected yield that is 25 bags per acre. The responses from men looked so uniform that I became suspicious. I
compared with the responses of the wives of the these men and their responses were all different. To decide which one was right,
whenever possible I could in a tactful way verify with an older son or daughter. It is important to acknowledge that certain
psychological factors induce incorrect responses and great care must be taken to design a study that minimizes their effect.
Knowing the study purpose
Knowing why a study is being conducted may create incorrect responses. A classic example is the question: What is your income? If
a government agency is asking, a different figure may be provided than the respondent would give on an application for a home
mortgage. One way to guard against such bias is to camouflage the study`s goals; Another remedy is to make the questions very
specific, allowing no room for personal interpretation. For example, "Where are you employed?" could be followed by "What is your
salary?" and "Do you have any extra jobs?" A sequence of such questions may produce more accurate information.
Induced bias
Finally, it should be noted that the personal prejudices of either the designer of the study or the data collector may tend to induce
bias. In designing a questionnaire, questions may be slanted in such a way that a particular response will be obtained even though it is
inacurrate. For example, an agronomist may apply fertilizer to certain key plots, knowing that they will provide more favourable
yields than others. To protect against induced bias, advice of an individual trained in statistics should be sought in the design and
someone else aware of search pitfalls should serve in an auditing capacity.
SELECTING THE SAMPLE

The preceding section has covered the most common problems associated with statistical studies. The desirability of a sampling
procedure depends on both its vulnerability to error and its cost. However, economy and reliability are competing ends, because, to
reduce error often requires an increased expenditure of resources. Of the two types of statistical errors, only sampling error can be
controlled by exercising care in determining the method for choosing the sample. The previous section has shown that sampling error
may be due to either bias or chance. The chance component (sometimes called random error) exists no matter how carefully the
selection procedures are implemented, and the only way to minimize chance sampling errors is to select a sufficiently large sample
(sample size is discussed towards the end of this tutorial). Sampling bias on the other hand may be minimized by the wise choice of a
sampling procedure.
TYPES OF SAMPLES
There are three primary kinds of samples: the convenience, the judgement sample, and the random sample. They differ in the manner
in which the elementary units are chosen.
The convenient sample
A convenience sample results when the more convenient elementary units are chosen from a population for observation.
The judgement sample
A judgement sample is obtained according to the discretion of someone who is familiar with the relevant characteristics of the
population.
The random sample
This may be the most important type of sample. A random sample allows a known probability that each elementary unit will be
chosen. For this reason, it is sometimes referred to as a probability sample. This is the type of sampling that is used in lotteries and
raffles. For example, if you want to select 10 players randomly from a population of 100, you can write their names, fold them up,
mix them thoroughly then pick ten. In this case, every name had any equal chance of being picked. Random numbers can also be
used (see Lapin page 81).
TYPES OF RANDOM SAMPLES
A simple random sample
A simple random sample is obtained by choosing elementary units in search a way that each unit in the population has an equal
chance of being selected. A simple random sample is free from sampling bias. However, using a random number table to choose the
elementary units can be cumbersome. If the sample is to be collected by a person untrained in statistics, then instructions may be
misinterpreted and selections may be made improperly. Instead of using a least of random numbers, data collection can be simplified
by selecting say every 10th or 100th unit after the first unit has been chosen randomly as discussed below. such a procedure is called
systematic random sampling.
A systematic random sample
A systematic random sample is obtained by selecting one unit on a random basis and choosing additional elementary units at evenly
spaced intervals until the desired number of units is obtained. For example, there are 100 students in your class. You want a sample
of 20 from these 100 and you have their names listed on a piece of paper may be in an alphabetical order. If you choose to use
systematic random sampling, divide 100 by 20, you will get 5. Randomly select any number between 1 and five. Suppose the number
you have picked is 4, that will be your starting number. So student number 4 has been selected. From there you will select every 5th
name until you reach the last one, number one hundred. You will end up with 20 selected students.
A stratified sample
A stratified sample is obtained by independently selecting a separate simple random sample from each population stratum. A
population can be divided into different groups may be based on some characteristic or variable like income of education. Like any
body with ten years of education will be in group A, between 10 and 20 group B and between 20 and 30 group C. These groups are
referred to as strata. You can then randomly select from each stratum a given number of units which may be based on proportion like
if group A has 100 persons while group B has 50, and C has 30 you may decide you will take 10% of each. So you end up with 10
from group A, 5 from group B and 3 from group C.
A cluster sample
A cluster sample is obtained by selecting clusters from the population on the basis of simple random sampling. The sample comprises
a census of each random cluster selected. For example, a cluster may be some thing like a village or a school, a state. So you decide
all the elementary schools in Newyork State are clusters. You want 20 schools selected. You can use simple or systematic random
sampling to select the schools, then every school selected becomes a cluster. If you interest is to interview teachers on thei opinion of
some new program which has been introduced, then all the teachers in a cluster must be interviewed. Though very economical cluster
sampling is very susceptible to sampling bias. Like for the above case, you are likely to get similar responses from teachers in one
school due to the fact that they interact with one another.
PURPOSEFUL SAMPLING
Purposeful sampling selects information rich cases for indepth study. Size and specific cases depend on the study purpose.
There are about 16 different types of purposeful sampling. They are briefly described below for you to be aware of them. The details
can be found in Patton(1990)Pg 169-186.
Extreme and deviant case sampling This involves learning from highly unusual manifestations of the
phenomenon of interest, suchas outstanding successes, notable failures, top of the class, dropouts, exotic
events, crises.
Intensity sampling This is information rich cases that manifest the phenomenon intensely, but not extremely,
such as good students,poor students, above average/below average.
Maximum variation sampling This involves purposefully picking a wide range of variation on dimentions of
interest. This documents unique or diverse variations that have emerged in adapting to different conditions. It
also identifies important common patterns that cut across variations. Like in the example of interviewing Cornell
students, you may want to get students of different nationalities, professional backgrounds, cultures, work
experience and the like.
Homogenious sampling This one reduces variation, simplifies analysis, facilitates group interviewing. Like
instead of having the maximum number of nationalities as in the above case of maximum variation, it may focus
on one nationality say Americans only.
Typical case sampling It involves taking a sample of what one would call typical, normal or average for a
particular phenomenon,
Stratified purposeful sampling This illustrates charecteristics of particular subgroups of interest and facilitates
comparisons between the different groups.
Critical case sampling> This permits logical generalization and maximum application of information to other
cases like "If it is true for this one case, it is likely to be true of all other cases. You must have heard statements
like if it happenned to so and so then it can happen to anybody. Or if so and so passed that exam, then anybody
can pass.
Snowball or chain sampling This particular one identifies, cases of interest from people who know people who
know what cases are information rich, that is good examples for study, good interview subjects. This is
commonly used in studies that may be looking at issues like the homeless households. What you do is to get
hold of one and he/she will tell you where the others are or can be found. When you find those others they will
tell you where you can get more others and the chain continues.
Criterion sampling Here, you set a criteria and pick all cases that meet that criteria for example, all ladies six feet
tall, all white cars, all farmers that have planted onions. This method of sampling is very strong in quality
assurance.
Theory based or operational construct sampling. Finding manifestations of a theoretical construct of interest so
as to elaborate and examine the construct.
Confirming and disconfirming cases Elaborating and deepening initial analysis like if you had already started
some study, you are seeking further information or confirming some emerging issues which are not clear,
seeking exceptions and testing variation.
Opportunistic Sampling This involves following new leads during field work, taking advantage of the
unexpected flexibility.
Random purposeful sampling This adds credibility when the purposeful sample is larger than one can handle.
Reduces judgement within a purposeful category. But it is not for generalizations or representativeness.
Sampling politically important cases This type of sampling attracts or avoids attracting attention undesired
attention by purposisefully eliminating from the sample political cases. These may be individuals, or localities.
Convenience sampling It is useful in getting general ideas about the phenomenon of interest. For example you
decide you will interview the first ten people you meet tomorrow morning. It saves time, money and effort. It is
the poorest way of getting samples, has the lowest credibility and yields information-poor cases.
Combination or mixed purposeful sampling This combines various sampling strategies to achieve the desired
sample. This helps in triangulation, allows for flexibility, and meets multiple interests and needs. When selecting
a sampling strategy it is necessary that it fits the purpose of the study, the resources available, the question
being asked and the constraints being faced. This holds true for sampling strategy as well as sample size.
SAMPLE SIZE
Before deciding how large a sample should be, you have to define your study population. For example, all children below age three
in Tomkin`s County. Then determine your sampling frame which could be a list of all the chidren below three as recorded by
Tomkin`s County. You can then struggle with the sample size.
The question of how large a sample should be is a difficult one. Sample size can be determined by various constraints. For example,
the available funding may prespecify the sample size. When research costs are fixed, a useful rule of thumb is to spent about one half
of the total amount for data collection and the other half for data analysis. This constraint influences the sample size as well as
sample design and data collection procedures.
In general, sample size depends on the nature of the analysis to be performed, the desired precision of the estimates one wishes to
achieve, the kind and number of comparisons that will be made, the number of variables that have to be examined simultaneously
and how heterogenous a universe is sampled. For example, if the key analysis of a randomized experiment consists of computing
averages for experimentals and controls in a project and comparing differences, then a sample under 100 might be adequate,
assuming that other statistical assumptions hold.
In non-experimental research, most often, relevant variables have to be controlled statistically because groups differ by factors other
than chance.
More technical considerations suggest that the required sample size is a function of the precision of the estimates one wishes to
achieve, the variability or variance, one expects to find in the population and the statistical level of confidence one wishes to use. The
sample size N required to estimate a population mean (average) with a given level of precision is:
The square root of N=(1.96)*(&)/precision Where & is the population standard deviation of the for the variable whose mean one is
interested in estimating. Precision refers to width of the interval one is willing to tolerate and 1.96 reflects the confidence level. For
details on this please see Salant and Dillman (1994).
For example, to estimate mean earnings in a population with an accuracy of $100 per year, using a 95% confidence interval and
assuming that the standard deviation of earnings in the population is $1600.0, the required sample size is 983:[(1.96)(1600/100)]
squared.
Deciding on a sample size for qualitative inquiry can be even more difficult than quantitative because there are no definite rules to be
followed. It will depend on what you want to know, the purpose of the inquiry, what is at stake, what will be usefull, what will have
credibility and what can be done with available time and resources. With fixed resources which is always the case, you can choose to
study one specific phenomenon in depth with a smaller sample size or a bigger sample size when seeking breadth. In purposeful
sampling, the sample should be judged on the basis of the purpose and rationale for each study and the sampling strategy used to
achieve the studies purpose. The validity, meangfulness, and insights generated from qualitative inquiry have more to do with the
information-richness of the cases selected and the observational/analytical capabilities of the researcher than with sample size.
CONCLUSION
In conclusion, it can be said that using a sample in research saves mainly on money and time, if a suitable sampling strategy is used,
appropriate sample size selected and necessary precautions taken to reduce on sampling and measurement errors, then a sample
should yield valid and reliable information. Details on sampling can be obtained from the references included below and many other
books on statistics or qualitative research which can be found in libraries.
References
Webster, M. (1985). Webster`s nith new collegiate dictionary. Meriam - Webster Inc.
Salant, P. and D. A. Dillman (1994). How to conduct your own survey. John Wiley & Sons, Inc.
Patton, M.Q.(1990). Qualitative evaluation and research methods. SAGE Publications. Newbury Park London New Delhi.
Lapin, L. L. (1987). Statistics for mordern business decisions. Harcourt Brace Jovanovich, Inc.
PHENOMENOLOGISTS BEHIND BARS:
how to do QUALITATIVE research
INTERVIEWS WITH
incarcerated YOUTHs
This tutorial is a first "crash course" in conducting qualitative research interviews with youths in jail.
It is not and exhaustive or complete guide, but a good start. For further information and additional tips, please contact
cb53@cornell.edu
Whether you are a student, a researcher, a writer, or a member of the juvenile justice system, if you have it in your heart to try to
understand incarcerated kids, you have come to the right webpage.
If the idea of interviewing kids in jail repels or horrifies you, you are not alone, and you are also on the wrong webpage...
If you are still reading, let's begin:
The information you are about to explore stems from a project called "Making sense of senseless youth violence". It was
conducted in New York State, by researchers at Cornell University, in Ithaca, NY. An important part of this project consisted of
trying to make sense of youth violence by going directly to violent youths, to ask them to tell us about youth violence in their
own words, from their life perspectives. After all, if we want to know about something, what better way than to start by going to
the source?
"You have not converted a man
because you have silenced him"
(anonymous)
Our project turns to the direct experience of incarcerated youth and asks: How do children become teenagers involved in
committing the serious violent offenses society is becoming increasingly familiar with, such as murder, attempted-murder,
manslaughter and assault with weapons?
But wait...Before running to the jail, let's stop and think. What is it I'm about to do at the jail? And, why?
I wanted to start by confessing that I have very strong feelings about the need for directly involving violent kids in research on
youth violence, and I have strong feelings about the kids locked behind jail walls.
For one, I LIKE these kids, no matter what they have done. I respect them because of the journey they have walked. I know that
some people have little or no respect for these kids at all, mostly because of what the kids have done, and for what they
represent. But contrary to those people, I have a lot of respect for these kids. I don't like what they did -- Many have committed
horrible acts unto other human beings. But as a qualitative researcher and as a person, I look at these kids as being more than the
crimes they have done. There are life stories behind what has led them to violence. There are human stories of evolution towards
crime.
An important goal of social research is to make this world a better place. In youth violence research, that motivate translates
into the question: "Are we going to stop at blaming, or are we going to take it a step further and try to understand, in order to
possibly identify ways to prevent other youths from evolving towards crime?" That is the framework, the philosophy behind this
kind of research. The practice is the next logical step: I want to make sure that I do my best to hear what it is about this kid's life
that landed him in here-- I want to make sure that I get his own account of how he got here, in his own words, from his own
perspective. The method to help this practice bear fruit is a branch of qualitative research called phenomenology.
Phenomenology focuses on the question:"What is the structure and essence of experience of this phenomenon for these
people?" (Patton, M.Q., 1990). The phenomenon here is acts of violence committed by youths. Acts serious enough to land them
in jail. Hussert (1962) explained it as "the study of how people describe things and experience them through their senses." The
idea being that we can only really know what we experience first-hand. So, how youths understand the problem of youth
violence, and its solutions (don't forget we're trying to make the world better!), seen from their lenses: that is phenomenological
research.
Now that you are clearer on how to explain to the jail staff what your role is as a phenomenological researcher, proceed through
the following steps:
● write a project description, explain what information you want to collect;

● get a written agreement from the youth jail;
● have the jail prepare and mail parental consent forms to get permission for incarcerated minors to participate (some of
your respondents will be under 18. By law, all of them will be under 21);
● work out an arrangement with the jail director and with the guards to find a location, a place where interviews can be
conducted in privacy. Make sure the room is far enough to avoid evesdropping, but in sight to ensure safety regulations
are respected;
● buy a tape recorder (one with a microphone jack), a very small good quality microphone, several tapes, several batteries.
● Prepare your "sales pitch" on why the jail should let you have access to the kids (see below):
Sales Pitch:
"I don't want to read his story from a file, written by an understandably overworked social worker who maybe spent a brief
amount of time gathering a case history. I don't want to read the police reports, or the judge's reports, or even the previous
facility's reports. I don't believe you can really get to know kids from a file, anymore than you would really get to know me if
you read my school files, my personnel files, or even my psychologist's notes. My basic rule is: If I don't think it's going to work
with me, why do I think it's going to work for someone else? We all know that there is no better way to get to know someone
than to sit face to face and talk..."
When all the points above have been checked (X) on your to-do list, you can go on to organize and begin your interviews. The
interviews will provide all the phenomenological date you will need to begin piecing your research together. This data can then
be complemented by a good review of the literature on youth violence, and if you have a lot of time and resources, it can even
eventually be complemented by field trips to the areas your interviewees grew up in, to meet their families, teachers, and so on.
By then you will love this so much, that you will want to devote long chunks of time to immerse yourself in the youth's pre-jail
environment to conduct on-site research! At that time, you will move from phenomenology to ethnography. As a great practical
guide to making that jump, I highly recommend reading The Professional Stranger, by Michael Agar. Academic Press.
Ready? Get Set, Go!
CAN WE TALK?
INTERVIEWING INCARCERATED YOUTH AS HUMAN BEINGS
"Research Subjects have also been known to be people."
-From Halcom's Evaluation Laws
Time to prepare for your interviews:
What are we going to talk about???
Your interview format will be semi-structured, more an "agenda of topics" than a schedule of questions, and it will build upon
the information offered by the youth during the interview process.
Be prepared to spread your interviews over several weeks:
● People need time to get comfortable,

● not all the information will come out in one interview,
● repeated interviews facilitate recall!
Some of the key issues to be explored are these:
a. memories and family stories about birth and early childhood;
b. relationships and life events focusing on family (parents, siblings, extended family, divorce/separation, incarceration of family
members) and residential moves (schools, neighborhoods);
c. experience with violence arriving at the point of killing, to document "violence history" (earliest memories of being hurt, first
memories of hurting someone else, critical events);
d. exposure to "environmental poisons" (violent television and movies, gang exposure and involvement);
e. involvement in the child welfare and juvenile justice systems

f. moments, decisions, or missed opportunities that could have become "turning points";
g. religious and spiritual orientation, experiences, and activities, past, present, and projected future;
h. hopes for the future and experiences of meaningfulness in day-to-day life.
Suggested Themes to Cover during the interview, based on the Cornell youth violence project:
You can of course come up with your own themes, based on your personal research interests!
Eight themes:
1. Exposure to physical violence: Based upon the well- documented role of exposure to physical violence, including physical
abuse and exposure to violence in the media, as a threat to development (Terr, 1994) and as a linked factor in the development of
juvenile delinquency. The range is from -2 (severe exposure to physical violence) to +2 (minimal exposure to physical violence).
2. Experience of psychological maltreatment: Based upon repeated findings that psychological maltreatment in the form of
rejection, isolation, terrorizing, ignoring, and corrupting is a the core issue in child maltreatment (Garbarino, Guttman, and
Seeley, 1986). The range will be from -2 (experience of multiple forms of severe psychological maltreatment) to +2 (absence of
any form of psychological maltreatment).
3. Existential crisis of shame and the threat of self annihilation: Based upon J. Gilligan's (1996) analysis of shame as the origin
of violence and violence as the only perceived means of avoiding the destruction of one's self by another person. Our range will
be from -2 (extreme shame and perceived threat of self annihilation) to +2 (.positive identity based upon affirmed identity).
4. Narrative coherence: Based upon Cohler's (1991) finding that narrative coherence (i.e. one's ability to tell his or her own life
story as a way to make sense of how one's life unfolds) itself mediates the impact of life events on subsequent development and
coping. The range is from -2 (primitive and incoherent narrative account) to +2 (coherent and positive account).
5. Meaningfulness: Based upon van der Kolk's (1994) finding that trauma leading to unresolved crises of meaning (i.e. meaning
of one's life, meaning of one's purpose in life) is associated with impaired functioning and reduced resilience.
6. Planful Competence: Based upon Rutter's (1989) finding that the presence of planfulness (in the sense of making conscious,
future-oriented decisions about life events) differentiates between youth who show resilience in the face of difficult life
circumstances and youth who succumb to the negative potential of these life circumstance.
7. Spiritual Orientation: Based upon research relating spiritual oientation (i.e. a focus on one's inner life and a belief in a higher
power) to coping with adversity and trauma (Garbarino and Bedard, 1996).
8. Optimism: Based upon Bettelheim and Rosenfeld's (1994) analysis of the key role played by optimism as an orientation to
life events that supports resilience.
Time to meet your first kid!
Are we going to click? Are they going to talk?
Now that you know what information you want to gather, you have your agreement with the jail, the parental forms have come
back signed, and you have your tape recorder, tapes, batteries, note pad and pens, you are now ready to meet your first kid!
Very Important "DO's & DON'Ts" for Building Rapport
with Incarcerated Youths :
DO:
● SHOW RESPECT
● SHOW EMPATHY
● CREATE A CLIMATE OF POSITIVE REGARD
● PRACTICE NON-JUDGEMENT
● BE GENUINE, REAL, SINCERE
DON'T:
● TRY TO COME ACROSS AS 'COOL'

● TRY TO SPEAK SLANG
● AVOID EYE CONTACT
● OVERDO EYE CONTACT
● JUDGE
Always Remember:
I- CONNECT AS A HUMAN BEING
II- TALK AS A HUMAN BEING

III- ASK QUESTIONS AS A HUMAN BEING
III- LISTEN LIKE A HUMAN BEING
IV- RESPOND LIKE A HUMAN BEING
YOU DID IT!
Repeat all of the above steps over several months and dozens of incarcerated kids, and you will be on your way to creating your
very own website to teach others how to improve on this methodology!
If you have made it all the way down this page, congratulations! You have the patience and energy to give this type of research
your best try.
For further information, comments, and insight, please feel free to contact us!
cb53@cornell.edu
Last updated: May 16th, 2000

Observational Field Research
This web page is designed as an introduction to the basic issues and design options in observational research within natural settings.
Observational research techniques solely involve the researcher or researchers making observations. There are many positive aspects
of the observational research approach. Namely, observations are usually flexible and do not necessarily need to be structured around
a hypothesis (remember a hypothesis is a statement about what you expect to observe). For instance, before undertaking more
structured research a researcher may conduct observations in order to form a research question. This is called descriptive research. In
terms of validity, observational research findings are considered to be strong. Trochim states that validity is the best available
approximation to the truth of a given proposition, inference, or conclusion. Observational research findings are considered strong in
validity because the researcher is able to collect a depth of information about a particular behavior. However, there are negative
aspects. There are problems with reliability and generalizability. Reliability refers the extent that observations can be replicated.
Seeing behaviors occur over and over again may be a time consuming task. Generalizability, or external validity, is described by
Trochim as the extent that the study's findings would also be true for other people, in other places, and at other times. In
observational research, findings may only reflect a unique population and therefore cannot be generalized to others. There are also
problems with researcher bias. Often it is assumed that the researcher may "see what they want to see." Bias, however, can often be
overcome with training or electronically recording observations. Hence, overall, observations are a valuable tool for researchers.
First this Web Page will discuss the appropriate situations to use observational field research. Second, the various types of
observations research methods are explained. Finally, observational variables are discussed. This page's emphasis is on the collection
rather the analysis of data.
After reading this web page, you should be able to
1. Understand the advantages and disadvantages of observational research compared to other research methods.
2. Understand the strengths and weaknesses in the validity of observational research findings.
3. Know what Direct Observation is and some of the main concerns of using this method.
4. Know what Continuos Monitoring is and what types of research it is appropriate for.
5. Understand Time Allocation research and why you would want to use it.
6. Know why unobtrusive research is a sticky proposition.
7. Understand the validity issues when discussing unobtrusive observation.
8. Know what to do in a behavior trace study.
9. Consider when to conduct a disguised field experiment.
10. Know the observational variables.
Should you or shouldn't you collect your data through observation?
Questions to consider:
Is the topic sensitive?

Are people uncomfortable or unwilling to answer questions about a particular subject? For instance, many people are
uncomfortable when asked about prejudice. Self-reports of prejudice often bring biased answers. Instead, a researcher may
choose to observe black and white students interactions. In this case, observations are more likely to bring about more
accurate data. Thus, sensitive social issues are better suited for observational research.
Can you observe the Phenomena?
You must be able to observe what is relevant to your study. Let's face it, you could observe and observe but if you never see
what your studying your wasting your time. You can't see attitudes. Although you can observe behaviors and make
inferences about attitudes. Also, you can't be everywhere. There are certain things you can't observe. For example, questions
regarding sexual behavior are better left to a survey.
Do you have a lot of time?
Many people don't realize that observational research may be time consuming. In order to obtain reliability, behaviors must
be observed several times. In addition, there is also a concern that the observer's presence may change the behaviors being
observed. As time goes on, however, the subjects are more likely to grow accustomed to your presence and act normally. It
is in the researchers best interest to observe for a long period of time.
Are you not sure what your looking for?
That's okay! Known as descriptive research, observations are a great way to start a research project. Let's say you are
interested in male and female behavior in bars. You have no idea what theory to use or what behavior you are interested in
looking for. So, you watch, and, wow, you see something. Like the amount of touching is related to alcohol consumption.
So you run to the library, gather your research, and maybe decide to do more observations or supplement your study with
surveys. Then, these observations turn into a theory once they are replicated (well, it's not quite that simple). So you see,
observations are a good place to start.
Types of Observations
Okay, so you've decided that you think observational research is for you. Now you only have to pick which kind of observation to do.
● Direct (Reactive) Observation

In direct observations, people know that you are watching them. The only danger is that they are reacting to you. As stated
earlier, there is a concern that individuals will change their actions rather than showing you what they're REALLY like. This
is not necessarily bad, however. For example, the contrived behavior may reveal aspects of social desirability, how they feel
about sharing their feelings in front of others, or privacy in a relationship. Even the most contrived behavior is difficult to
maintain over time. A long term observational study will often catch a glimpse of the natural behavior. Other problems
concern the generalizability of findings. The sample of individuals may not be representative of the population or the
behaviors observed are not representative of the individual (you caught the person on a bad day). Again, long-term
observational studies will often overcome the problem of external validity. What about ethical problems you say? Ethically,
people see you, they know you are watching them (sounds spooky, I know) and they can ask you to stop.
Now here are two commonly used types of direct observations:
1. Continuous Monitoring:
Continuos monitoring (CM) involves observing a subject or subjects and recording (either manually, electronically,
or both) as much of their behavior as possible. Continuos Monitoring is often used in organizational settings, such
as evaluating performance. Yet this may be problematic due to the Hawthorne Effect. The Hawthorne Effect states
that workers react to the attention they are getting from the researchers and in turn, productivity increases.
Observers should be aware of this reaction. Other CM research is used in education, such as watching teacher-
student interactions. Also in nutrition where researchers record how much an individual eats. CM is relatively easy
but a time consuming endeavor. You will be sure to acquire a lot of data.
2. Time Allocation:
Time Allocation (TA) involves a researcher randomly selecting a place and time and then recording what people
are doing when they are first seen and before they see you. This may sound rather bizarre but it is a useful tool
when you want to find out the percent of time people are doing things (i.e. playing with their kids, working, eating,
etc.). Thereare several sampling problems with this approach. First, in order to make generalizations about how
people are spending their time the researcher needs a large representative sample. Sneaking up on people all over
town is tough way to spend your days. In addition, questions such as when, how often, and where should you
observe are often a concern. Many researchers have overcome these problems by using nonrandom locations but
randomly visiting them at different times.
● Unobtrusive Observation:
Unobtrusive measures involves any method for studying behavior where individuals do NOT know they are being observed
(don't you hate to think that this could have happened to you!). Here, there is not the concern that the observer may change
the subject's behavior. When conducting unobtrusive observations, issues of validity need to be considered. Numerous
observations of a representative sample need to take place in order to generalize the findings. This is especially difficult
when looking at a particular group. Many groups posses unique characteristics which make them interesting studies. Hence,
often such findings are not strong in external validity. Also, replication is difficult when using non-conventional measures
(non-conventional meaning unobtrusive observation). Observations of a very specific behaviors are difficult to replicate in
studies especially if the researcher is a group participant (we'll talk more about this later). The main problem with
unobtrusive measures, however, is ethical. Issues involving informed consent and invasion of privacy are paramount here.
An institutional review board may frown upon your study if it is not really necessary for you not to inform your subjects.
Here is a description of two types of unobtrusive research measures you may decide to undertake in the
field:
1. Behavior Trace studies:

Behavior trace studies involve findings things people leave behind and interpreting what they mean. This can be
anything to vandalism to garbage. The University of Arizona Garbage Project one of the most well-known trace
studies. Anthropologists and students dug through household garbage to find out about such things as food
preferences, waste behavior, and alcohol consumption. Again, remember, that in unobtrusive research individuals
do not know they are being studied. How would you feel about someone going through your garbage? Surprisingly
Tucson residents supported the research as long as their identities were kept confidential. As you might imagine,
trace studies may yield enormous data.
2. Disguised Field Observations:
Okay, this gets a little sticky. In Disguised field analysis the researcher pretends to join or actually is a member of a
group and records data about that group. The group does not know they are being observed for research purposes.
Here, the observer may take on a number of roles. First, the observer may decide to become a complete-participant
in which they are studying something they are already a member of. For instance, if you are a member of a sorority
and study female conflict within sororities you would be considered a complete-participant observer. On the other
hand you may decide to only participate casually in the group while collecting observations. In this case, any
contact with group members is by acquaintance only. Here you would be considered an observer-participant.
Finally, if you develop an identity with the group members but do not engage in important group activities consider
yourself a participant-observer. An example would be joining a cult but not participating in any of their important
rituals (such as sacraficing animals). You are however, considered a member of the cult and trusted by all of the
members. Ethically, participant-observers have the most problems. Certainly there are degrees of deception at
work. The sensitivity of the topic and the degree of confidentiality are important issues to consider. Watching
classmates struggle with test-anxiety is a lot different than joining Alcoholics Anonymous. In all, disguised field
experiments are likely to yield reliable data but the ethical dilemmas are a trade-off.
An Interesting Side Note:
The protection of human rights from unethical research practices was heightened as a
consequence of the Nazi regime in Germany. The Nuremberg Code was adopted
following the trials of the twenty-three Nazi physicians convicted of crimes against
humanity. This Code provided a statement concerning the rights of human participants to
be informed and freely choose to participate in research. The Nuremberg Code has since
influenced policies of ethical research practices in several countries.
Federal Register (1991). Federal policy for the protection of human subjects; notices and rules, part II. Federal
register, 56, 28001-28032.
Observational Variables
Before you start on a research project make sure you how you are going to interpret your observations.
1. Descriptive:
Descriptive observational variables require no inference making on the part of the researcher. You see something and write
it down.
2. Inferential:
Inferential observational variables require the researcher to make inferences about what is observed and the underlying
emotion. For example, you may observe a girl banging on her keyboard. From this observation you may assume (correctly)
that she is frustrated with the computer.
3. Evaluative:
Evaluative observational variables require the researcher to make an inference and a judgment from the behavior. For
example, you may question whether computers and humans have a positive relationship. "Positive" is an evaluative
judgment. You observe the girl banging on her keyboard and conclude that humans and computers do not have a positive
relationship (you know you must replicate these findings!).
When writing field notes the researcher should include descriptive as well as inferential data. It is important to describe the setting
and the mood in a detailed manner. All such things that may change behavior need to be noted. Especially reflect upon your
presence. Do you think that you changed the behavior noticeably?
Okay, so this is a lot to remember. Go back up to the check-list of "things you should be able
to..." and ask yourself some questions. Remember, observations are a great way to start and add
to a research project.
Good luck observing!
References
and Suggested Reading
Babbie, E. (1992). The practice of social research. (6th ed.). Chapter 11. California: Wadsworth.
Bernard, R. (1994). Research methods in anthropology. (2nd ed.) Chapters 14-15. California: AltaMira.
Gall, M., Borg., & Gall, J. (1996). Educational research. (6th ed.). Chapter 9. New York: Longman.
Montgomery, B. & Duck, S. (1991). Studying interpersonal interaction. Chapter 11. New York: Guilford.
And HIGHLY RECOMMENDED is Trochim's Knowledge Base which is packed with information about validity and research
design.
Laura Brown
Comments: LAB19@Cornell.Edu
Thanks for Coming!
A Plethora of Threats: A Mildly Amusing Guide for the Weary Student and Anyone
Else Encountering the How To's and What If's of Construct Validity
*Warning: This web page may cause severe gastrointestinal disorders, bloodshot eyes and various other stress-related pains -- particularly for those
who are just about to engage in their thesis research (and thought they had thought of everything!). Anyone planning on finishing graduate school in
less than 10 years should consult Dr. Daniels (Jack, of course) before reading further.
**Also note: The events and characters portrayed here are purely fictional. If anyone or any situation resemble you or your own situation in any way --
join the club.
FINALLY....
On to the topic at hand. Picture this if you will:
A bubbly, over-eager, graduate student, we'll call her Susie, comes bouncing into her first committee meeting. Enthusiastically, she recites her ideas for her masters thesis:
"I am planning on doing a study which looks at increasing the level of parents' involvement in their children's' lives, particularly parents of Head Start students. I figure I will take
a couple of Head Start classrooms in the area and provide the parents with a series of parenting classes. The topics of the parenting classes could include things like: practicing
positive discipline, cooking with kids, how to handle problematic peer relationships, and multicultural curriculum. I could measure their involvement with their children both
before and after the classes and see if my program made a difference. Sounds great, Eh?!"
The committee members glanced at each other with furrowed brows. Dr. Doolittle was the first to comment: "What about validity Susie? How do you plan on
addressing validity issues in your study?"
"Validity? What do you mean exactly?", said Susie, slumping down like a deflated balloon.
"Well," said Dr. Doolittle, "Maybe we should start at the beginning. Validity is the best available approximation to the truth of a given proposition, inference or conclusion (click
here for a quick overview of validity -- this will be necessary if you have little exposure to the topic of validity). Generally, validity is subdivided into four parts: construct validity,
conclusion validity, internal validity, and external validity. Now, ...".
"Um, excuse me Dr.D," interjected Dr. Muffy, a sweet, older woman who felt as if she should slow down the good doctor before he gave her favorite graduate student an
information overload. "Why don't we just start with construct validity today. In fact why don't we even narrow it down further to the THREATS TO CONSTRUCT VALIDITY?"
"Sure, sure," replied Dr. Doolittle. "Where was I? Construct validity refers to generalizing from your program or measures to the concept of your program or measures; it is an
issue of labeling. In other words, when you deal with the concept of construct validity you can ask yourself the question -- DID I MEASURE WHAT I THOUGHT I MEASURED?
There are many issues and topics to consider when addressing concept validity. There are the types of measurement validity, the nomological network, the multitrait-multimethod
matrix, and then of course there is pattern matching...."
"But we agreed that we are going to focus on the THREATS to construct validity today --RIGHT DR. DOOLITTLE????" Dr. Muffy was getting a bit impatient at this point. She
knew what a complicated topic validity was, and how overwhelming it could be for someone who was first starting out.
"Right, threats."
"And only hit the major threats, please," Dr. Muffy continued, "I would like to go to lunch sometime this year." Susie shot her a grateful look and settled back to listen to Dr. D
spew about these "threats".
"Threats refer to questions and issues that may be raised by critics (both of the friendly and non-friendly variety) of your
research. There are 10 threats we will address today. The first one is INADEQUATE PREOPERATIONAL EXPLICATION OF
CONSTRUCTS...."
"I'm going to operate on who, why?" Susie interrupted.
"No, dear," said Dr.M, "that's just a fancy way of saying that you didn't do a very good job of operationally defining your
constructs. In your case, you will need to think through concepts such as 'parent involvement'. What do you mean by parent
involvement? Do others share your view? In order to address this threat, it often helps to illicit expert opinions and use specific
methods (such as concept mapping) to better define the construct."
"Right, as I was saying," interjected Dr.D, always insisting on being the center of attention, "the next two threats are MONO-OPERATION bias and MONO-METHOD bias..."
"Basically, the mono-operation threat refers to using only one version of your treatment. In the case of your proposed study this would refer to having only one version of
parenting classes; critics may argue that any conclusions drawn may refer to your particular version of the program, not the actual construct. In order to avoid this threat, you
could employ multiple versions of your parenting program.
On the other hand, a mono-method threat is basically the same thing as a mono-operation threat, except that it refers to the inadequacy of using a single measure to look at a
particular concept. Your critics may ask you how you can be sure you are measuring parent involvement, if you are only using one measure. The answer? Use several methods!"
"That doesn't sound too bad, " said Susie, "but I have a feeling that's not the whole story..."
"That's not even the half of it!" exclaimed Dr. Doolittle. "Oh no indeed, next there are the INTERACTION THREATS. The first of these refers to interactions between different
treatments. You need to be sure that the results you are obtaining are a result of your parenting classes and not some combination of activities in which your parents are involved.
For example, maybe the parents in your study are also participating in activities at church or through a neighborhood organization. These activities may encourage more
involvement with their children as well. It may be that the reinforcement of both programs working in conjunction with each other is what prompts them to increase their
involvement with their children.
The other type of interaction that may occur is an interaction of the testing and the treatment. Simply, by giving your parents a pretest you may heighten their awareness and
sensitivity to parent involvement. Because you have made them aware of the time they spend with their children, it may cause parents to reflect and increase involvement levels.
When you label your parenting classes the program, you are leaving out the pre-test, which also may be influencing the construct."
"Very well said Dr. Doolittle, "commented Dr.M, "Take a break and I will continue with RESTRICTED GENERALIZIBILITY and CONFOUNDING CONSTRUCTS."
"I think I should have stayed in bed this morning..." replied Susie. "Am I ever going to get the hang of all this???? "
(If at this point YOU are considering bagging the whole thesis, dial 1-800-IWANT-MOM)
"Of course you are and since I have been in this business awhile, I even have a 'cheat sheet' for you to take home with you (Click here for a copy of Dr.M's Cheat Sheet). In the
meantime, sit back and try to catch the basics. The next threat is restricted generalizability across constructs. This refers to unanticipated consequences. For instance, your
parenting classes may in fact increase the level of parents involvement with their children , but they may also increase the number of arguments between spouses because one
spouse (the one involved in the classes) may accuse the other spouse of not spending enough time with their children. This threat reminds you to be careful about whether the
observed effects ("parenting classes are a good thing") could be generalized to other outcomes (marital satisfaction).
Finally, there is the confounding constructs and levels of constructs threat. Imagine that you carried out your project , and the data analysis revealed that the parenting classes
really did nothing to increase involvement. You'd be bummed and your first reaction would probably be to chalk it up to another dissertation that proved nothing. However, it may
not be that parenting classes are useless, it may be that you didn't conduct ENOUGH parenting classes to see the desired effect. Therefore, it is not appropriate to label parenting
classes as a "waste". Get it?"
"I got it, or at least I think I am starting to get it. More importantly, did I hear you say FINALLY and did that mean we are close to the end of this madness for today?" Susie
queried.
"Close," laughed Dr. Muffy. "The last three threats we will cover are known as 'social threats'. These threats all represent the joy of doing research with human beings. The first
of these is HYPOTHESIS GUESSING."
"Let me guess, " joked Susie. "It is when the people in your study guess what you are looking at and their actions reflect their guess. In my case, that would mean that parents
guess I am trying to measure involvement with their children and they purposefully get more involved, not because of my parenting classes, but because of their inference."
"Excellent!" cried Dr.Doolittle, who had been uncharacteristically quiet. "Why don't I finish up with the last two threats." Without waiting for an answer, he proceeded.
"EVALUATION APPREHENSION is next. Many people get really anxious about being in a study -- they are afraid they won't look good or smart, or in this case that they won't
appear to be good parents. In their desire to look like the model participant their behavior and actions may not reflect reality. Again, this is a labeling problem because you label
"increased participation" as a program effect (when really it is not).
FINALLY, for real this time, one must address the RESEARCHER EXPECTANCIES threat. Without knowing it, you may bias your study. For example, you may become really
enthusiastic when you discuss parent involvement. This may send a message to parents that you think involvement is a "good thing" and may prompt them to act accordingly.
Again, this means you will label the involvement as a program effect, when in truth it is your overwhelming enthusiasm. SO...what do you think Susie?"
"I think I am overwhelmed."
"Overwhelmed is understandable, but don't panic!" cautioned Dr. Muffy.
"Understanding validity is a long process and there are many sources out there to help you get a grip on things. As a matter of fact, another professor here at Cornell, Bill
Trochim, has quite a bit of information on his web site that will be useful to you (click here to go to Bill's Page). In the meantime, I will leave you with my Top 5 Golden Rules of
Validity:
Dr. Muffy's 5 Golden Rules of Addressing Threats to Construct Validity:
1. Validity is something you argue -- not prove. You can't please all the people all the time.
2. Do the best you can with what you have, where you are. In other words, with limited time and resource (constraints of grad school and the real
world) you aren't going to eliminate every single solid threat to construct validity. Try your best, and be able to back up what you did.
3. An ounce of prevention is worth a pound of Tylenol at your defense. In other words, think about threats to construct validity (and other kinds of
validity) NOW rather than later.
4. Two heads are better than one when brainstorming threats to validity -- talk to others, get help!
5. There is no "end of the road" when learning about validity -- it's the process, not the final amount of knowledge -- that counts.
Send any questions or comments to Nicole M. Driebe. Good luck!

A QUICK AND DIRTY GUIDE TO VALIDITY:
Here's some of the basics to get you up to speed with validity. For a more detailed look at the topic click here A more in depth look at validity.
VALIDITY:
The best approximation to the truth of a given proposition, inference or conclusion.
GOOD THINGS TO REMEMBER:
● Measures , samples, and designs DON'T have validity -- they may lead to valid conclusions, but in and of themselves
they are not valid
● There are four types of validity and they build on each other
● Validity is something to be argued, not proven
● Validity can be thought of as a set of standards by which you judge things
FOUR SUBDIVISIONS OF VALIDITY:
● CONCLUSION VALIDITY: This type of validity asks the question -- Is there a relationship between two variables
in a given study? In Susie's project we would ask: Is there a relationship between participation in parenting classes and
parent involvement ?
● INTERNAL VALIDITY: Internal validity refers to the question -- Assuming that there is a relationship between variables, is the
relationship a causal one? This would lead us to ask: Do parenting classes cause parent involvement?
● CONSTRUCT VALIDITY: Construct validity looks at the question -- Assuming there is a relationship in this study, and it is causal,
did our program reflect our idea of the construct of program and did our measure reflect the idea of our construct? Or, in our
example, Did the parenting classes, or treatment, reflect our construct of parenting classes and did what we measured as "parent
involvement" reflect our construct of "parent involvement"?
● EXTERNAL VALIDITY: External validity is concerned with GENERALIZIBILITY and asks -- Assuming that there is a relationship,
and it is causal, and our constructs are adequately represented, can we generalize to other situations and people? Here, we would ask:
Would parenting classes increase parent involvement in other classrooms with other parents?
THREATS TO VALIDITY:
Threats to validity raise the what about? and what if? questions that people often ask in terms of a research study. They are also related to the
"how come you didn't think of this?" questions that committee members ask at a defense. More generally, threats to validity are the possible
reasons that the inferences made from a research project may be wrong. These threats come in many shapes and sizes; threats to construct
validity are the focus of the previous page (click here to return to the previous page: Threats to Construct Validity .
CHEAT SHEET: A QUICK REFERENCE GUIDE TO THE THREATS TO CONSTRUCT VALIDITY
● INADEQUATE PREOPERATIONAL EXPLICATION OF CONSTRUCTS:

In English: You didn't do such a hot job of operationally defining your constructs.
So What? Your critics could accuse you of not clearly thinking through your constructs.
Our Example: Susie needs to think through her constructs of parent involvement and parenting classes; among other things
she needs to consult with experts so she is looking at what she thinks she is looking at.
● MONO-OPERATION BIAS:
In English: You only applied one version of your program.
So What? One could argue that the results of your study only reflect the particular version of the program that you used.
Our example: Susie needs to think about employing more than version of her parenting program to strengthen her argument
against this threat.
● MONO-METHOD BIAS:
In English: You only used one measure to look at your construct of interest (example -one questionnaire, one observation,
etc).
So What? Maybe you didn't measure what you thought you measured...
Our example: In this case, Susie needs to measure parent involvement in more than one way -- perhaps through a
questionnaire and an observation.
● INTERACTION OF DIFFERENT TREATMENTS:

In English: It was really a combination of things -- your program and other factors -- that produced the desired effect.
So What? You think you did more than you really did (stop patting yourself on the back so much!).
Our example: It may have been a combination of things, including parents' participation in other organizations, in
conjunction with Susie's program that yield the desired effect.
● INTERACTION OF TESTING AND TREATMENT:

In English: Your testing, in combination with the actual program, produced the desired effects.
So What? You are labeling incorrectly -- when you say program you mean just the program, when in actuality the program
includes the testing.
Our Example: By pretesting her parents, Susie may raise their level of awareness of parent involvement. It may be because
of this heightened awareness, and not necessarily Susie's program, that the desired effects emerge.
● RESTRICTED GENERALIZABILITY ACROSS CONSTRUCTS:

In English: Things happened that you didn't necessarily predict and although your program had a positive effect on the
construct of interest, that "possessiveness" may not be generalizable across constructs.
So What? BE CAREFUL!!
Our Example: Susie's parenting classes may increase the level of parent involvement, but they may also increase the amount
of marital distress (as one spouse may accuse the other of not spending enough time with the kids). This threat reminds us to
be careful about whether the desired "goodness" or "badness" of an outcome can be generalizable across constructs.
● CONFOUNDING CONSTRUCTS AND LEVELS OF CONSTRUCTS:

In English: It wasn't necessarily that the program or treatment you applied did or didn't work, but rather that the AMOUNT
or LEVEL of the treatment didn't work.
So What? Maybe you can start patting yourself on the back again -- it may be better than you think!
Our Example: If Susie's data analysis showed that her program had no effect, she may become unnecessarily discouraged. It
may have simply been that she did not offer ENOUGH parenting classes, but the concept of parenting classes is still a good
one.
● HYPOTHESIS GUESSING:
In English: The members of your study guessed what you were trying to examine and based their behavior on this guess.
So What? You think your program "did the trick", when in actuality just KNOWING what your program was SUPPOSED
to do "did the trick".
Our Example: If the parents guessed that Susie was looking at parent involvement, they may alter their level of involvement
accordingly.
● EVALUATION APPREHENSION:
In English: People have this burning desire to look cool, smart, knowledgeable, etc.
So What? Because people are trying to look socially acceptable, they may not be acting in a way that reflects reality when
they are not participating in a research project.
Our Example: The parents in the study may say or act like they are highly involved with their kids, just to "look good",
regardless of the effect of the program.
● EXPERIMENTER EXPECTANCIES:
In English: Without knowing it, you encourage your subjects to respond in certain way.
So What? Again, participants are not acting or responding as they would in reality, distorting your data.
Our Example: Susie may act overly enthusiastic in regards to parent involvement and therefore influence the behavior of her
parents.
If you have had enough fun on this page for one day, and you want to continue with the engaging saga of Susie, click here
Threats to Validity .
How Stable and Consistent Is Your Instrument?
A Brief Look at Reliability
This web page was designed to provide you with basic information on an important characteristic of a good measurement instrument: reliability. Prior to starting any
research project, it is important to determine how you are going to measure a particular phenomena. This process of measurement is important because it allows you to
know whether you are on the right track and whether you are measuring what you intend to measure. Both reliability and validity are essential for good measurement,
because they are your first line of defense against forming inaccurate conclusions (i.e., incorrectly accepting or rejecting your research hypotheses). Although this tutorial
will only address general issues of reliability, you can access more detailed information by clicking on the words or titles that are highlighted.
What is Reliability?
I am sure you are familiar with terms such as consistency, predictability, dependability, stablity, and repeatability. Well, these are the terms that come to mind when we
talk about reliability. Broadly defined, reliability of a measurement refers to the consistency or repeatability of the measurement of some phenomena. If a measurement
instrument is reliable, that means the instrument can measure the same thing more than once or using more than one method and yield the same result. When we speak of
reliability, we are not speaking of individuals, we are actually talking about scores.
The observed score is one of the major components of reliability. The observed score is just that, the score you would observe in a research setting. The observed score
comprised of a true score and an error score. The true score is a theoretical concept. Why is it theoretical? Because there is no way to really know what the true score is
(unless you're God). The true score reflects the true value of a variable. The error score is the reason why the observed is different from the true score. The error score is
further broken down into method (or systematic) error and trait (or random) error. Method error refers to anything that causes a difference between the observed score and
true score due to the testing situation. For example, any type of disruption (loud music, talking, traffic) that occurs while students are taking a test may cause the students to
become distracted and may affect their scores on the test. On the other hand, trait error is caused by any factors related to the characteristic of the person taking the test
that may randomly affect measurement. An example of trait error at work is when individuals are tired, hungry, or unmotivated. These characteristics can affect their
performance on a test, making the scores seem worse than they would be if the individuals were alert, well-fed, or motivated.
Reliability can be viewed as the ratio of the true score over the true score plus the error score, or:
true score
true score + error score
Okay, now that you know what reliability is and what its components are, you're probably wondering how to achieve reliability. Simply put, the degree of reliability can be
increased by decreasing the error score. So, if you want a reliable instrument, you must decrease the error.
As previously stated, you can never know the actual true score of a measurement. Therefore, it is important to note that reliability cannot be calculated; it can only be
estimated. The best way to estimate reliability is to measure the degree of correlation between the different forms of a measurement. The higher the correlation, the higher
the reliability.
3 Aspects of Reliability
Before going on to the types of reliability, I must briefly review 3 major aspects of reliability: equivalence, stability, and homogeneity. Equivalence refers to the degree of
agreement between 2 or more measures administered nearly at the same time. In order for stability to occur, a distinction must be made between the repeatability of the
measurement and that of the phenomena being measured. This is achieved by employing two raters. Lastly, homogeneity deals with assessing how well the different items
in a measure seem to reflect the attribute one is trying to measure. The emphasis here is on internal relationships, or internal consistency.
Now back to the different types of reliability. The first type of reliability is parallel forms reliability. This is a measure of equivalence, and it involves administering two
different forms to the same group of people and obtaining a correlation between the two forms. The higher the correlation between the two forms, the more equivalent the
forms.
The second type of reliability, test-retest reliability, is a measure of stability which examines reliability over time. The easiest way to measure stability is to administer the
same test at two different points in time (to the same group of people, of course) and obtain a correlation between the two tests. The problem with test-retest reliability is
the amount of time you wait between testings. The longer you wait, the lower your estimation of reliability.
Finally, the third type of reliability is inter-rater reliability, a measure of homogeneity. With inter-rater reliability, two people rate a behavior, object, or phenomenon and
determine the amount of agreement between them. To determine inter-rater reliability, you take the number of agreements and divide them by the number of total
observations.
The Relationship Between Reliability and Validity
The relationship between reliability and validity is a simple one to understand: a measurement can be reliable, but not valid. However, a measurement must first be reliable
before it can be valid. Thus reliability is a necessary, but not sufficient, condition of validity. In other words, a measurement may consistently assess a phenomena (or
outcome), but unless that measurement tests what you want it to, it is not valid.
Remember: When designing a research project, it is important that your measurements are both reliable and valid. If they aren't, then your instruments are basically
useless and you decrease your chances of accurately measuring what you intended to measure.
For more detailed information on reliability and validity, try perusing the Knowledge Base Home Page. Or, click here if you're interested in
looking at other research methods tutorials.
The NEP and Measurement Validity
by Lisa Pelstring
Spring Semester 1997
This web site was designed for a course in Research Methods at Cornell University's Health and Human Services Department. For a
more detailed description of the New Environmental Paradigm, environmental attitudes, and various scales used to measure
environmental attitudes, please see the web site Measuring Environmental Attitudes: The NEP . This page builds on that web site and
examines measurement validity addressed in a study on the NEP by Dunlap and Van Liere.
● An Introduction to Measurement
● The New Environmental Paradigm
● The NEP Scale
● So What is a Valid Scale?
● Construct Validity
● Predictive Validity
● Face Validity
● Summary
● Return to Trochim Home Page
An Introduction to Measurement
In general, scales are meant to "weigh" an object. In social science, scales are used to "weigh" or gauge a behavior or a personality
quality like self-esteem, for example. In the late 1970s, many researchers began to examine environmental attitude and potential
ways to "gauge" this concept. Think about it for a moment, if you wanted to measure environmental attitude...
● How would you go about this?

● How would you even define environmental attitude?
● What would be some of the questions you might ask an individual to measure this concept?
● How do you know those questions are the "right" ones to ask to get at this concept of environmental attitude?
As you can see, it is very tricky to measure--let alone define--something like environmental attitude. Getting the "right" answers to
the questions above means operationalizing the construct environmental attitude accurately--defining exactly what you mean by
environmental attitude and developing a scale that captures this concept. In other words, developing a scale that is "valid" and
accurately able to measure the concept of interest--in this case environmental attitude.
When we think about measurement validity we are essentially talking about construct validity--"the approximate truth of the
conclusion that your operationalization accurately reflects its construct" (Trochim web site). Clearly, if we want to measure
environmental attitude, we first need to operationalize it or define exactly what we think an individual's environmental attitude might
be. Luckily, after a literature search on environmental attitude, we have found one study that has operationalized environmental
attitude and developed a scale to measure it.
The New Environmental Paradigm
In the 1960s and 1970s, social scientists' interest in the concept environmental attitude increased. There was a great deal of concern
relating to the environment during this decade: the Ohio Cayahoga River caught fire in 1969 capturing national attention; the first
Earth Day was held in 1970; the National Environmental Policy Act was signed that same year; and energy conservation became a
primary goal in the mid and late 1970s as oil embargoes severely impacted the nation. As a result of these and many other incidents,
funding for research directed at the environment and human interaction with the environment became more of a priority.
In 1978, social scientists Dunlap and Van Liere published an article in The Journal of Environmental Education that summarized
their efforts to measure a fairly new environmental mind-set they and other researchers believed was becoming a predominant
influence. At the time, many social scientists believed that a "paradigmatic" shift--a change in many people's way of thinking--was
occurring. People were becoming disenchanted with the so-called "Dominant Social Paradigm," which emphasized human ability to
control and manage the environment, limitless natural resources, private property rights, and unlimited industrial growth.
The New Environmental Paradigm, on the other hand, emphasized environmental protection, limited industrial growth, and
population control, among other issues. The two social scientists developed the New Environmental Paradigm scale to measure this
mind-set. Since its development, the scale has been used in many other studies--both replicating as well as modifying the scale. Many
of the studies conducted since then have questioned whether in fact a paradigmatic shift is occurring or has occurred. But most
researchers agree that the scale developed by Dunlap and Van Liere is considered one valid measure of environmental attitude and
comprises the 12-items listed below. Agreement and disagreement with these statements constitute acceptance or rejection of the
NEP.
The New Environmental Paradigm Scale
● We are approaching the limit of the number of people the earth can support.
● The balance of nature is very delicate and easily upset.
● Humans have the right to modify the natural environment.
● Humankind was created to rule over the rest of nature.
● When humans interfere with nature it often produces disastrous consequences.
● Plants and animals exist primarily to be used by humans.
● To maintain a healthy economy we will have to develop a "steady state" economy where industrial growth is controlled.
● Humans must live in harmony with nature in order to survive.
● The earth is like a spaceship with only limited room and resources.
● Humans need not adapt to the natural environment because they can remake it to suit their needs.
● There are limits to growth beyond which our industrialized society cannot expand.
● Mankind is severely abusing the environment.
So What is a Valid Scale?
Validity is "a set of standards by which research can be judged" or "the best available approximation to the truth or falsity of a given
inference, proposition, or conclusion" (Trochim web site). Validity can be divided into the following areas: Conclusion Validity,
Internal Validity, Construct Validity, and External Validity.
I will not attempt to define all of the above kinds of validity. For a detailed explanation of validity types, please see the Knowledge
Base constructed by Professor William Trochim. The kind of validity this web site is concerned with is Measurement Validity--and
falls mostly under the domain of Construct Validity. Dunlap and Van Liere attempted to prove that their NEP scale was valid by
addressing three elements of measurement validity: construct, predictive, and face validity.
Construct Validity
Construct validity is often considered the most difficult kind of validity to achieve--it essentially comprises both predictive and face
validity. I will do my best to differentiate different types of validity, but be aware that many kinds of validity overlap and are
sometimes difficult to distinguish. By construct validity we mean assessing how well an idea or concept is translated from the "land
of theory" in your head to the land of reality into an actual measure or scale. In terms of the NEP, achieving construct validity (and
thus achieving measurement validity) meant that Dunlap and Van Liere had to translate exactly what they meant by the new
environmental paradigm, as well as develop an actual scale that could accurately measure whether or not this paradigm was in fact
part of an individual's attitude-makeup.
There are essentially three conditions that must be met to ensure construct validity:
1. The concept requires a specific theoretical framework--in other words, you must explicitly state what you mean by the NEP.
2. You must be able to show that your operationalization acts the way it theoretically should.
3. The data you gather must support these theoretical views.
How did Dunlap and Van Liere achieve construct validity for their scale? Well, in order to meet the first condition, they reviewed the
literature to find out more about how others defined construct validity. In addition, they consulted scientists and ecologists to
determine if their definition and development of the NEP and scale items met with agreement among experts. [By the way, this
discussion on establishing construct validity also overlaps with Content Validity--see Trochim's web site for more information on this
kind of validity.] What I have just outlined above falls under Face Validity below, so skip to that section for more detailed
information. They met the second condition by predicting results they might achieve with their scale [skip to Predictive Validity for
more information]. By achieving predictive validity, they also essentially achieved the third condition--their data from two samples
supported their theoretical views.
Predictive Validity (and Concurrent Validity)
What is predictive validity? Recall from above that validity means "the approximate truth of the conclusion that your
operationalization accurately reflects its construct." The definition of predictive validity can be found in the name--PREDICTIVE.
How well does the NEP Scale predict what it theoretically should predict? Well, Dunlap and Van Liere were able to test their scale
on two samples--a sample of the general public as well as a sample of environmental group members. They theorized that the
environmental group members would score BETTER than the general public on the NEP scale. And they were right--the mean total
scale score for environmental group members was 43.8. This compares with a mean scale score of 36.3 for the general public.
What exactly did the two researchers accomplish with two distinct samples and an explicit statement that the environmental group
members would score better than the general public? First, they were able to demonstrate meeting the second condition for
establishing construct validity. They offered a theory about the results they would receive from these two samples--one sample would
score better than another sample based on environmental attitudes. The two researchers were correct in their prediction--the
environmental groups did score higher than the general public.
They not only established predictive validity--meeting the second condition for construct validity--they were also able to demonstrate
concurrent validity. Concurrent Validity is established when a scale is able to "distinguish between two groups that it theoretically
should be able to distinguish between" (Trochim web site). We would expect, as did Dunlap and Van Liere, that environmental group
members would care MORE about the environment--and thus score higher on the NEP scale--than the members of the general public.
Additional Tests of Predictive Validity
Dunlap and Van Liere also tested the NEP scale against other measures of environmental attitude and behavior. Their additional
scales contained a list of environmental activities and lists of state and federal environmental programs. Respondents were asked to
report how often they performed the behaviors and how much they supported the various state and federal programs. The data from
these scales was compared with the NEP scale data to see if respondents who performed environmental behaviors and supported
government environmental programs also scored well on the NEP scale. Again, Dunlap and Van Liere found that their NEP scale did
in fact correlate well with their other measures of environmental attitude and behavior.
Face Validity
Now lets look at face validity--the most subjective and weakest method to establishing measurement validity. Face Validity
essentially looks at whether the scale appears to be a good measure of the construct "on its face." As mentioned earlier, Dunlap and
Van Liere established face validity by conducting a literature review of what they considered to be crucial aspects of this new
environmental paradigm and developing a list of scale items that constitute the paradigm. In addition to the literature review,
environmental scientists and ecologists also aided in developing and writing scale questions. By submitting the NEP scale for review
by experts in environmental issues, the two researchers were able to bolster face validity.
Summary
Lets take a moment to recap exactly what Dunlap and Van Liere did to ensure measurement validity of their New Environmental
Paradigm Scale. First, the researchers operationalized--or explicitly defined--what they meant by the NEP and how they were going
to measure the NEP. Second, they worked with a panel of experts who approved of the content of their scale. Third, they used two
separate population samples--an environmental group sample and a general public sample. Fourth, the researchers also used several
scales to measure environmental attitude. Fifth, they theorized how their scale would work in relation to the other scales and with the
different population samples--in other words, what kind of data they would expect to get. And last but not least, their data actually
supported their theory and predictions.
Remember what the three conditions are to ensure construct validity?
1. The concept requires a specific theoretical framework--in other words, you must explicitly state what you mean by the NEP.
2. You must be able to show that your operationalization acts the way it theoretically should.
3. The data you gather must support these theoretical views.
What do you think? Did Dunlap and Van Liere meet these conditions? I would say yes. They provided a theoretical framework for
their concept, demonstrated that their operationalization acted the way they predicted it would, and produced data to support their
theoretical views.
Copyright © 1997 lmp23@cornell.edu

THE TRO-VELL LEARNING MAZE
Welcome to the Tro-Vell Learning Maze
We invite you to try your skill at completing the Reliability Test-Maze, the first in a series of exercises that will assist you to greater
understanding of the research methods available to you to make your work as valid and reliable as possible. We have a responsibility,
as researchers, to insure and defend the credibility of our work. As students, you need to begin acquiring now the skills for
accomplishing these essential research goals.
Classical test theory is foundational to the assessment of measurement reliability (Carmines & Zeller, 1979). As social scientists
strive to develop methods of measuring for "true scores" (hypothetical and unobserved quantities that cannot be directly measured),
they must eliminate as many sources of potential for random error as possible. Of course, it is impossible to eliminate all random
error. So, settlement on the least amount of random error in our measurement instruments is the best that we can expect to achieve.
Infinite measurement of the same phenomenon, if this were possible, would eventually yield a "true score" with all random error
eliminated. Clearly, this is not possible. The next-best approach is a finite and manageable number of repetitive measurements of the
phenomenon under study. This more practical process is the method of reliability assessment.
Reliability is defined by Carmines and Zeller (1979) as "the tendency toward consitency found in repeated measurements of the same
phenomenon" (p. 12). Reliability is revealed by "the extent to which an experiment, test, or any measuring procedure yields the same
results on repeated trials" (p.11). Reliability, therefore, is related to measurement. Measurement reliability is rooted in the degree to
which measurement of any phenomenon is confounded by factors that are designated either random error or systematic error/bias.
"Carmine and Zeller explains further that "a highly reliable indicator of a theoretical concept is one that leads to consistent results on
repeated measurements because it does not fluctuate greatly due to random error" (p. 13). Random error is present when chance
factors occur in ways that cannot be predicted, i.e., "neither the direction nor the magnitude of these errors can be predicted" (Chase,
1978, p. 78). Systematic error is present when confounding factors occur as a result of faulty instrumentation that has a predictable
affect on the measurement outcomes. The major concern of reliability is controlling for error.
There are several methods of controlling for error and increasing reliability: Pilot-Tested Instruments - strategy to get feedback from
the respondents that can be used to refine the instrument Standardized Interviewer Training - strategy to eliminate bias related to how
the instrument is administered Unobtrusive Observations - strategy to reduce "noise" by controlling for extraneous circumstances
Triangulation - use of several measures that may have different biases Double Check Data - strategy to eliminate "human" error
There are four types of methods of estimating reliability Test - Retest Method (Retest Method) Alternative Forms Method
(Equivalent Forms Method) Split-Halves Method Internal Consistency Method
Reliabililty is not a goal we seek as separate from validity. Reliabililty is inextricably linked to validity as we seek to authenticate the
quality of our work. The Reliability Test-Maze will treat these as separate endeavors so that you do not become lost in a learning
maze that is too complicated to afford a fruitful experience.
Travel through The Reliability Test-Maze is a journey of discovery---discovery of the meaning of reliability. The trip will provide
some interesting twists and false paths that you will need to anticipate and circumvent. So, be nimble!
The Reliability Test-Maze contains 10 questions that can be answered True or False. You must click on the word "True" or "False"
to answer the question and progress through the maze. Each time you guess correctly, you will receive 10 points. If you are able to
complete the maze by answering all the questions correctly the first time, you will have earned 100 points. But, we need to make this
interesting, don't we? So, each time you guess incorrectly, 5 points will be deducted.
With each answer, whether correct or incorrect, you will receive information about that question that will help you to understand
which aspect of reliability the question addresses. At the end of the discussion of that question instructions will apprear that direct
you to continue your journey through the maze. If you answered correctly, you will continue through the maze to the next question. If
you aswered incorrectly, you will be returned to the same question and given another opportunity to answer. To continue, you will
need to click on the link to the next
When you are ready, click on Question #1below to begin your journey. Oh, and, try to have some fun with this. You know what
reliability is, and these questions are just a way of confirming that knowledge.
Good luck!
Question #1
Question #1
Multiple measurements over time increases measurement reliability.

Congratulations! You avoided that little twist very nicely.
The correct answer is TRUE
Rationale: The general rule that applies here is that you obtain better measurements of the estimates of net
effects of a phenomenon when you make more measurements before and after the phenomenon. Since the
phenomenon is expected to have an affect over time, multiple periodic measurements can more reliably
represent how the individual or group is reacting to the phenomenon.
Now, you can continue through the maze to Question #2

Question #2
The difference between the true score and an obtained score is called an error of measurement. The errors of
measurement occur because we cannot precisely measure the impact of a particular phenomenon on an
individual or group. We can approximate a true score for an individual or group only through the use of the
averaged scores of various measurements that produce positive and negative errors.
We may assume that the average of a small number of obtained scores for the same person allows the errors of
measurement to cancel each other so that we can closely approximate the true score for that person.
Sorry!
That's an easy error to make.
Here's a hint....the number of scores for an individual or group is important!!!!
You must return to Question #2 and answer it correctly before you can continue through the maze. I feel certain you will get it this
time.
Congratulations! You took that turn very smoothly.
The correct answer is FALSE
Rationale: A reliability coefficient is an estimate of the statistic representing the correlation between one set of
scores for an individual and/or group and an independent second set of scores for the same individual and/or
group on equivalent measurements. The hypothetical true score of an individual or group is the average of a
very large number of scores that are obtained for the same individual or group on equivalent measurements
under eqivalent conditions.
Therefore the statement is false because it refers to a small number of obtained scores and does not specify the
conditions of equivalency.

Question #3
There has been a series of elderly-patient accidents resulting in falls. A questionnaire is designed to measure the behaviors of
elderly patients that put them at risk of falling and injuring themselves during their hospital admission. The questionnaire is
being pilot tested for reliability. Because of inadequate staffing, it is not possible to administer the questionnaire twice during
the patients' hospitalization. Consequently, it is decided to administer the questionnaire to the next 100 patients over 50 who
sustain falls while hospitalized. The Split-Halves Method is used to assess the reliability of the questionnaire for measuring at-
risk-behaviors.
The Split-Halves Method of determining the reliability of your measurement requires that you give the same questionnaire to
the 100 patients, and that you split the items on the answered questionnaires into two groups of 50 each. The only method of
assuring that questionnaire items are split into equivalent groups is to randomly assign the items into two groups
Sorry!
Here's a hint....ask yourself if this method is really so limited!!!!
You must return to Question #3 and answer it correctly before you can continue through the maze. Hang in
there! I feel certain you will get it this time.
Congratulations! You avoided that limitation very smoothly.
Rationale: Deciding which of the hundred items will go into the two groups is an important consideration.
Randomly assigning items into the two groups is one effective method. However, it is not the only method.
Counting the odd-numbered items as one group and the even- numbered items as the second group effectively
splits the items into two groups that contain items that come equally from each section of the questionnaire. In
this way, if the questions have been arranged according to criteria that, for example, is ordered from least-to-
most-intrusive, each half will contain an equal sampling of these ordered items.

Question #4
An analysis of the data from the Split-Halves Method, using the Spearman-Brown formula, yields a reliability coefficient (r)
equal to .79. This statistic tells us that the answers to the questions in one group are not consistent with the answers to the
questions in the second group.
Sorry!
Here's a hint....more is better!!!!
You must return to Question #4 and answer it correctly before you can continue through the maze. You will get
it this time, just don't split on me.
Congratulations! You didn't split apart!!!
Rationale: Split-halves reliability tells us about the relationship between the scores of the two subgroups. More
specifically, it tells us to what degree the subgroups are consistent with each other. If the consistency between
subgroups is substantial, i.e., there is a high r value that is close to 1.00, then we can have confidence that the
total questionnaire has consistency---that it is measuring the same concept.

Question #5
The freshmen students in an advanced math class are tested on their base knowledge of calculus before the professor proceeds to
more advanced theory. The test is a recent addition to the university's test-database and is administered to the students on the first day
of class. The test results are unusually high. The professor, who has taught this course successfully for several years, eliminates the
usual review sessions and proceeds with the course sylabus. The professor becomes aware that the students seem unable to follow
class discussions and the homework assignments show that the students are confused on several key concepts. The professor decides
to give the same exam two weeks later. On the second exam, the student scores are unusually high again. Although there is high
correlation between the two tests (r=.83), the professor sees that the completed homework assignments and the students' classroom
responses do not support the conclusions about the students' level of knowledge indicated by the two tests.
The professor is correct to conclude that the calculus test scores are reliably testing base calculus knowledge, but that his
expectations of the students are too high.
Sorry!
That was a hard one.
Here's a hint....what is being measured?!!!!
You must return to Question #5 and answer it correctly before you can continue through the maze. I am sure
you will get it this time.
Congratulations! You are becoming very adept at navigating the maze.
Rationale: We may assume that the test is consistently testing some concept because of the high correlation
between test scores. However, other measurements of the students' level of knowledge contradicts the
supposition that the test is reliably measuring the base calculus knowledge necessary for this course. The
current syllubus has been used successfully by the professor for several years to structure the math course. On
the other hand, the test is newly introduced into the curriculum. The logical conclusion to be drawn is that there
may be some problem in the reliability of the test that requires that it be further tested before using it as a
determinant of students' base knowledge of calculus .

Question #6
The student health service has had a 15% increase in the number of senior students diagnosed with sexually
transmitted diseases this Fall semester when compared with the number similarly diagnosed the previous
spring semester. This increased incidence of disease heightens the concern of the campus community that
students may not be receiving important health information that may influence their sexual behaviors. With this
in mind, the health service nurses design a questionnaire to determine what are the attitudes senior students
have about abstaining from sexual activity as a healthy choice. The questionnaire is administered at the
beginning of the Spring semester to the senior students (n=2,000) with the result that only 17% of the completed
questionnaires (n=1980) indicate abstinence as a choice they would make to ensure continued good health. One
month prior to completion of the spring semester, the senior students are again administered the questionnaire
(n=2,000) with the result that 40% of the completed questionnaires (n=2,000) indicate abstinence as a choice
they would make to ensure continued good health. During the intervening three months, the star basketball
player on the school team goes public with the information that he is HIV positive as a result of unprotected sex
with multiple partners. The analysis of the questionnaire results support the interpretation of measurement
instability when assessing the retest reliability of the questionnaire.
Sorry!
That was especially challenging.
Here's a hint....the impact of history!!!!
You must return to Question #6 and answer it correctly before you can continue through the maze. You will
definitely rebound on this one.
Congratulations! You recognize a red herring when you see one.
Rationale: The Test-Retest/Retest Method, although the most instinctively appealing method of assessing
reliability, has problems that seriously limit its utility. In additiion to being expensive (obtaining repeated
measures overtime is cosly), the interpretation of the correlation information does not accommodate to true
change---i.e., that the theoretical concept that was initially tested may change due to some event that the test
subjects jointly experience. When this occurs, true change is erroneously interpreted as measurement
instability.
This is illustrated by the example in the question. Attitudes toward abstinence changed substantially as a result
of senior students being confronted by the devastating consequence of the unhealthy choices made by the star
basketball player. There was an approximate 23% increase in the number of students who subsequently viewed
abstinence as a healthy choice. Consequently, the reliability of the instrument in assessing attitudes of
abstinence in the senior students is proven---but, it was also reliably recording true change in their attitudes.

Question #7
University professors, as with all teachers, are continually faced with the problem of devising new exams each
semester that are sufficiently different from the exams of last semester, but equally capable of measuring the
degree to which students have learned the course materials. constructing Alternative/Equivalent Forms of the
test that reliably test the same information is a challenge. One way of meeting the challenge is for the professor
to develop, for example, 200 exam questions. The exam questions are then randomly distributed into two sets of
100 questions each. To assess the reliability of the two sets of questions, students from several classes that
cover the same material are first tested with the 100 questons from set #1, and two weeks later they are tested
with the 100 questions from set #2. The degree to which the two sets of questions correlate is an estimate of the
reliability of both sets of questions.
The primary advantage of this method of assessing reliability over the test-retest method is that the
phenomenon that is being tested (in this instance student learning) is easier to interpret if it is subject to rapid
and/or radical change.
Sorry!
Just being a little devious with this one---this was an alternative form of the previous question.
Here's a hint...."a rose by any other name"!!!!
You must return to Question #7 and answer it correctly before you can continue through the maze. You will
definitely rebound on this one.
Congratulations! You recognize the same red herring that was present in question #6 when you see it again,
don't you?
Rationale: The Alternative/Equivalent Form Method, like the Test-Retest Method, cannot distinguish true change
from measurement instability. Therefore, when the phenomenon being tested undergoes rapid and/or radical
change, both methods have the problem of being interpreted as unreliable measurement. The primary
advantage of the Alternative/Equivalent Form Method ove the Test-Retest/Retest Method is that the same test is
not given on repeat measurement. Therefore, memory is reduced as a factor that can unduly influence the
reliability estimate upward.
The basic limitation of the Alternative/Equivalent Form Method in assessing reliability is the difficulty of
constructing two forms of a test that have parallel measurement properties.

Question #8
The Red Cross has a continual need for blood donations. However, in times of crisis, there is an increased need
for blood donors. Meeting this increased need must function within the strict medical, legal and ethical
boundaries that circumscribe the activities of this humane organization. Finding the best balance between
meeting a critical need and respecting hamane tenets is not allways easy. To illustrate, the Red Cross will offer
financial incentives to blood donors in time of crisis because of the critical need for blood donors. When these
incentives are offered, the Red Cross maintains criteria for donors that reduces any risk to them that may result
from their blood donation. However, individuals often attempt to circumvent the strictures the Red Cross
imposes on potential blood donors. Frequently, the reasons for these attempts are economically-based. So, it
happens that individuals who knowingly are not eligible to donate blood nonetheless try to do so because of
their need for cash. To assess the physiologic ability for blood donorship, the Red Cross must reliably assess
the nutritional level and the degree of illness of potential donors. These assessments are measured by
questionnaire in combination with personal interviews. Developing measurement scales that accurately
measure nutritional level and degree of illness are critically important, but not easy tasks. A great deal of time
has been devoted to developing reliable measures of these concept domains.
The degree that the Red Cross has been successful in the development of scales that reliably measure these
concept domains is evidence of how much internal consistency exists within each scale.
Congratulations! You are right on target.
The correct answer is TRUE
Rationale: When all items in a scale presumably test the same concept or trait, then the results of testing should
show high correlation between each item within the scale. When, in fact high correlation is present, then the
measurement scale is said to have internal consistency. The most frequently cited example of this is the
measurement of an object using a ruler with standardized units of measurement. If you measure the length of a
desk with a ruler that is made up of 36 standardized inches, then each subsequent individual that measures that
same desk with a standardized 36-inches ruler will obtain the same measurement result. Consequently, you can
be reliably certain that the standardized 36-inches ruler provides measurements that are internally consistent, i.
e., that it measures the length of the desk in exactly the same way each time.

Question #9
Think back to the example used in the last question. The Red Cross measures the concept domains of
nutritional level and degree of illness. These concepts although different are correlated. Nutritional status can
have positive or negative consequence for the degree of illness in an individual; and, the degree of illness in an
individual can impact on her/his nutritional status.
Therefore, it is correct to say that the correlations of the items between these different concepts are the same as
the correlations between items within the same concepts.
Sorry!
If we are not splitting them up, we are doubling them up, huh? Well, you now see through that.
Here's a hint....like-to-like!!!!
But, you must return to Question #9 and answer it correctly before you can continue through the maze. You can
do it!
Congratulations!

Rationale: The correlation that exists between items that measure the same concept or trait will always be
greater than the correlation that exists between items that measure different concepts or traits. This is logical,
and be exemplified by the following: Two identical twins, who look exactly alike will demonstrate differences in
symmetrical measurements (i.e., right hand compared to left hand, right foot compared to left foot, etc.) when
you compare the measurements across twins; whereas, the symmetrical measurements will be more nearly
exactly the same when you compare them for each twin. In other words, the right-compared-to-left sides of each
twin correlates more highly than the right-compared-to-right and the left-compared-to-left sides across twins.
Question #10
The chances of getting a reliable measurement of a phenomenon are difficult in the best of circumstances.
Obtaining reliable results become even more difficult if the threats to reliablility are not addressed when
planning, designing and implementing the measurement instrument. The length of the measurement, the time it
takes to complete the measurement instrument, and the composition of the targeted group arer all factors that
can decrease reliability.
The more heterogeneous is the targeted group, the higher is the correlation of items that measure the
phenomenon.
Sorry!
Just thought I would mix things up a bit. After all, we are about to exit the maze.
Here's a hint....different isn't always better!!!! But, you must return to Question #10 and answer it correctly
before you can continue through the maze. You can do it!
Congratulations! You certainly know that "different" is not always "better"
Rationale: Heterogeneity in a targeted group will produce greater variance in measurement results than
homogeneity. With increased variance, the reliabilility estimate decreases (an indication of decreased
correlation)
You have successfully completed the Reliability Test-Maze. Your scores are recorded. If you develop questions
later, you can always take another journey through the maze. Click here to reach the REFERENCES. The
references are comprehensive readings in simple language. They will provide good (read that, useful)
information about reliability.
CONGRATULATIONS!
If you want to give the Reliability Test-Maze another try click on TRO-VELL
LEARNING MAZE below and give it another whirl. I'm betting you'll ace it this
time.
TRO-VELL LEARNING MAZE

REFERNCES
Carmines, E.G. & Zeller, R.A. (1979). Reliability And Validity Measurement California: Sage Publications.
Chase, C.I. (1978). Measurement For Education Evaluation (2nd ed.). Philippines: Addison-Wesley Publishing.
Ebel, R.L. (1979). Essentials Of Education Measurement (3rd ed.). New Jersey: Prentice-Hall.
Trochim, W.M.K. & Linton, R. (Eds.). (1984). Essays On Evaluation: Validity, Design And Analysis. New York: Cornell University.
Trochim, W.M.K. (Ed.). (1995). HSS691: Research Design For Program Evaluation. New York: Cornell University.
Copyright 1996, William M. Trochim & Betty G. Levell

Sorry!
That one was a little tricky.
Here's a hint....replication is important to reliability!!!!
You must return to Question #1 and answer it correctly before you can continue through the maze. But you
know, the second times's a charm.
You must return to Question #1

Sorry!
We have progressed to a different reliability type.
Here's a hint....think about how your desk ruler measures things.
You must return to Question #8 and answer it correctly before you can continue through the maze. Take heart,
you are getting close to the end.
Measuring a psychological construct like emotional intelligence is as much an art as it is a science. Because such psychological
constructs are latent and not directly observable, issues of construct validity are paramount, but are, unfortunately, often
glossed over in the methodology sections of research papers. In an effort to increase the validity of conclusions reached using
paper-and-pencil measures of psychological constructs like emotional intelligence, this web page was constructed. This page
covers the major validity issues involved in measuring psychological constructs, using examples from measuring emotional
intelligence. The information gathered here will provide insight regarding the construct of emotional intelligence and how one
would attempt to clarify its meaning and measure it (as well as any other psychological construct for that matter).
As of yet, no one has created a measure of emotional

intelligence. However, due to the appeal and
applicability of such a construct, it is almost certain
that someone will attempt such an endeavor soon. As
with measuring any psychological construct, one
must not rush to make conclusions based on the
results of a poorly constructed measuring instrument.
The following is a table of contents of this page:
● Emotional intelligence
❍ Why emotional intelligence is important
❍ Definition and dimensions of emotional intelligence
● Measurement Issues
❍ Psychological constructs
❍ Problems with measurement
❍ Validity and reliability
■ Construct validity
■ Face and content validity
■ Criterion-related validity
■ Internal consistency
● Creating a measure of emotional intelligence

❍ Step 1: Item development
❍ Step 2: Scale development
❍ Step 3: Scale evaluation
● Conclusions
● Comments appreciated
● References
EMOTIONAL INTELLIGENCE
Why emotional intelligence is important
Researchers investigated dimensions of emotional intelligence (EI) by measuring related concepts, such as social skills,
interpersonal competence, psychological maturity and emotional awareness, long before the term "emotional intelligence"
came into use. Grade school teachers have been teaching the rudiments of emotional intelligence since 1978, with the
development of the Self Science Curriculum and the teaching of classes such as "social development," "social and emotional
learning," and "personal intelligence," all aimed at "raise[ing] the level of social and emotional competence" (Goleman, 1995:
262). Social scientists are just beginning to uncover the relationship of EI to other phenomenon, e.g., leadership (Ashforth and
Humphrey, 1995), group performance (Williams & Sternberg, 1988), individual performance, interpersonal/social exchange,
managing change, and conducting performance evaluations (Goleman, 1995). According to Goleman (1995: 160), "Emotional
intelligence, the skills that help people harmonize, should become increasingly valued as a workplace asset in the years to
come." And Shoshona Zuboff, a psychologist at Harvard Business School, points out, "corporations have gone through a
radical revolution within this century, and with this has come a corresponding transformation of the emotional landscape.
There was a long period of managerial domination of the corporate hierarchy when the manipulative, jungle-fighter boss was
rewarded. But that rigid hierarchy started breaking down in the 1980s under the twin pressures of globalization and
information technology. The jungle fighter symbolizes where the corporation has been; the virtuoso in interpersonal skills is
the corporate future" (Goleman, 1995: 149). If these predictions are true, then the interest in emotional intelligence, if there is
such a thing, is sure to increase, and with this increase in interest comes a corresponding increase in trying to measure
emotional intelligence. Two such measures purport to measure emotional intelligence. One test from USA Weekendand the
other is from Utne Reader. However, neither of these tests provide any evidence of providing results that are reliable or valid.
Definition and dimensions of emotional intelligence
Recent discussions of EI proliferate across the American landscape -- from the cover of Time, to a best selling book by Daniel
Goleman, to an episode of the Oprah Winfrey show. But EI is not some easily dismissed "neopsycho-babble."
EI has its roots in the concept of "social intelligence," first identified by E.L. Thorndike in 1920.
Psychologists have been uncovering other intelligences for some time now, and grouping them mainly into
three clusters: abstract intelligence (the ability to understand and manipulate with verbal and mathematics
symbols), concrete intelligence (the ability to understand and manipulate with objects), and social intelligence
(the ability to understand and relate to people) (Ruisel, 1992). Thorndike (1920: 228), defined social
intelligence as "the ability to understand and manage men and women, boys and girls -- to act wisely in
human relations." And Gardner (1983) includes inter- and intrapersonal intelligences in his theory of
multiple intelligences. These two intelligences comprise social intelligence. He defines them as follows:
Interpersonal intelligence is the ability to understand other people: what motivates them, how they work, how
to work cooperatively with them. Successful salespeople, politicians, teachers, clinicians, and religious leaders
are all likely to be individuals with high degrees of interpersonal intelligence. Intrapersonal intelligence ... is a
correlative ability, turned inward. It is a capacity to form an accurate, veridical model of oneself and to be
able to use that model to operate effectively in life.
Emotional intelligence, on the other hand, "is a type of social intelligence that involves the ability to monitor one's own and
others' emotions, to discriminate among them, and to use the information to guide one's thinking and actions" (Mayer &
Salovey, 1993: 433). According to Salovey & Mayer (1990), the originators of the concept of emotional intelligence, EI
subsumes Gardner's inter- and intrapersonal intelligences, and involves abilities that may be categorized into five domains:
Self-awareness:
Observing yourself and recognizing a feeling as it happens.
Managing emotions:
Handling feelings so that they are appropriate; realizing what is behind a feeling; finding ways to handle
fears and anxieties, anger, and sadness.
Motivating oneself:
Channeling emotions in the service of a goal; emotional self control; delaying gratification and stifling
impulses.
Empathy:
Sensitivity to others' feelings and concerns and taking their perspective; appreciating the differences in how
people feel about things.
Handling relationships:
Managing emotions in others; social competence and social skills.
Self-awareness (intrapersonal intelligence), empathy and handling relationships (interpersonal intelligence) are essentially
dimensions of social intelligence.
MEASUREMENT ISSUES
Psychological constructs
Emotional intelligence is a psychological construct, an abstract theoretical variable that is invented to explain some
phenomenon which is of interest to scientists. Salovey and Mayer invented (made up) the idea of emotional intelligence to
explain why some people seem to be more "emotionally competent" than other people. It may just be that they are better
listeners and this explains the variability in people's "emotional competence." Or it may be that these people differ in
emotional intelligence, and this is what explains the difference. Salovey and Mayer believed it was necessary to develop the
construct of emotional intelligence in order to explain this difference in people. Examples of other psychological constructs,
just to name a few, include organizational commitment, self esteem, job satisfaction, tolerance for ambiguity, optimism, and
intention to turnover.
Problems with Measurement
So imagine for the moment that you are a social scientist and you want to measure emotional intelligence using a paper-and-
pencil instrument, or in other words, a questionnaire (also referred to as a scale or measure as well). A questionnaire can
include more than one measure or scale (a measure of self-esteem and a measure of depression). Questionnaires are the most
commonly used procedure of data acquisition in field research (Stone, 1978), and many researchers have questioned how good
these questionnaires really are. Field research involves investigating something out in the "real world" rather than in a
laboratory. Problems with the reliability and validity of some of these questionnaires has often led to difficulties in interpreting
the results of field research (Cook, Hepworth, Wall & Warr, 1981; Schriesheim, Powers, Scandura, Gardiner & Lankau, 1993;
Hinkin, 1995). Unfortunately, researchers begin using these measures or questionnaires before knowing if they are any good or
not, and often make significant conclusions only to be contracted by other researchers later on who are able to measure the
constructs more accurately and precisely (Hinkin, 1995). Thus, before you go ahead and add another lousy measure of a
psychological construct to the already growing pile of them, take a few minutes now to learn about the process of creating valid
and reliable instruments that measure psychological constructs.
Validity and Reliability
Developing a measure of a psychological construct is a difficult and extremely time-consuming process if it is to be done
correctly (Schmitt & Klimoski, 1991). However, if you don't take the time to do it right, then any conclusions you reach using
your questionnaire may be dubious. Many organizational researchers believe that the legitimacy of organizational research as
a scientific endeavor is dependent upon the how well the measuring instruments measure the intended constructs (Schoenfeldt,
1984). The management field needs measures that provide results that are valid and reliable if the field is to advance (cf.
American Educational Research Association, American Psychological Association, & National Council on Measurement in
Education, 1985). The American Psychological Association (1985) states that measures of psychological constructs should
demonstrate content validity, criterion-related validity and internal consistency, or reliability, which in turn provide evidence
of construct validity. Reliability refers to the extent to which the question responses correlate with the overall score on the
questionnaire. In other words, do all the questions "hang together," all attempting to measure the same thing, whatever that
thing is? What that "thing" is involves the issue of validity. Validity is basically "the best available approximation to the truth
or falsity of a given inference, proposition, or conclusion" (Trochim, 1991: 33). In this particular case where a measure is being
constructed, validity refers to how well the questionnaire measures what it is supposed to be measuring. There are different
types of validity, and each will be discussed below. What needs to be stressed at this point is that the key word here is
demonstrating, not proving, validity of our questionnaires. We can never prove that are instruments measure what they are
supposed to measure. There is no one person or statistical test that can prove or give approval of your measure. That's why it
is suggested that one use the modifier "approximately" when referring to validity because "one can never know what is true.
At best, one can know what has not yet been ruled out as false" (Cook & Campbell, 1979: 37). Only through time and lots of
testing will the approximate "validity and reliability" of your measure be established. I use quotes around the words validity
and reliability because the measure itself is not reliable and valid, only the conclusions reached using the measure are reliable
and valid.
1. Construct validity
Construct validity is concerned with the relationship of the measure to the underlying attributes it is
attempting to assess. A law analogy sums it up nicely: construct validity refers to measuring the
construct of interest, the whole construct, and nothing but the construct. The goal is to measure
emotional intelligence, fully and exclusively. To what degree is your questionnaire measuring the
theoretical construct of emotional intelligence (only and completely)? Answering this question will
demonstrate the construct validity of your instrument. What might be happening instead of
emotional intelligence being measured is that the measure might be measuring something else, may
be measuring only part of emotional intelligence and part of something else, or may be measuring
only part of emotional intelligence and not the full construct.
Construct validity is an overarching type of validity, and includes face, content, criterion-related,
predictive and concurrent validity (described below) and convergent and discriminant validity.
Convergent validity is demonstrated by the extent to which the measure correlates with other
measures designed to assess similar constructs. Discriminant validity refers to the degree to which
the scale does not correlate with other measures designed to assess dissimilar constructs. Basically,
by providing evidence of all these variations of construct validity (content, criterion-related,
convergent and discriminant), you are establishing that your scale measures what it was intended to
measure. Construct validity is often examined using the multitrait-multimethod matrix developed by
Campbell and Fiske (1959). See two other terrific web pages for a thorough description of this
method: one by Trochim and one by Jabs.
2. Face and content validity
Face validity refers to whether a measure appears "valid on the face." In plain English, it means that
just by looking at it, one would declare that the measure has face validity. It is a judgment call, and
one would look at say a measure of emotional intelligence and say, "Yes, it looks to me like it
measures emotional intelligence." Obviously, this is the weakest form of construct validity. Content
validity is established by showing that the questionnaire items (questions) are a sample of a universe
or domain in which the researcher is interested (Cronbach & Meehl, 1955). Again, this is a judgment
call, but more systematic means can be used (such as concept mapping and factor analysis, both
described below). This means that, like in the case of emotional intelligence, a questionnaire would
have to tap or ask questions about all dimensions of the construct. If our questionnaire of emotional
intelligence only asked about how well you engage in conversation at a party than the content
adequacy of our measure is suspect. Our focus is too narrow and our questions are not a
representative sample of the entire domain or "world of" emotional intelligence. The problem here is
that we don't really know what the domain entails. We have only the educated guesses of two guys
and a few other researchers who say the domain of emotional intelligence consists of five dimensions.
As will be discussed later on, concept mapping is a useful tool for developing and gaining consensus
on the domain of a construct. See Schriesheim, Powers, Scandura, Gardiner, and Lankau (1993) for
a very thorough review of content adequacy of paper-and-pencil survey type instruments.
3. Criterion-related validity
This refers to the relationship between your measure and other independent measures (Hinkin,
1995). It is the degree to which your measure uncovers relationships that are in keeping with the
theory underlying the construct. Criterion-related validity is an indicator that reflects to what extent
scores on our measure of emotional intelligence can be related to a criterion. A criterion is some
behavior or cognitive skill of interest that we want to predict using our test scores of emotional
intelligence. For instance, people scoring higher in emotional intelligence on our test we would
predict would demonstrate more sensitivity to others' problems, would be able to control their
impulses, and would be able to label their emotions more easily than someone who scores lower on
our test of emotional intelligence. Evidence of criterion-related validity would usually be
demonstrated by the correlation between the test scores and the scores of a criterion performance.
Criterion-related validity has two sub-components: predictive validity and concurrent validity
(Cronbach & Meehl, 1955). Predictive validity refers to the correlation between the test scores and
the scores of a criterion performance given at a later date. Concurrent validity refers to the
correlation between the test scores and the scores of a criterion performance when both tests are
given at the same time. An example will help clarify the two types of validity.
Perhaps we want to predict the performance of front desk clerks at a hotel. This will be our criterion
that we want to predict using some test. The test we will use in this case is a measure of emotional
intelligence. The predictive validity of the emotional intelligence test can be estimated by correlating
an employee's score on a test of emotional intelligence with his/her performance evaluation a year
after taking the test. If there is a high positive correlation, then we can predict performance using the
emotional intelligence measure and have demonstrated the predictive validity of the emotional
intelligence measure. To demonstrate concurrent validity, we would have to correlate emotional
intelligence test scores and criterion scores (current performance evaluations). If the correlation is
large and positive, this would provide evidence of concurrent validity. Because the concurrent
validity correlation coefficient tends to underestimate the corresponding predictive validity
correlation coefficient, predictive validity tends to be preferred to concurrent validity.
4. Internal consistency
Also known as internal consistency reliability, this refers to how well the questions correlate to each
other and to the total test score. Basically what internal consistency reliability measures is whether
the items are all measuring the same thing, whatever that "thing" might be. There are several
different statistical procedures for estimating this reliability. The most common estimates a
coefficient alpha, or Cronbach coefficient alpha. If a scale is multi-dimensional, consisting of
numerous subscales, than coefficient alphas must be estimated for each subscale.
CREATING A MEASURE OF EMOTIONAL INTELLIGENCE
Now that we know what we are up against, let's begin developing a measure of emotional intelligence (or any other construct
you wish to measure). The basic steps for developing measures, as suggested by Schwab (1980) are as follows:
Step 1: Item Development

The generation of individual items or questions.
Step 2: Scale Development

The manner in which items are combined to form scales.
Step 3: Scale Evaluation

The examination of the scale in light of reliability and validity issues.
The following discussion will be presented in the order of steps suggested by Schwab (1980), with modifications and additions
made as necessary. At each step, the issues relating to validity and reliability will be addressed.
Step 1: Item Generation
The first step in creating a measure of a psychological construct is creating test questions or items. For example, in the case of
emotional intelligence, you may create a group of 20 questions, the answers to which would provide evidence of a person's
emotional intelligence. But how do you know what to ask? And how many questions are needed? The answer is that you have
to ask questions that sample the construct domain and you have to ask enough questions to adequately sample the domain to
ensure that the entire domain has been covered, but not too many extraneous questions. According to Hinkin (1995: 969), the
"measure must adequately capture the specific domain of interest yet contain no extraneous content." This has to do with
content validity and there is no statistical or quantitative index of content validity. It is a matter of judgment and of collecting
evidence to prove the content validity of the measure.
However, first things first. You have to define the construct you are interested in measuring. It may be already defined by the
existing literature or it may need to be defined based on a review of the literature. In the case of emotional intelligence, Salovey
and Mayer have provided a theoretical universe of emotional intelligence. They suggest that emotional intelligence consists of 5
dimensions as noted above. One way of generating items for your measure would be to create questions that tap these five
dimensions, utilizing the classification schema defined by them. This is called the deductive approach to item development
(Hinkin, 1995). So, you say, now we're getting somewhere. All I have to do is write questions that get at all 5 dimensions of
emotional intelligence. And if I can't do it alone, I can ask experts to help generate questions within the conceptual definition of
emotional intelligence. But how does one know if Salovey and Mayer are right? How does one know that emotional intelligence
is comprised of 5 dimensions and not 6 or 3? And how do you know if the dimensions they mentioned are right? Maybe
emotional intelligence consists of five dimensions, but just not the dimensions as they defined them.
If little literature or theory exists concerning a construct, then an inductive approach to item development must be undertaken
(Hinkin, 1995). Basically the researcher is left to determine the domain or dimensions of the construct. The researcher can
gather qualitative data, such as interviews, and categorize the content of the interviews in order to generate the dimensions of
the construct. One method that of data gathering that is quite useful in developing a conceptual domain of a construct is
concept mapping.
Developed by William Trochim (1989), concept mapping is a "type of structured conceptualization" that allows a group of
people to conceptualize, in the form of a "concept map"(a visual display), the domain of a construct. The group of people can
consist of just about anyone and is typically best when a "wide variety of relevant people" are included (Trochim, 1989: 2). In
the case of emotional intelligence, in order to develop the domain of the construct, one might wish to gather a group of experts,
such as psychologists, or human resources managers, or a group of employees. The groups are then asked to brainstorm about
the construct. For emotional intelligence, the brainstorming focus statement may be something like: "Generate statements
which describe the ways in which a person high in emotional intelligence is distinct from someone low in emotional
intelligence" or "What is emotional intelligence?" The entire process of concept mapping is described in Trochim (1989).
What concept mapping does, as well as what can be done with data collected via qualitative methods such as interviews, is
factor analyze, or sort, the items into groups which then provide a foundation for defining a construct as multi-dimensional. If
we were to gather a bunch of experts and conducted a concept mapping session, we would hope that their conceptualization of
emotional intelligence would consist of the five dimensions suggested by Mayer and Salovey, thus lending support to Mayer &
Salovey's theoretical dimensions.
Regardless of whether a deductive or inductive approach to item generation is undertaken, the main issue is content validity,
specifically domain sampling. In the case of a deductive procedure, item are generated theoretically from the literature. These
items may be assessed by experts in the area as to the content validity of the items. In the case of emotional intelligence, we
could develop items to cover the five dimensions. Then we could ask a group of psychologists to sort the items into six
categories, the five dimensions plus an "other" category. Those items that were assigned to the proper category more than
80% or 85% would be retained for use in the questionnaire. The "other" category and those items not meeting the cutoff for
the proper category would be discarded. This procedure is described as a best procedure in Hinkin (1995). Another way of
tackling this would be, rather than giving the five dimensions to the experts, just ask them to sort the piles into as many
categories as they see fit. The results can be analyzed in the same manner used in concept mapping. If the experts come up
with 5 dimensions like those theorized, then the researcher can be more confident in those dimensions. Just because some
people theorize what the domain of a construct is, there is no reason to rely on their theoretical conceptualization of the
construct. By giving the experts the categories up front, you are in essence, assuming those categories, dimensions or
conceptualization of the construct is correct and are limiting the experts within those boundaries. Allowing the experts to sort
into as many categories as they see fit allows the data to speak for itself and if the categories coincide with the theorized
categories, this is confirmatory evidence of the conceptualization of the domain.
If an inductive approach was taken, the same process can be undertaken. Experts may be used to sort the data. If interviews
were conducted, the raw, qualitative data may be sorted, from which items are generated for each category. Another way of
sorting involves generating items from the raw data, using as much of the wording provided by the interviewees as possible,
and then sorting the items. The raw data or items may be sorted by either telling the sorters the number of categories to sort
into or by allowing the sorters to categorize into as many categories as they see fit (and each sorter may sort into a different
number of categories!). Once again, by allowing the sorters to determine the number of categories, it allows the data to speak
rather than forcing the data into some preconceived notion as to how many categories there should be.
The main concern in generating items for a measure is with content validity -- that is, assessing the adequacy
with which the measure assesses the domain of interest.
The content validity of a measure should be assessed as soon as the items have been developed. This way, if items need revision,
this can be done before the researcher has large investments in the preparation and administration of the questionnaire
(Schriesheim, et al., 1993).
Step 2: Scale Development
There are three stages within this step: design of the developmental study, scale construction, and reliability assessment
(Hinkin, 1995).
A. Developmental study
At this stage in the process, the researcher has a potential set of items for the questionnaire measuring the intended construct.
However, at this point, we don't know if the items measure the construct. We only know that they seem to break down (via the
sorting) into categories that seem to reflect the underlying dimensions of the construct. Next, the researcher has to administer
the items or questionnaire to see how well the items conform to the expected and theorized structure of the construct. There
are five important issues in measurement that need to be addressed in the developmental study phase of scale development.
The Sample
Who the questionnaire or items are given to make a difference. The sample of individuals chosen should be
selected to reflect or represent the population of individuals the researcher is intended to study in the future
and make inferences about.
Reverse-scored Items
The use of negatively worded items (items that are worded so a positive response indicates a "lack" of the
construct) are mainly used to eliminate or attenuate response pattern bias or response set. Response pattern
bias is where the respondent simply goes down the page without really reading the questions thoroughly and
circles all "4"s for a response to all the questions. With reverse-scored items, the thought is that the
respondent will have to think about the response because the answer is "reversed." However, in recent years,
reverse-scored items have come under attack because these items where found to reduce the validity of
questionnaire responses (Schriesheim & Hill, 1981) and in fact may introduce systemmatic error to the scale
(Jackson, Wall, Martin, & Davids, 1993). An in factor analysis (a sorting of the items into underlying
categories or dimensions) of negatively worded and positively worded items, the negatively worded item
loadings were lower than the positively worded items that loaded on the same factor (Hinkin, 1995).
Alternatives to attenuate response pattern bias should be sought before automatically turning to reverse-
scored items. Keeping the scales shorter rather than longer can help reduce response pattern bias.
Number of Items
The measure of a construct should include enough items to adequately sample the domain, but at the same
time is as parsimonious as possible, in order to obtain content and construct validity (Cronbach and Meehl,
1955). The number of items in a scale can affect responses in different ways. Scales with too many items and
excessively lengthy can induce fatigue and response pattern bias (Anastasi, 1976). By keeping the number of
items to a minimum, response pattern bias can be reduced (Schmitt & Stults, 1985). However, if too few items
are used, than the content and construct validity and reliability of the measure may be at risk (Kenny, 1979;
Nunnally, 1976). Single item scales (those scales that ask just one question to measure a construct) are most
susceptible to these problems (Hinkin & Schriesheim, 1989). Adequate internal consistency reliability can be
obtain with as few as three items (Cook, Hepworth, Wall, & Warr, 1981), and the more items added the
progressively less impact they have on the scale reliability (Carmines & Zeller, 1979).
Scaling of Items
The scaling of items refers to the choice of responses given for each item. Examples include Likert-type scales,
such as choosing from 1 to 5, which refer to strongly agree, agree, neither agree or disagree, disagree, and
strongly disagree, respectively. Semantic differential scales refer to the use of words such as "happy" and
"sad" and the respondent chooses a response on a scale of 1 to 7 or 1 to 5, with "1" referring to "happy" and
"5" or "7" referring to "sad" and the numbers in between referring to states between being happy and sad.
The important issue to contend with at this point is achieving sufficient variance or variability among
respondents. A researchers would not want a measure with a Likert-type scale with responses 1 to 3, and
most of the respondents choosing response "3." This measure is not capable of differentiating different types
of responses, and perhaps giving choices from 1 to 5 would alleviate this problem. The reliability of Likert-
type scales increases with the increase in the number of response choices up to five, but then levels off (Lissitz
& Green, 1975).
Sample Size
In terms of confidence in the results, the larger the sample size the better. That is, if the researcher has
generated items and is looking to conduct a developmental study to check the validity and reliability of the
items, then the larger sample of individuals administered the items, the better. The larger the sample, the
more likely the results will be statistically significant. When conducting factor analysis of the items to check
the underlying structure of the construct, the results may be susceptible to sample size effects (Hinkin, 1995).
Rummel (1970) recommends an item-to-response ratio range of 1:4, and Schwab (1980) recommends a ratio
of 1:10. For example, if a researchers has 20 items he/she is analyzing, then the sample size should be
anywhere from 80 to 200 respondents. New research in this area has found that a sample size of 150
respondents should be adequate to obtain an accurate exploratory factor analysis solution given that the
internal consistency reliability is reasonably strong (Guadagnoli & Velicer, 1988). An exploratory factor
analysis is when there is no a priori conceptualization of the construct. A confirmatory factor analysis is when
the researcher is attempting to confirm the theoretical conceptualization put forth in the literature. In the
case of emotional intelligence, a confirmatory factor analysis would be conducted to see if the items
"breakdown" or "sort" into five factors or "dimensions" similar to those suggested by Mayer and Salovey.
Recent research suggests that a minimum sample size of 200 is necessary for an accurate confirmatory factor
solution (Hoelter, 1983).
B. Scale construction
At this point in the process, the researcher has generated items and administered them to a sample (hopefully representative of
the population of interest). The researcher has taken into consideration reverse-scored items, the number of items to both
adequately sample the domain and be parsimonious, the scaling of the items to ensure sufficient variance among the
respondents, and has used an adequate sample size. Now comes the process of constructing the scale or measure of the
construct, through a process of reduction of the number of items and the refinement of the construct. The most common
technique for doing this is factor analysis (Ford, MacCallum & Tait, 1986). When items do not load sufficiently on a factor
should be discarded or revised. Minimum item loadings of .40 are the most commonly mentioned criteria (Hinkin, 1995).
The purpose of the factor analysis in the construction of the scale is to "examine the stability of the factor structure and
provide information that will facilitate the refinement of a new measure" (Hinkin, 1995: 977). The researcher is trying to
establish the factor structure or dimensionality of the construct. Using a couple of different independent samples for
administering the items and then factor analyzing the results of each sample will help provide evidence (or lack of evidence!) of
a stable factor structure. If the researcher finds a different factor structure for each sample, then the researcher has some
work to do uncover a stable (the same for all samples) factor structure. Although either an exploratory or confirmatory factor
analysis can be conducted, Hinkin (1995: 977) recommends using a confirmatory approach at this point in scale development
"...because of the objective of the task of scale development, it is recommended that a confirmatory approach be utilized ...
[because] it allows the researcher more precision in evaluating the measurement model." And although the confirmatory
factor analysis will tell the researcher if the items are loading on the same factor, it does not tell the researcher if the factor is
measuring the intended construct. For example, in the case of emotional intelligence, if I administered the items to a sample
and the items loaded on five factors, I might want to jump to conclusions and say my items measure the same five dimensions
as outlined by Mayer and Salovey. This would be a big mistake. All I really know at this point is that the items appear to
measure five factors or dimensions of "something." I still don't know what that something is. I'm hoping that it is emotional
intelligence, but I won't gather evidence until Step 3: Scale Evaluation (see below).
C. Reliability assessment
Two basic issues are to be dealt with at this point: internal consistency and the stability of the scale over time. As mentioned
previously, the internal consistency reliability measures whether or not the items "hang together" -- that is, whether the items
all measure the same phenomenon. The internal consistency reliability of measures are commonly assessed using Cronbach's
Alpha. The stability of the measure over time will be assessed by the test-retest reliability of the measure since emotional
intelligence is not expected to change over time (Stone, 1978). An alpha of .70 will be considered the minimum acceptable level
for this measure.
Step 3: Scale Evaluation
At this point in the process, a measure of a psychological construct has been developed that is both reliable and valid.
Construct validity was demonstrated via concept mapping, factor analysis, internal consistency, and test-retest reliability.
However, as suggested by Hinkin (1995: 979, 980),
Demonstrating the existence of a nomological network of relationships with other variables through criterion-
related validity, assessing two groups who would be expected to differ on the measure, and the demonstrating
discriminant and convergent validity using a method such as the multitrait-multimethod matrix developed by
Campbell and Fiske (1959) would provide further evidence of the construct validity of the new measure.
Criterion-related validity
Criterion-related validity is an indicator that reflects to what extent scores on the measure of the construct of interest can be
related to a criterion. A criterion is some behavior or cognitive skill of interest that one wants to predict using the test scores of
the construct of interest. For instance, in the case of emotional intelligence, people who score higher in emotional intelligence
according to the measure would be predicted to demonstrate more sensitivity to others' problems, be able to control their
impulses, and be able to label their emotions more easily than someone who scores lower on the test of emotional intelligence.
Evidence of criterion-related validity would usually be demonstrated by the correlation between the test scores and the scores
of a criterion performance. For emotional intelligence, the criterion performance could be showing sensitivity to others'
problems, being able to label one's feelings, etc. judged by an expert. One way of doing this would be to have the facilitators of
a sensitivity training group (T-group) judge a sample of T-group participants on the performance of the criteria. "The training
or T-group is an approach to humans relation training which, broadly speaking, provides participants with an opportunity to
learn more about themselves and their impact on others and, in particular, to learn how to function more effectively in face-to-
face situations" (Cooper & Mangham, 1971: v). As such, it is a rich environment for seeing the display of emotional
intelligence. The facilitators of each T-group will supply subjective measures of each group member's level of emotional
intelligence and these will be correlated with the observed scores of each group member on the emotional intelligence
instrument, providing further evidence for the measure's validity.
Construct validity
Construct validity includes face, content, criterion-related, predictive, concurrent, convergent and discriminant validity, as
well as internal consistency. Issues concerning face, content, predictive and concurrent validity have already been addressed in
previous sections. As mentioned previously, construct validity is often examined using the multitrait-multimethod matrix, and
is a wonderful method that addresses issues of convergent and discriminant validity (see Campbell and Fiske (1959) or the web
pages by Trochim and Jabs for details on this method). Convergent validity is demonstrated by the extent to which the
measure correlates with other measures designed to assess similar constructs. Discriminant validity refers to the degree to
which the scale does not correlate with other measures designed to assess dissimilar constructs.
In the case of emotional intelligence, the newly developed measure could be correlated with Gist's (1995) Social Intelligence
measure, Riggio's (1986) Social Skills Inventory, Hogan's (1969) Empathy Scale, Snyder's (1986) Self-monitoring Scale,
Eysenck's (1977) I.7 Impulsiveness Questionnaire and Watson and Greer's (1983) Courtauld Emotional Control Scale. Such
correlations with specific dimensions of the emotional intelligence measure would provide evidence for convergent validity.
Specifically,
● Hogan's Empathy Scale should converge with the empathy subscale of the emotional intelligence instrument;
● Eysenck's I.7 Impulsiveness Questionnaire should negatively correlate and Watson and Greer's Courtauld Emotional
Control Scale should positively correlate with the motivating oneself subscale of the emotional intelligence instrument;
● Riggio's Social Skills Inventory should converge with the handling relationships subscale of the emotional intelligence
instrument; and
● Gist's Social Intelligence should be positively correlate with the self awareness and handling relationships subscales of
the emotional intelligence instrument.
The correlations of these other scales with specific subscales of the measure of emotional intelligence would be predicted to be
stronger than the correlations of any of these other scales with the entire measure of emotional intelligence, thus providing
evidence of discriminant validity. In addition, discriminant validity of any measure of emotional intelligence would have to
address how emotional intelligence differs from other intelligences.
In addition, as with any measure of a psychological construct, social desirability should be assessed. One of the most popular
measures of social desirability is the Crowne and Marlowe (1964) measure. Another point to be mentioned is that a different
independent sample should be used at each stage in the development of any psychological construct, thus attenuating the
possibility of "sample specific" findings and increasing the generalizability of the measure.
CONCLUSIONS
Creating a paper-and-pencil measure of a psychological construct is lengthy and difficult process, and poor measures continue
to be created and used. Some researchers may not understand or appreciate the importance of reliability and validity to
proper measurement. Many researchers create measures and never validate them, instead relying on "the face validity if a
measure appears to capture the construct of interest" (Hinkin, 1995: 981). In addition, because developing sound measures is a
arduous and lengthy process, many researchers take shortcuts or simply avoid the process altogether. Schmitt (1989) believes
that the behavioral science field may overlook the importance of validity and reliability, instead emphasizing statistical
analysis. Statistical procedures and analysis are of little importance if the data is collected with measures that have not been
proven to provide reliable and valid data (Nunnally, 1978). And without sound measurement, the theoretical progress of the
field is in jeopardy (Schwab, 1980).
COMMENTS APPRECIATED
Thank you for visiting this page. It was created by Cheri A. Young, Ph.D. student in micro-organizational behavior at Cornell
University. Comments are greatly appreciated and can be addressed to Cheri at cay1@cornell.edu.
Copyright © 1996, Cheri A. Young. All rights reserved.
REFERENCES
Anastasi, A. (1976). Psychological testing, 4th ed. New York: Macmillan.
Ashforth, B.E. & Humphrey, R.H. (1995). Emotion in the workplace: A reappraisal. Human Relations, 48(2), 97-125.
Campbell, D.T. & Fiske, D.W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix.
Psychological Bulletin, 56: 81-105.
Carmines, E.G. & Zeller, R.A. (1979). Reliability and validity assessment. Beverly Hills: Sage.
Cook, T.D. & Campbell, D.T. (1979). Quasi-experimentation. Boston: Houghton Mifflin Company.
Cook, J.D., Hepworth, S.J., Wall, T.D. & Warr, P.B. (1981). The experience of work. San Diego: Academic Press.
Cooper, C.L. & Mangham, I.L. (1971). T-groups: A Survey of Research. London: Wiley-Interscience.
Cronbach, L.J. & Meehl, P.C. (1955). Construct validity in psychological tests. Psychological Bulletin, 52: 281-302.
Crowne, D. & Marlowe, D. (1964). The approval motive: Studies in evaluative dependence. New York: Wiley.
Eysenck, S.B., Pearson, P.R., Easting, G. & Allsopp, J.F. (1985). Age norms for impulsiveness, venturesomeness and empathy
in adults. Personality and Individual Differences, 6(5), 613-619.
Ford, J.K., MacCallum, R.C. & Tait, M. (1986). The application of exploratory factor analysis in applied psychology: A critical
review and analysis. Personnel Psychology, 39: 291-314.
Gardner, H. (1993). Multiple Intelligences. New York: BasicBooks.

Gist, M.E. (1995). The Social Intelligence measure.
Goleman, D. (1995). Emotional intelligence. New York: Bantam Books.
Guadagnoli, E. & Velicer, W.F. (1988). Relation of sample size to the stability of component patterns. Psychological Bulletin,
103: 265-275.
Hinkin, T.R. (1995). A review of scale development practices in the study of organizations. Journal of Management, 21(5), 967-
988.
Hinkin, T.R. & Schriesheim, C.A. (1989). Development and application of new scales to measure the French and Raven (1959)
bases of social power. Journal of Applied Psychology, 74(4): 561-567.
Hoelter, J.W. (1983). The analysis of covariance structures: Goodness-of-fit indices. Sociological Methods and Research, 11:
325-344.
Hogan, R. (1969). Development of an empathy scale. Journal of Consulting and Clinical Psychology, 33, 307-316.
Jackson, P.R., Wall, T.D., Martin, R. & Davids, K. (1993). New measures of job control, cognitive demand and production
responsibility. Journal of Applied Psychology, 78: 753-762.
Kenny, D.A. (1979). Correlations and causality. New York: Wiley.
Lissitz, R.W. & Green, S.B. (1975). Effect of the number of scale points on reliability: A Monte Carlo approach. Journal of
Applied Psychology, 60: 10-13.
Mayer, J.D. & Salovey, P. (1993). The intelligence of emotional intelligence. Intelligence, 17, 433-442.
Nunnally, J.C. (1976). Psychometric theory, 2nd ed. New York: McGraw-Hill.
Riggio, R. (1986). Assessment of basic social skills. Journal of Personality and Social Psychology, 51(3), 649-660.
Ruisel, I. (1992). Social intelligence: Conception and methodological problems. Studia Psychologica, 34(4-5), 281-296.
Rummel, R.J. (1970). Applied factor analysis. Evanston, IL: Northwestern University Press.
Salovey, P. & Mayer, J.D. (1990). Emotional intelligence. Imagination, Cognition, and Personality, 9(1990), 185-211.
Schmitt, N.W. & Klimoski, R.J. (1991). Research methods in human resources management. Cincinnati: South-Western
Publishing.
Schmitt, N.W. & Stults, D.M. (1985). Factors defined by negatively keyed items: The results of careless respondents? Applied
Psychological Measurement, 9: 367-373.
Schoenfeldt, L.F. (1984). Psychometric properties of organizational research instruments. In T.S. Bateman & G.R. Ferris
(Eds.), Method and analysis in organizational research. Reston, VA: Reston Publishing.
Schriesheim, C.A. & Hill, K. (1981). Controlling acquiescence response bias by item reversal: The effect on questionnaire
validity. Educational and psychological measurement, 41: 1101-1114.
Schriesheim, C.A., Powers, K.J., Scandura, T.A., Gardiner, C.C. & Lankau, M.J. (1993). Improving construct measurement in
management research: Comments and a quantitative approach for assessing the theoretical content adequacy of paper-and-
pencil survey-type instruments. Journal of Management, 19: 385-417.
Schwab, D.P. (1980). Construct validity in organization behavior. In B.M. Staw & L.L. Cummings (Eds.), Research in
organizational behavior, Vol. 2. Greenwich, CT: JAI Press.
Snyder, M. (1986). On the nature of self-monitoring: Matters of assessment, matters of validity. Journal of Personality and
Social Psychology, 51(1), 125-139.
Stone, E. (1978). Research methods in organizational behavior. Glenview, IL: Scott, Foresman.
Thorndike, E.L. (1920). Intelligence and its uses. Harper's Magazine, 140, 227-235.
Trochim, W.M. (1991). Developing an evaluation culture for international agricultural research. In D.R. Lee, S. Kearl, and N.
Uphoff (Eds.). Assessing the Impact of International Agricultural Research for Sustainable Development: Preceedings from a
Symposium at Cornell University, Ithaca, NY, June 16-19, the Cornell Institute for Food, Agriculture and Development,
Ithaca, NY.
Trochim, W.M. (1989). An introduction to concept mapping for planning and evaluation. Evaluation and Program Planning,
12, 1-16.
Trochim, W.M. (1985). Pattern matching, validity, and conceptualization in program evaluation. Evaluation Review, 9(5), 575-
604.
Watson, M. & Greer, S. (1983). Development of a questionnaire measure of emotional control. Journal of Psychosomatic
Research, 27(4), 299-305.
Williams, W.M. & Sternberg, R.J. (1988). Group intelligence: Why some groups are better than others. Intelligence, 12, 351-
377.
Introductory Social Research Methods: Introduction to Program Evaluation
=>Copyright © 1997 by by David Abrahams
David Abrahams. All rights reserved.
Updated: May 8, 1997
Comments and questions
Introduction to Research Design

"It is the theory that decides what can be observed."
Albert Einstein
Experiments, if conducted correctly can enable a better understanding of the relationship between a
Table of Contents
causal hypothesis and a particular phenomenon of theoretical or practical interest. One of the biggest
Main Menu
challenges is deciding which research methodology to use. “Research that tests the adequacy of research
Full Screen View methods does not prove which technique is better; it simply provides evidence relating to the potential
Intodution to Reasearch Design strengths and limitations of each approach.” (Howard, 1985).
Random Assignment
In research and evaluation, a true experimental design (also known as random experimental design), is
Design Notations
the preferred method of research. It provides the highest degree of control over an experiment, enabling
Notation Designs and Their Meaning the researcher the ability to draw causal inferences with a high degree of confidence.
Preexperimental Designs
A true experimental design is a design in which subjects are randomly assigned to program and control
True Experimental Design groups. With this technique, every member of the target population has an equal chance of being
Social Researchers selected for the sample. The fact that every member of the target population has an equal chance of
being selected for the sample makes this design the strongest method for establishing equivalence
Bibliography between a program and control group.
Table of Contents of Links to Other Sources

Quasi-experimental group design differs from true experimental group design by the omission of random
Full Screen View assignment of subjects to a program and control group. As a result, you can not be sure that the program
Threats to Internal Validity (Trochim, and the control group are equivalent.
1997)
Introduction to Internal Validity The use of random experimental design to randomly assign subjects to a program and control group,
(Trochim, 1997) controls for all threats to internal validity. Issues of internal validity arise when groups in the study are
nonequivalent. Your ability as a researcher to say that your treatment caused the effect is compromised.
Quasi-experimental designs (Trochim,
1996) In most causal hypothesis tests, the central inferential question is whether any observed outcome
differences between groups are attributable to the program or instead to some other factor. In order to
Research Tools
Full Screen View argue for the internal validity of an inference the analyst must attempt to demonstrate that the program
and not some plausible alternative explanation is responsible for the effect. In the literature on internal
Study Size Determination Program, validity, these plausible alternative explanations or factors are often termed threats" to internal
University of Bergen, Norway validity" (Trochim, 1997).
Case Studies of Statistical Methods

used in Research Let us consider an instance in which an investigator wishes to determine if a program designed to reduce
prejudice is effective. In this instance, the independent variable is a lecture on prejudice for grammar
MiniTab Home Page school students. For the dependent measure, the researcher will use a standard self-report test of
prejudice. To conduct the study, the researcher selects a group of students from a local grammar school
and administers the prejudice questionnaire to all of them. A week later, all the students receive the
Other Web Site Tutorials
lecture on prejudice and, after the lecture, again are tested. The next step is to find out whether the
Full Screen View
prejudice scores collected before the intervention (call them the pretest scores) are substantially higher
Why use statistics ?, Dr T.E.Allan Palmer, than scores obtained following the lecture (the posttest scores). The researcher might conclude that, if
The Division of Anaesthesiology & Intensive the posttest responses are lower than the pretest responses, the intervention has reduced subjects'
Care, The University of Queensland, Australia prejudice. As you can see, what the researcher has done is assume that changes in the dependent variable
How to use MiniTab - A demo exercise were caused by the introduction of the independent variable. But what possibilities other than the
operation of the independent variable on the dependent variable might explain the observed relationship
(Campbell & Stanley, 1963)? The section on experimental design explains several such threats to
internal validity .
This is an important point to note. The research designs and methods used in an evaluation have a direct
effect on whether or not a program is perceived effective. Did the cause really produced the effect or was
it some other plausible explanation? If the cause produced the effect, can it be generalized to a different
group in another location? These are questions of validity. "The first thing we have to ask is: "validity of
what?" When we think about validity in research, most of us think about research components. We
might say that a measure is a valid one, or that a valid sample was drawn, or that the design had strong
validity. All of those statements are technically incorrect. Measures, samples and designs don't 'have'
validity -- only propositions can be said to be valid. Technically, we should say that a measure leads to
valid conclusions or that a sample enables valid inferences, and so on. It is a proposition, inference or
conclusion that can 'have' validity" (Trochim, 1997).
| Main Menu | Top of the Page |
Random Assignment | Notation Designs and their Meaning | Design Notation | Preexperimental Designs | True
Experimental Designs | Social Researchers | Bibliography
Research Methods Tutorial Page | Trochim's Social Research Home Page

Introduction to Program Evaluation
by David Abrahams
Introduction to Research Design

"It is the theory that decides what can be observed."
Albert Einstein
Experiments, if conducted correctly can enable a better understanding of the relationship between a causal hypothesis and a particular
phenomenon of theoretical or practical interest. One of the biggest challenges is deciding which research methodology to use.
“Research that tests the adequacy of research methods does not prove which technique is better; it simply provides evidence relating
to the potential strengths and limitations of each approach.” (Howard, 1985).
In research and evaluation, a true experimental design (also known as random experimental design), is the preferred method of
research. It provides the highest degree of control over an experiment, enabling the researcher the ability to draw causal inferences
with a high degree of confidence.
A true experimental design is a design in which subjects are randomly assigned to program and control groups. With this technique,
every member of the target population has an equal chance of being selected for the sample. The fact that every member of the target
population has an equal chance of being selected for the sample makes this design the strongest method for establishing equivalence
between a program and control group.
Quasi-experimental group design differs from true experimental group design by the omission of random assignment of subjects to a
program and control group. As a result, you can not be sure that the program and the control group are equivalent.
The use of random experimental design to randomly assign subjects to a program and control group, controls for all threats to internal
validity. Issues of internal validity arise when groups in the study are nonequivalent. Your ability as a researcher to say that your
treatment caused the effect is compromised.
In most causal hypothesis tests, the central inferential question is whether any observed outcome differences between groups are
attributable to the program or instead to some other factor. In order to argue for the internal validity of an inference the analyst must
attempt to demonstrate that the program and not some plausible alternative explanation is responsible for the effect. In the literature
on internal validity, these plausible alternative explanations or factors are often termed threats" to internal validity" (Trochim, 1997).
Let us consider an instance in which an investigator wishes to determine if a program designed to reduce prejudice is effective. In
this instance, the independent variable is a lecture on prejudice for grammar school students. For the dependent measure, the
researcher will use a standard self-report test of prejudice. To conduct the study, the researcher selects a group of students from a
local grammar school and administers the prejudice questionnaire to all of them. A week later, all the students receive the lecture on
prejudice and, after the lecture, again are tested. The next step is to find out whether the prejudice scores collected before the
intervention (call them the pretest scores) are substantially higher than scores obtained following the lecture (the posttest scores). The
researcher might conclude that, if the posttest responses are lower than the pretest responses, the intervention has reduced subjects'
prejudice. As you can see, what the researcher has done is assume that changes in the dependent variable were caused by the
introduction of the independent variable. But what possibilities other than the operation of the independent variable on the dependent
variable might explain the observed relationship (Campbell & Stanley, 1963)? The section on experimental design explains several
such threats to internal validity .
This is an important point to note. The research designs and methods used in an evaluation have a direct effect on whether or not a
program is perceived effective. Did the cause really produced the effect or was it some other plausible explanation? If the cause
produced the effect, can it be generalized to a different group in another location? These are questions of validity. "The first thing we
have to ask is: "validity of what?" When we think about validity in research, most of us think about research components. We might
say that a measure is a valid one, or that a valid sample was drawn, or that the design had strong validity. All of those statements are
technically incorrect. Measures, samples and designs don't 'have' validity -- only propositions can be said to be valid. Technically, we
should say that a measure leads to valid conclusions or that a sample enables valid inferences, and so on. It is a proposition, inference
or conclusion that can 'have' validity" (Trochim, 1997).
Random Assignment | Notation Designs and their Meaning | Design Notation | Preexperimental Designs | True Experimental Designs | Social
Researchers | Bibliography

Random Assignment
by David Abrahams
Random Assignment
What is necessary for an assignment to be considered random? The most important requirement is that each participant or subject in
the study have an equal chance to be assigned to a group. Statistically, “equal chance” is expressed as “.5” or “50%.” One method is
to randomly assign the subjects to two groups by flipping a coin. If the result of the coin toss is heads, the subject is assigned to one
group. If the result is tails, the subject is assigned to the other group. A more effective method for random assignment is to use a
personal computer with a statistical program like Mini-Tab or SPSS to randomly assign the subjects to groups.
Researchers

Experimental Designs
by David Abrahams
Experimental Designs and Their Meaning
R O X O
"To understand is to perceive patterns."

Isaiah Berlin, Historical Inevitability
There are three types of experimental designs. They are:
* Preexperimental designs
* True experimental designs and
* Quasi-experimental designs.
Preexperimental designs lack random assignment to the program group and the control group.This design illustrates some
inherent weaknesses in terms of establishing internal validity. The better designs are called true experimental designs and
quasi-experimental designs. True experimental designs are more complex and use randomization and other techniques to
control the threats to internal validity. Quasi-experimental designs (Trochim, 1996) are special designs for use in the
approximation of true experimental control in nonexperimental settings. The closer your nonequivalent group approximates
a true experimental population the stronger the internal validity.
Random Assignment | Notation Designs and their Meaning | Design Notation | Preexperimental Designs | True Experimental Designs |
Social Researchers

by David Abrahams
Preexperimental Designs and Their Meaning
O X O
On the surface, the design below appears to be an adequate design. The subjects are pretested, exposed a treatment, and then
posttested. It would seem that any differences between the pretest measures and posttest measures would be due to the progam
treatment.
The One-Group Pretest-Posttest Design

Experimental Group: O X O
However, there are serious weaknesses in this design. With the exceptions of selection and morality threat to internal validity, which
are not factors due to the lack of a control group, this design is subject to five other threats to internal validity. If a historical event
related to the dependent variable intervenes between the pretest and the posttest, its effects could be confused with those of the
independent variable. Maturation changes in the subjects could also produce differences between pretest and posttest scores. If paper-
and pencil measures are used on a pretest and a different test measure was used on the posttest, a shift of scores from pretest to
posttest could occur resulting in a testing threat. Regardless of the measurement process utilized, instrumentation changes could
produce variation in the pretest and posttest scores. Finally, if the subjects were selected because they possessed some extreme
characteristic, differences between pretest and posttest scores could be due to regression toward the mean.
In all of these cases, variation on the dependent variable produced by one or more of the validity threats could easily be mistaken for
variation due to the independent variable. The fact that plausible alternative explanation can not be ruled out makes it very difficult to
say with any kind of confidence the treatment given caused the observed effect.
The next preexperimental design involves comparing one group that experiences the treatment with another group that does not.
Experimental group: X O
Control group: O
In considering this design, it is important to recognize that the comparison group that appears to be a control group is not, in the true
sense, a control group. The major validity threat to this design is selection. Note that no random assignment (omission of the letter
"R") is the indicator that the comparison group nonequivalent. In the above design, the group compared is picked up only for the
purpose of comparison. There is no assurance of comparability between it and the experimental group. For example, we might wish
to test the impact of a new type of math test by comparing a school in which the program exists with one that does not have the
program. Any conclusions we might reach about the effects of the program might be inaccurate because of other differences between
the two schools.
Despite their weaknesses, preexperimental designs are used when resources do not permit the development of true experimental
designs. The conclusions reached from this type of design should be regarded with the utmost caution and the results viewed as
suggestive at best (Dooley, 1990).
Researchers

True Experimental Designs
by David Abrahams
True Experimental Designs and Their Meaning
R O X O
True Experimental Designs
Probably the most common design is the Pretest-Posttest Group Design with random assignment. This design is used so often that it
is frequently referred to by its popular name: the "classic" experimental design. In a true experimental design, the proper test of a
hypotheses is the comparison of the posttests between the treatment group and the control group.
Experimental group: R O X O
Control group: R O O
This design utilizes a control group, using random assignment to equalize the comparison groups, which eliminates all the threats to
internal validity except mortality. Because of this, we can have considerable confidence that any differences between treatment group
and control group are due to the treatment.
Why are internal threats to validity removed by this design? History is removed as a rival explanation of differences between the
groups on the posttest because both groups would experience the same events. Maturation effects are removed, because the same
amount of time passes for both groups. Instrumentation threats are controlled by this design because although any unreliability in the
measurement could cause a shift in scores from pretest to posttest, both groups would experience the same effect. By removing
threats to internal validity you maintain equivalence between the groups. This enables you to conclude with a high degree of
confidence that your treatment caused the observed effect and not some alternate plausible explanation.
With respect to regression, the classic experimental design can control for regression through random assignment of subjects with
extreme characteristics. This ensures that whenever regression does take place both groups will equally experience its effect.
Regression toward the mean should not, therefore, account for any differences between the groups on the posttest. Randomization
also controls for selection threat to internal validity by making sure that the comparison groups are equivalent.
Another true experimental design is the Solomon Four-Group Design which is more sophisticated in that four different comparison
groups are used.
Experimental group 1: R O X O
Control group 1: R O O
Experimental group 2: R X O
Control group 2: R O
The major advantage of the Solomon design is that it can tell us whether changes in the dependent variable are due to some
interaction effect between the pretest and the treatment. For example, let's say we wanted to assess the effect on attitude about police
officers (the dependent variable) after receiving positive information about a group of police officers' community service work (the
independent variable). During the pretest, the groups are asked questions regarding their attitudes toward police officers. Next, they
are exposed to the experimental treatment: newspaper articles reporting on civic deeds and rescue efforts of members of the police
department.
If treatment group 1 scores lower on the attitude test than control group 1, it might be due to the independent variable. But it could
also be that filling out a pretest questionnaire has sensitized people to the difficulties of being a police officer. The people in
treatment group 1 are alerted to the issues and they react more strongly to the experimental treatment than they would have without
such pretesting. If this is true, then experimental group 2 should show less change than experimental group 1. If the independent
variable has an effect separate from its interaction with the treatment, then experimental group 2 should show more change than
control group 1. If control group 1 and experimental group 2 show no change but experimental group 1 does show a change, then
change is produced only by the interaction of pretesting and treatment.
When using the Solomon Four-Group Design our concern with history and maturation effects is usually only in terms of controlling
their effects. The Solomon design enables us to make a more complex assessment of the cause of changes in the dependent variable.
However, the combined effects of maturation and history can not only be controlled but also measured. By comparing the posttest of
control group 2 with the pretests of experimental group 1 and control group 1, these effects can be assessed. However, our concern
with history and maturation effects is usually only in terms of controlling their effects, not measuring them.
The Solomon design is often bypassed because it requires twice as many groups. This effectively doubles the time and cost of
conducting the experiment. Many researchers decide that the advantages are not worth the added cost and complexity (Graziano and
Raulin, 1996).

Design Notation
by David Abrahams
Design Notation
R O X O
"The greatest challenge to any thinker is stating

the problem in a way that will allow a solution."
Bertrand Russell
Design notation is a way to indicate the number of factors and the number of levels of each factor. This system provides the
researcher with all the information necessary for describing an experimental design. Be aware of the fact that design notation is a
very flexible system with respect to symbol definition. In order to accurately interpret the results of another researcher's work that
utilizes design notation, that researcher must provide a legend that defines the meaning of the symbols. The elements of design
notation are:
· “O” is the symbol for an observation or measurement.
· “X” is the symbol for a program or treatment group.
· The “Groups” (program and control) are each

given their own line for their notation symbol(s).
For example, if there are three lines there are
three groups.
· “Assignment Groups”- At the beginning of each

line you will see letters (called notation symbols),
the first letter describes how the group was
assigned:
R= RANDOM ASSIGNMENT
N= NONEQUIVALENT GROUP DESIGN
C= CUTOFF POINT FOR ASSIGN
Note: If you do not see one of the above

three letters on a notation line
and the line begins with an "O",
that line indicates a nonequivalent
group.
Researchers

Social Researchers
We cut nature up, organize it into concepts, and ascribe significances as we do, largely because we are parties
to a "contentious group of truth seekers" that agrees to organize it in this way - an agreement that holds
through the research community and is codified in the patterns of our language of research.
Our challenge as "truth seekers" is to see things as they are and ask why? Hopefully our solutions will have a
positive effect on change and not be adversely affected by it.
David Abrahams

Bibliography
Campbell, Donald T and Julian C Stanley

Experimental and Quasi-Experimental Designs
for Research, Rand McNally 1966
Graziano, Anthony and Raulin, Michael

Research Methods A Process of Inquiry
Longman,Inc. 1996
Howard, George
Basic Reasearch Methods in the Social Sciences
Scott, Foresman and Company 1985
Trochim, William, M.K.

Social Research Methods
Available at: http://trochim.human.cornell.edu/
Researchers

CHOOSING A RESEARCH DESIGN
What Kind of Experiment Should You Do?!?
This WEBsite will serve as a tutorial on how to choose a research design. This tutorial will focus on experimental designs (as
opposed to observational studies-studies that involve observing an already existing situation). To begin with, let's define what
an experiment is. An experiment is a cohort (group of subjects) study in which the investigator manipulates the predictor
variable-otherwise called the treatment, program or intervention-and then observes the outcome. this tutorial will focuse on
experiments with two groups, one who recieve the treatment, and a comparison group that does not. We have briefly
discussed the structure of a generic experimental study . What we haven't discussed, is that there are varying types of
experimental designs . These types or degrees depend on how your treatment and comparison groups are selected. So,
depending on how these groups are selected, you may have more or less ability to establish a causal inference. What is meant
by this is that various different types of designs have respectively, greater or lesser ability to determine ig your treatment or
intervention is in fact the cause of the difference in your outcome measurement between your two groups.
Now, let's go through some basic steps on how to choose a research design appropriate for the issue you
want to address.
The First Question is: Are You Concerned with establishing a causal inference?
YES NO
OR
Causality and concern for ethical treatment issues ?
Some Other Thoughts and Caveats
*Please note that this tutorial does not address monetary or feasibility considerations of research design. Although a certain type of
experiment may be the best for what you want study, it may not be feasible, nor may it be affordable. These issues can be addressed
once you decide on what your goal/manpower/budget is for your study*.
*This tutorial is meant to be an introduction to research design. Therefore, the designs mentiones are of the simplest form. research
design can become quite complicated, and the designs can become quite intricate. So I hope you find this tutorial to be helpful, and a
good introduction to research design, and that it sparks your interest in research design.
****There are study designs that involve only one group (a group to which/whom you address your issue), these groups will not be
dealt with in this tutorial. Good Luck! .
Please Email me with any information or comments about this web page
Rhonda BeLue
Cornell University
4/9/97
Randomized Experiments
So You've Decided to Do A Randomized Experimental Design
First, Let's define what a randomized experiment is. A randomized is one in which subjects were assigned to the treatment and
comparison groups randomly. That is, once you determine what/who your target population is going to be, your subjects are then
randomly assigned to the comparison and treatment groups. This random assignment gives you Probabilistic Equivelance. This means
that any differences between the treatment and comparison groups is due to chance: After randomized assignment, the groups are
equivelant in probability. This notion Probabilistic Equivelance is the reason that randomized experiments ("true experiments") are the
best for establishing cause, or establishing that your treatment or program work. If the randomization was done properly, you know
that your groups (in probability) were the same before your treatment (program or intervention), and that any difference is due to your
treatment, and most likely not due to differences between your two groups due to the assignment process.
PRE and POST Tests
Now that you have decided to do a randomized design, will you be taking a pre-treatment measurement (a pre-test) as well as a
measurement to determine your treatment outcome (a post-test)? If you have the resources, and an appropriate pre-test measure, and,
you want to get an idea of where your subjects stand before your treatment, I suggest that you do a pre-test measure.
So, will you do both a pretest and a post test measure, or just a post-test? PRE&POST POST ONlY
Rhonda BeLue
Cornell University
4/9/97
Designs Involving a pre and post test
So you've decided to do a randomized design with pre and a post tests?
Here is model of the design:
--------------------
Rt -> Pr -> I -> Po
--------------------
Rc -> Pr -> Po
--------------------
Rt=Randomized treatment group

Rc=Randomized comparison group
Pr=Pre-test
I=Intervention
Po=Post-test
Analysis of Designs with a Pre and Post Test.
Designs with both pre and post tests must be analyzed using Analysis of Covariance (ANCOVA). What is ANCOVA? ANCOVA is
statistical a method that allows you to covary the pretest measurement with the outcome meaurement. ANCOVA is basically a linear
regression model that allows you to adjust for the pretest measure. In other words, we are just removing the effect of the pretest
measure so that we can just look at the difference between the post test measurements between the treatment and comparison groups.
For example, The ANCOVA model looks like this:
Outcome= intercept + (constant1)Pretest + (constant2)Z[t or c] + Error term
Where Z is used to indicate you treatment verses your comparison group (Z=1 treatment, Z=0 comparison).
Here is an example of an ANCOVA analysis in MINITAB statistical Package
(I made up this data!)
The regression equation is

posttest = 0.107 - 0.053 pretest + 0.069 Z
Predictor Coef StDev T P

Constant 0.1071 0.1466 0.73 0.230
pretest -0.0530 0.1009 -0.53 0.100
Z 0.0687 0.1995 0.34 0.042
S = 0.9942 R-Sq = 0.57 R-Sq(adj) = 56.2%
We can see from the P-values, that P<.05 for the Z predictor, so we know that Z is
significant, and therefore,
there is a difference between the treatment and comparison groups.
If you would like more information on ANCOVA, Applied Linear Statistical Models, by Neter, Kutner, Nachshem, and
Wasserman is a great reference. This book also discusses P-values, and significance levels if you need a refresher.
__________________________________________________________
ANCOVA can be done in a variety of statistical packages
The easiest and/or most versatile packages that do ANCOVA are :
1) MINITAB-very user friendly-point & click
2) SAS-sorry, you have to write your own code for this one!
3)SPSS-again,point & click

Rhonda BeLue
Cornell University
4/9/97
Designs Involving Post Test Only
So you've decided to do a randomized design with a post test only
Here is a simple model of the design:
-------------------------
Rt -> I -> Po
-------------------------
Rc -> -> Po
-------------------------
Rt=Randomized treatment group

Rc=Randomized comparison group
I=Intervention
Po=Post-test
Analysis of Designs with Post Test Only
The easiest way to understand and analyze designs with post test only is :
I. T-Test:
A T-test is a ratio that calculates , [Difference between groups]/[Variability between groups]. It is a signal to noise ratio..
T= [MEAN(treatment)-MEAN(comparison)]/Standard Error(Trt.Comp)
Here is an example of a T-Test shown in MINITAB :

__________________________________________________________________________________________________________
Two sample T for Comparison group vs treatment group

Comparison group: N=100
Mean= 0.143
Std Dev=0.986
SEmean=0.099
Treatment group:
N=100
Mean= 0.030
Stdev=0.990
Standard Error (mean)=0.099
________________________________________________________________________________________
*95% CI for mu Comp - mu treat: ( -0.163, 0.388)
T-Test mu Comp = mu treat (vs not =): T= 0.80 P=0.42 DF= 197
________________________________________________________________________________________________________
The results of this T-test are not significant. We know this because the confidence interval (indicated by * above) spans zero (0) CI:
(-.163,.388). This is the easiest way to tell whether your test is significant or not.
For your information, designs with post test only can also be analyzed using a :
2) Linear Regression
MODEL:
Outcome=Constant1 + (constant2)Zi + error
Z=0 for comparison group, Z=1 for control group
OR
3) Analysis of Variance (ANOVA)
MODEL
outcome= U + Ai+ Error

U=Population mean on all observations
Ai=effect of treatment (either treatment or no treatment)
All of these methods will yield the same results.
More information on ANOVA and Linear regression can be found in Applied Linear Statistical Models, by Neter, Kutner, Nachshem,
and Wasserman.
The easiest and/or most versatile packages for T-tests, ANOVA and Linear Regression are :
- 1) MINITAB-very user friendly-point & click
2)SAS-sorry, you have to write your own code for this one
3)SPSS-again,point & click
Rhonda BeLue
Cornell University
4/9/97
Quasi Experimental Designs
Causal Inference: Not so important or not feasible?
A Quasi-Experimental design is one in which the treatment comparison groups are not assigned by randomization. The groups might
be for example an education study, in which your treatment and comparison groups are two different 6th grade classes, or two different
grades all together. Another example would be a blood pressure study where the treatment group is everyone with HDL>150 and the
comparison group is everyone with HDL<150. The latter example is actually indicative of a cut-off assignment which is actually very
good at infering causality, but, it is still technically a Quasi-experimental design because the groups were not assigned randomly.
As we can see, it is possible that pre-existing groups (i.e. the 6th grade example) are dissimilar in one way or another, before the
treatment or intervention even occurs. That is why these types of experiments are not as good as a randomized (or cut-off ) experiments
for infering causation. The problem is that you may not be able to tell if it is your treatment/program/intervention, or dissimilarities
between the two groups that cause the difference in your outcome.
The one advantage to these type of experiments is that they are often cheaper, and more feasible than a randomized
experiment, and the differences in comparison and treatment groups can be accounted for in the analysis (see analysis
section in the non-equiv group page below).
If you feel that you do want a greater ability to infer cause, you might want to investigate whether it if feasible to do your
experiment as a randomized design .
non-equiv. groups
Rhonda BeLue
Cornell University
4/9/97
Regression Discontinuity Design (RDD)
So You are Concerned With the Ethical Treatment of Your

Subjects?
A regression Discontinuity design is a design in which the two groups, are distingished by a cut-off point. This is considered the most
ethical of the stuy designs.
If you assign your groups by a cut-off point, you are able to give your treatment or program to those who really need it. Take, for
example , a study on cholesterol. If we give the treatment (say some drug ) to those with HDL>150, and no treatment to those with
HDL<150, those who need the treatment actually get it, as opposed to groups that were assigned randomly, or that were pre-existing
groups (i.e. non equivalent).
Here is the model for the design:
----------------------
Ct-> Pr->I->Po
---------------------
Cc-> Pr-> Po
-------------------
Ct=Cut-off Assigned treatment group
Cc=Cut-off assigned comparison group
I=intervention
Pr=pre-test (baseline measurement)
Po=Post-test measurement.
***if you have looked at the randomized (infer causal inference) section, randomized , will notice that the design can be done with
or without a pre test.
The standard regresion discontinuity design , always has a pretest, as to get an idea of the differences between the cut-off assigned
groups before the intervention.
Statistical Analysis of RDD:
The statistical analysis of the Regression discontinuity Design is the standard Pre-Post test design analysis. ANCOVA analysis
(see below) is similar in method to that of the randomized pre-post analysis, and the cut-off point analysis. The differences in
the analysis are in how you account for the differences in how the treatment and comparison groups were asigned (i.e.
randomly ,non-equivalently, or by cut-off point).
Analysis of RDD with a Pre and Post Test.


Constant 0.1071 0.1466 0.73 0.230
pretest -0.0530 0.1009 -0.53 0.100
Z 0.0687 0.1995 0.34 0.042
S = 0.9942 R-Sq = 0.57 R-Sq(adj) = 56.2%

The information above describes a basic ANCOVA model. When the groups are assigned by
cut-off, there are other
things you must consider.
A good reference about the specifics of this RDD is "The regression Discontinuity design"-
W. Trochim
In L. Sechrest, E. Perrin, and J. Bunker (Eds.) 1990.
Research Methodology:Design and Analysis for Non-experimental data.
U.S. Dept of HSS, DHHS Number (PHS) 90-3454 pps. 119-139
Rhonda BeLue
Cornell University
4/9/97
NON-EQUIVALENT GROUP DESIGN(NEGD)
So You've have two non-equivalent groups?
A non-equivalent group design, is a design in which your two groups are not ,Probabilisticly Equivalent . An example of this is, a study
that looks at reading comprehension in junior high students.
Here is the model for the design:
----------------------
Nt-> Pr->I->Po
---------------------
Nc-> Pr-> Po
-------------------
Nt=Nonequivalent treatment group
Nc=nonequivalent comparison group
I=intervention
Pr=pre-test (baseline measurement)
Po=Post-test measurement.
***if you have looked at the randomized (infer causal inference) section, randomized , will notice that the design can be done with
or without a pre test.
Thestandard NEGD, always has a pretest, as to get an idea of the differences between the non-equivalent groups before the
intervention. This is not as big of an issue with a randomized experiment, due to probalistic equivalence (see link above).
Statistical Analysis of NEGD:

The statistical analysis of the NEGD is the standard Pre-Post test design analysis. ANCOVA analysis (see below) is similar in
method to that of the randomized pre-post analysis, and the cut-off point analysis. The differences in the analysis are in how
you account for the differences in how the treatment and comparison groups were asigned (i.e. randomly ,non-equivalently, or
by cut-off point).
Analysis of NEQDwith a Pre and Post Test.


Constant 0.1071 0.1466 0.73 0.230
pretest -0.0530 0.1009 -0.53 0.100
Z 0.0687 0.1995 0.34 0.042
S = 0.9942 R-Sq = 0.57 R-Sq(adj) = 56.2%
The information above describes a basic ANCOVA model. When the groups are non-
equivalent, you must also adjust for error
the pretest measurement by adjusting the scores with a reliability (reliability=how much
of your measurement is true,
and how much is due to error) meaurement.
How to adjust the pre-test
New pretest score= Average of all pretest scores-(original pretest score-average of all
prtest scores)*(Reliability)
Reliability can be calculated in a number of ways,or it may alrady be known for your
measurements.
This is something you wil have to calculate for each study you do.
The reason that this is necessary in the
NEGD is that the pre-test error might be different between groups, as opposed to a
design where the groups are considered
more similar.
Rhonda BeLue
Cornell University
4/9/97
Research Design and Mixed-Method Approach
A Hands-on Experience
John Sydenstricker-Neto
This tutorial web site provides hands-on experience integrating different research methods into a research strategy. The use of mixed-
method is likely to increase the quality of final results and to provide a more comprehensive understanding of analyzed phenomena.
Hyperlinks will connect you to other sites where you can learn more about research design.
● Getting Started
● Case Study: "Bolsa-Escola"
● Evaluation: Audience, Purpose, and Issues
● Research Design
● Why Mixed-Method
● References
● Read More About
Getting Started
The definition of what is a good evaluation varies according to initial assumptions, values, and philosophical positions shared by the
evaluator and based on the intended uses of the results of an evaluation. One dimension that unites evaluators, however, is a
particular concern regarding the quality of their work. In some sense, that might explain why research methodology is a topic to
which evaluators pay so close attention and even fight for it.
Within the so-called quantitative tradition, quality standards have been defined using the concept of validity (Cook and Campbell,
1972). This concept is a cumulative process with four steps. The initial steps are to assess whether a relationship exists between two
variables (conclusion validity) and to determine if this relationship is causal (internal validity). The third examines if the theoretical
model is well depicted by the means through which it was operationalized (construct validity). Finally, external validity examines if,
and to what extent, findings can be generalized to other groups, places, and times.
This conceptualization of validity has been very influential even within the so-called qualitative tradition, wherein a solid approach to
assess the quality of interpretative inquiry is the truthworthiness criteria (Lincoln and Guba, 1985; Guba and Lincoln, 1989). Besides
the critiques to the classical approach of validity, these criteria include the notions of credibility and transferability that are parallels
to the concepts of internal validity and external validity, respectively.
These parallels suggest that the dichotomy--quantitative versus qualitative--might not be so incompatible as purists from both sides
have argued. More than that, studies using mixed-method have shown that integration of these traditions within the same study can
be seen as complementary to each other (Greene and Caracelli, 1979; Caracelli and Greene, 1997).
So, to give you hands-on experience, let us examine a case study and a research strategy to evaluate it. This strategy is placed within
a mixed-method approach, and potential benefits of such an approach are highlighted. This will put into perspective your knowledge
of research design.
Case Study: "Bolsa-Escola"
Four years ago, the Secretary of Education of Brasília, the

federal capital of Brazil, started an innovative educational
program called "Bolsa-Escola" (Basic School Fellowship).
According to specialists, "Bolsa-Escola" is not a simple
educational or welfare program for poor families. It is a
program that addresses children's education. It also helps
their families, whose difficulties are at the root of childhood
neediness. The program provides an additional monthly
income of approximately $100 U.S. (current minimal
monthly wage in Brazil) for poor families that have all their
7-14 year-old children enrolled and regularly attending
classes in the nearest public school.
To be eligible, children in the family must attend a public school, the family has to live in Brasília for at least five years before the
actual enrollment in the program, and the family must be considered "poor" according to a scale. This scale takes into account per
capita income, type of dwelling (owned or rented), number of children in the household, whether father and/or mother are employed,
and the number and type of electric/electronic devices in the house. Instead of food stamps, benefits in goods or services (e.g. clothes,
shelter), the program gives mothers, but not fathers, money to spend the way they want. They can buy groceries, pay bills, or drink
cachaça (the Brazilian vodka or tequila) in a bar. Two unjustified absences of a child from school in a month are sufficient to cancel
the benefit for that month (Policarpo Jr. and Sandra Brasil. 1997).
Evaluation: Audience, Purpose, and Issues

Within an evaluation, we must be selective in terms of the audience, purpose, and issues to be addressed. In this particular case, the
audience comprises policy makers, professionals of the Secretary of Education, and managers directly involved with the program.
The main purpose of the evaluation is to assess the efficacy of the program in order to make decisions about its future.
The main issues the evaluation will address are as follows:
● Enrollment/Withdrawal
❍ What was/is the difference in school enrollment/withdrawal in Brasília before and after the program started?
❍ What is the difference in the number of children abandoning school since the program started (in general and
within the program)?
❍ How does program participants' class attendance compare to that of nonparticipants?
● Performance
❍ What was/is the difference in students' performance at the end of the school year before and after the program
started?
❍ How does program participants' performance compare to that of nonparticipants?
Research Design
Research design refers to the strategy to integrate the different components of the research project in a cohesive and coherent way.
Rather than a "cookbook" from which you choose the best recipe, it is a means to structure a research project in order to address a
defined set of questions (Trochim and Land, 1982).
Considering the nature of the "Bolsa-Escola" program, types of research design, and specific strengths of the major quasi-
experimental designs, we decided to adopt the regression-discontinuity design. The major assumptions of this design and its relation
to our case study are:
The Cutoff Criterion. Children and their families are assigned to the program based on a defined socioeconomic
scale, creating two distinct groups: a) children belonging to low-income families (program group) who, therefore,
will receive financial support (treatment); and b) children belonging to families above this income level who,
therefore, will not receive any additional benefit (control).
The Pre-Postprogram Measures. The major sources of information for both issues-- enrollment/withdrawal/
attendance and students' performance--are official school records. Complementary data come from application
forms and initial interviews with parents before the child/family is formally enrolled in the program. For both
issues, two dimensions are considered before and after the program was implemented, as well as during
implementation of the program (program group versus control group).
Statistical Issues. We will assume that the requirements regarding the statistical model are fully met, including
statistical power (42,000 children enrolled in the program).
Program Implementation. We will assume that the program is implemented according to the guidelines and there
is no major delivery discrepancy.
The figure expresses the regression-discontinuity design in notation form. The letter "C"
indicates that groups are assigned by means of cutoff (not randomly); the "O" indicates "pre-
postprogram" measures, and the "X" indicates the treatment. The first line refers to the
program group and the second to the control group.
Why Mixed-Method
Though regression-discontinuity is strong in internal validity and can parallel other non-equivalent designs in terms of validity
threats, interpretation of results might be difficult. Outcomes might be the result of combined effects of factors (e.g. better training of
teachers, improvement in school facilities) that are not exactly related to the program per se. Depending on the statistical results, it
might also be difficult to assess the efficacy of the program. Adding qualitative flesh to the quantitative bones is a good strategy to
overcoming some of these problems.
Among the purposes for mixed-method evaluation design, Green et al. (1989) highlight five major ones that might enhance the
evaluation as follows:
Triangulation. tests the consistency of findings obtained through different instruments. In the case study,
triangulation will increase chances to control, or at least assess, some of the threats or multiple causes influencing
our results.
Complementarity clarifies and illustrates results from one method with the use of another method. In our case, in-
class observation will add information about the learning process and will qualify the scores and statistics.
Development results from one method shape subsequent methods or steps in the research process. In our case,
partial results from the preprogram measures might suggest that other assessments should be incorporated.
Initiation stimulates new research questions or challenges results obtained through one method. In our case, in-
depth interviews with teachers and principals will provide new insights on how the program has been perceived
and valued across sites.
Expansion provides richness and detail to the study exploring specific features of each method. In our case,
integration of procedures mentioned above will expand the breadth of the study and likely enlighten the more
general debate on social change, social justice, and equity in Brazil and the role of the public and private sector in
this process.
In sum, the examination of this case study helps us see that a research strategy integrating different methods is likely to produce
better results in terms of quality and scope. In addition, it encourages us to probe the underlying issues assumed by mixed-method.
Two of them are professionals with broader technical skills and financial resources to cover "extra" activities.
Mixed-method is a way to come up with creative alternatives to traditional or more monolithic ways to conceive and implement
evaluation. It is likely that these alternatives will not be able to represent radical shifts in the short run. However, they are a genuine
effort to be reflexive and more critical of the evaluation practice and, ideally, more useful and accountable to broader audiences.
Cited References
Caracelli, Valerie J. and Greene, Jennifer C. 1997. "Crafting mixed-method evaluation design." In J. C. Greene and V. J. Caracelli
(eds.), Advances in mixed-method evaluation: The challenges and benefits of integrating diverse paradigms. New Directions for
Program Evaluation, No. 74. San Francisco, CA: Jossey-Bass, pp. 19-32.
Cook, Thomas D. and Campbell, Donald T. 1979. "Validity." In T.D. Cook and D.T. Campbell. Quasi-experimentation: Design and
analysis for field settings. Boston, MA: Houghton Mifflin, pp. 37-94.
Guba, Egon G. and Lincoln, Yvonne S. 1989. "Judging the quality of fourth generation evaluation." In E.G. Guba and Y.
Lincoln. Fourth generation evaluation. Newbury Park, CA: Sage, pp. 228-51.
Greene, Jennifer C. and Caracelli, Valerie J. 1997. "Defining and describing the paradigm issue in mixed-method evaluation." In J. C.
Greene and V. J. Caracelli (eds.). Advances in mixed-method evaluation: The challenges and benefits of integrating diverse
paradigms. New Directions for Program Evaluation, No. 74. San Francisco, CA: Jossey-Bass, pp. 5-18.
Greene, Jennifer C., Caracelli, Valerie J. and Graham, Wendy F. 1989. "Toward a conceptual framework for mixed-method
evaluation design." Educational Evaluation and Policy Analysis, 11(3), pp. 255-74.
Lincoln, Yvonne S. and Guba, Egon G. 1985. Naturalistic inquiry. Beverly Hills, CA: Sage.
Policarpo Jr. and Sandra Brasil. 1997. "Casa e escola: Governo do Distrito Federal ajuda crianças pobres - e também os seus pais."
Veja, edition 1516 (October, 8, 1997), 30 (40), São Paulo, SP: Editora Abril, pp. 74-6.
Trochim, William M. K. and Land, Douglas A. 1982. "Designing designs for research." The Researcher, 1(1), pp. 1-6.
Read More About
If you are interested in going further, here are some suggestions to get you moving.
Mixed-Method
Greene, Jennifer C. and Caracelli, Valerie J. (eds.). 1997. Advances in mixed-method evaluation: The challenges
and benefits of integrating diverse paradigms. New Directions for Program Evaluation, No. 74, San Francisco:
Jossey-Bass.
Rossman, Gretchen B. and Wilson, Bruce L. 1994. "Numbers and words revisited: being 'shamelessly eclectic'."
Quality and Quantity, 28, pp. 315-327.
Qualitative Research
Denzin, Norman K. and Lincoln, Yvonne. (eds.). 1994. Handbook of qualitative research. Thousand Oaks, CA:
Sage.
Miles, Matthew B. and A. Michael Huberman. 1994. Qualitative data analysis: An expanded sourcebook (2nd ed.).
Thousand Oaks, CA: Sage.
Patton, Michael Q. 1990. Qualitative evaluation and research methods. (2nd ed.). Newbury Park, CA: Sage.
Web Page - Methods of Qualitative Analysis
Web Page - QualPage
Luft, Harold S. 1990. "The applicability of the regression-discontinuity design in health evaluation." In L. Sechrest,
E. Perrin, and J. Bunker (eds.). Research methodology: Strengthening causal interpretations of nonexperimental
data. Washington, DC: U.S. Dept. of HHS, DHHS, No. (PHS) 90-3454, pp. 141-3.
Trochim, William. 1990. "The regression-discontinuity design." In L. Sechrest, E. Perrin, and J. Bunker (eds.).
Research methodology: Strengthening causal interpretations of nonexperimental data. Washington, DC: U.S.
Dept. of HHS, DHHS, No. (PHS) 90-3454, pp. 119-39.
Trochim, William and Cappelleri, Joseph. 1992. "Cutoff assignment strategies for enhancing randomized clinical
trials." Controlled Clinical Trials, 13, pp. 190-212.
Williams, Sankey V. 1990. "Regression-discontinuity design in health evaluation." In L. Sechrest, E. Perrin, and J.
Bunker (eds.). Research methodology: Strengthening causal interpretations of nonexperimental data. Washington,
DC: U.S. Dept. of HHS, DHHS, No. (PHS) 90-3454, pp. 145-9.
Research Strategy and Design
Cook, Thomas D. 1985. "Postpositivist critical multiplism. In R. L. Schotland and M.M. Mark (eds.) Social science
and social policy. Beverly Hills, CA: Sage, pp. 21-62.
Cook, Thomas D. and Campbell, Donald T. 1979. Quasi-experimentation: Design and analysis for field settings.
Boston, MA: Houghton Mifflin.
Trochim, William. (ed.). 1986. Advances in Quasi-experimental design and analysis. New Directions for Program
Evaluation, No. 31. San Francisco, CA: Jossey-Bass.
Trochim, William. 1997. Knowledge Base Home Page

Top of Page Send your Comments Tutorial Home Page Knowledge Base Trochim's Home Page
Copyright © 1997 John Sydenstricker-Neto

Do We Have Internal Validity?
For those who are reading this web page, I am in need of your help. You see, I was appointed by
the Department of Education to evaluate a program. This program was design to keep students in
high school. The program is called "Keep Me In School". In this program every ninth grader,
when entering into high school, were appointed a mentor from the surrounding community. The
mentors were to stay involve in the student's life for at least four years to insure that the student
graduate from high school. The program was first implemented in a high school in Washington,
DC and also in a High School in Ithaca, New York.
Now that you know a little about the program, our "yes, I said our" job is to find out if the
program works. In other word, does the "Keep Me in School" program have internal validity? I
know exactly what you are thinking. I thought the same thing when I first read the proposal sent
to me by the Department of Education, "what in the world is internal validity?" So I guess before
we can evaluate the "Keep Me In School" program to see if it has internal validity, we should first
find out what in the world internal validity is.
Now, you have to ask yourself, are you bold enough to take this journal toward understanding
the meaning of internal validity? Before you make up your mind, I want you to know that internal
validity is a relative to the statistical family. Yes, it is true, I believe they are first cousin. So, are
you still brave enough to continual on this journal. However, if you are not ready to face the
statistical family, I do understand. Just remember that one day you will have to confront this
family before they come after you.
When you are ready, just follow the outline below. First, I will defined internal validity. And then, I
will discuss the threats to internal validity. So lets begin. Just follow the outline and feel free to
ask question if you don't understand something.
Table of Contents
What is Internal Validity
Threats to Internal Validity
Back to Bill Trochim's Home Page
Copyright ©2000
Contact Me HereArlean Wells

Internal Validity
What is Internal Validity
What is internal validity?
A research study or experiment has internal validity if the outcome is a results of the variables that are measured,
controlled or manipulated.
" So you are saying, if the program "keep me in school" reduce the amount of students that drops out of school,
than the program has internal validity"?
Yes, you are right. If the "Keep Me In School" program prevent students from dropping out of school and there are
no other alternative explanations for the outcome , than the program has internal validity, .
"No alternative explanation, what do you mean by alternative explanation"?
I mean any other possible causes that could have manipulate the outcome of the program.
"I don't understand. What else could have cause the students to stay in school besides the program"?
You just asked a good question. To answer you question, there are many elements that could effect a program
outcome. We call these elements threats to internal validity.
"Threats to Internal validity"
Yes threats to internal validity, the other alternative explanations that could be raised when you try to conclude that
your program caused the outcome. To make it easier for you to understand the threats to internal validity, I divided
them into three categories. The three categories are, single group threats to internal validity, multiple group threats
to internal validity and social interaction threats to internal validity. Now, if you will return back to the main page,
and continue to follow the outline, I will begin the discussion on single group threats to internal validity.

Copyright © 2000 Arlean Wells, Ithaca, New York. All rights reserved.
August 23, 2004
Threats to Internal Validity
Single Group threats to Internal Validity
Single group threats to internal validity only apply when you are studying a single group that receives your program. the following
are considered as single group threats.
History Threat:
History threats to internal validity are some historical events that occurred during the course of the study. these experiences function
like extra, and unplanned, independent variables. the historical experiences are likely to vary across subjects which has a differential
effect on the subjects responses. For example, in the "Keep Me In School" program, students that participated in the program could
have had other mentors outside of the program that had an influences on their decision to stay in school. .
Maturation Threat:
Maturation threat to internal validity are natural (rather than experimenter imposed) changes that occur as a result of the normal
passage or time. In the "Keep Me In School" Program, there maybe a possibility that the program did not prevent students from
dropping out of school, Instead the results of the students maturing allow the students to understand the value of education and
therefore resulting in the student discussion to stay in school.
Testing Threat:
Testing threat to internal validity are threats that only occurs in the pre-post design. A consequence of per testing program
participants is that they can change the subjects performance on later tests. In the case of "Keep Me In School" if you give a pretest to
the students to evaluate their attitude about school before they start the program, some of the children might become aware of the
goals of the program and behavior in a way to accomplish those goals.
Instrumentation Threat:
Instrumentation threat to internal validity is similar to testing threat to internal validity because instrumentation threats only occur in
the pre-post design. In this situation changing the measurement methods during a study affects what is measured. In the case of the
"Keep Me In School" program, there could only be an instrumentation threat to internal validity if the students were measure before
they participated in the program and sometime during the program with different instruments.
Mortality Threat:
Mortality threat to internal validity occur when subjects dropout of the study. If more students in the Washington, D.C. high school
dropout of the program than the students in the Ithaca high school, the observed differences between groups become questionable.
Statistical Regression Threat:
Statistical Regression becomes a threat to internal validity when subjects in a study are selected as participants because they scored
extremely high or extremely low on the pretest. Re testing of the subjects will produce distribution of scores that are closer to the
population mean. In the "Keep Me In School" program statistical regression would not be a threat to internal validity because all
incoming freshmen receives that program.
We are finish discussing the single group threats to internal validity, do you have any questions?
"Yes, I have a question. Single group threats to internal validity only occur when you are studying a single group that receives
your program or treatment. Are there any threats to internal validity when there are more then one group receiving your program
or treatment? For example, the "Keep Me In School" program was implemented in two high schools, A high school in
Washington D.C., and a high school in Ithaca, New York.".
Yes, There are threats to internal validity when there are two or more groups receiving the same program or treatment. Those threats
are called multiple group threats to internal validity. If you look on your outline our next discussion is on multiple group threats to
internal validity.
Multiple Group threats to Internal Validity
The multiple-group threats typically involves at least two groups and before-after measurement (trochim 2000). The main multiple-
group threats to internal validity is selection threats to internal validity, which are any factors other than the program that leads to
post test differences between groups. Because the multiple group threats to internal validity are parallel with the single group threats
to internal validity, I will only state each of the multiple-groups threats instead of discussing them in detail.
Selection-History Threat to internal validity
Selection-Maturation Threat to internal validity
Selection-Testing Threat to internal validity
Selection- Instrumentation Threat to internal validity
Selection-Mortality Threat to internal validity
Selection- Regression Threat to internal validity
We are now finished with the multiple group threats to internal validity. If you are not sure about multiple group threat to internal
validity, please click here for a better understanding.
"I have another question. In order to have multiple group threats to internal validity you have to have at least two groups or
more, in which one group receives the program, while the other group don't. What will happen when the group that don't receive
your program on treatment find out about the group that does?"
You know, you are always asking the right questions. What you are referring to is the third category of threats to internal validity.
Those threats are called social interaction threats to internal validity.
Social Interaction threats to Internal Validity
Social interaction threats to internal validity refer to the social pressures in the research context that lead to post test differences that
are not directly caused by the treatment itself (Trochim 2000).
Diffusion or Imitation of Treatment:

Diffusion of imitation or treatment occurs when a comparison group learns about the program from program participants. The
comparison group might than set up their own program by imitating that of the program group. This threat to internal validity will
equalize the outcome between the groups. In this case it will be harder to tell if your program under study actually works.
Compensatory Rivalry:
Compensatory Rivalry occurs when the comparison group knows what the program group is getting and therefore, develops a
competitive attitude with them. Similar to the diffusion or imitation social interaction threat to internal validity, the compensatory
rivalry threat also make it harder to see an effect if one is there.
Resentful Demoralization:
Resentful Demoralization occurs when the comparison group knows what the program group is getting and get discouraged or angry
and withdraw from the study.
Compensatory Equalization of Treatment:
Compensatory Equalization of Treatment occurs when people who runs or control the program demand to be reassigned to the other
group, the group that is receiving the program or treatment.
O.K we have just finish discussing the three categories of threats to internal validity or other alternative explanation to an outcome.
We are also at the end of the journal. It wasn't that bad, was it? Now that you have an understanding of internal validity and its
threats, can you find any threats to internal validity in the case of the "Keep Me In School" program? What would you tell the
Department of Education about the validity of this program? If you are still not sure about internal validity, please click here (trochim
2000).
Copyright © 2000, Arlean Wells, Ithaca, New York.

Designs to rule out threats to internal validity
Welcome to the PAM 613 tutorial web page! The page is put up in an attempt to help contribute to an in-depth understanding of
the strategies commonly used in ruling out plausible alternative explanations or what we call the selection threats to internal validity
in a causal relationship study. Among others, this tutorial web page will focuses on the strategy to rule out threats by research
design. In addition to its exploratory role , the design could play a detective and defensive role to rule out threats to internal validity
by adding to its basic design treatment or control groups, extending waves of measurement, expansion in time and the like.
The page will introduce you to three main designs that work almost with all types selection threats to internal validity. They
include Double Pretest, Switching Replication, Solomon Four Group Design. Please be, however, reminded that there is by and large
no any cut and dried design that could specifically solve all the problems but what is important is the logic behind each design. If you
understand this logic, no matter whatever type of threats to internal validity you might encounter in your research or experiment, you
can still intuitively craft a design that could help you rule out those threats. Be optimistic, you will win!
Table of contents
I. Validity in social research
What is validity
What is internal validity
II. Threats to internal validity
A single group threat

A multiple group threat
A social threat
III. Design to rule out threats to internal validity
A double pretest design

A switching replication design
A Solomo Four Group design
IV. Conclusion
Back to contents
I. Validity in social research
What is validity? " Validity is the best available approximation to the truth of a given proposition, inference or
conclusion" (Trochim, 1999; p.29). We have made daily conclusions or inferences in our everyday life.
As students, we have been frequently involved, for instance, in making inference or drawing a conclusion in
our daily academic activity such as doing research or conducting an experiment from which the conclusion or
inference will be drawn and written up as a research paper or thesis or dissertation. For instance, after conducting a
research on whether the math improvement program at the 7 graders have really made change in the post test, one
might concludes that the program has absolutely elevated the average score on the post test. How valid is this
conclusion? How close to the truth is this conclusion? Was the program really responsible for that change in post
test score of the students? All these question are concerned with what is called VALIDITY.
There are four types of validity in social research -- construct validity, conclusion validity, external
validity and internal validity. In this assignment, only the internal validity is considered in more explanatory and
hopefully in a pedagogical ways.
What is internal validity? Internal validity is the approximate truth the inference is made regarding the study that
involves a causal relationship (Trochim, 1999).
From the earlier example, the math improvement program has tried to elevate the average post test score of
the 7 graders. Internal validity in this example has something to do with how close your inference or conclusion
regarding the truth that only your program has made such improved grades that the students have received at the
post test. There might be some other factors that are instead responsible for making or contributing to such
improvement. They might get the good grades because they feel very good during the day they took a test or they
might get the poor grades because during the time you were taking the test they were interrupted by outside traffic
or they have not eaten breakfast or lunch ( I've experienced this myself. I lost 50% of my concentration).
The set of factors that prevent us from assuring that the improved average score at the post test is due to our
program as mentioned above is called plausible alternative explanation. Put differently, they are frequently
termed as threats to internal validity. These threats inhibit most researchers or scientists from making internally
valid inference or conclusion from their research or experiment unless their researches or experiments are designed
in a way that would rule them out.
II. Threats to Internal Validity
What is threats to internal validity? Threats to internal validity is refereed to all alternative causes other than
the program or treatment that are responsible for the difference in the post test. These threats would prevent
researchers or scientists who are trying to study causal relationship within their treatment or program from
detecting the real effect of that program. In other word, the threats to internal validity prevent researchers or
scientists from establishing the real causal relationship in their program (Trochim, 1999). For instance, you are
implementing a program which is designed to improve the low scorer at one specific high school-- let say the 7
graders. In order to know if your program makes a difference, you have conducted a pretest to get a baseline
average score. After the program is completed you administer a post test in order to measure how much the
students in the treatment have gained in term of their post test average score.
From the example above, could you guess what other possible causes or threats to internal validity will be?
What could be wrong in making a quick inference that the gain in average score is due to your program? Your
critics might come and say to you that your program has not made any difference at all. They point out to other
historical events such as several previous training in similar subjects of highly correlated with the one in your
program, that have continuing effects on the students performance thereby making such a difference in your post
test score.
Threats to internal validity are categorized into three groups depending on the nature of the research, and how
it is designed. These include a single group threat, multiple group threat and social threat to internal validity.
A single group threat to internal validity occurs when an experiment or treatment involves a single group. That is,
researchers or experimenters are not using a comparison group in their causal relationship study. A single group
threat includes history, maturation, testing, instrumentation, mortality and regression to mean threats.
A multiple group is refereed to a research design that involves two groups in an experiment or in a treatment,
in which one group receives a treatment and other does not. The former is named the treatment group while the
latter-- the comparison group. The multiple group threat to internal validity refereed to the conditions in which
the two groups are not comparable before the study . These multiple group threats are called a selection bias or
selection threat. These include selection history, selection maturation, selection testing, selection instrumentation,
selection mortality and selection regression threats (Trochim, 1999).
A final type of threat to internal validity is a social threat. " The social threat to internal validity refers to the
social pressures in the research context that can lead to post test differences that are not directly caused by the
treatment itself" (Trochim, 1997). Social threats to internal validity include imitation of treatment, compensatory,
resentful demoralization, and compensatory equalization.
Knowing what the threats to internal validity are is one thing, and knowing how to rule out all these threats is
another. The following section will give you several possibilities to achieve this end.
III. Designs to rule out threats to internal validity
There are five main approaches of ruling out threats to internal validity-- by argument, by measurement or
observation, by preventive action, by analysis and finally by research design. In this assignment I am pleased to
introduce you, maybe, one of the most powerful approach to deal with threats to internal validity-- the research
design.
3.1 Double pretest design
The design notation is as follows:
This design is very strong against threat to internal validity. The design includes two measures as denoted by
two "Os" prior to the program. This design can rule out selection maturation threat. From this design, if the
treatment and comparison groups are, for instance, maturing at different rate, we could detect this maturation
difference between pretest 1 and pretest 2. If there is no any detectable difference in maturation rate between the
two pretest measures of the two groups, we would be very sure that the two groups are comparable before
receiving the treatment. Therefore, the difference between them in the post test would be attributed to the program
effects.
You might remember that when there are two groups in the experiment there will be possibly a selection
threat to internal validity. The selection regression threat might lead us to misjudgment of the treatment effect. If
each group-- treatment and comparison regresses differently they are both no longer comparable. If they are not
comparable, it would be useless to involve them in the experiment because they will produce confusing effect of
the treatment. We are not sure if the difference in the post test is due to the treatment or due to a selection
regression or a combination of both.
The double pretest design also works with a selection regression threat. It will help to make sure that the two
groups are comparable before the treatment. How? Remember that if the regression threat happens, it will happen
between pretest 1 to pretest 2. If between pretest 1 and 2, it has not happened, it will not happen between the
pretest 2 and post test as well. Therefore the difference in the treatment group between pretest 2 and post test is
absolutely attributed to the treatment effect.
The double pretest design can potentially rule out a selection history. You might recall that the selection
history is referred to two groups-- the treatment and the comparison-- in the experiment or program that are not
comparable before the program because they react differently to historical events. It might be that the program
group reacts to historical event while the comparison does not or vice versus. If this is going to happen, it will
prevent us from clearly attributing the difference at the post test to the treatment effect of the program.
How can the double pretest solve this selection history? It is simple to see how it works in ruling out this
threat. The design involves two groups, each is subject to two pretests-- pretest 1 and pretest 2. If, for instance, the
two groups react differently to history threat, this will happen between pretest 1 and 2 in each group, then they are
not comparable in term of their reacting to historical event. If they are not affected by the history threat, there will
be difference between pretest 1 and 2. The two groups are thus comparable before the treatment is administered,
and the selection history is thereby ruled out. Therefore the difference at the post test in the treatment group is
strongly attributed to the program effect.
3.2 Switching replication design
Design notation:
This design is good at solving the social threats to internal validity. Since it is a multiple groups, in this case
two groups of people, there are usually social interaction between the groups. Some of them who are in different
groups may, for instance, know each other, and possibly exchange among themselves the treatment effect to others
in another group, that could be termed as a spill over effect. If this happens in the treatment or experiment, it will
prevent us from detecting the real effects of the program on the treatment group.
The most frequent social threats to internal validity that have been encountered in most social research are the
following-- compensatory rivalry, compensatory equalization, resentful demoralization.
The switching replication design works well with these issues. In this design, the two groups-- the treatment
and the comparison will act alternatively as either a treatment or a comparison group at different waves of
measurement. In the first wave of measurement, the first group receives the treatment which is denoted by "X", and
the second group acts as the comparison. In the second wave of measurement the second group receives, in turn,
the treatment while the first group becomes the comparison group.
How is this design dealing with these social threats? You might recall well that the root cause of social threats
is the difference between the two groups-- the program. If this program is beneficial, the jealousy will be created.
Those who are in the program group will be happy, and those who not will be unhappy. As both groups are in our
experiment, this social friction will absolutely affect the outcomes of the experiment or treatment.
Fortunately, we have the design that could handle the issue. What is cool about this design, is that each group
in the experiment will receive program one after the other, as implied by the name of the design-- the switching
replication. Because they receive equally, for instance, the benefits from the program, all social threats spawned
from inequity in the program assignment as mentioned above will be ruled out.
3.3 The Solomon Four Group Design
The design notation is the following:
This design is strong against testing threat to internal validity that occurs when the act of taking a test affects
the posttest score. The design consists of four groups of randomly assigned. Two of them receive the treatment as
denoted by " X" and the other two do not. Another important characteristics of the design is that the two first
groups have pre-post test, while the two last have not .
The testing threat might have happened in the first two groups that are subject to pretest. The design has
included the other two groups of randomly assigned without being subjected pretest-- one receives treatment and
another does not. The last two groups help us prove the possibility that the testing threat may or may not occurs in
the experiment.
If the testing threat occurs, it will be reflected in the post test difference between the two treatment groups,
one of which is exposed to pretest and other not, and so will between the two comparison groups. If the groups
with the pretests are not affected by the testing threat, they should produce the same result. That is, the treatment
group 1 with the pretest produces the same result as does the treatment group 2 without the pretest, and so do the
comparison groups. If the difference does not occur, the testing threat is , therefore, automatically ruled out.
IV Conclusion
The strategy to rule out the selection threats to internal validity by research design has proven technically
promising but attention has to be paid to logistical concern. Although it can help us to rule out almost selection
threats, we also have got to bear in mind that cost effectiveness of each design should deserve weighted
consideration. Different experimental designs cost us as the experimenters or researchers differently. The
expansion in time or in waves of measurement and the like requires the expansion in efforts and most importantly
in cost you are going to incur. Therefore before deciding to use any research design to rule out the threats to
internal validity in your research as mentioned above, you have to do benefit cost analysis to see how much you
will gain in term of internal validity in your research and how much it will cost you to achieve that gain. If the
benefit exceeds the cost, it is fine and please go do it!
Any comment? Please e-mail me: sk234@cornell.edu
Securing Internal Validity - How to Avoid the Threats
Paul A. Burns
The following web page is designed to provide an introduction to internal validity and to
familiarize you with the potential threats or obstacles to achieving internal validity. To begin, we
should have a clear idea what validity means.
Table of Contents
● What is Validity?
● What is Internal Validity?
● What is the Criteria for Causality?
● What are the the threats to Internal Validity?
❍ History
❍ Maturation
❍ Testing
❍ Instrumentation
❍ Regression
❍ Mortality
❍ Selection
● What are the four Social Threats

● Save I-V-Y (Internal Validity - Yes!)
What is Validity?
Validity is the best available approximation to the truth of a given proposition, inference or conclusion. In other words, when we
make some claim, does the evidence support our conclusions. Validity is divided into four categories. They are as follows:
conclusion, construct, external and internal.
What is Internal Validity?
Internal validity focuses on cause and effect relationships. The notion of one thing leading to another is applicable here. ( Event A
leads to Event B). For example, it is believed that listening to Selena twice a day causes adolescent girls to crave fajitas and develop
an uncontrollable urge to attack pinatas. However, if we want to test such a claim for internal validity, then we must be able to
measure this unusual phenomenon for causality.
What is the Criteria for Causality?
As we stated earlier, causality is critical to internal validity. Now, let's see how we might be able to help our Selena fans. There are
three components necessary to establish causality.
1) Temporal Precedence
We must have evidence that the cause precedes the effect.
2) Covariation of the Cause and Effect
Here simply we must note a change that demonstrates cause and effect. If the treatment or program is effective then there is a change.
On the other hand, if the program is withheld then there is no change observed.
3) No Plausible Alternative Explanation
On occasion a treatment plan is implemented and a change is noted, but the change is not the result of the treatment plan. In social
science, this is referred to as the third variable or missing variable problem. Despite the fact there is a relationship, it is not clear that
there is cause and effect. This brings us to focus of our study of threats to internal validity.
Now, that we understand the notion of cause and effect do you think our Selena hypothesis meets the criteria? You do the math. For
more information on the criteria for establishing a cause-effect relationship check out Trochim's home page.
What are the the threats to internal validity?
Threats can be defined as issues or concerns that will arise when there are inconsistencies in the data that compromise the causal
relationship. These threats fall into seven major categories:
1. History
I'm sure you are wondering what does the Magna Carta have to do with cause and effect. Well, you're right - Nothing!! It's not that
sort of history we're dealing with, but a more garden type variety. In order to achieve internal validity, we must be careful to account
for events that occur during the course of the program that might impact the final outcome. For example, what if you are
implementing a health education program in the public schools to create awareness to reduce the incidence of teenage pregnancy.
During the program, one of the students becomes pregnant, she eventually drops out of school and commits suicide. Clearly, this
event would have a profound effect on the students awareness. So when we measure the effectiveness of the program and we note an
increase in awareness, we must be sure to factor in the impact of the suicide on student's awareness when assessing program
effectiveness.
2. Maturation
This is simply what is sounds like. The everyday human activities that lead to growth that occur naturally as the treatment program is
being implemented are potential threats. These include maturation due to age, experience, physical development or anything that
leads to increase an knowledge and understanding of the world which can affect program results.
3. Testing
As the old adage goes, "Experience is the best teacher". This is precisely what happens in programs that utilize pre and post test
measures. One of the simplest examples is that of testing. For example, just by virtue of having had the experience of taking the GRE
once, without any additional preparation, you are more likely to improve your score on a re-take.
4. Instrumentation
The instrumentation threat is caused by inconsistencies with the testing instrument i.e., interviewer, grader, or the test itself. Although
a student's math performance may remain unchanged during the course of the program, if the measurement tool is altered, it can
show an improvement that may be attributable to the revised test and not the program.
5. Regression
The phenomenon of regression shows that there is a natural tendency for individuals who score on the outer extremes (either very
high or very low) of the grade continuum, when retested will score closer to the mean. For example, Sheena, a low scorer on the high
school achievement exam upon retaking the exam will improve her scores even without the intervention of any study program due to
the regression effect.
6. Mortality
Participants dying in your program would definitely not achieve the internal validity you seek. However,luckily you need not worry
about that because we are not referring to a death that leads to loss of life, but instead to a common program problem - the issue of
participants dropping out of the program. Unfortunately, the loss of program participants can create a false treatment effect that
appears to be the result of your program. For example, you want to evaluate a pre-school readiness program for 3-4 yr olds. To
determine the effectiveness of the program, you use a pre-post measurement, but during the course of the study several of the low
scorers drop-out. The loss of low end scorers will artificially raise the post-test measures creating the impression of an program effect
when in fact there is none.
7. Selection
The above internal validity threats are considered single group threats because there is no control group. However, multiple groups
designs are also common and statistically stronger and are subject to the same threats as single groups. The difference is simply due
to differences between groups. A selection threat is generated when there are inconsistencies in the comparison groups. Below is a
list of the multiple group threats. I will not discuss them in detail as they are very similar to single group threats, but will again refer
you to the Trochim Home page for more details about multiple group threats. Below you will find a list of the selection multiple
group threats.
Selection-History
Selection Maturation
Selection-Testing
Selection-Instrumentation
Selection-Mortality
Selection-Regression
What are the four Social Threats?
In addition to single and multiple group threats there is another category of threats known as social interaction threats. Because social
research does not take place in a vacum and the human element underpins all research activity, the occurence of social interaction
between researcher and subject or among subjects is inevitable. It is this social interaction that can lead to misinterpretation of the
cause-effect relationship. The following are the major social interaction threats.
Let's assume for the sake of this exercise that you are studying the impact of art education on student achievement. You have
randomly selected two sixth grade classrooms for study in Boise, Idaho. One class recieves arts instruction once a week and the
comparison group receives no arts instruction. And since your research is interesting and valuable, you have a large budget from the
National Endowment for the Arts and the Dept. of Educationthat which includes a trip to New York City for the program group.
Now, let's see what could happen in such a scenario.
1) Diffusion or Imitation of Treatment
This threat occurs if the control or comparison group finds out about the program. Keeping a program like this quiet will be next to
impossible so there will be some sharing of information and experience. Also, the students and teachers in an effort to minimize
differences may try to implement a similar curriculum for the other group.
2) Compensatory Rivalry
Undoubtedly such a program will produce some jealousy (who wouldn't want to spend there afternoon painting and a free trip to
NYC) and this may translate into the comparison group developing a competitive spirit. The comparison group may work harder than
usual to show up the lucky artsy-fartsy types.
3) Resentful Demoralization
The reverse of the compensatory rivalry threat may also occur. Here rather than competing with the NYC group the comparison
group may become angry and upset, and out of frustration stop trying altogether.
4) Compensatory Equalization of Treatment
Now, this is where you get administrators and parents involved in the process. Naturally, in the above scenario there will be some
hurt feelings. Parents will call to complain and demand to know why there child was not included in the new program. In an effort to
conduct damage control, a similar or compensatory program is implemented to appease angry parents.
Finally, since social science research is not a perfect science and there are externalities that can effect the outcome of programs, the
ability to design a project that limits outside factors is crucial. Therefore having a clear understanding of the threats during the design
phase of your program can help you to strengthen internal validity and avoid the nasty comments from critics who review your work.
Save I-V-Y (Internal Validity - Yes!)

Next, to test your knowledge of internal validity threats, play this exciting game to see if you can save IVY. IVY is a together career
girl. She's a graduate of Cornell's School of Human Ecology and is a Program Evaluator working for a Washington, DC - based
Public Policy Institute. Currently, she is involved in a project that is studying the impact of an after school computer program in
Baltimore. When evaluating this project, IVY must be sure to be on the lookout for threats - that is internal validity threats. Help IVY
identify the threats and secure Internal Validity. GO!!
1. After reviewing some documentation, IVY learns a test was administered both before and after the program. But the tests were
different. Find the threat.
a. History
b. Mortality
c. Testing
d. Instrumentation
2. About half way through your program, the three most at-risk students, and consequently the lowest scores on the pre-test, are
arrested and subsequently leave the program. Find the threat!!
a. Maturation
b. Testing
c. Mortality
d. Compensatory Rivalry
3. Since all of the students are at-risk and the primary criteria for participation in the program is poor academic performance, when
assessing the effectiveness of the program based on pre and post test measures, IVY must be careful of what threat to internal
validity? Find the threat.
a. Mortality
b. Regression
c. Testing
d. Diffusion of Treatment
4. Since these are the only computers on campus, other students want access as well so they tell there parents. Parents begin calling
complaining and demanding to know why their child can not use the computers. Bowing to pressure from the parents and the School
Board, the principal provides access other students during school hours. Find the threat.
a. Diffusion or Imitation of Treatment
b. Compensatory Equalization of Treatment
c. Mortality
d. Compensatory Rivalry
5. The program is one year project Over the course of the program the students are students naturally developing and growing and
increasing their knowledge of the world. Find the threat.
a. Regression
b. Mortality
c. Maturation
d. Resentful Demoralization
Trochim Home Page

Copyright @ 1997, Paul A. Burns

Single Group Threats to
Internal Validity
What is Internal Validity?

Before you can discuss threats to internal validity, you must understand what it is. As the name
suggests, internal validity is the kind of validity that only pertains to the specific implementation of
the treatment or program that is being evaluated. Internal validity allows a researcher to claim that
it is, in fact, the treatment or program that caused a change in the group that was treated.
In order for a researcher to make a causal claim about a treatment or program, he or she must be
able to show that:
1. there is a relationship between the treatment and the effect (in other words, the researcher
must establish conclusion validity);
2. the observed effect occurred after the treatment was implemented
3. there are no plausible alternative hypotheses
Criterion 3 is usually the most difficult for a researcher to meet. Especially in social research, where
it is often impossible to insulate groups from exposure to their social environment or regulate their
reaction to the experience of being evaluated, events, conditions and responses unrelated to the
treatment or program under study can have an impact on the group. This can cause a researcher to
mistakenly attribute an observable difference between the pre- and posttests to the treatment under
study when in fact the unrelated event or response caused the change. In other words, a researcher
will claim he or she has established a causal relationship when none exists.
What is a Single Group?

Well, obviously a single group is one group. In research terms, this means that no control group (a
group that doesn't receive the program or treatment, also called a comparison group) is being used
to serve as a a standard against which a researcher can compare the results of the treated group.
Adding a control group can enable a researcher to eliminate many threats to internal validity, but it
can often introduce new threats, such as those related to selection biases and social threats. Adding
a control group can be difficult, expensive or inefficient for a field researcher. For this reason,
researchers have to learn to be aware of and guard against single group threats to internal validity.
What kind of threats are there?

There are six threats to single group internal validity:
1. History
2. Maturation
3. Testing
4. Instrumentation
5. Mortality
6. Regression to the Mean
Read up on these descriptions. Once you think you've gained an understanding of how the threats
to internal validity work, you might want to try playing the game:
Protecting against Threats
to Internal Validity
How can a researcher protect his or her evaluation against these threats? Well, one way, as stated
above is to add a control group. This can enable a researcher to show that a threat such as testing is
not as plausible if it had no effect on the control group. A threat like regression could be controlled
for because a researcher could measure the amount of regression in the control group and compare
that to the amount of regression in the treatment group.
Another way to protect against these threats would be to insulate the group under study. A threat
such as history could be controlled if the researcher could make sure that the group was not
exposed to any outside events that could make an impact on the study. Of course, this is rarely a
practical option in field research.
A third way to protect against these threats is to be aware of them and take measurements of
factors that might show if these threats are, indeed, likely to have had an impact on the study. For
example, a researcher who is evaluating an AIDS awareness program could monitor the media
during the period in which the program is being implemented to see if there are any major stories
about AIDS that might influence the treatment group. This could help rule out a history threat. A
researcher examining a math education program could measure how students are improving in
their science and vocabulary knowledge at the same time. If they appear to have improved in math
at a much greater rate than the other subjects, this could help the researcher to make a case against
a maturation threat.
Randomized assignment to a group is a way to control for certain threats, such as regression, and
possibly maturation (if the randomization is across age and experience, for example), but it can't
really protect against history, testing, instrumentation or mortality.
Careful analysis of the data can guard against threats such as regression and mortality. For
example, if you look at the pretest scores of participants who drop out of a program and find they
are evenly distributed among the population, you can make a better argument against a mortality
threat.
Of course, a researcher can guard against an instrumentation threat by using experienced

observers and consistent, established instruments. However, this may not always be feasible or
appropriate. A testing threat can be avoided simply by not administering a pretest, but then it is
more difficult to establish that a change has occurred in the group. However, unobtrusive pre- and
posttests will make the threat of testing less plausible.
Researchers try to guard against a variety of threats to internal validity by carefully designing their
reseach. See Bill Trochim's site on Research Design for an explanation of how to protect your
evaluation from falling prey to these potential research pitfalls.
References
Cook, T.D. & Campbell, D.T. (1979). Quasi-Experimentation: Design and Analysis Issues for Field
Settings. Boston: Houghton-Mifflin.
Huck, S.W. & Sandler, H.M. (1979). Rival Hypotheses: Alternative Interpretations of Data Based
Conclusions. New York: Harper & Row, Publishers.
Judd, C.M & Kenny, D.A. (1981). Estimating the Effects of Social Interventions. Cambridge:
Cambridge University Press.
Trochim, W. (1996) "Knowledge Base" in BillTrochim's Center for Social Research Methods. http://
trochim.human.cornell.edu/kb/kbhome.htm
Name That Threat | Trochim Home | Research Methods Tutorials

History | Instrumentation | Maturation | Mortality | Regression | Testing

History Threats
History can pose a threat to internal validity when the group under study experiences an event--
unrelated to the treatment--which has an impact on their performance on the posttest. For
example, if a researcher is evaluating a drug prevention program, and during the time of the
study a popular celebrity dies from a drug overdose, the effect (or lack of effect) that is noticed
may have been caused by the group's reaction to the celebrity's death rather than the program that
was implemented.
How do you think a comparison group can alleviate this threat to internal validity?
Back
Home | Name That Threat | Trochim Home | Research Methods Tutorials

Instrumentation | Maturation | Mortality | Regression | Testing

(All of the examples provided on this page come from the book Rival Hypotheses: Alternative
Interpretations of Data Conclusions by Shuyler W. Huck and Howard M. Sandler.)
1. Miss America
In an attempt to prove that billboard advertising is the best medium for advertising a product in
order to increase brand name recognition, which would hopefully lead to increased sales of the
product, the Institute of Outdoor Advertising (IOA) conducted a research study. Huck and
Sandler state, "The goals and results of this empirical investigation were reported in two IOA
promotional brochures. " The text of the brochures goes as follows:
We've long believed that Outdoor can outperform other media in getting across
a message to the public. What we needed, here at the Institute of Outdoor
Advertising, was a way to prove it. Last year we thought of a way. We would
see if our medium, by itself, could increase public awareness of the name of
Miss America, 1975.
We approached the Outdoor companies with our plan. We asked them to

donate space not already sold or earmarked for public service announcements.
They gave us 10,000 panels -- or about $1.5 million worth of Outdoor at the
going rate.
Our poster [which showed a large picture of Miss America with her crown and
the message "Shirley Cothran, Miss America, 1975"] was to go on display for
two months beginning January 1, 1975. But before it did, the Outdoor
companies sponsored a series of studies to determine public awareness of Miss
America's name prior to posting. Random sample surveys were conducted
during November and December, 1974 in 44 metropolitan markets by 25
colleges and universities and 12 independent research organizations. Over
15,000 adults were questioned.
Despite all the exposure Miss America had received (previously) on TV and
radio and in print, only 1.6% of the respondents (italics theirs) gave the correct
answer when asked, "What is the name of Miss America 1975?"
Then our posters went up. And in February and March 1975, a second wave of
over 15,000 interviews was conducted by the same research teams. This time,
16.3% of the respondents (italics theirs)-- about one of every six -- knew who
Miss America was. That's a 10-fold increase in awareness. Projected nationally
it would mean that Outdoor had communicated a new and difficult name to
more than 20 million adults. Through a two-month posting, Outdoor made
Shirley Cothran the best-known Miss America in history.
Huck and Sandler say that, "the promotional material put out by the IOA gives the distinct
impression that outdoor advertising was the causal agent that brought about the tenfold increase
in public awareness of her name." Which of the following threats do you believe could provide
an alternative hypothesis for the results found in this study?
● Instrumentation
● Regression to the Mean
● History
● Testing
2. Air Force Officer School

&n bsp; and Dogmatism
Huck and Sandler write, "Many people believe that the structure of the armed services and the
inherent chain-of-command basis of communication attract highly dogmatic volunteers. Military
commanders, of course, disagree; they claim that they have a need for officers who are open-
minded, tolerant, and able to win the respect and loyal cooperation of the personnel they direct.
Aside from this question as to the type of person attracted to a military career, a researcher
recently wondered whether a 14-week stint at officer training school would cause the junior
officers to become more or less dogmatic, or would it have no effect on dogmatism? And would
the influence of the 14-week program be the same for those participants who began with high
levels of dogmatism as it was for those who began with low levels?
"The subjects in this investigation came from a pool of 764 officers who completed the three-and-
a-half-month Squadron Officer School (SOS) at Maxwell Air Force Base in Alabama. As the
researcher saw it, there were several facets of the SOS program that might have made for a
change in dogmatism. For example, each student was given extensive feedback from peers, the
opportunity to discuss the personality characteristics of other trainees, a chance to deal with
unstructured situations, and experience in planning military strategy in areas divorced from his
field of expertise. These and other similar activities might, according to the researcher's
hypothesis, cause the students' dogmatism levels to decrease over the 14-week time interval.
"During the first and last weeks of training, all students in the SOS program were given a copy of
the Rokeach Dogmatism Scale, Form E. (In this study, it was titled the Rokeach Opinion Scale.)
This measuring instrument is made up of 40 statements, each of which is rated on a -3 to +3 scale
so as to indicate the extent of one's agreement or disagreement. Two of the statements go as
follows: 'Most people just don't know what's good for them,' and 'A group which tolerates too
much difference of opinion among its members cannot exist for long.' From among the SOS
students who completed and returned the Rokeach Scale at both the pretest and post-test periods,
250 were randomly selected. Then, based upon an examination of the pretest scores, the 250
subjects were subdivided into five groups of 50 subjects each. In terms of the dogmatism
continuum, these subgroups were described as high, above average, average, below average, and
low.
"The data were statistically analyzed in two ways. First, the pretest mean for all 250 subjects was
compared to the overall post-test mean. Results indicated no significant difference. Next, a two-
way analysis of variance was used to see whether the five subgroups were changing in a similar
fashion between the beginning and end of the SOS program, or possibly not changing at all. The
pre- and post-test means for the subgroups turned out as follows:
post-
Subgroup pretest
test
High pretest scorers 170.04 161.14
Above average pretest

151.02 145.42
scorers
Average pretest scorers 138.12 137.26
Below average pretest

126.12 128.78
scorers
Low pretest scorers 108.24 117.32
"The statistical analysis indicated a significant interaction between subgroups and pre-post trials.
(Such an interaction simply means that the change from pretest to post-test is not the same for all
subgroups.)
"Based on the subgroup means presented in the preceding table and the significant statistical
finding, the researcher stated that 'Subjects high in dogmatism on the pretest tended to become
less dogmatic by the last week of training while those scoring below the mean tended to become
more dogmatic'.... In a way, it looks as if the SOS training program causes the participants to
become more homogenous in terms of dogmatism." Can you think of any alternative
explanations to explain this result?
● Maturation
● Mortality
● Instrumentation

3. Camping Out
Huck and Sandler write, "...you are undoubtedly aware of the fact that the daily routine in mental
institutions is extremely stifling for the patients....Fortunately, there are some staff members who
are sensitive to the ill effects brought about by the same routine day in and day out -- and who
care enough to try to do something about it. Usually, these attempts at breaking the monotony are
not evaluated by means of any sort of formal research...But on occasion, data are collected in an
attempt to verify scientifically the worth of the innovative program. One such research study...
was conducted in Utah, and the new activity -- camping out -- was about as different from the
daily institutional routine as you could imagine.
"The subjects in this investigation were 25 male and female adults aged 19 to 62, who were
randomly selected from a state mental hospital located in an urban area of Utah. These
individuals were taken to an isolated camp site in the mountains near Flaming Gorge. The
patients and staff were on this camping trip for five days...and while on this excursion the staff
maintained a very low profile. The patients had the responsibility of forming teams for cooking
and clean-up, of arranging sleeping accomodations, and of structuring their own free-time
activities. Other than busing the group to and from the campsite, the staff took charge on only
two occasions -- when the group went on a raft ride down the river and when they visited a
nearby trading post for Cokes and candy.
"The researchers expected this week-long camping retreat to serve as a therapeutic tool, and in
particular they hypothesized that the activities would bring about increased social interaction
among the patients. To test this hypothesis, two types of data were collected on both the first and
final days of the camp-out. According to a prearranged random time sampling scheme, five-
minute sessions of group interaction were taped, unobtrusively, on an audio recorder. In addition,
photographs were taken of the patients.
"One week after returning, all staff members and five of the patients who had gone on the camp-
out used the audiotapes and pictures to rate the 25 patient campers in terms of social interaction.
These ratings were obtained by using a modified version of the Bales Interaction Matrix. For
each of the 25 patients, average ratings from the staff judges were computed. Then, the ratings
within each group of judges were averaged across the 25 patients to obtain overall Monday and
Friday ratings for the entire group. Since the Bales Interaction Matrix yields 12 subscale scores,
there were two sets of 12 pretest and post-test composite ratings on the 25 campers, one set from
the staff judges and the other set from the five patient judges.
"When the prestest and post-test data were tested statistically, the researchers found that there
was significantly more social interaction at the end of the five-day camping excursion than there
had been at the beginning. The ratings from the patient group of judges showed increases on 11
of the 12 subscales of the Bales instrument, while the ratings from the staff members indicated
significant improvement on all 12 subscales. One possible interpretation of these results is that
the camping activities and unique environment brought about increased social interaction. Might
there be other plausible explanations for the observed differences between the Monday and
Friday ratings?"
● History
● Testing
● Maturation
● Instrumentation
4. Groups for Parents
Huck and Sandler write, "Groups for Parents is a packaged method that offers parents both a
support group of other parents and didactic information on an integrated humanistic behavior
modification approach. The authors of 'Groups for Parents'...published a study evaluating the
effectiveness of their approach in 'improving both general child behaviors [and] individually
targeted ones.' They also reported success in increasing the parents' rates of positive
reinforcement along with the rates of compliance in their children.
"The method of evaluation was quite simple. Thirteen groups of parents (a total of 277) met once
a week for two and one-half hours over an eight week period. About one-half of the parents were
referred by various community agencies; the rest had heard about the program from friends or
other informed sources. The pre- and post-test measures used included a problem behavior
checklist, positive reinforcement rates (measured by the parents), compliance rates (also
measured by the parents), and client satisfaction (self-report). Approximately two-thirds (180) of
those enrolled completed the entire eight-week course.
"The data analyses were equally straightforward, consisting of analyses of the differences
between pre- and post-test means. Significant results that concern us were reported on the
problem behavior checklist, reinforcement rates, and compliance rates. In addition, a very high
rate of client satisfaction at the end of the study was reported." Can you think of any plausible
explanations, other than that the program was successful, for the results the researchers found?
● Mortality
● Instrumentation
● History
● Testing
● Maturation
Home | Trochim Home | Research Methods Tutorials


1. Miss America
Instrumentation
While the authors state that one might think that the extraordinary rise in name
recognition might be due to a change in the way the interviews were conducted, because the
same researchers conducted both the pre- and post-tests, and because it is likely that the
researchers were fairly skilled at interviewing, and finally, because the stated interview
question was quite simple, it is not likely that there was any difference between the way the
pre- and posttests were conducted. Therefore, instrumentation, while a possible threat, is
not a plausible threat.
Return
1. Miss America
Because there is no reason to assume that the subjects who were interviewed were at either
extreme for recognizing the name of Miss America, there is no reason to believe that
regression to the mean posed a threat in this study. Furthermore, it is likely that the people
interviewed in the pretest were not the same as the people interviewed in the posttest.
Therefore, regression is even less of a threat, because there is not consistent degree above
or below any population mean by which to measure the respondents.
Return
1. Miss America
History
Correct! History poses the greatest threat in this study. The authors give two reasons for
this. First, the study seems to assume that during the time between their pre- and posttests,
no other media existed. Of course, this is not the case. At any time during the two months
the billboards were up other forms of media could have reported on Miss America, and the
posttest respondents could have learned her name through any one of those media. Second,
the authors note that the unique nature of the billboard, in fact, caused other forms of
media to cover the story of the research study! In other words, Miss America's name was
being discussed more than usual in newspapers, on the radio and on television. The study
itself had become a historical event which probably caused more people to hear and
remember Miss America's name. It is very plausible that it is because of this event, rather
than simply that people saw the billboards, that the results were so extraordinary.
Return
1. Miss America
Testing
It does not say explicitly in the text whether or not the same people who were interviewed
in the pretest were interviewed in the posttest. As the authors state, "If they were the same,
then clearly the pre-post percentage change could be attributable to the stimulus of the first
interview rather than to the intervening outdoor advertisements. However, we...feel that
this is an improbable competing explanation. We strongly suspect that the people
interviewed during the pretest were not included in the post-test sample." If this is the case,
then this study would be an example of the Separate Pre-Post Samples Design.
Return
2. Air Force Officer School and Dogmatism
Maturation
Because the officers were all adults, and the course was only 14 weeks long, there is no real
reason to believe that, without the treatment, in a 14 week period of the goup members'
normal lives they would have matured in any way that would have a particular effect on
their levels of dogmatism. The subjects were randomly selected from an overall group. This
selection was an attempt by the researchers to improvise a kind of randomization of
assignment for their study, and this kind of randomization is meant to control for threats
such as maturation. Finally, the results of the study are not charateristic of a maturation
threat, since some of the subjects have higher scores on the posttest and some have lower
scores on the posttest. Maturation would generally be indicated by a consistent, though
probably differential, increase or decrease across all the paticipants' scores.
Return
Correct! The results of this study fall into the classic pattern of regression to the mean,
with the people at the high and low ends of the scale regressing more toward the mean than
those people only slightly above or below, while those who scored at the mean barely
change at all. The authors state, "the observed correlation between the Rokeach pretest
and post-test dogmatism scores for the 250 participants in this particular study was +.71.
Based on this correlation, the researcher made a prediction about how high each of the five
subgroups would have scored on the post-test, assuming that the SOS training program
had absolutely no impact on the participants' dogmatism levels and that any pre-post
changes were caused entirely by the phenomenon of regression. These 'estimated' post-test
means for the five subgroups turned out to be almost identical to that actual post-test
means." In other words, even the researchers suspected that regression caused the pre-post
differences.
Return

Mortality
The researchers chose to study only the scores of participants who had completed both a
pretest and a posttest. In other words, of the people from whom the study draws its results,
none dropped out. Therefore, mortality is not a threat.
Return

Instrumentation
First, the instrument used in this study was an accepted and previously-used test. It also
did not rely on human observers who might grow more experienced over time or who
might not be the same people for both tests. And the researchers do not say that the test
was changed in any way from pre- to postest, so we can assume it was the same test. This
leads us to conclude that instrumtation did not pose a treat in this study.
Return
3. Camping Out
History
Because the group was out in an isolated camp site, it seems unlikely that there would have
been some historical event, unrelated to the program -- i.e. the camping expedition -- under
study, that the group would have been exposed to. The short amount of time between the
pre- and posttest makes history an unlikely threat as well. Of course, it could have been
something unplanned in the camping trip -- an accident, a chance encounter -- that actually
caused the observed effect, but it would be hard to say whether such an event could be
considered a history effect or just a part of the treatment of taking the patients away from
the hospital routine.
Return
3. Camping Out
Testing
The researchers stated that the data were collected by unobtrusively recording on
audiotape five minutes of group interactions using a random time selection scheme. Of
course, the it is difficult to know just how unobtrusive the recording was, but assuming the
researchers were fairly careful, they could have turned the audiotape recorder on without
the group members realizing it. If the group members were aware that they were being
recorded, it would seem that they would feel self-conscious during both recording sessions.
In other words, the data collection during the first session would still not have alerted or
primed the group members to behave differently at the next session. If, however, the group
members realized after the first recording what the purpose of the recording was, and
therefore tried to behave differently the next time, testing could pose a threat. So, while
testing is not a highly plausible threat, it is a possible one. There is, however, a more
plausible threat.
Return
3. Camping Out
Maturation
Because the group members were all adults and the treatment took place over only 5 days,
it is doubtful that the group members would have otherwise, with no treatment, matured in
any measurable way. In addition, the group members were chosen at random, which is a
selection method that is used to balance out possible differential rates of maturation.
Return
3. Camping Out
Instrumentation
Correct! The authors feel that instrumentation is the most plausible threat in this study.
First of all, the researchers employed the Bales Interaction Matrix for examining group
interactions, and they had the hospital staff and five patients do the ratings, using this
instrument, of the group interactions. The Bales instrument is a very complex
measurement tool, and it is not likely that many or any of the staff or patients had
experience using it. Therefore, the learning curve for using this scale would have been
large, and it is likely that the raters' ability improved with their second round of rating the
group interactions.
Another alternative hypothesis, while not exactly instrumentation, relates to the aspect of
the instrumentation threat that deals with the characteristics of the human observers, in
this case, the staff and patients who served as the raters or judges of the group interaction
tapes. This threat is called the Rosenthal effect. The authors say, "The staff and patients
probably expected the camp-out to facilitate social interaction. And this expectancy could
very well have distorted the judges' perceptions when they listened to the tapes and looked
at the second set of pictures. It is not at all unlikely that they selectively heard and saw
things that confirmed their expectations, while not noticing occurrences that ran contrary
to their hopes." In other words, because the staff and patients had a personal investment in
this program, they could have consciously or unconsciously tried to prove that the program
was effective. The authors believe the data analysis would have been much stronger if
outside raters had been used to evaluate the tapes and photographs.
Return
4. Groups for Parents
Correct! Actually, the reason there are six choices for this example is that all six are threats
to the internal validity of this study. The authors say, "First, we must consider the rival
hypothesis of experimental mortality....This raises the possibility that those who were not
finding the approach helpful were more likely to drop out than those for whom the
approach was succeeding.This would also account for the high rate of client satisfaction
reported in the study.
"Second,instrumentation could account for the increases in both reinforcement rates and
compliance rates; that is, since in both cases the parents were the measuring instruments, it
is likely that they were better able to identify both types of behaviors even when the rate
was unchanging. We must also consider the rival hypothesis of history; other events outside
the study may have been taking place during the eight-week experimental period which
affected the dependent variables. It is also possible for the problems to have eased up on
their own during this time, thus giving rise to the rival hypothesis of maturation.
"Other rival hypotheses to be considered include both testing (perhaps the pretest
influenced the parents' responses to the post-test) and statistical regression....in this study
parents were referred as a consequence of having extreme problems -- we would not expect
their problems to be as extreme on a second measurement.
"Needless to say, although the program may well be effective (no data to the contrary are
reported), we would not enroll on the basis of the data presented in this study."
Return
Complete Bibliography
A complete bibliography of Trochim Publications.
Concept Mapping Literature
Publications and selected papers on concept mapping (structured conceptualization).
The Regression-Discontinuity Design Page
Everything you ever wanted to know about the regression-discontinuity quasi-experimental research design but
were afraid to ask.
The Evaluator as Cartographer: Technology for Mapping Where We're Going and Where We've Been
(available also in Spanish)
Trochim, W. (1999) Paper Presented at the 1999 Conference of the Oregon Program Evaluators Network,
"Evaluation and Technology: Tools for the 21st Century", Portland, Oregon, October 8, 1999.
Measuring Organizational Performance as a Result of Installing a New Information System: Using Concept
Mapping as the Basis for Performance Measurement
Trochim, W. (1999). Paper presented at the Annual Conference of the American Evaluation Association, Orlando,
FL, November 3-6, 1999.
This is an outline of the soon-to-be-drafted paper describing a performance measurement project in connection
with an SAP installation at CITGO Petroleum Corporation.
The Regression-Discontinuity Design in Health Evaluation
Trochim, W. (1990). Regression-discontinuity design in health evaluation. In L. Sechrest, E. Perrin and J. Bunker
(Eds.). Research Methodology: Strengthening Causal Interpretations of Nonexperimental Data. U.S. Dept. of HHS,
Agency for Health Care Policy and Research, Washington, D.C.
The Regression-Discontinuity Design: An Introduction
Trochim, W. (1994). The Regression-Discontinuity Design: An Introduction. Research Methods Paper Series,
Number 1, Thresholds National Research and Training Center on Rehabilitation and Mental Illness, Chicago, IL.
Developing an Evaluation Culture for International Agricultural Research
Trochim, W. (1992). Developing an Evaluation Culture in International Agriculture Research. In David R. Lee,
Steven Kearl, and Norman Uphoff (Eds.) Assessing the Impact of International Agricultural Research for
Sustainable Development, Cornell Institute on International Food, Agriculture and Development's (CIIFAD),
Cornell University, Ithaca NY. Includes both a regression-discontinuity and a concept mapping case study.
Characteristics of a Chief of Police: A Concept Mapping Case Study
This website documents a community-wide concept mapping project to identify the characteristics citizens in
Ithaca, New York wanted to see in their next police chief. The results were used by the selection committee to help
evaluate candidates. Project was conducted 1966-1997.
The Regression Point Displacement Design for Evaluating Community-Based Pilot Programs and Demonstration
Projects
This is an unpublished paper that I have been working on with Donald T. Campbell who passed away recently.
Although we have been going at this paper for about four years and have been through about five previous drafts,
we felt it still needed some editing and sharpening. Our current plan was to submit it for publication some time this
millenium. I thought my fellow Campbell fans would enjoy seeing what may be the last original Campbell quasi-
experimental design. Comments and suggestions would be appreciated.
Introduction to Concept Mapping
Trochim, W. (1989). An introduction to concept mapping for planning and evaluation. In W. Trochim (Ed.) A
Special Issue of Evaluation and Program Planning, 12, 1-16.
Pattern Matching, Validity, and Conceptualization in Program Evaluation
Trochim, W. (1985). Pattern Matching, Validity, and Conceptualization in Program Evaluation. Evaluation
Review, 9, 575-604.
Concept Mapping: Hard Art or Soft Science?
Trochim, W. (1989). Concept mapping: Soft science or hard art? In W. Trochim (Ed.) A Special Issue of
Evaluation and Program Planning, 12, 87-110.
The Reliability of Concept Mapping

Trochim, W. Reliability of Concept Mapping. Paper presented at the Annual Conference of the American
Evaluation Association, Dallas, Texas, November, 1993.
Concept Mapping in Mental Health
Trochim, W., Cook, J. and Setze, R. (1994). Using concept mapping to develop a conceptual framework of staff's
views of a supported employment program for persons with severe mental illness. Consulting and Clinical
Psychology, 62, 4, 766-775.
Workforce Competencies for Psychosocial Rehabilitation Workers
Trochim, W. and Cook, J. (1993). Workforce Competencies for Psychosocial Rehabilitation Workers: A Concept
Mapping Project, Final Report for the conference of The International Association of Psychosocial Rehabilitation
Services, Albuquerque, New Mexico, November 11-12, 1993.
Trochim Home Page

[ Home ] [ Language Of Research ] [ Philosophy of Research ] [ Ethics in Research ] [ Conceptualizing ] [ Evaluation Research ]
Five Big Words Learning about research is a lot like learning about anything else. To start, you need to
Types of Questions learn the jargon people use, the big controversies they fight over, and the different
Time in Research
factions that define the major players. We'll start by considering five really big multi-
Types of Relationships
Variables
syllable words that researchers sometimes use to describe what they do. We'll only do a
Hypotheses few for now, to give you an idea of just how esoteric the discussion can get (but not
Types of Data enough to cause you to give up in total despair). We can then take on some of the major
Unit of Analysis issues in research like the types of questions we can ask in a project, the role of time in
Two Research Fallacies research, and the different types of relationships we can estimate. Then we have to
consider defining some basic terms like variable, hypothesis, data, and unit of analysis. If
you're like me, you hate learning vocabulary, so we'll quickly move along to consideration
of two of the major fallacies of research, just to give you an idea of how wrong even
researchers can be if they're not careful (of course, there's always a certainly probability
that they'll be wrong even if they're extremely careful).

Structure of Research You probably think of research as something very abstract and complicated. It can be,
Deduction & Induction but you'll see (I hope) that if you understand the different parts or phases of a
Positivism & Post-Positivism
research project and how these fit together, it's not nearly as complicated as it may
Introduction to Validity
seem at first glance. A research project has a well-known structure -- a beginning,
middle and end. We introduce the basic phases of a research project in The
Structure of Research. In that section, we also introduce some important distinctions
in research: the different types of questions you can ask in a research project; and,
the major components or parts of a research project.
Before the modern idea of research emerged, we had a term for what philosophers
used to call research -- logical reasoning. So, it should come as no surprise that some
of the basic distinctions in logic have carried over into contemporary research. In
Systems of Logic we discuss how two major logical systems, the inductive and
deductive methods of reasoning, are related to modern research.
OK, you knew that no introduction would be complete without considering something
having to do with assumptions and philosophy. (I thought I very cleverly snuck in the
stuff about logic in the last paragraph). All research is based on assumptions about
how the world is perceived and how we can best come to understand it. Of course,
nobody really knows how we can best understand the world, and philosophers have
been arguing about that very question for at least two millennia now, so all we're
going to do is look at how most contemporary social scientists approach the question
of how we know about the world around us. We consider two major philosophical
schools of thought -- Positivism and Post-Positivism -- that are especially important
perspectives for contemporary social research (OK, I'm only considering positivism
and post-positivism here because these are the major schools of thought. Forgive me
for not considering the hotly debated alternatives like relativism, subjectivism,
hermeneutics, deconstructivism, constructivism, feminism, etc. If you really want to
cover that stuff, start your own Web site and send me your URL to stick in here).
Quality is one of the most important issues in research. We introduce the idea of
validity to refer to the quality of various conclusions you might reach based on a
research project. Here's where I've got to give you the pitch about validity. When I
mention validity, most students roll their eyes, curl up into a fetal position or go to
sleep. They think validity is just something abstract and philosophical (and I guess it is
at some level). But I think if you can understand validity -- the principles that we use to
judge the quality of research -- you'll be able to do much more than just complete a
research project. You'll be able to be a virtuoso at research, because you'll have an
understanding of why we need to do certain things in order to assure quality. You
won't just be plugging in standard procedures you learned in school -- sampling
method X, measurement tool Y -- you'll be able to help create the next generation of
research technology. Enough for now -- more on this later.

We are going through a time of profound change in our understanding of the ethics of applied social
research. From the time immediately after World War II until the early 1990s, there was a gradually
developing consensus about the key ethical principles that should underlie the research endeavor. Two
marker events stand out (among many others) as symbolic of this consensus. The Nuremberg War Crimes
Trial following World War II brought to public view the ways German scientists had used captive human
subjects as subjects in oftentimes gruesome experiments. In the 1950s and 1960s, the Tuskegee Syphilis
Study involved the withholding of known effective treatment for syphilis from African-American participants
who were infected. Events like these forced the reexamination of ethical standards and the gradual
development of a consensus that potential human subjects needed to be protected from being used as
'guinea pigs' in scientific research.
By the 1990s, the dynamics of the situation changed. Cancer patients and persons with AIDS fought
publicly with the medical research establishment about the long time needed to get approval for and
complete research into potential cures for fatal diseases. In many cases, it is the ethical assumptions of
the previous thirty years that drive this 'go-slow' mentality. After all, we would rather risk denying treatment
for a while until we achieve enough confidence in a treatment, rather than run the risk of harming innocent
people (as in the Nuremberg and Tuskegee events). But now, those who were threatened with fatal illness
were saying to the research establishment that they wanted to be test subjects, even under experimental
conditions of considerable risk. You had several very vocal and articulate patient groups who wanted to be
experimented on coming up against an ethical review system that was designed to protect them from
being experimented on.
Although the last few years in the ethics of research have been tumultuous ones, it is beginning to appear
that a new consensus is evolving that involves the stakeholder groups most affected by a problem
participating more actively in the formulation of guidelines for research. While it's not entirely clear, at
present, what the new consensus will be, it is almost certain that it will not fall at either extreme: protecting
against human experimentation at all costs vs. allowing anyone who is willing to be experimented on.
Ethical Issues
There are a number of key phrases that describe the system of ethical protections that the contemporary
social and medical research establishment have created to try to protect better the rights of their research
participants. The principle of voluntary participation requires that people not be coerced into participating
in research. This is especially relevant where researchers had previously relied on 'captive audiences' for
their subjects -- prisons, universities, and places like that. Closely related to the notion of voluntary
participation is the requirement of informed consent. Essentially, this means that prospective research
participants must be fully informed about the procedures and risks involved in research and must give their
consent to participate. Ethical standards also require that researchers not put participants in a situation
where they might be at risk of harm as a result of their participation. Harm can be defined as both
physical and psychological. There are two standards that are applied in order to help protect the privacy of
research participants. Almost all research guarantees the participants confidentiality -- they are assured
that identifying information will not be made available to anyone who is not directly involved in the study.
The stricter standard is the principle of anonymity which essentially means that the participant will remain
anonymous throughout the study -- even to the researchers themselves. Clearly, the anonymity standard
is a stronger guarantee of privacy, but it is sometimes difficult to accomplish, especially in situations where
participants have to be measured at multiple time points (e.g., a pre-post study). Increasingly, researchers
have had to deal with the ethical issue of a person's right to service. Good research practice often
requires the use of a no-treatment control group -- a group of participants who do not get the treatment or
program that is being studied. But when that treatment or program may have beneficial effects, persons
assigned to the no-treatment control may feel their rights to equal access to services are being curtailed.
Even when clear ethical standards and principles exist, there will be times when the need to do accurate
research runs up against the rights of potential participants. No set of standards can possibly anticipate
every ethical circumstance. Furthermore, there needs to be a procedure that assures that researchers will
consider all relevant ethical issues in formulating research plans. To address such needs most institutions
and organizations have formulated an Institutional Review Board (IRB), a panel of persons who reviews
grant proposals with respect to ethical implications and decides whether additional actions need to be
taken to assure the safety and rights of participants. By reviewing proposals for research, IRBs also help to
protect both the organization and the researcher against potential legal implications of neglecting to
address important ethical issues of participants.

Problem Formulation One of the most difficult aspects of research -- and one of the least discussed -- is how to
Concept Mapping develop the idea for the research project in the first place. In training students, most faculty
just assume that if you read enough of the research in an area of interest, you will somehow
magically be able to produce sensible ideas for further research. Now, that may be true.
And heaven knows that's the way we've been doing this higher education thing for some
time now. But it troubles me that we haven't been able to do a better job of helping our
students learn how to formulate good research problems. One thing we can do (and some
texts at least cover this at a surface level) is to give students a better idea of how
professional researchers typically generate research ideas. Some of this is introduced in
the discussion of problem formulation in applied social research.
But maybe we can do even better than that. Why can't we turn some of our expertise in
developing methods into methods that students and researchers can use to help them
formulate ideas for research. I've been working on that area pretty intensively for over a
decade now -- I came up with a structured approach that groups can use to map out their
ideas on any topic. This approach, called concept mapping can be used by research teams
to help them clarify and map out the key research issues in an area, to help them
operationalize the programs or interventions or the outcome measures for their study. The
concept mapping method isn't the only method around that might help researchers
formulate good research problems and projects. Virtually any method that's used to help
individuals and groups to think more effectively would probably be useful in research
formulation. Some of the methods that might be included in our toolkit for research
formulation might be: brainstorming, brainwriting, nominal group technique, focus groups,
Delphi methods, and facet theory. And then, of course, there are all of the methods for
identifying relevant literature and previous research work. If you know of any techniques or
methods that you think might be useful when formulating the research problem, please feel
free to add a notation -- if there's a relevant Website, please point to it in the notation.

Introduction to Evaluation One specific form of social research -- evaluation research -- is of particular interest
The Planning-Evaluation Cycle here. The Introduction to Evaluation Research presents an overview of what
An Evaluation Culture
evaluation is and how it differs from social research generally. We also introduce
several evaluation models to give you some perspective on the evaluation
endeavor. Evaluation should not be considered in a vacuum. Here, we consider
evaluation as embedded within a larger Planning-Evaluation Cycle.
Evaluation can be a threatening activity. Many groups and organizations struggle

with how to build a good evaluation capability into their everyday activities and
procedures. This is essentially an organizational culture issue. Here we consider
some of the issues a group or organization needs to address in order to develop an
evaluation culture that works in their context.

[ Home ] [ Introduction to Evaluation ] [ The Planning-Evaluation Cycle ] [ An Evaluation Culture ]
Evaluation is a methodological area that is closely related to, but distinguishable from more traditional
social research. Evaluation utilizes many of the same methodologies used in traditional social research,
but because evaluation takes place within a political and organizational context, it requires group skills,
management ability, political dexterity, sensitivity to multiple stakeholders and other skills that social
research in general does not rely on as much. Here we introduce the idea of evaluation and some of the
major terms and issues in the field.
Definitions of Evaluation
Probably the most frequently given definition is:
Evaluation is the systematic assessment of the worth or merit of some object
This definition is hardly perfect. There are many types of evaluations that do not necessarily result in an
assessment of worth or merit -- descriptive studies, implementation analyses, and formative evaluations, to
name a few. Better perhaps is a definition that emphasizes the information-processing and feedback
functions of evaluation. For instance, one might say:
Evaluation is the systematic acquisition and assessment of information to provide

useful feedback about some object
Both definitions agree that evaluation is a systematic endeavor and both use the deliberately ambiguous
term 'object' which could refer to a program, policy, technology, person, need, activity, and so on. The
latter definition emphasizes acquiring and assessing information rather than assessing worth or merit
because all evaluation work involves collecting and sifting through data, making judgements about the
validity of the information and of inferences we derive from it, whether or not an assessment of worth or
merit results.
The Goals of Evaluation
The generic goal of most evaluations is to provide "useful feedback" to a variety of audiences including
sponsors, donors, client-groups, administrators, staff, and other relevant constituencies. Most often,
feedback is perceived as "useful" if it aids in decision-making. But the relationship between an evaluation
and its impact is not a simple one -- studies that seem critical sometimes fail to influence short-term
decisions, and studies that initially seem to have no influence can have a delayed impact when more
congenial conditions arise. Despite this, there is broad consensus that the major goal of evaluation should
be to influence decision-making or policy formulation through the provision of empirically-driven feedback.
Evaluation Strategies
'Evaluation strategies' means broad, overarching perspectives on evaluation. They encompass the most
general groups or "camps" of evaluators; although, at its best, evaluation work borrows eclectically from
the perspectives of all these camps. Four major groups of evaluation strategies are discussed here.
Scientific-experimental models are probably the most historically dominant evaluation strategies. Taking
their values and methods from the sciences -- especially the social sciences -- they prioritize on the
desirability of impartiality, accuracy, objectivity and the validity of the information generated. Included
under scientific-experimental models would be: the tradition of experimental and quasi-experimental
designs; objectives-based research that comes from education; econometrically-oriented perspectives
including cost-effectiveness and cost-benefit analysis; and the recent articulation of theory-driven
evaluation.
The second class of strategies are management-oriented systems models. Two of the most common of
these are PERT, the Program Evaluation and Review Technique, and CPM, the Critical Path Method. Both
have been widely used in business and government in this country. It would also be legitimate to include
the Logical Framework or "Logframe" model developed at U.S. Agency for International Development and
general systems theory and operations research approaches in this category. Two management-oriented
systems models were originated by evaluators: the UTOS model where U stands for Units, T for
Treatments, O for Observing Observations and S for Settings; and the CIPP model where the C stands for
Context, the I for Input, the first P for Process and the second P for Product. These management-oriented
systems models emphasize comprehensiveness in evaluation, placing evaluation within a larger
framework of organizational activities.
The third class of strategies are the qualitative/anthropological models. They emphasize the importance
of observation, the need to retain the phenomenological quality of the evaluation context, and the value of
subjective human interpretation in the evaluation process. Included in this category are the approaches
known in evaluation as naturalistic or 'Fourth Generation' evaluation; the various qualitative schools;
critical theory and art criticism approaches; and, the 'grounded theory' approach of Glaser and Strauss
among others.
Finally, a fourth class of strategies is termed participant-oriented models. As the term suggests, they
emphasize the central importance of the evaluation participants, especially clients and users of the
program or technology. Client-centered and stakeholder approaches are examples of participant-oriented
models, as are consumer-oriented evaluation systems.
With all of these strategies to choose from, how to decide? Debates that rage within the evaluation
profession -- and they do rage -- are generally battles between these different strategists, with each
claiming the superiority of their position. In reality, most good evaluators are familiar with all four categories
and borrow from each as the need arises. There is no inherent incompatibility between these broad
strategies -- each of them brings something valuable to the evaluation table. In fact, in recent years
attention has increasingly turned to how one might integrate results from evaluations that use different
strategies, carried out from different perspectives, and using different methods. Clearly, there are no
simple answers here. The problems are complex and the methodologies needed will and should be varied.
Types of Evaluation
There are many different types of evaluations depending on the object being evaluated and the purpose of
the evaluation. Perhaps the most important basic distinction in evaluation types is that between formative
and summative evaluation. Formative evaluations strengthen or improve the object being evaluated --
they help form it by examining the delivery of the program or technology, the quality of its implementation,
and the assessment of the organizational context, personnel, procedures, inputs, and so on. Summative
evaluations, in contrast, examine the effects or outcomes of some object -- they summarize it by
describing what happens subsequent to delivery of the program or technology; assessing whether the
object can be said to have caused the outcome; determining the overall impact of the causal factor beyond
only the immediate target outcomes; and, estimating the relative costs associated with the object.
Formative evaluation includes several evaluation types:
needs assessment determines who needs the program, how great the need is, and what might
work to meet the need
evaluability assessment determines whether an evaluation is feasible and how stakeholders can
help shape its usefulness
structured conceptualization helps stakeholders define the program or technology, the target
population, and the possible outcomes
implementation evaluation monitors the fidelity of the program or technology delivery
process evaluation investigates the process of delivering the program or technology, including
alternative delivery procedures
Summative evaluation can also be subdivided:
outcome evaluations investigate whether the program or technology caused demonstrable

effects on specifically defined target outcomes
impact evaluation is broader and assesses the overall or net effects -- intended or unintended --
of the program or technology as a whole
cost-effectiveness and cost-benefit analysis address questions of efficiency by standardizing
outcomes in terms of their dollar costs and values
secondary analysis reexamines existing data to address new questions or use methods not
previously employed
meta-analysis integrates the outcome estimates from multiple studies to arrive at an overall or
summary judgement on an evaluation question
Evaluation Questions and Methods
Evaluators ask many different kinds of questions and use a variety of methods to address them. These are
considered within the framework of formative and summative evaluation as presented above.
In formative research the major questions and methodologies are:
What is the definition and scope of the problem or issue, or what's the question?
Formulating and conceptualizing methods might be used including brainstorming, focus

groups, nominal group techniques, Delphi methods, brainwriting, stakeholder analysis,
synectics, lateral thinking, input-output analysis, and concept mapping.
Where is the problem and how big or serious is it?
The most common method used here is "needs assessment" which can include: analysis of
existing data sources, and the use of sample surveys, interviews of constituent populations,
qualitative research, expert testimony, and focus groups.
How should the program or technology be delivered to address the problem?
Some of the methods already listed apply here, as do detailing methodologies like
simulation techniques, or multivariate methods like multiattribute utility theory or exploratory
causal modeling; decision-making methods; and project planning and implementation
methods like flow charting, PERT/CPM, and project scheduling.
How well is the program or technology delivered?
Qualitative and quantitative monitoring techniques, the use of management information

systems, and implementation assessment would be appropriate methodologies here.
The questions and methods addressed under summative evaluation include:
What type of evaluation is feasible?
Evaluability assessment can be used here, as well as standard approaches for selecting an
appropriate evaluation design.
What was the effectiveness of the program or technology?
One would choose from observational and correlational methods for demonstrating whether
desired effects occurred, and quasi-experimental and experimental designs for determining
whether observed effects can reasonably be attributed to the intervention and not to other
sources.
What is the net impact of the program?
Econometric methods for assessing cost effectiveness and cost/benefits would apply here,
along with qualitative methods that enable us to summarize the full range of intended and
unintended impacts.
Clearly, this introduction is not meant to be exhaustive. Each of these methods, and the many not
mentioned, are supported by an extensive methodological research literature. This is a formidable set of
tools. But the need to improve, update and adapt these methods to changing circumstances means that
methodological research and development needs to have a major place in evaluation work.
Often, evaluation is construed as part of a larger managerial or administrative process. Sometimes this is
referred to as the planning-evaluation cycle. The distinctions between planning and evaluation are not
always clear; this cycle is described in many different ways with various phases claimed by both planners
and evaluators. Usually, the first stage of such a cycle -- the planning phase -- is designed to elaborate a
set of
potential
actions,
programs,
or
technologies,
and select
the best for
implementation.
Depending
on the
organization
and the
problem
being
addressed,
a planning
process
could
involve any
or all of
these
stages: the
formulation
of the
problem,
issue, or concern; the broad conceptualization of the major alternatives that might be considered; the
detailing of these alternatives and their potential implications; the evaluation of the alternatives and the
selection of the best one; and the implementation of the selected alternative. Although these stages are
traditionally considered planning, there is a lot of evaluation work involved. Evaluators are trained in needs
assessment, they use methodologies -- like the concept mapping one presented later -- that help in
conceptualization and detailing, and they have the skills to help assess alternatives and make a choice of
the best one.
The evaluation phase also involves a sequence of stages that typically includes: the formulation of the
major objectives, goals, and hypotheses of the program or technology; the conceptualization and
operationalization of the major components of the evaluation -- the program, participants, setting, and
measures; the design of the evaluation, detailing how these components will be coordinated; the
analysis of the information, both qualitative and quantitative; and the utilization of the evaluation results.

I took the idea of an evaluation culture from a wonderful paper written by Donald Campbell in 1969 entitled
'Methods for an Experimenting Society.' Following in the footsteps of that paper, this one is considerably
more naive and utopian. And, I have changed the name of this idealized society to reflect terminology that
is perhaps more amenable to the climate of the 1990s. For the term experimenting, I have substituted the
softer and broader term evaluating. And for the term society, I have substituted the more internationally-
flavored term culture. With these shifts in emphasis duly noted, I want you to know that I see the
evaluation culture as one that a member of the experimenting society would feel comfortable visiting, and
perhaps even thinking of taking as a permanent residence.
What would an evaluation culture look like? What should its values be? You should know at the outset that
I fully hope that some version of this fantasy will become an integral part of twenty-first century thought.
There is no particular order of importance to the way these ideas are presented -- I'll leave that ordering to
subsequent efforts.
First, our evaluation culture will embrace an action-oriented perspective that actively seeks solutions to
problems, trying out tentative ones, weighing the results and consequences of actions, all within an
endless cycle of supposition-action-evidence-revision that characterizes good science and good
management. In this activist evaluation culture, we will encourage innovative approaches at all levels. But
well-intentioned activism by itself is not enough, and may at times be risky, dangerous, and lead to
detrimental consequences. In an evaluation culture, we won't act for action's sake -- we'll always attempt
to assess the effects of our actions.
This evaluation culture will be an accessible, teaching-oriented one that emphasizes the unity of formal
evaluation and everyday thought. Most of our evaluations will be simple, informal, efficient, practical, low-
cost and easily carried out and understood by nontechnicians. Evaluations won't just be delegated to one
person or department -- we will encourage everyone in our organizations to become involved in evaluating
what they and their organizations do. Where technical expertise is needed we will encourage the experts
to also educate us about the technical side of what they do, demanding that they try to find ways to explain
their techniques and methods adequately for nontechnicians. We will devote considerable resources to
teaching others about evaluation principles.
Our evaluation culture will be diverse, inclusive, participatory, responsive and fundamentally non-
hierarchical. World problems cannot be solved by simple "silver bullet" solutions. There is growing
recognition in many arenas that our most fundamental problems are systemic, interconnected, and
inextricably linked to social and economic issues and factors. Solutions will involve husbanding the
resources, talents and insights of a wide range of people. The formulation of problems and potential
solutions needs to involve a broad range of constituencies. More than just "research" skills will be needed.
Especially important will be skills in negotiation and consensus-building processes. Evaluators are familiar
with arguments for greater diversity and inclusiveness -- we've been talking about stakeholder,
participative, multiple-constituency research for nearly two decades. No one that I know is seriously
debating anymore whether we should move to more inclusive participatory approaches. The real question
seems to be how such work might best be accomplished, and despite all the rhetoric about the importance
of participatory methods, we have a long way to go in learning how to do them effectively.
Our evaluation culture will be a humble, self-critical one. We will openly acknowledge our limitations and
recognize that what we learn from a single evaluation study, however well designed, will almost always be
equivocal and tentative. In this regard, I believe we too often undervalue cowardice in research. I find it
wholly appropriate that evaluators resist being drawn into making decisions for others, although certainly
the results of our work should help inform the decision makers. A cowardly approach saves the evaluator
from being drawn into the political context, helping assure the impartiality needed for objective
assessment, and it protects the evaluator from taking responsibility for making decisions that should be left
to those who have been duly-authorized -- and who have to live with the consequences. Most program
decisions, especially decisions about whether to continue a program or close it down, must include more
input than an evaluation alone can ever provide. While evaluators can help to elucidate what has
happened in the past or might happen under certain circumstances, it is the responsibility of the
organization and society as a whole to determine what ought to happen. The debate about the appropriate
role of an evaluator in the decision-making process is an extremely intense one right now in evaluation
circles, and my position advocating a cowardly reluctance of the evaluator to undertake a decision-making
role may very well be in the minority. We will need to debate this issue vigorously, especially for politically-
complex, international-evaluation contexts.
Our evaluation culture will need to be an interdisciplinary one, doing more than just grafting one
discipline onto another through constructing multi-discipline research teams. We'll need such teams, of
course, but I mean to imply something deeper, more personally internalized -- we need to move toward
being nondisciplinary, consciously putting aside the blinders of our respective specialties in an attempt to
foster a more whole view of the phenomena we study. As we consider the programs we are evaluating, we
each should be able to speculate about a broad range of implementation factors or potential
consequences. We should be able to anticipate some of the organizational and systems-related features
of these programs, the economic factors that might enhance or reduce implementation, their social and
psychological dimensions, and especially whether the ultimate utilizers can understand or know how to
utilize and be willing to utilize the the results of our evaluation work. We should also be able to anticipate a
broad spectrum of potential consequences -- system-related, production-related, economic, nutritional,
social, environmental.
This evaluation culture will also be an honest, truth-seeking one that stresses accountability and scientific
credibility. In many quarters in contemporary society, it appears that many people have given up on the
ideas of truth and validity. Our evaluation culture needs to hold to the goal of getting at the truth while at
the same time honestly acknowledging the revisability of all scientific knowledge. We need to be critical of
those who have given up on the goal of "getting it right" about reality, especially those among the
humanities and social sciences who argue that truth is entirely relative to the knower, objectivity an
impossibility, and reality nothing more than a construction or illusion that cannot be examined publicly. For
them, the goal of seeking the truth is inappropriate and unacceptable, and science a tool of oppression
rather than a road to greater enlightenment. Philosophers have, of course, debated such issues for
thousands of years and will undoubtedly do so for thousands more. We in the evaluation culture need to
check in on their thinking from time to time, but until they settle these debates, we need to hold steadfastly
to the goal of getting at the truth -- the goal of getting it right about reality.
Our evaluation culture will be prospective and forward-looking, anticipating where evaluation feedback
will be needed rather than just reacting to situations as they arise. We will construct simple, low-cost
evaluation and monitoring information systems when we first initiate a new program or technology -- we
cannot wait until a program is complete or a technology is in the field before we turn our attention to its
evaluation.
Finally, the evaluation culture I envision is one that will emphasize fair, open, ethical and democratic
processes. We should move away from private ownership of and exclusive access to data. The data from
all of our evaluations needs to be accessible to all interested groups allowing more extensive independent
secondary analyses and opportunities for replication or refutation of original results. We should encourage
open commentary and debate regarding the results of specific evaluations. Especially when there are
multiple parties who have a stake in such results, it is important for our reporting procedures to include
formal opportunities for competitive review and response. Our evaluation culture must continually strive for
greater understanding of the ethical dilemmas posed by our research. Our desire for valid, scientific
inference will at times put us in conflict with ethical principles. The situation is likely to be especially
complex in international-evaluation contexts where we will often be dealing with multiple cultures and
countries that are at different stages of economic development and have different value systems and
morals. We need to be ready to deal with potential ethical and political issues posed by our methodologies
in an open, direct, and democratic manner.
Do you agree with the values I'm describing here? What other characteristics might this evaluation culture
have? You tell me. There are many more values and characteristics that ought to be considered. For now,
the ones mentioned above, and others in the literature, provide us with a starting point at which we can all
join the discussion. I hope you will add to the list, and I encourage each of you to criticize these tentative
statements I've offered about the extraordinary potential of the evaluation culture that we are all in the
process of creating today.

[ Home ] [ Yin-Yang Map ] [ The Road Map ]
Navigating the Knowledge Base
The Yin-Yang Map
The Yin and the Yang of Research
You can use the figure above to find your way through the material in the Knowledge Base. Click on any
part of the figure to move to that topic.
The figure shows one way of structuring the material in the Knowledge Base. The left side of the figure
refers to the theory of research. The right side of the figure refers to the practice of research.
The yin-yang figure in the center links you to a theoretical introduction to research on the left and to the
practical issue of how we formulate research projects on the right.
The four arrow links on the left describe the four types of validity in research. The idea of validity provides
us with a unifying theory for understanding the criteria for good research. The four arrow links on the right
point to the research practice areas that correspond with each validity type. For instance, external validity
is related to the theory of how we generalize research results. It's corresponding practice area is sampling
methodology which is concerned with how to draw representative samples so that generalizations are
possible.
[ Home ] [ Yin-Yang Map ] [ The Road Map ]
Navigating the Knowledge Base
The Road Map
The Road to Research
Remember all those Bob Hope and Bing Crosby films? The Road to Singapore? Of course you don't -- you're much
too young! Well, I thought it might be useful to visualize the research endeavor sequentially, like taking a trip, like
moving down a road -- the Road to Research. The figure above shows a very applied way to view the content of a
research methods course that helps you consider the research process practically. You might visualize a research
project as a journey where you must stop at certain points along your way. Every research project needs to start with
a clear problem formulation. As you develop your project, you will find critical junctions where you will make choices
about how you will proceed. Consider issues of sampling, measurement, design, and analysis - as well as the
theories of validity behind each step. In the end, you will need to think about the whole picture, or "What can we
conclude?" Then you might write-up your findings or report your evaluation. You even might find yourself backtracking
and evaluating your previous decisions! Don't forget that this is a two-way road; planning and evaluation are critical
and interdependent. The asphalt of the road is the foundation of research philosophy and practice. Without
consideration of the basics in research, you'll find yourself bogged down in the mud!

[ Home ] [ Types of Surveys ] [ Selecting the Survey Method ] [ Constructing the Survey ] [ Interviews ] [ Plus & Minus of Survey Methods ]
Surveys can be divided into two broad categories: the questionnaire and the interview. Questionnaires
are usually paper-and-pencil instruments that the respondent completes. Interviews are completed by the
interviewer based on the respondent says. Sometimes, it's hard to tell the difference between a
questionnaire and an interview. For instance, some people think that questionnaires always ask short
closed-ended questions while interviews always ask broad open-ended ones. But you will see
questionnaires with open-ended questions (although they do tend to be shorter than in interviews) and
there will often be a series of closed-ended questions asked in an interview.
Survey research has changed dramatically in the last ten years. We have automated telephone surveys
that use random dialing methods. There are computerized kiosks in public places that allows people to ask
for input. A whole new variation of group interview has evolved as focus group methodology. Increasingly,
survey research is tightly integrated with the delivery of service. Your hotel room has a survey on the desk.
Your waiter presents a short customer satisfaction survey with your check. You get a call for an interview
several days after your last call to a computer company for technical assistance. You're asked to complete
a short survey when you visit a web site. Here, I'll describe the major types of questionnaires and
interviews, keeping in mind that technology is leading to rapid evolution of methods. We'll discuss the
relative advantages and disadvantages of these different survey types in Advantages and Disadvantages
of Survey Methods.
Questionnaires
When most people think of questionnaires, they think of the mail survey.
All of us have, at one time or another, received a questionnaire in the mail.
There are many advantages to mail surveys. They are relatively
inexpensive to administer. You can send the exact same instrument to a
wide number of people. They allow the respondent to fill it out at their own
convenience. But there are some disadvantages as well. Response rates from mail surveys are often very
low. And, mail questionnaires are not the best vehicles for asking for detailed written responses.
A second type is the group administered questionnaire. A sample of

respondents is brought together and asked to respond to a structured
sequence of questions. Traditionally, questionnaires were administered in
group settings for convenience. The researcher could give the
questionnaire to those who were present and be fairly sure that there
would be a high response rate. If the respondents were unclear about the
meaning of a question they could ask for clarification. And, there were
often organizational settings where it was relatively easy to assemble the
group (in a company or business, for instance).
What's the difference between a group administered questionnaire and a group interview or focus group?
In the group administered questionnaire, each respondent is handed an instrument and asked to complete
it while in the room. Each respondent completes an instrument. In the group interview or focus group, the
interviewer facilitates the session. People work as a group, listening to each other's comments and
answering the questions. Someone takes notes for the entire group -- people don't complete an interview
individually.
A less familiar type of questionnaire is the household drop-off survey. In

this approach, a researcher goes to the respondent's home or business
and hands the respondent the instrument. In some cases, the respondent
is asked to mail it back or the interview returns to pick it up. This approach
attempts to blend the advantages of the mail survey and the group
administered questionnaire. Like the mail survey, the respondent can work
on the instrument in private, when it's convenient. Like the group
administered questionnaire, the interviewer makes personal contact with
the respondent -- they don't just send an impersonal survey instrument. And, the respondent can ask
questions about the study and get clarification on what is to be done. Generally, this would be expected to
increase the percent of people who are willing to respond.
Interviews
Interviews are a far more personal form of research than questionnaires. In the
personal interview, the interviewer works directly with the respondent. Unlike
with mail surveys, the interviewer has the opportunity to probe or ask follow-up
questions. And, interviews are generally easier for the respondent, especially if
what is sought is opinions or impressions. Interviews can be very time
consuming and they are resource intensive. The interviewer is considered a part
of the measurement instrument and interviewers have to be well trained in how
to respond to any contingency.
Almost everyone is familiar with the telephone

interview. Telephone interviews enable a researcher to gather
information rapidly. Most of the major public opinion polls that are
reported were based on telephone interviews. Like personal interviews,
they allow for some personal contact between the interviewer and the
respondent. And, they allow the interviewer to ask follow-up questions.
But they also have some major disadvantages. Many people don't have
publicly-listed telephone numbers. Some don't have telephones. People
often don't like the intrusion of a call to their homes. And, telephone interviews have to be relatively short
or people will feel imposed upon.

Types Of Questions Constructing a survey instrument is an art in itself. There are numerous small decisions that
Question Content must be made -- about content, wording, format, placement -- that can have important
Response Format
consequences for your entire study. While there's no one perfect way to accomplish this job,
Question Wording
we do have lots of advice to offer that might increase your chances of developing a better
Question Placement
final product.
First of all you'll learn about the two major types of surveys that exist, the questionnaire and
the interview and the different varieties of each. Then you'll see how to write questions for
surveys. There are three areas involved in writing a question:
determining the question content, scope and purpose

choosing the response format that you use for collecting information from the
respondent
figuring out how to word the question to get at the issue of interest
Finally, once you have your questions written, there is the issue of how best to place them
in your survey.
You'll see that although there are many aspects of survey construction that are just common
sense, if you are not careful you can make critical errors that have dramatic effects on your
results.

[ Home ] [ Types Of Questions ] [ Question Content ] [ Response Format ] [ Question Wording ] [ Question Placement ]
Survey questions can be divided into two broad types: structured and unstructured. From an instrument design
point of view, the structured questions pose the greater difficulties (see Decisions About the Response Format).
From a content perspective, it may actually be more difficult to write good unstructured questions. Here, I'll
discuss the variety of structured questions you can consider for your survey (we'll discuss unstructured
questioning more under Interviews).
Dichotomous Questions
When a question has two possible responses, we consider it dichotomous. Surveys often use dichotomous
questions that ask for a Yes/No, True/False or Agree/Disagree response. There are a variety of ways to lay these
questions out on a questionnaire:
Questions Based on Level Of Measurement
We can also classify questions in terms of their level of measurement. For instance, we might measure
occupation using a nominal question. Here, the number next to each response has no meaning except as a
placeholder for that response. The choice of a "2" for a lawyer and a "1" for a truck driver is arbitrary -- from the
numbering system used we can't infer that a lawyer is "twice" something that a truck driver is.
We might ask respondents to rank order their preferences for presidential candidates using an ordinal question:
We want the respondent to put a 1, 2, 3 or 4 next to the candidate, where 1 is the respondent's first choice. Note
that this could get confusing. We might want to state the prompt more explicitly so the respondent knows we want
a number from one to 4 (the respondent might check their favorite candidate, or assign higher numbers to
candidates they prefer more instead of understanding that we want rank ordering).
We can also construct survey questions that attempt to measure on an interval level. One of the most common of
these types is the traditional 1-to-5 rating (or 1-to-7, or 1-to-9, etc.). This is sometimes referred to as a Likert
response scale (see Likert Scaling). Here, we see how we might ask an opinion question on a 1-to-5 bipolar
scale (it's called bipolar because there is a neutral point and the two ends of the scale are at opposite positions of
the opinion):
Another interval question uses an approach called the semantic differential. Here, an object is assessed by the
respondent on a set of bipolar adjective pairs (using 5-point rating scale):
Finally, we can also get at interval measures by using what is called a cumulative or Guttman scale (see
Guttman Scaling). Here, the respondent checks each item with which they agree. The items themselves are
constructed so that they are cumulative -- if you agree to one, you probably agree to all of the ones above it in the
list:
Filter or Contingency Questions
Sometimes you have to ask the respondent one question in order to determine if they are qualified or experienced
enough to answer a subsequent one. This requires using a filter or contingency question. For instance, you
may want to ask one question if the respondent has ever smoked marijuana and a different question if they have
not. in this case, you would have to construct a filter question to determine whether they've ever smoked
marijuana:
Filter questions can get very complex. Sometimes, you have to have multiple filter questions in order to direct your
respondents to the correct subsequent questions. There are a few conventions you should keep in mind when
using filters:
try to avoid having more than three levels (two jumps) for any question
Too many jumps will confuse the respondent and may discourage them from continuing with the
survey.
if only two levels, use graphic to jump (e.g., arrow and box)
The example above shows how you can make effective use of an arrow and box to help direct the
respondent to the correct subsequent question.
if possible, jump to a new page
If you can't fit the response to a filter on a single page, it's probably best to be able to say
something like "If YES, please turn to page 4" rather that "If YES, please go to Question 38"
because the respondent will generally have an easier time finding a page than a specific question.
For each question in your survey, you should ask yourself how well it addresses the content you are trying
to get at. Here are some content-related questions you can ask about your survey questions.
Is the Question Necessary/Useful?
Examine each question to see if you need to ask it at all and if you need to ask it at the level of detail you
currently have.
Do you need the age of each child or just the number of children under 16?
Do you need to ask income or can you estimate?
Are Several Questions Needed?
This is the classic problem of the double-barreled question. You should think about splitting each of the
following questions into two separate ones. You can often spot these kinds of problems by looking for the
conjunction "and" in your question.
What are your feelings towards African-Americans and Hispanic-Americans?

What do you think of proposed changes in benefits and hours?
Another reason you might need more than one question is that the question you ask does not cover all
possibilities. For instance, if you ask about earnings, the respondent might not mention all income (e.g.,
dividends, gifts). Or, if you ask the respondents if they're in favor of public TV, they might not understand
that you're asking generally. They may not be in favor of public TV for themselves (they never watch it),
but might favor it very much for their children (who watch Sesame Street regularly). You might be better off
asking two questions, one for their own viewing and one for other members of their household.
Sometimes you need to ask additional questions because your question does not give you enough
context to interpret the answer. For instance, if you ask about attitudes towards Catholics, can you
interpret this without finding out about their attitudes towards religion in general, or other religious groups?
At times, you need to ask additional questions because your question does not determine the intensity
of the respondent's attitude or belief. For example, if they say they support public TV, you probably should
also ask them whether they ever watch it or if they would be willing to have their tax dollars spent on it. It's
one thing for a respondent to tell you they support something. But the intensity of that response is greater
if they are willing to back their sentiment of support with their behavior.
Do Respondents Have the Needed Information?
Look at each question in your survey to see whether the respondent is likely to have the necessary
information to be able to answer the question. For example, let's say you want to ask the question:
Do you think Dean Rusk acted correctly in the Bay of Pigs crisis?
The respondent won't be able to answer this question if they have no idea who Dean Rusk was or what the
Bay of Pigs crisis was. In surveys of television viewing, you cannot expect that the respondent can answer
questions about shows they have never watched. You should ask a filter question first (e.g., Have you ever
watched the show ER?) before asking them their opinions about it.
Does the Question Need to be More Specific?
Sometimes we ask our questions too generally and the information we obtain is more difficult to interpret.
For example, let's say you want to find out respondent's opinions about a specific book. You could ask
them
How well did you like the book?
on some scale ranging from "Not At All" to "Extremely Well." But what would their response mean? What
does it mean to say you liked a book very well? Instead, you might as questions designed to be more
specific like:
Did you recommend the book to others?
or
Did you look for other books by that author?
Is Question Sufficiently General?
You can err in the other direction as well by being too specific. For instance, if you ask someone to list the
televisions program they liked best in the past week, you could get a very different answer than if you
asked them which show they've enjoyed most over the past year. Perhaps a show they don't usually like
had a great episode in the past week, or their show was preempted by another program.
Is Question Biased or Loaded?
One danger in question-writing is that your own biases and blind-spots may affect the wording (see
Decisions About Question Wording). For instance, you might generally be in favor of tax cuts. If you ask a
question like:
What do you see as the benefits of a tax cut?

you're only asking about one side of the issue. You might get a very different picture of the respondents'
positions if you also asked about the disadvantages of tax cuts. The same thing could occur if you are in
favor of public welfare and you ask:
What do you see as the disadvantages of eliminating welfare?
without also asking about the potential benefits.
Will Respondent Answer Truthfully?
For each question on your survey, ask yourself whether the respondent will have any difficulty answering
the question truthfully. If there is some reason why they may not, consider rewording the question. For
instance, some people are sensitive about answering questions about their exact age or income. In this
case, you might give them response brackets to choose from (e.g., between 30 and 40 years old,
between $50,000 and $100,000 annual income). Sometimes even bracketed responses won't be enough.
Some people do not like to share how much money they give to charitable causes (they may be afraid of
being solicited even more). No matter how you word the question, they would not be likely to tell you their
contribution rate. But sometimes you can do this by posing the question in terms of a hypothetical
projective respondent (a little bit like a projective test). In this case, you might get reasonable estimates if
you ask the respondent how much money "people you know" typically give in a year to charitable causes.
Finally, you can sometimes dispense with asking a question at all if you can obtain the answer
unobtrusively (see Unobtrusive Measures). If you are interested in finding out what magazines the
respondent reads, you might instead tell them you are collecting magazines for a recycling drive and ask if
they have any old ones to donate (of course, you have to consider the ethical implications of such
deception!).

The response format is how you collect the answer from the respondent. Let's start with a simple distinction
between what we'll call unstructured response formats and structured response formats. [On this page,
I'll use standard web-based form fields to show you how various response formats might look on the web. If
you want to see how these are generated, select the View Source option on your web browser.]
Structured Response Formats
Structured formats help the respondent to respond more easily and help the researcher to accumulate and
summarize responses more efficiently. But, they can also constrain the respondent and limit the researcher's
ability to understand what the respondent really means. There are many different structured response
formats, each with its own strengths and weaknesses. We'll review the major ones here.
Fill-In-The-Blank. One of the simplest response formats is a blank line. A blank line can be
used for a number of different response types. For instance:
Please enter your gender:
_____ Male
_____ Female
Here, the respondent would probably put a check mark or an X next to the response. This is
also an example of a dichotomous response, because it only has two possible values. Other
common dichotomous responses are True/False and Yes/No. Here's another common use of
a fill-in-the-blank response format:
Please enter your preference for the following candidates where '1' =
your first choice, '2' = your second choice, and so on.
_____ Robert Dole
_____ Colin Powell
_____ Bill Clinton
_____ Al Gore
In this example, the respondent writes a number in each blank. Notice that here, we expect
the respondent to place a number on every blank, whereas in the previous example, we
expect to respondent to choose only one. Then, of course, there's the classic:
NAME: ________________________
And here's the same fill-in-the-blank response item in web format:
NAME:
Of course, there's always the classic fill-in-the-blank test item:
One of President Lincoln's most famous speeches, the

Address, only lasted a few minutes when delivered.
Check The Answer. The respondent places a check next to the response(s). The simplest
form would be the example given above where we ask the person to indicate their gender.
Sometimes, we supply a box that the person can fill in with an 'X' (which is sort of a variation
on the check mark. Here's a web version of the checkbox:
Please check if you have the following item on the computer you use
most:
modem
printer
CD-ROM drive
joystick
scanner
Notice that in this example, it is possible for you to check more than one response. By
convention, we usually use the checkmark format when we want to allow the respondent to
select multiple items.
We sometimes refer to this as a multi-option variable. You have to be careful when you
analyze data from a multi-option variable. Because the respondent can select any of the
options, you have to treat this type of variable in your analysis as though each option is a
separate variable. For instance, for each option we would normally enter either a '0' if the
respondent did not check it or a '1' if the respondent did check it. For the example above, if
the respondent had only a modem and CD-ROM drive, we would enter the sequence 1, 0, 1,
0, 0. There is a very important reason why you should code this variable as either 0 or 1 when
you enter the data. If you do, and you want to determine what percent of your sample has a
modem, all you have to do is compute the average of the 0's and 1's for the modem variable.
For instance, if you have 10 respondents and only 3 have a modem, the average would be
3/10 = .30 or 30%, which is the percent who checked that item.
The example above is also a good example of a checklist item. Whenever you use a checklist,
you want to be sure that you ask the following questions:
Are all of the alternatives covered?

Is the list of reasonable length?
Is the wording impartial?
Is the form of the response easy, uniform?
Sometimes you may not be sure that you have covered all of the possible responses in a
checklist. If that is the case, you should probably allow the respondent to write in any other
options that may apply.
Circle The Answer. Sometimes the respondent is asked to circle an item to indicate their
response. Usually we are asking them to circle a number. For instance, we might have the
following:
In computer contexts, it's not feasible to have respondents circle a response. In this case, we
tend to use an option button:
Capital punishment is the best way to deal with convicted murderers.
Strongly Disagree Neutral Agree Strongly

Disagree Agree
Notice that you can only check one option at a time. The rule of thumb is that you ask
someone to circle an item or click on a button when you only want them to be able to select
one of the options. In contrast to the multi-option variable described above, we refer to this
type of item as a single-option variable -- even though the respondent has multiple choices,
they can only select one of them. We would analyze this as a single variable that can take the
integer values from 1 to 5.
Unstructured Response Formats
While there is a wide variety of structured response formats, there are relatively few unstructured ones. What
is an unstructured response format? Generally, it's written text. If the respondent (or interviewer) writes down
text as the response, you've got an unstructured response format. These can vary from short comment
boxes to the transcript of an interview.
In almost every short questionnaire, there's one or more short text field questions. One of the most frequent
goes something like this:
Please add any other comments:
Actually, there's really not much more to text-based response formats of this type than writing the prompt and
allowing enough space for a reasonable response.
Transcripts are an entirely different matter. There, the transcriber has to decide whether to transcribe every
word or only record major ideas, thoughts, quotes, etc. In detailed transcriptions, you may also need to
distinguish different speakers (e.g., the interviewer and respondent) and have a standard convention for
indicating comments about what's going on in the interview, including non-conversational events that take
place and thoughts of the interviewer.

One of the major difficulty in writing good survey questions is getting the wording right. Even slight wording
differences can confuse the respondent or lead to incorrect interpretations of the question. Here, I outline
some questions you can ask about how you worded each of your survey questions.
Can the Question be Misunderstood?
The survey author has to always be on the lookout for questions that could be misunderstood or confusing.
For instance, if you ask a person for their nationality, it might not be clear what you want (Do you want
someone from Malaysia to say Malaysian, Asian, or Pacific Islander?). Or, if you ask for marital status, do
you want someone to say simply that they are either married or no married? Or, do you want more detail
(like divorced, widow/widower, etc.)?
Some terms are just to vague to be useful. For instance, if you ask a question about the "mass media,"
what do you mean? The newspapers? Radio? Television?
Here's one of my favorites. Let's say you want to know the following:
What kind of headache remedy do you use?
Do you want to know what brand name medicine they take? Do you want to know about "home"
remedies? Are you asking whether they prefer a pill, capsule or caplet?
What Assumptions Does the Question Make?
Sometimes we don't stop to consider how a question will appear from the respondent's point-of-view. We
don't think about the assumptions behind our questions. For instance, if you ask what social class
someone's in, you assume that they know what social class is and that they think of themselves as being
in one. In this kind of case, you may need to use a filter question first to determine whether either of these
assumptions is true.
Is the time frame specified?
Whenever you use the words "will", "could", "might", or "may" in a question, you might suspect that the
question asks a time-related question. Be sure that, if it does, you have specified the time frame precisely.
For instance, you might ask:
Do you think Congress will cut taxes?

or something like
Do you think Congress could successfully resist tax cuts?
Neither of these questions specifies a time frame.
How personal is the wording?
With a change of just a few words, a question can go from being relatively impersonal to probing into your
private perspectives. Consider the following three questions, each of which asks about the respondent's
satisfaction with working conditions:
Are working conditions satisfactory or not satisfactory in the plant where you work?
Do you feel that working conditions satisfactory or not satisfactory in the plant where you work?
Are you personally satisfied with working conditions in the plant where you work?
The first question is stated from a fairly detached, objective viewpoint. The second asks how you "feel."
The last asks whether you are "personally satisfied." Be sure the questions in your survey are at an
appropriate level for your context. And, be sure there is consistency in this across questions in your survey.
Is the wording too direct?
There are times when asking a question too directly may be too threatening or disturbing for respondents.
For instance, consider a study where you want to discuss battlefield experiences with former soldiers who
experienced trauma. Examine the following three question options:
How did you feel about being in the war?

How well did the equipment hold up in the field?
How well were new recruits trained?
The first question may be too direct. For this population it may elicit powerful negative emotions based on
their recollections. The second question is a less direct one. It asks about equipment in the field, but, for
this population, may also lead the discussion toward more difficult issues to discuss directly. The last
question is probably the least direct and least threatening. Bashing the new recruits is standard protocol in
almost any social context. The question is likely to get the respondent talking, recounting anecdotes,
without eliciting much stress. Of course, all of this may simply be begging the question. If you are doing a
study where the respondents may experience high levels of stress because of the questions you ask, you
should reconsider the ethics of doing the study.
Other Wording Issues
The nuances of language guarantee that the task of the question writer will be endlessly complex. Without
trying to generate an exhaustive list, here are a few other questions to keep in mind:
Does the question contain difficult or unclear terminology?
Does the question make each alternative explicit?
Is the wording objectionable?
Is the wording loaded or slanted?

Decisions About Placement
One of the most difficult tasks facing the survey designer involves the ordering of questions. Which topics
should be introduced early in the survey, and which later? If you leave your most important questions until
the end, you may find that your respondents are too tired to give them the kind of attention you would like.
If you introduce them too early, they may not yet be ready to address the topic, especially if it is a difficult
or disturbing one. There are no easy answers to these problems - you have to use your judgment.
Whenever you think about question placement, consider the following questions:
Is the answer influenced by prior questions?

Does question come too early or too late to arouse interest?
Does the question receive sufficient attention?
The Opening Questions
Just as in other aspects of life, first impressions are important in survey work. The first few questions you
ask will determine the tone for the survey, and can help put your respondent at ease. With that in mind, the
opening few questions should, in general, be easy to answer. You might start with some simple descriptive
questions that will get the respondent rolling. You should never begin your survey with sensitive or
threatening questions.
Sensitive Questions
In much of our social research, we have to ask respondents about difficult or uncomfortable subjects.
Before asking such questions, you should attempt to develop some trust or rapport with the respondent.
Often, preceding the sensitive questions with some easier warm-up ones will help. But, you have to make
sure that the sensitive material does not come up abruptly or appear unconnected with the rest of the
survey. It is often helpful to have a transition sentence between sections of your instrument to give the
respondent some idea of the kinds of questions that are coming. For instance, you might lead into a
section on personal material with the transition:
In this next section of the survey, we'd like to ask you about your personal relationships.
Remember, we do not want you to answer any questions if you are uncomfortable doing so.
A Checklist of Considerations
There are lots of conventions or rules-of-thumb in the survey design business. Here's a checklist of some
of the most important items. You can use this checklist to review your instrument:
start with easy, nonthreatening questions
put more difficult, threatening questions near end
never start a mail survey with an open-ended question
for historical demographics, follow chronological order
ask about one topic at a time
when switching topics, use a transition
reduce response set (the tendency of respondent to just keep checking the same
response)
for filter or contingency questions, make a flowchart
The Golden Rule
You are imposing in the life of your respondent. You are asking for their time, their attention, their trust,
and often, for personal information. Therefore, you should always keep in mind the "golden rule" of survey
research (and, I hope, for the rest of your life as well!):
Do unto your respondents as you would have them do unto you!
To put this in more practical terms, you should keep the following in mind:
Thank the respondent at the beginning for allowing you to conduct your study
Keep your survey as short as possible -- only include what is absolutely necessary
Be sensitive to the needs of the respondent
Be alert for any sign that the respondent is uncomfortable
Thank the respondent at the end for participating
Assure the respondent that you will send a copy of the final results
Interviews are among the most challenging and rewarding forms of measurement. They require a personal
sensitivity and adaptability as well as the ability to stay within the bounds of the designed protocol. Here, I
describe the preparation you need to do for an interview study and the process of conducting the interview
itself.
Preparation
The Role of the Interviewer
The interviewer is really the "jack-of-all-trades" in survey research. The interviewer's role is complex and
multifaceted. It includes the following tasks:
Locate and enlist cooperation of respondents
The interviewer has to find the respondent. In door-to-door surveys, this means being able
to locate specific addresses. Often, the interviewer has to work at the least desirable times
(like immediately after dinner or on weekends) because that's when respondents are most
readily available.
Motivate respondents to do good job
If the interviewer does not take the work seriously, why would the respondent? The
interviewer has to be motivated and has to be able to communicate that motivation to the
respondent. Often, this means that the interviewer has to be convinced of the importance of
the research.
Clarify any confusion/concerns
Interviewers have to be able to think on their feet. Respondents may raise objections or
concerns that were not anticipated. The interviewer has to be able to respond candidly and
informatively.
Observe quality of responses
Whether the interview is personal or over the phone, the interviewer is in the best position
to judge the quality of the information that is being received. Even a verbatim transcript will
not adequately convey how seriously the respondent took the task, or any gestures or body
language that were evident.
Conduct a good interview
Last, and certainly not least, the interviewer has to conduct a good interview! Every
interview has a life of its own. Some respondents are motivated and attentive, others are
distracted or disinterested. The interviewer also has good or bad days. Assuring a
consistently high-quality interview is a challenge that requires constant effort.
Training the Interviewers
One of the most important aspects of any interview study is the training of the interviewers themselves. In
many ways the interviewers are your measures, and the quality of the results is totally in their hands. Even
in small studies involving only a single researcher-interviewer, it is important to organize in detail and
rehearse the interviewing process before beginning the formal study.
Here are some of the major topics that should be included in interviewer training:
Describe the entire study
Interviewers need to know more than simply how to conduct the interview itself. They
should learn about the background for the study, previous work that has been done, and
why the study is important.
State who is sponsor of research
Interviewers need to know who they are working for. They -- and their respondents -- have
a right to know not just what agency or company is conducting the research, but also, who
is paying for the research.
Teach enough about survey research
While you seldom have the time to teach a full course on survey research methods, the
interviewers need to know enough that they respect the survey method and are motivated.
Sometimes it may not be apparent why a question or set of questions was asked in a
particular way. The interviewers will need to understand the rationale for how the
instrument was constructed.
Explain the sampling logic and process
Naive interviewers may not understand why sampling is so important. They may wonder
why you go through all the difficulties of selecting the sample so carefully. You will have to
explain that sampling is the basis for the conclusions that will be reached and for the
degree to which your study will be useful.
Explain interviewer bias
Interviewers need to know the many ways that they can inadvertently bias the results. And,
they need to understand why it is important that they not bias the study. This is especially a
problem when you are investigating political or moral issues on which people have strongly
held convictions. While the interviewer may think they are doing good for society by slanting
results in favor of what they believe, they need to recognize that doing so could jeopardize
the entire study in the eyes of others.
"Walk through" the interview
When you first introduce the interview, it's a good idea to walk through the entire protocol
so the interviewers can get an idea of the various parts or phases and how they interrelate.
Explain respondent selection procedures, including
reading maps
It's astonishing how many adults don't know how to follow directions on a
map. In personal interviews, the interviewer may need to locate respondents
who are spread over a wide geographic area. And, they often have to
navigate by night (respondents tend to be most available in evening hours)
in neighborhoods they're not familiar with. Teaching basic map reading skills
and confirming that the interviewers can follow maps is essential.
identifying households
In many studies it is impossible in advance to say whether every sample

household meets the sampling requirements for the study. In your study,
you may want to interview only people who live in single family homes. It
may be impossible to distinguish townhouses and apartment buildings in
your sampling frame. The interviewer must know how to identify the
appropriate target household.
identify respondents
Just as with households, many studies require respondents who meet

specific criteria. For instance, your study may require that you speak with a
male head-of-household between the ages of 30 and 40 who has children
under 18 living in the same household. It may be impossible to obtain
statistics in advance to target such respondents. The interviewer may have
to ask a series of filtering questions before determining whether the
respondent meets the sampling needs.
Rehearse interview
You should probably have several rehearsal sessions with the interviewer team. You might
even videotape rehearsal interviews to discuss how the trainees responded in difficult
situations. The interviewers should be very familiar with the entire interview before ever
facing a respondent.
Explain supervision
In most interview studies, the interviewers will work under the direction of a supervisor. In
some contexts, the supervisor may be a faculty advisor; in others, they may be the "boss."
In order to assure the quality of the responses, the supervisor may have to observe a
subsample of interviews, listen in on phone interviews, or conduct follow-up assessments of
interviews with the respondents. This can be very threatening to the interviewers. You need
to develop an atmosphere where everyone on the research team -- interviewers and
supervisors -- feel like they're working together towards a common end.
Explain scheduling
The interviewers have to understand the demands being made on their schedules and why
these are important to the study. In some studies it will be imperative to conduct the entire
set of interviews within a certain time period. In most studies, it's important to have the
interviewers available when it's convenient for the respondents, not necessarily the
interviewer.
The Interviewer's Kit
It's important that interviewers have all of the materials they need to do a professional job. Usually, you will
want to assemble an interviewer kit that can be easily carried and includes all of the important materials
such as:
a "professional-looking" 3-ring notebook (this might even have the logo of the company or
organization conducting the interviews)
maps
sufficient copies of the survey instrument
official identification (preferable a picture ID)
a cover letter from the Principal Investigator or Sponsor
a phone number the respondent can call to verify the interviewer's authenticity
The Interview
So all the preparation is complete, the training done, the interviewers ready to proceed, their "kits" in hand.
It's finally time to do an actual interview. Each interview is unique, like a small work of art (and sometimes
the art may not be very good). Each interview has its own ebb and flow -- its own pace. To the outsider, an
interview looks like a fairly standard, simple, prosaic effort. But to the interviewer, it can be filled with
special nuances and interpretations that aren't often immediately apparent. Every interview includes some
common components. There's the opening, where the interviewer gains entry and establishes the rapport
and tone for what follows. There's the middle game, the heart of the process, that consists of the protocol
of questions and the improvisations of the probe. And finally, there's the endgame, the wrap-up, where the
interviewer and respondent establish a sense of closure. Whether it's a two-minute phone interview or a
personal interview that spans hours, the interview is a bit of theater, a mini-drama that involves real lives in
real time.
Opening Remarks
In many ways, the interviewer has the same initial problem that a salesperson has. You have to get the
respondent's attention initially for a long enough period that you can sell them on the idea of participating
in the study. Many of the remarks here assume an interview that is being conducted at a respondent's
residence. But the analogies to other interview contexts should be straightforward.
Gaining entry
The first thing the interviewer must do is gain entry. Several factors can enhance the
prospects. Probably the most important factor is your initial appearance. The interviewer
needs to dress professionally and in a manner that will be comfortable to the respondent. In
some contexts a business suit and briefcase may be appropriate. In others, it may
intimidate. The way the interviewer appears initially to the respondent has to communicate
some simple messages -- that you're trustworthy, honest, and non-threatening. Cultivating
a manner of professional confidence, the sense that the respondent has nothing to worry
about because you know what you're doing -- is a difficult skill to teach and an
indispensable skill for achieving initial entry.
Doorstep technique
You're standing on the doorstep and someone has opened the door, even if only halfway.
You need to smile. You need to be brief. State why you are there and suggest what you
would like the respondent to do. Don't ask -- suggest what you want. Instead of saying
"May I come in to do an interview?", you might try a more imperative approach like " I'd like
to take a few minutes of your time to interview you for a very important study."
Introduction
If you've gotten this far without having the door slammed in your face, chances are you will
be able to get an interview. Without waiting for the respondent to ask questions, you should
move to introducing yourself. You should have this part of the process memorized so you
can deliver the essential information in 20-30 seconds at most. State your name and the
name of the organization you represent. Show your identification badge and the letter that
introduces you. You want to have as legitimate an appearance as possible. If you have a
three-ring binder or clipboard with the logo of your organization, you should have it out and
visible. You should assume that the respondent will be interested in participating in your
important study -- assume that you will be doing an interview here.
Explaining the study
At this point, you've been invited to come in (After all, you're standing there in the cold,
holding an assortment of materials, clearly displaying your credentials, and offering the
respondent the chance to participate in an interview -- to many respondents, it's a rare and
exciting event. They hardly ever get asked their views about anything, and yet they know
that important decisions are made all the time based on input from others.). Or, the
respondent has continued to listen long enough that you need to move onto explaining the
study. There are three rules to this critical explanation: 1) Keep it short; 2) Keep it short;
and 3) Keep it short! The respondent doesn't have to or want to know all of the neat
nuances of this study, how it came about, how you convinced your thesis committee to buy
into it, and so on. You should have a one or two sentence description of the study
memorized. No big words. No jargon. No detail. There will be more than enough time for
that later (and you should bring some written materials you can leave at the end for that
purpose). This is the "25 words or less" description. What you should spend some time on
is assuring the respondent that you are interviewing them confidentially, and that their
participation is voluntary.
Asking the Questions
You've gotten in. The respondent has asked you to sit down and make yourself comfortable. It may be that
the respondent was in the middle of doing something when you arrived and you may need to allow them a
few minutes to finish the phone call or send the kids off to do homework. Now, you're ready to begin the
interview itself.
Use questionnaire carefully, but informally
The questionnaire is your friend. It was developed with a lot of care and thoughtfulness.
While you have to be ready to adapt to the needs of the setting, your first instinct should
always be to trust the instrument that was designed. But you also need to establish a
rapport with the respondent. If you have your face in the instrument and you read the
questions, you'll appear unprofessional and disinterested. Even though you may be
nervous, you need to recognize that your respondent is most likely even more nervous. If
you memorize the first few questions, you can refer to the instrument only occasionally,
using eye contact and a confident manner to set the tone for the interview and help the
respondent get comfortable.
Ask questions exactly as written
Sometimes an interviewer will think that they could improve on the tone of a question by
altering a few words to make it simpler or more "friendly." DON'T. You should ask the
questions as they are on the instrument. If you had a problem with a question, the time to
raise it was during the training and rehearsals, not during the actual interview. It is
important that the interview be as standardized as possible across respondents (this is true
except in certain types of exploratory or interpretivist research where the explicit goal is to
avoid any standardizing). You may think the change you made was inconsequential when,
in fact, it may change the entire meaning of the question or response.
Follow the order given
Once you know an interview well, you may see a respondent bring up a topic that you know
will come up later in the interview. You may be tempted to jump to that section of the
interview while you're on the topic. DON'T. You are more likely to lose your place. You may
omit questions that build a foundation for later questions.
Ask every question
Sometimes you'll be tempted to omit a question because you thought you already heard
what the respondent will say. Don't assume that. For example, let's say you were
conducting an interview with college age women about the topic of date rape. In an earlier
question, the respondent mentioned that she knew of a woman on her dormitory floor who
had been raped on a date within the past year. A few questions later, you are supposed to
ask "Do you know of anyone personally who was raped on a date?" You figure you already
know that the answer is yes, so you decide to skip the question. Instead, you might say
something like "I know you may have already mentioned this, but do you know of anyone
personally who was raped on a date?" At this point, the respondent may say something like
"Well, in addition to the woman who lived down the hall in my dorm, I know of a friend from
high school who experienced date rape." If you hadn't asked the question, you would never
have discovered this detail.
Don't finish sentences
I don't know about you, but I'm one of those people who just hates to be left hanging. I like
to keep a conversation moving. Once I know where a sentence seems to be heading, I'm
aching to get to the next sentence. I finish people's sentences all the time. If you're like me,
you should practice the art of patience (and silence) before doing any interviewing. As you'll
see below, silence is one of the most effective devices for encouraging a respondent to
talk. If you finish their sentence for them, you imply that what they had to say is transparent
or obvious, or that you don't want to give them the time to express themselves in their own
language.
Obtaining Adequate Responses - The Probe
OK, you've asked a question. The respondent gives a brief, cursory answer. How do you elicit a more
thoughtful, thorough response? You probe.
Silent probe
The most effective way to encourage someone to elaborate is to do nothing at all - just
pause and wait. This is referred to as the "silent" probe. It works (at least in certain cultures)
because the respondent is uncomfortable with pauses or silence. It suggests to the
respondent that you are waiting, listening for what they will say next.
Overt encouragement
At times, you can encourage the respondent directly. Try to do so in a way that does not
imply approval or disapproval of what they said (that could bias their subsequent results).
Overt encouragement could be as simple as saying "Uh-huh" or "OK" after the respondent
completes a thought.
Elaboration
You can encourage more information by asking for elaboration. For instance, it is
appropriate to ask questions like "Would you like to elaborate on that?" or "Is there anything
else you would like to add?"
Ask for clarification
Sometimes, you can elicit greater detail by asking the respondent to clarify something that
was said earlier. You might say, "A minute ago you were talking about the experience you
had in high school. Could you tell me more about that?"
Repetition
This is the old psychotherapist trick. You say something without really saying anything new.
For instance, the respondent just described a traumatic experience they had in childhood.
You might say "What I'm hearing you say is that you found that experience very traumatic."
Then, you should pause. The respondent is likely to say something like "Well, yes, and it
affected the rest of my family as well. In fact, my younger sister..."
Recording the Response
Although we have the capability to record a respondent in audio and/or video, most interview
methodologists don't think it's a good idea. Respondents are often uncomfortable when they know their
remarks will be recorded word-for-word. They may strain to only say things in a socially acceptable way.
Although you would get a more detailed and accurate record, it is likely to be distorted by the very process
of obtaining it. This may be more of a problem in some situations than in others. It is increasingly common
to be told that your conversation may be recorded during a phone interview. And most focus group
methodologies use unobtrusive recording equipment to capture what's being said. But, in general,
personal interviews are still best when recorded by the interviewer using pen and paper. Here, I assume
the paper-and-pencil approach.
Record responses immediately
The interviewer should record responses as they are being stated. This conveys the idea
that you are interested enough in what the respondent is saying to write it down. You don't
have to write down every single word -- you're not taking stenography. But you may want to
record certain key phrases or quotes verbatim. You need to develop a system for
distinguishing what the respondent says verbatim from what you are characterizing (how
about quotations, for instance!).
Include all probes
You need to indicate every single probe that you use. Develop a shorthand for different
standard probes. Use a clear form for writing them in (e.g., place probes in the left margin).
Use abbreviations where possible
Abbreviations will help you to capture more of the discussion. Develop a standardized
system (e.g., R=respondent; DK=don't know). If you create an abbreviation on the fly, have
a way of indicating its origin. For instance, if you decide to abbreviate Spouse with an 'S',
you might make a notation in the right margin saying "S=Spouse."
Concluding the Interview
When you've gone through the entire interview, you need to bring the interview to closure. Some important
things to remember:
Thank the respondent
Don't forget to do this. Even if the respondent was troublesome or uninformative, it is

important for you to be polite and thank them for their time.
Tell them when you expect to send results
I hate it when people conduct interviews and then don't send results and summaries to the
people who they get the information from. You owe it to your respondent to show them
what you learned. Now, they may not want your entire 300-page dissertation. It's common
practice to prepare a short, readable, jargon-free summary of interviews that you can send
to the respondents.
Don't be brusque or hasty
Allow for a few minutes of winding down conversation. The respondent may want to know a
little bit about you or how much you like doing this kind of work. They may be interested in
how the results will be used. Use these kinds of interests as a way to wrap up the
conversation. As you're putting away your materials and packing up to go, engage the
respondent. You don't want the respondent to feel as though you completed the interview
and then rushed out on them -- they may wonder what they said that was wrong. On the
other hand, you have to be careful here. Some respondents may want to keep on talking
long after the interview is over. You have to find a way to politely cut off the conversation
and make your exit.
Immediately after leaving -- write down any notes about how the interview went
Sometimes you will have observations about the interview that you didn't want to write
down while you were with the respondent. You may have noticed them get upset at a
question, or you may have detected hostility in a response. Immediately after the interview
you should go over your notes and make any other comments and observations -- but be
sure to distinguish these from the notes made during the interview (you might use a
different color pen, for instance).

It's hard to compare the advantages and disadvantages of the major different survey types. Even though
each type has some general advantages and disadvantages, there are exceptions to almost every rule.
Here's my general assessment. Perhaps you would differ in your ratings here or there, but I think you'll
generally agree.
Issue Questionnaire Interview
Group Mail Drop-Off Personal Phone
Are Visual Presentations Possible? Yes Yes Yes Yes No
Are Long Response Categories

Yes Yes Yes ??? No
Possible?
Is Privacy A Feature? No Yes No Yes ???
Is the Method Flexible? No No No Yes Yes
Are Open-ended Questions Feasible? No No No Yes Yes
Is Reading & Writing Needed? ??? Yes Yes No No
Can You Judge Quality of Response? Yes No ??? Yes ???
Are High Response Rates Likely? Yes No Yes Yes No
Can You Explain Study in Person? Yes No Yes Yes ???

Is It Low Cost? Yes Yes No No No
Are Staff & Facilities Needs Low? Yes Yes No No No
Does It Give Access to Dispersed

No Yes No No No
Samples?
Does Respondent Have Time to

No Yes Yes No No
Formulate Answers?
Is There Personal Contact? Yes No Yes Yes No
Is A Long Survey Feasible? No No No Yes No
Is There Quick Turnaround? No Yes No No Yes

Selecting the type of survey you are going to use is one of the most critical decisions in many social
research contexts. You'll see that there are very few simple rules that will make the decision for you -- you
have to use your judgment to balance the advantages and disadvantages of different survey types. Here,
all I want to do is give you a number of questions you might ask that can help guide your decision.
Population Issues
The first set of considerations have to do with the population and its accessibility.
Can the population be enumerated?
For some populations, you have a complete listing of the units that will be sampled. For
others, such a list is difficult or impossible to compile. For instance, there are complete
listings of registered voters or person with active drivers licenses. But no one keeps a
complete list of homeless people. If you are doing a study that requires input from
homeless persons, you are very likely going to need to go and find the respondents
personally. In such contexts, you can pretty much rule out the idea of mail surveys or
telephone interviews.
Is the population literate?
Questionnaires require that your respondents can read. While this might seem initially like a
reasonable assumption for many adult populations, we know from recent research that the
instance of adult illiteracy is alarmingly high. And, even if your respondents can read to
some degree, your questionnaire may contain difficult or technical vocabulary. Clearly,
there are some populations that you would expect to be illiterate. Young children would not
be good targets for questionnaires.
Are there language issues?
We live in a multilingual world. Virtually every society has members who speak other than
the predominant language. Some countries (like Canada) are officially multilingual. And, our
increasingly global economy requires us to do research that spans countries and language
groups. Can you produce multiple versions of your questionnaire? For mail instruments,
can you know in advance the language your respondent speaks, or do you send multiple
translations of your instrument? Can you be confident that important connotations in your
instrument are not culturally specific? Could some of the important nuances get lost in the
process of translating your questions?
Will the population cooperate?
People who do research on illegal immigration have a difficult methodological problem.

They often need to speak with illegal immigrants or people who may be able to identify
others who are. Why would we expect those respondents to cooperate? Although the
researcher may mean no harm, the respondents are at considerable risk legally if
information they divulge should get into the hand of the authorities. The same can be said
for any target group that is engaging in illegal or unpopular activities.
What are the geographic restrictions?
Is your population of interest dispersed over too broad a geographic range for you to study
feasibly with a personal interview? It may be possible for you to send a mail instrument to a
nationwide sample. You may be able to conduct phone interviews with them. But it will
almost certainly be less feasible to do research that requires interviewers to visit directly
with respondents if they are widely dispersed.
Sampling Issues
The sample is the actual group you will have to contact in some way. There are several important
sampling issues you need to consider when doing survey research.
What data is available?
What information do you have about your sample? Do you know their current addresses?
Their current phone numbers? Are your contact lists up to date?
Can respondents be found?
Can your respondents be located? Some people are very busy. Some travel a lot. Some
work the night shift. Even if you have an accurate phone or address, you may not be able to
locate or make contact with your sample.
Who is the respondent?
Who is the respondent in your study? Let's say you draw a sample of households in a small
city. A household is not a respondent. Do you want to interview a specific individual? Do
you want to talk only to the "head of household" (and how is that person defined)? Are you
willing to talk to any member of the household? Do you state that you will speak to the first
adult member of the household who opens the door? What if that person is unwilling to be
interviewed but someone else in the house is willing? How do you deal with multi-family
households? Similar problems arise when you sample groups, agencies, or companies.
Can you survey any member of the organization? Or, do you only want to speak to the
Director of Human Resources? What if the person you would like to interview is unwilling or
unable to participate? Do you use another member of the organization?
Can all members of population be sampled?
If you have an incomplete list of the population (i.e., sampling frame) you may not be able
to sample every member of the population. Lists of various groups are extremely hard to
keep up to date. People move or change their names. Even though they are on your
sampling frame listing, you may not be able to get to them. And, it's possible they are not
even on the list.
Are response rates likely to be a problem?
Even if you are able to solve all of the other population and sampling problems, you still
have to deal with the issue of response rates. Some members of your sample will simply
refuse to respond. Others have the best of intentions, but can't seem to find the time to
send in your questionnaire by the due date. Still others misplace the instrument or forget
about the appointment for an interview. Low response rates are among the most difficult of
problems in survey research. They can ruin an otherwise well-designed survey effort.
Question Issues
Sometimes the nature of what you want to ask respondents will determine the type of survey you select.
What types of questions can be asked?
Are you going to be asking personal questions? Are you going to need to get lots of detail
in the responses? Can you anticipate the most frequent or important types of responses
and develop reasonable closed-ended questions?
How complex will the questions be?

Sometimes you are dealing with a complex subject or topic. The questions you want to ask
are going to have multiple parts. You may need to branch to sub-questions.
Will screening questions be needed?
A screening question may be needed to determine whether the respondent is qualified to

answer your question of interest. For instance, you wouldn't want to ask someone their
opinions about a specific computer program without first "screening" them to find out
whether they have any experience using the program. Sometimes you have to screen on
several variables (e.g., age, gender, experience). The more complicated the screening, the
less likely it is that you can rely on paper-and-pencil instruments without confusing the
respondent.
Can question sequence be controlled?
Is your survey one where you can construct in advance a reasonable sequence of
questions? Or, are you doing an initial exploratory study where you may need to ask lots of
follow-up questions that you can't easily anticipate?
Will lengthy questions be asked?
If your subject matter is complicated, you may need to give the respondent some detailed
background for a question. Can you reasonably expect your respondent to sit still long
enough in a phone interview to ask your question?
Will long response scales be used?
If you are asking people about the different computer equipment they use, you may have to
have a lengthy response list (CD-ROM drive, floppy drive, mouse, touch pad, modem,
network connection, external speakers, etc.). Clearly, it may be difficult to ask about each of
these in a short phone interview.
Content Issues
The content of your study can also pose challenges for the different survey types you might utilize.
Can the respondents be expected to know about the issue?
If the respondent does not keep up with the news (e.g., by reading the newspaper,
watching television news, or talking with others), they may not even know about the news
issue you want to ask them about. Or, if you want to do a study of family finances and you
are talking to the spouse who doesn't pay the bills on a regular basis, they may not have
the information to answer your questions.
Will respondent need to consult records?
Even if the respondent understands what you're asking about, you may need to allow them
to consult their records in order to get an accurate answer. For instance, if you ask them
how much money they spent on food in the past month, they may need to look up their
personal check and credit card records. In this case, you don't want to be involved in an
interview where they would have to go look things up while they keep you waiting (they
wouldn't be comfortable with that).
Bias Issues
People come to the research endeavor with their own sets of biases and prejudices. Sometimes, these
biases will be less of a problem with certain types of survey approaches.
Can social desirability be avoided?
Respondents generally want to "look good" in the eyes of others. None of us likes to look
like we don't know an answer. We don't want to say anything that would be embarrassing. If
you ask people about information that may put them in this kind of position, they may not
tell you the truth, or they may "spin" the response so that it makes them look better. This
may be more of a problem in an interview situation where they are face-to face or on the
phone with a live interviewer.
Can interviewer distortion and subversion be controlled?
Interviewers may distort an interview as well. They may not ask questions that make them
uncomfortable. They may not listen carefully to respondents on topics for which they have
strong opinions. They may make the judgment that they already know what the respondent
would say to a question based on their prior responses, even though that may not be true.
Can false respondents be avoided?
With mail surveys it may be difficult to know who actually responded. Did the head of
household complete the survey or someone else? Did the CEO actually give the responses
or instead pass the task off to a subordinate? Is the person you're speaking with on the
phone actually who they say they are? At least with personal interviews, you have a
reasonable chance of knowing who you are speaking with. In mail surveys or phone
interviews, this may not be the case.
Administrative Issues
Last, but certainly not least, you have to consider the feasibility of the survey method for your study.
costs
Cost is often the major determining factor in selecting survey type. You might prefer to do
personal interviews, but can't justify the high cost of training and paying for the interviewers.
You may prefer to send out an extensive mailing but can't afford the postage to do so.
facilities
Do you have the facilities (or access to them) to process and manage your study? In phone
interviews, do you have well-equipped phone surveying facilities? For focus groups, do you
have a comfortable and accessible room to host the group? Do you have the equipment
needed to record and transcribe responses?
time
Some types of surveys take longer than others. Do you need responses immediately (as in
an overnight public opinion poll)? Have you budgeted enough time for your study to send
out mail surveys and follow-up reminders, and to get the responses back by mail? Have
you allowed for enough time to get enough personal interviews to justify that approach?
personnel
Different types of surveys make different demands of personnel. Interviews require

interviewers who are motivated and well-trained. Group administered surveys require
people who are trained in group facilitation. Some studies may be in a technical area that
requires some degree of expertise in the interviewer.
Clearly, there are lots of issues to consider when you are selecting which type of survey you wish to use in
your study. And there is no clear and easy way to make this decision in many contexts. There may not be
one approach which is clearly the best. You may have to make tradeoffs of advantages and
disadvantages. There is judgment involved. Two expert researchers may, for the very same problem or
issue, select entirely different survey methods. But, if you select a method that isn't appropriate or doesn't
fit the context, you can doom a study before you even begin designing the instruments or questions
themselves.

[ Home ] [ General Issues in Scaling ] [ Thurstone Scaling ] [ Likert Scaling ] [ Guttman Scaling ]
S.S. Stevens came up with what I think is the simplest and most straightforward definition of scaling. He
said:
Scaling is the assignment of objects to numbers according to a rule.
But what does that mean? In most scaling, the objects are text statements, usually statements of attitude
or belief. The figure shows an example. There are three statements describing attitudes towards
immigration. To scale these
statements, we have to assign
numbers to them. Usually, we
would like the result to be on at
least an interval scale (see
Levels of Measurement) as
indicated by the ruler in the
figure. And what does
"according to a rule" mean? If
you look at the statements, you
can see that as you read down,
the attitude towards immigration
becomes more restrictive -- if a
person agrees with a statement
on the list, it's likely that they
will also agree with all of the
statements higher on the list. In
this case, the "rule" is a
cumulative one. So what is
scaling? It's how we get
numbers that can be meaningfully assigned to objects -- it's a set of procedures. We'll present several
different approaches below.
But first, I have to clear up one of my pet peeves. People often confuse the idea of a scale and a response
scale. A response scale is the way you collect responses from people on an instrument. You might use a
dichotomous response scale like Agree/Disagree, True/False, or Yes/No. Or, you might use an interval
response scale like a 1-to-5 or 1-to-7 rating. But, if all you are doing is attaching a response scale to an
object or statement, you can't call that scaling. As you will see, scaling involves procedures that you do
independent of the respondent so that you can come up with a numerical value for the object. In true
scaling research, you use a scaling procedure to develop your instrument (scale) and you also use a
response scale to collect the responses from participants. But just assigning a 1-to-5 response scale for an
item is not scaling! The differences are illustrated in the table below.
Scale Response Scale
is used to collect the

results from a process
response for an item
each item on scale has item not associated with a

a scale value scale value
refers to a set of items used for a single item
Purposes of Scaling
Why do we do scaling? Why not just create text statements or questions and use response formats to
collect the answers? First, sometimes we do scaling to test a hypothesis. We might want to know whether
the construct or concept is a single dimensional or multidimensional one (more about dimensionality later).
Sometimes, we do scaling as part of exploratory research. We want to know what dimensions underlie a
set of ratings. For instance, if you create a set of questions, you can use scaling to determine how well
they "hang together" and whether they measure one concept or multiple concepts. But probably the most
common reason for doing scaling is for scoring purposes. When a participant gives their responses to a
set of items, we often would like to assign a single number that represents that's person's overall attitude
or belief. For the figure above, we would like to be able to give a single number that describes a person's
attitudes towards immigration, for example.
Dimensionality
A scale can have any number of dimensions in it. Most scales that we develop have only a few
dimensions. What's a dimension? Think of a dimension as a number line. If we want to measure a
construct, we have to decide whether the construct can be measured well with one number line or whether
it may need more. For instance, height is a concept that is unidimensional or one-dimensional. We can
measure the concept of height very well
with only a single number line (e.g., a
ruler). Weight is also unidimensional -- we
can measure it with a scale. Thirst might
also bee considered a unidimensional
concept -- you are either more or less
thirsty at any given time. It's easy to see
that height and weight are
unidimensional. But what about a concept
like self esteem? If you think you can
measure a person's self esteem well with
a single ruler that goes from low to high,
then you probably have a unidimensional
construct.
What would a two-dimensional concept be? Many models of intelligence or achievement postulate two
major dimensions -- mathematical and verbal ability.
In this type of two-dimensional model, a person can
be said to possess two types of achievement. Some
people will be high in verbal skills and lower in math.
For others, it will be the reverse. But, if a concept is
truly two-dimensional, it is not possible to depict a
person's level on it using only a single number line.
In other words, in order to describe achievement you
would need to locate a person as a point in two
dimensional (x,y) space.
OK, let's push this one step further: how about a

three-dimensional concept? Psychologists who study
the idea of meaning theorized that the meaning of a
term could be well described in three dimensions.
Put in other terms, any objects can be distinguished or differentiated from each other along three
dimensions. They labeled these three dimensions activity, evaluation, and potency. They called this
general theory of meaning the semantic differential. Their theory essentially states that you can rate any
object along those three dimensions. For
instance, think of the idea of "ballet." If
you like the ballet, you would probably
rate it high on activity, favorable on
evaluation, and powerful on potency. On
the other hand, think about the concept of
a "book" like a novel. You might rate it
low on activity (it's passive), favorable on
evaluation (assuming you like it), and
about average on potency. Now, think of
the idea of "going to the dentist." Most
people would rate it low on activity (it's a
passive activity), unfavorable on
evaluation, and powerless on potency
(there are few routine activities that make
you feel as powerless!). The theorists
who came up with the idea of the
semantic differential thought that the meaning of any concepts could be described well by rating the
concept on these three dimensions. In other words, in order to describe the meaning of an object you have
to locate it as a dot somewhere within the cube (three-dimensional space).
Unidimensional or Multidimensional?
What are the advantages of using a unidimensional model? Unidimensional concepts are generally easier
to understand. You have either more or less of it, and that's all. You're either taller or shorter, heavier or
lighter. It's also important to understand what a unidimensional scale is as a foundation for comprehending
the more complex multidimensional concepts. But the best reason to use unidimensional scaling is
because you believe the concept you are measuring really is unidimensional in reality. As you've seen,
many familiar concepts (height, weight, temperature) are actually unidimensional. But, if the concept you
are studying is in fact multidimensional in nature, a unidimensional scale or number line won't describe it
well. If you try to measure academic achievement on a single dimension, you would place every person on
a single line ranging from low to high achievers. But how do you score someone who is a high math
achiever and terrible verbally, or vice versa? A unidimensional scale can't capture that type of achievement.
The Major Unidimensional Scale Types
There are three major types of unidimensional scaling methods. They are similar in that they each
measure the concept of interest on a number line. But they differ considerably in how they arrive at scale
values for different items. The three methods are Thurstone or Equal-Appearing Interval Scaling, Likert or
"Summative" Scaling, and Guttman or "Cumulative" Scaling.

Thurstone was one of the first and most productive scaling theorists. He actually invented three different
methods for developing a unidimensional scale: the method of equal-appearing intervals; the method
of successive intervals; and, the method of paired comparisons. The three methods differed in how
the scale values for items were constructed, but in all three cases, the resulting scale was rated the same
way by respondents. To illustrate Thurstone's approach, I'll show you the easiest method of the three to
implement, the method of equal-appearing intervals.
The Method of Equal-Appearing Intervals
Developing the Focus. The Method of Equal-Appearing Intervals starts like almost every other scaling
method -- with a large set of statements. Oops! I did it again! You can't start with the set of statements --
you have to first define the focus for the scale you're trying to develop. Let this be a warning to all of you:
methodologists like me often start our descriptions with the first objective methodological step (in this case,
developing a set of statements) and forget to mention critical foundational issues like the development of
the focus for a project. So, let's try this again...
The Method of Equal-Appearing Intervals starts like almost every other scaling method -- with the
development of the focus for the scaling project. Because this is a unidimensional scaling method, we
assume that the concept you are trying to scale is reasonably thought of as one-dimensional. The
description of this concept should be as clear as possible so that the person(s) who are going to create the
statements have a clear idea of what you are trying to measure. I like to state the focus for a scaling
project in the form of a command -- the command you will give to the people who will create the
statements. For instance, you might start with the focus command:
Generate statements that describe specific attitudes that people might have towards
persons with AIDS.
You want to be sure that everyone who is generating statements has some idea of what you are after in
this focus command. You especially want to be sure that technical language and acronyms are spelled out
and understood (e.g., what is AIDS?).
Generating
Potential Scale
Items. Now,
you're ready to
create
statements. You
want a large set
of candidate
statements (e.g.,
80 -- 100)
because you are
going to select
your final scale
items from this
pool. You also
want to be sure
that all of the
statements are
worded similarly
-- that they don't
differ in grammar or structure. For instance, you might want them each to be worded as a statement which
you cold agree or disagree with. You don't want some of them to be statements while others are questions.
For our example focus on developing an AIDS attitude scale, we might generate statements like the
following (these statements came from a class exercise I did in my Spring 1997 undergrad class):
people get AIDS by engaging in immoral behavior

you can get AIDS from toilet seats
AIDS is the wrath of God
anybody with AIDS is either gay or a junkie
AIDS is an epidemic that affects us all
people with AIDS are bad
people with AIDS are real people
AIDS is a cure, not a disease
you can get AIDS from heterosexual sex
people with AIDS are like my parents
you can get AIDS from public toilets
women don’t get AIDS
I treat everyone the same, regardless of whether or not they have AIDS
AIDS costs the public too much
AIDS is something the other guy gets
living with AIDS is impossible
children cannot catch AIDS
AIDS is a death sentence
because AIDS is preventable, we should focus our resources on prevention instead of curing
People who contract AIDS deserve it
AIDS doesn't have a preference, anyone can get it.
AIDS is the worst thing that could happen to you.
AIDS is good because it will help control the population.
If you have AIDS, you can still live a normal life.
People with AIDS do not need or deserve our help
By the time I would get sick from AIDS, there will be a cure
AIDS will never happen to me
you can't get AIDS from oral sex
AIDS is spread the same way colds are
AIDS does not discriminate
You can get AIDS from kissing
AIDS is spread through the air
Condoms will always prevent the spread of AIDS
People with AIDS deserve what they got
If you get AIDS you will die within a year
Bad people get AIDS and since I am a good person I will never get AIDS
I don't care if I get AIDS because researchers will soon find a cure for it.
AIDS distracts from other diseases that deserve our attention more
bringing AIDS into my family would be the worst thing I could do
very few people have AIDS, so it's unlikely that I'll ever come into contact with a sufferer
if my brother caught AIDS I'd never talk to him again
People with AIDS deserve our understanding, but not necessarily special treatment
AIDS is a omnipresent, ruthless killer that lurks around dark alleys, silently waiting for naive victims
to wander passed so that it might pounce.
I can't get AIDS if I'm in a monogamous relationship
the nation's blood supply is safe
universal precautions are infallible
people with AIDS should be quarantined to protect the rest of society
because I don't live in a big city, the threat of AIDS is very small
I know enough about the spread of the disease that I would have no problem working in a health
care setting with patients with AIDS
the AIDS virus will not ever affect me
Everyone affected with AIDS deserves it due to their lifestyle
Someone with AIDS could be just like me
People infected with AIDS did not have safe sex
Aids affects us all.
People with AIDS should be treated just like everybody else.
AIDS is a disease that anyone can get if there are not careful.
It's easy to get AIDS.
The likelihood of contracting AIDS is very low.
The AIDS quilt is an emotional reminder to remember those who did not deserve to die painfully or
in vain
The number of individuals with AIDS in Hollywood is higher than the general public thinks
It is not the AIDS virus that kills people, it is complications from other illnesses (because the
immune system isn't functioning) that cause death
AIDS is becoming more a problem for heterosexual women and their offsprings than IV drug users
or homosexuals
A cure for AIDS is on the horizon
A cure for AIDS is on the horizon
Mandatory HIV testing should be established for all pregnant women
Rating the Scale Items. OK, so now you have a set of statements. The next step is to have your
participants (i.e., judges) rate each statement on a 1-to-11 scale in terms of how much each statement
indicates a favorable attitude towards people with AIDS. Pay close attention here! You DON'T want the
participants to tell you what their attitudes towards AIDS are, or whether they would agree with the
statements. You want them to rate the "favorableness" of each statement in terms of an attitude towards
AIDS, where 1 = "extremely unfavorable attitude towards people with AIDS" and 11 = "extremely favorable
attitude towards people with AIDS.". (Note that I could just as easily had the judges rate how much each
statement represents a negative attitude towards AIDS. If I did, the scale I developed would have higher
scale values for people with more negative attitudes).
Computing Scale Score Values for Each Item. The next step is to analyze the rating data. For each
statement, you need to compute the Median and the Interquartile Range. The median is the value above
and below which 50% of the ratings fall. The first quartile (Q1) is the value below which 25% of the cases
fall and above which 75% of the cases fall -- in other words, the 25th percentile. The median is the 50th
percentile. The third quartile, Q3, is the 75th percentile. The Interquartile Range is the difference between
third and first quartile, or Q3 - Q1. The figure above shows a histogram for a single item and indicates the
median and Interquartile Range. You can compute these values easily with any introductory statistics
program or with most spreadsheet programs. To facilitate the final selection of items for your scale, you
might want to sort the table of medians and Interquartile Range in ascending order by Median and, within
that, in descending order by Interquartile Range. For the items in this example, we got a table like the
following:
Statement Number Median Q1 Q3 Interquartile Range

23 1 1 2.5 1.5
8 1 1 2 1
12 1 1 2 1
34 1 1 2 1
39 1 1 2 1
54 1 1 2 1
56 1 1 2 1
57 1 1 2 1
18 1 1 1 0
25 1 1 1 0
51 1 1 1 0
27 2 1 5 4
45 2 1 4 3
16 2 1 3.5 2.5
42 2 1 3.5 2.5
24 2 1 3 2
44 2 2 4 2
36 2 1 2.5 1.5
43 2 1 2.5 1.5
33 3 1 5 4
48 3 1 5 4
20 3 1.5 5 3.5
28 3 1.5 5 3.5
31 3 1.5 5 3.5
19 3 1 4 3
22 3 1 4 3
37 3 1 4 3
41 3 2 5 3
6 3 1.5 4 2.5
21 3 1.5 4 2.5
32 3 2 4.5 2.5
9 3 2 3.5 1.5
1 4 3 7 4
26 4 1 5 4
47 4 1 5 4
30 4 1.5 5 3.5
13 4 2 5 3
11 4 2 4.5 2.5
15 4 3 5 2
40 5 4.5 8 3.5
2 5 4 6.5 2.5
14 5 4 6 2
17 5.5 4 8 4
49 6 5 9.75 4.75
50 8 5.5 11 5.5
35 8 6.25 10 3.75
29 9 5.5 11 5.5
38 9 5.5 10.5 5
3 9 6 10 4
55 9 7 11 4
10 10 6 10.5 4.5
7 10 7.5 11 3.5
46 10 8 11 3
5 10 8.5 11 2.5
53 11 9.5 11 1.5
4 11 10 11 1
Selecting the Final Scale Items. Now, you have to select the final statements for your scale. You should
select statements that are at equal intervals across the range of medians. In our example, we might select
one statement for each of the eleven median values. Within each value, you should try to select the
statement that has the smallest Interquartile Range. This is the statement with the least amount of
variability across judges. You don't want the statistical analysis to be the only deciding factor here. Look
over the candidate statements at each level and select the statement that makes the most sense. If you
find that the best statistical choice is a confusing statement, select the next best choice.
When we went through our statements, we came up with the following set of items for our scale:
People with AIDS are like my parents (6)

Because AIDS is preventable, we should focus our resources on prevention instead of curing (5)
People with AIDS deserve what they got. (1)
Aids affects us all (10)
People with AIDS should be treated just like everybody else. (11)
AIDS will never happen to me. (3)
It's easy to get AIDS (5)
AIDS doesn't have a preference, anyone can get it (9)
AIDS is a disease that anyone can get if they are not careful (9)
If you have AIDS, you can still lead a normal life (8)
AIDS is good because it helps control the population. (2)
I can't get AIDS if I'm in a monogamous relationship. (4)
The value in parentheses after each statement is its scale value. Items with higher scale values should, in
general, indicate a more favorable attitude towards people with AIDS. Notice that we have randomly
scrambled the order of the statements with respect to scale values. Also, notice that we do not have an
item with scale value of 7 and that we have two with values of 5 and of 9 (one of these pairs will average
out to a 7).
Administering the Scale. You now have a scale -- a yardstick you can use for measuring attitudes
towards people with AIDS. You can give it to a participant and ask them to agree or disagree with each
statement. To get that person's total scale score, you average the scale scores of all the items that person
agreed with. For instance, let's say a respondent completed the scale as follows:
Agree Disagree People with AIDS are like my parents.
Because AIDS is preventable, we should

Agree Disagree focus our resources on prevention instead of
curing.
People with AIDS deserve what they got.

Agree Disagree
Agree Disagree Aids affects us all.
People with AIDS should be treated just like

Agree Disagree
everybody else.
Agree Disagree AIDS will never happen to me.
Agree Disagree It's easy to get AIDS.
AIDS doesn't have a preference, anyone can

Agree Disagree
get it.
AIDS is a disease that anyone can get if they
Agree Disagree
are not careful.
If you have AIDS, you can still lead a normal

Agree Disagree
life.
AIDS is good because it helps control the

Agree Disagree
population.
I can't get AIDS if I'm in a monogamous

Agree Disagree
relationship.
If you're following along with the example, you should see that the respondent checked eight items as
Agree. When we take the average scale values for these eight items, we get a final value for this
respondent of 7.75. This is where this particular respondent would fall on our "yardstick" that measures
attitudes towards persons with AIDS. Now, let's look at the responses for another individual:
People with AIDS are like my parents.

Agree Disagree
Because AIDS is preventable, we should

focus our resources on prevention instead of
Agree Disagree curing.
People with AIDS deserve what they got.

Agree Disagree
Agree Disagree Aids affects us all.
People with AIDS should be treated just like

Agree Disagree
everybody else.
Agree Disagree AIDS will never happen to me.
Agree Disagree It's easy to get AIDS.
AIDS doesn't have a preference, anyone can

Agree Disagree
get it.
AIDS is a disease that anyone can get if they

Agree Disagree
are not careful.
If you have AIDS, you can still lead a normal

Agree Disagree
life.
AIDS is good because it helps control the

Agree Disagree
population.
I can't get AIDS if I'm in a monogamous

Agree Disagree
relationship.
In this example, the respondent only checked four items, all of which are on the negative end of the scale.
When we average the scale items for the statements with which the respondent agreed we get an average
score of 2.5, considerably lower or more negative in attitude than the first respondent.
The Other Thurstone Methods
The other Thurstone scaling methods are similar to the Method of Equal-Appearing Intervals. All of them
begin by focusing on a concept that is assumed to be unidimensional and involve generating a large set of
potential scale items. All of them result in a scale consisting of relatively few items which the respondent
rates on Agree/Disagree basis. The major differences are in how the data from the judges is collected. For
instance, the method of paired comparisons requires each judge to make a judgement about each pair of
statements. With lots of statements, this can become very time consuming indeed. With 57 statements in
the original set, there are 1,596 unique pairs of statements that would have to be compared! Clearly, the
paired comparison method would be too time consuming when there are lots of statements initially.
Thurstone methods illustrate well how a simple unidimensional scale might be constructed. There are
other approaches, most notably Likert or Summative Scales and Guttman or Cumulative Scales.

Like Thurstone or Guttman Scaling, Likert Scaling is a unidimensional scaling method. Here, I'll explain the basic steps in developing a Likert or
"Summative" scale.
Defining the Focus. As in all scaling methods, the first step is to define what it is you are trying to measure. Because this is a unidimensional
scaling method, it is assumed that the concept you want to measure is one-dimensional in nature. You might operationalize the definition as an
instruction to the people who are going to create or generate the initial set of candidate items for your scale.
Generating the Items. next, you have to create the set

of potential scale items. These should be items that can
be rated on a 1-to-5 or 1-to-7 Disagree-Agree response
scale. Sometimes you can create the items by yourself
based on your intimate understanding of the subject
matter. But, more often than not, it's helpful to engage a
number of people in the item creation step. For
instance, you might use some form of brainstorming to
create the items. It's desirable to have as large a set of
potential items as possible at this stage, about 80-100
would be best.
Rating
the
Items.
The
next
step is
to
have a
group of judges rate the items. Usually you would use a 1-to-5 rating scale where:
1. = strongly unfavorable to the concept

2. = somewhat unfavorable to the concept
3. = undecided
4. = somewhat favorable to the concept
5. = strongly favorable to the concept
Notice that, as in other scaling methods, the judges are not telling you what they believe -- they
are judging how favorable each item is with respect to the construct of interest.
Selecting the Items. The next step is to compute the intercorrelations between all pairs of items, based on the ratings of the judges. In making
judgements about which items to retain for the final scale there are several analyses you can do:
Throw out any items that have a low correlation with the total (summed) score across all items
In most statistics packages it is relatively easy to compute this type of Item-Total correlation. First, you create a new variable which is the
sum of all of the individual items for each respondent. Then, you include this variable in the correlation matrix computation (if you include
it as the last variable in the list, the resulting Item-Total correlations will all be the last line of the correlation matrix and will be easy to
spot). How low should the correlation be for you to throw out the item? There is no fixed rule here -- you might eliminate all items with a
correlation with the total score less that .6, for example.
For each item, get the average rating for the top quarter of judges and the bottom quarter. Then, do a t-test of the differences between the
mean value for the item for the top and bottom quarter judges.
Higher t-values mean that there is a greater difference between the highest and lowest judges. In more practical terms, items with higher t-
values are better discriminators, so you want to keep these items. In the end, you will have to use your judgement about which items are
most sensibly retained. You want a relatively small number of items on your final scale (e.g., 10-15) and you want them to have high Item-
Total correlations and high discrimination (e.g., high t-values).
Administering the Scale. You're now ready to use your Likert scale. Each respondent is asked to rate each item on some response scale. For
instance, they could rate each item on a 1-to-5 response scale where:
1. = strongly disagree
2. = disagree
3. = undecided
4. = agree
5. = strongly agree
There are a variety possible response scales (1-to-7, 1-to-9, 0-to-4). All of these odd-numbered scales have a middle value is often labeled
Neutral or Undecided. It is also possible to use a forced-choice response scale with an even number of responses and no middle neutral or
undecided choice. In this situation, the respondent is forced to decide whether they lean more towards the agree or disagree end of the scale for
each item.
The final score for the respondent on the scale is the sum of their ratings for all of the items (this is why this is sometimes called a "summated"
scale). On some scales, you will have items that are reversed in meaning from the overall direction of the scale. These are called reversal items.
You will need to reverse the response value for each of these items before summing for the total. That is, if the respondent gave a 1, you make it
a 5; if they gave a 2 you make it a 4; 3 = 3; 4 = 2; and, 5 = 1.
Example: The Employment Self Esteem Scale
Here's an example of a ten-item Likert Scale that attempts to estimate the level of self esteem a person has on the job. Notice that this instrument
has no center or neutral point -- the respondent has to declare whether he/she is in agreement or disagreement with the item.
INSTRUCTIONS: Please rate how strongly you agree or disagree with each of the following statements by placing a check mark in the
appropriate box.

1. I feel good about my work on the job.

2. On the whole, I get along well with others at work.

3. I am proud of my ability to cope with difficulties at work.

4. When I feel uncomfortable at work, I know how to handle it.
5. I can tell that other people at work are glad to have me there.

6. I know I'll be able to cope with work for as long as I want.
7. I am proud of my relationship with my supervisor at work.

8. I am confident that I can handle my job without constant

Strongly Disagree Somewhat Disagree Somewhat Agree Strongly Agree assistance.
9. I feel like I make a useful contribution at work.

10. I can tell that my coworkers respect me.


Guttman scaling is also sometimes known as cumulative scaling or scalogram analysis. The purpose of
Guttman scaling is to establish a one-dimensional continuum for a concept you wish to measure. What
does that mean? Essentially, we would like a set of items or statements so that a respondent who agrees
with any specific question in the list will also agree with all previous questions. Put more formally, we
would like to be able to predict item responses perfectly knowing only the total score for the respondent.
For example, imagine a ten-item cumulative scale. If the respondent scores a four, it should mean that he/
she agreed with the first four statements. If the respondent scores an eight, it should mean they agreed
with the first eight. The object is to find a set of items that perfectly matches this pattern. In practice, we
would seldom expect to find this cumulative pattern perfectly. So, we use scalogram analysis to examine
how closely a set of items corresponds with this idea of cumulativeness. Here, I'll explain how we develop
a Guttman scale.
Define the Focus. As in all of the scaling methods. we begin by defining the focus for our scale. Let's
imagine that you wish to develop a cumulative scale that measures U.S. citizen attitudes towards
immigration. You would want to be sure to specify in your definition whether you are talking about any type
of immigration (legal and illegal) from anywhere (Europe, Asia, Latin and South America, Africa).
Develop the Items. Next, as in all scaling methods, you would develop a large set of items that reflect the
concept. You might do this yourself or you might engage a knowledgeable group to help. Let's say you
came up with the following statements:
I would permit a child of mine to marry an immigrant.

I believe that this country should allow more immigrants in.
I would be comfortable if a new immigrant moved next door to me.
I would be comfortable with new immigrants moving into my community.
It would be fine with me if new immigrants moved onto my block.
I would be comfortable if my child dated a new immigrant.
Of course, we would want to come up with many more statements (about 80-100 would be desirable).
Rate the Items. Next, we would want to have a group of

judges rate the statements or items in terms of how
favorable they are to the concept of immigration. They
would give a Yes if the item was favorable toward
immigration and a No if it is not. Notice that we are not
asking the judges whether they personally agree with the
statement. Instead, we're asking them to make a
judgment about how the statement is related to the
construct of interest.
Develop the Cumulative

Scale. The key to
Guttman scaling is in the
analysis. We construct a
matrix or table that shows
the responses of all the
respondents on all of the
items. We then sort this
matrix so that
respondents who agree
with more statements are
listed at the top and those
agreeing with fewer are
at the bottom. For
respondents with the
same number of
agreements, we sort the
statements from left to
right from those that most
agreed to to those that
fewest agreed to. We
might get a table
something like the figure. Notice that the scale is very nearly cumulative when you read from left to right
across the columns (items). Specifically if someone agreed with Item 7, they always agreed with Item 2.
And, if someone agreed with Item 5, they always agreed with Items 7 and 2. The matrix shows that the
cumulativeness of the scale is not perfect, however. While in general, a person agreeing with Item 3
tended to also agree with 5, 7 and 2, there are several exceptions to that rule.
While we can examine the matrix if there are only a few items in it, if there are lots of items, we need to
use a data analysis called scalogram analysis to determine the subsets of items from our pool that best
approximate the cumulative property. Then, we review these items and select our final scale elements.
There are several statistical techniques for examining the table to find a cumulative scale. Because there
is seldom a perfectly cumulative scale we usually have to test how good it is. These statistics also estimate
a scale score value for each item. This scale score is used in the final calculation of a respondent's score.
Administering the Scale. Once you've selected the final scale items, it's relatively simple to administer
the scale. You simply present the items and ask the respondent to check items with which they agree. For
our hypothetical immigration scale, the items might be listed in cumulative order as:
I believe that this country should allow more immigrants in.

I would be comfortable with new immigrants moving into my community.
It would be fine with me if new immigrants moved onto my block.
I would be comfortable if a new immigrant moved next door to me.
I would be comfortable if my child dated a new immigrant.
I would permit a child of mine to marry an immigrant.
Of course, when we give the items to the respondent, we would probably want to mix up the order. Our
final scale might look like:
INSTRUCTIONS: Place a check next to each statement you agree

with.
_____ I would permit a child of mine to marry an

immigrant.
_____ I believe that this country should allow more

immigrants in.
_____ I would be comfortable if a new immigrant

moved next door to me.
_____ I would be comfortable with new immigrants

moving into my community.
_____ It would be fine with me if new immigrants

moved onto my block.
_____ I would be comfortable if my child dated a new

immigrant.
Each scale item has a scale value associated with it (obtained from the scalogram analysis). To compute a
respondent's scale score we simply sum the scale values of every item they agree with. In our example,
their final value should be an indication of their attitude towards immigration.

[ Home ]
The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship
between two variables. Let's work through an example to show you how this statistic is computed.
Correlation Example
Let's assume that we want to look at the relationship between two variables, height (in inches) and self esteem. Perhaps we have a hypothesis that
how tall you are effects your self esteem (incidentally, I don't think we have to worry about the direction of causality here -- it's not likely that self
esteem causes your height!). Let's say we collect some information on twenty individuals (all male -- we know that the average height differs for
males and females so, to keep this example simple we'll just use males). Height is measured in inches. Self esteem is measured based on the
average of 10 1-to-5 rating items (where higher scores mean higher self esteem). Here's the data for the 20 cases (don't take this too seriously -- I
made this data up to illustrate what a correlation is):
Person Height Self Esteem

1 68 4.1
2 71 4.6
3 62 3.8
4 75 4.4
5 58 3.2
6 60 3.1
7 67 3.8
8 68 4.1
9 71 4.3
10 69 3.7
11 68 3.5
12 67 3.2
13 63 3.7
14 62 3.3
15 60 3.4
16 63 4.0
17 65 4.1
18 67 3.8
19 63 3.4
20 61 3.6
Now, let's take a quick look at the histogram for each variable:
And, here are the descriptive statistics:
Variable Mean StDev Variance Sum Minimum Maximum Range

Height 65.4 4.40574 19.4105 1308 58 75 17
Self Esteem 3.755 0.426090 0.181553 75.1 3.1 4.6 1.5
Finally, we'll look at the simple bivariate (i.e., two-variable) plot:

You should immediately see in the bivariate plot that the relationship between the variables is a positive one (if you can't see that, review the
section on types of relationships) because if you were to fit a single straight line through the dots it would have a positive slope or move up from left
to right. Since the correlation is nothing more than a quantitative estimate of the relationship, we would expect a positive correlation.
What does a "positive relationship" mean in this context? It means that, in general, higher scores on one variable tend to be paired with higher
scores on the other and that lower scores on one variable tend to be paired with lower scores on the other. You should confirm visually that this is
generally true in the plot above.
Calculating the Correlation
Now we're ready to compute the correlation value. The formula for the correlation is:
We use the symbol r to stand for the correlation. Through the magic of mathematics it turns out that r will always be between -1.0 and +1.0. if the
correlation is negative, we have a negative relationship; if it's positive, the relationship is positive. You don't need to know how we came up with this
formula unless you want to be a statistician. But you probably will need to know how the formula relates to real data -- how you can use the formula
to compute the correlation. Let's look at the data we need for the formula. Here's the original data with the other necessary columns:
Person Height (x) Self Esteem (y) x*y x*x y*y

1 68 4.1 278.8 4624 16.81
2 71 4.6 326.6 5041 21.16
3 62 3.8 235.6 3844 14.44
4 75 4.4 330 5625 19.36
5 58 3.2 185.6 3364 10.24
6 60 3.1 186 3600 9.61
7 67 3.8 254.6 4489 14.44
8 68 4.1 278.8 4624 16.81
9 71 4.3 305.3 5041 18.49
10 69 3.7 255.3 4761 13.69
11 68 3.5 238 4624 12.25
12 67 3.2 214.4 4489 10.24
13 63 3.7 233.1 3969 13.69
14 62 3.3 204.6 3844 10.89
15 60 3.4 204 3600 11.56
16 63 4 252 3969 16
17 65 4.1 266.5 4225 16.81
18 67 3.8 254.6 4489 14.44
19 63 3.4 214.2 3969 11.56
20 61 3.6 219.6 3721 12.96
Sum = 1308 75.1 4937.6 85912 285.45
The first three columns are the same as in the table above. The next three columns are simple computations based on the height and self esteem
data. The bottom row consists of the sum of each column. This is all the information we need to compute the correlation. Here are the values from
the bottom row of the table (where N is 20 people) as they are related to the symbols in the formula:
Now, when we plug these values into the formula given above, we get the following (I show it here tediously, one step at a time):
So, the correlation for our twenty cases is .73, which is a fairly strong positive relationship. I guess there is a relationship between height and self
esteem, at least in this made up data!
Testing the Significance of a Correlation
Once you've computed a correlation, you can determine the probability that the observed correlation occurred by chance. That is, you can conduct
a significance test. Most often you are interested in determining the probability that the correlation is a real one and not a chance occurrence. In
this case, you are testing the mutually exclusive hypotheses:
Null Hypothesis: r=0

Alternative Hypothesis: r <> 0
The easiest way to test this hypothesis is to find a statistics book that has a table of critical values of r. Most introductory statistics texts would have
a table like this. As in all hypothesis testing, you need to first determine the significance level. Here, I'll use the common significance level of alpha
= .05. This means that I am conducting a test where the odds that the correlation is a chance occurrence is no more than 5 out of 100. Before I
look up the critical value in a table I also have to compute the degrees of freedom or df. The df is simply equal to N-2 or, in this example, is 20-2 =
18. Finally, I have to decide whether I am doing a one-tailed or two-tailed test. In this example, since I have no strong prior theory to suggest
whether the relationship between height and self esteem would be positive or negative, I'll opt for the two-tailed test. With these three pieces of
information -- the significance level (alpha = .05)), degrees of freedom (df = 18), and type of test (two-tailed) -- I can now test the significance of the
correlation I found. When I look up this value in the handy little table at the back of my statistics book I find that the critical value is .4438. This
means that if my correlation is greater than .4438 or less than -.4438 (remember, this is a two-tailed test) I can conclude that the odds are less than
5 out of 100 that this is a chance occurrence. Since my correlation 0f .73 is actually quite a bit higher, I conclude that it is not a chance finding and
that the correlation is "statistically significant" (given the parameters of the test). I can reject the null hypothesis and accept the alternative.
The Correlation Matrix
All I've shown you so far is how to compute a correlation between two variables. In most studies we have considerably more than two variables.
Let's say we have a study with 10 interval-level variables and we want to estimate the relationships among all of them (i.e., between all possible
pairs of variables). In this instance, we have 45 unique correlations to estimate (more later on how I knew that!). We could do the above
computations 45 times to obtain the correlations. Or we could use just about any statistics program to automatically compute all 45 with a simple
click of the mouse.
I used a simple statistics program to generate random data for 10 variables with 20 cases (i.e., persons) for each variable. Then, I told the program
to compute the correlations among these variables. Here's the result:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10
C1 1.000
C2 0.274 1.000
C3 -0.134 -0.269 1.000
C4 0.201 -0.153 0.075 1.000
C5 -0.129 -0.166 0.278 -0.011 1.000
C6 -0.095 0.280 -0.348 -0.378 -0.009 1.000
C7 0.171 -0.122 0.288 0.086 0.193 0.002 1.000
C8 0.219 0.242 -0.380 -0.227 -0.551 0.324 -0.082 1.000
C9 0.518 0.238 0.002 0.082 -0.015 0.304 0.347 -0.013 1.000
C10 0.299 0.568 0.165 -0.122 -0.106 -0.169 0.243 0.014 0.352 1.000
This type of table is called a correlation matrix. It lists the variable names (C1-C10) down the first column and across the first row. The diagonal of
a correlation matrix (i.e., the numbers that go from the upper left corner to the lower right) always consists of ones. That's because these are the
correlations between each variable and itself (and a variable is always perfectly correlated with itself). This statistical program only shows the lower
triangle of the correlation matrix. In every correlation matrix there are two triangles that are the values below and to the left of the diagonal (lower
triangle) and above and to the right of the diagonal (upper triangle). There is no reason to print both triangles because the two triangles of a
correlation matrix are always mirror images of each other (the correlation of variable x with variable y is always equal to the correlation of variable y
with variable x). When a matrix has this mirror-image quality above and below the diagonal we refer to it as a symmetric matrix. A correlation
matrix is always a symmetric matrix.
To locate the correlation for any pair of variables, find the value in the table for the row and column intersection for those two variables. For
instance, to find the correlation between variables C5 and C2, I look for where row C2 and column C5 is (in this case it's blank because it falls in the
upper triangle area) and where row C5 and column C2 is and, in the second case, I find that the correlation is -.166.
OK, so how did I know that there are 45 unique correlations when we have 10 variables? There's a handy simple little formula that tells how many
pairs (e.g., correlations) there are for any number of variables:
where N is the number of variables. In the example, I had 10 variables, so I know I have (10 * 9)/2 = 90/2 = 45 pairs.
Other Correlations
The specific type of correlation I've illustrated here is known as the Pearson Product Moment Correlation. It is appropriate when both variables are
measured at an interval level. However there are a wide variety of other types of correlations for other circumstances. for instance, if you have two
ordinal variables, you could use the Spearman rank Order Correlation (rho) or the Kendall rank order Correlation (tau). When one measure is a
continuous interval level one and the other is dichotomous (i.e., two-category) you can use the Point-Biserial Correlation. For other situations,
consulting the web-based statistics selection program, Selecting Statistics at http://trochim.human.cornell.edu/selstat/ssstart.htm.

Concept Systems, Inc.
Engages organizations to develop common understanding
Enables organizations to build consensus
Empowers organizations to turn ideas into action
SOFTWARE TRAINING * VERSION 4.0 SOFTWARE * RESEARCH ASSOCIATION * SEARCH

LIBRARY * NEWS * CONSULTING SERVICES
401 East State Street, Suite 402, Ithaca, NY 14850; Tel: 607.272.1206
Concept Systems Software Version 4
Version 4: New features, enhanced data capability, easy Further Resources:

to use.
Register and try it
FREE!
The Concept System® Version 4's integrated reporting
capabilities, more intuitive navigation and new features like Go- Concept Systems
Zones make this release a new generation of software. Research Association
SEARCH SITE Some new and improved features: Software Overview
Reporting and Exporting
● Improved and integrated reporting capabilities.

● More reports, improved layouts and legends
The Concept System Importing from Concept Systems Global

assists you by
integrating
● Now integrated with our new Internet data collection system 'Concept Systems Global'
measurement and
● Contact us for a demo
data collection with
existing systems.
Read More... System Requirements:
Microsoft Windows 2000 or greater
128 MB RAM
20 MB Hard Drive Space
BACK TO TOP Home - About - News - Services - Software - Clients - Library - Search
Copyright © 2005 Concept Systems Incorporated; Comments or Questions? Email CSI Webmaster
Concept Mapping
One of the most daunting tasks facing any working group is how Recommended Reading
to create and implement a common framework that will guide
Quick Tour
their actions. We believe that Concept Mapping is the solution to
this dilemma as our Concept Maps visually organize the ideas of
The Concept System
a group or organization.
Pattern Matching
Whether you are conducting strategic planning, trying to create
SEARCH SITE a new training program, or attempting to improve the Our Software
performance of your workforce, you will need a way to bring
together the different stakeholders in your organization and help Licensing
them rapidly form a shared vision for future actions.
How do most organizations do this today?

They conduct seemingly endless committee meetings and focus groups that end up with
walls covered with newsprint. Concept Mapping solves this problem effectively and efficiently
The Concept System by using state-of-the-art technology and processes to enable participants to have their say
has been used individually, and rapidly integrate their input into a common strategic vision.
successfully by
major management
Creating A Concept Map
consulting
It takes only a few simple steps to create a concept map.
companies,
corporations,
government
agencies and not-
for-profit
organizations for 1) Facilitators and participants identify the focus of the project, such
strategic planning, as the components of a new training program or the strategic
needs assessment, performance objectives for their organization;
training curriculum
development and
evaluation,
performance and
program evaluation
and focus groups.
Read More... 2) The group brainstorms ideas over the web using our proprietary
Internet software or in a facilitated group session using the Concept
System Core Program;
3) Selected participants organize the ideas by sorting them into

groups of related items and then rate them according to priority or
relative importance (or on other scales as necessary);
4) The Concept System Core Program uses state-of-the-art analytic
methods to map the ideas for the entire group, providing a single
graphic that acts as a roadmap or blueprint for subsequent work;
5) Participants interpret the maps, discuss how the ideas are

organized, and identify the critical high-priority areas (called Go-
Zones);
6) The organization uses the results to organize for action, examine

consensus, and evaluate subsequent actions.
Reading A Concept Map

Concept Maps take an enormous amount of information and consolidate it into a concise,
readable graphic. By working with concept maps, a group of people can rapidly explore the
relative importance (or other factors) of different ideas and use this shared vision as the
basis for further action.
SEARCH SITE
The whole process

generally takes less
time than typical
focus group
exercises, and the
results are much
more reliable and
easy to compare.
Read More...
Beginning of Slide Show Next Slide >>
Concept Systems Partial Client List
Our powerful software and consulting services have been used in hundreds of projects for
government agencies, national associations, non-profit organizations, and private
businesses. The types of projects weâ™ve consulted on are just as varied as our clients
themselves. A sample listing of our diverse client base is as follows:
SEARCH SITE
Centers For Disease Control and Prevention American Cancer Society
The Concept System

assists you by
integrating National Cancer Institute Delta Air Lines
measurement and
existing systems.
Read More...
United States Department of Labor
Cornell University
Massachusettes Cancer Coalition

The Office on Women's Health
University of North Carolina
Office of Behavioral and Social Science

Research
DB Consulting Group
Hawaii Department of Health
State of Georgia
Roswell Park Cancer Institute
Price Waterhouse
Nortel Networks
Netlan Nationwide
Motorola
State of Delaware
Hallmark
Albany County (NY) Government Strategic
Amtrak
Citgo
Southern Tier Health Care System Inc.
Have you completed a successful Concept Mapping project, with or without our software?
Concept Systems wants to hear about it! Please complete our quick info form. Thanks!
Certification Training
Training provides Concept System users with all the tools and Registration:
support they need to apply The Concept System® effectively
Simply call our office or
throughout their organizations for planning, development,
send an email if you're
decision-making and evaluation.
interested in this training.
We'll discuss costs, help
CSI offers two levels of training: you find lodging, and get
you set up.
SEARCH SITE · Core Training for Facilitators and Researchers csiinfo@conceptsystems.
· Advanced Training for Facilitators and Researchers com or call (607) 272-
1206.
2006 Certification Training Schedule - CS Core Training
Ithaca, NY
January 18-20, 2006
Concept Systems CSI offices
assists organizations
to build consenses Ithaca, NY
and committment April 26-28, 2006
by capturing and CSI offices
illustrating multiple
perspectives on
Ithaca, NY
complex issues.
July 19-21, 2006
Read More...
CSI offices
Ithaca, NY
October 4-6, 2006
CSI offices
Learn how to help your organization, business or agency organize for success!
As a certified facilitator of The Concept SystemÂ®, you can:
● Increase your efficiency by structuring your group decision-making processes

● Ensure buy-in and group consensus with a conceptual framework defined by the entire
group
● Eliminate the inefficiencies of traditional âœgroup-thinkâ•
● Optimize understanding with easy-to-read pictorial maps
● Pinpoint disconnects before significant investments are made
● Ensure accurate follow-up with measures that track the project from start to finish
● Develop Action Plans from your unique concept maps and pattern matches
Register by email: csiinfo@conceptsystems.com or call (607) 272-1206.
Concept Systems Research Association
CSRA is an exciting new membership association to increase Further Resources:

your research capabilities and methodology knowledge as a
CSRA member's only
social researcher. CSRA's services and support are tailored to
researchers who are using or plan to use The Concept System® website
software, a unique tool using rigorous statistical analyses to
produce accurate and understandable graphical results. Membership Form
Version 4
SEARCH SITE The CSRA is committed to:
Software Overview
● Contributing to the public good, by encouraging high
quality research.
● Creating a network of colleagues to enhance the quality of social research.
● Advancing the concept mapping methodology in design and application.
Member Benefits
A Sort involves
categorizing
statements The benefits of membership in CSRA will be invaluable to you in your research.
according to Already a member? Log in now
MEANING As a member you can:
Read More...
● Expand your research, ask methodological application questions
and learn from colleagues through a members-only bulletin
board.
● Search a directory of concept mapping researchers to connect
with colleagues with shared research interests.
● Participate in quarterly Concept Mapping Seminars, addressing
questions and topics submitted by members. Whether the
facilitator is William Trochim, Ph.D., creator of The Concept
System®, a CSI consultant, or a CSRA member, you will enjoy a
thought-provoking and educational seminar via conference call.
● Access a special members-only web page to find seminar dates and topics, links to the
bulletin board and the directory, and project management tips.
● Be assured that your process and methodological questions will be answered quickly.
● Enjoy a complimentary half-hour of project support from knowledgeable and
experienced CSI staff.
● Have opportunities for pre-publication review of your research.
Membership Terms
● 12-month rolling membership

● Members are graduate students, academics and one-person consulting firms whose
work is in the public sector.
● Annual dues: Graduate students $75, all others $150.
Become a Member
To become a member call us at 607-272-1206 or fax the Membership Form to us at 607-272-

1215. Membership Form
Sitewide Search
Please enter kewords in the box below and click the arrow to search.
Search:
SEARCH SITE
Reindex the collection
CSI uses the

Concept System in
requirements
analysis as
'engagement
insurance'--the
client has
committed to a
course of action,
and the delivery and
satisfaction can be
measured on what
the client has
already agreed to.
Read More...
Library and Resources Center
The CSI Web Library/Resource Center is a comprehensive center Bill Trochim

for all types of information relating to Concept Mapping. Here
are documents and links relating to concept mapping, related
research, case studies, white papers, software documentation,
project reports, presentations, book chapters, dissertations, and
more!
SEARCH SITE Published articles about Concept Mapping by Bill Trochim

(founder of the concept mapping methodology)
Published articles about Concept Mapping by CSI Staff
Case Studies - See how our solutions can empower your

business. Bill Trochim's center for
social research methods
The Concept System
assists you by White Papers: about Concept Mapping projects that have helped
integrating define and push the limits of the methodology.
measurement and
data collection with The Concept System Software Guide, Version 1.75 (Adobe PDF format).
existing systems.
Read More...
Knowledge Base: the complete textbook of the process and technology behind The Concept
System.
William Trochim at Cornell (click to launch his website in a new browser)
Search Entire Site:

Welcome to Ideas in Action - CSI's online newsletter. Here you can find
all of the latest information about Concept Systems, Incorporated and
learn about our Consulting, Software, Clients, Case Studies and Staff.
Best Regards,
Mary Kane, President & CEO
SEARCH SITE
CSI Newsletter September/October 2005 Issue

CSI Announces:
CDC Project Team Wins 2005 Peavy Workforce Development Award; Concept Systems Inc.
Key Team Member
Pattern matches are
based on the The Project Officer of the Future program at the Centers for Disease Control & Prevention
information in a (CDC) was awarded this yearâ™s prestigious James Virgil Peavy Workforce Development
concept map. Award. The Project Officer of the Future program focuses on understanding and
Read More... responding to the specific learning and development needs of project officers, public
health program specialists who support the work of state, local and territory public health
agencies. CDC makes this award once a year to recognize the work of individuals or teams
who promote public health workforce learning and development.
Concept Systems Inc. has been involved with the initiative since its inception in 2002,
performing the key functions of training and performance needs analysis, research and
data aggregation. CSI is also the primary contractor for all instructional design for Project
Officer of the Future courses, and evaluation of the program. CSI has developed three
courses so far, including: Understanding Your Partnerâ™s Context and Being a Program
Champion; Consulting, Leadership and Resource Linkage for Effective Partnerships and
Sustainable Programs; and CDC Stewardship: Navigating Partners through Cooperative
Agreements. CSI is also looking forward to rolling out the following two remaining courses
over the next year: Getting to Results: Planning, Implementation & Evaluation and The Art
and Science of Public Health: Perspectives, Theory & Practice.
Featured Affiliate
Bud Nicola, MD:

Bud is a Senior Consultant and CDC assignee to The Robert Wood Johnson Foundationâ
™s Turning Point National Program Office, has practiced public health at the local and
federal level for 26 years. His past experience includes directing the Division of Public
Health Systems at the Centers for Disease Control and Prevention, the Seattle King
County Health Department, and the Tacoma Pierce County Health Department. He is an
Affiliate Associate Professor with the Department of Health Services. His interests include
the improvement of the public health system, the use of information technology in public
health, and the mobilization of communities to action on health issues.
Bud's Research:
Academic Alliance for Disease Detection in Asia-Pacific
CSI software was used for a project called âœAcademic Alliance for Disease Detection in
Asia-Pacific.â• The principal aim of this project is to build laboratory and epidemiologic
capacity including effective organizational infrastructure to support Emerging Infectious
Disease surveillance and response in selected countries of the Asia-Pacific region in
partnership with academic centers and others throughout the Pacific Rim.
The Focus Statement used in the project was: âœA specific action that universities and
foundations working together can do or should do to support global disease detection is...â
• Ratings of importance and feasibility were used. Demographic information collected on
participants included institutional affiliation, geographic location, years working, and major
focus of work.
The brainstorming was done by a group of 29 public health practitioners and academicians
from the U.S. and Pacific Island jurisdictions. I then facilitated several planning meetings
of a subset of the brainstorm group during which I presented the sorting, rating, and
analysis. This entire planning process took place over an eight-month period. The CSI
software helped us meet our goals by focusing the group on the most important (and
feasible) areas for action and by giving us planning data for future funding. CSIâ™s
technical support was particularly helpful in thinking through the nuances of the Focus
Statement.
CSI Project Spotlight

CSI Project Spotlight describes work that the CSI consulting team is engaged in, and
features unique applications of the concept mapping methodology and new uses of project
results.
Office of Cancer Survivorship

Concept Systems Inc. (CSI) has collaborated on several projects with Dr. Jon Kerner,
Deputy Director of Research Dissemination and Diffusion, and Cynthia Vinson,
Dissemination and Diffusion Coordinator, in the Office of Dissemination and Diffusion at
the National Cancer Institute. Through this work, CSI was introduced to several other vital
public health agencies, including the Office of Cancer Survivorship (OCS) at the National
Cancer Institute.
CSI is currently working with the OCS to learn about how research on cancer survivorship
might be better integrated and coordinated across the Institute. OCS asked 270 NCI
colleagues to respond to the focus prompt: âœOne specific aspect of the work that I am
doing, or would like to do, that relates to cancer survivorship isâ¦â•
The Concept Mapping methodology is uniquely suited to building awareness and consensus
on this important topic, because it is multifaceted and multi-disciplinary. Cancer biology
scientists, social and behavioral scientists, and epidemiologists must create a common
vocabulary and understand one anotherâ™s priorities if they are to work together on this
issue.
Because these groups typically belong to different research spheres with different cultures
and ways of viewing their work, this can be a challenging task. With its mix of individual
input and group interpretation of results, the Concept Mapping methodology is making it
possible different types of cancer researchers to contribute their unique perspective, and
still create a shared understanding of the issues. This makes strategic planning and
priority setting possible.
The ability to participate in the process online, without a series of meetings, made the
process efficient and doable.
The OCS project team is currently reviewing the concept map and discussing new ways to
approach their work, to allow for more interdisciplinary collaborations across the Institute.
The staff is hopeful that these collaborations will help them accelerate the pace of
discovery, development, and delivery of interventions and support that cancer survivors
need most.
Recent Publications
Mary Kane, in collaboration with other authors, has published a peer-reviewed article in
CDCâ™s journal âœPreventing Chronic Diseaseâ•. The article is based on a recent
concept mapping project and discusses the role that state public health agencies have in
addressing less prevalent chronic conditions. The full text of the article is available at:
http://www.cdc.gov/pcd/issues/2005/jul/04_0129.htm
The Journal of the National Medical Association recently printed an article outlining the
results of a concept-mapping project. The project identified barriers to racial/ethnic
minority application and competition for NIH research funding. The full text of the article is
available at:
http://www.nmanet.org/JMNA_Journal_Articles/August-05_JNMA/OC1063.pdf
Earlier this year, William M. K. Trochim and Mary Kane had their work published in the
International Journal for Quality in Health Care. The article outlines the basic steps and
analysis sequence in the concept mapping method and presents a brief example of results
from a recent public health planning project. The full text of the article is available at:
http://conceptsystems.com/papers/publications/Structured%20Conceptualization%20in%
20Health%20Care.pdf
William M. K. Trochim and Derek Cabrera authored an article for âœEmergenceâ•. The
article introduces concept mapping is a participatory mixed methodology and discusses
how it is currently used in policy analysis and management contexts. The full text of the
article is available at: http://www.socialresearchmethods.net/research/
The_Complexity_of_Concept_Mapping.pdf
CSI Training
CSI had a total of 22 participants in the 5 core trainings that were conducted this year, at
the Ithaca offices. A sampling of the trainees is listed below, along with their expected
project focus.
Participants
Participant Organization Project Focus
Bradshaw, Catherine âœWhat factors contribute to (are important for)

Johns Hopkins Bloomberg successful implementation of evidence-based child mental
School of Public Health health interventions in schools?â•
Freese (Gibson), Laura

McGuire, Peggy âœCharacteristics of effective organizations.â•
University of Louisville
Jasek, Kirsten âœA core group of employees and board members will
University CA Stanislaus determine the focus question.â•
âœWe have a group of researchers who are coming

Jones, Nora together to create a new field â“ human
Sankar, Pamela pharmacogenomic epidemiology (HPE) â“ comprised of
University of geneticists, bioethicists, epidemiologists, biostatisticians,
PennsylvaniaCenter for bioinformaticians, and we are interested in how this
Bioethics disparate group can come to a common understanding/
definition of what HPE is.â•
Lubker, Disa
âœThe barriers to getting healthcare in my community
(Student of Anne Wallis)
are â¦â•
University of Iowa
Mpofu, Dr Elias âœWhat components would be needed for a STEM

Penn State University program for students with disabilities.â•
âœA specific action that universities and foundations

Nicola, Bud
working together can do or should do to support global
University of Washington
disease detection is...â•
Parker, Frank
âœIn order to enhance the quality of the freshman year
Hermitte, Gene
experience, within the next five years, a specific action
Whitworth, Ling
Johnson C. Smith University should take isâ¦â•
Johnson C Smith University
Vishwanath, Arun âœWhat are the barriers to the adoption of electronic

Buffalo Academy of Medicine medical records?â•
Ward, Kristin Reducing disproportionality and disparate outcomes for

Casey Family Services children and families of color in the child Welfare system
Browse our past newsletters:

Jul/Aug 2004
Apr/May 2004
Jan/Feb 2004
May/June 2003
CSI Consulting Services
CSI performs consulting and facilitation services in the areas of Further Resources:
organizational planning, project design, development and
Client List Page
evaluation, and strategic stakeholder-based issues assessment
and decision-making. Additionally, the company works with
Software page
clients to develop training curriculum and corresponding
materials and frequently delivers training programs. CSI utilizes
Employees
its proprietary software in many consulting projects and requires
a use license fee when it is employed.
SEARCH SITE
The company serves clients in public health, education, academe and the private sector. A
rigorous, but customizable consulting and facilitation approach is employed that rests on a
three-pronged foundation:
● A responsive, time-effective and client-focused process methodology;

● A proprietary set of technologies that engage stakeholders, elicit buy-
Our clients and in and awareness and lead to concrete, usable results;
users range from ● Experience and content expertise ranging from corporate
some of the biggest communications and training development to large public health
organizations in the planning initiatives.
world to individual
researchers and
The process methodology includes client-centered planning, issues
consultants.
identification, strategies for high inclusion, and detailed timeframes and deliverables that are
Read More...
conscientiously adhered to. CSI
● Involves stakeholders and participants in the creation of solutions

● Creates tools and deliverables that are understandable and usable for â
œoperationalizingâ• needed change for any number of stakeholders.
● Assists with the change initiative that results from a successful stakeholder-driven
process that is affordable, timely and effective.
● Provides evaluation services to ensure adherence to program goals.
The Concept System®, commercially available proprietary technology, provides a framework

for high-quality involvement; enabling issues identification, focus group results, conceptual
framework creation and ratings survey results to be accomplished and integrated in an
extremely time-effective manner. The deliverables include common frameworks of
agreement on the issues at hand; a strategy for implementing change that will have a high
degree of likely success; action planning results, and evaluation plans.
Contact us today to find out more!
Trochim Publications
PUBLICATIONS
Trochim, W. and Cabrera, D. (2005). The complexity of concept mapping. Emergence: Complexity
and Organization, 7, 1, 11-22.
Trochim, W. and Kane, M (2005). Concept mapping: An introduction to structured conceptualization

in health care. International Journal for Quality in Health Care, 17, 3, June 2005, 187-191.
Dunifon, R., Duttweiler, M., Pillemer, K., Tobias, D., and Trochim, W. (2004). Evidence-Based Extension,
Journal of Extension, April 2004, Volume 42, Number 2. http://www.joe.org/joe/2004april/a2.shtml.
Baldwin, C.M., Kroesen, K., Trochim, W.M. and Bell, I.R. (2004). Complementary and conventional
medicine: A concept map. BMC Complementary and Alternative Medicine, 4:2, http://www.
biomedcentral.com/1472-6882/4/2.
Trochim, W., Milstein, B. Wood, B., Jackson, S. and Pressler, V. (2004). Setting Objectives for
Community and Systems Change: An Application of Concept Mapping for Planning a Statewide
Health Improvement Initiative, Health Promotion Practice, 5, 1, 8-19.
Stokols, D. Fuqua, J. Gress, J., Harvey, R., Phillips, K. Baezconde-Garbanati, L., Unger, J., Palmer, P.,
Clark, M., Colby, S., Morgan, G. and Trochim, W. Evaluating Transdisciplinary Science. In Abrams D.B.,
& Leslie, F. (Eds.), in press, December 2003, Nicotine and Tobacco Research, 5 Suppl 1:S21-39.
Cappelleri J.C. and Trochim W. (2003). Cutoff Designs. In: Chow, Shein-Chung (Ed.) Encyclopedia of
Biopharmaceutical Statistics, Second Edition, 263-269. New York, NY: Marcel Dekker.
Trochim, W. Stillman, F., Clark, P., and Schmitt, C. (2003). Development of a Model of the Tobacco
Industry’s Interference with Tobacco Control Programs. Tobacco Control, 12, 140-147.
Jackson, K. and Trochim, W. (2002). Concept mapping as an alternative approach for the analysis of
open-ended survey responses. Organizational Research Methods, Vol. 5 No. 4, October, 307-336.
Biopharmaceutical Statistics, 149-156. New York, NY: Marcel Dekker.
Nanda, S.K., Rivas, A.L., Trochim, W.M. and Deshler, J.D. (2000). Emphasis on validation in research: A
Meta-analysis. Scientometrics, 48, 1, 45-64
Trochim, W. and Campbell, D.T. (1999). Designs for Community-Based Demonstration Projects. In
Campbell, D.T. & Russo, M.J. (Eds.). Social Experimentation, Sage Publications, Thousand Oaks, CA.,
309-337.
Trochim, W. (1999) The Research Methods Knowledge Base, 2nd Edition. Cornell Custom Publishing,
Cornell University, Ithaca, New York. Also published as a World Wide Web text at URL http://www.
socialresearchmethods.net/kb.
Trochim, W. (1998) Donald T. Campbell and Research Design. American Journal of Evaluation, 19, 3,
407-409.
Trochim, W. (1998) An "Evaluation" of Michael Scriven's "Minimalist Theory: The Least Theory that
Practice Requires". American Journal of Evaluation, 19, 2, 243-249.
McLinden, D. J. & Trochim, W.M.K. (1998). From Puzzles to Problems: Assessing the Impact of
Education in a Business Context with Concept Mapping and Pattern Matching. In J. Phillips (Ed.),
Evaluation Systems and Processes. Alexandria, VA: American Society for Training and Devlopment.
McLinden, D. J. & Trochim, W.M.K. (1998). Getting to parallel: Assessing the return on expecations of
training. Performance Improvement, 37, 8, 21-26.
Witkin, B. and Trochim, W. A (1997) Toward a synthesis of listening constructs: a concept map analysis.
International Journal of Listening, 11, 69-87.
Rivas, A.L., Wilson, D.J. Gonzalez, R.N., Mohammed, H.O., Quimby, F.W., Lein, D.H., Milligan, R.A.,
Colle, R.D., Deshler, J.D., and Trochim, W. (1997). An interdisciplinary and systems-based evaluation of
academic programs: Bovine mastisis-related veterinary research, education and outreach.
Scientometrics, 40, 2, 195-213.
Trochim, W. (1996). Criteria for evaluating graduate programs in evaluation. Evaluation News and
Comment: The Magazine of the Australasian Evaluation Society, 5, 2, 54-57.
Cappelleri, J.C. and Trochim, W. (1995). Ethical and scientific features of cutoff-based designs of
clinical trials: A simulation study. Medical Decision Making, 15, 4, 387-394.
Shern, D.L., Trochim, W. and LaComb, C.A. (1995). The use of concept mapping for assessing fidelity
of model transfer: An example from psychiatric rehabilitation. Evaluation and Program Planning, 18,
2.
Reichardt, C.S., Trochim, W. and Cappelleri, J. (1995). Reports of the death of the regression-
discontinuity design are greatly exaggerated. Evaluation Review, 19, 1, 39-63.
Cappelleri, J. and Trochim, W. (1994). An illustrative statistical analysis of cutoff-based randomized

clinical trials. Journal of Clinical Epidemiology, 47, 261-270.
Cappelleri, J., Darlington, R.B. and Trochim, W. (1994). Power analysis of cutoff-based randomized
clinical trials. Evaluation Review, 18, 141-152.
Trochim, W., Cook, J. and Setze, R. (1994). Using concept mapping to develop a conceptual
framework of staff's views of a supported employment program for persons with severe mental illness.
Consulting and Clinical Psychology, 62, 4, 766-775.
Trochim, W. (1994). The regression-discontinuity design: An introduction. Research Methods Paper

Series, Number 1, Thresholds National Research and Training Center on Rehabilitation and Mental
Illness, Chicago, IL.
Trochim, W., Dumont, J. and Campbell, J. (1993). Mapping mental health outcomes from the
perspective of consumers/survivors. NASMHPD Technical Reports Series, National Association of
Mental Health Program Directors, Alexandria VA.
Trochim, W. and Cappelleri, J.C. (1992). Cutoff assignment strategies for enhancing randomized
clinical trials. Controlled Clinical Trials, 13, 190-212.
Trochim, W. and Cook, J. (1992). Pattern matching in theory-driven evaluation: A field example from
psychiatric rehabilitation. in H. Chen and P.H. Rossi (Eds.) Using Theory to Improve Program and Policy
Evaluations, Greenwood Press, New York, 49-69.
Cappelleri, J.C., Trochim, W., Stanley, T.D., and Reichardt, C.S. (1991). Random measurement error
doesn't bias the treatment effect estimate in the regression-discontinuity design: I. The case of no
interaction. Evaluation Review, 15, 4, 395-419.
Trochim, W., Cappelleri, J.C. and Reichardt, C.S. (1991). Random measurement error doesn't bias the
treatment effect estimate in the regression-discontinuity design: II. When an interaction effect is
present. Evaluation Review, 15,5, 571-604.
Trochim, W. (1990). Regression-discontinuity design in health evaluation. In L. Sechrest, E. Perrin and J.

Bunker (Eds.). Research Methodology: Strengthening Causal Interpretations of Nonexperimental
Data. U.S. Dept. of HHS, Agency for Health Care Policy and Research, Washington, D.C.
Trochim, W. (1989). Outcome pattern matching and program theory. Evaluation and Program
Planning, 12, 4, 355-366.
Trochim, W. (Ed.) (1989). A Special Issue of Evaluation and Program Planning on Concept Mapping
for Planning and Evaluation, 12.
Trochim, W. (1989). An introduction to concept mapping for planning and evaluation. In W. Trochim
(Ed.) A Special Issue of Evaluation and Program Planning, 12, 1-16.
Trochim, W. (1989). Concept mapping: Soft science or hard art? In W. Trochim (Ed.) A Special Issue of
Gurowitz, W.D., Trochim, W. and Kramer, H. (1988). A process for planning. The Journal of the National
Association of Student Personnel Administrators, 25, 4, 226-235.
Trochim, W. (1987) Pattern matching and program theory. In P.H. Rossi and H. Chen (Eds.), Special
Issue on Theory-Driven Evaluation. Evaluation and Program Planning.
Trochim, W. (1986)(Ed.) Advances in quasi-experimental design and analysis. New Directions for
Program Evaluation Series, Number 31, San Francisco, CA: Jossey-Bass.
Trochim, W. and Davis, J. (1986). Computer simulation for program evaluation. Evaluation Review, 10,
5, 609-634.
Trochim, W. and Linton, R. (1986). Conceptualization for evaluation and planning. Evaluation and
Program Planning, 9, 289-308.
Trochim, W. and Davis, J. (1986). Computer simulation of human service program evaluations.
Computers in Human Services, 1, 4, 17-38.
Trochim, W. and Visco, R. (1986). Assuring quality in educational evaluation. Educational Evaluation
and Policy Analysis, 8, 3, 267-276.
Trochim, W. (1985). Pattern matching, validity and conceptualization in program evaluation.

Evaluation Review, 9, 5, 575-604.
Trochim, W. and Visco, R. (1985). Quality control in evaluation. In D.J. Cordray (Ed.), Designing Impact
Evaluations Capitalizing on Prior Research. New Directions for Program Evaluation Series, Number 27,
San Francisco, CA: Jossey-Bass.
Trochim, W. (1984). Research Design for Program Evaluation: The Regression-Discontinuity Approach.
Beverly Hills, CA: Sage Publications. Also, please note: Missing pages 68-87.
Trochim, W. (1983). Methodologically-based discrepancies in compensatory education evaluations.

Evaluation Studies Review Annual, Volume 8, Beverly Hills, CA: Sage Publications.

Trochim, W. and Land, D. (1982). Designing designs for research. The Researcher, 1, 1, 1-6.
Trochim, W. (1981). Resources for locating public and private data. In R.F. Boruch, P.M. Wortman and
D.S. Cordray (Eds.). Reanalyzing Program Evaluations. San Francisco, CA: Jossey-Bass.
Trochim, W. and Spiegelman, C. (1980). The relative assignment variable approach to selection bias
in pretest-posttest group designs. Proceedings of the Social Statistical Section, American Statistical
Association.
Cook, T.D., DelRosario, M., Hennigan, K., Mark, M. and Trochim, W. (1978). (Eds.). Evaluation Studies
Review Annual, Beverly Hills, CA: Sage Publications.
Trochim, W. (1976). The three-dimensional graphic method for quantifying body position. Behavioral
Research Methods and Instrumentation, 8, 1-4.
INVITED ADDRESSES
The Evaluator as Cartographer: Technology for Mapping Where We're Going and Where We've
Been. Keynote presentation to the 1999 Conference of the Oregon Program Evaluators Network,
"Evaluation and Technology: Tools for the 21st Cenury", Portland, Oregon, October 8, 1999. Now also
available in Spanish.
Trochim, W. An "Evaluation" of Michael Scriven's "Minimalist Theory: The Least Theory that Practice
Requires". . Invited paper presented at the Annual Conference of the American Evaluation
Association, San Diego, California, November, 1997.
Trochim, W. Donald T. Campbell and Research Design. Invited paper presented at the Annual
Conference of the American Evaluation Association, Atlanta, Georgia, November, 1996.
Trochim, W. and Trochim, M. Concept mapping and pattern matching for planning and evaluating
business training programs. Presented at the Annual Benchmarking Forum of the American Society
for Training and Development, Massachusetts, November, 1995.
Trochim, W. Concept mapping in mental health services research. Invited presentation at the Annual
Conference of the American Psychological Association, Toronto, August, 1993.
Trochim, W. Concept mapping in mental health services research. Invited presentation at the
National Institute of Mental Health and American Public Health Association Continuing Education
Institute, November 7-8, Washington DC, 1992.
Trochim, W. Advances in Quasi-Experimental Design and Analysis for Mental Health. Pre-Conference
Workshop, The Third Annual National Conference on State Mental Health Agency Services Research,
The National Association of State Mental Health Program Directors, October 21, 1992.
Trochim, W. An Introduction to Concept Mapping. Presentation at the Annual Conference of the

Transition Research Institute, Washington DC, 1992.
Trochim, W. An Introduction to Computerized Group Concept Mapping. Presentation to the U.S. G.A.
O. Annual Technical Conference, Washington DC, 1991.
Trochim, W. Developing an Evaluation Culture in International Agriculture Research. Invited address

presented at the Cornell Institute on International Food, Agriculture and Development's (CIIFAD)
workshop on the Assessment of International Agricultural Research Impact for Sustainable
development, Cornell University, Ithaca NY, June 16-19, 1991.
Trochim, W. The Regression-Discontinuity Design in Health Evaluation. Invited presentation at the

conference on Strengthening Causal Interpretations of Non-Experimental Data sponsored by the
National Center for Health Services Research and Technology Assessment, Tucson, April 8-10, 1987.
PAPER AND PANEL PRESENTATIONS
Trochim, William. Measuring Organizational Performance as a Result of Installing a New Information

System: Using Concept Mapping as the Basis for Performance Measurement. Paper presented at the
Annual Conference of the American Evaluation Association, Orlando, Florida, November, 1999.
Tobias, Donald; Trochim, William. Development of the American Distance Education Consortium
(A*DEC): Using Concept Mapping to Plan and Evaluate Organizational Expansion. Paper presented
at the Annual Conference of the American Evaluation Association, Orlando, Florida, November,
1999.
Kane, Mary; Trochim, William; Tobias, Donald. Multi-Level Collaborations to Create Meaningful
Employment for Disabled Adults: Using Concept Mapping to Plan, Communicate, Design and
Evaluate. Paper presented at the Annual Conference of the American Evaluation Association,
Orlando, Florida, November, 1999.
Trochim, W. Developing a performance measurement system at CITGO to facilitate an SAP

implementation. Presented at the 8th Annual Conference and Expo on Assessment, Measurement &
Evaluation for Accelerated Human Development, Arlington, VA, May 12, 1999.
Warzynski, C. and Trochim, W. A systems approach to socio-technical integration and knowledge

management. A working paper presented at BPR Europe '98: Practical Process and Knowledge
Management. London, U.K., September 15, 1998.
Trochim, W. Mapping Organizational Improvement: From Strategy to Evaluation. Paper Presented at

the 1998 International Conference of the International Society for Performance Improvement,
Chicago, Illinois, March 25, 1998.
Trochim, W. Evaluating the Web-based classroom: From practice to theory. Panel presented at the
Annual Conference of the American Evaluation Association, San Diego, California, November, 1997.
Trochim, W. An Internet-Based Concept Mapping of Accreditation Standards for Evaluation. Paper
presented at the Annual Conference of the American Evaluation Association, Atlanta, Georgia,
November, 1996.
Trochim, W. Evaluating Websites. Paper presented at the Annual Conference of the American
Evaluation Association, Atlanta, Georgia, November, 1996.
Trochim, W. View from the Trenches: Implementing Concept Mapping and Pattern Matching
Technology in a Business Environment. Paper presented at the Annual Conference of the American
Evaluation Association, Vancouver, Canada, November, 1995.
Trochim, W. A Concept Mapping and Pattern Matching Approach to Assessing Business Training.
Paper presented at the Annual Conference of the American Evaluation Association, Vancouver,
Canada, November, 1995.
Trochim, W. and Mclinden, D. On Relabeling "Concept Mapping and Pattern Matching" as the CPM
Model. Paper presented at the Annual Conference of the American Evaluation Association,
Vancouver, Canada, November, 1995.
Silvey, L., McLinden, D. and Trochim, W. A Concept Mapping and Pattern Matching Approach to
Evaluating Business Training. Paper presented at the Performance Measurements for Training
Conference, International Quality and Productivity Center, Atlanta, Georgia, January 24, 1994.
Trochim, W. and McLinden, D. A Concept Mapping and Pattern Matching Approach to Evaluating
Business Training. Paper presented at the Annual Conference of the American Evaluation
Association, Boston, Massachusetts, November, 1994.
Trochim, W. Reliability of Concept Mapping. Paper presented at the Annual Conference of the
American Evaluation Association, Dallas, Texas, November, 1993.
Trochim, W. Concept Mapping in Mental Health Services Research. Panel discussion organized for
the National Conference on Mental Health Statistics, Washington, D.C., June, 1993.
Trochim, W. Concept Mapping in Mental Health. Panel discussion organized for the Annual
Conference of the American Evaluation Association, Seattle, Washington, November, 1992.
Trochim, W. Practical Issues in Concept Mapping. Panel discussion organized for the Annual
Conference of the American Evaluation Association, Seattle, Washington, November, 1992.
Trochim, W. Statistical Analyses for Identifying Point, Planar and Circular Patterns in Two-Dimensional
Map Configurations. Paper presented at the Annual Conference of the American Evaluation
Association, Seattle, Washington, November, 1992.
Trochim, W. Using structured concept mapping to synthesize results from mixed qualitative and
quantitative methods. Paper presented at the Annual Conference of the American Evaluation
Association, Chicago, IL, October, 1991.
Trochim, W., Freeman, R.A. and Siegel, M.L. On the Use of Regression-Discontinuity in the Evaluation of
Pharmaceuticals. Paper presented at the 7th Annual Meeting of the International Society for
Technology Assessment in Health Care, Helsinki, Finland, July, 1991.
Tasch, R.F., Trochim, W., and Freeman, R. The Use of Clinical Trials in Economic Assessments of
Pharmaceuticals. Paper presented at the 7th Annual Meeting of the International Society for
Technology Assessment in Health Care, Helsinki, Finland, July, 1991.
Trochim, W and Cappelleri J. Cutoff Assignment Strategies for Enhancing Randomized Clinical Trials
(RCTs). Paper presented at the Annual Conference of the American Evaluation Association, Wash.,
DC, October, 1990.
Trochim, W and Cappelleri J. Why Stanley and Robinson are wrong again about the regression-
discontinuity design. Paper presented at the Annual Conference of the American Evaluation
Association, Wash., DC, October, 1990.
Cappelleri, J and Trochim, W. Random measurement error in regression-discontinuity designs. Paper

presented at the Annual Conference of the American Evaluation Association, Wash., DC, October,
1990.
Trochim, W. The effect of Aplrazolam on panic: Patterns across symptoms. Paper presented at the
annual conference of the American Evaluation Association, New Orleans, October, 1988.
Trochim, W. Conceptualization for planning and evaluation. Paper presented at the annual
conference of the American Evaluation Association, Kansas City, October, 1986.
Trochim, W. and Dumont, J. Combining multidimensional scaling, cluster analysis and cognitive
mapping for conceptualizing evaluations. Paper presented at the annual conference of the
Evaluation Research Society, Toronto, 1985.
Trochim, W. and Davis, J. Hierarchical structures in the development of statistical simulations. Paper
presented at the annual conference of the Evaluation Research Society, Toronto, 1985.
Trochim, W. Advances in quasi-experimental design and analysis. Paper presented at the annual
conference of the Evaluation Research Society, Toronto, 1985.
Trochim, W. Evaluator roles: The 'theorist' versus the 'methodologist'. Paper presented at the annual
conference of the Evaluation Research Society, Toronto, 1985.
Trochim, W. and Linton, R. Conceptualization for evaluation and planning. Paper presented at the
annual conference of the Evaluation Research Society, Chicago, 1984.
Trochim, W. and Linton, R. Evaluation and planning in a university health service organization: A
structured conceptualization. Paper presented at the annual conference of the Northeaster
Educational Research Association, October, 1984.
Trochim, W. and Linton, R. Framing the evaluation question: Some useful strategies. Paper presented
at the annual conference of the Evaluation Research Society, San Francisco, 1983.
Trochim, W. and Visco, R. Quality control in educational evaluation. Paper presented at the annual
conference of the American Educational Research Association, April, 1983.
Trochim, W. The statistical analysis of data from the regression-discontinuity design. Versions of this
paper were presented at: the annual conference of the American Sociological Association, San
Francisco, 1982; the annual conference of the American Educational Research Association, New
York City, 1982; and, the annual conference of the Northeastern Educational Research Association,
November, 1981.
Trochim, W. Research implementation. Paper presented at the annual conference of the Evaluation
Research Society, Austin, Texas, 1981.
UNPUBLISHED PAPERS, MONOGRAPHS & REPORTS
Trochim, W. and Campbell, D.T. (1996). The regression point displacement design for evaluating
community-based pilot programs and demonstration projects. Unpublished manuscript. Cornell
University, Ithaca, NY.
Trochim, W. Reliability of Concept Mapping. Paper presented at the Annual Conference of the
American Evaluation Association, Dallas, Texas, November, 1993.
Trochim, W. (1993). Workforce Competencies for Psychosocial Rehabilitation Workers: A Concept

Mapping Project. Final report for the conference of the International Association of Psychosocial
Rehabilitation Services, Albuquerque, new mexico, November 11-12, 1993.
Concept Mapping
PLEASE NOTE: This page is for educational use only. Articles provided here are not to be reproduced or
distributed without permission of the copyright holder.
Published Literature
● Anderson, L. A., Gwaltney, M. K., Sundra, D. L., Brownson, R. C., Kane, M., Cross, A. W., et al.
(2006). Using concept mapping to develop a logic model for the prevention research centers
program. Preventing Chronic Disease: Public Health Research, Practice and Policy, 3(1), 1-9.
● Baldwin, C.M., Kroesen, K., Trochim, W.M. and Bell, I.R. (2004). Complementary and
conventional medicine: A concept map. BMC Complementary and Alternative Medicine, 4:2,
http://www.biomedcentral.com/1472-6882/4/2.
● Batterham, R. W., Southern, D. M., Appleby, N. J., Elsworth, G., Fabris, S., Dunt, D., et al. (2002).
Construction of a gp integration model. Social Science & Medicine, 54, 1225-1241.
● Biegel, D. E., Johnsen, J. A., & Shafran, R. (1997). Overcoming barriers faced by african-
american families with a family member with mental illness. Family Relations, 46(2), 163-178.
● Brown, J., & Calder, P. (1999). Concept-mapping the challenges faced by foster parents.
Children And Youth Services Review, 21(6), 481-495.
● Caracelli, V. (1989). Structured conceptualization: A framework for interpreting evaluation

results. Trochim, William (ed.), Evaluation and Program Planning, 12 (1) p. 45-52, a special issue
on Concept Mapping for Evaluation and Planning.
● Carpenter, B. D., Van Haitsma, K., Ruckdeschel, K., & Lawton, M. P. (2000). The psychosocial
preferences of older adults: A pilot examination of content and structure. Gerontologist, 40(3),
335-348.
● Cooksy, L. (1989). In the eye of the beholder: Relational and hierarchical structures in
conceptualization. Evaluation and Program Planning, 12 (1) p. 59-66.
● Cousins, J. B., & MacDonald, C. J. (1998). Conceptualizing the successful product

development project as a basis for evaluating management training in technology-based
companies: A participatory concept mapping application. Evaluation And Program Planning,
21(3), 333-344.
● Daughtry, D., & Kunkel, M. A. (1993). Experience of depression in college-students - a concept

map. Journal Of Counseling Psychology, 40(3), 316-323.
● Davis, J. (1989). Construct validity in measurement: A pattern matching approach. Trochim,
William, Evaluation and Program Planning, 12 (1) p. 31-36, a special issue on Concept
Mapping for Evaluation and Planning.
● DeRidder, D., Depla, M., Severens, P., & Malsch, M. (1997). Beliefs on coping with illness: A
consumer's perspective. Social Science & Medicine, 44(5), 553-559.
● Donnelly, J. P., Huff, S. M., Lindsey, M. L., McMahon, K. A., & Schumacher, J. D. (2005). The
needs of children with life-limiting conditions: A healthcare-provider-based model. American
Journal of Hospice & Palliative Care, 22(4), 259-267.
● Donnelly, J. P., Donnelly, K., & Grohman, K. J. (2005). A multi-perspective concept mapping
study of problems associated with traumatic brain injury. Brain Injury, 19(13), 1077-1085.
● Donnelly, K. Z., Donnelly, J. P., & Grohman, K. J. (2000). Cognitive, emotional, and behavioral
problems associated with traumatic brain injury: A concept map of patient, family, and
provider perspectives. Brain And Cognition, 44(1), 21-25.
● Dumont, J. (1989). Validity of multidimensional scaling in the context of structured

conceptualization. Evaluation and Program Planning, 12 (1) p.81-86.
● Galvin, P. F. (1989). Concept mapping for planning and evaluation of a big brother big sister
program. Evaluation And Program Planning, 12(1), 53-57.
● Gurowitz, W. D., Trochim, W., & Kramer, H. (1988). A process for planning. The Journal of the
National Association of Student Personnel Administrators, 25 (4) p.226-235.
● Hurt, L. E., Wiener, R. L., Russell, B. L., & Mannen, R. K. (1999). Gender differences in evaluating
social-sexual conduct in the workplace. Behavioral Sciences & The Law, 17(4), 413-433.
● Jackson, K. M., & Trochim, W. M. K. (2002). Concept mapping as an alternative approach for
the analysis of open-ended survey responses. Organizational Research Methods, 5(4), 307-336.
● Johnsen, J. A., Biegel, D. E., & Shafran, R. (2000). Concept mapping in mental health: Uses and
adaptations. Evaluation And Program Planning, 23(1), 67-75.
● Keith, D. (1989). Refining concept maps: Methodological issues and an example. Evaluation
and Program Planning, 12 (1) p.75-80.
● Linton, R. (1989). Toward a feminist research method. In Jagger, A.M. and Bordo, S.R. (Eds.).
Gender/Body/Knowledge: Feminist Reconstructions of Being and Knowing. Rutgers University
Press, New Brunswick, NJ, 1989.
● Linton, R. (1989). Conceptualizing feminism: Clarifying social science concepts. Trochim,
William (ed.), Evaluation and Program Planning, 12 (1) p.25-30, a special issue on Concept
Mapping for Evaluation and Planning.
● Mannes, M. (1989). Using concept mapping for planning the implementation of a social
technology. Evaluation and Program Planning, 12 (1) p.67-74.
● Marquart, J. M. (1989). A pattern matching approach to assess the construct validity of an

evaluation instrument. Trochim, William (ed.), Evaluation and Program Planning, 12 (1) p.37-44,
a special issue on Concept Mapping for Evaluation and Planning.
● McLinden, D., & Trochim, W. (1998). Getting to parallel: Assessing the return on expectations of
training. Performance Improvement, 37(8), 21-25.
● McLinden, D., & William, T. (1998). From puzzles to problems: Assessing the impact of education
in a business context with concept mapping and pattern matching. Implementing Evaluation
Systems and Processes, The American Society for Training and Development, Vol. 18, pp285-
304.
● Mercier, C., Piat, M., Peladeau, N., & Dagenais, C. (2000). An application of theory-driven
evaluation to a drop-in youth center. Evaluation Review, 24(1), 73-91.
● Michalski, G. V., & Cousins, J. B. (2000). Differences in stakeholder perceptions about training
evaluation: A concept mapping/pattern matching investigation. Evaluation And Program
Planning, 23(2), 211-230.
● Pammer, W., Haney, M., Wood, B. M., Brooks, R. G., Morse, K., Hicks, P., et al. (2001). Use of
telehealth to extend child protection team services. Pediatrics, 108(3), 584-590.
● Paulson, B. L., Truscott, D., & Stuart, J. (1999). Clients' perceptions of helpful experiences in
counseling. Journal Of Counseling Psychology, 46(3), 317-324.
● Rao, J. K., Alongi, J., Anderson, L. A., Jenkins, L., Stokes, G., & Kane, M. (2005). Development of
public health priorities for end-of-life initiatives. Am J Prev Med, 29(5), 453-460.
● Rosas, S.R. (2005). Concept mapping as a technique for program theory development: An
illustration using family support programs. American Journal of Evaluation, 26, 3, 3889-401.
● Shern, D., Trochim, W., & Christina, L. (1995). The use of concept mapping for assessing fidelity
of model transfer: An example from psychiatric rehabilitation. Evaluation and Program
Planning, 18 (2) p. 1442-153.
● Vickie L. Shavers, PhD; Pebbles Fagan, PhD, MPH; Deirdre Lawrence, PhD, MPH; Worta
McCaskill-Stevens, MD; Paige McDonald, PhD; Doris Browne, MD, MPH; Dan McLinden, ED;
Michaele Christian, MD; and Edward Trimble, MD Bethesda, Maryland and Montgomery,
IllinoisBarriers to Racial/Ethnic Minority Application and Competition for NIH Research Funding.
JOURNAL OF THE NATIONAL MEDICAL ASSOCIATION, 1063-1077.
● Southern, D. M., Young, D., Dunt, D., Appleby, N. J., & Batterham, R. W. (2002). Integration of
primary health care services: Perceptions of australian general practitioners, non-general
practitioner health service providers and consumers at the general practice-primary care
interface. Evaluation And Program Planning, 25(1), 47-59.
● Stokols, D., Fuqua, J., Gress, J., Harvey, R., Phillips, K., Baezconde-Garbanati, L., et al. (2003).
Evaluating transdisciplinary science. Nicotine and Tobacco Research, 5 Suppl 1, S21-39.
● Trochim, W. (1985). Pattern matching, validity, and conceptualization in program evaluation.

Evaluation Review, 9 (5) p.575-604. October.
● Trochim, W. (1989). Concept mapping: Soft science or hard art? Evaluation and Program
Planning, 12 (1) p.87-110.
● Trochim, W. (1989). An introduction to concept mapping for planning and evaluation.

Evaluation and Program Planning, 12(1), 1-16.
● Trochim, W. (1989). Outcome pattern matching and program theory. Evaluation and Program
Planning, 12 (1) p.355-366.
● Trochim, W. (1989). Special issue: Concept mapping for evaluation and planning. William
Trochim, Guest Editor, Evaluation and Program Planning. Vol. 12 (1).
● Trochim, W. (1996). Criteria for evaluating graduate programs in evaluation. Evaluation News
and Comment: The Magazine of the Australasian Evaluation Society, 5(2), 54-57.
● Trochim, W. and Cabrera, D. (2005). The complexity of concept mapping. Emergence:

Complexity and Organization, 7, 1, 11-22.
● Trochim, W. and Kane, M (2005). Concept mapping: An introduction to structured

conceptualization in health care. International Journal for Quality in Health Care, 17, 3, June
2005, 187-191.
● Trochim, W., Cook, J., & Setze, R. (1994). Using concept mapping to develop a conceptual
framework of staff’s views of a supported employment program for persons with severe
mental illness. Consulting and Clinical Psychology, 62 (4) p.766-775.
● Trochim, W., & Judith, C. (1992). Pattern matching in theory-driven evaluation: A field example
from psychiatric rehabilitation. Chapter 3, Research Strategies and Methods, pp. 49-69, New
York: Greenwood Press.
● Trochim, W., & Linton, R. (1986). Conceptualization for planning and evaluation. Evaluation
and Program Planning, 289-308.
● Trochim, W., Milstein, B., Wood, B., Jackson, S., & Pressler, V. (2004). Setting objectives for
community and systems change: An application of concept mapping for planning a
statewide health improvement initiative. Health Promotion Practice, 5(1), 8-19.
● Trochim, W., Stillman, F., Clark, P., & Schmitt, C. (2003). Development of a model of the
tobacco industry's interference with tobacco control programs. Tobacco Control, 12, 140-147.
● Valentine, K. (1989). Contributions to the theory of care. Trochim, William (ed.), Evaluation and
Program Planning, 12 (1) p.17-24, a special issue on Concept Mapping for Evaluation and
Planning.
● van Nieuwenhuizen, C., Schene, A. H., Koeter, M. W. J., & Huxley, P. J. (2001). The lancashire
quality of life profile: Modification and psychometric evaluation. Social Psychiatry And
Psychiatric Epidemiology, 36(1), 36-44.
● VanderWaal, M. A. E., Casparie, A. F., & Lako, C. J. (1996). Quality of care: A comparison of
preferences between medical specialists and patients with chronic diseases. Social Science &
Medicine, 42(5), 643-649.
● White, K. S., & Farrell, A. D. (2001). Structure of anxiety symptoms in urban children: Competing
factor models of the revised children's manifest anxiety scale. Journal Of Consulting And
Clinical Psychology, 69(2), 333-337.
● Witkin, B., & Trochim, W. (1997). Toward a synthesis of listening constructs: A concept map
analysis of the construct of listening. International Journal of Listening, 11, 69-87.
Selected Unpublished Papers
● The Evaluator as Cartographer: Technology for Mapping Where We're Going and Where
We've Been. Keynote presentation to the 1999 Conference of the Oregon Program Evaluators
Network, "Evaluation and Technology: Tools for the 21st Cenury", Portland, Oregon, October 8,
1999. Now also available in Spanish.
● Trochim, W. Developing an Evaluation Culture in International Agriculture Research. Invited

address presented at the Cornell Institute on International Food, Agriculture and
Development's (CIIFAD) workshop on the Assessment of International Agricultural Research
Impact for Sustainable development, Cornell University, Ithaca NY, June 16-19, 1991.
● Trochim, W. An Internet-Based Concept Mapping of Accreditation Standards for Evaluation.

Paper presented at the Annual Conference of the American Evaluation Association, Atlanta,
Georgia, November, 1996.
● Trochim, W. Reliability of Concept Mapping. Paper presented at the Annual Conference of

the American Evaluation Association, Dallas, Texas, November, 1993.
● Trochim, W. (1993). Workforce Competencies for Psychosocial Rehabilitation Workers: A

Concept Mapping Project. Final report for the conference of the International Association of
Psychosocial Rehabilitation Services, Albuquerque, new mexico, November 11-12, 1993.
Theses and Dissertations
● Abstracts of Theses and Dissertations that used concept mapping

Publications
In Reverse Chronological Order
Biopharmaceutical Statistics, 149-156. New York, NY: Marcel Dekker.
Cappelleri, J.C. and Trochim, W. (1995). Ethical and scientific features of cutoff-based designs of clinical trials:
A simulation study. Medical Decision Making, 15, 4, 387-394.
Reichardt, C.S., Trochim, W. and Cappelleri, J. (1995). Reports of the death of the regression-discontinuity
design are greatly exaggerated. Evaluation Review, 19, 1, 39-63.
Cappelleri, J. and Trochim, W. (1994). An illustrative statistical analysis of cutoff-based randomized clinical
trials. Journal of Clinical Epidemiology, 47, 261-270.
Cappelleri, J., Darlington, R.B. and Trochim, W. (1994). Power analysis of cutoff-based randomized clinical
trials. Evaluation Review, 18, 141-152.
Trochim, W. (1994). The regression-discontinuity design: An introduction. Research Methods Paper Series,
Number 1, Thresholds National Research and Training Center on Rehabilitation and Mental Illness, Chicago, IL.
Trochim, W. and Cappelleri, J.C. (1992). Cutoff assignment strategies for enhancing randomized clinical trials.
Controlled Clinical Trials, 13, 190-212.
Cappelleri, J.C., Trochim, W., Stanley, T.D., and Reichardt, C.S. (1991). Random measurement error doesn't
bias the treatment effect estimate in the regression-discontinuity design: I. The case of no interaction.
Trochim, W., Cappelleri, J.C. and Reichardt, C.S. (1991). Random measurement error doesn't bias the
treatment effect estimate in the regression-discontinuity design: II. When an interaction effect is present.
Evaluation Review, 15,5, 571-604.
Trochim, W. (1990). Regression-discontinuity design in health evaluation. In L. Sechrest, E. Perrin and J. Bunker
(Eds.). Research Methodology: Strengthening Causal Interpretations of Nonexperimental Data. U.S. Dept. of
HHS, Agency for Health Care Policy and Research, Washington, D.C.
Trochim, W. The Regression-Discontinuity Design in Health Evaluation. Invited presentation at the conference
on Strengthening Causal Interpretations of Non-Experimental Data sponsored by the National Center for
Health Services Research and Technology Assessment, Tucson, April 8-10, 1987.
Trochim, W. (1984). Research Design for Program Evaluation: The Regression-Discontinuity Approach. Beverly
Hills, CA: Sage Publications. Also, please note: Missing pages 68-87.

Evaluation Studies Review Annual, Volume 8, Beverly Hills, CA: Sage Publications.

● ● Search ● Contact Us
●
United States
Adobe Reader
Download Adobe Reader
Step 1 of 2 Adobe Reader updates

Get the latest updates available for your
version of Adobe® Reader®.
Select your version of Windows: Select a Windows version...
Distribute Adobe Reader
Find out how to distribute Adobe Reader
Choose a different version software on an intranet, CD, or other
media, or place an "Includes Adobe Reader"
logo on your printed material.
Step 2 of 2
More info
Adobe Reader for System requirements Adobe Reader

Windows ,
English Adobe Reader for Symbian OS™
Latest version
Adobe Reader for Pocket PC
Adobe Reader for Palm OS®
What is Adobe PDF?
By downloading software from the Adobe web site, you agree to the terms of our license agreements, including that
you agree not to use Adobe Reader software with any other software, plug-in or enhancement which uses or relies
on Adobe Reader when converting or transforming PDF files into other file formats.
Adobe Reader license agreement
Adobe Photoshop Starter Edition license agreement
Copyright © 2006 Adobe Systems Incorporated. All rights reserved.
● Terms of Use
● Online Privacy Policy
● Accessibility
● Preventing software piracy
● Permissions and trademarks

● Product license agreements
..
..
..
..
..
The Evaluator as
Cartographer:
. . . . . . . . .
Technology for Mapping Where We're Going and

Where We've Been
Cornell University
Paper Presented at the 1999 Conference of the Oregon Program

Evaluators Network, "Evaluation and Technology: Tools for the
21st Century", Portland, Oregon, October 8, 1999.
DRAFT: Not for quotation, citation or dissemination
 1999, William M.K. Trochim, All Rights Reserved

Introduction
The renowned organizational psychologist Karl Weick used to tell an anecdote that
1
illustrates the critical importance of maps to a group or organization.
A group of mountain climbers was in the process of ascending one of the most daunting peaks in the
Alps when they were engulfed by a sudden snow squall. All were experienced climbers and each
had their own idea of the direction they should go in to get back to the base camp. They wandered
around for some time, arguing about which way to go, while their circumstances became more dire
and threatening with each moment of indecision. Finally, one of the climbers dug around in their
backpack and found a map. Everyone huddled around the map, studied it, and quickly determined
their direction. Several hours later, they arrived safely at the camp. While they were warming
themselves around the fire, regaling each other with the story of their near misadventure, one of the
climbers picked up the map they had used to descend the Alps. On looking at it more carefully, they
realized it was actually a map of the Pyrenees!
The map provided the group with an externalized organizing device around which
they could reach consensus. It gave them an apparent sense of direction and
enabled them to formulate a coherent plan. It led them to take action in a concerted
manner. But it was the wrong map. Perhaps this group was simply lucky. The way
in which their map was wrong happened to provide one possible "correct" path for
them. But the story makes it clear that the group was going nowhere anyway.
Without some type of device to enable them to coalesce as a group they would
almost certainly have perished in the confusion and conflict. Of course, we would
prefer that our mountain climbers have the "right" map. But, in a storm, even the
wrong map can sometimes lead to better decision making than no map at all.
In an intriguing and stimulating volume several decades ago, Nick Smith (1981)
described a number of interesting metaphors for thinking about evaluation work. He
showed that it was useful to construe what evaluators did as analogous to what other
professions and fields engaged in, sometimes seemingly even very unrelated fields.
For instance, he likened aspects of evaluation to what lawyers do when they engage
in a trial or what art critics do when they review artwork. He did not mean to suggest
that evaluators really were lawyers or art critics, only that some aspects of evaluator's
work shared important characteristics with these others.
This paper seeks to extend that notion to a field he did not consider then:
cartography. It suggests that one way to think about evaluation is to view it as akin to
mapmaking. Like the cartographer, the evaluator gathers information, albeit not
2
geographical in nature . Like the cartographer, evaluators analyze and represent that
information, making decisions about how best to draw it, minimize biases and depict
perspectives. Like the cartographer, evaluators hope that their representations are
useful for guiding others and helping them make more informed decisions. And, in
the two decades since Smith's work, we have even seen a form of analysis and
representation emerge in evaluation that is called "mapping." These are not
geographic maps, but maps of ideas and maps of data. The evaluator facilitates the
creation of these maps as an extension of and alternative to more traditional tabular,
numerical and textual forms of data representation. And, the evaluator hopes that,
like geographic maps, their concept and data maps will help to inform and guide
others and enable them to make better decisions.
1
I recall hearing this story but do not have an exact citation. I have done my best to be accurate to the
original details. If anyone has a citation for this, please contact me.
2
The reader should note that this paper does not address in any way the important and growing use of
geographic information systems in evaluation research. The "mapping" the metaphor refers to is intended
to be the data and concept maps of the type described here.
2
As with Smith's metaphors, there is no intent here to suggest that evaluation and
cartography are the same thing or that one is in any way a subset of the other. There
are many ways in which the two fields are distinct. But, this paper intends to show
that, especially when it comes to the newer mapping techniques emerging in
evaluation, the evaluator will benefit by considering where the cartographers have
been and where they appear to be going. Evaluation-cartographers will be better
informed and prepared if they learn about the ways cartographers consider issues of
context, purpose, methodology, bias, and representation in their own field. This paper
hopes to make a beginning in this interdisciplinary endeavor.
Maps and Concept Maps

The idea of a "map" goes back into the farthest reaches of human history, predating
even historical records (Wilford, 1982). The impulse to depict visually the distinction
between here and there or to direct one in getting from place to place is a
fundamental human trait. The historical record suggests that "…the map evolved
independently among many peoples in many separate parts of Earth" (Wilford, 1982).
Throughout history maps have played important roles in providing direction,
demarcating property, establishing claims, and demonstrating the powers of states.
When most people think of maps, they think of geographical depictions. The naïve
assumption is that a map is as accurate a representation of some external physical
reality as the cartographer could make it. We will see that this is not the view of most
contemporary cartographers, but it is one that persists in the general public.
We can distinguish between several different fundamental types of maps to illustrate

both the evolution of thinking about map structure and to show how far we have
moved from the idea of the geographical fundamentalism usually association with
cartography. To do so, we will develop a simple classification system that will
probably not outlive this paper, but that might help orient us to some meaningful
distinctions.
Let's begin by making a simple distinction between the "base" of a map and any
additional data that is represented upon this base. Furthermore, let's distinguish
between whether the base is a geographic one or whether it represents some other
relational arrangement. We then have four broad types of maps:
No Data Data
Geographical Base geographic map geographic data map
Relational Base relational map relational data map
Any geographical map -- a highway map, geological survey map, and most maps
from a standard atlas -- can be considered geographically "isomorphic". That is,
there is a presumed one-to-one relationship between the information depicted on the
map and some external physical reality. This isomorphism can never be exact -- it is
always an errorful signification. For instance, the star symbol the highway map
shown in Figure 1 indicates where my house is located geographically. The star itself
is only a sign that represents the house -- I don't live in a star-shaped house, and my
house is not nearly as large as depicted on this map -- but the map assumes that
there is a one-to-one correspondence (i.e., isomorphism) between the objects on the
map and something in geographical reality. If you navigate to the place indicated by
the map, you will find my house. The correspondence is "star" = "house" and "line" =
"road" and "blue line" = "water" and so on.
3
Figure 1. A first-level map with geographic isomorphism.
A second-level map is one that retains a geographical base but depicts data upon it.
We might term this type of map a "geographical data map" one. The idea is to
represent some characteristic in reference to the geographical base. For instance,
consider the map in Figure 2 that depicts earthquakes in the contiguous 48 states. If
you were to use this to navigate geographically to a location with white on the map
(indicating a high-risk area), you would not be likely to "find" an earthquake in the
same sense that you would "find" my house if you follow the geography implicit on
Figure 1. In most data maps, the "data" is not in itself geographical. It can simply be
represented well with respect to geography.
Figure 2. 1989 Computer-generated map showing earthquake-prone areas. High-risk areas appear as
white peaks. Courtesy of Melvin L. Prueitt, Los Alamos National Laboratory. Data from the U.S.
Geological Survey.
The data can be even more abstract than earthquakes. For instance, we might map
specific crimes onto a geographical base as shown in the map for Evansville, Indiana
in Figure 3. Again, while there is a geographical base to the data depicted on the
map, one would not expect to be able to go to a specific dot on this map and find the
indicated event. The geographic base is used as a frame for the depiction of the
crime data in a visually meaningful manner.
4
Figure 3. Crime map for Evansville, Indiana, representing crimes for the week ending September 22,
1999.
One more level of abstraction. In Figure 4 we see crime data aggregated by state.
Here, the data is even more abstract and highly aggregated than in Figure 3. We
would not expect that by going to the state of Florida we would "see" in any direct
manner the higher crime reflected in the dark fill for that state. The geographic base
is used as a framework for representing information on another variable or measure.
It is important to recognize that the data being represented on this map represents
considerable judgment. What is meant by a "serious" crime for purposes of the map?
How accurately can we assume such crimes are actually measured? How are the
levels of crime divided into the five categories depicted in the different shades? How
might the map look different if the data was shown by county?
Figure 4. Serious crime by state (darker areas indicate higher serious crime rates.
The type of geographical data map shown in Figure 4 is an invention of the past few
centuries. The idea of using geographical maps as a basis for depicting other non-
geographic information is a distinctly "modern" one. Note that we can move to
increasingly abstract characteristics as the objects of representation. For instance,
5
instead of measuring crime, we could measure individuals' perceptions of safety. We
could conduct a public opinion survey asking people to rate subjectively the degree to
which they feel safe in their home communities and graph that information onto a
geographic base. In this case, the respondent's perception of safety may bear little or
no resemblance to the pattern of reported crime for the same geographical area.
If we move away from the geographical base, we come to the third type of map which
will be termed here a "relational map." Here, the base is not a geographical one. For
instance, if we asked a person to "draw" a map of different types of crimes, placing
crimes that they think are more similar to each other closer together on the map, we
would have a relational representation. The "base" for this map would consist not of
geographical entities but, in this case, of crimes. These crimes would be located in
proximity to one another based on a person's judgment of their similarity. The implicit
"landscape" for such a map is the person's subjective perceptual relational
arrangement of crimes and crime types.
Any map that uses non-geographic relationship information as the base can be
classified as a relational map. If the intent of the map is to represent accurately some
implicit cognitive reality, we can refer to it as a relationally isomorphic map. Here,
each symbol has a one-to-one relationship to an idea or construct, and the
arrangement of symbols on the map shows how the ideas are thought to be
interrelated.
In the past several decades we have seen a large number of these types of maps
evolve. They range from the "mind maps" recommended by Tony Buzan (Buzan,
1993) and illustrated in Figure 5 to the concept maps developed by Joe Novak
(Novak and Gowin, 1985, Novak, 1993) and exemplified in Figure 6. In both types of
maps, the intent is to represent a person's thinking pictorially. The relational structure
is based on lines that depict relationships between ideas. No attempt is made to
develop a meaningful Euclidean framework on which to depict relationships -- it is the
connecting lines that carry the relational information.
In most cases, these types of maps are constructions of individuals. Where groups
are involved, there is no algorithm for aggregating -- the group must do that
interactively.
6
Figure 5. A stylized mind map in the style of Tony Buzan (Buzan, 1993) showing the linkages among
researchers' ideas for a research project.
Figure 6 A relationally isomorphic concept map in the style of (Novak and Gowin, 1985) on the topic of St.
Nicolas..
We come finally to the types of maps described in this paper as concept maps.
Illustrations of these types of maps are provided later. These are also relational
maps. But in these maps the relational base provides the structure for carrying or
representing additional data. In this sense, we can describe these as relational data
maps. These concept maps are usually constructions of groups. They utilize
7
mathematical algorithms that aggregate individuals' judgments of the similarity among
ideas and represent the ideas in symbols arrayed in Euclidean space. In this sense,
these maps are like geographical maps in that the distance between symbols is
meaningfully interpreted as an empirical estimate of the semantic distance between
ideas.
The point of this classification system, other than providing us with useful terms to
distinguish different types of maps, is to show that there is a continuum between the
traditional geographic maps and the more recently evolved conceptual ones. They
are less distinct than might at first appear to be the case. Given this relationship
between traditional geographical maps and the concept maps that are the subject of
this paper, we might look to the cartographic profession and their current
methodological discussions for insights that help us understand better the issues of
quality, interpretability and validity involved in concept mapping.
Before doing so, the basic procedure for producing a concept map is described, and
a case study example presented in some detail. With this prologue, we can examine
some of the issues or relevance that contemporary cartography is grappling with.
Concept Mapping
Concept Mapping
Concept mapping is a process that can be used to help a group describe its ideas on
any topic of interest (Trochim, 1989a) and represent these ideas visually in the form
of a map. The process typically requires the participants to brainstorm a large set of
statements relevant to the topic of interest, individually sort these statements into
piles of similar ones and rate each statement on some scale, and interpret the maps
that result from the data analyses. The analyses typically include a two-dimensional
multidimensional scaling (MDS) of the unstructured sort data, a hierarchical cluster
analysis of the MDS coordinates, and the computation of average ratings for each
statement and cluster of statements. The maps that result show the individual
statements in two-dimensional (x,y) space with more similar statements located
nearer each other, and show how the statements are grouped into clusters that
partition the space on the map. Participants are led through a structured
interpretation session designed to help them understand the maps and label them in
a substantively meaningful way.
The concept mapping process discussed here was first described by Trochim and
Linton (1986). Trochim (1989a) delineates the process in detail and Trochim (1989b)
presents a wide range of example projects. Concept mapping has received
considerable use and has been used to address substantive issues in social services
(Galvin, 1989; Mannes, 1989), mental health (Cook, 1992; Kane, 1992; Lassegard,
1993; Marquart, 1988; Marquart, 1992; Marquart et al, 1993; Penney, 1992; Ryan and
Pursley, 1992; Shern, 1992; Trochim, 1989a; Trochim and Cook, 1992; Trochim et al,
in press; Valentine, 1992), health care (Valentine, 1989), education (Grayson, 1993;
Kohler, 1992; Kohler, 1993), educational administration (Gurowitz et al, 1988),
training development (McLinden and Trochim, in press) and theory development
(Linton, 1989, Witkin and Trochim, 1996). Considerable methodological work on the
concept mapping process and its potential utility has also been accomplished (Bragg
and Grayson, 1993; Caracelli, 1989; Cooksy, 1989; Davis, 1989; Dumont, 1989;
Grayson, 1992; Keith, 1989; Lassegard, 1992; Marquart, 1989; Mead and Bowers,
1992; Mercer, 1992; SenGupta, 1993; Trochim, 1985 , 1989c, 1990, 1993).
8
How a Concept Map is Produced
Concept mapping combines a group process (brainstorming, unstructured sorting and
rating of the brainstormed items) with several multivariate statistical analyses
(multidimensional scaling and hierarchical cluster analysis) and concludes with a
group interpretation of the conceptual maps that result.
In the typical situation, concept mapping begins with the formulation of a focus
statement that guides and delimits the scope of the map. A set of statements that
address this focus statement is then produced, usually through some form of
3
brainstorming . Two types of data are typically collected with respect to these
statements. First, each participant is asked to sort the statements into piles of similar
ones, an unstructured similarity sort. The sort is required -- a concept map cannot be
produced without sorting data. Second, each participant is usually (although this is
not a requirement) asked to rate each statement on one or more variables. Most
typically, each statement is rated for its relative importance, usually on a 1 (Relatively
Unimportant) to 5 (Extremely Important) scale. The rating information is not used to
produce the base of the map itself -- it is only used as an overlay on a map that was
constructed from sort data.
The first step in the analysis involves transforming each participant's sort into
quantitative information. The challenge here is to find a way to "aggregate" or
combine information across participants given that different individuals will have
different numbers of sort groups or piles. The solution is to place each sort result into
the same-sized matrix. Figure 7 illustrates this for the simple example of a single
participant and a ten-statement sort. This person sorted the 10 statements into 5
piles or groups. Other participants may have had more or fewer groups. but all
sorted the same number of statements, in this example, ten. So, we construct a
10x10 matrix or table of numbers. For each individual, the table is a binary one
consisting only of 0s and 1s. If two statements were placed together in a pile, their
corresponding row and column numbers would have a 1. If they weren't placed
together, their joint row-column value would be a 0. Because a statement is always
sorted into the same pile as itself, the diagonal of the matrix always consists of 1s.
The matrix in symmetric because, for example, if statement 5 is sorted with statement
8 then it will always be the case that statement 8 is sorted with 5. Thus, the concept
mapping analysis begins with construction from the sort information of an NxN binary
(where N = the number of statements), symmetric matrix of similarities, Xij. For any
two items i and j, a 1 is placed in Xij if the two items were placed in the same pile by
the participant, otherwise a 0 is entered (Weller and Romney, 1988, p. 22).
3
The concept mapping methodology doesn't know or care how the statements are generated. They could
be abstracted from existing documents, generated by an individual, developed from an interview
transcript, and so on. All that the method requires is that there is a set of statements. Of course, one's
interpretation of the maps would depend critically on how the statements were generated. The current
version of the Concept System software allows for up to 200 statements in a map, although it would be
rare that a participant group would be comfortable dealing with any more than 100 or so.
9
5 6 3 10 7
Sort Cards 8 2 4
for 10 1
9
statements
1 2 3 4 5 6 7 8 9 10
1 1 1 0 0 0 1 0 0 1 0
2 1 1 0 0 0 1 0 0 1 0
3 0 0 1 1 0 0 0 0 0 0
4 0 0 1 1 0 0 0 0 0 0
Binary Square 5 0 0 0 0 1 0 0 1 0 0
Similarity Matrix 6 1 1 0 0 0 1 0 0 1 0
7 0 0 0 0 0 0 1 0 0 0
for one Person 8 0 0 0 0 1 0 0 1 0 0
9 1 1 0 0 0 1 0 0 1 0
10 0 0 0 0 0 0 0 0 0 1
Figure 7. Transforming sort data into a binary square similarity matrix.
With this simple transformation of the sort into matrix form, we now have a common
data structure that is the same size for all participants. This enables us to aggregate
across participants. Figure 8 shows how this might look when aggregating sort
results from five participants who each sorted a ten-statement set. In effect, the
individual matrices are "stacked" on top of each other and added. Thus, any cell in
this matrix could take integer values between 0 and 5 (i.e., the number of people who
sorted the statements); the value indicates the number of people who placed the i,j
pair in the same pile. Thus, in this second stage, the total NxN similarity matrix, Tij
was obtained by summing across the individual Xij matrices.
1 0 1 1 0 0 0 0 0 0
1 0 0 1 0 0 0 1 0 0 0
1 0 0 1 0 0 1 1 0 0 0 0
1 0 1 1 0 0 0 1 0 0 0 0 0
1 0 0 1 0 0 1 0 0 0 0 0 0 0
5 0 2 5 0 0 2 3 0 0 0 0 0 0 0
0 5 0 0 0 1 0 0 2 0 0 0 0 0 0
2 0 5 3 0 0 0 0 0 0 0 0 0 0 0
Total Square 5 0 3 5 0 0 0 0 0 0 0 0 0 0 0
Similarity 0 0 0 0 5 0 0 2 0 0 0 0 0 0 1
Matrix Across 0 1 0 0 0 5 0 0 4 0 0 0 0 1
2 0 0 0 0 0 5 0 0 0 0 0 1
Five 3 0 0 0 2 0 0 5 0 0 0 1
Participants 0 2 0 0 0 4 0 0 5 0 1
0 0 0 0 0 0 0 0 0 5
Figure 8. Aggregating sort data across participants.
It is this total similarity matrix Tij that is analyzed using nonmetric multidimensional
scaling (MDS) analysis with a two-dimensional solution. The solution is limited to two
dimensions because, as Kruskal and Wish (1978) point out:
Since it is generally easier to work with two-dimensional configurations

than with those involving more dimensions, ease of use considerations
are also important for decisions about dimensionality. For example,
when an MDS configuration is desired primarily as the foundation on
which to display clustering results, then a two-dimensional configuration
is far more useful than one involving three or more dimensions (p. 58).
10
The analysis yields a two-dimensional (x,y) configuration of the set of statements
based on the criterion that statements piled together by more people are located
closer to each other in two-dimensional space while those piled together less
frequently are further apart. The similarity matrix input and the most basic "point
map" output are shown in Figure 9.
5 0 2 5 0 0 2 3 0 0
0 5 0 0 0 1 0 0 2 0
2 0 5 3 0 0 0 0 0 0
5 0 3 5 0 0 0 0 0 0
0 0 0 0 5 0 0 2 0 0
Input: A square matrix of
0 1 0 0 0 5 0 0 4 0 relationships among a set
2 0 0 0 0 0 5 0 0 0
3 0 0 0 2 0 0 5 0 0 of entities
0 2 0 0 0 4 0 0 5 0
0 0 0 0 0 0 0 0 0 5
13 16 17 22
3 23 24 18 38
27
43 12 8 26
50 52
25
36 6 44
37
41
29 30 34
7
35
47 51 42
31
10 28
33
54 45
Output: An n-dimensional 14
32
39
mapping of the entities 1
49 40 46 11
4 48 9
20 55 19 56
21 5
53 15
Figure 9. Input and output of the map analysis.
Multidimensional scaling (MDS) is the analytic procedure that accomplished the basic
mapping shown in Figure 9. How does it do this? There are numerous mathematical
descriptions of the process (Davison, 1983, Kruskal and Wish, 1978) that will not be
repeated here. Instead, we attempt to provide a nonmathematical explanation that
will hopefully provide some insight regarding how MDS accomplishes its work.
One way to accomplish this intuitively is to think about the opposite of what MDS
accomplishes. As described above in Figure 9, MDS takes a square matrix of
4 5
(dis)similarities for a set of items/objects as input and produces a map as output.
To see how it does this, think about the much more intuitive task of going in the
opposite direction -- starting with a map and, from it, producing a table of
(dis)similarities. Figure 10 shows a map of the United States with three major cities
indicated. The cities are the "objects", the points on this map that are analogous to
the statements on a concept map. How would you produce, from a two-dimensional
map like this, a table of (dis)similarities? The simplest way would be to use a ruler to
measure the distances (in whatever units) between all possible pairings of the three
points. The figure shows the distances obtained (in inches) and the table of
dissimilarities. This is the opposite of what MDS does, but it is a common task and
easily understood by anyone who has ever worked with a map.
4
The term (dis)similarity is used in the MDS literature to indicate that the data can consist of either
dissimilarities or similarities. In concept mapping, the data is always the square symmetric similarity
matrix that is generated from the sorting data.
5
The "map" is the distribution of points that represent the location of objects in N-dimensional space. in
concept mapping, the objects are the brainstormed (or otherwise generated) statements and the map that
MDS produces is the point map in two dimensions.
11
Chicago 1.89”
4.28” New York
6.12”
Los Angeles
To get a matrix
of dissimilarities NY CH LA
from a map NY -- 1.89” 6.12”
CH 1.89” -- 4.28”
LA 6.12” 4.28” --
Figure 10. Doing the "opposite" of MDS -- moving from a map to a table of (dis)similarities.
Now, consider how MDS works in the context of this example. MDS would start with
a matrix of distances between the three cities and would produce a map that shows
the three cities as points. Figure 11 shows how this would work. We begin with a
common table, easily obtained from an almanac or atlas. The table shows the
straight airline distances (in miles) between the three cities. The goal of MDS would
be to convert this information into a map. We'll limit the result to two-dimensions
because that is the solution we typically use in concept mapping. How might you do
this manually? The airline distances in the table range from 713 to 2451 and those
units are most likely inconvenient for drawing a map on a piece of paper. As a first
step, you might convert the airline distances into a unit of measurement that is more
convenient for graphing. In the figure, we convert to inches using a scale of 1 inch for
every 500 airline miles. Now that we are operating in inches, we can more easily
work on a sheet of paper. The object is to place three points on the piece of paper so
that the distances between the points best represents the distances in inches in the
table. You would start by placing one point. In this case, let's assume you place a
point on the paper to represent Los Angeles. Next, you place the second point. It
doesn't matter whether you choose New York or Chicago, so let's assume you
arbitrarily select New York. Where do you place it? According to the table of
distances, you need to place New York 4.90" from Los Angeles (note that it doesn't
matter in which direction you go, only that New York is exactly that distance from Los
Angeles). The figure shows a ruler that indicates where you would place the NY
point.
12
NY CH LA NY CH LA
NY -- 713 2451 NY -- 1.43” 4.90”
CH 713 -- 1745 CH 1.43” -- 3.49”
LA 2451 1745 -- LA 4.90” 3.49” --
Airline distances between cities Airline distances converted to inches
(500 miles = 1 inch)
CH
LA NY
0 1 2 3 4 5 6
To get a map from a

matrix of
dissimilarities
Figure 11. How to manually obtain a two-dimensional map from a matrix of dissimilarities.
All that remains is to place the point for the third city, in this case, Chicago. Where
should it go? It must simultaneously meet the condition of being both 3.49" from Los
Angeles and 1.43" from New York. Figure 11 shows how you might meet these
conditions using a compass to draw a semi-circle from Los Angeles at 3.49" and one
from New York at 1.43". The semi-circles interact at two points and either of these
would be equally good locations for the Chicago point if the object is to represent the
distances.
With only three cities, it's a pretty simple matter to construct a map from a table of
distances. But what if you had four, or ten, or even 100? The process would rapidly
become tedious and, after a few points would become well nigh impossible to
accomplish manually. In concept mapping we usually have lots of ideas, sometimes
as many as 100 or even 150 that need to be mapped. The input for the mapping --
the analogy to the airline distance table -- is the matrix of similarities among
statements that is obtained from the sort task as described in Figure 8. The output is
the point map of the statements. MDS accomplishes mathematically a process
analogous to what you would do manually in our simple three-city example, except
that it could do so for a table that includes a hundred cities or more.
There are several important insights from this simple description of what MDS does.
MDS does not know compass directions. It does not know North from South. In the
example in Figure 11, MDS would just as happily place Chicago in either of the two
locations. It might just as easily place Los Angeles on the right as on the left. This
means that when you look at a concept map generated by MDS, direction on the map
is entirely arbitrary. You could take a concept map and flip it horizontally or vertically
and/or rotate it clockwise or counterclockwise any amount and this would have no
effect on the distances among the points. The simple exercise shows that MDS
yields a relational picture and is indifferent to directional orientation.
In our three-city example, there will always be a two-dimensional solution that will
represent the table exactly, with no error. When we move to larger numbers of points
this will no longer be the case -- we will not be likely to be able to represent the
(dis)similarities exactly. Some (dis)similarity matrices will be able to be represented
13
more exactly in two dimensions than others will. In MDS, we estimate the overall
degree of correspondence between the input (i.e., the (dis)similarity matrix) and the
output (i.e., distances between points on the map) using a value called the Stress
Value. A lower stress value indicates a better fit, higher stress means the fit is less
exact. In general, you want lower stress, although it is not always clear whether slight
differences in stress can be translated into meaningful differences in the
interpretability of a map. The normative range for judging stress values in a particular
study should be determined from comparisons with similar types of data collected
under similar circumstances. We would never expect to get a stress value in a
concept mapping study of 100 ideas/statements that is anywhere near as low as what
we would obtain from a map of distances among 100 cities! We would also not
expect to get as low a stress value if we mapped cities not in terms of their airline
distances, but rather in terms of measures that have more variability like crime rates,
annual rainfall, or even perceived quality of life. In a study of the reliability of concept
mapping, Trochim (1993) reported that the average Stress Value across 33 concept
map projects was .285 with a range from .155 to .352. While the stress value has
some use in an interpretation, giving you some idea of how accurately your map
represents the input data relative to other maps, it is not clear that maps with lower
stress are more interpretable or useful than ones with considerably higher stress.
The discussion to this point shows how the concept mapping analysis uses sorting
results and MDS to produce the basic "point map" that is the foundation for all other
maps. While this is useful in itself, it is helpful to be able to view a concept map at
different levels of detail. Just as in geographic mapping, there are times when you
want considerable detail (e.g., when hiking or mountain climbing) and other times
where grosser-level maps (e.g., when driving on interstate highways) are more useful.
The point map generated by MDS is a fairly detailed map. To get to higher-level
maps that summarize across some of that detail a procedure known as hierarchical
cluster analysis is used. The input to the cluster analysis is the point map, specifically
the x,y values for all of the points on the MDS map. Using the MDS configuration as
input to the cluster analysis forces the cluster analysis to partition the MDS
configuration into non-overlapping clusters in two-dimensional space. Unfortunately,
mathematicians do not agree on what constitutes a cluster mathematically and,
consequently, there is a wide variety of algorithms for conducting cluster analysis,
each of them yielding different results. In concept mapping we usually conduct
hierarchical cluster analysis utilizing Ward's algorithm (Everitt, 1980) as the basis for
defining a cluster. Ward's algorithm has the advantage of being especially
appropriate with the type of distance data that comes from the MDS analysis. The
hierarchical cluster analysis takes the point map and constructs a "tree" that at one
point has all points together (in the trunk of the tree) and at another has all points as
their own end points of the "branches". All hierarchical cluster analysis approaches
are divided into two broad types, agglomerative and divisive. In agglomerative, the
procedure starts with each point as its own branch end-point and decides which two
points to merge first. In each successive clustering iteration, the algorithm uses a
mathematical rule to determine which two points and/or clusters to combine next.
Thus, the procedure agglomerates the points together until they are all in one cluster.
Divisive hierarchical cluster analysis works in the opposite manner, beginning with all
points together and deciding based on a mathematical rule how to divide them into
groups until each point is its own groups. Ward's method is an agglomerative one.
14
13 17 2
3 16 22 23 24 18 38
50 43
25 12 27 8 26
52
37 36 6 44
41
29 30 34
35 7
31 47 51 42
10 33 28
54 45 32
14 39 1 40 11
4 20
49
55 19 48 46 56 9
53 15 21 5
9 clusters
Figure 12. A suggestive hierarchical cluster tree for a point map, showing how a nine-cluster solution
would be obtained as a "slice" from the tree.
The kind of thing that is going on in hierarchical cluster analysis is suggested in

6
Figure 12 . The numbers show the locations of statements on a point map that was
generated by MDS. Each statement is the end-point of a branch. The tree shows
how statements get agglomerated and eventually combined onto a single trunk -- a
one-cluster solution. By taking vertical slices at different heights of the tree, one can
look at different numbers of clusters. The figure shows a vertical slice that would
result in nine separate clusters of statements. The resulting nine-cluster solution is
shown in two-dimensional concept map form as a "point-cluster" map in Figure 13.
6
The figure is labeled "suggestive" because it is only meant to convey the idea of a cluster tree visually.
There is some deliberate distortion of dimensionality here for graphic purposes.
15
2
13 16
3 23 24 18 38
17 27
43 22 12 8 26
50 52
25
36 6 44
37
41
29 30 34
7
35
31 47 51 42
10 33
54 45 28
32
14
39 1
49 40 46 11
4 48 9
20 19 56
55
21 5
53 15
Figure 13. Two-dimensional nine-cluster point-cluster map.
In most concept mapping application contexts it is not useful in interpreting results to

show the entire cluster analysis tree. Just as in geographic mapping, it is not feasible
to show all levels of detail simultaneously. The geographic cartographer makes
decisions about scale and detail depending on the intended uses of the map.
Similarly, in concept mapping, the facilitator, often in collaboration with a small group
of participants, decides on the number of clusters to use in maps that are
reported/presented. There is no simple mathematical criterion by which a final
number of clusters can be selected. The procedure typically followed in concept
mapping is to examine an initial cluster solution that was the maximum thought
desirable for interpretation in this context. Then, successively lower cluster solutions
(i.e., successive slices moving down the tree) are examined, with a judgment made at
each level about whether the merger seemed substantively reasonable, defensible or
desirable. The pattern of judgments of the suitability of different cluster solutions is
examined and results in a decision of a specific cluster solution that seems
appropriate for the intended purpose of the project. In some projects several such
cluster maps are produced to illustrate different levels of aggregation.
All that remains in the core of the concept mapping analysis is the incorporation of
any rating data or other measures that may have been gathered. Note that the only
data needed to produce the base point and cluster maps described above is the
sorting data. Rating data is always used in concept mapping to provide a third-
dimension, a vertical overlay that graphically depicts "height" of various areas of the
map. For instance, if as part of the data organizing phase, a simple rating of the
relative importance of each statement was obtained from participants, one could
7
graph the average importance rating for each statement by extruding a third
dimension on each point of the point map. Similarly, the importance of each cluster
could be indicated using "layers" for the clusters to indicate the average importance
7
After years of playing around with this stuff, it seems to be a consensus in concept mapping that a
pseudo-three-dimensional rendering is more interpretable than the technically more accurate true three-
dimensional representation.
16
rating of all statements in the cluster. Examples of variations of these maps will
provided below.
A Concept Mapping Example

This case study illustrates the use of concept mapping as the foundation for strategic
planning for the Oregon Program Evaluators Network (OPEN). It was undertaken as
a an informal demonstration project and done in advance of the annual OPEN
Conference where the results were to be presented and discussed. OPEN is an
organization for Oregonians and Washingtonians involved with, or interested in,
program evaluation. It allows members to exchange ideas and information which
promotes and encourages high-quality evaluation practices. The founding members
of OPEN represent government agencies, universities, and private consulting firms.
OPEN's mission is:
To provide a regional, interdisciplinary forum for professional

development, networking, and exchange of practical, methodological, and
theoretical knowledge in the field of evaluation.
Participants
There were approximately 325 OPEN members on an electronic mailing list who were
contacted by e-mail and asked to participate in the study. Since all participation up to
the interpretation of maps was done over the World Wide Web, participants self-
selected to go to the website and take part. Because brainstorming is anonymous,
there is no way of knowing exactly how many participants generated the 83
brainstormed statements. The organizing phase of the process requires that
participants logon and, consequently, it is possible to determine sample size exactly.
There were 23 people who successfully logged onto the web site, and 22 of these
answered at least some of the demographics questions (which only take a few
seconds to complete). The sorting and rating tasks are more demanding. Sorting
can take as long as 45 minutes to an hour and rating typically requires 10-15 minutes
for a set of 83 statements. Of the 23 who logged on, 17 successfully completed the
sort and 18 completed the rating task.
Procedure
The general procedure for concept mapping is described in detail in Trochim (1989a).
Examples of results of numerous concept mapping projects are given in Trochim
(1989b). The process implemented here was accomplished over the World Wide
Web using the Concept System© Global web program. This program can be used
from any terminal that can access the World Wide Web with any reasonably
contemporary web browser such as Internet Explorer or Netscape Navigator
(versions 2 or higher). No software needs to be downloaded to the client machine
and no applets or other programs are automatically downloaded.
The data collection for the project took place in two phases between September 4,
1999 and October 1, 1999. In the first phase (September 4 - 17), participants were
asked to brainstorm statements to a specific prompt. In the second phase
(September 20 - October 1), participants were asked to sort and rate the statements
and provide basic demographics data.
Phase I: Generation of Conceptual Domain. In the first phase, participants

generated statements using a structured brainstorming process (Osborn, 1948)
guided by a specific focus prompt that limits the types of statements that are
acceptable. The focus statement or criterion for generating statements was
17
operationalized in the form of the complete-the-sentence focus prompt to the
participants:
One specific thing I think the Oregon Program Evaluators Network should
do over the next five years is…
The brainstorming interface is illustrated in Figure 14. Participants were only required
to point their browsers to the project web page -- no software was required on the
participant's machine other than a traditional web browser and web access.
The participants brainstormed 83 statements. The complete set of statements is

shown in Table 1. When the brainstorming period ended, two of the principles from
the client group were asked to edit the brainstormed statements to assure that
spelling and grammar was correct. The website was then set up for Phase II of the
project.
Figure 14. Web-based brainstorming interface for statement generation using the Concept System Global
software.
Phase II: Organizing the Brainstormed Statements. As in Phase I, this phase was
accomplished entirely over the web. The organizing phase involved three distinct
tasks, the sorting and rating of the brainstormed statements, and collection of basic
demographic variables. For the sorting (Rosenberg and Kim, 1975; Weller and
Romney, 1988), each participant grouped the statements into piles of similar ones.
They were instructed to group statements together if they seemed similar in meaning.
Each group or pile of statements was given a name by the participant. The only
restrictions in this sorting task were that there could not be: (a) N piles (every pile
having one item each); (b) one pile consisting of all items; or (c) a "miscellaneous"
pile (any item thought to be unique was to be put in its own separate pile). Weller and
Romney (1988) point out why unstructured sorting (in their terms, the pile sort
method) is appropriate in this context:
18
The outstanding strength of the pile sort task is the fact that it can
accommodate a large number of items. We know of no other data
collection method that will allow the collection of judged similarity data
among over 100 items. This makes it the method of choice when large
numbers are necessary. Other methods that might be used to collect
similarity data, such as triads and paired comparison ratings, become
impractical with a large number of items (p. 25).
For the rating task, each participant was asked to rate each statement on a 5-point
Likert-type response scale in terms of how important the statement is with respect to
the future of OPEN. The specific rating instruction was:
Please rate each statement for how important you think it is for the future of OPEN, where
1=Relatively unimportant
2=Somewhat important
3=Moderately important
4=Very important
5=Extremely important
The demographics that were collected are shown in Table 2. Seventeen participants
had complete sorting data and eighteen had complete rating data.
Results
The first and most elemental map that is produced in the analysis is the point map
that shows the brainstormed statements in a two-dimensional space. Each point on
the map represents a single brainstormed idea. Points that are closer to each other
physically are more similar to each other cognitively, that is, they tended to be placed
together in the same sort piles by more of the participants. Hierarchical cluster
analysis is used to "partition" the points into graphically adjacent and proximal groups
or clusters. The point map overlaid on the eight-cluster solution is shown in Figure
15.
1661
7 3
26 5681584
39
13
36 15 55 60
83 27
38 52
69 25 24
33 23 63 75
8
17
254
66 72
35
29
53
47
80 28 10
11 73 12
34 44 9
42 57 82
22
49 79 43 6574
77
46 41 2120
37
64 68 67 5116
325
78
71 62
50 45 76
30 1970
14
3140
18 48 59
Figure 15. Point-cluster map for the OPEN project.
Each point on the map is shown with an accompanying statement identification

number that enables one to identify the contents of the statements as shown in Table
1. We can also see the text of statements within each cluster, where the statements
19
are ordered in descending order by average importance across all participants as
shown in Table 3.
While the map shows a considerable amount of detail, it is not in this form very useful
as a graphic entity. In cartographic terms, it may be that the scale of the map is too
8
detailed to be of much use in typical decision-making contexts. To move to a higher
level of abstraction or scale, we might drop off the detail of the points and include
meaningful text labels for each of the clusters as shown in Figure 16.
How do we obtain the labels? First, the analysis uses an algorithm that
mathematically determines the "best-fitting" label for each cluster based on the sort
pile labels the participants developed. In fact, the algorithm enables one to select the
top-10 best-fitting labels for each cluster in descending order of goodness-of-fit.
Second, the participants examine the statements in each cluster and the top-10
9
labels, and from this information determine a cluster label that makes most sense.
Finally, the labeled cluster map can be graphed as shown in Figure 16.
This cluster map is often both interpretable and useful. in addition to its role as a
summarizing and signifying device for the ideas of the group, it also often
enables/encourages the participants to begin developing theories about the
interrelationships among the idea clusters. For example, it is often sensible to
"regionalize" the map by grouping clusters of clusters. In the map in Figure 16, it
makes sense to group the four clusters on the top and left into one region that has to
do with OPEN's relationship with the world external to it. The four clusters in this
region -- Recruitment, Linkages, Outreach, and PR & Community Service -- describe
the major areas of interface with this outside environment. On the other hand, the
clusters on the lower and right side of the map pertain more to issues internal to
OPEN -- the Programs & Events, Services, Communication, and
Mentoring/Scholarships.
8
Although this level of detail will be much more important when the organization actually begins to work
on detailed design of a strategic plan with specific tasks or action steps). The point map is analogous to a
detailed contour map or to a geological survey map - it doesn't give one a very good overall sense of the
geography, but it is absolutely essential if you are going to hike through the area or are thinking of building
there.
9
In this demonstration project the author developed the final cluster labels by examining the statements in
each cluster and looking at the top-10 labels. For instance, consider the cluster labeled 'Recruitment' in
Figure 16. This cluster had as the top-10 labels (in descending order): Recruitment ideas, building
membership, Scope, recruiting, Increase Diversity, Extend membership, Members-who, Outreach-Non-
Evals, Membership, Member recruitment. After reviewing these labels and the statements in that cluster,
the term 'Recruitment' was selected as the final label that best summarized the ideas.
20
Linkages Recruitment
Outreach
PR & Community Service
Services Programs & Events
Mentoring/Scholarships
Communication
Figure 16. Labeled cluster map for the OPEN project.
The labeled cluster map in Figure 16 can be considered the "base" in this relational
10
map. Just as with the individual points, distance between clusters is interpretable.
Typically, we use this base as the foundation for displaying or representing other
information. For instance, consider the map in Figure 17.
10
Dimensional orientation is irrelevant -- the map does not know north from south or east from west.
once could rotate the map and/or flip it horizontally or vertically and the points/clusters would remain in the
same relational configuration. The shape of the cluster is also irrelevant -- it is determined solely by
joining the outer points in each cluster. however, the size of the cluster does have some interpretability.
In general larger clusters are "broader" in meaning and smaller ones are more 'narrowly" defined.
21
Outreach
Layer Value
Communication
1 2.70 to 2.84
2 2.84 to 2.98
3 2.98 to 3.12
4 3.12 to 3.26
5 3.26 to 3.40
Figure 17. Cluster rating map showing average importance across all participants.
In this map, we use layers on the base cluster shapes to represent a separate
11
variable, in this case, the average importance rating across all participants. This
type of map is analogous to the map of crime rates in the United States that is shown
in Figure 4 except that in this map the base is conceptual rather than geographical.
As with any geographical mapping system, it is often useful to compare two different
variables as represented on a relational base. This is illustrated in Figure 18 which
shows on the left side the pattern of ratings for participants who were government
employees and on the right side the pattern of ratings for participants who were not
government employees. This might be analogous to two maps of the United states
that show felonies and misdemeanors.
The problem with these types of maps is that it is difficult to compare the information
visually. Such a comparison requires the eyes to flit back and forth between the two
figures, noting what it high in one and how high it is in the other.
11
The map also has a legend that shows that the cluster averages range from a low of 2.70 to a high of
3.40, even though the rating was measured on a 1-to-5 scale. This narrow range often surprises people
who are tempted to conclude that it means there is essentially no difference between the highest and
lowest rated cluster. However, it is important to remember that the cluster averages are aggregated twice:
for each statement the average is determined across participants and for each cluster, the average is
determined across statements in the cluster. Because of this, it is expected that the range would be
considerably restricted. it also means that even relatively small differences between cluster averages are
more likely to be interpretable than might appear at first glance.
22
Government Employees Not Government Employees
Outreach
Outreach


Communication
Communication
Figure 18. Comparison of the patterns of ratings of government employee participants and non-
government employee participants.
Another way to accomplish the same "pattern match" comparison more effectively is
to array the two patterns side-by-side in a "ladder graph" form as shown in Figure 19.
The ladder graph has two vertical axes, one for each group. Each cluster from the
map is represented as its own line on the ladder and is listed vertically as a label.
The ordering of the labels from top-to-bottom depicts the rank ordering of the
importance of the clusters and links to the order in which each cluster's line hits the
axis on that side. The point where each line hits the axis depicts the interval-level
average of the importance ratings. The Pearson Product Moment Correlation (r) is
an estimate of the overall pattern match and is shown at the bottom of the figure.
This type of figure is called a "ladder graph" because, when there is strong agreement
between the groups, the lines tend to be horizontal and look like a funny type of multi-
color uneven-step ladder.
Government Employees Not Government Employees
3.33 3.56
Linkages Communication
Services Services
Outreach Outreach
Recruitment Programs & Events
Programs & Events Linkages
Communication PR & Community Service
Mentoring/Scholarships Recruitment
PR & Community Service Mentoring/Scholarships
2.68 2.72
r = .58
Figure 19. A ladder graph representation of the pattern match data in Figure 18.
The pattern matching ladder graph is especially useful for examining visually the
degree to which two vertical pattern representations on a concept map base are
similar or different from each other. For example, the patterns in Figure 19 suggest
that while Linkages is the most important cluster for government employees, it is
slightly below half-way down in importance for non-government employees. We can
also see in the figure that government employees have almost dichotomous or
bimodal ratings -- Mentoring/Scholarships and PR & Community Service are rated
almost equally low and everything else is almost equally high.
23
Ph.D. Not Ph.D.
3.54 3.34
Communication Services
Outreach Recruitment
Services Communication
Linkages Linkages
Programs & Events Programs & Events
PR & Community Service Outreach
Recruitment Mentoring/Scholarships
Mentoring/Scholarships PR & Community Service
2.5 2.65
r = .37
Figure 20. Pattern match of participants having a Ph.D. versus those who do not.
Figure 20 Shows the pattern match of respondents with a Ph.D. compared to those
without one. Several disconnects seem interesting. Non-Ph.D.s seem to rate
Recruitment and possibly Mentoring/Scholarships more highly than Ph.D.s do. On
the other hand, they rate PR & Community Service lower, and it is their lowest
category.
> 10 years in research <= 10 years in research
3.48 3.47
Outreach Communication
Services Outreach
PR & Community Service PR & Community Service
Mentoring/Scholarships Mentoring/Scholarships
2.56 2.85
r = .68
Figure 21. Pattern match of those with more versus less than ten years of experience with research.
In Figure 21, we see that the less experienced researchers seem more interested in
both Services and Recruitment than their more experienced counterparts. This
makes sense. They are more likely to be in the process of establishing their careers
and in need of assistance/guidance (e.g., Services) and interested in networking and
making contact with others who are like them (e.g., Recruitment).
24
AEA Member Not AEA Member
3.47 3.48
Programs & Events Outreach
Services Linkages
PR & Community Service PR & Community Service
Mentoring/Scholarships Mentoring/Scholarships
2.58 2.8
r = .75
Figure 22. Pattern match of AEA versus non-AEA Members.
The national association of evaluators in the United States is the American Evaluation
Association (AEA). It stands to reason the OPEN members would be interested in
participating in this national association in addition to OPEN. Figure 22 shows that
OPEN members who are not members of AEA view Services as the most important
cluster, most likely because they rely on OPEN as their primary or sole source for
such services while AEA members have at least that alternative.
Females Males
3.47 3.33
Communication Outreach
Services Services
Recruitment PR & Community Service
Mentoring/Scholarships Programs & Events
PR & Community Service Mentoring/Scholarships
2.76 2.2
r = .47
Figure 23. Pattern match of females versus males.
Gender was also compared as shown in Figure 23. Here, the most salient feature
appears to be the degree to which males rated Mentoring/Scholarships lower than all
other categories -- their ratings are almost bimodal in this sense. Can this be
reflective of a general tendency of males to be less interested in mentoring and
sponsoring activities?
Pattern matches can be portrayed either relatively of absolutely, and there are distinct
advantages and disadvantages to each. For instance, consider the two pattern
matches shown in Figure 24. Both ladder graphs portray exactly the same pattern
matching data. Both compare the average importance of participants living in
Portland, Oregon with those not living in Portland. The only difference between them
is that the ladder graph on the left uses the actual minimum and maximum average
25
ratings for each group to determine the axis high and low values whereas the graph
on the right fixes the axes for both groups to the same high and low values.
Portland Not Portland Portland Not Portland
3.25 3.7 3.75 3.75
Communication Communication Communication Communication
Services Services Services Services
Programs & Events Outreach Programs & Events Outreach
Linkages Linkages Linkages Linkages
Outreach Programs & Events Outreach Programs & Events
Recruitment Recruitment Recruitment Recruitment
PR & Community Service PR & Community Service PR & Community Service PR & Community Service
Mentoring/Scholarships Mentoring/Scholarships Mentoring/Scholarships Mentoring/Scholarships
2.63 2.86 2.6 2.6

r = .85 r = .85
Figure 24. A relative (left) and absolute (right) pattern match for the same comparison.
Both pattern matches show that Communications is important to both groups, but the
absolute match shows more clearly that it is more important to those outside of
Portland. On the other hand, the relative pattern match shows that, relatively
speaking, Programs & Events is rated higher by those in Portland than by those
outside it (perhaps because those farther from Portland find the events predominantly
in Portland less accessible to them).
What can we make of these results? We need to recognize, of course, that they are
based on very few respondents. This project was undertaken as a demonstration, not
as a more formal evaluation. Inferences from so small and unrepresentative a
sample should be made very cautiously. In addition, all interpretations made here are
from the perspective of the author who was also the facilitator of the project. There
was brief opportunity to engage with OPEN members at their conference and present
the results, but no attempt comparable to what we typically do in concept mapping
was made to engage the groups in the process of interpretation.
On the other hand, we can look at these maps and matches in the spirit of
"suggestive" devices designed to stimulate members of OPEN to think about their
organization and its issues in different ways. With this in mind, and not taking any of
the results too seriously, a few inferences do "suggest" themselves. It seems clear
from the map that there is an internal-external division in how people view OPEN and
its issues. The most salient internal issues relate to the categories Services,
Programs & Events, Communications, and Mentoring/Scholarships. The external
issues are categorized as Recruitment, Linkages, Outreach, and PR & Community
Service. It's interesting in reference to the maps shown in Figure 16 and Figure 17
that Recruitment (external) is located near Program & Events (internal) while PR &
Community Service (external) is closer to Services and Mentoring/Scholarships
(internal). In terms of planning, this may suggest that the organization look for
internal-external synergies in these areas. For instance, this may suggest that
Programs & Events be viewed as a major device to help achieve Recruitment.
Conversely, one might conclude that more Recruitment is important in order to
enhance Programs & Events or offer a greater variety of these. Analogously, one
might look at the confluence of PR & Community Service (external), Services
(internal) and Mentoring/Scholarships (internal) and conclude that the
Mentoring/Scholarships category might very well be a "bridge" area -- it can help
enhance PR & Community Service for OPEN while at the same time providing a
Service to OPEN members.
The other major interpretation from this brief pilot project is related to the pattern
matches. These suggest that OPEN has a number of important constituencies who
have some discernable differences in their preferences. They look to the organization
26
for different things, and participate in different ways. This is no surprise given the
diversity of members, but the pattern matches do help suggest which groups have
which preferences.
Finally, it's important to recognize the stated purpose for this pilot study (outside of its
obvious use as a demonstration of concept mapping). The focus of the study was on
strategic planning for OPEN -- what OPEN should do over the next five years. We
would hope that the map -- like all maps -- would have some utility in guiding the
organization, in this case, in their strategic planning efforts. How could it be used in
this way? First, the maps act as a communications device. They summarize an
enormous amount of information into eight categories, providing a common language
that can shape subsequent planning. That is, they help to organize and simplify the
discussion. Second, the maps provide an implicit structure that can be used to
organize strategic planning efforts. For instance, OPEN might decide to subdivide the
next phase of its strategic planning effort by forming several committees or task
forces. In lieu of this map, they would need to create these intuitively. The map,
however, suggests eight categories that might form the basis for several committees.
If they only want a few committees, they might use the internal-external distinction. If
they want more, they could have one per cluster or group adjacent clusters (e.g., fold
Mentoring/Scholarships into the adjacent Services cluster and form a Services
committee). Each such committee already has a good head start. In their designated
clusters they have a set of issues (i.e., the statements as shown in Table 3) that have
been rated for importance. They might start with the ones rated most important by all
participants and begin to develop potential action statements, deadlines,
12
assignments, and so on . Or, the group might prefer to work as a whole and might
look across the entire map to find the statements that were rated most important
across all participants as shown in Table 4. For these they might then decide on
actions and the details associated with implementing them. These are simply initial
suggestions for how the organization might use the maps to guide their strategic
planning. Potential uses are limited primarily by the creativity of the organization.
It should be clear that concept maps, like their geographic counterparts, are intended
to provide guidance, a sense of direction. Instead of geographical terrain, the
concept maps attempt to describe the conceptual and semantic terrain. With this
case study in mind, we turn now to look at what cartographers are saying about
mapping, to see how their insights and experiences might illuminate the evaluator's
endeavors.
The Cartography of Concept Mapping

A central premise of this paper is that evaluators can, both figuratively and literally, be
cartographers of the social reality within which they work. When understood in the
broader classification described above, it should be clear that the evaluator-
cartographer is not meant to be limited to geographic mapping (such as geographic
information systems). In fact, geographic mapping may in the end be the lesser
application of the evaluator-cartographer. Relational maps, especially ones that
12
Concept Systems Incorporated, the producers of The Concept System, have a number of post-mapping
modules that help participants take the results of maps and use them for various purposes. These
modules, termed the Concept System Application Suite, include (among many): a program for strategic or
action planning that enables an organization to develop tasks for any statement/clusters, set deadlines,
assign resources, set costs, and so on; and a program for evaluation or performance measurement that
enables the development of multiple measures for each statement/cluster, specification of targets or
objectives, and ongoing monitoring over time, analysis, and reporting.
27
involve additional representation of data, are potentially valuable both for prospective,
formative evaluation and retrospective, summative work.
If one accepts the premise that mapping has a useful place in evaluation, and that the
type of non-geographical mapping that might be undertaken by evaluators is
conceptually related to traditional geographic mapping, it makes sense that we might
examine how contemporary cartographers think about quality in their work to see if
there may be some useful lessons and/or warnings there for evaluator-cartographers.
The naïve view of most laypersons is that geographic maps -- those that we might call
geographically isomorphic -- more-or-less correspond to external reality. A map,
viewed from that perspective, can be judged in terms of its accuracy, the degree to
which it correctly reflects geographical reality.
Those of us who have been engaged in evaluation work are familiar with this
perspective with respect to evaluation. Just as with maps, our evaluations are
presumed by many laypersons to be isomorphic with reality. An evaluation in this
perspective can be judged in terms of its validity or the degree to which it "correctly"
represents reality. In Harley's words:
The usual perception of the nature of maps is that they are a mirror, a graphical
representation, of some aspect of the real world. …the role of a map is to present a factual
statement about geographic reality. (Harley, 1990)
The positivist and naïve realist perspectives in evaluation have been successfully
challenged by a variety of post-positivist stances that include critical realism,
emergent realism, constructivism, and interpretivism, among others. All of these
perspectives share the premise that reality is not an immediately sensible
phenomenon that can easily be represented acontextually. All are critical of our ability
to observe and measure "accurately" in the naïve positivist or realist sense. All of
them view the political and interpersonal environment as critical components in
evaluation contexts.
It is somewhat heartening as an evaluator to read as an uninitiated outsider some of

the recent literature on cartography. Many of the themes raised there, and many of
the conclusions that result, have direct analogues in evaluation. The cartographers,
like evaluators, have struggled with the positivist and naïve realist paradigms. They,
like we, have recognized the critical importance of context, the frailty of measurement
and observation, the degree to which our work is constructed based on our
perspectives and experiences.
In this section some of the current thinking in cartography is reviewed. No attempt is

made to represent accurately the full range of current debate among cartographers.
It is almost certainly as divided and contentious as current debate among evaluators.
This discussion also does not pretend to represent the most discerning reaches of
academic discourse in cartography. The primary sources used here are readily
available in the popular press and can be easily located in most large bookstores.
Although several sources are included, this discussion draws heavily from the
fascinating work by Denis Wood entitled The Power of Maps (Wood, 1992). Wood
clearly comes from a constructivist/interpretivist perspective in cartography that
challenges even the assumptions of post-positivist critical realism. His insights are
challenging and have great potential value for evaluator-cartographers, whether one
agrees with him or not. The discussion that follows is based on a set of themes, each
28
13
of which is stated directly from the contents of Wood's book. Within each theme is
some brief discussion of the theme within cartography along with consideration of the
implication of the theme for the concept mapping case study in particular and concept
mapping in general.
Every Map Has an Author, a Subject, a Theme
…maps, all maps, inevitably, unavoidably, necessarily embody their authors' prejudices,
biases and partialities … There can be no description of the world not shackled … by these
and other attributes of the describer. Even to point is always to point…somewhere; and
this not only marks a place but makes it the subject of the particular attention that pointed
there instead of … somewhere else. The one who points: author, mapmaker; the place
pointed: subject, location; the particular attention: the aspect attended to, the theme --
nothing more is involved (and nothing less) in any map. (Wood, 1992, p. 24)
Who is the "author" of a concept map? We usually describe three types of people
who are directly involved in developing a map -- initiator(s), facilitator(s) and
participants. The initiator(s) is the person(s) who called for the map, who set the
process in motion. In most mapping contexts, the initiator is someone from a client
group or organization who contacts the facilitator(s) who will oversee the process
itself. The initiator is usually fairly powerful within the context of the map interest.
They might be the director or managers in an organization. By selecting a concept
mapping process itself, they are "pointing" in a sense, choosing one method or
approach over another. But they have a more important influence in shaping the
project because they are usually the key persons working with the facilitator to decide
on the focus for the map, the demographics and ratings, the schedule and who will be
selected to participate. The facilitator also plays a key role in shaping the nature of
the map. As the person most likely to be knowledgeable about the mapping process
and how it works, the facilitator has numerous opportunities to shape the content. For
instance, the facilitator plays a key determinative role in:
• shaping the planning (selection of focus, schedule, participants, etc.)

• shaping the content of the brainstormed statements through direct
management of the group during brainstorming and any editing of the
statement set
• shaping the ratings by helping phrase the rating instructions
• shaping the analysis through choices in analysis parameters
• shaping the results, especially through the selection of the number of
clusters and in the rearrangement of any points or cluster boundaries
on the map
• shaping the interpretation, through what is presented to the
participants, in what order, and through management of the
interpretation discussion
Obviously, it's also important to recognize that the participants shape the content of
the map. In one sense, they are considered to be the primary authors. However, it
13
The contents alone are a stimulating read. The notion of stating contents headings in so direct and
accessible a manner adds greatly to the readability and impact of the volume. However, the themes used
here are only a partial representation of the contents. Many of the themes that are omitted relate to the
iconographic nature of many geographical maps -- a feature not shared by the type of concept mapping
described here.
29
should be clear that their contribution is shaped significantly by the initiator(s) and
facilitator(s).
One of the most dangerous and potentially deceptive aspects of concept mapping is
related to the apparent and real roles of the facilitator. On the surface to many
participants and stakeholders, the facilitator appears to be a disinterested one who
objectively helps the group surface their content. In reality, the facilitator plays a
major determinative role, influencing in many subtle ways. However, it may not be
clear in what directions or for what interests the facilitator works. As facilitator of this
project I was not a member of OPEN, only knew professionally a few members
(including the initiators) and had no real interests of my own other than wanting to do
as nice a demonstration of the method as possible (which is not to say this isn't a
powerful influencer). But, I am a white, middle-age professor from an ivy league
school and with extensive experience in the evaluation profession. These factors
certainly influence decisions I make or encourage others to make as part of concept
mapping. I hasten to add that I don't think this is a problem in itself -- any facilitator
will have biases and unique perspectives that they bring to a project. The danger is
not in the bias, it is in not recognizing that there is that influence. To pretend that the
facilitator is an objective outsider who only facilitates and does not influence the
results which, instead, stem entirely from objective and rigorous statistical and
computer-based processes -- now that is a potential danger.
In the OPEN case study, I was the only facilitator. I was contacted by the current
President of OPEN who knew of my mapping work and asked me to be the keynote
speaker for their first conference. In our early discussions I suggested that we do an
actual concept mapping project for OPEN that could then be incorporated into my
talk. Subsequently another associate of the President's joined our discussions
because he was both a member of OPEN and experienced with the concept mapping
methodology. Therefore, in this context we had one facilitator and two initiators from
the client organization. All of the planning decisions for the project were made by the
three of us. We decided to do the process (up to interpretation of the maps) using
the web version of The Concept System. We decided on the focus statement for
brainstorming and we decided to have all OPEN members on their e-mail list
(evidently the vast majority of members have e-mail, and OPEN uses it routinely for
member communications, although it's not clear how much/often members use it).
All e-mail communications about the project came to members directly from the
OPEN President, although I actually did all the management of the technology and
process over the web.
There were several points at which I or the initiators influenced the process, in
addition to the decisions described above. During the brainstorming period very few
statements were being generated over the web. A few days before the brainstorming
was due to close, I sent this e-mail to the initiators:
It's Tuesday afternoon and we only have 24 brainstormed statements! I'm shutting this
thing off this weekend and I REALLY, REALLY hope you guys can drum up a bit more
involvement than that. I'm going to spend a fair amount of this Sunday generating 300 or
so datasets for your participants and it wouldn't hardly be worth it unless we get 70-80
statements. Of course, you two can "pad" the list when you do the editing this weekend
(and you may have to), but it would be so much nicer if others participated.
Of course, this was an artificial example among friends and done entirely for
purposes of demonstration. But it's clear from the message that I'm concerned both
about the low level of participation and the degree to which it might reflect poorly on
my methodology (not to mention the fear that I would not obtain a good enough result
to present and would have to come up with another example!). My initiator buddy
wrote me back:
30
only 30 statements: What a bummer! Especially since I did 15 of them already. So yes, I
will start padding --also will call friends and solicit their statements and add them as well.
He evidently did a great job because, in the end, we had 83 statements. Was I
concerned about this participation? Not in this case because it was clear only a
demonstration. Could stuff like this happen in "real" projects? Absolutely. The
problem is not particularly that this kind of thing occurs -- it's that it doesn't get
presented. The impressive-looking and authoritative maps that are presented look as
though they are based on extensive participation. This issue manifests itself in
different ways depending on how the brainstormed statements are gathered. For
instance, when brainstorming is done over the web as in this case study, it is virtually
14
impossible to tell how broad-based the participation is because it is anonymous .
Different and equally vexing problems occur in a face-to-face brainstorming session.
When brainstorming was shut down, I instructed the initiators to "edit" the
brainstormed statement set. Here's part of the e-mail interaction:
FROM Initiator:
Just to make certain I'm clear about the goal-we want statements that are conceptually
different from each other as opposed to reiterations of similar themes? What about
statements that are more detailed aspects of broader concepts (e.g., "Hold more events
outside the Portland area" vs. "Schedule meetings in Salem (or Eugene, or Bend...")? Is it
legitimate to include all of those statements about the "meeting venue" concept, or should
we cull out the more detailed statements? Or should we include the more detailed
statements, and leave out the broader concept?
My Response:
Technically, as long as the statement completes the focus prompt it is legitimate.

However, you want to use some judgment when you revise. if someone says, "hold more
meetings outside Portland" I guess I would be inclined to leave it alone. However, if you
get a ton of statements that say "hold a meeting in Portland", "hold a meeting in Eugene",
"hold a meeting in Podunck", I would probably collapse these to "hold meetings in a variety
of locations throughout the state."
Here, both the facilitator and the initiators are influencing the map results directly in
the numerous editing choices. One might argue that we are trying to be fairly even-
handed in our judgments. However, if you were a member from an area of the state
that had never had an OPEN meeting, having your statement lumped in with others
who have had prior meetings in their areas might legitimately lead you to feel we were
biased against you in our editing judgment.
In the end, the statements themselves should reflect legitimate content from the
participants. Thus, in concept mapping there are multiple "authors" who influence the
subject and theme of a map in numerous often unconscious ways. Like our
cartographer colleagues, the evaluator-facilitator in this context often tries to distance
themself from the result. We are only facilitators. It's the map that is an objective
14
Actually, for those who know about how the web works, you should recognize that while we don't
require login for brainstorming, every web server can trace every hit to the web back to a specific IP
number. In many organizations, IP numbers are assigned permanently to specific machines, which
means we could identify which machines hit the brainstorming. Of course, we don't know who is on that
machine, but in many companies a machine is assigned to an individual, so we might infer who it was that
hit the brainstorming site. Of course, the log won't tell us what statement was brainstormed in what hit,
although it's possible to reconstruct the sequence. So, theoretically, it is possible that one could identify
who brainstormed what statement even with "anonymous" brainstorming. I wouldn't want to try figuring it
out, however.
31
reflection of reality. The cartographers remind us that that isn’t even true in most
geographic mapping! Why would we expect it to be in concept mapping?
Every Map Shows This…But Not That
…the cartographer makes a selection, classifies, standardizes; he undertakes intellectual

and graphical simplifications and combinations; he emphasizes, enlarges, subdues or
suppresses visual phenomena according to their significance to the map. In short, he
generalizes, standardizes, and makes selections and he recognizes the many elements
which interfere with one another, lie in opposition and overlap, thus coordinating the
content to clarify geographical patterns of the region. (Imhoff, E., 1982)
What is the analogue in the type of concept mapping described here? Perhaps it is
simply the choice of this form of concept mapping, this specific methodology, that, like
a choice of projection method in geographic mapping is a key selective factor
(although it is difficult to see how that choice itself determines the content of a map).
In another sense, the methodology itself -- the algorithms used, the wording of the
focus statement -- limits the nature of the resulting map. Perhaps most obvious, the
concept map may be most limited in its perspective by the selection of the
participants and by the contextual circumstances that might lead them to participate
actively and offer what they really think and believe. In this sense, anonymous
participation rather than face-to-face mapping may open up the process and help to
mitigate the narrowing of ideas. On the other hand, as this case study shows,
anonymous participation has its own problems.
One of the most delimiting aspects of concept mapping is the brainstorming focus
statement. It determines what content is allowed and what is excluded. Here, again,
is the one we used in the case study:
One specific thing I think the Oregon Program Evaluators Network should
do over the next five years is…
At first glance, this isn't a very limiting statement. But let's examine it a bit more
critically. It limits the statement to a five-year horizon. That's pretty broad, but it
doesn't encompass more than a portion of the time span of most members' careers
as evaluators. The focus calls for things that can be done -- for actions. It is neutral
on whether they should ever be fully accomplished, although that it suggested. The
focus is also neutral about why the suggestions should be done. To improve OPEN?
To help members? To help control the evaluation profession in Oregon? Perhaps
most important, there are numerous potentially critical things that are outside the
scope of the focus. The focus doesn't direct you to brainstorm about the current
problems OPEN has, or the things you like about the organization. In this sense, the
focus makes a choice, one that influences everything that follows in the process.
Another place where choices are made is in the selection of the number of clusters
on the map. There is no easy way to do this mathematically, and it's not clear that it
would be desirable to do so even if there were. This is also not an easy thing to
accomplish in a group process -- most groups have neither the interest or to examine
multiple cluster solutions and make a determination of which seems best. This
judgment is usually left to the facilitator. In this case study, I followed the standard
procedure for selecting clusters (see above) and the process was fairly
straightforward. However, in other mapping processes there are times when the
maps are not as clear as in this case study, or where there are multiple cluster
solutions that would be of value. This is a bit like the cartographer who recognizes
that drawing a map at one scale has value for one purpose while drawing it at another
32
has value for some other. It is not that one is "right" or "wrong", "better" or "worse" in
some abstract sense. It is that the context is in part determinative of the value of
such choices.
Maps Construct -- Not Reproduce -- the World

The naïve view is that a geographical map is a window on reality.
Were it not reality, why then it would just be…opinion, somebody's idea of where your
property began and ended, a good guess at where the border was, a notion of the location
of the hundred-year flood line, but not the flood line itself. What is elided in this way is
precisely the social construction of the property line, the social construction of the border,
the social construction of the hundred-year flood line, which -- like everything else we map
-- is not a line you can see, not a high water mark drawn in mud on a wall or in debris along
a bank, but no more than a more-or-less careful extrapolation from a statistical storm to a
whorl of contour lines. As long as the map is accepted as a window on the world, these
lines must be accepted as a representing things in it with the ontological status of streams
and hills. (Wood, 1992, p. 18-19)
But this is more than simply a philosophical conundrum, an abstract academic

argument. As in evaluation, one's position on this issue has a dramatic effect in real-
world consequences:
But no sooner are maps acknowledged as social constructions than their contingent, their
conditional, their…arbitrary character is unveiled. Suddenly the things represented by
these lines are opened to discussion and debate, the interest in them of owner, state,
insurance company is made apparent. (Wood, 1992, p. 19)
In concept mapping, the constructivist nature is even more apparent perhaps than
with geographical maps. But, as we utilize more and more sophisticated equipment
and rely on more advanced mathematical algorithms, the temptation increases to see
the results of all this sophistication as something accurate, a valid representation of
an implicit cognitive world.
This is a perfectly reasonable impulse. And, by making this argument we don't

presume that we should give up on the goal of making as accurate a map as
possible. We won't contend that a random concept map, thrown together with
randomly generated mathematical data is a reasonable goal any more than Wood is
likely to contend that random sketches might qualify as geographically useful guides.
The point -- for both geographical and concept maps -- is to recognize that no single
map can possibly reproduce reality. Any single map is but one of an infinity of other
potential reasonable and accurate reflections of some aspect of reality. Rather than
being a constraint, this recognition actually acts as a catalyst. For the map user, the
map becomes one view, suggests other views, should be critically viewed. The
question that every map raises, at least implicitly, is "Why was this question asked?"
or "Why was this approach taken?" or "Why was this perspective used?"
Not that accuracy is not worth achieving, but it was never really the issue, only the cover.
It is not precision that is at stake, but precision with respect to what? What is the
significance of getting the area of a state to a square millimeter when we can't count its
population? Who cares if we can fix the location of Trump's Taj Mahal with centimeter
accuracy when what would be interesting would be the dollar value of the flows from the
communities in which its profits originate? What is the point of worrying about the
generalization of roads on a transportation map when what is required are bus routes?
Each of these windows is socially selected, the view through them socially constrained no
matter how transparent the glass, the accuracy not in doubt, just…not an issue. (Wood,
1992, p. 21)
33
The OPEN map is clearly a construction. It is founded on an unrepresentative
sample of OPEN members and based on algorithms that inevitably introduce
15
distortions. It addresses a very broad topic -- what OPEN might do as an
organization over the next five years -- from the perspectives of a few self-selected
individuals. These participants may have a longer-term interest in OPEN than most.
They are likely to differ in numerous ways from members who chose not to
participate. It raises a question about the organization within a specific time frame.
Others might be interested in different questions, different time frames. This doesn't
mean the map is not "accurate," only that it has a point of view from which it is
constructed. We might obtain any number of other maps that are relevant,
interpretable, or useful, from the same participants or from others. We might find
that, even if we replicate the study on different OPEN members, the results are
substantially unchanged. Or not. That issue is ultimately empirical. But the choice of
perspective, the questions asked, the experience base, are reflections of decisions
made consciously or otherwise, in the course of producing this map. Even a larger
sample (or complete participation from every member of the population of interest)
would do nothing to diminish the fact that the map is inherently a construction based
on an infinity of choices.
Maps Make the Past and Future Present
The world we take for granted -- the real world -- is made like this, out of the accumulated
thought and labor of the past. It is presented to us on the platter of the map, presented,
that is, made present, so that whatever is invisible, unattainable, erasable past or future
can become part of our living…now…here (Wood, 1992, p. 7)
This is what it means to use a map. It may look like wayfinding or a legal action over
property or an analysis of the causes of cancer, but always it is this incorporation into the
here and now of actions carried out in the past. This is no less true when those actions are
carried out…entirely in our heads: the maps we make in our minds embody experience
exactly as paper maps do, accumulated as we have made our way through the world in the
activity of our living. (Wood, 1992, p. 14)
Here, the link between geographic and conceptual maps is almost explicitly drawn.
All concept maps, like geographical ones, incorporate our past into an imperfect
agglomeration of our experiences and thoughts. The OPEN concept map represents
the past through the experiences of those who participated. It anticipates the future in
the sense that it is focused on things OPEN could do. Thus, this concept map, like
every other, is historically indexed, a snapshot in time that presents the past and the
future at one moment.
Maps Link the Territory with What Comes with It
It is this ability to link the territory with what comes with it that has made maps so valuable
to so many for so long. Maps link the territory with taxes, with military service or a certain
rate of precipitation, with the likelihood that an earthquake will strike or a flood will rise…
(Wood, 1992, p. 10)
15 For instance, the multidimensional scaling is restricted to two dimensions when, in reality, we have no
idea how many dimensions might be needed to represent the information accurately.
34
Analogously, it is the ability of a concept map to link the concepts or ideas with action
that makes it so valuable. In a planning endeavor, we often use the clusters as the
basis for action. Statements in the cluster become the basis for action statements
that have specific persons assigned to carry them out, resources attached, deadlines
set. In evaluation contexts, we often use the map as the basis for measurement
construction. The cluster acts as the measurement construct. The statements are
potential or actual measurement items, operationalizations of the construct. In both
contexts, the map provides a vehicle or base for representation of the consequences
or results. it is the framework for showing progress on the strategic plan or change in
the measurement.
In thinking about geographic maps, the cartographer has reference to the implicit
geographical maps we carry around in our minds -- the mental maps -- of the
geographical landscape that we traverse. Analogously we carry around mental maps
that extend beyond simple geography, for our mental maps incorporate various
modes of our experience, the sights, smells, sounds and, especially the language and
meaning that we integrate with that experience.
Of course a mental map is … clearly related to the way we use paper maps to make
decisions. Certainly the similarities increase once we begin to externalize these maps, to
share them with each other. "What? Why would you go that way?" "Because it's shorter."
"No, No, it's shorter if you take St. Mary's to Lassiter Mill -- " "Oh, and then go out Six
forks to Sandy Forks?" "Yeah." Here the maps, in separate heads, are being consulted
almost as if they were paper maps open on the table, linking knowledge individually
constructed in the past to a shared living unfolding in the present. (Wood, 1992, p. 15-16)
Concept maps also enable individuals to negotiate through a terrain, in this case a
cognitive terrain of semantic relationships and personal experiences. The discussion
of a concept map is a useful underpinning for the exploration of shared meaning and
consensus or the identification of diversity and divergence of view. In the OPEN case
study, this discussion has not (and may never) occur. There was some brief
discussion of the maps produced here at the conference. The major purpose of
constructing these maps -- to illustrate the mapping process on an example of
relevance to OPEN members -- was accomplished at the conference itself.
Nevertheless, OPEN could decide to embark on strategic planning using the concept
map produced here (and, one would hope, augmenting it with broader involvement
and input).
The Interest the Map Serves is Masked

Why do so many people bristle at the idea that every map is biased, perspective-
laden, a necessary distortion?
It is because the map is powerful precisely to the extent that this author…disappears, for it
is only to the extent that this author escapes notice that the real world the map struggles to
bring into being is enabled to materialize (that is, to be taken for the world). As long as the
author -- and the interest he or she unfailingly embodies -- is in plain view, it is hard to
overlook him, hard to see around her, to the world described, hard to see it…as the world.
…As author -- and interest -- become marginalized (or done away with altogether), the
represented world is enabled to…fill our vision. Soon enough we have forgotten this is a
picture someone has arranged for us (chopped and manipulated, selected and coded).
Soon enough…it is the world, it is real, it is…reality. …When the authors have rendered
themselves transparent, suddenly we have…fresh respect. And it is astonishing how
easily this happens, how readily we take for granted -- as natural -- what is never more
than the social construction of a map. All that is required is the disappearance of the
author, the invisibility of the interest. (Wood, 1992, 70 - 71).
35
Concept mapping benefits from the illusion of objectivity, the appearance that there is
no one who is driving the machine, that it is a dispassionate and scientific algorithm
that determines the arrangement. But there are many places where the person
behind the algorithm enters in, and even more where the participants themselves
shape the result.
Again and again we have seen a similar vagueness of content and form, a similar diffusion
of ends and means. The map will show everything (and therefore claim innocence about
the choice of anything) and will show it as it is (ignoring the "white lies" the map must tell in
order to be accurate and truthful). The map will be seen to serve so many purposes that
none can predominate, or its means will be so widely spread in so many social institutions
that it can be claimed by none. Responsibility for the map will be shuffled off onto layered
and competing interest groups, or its authorship will be shown to be fragmented among so
many specialists as to be impossible to establish. Lying and lost, vague and confused, the
map will therefore show the world the way it really is. (Wood, 1992, p. 73)
I find this to be a particularly apt insight. There is something compelling about

concept maps. Despite the countless decisions and judgments that go into them, the
multiple perspectives and distortions, it is hard to look at a final map and refrain from
treating it as "accurate" in some sense. Our species has evolved an extremely
powerful desire to "make sense" of our environment. After all, we're able to do
elaborate interpretations even of random ink blots! A concept map shows a picture
that always appears to be very orderly and sensible. The computer generates very
crisp looking polygons to enclose groups of ideas. The layers suggest relative
importance and other characteristics. In the end, the map is often just plain
compelling. We have to remind ourselves of its frailty, its uniqueness. We have to be
honest with ourselves and the participants about how it masks various interests and
encompasses biases.
The Interest the Map Serves Can Be Yours

Some might see the perspective offered here on concept mapping and be depressed
or discouraged by it. To recognize that an apparently very powerful tool is (like all
other tools, I hasten to add) fallible and imperfect is sometimes difficult to take. I'm
sure that the recognition in the cartography community that this is so for geographic
maps has been a source of some contention and consternation there for the past few
decades.
Once we accept the inherent perspective-ladenness of every map, once we see that
each map serves interests, does the map itself lose its value?
Once the map is accepted for the interested representation it is, once its historical
contingency is fully acknowledged, it is no longer necessary to mask it. Freed from this
burden of…dissimulation…the map will be able to assume its truest character, that of
instrument for…data processing, that of instrument for…reasoning about quantitative
information, that of instrument for…persuasive argument (Wood, 1992, p. 182)
If this is the case for geographic maps, then perhaps it is even more so for concept
maps. Over the past several decades as I developed and worked with the idea of
concept mapping described here, I frequently heard from my colleagues questions
about the "validity" of the maps that were being generated. As a methodologist, I too
was interested in this question and in several instances (Dumont, 1989; Trochim,
1993) attempted to address it, with varying degrees of success. It has not been clear
how, exactly, one might methodologically best determine the accuracy or validity of
these maps.
36
This brief sojourn into the land of the cartographer suggests that the validity question
may be misplaced. It argues that every map is inherently inaccurate, and perhaps
even worse (or is it better?) that the issue of accuracy is not even a very interesting or
exciting one. In some sense, this recognition doesn't diminish the power of maps,
geographic or otherwise. Perhaps instead it frees the maps up, in Wood's language,
to do what they do best. The map is a powerful suggestive device. No one map can
ever carry the weight of accuracy. Multiple maps based on multiple and diverse
biases, perspectives and interests, aren't diminished because they are different.
Potentially it is those differences that add value, that deepen insight, that increase our
knowledge of the world. They assume their "… truest character, that of instrument
for…data processing, that of instrument for…reasoning about quantitative
information, that of instrument for…persuasive argument."
37
References
Anderberg, M.R. (1973). Cluster analysis for applications. New York, NY: Academic
Press.
Bragg, L.R. and Grayson, T.E. (1993). Reaching consensus on outcomes: Lessons
learned about concept mapping. Paper presented at the Annual Conference of
the American Evaluation Association, Dallas, TX.
Buzan, T. with Buzan, B. (1993). The Mindmap Book: Radiant Thinking, The Major
Evolution in Human Thought. BBC Books, London.
Caracelli, V. (1989). Structured conceptualization: A framework for interpreting evaluation
results. Evaluation and Program Planning. 12, 1, 45-52.
Cook, J. (1992). Modeling staff perceptions of a mibile job support program for persons
with severe mental illness. Paper presented at the Annual Conference of the
American Evaluation Association, Seattle, WA.
Cooksy, L. (1989). In the eye of the beholder: Relational and hierarchical structures in
conceptualization. Evaluation and Program Planning. 12, 1, 59-66.
Davis, J. (1989). Construct validity in measurement: A pattern matching approach.
Evaluation and Program Planning. 12, 1, 31-36.
Davison, M.L. (1983). Multidimensional scaling. New York, John Wiley and Sons.
Dumont, J. (1989). Validity of multidimensional scaling in the context of structured
Everitt, B. (1980). Cluster Analysis. 2nd Edition, New York, NY: Halsted Press, A
Division of John Wiley and Sons.
Galvin, P.F. (1989). Concept mapping for planning and evaluation of a Big Brother/Big
Sister program. Evaluation and Program Planning. 12, 1, 53-58.
Grayson, T.E. (1992). Practical issues in implementing and utilizing concept mapping.
Paper presented at the Annual Conference of the American Evaluation
Association, Seattle, WA.
Grayson, T.E. (1993). Empowering key stakeholders in the strategic planning and
development of an alternative school program for youth at risk of school behavior.
Association, Dallas, TX.
Gurowitz, W.D., Trochim, W. and Kramer, H. (1988). A process for planning. The
Journal of the National Association of Student Personnel Administrators, 25, 4,
226-235.
Harley, J.B. (1990). Text and contexts in the interpretation of early maps. In david
Buisseret (Ed.) From Sea Charts to Satellite Images: Interpreting North American
History Through Maps. University of Chicago Press. Chicago.
Imhoff, E. (1982). Cartographic Relief Representation. de gruyter Press, Berlin.
Kane, T.J. (1992). Using concept mapping to identify provider and consumer issues
regarding housing for persons with severe mental illness. Paper presented at the
Annual Conference of the American Evaluation Association, Seattle, WA.
Keith, D. (1989). Refining concept maps: Methodological issues and an example.
Kohler, P.D. (1992). Services to students with disabilities in postsecondary education
settings: Identifying program outcomes. Paper presented at the Annual
Conference of the American Evaluation Association, Seattle, WA.
Kohler, P.D. (1993). Serving students with disabilities in postsecondary education
settings: Using program outcomes for planning, evaluation and
empowerment.Paper presented at the Annual Conference of the American
Evaluation Association, Dallas, TX.
Kruskal, J.B. and Wish, M. (1978). Multidimensional Scaling. Beverly Hills, CA: Sage
Publications.
Lassegard, E. (1992). Assessing the reliability of the concept mapping process. Paper
presented at the Annual Conference of the American Evaluation Association,
Seattle, WA.
38
Lassegard, E. (1993). Conceptualization of consumer needs for mental health
services.Paper presented at the Annual Conference of the American Evaluation
Linton, R. (1989). Conceptualizing feminism: Clarifying social science concepts.
Mannes, M. (1989). Using concept mapping for planning the implementation of a social
technology. Evaluation and Program Planning. 12, 1, 67-74.
Marquart, J.M. (1988). A pattern matching approach to link program theory and
evaluation data: The case of employer-sponsored child care. Unpublished
doctoral dissertation, Cornell University, Ithaca, New York.
Marquart, J.M. (1989). A pattern matching approach to assess the construct validity of an
evaluation instrument. Evaluation and Program Planning. 12, 1, 37-44.
Marquart, J.M. (1992). Developing quality in mental health services: Perspectives of
administrators, clinicians, and consumers. Paper presented at the Annual
Marquart, J.M., Pollak, L. and Bickman, L. (1993). Quality in intake assessment and case
management: Perspectives of administrators, clinicians and consumers. In R.
Friedman et al. (Eds.), A system of care for children's mental health: Organizing
the research base. Tampa: Florida Mental Health Institute, University of South
Florida.
McLinden, D. J. & Trochim, W.M.K. (In Press). From Puzzles to Problems: Assessing the
Impact of Education in a Business Context with Concept Mapping and Pattern
Matching. In J. Phillips (Ed.), Return on investment in human resource
development: Cases on the economic benefits of HRD - Volume 2. Alexandria,
VA: American Society for Training and Devlopment.
Mead, J.P. and Bowers, T.J. (1992). Using concept mapping in formative evaluations.
Mercer, M.L. (1992). Brainstorming issues in the concept mapping process. Paper
Seattle, WA.
Novak, J.D. (1993). How do we learn our lesson? Taking students through the process.
The Science Teacher, 60, 3, 50-55.
Novak, J.D. and Gowin, D.B. (1985). Learning How To Learn. Cambridge, Cambridge
University Press.
Nunnally, J.C. (1978). Psychometric Theory. (2nd. Ed.). New York, McGraw Hill.
Penney, N.E. (1992). Mapping the conceptual domain of provider and consumer
expectations of inpatient mental health treatment: New York Results. Paper
Seattle, WA.
Romney, A.K., Weller, S.C. and Batchelder, W.H. (1986). Culture as consensus: A theory
of culture and informant accuracy. American Anthropologist, 88, 2, 313-338.
Rosenberg, S. and Kim, M.P. (1975). The method of sorting as a data gathering
procedure in multivariate research. Multivariate Behavioral Research, 10, 489-
502.
Ryan, L. and Pursley, L. (1992). Using concept mapping to compare organizational
visions of multiple stakeholders. Paper presented at the Annual Conference of
the American Evaluation Association, Seattle, WA.
SenGupta, S. (1993). A mixed-method design for practical purposes: Combination of
questionnaire(s), interviews, and concept mapping.Paper presented at the Annual
Conference of the American Evaluation Association, Dallas, TX.
Shern, D.L. (1992). Documenting the adaptation of rehabilitation technology to a core
urban, homeless population with psychiatric disabilities: A concept mapping
approach. Paper presented at the Annual Conference of the American Evaluation
39
Shern, D.L., Trochim, W. and LaComb, C.A. (1995). The use of concept mapping for
assessing fidelity of model transfer: An example from psychiatric rehabilitation.
Evaluation and Program Planning, 18, 2.
Trochim, W. (1985). Pattern matching, validity, and conceptualization in program
evaluation. Evaluation Review, 9, 5, 575-604.
Trochim, W. (Ed.) (1989). A Special Issue of Evaluation and Program Planning on
Concept Mapping for Planning and Evaluation, 12.
Trochim, W. (1989a). An introduction to concept mapping for planning and evaluation.
Evaluation and Program Planning, 12, 1, 1-16.
Trochim, W. (1989b). Concept mapping: Soft science or hard art? Evaluation and
Program Planning, 12, 1, 87-110.
Trochim, W. (1989c). Outcome pattern matching and program theory. Evaluation and
Trochim, W. (1990). Pattern matching and program theory. In H.C. Chen (Ed.), Theory-
Driven Evaluation. New Directions for Program Evaluation, San Francisco, CA:
Jossey-Bass.
Trochim, W. and Cook, J. (1992). Pattern matching in theory-driven evaluation: A field
example from psychiatric rehabilitation. in H. Chen and P.H. Rossi (Eds.) Using
Theory to Improve Program and Policy Evaluations. Greenwood Press, New
York, 49-69.
Trochim, W. (1993). Reliability of Concept Mapping. Paper presented at the Annual
Conference of the American Evaluation Association, Dallas, Texas, November.
Trochim, W. (1996). Criteria for evaluating graduate programs in evaluation. Evaluation
News and Comment: The Magazine of the Australasian Evaluation Society, 5, 2,
54-57.
Trochim, W. and Linton, R. (1986). Conceptualization for evaluation and planning.
Trochim, W., Cook, J. and Setze, R. (1994). Using concept mapping to develop a
conceptual framework of staff's views of a supported employment program for
persons with severe mental illness. Consulting and Clinical Psychology, 62, 4,
766-775.
Valentine, K. (1989). Contributions to the theory of care. Evaluation and Program
Planning. 12, 1, 17-24.
Valentine, K. (1992). Mapping the conceptual domain of provider and consumer
expectations of inpatient mental health treatment: Wisconsin results. Paper
Seattle, WA.
Weller S.C. and Romney, A.K. (1988). Systematic Data Collection. Newbury Park, CA,
Sage Publications.
Wilford, J.N. (1982). The Mapmakers: the Story of the great Pioneers in Cartography from
Antiquity to the Space Age. New York, Vintage Books.
Wood, D. (1992) . The Power of Maps. The Guilford Press. New York.
40
Table 1. Brainstormed Statements
1. Increase membership of program staff and non-researchers.
2. Develop a logo.
3. Include public-trust businesses in organization, such as public utilities or waste management,
which use GIF.
4. Recruit people working on the fringe of evaluation, e.g., people in assessment, training,
technical assistance, etc.
5. Have skill development sessions more than once every other month.
6. Increase the diversity of membership.
7. Involve other local governments (counties, city, metro).
8. Meet with Board of Directors of agencies to promote evaluation.
9. Have annual conferences.
10. Have meetings during the day to accommodate the working researcher with family
responsibilities.
11. Have committee to produce "press releases" about research in the area.
12. Provide a variety of for-fee events.
13. Involve evaluators in other parts of the state.
14. Provide information about available grants.
15. Create linkage with our neighbors to the north, and see what is going on regionally.
16. Have more facilitated roundtables.
17. Reach out to organizations receiving federal funds where evaluation is required.
18. Have follow-up information in newsletter about research opportunities originally announced via
OPEN.
19. Provide feedback on the annual AEA meeting for those of us not able to attend.
20. Include "panels" as a program format.
21. Organize mini-conference.
22. Have no-host social hour at a variety of locations.
23. Enhance linkage with other research organizations.
24. Develop a network to include programs.
25. Develop linkages with police department to promote evaluation.
26. Work on ways to include evaluators from other parts of the state.
27. Enhance linkage with American Evaluation Association.
28. Include a brief bio on each OPEN member.
29. Hold events outside the downtown Portland area.
30. Develop a newsletter, email or otherwise.
31. Provide information about available contracts.
32. Continue the bi-monthly presentations
33. Do outreach to promote evaluation in the public and private sectors.
34. Develop special interest subgroups within the membership.
35. Use research knowledge for policy change.
36. Do more outreach.
37. Include interactive discussion with audience at program sessions.
38. Join state and county advisory boards as representative of OPEN.
39. Increase membership of auditors and accountants.
40. Continue email announcements of employment opportunity.
41. Distribute the results of research to members and agencies.
42. Circulate a list of members who are expert in various techniques and who are willing to do
brief consulting with others.
43. Have "special interest" or "special topic" brown bags.
44. Hear more from local evaluators about what their workload is like, including information about
on-going projects.
45. Have a "best practices" model available at conferences or workshops.
46. Develop student scholarship fund for conferences.
47. Develop a pool of volunteers to help agencies needing evaluation but with out funding.
48. Provide a summary of the skill-building sessions in the newsletter.
49. Develop a better communication system for members and nonmembers.
50. Develop student scholarship fund for seed money for research (e.g., dissertations).
51. Continue to offer information/knowledge development sessions.
52. Add linkage with national think tanks.
53. Provide community service by offering evaluation to agencies with no funding for such
activities.
54. Develop a speakers committee.
55. Do a better job of advertising the benefits of membership in OPEN.
56. Include students and other disenfranchised groups by offering free membership.
57. Have an annual picnic for members and nonmembers.
58. Increase membership and participation of people of color.
59. Coordinate communication between email, webpage and newsletter.
60. Broaden scope of organization to include all research, not just program evaluation.
61. Increase diversity of membership to include business sectors.
62. Develop a mentor program for students and recent graduates.
63. Have a student division.
64. Develop a mentoring program for recent graduates.
65. Use "lecture" format for the program topics.
66. Develop way to monitor social health in the tri-county area and sell idea to Commissioners to
collect data annually.
67. Develop technical assistance program for members to help other members.
68. Arrange better access to libraries for OPEN members.
69. Work to develop collaboration between OPEN and agencies needing research and
evaluation.
70. Create an interactive chat-room on the website.
71. Develop scholarship for internships.
72. Obtain non-profit status.
73. Work to foster collaboration between members.
74. Plan and schedule the bi-monthly meetings an entire year in advance.
75. Get the membership more active and participatory.
76. Have meetings someplace other than the Lucky Lab.
77. Include "debate" as a format of the program.
78. Provide student scholarships to AEA conference.
79. Arrange for Continuing Education Credit for the programs.
80. Membership directory listings include areas in which members would like to develop their
skills.
81. Double the size of the membership.
82. Provide a variety of no-charge events.
83. Do outreach to higher education and public schools.
42
Table 2. Demographics
Please select one:

government
Name: Employment education
non-profit agency
Where is your primary place of employment: research consultant
student
other
Please select one:

Name: Degree BA/BS degree
Master's degree
What is the highest academic degree earned? Ph.D./Doctorate
Other
Please select one:

Agriculture
Business
Economy
Education
Name: Discipline Evaluation
Health
What is your primary discipline? Human Services
Political Sciences
Psychology
Sociology
Social Work
Other
Name: Years in Research

Enter a value between 0 and 75:
How many years of experience have you had in
research or related activities?
Name: OPEN Member Please select one:
Yes
Are you a member of OPEN? No
Name: AEA Member Please select one:

Are you a member of the American Evaluation Yes
No
Association?
Name: Gender Please select one:
Female
What is your gender? Male
Please select one:

Portland
Portland east suburb
Portland west suburb
Name: Residence Corvallis
Salem
In what city do you reside? Eugene
Oregon - Other
Washington State
Other
44
Table 3. Statements by Cluster in Descending Order by Average Rating
Recruitment
58) Increase membership and participation of people of color. 3.83

6) Increase the diversity of membership. 3.78
13) Involve evaluators in other parts of the state. 3.72
75) Get the membership more active and participatory. 3.61
55) Do a better job of advertising the benefits of membership in OPEN. 3.17
81) Double the size of the membership. 3.11
60) Broaden scope of organization to include all research, not just 2.94
program evaluation.
4) Recruit people working on the fringe of evaluation, e.g., people in 2.78
assessment, training, technical assistance, etc.
63) Have a student division. 2.78
1) Increase membership of program staff and non-researchers. 2.67
56) Include students and other disenfranchised groups by offering free 2.67
membership.
61) Increase diversity of membership to include business sectors. 2.61
39) Increase membership of auditors and accountants. 2.22
Average Rating: 3.07
Linkages
7) Involve other local governments (counties, city, metro). 3.61

26) Work on ways to include evaluators from other parts of the state. 3.61
15) Create linkage with our neighbors to the north, and see what is 3.56
going on regionally.
24) Develop a network to include programs. 2.72
3) Include public-trust businesses in organization, such as public 2.41
utilities or waste management, which use GIF.
35) Use research knowledge for policy change. 3.22

47) Develop a pool of volunteers to help agencies needing evaluation 3.11
but with out funding.
53) Provide community service by offering evaluation to agencies with no 3.00
funding for such activities.
66) Develop way to monitor social health in the tri-county area and sell 2.89
idea to Commissioners to collect data annually.
54) Develop a speakers committee. 2.83
11) Have committee to produce "press releases" about research in the 2.72
area.
2) Develop a logo. 2.17
Outreach
69) Work to develop collaboration between OPEN and agencies needing 3.89
research and evaluation.
33) Do outreach to promote evaluation in the public and private 3.44
sectors.
23) Enhance linkage with other research organizations. 3.33
27) Enhance linkage with American Evaluation Association. 3.33
17) Reach out to organizations receiving federal funds where evaluation 3.28
is required.
36) Do more outreach. 3.28
52) Add linkage with national think tanks. 3.28
83) Do outreach to higher education and public schools. 3.11
8) Meet with Board of Directors of agencies to promote evaluation. 3.06
38) Join state and county advisory boards as representative of OPEN. 2.61
25) Develop linkages with police department to promote evaluation. 2.56
Programs & Events
51) Continue to offer information/knowledge development sessions. 4.56

32) Continue the bi-monthly presentations 4.06
43) Have "special interest" or "special topic" brown bags. 3.89
9) Have annual conferences. 3.83
10) Have meetings during the day to accommodate the working 3.44
researcher with family responsibilities.
21) Organize mini-conference. 3.44
45) Have a "best practices" model available at conferences or 3.41
workshops.
37) Include interactive discussion with audience at program sessions. 3.22
44) Hear more from local evaluators about what their workload is like, 3.22
including information about on-going projects.
72) Obtain non-profit status. 3.17
20) Include "panels" as a program format. 3.11
16) Have more facilitated roundtables. 3.06
77) Include "debate" as a format of the program. 3.00
76) Have meetings someplace other than the Lucky Lab. 2.94
82) Provide a variety of no-charge events. 2.89
29) Hold events outside the downtown Portland area. 2.83
74) Plan and schedule the bi-monthly meetings an entire year in 2.78
advance.
65) Use "lecture" format for the program topics. 2.67
12) Provide a variety of for-fee events. 2.61
22) Have no-host social hour at a variety of locations. 2.61
5) Have skill development sessions more than once every other 2.56
month.
57) Have an annual picnic for members and nonmembers. 2.22
Communication
40) Continue email announcements of employment opportunity. 4.56

48) Provide a summary of the skill-building sessions in the newsletter. 3.83
31) Provide information about available contracts. 3.78
14) Provide information about available grants. 3.67
19) Provide feedback on the annual AEA meeting for those of us not 3.61
able to attend.
30) Develop a newsletter, email or otherwise. 3.44
46
59) Coordinate communication between email, webpage and 3.11
newsletter.
18) Have follow-up information in newsletter about research 2.61
opportunities originally announced via OPEN.
70) Create an interactive chat-room on the website. 2.00
Services
42) Circulate a list of members who are expert in various techniques and 3.89
who are willing to do brief consulting with others.
73) Work to foster collaboration between members. 3.78
41) Distribute the results of research to members and agencies. 3.72
67) Develop technical assistance program for members to help other 3.72
members.
34) Develop special interest subgroups within the membership. 3.44
28) Include a brief bio on each OPEN member. 3.06
49) Develop a better communication system for members and 3.06
nonmembers.
68) Arrange better access to libraries for OPEN members. 2.94
79) Arrange for Continuing Education Credit for the programs. 2.89
80) Membership directory listings include areas in which members would 2.56
like to develop their skills.
64) Develop a mentoring program for recent graduates. 3.22

62) Develop a mentor program for students and recent graduates. 3.11
46) Develop student scholarship fund for conferences. 2.94
71) Develop scholarship for internships. 2.39
50) Develop student scholarship fund for seed money for research (e.g., 2.28
dissertations).
78) Provide student scholarships to AEA conference. 2.28
47
Table 4. Statements in Descending Order by Average Rating
40) Continue email announcements of employment opportunity. 4.56
51) Continue to offer information/knowledge development sessions. 4.56
32) Continue the bi-monthly presentations 4.06
42) Circulate a list of members who are expert in various techniques and who are willing 3.89
to do brief consulting with others.
43) Have "special interest" or "special topic" brown bags. 3.89
69) Work to develop collaboration between OPEN and agencies needing research and 3.89
evaluation.
9) Have annual conferences. 3.83
48) Provide a summary of the skill-building sessions in the newsletter. 3.83
58) Increase membership and participation of people of color. 3.83
6) Increase the diversity of membership. 3.78
31) Provide information about available contracts. 3.78
73) Work to foster collaboration between members. 3.78
13) Involve evaluators in other parts of the state. 3.72
41) Distribute the results of research to members and agencies. 3.72
67) Develop technical assistance program for members to help other members. 3.72
14) Provide information about available grants. 3.67
7) Involve other local governments (counties, city, metro). 3.61
19) Provide feedback on the annual AEA meeting for those of us not able to attend. 3.61
26) Work on ways to include evaluators from other parts of the state. 3.61
75) Get the membership more active and participatory. 3.61
15) Create linkage with our neighbors to the north, and see what is going on regionally. 3.56
10) Have meetings during the day to accommodate the working researcher with family 3.44
responsibilities.
21) Organize mini-conference. 3.44
30) Develop a newsletter, email or otherwise. 3.44
33) Do outreach to promote evaluation in the public and private sectors. 3.44
34) Develop special interest subgroups within the membership. 3.44
45) Have a "best practices" model available at conferences or workshops. 3.41
23) Enhance linkage with other research organizations. 3.33
27) Enhance linkage with American Evaluation Association. 3.33
17) Reach out to organizations receiving federal funds where evaluation is required. 3.28
36) Do more outreach. 3.28
52) Add linkage with national think tanks. 3.28
35) Use research knowledge for policy change. 3.22
37) Include interactive discussion with audience at program sessions. 3.22
44) Hear more from local evaluators about what their workload is like, including 3.22
information about on-going projects.
64) Develop a mentoring program for recent graduates. 3.22
55) Do a better job of advertising the benefits of membership in OPEN. 3.17
72) Obtain non-profit status. 3.17
20) Include "panels" as a program format. 3.11
47) Develop a pool of volunteers to help agencies needing evaluation but with out 3.11
funding.
59) Coordinate communication between email, webpage and newsletter. 3.11
62) Develop a mentor program for students and recent graduates. 3.11
81) Double the size of the membership. 3.11
83) Do outreach to higher education and public schools. 3.11
8) Meet with Board of Directors of agencies to promote evaluation. 3.06
16) Have more facilitated roundtables. 3.06
28) Include a brief bio on each OPEN member. 3.06
49) Develop a better communication system for members and nonmembers. 3.06
53) Provide community service by offering evaluation to agencies with no funding for 3.00
such activities.
77) Include "debate" as a format of the program. 3.00
46) Develop student scholarship fund for conferences. 2.94
60) Broaden scope of organization to include all research, not just program evaluation. 2.94
68) Arrange better access to libraries for OPEN members. 2.94
76) Have meetings someplace other than the Lucky Lab. 2.94
66) Develop way to monitor social health in the tri-county area and sell idea to 2.89
Commissioners to collect data annually.
79) Arrange for Continuing Education Credit for the programs. 2.89
82) Provide a variety of no-charge events. 2.89
29) Hold events outside the downtown Portland area. 2.83
54) Develop a speakers committee. 2.83
4) Recruit people working on the fringe of evaluation, e.g., people in assessment, 2.78
training, technical assistance, etc.
63) Have a student division. 2.78
74) Plan and schedule the bi-monthly meetings an entire year in advance. 2.78
11) Have committee to produce "press releases" about research in the area. 2.72
24) Develop a network to include programs. 2.72
1) Increase membership of program staff and non-researchers. 2.67
56) Include students and other disenfranchised groups by offering free membership. 2.67
65) Use "lecture" format for the program topics. 2.67
12) Provide a variety of for-fee events. 2.61
18) Have follow-up information in newsletter about research opportunities originally 2.61
announced via OPEN.
22) Have no-host social hour at a variety of locations. 2.61
38) Join state and county advisory boards as representative of OPEN. 2.61
61) Increase diversity of membership to include business sectors. 2.61
5) Have skill development sessions more than once every other month. 2.56
25) Develop linkages with police department to promote evaluation. 2.56
80) Membership directory listings include areas in which members would like to develop 2.56
their skills.
3) Include public-trust businesses in organization, such as public utilities or waste 2.41
management, which use GIF.
71) Develop scholarship for internships. 2.39
50) Develop student scholarship fund for seed money for research (e.g., dissertations). 2.28
78) Provide student scholarships to AEA conference. 2.28
39) Increase membership of auditors and accountants. 2.22
57) Have an annual picnic for members and nonmembers. 2.22
2) Develop a logo. 2.17
70) Create an interactive chat-room on the website. 2.00
49
Traducción realizada por Roxy Silva laossi@hotmail.com
tutoreada por el prof. Ricardo Aldazoro (Psicólogo) y el prof. Oscar Guillermo (Lic.
en Idiomas) como Trabajo de Grado realizado para la Escuela de Psicología de la
Universidad Central de Venezuela
El evaluador como cartógrafo

Tecnología para elaborar mapas que nos indiquen hacia dónde
vamos y dónde hemos estado

Universidad de Cornell
Ensayo presentado en la conferencia de la red de evaluadores del programa Oregón
1999, “Evaluación y tecnología: Herramientas para el siglo 21, Pórtland, Oregón, 8
de octubre de 1999.
Este BORRADOR no esta destinado para ser usado a manera de cita, referencia, ni
para ser difundido.
© 1999, William M.K. Trochim, Todos los derechos reservados.
Introducción
El reconocido psicólogo organizacional Karl Weick solía contar una anécdota
que ilustraba la importancia crucial que tienen los mapas para los grupos u
organizaciones.1
Un grupo de montañistas estaba ascendiendo a uno de los picos más
desafiantes de los Alpes cuando de pronto se vieron atrapados por una fuerte
1
Recuerdo haber escuchado esta historia pero no cuento con la cita exacta. Traté de ser lo más preciso
posible para conservar los detalles originales. Sin embargo, si alguna persona cuenta con la cita en
cuestión, por favor, comuníquese con mi persona.
1
nevada. Todos eran montañistas experimentados y tenían sus propias ideas en
cuanto a la dirección que debían tomar para regresar al campamento base.
Anduvieron errantes por la montaña durante algún tiempo, mientras discutían
sobre cuál ruta debían tomar. Entre tanto, cada vez que caían en alguna
indecisión, la situación se tornaba más crítica y peligrosa. Finalmente, uno de
los montañistas comenzó a revisar su bolso y encontró un mapa. Todos se
agruparon alrededor del mismo, comenzaron a analizarlo y así pudieron
determinar rápidamente el rumbo que debían tomar. Unas horas mas tarde,
llegaron sanos y salvos al campamento. ¡Mientras se reunían alrededor de la
fogata para calentarse y compartir la experiencia de lo que estuvo a punto de
convertirse en una desgracia, uno de los escaladores tomo el mapa que habían
utilizado para descender de los Alpes y, mirándolo con mas detenimiento, se
dio cuenta que en realidad se trataba de un mapa de los Pirineos!
El Mapa les había servido como una herramienta de arbitraje que les permitió
organizarse y llegar a un consenso. El mapa les doto de un aparente sentido de
dirección y les permitió formular un plan coherente que los condujo a proceder de
una manera concertada. Sin embargo, se habían orientado con el mapa equivocado.
Quizás el grupo corrió simplemente con un golpe de suerte. Lo cierto fue el hecho de
que guiarse con el mapa equivocado les permitió decidirse por la ruta “correcta”.
Luego la historia nos muestra claramente que anduvieron errantes por la montaña. Si
no hubieran contado con alguna herramienta que les permitiera engranarse como
equipo, muy probablemente hubieran perecido en medio de la confusión y del
conflicto. Por supuesto habríamos preferido que nuestros montañistas hubieran
contado con el mapa “correcto”, pero, en medio de una tormenta, incluso un mapa
equivocado en muchas ocasiones puede contribuir a tomar mejores decisiones que si
no se contara con alguno.
En una intrigante y fascinante obra publicada hace varias décadas, Nick Smith
(1981) describió algunas metáforas interesantes que invitan a reflexionar sobre la
tarea de la evaluación. Demostró que era útil concebir la actividad de los evaluadores
2
como una tarea análoga a la emprendida en otras profesiones y campo con funciones
similares, y hasta en aquellos que parecieran ser muy diferentes. Por ejemplo,
comparo aspectos de la evaluación con la actividad que los abogados realizan cuando
entablan un juicio o con la tarea que los críticos del arte hacen cuando analizan una
obra artística. La intención de este investigador no fue sugerir que los evaluadores
fuesen realmente como los abogados o críticos del arte, sino solamente que algunos
aspectos del trabajo de los evaluadores presentaban rasgos relevantes comunes con
estas disciplinas.
Este documento intenta aplicar esa noción a un campo que Smith no había
considerado hasta entonces: la cartografía. Sugiere que una manera de abordar la
evaluación consiste en concebirla como una actividad similar a la elaboración de un
mapa mental. Así como el cartógrafo, el evaluador recaba información, aunque no
sea de naturaleza geográfica2. Así como el cartógrafo, el evaluador también analiza y
representa información, decide cuál es la mejor forma de representarla, minimiza la
subjetividad y describe gráficamente las perspectivas. Al igual que el cartógrafo, los
evaluadores esperan que sus representaciones sean útiles para guiar a otras personas y
para ayudarlas a tomar decisiones con mayor fundamento. Además, durante las dos
décadas que transcurrieron después de la publicación de la obra de Smith, hemos sido
testigos de una forma de análisis y de representación que surge en la evaluación
llamada “elaboración de mapas”. No se trata de mapas geográficos, sino de mapas de
ideas y de datos. El evaluador propicia la creación de estos mapas como
complemento y alternativa para las formas tabulares, numéricas y textuales de
representar información más tradicionales. Igualmente, el evaluador espera que, así
como los mapas geográficos, los mapas conceptuales y de datos ayuden a informar y
guiar a otras personas y les permita tomar mejores decisiones.
2
El lector debe notar que este ensayo no toma en cuenta en ningún momento la importancia y el
creciente uso de los sistemas de información geográfica en el estudio de la evaluación. “Hacer mapas”
de la metáfora quiere decir que se pretende hablar de los mapas de datos y conceptos como los
descritos en este ensayo.
3
Así como las metáforas de Smith, en este documento no pretendemos sugerir
que la evaluación y la cartografía sean lo mismo o que, de algún modo, una sea un
subconjunto de la otra. Estos dos campos se diferencian en muchos aspectos. No
obstante, este ensayo intenta demostrar que, especialmente cuando se habla del
surgimiento de las técnicas de elaboración de mapas en la evaluación, el evaluador se
beneficiara al considerar qué han hecho los cartógrafos y hacia dónde parecen
dirigirse. Los evaluadores que usan la cartografía estarán mejor informados y
preparados a medida que aprendan de qué manera los cartógrafos abordan los
problemas de contexto, propósito, metodología, tendencia, y representación en su
propio campo. Este ensayo espera impulsar el inicio de este esfuerzo
interdisciplinario.
Mapas corrientes y mapas conceptuales.
La idea sobre los “mapas” se remonta a los inicios de la historia humana e

incluso su origen antecede a los registros históricos (Wilford: 1982). La tendencia a
representar de forma gráfica la diferencia entre el aquí y el allá o a indicar a las
personas cómo llegar de un lugar a otro es una característica fundamental del hombre.
Los registros históricos sugieren que “...el mapa evolucionó independientemente en
diversos pueblos de la Tierra, muy distantes entre sí” (Wilford: 1982). A lo largo de
la historia, los mapas han desempeñado papeles importantes como proporcionar
direcciones, demarcar propiedades, introducir demandas, y demostrar el poderío de
los estados.
Cuando la mayoría de las personas piensan en mapas, lo primero que les viene
a la mente son imágenes geográficas. Es ingenuo suponer que la representación de
una realidad física externa sobre un mapa, sea tan precisa como lo plasma el
cartógrafo en su mapa. Veremos que ésta no es la opinión que predomina en la
mayoría de los cartógrafos, sino que es la que persiste en el publico en general.
4
Podemos establecer las diferencias entre diversos tipos básicos de mapas tanto
para apreciar la evolución de la concepción de esa estructura de los mapas como para
contrastar en qué medida hemos superado la idea del fundamentalismo geográfico
generalmente atribuido a la cartografía. Parar tal fin, desarrollaremos un sistema de
clasificación sencillo que probablemente sustituyamos en la versión final de este
ensayo; no obstante, pudiera servir de orientación para establecer algunas diferencias
significativas.
Comencemos haciendo una pequeña diferencia entre la “base” de un mapa y
cualquier información adicional que esté apoyada sobre esta base. Además,
distinguiremos entre la posibilidad de que la base sea de tipo geográfico o represente
algún arreglo de elementos relacionados entre sí. De esa manera disponemos “grosso
modo” de cuatro tipos de mapas:
Sin Información Con Información

Base Geográfica Mapa geográfico Mapa de información geográfica
Base Relacional Mapa relacional Mapa de información relacional
Cualquier mapa geográfico (sea de carreteras, de reconocimientos geológicos

o la mayoría de los mapas de los atlas comunes) puede ser considerado
geográficamente como “isomorfo”. Esto quiere decir, que presuntamente existe una
relación uno-a-uno entre la información representada en el mapa y alguna realidad
física externa. Este isomorfismo nunca puede ser exacto (siempre es una
representación errada). Por ejemplo, el símbolo de la estrella en el mapa de carreteras
mostrado en la figura 1 indica dónde está ubicada mi casa geográficamente. La
estrella por sí misma sólo es un símbolo que representa la casa (eso no quiere decir
que mi casa tenga forma de estrella ni que sea casi tan grande como esta representada
en ese mapa), sin embargo, en el mapa se presupone que existe una correspondencia
5
uno-a-uno (es decir, un isomorfismo) entre los objetos que se encuentran en el mapa y
algún elemento de la realidad geográfica. Si usted se dirige al lugar indicado en el
mapa, va a encontrar mi casa. La correspondencia es la siguiente: “estrella”= “casa”
y “línea” = “carretera” y “línea azul” = “agua” y así sucesivamente.
Figura 1. Mapa de primer nivel con isomorfismo geográfico.
Un mapa de segundo nivel es aquel que conserva su base geográfica pero que
además muestra otros elementos; podríamos denominar este tipo de mapa como
“mapa de información geográfica”. La idea consiste en representar algunas
características referidas a la base geográfica. Por ejemplo, consideremos el mapa de
la figura 2 que representa gráficamente terremotos de los 48 estados contiguos de los
Estados Unidos. Si se utilizara este mapa para navegar geográficamente a una
localidad que aparece en el mapa con el color blanco (que indica un área de alto
riesgo), probablemente no se “encontraría” un terremoto en el mismo sentido en que
se “encontraría” mi casa si se siguen los datos geográficos implícitos en la Figura 1.
En la mayoría de los mapas de información, la “información” no es geográfica en sí
misma y puede ser simplemente bien representada con respecto a la geografía.
6
Figura 2. Mapa computarizado elaborado en 1989 que muestra las áreas propensas
a terremotos. Las áreas de alto riesgo se identifican en forma de picos blancos.
Cortesía de Melvin L. Prueitt, Laboratorio Nacional de Los Álamos. Información
obtenida del Estudio Geológico de los EE.UU.
Esta información puede ser incluso más abstracta que los terremotos. Por
ejemplo, pudiéramos trazar un mapa de crímenes específicos sobre una base
geográfica, como se muestra en el mapa de Evansville, Indiana, en la Figura 3.
Nuevamente, si bien es cierto que existe una base geográfica para la información
representada en el mapa, nadie esperaría dirigirse a un punto específico de este mapa
y encontrarse con el suceso señalado en el mismo. La base geográfica se utiliza como
un marco de referencia para representar la información sobre crímenes de modo que
tengan sentido visualmente.
7
Figura 3. Mapa de crímenes de Evansville, Indiana, que representa los crímenes del
fin de semana del 22 de septiembre de 1999.
Pasemos a un nivel mayor de abstracción. En la Figura 4 observamos

información sobre crímenes discriminados por estados. Aquí la información es aún
más abstracta y mucho más discriminada que en la Figura 3. No podríamos esperar
dirigirnos al estado de Florida y “ver” de una manera directa el mayor índice de
crimen reflejado por el tono oscuro asignado a ese estado. La base geográfica se
utiliza como un marco de referencia para representar información sobre otra variable
o medida. Es importante reconocer que la información plasmada en este mapa
representa una apreciación considerable. ¿A qué nos referimos cuando hablamos de
incidencia de crímenes “graves” para los fines del mapa? ¿Con cuánta precisión
podemos suponer que estos crímenes han sido realmente cuantificados? ¿Cuáles son
los niveles de crímenes que han sido divididos en cinco categorías y representados
con diferentes tonalidades? Y, finalmente ¿Cuán diferente pudiera lucir el mapa si la
información fuera mostrada por condados?
8
Figura 4. Crímenes graves cometidos en cada estado (las zonas más oscuras indican
una tasa más elevada de crímenes graves).
El tipo de mapa de información geográfico mostrado en la Figura 4 es una

invención de los siglos más recientes. La idea de utilizar mapas geográficos como
una base para representar otra información que no sea de origen geográfico es una
idea notablemente “moderna”. Observe que podemos acercarnos a características
cada vez más abstractas como objetos de representación. Por ejemplo, en vez de
medir crímenes, pudiéramos medir la percepción de las personas acerca de la
seguridad. Podríamos realizar un estudio de opinión pública sobre algún tema en
particular solicitando a las personas que evalúen de manera subjetiva el grado de
seguridad que perciben en sus comunidades y elaborar un mapa geográfico que
contenga dicha información. En este caso, la percepción de los entrevistados sobre la
seguridad pudiera o no tener algún parecido con el patrón estadístico de crímenes de
la misma zona geográfica.
Si nos alejamos de la base geográfica, nos acercamos al tercer tipo de mapas
que se definirá aquí como “mapa relacional”. En este mapa, la base no es de carácter
geográfico. Por ejemplo, si le pedimos a una persona que “trace” un mapa de
diferentes tipos de crímenes, que agrupe los crímenes que considera que son más
9
parecidos y los ubique en el mapa, obtendríamos una representación relacional. La
“base” de este mapa no constaría de entidades geográficas sino que, en este caso, de
entidades de crímenes. Estos crímenes estarían localizados de modo que se aprecien
próximos los unos de los otros, basándonos en la apreciación de una persona con
respecto a la similitud de los crímenes cometidos. El “panorama” implícito para tal
mapa es la disposición relacional, perceptiva y subjetiva de la persona en cuanto a los
crímenes y los tipos de crímenes.
Cualquier mapa en el que se utilice como base una información de relación no
geográfica, se puede clasificar como mapa relacional. Si la intención del mapa es
representar de manera precisa alguna realidad cognitiva implícita, podemos
denominarlo mapa relacionalmente isomorfo. En este caso, cada símbolo tiene una
relación uno-a-uno con una idea o constructo, y la disposición de símbolos en el mapa
revela la manera en que se consideran interrelacionadas las ideas.
Durante las últimas décadas hemos observado cómo se ha desarrollado un
gran número de estos tipos de mapas. Encontramos desde los “mapas mentales”
recomendados por Tony Buzan (Buzan: 1993) que se aprecian en la Figura 5 hasta los
mapas conceptuales desarrollados por Joe Novak (Novak y Gowin: 1985; Novak:
1993) ejemplificados en la Figura 6. En ambos tipos de mapas, la intención consiste
en representar el pensamiento de una persona de manera gráfica. La estructura
relacional está basada en líneas que representan relaciones entre ideas. No se ha
intentado desarrollar un esquema euclidiano significativo en el que se representen
relaciones de forma geográfica, sino que las líneas conectoras son las que portan
información relacional.
En la mayoría de los casos, estos tipos de mapas son interpretaciones
individuales. En el caso de los grupos, no se dispone de algún algoritmo que permita
conjugar, sino que el grupo debe realizar esta tarea de manera interactiva.
10
Figura 5. Modelo estilizado de mapa mental al estilo de Tony Buzan (Buzan: 1993)
que muestra las asociaciones entre las ideas de investigadores para un proyecto de
investigación.
Figura 6. Modelo de mapa conceptual relacionalmente isomorfo al estilo de (Novak

y Gowin: 1985) sobre el tema de San Nicolás.
11
Finalmente llegamos a los tipos de mapas descritos en este ensayo como
mapas conceptuales. Luego presentaremos ilustraciones de estos tipos de mapas. Se
trata también de mapas relacionales, pero en este caso, la base relacional proporciona
la estructura que permite llevar o representar información adicional. En este sentido,
podemos describir estos mapas como mapas de información relacional. Por lo
general, estos mapas conceptuales son interpretaciones de grupos. Estos utilizan
algoritmos matemáticos que conjugan criterios individuales con respecto a la
similitud de las ideas y representan las ideas en símbolos ordenados en un espacio
euclidiano. En este sentido, estos mapas se parecen a los mapas geográficos en tanto
que la distancia entre los símbolos es interpretada significativamente como una
apreciación empírica de la distancia semántica entre las ideas.
El propósito de este sistema de clasificación, en vez de aportar términos útiles
para distinguir diferentes tipos de mapas, consiste en demostrar que existe un
continuum entre los mapas geográficos tradicionales y los mapas conceptuales cuyo
diseño es más reciente. A primera vista estos mapas pudieran parecer menos
diferentes. Dada esta relación entre los mapas geográficos tradicionales y los mapas
conceptuales, que son el tema de este ensayo, deberíamos interesarnos en la
cartografía y sus concepciones metodológicas más recientes a fin de obtener la visión
que nos ayude a comprender con mayor profundidad la calidad, capacidad de
interpretación y validez propias de la elaboración de mapas conceptuales.
Pero antes, es necesario describir el procedimiento básico para elaborar un
mapa conceptual, y presentar algunos detalles de un ejemplo de un estudio de caso.
Con este prólogo podemos examinar algunos de los tópicos o aspectos relevantes que
la cartografía contemporánea está tratando de resolver.
12
Elaboración de mapas conceptuales
Elaboración de mapas conceptuales

La elaboración de los mapas conceptuales es un proceso que puede utilizarse
para ayudar a las personas a describir sus ideas sobre cualquier tópico de interés
(Trochim: 1989a) así como a representar esas ideas de manera visual en forma de
mapa. Por lo general, este proceso exige que los participantes, por medio de una
lluvia de ideas, aporten un gran número de enunciados consistentes en conceptos con
los cuales se pueden generar ideas y enunciados, que estén relacionadas con el tema
de interés, que clasifiquen esos aportes en conjuntos según su similitud y los evalúen
en una escala a fin de apreciarlas mejor y por último interpreten los mapas obtenidos
a partir de estos análisis de la información. Es típico que los análisis incluyan un
“método de escalamiento bidimensional multidimensional (MDS)” de la información
clasificada no estructurada, un análisis jerárquico de los conjuntos ubicados en las
coordenadas MDS, y el cómputo de las apreciaciones promedio de cada idea,
concepto o enunciado y del conjunto de ellos. Los mapas que se obtienen como
resultado muestran las afirmaciones individuales en un espacio bidimensional (x,y)
con más afirmaciones similares ubicadas unas más cercas de la otras, y muestra de
qué manera se agrupan las afirmaciones en conjuntos que dividen el espacio del
mapa. Los participantes son guiados por una serie de sesiones de interpretación
estructuradas diseñadas con el fin de ayudarles a comprender los mapas y asignarles
nombres que guarden una substancial relación con su significado.
El proceso para elaborar mapas conceptuales explicados en este ensayo fue
descrito por primera vez por Trochim y Linton (1986). Trochim (1989a) esboza el
proceso en detalles y Trochim (1989b) presenta una gran cantidad de ejemplos de
proyectos. Los mapas conceptuales han sido ampliamente utilizados y aplicados para
manejar problemas importantes en el área de servicios sociales (Galvin, 1989;
Mannes, 1989), 1992; Marquart et al, 1993; Penney, 1992; Ryan y Pursley, 1992;
13
Shern, 1992; Trochim, 1989a; Trochim y Cook, 1992; Trochim et al, en proceso de
impresión; Valentine, 1992), cuidado de la salud (Valentine, 1989), educación
(Grayson, 1993; Kohler, 1992; Kohler, 1993), administración educacional (Gurowitz
et al, 1988), desarrollo del entrenamiento (McLinden y Trochim, en proceso de
impresión) y desarrollo de la teoría (Linton, 1989; Witkin y Trochim, 1996). Se ha
realizado un considerable trabajo metodológico sobre el proceso de la elaboración de
mapas conceptuales y su utilidad potencial (Bragg y Grayson, 1992; Keith, 1989;
Lassegard, 1992; Marquart, 1989; Mead y Bowers, 1992; Mercer, 1992; SenGupta,
1993; Trochim, 1985, 1989c, 1990, 1993).
Cómo se elabora un mapa conceptual

La elaboración de un mapa conceptual combina un conjunto de procesos
(lluvia de ideas, ordenación y clasificación no estructurada de los ítems que surgen de
la lluvia de ideas) con varios análisis estadísticos de diversas variantes (escala
multidimensional y análisis jerárquicos de conjuntos) y concluye con una
interpretación que las personas realizaron acerca de los mapas conceptuales
obtenidos.
En una situación típica, la realización de mapas conceptuales comienza con la
formulación de un planteamiento de enfoque que guía y delimita el alcance del mapa.
Posteriormente, se realiza una serie de planteamientos que abordan el planteamiento
principal, lo cual usualmente se logra a través algún mecanismo de lluvia de ideas3.
Por lo general, se recolectan dos tipos de datos con respecto a estos planteamientos.
En primer lugar, se solicita a cada participante que los ordene en series que guarden
cierta similitud, es decir, que se realice un ordenamiento por similitud no
3
La metodología para elaborar mapas conceptuales no conoce o presta poca atención a la manera
como surgen los planteamientos. Estos pueden ser extraídos de documentos ya existentes, pueden
surgir de algún individuo, desarrollado de una trascripción de entrevista, y así por el estilo. Todo lo
que requiere el método es que exista un conjunto de planteamientos. Claro está que la interpretación
que una persona le dé a los mapas dependerá de manera crucial de la forma en que se generen dichos
planteamientos. La versión actual del software Concept System acepta hasta 200 planteamientos en
un mapa, aunque sería poco probable que un grupo de participantes se sienta a gusto trabajando con
más de 100 planteamientos o algo por el estilo.
14
estructurado. Esta clasificación es necesaria puesto que un mapa conceptual no puede
elaborarse sin datos que permitan realizar el ordenamiento. Por lo general, y aunque
no es un requisito, el segundo paso consiste en solicitar a cada participante que
clasifique cada planteamiento en una o más variables. Lo más usual es clasificar cada
planteamiento por su importancia relativa, generalmente en una escala del 1 (nada
importante) al 5 (sumamente importante). La información sobre la clasificación dada
no se utiliza para producir la base del mapa en sí (sólo se utiliza como una capa
superpuesta sobre un mapa que fue construido a partir de la información organizada).
El primer paso en el análisis consiste en transformar el ordenamiento realizado
por cada uno de los participantes en información cuantitativa. En esta etapa el reto
consiste en hallar una manera de “agregar” o combinar la información aportada por
los diferentes participantes, debido a que diferentes individuos deberán tener
diferentes números de conjuntos o series de ordenamiento. La solución consiste en
insertar cada propuesta de ordenamiento dentro de una matriz de igual medida. La
Figura 7 ilustra lo antes dicho con un ejemplo sencillo que incluye un solo
participante y un ordenamiento de diez planteamientos. En este ejemplo, esta persona
ordena los diez planteamientos en cinco series o conjuntos. Quizás otros
participantes los hubieran dividido en un número mayor o menor de conjuntos, pero,
en este caso, todos ordenaron el mismo número de planteamientos, es decir, diez. De
esa manera, construimos una matriz o una tabla de números 10x10. Cada individuo
cuenta con una tabla binaria que está formada únicamente por ceros (0) y unos (1). Si
dos planteamientos fueran colocados juntos en una serie, sus números de filas y
columnas correspondientes llevarían un 1. Si no fueran colocados juntos, el valor del
punto en que convergen la fila y la columna de ambos sería un 0. Debido a que un
planteamiento dentro de una serie siempre se ordena consigo mismo, la diagonal de la
matriz siempre estará formada por “unos” (1). La matriz es simétrica puesto que, por
ejemplo, si el planteamiento o enunciado 5 está ordenado con relación al
planteamiento 8 entonces siempre se cumplirá que el planteamiento 8 esté ordenado
con el número 5. De esa manera, el análisis de la elaboración del mapa conceptual
15
comienza con la interpretación de los datos de organización de una matriz binaria
NxN (donde N = el número de planteamientos) y una matriz simétrica de similitudes,
Xij. Para cualesquiera dos ítems i y j, se coloca un 1 en Xij si el participante hubiera
ubicado los dos ítems en la misma serie; de no ser así, se coloca un 0 (Weller y
Romney: 1988, p. 22).
Figura 7. Transformación de datos de ordenamiento en una matriz cuadrada binaria

de similitudes
Ahora, con esta sencilla transformación del ordenamiento en forma de matriz,

contamos con una estructura de información común que tiene las mismas
dimensiones para todos los participantes. Esta matriz nos permite integrar
numéricamente los ordenamientos de los demás participantes. La Figura 8 muestra
cómo se puede ver la matriz cuando se unen las propuestas de ordenamiento de cinco
participantes, cada uno de los cuales ordenó un conjunto de diez planteamientos. En
efecto, las matrices individuales son “apiladas” una sobre las otras y agregadas. De
esta manera, cualquier celda en esta matriz pudiera adoptar valores enteros entre 0 y
5, es decir, el número de personas que organizaron los planteamientos; el valor indica
el número de personas que colocaron el par i,j en la misma serie. En consecuencia,
16
en esta segunda etapa se obtuvo la matriz total de similitudes NxN, Tij por medio de
la suma de cada una de las matrices Xij.
Figura 8. Integración de datos de ordenamiento de cinco participantes.
La figura anterior muestra precisamente esta matriz total de similitudes Tij

que se analiza aplicando la escala multidimensional no métrica (MDS) con una
solución bidimensional, limitada a dos dimensiones por las razones que plantean
Kruskal y Wish (1978):
Debido a que generalmente es más fácil trabajar con configuraciones
bidimensionales que con aquellas que comprenden más dimensiones, es
también importante tomar en cuenta la sencillez de su uso al momento de
tomar decisiones sobre dimensionalidad. Por ejemplo, si en principio se
desea una configuración MDS usada como base para mostrar los resultados
de las operaciones de agrupación, entonces resulta mucho más útil aplicar
una configuración bidimensional que una que tenga tres o más dimensiones
(p. 58).
Este análisis arroja una configuración bidimensional (x,y) de un conjunto de
planteamientos basados en el criterio que consiste en que los planteamientos
agrupados por un número mayor de personas están colocados más cerca en un espacio
bidimensional, mientras que aquellos que se agruparon con menos frecuencia están
17
más separados. La Figura 9 muestra la información que ingresa a la matriz de
similitudes y el resultado más elemental de un “mapa de puntos”.
Figura 9. Información de entrada y resultado del análisis del mapa.
El método de escalamiento multidimensional (MDS) es el procedimiento

analítico que permitió la elaboración básica del mapa mostrado en la Figura 9.
¿Cómo se logra este resultado? Existen numerosas descripciones matemáticas del
proceso (Davinson: 1983; Kruskal y Wish: 1978) que no se mencionarán de nuevo en
este ensayo. En lugar de ello, procuramos dar una explicación no matemática que
esperamos les brinde una mayor visión con respecto a la manera en que el MDS
cumple su función.
Una manera intuitiva de hacer su trabajo es analizando lo opuesto a lo que
realiza el MDS. Tal como se describió en la Figura 9, el MDS requiere una matriz
cuadrada de (di)similitudes4 de un conjunto de ítems/objetos como información de
4
El término (di)similitudes empleado en el texto sobre MDS para indicar que los datos puede consistir
tanto en similitudes como en disimilitudes o diferencias. En la elaboración de mapas conceptuales,
18
entrada y como resultado produce un mapa5. Para ver cómo se hace esto, piense en la
tarea mucho más intuitiva de ir en la dirección opuesta (comenzando por un mapa y,
de allí, elaborando una tabla de (di)similitudes). La Figura 10 muestra un mapa de
los Estados Unidos en el que pueden apreciarse tres de sus ciudades principales. Las
ciudades son los “objetos”, es decir, los puntos en este mapa son análogos a los
planteamientos que aparecen en un mapa conceptual. ¿Cómo pudiera elaborarse una
tabla a partir de una tabla de (di)similitudes partiendo de un mapa bidimensional
como éste? La forma más sencilla sería utilizando una regla para medir las distancias
(en cualquier unidad) entre todos los pares posibles de los tres puntos. La figura
muestra las distancias obtenidas (en pulgadas) y la tabla de una tabla de disimilitudes
que es lo contrario a lo que hace el MDS. Sin embargo, se trata de una tarea común y
fácil de entender por cualquiera que siempre haya trabajado con un mapa.
Figura 10. Operación contraria al MDS partiendo de un mapa a una tabla de

(di)similitudes
los datos son siempre de la matriz cuadrada de similitudes simétricas que generalmente se obtiene de
los datos de ordenamiento.
5
El “mapa” es la distribución de los puntos que representan la ubicación de objetos en espacio N-
dimensional; en los mapas conceptuales, los objetos son los planteamientos generados mediante lluvia
de ideas u otros medios y el mapa que produce el MDS es el mapa de puntos en dos dimensiones.
19
Ahora bien, consideremos cómo funciona el MDS en el contexto de este
ejemplo. El MDS comenzaría con una matriz de distancias entre tres ciudades y
surgiría un mapa que muestre las tres ciudades a manera de puntos. La Figura 11
muestra cómo funcionaría esto. Comenzamos con una tabla común, obtenida
fácilmente de un almanaque o de un atlas. La tabla muestra las distancias
aeronáuticas en forma recta (en millas) entre tres ciudades. La meta del MDS sería
convertir esa información en un mapa. Limitaremos los resultados a dos dimensiones
porque esa es la típica solución que aplicamos en la elaboración de mapas
conceptuales. ¿Cómo se pudiera hacer esto manualmente? La distancia aeronáutica
en la tabla se encuentra dentro de la gama de 713 a 2451. Estas unidades son
probablemente más prácticas para trazar un mapa sobre un trozo de papel. Como
primer paso, se pueden convertir las distancias aeronáuticas en una unidad de medida
que sea más práctica para graficar. En la figura, para hacer la conversión de pulgadas
se utiliza una escala de 1 pulgada por cada 500 millas aéreas. Ahora que estamos
trabajando en pulgadas, podemos trabajar mejor con papel y lápiz. El objetivo es
ubicar tres puntos sobre el papel de manera que las distancias entre los puntos
representen mejor las distancias en pulgadas en la tabla. Se pudiera comenzar
colocando un punto. En este caso, supongamos que se coloca un punto en el papel
para representar la ciudad de Los Ángeles. Posteriormente, se coloca un segundo
punto, no importa si se selecciona a Nueva York o a Chicago; entonces supongamos
que se seleccionó arbitrariamente a Nueva York. ¿Dónde se coloca el punto? De
acuerdo con la tabla de distancias, se necesita colocar a Nueva York a 4.90’’ de Los
Ángeles (note que no importa en qué dirección uno se dirige, sólo que la distancia de
Nueva York a Los Ángeles sea exactamente esa). La figura muestra una regla que
indica dónde se colocaría el punto que representa a Nueva York.
20
Figura 11. Cómo obtener manualmente un mapa bidimensional partiendo de una
matriz de similitudes
Todo lo que resta por hacer consiste en colocar el punto de la tercera ciudad,
en este caso, de Chicago. ¿Dónde debe ir el punto? El punto debe reunir
simultáneamente las condiciones de tener tanto 3,49’’ de Los Ángeles y 1,43’’ de
Nueva York. La Figura 11 muestra cómo se pueden reunir estas condiciones usando
un compás para trazar un semicírculo desde Los Ángeles a 3,49’’ y uno desde Nueva
York a 1,43’’. Los semicírculos interactúan en dos puntos y cualquiera de estos
pudieran ser ubicaciones igualmente apropiadas para el punto que represente a
Chicago si el objetivo es el de representar las distancias.
Si se trabaja con tres ciudades, resulta sumamente sencilla la tarea de construir
un mapa partiendo de una tabla de distancias. No obstante, si se tratara de cuatro,
diez o hasta 100 ciudades ¿Qué debería hacerse? El proceso rápidamente se volvería
tedioso y, luego de colocar unos cuantos puntos, se volvería totalmente imposible de
realizar de forma manual. En el caso de la elaboración de mapas conceptuales
21
generalmente se nos ocurren muchas ideas; algunas veces se pueden tener hasta 100 e
incluso 150 ideas que necesitan ser representadas en un mapa. La información que se
necesita para la elaboración de mapas (la analogía con la tabla de distancias
aeronáuticas) es la matriz de similitudes entre los planteamientos que se obtienen por
medio de la tarea de la clasificación, tal como se describe en la Figura 8. El resultado
es el mapa de puntos de los planteamientos. El MDS lleva a cabo de forma
matemática un proceso análogo al que usted realizaría de forma manual en nuestro
ejemplo sencillo de las tres ciudades, sólo que se pudiera hacer de esa manera en el
caso de una tabla que incluya cien ciudades o más.
Existen diferentes puntos de vista importantes que surgen de esta sencilla
descripción sobre lo que hace el MDS. Este método no maneja direcciones como lo
haría una brújula, no diferencia el norte del sur. En el ejemplo de la Figura 11, el
MDS pudiera muy bien ubicar a Chicago en cualquiera de los dos puntos, y colocar a
Los Ángeles tanto a la derecha como a la izquierda. Esto quiere decir que cuando se
observa un mapa conceptual formado a partir del MDS, las direcciones que se trazan
sobre el mapa son completamente arbitrarias. Se puede tomar un mapa conceptual y
colocarlo de manera horizontal o vertical y/o hacerlo girar cuanto se desee en sentido
horario o antihorario y no causaría ningún efecto sobre las distancias entre los
puntos. Este simple ejercicio muestra que el MDS produce un cuadro relacional y es
independiente de la orientación direccional.
En nuestro ejemplo de las tres ciudades, siempre va a existir una solución
bidimensional que representará la tabla de manera exacta, sin ningún error. Cuando
utilizamos un mayor número de puntos, la situación cambia (probablemente no se
podrán representar las (di)similitudes de manera exacta, sin ningún error. Si se
trabaja en dos dimensiones, algunas matrices de (di)similitudes se podrán representar
de forma más precisa que otras. En el método MDS, estimamos el grado global de
correspondencia entre la información de entrada (es decir, la matriz de
(di)similitudes) y los resultados (es decir, las distancias entre los puntos sobre el
mapa) aplicando un valor denominado Valor de Intensidad. Un valor de intensidad
22
menor indica una mejor adecuación, mientras que una intensidad mayor significa que
la adecuación es menos exacta. En general, lo deseable es una menor intensidad,
aunque no siempre quede claro si las pequeñas diferencias de intensidad pudieran
entenderse como diferencias significativas en cuanto a la capacidad de interpretación
de un mapa. La norma dentro de la gama de valores para evaluar la intensidad en un
estudio particular se debe determinar partiendo de las comparaciones entre tipos de
informaciones similares recabadas bajo condiciones parecidas. ¡Nunca podría
esperarse obtener un valor de intensidad en un estudio con mapas conceptuales de
100 ideas o planteamientos que ni remotamente se acerque a un valor tan bajo como
el que se obtendría de un mapa de distancias entre 100 ciudades! Tampoco
esperaríamos obtener un valor de intensidad tan bajo si realizáramos mapas de
ciudades, ya no en términos de la distancias aeronáuticas, sino más bien en términos
de medidas que posean más variabilidad como las incidencia de criminalidad, índices
de pluviosidad anuales, o incluso la percepción sobre la calidad de vida. En un
estudio sobre la confiabilidad de los mapas conceptuales, Trochim (1993) informó
que el Valor de Intensidad promedio entre 33 proyectos de mapas conceptuales era
0,285 dentro de una gama de valores ubicada entre 155 y 352. Aún cuando el valor
de intensidad ofrece alguna utilidad en la tarea interpretativa en tanto proporciona
alguna idea sobre el nivel de precisión con que un mapa pueda representar la
información de entrada con relación a otros mapas, no se sabe con claridad si los
mapas con valor de intensidad menor pueden interpretarse o utilizarse más que los
que tienen un valor de intensidad considerablemente mayor.
Hasta este momento la exposición muestra cómo el análisis de la elaboración
de mapas conceptuales utiliza los resultados de la clasificación de datos y el MDS
para producir el “mapa de puntos” básico que sirva de fundamento para el resto de los
mapas. A medida que este análisis es útil en sí mismo, de ese modo ayuda a
visualizar un mapa conceptual detalladamente en diferentes niveles. Así como ocurre
con los mapas geográficos, hay momentos en los que se desea obtener suficientes
detalles (p.ej. cuando se hacen excursiones o al escalar montañas) y otros (p.ej. al
23
manejar por carreteras que cruzan estados) en los que resulta más útil contar con
mapas de mayor nivel. El mapa de puntos que se elabora partiendo de un MDS es un
mapa bastante detallado. Para obtener un mapa de un nivel mayor que resuma una
buena parte de los detalles, se aplica un procedimiento conocido como análisis
jerárquico de conjunto. La información de entrada para el análisis de conjuntos es el
mapa de puntos, es decir, los valores específicos X,Y para todos los puntos del mapa
MDS. Al utilizar la configuración MDS como información de entrada para el análisis
de conjuntos, éste obligada a dividir la configuración MDS en conjuntos no
superpuestos ubicados en un espacio bidimensional. Desafortunadamente, los
matemáticos no se han puesto de acuerdo en cuanto a la constitución de los conjuntos
desde el punto de vista matemático y, como consecuencia, existe una amplia variedad
de algoritmos para realizar el análisis de conjuntos que arrojan resultados diferentes.
En el caso de los mapas conceptuales, generalmente se realiza un análisis jerárquico
de conjuntos aplicando el algoritmo de Ward (Everitt, 1980) como base para la
definición de un conjunto. Este algoritmo tiene la ventaja de ser especialmente
apropiado para manejar información relacionada con distancias obtenidas a partir de
un análisis MDS. El análisis jerárquico de conjuntos aplica el mapa de puntos y de
allí construye un “árbol” que, en un momento determinado, tiene todos los puntos
juntos (en el tronco del árbol) y en otro momento, tiene todos los puntos al extremo
de sus “ramas”. Todas las aproximaciones de análisis jerárquico de conjunto están
divididas en dos tipos fundamentales a saber: el aglomerativo y el divisorio. En el
tipo aglomerativo, el procedimiento comienza con cada punto ubicado al extremo de
su propia rama y permite determinar cuáles serán los dos primeros puntos que se
fusionarán. En cada iteración sucesiva de agrupamiento, el algoritmo aplica una regla
matemática para determinar cuáles serán los dos puntos y/o conjuntos que se
combinarán a continuación. De esa manera, el procedimiento aglomera los puntos
hasta que todos ellos conformen un solo conjunto. El análisis jerárquico divisorio de
conjuntos funciona de manera opuesta, es decir, comienza con todos los puntos
aglomerados y a continuación decide de qué manera se dividirán los puntos en
24
conjuntos hasta que cada uno forme su propio conjunto, basado en una regla
matemática. El método de Ward es un método aglomerativo.
Figura 12. Un sugerente árbol jerárquico o de conjunto para un mapa de puntos que
muestra cómo se puede obtener una solución para nueve conjuntos tal como si se
realizara un corte transversal de un árbol.
La Figura 126 sugiere lo que ocurre en un análisis jerárquico de conjunto. Los

números muestran la ubicación de los planteamientos en un mapa de puntos que fue
elaborado a partir de un MDS. Cada planteamiento se encuentra en el punto extremo
de una rama. El árbol muestra cómo los planteamientos se aglomeran y finalmente se
combinan en un solo tronco (una solución de un conjunto). Se pueden observar
diferentes conjuntos haciendo un corte transversal a diferentes alturas del árbol. La
6
Esta figura recibe el nombre de “sugerente” porque sólo transmite la idea de un árbol de grupos de
manera visual. En esta figura existe cierta distorsión deliberada de dimensionalidad por fines gráficos.
25
figura muestra un corte vertical a partir del cual se obtienen nueve conjuntos distintos
de planteamientos. La Figura 13 muestra la solución resultante de nueve conjuntos
en forma de mapa conceptual bidimensional que conocemos como mapa de
“conjuntos y puntos”.
Figura 13. Mapa bidimensional de conjuntos y puntos que conforman un total de

nueve conjuntos.
En la mayor aparte de los contextos donde se aplican los mapas conceptuales,

no resulta útil mostrar todo el árbol del análisis de conjuntos para la tarea de la
interpretación de los resultados. Así como ocurre en la elaboración de mapas
geográficos, tampoco resulta factible mostrar todos los niveles de detalles de manera
simultánea. El geógrafo que maneja la cartografía toma decisiones sobre la escala y
los detalles que va a mostrar según la utilidad que se pretenda dar al mapa. De igual
manera, en la elaboración de mapas conceptuales, el facilitador, frecuentemente
acompañado de un pequeño conjunto de participantes, decide qué cantidad de
conjuntos utilizará en mapas que son producidos y presentados. No existe un criterio
26
matemático mediante el cual se pueda seleccionar un número específico de conjuntos.
Por lo general, el procedimiento aplicado en la elaboración de mapas conceptuales
consiste en evaluar una solución inicial de conjuntos que, al estudiarla, resulte la
solución más deseable para realizar la interpretación en un contexto dado.
Posteriormente, se examinan las soluciones de conjuntos cada vez más pequeños (es
decir, de cortes transversales que se suceden hasta llegar a la base del árbol),
aplicando un juicio en cada nivel que permita determinar si la fusión pareciera ser
razonable, defendible o deseable. Se examina el patrón de juicio aplicado para
determinar la conveniencia de las diferentes soluciones de conjuntos así como los
resultados obtenidos en la decisión de una solución específica de conjuntos que luzca
apropiada para los fines planteados en el proyecto. En algunos proyectos se elaboran
varios mapas de conjuntos de este tipo con el fin de ilustrar los diferentes niveles de
agregación.
Todo lo que resta como esencia del análisis de la elaboración de mapas
conceptuales es la incorporación de cualquier información de clasificación u otras
medidas que pudieran haber sido concertadas. Note que la única información que se
necesita para producir el punto base y los mapas de conjuntos descritos arriba es la
información que permite ordenar los datos. La información que permite clasificarlos
siempre es utilizada en la elaboración de mapas conceptuales para proporcionar una
tercera dimensión, es decir, una impresión vertical que describe gráficamente la
“altura” de varias áreas del mapa. Por ejemplo, si como parte de la fase de
organización de datos se pudiera obtener de los participantes una clasificación simple
de la importancia relativa de cada planteamiento, entonces se pudiera graficar la
importancia promedio asignada a cada uno de los planteamientos haciendo surgir7
una tercera dimensión en cada punto del mapa de puntos. De igual manera, la
importancia de cada conjunto pudiera indicarse utilizando “estratos/capas” para los
7
Después de varios años de ensayar con esta estrategia, parece haberse aceptado por consenso en la
elaboración de mapas conceptuales que un producto seudo-tridimensional es más fácil de interpretar
que una verdadera representación tridimensional, técnicamente más precisa.
27
conjuntos a fin de señalar la importancia promedio asignada a todos los
planteamientos del conjunto. Más adelante presentaremos ejemplos de varios de
estos mapas.
Un Ejemplo de Elaboración de Mapa Conceptual
Este estudio de caso ilustra la aplicación de mapas conceptuales como base

para la planificación estratégica de la Red de Evaluadores del Programa Oregón
(OPEN). Este estudio fue emprendido como un proyecto informal de demostración y
fue realizado antes de a Conferencia la OPEN anual donde los resultados debían ser
presentados y explicados. La OPEN es una organización destinada a aquellos
investigadores provenientes de Oregón y Washington que participan o están
interesados en la evaluación de programas. Esta organización permite a sus
miembros intercambiar ideas e información que facilite y promueva la práctica de
evaluaciones de alta calidad. Los miembros fundadores de la OPEN representan
agencias gubernamentales, universidades, y firmas consultoras privadas. La misión
de la OPEN consiste en:
Promover un foro regional interdisciplinario para el desarrollo profesional,

el trabajo en redes, y el intercambio de conocimientos prácticos,
metodológicos y teóricos en el campo de la evaluación.
Participantes
Aproximadamente 325 miembros de la OPEN que estaban en una lista de
correo electrónico fueron contactados por sus correos e invitados a participar en el
estudio. Debido a que todas las actividades hasta la fase de interpretación de los
mapas se efectuaron por Internet, los participantes se auto seleccionaron para
participar. El hecho de que la lluvia de ideas tuviese carácter anónimo, permitió que
28
no hubiera forma de conocer exactamente de cuántos participantes surgieron los 83
planteamientos que resultaron de la lluvia de ideas. La fase organizacional del
proceso requiere que los participantes se registren en un sistema y, en consecuencia,
es posible determinar con precisión el tamaño de la muestra. En el sitio de Internet se
registraron satisfactoriamente 23 personas, y 22 de éstas respondieron al menos
algunas de las preguntas sobre demografía, para las cuales sólo se requerían unos
cuantos segundos. Por su parte, las tareas de ordenamiento y clasificación son más
exigentes. Se puede invertir de 45 minutos hasta una hora en la tarea del
ordenamiento y, por lo general, la clasificación de un conjunto de 83 planteamientos
requiere de 10 a 15 minutos. De las 23 persona que se registraron en el sistema, 17
culminaron con éxito la tarea de ordenamiento y 18 completaron la tarea de
clasificación.
Procedimiento
El procedimiento general para la elaboración de mapas está descrito
detalladamente en la obra de Trochim (1989a). El trabajo de este autor (1989b)
ofrece una serie de ejemplos de resultados de numerosos proyectos de mapas
conceptuales. El proceso implementado en este caso fue llevado a cabo en la red de
Internet (www) aplicando el programa de red Concept System © Global. Este
programa puede ser utilizado desde cualquier terminal que tenga acceso a la red de
Internet con cualquier navegador de red suficientemente actualizado como el
programa Internet Explorer o Netscape Navigator (versión 2 o superior). No es
necesario transferir ningún software en la computadora del cliente ni tampoco se
transfieren automáticamente micro aplicaciones temporales ni otro tipo de programas.
La recopilación de la información para el proyecto se efectuó en dos fases
entre el 4 de septiembre y el 1 de octubre de 1999. En la primera fase (del 4 al 17 de
septiembre), se solicitó a los participantes que realizaran una lluvia de ideas de
planteamientos como respuesta a una solicitud de datos específicos. En la segunda
fase (del 20 de septiembre al 1 de octubre) se solicitó a los participantes que
29
ordenaran y clasificaran los planteamientos y que así mismo aportaran datos
demográficos básicos.
Fase I: Generación del Dominio Conceptual. En la primera fase, los

participantes presentan sus planteamientos aplicando un proceso estructurado de
lluvia de ideas (Osborn, 1948) conducido por requerimientos específicos sugeridos
que limite el tipo de planteamiento aceptable. La propuesta o criterio para
concentrarse en un tema específico y generar planteamientos consistía en solicitar a
los participantes que se concentraran en la tarea de “completar la oración” que se
muestra a continuación:
Pienso que, dentro de los próximos cinco años, la Red de Evaluadores del
Programa Oregón debería dedicarse específicamente a la tarea de...
La interfaz de la lluvia de ideas está ilustrada en la Figura 14. Únicamente se

solicitó a los participantes que colocaran sus navegadores en la página web del
proyecto (no se exigió un software específico en sus computadoras diferente al
navegador web y acceso a la red habituales para el usuario).
Los participantes produjeron 83 planteamientos de la lluvia de ideas. La
Tabla 1 muestra el conjunto completo de planteamientos. Cuando culminó el período
de lluvia de ideas, se solicitó a dos de los líderes del grupo de clientes que revisaran
los planteamientos que surgieron de la lluvia de ideas para cerciorarse de que no
tuvieran ningún error gramatical ni de ortografía. Posteriormente se preparó la página
web para la Fase II del proyecto.
30
Figura 14. Interfaz de la lluvia de ideas basada en la página web para la
generación de planteamientos en que se utiliza el software Concept System Global
Fase II: Organización de los planteamientos surgidos de la lluvia de ideas.

Tal como en la Fase I, esta segunda fase fue realizada completamente a través de
Internet. La fase organizacional incluyó tres tareas distintas: las del ordenamiento y
clasificación de los planteamientos o propuestas surgidos en la lluvia de ideas, y la de
la recopilación de variables demográficas básicas. Para poder ordenar los
planteamientos (Rosenberg y Kim: 1975; Weller y Romney: 1988), cada participante
los agrupó en hileras según su similitud y se les indicó hacerlo en caso de considerar
que presentaban similitudes de significado. Los participantes le asignaron nombres a
cada conjunto o hilera resultante. La única restricción impuesta en la tarea del
ordenamiento era que no podía haber: (a) N hileras (cada hilera tiene un ítem); (b)
una hilera está compuesta por todos los ítems; ni (c) una hilera “miscelánea”
(cualquier ítem considerado único se debía colocar en su propia hilera separada).
Weller y Romney (1988) señalan por qué el ordenamiento no estructurado (o método
para ordenar conjuntos, según ellos mismos denominan) es apropiado en este
contexto:
31
La notable fuerza de la tarea del ordenamiento en hileras consiste en el hecho
de que se puede dar cabida a un gran número de ítems. No conocemos otro
método de recolección de información que permita recolectar datos que se
consideren similares entre un conjunto de 100 ítems. Esto lo convierte en el
método preferido cuando se necesitan manejar grandes números. Otros
métodos que pudieran ser utilizados para recolectar información en base a
semejanzas, como por ejemplo tríadas y clasificaciones de comparaciones por
pares, se hacen inaplicables cuando se maneja un gran número de ítems (p.
25).
Para realizar la tarea de clasificación, se le pidió a cada participante que
clasificara cada planteamiento en una escala tipo Likert de 5 opciones de respuesta
que especificara el nivel de importancia de los planteamientos con respecto al futuro
de la OPEN. La indicación específica sobre la clasificación fue la siguiente:
Por favor, clasifique cada planteamiento de acuerdo a la importancia que usted
considera que tiene para el futuro de la OPEN:
1= Nada importante
2= Poco importante
3=Moderadamente importante
4=Muy importante
5=Sumamente importante
La información demográfica que fue recolectada se muestra en la Tabla 2.

Diecisiete participantes habían completado la información pertinente al ordenamiento
y dieciocho de ellos habían completado la información de clasificación.
Resultados
El primer mapa (el más elemental) que surgió del análisis es el mapa de
puntos que muestra los planteamientos resultantes de la lluvia de ideas en un espacio
bidimensional. Cada punto del mapa representa una sóla idea surgida de la lluvia de
32
ideas. Los puntos que se encuentran físicamente más cerca los unos de los otros se
parecen más desde el punto de vista cognitivo, es decir, tendían a ser ubicados juntos
en las mismas hileras de ordenamiento por un mayor número de participantes. El
análisis jerárquico de conjuntos se utiliza para hacer una “distribución” de los puntos
en conjuntos y conjuntos gráficamente adyacentes y próximos. La Figura 15 muestra
el mapa de puntos superpuesto en una solución de seis conjuntos.
Figura 15. Mapa de puntos y conjuntos del proyecto de la OPEN.
Cada punto del mapa va acompañado de un número de identificación de

planteamientos que permite reconocer el contenido de los mismos, como lo muestra
la Tabla 1. Así mismo se puede observar el texto que contiene los aportes dentro de
cada conglomerado, en donde los mismos son ordenados de manera descendente
según la importancia promedio obtenida de todos los participantes, tal como se
muestra en la Tabla 3.
Aún cuando el mapa muestra una cantidad considerable de detalles, esta forma
no es muy útil como lo sería una entidad gráfica. En términos cartográficos, pudiera
ser que la escala del mapa sea excesivamente detallada si tomamos en cuenta la
33
utilidad que tendría en contextos típicos de toma de decisiones.8 Para pasar a un nivel
mayor de abstracción o escala, se pudieran dejar a un lado los detalles acerca de los
puntos y asignar a los textos nombres significativos para cada uno de los conjuntos,
tal como se muestra en la Figura 16.
¿Cómo asignar estos rótulos? En primer lugar, el análisis aplica un algoritmo
que determina matemáticamente el rótulo “que queda mejor” para cada conjunto
según los rótulos de serie de ordenamiento que los participantes hayan seleccionado.
De hecho, el algoritmo permite la selección de los diez rótulos más convenientes para
cada conjunto en orden descendente, según el nivel de adecuación de los mismos. En
segundo lugar, los participantes evalúan los planteamientos en cada uno de los
conjuntos y los 10 mejores rótulos. De esa manera, determinan a partir de esta
información el rótulo de conjunto que guarde más concordancia con un conjunto
dado.9 Finalmente, podemos elaborar un mapa de conjuntos con sus respectivos
rótulos, tal como apreciamos en la Figura 16.
Con frecuencia, este mapa de conjuntos puede ser interpretado y, por
consiguiente, resulta útil. Por lo general, además de servir como herramienta que
permite resumir y señalizar las ideas del conjunto, también permite y estimula a los
participantes a comenzar a desarrollar teorías acerca de las interrelaciones entre los
conjuntos de ideas. Por ejemplo, con frecuencia resulta práctico “regionalizar” el
mapa formando conjuntos de conjuntos. En el mapa de la Figura 16, tendría sentido
agrupar los cuatro conjuntos en la parte superior izquierda dentro de una región que
8
Aunque este nivel de detalle tendrá más importancia cuando la organización verdaderamente
comience a trabajar en diseños precisos de un plan estratégico que posea tareas o pasos de acción
específicos. El mapa de puntos se asemeja a un mapa de perfil detallado o a un mapa de estudio
geológico (pues no brinda un buen sentido general de la geografía, sino que es absolutamente esencial
en caso de querer hacer una caminata por un área o de pensar construir en ese sitio.
9
En este proyecto de demostración, el autor obtuvo los rótulos definitivos de los conjuntos, analizando
los planteamientos en cada conjunto y observando los 10 mejores rótulos. Por ejemplo, consideremos
el conjunto cuyo rótulo asignado fue ‘Reclutamiento’ en la Figura 16. Los 10 mejores nombres (en
orden descendente) para este conjunto fueron: Ideas para el Reclutamiento, Conformación de la
Membresía, Alcance, Proceso De Reclutamiento, Diversidad Incrementada, Membresía Ampliada,
Integrantes De La Membresía, Alcance sin Evaluación, Membresía y Reclutamiento De Miembros.
Después de revisar estos nombres y los planteamientos en ese grupo, fue seleccionado el término
‘Reclutamiento’ como nombre definitivo que resumiera mejor las ideas.
34
guarde alguna correspondencia con las relaciones que mantiene la OPEN con el
mundo externo. Los cuatro conjuntos en esta región conformada por Reclutamiento,
Vínculos, Alcance, y RRPP & Servicio a la Comunidad, describen las áreas
principales de la interfaz con este entorno. Por otro lado, los conjuntos ubicados en la
parte inferior derecha del mapa corresponden más con asuntos internos de la OPEN
(Programas & Eventos, Servicios, Comunicación, y Ayudas/Becas).
Figura 16. Mapa de conjuntos con rótulos asignados para el proyecto de la OPEN.
El mapa de conjuntos con nombres asignados que se muestra en la Figura 16

se puede considerar como la “base” de este mapa relacional. Así como sucede con
cada uno de los puntos, la distancia entre los conjuntos puede interpretarse.10 Por lo
general, se utiliza esta base como fundamento para visualizar o representar otros
datos. Por ejemplo, observe el mapa de la Figura 17.
10
La orientación dimensional no es relevante (el mapa carece de la noción norte-sur o este-oeste). Se
puede girar el mapa y/o colocarlo de manera vertical u horizontal y los puntos/conjuntos mantendrían
la misma configuración relacional. La forma del grupo tampoco es relevante (ésta se determina
únicamente por medio de la unión de los puntos exteriores de cada conjunto). Sin embargo, el tamaño
del grupo si presenta cierta interpretabilidad. En general los grupos más grandes son “más amplios”
en cuanto al significado y los más pequeños son más definidos con un menor margen de significado.
35
1
El mapa también cuenta con una leyenda que muestra que los promedios de los conjuntos tiene
valores comprendidos entre 2,70 como valor más bajo y 3,40 como valor más alto, aún cuando la
clasificación fue calculada en una escala del 1 al 5. Con frecuencia, esta gama de valores estrecha
sorprende a las personas que se sienten tentadas a concluir que esto significa que no existe ninguna
diferencia esencial entre el conjunto mayor y el menor. Sin embargo, es importante recordar que los
promedios de los conjuntos son sumados dos veces: por cada planteamiento, el promedio está
determinado entre los participantes y por cada grupo, el promedio está determinado entre los
planteamientos que forman parte del conjunto. En virtud de esta operación, se espera que la gama de
valores este considerablemente restringida. Esto también significa que incluso las diferencias
relativamente pequeñas entre los promedios de los grupos, probablemente se puedan interpretar mejor
de lo que se haría dándole un primer vistazo.
El ordenamiento de los rótulos de arriba abajo representa de manera gráfica la

jerarquía en orden de importancia de los conjuntos y establece vínculos en el orden en
el que cada línea de un conjunto coincide con el eje de ese lado. El punto en el que
cada línea toca el eje, representa gráficamente el promedio entre el nivel y el intervalo
de las clasificaciones por importancia. El Indicador Correlacional Producto Momento
de Pearson ® es un estimado de la correspondencia general entre los patrones y se
muestra en la parte inferior de la figura. Este tipo de figura recibe el nombre de
“gráfico tipo escalera” debido a que, cuando existe una concordancia apreciable entre
los conjuntos, las líneas tienden a ser horizontales y lucen como una extraña especie
de escalera multicolor con peldaños desnivelados.
Figura 19. Representación gráfica tipo escalera de los patrones sobre

correspondencia de modelo obtenida en la Figura 18.
36
El gráfico tipo escalera que establece la correspondencia entre los patrones es
especialmente útil para examinar visualmente el grado en que se asemejan o
diferencian dos representaciones de patrones verticales sobre la base de un mapa
conceptual. Por ejemplo, los patrones mostrados en la figura 19 sugieren que
mientras el rubro Vínculos es el conjunto más importante para empleados públicos,
para los empleados no gubernamentales se encuentra un poco por debajo de la mitad
en el orden de importancia. Además podemos observar en esta figura que los
empleados públicos proponen clasificaciones casi dicotómicas o bimodales, por
ejemplo, Ayudas/Becas y RRPP & Servicio a la Comunidad, a los que se les asignan
valores bajos prácticamente iguales mientras el resto de los rubros están colocados en
una posición superior muy parecida.
Figura 20. Correspondencia entre patrones de participantes que han cursado

estudios de doctorado frente a aquellos que no.
La Figura 20 muestra la correspondencia de patrones realizado por los

participantes con un Ph.D. comparada con aquellos que carecen de este grado
académico. Algunas de estas discrepancias son dignas de atención. Los que no
poseen títulos parecen asignar valores más altos a los rubros Reclutamiento y
37
posiblemente Ayudas/Becas que los que han cursado tales programas. Por otro lado,
asignan valores relativamente más bajos al rubro RRPP & Servicio a la Comunidad, y
de hecho es ésta la categoría con menor valor.
Figura 21. Correspondencia entre patrones de participantes con más de diez años de
experiencia en comparación con aquellos que han trabajado menos tiempo en el área
de investigaciones.
En la Figura 21 podemos apreciar que los investigadores con menos

experiencia parecen estar más interesados tanto en los Servicios como en el
Reclutamiento que sus colegas de más experiencia, lo cual tiene sentido ya que es
más probable que estén en el proceso de consolidar sus carreras y de necesitar
asistencia y/o guía (ej. Servicios) y que estén interesados en trabajar en red y hacer
contactos con otros que estén en su misma situación (ej. Reclutamiento).
38
Figura 22. Correspondencia entre patrones de los miembros de la AEA en
comparación con los que no son miembros de la misma.
La asociación nacional de evaluadores de los Estados Unidos es la Asociación

Americana de Evaluación (AEA), la cual se organizó con el fin de que los miembros
de la OPEN pudieran interesarse en participar en esta asociación nacional además de
participar en la OPEN. La Figura 22 muestra que los miembros de la OPEN que no
son miembros de la AEA consideran el rubro Servicios como el conjunto más
importante y muy probablemente se debe a que cuentan con la OPEN como su única
y principal fuente de tales servicios, mientras que los miembros de la AEA la tienen
sólo como una alternativa.
39
Figura 23. Correspondencia entre patrones por género (femenino-masculino).
De igual manera, se hizo una comparación según el género como se observa

en la figura 23. En este caso, el rasgo más resaltante parece ser que los participantes
de género masculino consideraron el rubro de Ayudantías y Becas como inferior a las
demás categorías; en este sentido, sus clasificaciones son casi bimodales. ¿Podría
este resultado reflejar una tendencia general del sexo masculino a estar menos
interesados en guiar y patrocinar actividades?
La correspondencia entre patrones puede ser representada tanto de manera
relativa como absoluta, y existen diferentes ventajas y desventajas para cada caso.
Por ejemplo, consideremos las dos correspondencias entre patrones mostradas en la
figura 24. Ambos gráficos tipo escalera representan exactamente los mismos datos de
correspondencia de patrones. Ambos comparan la importancia promedio de los
participantes que viven en Pórtland, Oregón, con aquellos que no están residenciados
en esa ciudad. La única diferencia entre ellos es que el gráfico de escalera de la
izquierda utiliza el verdadero promedio de los valores asignados, tanto los mínimos
como los máximos, para cada conjunto a fin de determinar los valores superiores e
inferiores de los ejes, mientras que el gráfico de la derecha fija los ejes para ambos
conjuntos en un mismo nivel de valores altos y bajos.
40
Figura 24. Correspondencia entre patrones relativa (a la derecha) y absoluta (a la
izquierda) para la misma comparación.
Ambas correspondencias de patrones muestran que el rubro Comunicaciones

es importante para ambos grupos; sin embargo, la correspondencia absoluta muestra
con más claridad que este rubro es más importante para los que viven fuera de
Pórtland. Por otro lado, la correspondencia relativa de patrones revela que, en
términos relativos, los residentes de Pórtland asignaron a Programas & Eventos un
valor más alto que el asignado por quienes no residen en esa ciudad (probablemente,
en virtud de las distancias, este último grupo encuentra menos accesibles eventos
que, en su mayoría, se llevan a cabo en esa localidad).
¿Qué podemos hacer con estos resultados? Por supuesto, debemos reconocer
que están basados en una muestra muy pequeña. Este proyecto fue realizado como
una demostración, mas no como una evaluación formal. Las inferencias que surjan
de una muestra tan pequeña y poco representativa deben hacerse con mucha cautela.
Además, todas las interpretaciones hechas en este estudio se realizaron desde la
perspectiva del autor, quien también era el facilitador del proyecto. Hubo una breve
oportunidad de interactuar con los miembros de la OPEN en su conferencia y
presentarles nuestros resultados. Sin embargo, nos abstuvimos de motivarlos a que
participaran en el proceso de interpretación con la misma intensidad que caracteriza a
nuestros trabajos en mapas conceptuales.
41
Por otro lado, podemos contemplar estos mapas y las correspondencias
observadas como herramientas "sugerentes" diseñadas para impulsar a los miembros
de la OPEN a analizar, desde diversas perspectivas, asuntos concernientes a su
organización y a los problemas que le atañen. Con esta perspectiva, y sin tomar
ninguno de los resultados con demasiada seriedad, pudiera decirse que algunas
inferencias “saltan a la vista”. El mapa parece dejar claro que la manera en que las
personas conciben a la OPEN y sus asuntos está dividida en dos puntos de vista:
interno y externo. Los problemas internos más resaltantes se relacionaron con las
categorías Servicios, Programas & Eventos, Comunicaciones, y la de Becas y
Ayundantías, mientras que entre los problemas externos se ubicaron los rubros
Reclutamiento, Vínculos, Alcance, y RRPP & Servicios a la Comunidad. Con
relación a los mapas mostrados en la Figura 16 y 17, es interesante observar que el
rubro Reclutamiento, el cual fue considerado como un asunto externo, se encuentra
ubicado cerca de Programas y Eventos, considerado como asunto interno, mientras
que RRPP & Servicios a la Comunidad (problema externo) se encuentra más cerca de
Servicios y Becas y Ayudantías (problemas internos). En términos de planificación,
este resultado pudiera sugerir que la organización busque establecer una correlación
entre estas sinergias internas y externas. Por ejemplo, este resultado pudiera sugerir
que el rubro Programa & Eventos sea visto como un mecanismo fundamental para
alcanzar logros en el área de Reclutamiento y, a la inversa, podría concluirse que es
importante realizar mayores esfuerzos en el área de Reclutamiento a fin de ampliar el
contenido de Programas & Eventos u ofrecer una mayor variedad de los mismos. De
igual manera, pudiéramos analizar la confluencia de RRPP & Servicio a la
Comunidad (asunto externo), Servicios (interno) y Becas y Ayudantías (interno) y
concluir que la categoría de Becas y Ayudantías muy bien pudiera servir de “puente”
entre estas áreas, puesto que puede ayudar a reforzar las RRPP & Servicio a la
Comunidad de la OPEN y al mismo tiempo ofrecer un Servicio a sus miembros.
La otra interpretación importante que se puede obtener de este pequeño
proyecto piloto está relacionada con las correspondencias entre los patrones. Estas
42
correspondencias sugieren que la OPEN posee varios integrantes importantes de
quienes se pueden percibir algunas diferencias en cuanto a sus preferencias. Ellos
recurren a la organización por objetivos diferentes, y participan en aspectos
igualmente diferentes, lo cual no es motivo de ninguna sorpresa dada la diversidad de
los miembros. Sin embargo, las correspondencias de los patrones sí contribuyen a
que podamos intuir cuáles grupos presentan alguna preferencia específica.
Por último, es importante reconocer el propósito planteado para este estudio
piloto (obviamente, aparte de ser una herramienta para demostrar la utilidad de los
mapas conceptuales). El estudio estaba enfocado en la planificación estratégica de la
OPEN, específicamente para los próximos cinco años. Esperaríamos que este mapa
(como todos los mapas) sirva como guía para la organización, en este caso, en sus
esfuerzos de planificación estratégica. ¿Cómo podría utilizarse de esta manera? En
primer lugar, los mapas sirven como una herramienta de comunicación que contiene
una cantidad enorme de información divididas en ocho categorías y que proporcionan
un idioma común que puede dar forma a esfuerzos de planificaciones posteriores, es
decir, estos mapas ayudan a organizar y simplificar el debate. En segundo lugar, los
mapas proporcionan una estructura implícita que puede usarse para organizar los
esfuerzos de la planificación estratégica. Por ejemplo, la OPEN podría decidir
subdividir la próxima fase de su esfuerzo de planificación estratégica conformando
varios comités o cuerpos de trabajo. En lugar de este mapa, ellos necesitarían
crearlos intuitivamente. Sin embargo, el mapa sugiere ocho categorías que pudieran
conformar la base para varios comités. Si sólo quieren unos cuantos comités,
pudieran aplicar la diferencia entre los asuntos internos y los externos, pero si quieren
contar con más comités, podrían tener uno por conjunto o conjunto adyacente del
grupo (por ejemplo, se incorpora el rubro Becas y Ayudantías en el conjunto
adyacente de los Servicios para formar el comité de los Servicios). Cada comité ya
cuenta con una ventaja, puesto que en los conjuntos designados se incorporan una
serie de asuntos, o planteamientos (como se muestra en la Tabla 3) que han sido
clasificados según su importancia. Pudiera comenzarse por aquellos que han sido
43
clasificados como más importantes por todos los participantes para así comenzar a
ejecutar los planteamientos de acciones potenciales, los lapsos de ejecución, las
asignaciones, y así sucesivamente11. No obstante, si el grupo así lo prefiere, se
pudiera comenzar a trabajar en conjunto y observar detalladamente todo el mapa para
así identificar los planteamientos clasificados como más importantes por todos los
participantes, como muestra la Tabla 4. Posteriormente, los participantes pudieran
decidir las acciones y detalles relacionados con la ejecución de estos planteamientos,
simplemente con el fin de realizar sugerencias iniciales a la organización sobre cómo
podría utilizar los mapas para guiar su planificación estratégica. Los usos potenciales
están limitados principalmente por la creatividad de la organización.
Hay que dejar en claro que los mapas conceptuales, así como los geográficos,
proporcionan una guía, es decir, un sentido de dirección. En lugar de describir
terrenos geográficos, los mapas conceptuales intentan describir “terrenos”
conceptuales y semánticos. Tomando en cuenta este estudio de caso, ahora nos
concentramos en lo que cartógrafos afirman actualmente en cuanto a la elaboración
de mapas, para percatarnos de qué manera sus visiones y experiencias podrían
esclarecer los esfuerzos del evaluador.
La cartografía de los mapas conceptuales
Una premisa fundamental de este ensayo es que el evaluador puede ser, tanto
en término figurado como literal, un cartógrafo de la realidad social en la que trabaja.
Cuando se concibe dentro de la clasificación descrita anteriormente, que es más
11
El Concept Systems Incorporated, productores de The Concept System, cuenta con numerosos
módulos que se aplican luego de la elaboración de mapas y que ayudan a los participantes a obtener los
resultados de los mapas y a utilizarlos para diversos propósitos. Estos módulos, que resultaron en el
programa Concept System Application Suite, incluye, entre muchas otros programas: uno de
planificación estratégica o de plan de acción que permite a la organización desarrollar tareas
relacionadas con cualquier planteamiento o conjunto, plazos, recursos asignados, costos, y así
sucesivamente; y un programa de evaluación y ejecución de medidas que permite el desarrollo de
múltiples medidas para cada planteamiento o conjunto, la especificación de metas u objetivos, y la
supervisión actual del tiempo, del análisis y de la presentación de informes.
44
totalizadora, debe quedar claro que ser evaluator-cartógrafo no significa estar
limitado a la elaboración de mapas geográficos (como aquellos basados en sistemas
de información geográfica). De hecho, la elaboración de mapas geográficos debiera
ser en última instancia una tarea de carácter más bien accesoria para el evaluador-
cartógrafo. Los mapas relacionales, especialmente aquellos en los que está presente
una representación adicional de datos, son potencialmente valiosos para realizar tanto
una evaluación prospectiva y formativa como un trabajo sumativo y retrospectivo.
Si partimos de la premisa de que los mapas son una herramienta útil en la
evaluación, y que el tipo de elaboración de mapas no geográficos que pudiera realizar
el evaluador está relacionado desde un punto de vista conceptual con la elaboración
tradicional de mapas geográficos, entonces ciertamente pudiéramos analizar cuál es la
concepción de los cartógrafos contemporáneos en cuanto a la calidad en su trabajo,
con el fin de detectar si pudiera existir alguna lección y/o advertencia útil para el
evaluador-cartógrafo.
El común de las personas que no tienen conocimiento en esta materia
considera que los mapas geográficos (aquéllos que podríamos denominar como
geográficamente isomorfos) corresponden más o menos con la realidad externa. Un
mapa, visto desde esta perspectiva, puede evaluarse según su exactitud, es decir,
según el grado de precisión con que refleja la realidad geográfica.
Todos los que hemos estado comprometidos con el trabajo de la evaluación,
estamos familiarizados con esta perspectiva. Así como sucede con los mapas,
muchas personas no familiarizadas con el tema, presumen que nuestras evaluaciones
son isomorfas con respecto a la realidad. Una evaluación con esta perspectiva puede
ser calificada según su validez o el grado de “precisión” con que representa la
realidad. Harley comenta al respecto lo siguiente:
Por lo general, se considera que los mapas son como un espejo, es decir, una
representación gráfica de algún aspecto del mundo real. …el papel de un
mapa consiste en representar el verdadero estado de una realidad geográfica.
(Harley, 1990)
45
Las perspectivas ingenuo-realistas y las positivistas en la evaluación han sido
contundentemente cuestionadas por una serie de posturas post-positivistas entre las
que se encuentran el realismo crítico, realismo emergente, constructivismo e
interpretativismo. Todas estas perspectivas comparten la premisa de que la realidad
no es un fenómeno de percepción inmediata que puede ser representado fácilmente
fuera de su contexto. Todas las personas son críticas de nuestra habilidad de observar
y medir la realidad "con precisión" en el sentido ingenuo-positivista o realista.
Además, consideran el ambiente político e interpersonal como un componente de
suma importancia en los contextos de la evaluación.
De alguna manera, para el evaluador resulta alentador leer algunos textos
recientes sobre cartografía como si fuera un inexperto en la materia. Muchos de estos
temas se obtuvieron de esa manera, y muchas de las conclusiones que resultaron se
relacionan directamente con las de la evaluación. Tanto los cartógrafos como los
evaluadores hemos luchado con los paradigmas ingenuo-realistas y con los
positivistas. Tanto ellos como nosotros hemos reconocido la gran importancia del
contexto, la imperfección de las medidas y de la observación, así como el peso que
tienen nuestras perspectivas y experiencias para basar nuestro trabajo.
En esta sección se analizan algunas posturas actuales en el área de la
cartografía. En ningún momento se intentó representar con precisión todos los
aspectos del debate actual entre los cartógrafos, pero ciertamente no deja de ser tan
dividido y polémico como el debate actual entre los evaluadores. Esta exposición
tampoco pretende representar los últimos alcances del discurso académico sobre la
cartografía. Ya se encuentra disponible en la prensa popular las principales
bibliografías utilizadas en este ensayo y pueden obtenerse fácilmente en la mayoría
de las grandes librerías.
Aunque se incluyen varias fuentes, este debate se extrajo principalmente de la
fascinante obra de Denis Wood titulada The Power of the Maps (“El Poder de los
Mapas”, Wood, 1992). Wood parte claramente de la perspectiva constructivista y
interpretativista de la cartografía que incluso desafía los supuestos del realismo crítico
46
post-positivista. Sus propuestas desafían las actuales y poseen un gran valor
potencial tanto para los evaluadores-cartógrafos que están de acuerdo con sus
posturas como para los que no. El siguiente debate abarca una serie de temas, cada
uno de los cuales está tomado directamente del libro de Wood.12 Dentro de cada
tema se encuentra una breve explicación de un aspecto de la cartografía y las
consideraciones de la importancia del tema para el estudio de la elaboración de mapas
conceptuales en particular y la elaboración de mapas conceptuales en general.
Cada mapa tiene un autor, un asunto y una materia

... todos los mapas, sin excepción, llevan plasmado de manera inevitable,
ineludible y necesaria los prejuicios, predisposiciones y parcialidades de su
autor... No existe ninguna descripción del mundo que no posea grilletes...
colocados por éstos y otros atributos de quien realizó la descripción. Incluso
señalar siempre será señalar... a alguna parte, y este acto no sólo indica un
lugar sino que lo convierte en el centro particular de atención en vez de...
cualquier otro lugar. El que señala es el autor, el cartógrafo; y el lugar
señalado es el sujeto o situación; el aspecto particular de atención es el que se
aborda, es decir, el tema, sin involucrar nada más (ni nada menos) en
cualquier mapa (Wood, 1992, pág. 24)
¿Quién es el "autor" de un mapa conceptual? Por lo general describimos tres
tipos de personas que están directamente involucradas en realizar un mapa: el
iniciador(es), el facilitador(es) y los participantes. El initiator(es) es la persona(s) que
solicitó el mapa, es decir, que puso el proceso en marcha. En la mayoría de los
contextos relacionados con la elaboración de mapas, el iniciador es alguien
perteneciente a un grupo de clientes o a una organización que establece el contacto
12
Ya el sólo hecho de leer el libro resulta estimulante. La concepción de los títulos es tan directa y
accesible que hacen del texto una lectura muy fácil y de gran impacto. Sin embargo, los temas
seleccionados para este ensayo sólo representan una parte del contenido. Muchos de los temas que se
omiten están relacionados con la naturaleza iconográfica de muchos de los mapas geográficos
(característica que no compartimos en el tipo de elaboración de mapas conceptuales descritos en este
ensayo).
47
con el facilitador(es) quien supervisa el proceso en sí. Generalmente, el iniciador es
bastante influyente dentro del contexto de interés del mapa. Pudiera ser el director o
los gerentes de una organización. Al seleccionar un proceso particular para la
elaboración de mapas conceptuales, están “señalando” en cierto sentido o dando
preferencia a un método o enfoque por encima de otro. No obstante, el iniciador
ejerce una influencia más importante en la definición del proyecto porque, por lo
general, es la persona clave que trabaja con el facilitador para decidir los aspectos que
se han de resaltar en el mapa, los datos demográficos y clasificaciones, el programa
de trabajo y las personas que serán seleccionadas a participar. El facilitador también
desempeña un papel clave en la definición de la naturaleza del mapa. Por ser la
persona que probablemente conozca más sobre el proceso de la elaboración de mapas
y su funcionamiento, el facilitador cuenta con numerosas oportunidades de
determinar el contenido del mapa. Por ejemplo, el facilitador desempeña un papel
clave y determinante en las siguientes tareas:
- definición de las tareas de la planificación (selección del enfoque, programa,
participantes, etc.)
- selección del contenido de los planteamientos obtenidos por lluvia de ideas a
través del trabajo directo del grupo durante la lluvia de ideas y la corrección del
conjunto de planteamientos.
- determinación del criterio para las clasificaciones ayudando a elaborar la
frase para las instrucciones de clasificación.
- diseño del análisis por medio de la escogencia de los parámetros para el
mismo.
- administración de los resultados, sobre todo por medio de la selección del
número de conjuntos y la reestructuración de los límites de los puntos o conjuntos del
mapa.
- definición del criterio de la interpretación por medio de lo que se presenta a
los participantes, el orden de presentación, y la conducción de la discusión sobre la
interpretación.
48
Obviamente, también es importante reconocer que los participantes definen el
contenido del mapa y, en cierto sentido, son considerados como los autores
principales. Sin embargo, debe quedar claro que su contribución está limitada
significativamente por el iniciador(es) y el facilitator(es).
Uno de los aspectos más delicados y potencialmente engañosos de la
elaboración de mapas conceptuales está relacionado con los roles aparentes y
verdaderos del facilitador. A primera vista, para muchos participantes tanto directos
como indirectos, el facilitador parece una persona desinteresada que aporta una ayuda
objetiva al grupo en la determinación del contenido. En realidad, el facilitador juega
un papel determinante, porque ejerce sutilmente una influencia de diversas maneras,
pero sin que necesariamente deje en claro las intenciones e intereses de su trabajo.
Como facilitador de este proyecto, yo no era miembro de la OPEN; sólo conocía
algunos miembros desde el punto de vista profesional (incluso a los iniciadores) y no
estaba realmente interesado en hacer otra cosa que la mejor demostración posible del
método (lo cual no quiere decir que eso no representara una poderosa influencia). Sin
embargo, yo soy un profesor de raza blanca y en edad madura proveniente de una de
las ocho universidades más prestigiosas del este de EEUU, y cuento con amplia
experiencia en la profesión de la evaluación. Ciertamente, estos factores influyen en
las decisiones que yo tomo e insto que otros tomen cuando elaboran mapas
conceptuales. Me adelanto para agregar que no considero que éste sea un problema
en sí mismo, porque, de cualquier modo, todos los facilitadores tienen sus propios
prejuicios y visiones muy personales que plasman en sus proyectos. El peligro no
está en el prejuicio en sí, sino el hecho de no reconocer la existencia de tal influencia.
Ahora bien, lo que sí representa un peligro potencial es pretender que el facilitador
sea un extraño completamente objetivo que sólo facilita el proceso mas no ejerce
influencia sobre los resultados y además pretender que, en cambio, estos resultados
provengan completamente de procesos estadísticos objetivos, rigurosos y basados en
programas de computadora.
49
En el estudio del caso de la OPEN, yo fui el único facilitador. La persona que
estableció contacto conmigo fue el actual Presidente de la OPEN, quien conocía mi
trabajo en la elaboración de mapas y me pidió que fuera el orador del tema central de
su primera conferencia. En nuestras primeras discusiones sugerí que hiciéramos un
proyecto real de elaboración de mapas para la OPEN, que luego pudiese ser
incorporado a mi discurso. Posteriormente, se incorporó otro socio del Presidente a
nuestras discusiones debido a que ambos eran miembros de la OPEN y tenían
experiencia en cuanto a metodología de elaboración de mapas conceptuales. Por ello,
en este contexto contamos con un facilitador y dos iniciadores que formaban parte de
la organización cliente. Todas las decisiones sobre planificación del proyecto fueron
tomadas por nosotros tres. Decidimos llevar a cabo el proceso (hasta la interpretación
de los mapas) utilizando la versión web del Concept System, centrarnos en el
planteamiento principal para la lluvia de ideas y contar con una lista obtenida por
correo electrónico de todos los miembros de la OPEN (evidentemente, la gran
mayoría de los miembros cuentan con un correo electrónico, y la OPEN la utiliza de
manera rutinaria para enviar comunicados a sus miembros, aunque no se sabe con
precisión con cuánta frecuencia los miembros usan sus correos). Todos los
comunicados acerca del proyecto enviados a los miembros por correo electrónico
fueron redactados directamente por el Presidente de la OPEN, aunque en realidad
todo el trabajo de administración y procesos de la Internet estaban a cargo de mi
persona.
Hubo diversos aspectos del proceso en los que yo o el iniciador ejercimos
influencia, además de las decisiones descritas anteriormente. Durante el período de la
lluvia de ideas, se generaron muy pocos planteamientos en la red. Pocos días antes
de que se cerrara el proceso de la lluvia de ideas, envié a los iniciadores el siguiente
mensaje por correo:
¡Ya estamos en la tarde del jueves y la lluvia de ideas sólo ha arrojado 24
planteamientos! Yo voy a terminar con esto este fin de semana y DE
VERDAD, VERDAD espero que ustedes, muchachos, le pongan más corazón
50
al trabajo. En lo que a mi respecta, este domingo me meto de lleno a generar
más o menos 300 conjuntos de datos para sus participantes y prácticamente no
valdría la pena a menos que saquemos de 70 a 80 planteamientos. Por su
puesto que ustedes dos pueden “rellenar” la lista si hacen una revisión este fin
de semana (que ese pudiera ser el caso), pero sería mucho mejor si otros
participaran.
Por su puesto que esto era un ejemplo simulado entre amigos y realizado
completamente para fines de ser manejado como una demostración. Sin embargo, en
el mensaje se evidencia que estoy preocupado tanto por el bajo nivel de participación
en la actividad como por el grado en el que pudiera reflejar deficiencia en mi
metodología (¡sin mencionar el temor de que no pudiera obtener un resultado
suficientemente bueno como para ser presentado y tuviera que comenzar de nuevo
con otra muestra!). Mi compañero iniciador respondió con el siguiente mensaje:
Sólo contamos con 30 planteamientos: ¡Qué sinvergüenzura la nuestra!
Especialmente porque yo ya hice 15 de ellos. Así que está bien, voy a
comenzar a rellenar la lista (también voy a llamar a algunos amigos para
pedirles que me hagan unos planteamientos para incluirlos).
Evidentemente, mi compañero hizo un gran trabajo porque, al final,
conseguimos los 83 planteamientos. ¿Realmente estaba yo preocupado por la
participación? No en este caso, porque estaba claro que sólo se trababa de una
demostración. ¿Estas cosas pasan en los proyectos “reales”? Claro que sí. El
problema no es precisamente que se dé este tipo de situaciones, sino evitar que se
noten en el trabajo final. Esos mapas tan imponentes y sobrios, lucen como si
hubieran sido hechos contando con una amplia participación. Este asunto se
manifiesta de diversas maneras dependiendo de la forma en que se hayan recopilado
los planteamientos de lluvias de ideas. Por ejemplo, cuando la lluvia de ideas se
realiza por medio de la red, como en este estudio de caso, es casi imposible calcular
51
su nivel de participación porque tiene carácter anónimo13. En una sesión de lluvia de
ideas presencial ocurren problemas que, aunque diferentes, generan la misma
incomodidad.
Cuando culminó la lluvia de ideas, solicité a los iniciadores que “editaran” el
conjunto de planteamientos generados. La siguiente es una muestra de cómo nos
comunicamos por correo electrónico:
El iniciador:
Sólo para verificar si estoy claro con lo de la meta, le quiero preguntar si lo
que buscamos son planteamientos que sean diferentes conceptualmente entre
sí y no repeticiones de temas parecidos. ¿Qué pasaría con los planteamientos
que traten detalles de conceptos más amplios (por ejemplo, “organizar más
eventos fuera del área de Pórtland” frente a “programar reuniones en Salem (o
en Eugene, o Bend...”)? ¿Se puede incluir todos los planteamientos dentro del
concepto “lugar de reunión”, o debemos seleccionar planteamientos que sean
más detallados? O ¿incluir planteamientos más detallados y obviar los
conceptos más amplios?
Mi respuesta fue la siguiente:
Técnicamente, siempre y cuando el planteamiento satisfaga a plenitud la
solicitud para concentrarse en el tema central, sí es legítimo. Sin embargo,
necesitarás aplicar algún criterio a la hora de efectuar la revisión. Si alguien
propone “realizar más reuniones fuera de Pórtland”, yo considero que me
inclinaría por dejarla sola. No obstante, si obtienes una inmensa cantidad de
13
En realidad, los que conocen el funcionamiento de la red debería recordar que, aunque no se cuente
con un código de acceso para la lluvia de ideas, todos los servidores de la red pueden rastrear las
visitas a los sitios de la web e incluso el número IP específico del usuario. En muchas organizaciones,
a cada máquina se le asigna un número IP permanente, de modo que podemos identificar cuáles
máquinas visitaron la sección de lluvia de ideas. Por supuesto, no se sabe quién está en esa máquina,
pero en muchas compañías cada máquina se asigna a un individuo diferente, de manera que
pudiéramos inferir quién visitó el sitio web de la lluvia de ideas. Desde luego, la bitácora no puede
decirnos qué planteamiento y en cuál visita. Sin embargo, es posible reconstruir la secuencia. Así que,
teóricamente sí es posible identificar quién realizó la lluvia de ideas y con qué planteamiento, incluso
para un proceso de lluvia de ideas “anónimo”. Claro está, a mi se me ocurriría averiguar estos asuntos.
52
planteamientos que proponen “realizar una reunión en Pórtland”, “realizar una
reunión en Eugene” o “realizar una reunión en Podunck”, probablemente las
fusionaría en la propuesta de “realizar reuniones en diversos lugares de todo el
estado”.
En este caso, tanto el facilitador como los iniciadores están ejerciendo
influencia sobre los resultados del mapa, específicamente cuando eligen entre una
multitud de formas de edición. Se pudiera sostener que estamos tratando de ser
bastante imparciales con respecto a nuestros criterios. Sin embargo, si usted fuera un
miembro de alguna área de algún estado en donde nunca se halla realizado una
reunión de la OPEN, y su planteamiento fuera amontonado con los de otras personas
que sí han contado con previas reuniones en sus áreas, usted pudiera legítimamente
sentir que hemos actuado con prejuicios en contra suya, lo cual se manifiesta en
nuestro criterio de revisión.
A fin de cuentas, los planteamientos mismos deben reflejar el contenido
legítimo manifestado por los participantes. De esta manera, en la elaboración de
mapas conceptuales, existe una gran cantidad de “autores” que ejercen influencia
sobre el sujeto y el tema de un mapa de muchas maneras, con frecuencia de forma
inconsciente. En este contexto, así como ocurre con nuestros colegas cartógrafos, el
evaluador-facilitador trata con frecuencia de mantener distancia ante los resultados.
Nosotros sólo somos facilitadores y lo que constituye un reflejo objetivo de la
realidad, es el mapa. ¡Los cartógrafos nos recuerdan que incluso esa afirmación no se
cumple en la mayoría de los mapas geográficos! ¿Porque habríamos de esperar que
el caso de los mapas conceptuales sea diferente?
Cada mapa muestra esto... pero no aquello

...El cartógrafo realiza una selección, clasifica y estandariza; se encarga de
realizar simplificaciones y combinaciones intelectuales y gráficas. Además
enfatiza, amplía, atenúa u omite fenómenos visuales de acuerdo con la
importancia que estos representen para el mapa. En pocas palabras, el
53
cartógrafo generaliza, estandariza, selecciona y reconoce aquellos elementos
que interfieren entre sí, que se oponen o superponen, y de esta manera
coordina el contenido del mapa para mostrar con claridad los patrones
geográficos de una región. (Imhoff, E., 1982)
¿Qué analogía puede establecerse para el tipo de mapas conceptuales descritos
en este ensayo? Quizás sea simplemente la escogencia de esta forma de elaboración
de mapas conceptuales, es decir, de esta metodología específica, la cual, al igual que
la escogencia de un método de proyección para la elaboración de mapas geográficos,
representa un factor clave de selección (aunque es difícil determinar de qué manera
esa escogencia propiamente dicha determine el contenido del mapa). En otro sentido,
la metodología misma, es decir, los algoritmos aplicados y la redacción del
planteamiento central, limita la naturaleza del mapa producido. Quizás resulte muy
obvio que la perspectiva del mapa conceptual pueda quedar muy restringida por la
selección de los participantes, y las circunstancias específicas que pudieran
motivarles a participar de forma activa y ofrecer lo que éstos realmente piensan y
creen. En este sentido, a diferencia de la participación presencial, la participación
anónima pudiera apuntalar el proceso y ayudar a mitigar la escasez de ideas. Por otro
lado, como lo muestra este estudio de caso, la participación anónima presenta sus
problemas particulares.
Uno de los aspectos que permite delimitar más la tarea de la elaboración de
mapas conceptuales es el planteamiento principal para la lluvia de ideas, puesto que
determina los contenidos que serán permitidos y los que serán excluidos u obviados.
A continuación una vez más presentaremos uno de los planteamientos utilizados en el
estudio de caso:
Pienso que, dentro de los próximos cinco años, la Red de Evaluadores del
Programa Oregón debería dedicarse específicamente a la tarea de...
54
A primera vista, este planteamiento no parece delimitar bien el problema. No
obstante, procedamos a analizarlo de una forma un poco más crítica. Éste nos permite
primeramente limitarnos a un período de cinco años, que aún parecen bastante, pero
que no abarca sino sólo una parte del tiempo que ha invertido la mayoría de los
miembros en sus carreras como evaluadores. El enfoque requiere tareas que puedan
realizarse, es decir, acciones concretas, y es neutro porque no especifica si estas
actividades deban ejecutarse completamente, aunque eso es lo que sugiere. De igual
manera, el enfoque es neutro porque no especifica las razones por las cuales las
sugerencias deberán ser tomadas en cuenta. ¿El objetivo será mejorar la OPEN,
ayudar a sus miembros o contribuir a mantener un control en la profesión de la
evaluación en la ciudad de Oregón? Quizás existan aspectos más importantes, como
numerosos aspectos potencialmente críticos que quedan fuera del alcance del
enfoque. En consecuencia, el enfoque no exige la realización de una lluvia de ideas
acerca de los verdaderos problemas de la OPEN o de los aspectos que se quieran
tratar en cuanto a la organización, sino que, en este sentido, provoca una selección
que, a su vez, cause influencia sobre los demás aspectos que siguen al proceso.
Otra área en la que se debe realizar una selección es en la determinación del
número de conjuntos del mapa. No existe una manera sencilla de realizar este
procedimiento matemáticamente, y si la hubiere, tampoco está claro si este tipo de
operaciones sea conveniente. Tampoco es fácil lograrlo en un proceso en que se
trabaja con grupos de participantes, puesto que la mayoría de ellos no tienen interés
en examinar las múltiples soluciones de conjuntos ni en determinar cuál pudiera ser la
mejor. Por lo general, este tipo de decisiones quedan en manos del facilitador. En
este estudio de caso, seguí el procedimiento estándar para seleccionar los conjuntos
(ver arriba) y el proceso fue bastante expedito. Sin embargo, en el caso de otros
procesos para elaboración de mapas hay veces en que los mapas no son tan claros
como en este estudio de caso, o en donde existen múltiples soluciones de conjuntos
que pudieran ser valiosas. Esta situación se parece un poco al hecho de que el
cartógrafo reconoce que trazar un mapa a cierta escala tendría valor para
55
determinados propósitos, mientras que utilizar una escala diferente tendría valor pero
para otros fines, sin querer decir que una forma sea la “correcta” y la otra la
“incorrecta”, o que una fuera la “mejor” y la otra la “peor”, hablando en sentido
abstracto. Lo que sucede es que el contexto es un factor determinante a la hora de
evaluar una selección.
Un mapa representa una idea del mundo más no lo reproduce

Existe la ingenua tendencia a creer que un mapa geográfico es una ventana
hacia la realidad.
¿Acaso no se trataba de una realidad? Entonces ¿cómo es que sólo representa
la... opinión, la idea de alguien acerca de dónde comienza y termina la
propiedad de alguna persona, o una buena conjetura sobre dónde estaban las
fronteras, o una idea aproximada de la ubicación de una inundación ocurrida
en cien años, mas no es la inundación propiamente dicha? En este sentido, lo
que se omite es precisamente la interpretación social de la línea que demarca
la propiedad o de la zona en la que ocurrió la peor inundación en cien años, la
cual, al igual que en todos los mapas, no es una línea que se pueda ver ni una
marca que indique el nivel máximo de las aguas trazada con barro sobre una
pared o con sedimentos a lo largo de un banco, sino acaso una transpolación
más o menos cuidadosa que parte de un inmenso despliegue de operaciones
estadísticas hasta dar como resultado una espira de líneas de relieve. Siempre
que se acepte un mapa como una ventana al mundo de la realidad, estas líneas
también deberán ser aceptadas como elementos de representación con la
categoría ontológica de corrientes y colinas. (Wood, 1992, p. 18-19)
Sin embargo, esta reflexión trasciende el plano de los abastares filosóficos y
de argumentos académicos abstractos. Como ocurre con la evaluación, la postura que
asumamos sobre este tema provoca un gran efecto sobre el mundo real:
No obstante, los mapas no serán reconocidos como interpretaciones sociales
hasta que no sea develado su carácter contingente, condicional y... arbitrario.
56
Es precisamente en este instante cuando los objetos o conceptos representados
por estas líneas se ven sujeto a la discusión y el debate y se manifiesta el
interés que en ellas puedan tener propietarios, estados y compañías de seguros.
(Wood, 1992, p.19)
En los mapas conceptuales, la naturaleza constructivista es incluso más
evidente que en el caso de los mapas geográficos. Sin embargo, a medida que nos
hagamos dependientes de equipos cada vez más sofisticados y se cuente con
algoritmos matemáticos más avanzados, aumenta la tentación de ver los resultados de
todo este despliegue de alardes tecnológicos como algo muy preciso y como una
representación válida de un mundo cognitivo implícito.
Esta reacción es perfectamente admisible y, con este argumento no estamos
aceptando la idea de que debamos renunciar a la meta de elaborar mapas tan precisos
como podamos. Al afirmar que un mapa conceptual aleatorio, improvisado a partir
de datos matemáticos generados en forma aleatoria, no seríamos más tajantes que
Wood si éste afirmara, tal como podría esperarse, que los bosquejos aleatorios
pudieran calificar como guías útiles para elaborar mapas geográficos. En todo caso,
tanto para los mapas geográficos como los conceptuales, se trata de que
reconozcamos que no hay manera de que con un solo mapa podamos representar la
realidad. Un mapa es, si acaso, una entre una infinita gama de posibilidades
admisibles y precisas con que podemos reflejar la realidad. En vez de ser una
restricción, el reconocer esta situación actúa como un catalizador. Para el usuario de
un mapa, esta herramienta se convierte en una perspectiva, sugiere otras, y debe ser
vista desde un punto de vista crítico. La pregunta que surge de todos los mapas, al
menos implícitamente, es “¿por qué se formuló tal o cual pregunta?” o “¿por qué se
seleccionó tal o cual enfoque? o “¿por qué se aplicó una perspectiva específica?”
No es que no valga la pena alcanzar la precisión, sino que en realidad nunca
fue el problema, sino sólo una excusa. La precisión no es lo que está en juego,
sino saber con respecto a qué criterio se define tal precisión ¿Qué importancia
tiene calcular un área de un estado con un margen de milímetros cuadrados
57
cuando no podemos contar su población? ¿A quién le interesa que
determinemos la ubicación del Taj Mahal de Donald Trump con una precisión
de centímetros cuando lo que pudiera interesar sería la magnitud de las
operaciones monetarias en dólares de las comunidades en las que se originan
sus ganancias? ¿Qué sentido tiene preocuparse por generalidades sobre
carreteras en un mapa de rutas de transporte cuando lo que hace falta son más
rutas de unidades de transporte? Cada uno de estos puntos de vista se elige
con criterios e importancia social, es decir, según la perspectiva restringida
por el factor social y no importa la transparencia del lente con que se mire. La
precisión no se pone en duda, simplemente…no es materia de discusión.
(Wood, 1992, p.21)
Partiendo de esa premisa, queda claro que el mapa de la OPEN es una
interpretación, basada en una muestra no representativa de los miembros de la OPEN
y en algoritmos que, inevitablemente, dan cabida a distorsiones.14 Además, el mapa
aborda un tópico bastante amplio (las medidas que la OPEN como organización debe
tomar dentro de los siguientes cinco años) visto desde las perspectivas de un pequeño
grupo de individuos que se han seleccionado a sí mismos. Estos participantes
pudieran mostrar un interés en la OPEN por un plazo más largo que la mayoría.
Probablemente difieren en muchos aspectos de aquellos miembros que escogen no
participar. De allí surge una pregunta acerca de la organización dentro de un lapso
específico de tiempo. Otros pudieran interesarse en otras interrogantes y
cronogramas, lo cual no significa que el mapa no sea “preciso”, sino sólo que es
elaborado desde un punto de vista particular. Pudiéramos obtener un sinnúmero de
mapas distintos que sean pertinentes, interpretables o útiles, utilizando a los mismos
participantes como muestra o a otros. Pudiéramos percatarnos de que, aún si
repetimos el estudio de manera exactamente igual pero con otros miembros OPEN,
14
Por ejemplo, la escala multidimensional está restringida a dos dimensiones cuando, en realidad, no
tenemos idea de cuántas dimensiones pudieran necesitarse pata representar la información con
precisión.
58
los resultados no sufrirían cambios substanciales, como también puede que sí.
Ultimadamente, el asunto es completamente empírico. Sin embargo, la selección de
la perspectiva, las preguntas formuladas y la experiencia acumulada hasta el
momento son producto directo de decisiones tomadas consciente o inconscientemente
en el transcurso de la elaboración de este mapa. Incluso, una muestra más amplia (o
una participación masiva de los miembros de la población de interés) no serviría para
atenuar el hecho de que el mapa es inherentemente una interpretación basada en un
sinfín de elecciones.
Los mapas convierten en presente el pasado y el futuro

El mundo al que estamos acostumbrados, el mundo real, está hecho así, por
los pensamientos y esfuerzos acumulados en el pasado. Se presenta ante
nosotros en forma de mapa, y decir que “se presenta” es decir que “se hace
presente”, de manera que todo lo que forme parte del pasado o futuro
invisible, inasequible o borrable pueda convertirse en parte de nuestra
vivencia…en este mismo momento…y en este mismo lugar (Wood, 1992, p.
7)
Esto es lo que significa utilizar un mapa. Pudiera parecer como la búsqueda

de un camino o una acción legal sobre una propiedad o un análisis de las
causas de un cáncer, pero siempre se trata de ese hecho de incorporar en el
aquí y el ahora las acciones realizadas en el pasado. Y no es menos cierto
cuando esas acciones ocurren completamente… en nuestra mente; los mapas
que construimos en nuestra mente representan experiencias exactamente de la
misma manera en que lo hacen los mapas en un papel, acumuladas de la forma
en que hemos andado nuestro camino por el mundo en la actividad de nuestro
diario vivir (Word, 1992, p. 14)
En este sentido, el vínculo entre los mapas geográficos y conceptuales es casi
explícito. Todos los mapas conceptuales, así como los geográficos, incorporan
59
nuestro pasado en una acumulación imperfecta de nuestras experiencias y
pensamientos. El mapa conceptual de la OPEN representa el pasado a través de las
experiencias de los participantes. Se anticipa al futuro en el sentido de que se
concentra en acciones concretas que la OPEN pudiera llevar a cabo. De esta manera,
este mapa conceptual, como todos los demás, está circunscrito en un contexto
histórico, es decir, se asemeja a una fotografía instantánea que presenta el pasado y el
futuro respecto de un momento dado.
Los mapas establecen relaciones entre el territorio y lo que contiene

La capacidad de relacionar el territorio con lo que contiene es lo que ha hecho
de los mapas un instrumento valioso para tantas personas durante tanto
tiempo. Los mapas relacionan un territorio con sus impuestos, con el servicio
militar o cierta tasa de precipitaciones y con la probabilidad de que ocurra un
terremoto o una inundación… (Wood, 1992, p. 10)
De esa misma manera, la capacidad que posee un mapa conceptual de
relacionar conceptos e ideas con acciones, lo convierte en una herramienta muy
valiosa. En un esfuerzo de planificación, con frecuencia utilizamos los conjuntos de
planteamientos como la justificación para las actividades futuras. Los planteamientos
de cada conjunto se convierten a su vez en la justificación para aquellos enunciados
de acciones en la que se tienen personas específicas asignadas para ejecutarlos,
recursos concedidos y plazos establecidos. En el contexto de la evaluación, por lo
general utilizamos los mapas como base para dar sentido práctico a las mediciones
efectuadas. En este sentido, el conjunto actúa como el constructo de las mediciones,
y los planteamientos como ítems de mediciones potenciales o reales, es decir, como
las operalizaciones del constructo. En ambos contextos, el mapa sirve de vehículo o
fundamento para la representación de consecuencias o resultados, es decir, constituye
el marco referencial que permite apreciar avances en el plan estratégico o cambios en
la medición.
60
Cuando se piensa en mapas geográficos, el cartógrafo tiene como referencia
los mapas geográficos implícitos que elaboramos en nuestras mentes (o mapas
mentales) del paisaje geográfico que atravesamos. De igual manera, elaboramos
mapas mentales que van más allá de la simple geografía, puesto que nuestros mapas
mentales incorporan varios modos de nuestra experiencia como lo son las imágenes,
los olores, los sonidos y, especialmente el lenguaje y el significado que le atribuimos
a estas experiencias.
Por supuesto, un mapa mental está claramente relacionado con la manera en
que utilizamos los mapas impresos para tomar decisiones... Ciertamente, las
similitudes aumentan una vez que comenzamos a exteriorizar estos mapas
para compartirlos con otros. ¿Qué podemos decir? ¿Por qué escogemos un
camino? “Porque es el más corto”. “No, no, el camino más corto es el de St.
Mary hasta Lassiter Mill”- “Ah, entonces salga de Six Forks hasta Sandy
Forks” “Claro”. En este caso, en la mente de cada individuo los mapas son
consultados como si fueran planos vistos sobre una mesa, en los que se
relacionan los conocimientos individualmente interpretados en el pasado para
resolver una vivencia compartida que se desenlaza en el presente. (Wood,
1992, p. 15-16)
Los mapas conceptuales también permiten a los individuos negociar en un
terreno específico, en este caso, un terreno cognitivo de relaciones semánticas y
experiencias personales. La discusión de un mapa conceptual es una justificación
válida para explorar significados y consensos compartidos o identificar la diversidad
y divergencia de perspectivas. En el estudio de caso de la OPEN, no se ha dado esta
discusión (y pudiera ser que nunca se dé). Hubo algunas discusiones breves sobre los
mapas producidos en esta conferencia, durante la cual se cumplió el propósito
principal de construir estos mapas (ilustrar el proceso de elaboración de mapas con un
ejemplo relevante para los miembros de la OPEN). No obstante, la OPEN pudo
decidir abocarse a las estrategias de planificación utilizando el mapa conceptual
61
producido en la conferencia (y, esperaríamos ampliarlo con una mayor participación e
información de entrada).
Los intereses a los que responde el mapa están ocultos

¿Por qué tantas personas se desconciertan ante la idea de que los mapas
están prejuiciados, inclinados a una perspectiva o son necesariamente una distorsión?
Ello se debe a que un mapa es fidedigno precisamente en la medida en que la
huella de su autor… desaparece, porque sólo cuando éste pasa inadvertido,
puede llegar a materializarse el mundo real que el mapa trata de representar
(es decir, a ser asumido como el mundo). En la medida en que se evidencia la
presencia del autor y el interés que ineludiblemente representa, se hace difícil
pasarlo por alto y ver el mapa sin tomarlo en cuenta, asumiendo que el mundo
descrito… sea el mundo en verdad.
…Así como el autor y su interés vienen a ser marginados (o apartados
completamente del medio), el mundo representado es capaz de deslumbrarnos
tan rápidamente que habremos olvidado que se trataba de un cuadro que
alguien había elaborado para nosotros, es decir, olvidamos que ha sido
modificado, manipulado, seleccionado y codificado. Muy pronto…se
convierte en el mundo, en el real… en la realidad…Cuando los autores no se
presentan a sí mismos de forma manifiesta... en seguida sentimos que surge
cierto respeto. Es sorprendente cómo ocurre esto tan fácilmente, cómo
enseguida tomamos en cuenta, de forma natural, lo que no es más que la
interpretación social de un mapa. Todo lo que se requiere es que el autor
desaparezca y que su interés se vuelva invisible. (Wood, 1992, p. 70-71).
La elaboración de los mapas conceptuales se beneficia de la ilusión de la
objetividad, es decir, de la apariencia de que no existe nadie detrás de la máquina, y
de que se trata de un algoritmo científico y desapasionado que determina la
disposición de sus elementos. No obstante, muchas veces podemos percatarnos de la
62
persona que se encuentra detrás del algoritmo, e inclusive sucede con más frecuencia
cuando los participantes mismos determinan los resultados.
Una y otra vez hemos notado una vaguedad similar de contenidos y formas,
una deficiente diferenciación entre fines y medios. El mapa mostrará todo (y,
por consiguiente, se declarará inocente de cualquier elección). También
mostrará todo tal como es (sin tomar en cuenta las “mentiras blancas” que el
mapa deba decir para ser preciso y valedero). El mapa será visto como un
instrumento aplicable a muchos propósitos, pero ninguno puede predominar,
porque, de lo contrario, sus medios intentarán abarcar una gama tan amplia de
instituciones sociales que ninguna de ellas podrá reclamar el mapa para sí.
Por consiguiente, los diferentes grupos con intereses opuestos y estratificados
eludirán su responsabilidad en la elaboración del mapa, o su autoría lucirá tan
fragmentada entre tantos especialistas que se vuelve imposible determinarla.
Precisamente por mentir, por su carácter incierto y confuso, los mapas
parecerán mostrar el mundo tal como es en realidad. (Wood, 1992, p. 73)
Me parece que esta perspectiva es particularmente acertada. Si hay algo
cierto con respecto a los mapas conceptuales es que, a pesar de las incontables
decisiones y juicios aplicados a ellos, y a pesar de las múltiples perspectivas y
distorsiones, es difícil observar el mapa final sin considerarlo “preciso” en cierto
sentido. El ser humano ha cultivado un deseo muy fuerte de “buscar sentido” a su
entorno. ¡De cualquier modo, incluso somos capaces de realizar elaboradas
interpretaciones de manchas de tinta dispuestas al azar! Un mapa conceptual muestra
un cuadro que siempre parece ser muy ordenado y razonable. La computadora genera
muchos polígonos que parecen categóricos para encerrar grupos e ideas. Los estratos
que aparecen en los mapas sugieren relaciones de importancia y otras características.
Al fin de cuentas, con frecuencia el mapa termina siendo simplemente convincente.
Por ello, debemos recordarnos a nosotros mismos que los mapas son frágiles y
únicos, y ser honestos con nosotros mismos y con los participantes acerca de la forma
en que el mapa oculta diversos intereses y prejuicios.
63
El interés del mapa puede ser el suyo
Alguien pudiera considerar la perspectiva ofrecida en este ensayo sobre
elaboración de mapas conceptuales y sentirse deprimido o desalentado. Algunas
veces resulta difícil reconocer que una herramienta, aparentemente tan poderosa, sea
falible e imperfecta (como todas las demás, me atrevo a decir). Estoy seguro de que,
en la comunidad de la cartografía en la que no es fácil reconocer que esto también
sucede con los mapas geográficos, a sido motivo de algunas polémicas y de
consternación durante las últimas décadas.
Una vez que aceptamos la inherente propensión de los mapas a incorporar
perspectivas particulares y que nos percatamos de que cada mapa responde a intereses
particulares, ¿diríamos entonces que el mapa ha perdido su valor?
Una vez que se reconoce que el mapa es una representación sesgada por
intereses particulares y se reconoce a plenitud su contingencia histórica, ya no
es necesario disfrazarla. Una vez liberada de su carga de…disimulo… el
mapa podrá asumir su carácter más genuino, que es el de ser un instrumento
para… procesar datos, analizar información cuantitativa y servir como
argumento persuasivo (Wood, 1992, p. 182)
Si este es el caso de los mapas geográficos, entonces quizás con más razón
sea el caso de los mapas conceptuales. Durante las últimas décadas, y en virtud de
que me dediqué a desarrollar la idea de la elaboración de los mapas conceptuales
descritos en este ensayo, con frecuencia he escuchado interrogantes por parte de mis
colegas acerca de la “validez” de los mapas que se estaban generando. Como
especialista en metodología, también me interesé en esta clase de interrogantes y en
diversos casos (Dumont, 1989; Trochim, 1993) intenté abordarlo, con diferentes
grados de éxito entre un trabajo y otro. No se ha dilucidado con exactitud de qué
64
manera se pudiera determinar mejor metodológicamente la precisión o validez de
éstos mapas.
Esta breve estadía en el mundo de la cartografía nos hace concluir que el
tema de la validez pudiera estar mal enfocado. Esta experiencia nos hace ver que
todos los mapas son inherentemente imprecisos, y quizás, me atrevo a decir, que
incluso el asunto de la precisión no sea muy interesante ni emocionante. En cierto
sentido, este descubrimiento no resta importancia al papel de los mapas, sean
geográficos o de otra naturaleza y quizás, en vez de ello, deja absuelto los mapas,
como diría Wood, para que puedan desempeñarse en el área en que lo hacen mejor.
El mapa es una poderosa herramienta sugerente. Ninguno puede llevar el peso de ser
precisos. Muchos mapas basados en múltiples y diversos prejuicios, perspectivas e
intereses, no son menos por ser diferentes. Potencialmente, son estas diferencias la
que le añaden su valor, que profundizan la perspectiva e incrementan nuestro
conocimiento del mundo. Los mapas asumen entonces su “…carácter más genuino,
que es el de ser un instrumento para… procesar datos, analizar información
cuantitativa y servir como argumento persuasivo.”
65
Referencias
Anderberg, M.R. (1973). Cluster analysis for applications. New York, NY: Academic
Press.
Bragg, L.R. and Grayson, T.E. (1993). Reaching consensus on outcomes: Lessons
learned about concept mapping. Paper presented at the Annual Conference of the
American Evaluation Association, Dallas, TX.
Buzan, T. with Buzan, B. (1993). The Mindmap Book: Radiant Thinking, The Major
Evolution in Human Thought. BBC Books, London.
Caracelli, V. (1989). Structured conceptualization: A framework for interpreting

evaluation results. Evaluation and Program Planning. 12, 1, 45-52.
Cook, J. (1992). Modeling staff perceptions of a mibile job support program for
persons with severe mental illness. Paper presented at the Annual Conference of the
Cooksy, L. (1989). In the eye of the beholder: Relational and hierarchical structures
in conceptualization. Evaluation and Program Planning. 12, 1, 59-66.
Davis, J. (1989). Construct validity in measurement: A pattern matching approach.

Dumont, J. (1989). Validity of multidimensional scaling in the context of structured

Everitt, B. (1980). Cluster Analysis. 2nd Edition, New York, NY: Halsted Press, A
Division of John Wiley and Sons.
Galvin, P.F. (1989). Concept mapping for planning and evaluation of a Big
Brother/Big Sister program. Evaluation and Program Planning. 12, 1, 53-58.
Grayson, T.E. (1992). Practical issues in implementing and utilizing concept

mapping. Paper presented at the Annual Conference of the American Evaluation
Grayson, T.E. (1993). Empowering key stakeholders in the strategic planning and
development of an alternative school program for youth at risk of school behavior.
66
Paper presented at the Annual Conference of the American Evaluation Association,
Dallas, TX.
Gurowitz, W.D., Trochim, W. and Kramer, H. (1988). A process for planning. The
Journal of the National Association of Student Personnel Administrators, 25, 4, 226-
235.
Harley, J.B. (1990). Text and contexts in the interpretation of early maps. In David
Buisseret (Ed.) From Sea Charts to Satellite Images: Interpreting North American
History Through Maps. University of Chicago Press. Chicago.
Imhoff, E. (1982). Cartographic Relief Representation. de gruyter Press, Berlin.
Kane, T.J. (1992). Using concept mapping to identify provider and consumer issues
regarding housing for persons with severe mental illness. Paper presented at the
Keith, D. (1989). Refining concept maps: Methodological issues and an example.

Kohler, P.D. (1992). Services to students with disabilities in postsecondary education

settings: Identifying program outcomes. Paper presented at the Annual Conference of
Kohler, P.D. (1993). Serving students with disabilities in postsecondary education

settings: Using program outcomes for planning, evaluation and empowerment.Paper
presented at the Annual Conference of the American Evaluation Association, Dallas,
TX.
Kruskal, J.B. and Wish, M. (1978). Multidimensional Scaling. Beverly Hills, CA:
Sage Publications.
Lassegard, E. (1992). Assessing the reliability of the concept mapping process. Paper
presented at the Annual Conference of the American Evaluation Association, Seattle,
WA.
Lassegard, E. (1993). Conceptualization of consumer needs for mental health

services.Paper presented at the Annual Conference of the American Evaluation
Linton, R. (1989). Conceptualizing feminism: Clarifying social science concepts.

67
Mannes, M. (1989). Using concept mapping for planning the implementation of a
social technology. Evaluation and Program Planning. 12, 1, 67-74.
Marquart, J.M. (1988). A pattern matching approach to link program theory and
evaluation data: The case of employer-sponsored child care. Unpublished doctoral
dissertation, Cornell University, Ithaca, New York.
Marquart, J.M. (1989). A pattern matching approach to assess the construct validity
of an evaluation instrument. Evaluation and Program Planning. 12, 1, 37-44.
Marquart, J.M. (1992). Developing quality in mental health services: Perspectives of

administrators, clinicians, and consumers. Paper presented at the Annual Conference
of the American Evaluation Association, Seattle, WA.
Marquart, J.M., Pollak, L. and Bickman, L. (1993). Quality in intake assessment and
case management: Perspectives of administrators, clinicians and consumers. In R.
Friedman et al. (Eds.), A system of care for children's mental health: Organizing the
research base. Tampa: Florida Mental Health Institute, University of South Florida.
McLinden, D. J. & Trochim, W.M.K. (In Press). From Puzzles to Problems:

Assessing the Impact of Education in a Business Context with Concept Mapping and
Pattern Matching. In J. Phillips (Ed.), Return on investment in human resource
development: Cases on the economic benefits of HRD - Volume 2. Alexandria, VA:
American Society for Training and Devlopment.
Mead, J.P. and Bowers, T.J. (1992). Using concept mapping in formative evaluations.
Paper presented at the Annual Conference of the American Evaluation Association,
Seattle, WA.
Mercer, M.L. (1992). Brainstorming issues in the concept mapping process. Paper
presented at the Annual Conference of the American Evaluation Association, Seattle,
WA.
Novak, J.D. (1993). How do we learn our lesson? Taking students through the
process. The Science Teacher, 60, 3, 50-55.
Novak, J.D. and Gowin, D.B. (1985). Learning How To Learn. Cambridge,
Cambridge University Press
68
Penney, N.E. (1992). Mapping the conceptual domain of provider and consumer
expectations of inpatient mental health treatment: New York Results. Paper presented
at the Annual Conference of the American Evaluation Association, Seattle, WA.
Romney, A.K., Weller, S.C. and Batchelder, W.H. (1986). Culture as consensus: A
theory of culture and informant accuracy. American Anthropologist, 88, 2, 313-338.
Rosenberg, S. and Kim, M.P. (1975). The method of sorting as a data gathering
procedure in multivariate research. Multivariate Behavioral Research, 10, 489- 502.
Ryan, L. and Pursley, L. (1992). Using concept mapping to compare organizational

visions of multiple stakeholders. Paper presented at the Annual Conference of the
SenGupta, S. (1993). A mixed-method design for practical purposes: Combination of

questionnaire(s), interviews, and concept mapping.Paper presented at the Annual
Shern, D.L. (1992). Documenting the adaptation of rehabilitation technology to a core

urban, homeless population with psychiatric disabilities: A concept mapping
approach. Paper presented at the Annual Conference of the American Evaluation
Shern, D.L., Trochim, W. and LaComb, C.A. (1995). The use of concept mapping for
assessing fidelity of model transfer: An example from psychiatric rehabilitation.
Evaluation and Program Planning, 18, 2.
Trochim, W. (1985). Pattern matching, validity, and conceptualization in program

evaluation. Evaluation Review, 9, 5, 575-604.
Trochim, W. (Ed.) (1989). A Special Issue of Evaluation and Program Planning on

Concept Mapping for Planning and Evaluation, 12.
Trochim, W. (1989a). An introduction to concept mapping for planning and

evaluation. Evaluation and Program Planning, 12, 1, 1-16.
Trochim, W. (1989b). Concept mapping: Soft science or hard art? Evaluation and
Trochim, W. (1989c). Outcome pattern matching and program theory. Evaluation and
69
Trochim, W. (1990). Pattern matching and program theory. In H.C. Chen (Ed.),
Theory- Driven Evaluation. New Directions for Program Evaluation, San Francisco,
CA: Jossey-Bass.
Trochim, W. and Cook, J. (1992). Pattern matching in theory-driven evaluation: A

field example from psychiatric rehabilitation. in H. Chen and P.H. Rossi (Eds.) Using
Theory to Improve Program and Policy Evaluations. Greenwood Press, New York,
49-69.
Trochim, W. (1993). Reliability of Concept Mapping. Paper presented at the Annual

Conference of the American Evaluation Association, Dallas, Texas, November.
Trochim, W. (1996). Criteria for evaluating graduate programs in evaluation.

Evaluation News and Comment: The Magazine of the Australasian Evaluation
Society, 5, 2, 54-57.
Trochim, W. and Linton, R. (1986). Conceptualization for evaluation and planning.

Trochim, W., Cook, J. and Setze, R. (1994). Using concept mapping to develop a
conceptual framework of staff's views of a supported employment program for
persons with severe mental illness. Consulting and Clinical Psychology, 62, 4, 766-
775.
Valentine, K. (1989). Contributions to the theory of care. Evaluation and Program

Planning. 12, 1, 17-24
Valentine, K. (1992). Mapping the conceptual domain of provider and consumer

expectations of inpatient mental health treatment: Wisconsin results. Paper presented
Weller S.C. and Romney, A.K. (1988). Systematic Data Collection. Newbury Park,
CA, Sage Publications.
Wilford, J.N. (1982). The Mapmakers: the Story of the great Pioneers in Cartography
from Antiquity to the Space Age. New York, Vintage Books.
Wood, D. (1992) . The Power of Maps. The Guilford Press. New York.
70
Tabla 1. Planteamientos sometidos a lluvia de ideas
1. Aumentar el número de miembros del personal encargado de los
programas y de los que no son investigadores.
2. Crear un logotipo.
3. Incluir en la organización empresas de fideicomiso público, tales como
empresas de servicios básicos o control de gasto público, que utilicen
el programa de animación GIF.
4. Reclutar recursos humanos relacionados con la evaluación, por
ejemplo, personas en el área de la evaluación de proyectos,
entrenamiento, asistencia técnica, etc.
5. Contar con sesiones de adquisición de destrezas cuya frecuencia sea
mayor a una vez cada dos meses.
6. Incrementar la diversidad de las áreas de especialización de los
miembros.
7. Procurar la participación de otras entidades político-administrativas
locales.
8. Convocar la Junta Directiva de los entes gubernamentales con el fin de
promover la evaluación.
9. Celebrar conferencias anuales.
10. Convocar reuniones durante el día para hacer compatible las tareas de
los investigadores con sus responsabilidades familiares.
11. Solicitar la publicación de las investigaciones llevadas a cabo en el
área.
12. Promover diversos eventos para recabar fondos.
13. Promover la participación de los evaluadores con otras partes del
estado.
14. Proporcionar información sobre las subvenciones disponibles.
15. Crear un enlace con nuestros vecinos del norte, y observar lo que está
pasando a nivel regional.
71
16. Contar con mesas redondas que cuenten con el apoyo de facilitadores.
17. Extenderse a organizaciones, que reciban ayuda federal, donde se
requiera la evaluación.
18. Ofrecer información actualizada a través de boletines informativos
acerca de las oportunidades de investigación anunciadas originalmente
por medio de la OPEN.
19. Ofrecer un feedback en la reunión anual de la AEA para los miembros
de nuestra organización que no puedan asistir.
20. Incluir “paneles” como parte del formato del programa.
21. Organizar mini conferencias.
22. Promover encuentros informales en diversos lugares entre los que no
son anfitriones.
23. Incrementar vínculos con otras organizaciones de investigación.
24. Desarrollar una red que incluya programas.
25. Establecer vínculos con cuerpos policiales con el fin de promover la
evaluación.
26. Trabajar en las formas de incluir a evaluadores provenientes de otras
partes del estado.
27. Incrementar los vínculos con la Asociación Americana de Evaluación
(AEA).
28. Incluir una pequeña biografía de cada miembro de la OPEN.
29. Celebrar eventos fuera del centro de la ciudad de Pórtland.
30. Elaborar boletines informativos, correos electrónicos o
comunicaciones mediante otros medios de comunicación.
31. Proporcionar información sobre los contratos disponibles.
32. Mantener presentaciones bimensuales.
33. Aumentar el alcance de la organización a fin de promover la
evaluación en los sectores públicos y privados.
72
34. Formar subgrupos de intereses especiales dentro del conjunto de
miembros de la organización.
35. Aplicar los conocimientos sobre investigación para realizar cambios de
política.
36. Aumentar aún más el alcance de la organización.
37. Incluir discusiones interactivas con presencia de un auditorio en las
sesiones de programas.
38. Unificar la junta consultora estatal y regional como representantes de
la OPEN.
39. Aumentar el número de miembros de auditores y contadores.
40. Continuar enviando anuncios por correo electrónico sobre
oportunidades de empleo.
41. Distribuir los resultados de las investigaciones a miembros y
organizaciones.
42. Hacer circular una lista de miembros que sean expertos en varias
técnicas y que deseen realizar breves consultas con otros miembros.
43. Elaborar bolsos color marrón con los rótulos “tópicos” o “intereses
especiales”.
44. Obtener más conocimiento de los evaluadores locales sobre su carga
de trabajo, incluyendo información acerca de proyectos en marcha.
45. Contar con un modelo de “las mejores prácticas” que puedan utilizarse
en conferencias y talleres.
46. Crear un fondo de becas estudiantiles para las conferencias.
47. Conformar un grupo de voluntarios con el fin de colaborar con las
agencias que requieran alguna evaluación pero que no cuente con los
fondos necesarios.
48. Realizar en los boletines informativos un resumen de las sesiones de
adquisición de destrezas.
73
49. Crear un mejor sistema de comunicación para miembros y no
miembros.
50. Conformar un fondo de becas para estudiantiles con el fin de recabar
dinero para las investigaciones (ejemplo, trabajos de grado)
51. Seguir ofreciendo sesiones destinadas al desarrollo del conocimiento y
la información.
52. Aumentar los vínculos con grupos de expertos de los EEUU.
53. Ofrecer servicios a la comunidad en el área de la evaluación destinada
las organizaciones que no cuenten con los fondos necesarios para tales
fines.
54. Formar un comité de conferencistas.
55. Mejorar la publicidad de los beneficios de los miembros de la OPEN.
56. Incorporar estudiantes y otros grupos privados del derecho al voto,
ofrecièndoles membresía gratuita.
57. Organizar almuerzos campestres para miembros y no miembros cada
año.
58. Aumentar el número de miembros así como la participación de
personas de tez oscura.
59. Coordinar la comunicación enviada a través de correos electrónicos,
páginas web y boletines informativos.
60. Aumentar el alcance de la organización de modo que incorpore todas
las tareas de investigación, y no se limite a los programas de
evaluación.
61. Incrementar la diversidad de los miembros de modo que incluya
sectores comerciales.
62. Desarrollar un programa de ayudas para estudiantes y recién
graduados.
63. Contar con un departamento de estudiantes.
64. Desarrollar un programa de ayuda para recién graduados.
74
65. Utilizar el formato de “disertación académica” para los temas
abordados por el programa.
66. Hallar una forma de supervisar la salud pública en el área conformada
por los tres condados participantes y persuadir a los miembros de la
Comisión a recabar datos anualmente.
67. Desarrollar programas de asistencia técnica para miembros con el fin
de ayuda a otros miembros.
68. Tomar medidas que faciliten el acceso de los miembros de las OPEN a
las bibliotecas.
69. Incentivar la colaboración entre la OPEN y las organizaciones que
requieran investigaciones y evaluaciones.
70. Crear una sala de chateo interactiva en el sitio web.
71. Crear la figura de pasantías remuneradas.
72. Constituirse como una organización sin fines de lucro.
73. Trabajar en función de promover la colaboración entre los miembros.
74. Planificar y programar las reuniones bimensuales con un año de
anticipación.
75. Exhortar a los miembros a que sean más activos y participativos.
76. Convocar reuniones realizadas fuera de Lucky Lab.
77. Incluir el “debate” como formato del programa.
78. Ofrecer becas estudiantiles en la conferencia de la AEA.
79. Promover un Crédito de Educación Continuo para los programas.
80. Incluir en los directorios aquellas áreas en las que los miembros les
gustaría desarrollar sus habilidades.
81. Duplicar el número de miembros.
82. Promover diversos eventos gratuitos.
83. Ampliar el alcance para mejorar las escuelas públicas y la educación
superior.
75
Tabla 2. Datos demográficos
Nombre: Empleo Por favor, seleccione una
¿Cuál fue el primer lugar donde lo emplearon? opción:
Gobierno
Educación
Organizaciones sin fines de
lucro
Asesor de investigación
Estudiante
Otros
Nombre: Grado académico Por favor, seleccione una
¿Cuál es el grado académico más alto obtenido? opción:
Licenciatura/Ingeniería
Maestría
Ph.D/Doctorado
Otros
Nombre: Disciplina Por favor, seleccione una
¿Cuál es su disciplina principal? opción:
Agricultura
Comercio
Economía
Educación
Evaluación
Salud
Servicios Comunitarios
Ciencias Políticas
Psicología
Sociología
76
Trabajo Social
Otra
Nombre: Años en el campo de la investigación Coloque un valor entre 0 y
¿Cuántos años de experiencia ha tenido en el campo de 75:
la investigación o en actividades afines?
Nombre: Miembro de la OPEN Por favor, seleccione una
¿Es Ud. Miembros de la OPEN? opción:
Si
No
Nombre: Miembro de la AEA Por favor, seleccione una
¿Es Ud. Miembro de la Asociación Americana de opción:
Evaluación (AEA)? Si
No
Nombre: Género Por favor, seleccione una
¿Cuál es su género? opción:
Femenino
Masculino
Nombre: Lugar de residencia Por favor, seleccione una
¿En que ciudad se encuentra ubicada su residencia? opción:
Pórtland
Extremo oeste de Pórtland
Extremo este de Pórtland
Corvallis
Salem
Eugene
Oregon – Otro
Estado de Washington
Otro
77
Tabla 3. Planteamientos por conjunto en orden descendente según el
criterio de clasificación promedio.
Reclutamiento
58) Aumentar el número de miembros y la participación de la personas de tez
oscura 3,83
6) Incrementar la diversidad de las áreas de especialización de los miembros.
3,78
13) Promover la participación de los evaluadores con otras partes del estado.
3,72
75) Exhortar a los miembros a que sean más activos y participativos. 3,61
55) Mejorar la publicidad de los beneficios de los miembros de la OPEN. 3,17
81) Duplicar el número de miembros. 3,11
60) Aumentar el alcance de la organización de modo que incorpore todas las
tareas de investigación y no se limite a los programas de evaluación. 2,94
4) Reclutar recursos humanos relacionados con la evaluación, por ejemplo,
personas en el área de la evaluación de proyectos, entrenamiento, asistencia
técnica, etc. 2,78
63) Contar con una división de estudiantes. 2,78
1) Aumentar el número de miembros del personal encargado de los programas
y de los que no son investigadores. 2,67
56) Incorporar estudiantes y otros grupos privados del derecho al voto
ofreciéndoles membresía gratuita. 2,67
61) Incrementar la diversidad de los miembros de modo que incluya sectores
comerciales. 2,61
39) Aumentar el número de miembros de auditores y contadores. 2,22
Clasificación promedio: 3,07
78
Vínculos
7) Procurar la participación de otras entidades político-administrativas locales.
3,61
26) Trabajar en las formas de incluir a evaluadores provenientes de otras
partes del estado. 3,61
15) Crear un enlace con nuestros vecinos del norte, y observar lo que está
pasando a nivel regional. 3,56
24) Desarrollar una red que incluya programas. 2,72
3) Incluir en la organización empresas de fideicomiso público, tales como
empresas de servicios básicos o de control del gasto público, que utilice el
programa de animación GIF. 2,41
RRPP & Servicios a la Comunidad

35) Utilizar los conocimientos sobre investigación para realizar cambios de
políticas. 3,22
47) Conformar un grupo de voluntarios con el fin de colaborar con las
organizaciones que requieran alguna evaluación pero que no cuenten con los
fondos necesarios. 3,11
53) Ofrecer servicios a la comunidad en el área de la evaluación destinada a
organizaciones que no cuenten con los fondos necesarios para tales fines. 3,00
66) Hallar una forma de supervisar la salud pública en el área conformada por
los tres condados y persuadir a los miembros de la Comisión de recabar datos
anualmente. 2,89
54) Formar un comité de conferencias. 2,83
11) Solicitar la publicación de las investigaciones llevadas a cabo en el área.
2,72
79
2) Crear un logotipo. 2,17
Alcance
69) Incentivar la colaboración entre la OPEN y las organizaciones que
requieran investigaciones y evaluaciones. 3,44
33) Aumentar el alcance de la organización a fin de promover la evaluación en
los sectores públicos y privados. 3,44
23) Incrementar los vínculos con otras organizaciones destinadas a la
investigación. 3,33
27) Incrementar los vínculos con la Asociación Americana de Evaluación
(AEA). 3,33
17) Extenderse a organizaciones que reciben ayuda federal, donde se requiera
la evaluación. 3,28
36) Aumentar aún más el alcance de la organización. 3,28
52) Aumentar los vínculos con grupos de expertos nacionales. 3,28
83) Ampliar el alcance para mejorar las escuelas públicas y de educación
superior. 3,11
8) Convocar la Junta Directiva de los entes gubernamentales con el fin de
promover la evaluación. 3,04
38) Unificar la junta consultora estadal y regional como representante de la
OPEN. 2,61
25) Establecer vínculos con cuerpos policiales con el fin de promover la
evaluación. 2,56
80
Programas & Eventos
51) Seguir ofreciendo sesiones para el desarrollo del conocimiento y la
información. 4,56
32) Mantener las presentaciones bimensuales. 4,06
43) Elaborar bolsos color marrón con los rótulos “tópico” o “intereses
especiales”. 3,89
9) Celebrar conferencias anuales. 3,83
10) Convocar reuniones durante el día para hacer compatible la tarea de los
investigadores con sus responsabilidades familiares. 3,44
21) Organizar mini conferencias. 3,44
45) Contar con un modelo de las “mejores prácticas” que puedan utilizarse en
conferencias y talleres. 3,41
37) Incluir discusiones interactivas con presencia de un auditorio en las
sesiones de programas. 3,22
44) Obtener más conocimiento de los evaluadores locales sobre su carga de
trabajo, incluyendo información acerca de proyectos en marcha. 3,22
72) Constituirse como una organización sin fines de lucro. 3,17
20) Incluir “paneles” como parte del formato del programa. 3,11
16) Contar con mesas redondas que cuenten con el apoyo de facilitadores.
3,06
77) Incluir el “debate” como formato del programa. 3,00
76) Convocar reuniones realizadas fuera de Lucky Lab. 2,94
82) Promover diversos eventos gratuitos. 2,89
29) Celebrar eventos fuera del centro de la ciudad de Pórtland. 2,83
74) Planificar y programar las reuniones bimensuales por un año de
adelantado. 2,78
65) Utilizar el formato de “disertación académica” para los temas abordados
por el programa. 2,67
81
12) Promover diversos eventos para recabar fondos. 2,61
22) Promover encuentros informales en diversos lugares entre los que no son
anfitriones. 2,61
5) Contar con sesiones de adquisición de destrezas cuya frecuencia sea mayor
a una vez cada dos meses. 2,56
57) Organizar almuerzos campestres para miembros y no miembros cada año.
2,22
Comunicación
40) Continuar enviando anuncios por correo electrónico sobre oportunidades
de empleo. 4,56
48) Realizar en los boletines informativos un resumen de las sesiones de
adquisición de destrezas. 3,83
31) Proporcionar información sobre los contratos disponibles. 3,78
14) Contar con mesas redondas que cuenten con la presencia de facilitadores.
3,67
19) Obtener un feedback en la reunión anual de la AEA para los miembros de
la organización que no puedan asistir. 3,61
30) Elaborar boletines informativos, correos electrónicos u otras
comunicaciones mediante otros medios de comunicación. 3,44
59) Coordinar la comunicación enviada a través de correos electrónicos,
páginas web y boletines informativos. 3,11
18) Ofrecer información actualizada a través de boletines informativos acerca
de las oportunidades de investigación anunciadas originalmente por medio de
la OPEN. 2,61
70) Crear una sala de chateo interactiva en el sitio web. 2,00
82
Clasificación Promedio: 3,40
Servicios
42) Hacer circular una lista de miembros que sean expertos en varias técnicas
y que deseen realizar breves consultas con otros miembros. 3,89
73) Trabajar en función de promover la colaboración entre los miembros. 3,78
41) Distribuir los resultados de las investigaciones a miembros y
organizaciones. 3,72
67) Desarrollar programas de asistencia técnica para miembros con el fin de
ayudar a otros miembros. 3,72
34) Formar subgrupos de intereses especiales dentro del conjunto de
miembros de la organización. 3,44
28) Incluir una pequeña biografía de los miembros de la OPEN. 3,06
49) Desarrollar un mejor sistema de comunicación para miembros y no
miembros. 3,06
68) Tomar medidas que faciliten el acceso de los miembros de la OPEN a las
bibliotecas. 2,94
79) Proponer un Crédito de Educación Continuo para los programas. 2,89
80) Incluir en los directorios aquellas áreas en las que los miembros les
gustaría desarrollar sus habilidades. 2,56
Becas y Ayudantías
64) Desarrollar un programa de ayuda para recién graduados. 3,22
62) Desarrollar un programa de ayudas para estudiantes y recién graduados.
3,11
46) Crear fondo de becas estudiantiles para las conferencias. 2,94
71) Crear la figura de pasantías remuneradas. 2,39
83
50) Conformar un fondo de becas estudiantiles con el fin de recabar dinero
para las investigaciones (ejemplo, trabajos de grado). 2,28
78) Ofrecer becas estudiantiles en la conferencia de la AEA. 2,28
Tabla 4. Planteamientos por orden descendente según su clasificación

promedio
40) Continuar enviando anuncios por correo electrónico sobre oportunidades
de empleo. 4,56
51) Seguir ofreciendo sesiones destinadas desarrollo del conocimiento y la
información. 4,56
32) Mantener las presentaciones bimensuales. 4,06
42) Hacer circular una lista de miembros que sean expertos en varias técnicas
y que deseen realizar breves consultas con otros miembros. 3,89
43) Elaborar bolsos color marrón con los rótulos “tópico” o “intereses
especiales”. 3,89
69) Incentivar la colaboración entre la OPEN y las organizaciones que
requieran investigaciones y evaluaciones. 3,89
9) Celebrar conferencias anuales. 3,83
48) Realizar en los boletines informativos un resumen de las sesiones de
adquisición de destrezas. 3,83
58) Aumentar el número de miembros así como la participación de gente de
tez oscura. 3,83
6) Incrementar la diversidad de las áreas de especialización de los miembros.
3,78
31) Proporcionar información sobre los contratos disponibles. 3,78
73) Trabajar en función de promover la colaboración entre los miembros. 3,78
13) Promover la participación los evaluadores en otras partes del estado. 3,72
84
41) Distribuir los resultados de las investigaciones a miembros y
organizaciones. 3,72
67) Desarrollar programas de asistencia técnica para miembros con el fin de
ayudar a otros miembros. 3,72
14) Contar con mesas redondas que cuenten con el apoyo de facilitadores.
3,67
7) Procurar la participación de otras entidades político-administrativas locales.
3,61
19) Obtener un feedback en la reunión anual de la OPEN para los miembros
de nuestra organización que no puedan asistir. 3,61
26) Trabajar en las formas de incluir evaluadores provenientes de otras partes
del estado. 3,61
75) Exhortar a los miembros a que sean más activos y participativos. 3,61
15) Crear un enlace con nuestros vecinos del norte, y observar lo que está
pasando a nivel regional. 3,56
10) Convocar reuniones durante el día para hacer compatibles las tareas de los
investigadores con sus responsabilidades familiares. 3,44
21) Organizar mini conferencias. 3,44
30) Elaborar boletines informativos, correos electrónicos u otras
comunicaciones mediante otros medios de comunicación. 3,44
33) Aumentar el alcance de la organización a fin de promover la evaluación en
los sectores públicos y privados. 3,44
34) Formar subgrupos de intereses especiales dentro de miembros de la
organización. 3,44
45) Contar con un modelo de las “mejores prácticas” que puedan utilizarse en
conferencias y talleres. 3,41
23) Incrementar los vínculos con otras organizaciones destinadas a la
investigación. 3,33
85
27) Incrementar los vínculos con la Asociación Americana de Evaluadores
(AEA). 3,33
17) Extenderse a organizaciones, que reciban ayuda federal, donde se requiera
la evaluación. 3,28
36) Aumentar aun más el alcance de la organización. 3,28
52) Aumentar los vínculos con grupos de expertos nacionales. 3,28
35) Utilizar los conocimientos sobre investigación para realizar cambios de
política. 3,22
37) Incluir discusiones interactivas con presencia de un auditorio en las
sesiones de programas. 3,22
44) Obtener más conocimiento de los evaluadores locales sobre su carga de
trabajo, incluyendo información acerca de proyectos en marcha. 3,22
64) Desarrollar un programa de ayuda para recién graduados. 3,22
55) Mejorar la publicidad de los beneficios de los miembros de la OPEN. 3,17
72) Constituirse como una organización sin fines de lucro. 3,17
20) Incluir “paneles” como parte del formato del programa. 3,11
47) Conformar un grupo de voluntarios con el fin de colaborar con las
organizaciones que requieran alguna evaluación pero que no cuenten con los
fondos necesarios. 3,11
59) Coordinar la comunicación enviada a través de los correos electrónicos,
páginas web y boletines informativos. 3,11
62) Desarrollar un programa de ayudas para estudiantes y recién graduados.
3,11
81) Duplicar número de miembros. 3,11
83) Ampliar el alcance para mejorar las escuelas públicas y de educación
superior. 3,11
8) Convocar la Junta Directiva de los entes gubernamentales con el fin de
promover la evaluación. 3,06
16) Contar con mesas redondas que cuenten con facilitadores. 3,06
86
28) Incluir una pequeña biografía de cada miembro de la OPEN. 3,06
49) Crear un mejor sistema de comunicación para miembros y no miembros.
3,06
53) Ofrecer servicios a la comunidad en el área de la evaluación destinada a
organizaciones que no cuenten con los fondos necesarios para tales fines. 3,00
77) Incluir el “debate” como formato del programa. 3,00
46) Crear un fondo de becas estudiantiles para las conferencias. 2,94
60) Aumentar el alcance de la organización de modo que incorpore todas las
tareas de investigación, y no sólo se limite a la evaluación de programas. 2,94
68) Tomar medidas que faciliten el acceso de los miembros de la OPEN a las
bibliotecas. 2,94
76) Convocar Reuniones realizadas fuera de Lucky Lab. 2,94
66) Hallar una forma de supervisar la salud pública en el área conformada por
los tres condados y persuadir a los miembros de la Comisión a recabar datos
anualmente. 2,89
79) Promover un Crédito de Educación Continuo para los programas. 2,89
82) Promover diversos eventos gratuitos. 2,89
29) Celebrar eventos fuera de la ciudad de Pórtland. 2,83
54) Formar un comité de conferencias. 2,83
4) Reclutar recursos humanos relacionados con la evaluación, por ejemplo,
personas en el área de la evaluación de proyectos, entrenamiento, asistencia
técnica, etc. 2,78
63) Contar con una división de estudiantes. 2,78
74) Planificar y programar las reuniones bimensuales con un año de
anticipación. 2,78
11) Solicitar la publicación de las investigaciones llevadas a cabo en el área.
2,72
24) Desarrollar una red que incluya programas. 2,72
87
1) Aumentar el número de miembros del personal encargado de los programas
y de los que no son investigadores. 2,67
56) Incorporar estudiantes y otros grupos privados del derecho al voto,
ofreciéndoles membresía gratuita. 2,67
65) Utilizar el formato de “disertación académica” para los temas abordados
por el programa. 2,67
12) Promover diversos eventos para recabar fondos. 2,61
18) Ofrecer información actualizada a través de boletines informativos acerca
de las oportunidades de investigación anunciadas originalmente por medio de
la OPEN. 2,61
22) Promover encuentros informales en diversos lugares entre los que no son
anfitriones. 2,61
38) Unificar la junta consultora estadal y regional como representante de la
OPEN. 2,61
61) Incrementar la diversidad de los miembros de modo que incluya sectores
comerciales. 2,61
5) Contar con sesiones de desarrollo de adquisición de destrezas cuya
frecuencia sea mayor a una vez cada dos meses. 2,56
25) Establecer vínculos con cuerpos policiales con el fin de promover la
evaluación. 2,56
80) Incluir en los directorios aquellas áreas en las que los miembros les
gustaría desarrollar sus habilidades. 2,56
3) Incluir en la organización empresas de fideicomiso público, tales como
empresas de servicios básicos o de control del gasto público, que utilice el
programa de animación GIF. 2,41
71) Crear la figura de pasantías remuneradas. 2,39
50) Conformar un fondo de becas estudiantiles con el fin de recabar dinero
para las investigaciones (ejemplo, trabajos de grado). 2,28
78) Ofrecer becas estudiantiles en la conferencia de la AEA. 2,28
88
39) Aumentar el número de miembros de auditores y contadores. 2,22
57) Organizar almuerzos campestres para miembros y no miembros cada año.
2,22
2) Crear un logotipo. 2,17
70) Crear una sala de chateo interactiva en el sitio web. 2,00
89
Measuring Organizational Performance as a
Result of Installing a New Information
System: Using Concept Mapping as the
Basis for Performance Measurement

Cornell University
Paper presented to the Annual Conference of the American Evaluation

Association, Orlando, FL, November 3-6, 1999
DRAFT: Not for quotation or citation

For additional information contact:
William Trochim
132 MVR Hall
Department of Policy Analysis &
Management
Cornell University
Ithaca NY 14853
wmt1@cornell.edu
The Client: CITGO

● Key Statistics: Revenues: $13 Billion;
Employees: 4,000; Refinery Capacity:
1,073,000 BPD.
● Products & Services:
• produces and sells transportation fuels throughout the U.S.
through branded marketers and distributors
• produces and sells petrochemicals and industrial products in
bulk to a variety of U.S. manufacturers as raw materials for
finished goods
• markets many different types, grades and container sizes of
lubricant and wax products
• produces and markets high quality asphalt
• owns and operates a 959 mile crude oil pipeline system and
three product pipelines with a combined total of
approximately 1,100 miles
1
The Context
● Project E2000: Implementation of SAP computer
system in all major divisions at CITGO
– began in 1995
– first project (Asphalt) “go live” date: 1/1/98
● The role of performance measurement
– monitor post-implementation problems
– identify areas for improvement in implementation
● Initial measurement system problems (Asphalt
project)
– focused on developing metrics rather than desired
performance outcomes
– inconsistent results across locations
– too heavily based on perceptions
– the measurement system wasn’t designed to meet all of the
business requirements
The Light Oils Project

● Purpose: identify measures that would help CITGO
track their “Go Live” readiness in the Light Oils
division and their performance during the first 90
days of implementation
– Minimize the performance dip that would inevitably occur
with the introduction of a new system
● Start early enough to:
– develop the measurement system
– assign accountability for gathering measures
– let users know how they would be measured before go live
● Start with business problem and performance
objectives, not with metrics
2
Steps and Tools
1. Map business ● Concept mapping (The

performance objectives Concept System®)
2. Prioritize performance ● Importance ratings (CS
objectives Global®)
3. Develop and detail ● The CS Performer®
metrics
4. Collect data and utilize ● The CS Performer®
results
1. Map Performance Objectives

● Focus: Describe specific human and business
performance objectives for your area (or another
area) of the business that will ensure a successful
Light Oils implementation for CITGO.
● Participants
– 16 core participant brainstorm and sort objectives
– 42 participants rate objectives for relative importance
● Objectives: 124 performance objectives generated
● Tools
– Brainstorming: The Concept System (group facilitated)
– Sorting: The Concept System Remote on LAN
– Rating: CS Global over Internet
1. Map 2. Prioritize 3. Develop Metrics 4. Collect Data
3
Sample Performance Objectives
● Complete successful system test & validation
● Train people only on those tasks which they will be responsible for
● Collect what you bill in a timely manner
● Ensure there is no impact or disruption to the current level of customer support
● Communicate changes to the invoice to customers
● Ensure correct customer master data (pricing, credit, tax, etc,)
● Individuals must understand the implications of not performing tasks correctly
● Communicate what roles individuals will be expected to perform
● Determine who has responsibility for performing each role
● Define any role changes that may be required
● Bill customers accurately
● Bill on timely basis
● Ensure help desk staff are sufficiently trained (SAP)
● Ensure business unit Power Users are fully trained
● Identify the Power Users in each business area
● Power Users need to demonstrate an understanding of the entire business process in their area of
responsibility
● Power Users need to demonstrate expertise in SAP that supports their area of responsibility
● Give Power Users the appropriate time to be involved in testing
● Power users need to demonstrate an understanding of of special exceptions or unique situations that
may arise
● ...
Performance Objectives Map
Human Concerns
Roles & Communication
Training & Support General Business Processes
Accounting/Tax Customer Service
Testing & QC
Billing
Data Interface & Conversion
Hydrocarbons
4
Pre-Post Implementation
The legend shows the
proportion of statements in the
cluster that were considered Human Concerns
relevant to pre-implementation
Training & Support

General Business Processes
Customer Service
Accounting/Tax
Testing & QC
Billing
Layer Value
1 0.00 to 0.19 Hydrocarbons
2 0.19 to 0.38
3 0.38 to 0.58
4 0.58 to 0.77
5 0.77 to 0.96
Hydrocarbons Cluster
Report accurate bulk movements from carriers (e.g. pipelines, marine) (46)
Make sure suppliers know how to invoice us to get paid in a timely manner (53)
Make sure exchange balances are correct (58)
Make sure terminals don't run out of product (63)
Properly schedule production out of the refineries (64)
Properly schedule feed stocks into the refineries (65)
Properly schedule products into the terminals (66)
Guarantee sales dollars equal volume delivered times accurate price (70)
Make sure that billing delays do not impact demand planning (71)
Make sure that billing delays do not impact the allocations system (72)
Ensure vessels are available for charter (89)
Send out exchange statements in a timely manner (netting) (90)
Be able to keep CITGO inventory and 3rd party inventory separate (terminaling agreements, refinery processing
agreements) (91)
Ensure that refinery production schedules come into TSW properly (96)
63
89
72 71
96 64 70
66 46
91 65
90
58
53
5
2. Prioritize Performance Objectives
● Instruction:
Rate each objective for how important it is for ensuring a successful

Light Oils implementation for CITGO where:
1 = relatively unimportant
2 = somewhat important
3 = moderately important
4 = very important
5 = extremely important
● 42 participants
Importance Map
Human Concerns
Training & Support

General Business Processes
Customer Service
Testing & QC Accounting/Tax
Billing
Layer Value Hydrocarbons
1 3.83 to 3.97 Data Interface & Conversion
2 3.97 to 4.10
3 4.10 to 4.23
4 4.23 to 4.36
5 4.36 to 4.50
6
3. Develop and Detail Metrics
● Five teams for metric development
– based on performance objectives map
– relevant knowledgeable stakeholders
● Process
– review performance objectives in relevant clusters
– brainstorm metric candidates
– select final metrics
– detail
» operational instruction
» target value
» low-high case
» frequency
» person/unit responsible
» follow-up metrics
Metric Development Teams
5 Human Concerns
Training & Support General Business Processes
Testing & QC
Billing
4 Hydrocarbons 1
3 2
7
Entering Metric Data in CS Performer
Final Performance Measures

SAP Employee Turnover
User Knowledge of SAP
Effect on Job Performance
Understanding of Role
Power User Proficiency

Human Concerns
Goods Movement Timeliness
SAP User Proficiency
Help Desk Tickets
AR Delinquency
Training & Support General Business Processes Disputed Invoices
Billing Correction Efficiency
Testing & QC
Pass Rate on Testing Scenarios Billing
Client Dependent Change Requests Data Interface & Conversion
Billing Process Rejects
Hydrocarbons
Invoice Fulfillment Rate
Conversion Accuracy Movements Not Actualized

Cutover Task Success % Physical/Book Inventory
Client Independent Change Requests - Light Oils
- Petrochem & Solvents
- Crude & Feedstocks
Tax Discrepancies % Book/Target Inventory,
Goods Movement Corrections - Light Oils
- Petrochem & Solvents
Transaction Processing Efficiency
- Crude & Feedstocks
Pre-Implementation Measures
Weekly Pulse Rating
Post-Implementation Measures
8
4. Collect Data and Utilize Results
● Enter data into CS Performer

● Compute standardized performance
● Display data
– over time for each metric
– at one time across metrics
● Identify performance problems
● Use follow-up metrics where needed to diagnose
performance problems
● Intervene to improve performance
Sample Time Series Results
9
Sample Performance Results
Conclusions
♦ Gathers information from a virtually unlimited number of
participants
♦ Graphically matches the opinions of various groups of
stakeholders
♦ Provides a framework to articulate and discuss the measurement
process
♦ Guides the organization through the measurement process
♦ Instills discipline to follow the process
♦ Provides a clearer picture of what people think
♦ Communicates ideas and builds consensus
♦ Measures performance over time
♦ Quantifies qualitative concepts
♦ Uses processes based on a statistical engine to produce results
and generate a group map from the combined input of individuals
in the group
10
City of Ithaca
Chief of Police Search Process
Welcome to the City of Ithaca Chief of Police Search Page. From November, 1996 to June, 1997 the City of
Ithaca, New York will be searching for a new Chief of Police. This page describes current activities and
provides continuously-updated information about the entire search process.
The Characteristics the Community Wants to See in the Next Chief
The results of a several months of effort to identify the characteristics the Ithaca community would like to see
in the next police chief show that there are nine major categories of characteristics the community is looking
for:
❍ Commitment to
Diversity
❍ Dedicated to
Community
Collaboration
❍ Community
Issue Oriented
❍ Open-Minded,
Accessible
Communicator
❍ Preventive/
Alternative
Approaches
❍ Personable
❍ Innovative/
Visionary
❍ Leadership/
Management
❍ Uncompromising
Honesty
The 74 specific characteristics that came out of the community brainstorming effort are divided among these
nine broader categories. You can view:
❍ The Basic Results

❍ How the Results Were Obtained
❍ How the Results Can Be Used
On many of the pages that show results, you can also add your own comments and observations.
Project Background Information
❍ Job Announcement
❍ Major Events
❍ Selection Process Timeline
❍ Screening Committee Members
❍ Final Selection Press Release, June 25, 1997
Newspaper Articles
❍ How Ithaca's New Police Chief was Screened, Hired, Ithaca Journal, 7/16/97
❍ EDITORIAL: At Last, a Police Chief, Ithaca Journal, 6/26/97
❍ Chief-To-Be Described as Involved, Sensitive, Ithaca Journal, 6/26/97
❍ City Names Basile Chief: Ellenville Cop Starts on July 21, Ithaca Journal, 6/26/97
❍ City Police Chief Pick a Week Off, Ithaca Journal, 6/16/97
❍ EDITORIAL: The Rush to Judgment, Ithaca Times, 6/12/97
❍ EDITORIAL: What Now in City Police Chief Pick?, Ithaca Journal, 6/6/97
❍ More Chief Candidates Might Be Considered, Ithaca Journal, 6/6/97
❍ Only 1 Chief Candidate May Be Left, Ithaca Journal, 6/5/97
❍ EDITORIAL: Ithacans Cheated in Chief Selection, Ithaca Journal, 6/2/97
❍ Police Chief Pick Down to Two Names, Ithaca Journal, 6/2/97
❍ EDITORIAL: City Shortchanged in Police Chief Pick, Ithaca Journal, 4/28/97
❍ Owego's Top Cop Interviews in Ithaca, Ithaca Journal, 4/23/97
❍ 45 Apply To Be Ithaca Police Chief: Candidate Deadline Today, Ithaca Journal, 2/28/97
❍ The Ithaca Chief: Letter to the Editor, Ithaca Journal, 2/19/97
❍ Committee Resolves Debate on IPD Chief Search Input, Cornell Daily Sun, 2/14/97
❍ Police Chief Selection Misses Public Input, 2/14/97
❍ Group Says Chief Survey Not Skewed, Ithaca Journal, 2/13/97
❍ Alum Protests City's Treatment of Police, Cornell Daily Sun, 2/12/97
❍ Community Says Top Cop Criteria Shaped By Police, Ithaca Journal, 2/12/97
❍ Ithacans Accuse IPD of Tainting Search, Cornell Daily Sun, 2/11/97
❍ Next Meeting On Chief About Ideal Candidate, Ithaca Journal, 1/23/97
❍ Next Chief Will Meet Panel's 74 Criteria, Ithaca Journal, 1/23/97
❍ West End Breakfast Probes Top Cop Search, Ithaca Journal, 1/15/97
❍ Breakfast Club Returns for Year, Ithaca Journal, 1/13/97
❍ A Group Effort, Ithaca Journal, 12/31/96
❍ Forum: Hire talkative top cop, Ithaca Journal, 12/11/96
❍ EDITORIAL: New interest in police chief, Ithaca Journal, 12/10/96
❍ Police Chief Forum Set For Dec. 10, Ithaca Journal, 12/02/96
❍ Ithacans Brainstorm Traits for Chief, Cornell Daily Sun, 11/19/96
❍ Ithaca cop dies in city knifing: Woman killed in attack on officer, Ithaca Journal, 11/18/96
❍ EDITORIAL: What the chief means to the city, Ithaca Journal, 11/18/96
❍ An Invitation To The Ithaca Community, Guest Opinion, Ithaca Journal, 9/16/96
❍ Get Involved: Community input is sought in the search for a new police chief, Ithaca Times,
11/14/96
❍ Process to choose IPD head moves on, Ithaca Journal, 11/13/96
❍ Community Process Forums, Press Release, 11/12/96
❍ Brainstorming Forum, Press Release, 11/12/96
❍ EDITORIAL: Where’s the chief?, Ithaca Journal, 10/15/96
❍ City starts search for police chief, Ithaca Journal, 10/15/96
❍ C.U. Prof Assists City With IPD Chief Search, Cornell Daily Sun, 10/4/96
❍ Council turns back mayor’s vetoes, Ithaca Journal, 10/3/96
❍ Police chief selection process is proposed by Cohen, Ithaca Times, 10/3/96
❍ A chief-choosing plan: Council hears idea on how to get top cop, Ithaca Journal, 9/26/96
❍ Mayor stops 3 ballot-bound laws, Ithaca Journal, 9/12/96
❍ Cohen and Council engage in a power struggle, Ithaca Times, 9/9/96
This project is sponsored in part by:
Concept Systems Incorporated

118 Prospect Street, Suite 309
Ithaca NY 14850
(607) 272-1206
(607) 272-1215 FAX
E-mail:
concepthelp@conceptsystems.
com
Website: Concept Systems
Incorporated Website

Cornell University
DRAFT: Not for quotation or citation. Comments would be greatly appreciated.
Paper presented at the Annual Conference of the American Evaluation Association, Dallas, Texas, November 6, 1993. This research was supported in part through
NIMH Grant R01MH46712-01A1, William M.K. Trochim, Principal Investigator.
Abstract
Because of the growing interest in and use of the concept mapping methodology, it is important to define rigorous and feasible standards of quality. This paper
addresses the issue of the reliability of concept mapping. Six different reliability coefficients that can easily be estimated from the data typically available from any
concept mapping project were defined and estimated for 38 different concept mapping projects. Results indicate that the concept mapping process can be considered
reliable according to generally-recognized standards for acceptable reliability levels. It is recommended that the reliabilities estimated here be routinely reported with
concept mapping project results.
Concept mapping is a process that can be used to help a group describe its ideas on any topic of interest (Trochim, 1989a). The process typically requires the
participants to brainstorm a large set of statements relevant to the topic of interest, individually sort these statements into piles of similar ones and rate each statement
on some scale, and interpret the maps that result from the data analyses. The analyses typically include a two-dimensional multidimensional scaling (MDS) of the
unstructured sort data, a hierarchical cluster analysis of the MDS coordinates, and the computation of average ratings for each statement and cluster of statements.
The maps that result show the individual statements in two-dimensional (x,y) space with more similar statements located nearer each other, and show how the
statements are grouped into clusters that partition the space on the map. Participants are led through a structured interpretation session designed to help them
understand the maps and label them in a substantively meaningful way.
The concept mapping process as discussed here was first described by Trochim and Linton (1986). Trochim (1989a) delineates the process in detail and Trochim
(1989b) presents a wide range of example projects. Concept mapping has received considerable use and appears to be growing in popularity. It has been used to
address substantive issues in the social services (Galvin, 1989; Mannes, 1989), mental health (Cook, 1992; Kane, 1992; Lassegard, 1993; Marquart, 1988; Marquart,
1992; Marquart et al, 1993; Penney, 1992; Ryan and Pursley, 1992; Shern, 1992; Trochim, 1989a; Trochim and Cook, 1992; Trochim et al, in press; Valentine,
1992), health care (Valentine, 1989), education (Grayson, 1993; Kohler, 1992; Kohler, 1993), educational administration (Gurowitz et al, 1988), and theory
development (Linton, 1989). Considerable methodological work on the concept mapping process and its potential utility has also been accomplished (Bragg and
Grayson, 1993; Caracelli, 1989; Cooksy, 1989; Davis, 1989; Dumont, 1989; Grayson, 1992; Keith, 1989; Lassegard, 1992; Marquart, 1989; Mead and Bowers, 1992;
Mercer, 1992; SenGupta, 1993; Trochim, 1985 , 1989c, 1990).
Given the broad and apparently increasing utilization of the concept mapping method, it is increasingly important that issues related to the quality of the process be
investigated. In most social science research, the quality of the measurement is assessed through estimation of reliability and validity. This paper considers only the
reliability of concept mapping.
The traditional theory of reliability typically applied in social research does not fit the concept mapping model well. That theory assumes that for each test item there
is a correct answer that is known a priori. The performance of each individual is measured on each question and coded correct or incorrect. Data are typically stored
in a rectangular matrix with the rows being persons and the columns test items. Reliability assessment focuses on the test questions or on the total score of the test.
That is, we can meaningfully estimate the reliability of each test item, or of the total score.
Concept mapping involves a different emphasis altogether. There is no assumed correct answer or correct sort. Instead, it is assumed that there may be some
normatively typical arrangement of the statements that is reflected imperfectly in the sorts of all members who come from the same relatively homogeneous (with
respect to the construct of interest) cultural group. The emphasis in reliability assessment shifts from the item to the person. For purposes of reliability assessment, the
structure of the data matrix is reversed, with persons as the columns and items (or pairs of items) as the rows. Reliability assessment focuses on the consistency
across the assumed relatively homogeneous set of participants. In this sense, it is meaningful to speak of the reliability of the similarity matrix or the reliability of the
map in concept mapping, but not of the reliability of individual statements.
This paper presents several ways of estimating the reliability or consistency of concept mapping. The various estimates of reliability are illustrated on data from a
large heterogeneous group of prior concept mapping projects. The distributions of reliability estimates across many projects provide realistic estimates of the level of
reliability one might expect in typical field applications of concept mapping.
The Concept Mapping Process

Traditional presentations of concept mapping describe it as a six-step process as depicted in Figure 1.
Figure 1. The six steps in the concept mapping process.
During the preparation step, the focus for the concept mapping is operationalized, participants are selected, and a schedule is developed. The generation step is
usually accomplished through a simple brainstorming (Osborn, 1948) of a large set of statements related to the focus. In the structuring step, each participant
completes an unstructured sorting (Rosenberg and Kim, 1975; Weller and Romney, 1988) of the statements into piles of similar ones, and rates each statement on
some dimension of relevance. The representation step consists of the major statistical analyses. The analysis begins with construction from the sort information of an
NxN (where N is the total number of statements) binary, symmetric matrix of similarities, SNxN for each participant. For any two items i and j, a 1 is placed in Sij if
the two items were placed in the same pile by the participant, otherwise a 0 is entered (Weller and Romney, 1988, p. 22). The construction of this individual matrix is
illustrated in Figure 2.
Figure 2. The construction of the binary 0,1 similarity matrix, SNxN, for each sort in concept mapping.
The total NxN similarity matrix, TNxN is obtained by summing across the individual SNxN matrices. Thus, any cell in this matrix can take integer values between 0
and M (where M is the total number of people who sorted the statements); the value indicates the number of people who placed the i,j pair in the same pile. The total
similarity matrix TNxN is analyzed using nonmetric multidimensional scaling (MDS) analysis (Kruskal and Wish, 1978; Davison, 1983) with a two-dimensional
solution. The solution is limited to two dimensions because, as Kruskal and Wish (1978) point out:
Since it is generally easier to work with two-dimensional configurations than with those involving more dimensions, ease of use considerations are
also important for decisions about dimensionality. For example, when an MDS configuration is desired primarily as the foundation on which to
display clustering results, then a two-dimensional configuration is far more useful than one involving three or more dimensions (p. 58).
The analysis yields a two-dimensional XNx2 configuration of the set of N statements based on the criterion that statements piled together most often are located more
proximately in two-dimensional space while those piled together less frequently are further apart.
This two-dimensional configuration is the input for the hierarchical cluster analysis utilizing Ward's algorithm (Everitt, 1980) as the basis for defining a cluster.
Using the MDS configuration as input to the cluster analysis in effect forces the cluster analysis to partition the MDS configuration into non-overlapping clusters in
two-dimensional space. In the interpretation step, the participant group is guided by the facilitator through a structured process that familiarizes them with the various
maps and enables them to attach meaningful substantive labels to various locations on the map. Finally, in the utilization step, the participants discuss specific ways
the maps can be used to help address the original focus of the project.
Because the concept mapping process is so complex, it is difficult to conceive of a single overall reliability coefficient. For instance, it would be theoretically feasible
to ask about the reliability of any of the six phases of the process independent of the others. Nevertheless, it is clear that the central product of the concept mapping
process is the two-dimensional map itself and, consequently, efforts to address reliability are well-directed to the central phases of the analysis, the structuring and
representation steps. In this paper, the focus is on methods for estimating the reliability of the sort data and of the two-dimensional MDS map that results.
Estimates of the Reliability of Concept Mapping
The key components of the concept mapping process available for estimating various reliabilities are shown in Figure 3.
Figure 3. The key components in the concept mapping data and the related reliability estimates.
The figure assumes a hypothetical project involving ten participants (M=10), each of whom sorted and rated the set of N statements. We see that for each sort, there
is a corresponding binary symmetric similarity matrix, SNxN. These are aggregated into the total matrix, TNxN. This total matrix is the input to the MDS analysis
which yields a two-dimensional XNx2 configuration. The Euclidean distances (in two dimensions) between all pairs of statements can be computed directly from the
two-dimensional matrix, yielding a distance matrix DNxN where:
The ratings are analyzed separate from the sort data, as indicated in the figure.
Although the assumptions underlying reliability theory for concept mapping are different from traditional reliability theory, the methods for estimating reliability
would be familiar to traditionalists. Several common reliability estimators are considered below. From these, a subset set of estimators is selected that can readily be
obtained from the data for any typical concept mapping project.
One common estimator of reliability is the test-retest correlation. Typically, respondent scores on successive administrations of a test are correlated to estimate the
degree of consistency in repeated testings. In concept mapping, this could be accomplished by asking the same participants to sort the statements on two separate
occasions. Two reliability coefficients could be computed. One would involve the correlation between the aggregated similarity matrix, TNxN (the input to MDS) on
both occasions. The other would be the correlation between the two MDS maps that result (specifically, the correlation between the distances between all pairs of
points on the two maps, DNxN). The test-retest correlation has several disadvantages as a reliability estimator in a concept mapping context. It assumes that
participants do not change with respect to what is being measured (or change only in a linear fashion) between testings and that the first testing does not affect the
response on the second. More practically, the test-retest method requires twice the data collection. Participants would usually need to be assembled on separate days,
significantly increasing the costs and feasibility of a project. Although the test-retest reliability estimate should be used where practicable, it is not used here to
estimate reliability.
A second traditional way to estimate reliability would be to divide the set of test items into two random subtests and compute the correlation for these across the
participants. This "split half" reliability can also be accomplished for the concept mapping case. Here, one would divide the participant group randomly into two
subgroups, labeled A and B. Separate similarity matrices (TA and TB) and MDS maps (XA and XB) can be computed for each subgroup, as shown in Figure 3. By
correlating them, one can then estimate the split half reliability of the similarity matrix and of the map that results. The split half reliability has the advantage of being
relatively easy to compute from any concept mapping data. Both split half reliabilities are studied here.
In traditional reliability estimation, Cronbach's alpha is often used and is considered equivalent to computing all possible split half reliabilities. This would clearly be
superior to the simple split half estimator, but there is no known way to estimate alpha for the matrix data used in concept mapping. Even if one could accomplish
this for the sort data, one would need to compute MDS maps for each potential split half in order to estimate the equivalent to Cronbach's alpha -- clearly a
prohibitively time consuming proposition. For this reason, no Cronbach's alpha estimate of reliability is considered here.
Another traditional reliability estimate involves the degree to which each test item correlates with the total score across all items on the test. This average item-total
reliability has an analogue in concept mapping. One can compute the correlation between each person's binary sort matrix, SNxN, and the total similarity matrix,
TNxN, and between each person's binary sort matrix, SNxN, and the distances on the final map, DNxN. These will be labeled here the Average Individual-to-Total
reliability and the Average Individual-to-Map reliability.
A final traditional reliability estimate is based on the average of the correlations among items on a scale, or the average interitem correlation. It is possible to perform
an analogous analysis with concept mapping data, on both the sorting and rating data. These will be termed here the average Individual-to-Individual sort and the
average Rating-to-Rating reliabilities.
Most of the estimation methods described above (except for test-retest and Cronbach's alpha) rely on calculations that are based on only part of the total available
sample of participants. For instance, the split-half reliability has an effective sample size of one-half the total number of participants. The three averaged estimates are
even worse off, relying on only a single individual or pair as the effective sample size for each element entered into the average. Since we know that reliability is
affected by the number of items on a test (or persons in a concept mapping project), these correlations based on only part of the participant sample do not accurately
reflect the correlational value we would expect for the entire participant sample. This is traditionally corrected for in reliability estimation by applying the Spearman-
Brown Prophecy Formula (Nunnally, 1978, p. 211):
where:
rij = the correlation estimated from the data
k = N/n where N is the total sample size and n is the sample size on which rij is based
rkk = the estimated Spearman-Brown corrected reliability
In sum, there appear to be several reliability estimates that can be routinely constructed from any concept mapping data. All of them require use of the Spearman-
Brown correction. They are:
1. The Split-Half Total matrix reliability, rSHT
2. The Split-Half Map reliability, rSHM
3. The Average Individual-to-Total Reliability, rIT.(k = N/1)
4. The Average Individual-to-Map Reliability, rIM.(k = N/1)
5. The Average Individual-to-Individual Sort Reliability, rII. (k = N/1)
6. The Average Rating-to-Rating Reliability, rRR.(k = N/1)
Method
Sample
Thirty-eight separate concept mapping projects conducted over the past two years constituted the sample for this reliability study. This is essentially exhaustive of the
universe of all concept mapping projects conducted by the author over that time period. Almost all of the projects could be classified generally as in the area of social
services research. Most (N=18) were in the field of mental health. Three were related to arts organization administration. There were two each in health and
agriculture. Three were primarily focused on research methodology issues (such as the conceptualization of what is meant by measurement). There were 10 other
studies that were classified generally as social services in nature.
Procedure
All of the reliabilities calculated here are depicted graphically in Figure 3 above.
Split-Half Reliabilities. The set of sorts from each project was randomly divided into two halves (for odd-numbered participant groups, one group was randomly
assigned one more person than the other). Separate concept maps were computed for each group. The total matrices, TA and TB, for each group were correlated and
the Spearman-Brown correction applied to obtain rSHT. The Euclidean distances between all pairs of points on the two maps, DA and DB, were correlated and the
Spearman-Brown correction applied to obtain rSHM.
Individual-to-Individual Sort Reliability. The SNxN matrices were correlated for all pairs of individuals. These correlations were averaged and the Spearman-Brown
correction applied to yield rII.
Individual-to-Total Matrix Reliability. The SNxN sort matrix for each individual was correlated with the total matrix, TNxN. These correlations were averaged and the
Spearman-Brown correction applied to yield rIT.
Individual-to-Map Reliability. The SNxN sort matrix for each individual was correlated with the Euclidean distances, DNxN. These correlations were averaged and the
Spearman-Brown correction applied to yield rIM.
Average Inter-Sort Reliability. The correlation between the ratings for each pair of persons was computed. These correlations were averaged and the Spearman-
Brown correction applied to yield rRR.
Results
Descriptive statistics for the thirty-eight concept mapping projects are shown in Table 1.
Table 1. Descriptive statistics for the number of statements, number of sorters, number of raters, and stress values for 38 concept mapping projects.
Number of Statements Number of Sorters Number of Raters

Stress Value
Number of Projects 38 37 37 33
Mean 83.84211 14.62162 13.94595 0.28527
Median 93.00000 14.00000 14.00000 0.29702
Minimum 39.00000 7.00000 6.00000 0.15526
Maximum 99.00000 32.00000 33.00000 0.35201
SD 17.99478 5.77038 5.69086 0.04360
On average, 83.8 statements were brainstormed across all projects, with a range from 39 to 99. Most projects achieved over 90 statements (median=93). There were
an average of 14.62 sorters per project, very close to the typically recommended sample size of fifteen. The reason there are only 37 projects for the sorting is that
two of the projects were related and used the same sort statements, with one of those doing only the ratings. Similarly, in one of the projects, no ratings were done.
The last column shows that the average stress value across the projects was .285 (SD=.04). Stress is a statistic routinely reported for multidimensional scaling that
reflects the goodness of fit of the map to the original dissimilarity matrix that served as input. A lower stress value implies a better fit. The multidimensional scaling
literature suggests that a lower stress value is desired than is typically obtained in concept mapping. However, it must be remembered that the recommendations in
the literature are typically based on experience with much more stable phenomena (e.g., physiological perception of color similarities), fewer entities, and more
precise measurement methods (e.g., paired comparisons). The data summarized in Table 1 are important benchmarks that can act as reasonable standards for the level
of stress that should be expected in typical field-based concept mapping projects.
Table 2 shows the descriptive statistics for the stress values for the sample projects and for their split half samples.
Table 2. Descriptive statistics for the stress values for the entire project and the split-half samples for 38 concept mapping projects.
Stress 1 - Stress 2 -
Stress Value Split Half Split Half
Number of Projects
33 33 33
Mean 0.28527 0.30013 0.29987
Median 0.29702 0.31421 0.31082
Minimum 0.15526 0.19962 0.14875
Maximum 0.35201 0.34437 0.36855
SD 0.04360 0.03772 0.04654
The major value of this table is that it gives some indication of the effect of sample size (i.e., number of sorters) on final stress values. Somewhat surprisingly, the
table suggests that stress values based on sample sizes half as large are nearly as good as the full-sample values, suggesting that even smaller samples of sorters may
produce maps that fit almost as well as samples twice as large.
The estimates of reliability are reported in Table 3.
Table 3. Descriptive statistics for reliability estimates for 38 concept mapping projects.
rII rRR rIT rIM rSHT rSHM

Number of Projects 33 37 33 33 33 33
Mean 0.81507 0.78374 0.92965 0.86371 0.83330 0.55172
Median 0.82060 0.82120 0.93070 0.86280 0.84888 0.55881
Minimum 0.67040 0.42700 0.88230 0.74030 0.72493 0.25948
Maximum 0.93400 0.93540 0.97370 0.95490 0.93269 0.90722
SD 0.07016 0.12125 0.02207 0.04771 0.05485 0.15579
Three of the coefficients (i.e., rII, rIT, and rSHT) utilize only the individual sort matrices and the sum of these. The average individual-to-individual sort reliability
value (rII) was .815, the average individual-to-total matrix value (rIT) was .929, and the average split-half total matrix reliability (rSHT) was .833. The only reliability
estimate that involved rating values (rRR) yielded an average of .78. It is worth noting that one would typically not expect that there would be as much consistency
across a group of persons on the ratings as on sortings. Finally, there were only two reliability estimates that included information from the final map. The average
value of the relationship between individuals' sorts and the final map configuration (rIM) was .863. The split-half reliability of the final maps (rSHM) had an average
value of .55.
Table 4 shows the relationship between the number of statements and sorters, and the various reliabilities.
Table 4. Correlations between number of statements and number of sorters and the various reliabilities.
Number of Statements Number of Sorts

rII -0.06232 0.54577
rIT -0.00714 0.59122
rIM -0.15390 0.61201
rSHT -0.15483 0.54921
rSHM -0.07697 0.21373
The number of statements is largely uncorrelated with reliability, although all coefficients are slightly negative. On the other hand, the number of sorters is positively
correlated with reliabilities. This suggests that having more sorters in a concept mapping project can improve the overall reliability of the results.
Finally, the intercorrelations among the five sort-related reliability estimates are shown in Table 5.
Table 5. Correlations between different reliability estimates.
rII rIT rIM rSHT
rIT 0.94030
rIM 0.78976 0.90937
rSHT 0.91925 0.90301 0.77444
rSHM 0.48329 0.59313 0.68700 0.57188
The correlations are all significantly positive, with the lowest correlations between the split-half map reliabilities and all others.
Discussion
The results indicate that the concept mapping method, when examined across a wide range of projects, yields reliable results as estimated by a number of acceptable
reliability indicators.
While all reliability estimates were strongly positive, the split half estimate of the relationship between maps was clearly lower than the rest. It is not surprising that
this value is lower. A simple analogy might explain why this is so. Imagine that we had data on a multi-item scale and that we divided the sample of respondents
randomly into two halves. If we compute any estimate of reliability of the raw data, it is bound to be higher than estimates of reliability based on analyses that
process that data. For instance, if we applied the same regression model or factor analytic model to both random split half samples and correlated the results of these
analyses (e.g., the predicted regression values or factor analysis inter-item matrix), they would almost certainly be lower than any reliability based only on the
original raw data.
The reliability estimates reported here are all easily calculable from the raw data available from any concept mapping project. These estimates should be reported
routinely in write-ups of concept mapping results.
References
Bragg, L.R. and Grayson, T.E. (1993). Reaching consensus on outcomes: Lessons learned about concept mapping. Paper presented at the Annual Conference of the
Caracelli, V. (1989). Structured conceptualization: A framework for interpreting evaluation results. Evaluation and Program Planning. 12, 1, 45-52.
Cook, J. (1992). Modeling staff perceptions of a mibile job support program for persons with severe mental illness. Paper presented at the Annual Conference of the
Cooksy, L. (1989). In the eye of the beholder: Relational and hierarchical structures in conceptualization. Evaluation and Program Planning. 12, 1, 59-66.
Davis, J. (1989). Construct validity in measurement: A pattern matching approach. Evaluation and Program Planning. 12, 1, 31-36.
Dumont, J. (1989). Validity of multidimensional scaling in the context of structured conceptualization. Evaluation and Program Planning. 12, 1, 81-86.
Galvin, P.F. (1989). Concept mapping for planning and evaluation of a Big Brother/Big Sister program. Evaluation and Program Planning. 12, 1, 53-58.
Grayson, T.E. (1993). Empowering key stakeholders in the strategic planning and development of an alternative school program for youth at risk of school behavior.
Paper presented at the Annual Conference of the American Evaluation Association, Dallas, TX.
Grayson, T.E. (1992). Practical issues in implementing and utilizing concept mapping. Paper presented at the Annual Conference of the American Evaluation
Gurowitz, W.D., Trochim, W. and Kramer, H. (1988). A process for planning. The Journal of the National Association of Student Personnel Administrators, 25, 4,
226-235.
Kane, T.J. (1992). Using concept mapping to identify provider and consumer issues regarding housing for persons with severe mental illness. Paper presented at the
Keith, D. (1989). Refining concept maps: Methodological issues and an example. Evaluation and Program Planning. 12, 1, 75-80.
Kohler, P.D. (1992). Services to students with disabilities in postsecondary education settings: Identifying program outcomes. Paper presented at the Annual
Kohler, P.D. (1993). Serving students with disabilities in postsecondary education settings: Using program outcomes for planning, evaluation and empowerment.
Lassegard, E. (1992). Assessing the reliability of the concept mapping process. Paper presented at the Annual Conference of the American Evaluation Association,
Seattle, WA.
Lassegard, E. (1993). Conceptualization of consumer needs for mental health services.Paper presented at the Annual Conference of the American Evaluation
Linton, R. (1989). Conceptualizing feminism: Clarifying social science concepts. Evaluation and Program Planning. 12, 1, 25-30.
Mannes, M. (1989). Using concept mapping for planning the implementation of a social technology. Evaluation and Program Planning. 12, 1, 67-74.
Marquart, J.M. (1988). A pattern matching approach to link program theory and evaluation data: The case of employer-sponsored child care. Unpublished doctoral
dissertation, Cornell University, Ithaca, New York.
Marquart, J.M. (1989). A pattern matching approach to assess the construct validity of an evaluation instrument. Evaluation and Program Planning. 12, 1, 37-44.
Marquart, J.M. (1992). Developing quality in mental health services: Perspectives of administrators, clinicians, and consumers. Paper presented at the Annual
Marquart, J.M., Pollak, L. and Bickman, L. (1993). Quality in intake assessment and case management: Perspectives of administrators, clinicians and consumers. In
R. Friedman et al. (Eds.), A system of care for children's mental health: Organizing the research base. Tampa: Florida Mental Health Institute, University of South
Florida.
Mead, J.P. and Bowers, T.J. (1992). Using concept mapping in formative evaluations. Paper presented at the Annual Conference of the American Evaluation
Mercer, M.L. (1992). Brainstorming issues in the concept mapping process. Paper presented at the Annual Conference of the American Evaluation Association,
Seattle, WA.
Penney, N.E. (1992). Mapping the conceptual domain of provider and consumer expectations of inpatient mental health treatment: New York Results. Paper
presented at the Annual Conference of the American Evaluation Association, Seattle, WA.
Romney, A.K., Weller, S.C. and Batchelder, W.H. (1986). Culture as consensus: A theory of culture and informant accuracy. American Anthropologist, 88, 2, 313-
338.
Rosenberg, S. and Kim, M.P. (1975). The method of sorting as a data gathering procedure in multivariate research. Multivariate Behavioral Research, 10, 489-502.
Ryan, L. and Pursley, L. (1992). Using concept mapping to compare organizational visions of multiple stakeholders. Paper presented at the Annual Conference of the
SenGupta, S. (1993). A mixed-method design for practical purposes: Combination of questionnaire(s), interviews, and concept mapping.Paper presented at the
Annual Conference of the American Evaluation Association, Dallas, TX.
Shern, D.L. (1992). Documenting the adaptation of rehabilitation technology to a core urban, homeless population with psychiatric disabilities: A concept mapping
approach. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.
Trochim, W. (1989a). An introduction to concept mapping for planning and evaluation. Evaluation and Program Planning, 12, 1, 1-16.
Trochim, W. (1990). Pattern matching and program theory. In H.C. Chen (Ed.), Theory-Driven Evaluation. New Directions for Program Evaluation, San Francisco,
CA: Jossey-Bass.
Trochim, W. and Cook, J. (1992). Pattern matching in theory-driven evaluation: A field example from psychiatric rehabilitation. in H. Chen and P.H. Rossi (Eds.)
Using Theory to Improve Program and Policy Evaluations. Greenwood Press, New York, 49-69.
Trochim, W., Cook, J. and Setze, R. (in press). Using concept mapping to develop a conceptual framework of staff's views of a supported employment program for
persons with severe mental illness. Consulting and Clinical Psychology.
Valentine, K. (1989). Contributions to the theory of care. Evaluation and Program Planning. 12, 1, 17-24.
Valentine, K. (1992). Mapping the conceptual domain of provider and consumer expectations of inpatient mental health treatment: Wisconsin results. Paper presented
Weller, S.C. and Romney, A.K. (1988). Systematic Data Collection. Sage Publications, Newbury Park, CA.
Final Report
Workforce Competencies for Psychosocial Rehabilitation Workers:
A Concept Mapping Project
Complete Document in PDF Format
Complete Document in MS Word97 Format
Contents
1. Introduction
2. Preparation
3. Generation
4. Structuring
5. Representation
6. Interpretation
7. Utilization
8. References
Project conducted for
The International Association of Psychosocial Rehabilitation Services

Albuquerque, New Mexico
November 11-12, 1993

Cornell University
Judith Cook
Thresholds National Research and Training Center on Rehabilitation and Mental Illness
Home Page

Final Report
Workforce Competencies for Psychosocial Rehabilitation

Workers:
A Concept Mapping Project
CONSUMER-CENTERED
COMPETENCIES
REHABILITATION
METHODOLOGY Consumer
Consumer Empowerment
COMPETENCIES Outcome Competencies
Competencies
SYSTEM
COMPETENCIES Interpersonal
Competencies
Intervention
Assessment Skills
Competencies Competencies
Community
Resources
Competencies
Professional Role
Competencies
Family-
Focused
Competencies
Professional
Development
Competencies
Mental Health
Knowledge Base Intrapersonal
Self-Management
KNOWLEDGE BASE Competencies
Multicultural
COMPETENCIES Psychosocial Competencies
Rehabilitation PRACTITIONER
Knowledge Base
Competencies COMPETENCIES
Project conducted for
The International Association of Psychosocial Rehabilitation Services

Albuquerque, New Mexico
November 11-12, 1993

Cornell University
Judith Cook
Thresholds National Research and Training Center on Rehabilitation and Mental Illness
Contents
Introduction................................................................................................................................................ 3
Preparation ................................................................................................................................................. 4
The Focus for the Concept Mapping............................................................................................ 4
The Participants ........................................................................................................................... 4
The Schedule ............................................................................................................................... 4
Generation.................................................................................................................................................. 4
Structuring.................................................................................................................................................. 5
Representation............................................................................................................................................ 5
Representation Results................................................................................................................. 6
Interpretation.............................................................................................................................................. 6
Discussion of Skills versus Values............................................................................................... 7
Discussion of What was Missing on the Map .............................................................................. 7
Utilization .................................................................................................................................................. 8
Review and Feedback on the Map's Clusters and Regions .......................................................... 8
Discussion of Other Competency Documents.............................................................................. 9
Small Group Sessions .................................................................................................................. 9
Small Group Operationalizations of Five Clusters...................................................................... 9
Small Group Map Revision.......................................................................................................... 14
Next Steps.................................................................................................................................... 16
References.................................................................................................................................................. 18
Introduction
The International Association of Psychosocial Rehabilitation Services (IAPSRS) has as one of its primary missions
the task of developing Psychosocial Rehabilitation (PSR) as a professional discipline. To that end, they have for
several years been working towards the development of a comprehensive set of workforce competencies that could
be utilized as standards in the certification of PSR workers. This task has become even more pressing in view of the
national efforts to develop comprehensive health insurance coverage in the United States (The White House
Domestic Policy Council, 1993). It is essential that professional standards for PSR be clearly delineated if PSR is to
be included as a service that is covered under national health insurance.
In recent years, there have been a several efforts to elucidate PSR workforce competencies or competencies for
related endeavors that might be relevant (Curtis, 1993; Friday and McPheeters, 1985; Jonikas, 1993; IAPSRS
Ontario Chapter, 1992). To move the process along, IAPSRS contracted with the Thresholds Research and Training
Center on Rehabilitation and Mental Illness to: a) review the literature on PSR competencies and develop a paper
that integrated that literature; and b) conduct a concept mapping project with a selected national group of PSR
experts designed to elucidate a comprehensive framework of competencies. The Jonikas (1993) document
constituted the literature review. This report describes the concept mapping project that was undertaken.
Concept mapping is a process that can be used to help a group describe its ideas on any topic of interest (Trochim,
1989a). The process typically requires the participants to brainstorm a large set of statements relevant to the topic of
interest, individually sort these statements into piles of similar ones and rate each statement on some scale, and
interpret the maps that result from the data analyses. The analyses typically include a two-dimensional
multidimensional scaling (MDS) of the unstructured sort data, a hierarchical cluster analysis of the MDS
coordinates, and the computation of average ratings for each statement and cluster of statements. The maps that
result show the individual statements in two-dimensional (x,y) space with more similar statements located nearer
each other, and show how the statements are grouped into clusters that partition the space on the map. Participants
are led through a structured interpretation session designed to help them understand the maps and label them in a
substantively meaningful way.
The concept mapping process as conducted here was first described by Trochim and Linton (1986). Trochim
(1989a) delineates the process in detail and Trochim (1989b) presents a wide range of example projects. Concept
mapping has received considerable use and appears to be growing in popularity. It has been used to address
substantive issues in the social services (Galvin, 1989; Mannes, 1989), mental health (Cook, 1992; Kane, 1992;
Lassegard, 1993; Marquart, 1988; Marquart, 1992; Marquart et al, 1993; Penney, 1992; Ryan and Pursley, 1992;
Shern, 1992; Trochim, 1989a; Trochim and Cook, 1992; Trochim et al, in press; Valentine, 1992), health care
(Valentine, 1989), education (Grayson, 1993; Kohler, 1992; Kohler, 1993), educational administration (Gurowitz et
al, 1988), and theory development (Linton, 1989). Considerable methodological work on the concept mapping
process and its potential utility has also been accomplished (Bragg and Grayson, 1993; Caracelli, 1989; Cooksy,
1989; Davis, 1989; Dumont, 1989; Grayson, 1992; Keith, 1989; Lassegard, 1992; Marquart, 1989; Mead and
Bowers, 1992; Mercer, 1992; SenGupta, 1993; Trochim, 1985 , 1989c, 1990).
The concept mapping process involves six major steps:
1 Preparation
2 Generation
3 Structuring
4 Representation
5 Interpretation
6 Utilization
This report presents the results of the project in sequential order according to the six steps in the process.
3
Preparation
The preparation step involves three major tasks. First, the focus for the concept mapping project must be stated
operationally. Second, the participants must be selected. And, third, the schedule for the project must be set.
The Focus for the Concept Mapping
In concept mapping, the focus for the project is stated in the form of the instruction to the brainstorming participant
group. For this project this instruction was operationalized as:
Generate statements (short phrases or sentences) that describe specific workforce competencies for
psychosocial rehabilitation practitioners.
In most projects there is a secondary focus that relates to the ratings of the brainstormed statements. This focus is
also stated in its operational form and, for this project, was:
Using the following scale, rate each competency for its relative importance for high-quality service delivery.
1 2 3 4 5
relatively less somewhat moderately very extremely

important important important important important
The Participants
Twenty-one people participated in the concept mapping process. They were purposively selected to represent a
broad range of PSR experiences and schools of thought. They included the Director of IAPSRS, the Chair of the
committee responsible for developing competencies and several members of the IAPSRS Board of Directors.
Several participants were affiliated with the leading national centers for PSR. There were several consumers of PSR
services. [Judith -- what else could I say here?]
The Schedule
The concept mapping project was scheduled for two consecutive days. It began on Thursday, November 11th at
2pm. Between 2 and 6 pm the generation and structuring steps were accomplished. The representation step (i.e., the
data entry, analysis and production of materials for interpretation) was completed by the co-facilitators (Trochim and
Cook) during the evening of November 11th. The Interpretation step was accomplished from 9 to 12 am on Friday,
November 12th. Participants were given a two-hour lunch during which they could skim four documents that
attempted to delineate competencies in PSR or related areas (Curtis, 1993; Friday and McPheeters, 1985; Jonikas,
1993; IAPSRS Ontario Chapter, 1992). The Utilization step was accomplished on Friday afternoon from 2 to 5 pm.
Generation
The generation step essentially consists of a structured brainstorming session (Osborn, 1948) guided by a specific
focus prompt that limits the types of statements that are acceptable. The focus statement or criterion for generating
statements was operationalized in the form of the instruction to the participants given above. The general rules of
brainstorming applied. Participants were encouraged to generate as many statements as possible (with an upper limit
of 100); no criticism or discussion of other's statements was allowed (except for purposes of clarification); and all
participants were encouraged to take part. The group brainstormed ninety-six statements in approximately a forty-
five minutes.
The complete listing of brainstormed statements is given in Table 1. Participants were given a short break while the
statements were printed and duplicated for use in the structuring stage.
4
Structuring
Structuring involved two distinct tasks, the sorting and rating of the brainstormed statements. For the sorting
(Rosenberg and Kim, 1975; Weller and Romney, 1988), each participant was given a listing of the statements laid
out in mailing label format with twelve to a page and asked to cut the listing into slips with one statement (and its
identifying number) on each slip. They were instructed to group the ninety-six statement slips into piles "in a way
that makes sense to you." The only restrictions in this sorting task were that there could not be: (a) N piles (in this
case 96 piles of one item each); (b) one pile consisting of all 96 items; or (c) a "miscellaneous" pile (any item
thought to be unique was to be put in its own separate pile). Weller and Romney (1988) point out why unstructured
sorting (in their terms, the pile sort method) is appropriate in this context:
The outstanding strength of the pile sort task is the fact that it can accommodate a large number of items.
We know of no other data collection method that will allow the collection of judged similarity data among
over 100 items. This makes it the method of choice when large numbers are necessary. Other methods that
might be used to collect similarity data, such as triads and paired comparison ratings, become impractical
with a large number of items (p. 25).
After sorting the statements, each participant recorded the contents of each pile by listing a short pile label and the
statement identifying numbers on a sheet that was provided. For the rating task, the brainstormed statements were
listed in questionnaire form and each participant was asked to rate each statement on a 5-point Likert-type response
scale in terms of the relative importance of each competency as stated above. Because participants were unlikely to
brainstorm statements that were totally unimportant with respect to PSR, it was stressed that the rating should be
considered a relative judgment of the importance of each item to all the other items brainstormed.
This concluded the structuring session.
Representation
In the representation step, the sorting and rating data were entered into the computer, the MDS and cluster analysis
were conducted, and materials were produced for the interpretation step.
The concept mapping analysis begins with construction from the sort information of an NxN binary, symmetric
matrix of similarities, Xij. For any two items i and j, a 1 was placed in Xij if the two items were placed in the same
pile by the participant, otherwise a 0 was entered (Weller and Romney, 1988, p. 22). The total NxN similarity
matrix, Tij was obtained by summing across the individual Xij matrices. Thus, any cell in this matrix could take
integer values between 0 and 11 (i.e., the 11 people who sorted the statements); the value indicates the number of
people who placed the i,j pair in the same pile.
The total similarity matrix Tij was analyzed using nonmetric multidimensional scaling (MDS) analysis with a two-
dimensional solution. The solution was limited to two dimensions because, as Kruskal and Wish (1978) point out:
Since it is generally easier to work with two-dimensional configurations than with those involving more
dimensions, ease of use considerations are also important for decisions about dimensionality. For example,
when an MDS configuration is desired primarily as the foundation on which to display clustering results,
then a two-dimensional configuration is far more useful than one involving three or more dimensions (p.
58).
The analysis yielded a two-dimensional (x,y) configuration of the set of statements based on the criterion that
statements piled together most often are located more proximately in two-dimensional space while those piled
together less frequently are further apart.
5
This configuration was the input for the hierarchical cluster analysis utilizing Ward's algorithm (Everitt, 1980) as the
basis for defining a cluster. Using the MDS configuration as input to the cluster analysis in effect forces the cluster
analysis to partition the MDS configuration into non-overlapping clusters in two-dimensional space. There is no
simple mathematical criterion by which a final number of clusters can be selected. The procedure followed here was
to examine an initial cluster solution that on average placed five statements in each cluster. Then, successively lower
and higher cluster solutions were examined, with a judgment made at each level about whether the merger/split
seemed substantively reasonable. The pattern of judgments of the suitability of different cluster solutions was
examined and resulted in acceptance of the fifteen cluster solution as the one that preserved the most detail and
yielded substantively interpretable clusters of statements.
The MDS configuration of the ninety-six points was graphed in two dimensions and is shown in Figure 1. This
"point map" displayed the location of all the brainstormed statements with statements closer to each other generally
expected to be more similar in meaning. A "cluster map" was also generated and is shown in Figure 2. It displayed
the original ninety-six points enclosed by boundaries for the eighteen clusters.
The 1-to-5 rating data was averaged across persons for each item and each cluster. This rating information was
depicted graphically in a "point rating map" (Figure 3) showing the original point map with average rating per item
displayed as vertical columns in the third dimension, and in a "cluster rating map" which showed the cluster average
rating using the third dimension. The following materials were prepared for use in the second session:
(1) the list of the brainstormed statements grouped by cluster

(2) the point map showing the MDS placement of the brainstormed statements and their identifying
numbers (Figure 1)
(3) the cluster map showing the eighteen cluster solution (Figure 2)
(4) the point rating map showing the MDS placement of the brainstormed statements and their
identifying numbers, with average statement ratings overlaid (Figure 3)
(5) the cluster rating map showing the eighteen cluster solution, with average cluster ratings overlaid
Representation Results
The final stress value for the multidimensional scaling analysis was .2980101.
Methods for estimating the reliability of concept maps are described in detail in Trochim (1993). Here, six
reliability coefficients were estimated. The first is analogous to an average item-to-item reliability. The second and
third are analogous to the average item-to-total reliability (correlation between each participant's sort and the total
matrix and map distances respectively). The fourth and fifth are analogous to the traditional split-half reliability.
The sixth is the only reliability that examines the ratings, and is analogous to an inter-rater reliability. All average
correlations were corrected using the Spearman-Brown Prophesy Formula (Weller and Romney, 1988) to yield final
reliability estimates. The results are given in Table 2.
Interpretation
The interpretation session convened on Friday morning to interpret the results of the concept mapping analysis. This
session followed a structured process described in detail in Trochim (1989a). The facilitator began the session by
giving the participants the listing of clustered statements and reminding them of the brainstorming, sorting and rating
tasks performed the previous evening. The participants were asked to read through the set of statements in each
cluster and generate a short phrase or word to describe or label the set of statements as a cluster. The facilitator led
the group in a discussion where they worked cluster-by-cluster to achieve group consensus on an acceptable label for
each cluster. In most cases, when persons suggested labels for a specific cluster, the group readily came to a
consensus. Where the group had difficulty achieving a consensus, the facilitator suggested they use a hybrid name,
combining key terms or phrases from several individuals' labels.
6
Once the clusters were labeled, the group was given the point map (Figure 1) and told that the analysis placed the
statements on the map so that statements frequently piled together are generally closer to each other on the map than
statements infrequently piled together. To reinforce the notion that the analysis placed the statements sensibly,
participants were given a few minutes to identify statements close together on the map and examine the contents of
those statements. After becoming familiar with the numbered point map, they were told that the analysis also
organized the points (i.e., statements) into groups as shown on the list of clustered statements they had already
labeled. The cluster map was presented (Figure 2) and participants were told that it was simply a visual portrayal of
the cluster list. Each participant wrote the cluster labels next to the appropriate cluster on their cluster map. This
labeled cluster map is shown in Figure 4.
Participants then examined the labeled cluster map to see whether it made sense to them. The facilitator reminded
participants that in general, clusters closer together on the map should be conceptually more similar than clusters
farther apart and asked them to assess whether this seemed to be true or not. Participants were asked to think of a
geographic map, and "take a trip" across the map reading each cluster in turn to see whether or not the visual
structure seemed sensible. They were then asked to identify any interpretable groups of clusters or "regions." These
were discussed and partitions drawn on the map to indicate the different regions. Just as in labeling the clusters, the
group then arrived at a consensus label for each of the identified regions. Five regions were identified and are shown
in capital letters in Figure 4. No boundaries were drawn to distinguish these five regions.
The facilitator noted that all of the material presented to this point used only the sorting data. The results of the
rating task were then presented through the point rating (Figure 3) and cluster rating (Figure 5) maps. It was
explained that the height of a point or cluster represented the average importance rating for that statement or cluster
of statements. Again, participants were encouraged to examine these maps to determine whether they made intuitive
sense and to discuss what the maps might imply about the ideas that underlie their conceptualization. The final
original labeled cluster rating map is shown in Figure 5.
Table 3 shows the complete cluster listing with the cluster labels the participants assigned and the average
importance rating for each statement and cluster.
Discussion of Skills versus Values
The pattern of ratings on the map suggested that participants attached more importance to the clusters that had
"value" statements than to those made up of skills. This can perhaps be seen most clearly in Table 4 which shows
the ninety-six competency statements sorted from highest to lowest average importance rating. It is clear from the
table that the statements near the top of the table tend to be more general in nature and more related to values while
the statements near the bottom of the table tend to be more specific, operationalized, skill or knowledge-based ones.
Some of the participants felt that the value statements can't be considered competencies per se because they are not
sufficiently operationalized. Others felt that the value statements have actually been holding IAPSRS back in their
development of competencies because they place too much importance on these generic values and not on a more
specific skill base. Still others felt that the value statements are at the heart of what PSR represents and that they can
and should be operationalized as competencies. The facilitator characterized the discussion as a choice between two
alternatives:
A) Pull the value statements out of the competencies, perhaps putting them in a section up front describing
the kinds of values and characteristics expected of psychosocial rehabilitation workers.
B) Operationalize the value statements so they can be included as formal competencies.
The consensus of the group was that option B was preferable. As a result, the group decided that a major portion of
the afternoon utilization session would involve taking the value-oriented clusters (Clusters 1-5) and attempting to
draft operationalized competency statements for the statements in these clusters.
Discussion of What was Missing on the Map
7
The group also discussed what concepts seemed to be missing (primarily at the cluster level) from the map. The
following potentially missing labels were generated:
1 Advocacy
2 Systems Change
3 Vocational-Employment
4 Spiritual
5 Housing
6 Education
7 Health
8 Social/Recreational
9 Outcome Evaluation
10 Client Budgeting/Finances
11 Program Management
12 Health and Safety
The group then discussed whether the eventual competencies should have subject-specific categories (such as
housing, education, employment) or whether competencies related to such areas should be spread across the types of
headings already on the map (for instance, consumer outcomes related to employment). The consensus of the group
was that the competencies should not be grouped by subject.
Utilization
The utilization step took place on Friday afternoon from 2-5pm. The following schedule was explained to the
participants when they returned from lunch.
Time Activity Facilitator

2-3 Review progress and where we stand BT
Review and Feedback on the map's clusters BT
and regions
Discuss the competency documents JC
Present the two small group tasks and have JC
participants select their group/task
3-4 Small group sessions
4-4:50 Presentation of results of small groups
Summary of map revisions BT
Summary of operationalizing of the five Group
clusters Leaders
4:50-5 Discussion of next steps and wrap-up Anita Pernell-
Arnold
Review and Feedback on the Map's Clusters and Regions
The first part of the utilization discussion involved suggestions from participants regarding changes that could be
made to the final map in order to make it more interpretable, cohesive and usable. The discussion which took place
raised the following points.
Reactions to the Five Regions

1 Doesn't matter which five labels we use.
2 Change the name "Techniques."
3 What is the meaning of "consumer" (consumer involvement issues).
4 "Practitioner" is very broad.
5 Change titles by adding "competencies" to the labels.
6 Some consumer competencies are knowledge-based, others are techniques, others are system issues.
8
7 View (regions) as "key ingredients."
Reactions to Clusters
1 People did some categories according to the specific words in titles (e.g., "ability to...", or "knowledge
of..."). Was this wise?
2 Family relationships is lacking key intervention skills--want to add more?
3 Reconsider the two consumer clusters -- are labels OK?
4 Take another look at Friday and McPheeters broad classification -- better than ours? (Some said they
lose the values; do they exclude the consumers?).
5 Rename cluster 9 (Assessment) or think of dividing it up.
6 Revisit the cluster name "Personality Characteristics."
7 Consider combining "Interpersonal Social Skills" and "Supportive Behaviors."
Discussion of Other Competency Documents
The group then discussed the four competency statement documents (Curtis, 1993; Friday and McPheeters, 1985;
Jonikas, 1993; IAPSRS Ontario Chapter, 1992) that they skimmed over lunch and compared these to the map. The
following comments were made:
1. Current group has defined a set of competencies that is impressive. Need to be clear that we shouldn't
come up with competencies that are unrealistic, over-skilled, characterize a broad range of
competencies.
2. Curtis (1993) was not intended to specify competencies limited to PSR.
3. Curtis (1993) is good in its specificity.
4. Jonikas (1993) document has a totality that will be useful in deciding what to put where.
5. Eighty percent of all documents (including the concept map) were similar.
6. Friday and McPheeters (1985) shows earlier development of the field.
7. There is more in the literature of competencies than we thought.
8. Competencies related to knowledge of principles may not capture the centrality of safety, spirituality,
work, decent place to live, social life, education, and physical health in PSR. Don't want to lose the
essentials. Also want to emphasize high quality outcomes in these areas.
9. IAPSRS Ontario Chapter (1992) is impressive in its succinctness and specificity. Could help guide us
in our document. Action verbs were good in this document.
10.Curtis (1993) document emphasizes the importance of creation of environments, social situations. Not
just changing the individual, but creating contexts. Good use of respect as a concept/process.
Small Group Sessions
In the middle of the afternoon utilization step, the participants were divided into small groups in order to accomplish
some more detailed work. Five groups of 2-3 participants each took one of the first five clusters and attempted to
operationalize the statements in the cluster into ones that better approximated competency statements. One small
group of six participants discussed and made slight revisions to the final concept map. The results of these two types
of small group exercises are described in separate sections below.
Small Group Operationalizations of Five Clusters
Based on the interpretation discussion in the morning session, it was clear that the participants thought that many of
the statements in the first five clusters were better described as "values" than as operationalized competency
statements. The group thought that these value statements could be operationalized and that this would be a central
task for IAPSRS to accomplish as it developed competencies. The central utilization task of the afternoon therefore
was to have small groups of participants, each assigned one of the first five clusters, take the statements in the
clusters and develop draft operational competency statements. The summaries of these discussions (taken from the
newsprint sheets used at the presentation of the results) are reproduced below.
9
10
Cluster 1: Interpersonal Skills
This group took each statement in the cluster and generated several more operationalized statements.
Where appropriate, they chose statements from several of the other competency documents and these are
cited. This listing shows each brainstormed statement in Cluster 1 and the draft competency statements that
the small group generated.
1. ability to listen to consumers

• not interrupt the consumer
• able to repeat back what was said with the consumer affirming the correctness
• not imposing your agenda on them
10. ability to motivate clients to change behavior

• to be able to identify reasons for changing the behavior
• to be able to help them identify consequences
• willingness to serve as role model for desired change
• willingness to reinforce behavior that has been changed
36. ability to use the helping relationship to facilitate change

• use one's own experiences to encourage and guide the consumer
• ability to demonstrate approval and pride in their accomplishments
87. ability to interact and provide support in a non-judgmental fashion

• do not demean or patronize consumers
• give feedback on behavior and not the person (Friday and McPheeters, 1985)
• use language and behavior which reflects and perpetuates the dignity of the individual (Curtis,
1993)
5. ability to offer hope to others

• truly believe that there is hope and verbalize it to the consumer
• share examples of change that was possible in a seemingly hopeless situation
• have a healthy sense of humor and minimize the adversity (Friday and McPheeters, 1985)
• focus on consumer successes and help consumer see their own personal growth
6. belief in the recovery process

• the worker has to demonstrate that he/she believes in the recovery process
• to express the belief to the consumers that it's possible for them to live productive satisfying lives
in the community (Jonikas, 1993)
• help the consumer believe in his/her inherent capacity to improve or grow, given the opportunity
and resources, as it's true for all persons (adapted from Jonikas, 1993)
39. ability to build on successes and minimize failures

• point out and celebrate their successes
• help them to see their failure as a learning experience
• supporting risk-taking behaviors to move one step beyond
• ability to have the consumer feel good and acknowledge own success no matter how small
(adapted from Friday and McPheeters, 1985)
31. connecting (interpersonal) skills

• demonstrate behaviors that accept the consumer where he/she is at
• ability to establish a caring but not a consuming or possessive relationship
• demonstrate behaviors that show interest in the consumer and his/her interpretation of needs
78. ability to work with consumer colleagues

• to show sensitivity to the difficulties that they may encounter in their dual role
11
• avoid labeling persons (either consumers or consumer colleagues) with stereotypes or derogatory
terms (Friday and McPheeters, 1985)
• be straight with consumer colleagues
• have the same expectations as you do for all other colleagues
89. ability to normalize interactions and program practices

• ability to generalize program experiences to activities in the broader community
• have expectations within the program that are consistent with community expectations (with leeway in
terms of enforcement)
• set reasonable limits on bizarre behavior with explanations as to why you are doing it
Cluster 2: Supportive Behaviors
This group generated the following draft competency statements to cover the material listed in Cluster 2.
• ability to maintain ongoing productive relationship based on client satisfaction

• demonstrate high level of interaction (i.e., amount of time, interests, excitement, energy level)
• communicates belief in growth potential
• communicates understanding of thoughts/feelings of others in a non-judgmental manner
• demonstrates holistic understanding of the individual
• able to focus on the consumer's here and now needs/desires (there was some disagreement on the
wording of this one)
• ability to respond in a normalizing manner to the individual's diverse needs and strengths
The following were suggestions from the group about what statements might be "borrowed" from existing
lists:
from Curtis (1993):

4. Demonstrates basic communication and support skills
A1. Exhibits supportive interpersonal skills (i.e., ...)
A2.Establishes and maintains productive relationships with service recipients
• All of 4A--some areas to "negotiate"
1. especially A and B (language, behavior and holistic understanding)
from Friday and McPheeters (1985):

• III. Interpersonal - especially 2, 4, 6, 7, 8
Their group also listed some ways to measure competencies in this area:
• amount of time spent with client

• client satisfaction with the relationship (amount of support perceived)
• peer feedback/input
• share and use own life experience
• reciprocity of relationship
• genuineness
Cluster 3: Professional Role
For each statement in Cluster 3, the group generated one or two potential competency statements.
14. ability to negotiate

• to demonstrate communication skills between stakeholders for the purpose of goal attainment
which is satisfactory to all parties
58. ability to set limits
12
• to identify personal skills and resources, and expectations held by stakeholders in order to achieve
realistic/attainable goals
17. willingness to have fun

• to actively participate in "activities"
82. ability to use self as a role model

• to mutually share experiences and ideas
• to achieve goals through partnership
47. ability to ask for help and receive constructive feedback from colleagues and consumers
51. ability to let go

• to assist consumers to identify their skills/resources and promote a belief in efficacy of their skills
in order for consumers to take charge
88. ability to overcome personal prejudices when providing services

• to identify personal values/beliefs and evaluate their potential impact on all interactions
Cluster 4: Personality Characteristics
16. self awareness

• be able to describe and explain one's own actions
56. good personal stability but not ego-centric

• respond consistently and congruently to social and environmental demands
50. ability to handle personal stress

• separate personal needs and behaviors from job performance needs and behaviors
18. flexibility
• be able to change behaviors when situations, expectations and requirements are different
25. patience
• to calmly wait until the objective is reached
28. sense of humor

• to laugh at what is funny, to laugh at oneself, and to laugh with others
93. ability to know own limits

• to be able to stop when necessary; to be able to ask for help; to be able to ask for information
Cluster 5: Self Management
24. ability to read and write

1. person must meet high school equivalency level of reading and writing
2. must include accommodations for disabilities like blindness
3. ability to write in behavioral language
4. ability to write with clarity
5. reading comprehension skills must include ability to look up words in the dictionary, comprehend
language(s) used in service settings
29. ability to partialize tasks
13
41. ability to handle multiple tasks
69. ability to prioritize and manage time
• recognition of total number of tasks inherent in responsibilities
• identify critical tasks by applying an agreed-upon standard for what is most important
• ability to gauge the level of effort and amount of time necessary to complete discrete tasks
• ability to use organizational tools (calendars, to-do lists, tickler file) to keep track of tasks
• ability to engage consumers in assisting with provider's task and time management
• ability to recognize1 and deal effectively2 with personal stress resulting from multiple tasks
33. tolerance for ambiguity and enjoying diversity
Tolerating Ambiguity
1. Ability to problem-solve ambiguous situations through involvement of others in identification of
problem, generation of a number of potential solutions, evaluating candidate solutions, seeking
staff/consumer/family/network feedback re: viability of solutions, selection of solutions,
implementation and evaluation of solutions.
2. Ability to recognize and accept unresolvable ambiguities through letting-go, acceptance, humor
and other strategies.
3. Ability to distinguish between truly ambiguous situations and situations based on lack of: info,
training, feedback from others. Also, ability to address lacking areas by obtaining info, furthering
education/training, seeking feedback.
Enjoying Diversity
1. Ability to identify the opportunities presented by diversity and to incorporate them positively into
the rehabilitation process through providing alternatives for behavior, problem solution,
identification of opportunities.
91. willingness to take risks

1. demonstration of creative approaches
2. allowing/assisting consumers to exercise options not endorsed by practitioner, after applying
standards of reasonable judgment (safety, etc.)
3. demonstration of willingness to try new or untested approaches and interventions
45. ability to be pragmatic and do hands-on sorts of work

1. Recognition that PSR rehabilitation involves the doing of hands-on tasks for role modeling,
relationship building, etc.
2. Willingness to accept and perform well on hands-on, practical tasks.
3. Ability to develop and implement rehabilitation situations in which behavior or doing leads to
insight rather than vice versa.
94. never-ending willingness to develop oneself

1. NOTE: The group suggested that this item be moved to the Professional Development cluster.
This suggestion was adopted.
2. Development of one's personal growth through hobbies, therapy, education, and to share that
growth with consumers/peers for role modeling and motivation.
3. Willingness to seek help appropriately with one's own problems.
Small Group Map Revision
1recognize - increased feelings of anger and frustration about job, procrastination, blaming others
2deal effectively - use of humor, sharing with colleagues and consumers, stress management techniques (deep
breathing, serenity prayer, mantras)
14
The small group that considered the revisions to the map began by working with the suggestions generated earlier by
the entire participant group. The following shows these suggestions along with the actions taken, if any, by the small
group:
Large Group Suggestions Small Group Actions

1. Doesn't matter which five labels we use. Two changes were made to the original five labels. The
label "Techniques" was changed to "Rehabilitation
Methodology Competencies" and the original label
"Consumer" was changed to "Consumer-Centered
Competencies". In addition, all five labels had the term
"Competencies" appended to the end.
2. Change the name "Techniques." The label "Techniques" was changed to "Rehabilitation
Methodology Competencies".
3. What is the meaning of "consumer" (consumer The original label "Consumer" was changed to
involvement issues). "Consumer-Centered Competencies".
4. "Practitioner" is very broad. The group decided that the term "Practitioner" would be
left as is because it was an appropriately broad label for
a region name.
5. Change titles by adding "competencies" to the This was done for all region and cluster labels.
labels.
6. Some consumer competencies are knowledge- The small group agreed but made no changes to the map
based, others are techniques, others are system in response to this.
issues.
7. View (regions) as "key ingredients." The small group agreed but made no changes to the map
in response to this.
1. People did some categories according to the The small group agreed but made no changes to the map
specific words in titles (e.g., "ability to...", or in response to this.
"knowledge of..."). Was this wise?
2. Family relationships is lacking key intervention The cluster label "Family Relationships" was changed to
skills--want to add more? "Family-Focused." No intervention items were added.
3. Reconsider the two consumer clusters -- are labels Changed the original cluster label "Consumer Goal
OK? Attainment" to "Consumer Outcome Competencies."
4. Take another look at Friday and McPheeters The small group felt that there was considerable cross-
broad classification -- better than ours? (Some classifiability across the different competency
said they lose the values; do they exclude the documents and the map. No changes were made to the
consumers?). map in response to this.
5. Rename cluster 9 (Assessment) or think of The group retained the name for the cluster, only
dividing it up. changing it to "Assessment Competencies." See table
below for specific statements moved into and out of this
cluster.
6. Revisit the cluster name "Personality The group changed the original cluster label
Characteristics." "Personality Characteristics" to "Intrapersonal
Competencies."
7. Consider combining "Interpersonal Social Skills" These clusters (original clusters 1 and 2) were combined
and "Supportive Behaviors." into one cluster labeled "Interpersonal Competencies."
The original cluster label "Cultural Competence" was

changed to "Multicultural Competencies."
15
The positions of the original clusters "Family
Relationships" and "Mental Health Knowledge Base"
were switched on the map.
In addition to the above changes, several specific statements were shifted from one cluster to another. These changes
are shown in Figure 6 and listed in the table below:
Statement Original Cluster Location Cluster Moved To

43. knowledge of a wide variety of Family Relationships Mental Health Knowledge Base
approaches to mental health services Competencies
40. ability to establish alliances with Family Relationships Community Resources
providers, professionals, families, Competencies
consumers (partnership model)
12. skills in advocacy Assessment Community Resources
Competencies
15. strong crisis intervention skills Assessment Intervention Skills Competencies
85. early identification and Assessment Intervention Skills Competencies
intervention skills to deal with
relapse
94. never-ending willingness to Personality Characteristics Professional Development
develop oneself Competencies
53. ability to assess behavior in Intervention Skills Assessment Competencies
specific environments
55. functional assessment Intervention Skills Assessment Competencies
64. ability to assess active addiction Intervention Skills Assessment Competencies
and co-dependency
In all of the nine statement shifts described above, the shift was from one cluster into an adjacent one on the map.
The revised cluster listing showing the new cluster labels and the average importance ratings is given in Table 4.
The small group also drew explicit lines dividing the five regions. These are shown in Figure 7. They felt that
several of the clusters actually overlapped multiple regions and, consequently, the region lines cut through a cluster
shape rather than only going between clusters. For instance, The felt that the cluster "Interpersonal Competencies"
should fall simultaneously and partially into the three regions of "Consumer-Centered Competencies", "Practitioner
Competencies" and Rehabilitation Methodology Competencies." Similarly, they felt that the cluster "Professional
Development Competencies" should fall into both the "practitioner Competencies" and "Knowledge Base
Competencies" regions. The regional lines were drawn on the final map to show these multi-regional clusters.
Figure 8 constitutes the final map for this project. It shows the clusters and regions and includes the average
importance ratings for each cluster. There was considerable consensus across the participant group that it was a
good and fair representation of their ideas regarding competencies for psychosocial rehabilitation workers.
Next Steps
The final discussion of the project involved consideration of the next steps in the competency development process.
The following points were made:
1. Print up list of competencies and survey PSR workers.

2. Review and comment on Trochim concept mapping report.
3. Circulate regions, clusters and individual competencies to various constituencies: consumers,
families, PSR workers, other stakeholders.
4. Further operationalize remaining competencies.
5. Distinguish between entry-level and second-level competencies.
6. Edit and make language consistent on materials sent out for review.
16
7. Clarify the intent of the present process re: the use to which the final product will be put.
8. Inform a wide range of stakeholders of IAPSRS's intentions in this area.
9. Bring in an expert in credentialing to clarify legal risks, probable results, etc.
10. Involve Training and Certification Committee in this process.
11. Don't send document for review prematurely. Use simple format that helps potential reviewers.
Perhaps include a glossary to aid potential reviewers.
12. Be aware of other lists of competencies so review process doesn't become confused.
13. Include feedback from IAPSRS chapter presidents.
14. Certification conference.
15. Further literature review.
16. Hire someone to draft standards from competencies.
17. Develop an ethics statement based on already-held ethics forum.
18. Requirements of an "arms length" certification organization.
19. Need to consider the voluntary nature of CARF accreditation for organizations parallel to possible
implementation of standards for practitioners.
20. Conduct a cost/benefit analysis of certification.
17
References
Bragg, L.R. and Grayson, T.E. (1993). Reaching consensus on outcomes: Lessons learned about concept mapping.
Caracelli, V. (1989). Structured conceptualization: A framework for interpreting evaluation results. Evaluation and
Program Planning. 12, 1, 45-52.
Cook, J. (1992). Modeling staff perceptions of a mibile job support program for persons with severe mental illness.
Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.
Cooksy, L. (1989). In the eye of the beholder: Relational and hierarchical structures in conceptualization.
Curtis, L. (1993). Workforce competencies for direct service staff to support adults with psychiatric disabilities in
community mental health services. The Center for Community Change through Housing and Support,
Burlington, VT.
Davis, J. (1989). Construct validity in measurement: A pattern matching approach. Evaluation and Program
Planning. 12, 1, 31-36.
Dumont, J. (1989). Validity of multidimensional scaling in the context of structured conceptualization. Evaluation
and Program Planning. 12, 1, 81-86.
Everitt, B. (1980). Cluster Analysis. 2nd Edition, New York, NY: Halsted Press, A Division of John Wiley and
Sons.
Friday, J.C. and McPheeters, H.L. (1985). Assessing and improving the performance of psychosocial rehabilitation
staff. Southern Regional Education Board, Atlanta, GA.
Galvin, P.F. (1989). Concept mapping for planning and evaluation of a Big Brother/Big Sister program. Evaluation
and Program Planning. 12, 1, 53-58.
Grayson, T.E. (1992). Practical issues in implementing and utilizing concept mapping. Paper presented at the Annual
Grayson, T.E. (1993). Empowering key stakeholders in the strategic planning and development of an alternative
school program for youth at risk of school behavior. Paper presented at the Annual Conference of the American
Gurowitz, W.D., Trochim, W. and Kramer, H. (1988). A process for planning. The Journal of the National
Association of Student Personnel Administrators, 25, 4, 226-235.
International Association of Psychosocial Rehabilitation Services, Ontario Chapter. (1992). Competencies for Post-
Diploma Certificate Programs in Psychosocial Rehabilitation, Ontario, Canada.
Jonikas, J.A. (1993). Staff competencies for service-delivery staff in psychosocial rehabilitation programs.
Thresholds National Research and Training Center on Rehabilitation and Mental Illness, Chicago, IL.
Kane, T.J. (1992). Using concept mapping to identify provider and consumer issues regarding housing for persons
with severe mental illness. Paper presented at the Annual Conference of the American Evaluation Association,
Seattle, WA.
Keith, D. (1989). Refining concept maps: Methodological issues and an example. Evaluation and Program
Planning. 12, 1, 75-80.
Kohler, P.D. (1992). Services to students with disabilities in postsecondary education settings: Identifying program
outcomes. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.
Kohler, P.D. (1993). Serving students with disabilities in postsecondary education settings: Using program outcomes
for planning, evaluation and empowerment.Paper presented at the Annual Conference of the American
Lassegard, E. (1992). Assessing the reliability of the concept mapping process. Paper presented at the Annual
Lassegard, E. (1993). Conceptualization of consumer needs for mental health services.Paper presented at the Annual
Linton, R. (1989). Conceptualizing feminism: Clarifying social science concepts. Evaluation and Program Planning.
12, 1, 25-30.
18
Mannes, M. (1989). Using concept mapping for planning the implementation of a social technology. Evaluation and
Marquart, J.M. (1988). A pattern matching approach to link program theory and evaluation data: The case of
employer-sponsored child care. Unpublished doctoral dissertation, Cornell University, Ithaca, New York.
Marquart, J.M. (1989). A pattern matching approach to assess the construct validity of an evaluation instrument.
Marquart, J.M. (1992). Developing quality in mental health services: Perspectives of administrators, clinicians, and
consumers. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.
Marquart, J.M., Pollak, L. and Bickman, L. (1993). Quality in intake assessment and case management: Perspectives
of administrators, clinicians and consumers. In R. Friedman et al. (Eds.), A system of care for children's mental
health: Organizing the research base. Tampa: Florida Mental Health Institute, University of South Florida.
Mead, J.P. and Bowers, T.J. (1992). Using concept mapping in formative evaluations. Paper presented at the Annual
Mercer, M.L. (1992). Brainstorming issues in the concept mapping process. Paper presented at the Annual
Penney, N.E. (1992). Mapping the conceptual domain of provider and consumer expectations of inpatient mental
health treatment: New York Results. Paper presented at the Annual Conference of the American Evaluation
Romney, A.K., Weller, S.C. and Batchelder, W.H. (1986). Culture as consensus: A theory of culture and informant
accuracy. American Anthropologist, 88, 2, 313-338.
Rosenberg, S. and Kim, M.P. (1975). The method of sorting as a data gathering procedure in multivariate research.
Multivariate Behavioral Research, 10, 489-502.
Ryan, L. and Pursley, L. (1992). Using concept mapping to compare organizational visions of multiple stakeholders.
SenGupta, S. (1993). A mixed-method design for practical purposes: Combination of questionnaire(s), interviews,
and concept mapping.Paper presented at the Annual Conference of the American Evaluation Association,
Dallas, TX.
Shern, D.L. (1992). Documenting the adaptation of rehabilitation technology to a core urban, homeless population
with psychiatric disabilities: A concept mapping approach. Paper presented at the Annual Conference of the
Shern, D.L., Trochim, W. and LaComb, C.A. (1993). The use of concept mapping for assessing fidelity of model
transfer: An example from psychiatric rehabilitation. Unpublished manuscript. New York State Office of mental
health, Albany, NY..
Trochim, W. (1985). Pattern matching, validity, and conceptualization in program evaluation. Evaluation Review, 9,
5, 575-604.
Trochim, W. (1989a). An introduction to concept mapping for planning and evaluation. Evaluation and Program
Planning, 12, 1, 1-16.
Trochim, W. (1989a). An introduction to concept mapping for planning and evaluation. Evaluation and Program
Planning, 12, 1, 1-16.
Trochim, W. (1989b). Concept mapping: Soft science or hard art? Evaluation and Program Planning, 12, 1, 87-
110.
Trochim, W. (1989b). Concept mapping: Soft science or hard art? Evaluation and Program Planning, 12, 1, 87-
110.
Trochim, W. (1989c). Outcome pattern matching and program theory. Evaluation and Program Planning, 12, 4,
355-366.
Trochim, W. (1990). Pattern matching and program theory. In H.C. Chen (Ed.), Theory-Driven Evaluation. New
Directions for Program Evaluation, San Francisco, CA: Jossey-Bass.
Trochim, W. (1993). The reliability of concept mapping. Paper presented at the Annual Conference of the American
Evaluation Association, Dallas, Texas, November 6, 1993.
19
Trochim, W. and Cook, J. (1992). Pattern matching in theory-driven evaluation: A field example from psychiatric
rehabilitation. in H. Chen and P.H. Rossi (Eds.) Using Theory to Improve Program and Policy Evaluations.
Greenwood Press, New York, 49-69.
rehabilitation. in H. Chen and P.H. Rossi (Eds.) Using Theory to Improve Program and Policy Evaluations.
Greenwood Press, New York, 49-69.
Trochim, W. and Linton, R. (1986). Conceptualization for evaluation and planning. Evaluation and Program
Planning, 9, 289-308.
Trochim, W., Cook, J. and Setze, R. (in press). Using concept mapping to develop a conceptual framework of staff's
views of a supported employment program for persons with severe mental illness. Consulting and Clinical
Psychology.
Valentine, K. (1992). Mapping the conceptual domain of provider and consumer expectations of inpatient mental
health treatment: Wisconsin results. Paper presented at the Annual Conference of the American Evaluation
Weller S.C. and Romney, A.K. (1988). Systematic Data Collection. Newbury Park, CA, Sage Publications.
White House Domestic Policy Council. (1993). Health Security: The President's Report to the American People.
Simon and Schuster: New York.
20
Table 1. Complete listing of the ninety-six brainstormed statements for the IAPSRS Project.
1 ability to listen to consumers

2 ability to relate to others
3 knowledge of mental illness
4 knowledge of side effects of medications and alternatives
5 ability to offer hope to others
6 belief in the recovery process
7 ability to emphasize client choices and strengths
8 knowledge of human services network in community
9 knowledge of community resources beyond human services
10 ability to motivate clients to change behavior
11 knowledge of family networks
12 skills in advocacy
13 view consumer as the director of the process
14 ability to negotiate
15 strong crisis intervention skills
16 self awareness
17 willingness to have fun with others
18 flexibility
19 knowledge of appropriate or applicable mental health acts (legislation)
20 knowledge of eligibility benefits
21 social group-work skills
22 ability to see consumers as equal partners
23 teaching ability
24 ability to read and write
25 patience
26 ability to empathize
27 ability to develop structured learning experiences
28 sense of humor
29 ability to partialize tasks
30 demonstration of respect and understanding for family members
31 connecting (interpersonal) skills
32 cultural competence and ability to deliver culturally relevant services
33 tolerance for ambiguity and enjoying diversity
34 value consumer's ability to seek and sustain employment opportunities
35 value consumer's ability to pursue educational goals
36 ability to use the helping relationship to facilitate change
37 ability to develop alliances/partnerships with family members
38 knowledge of ethnic-based familial role definitions
39 ability to build on successes and minimize failures
40 ability to establish alliances with providers, professionals, families, consumers (partnership model)
41 ability to handle multiple tasks
42 ability to replace self with naturally-occuring resources
43 knowledge of a wide variety of approaches to mental health services
44 knowledge of the community you serve and its environment
45 ability to be pragmatic and do hands-on sorts of work
46 ability to set goals
47 ability to ask for help and receive constructive feedback from consumers, peers, stakeholders
48 ability to work with employers
49 ability to generate enthusiasm
50 ability to handle personal stress
51 ability to let go
52 ability to understand the impact of culture and ethnicity on mental illness
53 ability to assess behavior in specific environments
21
54 knowledge of legal issues (e.g., civil commitment, guardianship) and the ethical context
55 functional assessment
56 good personal stability but not ego-centric
57 knowledge of relationship between health status and mental illness
58 ability to set limits
59 being able to help client set measureable goals
60 able to nurture
61 ability to assess resources
62 ability to encourage
63 ability to assess role of peer support
64 ability to assess active addiction and co-dependency
65 ability to assess and access decent housing
66 routinely solicits and incorporates consumer preferences
67 ability to explain illness to consumer
68 commitment to ongoing education and training
69 ability to prioritize and manage time
70 knowledge of history of psychosocial rehabilitation
71 knowledge of principles and values of psychosocial rehabilitation
72 ability to use and develop innovative approaches
73 knowledge of and respect for multi-lingual skills
74 ability to foster inter-dependence
75 belief in the value of self-help
76 ability to help consumers choose, get, keep jobs
77 understand the availability of alternatives
78 ability to work with consumer colleagues
79 ability to help consumer learn to manage own mental illness
80 ability to help consumers develop cohesive groups
81 ability and comfort in helping consumers in recreational pursuits
82 ability to use self as a role model
83 ability to design, deliver and ensure highly-individualized services and supports
84 ability to maintain consumer records
85 early identification and intervention skills to deal with relapse
86 ability to conduct skills training in a manner to help overcome cognitive deficits
87 ability to interact and provide support in a non-judgemental fashion
88 ability to overcome personal prejudices when providing services
89 ability to normalize interactions and program practices
90 commitment to furthering the methods and technologies in PSR through research and sharing of best
practices
91 willingness to take risks
92 belief in the effectiveness of psychosocial methods
93 ability to know own limits
94 never-ending willingness to develop oneself
95 ability or willingness to consider alternative paradigms
96 ability to empower consumers
22
Table 2. Reliability Estimates for IAPSRS Concept Mapping Project
Reliability Estimator Reliability

Average Sort-to-Sort Reliability .9124
Average Sort-to-Total Matrix Reliability .9607
Average Sort-to-Map Reliability .9117
Split-Half Total Matrix Reliability .9332
Split-Half Map Reliability .8882
Average Rating-to-Rating Reliability .8446
23
Table 3. Cluster listing for original map interpretation showing cluster labels, and statement and cluster average
importance ratings.
1 ability to listen to consumers 4.71

10 ability to motivate clients to change behavior 3.62
36 ability to use the helping relationship to facilitate change 3.76
87 ability to interact and provide support in a non-judgemental fashion 4.33
5 ability to offer hope to others 4.52
6 belief in the recovery process 4.33
39 ability to build on successes and minimize failures 4.10
31 connecting (interpersonal) skills 3.76
78 ability to work with consumer colleagues 3.52
89 ability to normalize interactions and program practices 3.71
Cluster 1 Average = 4.04
2 ability to relate to others 4.33

49 ability to generate enthusiasm 3.48
62 ability to encourage 4.14
60 able to nurture 3.43
26 ability to empathize 4.14
14 ability to negotiate 3.14

58 ability to set limits 3.14
17 willingness to have fun with others 3.00
82 ability to use self as a role model 3.48
47 ability to ask for help and receive constructive feedback from consumers, peers, stakeholders 3.86
51 ability to let go 2.95
88 ability to overcome personal prejudices when providing services 4.48
Cluster 4: Personality Charasterics
16 self awareness 4.00

56 good personal stability but not ego-centric 3.43
50 ability to handle personal stress 3.52
18 flexibility 4.10
25 patience 3.62
28 sense of humor 3.48
93 ability to know own limits 3.57
24
24 ability to read and write 3.52

29 ability to partialize tasks 3.14
45 ability to be pragmatic and do hands-on sorts of work 4.24
33 tolerance for ambiguity and enjoying diversity 3.71
91 willingness to take risks 3.57
41 ability to handle multiple tasks 3.05
69 ability to prioritize and manage time 3.29
94 never-ending willingness to develop oneself 3.57
Cluster 6: Mental Health Knowledge Base
3 knowledge of mental illness 3.76

57 knowledge of relationship between health status and mental illness 2.86
4 knowledge of side effects of medications and alternatives 3.43
19 knowledge of appropriate or applicable mental health acts (legislation) 2.05
54 knowledge of legal issues (e.g., civil commitment, guardianship) and the ethical context 2.43
Cluster 7: Family Relationships
11 knowledge of family networks 2.76

30 demonstration of respect and understanding for family members 3.38
37 ability to develop alliances/partnerships with family members 3.10
40 ability to establish alliances with providers, professionals, families, consumers (partnership 3.71
model)
43 knowledge of a wide variety of approaches to mental health services 2.86
Cluster 8: Community Resources
8 knowledge of human services network in community 3.33

20 knowledge of eligibility benefits 2.81
9 knowledge of community resources beyond human services 2.76
44 knowledge of the community you serve and its environment 3.14
48 ability to work with employers 3.24
25
Cluster 9: Assessment
12 skills in advocacy 3.38

63 ability to assess role of peer support 2.95
61 ability to assess resources 3.29
65 ability to assess and access decent housing 3.48
15 strong crisis intervention skills 3.29
85 early identification and intervention skills to deal with relapse 3.81
Cluster 10: Cultural Competence
32 cultural competence and ability to deliver culturally relevant services 3.71

38 knowledge of ethnic-based familial role definitions 3.10
52 ability to understand the impact of culture and ethnicity on mental illness 3.76
73 knowledge of and respect for multi-lingual skills 3.05
Cluster 11: Professional Development
68 commitment to ongoing education and training 3.10

72 ability to use and develop innovative approaches 3.76
95 ability or willingness to consider alternative paradigms 3.43
Cluster 12: Psychosocial Rehabilitation Knowledge Base
70 knowledge of history of psychosocial rehabilitation 2.76

71 knowledge of principles and values of psychosocial rehabilitation 4.14
77 understand the availability of alternatives 2.95
90 commitment to furthering the methods and technologies in PSR through research and sharing of 3.00
best practices
92 belief in the effectiveness of psychosocial methods 4.14
Cluster 13: Consumer Empowerment
7 ability to emphasize client choices and strengths 4.48

96 ability to empower consumers 4.62
13 view consumer as the director of the process 4.05
22 ability to see consumers as equal partners 4.00
66 routinely solicits and incorporates consumer preferences 4.24
42 ability to replace self with naturally-occuring resources 3.19
74 ability to foster inter-dependence 3.24
26
Cluster 14: Consumer Goal Attainment
34 value consumer's ability to seek and sustain employment opportunities 4.24

76 ability to help consumers choose, get, keep jobs 4.10
35 value consumer's ability to pursue educational goals 3.71
80 ability to help consumers develop cohesive groups 2.90
75 belief in the value of self-help 3.76
59 being able to help client set measureable goals 3.86
79 ability to help consumer learn to manage own mental illness 4.24
67 ability to explain illness to consumer 3.00
81 ability and comfort in helping consumers in recreational pursuits 2.86
Cluster 15: Intervention Skills
21 social group-work skills 2.52

27 ability to develop structured learning experiences 2.62
86 ability to conduct skills training in a manner to help overcome cognitive deficits 3.00
46 ability to set goals 3.76
23 teaching ability 3.24
83 ability to design, deliver and ensure highly-individualized services and supports 3.62
84 ability to maintain consumer records 2.95
53 ability to assess behavior in specific environments 3.19
55 functional assessment 3.05
64 ability to assess active addiction and co-dependency 3.29
27
Table 4. Listing of brainstormed statements sorted from highest to lowest average importance rating.

18 flexibility 4.10
model)
25 patience 3.62
28
best practices
29
Table 5. Revised cluster listing showing cluster labels and statement and cluster average importance ratings.
Cluster 1: Interpersonal Competencies

Cluster 2: Professional Role Competencies

Cluster 3: Intrapersonal Competencies

18 flexibility 4.10
25 patience 3.62
30
Cluster 4: Self Management Competencies

Cluster 5: Mental Health Knowledge Base Competencies

Cluster 6: Family-Focused Competencies

Cluster 7: Community Resources Competencies
model)
31
Cluster 8: Assessment Competencies

Cluster 9: Multicultural Competencies

Cluster 10: Professional Development Competencies

Cluster 11: Psychosocial Rehabilitation Knowledge Base Competencies

best practices
Cluster 12: Consumer Empowerment Competencies

32
Cluster 13: Consumer Outcome Competencies

Cluster 14: Intervention Competencies

33
75
66 13
80 96 22
35
34 79 7
76
Figure 1
74
1
67 59
10
81
42 87
12 36
86 46 39 5
65 63 15
85 27 6
89
21
8 9 61 60
78 31
20
23 62
64 83 2
44 49
58
48 55
26
53 14
17
82
54 57
40 84
19 3
4 29 51
47 88
25
24 28
30 18 93
43 77 56
37 50 16
33
45
11 41 91
72 69 94
95
71 92 68
70 90
52
32 38 73
75
66 13
80 96 22
35
Figure 2
34 7
13
76 79
74
14 1
67 59
10
81
42 87
12 36
86 46 5
39
65 63 15
85 27 6
89 1
9 21
60
8 9 61
78 31
20
23 2 62
64 15 83 2
44 49
8 58
48 55
26
53 14
17
82
54 57
40 84 3
19 3 4 51
29
47 88
6 25
24 28
30
7 18 93
43 77 56
37 50 16
5 33
45
11 41 91
72 69 94
95 4
12 11
71 92 68
70 90
10
75
66 13
80 96 22
Figure 3
35
34 7
76 79
74
1
67 59
10
81
42 87
12 36
86 46 39 5
65 63 15
85 27 6
89
21
8 9 61 60
78 31
20
23 62
64 83 2
44
58 49
48 55
26
53 14
17
57 82
54
40 84
19 3
4 29 51
47 16 88 25
24 28
30 18 93
43 77
37 56
33
45 50
41 91
11
72 69 94
95
2.05 - 2.58 71 92 68
70 90
2.58 - 3.11
3.11 - 3.65
52
3.65 - 4.18
4.18 - 4.71 32 38 73
Figure 4
Consumer
CONSUMER Empowerment
Consumer Goal
Attainment
Interpersonal
SYSTEM Skills Supportive
Assessment Behaviors
Intervention
Skills
Community PRACTITIONER
Resources TECHNIQUES
Professional
Role
Mental Health
Knowledge Base Family Professional
Relationships Development Self
Management
Psychosocial Personality
Rehabilitation Characteristics
Knowledge Base
Cultural
KNOWLEDGE Competence
BASE
Consumer
Figure 5
CONSUMER Empowerment
Consumer Goal
Attainment
Interpersonal
Skills Supportive
SYSTEM Assessment Behaviors
Intervention
Skills
Community PRACTITIONER
Resources TECHNIQUES
Professional
Role
Mental Health
Family
Knowledge Base Relationships Professional
Development Self
Management
Psychosocial
Rehabilitation Personality
LAYER AVERAGE RATING Knowledge Base Characteristics
1 2.90 - 3.13
2 3.13 - 3.36
3 3.36 - 3.58 Cultural
4 3.58 - 3.81 KNOWLEDGE Competence
5 3.81 - 4.04
BASE
75
66 13
80 96 22
35
Figure 6
34 7
12
76 13 79
74
1
67 59
10
81
42 87
65 36
63 15 46
86 39 5
12 85
27 6
1
61 89
8 21
8 9 14 60
78 31
20
7 23 62
83 2
64 49
44
58
48 55
26
53 14
17
30 40
82
11
37 84 2
29 51
6 47 88
57 25
43 5 4 24 28
3 18 93
54 56
77
4 33 50 16
19 45
41 91
11 72 69 94
95 10 3
71 92 68
70 90
52 9
32 38 73
CONSUMER-CENTERED
COMPETENCIES
Figure 7
REHABILITATION
METHODOLOGY
Consumer
COMPETENCIES Empowerment
Consumer
Outcome Competencies
Competencies
SYSTEM
COMPETENCIES Interpersonal
Intervention Competencies
Skills
Assessment Competencies
Competencies
Community
Resources
Competencies
Professional Role
Competencies
Family-
Focused
Competencies
Professional
Development
Competencies
Mental Health
Self-Management
KNOWLEDGE BASE Competencies
Multicultural
COMPETENCIES Psychosocial Competencies
Rehabilitation PRACTITIONER
Knowledge Base
Competencies COMPETENCIES
CONSUMER-CENTERED
COMPETENCIES
Figure 8
REHABILITATION
METHODOLOGY Consumer
Consumer Empowerment
COMPETENCIES Outcome Competencies
Competencies
SYSTEM
Interpersonal
COMPETENCIES Competencies
Intervention
Assessment Skills
Community
Resources
Competencies
Professional Role
Competencies
Family-
Focused
Competencies
Professional
Development
Competencies
Mental Health
Layer Average Self-Management
Importance KNOWLEDGE BASE Competencies
Multicultural
1 2.90 - 3.13 COMPETENCIES Psychosocial Competencies
2 3.13 - 3.36 Rehabilitation PRACTITIONER
3 3.36 - 3.59 Knowledge Base
4 3.59 - 3.82 Competencies COMPETENCIES
5 3.82 - 4.05
Introduction
The International Association of Psychosocial Rehabilitation Services (IAPSRS) has as one of its primary missions the
task of developing Psychosocial Rehabilitation (PSR) as a professional discipline. To that end, they have for several years
been working towards the development of a comprehensive set of workforce competencies that could be utilized as
standards in the certification of PSR workers. This task has become even more pressing in view of the national efforts to
develop comprehensive health insurance coverage in the United States (The White House Domestic Policy Council,
1993). It is essential that professional standards for PSR be clearly delineated if PSR is to be included as a service that is
covered under national health insurance.
In recent years, there have been a several efforts to elucidate PSR workforce competencies or competencies for related
endeavors that might be relevant (Curtis, 1993; Friday and McPheeters, 1985; Jonikas, 1993; IAPSRS Ontario Chapter,
1992). To move the process along, IAPSRS contracted with the Thresholds Research and Training Center on
Rehabilitation and Mental Illness to: a) review the literature on PSR competencies and develop a paper that integrated that
literature; and b) conduct a concept mapping project with a selected national group of PSR experts designed to elucidate a
comprehensive framework of competencies. The Jonikas (1993) document constituted the literature review. This report
describes the concept mapping project that was undertaken.
Concept mapping is a process that can be used to help a group describe its ideas on any topic of interest (Trochim, 1989a).
The process typically requires the participants to brainstorm a large set of statements relevant to the topic of interest,
individually sort these statements into piles of similar ones and rate each statement on some scale, and interpret the maps
that result from the data analyses. The analyses typically include a two-dimensional multidimensional scaling (MDS) of
the unstructured sort data, a hierarchical cluster analysis of the MDS coordinates, and the computation of average ratings
for each statement and cluster of statements. The maps that result show the individual statements in two-dimensional (x,y)
space with more similar statements located nearer each other, and show how the statements are grouped into clusters that
partition the space on the map. Participants are led through a structured interpretation session designed to help them
understand the maps and label them in a substantively meaningful way.
The concept mapping process as conducted here was first described by Trochim and Linton (1986). Trochim (1989a)
delineates the process in detail and Trochim (1989b) presents a wide range of example projects. Concept mapping has
received considerable use and appears to be growing in popularity. It has been used to address substantive issues in the
social services (Galvin, 1989; Mannes, 1989), mental health (Cook, 1992; Kane, 1992; Lassegard, 1993; Marquart, 1988;
Marquart, 1992; Marquart et al, 1993; Penney, 1992; Ryan and Pursley, 1992; Shern, 1992; Trochim, 1989a; Trochim and
Cook, 1992; Trochim et al, in press; Valentine, 1992), health care (Valentine, 1989), education (Grayson, 1993; Kohler,
1992; Kohler, 1993), educational administration (Gurowitz et al, 1988), and theory development (Linton, 1989).
Considerable methodological work on the concept mapping process and its potential utility has also been accomplished
(Bragg and Grayson, 1993; Caracelli, 1989; Cooksy, 1989; Davis, 1989; Dumont, 1989; Grayson, 1992; Keith, 1989;
Lassegard, 1992; Marquart, 1989; Mead and Bowers, 1992; Mercer, 1992; SenGupta, 1993; Trochim, 1985 , 1989c,
1990).
The concept mapping process involves six major steps:
1 Preparation
2 Generation
3 Structuring
4 Representation
5 Interpretation
6 Utilization
This report presents the results of the project in sequential order according to the six steps in the process.
Back to Contents

Preparation
The preparation step involves three major tasks. First, the focus for the concept mapping project must be stated operationally. Second, the
participants must be selected. And, third, the schedule for the project must be set.
The Focus for the Concept Mapping
In concept mapping, the focus for the project is stated in the form of the instruction to the brainstorming participant group. For this project this
instruction was operationalized as:
Generate statements (short phrases or sentences) that describe specific workforce competencies for psychosocial rehabilitation
practitioners.
In most projects there is a secondary focus that relates to the ratings of the brainstormed statements. This focus is also stated in its operational
form and, for this project, was:
Using the following scale, rate each competency for its relative importance for high-quality service delivery.
2 3 4 5
1
somewhat moderately very extremely
relatively less important
important important important important
The Participants
Twenty-one people participated in the concept mapping process. They were purposively selected to represent a broad range of PSR experiences
and schools of thought. They included the Director of IAPSRS, the Chair of the committee responsible for developing competencies and several
members of the IAPSRS Board of Directors. Several participants were affiliated with the leading national centers for PSR. There were several
consumers of PSR services.
The Schedule
The concept mapping project was scheduled for two consecutive days. It began on Thursday, November 11th at 2pm. Between 2 and 6 pm the
generation and structuring steps were accomplished. The representation step (i.e., the data entry, analysis and production of materials for
interpretation) was completed by the co-facilitators (Trochim and Cook) during the evening of November 11th. The Interpretation step was
accomplished from 9 to 12 am on Friday, November 12th. Participants were given a two-hour lunch during which they could skim four
documents that attempted to delineate competencies in PSR or related areas (Curtis, 1993; Friday and McPheeters, 1985; Jonikas, 1993; IAPSRS
Ontario Chapter, 1992). The Utilization step was accomplished on Friday afternoon from 2 to 5 pm.
Back to Contents

Generation
The generation step essentially consists of a structured brainstorming session (Osborn, 1948) guided by a specific focus
prompt that limits the types of statements that are acceptable. The focus statement or criterion for generating statements
was operationalized in the form of the instruction to the participants given above. The general rules of brainstorming
applied. Participants were encouraged to generate as many statements as possible (with an upper limit of 100); no
criticism or discussion of other's statements was allowed (except for purposes of clarification); and all participants were
encouraged to take part. The group brainstormed ninety-six statements in approximately a forty-five minutes.
The complete listing of brainstormed statements is given in Table 1. Participants were given a short break while the
statements were printed and duplicated for use in the structuring stage.
Back to Contents

Structuring
Structuring involved two distinct tasks, the sorting and rating of the brainstormed statements. For the sorting (Rosenberg
and Kim, 1975; Weller and Romney, 1988), each participant was given a listing of the statements laid out in mailing label
format with twelve to a page and asked to cut the listing into slips with one statement (and its identifying number) on each
slip. They were instructed to group the ninety-six statement slips into piles "in a way that makes sense to you." The only
restrictions in this sorting task were that there could not be: (a) N piles (in this case 96 piles of one item each); (b) one pile
consisting of all 96 items; or (c) a "miscellaneous" pile (any item thought to be unique was to be put in its own separate
pile). Weller and Romney (1988) point out why unstructured sorting (in their terms, the pile sort method) is appropriate in
this context:
The outstanding strength of the pile sort task is the fact that it can accommodate a large number of items. We know of no
other data collection method that will allow the collection of judged similarity data among over 100 items. This makes it
the method of choice when large numbers are necessary. Other methods that might be used to collect similarity data, such
as triads and paired comparison ratings, become impractical with a large number of items (p. 25).
After sorting the statements, each participant recorded the contents of each pile by listing a short pile label and the
statement identifying numbers on a sheet that was provided. For the rating task, the brainstormed statements were listed in
questionnaire form and each participant was asked to rate each statement on a 5-point Likert-type response scale in terms
of the relative importance of each competency as stated above. Because participants were unlikely to brainstorm
statements that were totally unimportant with respect to PSR, it was stressed that the rating should be considered a relative
judgment of the importance of each item to all the other items brainstormed.
This concluded the structuring session.
Back to Contents

Representation
In the representation step, the sorting and rating data were entered into the computer, the MDS and cluster analysis were
conducted, and materials were produced for the interpretation step.
The concept mapping analysis begins with construction from the sort information of an NxN binary, symmetric matrix of
similarities, Xij. For any two items i and j, a 1 was placed in Xij if the two items were placed in the same pile by the
participant, otherwise a 0 was entered (Weller and Romney, 1988, p. 22). The total NxN similarity matrix, Tij was
obtained by summing across the individual Xij matrices. Thus, any cell in this matrix could take integer values between 0
and 11 (i.e., the 11 people who sorted the statements); the value indicates the number of people who placed the i,j pair in
the same pile.
The total similarity matrix Tij was analyzed using nonmetric multidimensional scaling (MDS) analysis with a two-
dimensional solution. The solution was limited to two dimensions because, as Kruskal and Wish (1978) point out:
Since it is generally easier to work with two-dimensional configurations than with those involving more dimensions, ease
of use considerations are also important for decisions about dimensionality. For example, when an MDS configuration is
desired primarily as the foundation on which to display clustering results, then a two-dimensional configuration is far
more useful than one involving three or more dimensions (p. 58).
The analysis yielded a two-dimensional (x,y) configuration of the set of statements based on the criterion that statements
piled together most often are located more proximately in two-dimensional space while those piled together less
frequently are further apart.
This configuration was the input for the hierarchical cluster analysis utilizing Ward's algorithm (Everitt, 1980) as the basis
for defining a cluster. Using the MDS configuration as input to the cluster analysis in effect forces the cluster analysis to
partition the MDS configuration into non-overlapping clusters in two-dimensional space. There is no simple mathematical
criterion by which a final number of clusters can be selected. The procedure followed here was to examine an initial
cluster solution that on average placed five statements in each cluster. Then, successively lower and higher cluster
solutions were examined, with a judgment made at each level about whether the merger/split seemed substantively
reasonable. The pattern of judgments of the suitability of different cluster solutions was examined and resulted in
acceptance of the fifteen cluster solution as the one that preserved the most detail and yielded substantively interpretable
clusters of statements.
The MDS configuration of the ninety-six points was graphed in two dimensions and is shown in Figure 1. This "point
map" displayed the location of all the brainstormed statements with statements closer to each other generally expected to
be more similar in meaning. A "cluster map" was also generated and is shown in Figure 2. It displayed the original ninety-
six points enclosed by boundaries for the eighteen clusters.
The 1-to-5 rating data was averaged across persons for each item and each cluster. This rating information was depicted
graphically in a "point rating map" (Figure 3) showing the original point map with average rating per item displayed as
vertical columns in the third dimension, and in a "cluster rating map" which showed the cluster average rating using the
third dimension. The following materials were prepared for use in the second session:
(1) the list of the brainstormed statements grouped by cluster
(2) the point map showing the MDS placement of the brainstormed statements and their identifying numbers (Figure 1)
(3) the cluster map showing the eighteen cluster solution (Figure 2)
(4) the point rating map showing the MDS placement of the brainstormed statements and their identifying numbers, with
average statement ratings overlaid (Figure 3)
(5) the cluster rating map showing the eighteen cluster solution, with average cluster ratings overlaid
Representation Results
The final stress value for the multidimensional scaling analysis was .2980101.
Methods for estimating the reliability of concept maps are described in detail in Trochim (1993). Here, six reliability
coefficients were estimated. The first is analogous to an average item-to-item reliability. The second and third are
analogous to the average item-to-total reliability (correlation between each participant's sort and the total matrix and map
distances respectively). The fourth and fifth are analogous to the traditional split-half reliability. The sixth is the only
reliability that examines the ratings, and is analogous to an inter-rater reliability. All average correlations were corrected
using the Spearman-Brown Prophesy Formula (Weller and Romney, 1988) to yield final reliability estimates. The results
are given in Table 2.
Back to Contents

Interpretation
The interpretation session convened on Friday morning to interpret the results of the concept mapping analysis. This session followed a structured process
described in detail in Trochim (1989a). The facilitator began the session by giving the participants the listing of clustered statements and reminding them of the
brainstorming, sorting and rating tasks performed the previous evening. The participants were asked to read through the set of statements in each cluster and
generate a short phrase or word to describe or label the set of statements as a cluster. The facilitator led the group in a discussion where they worked cluster-by-
cluster to achieve group consensus on an acceptable label for each cluster. In most cases, when persons suggested labels for a specific cluster, the group readily
came to a consensus. Where the group had difficulty achieving a consensus, the facilitator suggested they use a hybrid name, combining key terms or phrases
from several individuals' labels.
Once the clusters were labeled, the group was given the point map and told that the analysis placed the statements on the map so that statements frequently piled
together are generally closer to each other on the map than statements infrequently piled together. To reinforce the notion that the analysis placed the statements
sensibly, participants were given a few minutes to identify statements close together on the map and examine the contents of those statements. After becoming
familiar with the numbered point map, they were told that the analysis also organized the points (i.e., statements) into groups as shown on the list of clustered
statements they had already labeled.
The cluster map was presented and participants were told that it was simply a visual portrayal of the cluster list. Each participant wrote the cluster labels next to
the appropriate cluster on their cluster map.
Participants then examined the labeled cluster map to see whether it made sense to them. The facilitator reminded participants that in general, clusters closer
together on the map should be conceptually more similar than clusters farther apart and asked them to assess whether this seemed to be true or not. Participants
were asked to think of a geographic map, and "take a trip" across the map reading each cluster in turn to see whether or not the visual structure seemed sensible.
They were then asked to identify any interpretable groups of clusters or "regions." These were discussed and partitions drawn on the map to indicate the different
regions. Just as in labeling the clusters, the group then arrived at a consensus label for each of the identified regions. Five regions were identified and are shown
in capital letters. No boundaries were drawn to distinguish these five regions.
The facilitator noted that all of the material presented to this point used only the sorting data. The results of the rating task were then presented through the point
rating (Figure 3) and cluster rating (Figure 5) maps. It was explained that the height of a point or cluster represented the average importance rating for that
statement or cluster of statements. Again, participants were encouraged to examine these maps to determine whether they made intuitive sense and to discuss
what the maps might imply about the ideas that underlie their conceptualization.
Table 3 shows the complete cluster listing with the cluster labels the participants assigned and the average importance rating for each statement and cluster.
Discussion of Skills versus Values
The pattern of ratings on the map suggested that participants attached more importance to the clusters that had "value" statements than to those made up of skills.
This can perhaps be seen most clearly in Table 4 which shows the ninety-six competency statements sorted from highest to lowest average importance rating. It is
clear from the table that the statements near the top of the table tend to be more general in nature and more related to values while the statements near the bottom
of the table tend to be more specific, operationalized, skill or knowledge-based ones. Some of the participants felt that the value statements can't be considered
competencies per se because they are not sufficiently operationalized. Others felt that the value statements have actually been holding IAPSRS back in their
development of competencies because they place too much importance on these generic values and not on a more specific skill base. Still others felt that the
value statements are at the heart of what PSR represents and that they can and should be operationalized as competencies. The facilitator characterized the
discussion as a choice between two alternatives:
A) Pull the value statements out of the competencies, perhaps putting them in a section up front describing the kinds of values and characteristics expected of
psychosocial rehabilitation workers.
B) Operationalize the value statements so they can be included as formal competencies.
The consensus of the group was that option B was preferable. As a result, the group decided that a major portion of the afternoon utilization session would
involve taking the value-oriented clusters (Clusters 1-5) and attempting to draft operationalized competency statements for the statements in these clusters.
Discussion of What was Missing on the Map
The group also discussed what concepts seemed to be missing (primarily at the cluster level) from the map. The following potentially missing labels were
generated:
1 Advocacy
2 Systems Change
3 Vocational-Employment
4 Spiritual
5 Housing
6 Education
7 Health
8 Social/Recreational
9 Outcome Evaluation
10 Client Budgeting/Finances
11 Program Management
12 Health and Safety

The group then discussed whether the eventual competencies should have subject-specific categories (such as housing, education, employment) or whether
competencies related to such areas should be spread across the types of headings already on the map (for instance, consumer outcomes related to employment).
The consensus of the group was that the competencies should not be grouped by subject.
Back to Contents

Utilization
The utilization step took place on Friday afternoon from 2-5pm. The following schedule was explained to the participants when they returned from lunch.
Time Activity Facilitator

2-3 Review progress and where we stand BT
Review and Feedback on the map's clusters and regions BT
Discuss the competency documents JC
Present the two small group tasks and have participants select
JC
their group/task
3-4 Small group sessions
4-4:50 Presentation of results of small groups
Summary of map revisions BT
Summary of operationalizing of the five clusters Group Leaders
4:50-5 Discussion of next steps and wrap-up Anita Pernell-Arnold
Review and Feedback on the Map's Clusters and Regions
The first part of the utilization discussion involved suggestions from participants regarding changes that could be made to the final map in order to make it more
interpretable, cohesive and usable. The discussion which took place raised the following points.
1 Doesn't matter which five labels we use.
2 Change the name "Techniques."
3 What is the meaning of "consumer" (consumer involvement issues).
4 "Practitioner" is very broad.
5 Change titles by adding "competencies" to the labels.
6 Some consumer competencies are knowledge-based, others are techniques, others are system issues.
7 View (regions) as "key ingredients."
1 People did some categories according to the specific words in titles (e.g., "ability to...", or "knowledge of..."). Was this wise?
2 Family relationships is lacking key intervention skills--want to add more?
3 Reconsider the two consumer clusters -- are labels OK?
4 Take another look at Friday and McPheeters broad classification -- better than ours? (Some said they lose the values; do they exclude the consumers?).
5 Rename cluster 9 (Assessment) or think of dividing it up.
6 Revisit the cluster name "Personality Characteristics."
7 Consider combining "Interpersonal Social Skills" and "Supportive Behaviors."
Discussion of Other Competency Documents
The group then discussed the four competency statement documents (Curtis, 1993; Friday and McPheeters, 1985; Jonikas, 1993; IAPSRS Ontario Chapter, 1992) that
they skimmed over lunch and compared these to the map. The following comments were made:
1. Current group has defined a set of competencies that is impressive. Need to be clear that we shouldn't come up with competencies that are unrealistic, over-skilled,
characterize a broad range of competencies.
2. Curtis (1993) was not intended to specify competencies limited to PSR.
3. Curtis (1993) is good in its specificity.
4. Jonikas (1993) document has a totality that will be useful in deciding what to put where.
5. Eighty percent of all documents (including the concept map) were similar.
6. Friday and McPheeters (1985) shows earlier development of the field.
7. There is more in the literature of competencies than we thought.
8. Competencies related to knowledge of principles may not capture the centrality of safety, spirituality, work, decent place to live, social life, education, and physical
health in PSR. Don't want to lose the essentials. Also want to emphasize high quality outcomes in these areas.
9. IAPSRS Ontario Chapter (1992) is impressive in its succinctness and specificity. Could help guide us in our document. Action verbs were good in this document.
10.Curtis (1993) document emphasizes the importance of creation of environments, social situations. Not just changing the individual, but creating contexts. Good use
of respect as a concept/process.
Small Group Sessions
In the middle of the afternoon utilization step, the participants were divided into small groups in order to accomplish some more detailed work. Five groups of 2-3
participants each took one of the first five clusters and attempted to operationalize the statements in the cluster into ones that better approximated competency
statements. One small group of six participants discussed and made slight revisions to the final concept map. The results of these two types of small group exercises are
described in separate sections below.
Small Group Operationalizations of Five Clusters
Based on the interpretation discussion in the morning session, it was clear that the participants thought that many of the statements in the first five clusters were better
described as "values" than as operationalized competency statements. The group thought that these value statements could be operationalized and that this would be a
central task for IAPSRS to accomplish as it developed competencies. The central utilization task of the afternoon therefore was to have small groups of participants,
each assigned one of the first five clusters, take the statements in the clusters and develop draft operational competency statements. The summaries of these discussions
(taken from the newsprint sheets used at the presentation of the results) are reproduced below.
This group took each statement in the cluster and generated several more operationalized statements. Where appropriate, they chose statements from several of the other
competency documents and these are cited. This listing shows each brainstormed statement in Cluster 1 and the draft competency statements that the small group
generated.
1. ability to listen to consumers
ï not interrupt the consumer
ï able to repeat back what was said with the consumer affirming the correctness
ï not imposing your agenda on them
10. ability to motivate clients to change behavior
ï to be able to identify reasons for changing the behavior
ï to be able to help them identify consequences
ï willingness to serve as role model for desired change
ï willingness to reinforce behavior that has been changed

36. ability to use the helping relationship to facilitate change
ï use one's own experiences to encourage and guide the consumer
ï ability to demonstrate approval and pride in their accomplishments
87. ability to interact and provide support in a non-judgmental fashion
ï do not demean or patronize consumers
ï give feedback on behavior and not the person (Friday and McPheeters, 1985)
ï use language and behavior which reflects and perpetuates the dignity of the individual (Curtis, 1993)
5. ability to offer hope to others
ï truly believe that there is hope and verbalize it to the consumer
ï share examples of change that was possible in a seemingly hopeless situation
ï have a healthy sense of humor and minimize the adversity (Friday and McPheeters, 1985)
ï focus on consumer successes and help consumer see their own personal growth
6. belief in the recovery process
ï the worker has to demonstrate that he/she believes in the recovery process
ï to express the belief to the consumers that it's possible for them to live productive satisfying lives in the community (Jonikas, 1993)
ï help the consumer believe in his/her inherent capacity to improve or grow, given the opportunity and resources, as it's true for all persons (adapted from Jonikas, 1993)
39. ability to build on successes and minimize failures
ï point out and celebrate their successes
ï help them to see their failure as a learning experience
ï supporting risk-taking behaviors to move one step beyond
ï ability to have the consumer feel good and acknowledge own success no matter how small (adapted from Friday and McPheeters, 1985)
31. connecting (interpersonal) skills
ï demonstrate behaviors that accept the consumer where he/she is at
ï ability to establish a caring but not a consuming or possessive relationship
ï demonstrate behaviors that show interest in the consumer and his/her interpretation of needs
78. ability to work with consumer colleagues
ï to show sensitivity to the difficulties that they may encounter in their dual role
ï avoid labeling persons (either consumers or consumer colleagues) with stereotypes or derogatory terms (Friday and McPheeters, 1985)
ï be straight with consumer colleagues
ï have the same expectations as you do for all other colleagues

89. ability to normalize interactions and program practices
ï ability to generalize program experiences to activities in the broader community
ï have expectations within the program that are consistent with community expectations (with leeway in terms of enforcement)
ï set reasonable limits on bizarre behavior with explanations as to why you are doing it
This group generated the following draft competency statements to cover the material listed in Cluster 2.
ï ability to maintain ongoing productive relationship based on client satisfaction
ï demonstrate high level of interaction (i.e., amount of time, interests, excitement, energy level)
ï communicates belief in growth potential
ï communicates understanding of thoughts/feelings of others in a non-judgmental manner
ï demonstrates holistic understanding of the individual
ï able to focus on the consumer's here and now needs/desires (there was some disagreement on the wording of this one)
ï ability to respond in a normalizing manner to the individual's diverse needs and strengths
The following were suggestions from the group about what statements might be "borrowed" from existing lists:
from Curtis (1993):
4. Demonstrates basic communication and support skills
A1. Exhibits supportive interpersonal skills (i.e., ...)
A2.Establishes and maintains productive relationships with service recipients
ï All of 4A--some areas to "negotiate"
1. especially A and B (language, behavior and holistic understanding)
from Friday and McPheeters (1985):
ï III. Interpersonal - especially 2, 4, 6, 7, 8
Their group also listed some ways to measure competencies in this area:
ï amount of time spent with client
ï client satisfaction with the relationship (amount of support perceived)
ï peer feedback/input
ï share and use own life experience
ï reciprocity of relationship
ï genuineness
14. ability to negotiate
ï to demonstrate communication skills between stakeholders for the purpose of goal attainment which is satisfactory to all parties
58. ability to set limits
ï to identify personal skills and resources, and expectations held by stakeholders in order to achieve realistic/attainable goals
17. willingness to have fun
ï to actively participate in "activities"
82. ability to use self as a role model
ï to mutually share experiences and ideas
ï to achieve goals through partnership
47. ability to ask for help and receive constructive feedback from colleagues and consumers
51. ability to let go
ï to assist consumers to identify their skills/resources and promote a belief in efficacy of their skills in order for consumers to take charge
88. ability to overcome personal prejudices when providing services
ï to identify personal values/beliefs and evaluate their potential impact on all interactions
Cluster 4: Personality Characteristics
16. self awareness
ï be able to describe and explain one's own actions
56. good personal stability but not ego-centric
ï respond consistently and congruently to social and environmental demands
50. ability to handle personal stress
ï separate personal needs and behaviors from job performance needs and behaviors
18. flexibility
ï be able to change behaviors when situations, expectations and requirements are different
25. patience
ï to calmly wait until the objective is reached
28. sense of humor
ï to laugh at what is funny, to laugh at oneself, and to laugh with others
93. ability to know own limits
ï to be able to stop when necessary; to be able to ask for help; to be able to ask for information
24. ability to read and write
1. person must meet high school equivalency level of reading and writing
2. must include accommodations for disabilities like blindness
3. ability to write in behavioral language
4. ability to write with clarity
5. reading comprehension skills must include ability to look up words in the dictionary, comprehend language(s) used in service settings
29. ability to partialize tasks
41. ability to handle multiple tasks
69. ability to prioritize and manage time
ï recognition of total number of tasks inherent in responsibilities
ï identify critical tasks by applying an agreed-upon standard for what is most important
ï ability to gauge the level of effort and amount of time necessary to complete discrete tasks
ï ability to use organizational tools (calendars, to-do lists, tickler file) to keep track of tasks
ï ability to engage consumers in assisting with provider's task and time management
ï ability to recognize and deal effectively with personal stress resulting from multiple tasks
33. tolerance for ambiguity and enjoying diversity
Tolerating Ambiguity
1. Ability to problem-solve ambiguous situations through involvement of others in identification of problem, generation of a number of potential solutions, evaluating
candidate solutions, seeking staff/consumer/family/network feedback re: viability of solutions, selection of solutions, implementation and evaluation of solutions.
2. Ability to recognize and accept unresolvable ambiguities through letting-go, acceptance, humor and other strategies.
3. Ability to distinguish between truly ambiguous situations and situations based on lack of: info, training, feedback from others. Also, ability to address lacking areas
by obtaining info, furthering education/training, seeking feedback.
Enjoying Diversity
1. Ability to identify the opportunities presented by diversity and to incorporate them positively into the rehabilitation process through providing alternatives for
behavior, problem solution, identification of opportunities.
91. willingness to take risks
1. demonstration of creative approaches
2. allowing/assisting consumers to exercise options not endorsed by practitioner, after applying standards of reasonable judgment (safety, etc.)
3. demonstration of willingness to try new or untested approaches and interventions
45. ability to be pragmatic and do hands-on sorts of work
1. Recognition that PSR rehabilitation involves the doing of hands-on tasks for role modeling, relationship building, etc.
2. Willingness to accept and perform well on hands-on, practical tasks.
3. Ability to develop and implement rehabilitation situations in which behavior or doing leads to insight rather than vice versa.
94. never-ending willingness to develop oneself
1. NOTE: The group suggested that this item be moved to the Professional Development cluster. This suggestion was adopted.
2. Development of one's personal growth through hobbies, therapy, education, and to share that growth with consumers/peers for role modeling and motivation.
3. Willingness to seek help appropriately with one's own problems.
Small Group Map Revision
The small group that considered the revisions to the map began by working with the suggestions generated earlier by the entire participant group. The following shows
these suggestions along with the actions taken, if any, by the small group:
Large Group Suggestions Small Group Actions

Two changes were made to the original five labels. The label "Techniques" was
changed to "Rehabilitation Methodology Competencies" and the original label
1. Doesn't matter which five labels we use.
"Consumer" was changed to "Consumer-Centered Competencies". In addition,
all five labels had the term "Competencies" appended to the end.
The label "Techniques" was changed to "Rehabilitation Methodology
2. Change the name "Techniques."
Competencies".
The original label "Consumer" was changed to "Consumer-Centered
3. What is the meaning of "consumer" (consumer involvement issues).
Competencies".
The group decided that the term "Practitioner" would be left as is because it was
4. "Practitioner" is very broad.
an appropriately broad label for a region name.
5. Change titles by adding "competencies" to the labels. This was done for all region and cluster labels.
6. Some consumer competencies are knowledge-based, others are techniques,
The small group agreed but made no changes to the map in response to this.
others are system issues.
7. View (regions) as "key ingredients." The small group agreed but made no changes to the map in response to this.
1. People did some categories according to the specific words in titles (e.g.,
The small group agreed but made no changes to the map in response to this.
"ability to...", or "knowledge of..."). Was this wise?
The cluster label "Family Relationships" was changed to "Family-Focused." No
2. Family relationships is lacking key intervention skills--want to add more?
intervention items were added.
Changed the original cluster label "Consumer Goal Attainment" to "Consumer
3. Reconsider the two consumer clusters -- are labels OK?
Outcome Competencies."
The small group felt that there was considerable cross-classifiability across the
4. Take another look at Friday and McPheeters broad classification -- better than
different competency documents and the map. No changes were made to the
ours? (Some said they lose the values; do they exclude the consumers?).
map in response to this.
The group retained the name for the cluster, only changing it to "Assessment
5. Rename cluster 9 (Assessment) or think of dividing it up. Competencies." See table below for specific statements moved into and out of
this cluster.
The group changed the original cluster label "Personality Characteristics" to
6. Revisit the cluster name "Personality Characteristics."
"Intrapersonal Competencies."
These clusters (original clusters 1 and 2) were combined into one cluster labeled
7. Consider combining "Interpersonal Social Skills" and "Supportive Behaviors."
"Interpersonal Competencies."
The original cluster label "Cultural Competence" was changed to "Multicultural

Competencies."
The positions of the original clusters "Family Relationships" and "Mental Health
Knowledge Base" were switched on the map.
In addition to the above changes, several specific statements were shifted from one cluster to another. These changes are shown in Figure 6 and listed in the table below:
Statement Original Cluster Location Cluster Moved To
43. knowledge of a wide variety of approaches to
Family Relationships Mental Health Knowledge Base Competencies
mental health services
40. ability to establish alliances with providers,
professionals, families, consumers (partnership Family Relationships Community Resources Competencies
model)
12. skills in advocacy Assessment Community Resources Competencies
15. strong crisis intervention skills Assessment Intervention Skills Competencies
85. early identification and intervention skills to deal
Assessment Intervention Skills Competencies
with relapse
94. never-ending willingness to develop oneself Personality Characteristics Professional Development Competencies
53. ability to assess behavior in specific environments Intervention Skills Assessment Competencies
55. functional assessment Intervention Skills Assessment Competencies
64. ability to assess active addiction and co-
Intervention Skills Assessment Competencies
dependency
In all of the nine statement shifts described above, the shift was from one cluster into an adjacent one on the map. The revised cluster listing showing the new cluster
labels and the average importance ratings is given in Table 5.
The small group also drew explicit lines dividing the five regions. These are shown in Figure 7. They felt that several of the clusters actually overlapped multiple
regions and, consequently, the region lines cut through a cluster shape rather than only going between clusters. For instance, The felt that the cluster "Interpersonal
Competencies" should fall simultaneously and partially into the three regions of "Consumer-Centered Competencies", "Practitioner Competencies" and Rehabilitation
Methodology Competencies." Similarly, they felt that the cluster "Professional Development Competencies" should fall into both the "practitioner Competencies" and
"Knowledge Base Competencies" regions. The regional lines were drawn on the final map to show these multi-regional clusters.
Figure 8 constitutes the final map for this project. It shows the clusters and regions and includes the average importance ratings for each cluster. There was considerable
consensus across the participant group that it was a good and fair representation of their ideas regarding competencies for psychosocial rehabilitation workers.
Next Steps
The final discussion of the project involved consideration of the next steps in the competency development process. The following points were made:
1. Print up list of competencies and survey PSR workers.
2. Review and comment on Trochim concept mapping report.
3. Circulate regions, clusters and individual competencies to various constituencies: consumers, families, PSR workers, other stakeholders.
4. Further operationalize remaining competencies.
5. Distinguish between entry-level and second-level competencies.
6. Edit and make language consistent on materials sent out for review.
7. Clarify the intent of the present process re: the use to which the final product will be put.
8. Inform a wide range of stakeholders of IAPSRS's intentions in this area.
9. Bring in an expert in credentialing to clarify legal risks, probable results, etc.
10. Involve Training and Certification Committee in this process.
11. Don't send document for review prematurely. Use simple format that helps potential reviewers. Perhaps include a glossary to aid potential reviewers.
12. Be aware of other lists of competencies so review process doesn't become confused.
13. Include feedback from IAPSRS chapter presidents.
14. Certification conference.

15. Further literature review.
16. Hire someone to draft standards from competencies.
17. Develop an ethics statement based on already-held ethics forum.
18. Requirements of an "arms length" certification organization.
19. Need to consider the voluntary nature of CARF accreditation for organizations parallel to possible implementation of standards for practitioners.
20. Conduct a cost/benefit analysis of certification.
Back to Contents

References
Bragg, L.R. and Grayson, T.E. (1993). Reaching consensus on outcomes: Lessons learned about concept mapping. Paper
presented at the Annual Conference of the American Evaluation Association, Dallas, TX.
Caracelli, V. (1989). Structured conceptualization: A framework for interpreting evaluation results. Evaluation and
Cook, J. (1992). Modeling staff perceptions of a mibile job support program for persons with severe mental illness. Paper
presented at the Annual Conference of the American Evaluation Association, Seattle, WA.
Cooksy, L. (1989). In the eye of the beholder: Relational and hierarchical structures in conceptualization. Evaluation and
Curtis, L. (1993). Workforce competencies for direct service staff to support adults with psychiatric disabilities in
community mental health services. The Center for Community Change through Housing and Support, Burlington, VT.
Davis, J. (1989). Construct validity in measurement: A pattern matching approach. Evaluation and Program Planning. 12,
1, 31-36.
Dumont, J. (1989). Validity of multidimensional scaling in the context of structured conceptualization. Evaluation and
Friday, J.C. and McPheeters, H.L. (1985). Assessing and improving the performance of psychosocial rehabilitation staff.
Southern Regional Education Board, Atlanta, GA.
Galvin, P.F. (1989). Concept mapping for planning and evaluation of a Big Brother/Big Sister program. Evaluation and
Grayson, T.E. (1992). Practical issues in implementing and utilizing concept mapping. Paper presented at the Annual
Grayson, T.E. (1993). Empowering key stakeholders in the strategic planning and development of an alternative school
program for youth at risk of school behavior. Paper presented at the Annual Conference of the American Evaluation
Gurowitz, W.D., Trochim, W. and Kramer, H. (1988). A process for planning. The Journal of the National Association of
Student Personnel Administrators, 25, 4, 226-235.
International Association of Psychosocial Rehabilitation Services, Ontario Chapter. (1992). Competencies for Post-
Diploma Certificate Programs in Psychosocial Rehabilitation, Ontario, Canada.
Jonikas, J.A. (1993). Staff competencies for service-delivery staff in psychosocial rehabilitation programs. Thresholds
National Research and Training Center on Rehabilitation and Mental Illness, Chicago, IL.
Kane, T.J. (1992). Using concept mapping to identify provider and consumer issues regarding housing for persons with
severe mental illness. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.
Keith, D. (1989). Refining concept maps: Methodological issues and an example. Evaluation and Program Planning. 12,
1, 75-80.
Kohler, P.D. (1992). Services to students with disabilities in postsecondary education settings: Identifying program
outcomes. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.
Kohler, P.D. (1993). Serving students with disabilities in postsecondary education settings: Using program outcomes for
planning, evaluation and empowerment.Paper presented at the Annual Conference of the American Evaluation
Lassegard, E. (1992). Assessing the reliability of the concept mapping process. Paper presented at the Annual Conference
of the American Evaluation Association, Seattle, WA.
Lassegard, E. (1993). Conceptualization of consumer needs for mental health services.Paper presented at the Annual
Linton, R. (1989). Conceptualizing feminism: Clarifying social science concepts. Evaluation and Program Planning. 12, 1,
25-30.
Mannes, M. (1989). Using concept mapping for planning the implementation of a social technology. Evaluation and
Marquart, J.M. (1988). A pattern matching approach to link program theory and evaluation data: The case of employer-
sponsored child care. Unpublished doctoral dissertation, Cornell University, Ithaca, New York.
Marquart, J.M. (1989). A pattern matching approach to assess the construct validity of an evaluation instrument.
Marquart, J.M. (1992). Developing quality in mental health services: Perspectives of administrators, clinicians, and
consumers. Paper presented at the Annual Conference of the American Evaluation Association, Seattle, WA.
Marquart, J.M., Pollak, L. and Bickman, L. (1993). Quality in intake assessment and case management: Perspectives of
administrators, clinicians and consumers. In R. Friedman et al. (Eds.), A system of care for children's mental health:
Organizing the research base. Tampa: Florida Mental Health Institute, University of South Florida.
Mead, J.P. and Bowers, T.J. (1992). Using concept mapping in formative evaluations. Paper presented at the Annual
Mercer, M.L. (1992). Brainstorming issues in the concept mapping process. Paper presented at the Annual Conference of
Penney, N.E. (1992). Mapping the conceptual domain of provider and consumer expectations of inpatient mental health
treatment: New York Results. Paper presented at the Annual Conference of the American Evaluation Association, Seattle,
WA.
Romney, A.K., Weller, S.C. and Batchelder, W.H. (1986). Culture as consensus: A theory of culture and informant
accuracy. American Anthropologist, 88, 2, 313-338.
Rosenberg, S. and Kim, M.P. (1975). The method of sorting as a data gathering procedure in multivariate research.
Multivariate Behavioral Research, 10, 489-502.
Ryan, L. and Pursley, L. (1992). Using concept mapping to compare organizational visions of multiple stakeholders.
SenGupta, S. (1993). A mixed-method design for practical purposes: Combination of questionnaire(s), interviews, and
concept mapping.Paper presented at the Annual Conference of the American Evaluation Association, Dallas, TX.
Shern, D.L. (1992). Documenting the adaptation of rehabilitation technology to a core urban, homeless population with
psychiatric disabilities: A concept mapping approach. Paper presented at the Annual Conference of the American
Evaluation Association, Seattle, WA.
Shern, D.L., Trochim, W. and LaComb, C.A. (1993). The use of concept mapping for assessing fidelity of model transfer:
An example from psychiatric rehabilitation. Unpublished manuscript. New York State Office of mental health, Albany,
NY..
Trochim, W. (1985). Pattern matching, validity, and conceptualization in program evaluation. Evaluation Review, 9, 5,
575-604.
Trochim, W. (1989a). An introduction to concept mapping for planning and evaluation. Evaluation and Program Planning,
12, 1, 1-16.
Trochim, W. (1989a). An introduction to concept mapping for planning and evaluation. Evaluation and Program Planning,
12, 1, 1-16.
Trochim, W. (1990). Pattern matching and program theory. In H.C. Chen (Ed.), Theory-Driven Evaluation. New
Directions for Program Evaluation, San Francisco, CA: Jossey-Bass.
Trochim, W. (1993). The reliability of concept mapping. Paper presented at the Annual Conference of the American
Evaluation Association, Dallas, Texas, November 6, 1993.
rehabilitation. in H. Chen and P.H. Rossi (Eds.) Using Theory to Improve Program and Policy Evaluations. Greenwood
Press, New York, 49-69.
rehabilitation. in H. Chen and P.H. Rossi (Eds.) Using Theory to Improve Program and Policy Evaluations. Greenwood
Press, New York, 49-69.
Trochim, W. and Linton, R. (1986). Conceptualization for evaluation and planning. Evaluation and Program Planning, 9,
289-308.
Trochim, W., Cook, J. and Setze, R. (in press). Using concept mapping to develop a conceptual framework of staff's
views of a supported employment program for persons with severe mental illness. Consulting and Clinical Psychology.
Valentine, K. (1992). Mapping the conceptual domain of provider and consumer expectations of inpatient mental health
treatment: Wisconsin results. Paper presented at the Annual Conference of the American Evaluation Association, Seattle,
WA.
Weller S.C. and Romney, A.K. (1988). Systematic Data Collection. Newbury Park, CA, Sage Publications.
White House Domestic Policy Council. (1993). Health Security: The President's Report to the American People. Simon
and Schuster: New York.
Back to Contents
Concept Mapping Resource Guide
This page is the central resource guide for learning about structured conceptual
mapping. It includes links to general introductory materials, research and case studies
illustrating the use of the method, and comprehensive information about the Concept
System software including how to obtain and license it.
INTRODUCTION INTRODUCTORY MATERIALS RESEARCH visually

General organize the
ideas of a
Web Papers. Many of the major papers are
General Reading. A good place to group or
Presentations available on the web. The major
start learning about concept mapping is organization
references for concept mapping and
by doing some reading of the primary
pattern matching by the presenting author
background articles. Here are some of
RESEARCH (with web location, where appropriate)
the "classic" articles that are a good
are:
Papers starting point:
Whitepapers
Case Studies Trochim, W., Milstein, B.
Trochim, W. (1989). An
Wood, B., Jackson, S. and
introduction to concept Pressler, V. (2004). Setting
PRACTICE mapping for planning and Objectives for Community and
Knowledge Base evaluation. In W. Trochim Systems Change: An
Online Help (Ed.) A Special Issue of Application of Concept use
Tutorial Evaluation and Program Mapping for Planning a multivariate
Training Planning, 12, 1-16. Statewide Health Improvement methods within
Trochim, W. (1989). Concept a participatory
Worksheets Initiative, Health Promotion
mapping: Soft science or mixed-
Download Software Practice, 5, 1, 8-19. methods
Install Software hard art? In W. Trochim Trochim, W. Stillman, F., Clark, approach
Start (Ed.) A Special Issue of P., and Schmitt, C. (2003).
Evaluation and Program Development of a Model of the
Planning, 12, 87-110. Tobacco Industry’s
Search Trochim, W. Reliability of Interference with Tobacco
Concept Mapping. Paper Control Programs. Tobacco
presented at the Annual Control, 12, 140-147.
Conference of the American Jackson, K. and Trochim, W.
Evaluation Association, (2002). Concept mapping as
Dallas, Texas, November, an alternative approach for the
1993. analysis of open-ended survey
responses. Organizational
Web Page Introductions. There are Research Methods, Vol. 5 No.
also several good introductory web 4, October, 307-336.
McLinden, D. J. & Trochim, W. web-based
pages available for the beginner: synchronous
M.K. (1998). From Puzzles to
or
Problems: Assessing the
What is concept mapping? asynchronous
Impact of Education in a participation
What is pattern matching? Business Context with Concept with standard
Concept Mapping from the Mapping and Pattern Matching. browsers
Trochim, W. (1999). The In J. Phillips (Ed.), Evaluation
Research Methods Systems and Processes.
Knowledge Base. Alexandria, VA: American
Society for Training and
Development.
There are two major websites that act
as major resources on concept
mapping:
Concept Systems McLinden, D. J. & Trochim, W.
Incorporated M.K. (1998). Getting to parallel:
Assessing the return on
expecations of training. dramatically
This is, of course, reduce the
Performance Improvement, 37,
the primary website amount of time
8, 21-26.
and the only one spent in
Witkin, B. and Trochim, W. A
from which you can meetings
(1997) Toward a synthesis of
download the
software and obtain listening constructs: a concept
information about map analysis. International
licensing and Journal of Listening, 11, 69-87.
training. There are Kolb, D. & Shepherd, D..
numerous papers, (1997). Concept Mapping
case studies, and a Organizational Cultures.
comprehensive Journal of Management
Knowledge Base Inquiry: 6 (4). 282-295.
that covers both the Michalski, G. (1997).
group process and Stakeholder Variation in
software. The site Perceptions About Training get a clearer
emphasizes Program Results and picture of what
applications and people think
Evaluation: A Concept
examples from
corporate and for- Mapping Investigation. Paper
profit contexts. presented at the Annual
Conference of the American
Evaluation Association,
Center for Social Atlanta, San Diego, November,
Research Methods 1997.
SenGupta, S. (1996). Concept
Mapping and Pattern Matching
A good general
in Integrated Mental Health
resource in applied
social research and Service Delivery. Paper
evaluation, the site presented at the Annual
includes numerous Conference of the American
concept mapping Evaluation Association, engage the
papers and Atlanta, Georgia, November, organization in
projects, primarily 1996. knowledge
emphasizing the Shern, D.L., Trochim, W. and sharing
more scholarly LaComb, C.A. (1995). The use
basis of concept of concept mapping for
mapping and its assessing fidelity of model
application in basic transfer: An example from
research, public
psychiatric rehabilitation.
and not-for-profit
contexts. Evaluation and Program
Planning, 18, 2.
Trochim, W. (1996). Criteria for
Presentations. You can view the evaluating graduate programs
Concept System Introductory Tour. in evaluation. Evaluation News
The tour provides an overview of the and Comment: The Magazine
Concept System software and briefly of the Australasian Evaluation
describes the steps in the process. Society, 5, 2, 54-57.
communicate
Trochim, W., Cook, J. and
ideas and
Introduction to Concept Mapping Setze, R. (1994). Using
build
Presentation in San Diego, January 13, concept mapping to develop a consensus
2005. conceptual framework of staff's
views of a supported
employment program for
PRACTICE
persons with severe mental
illness. Consulting and Clinical
Knowledge Base and Online Help. Psychology, 62, 4, 766-775.
The entire Concept System help file is
on the web as the Concept System Trochim, W. Reliability of
Knowledge Base. This web version is Concept Mapping. Paper
identical to the full context-sensitive presented at the Annual
help system that ships with the Conference of the American
Concept System program. It is the Evaluation Association, Dallas,
equivalent of two entire books of Texas, November, 1993.
information and hyperlinked together, Trochim, W. and Cook, J. analyze needs
one describing the concept mapping (1992). Pattern matching in and assess
process issues and the other outcomes
theory-driven evaluation: A
explaining how to use the software.
You should examine this resource and, field example from psychiatric
once you have downloaded and rehabilitation. in H. Chen and P.
installed the program, you should get H. Rossi (Eds.) Using Theory
into the habit of hitting the F1 key when to Improve Program and Policy
you're in the Concept System and Evaluations, Greenwood
need or want context-sensitive Press, New York, 49-69.
assistance. the web version is at: Trochim, W. (Ed.) (1989). A
Special Issue of Evaluation and
http://www. Program Planning on Concept
conceptsystems. Mapping for Planning and
com/kb/cshelp.cfm Evaluation, 12. This is a full
volume devoted to concept
mapping. It includes the measure
Tutorial. From the software download following articles: performance
over time
page (see above) you can also Trochim, W. (1989).
download a detailed tutorial that has An introduction to
hands-on lessons showing you how to concept mapping for
use the software. The tutorial uses the planning and
Strategic Planning Example that is evaluation. In W.
automatically installed with the Trochim (Ed.) A
program. Training Concept Systems Special Issue of
Incorporated regularly offers Evaluation and
comprehensive introductory facilitator Program Planning,
training as well as a variety of 12, 1-16.
advanced training programs. You can Caracelli, V. (1986).
find out about training at: Structured
conceptualization: A
framework for
http://www. interpreting link planning
conceptsystems. evaluation results. and action to
com/About/training. Evaluation and the strategic
cfm Program Planning. goals of the
12, 1, 45-52. organization
Cooksy, L. (1989). In
Training. Concept Systems the eye of the
Incorporated regularly offers beholder: Relational
comprehensive introductory facilitator and hierarchical
training as well as a variety of structures in
advanced training programs. You can conceptualization.
find out about training at: Evaluation and
Program Planning.
12, 1, 59-66.
http://www. Davis, J. (1989).
conceptsystems. Construct validity in
com/About/training. measurement: A
pattern matching
cfm
approach. Evaluation
and Program
Worksheets. Here are an assortment Planning. 12, 1, 31-
of worksheets you can use in preparing 36.
and managing a Concept System
project. These are all in Microsoft Word
for Windows, v6 format:
Dumont, J. (1989).
http://www.
Validity of
conceptsystems. multidimensional
com/download/ scaling in the context
download.cfm of structured
conceptualization.
Evaluation and
Downloading. The full Concept Program Planning.
System program can be downloaded at 12, 1, 81-86.
no charge from the Concept Systems Galvin, P.F. (1989).
Incorporated website. In order to Concept mapping for
download, you are required first to fill planning and
out a brief online form requesting basic evaluation of a Big
identification information. The form is at Brother/Big Sister
program. Evaluation
and Program
http://www. Planning. 12, 1, 53-
conceptsystems. 58.
com/About/contact. Keith, D. (1989).
Refining concept
cfm
maps:
Methodological
When you complete the form, the issues and an
download page will be displayed. You example. Evaluation
can download the program as a single and Program
4.7M file or as four separate disk Planning. 12, 1, 75-
images. From that page you can also 80.
download the one-disk Concept Linton, R. (1989).
System Remote program. This is also Conceptualizing
included in the full download. the only feminism: Clarifying
reason you might want to download the social science
Remote program separately is if you concepts. Evaluation
intend to have your users/participants and Program
enter their data using the Remote and Planning. 12, 1, 25-
you would like to have the separate 30.
install (so your users don't need the full Keith, D. (1989).
4-disk installation or large single file). Refining concept
maps:
Methodological
Installing. The Concept System can issues and an
be installed on any Windows 3.1, 95, or example. Evaluation
NT machine. Please note: The and Program
Concept System does not work with Planning. 12, 1, 75-
any Macintosh machines. 80.
Mannes, M. (1989).
Using concept
To install the program follow the usual mapping for planning
software installation procedures. If you the implementation
downloaded the single file you now of a social
have a file on your machine named technology.
csinstal.exe. If you navigate to this file Evaluation and
and double-click on it the installation Program Planning.
program will start. If you downloaded 12, 1, 67-74.
the four disk images, you should be Marquart, J.M.
sure you put them into the same (1989). A pattern
directory. You can either copy each matching approach
image to its own initialized disk or to assess the
leave them on your hard disk and construct validity of
install directly from them. The four files an evaluation
are setup.exe, setup.w02, setup.w03 instrument.
and setup.w04. If you navigate to setup. Evaluation and
exe and double-click on it, the setup Program Planning.
program will start. As long as all four 12, 1, 37-44.
files are in the same directory the setup
will run correctly. Of course, if you
transfer the four files to their own
floppy disks you should start by Trochim, W. (1989).
inserting the disk with setup.exe on it. Concept mapping:
You will then be prompted for the Soft science or hard
remaining three disks. art? In W. Trochim
(Ed.) A Special Issue
of Evaluation and
The install program automatically Program Planning,
installs one already-completed 12, 87-110.
example project, Strategic Planning Trochim, W. (1989). Outcome
Example. You can explore this pattern matching and program
example and learn how to use the
theory. Evaluation and
program on it. In order to access the
example, you need to start the Concept Program Planning, 12, 4, 355-
System program and select the 366.
Strategic Planning Example (it will be Gurowitz, W.D., Trochim, W.
the only project listed if you just and Kramer, H. (1988). A
installed the program). You will be process for planning. The
presented with a logon box. In order to Journal of the National
log onto the project you will need to Association of Student
use the following username and Personnel Administrators, 25,
password: 4, 226-235.
Trochim, W. (1987) Pattern
matching and program theory.
Username: Admin In P.H. Rossi and H. Chen
Password: concept (Eds.), Special Issue on
Theory-Driven Evaluation.
Please note that the login is context Evaluation and Program
sensitive, so you must type the Planning.
username with a capital 'A' and Trochim, W. and Linton, R.
lowercase 'dmin' and the password (1986). Conceptualization for
must be in all lowercase. evaluation and planning.
Evaluation and Program
Planning, 9, 289-308.
Starting. When you are ready to use Trochim, W. (1985). Pattern
The Concept System on a project of matching, validity and
your own, you will need to obtain a
license. You can find information about conceptualization in program
licensing at: evaluation. Evaluation Review,
9, 5, 575-604.
http://www.
conceptsystems. Several of the more useful references for
com/software/ the statistical analysis in concept
mapping include:
license.cfm
Anderberg, M.R. (1973).

Look over the various licensing options Cluster analysis for
and decide which one is right for you. applications. New York, NY:
Academic Press.
Davison, M.L. (1983).
To actually start the project, start the
Multidimensional Scaling. New
New Project program that you installed
York, NY: John Wiley and
when you installed The Concept
Sons.
System. On the start-up screen you will
Everitt, B. (1980). Cluster
see a Code Entry Number and next to
Analysis (2nd Edition). New
it will be a Save button. Click on the
York, NY: Halsted Press, A
Save button (to make sure that the
Division of John Wiley and
next time you start the program you'll
Sons.
have the same Code Entry Number!)
Kruskal, J.B. and Wish, M.
and write down your Code Entry
(1978). Multidimensional
Number. Then, contact Concept
scaling. Beverly Hills, CA:
Systems Inc., to arrange for payment,
Sage Publications.
and obtain an Unlock Code for the
license you want. You can call Concept
Systems Inc. with your Code Entry
Number at (607) 272-1206 between
9am and 5pm Eastern Standard Time, Rosenberg, S. and Kim, M.P.
or e-mail us with your Code Entry (1975). The method of sorting
Number and payment information at as a data-gathering procedure
Concept Systems in multivariate research.
(infodesk@conceptsystems.com) Multivariate Behavioral
anytime. Once you've arranged for Research, 10, 489-502.
payment, you'll be given an Unlock Weller, S.C. and Romney, A.K.
Code that you enter into the (1988). Systematic Data
appropriate box in the New Project Collection. Beverly Hills,
program that will allow you to start your California: Sage Publications.
own concept mapping project.
Whitepapers. These are unpublished
lengthier pieces on technical or
theoretical aspects of the Concept
System and its process. A number of
these can be accessed off the Concept
Systems Incorporated website at: http://
www.conceptsystems.com/library/
whitepapers.cfm
In addition, there are also the following

whitepapers, special reports and
unpublished monographs:
The Evaluator as
Cartographer: Technology for
Mapping Where We're Going
and Where We've Been.
Keynote presentation to the
1999 Conference of the
Oregon Program Evaluators
Network, "Evaluation and
Technology: Tools for the 21st
Cenury", Portland, Oregon,
October 8, 1999.
Trochim, W. Developing an
Evaluation Culture in
International Agriculture
Research. Invited address
presented at the Cornell
Institute on International Food,
Agriculture and Development's
(CIIFAD) workshop on the
Assessment of International
Agricultural Research Impact
for Sustainable development,
Cornell University, Ithaca NY,
June 16-19, 1991.
Trochim, W. An Internet-Based
Concept Mapping of
Accreditation Standards for
Evaluation. Paper presented at
the Annual Conference of the
American Evaluation
Association, Atlanta, Georgia,
November, 1996.
Trochim, W. Reliability of
Concept Mapping. Paper
presented at the Annual
Conference of the American
Evaluation Association, Dallas,
Texas, November, 1993.
Trochim, W., Dumont, J. and
Campbell, J. (1993). Mapping
mental health outcomes from
the perspective of consumers/
survivors. NASMHPD
Technical Reports Series,
National Association of Mental
Health Program Directors,
Alexandria VA.
Trochim, W. (1993). Workforce
Competencies for Psychosocial
Rehabilitation Workers: A
Concept Mapping Project. Final
report for the conference of the
International Association of
Psychosocial Rehabilitation
Services, Albuquerque, new
mexico, November 11-12, 1993.
Case Studies. You can view a wide

variety of case studies of concept
mapping applications at: http://www.
conceptsystems.com/Consult/
CaseStudies/All.cfm
Pattern Matching
Pattern matching allows you to compare, Further Resources:

both visually and statistically, two ratings
The Concept System
from a concept map in order to explore
consensus, track consistency over time or
Concept Mapping
evaluate outcomes relative to expectations.
Our Software
Creating A Pattern Match
SEARCH SITE Pattern matches are based on the Licensing
information in a concept map. To create a
pattern match, you simply decide what you want to measure and select the groups you want
to compare. The Concept System does the rest for you.
Pattern matching can be used to:
assess consensus by comparing the views of different stakeholder groups, e.g.,

The Concept System
●
managers vs. line staff or one department versus another

has been used
relate the importance of various program elements to the financial support given to
successfully by
●
them
major management
match expectations for a project with the work accomplished to date
consulting
●
track the consistency of performance over time

companies,
●
assess how well outcomes or results meet the group's expectations.

corporations,
●
government
agencies and not-
for-profit
organizations for
strategic planning,
needs assessment,
training curriculum
development and
evaluation,
performance and
program evaluation
and focus groups.
Read More...
ARTICLE
HEALTH PROMOTION PRACTICE /MONTH YEAR
Policy and Politics
Setting Objectives for Community and

Systems Change: An Application of Concept
Mapping for Planning a Statewide
Health Improvement Initiative
William M.K. Trochim, PhD
Bobby Milstein, MPH
Betty J. Wood, PhD, MPH
Susan Jackson, BA
Virginia Pressler, MD, MBA, FACS
The Hawaii Department of Health comprehensive health improvement government leaders to devise sound
(HDOH) used concept mapping initiatives. strategies for putting the allocated
techniques to engage local stake- funds to good use. Indeed, many
holders and national subject area Keywords: community and systems governors and legislators required
experts in defining the community change; health plan- detailed, outcome-oriented plans
and system factors that affect indi- ning; tobacco settle- before agreeing to release the money.
viduals’ behaviors related to ment; concept mapping; Relatively little is known about
tobacco, nutrition, and physical multidimensional the processes that citizens, elected
activity. Over eight working days, scaling representatives, and agency officials
project participants brainstormed used to negotiate priorities for their
496 statements (edited to a final set windfall funding. There also has not
of 90), which were then sorted and been a review of the specific out-
rated for their importance and feasi- comes that constituents in each state
T
he 1998 master settlement
bility. A sequence of multivariate agreement between the U.S. sought to achieve through these
statistical analyses, including multi- Attorneys General and the investments. This article describes
dimensional scaling and hierarchi- tobacco industry created an historic how officials of the Hawaii Depart-
cal cluster analysis, generated maps opportunity for protecting the pub- ment of Health (HDOH), even while
and figures that were then inter- lic’s health. A total of 46 states working under intense time pres-
preted by project stakeholders. The received a share of the $206 billion sure, were able to meaningfully
results were immediately incorpo- settlement to use at their discretion. involve multiple stakeholders in set-
rated into an official plan, approved Some states invested these resources ting outcome objectives for their por-
by the Governor and state legisla- in tobacco prevention and control, tion of the tobacco settlement fund.
ture, recommending how Hawaii’s
> BACKGROUND
others paid for the medical care
tobacco settlement resources could expenses of individuals with
be used to create sustainable tobacco-related illnesses, still others
changes in population health. The directed money outside of the health
results also provide empirical sup- sector entirely, and many chose to The Tobacco Settlement in Hawaii
port for the premise that both com- fund a variety of activities (Centers Hawaii’s share of the master set-
munity and systems factors ought to for Disease Control and Prevention, tlement agreement is approximately
be considered when planning 2001; National Conference of State $1.3 billion to be paid over 25 years.
Legislatures, 2001). A unique mix of The first installment of $14.8 million
perspectives shaped the allocation (approximately 2% of the HDOH
Health Promotion Practice
decisions in each state. The rapid annual operating budget) was paid
2003 Vol. , No. , 1-12 influx of money came with high on December 14, 1999. Only 5
DOI: 10.1177/1524839903258020 expectations for what could be months earlier, the Hawaii Tobacco
©2003 Society for Public Health Education accomplished, creating pressure for Settlement Special Fund (Act 304),
1
was signed into law on August 4, Oversight responsibility went to ment of large organizational endeav-
1999, making Hawaii the first state the HDOH for planning and manag- ors both in the public and private
in the nation to declare how they ing both the tobacco trust fund and sectors (Allison & Kaye, 1997;
would allocate tobacco settlement the community-based health promo- Koteen, 1997; Lorange, 1994). The
funds. Thus, Hawaii was one of only tion activities; however, a strategic HDOH began its strategic planning
a few states to dedicate a majority of plan had to be delivered to the gover- with commitments to engage multi-
the settlement revenue for public nor and the state legislature within 3 ple stakeholders and incorporate the
health priorities (National Confer- months outlining recommended latest prevention science. Its first
ence of State Legislatures, 2001). program priorities and expendi- step was to develop a “plan for the
Hawaii’s legislation mandated that tures. The law mandated that the plan,” which assembled background
60% of the funds be used for tobacco HDOH draft a strategic plan for information and set forth the broad
prevention and control as well as review by the governor and state goals, scope, assumptions,
health promotion and chronic dis- legislature within 5 months. approaches, and responsibilities of
ease prevention. The intent of the staff. Next, HDOH planners con-
law was to invest a significant pro-
portion of the funds in primary pre- > METHODS sulted with scientists from the Cen-
ters for Disease Control and Preven-
vention initiatives that would even- tion to learn about state-of-the-art
tually reduce the need for direct Planning the Healthy Hawaii techniques in health promotion and
service programs, create a lasting Initiative (HHI) chronic disease prevention. In addi-
impact on community conditions for tion, HDOH and Centers for Disease
Strategic planning has become an
health and wellness, and leverage Control and Prevention staff mem-
essential component of the manage-
the power of partnerships. By main- bers worked together to develop a
taining a consistent focus on pro- system for engaging stakeholders
moting health and preventing chronic and ensuring accountability through
disease, leaders in Hawaii aimed to the Authors ongoing evaluation and program
leave a legacy of healthier people as William M.K. Trochim, PhD, is a improvement. Throughout the plan-
well as communities that support professor of policy analysis and ning process, the focus was on
healthier living. A total of priorities management at Cornell University in developing a clear and convincing
were identified and funded: Ithaca, New York. plan that could create meaningful
Bobby Milstein, MPH, is an evalua- health improvements and be practi-
• Establish an emergency and bud- tion coordinator for the Division of cal to implement and evaluate. The
get reserve fund (i.e., “rainy day Adult and Community Health at the program, now known as the HHI, is
fund”) as a supplemental source Centers for Disease Control and Pre- what emerged from that process.
of funding to guard against eco- vention in Atlanta.
nomic instability, provide for
disaster recovery, and improve Betty J. Wood, PhD, MPH, is the Pre- A Focus on Changing Community
the state’s bond rating (40% of ventive Health and Health Services Conditions and Systems
the settlement); block grant epidemiologist at the
• establish a trust fund to ensure Hawaii Department of Health in During the past decade, health
that resources are available in Honolulu. departments have been encouraged
perpetuity for tobacco-related to forge new partnerships and
Susan Jackson, BA, is the project
prevention and control activities increase their focus on environmen-
manager for the Healthy Hawaii Ini-
(25%); tal and policy change as a means of
tiative and the Tobacco Settlement
• Provide funds to guarantee that improving population health (Insti-
Fund at the Hawaii Department of
every child in Hawaii has health tute of Medicine, 1996; U.S. Depart-
Health in Honolulu.
insurance (up to 10%); and
ment of Health and Human Services,
• support community-based health Virginia Pressler, MD, MBA, FACS,
promotion initiatives aimed at 2000a, 2000b). From the outset, the
is vice president of Womens Health
reducing risk behaviors that are Services at Kapiolani Medical Center
HHI strategy emphasized the impor-
responsible for the greatest bur- and Hawaii Pacific Health in Hono- tance of changing community condi-
den of chronic disease in the lulu. She was deputy director of the tions and systems (i.e., those pro-
state (i.e., tobacco use, poor Hawaii State Department of Health grams, policies, practices,
nutrition, and physical inactiv- at the time of this work. infrastructures, norms, and other
ity) (25%). factors that shape health-related
2 HEALTH PROMOTION PRACTICE / 2003

Policy and Politics
behavior and affect health status). tobacco use, diet, and physical The Healthy People 2010 objec-
Building on expanded definitions of activity. tives (U.S. Department of Health and
health promotion (O’Donnell, 1989), According to this reasoning, posi- Human Services, 2000b) along with
advancements in ecological theory tive changes in community condi- surveillance data from Hawaii were
(Breslow, 1996; Green, Richard, & tions and systems are an important available as guides for setting objec-
Potvin, 1996), as well as findings mechanism for achieving lasting tives regarding health status
from prior evaluations of commu- health effects. As such, they provide improvements as well as risk and
nity interventions, the HHI planners an early indication of whether inter- protective behavior change. But it
understood that reductions in risk ventions are likely to be successful. was difficult to find information that
behavior as well as corresponding To emphasize the full spectrum of defined expected changes in com-
improvements in health status expected effects and enhance the munity conditions and systems,
would be neither widespread nor potential for evaluation, the HHI especially considering Hawaii’s
sustainable without changing the planners decided that their strategy social diversity. Moreover, the short
context within which those behav- should specify three levels of out- time frame enforced by Act 304 ren-
iors occur (i.e., the predisposing, come objectives phased over time as dered infeasible traditional tech-
enabling, and reinforcing factors in follows: niques for eliciting community
the social, physical, and organiza- input.
tional environment) (Green & • improvements in population The planners faced two unappeal-
health status (most distant),
Kreuter, 1999). To be truly effective, ing options: either set objectives for
• widespread changes in risk and
interventions supported by the set- protective behaviors (intermedi- community and systems change
tlement fund would have to alter the ate), and without the benefit of stakeholder
community conditions and systems • changing community conditions consultation or omit from the writ-
that shape behaviors related to and systems (most immediate). ten plan this critical element of their
> EDITORS’ COMMENTARY

State budgets are one of the most important factors in assuring that communities have the resources they
need for health promotion. The budget process is complicated and often highly contentions. Several factors
have influenced the ways these decisions are made at the state level, including open meeting policies,
improved systems of communication, advocacy and lobbying efforts, media access and reporting, the con-
sumer health revolution, and the Internet. Still, it has been hard to include input from everyone who has a
stake in the process. Although community health practitioners agree that a need exists to systematically con-
sider opinions of key stakeholders when planning budgets for health improvement initiatives, little has been
written about successful strategies to accomplish that goal. This article describes an effort to use concept map-
ping to identify priorities and common objectives for state-level improvement in conditions and systems
related to tobacco use, nutrition, and physical activity. The results had a clear impact on the development of
Hawaii’s resource allocation plan, which ultimately was approved by the legislature and signed into law by the
governor. Health promotion research and practice continues to move forward in innovative ways to provide
evidence-based support for engaging stakeholders in policy approaches whose most immediate effect may well
be seen as shifts in conditions and systems in the communities where people live.
Ellen Jones, MS, CHES, is a health research consultant and

is based in Madison, Mississippi.
Lori Dorfman, DrPH, is director of the Berkeley Media Studies Group,

a project of the Public Health Institute in Berkeley, California.
Trochim et al. / COMMUNITY AND SYSTEMS CHANGE 3

program philosophy. The concept 1997) planning and evaluation 304 to assist in developing statewide
mapping techniques described as approach that integrates familiar community health programs. Sec-
follows offered a sound and practi- qualitative group processes (brain- ond, colleagues outside of Hawaii
cal solution to this dilemma. storming and pile sorting) with with special expertise in community
multivariate statistical analyses to and systems change (N = 46) were
help a group describe its ideas on invited to participate. These partici-
Defining the Domain
any topic of interest and represent pants were identified by representa-
of Community and
these ideas visually through a map. tives from the Centers for Disease
Systems Factors
The process typically requires the Control and Prevention and the
The following concept mapping participants to brainstorm a large set American Evaluation Association
project was designed to develop an of statements relevant to the topic of and included leading scholars and
initial understanding of the commu- interest, individually sort these practitioners in comprehensive
nity and systems factors that affect statements into piles of similar ones, community change. Both groups
individuals’ behaviors related to rate each statement on one or more were selected to include people
tobacco, nutrition, and physical dimensions, and interpret the maps whose interests encompassed the
activity. The goal was to better that result from the data analyses. subject areas of tobacco, nutrition,
define the boundaries and elements The analyses typically include mul- and physical activity.
in this complex domain by synthe- tidimensional scaling (MDS) of the
sizing input from local stakeholders sort data, hierarchical cluster analy-
Procedure
as well as national subject area sis of the MDS coordinates, and
experts. computation of average ratings for The developer of this concept
The project began with a review each statement and cluster of state- mapping methodology (Trochim,
of existing knowledge about the ments. The maps that result show 1989a) facilitated the process, which
topic. Public health practitioners, the individual statements in two- took place during 8 working days
researchers, and theorists have long dimensional (x, y) space with more (November 23 through December 3,
understood that to change the behav- similar statements located nearer 1999). The Concept System com-
ior of a large number of people—and each other and grouped into clus- puter software1 (Concept Systems,
ensure that those changes are sus- ters. Participants are actively 2000) was used to perform all analy-
tainable—it is necessary to change involved in interpreting the results ses and produce all of the maps and
the context in which they behave to ensure that the maps are under- statistical results. Most of the data
(Fawcett et al., 2000). This is standable and labeled in a meaning- were collected over the World Wide
because the environment within a ful way. Concept mapping has been Web using the Concept System
community strongly influences used effectively to address substan- Global software to allow for partici-
what people do and how they feel tive issues across a wide range of pation across the Hawaiian Islands
(Green & Kreuter, 1999). Unfortu- fields (McLinden & Trochim, 1998; and beyond.
nately, there is relatively little con- Shern, Trochim, & LaComb, 1995; Stakeholder input was accom-
sensus among scholars or health pro- Trochim 1989b; Trochim, Cook, & plished in two phases. The first
fessionals about precisely which Setze, 1994; Witkin & Trochim, involved generating a list of commu-
contextual factors are most impor- 1997). nity and systems factors related to
tant for changing particular behav- tobacco use, nutrition, and physical
iors. This project was designed to activity. Phase 2 consisted of orga-
Participants
gather participants’ ideas about fac- nizing and prioritizing those factors
tors that would support widespread, Two groups were invited to par- followed by interpretation of the
sustained change in tobacco use, ticipate in this project. First, within results.
nutrition, and physical activity. Hawaii, were health professionals The initial definition provided to
and leaders from community agen- participants of community and sys-
cies and coalitions (N = 34). In addi- tems factors was deliberately broad.
Concept Mapping
tion to various grassroots leaders, It stated that these are “characteris-
Concept mapping (Trochim, this group included members of the tics of a community’s social, physi-
1989a; Trochim & Linton, 1986) is a Tobacco Health and Wellness Advi- cal, and organizational environment
mixed-methods (Greene & Caracelli, sory Group, a panel created by Act that might influence health behavior

Policy and Politics
and health status either directly or quate lighting.” These and other organizing phase (December 1-3). Of
indirectly.” similar statements were consoli- these, 11 completed the sorting task,
dated into the final “pedestrian- 19 completed the importance rating,
friendly environments” descriptor” and 14 completed the feasibility rat-
Phase 1: Generating Community and
(34).2 ing. These numbers are not unusual
Systems Factors
in this methodology, which is often
Participants responded to the fol- used as an alternative to traditional
Phase 2: Organizing Community and
lowing focus statement: “Generate focus group interview procedures
Systems Factors
statements that describe specific that frequently involve even fewer
community or systems factors that Each participant used the Con- participants. Trochim (1993), in
affect individuals’ behaviors related cept System Global program via the summarizing meta-analyses of 38
to tobacco, nutrition, and physical World Wide Web to projects, reports an average of
activity.” approximately 14 sorters and raters
Both Web-based and live brain- • record demographic in each project with a standard
storming sessions were used to characteristics, deviation of approximately 6.
gather responses. Participants vis- • sort brainstormed statements,
ited the project Web site and and
• rate brainstormed statements. Concept Mapping Analysis
brainstormed (Osborn, 1948) their
initial ideas between November 23 The concept mapping analysis
Demographics. Each participant
and December 1. This yielded 448 uses the sort information to con-
answered the following two demo-
statements in 53 user sessions. Sec- struct an N × N binary, symmetric
graphic questions: one on primary
ond, five HDOH managers partici- matrix of similarities, Xij. For any
areas of interest (i.e., physical activ-
pated in a live brainstorming session two items i and j, a 1 was placed in
ity, nutrition, tobacco, Hawaiian
at the HDOH offices on December 1, Xij if the two items were placed in
Health, or other) and one on organi-
1999, resulting in 48 additional the same pile by the participant; oth-
zational location (i.e., HDOH,
statements. Analysts pooled state- erwise, a 0 was entered (Weller &
Hawaii not Department of Health, or
ments from both methods, yielding Romney, 1988). The total N × N simi-
not in Hawaii).
496 statements, many of which were larity matrix, Tij was obtained by
conceptually similar or redundant. summing across the individual Xij
Sorting. Each participant conducted
A total of three HDOH staff mem- matrices. Thus, any cell in this
an unstructured sorting (Coxon,
bers, with guidance from the con- matrix could take integer values
1999; Rosenberg & Kim, 1975;
cept mapping facilitator, consoli- between 0 and 11 (i.e., the number of
Weller & Romney, 1988) of the state-
dated the list into the final set of 90 people who sorted the statements);
ments by grouping them into piles.
unique statements. This involved the value indicates the number of
The only restrictions in sorting the
doing a rapid sort of the statements people who placed the i,j pair in the
90 statements were that participants
into more than 100 piles based on same pile.
could not (a) have 90 piles with one
similarity, merger of similar piles, The total similarity matrix Tij was
item in each, (b) have one pile con-
consolidation of nearly identical analyzed using nonmetric MDS
sisting of all 90 items, or (c) have any
statements, and integration of analysis (Davison, 1983) with a two-
piles that grouped conceptually dis-
detailed statements into broader dimensional solution as recom-
similar items (e.g., a “miscella-
ones. For instance, three of the origi- mended by Kruskal and Wish
neous” pile).
nal brainstormed statements were as (1978). The two-dimensional solu-
follows: “Providing more pedes- tion yields a configuration in which
Ratings. Participants rated each of
trian-friendly environments to statements piled together most often
the 90 statements on two dimen-
encourage walking short distances are located more closely in two-
sions—importance (compared with
for increased physical activity,” dimensional space than are those
other factors) and feasibility (during
“Percentage of primary and second- piled together less frequently. The
the next 2 to 5 years)—on a 5-point
ary roads with median wide enough usual statistic reported in MDS anal-
scale with 5 indicating extremely
to accommodate pedestrians and yses to indicate the goodness of fit of
important or extremely feasible.
bicylists,” and “Percentage of pedes- the configuration is called the stress
A total of 25 participants logged
trian underpasses featuring ade- value. A lower stress value indicates
onto the Web site during the 2-day

a better fit. In a study of the reliabil-
ity of concept mapping, Trochim Environment Infrastructure
24
(1993) reported that the average 89 87 Access
Policies and Laws 44 47 81
stress value across 33 projects was 15 34 38 76 11
18 21 32
.285 with a range from .155 to .352. 25, 39 45 57
58
2
60
The stress value in this analysis was 82 31 65
86
22, 43 49 29
.257, which is better (i.e., lower) 55 37
9, 12
63
than average. 67
46 33
4 50
The x,y configuration was the 17
5
Children 61 77
input for the hierarchical cluster 52
& School 79 64 36
analysis using Wards algorithm 16
88 10 53
(Everitt, 1980) as the basis for defin- 62 70
ing a cluster. Using the MDS config- 42
90 28
uration as input to the cluster analy- 3
71 41 27 51 54
69
78 66
sis in effect forces the cluster 35 8
40
72 30
23
analysis to partition the MDS config- Coalitions/Collaborations 80
6 74 1
20 56
uration into nonoverlapping clusters 84
68
13 59 48 26 14 7
73
83 19
in two-dimensional space. No sim- 75
85 Information/
ple mathematical criterion is avail- Community Infrastructure Communication
able by which a final number of clus-
ters can be selected. The analysts FIGURE 1Point-Cluster Map Showing the Multidimensional Scaling Arrangement of the
examined an initial cluster solution 90 Statements With the Seven-Cluster Solution and Labels Superimposed
that was the maximum thought
desirable for interpretation in this
context (i.e., 20 clusters). Then, the (points) in relation to each other as rations, community infrastructure,
analysis team (comprising concept arranged by MDS. Points are located and information and/or communica-
mapping experts and community closer to each other if more people tion) all refer in some way to local
health professionals in Hawaii) sorted them together into a group. In community conditions. The central
examined successively lower cluster general, points that are closer location of the children and school
solutions, making a judgment at together are more similar in mean- cluster suggests that the educational
each stage about whether the merger ing. The analysis groups these points system might be an especially useful
seemed substantively reasonable. into clusters as shown. In this pro- link between the systems clusters on
Judgments about the suitability of ject, the seven-cluster solution best the top and the community ones
each solution were discussed among fits the data. below. In other words, the educa-
the analysts and resulted in accep- The analysis also mathematically tional system might have special
tance of the seven-cluster solution, selects the best-fitting label for each strategic importance in addressing
as this preserved the most detail cluster from all of the pile labels gen- tobacco use, nutrition, and physical
while yielding interpretable clusters. erated by all of the sorters. These activity.
were examined in relation to the
> RESULTS statements in each cluster, and if the

analysts determined that the sug-
Ratings
gested label did not appropriately Table 1 shows the statements in
Map Results cover the content, the next best fit- each cluster that had the highest
ting was examined until an appro- average importance or feasibility.
In concept mapping, several dif- priate cluster label was identified. Maps can also display rating
ferent maps are typically generated The three clusters across the top results. For instance, the importance
based on the same underlying struc- of the map (i.e., policies and laws, rating map (see Figure 2) shows
ture, the arrangement of the state- environment infrastructure, and the average relative importance of
ments by MDS. The foundation for access) refer to systems factors that each cluster for the entire group of
all maps is the labeled point-cluster are often associated with govern- participants.
map (see Figure 1), which shows all ment. The three clusters on the bot- The number of layers indicates
of the community and system factors tom (i.e., coalitions and/or collabo- the average importance rating. The

Policy and Politics
TABLE 1
Top Two Statements in Each Cluster in Importance or Feasibility Showing Statement ID Number,
Statement Text, Average Importance Rating, and Average Feasibility Rating Organized Alphabetically
by Cluster and Within Cluster in Descending Order by Average Importance
ID Statement Importance Feasibility
Access
2 Easy, affordable access to healthy food, safe places for physical activity, and strict
antismoking policies 4.47 3.14
12 Availability of healthy food choices at a wide variety of retail, institutional, and
educational locations 4.16 3.71
11 Expanded hours for recreation centers and pools 4.11 4.07
58 Availability of school sites for after-school and community health activities
(low cost or no cost) 3.89 4.14
Children and school
46 Amount and quality of physical education and physical fitness training in schools 4.37 3.57
10 Literacy 4.21 2.86
17 Joint school-community activities and/or programs to promote health 4.00 3.86
79 Encourage innovative use of space for physical activity 3.74 3.86
Coalitions and/or collaborations
8 A caring, nurturing parent or surrogate parent in early childhood 4.53 3.14
88 Health care provider adherence to counseling for tobacco cessation, physical activity,
and nutrition 4.11 3.50
71 Professional and organizational coalitions and partnerships 3.95 3.86
3 Involvement of faith communities in health promotion 3.53 3.86
Community infrastructure
84 Focus on lifelong physical activity 4.58 4.50
75 Engaging target populations in promoting health 4.39 3.93
85 Community recognition of good health role models 3.95 4.29
Environment infrastructure
63 Equal opportunities for participation in physical activity programs regardless of age,
gender, or disability 4.47 3.64
34 Pedestrian-friendly environments 4.37 3.79
47 Well maintained equipment in recreational facilities 3.95 3.71
Information and/or communication 54 Media-sup-ported
health promotion campaigns 4.58 4.14
68 Information that is culturally sensitive and appropriate 4.32 4.07
28 A report card on legislators actions on health issues 3.68 4.21
19 Collateral material on healthy lifestyles (e.g., print materials, posters, visuals,
public displays) 3.00 4.29
Policies and laws
31 School policy promoting physical activity, healthy diet, and tobacco control 4.53 3.93
39 Policies that promote healthy transportation alternatives (cycling, walking, public
transportation, and so forth) 4.37 3.79
43 Public and worksite policy that supports tobacco control 4.32 4.29
45 Restricted access to tobacco products for youth 4.26 4.36
average represented by the layers averages between clusters are likely were judged by participants to be
is actually a double averagingacross to be significant. The map clearly most important.
all of the participants and all of the shows that the environment infra- The map of perceived feasibility
factors in each cluster. Conse- structure, policies and laws, and (see Figure 3) shows that policies
quently, even slight differences in community infrastructure clusters and laws, community infrastructure,

across two maps. For instance, the
relationship between average impor-
Environment Infrastructure tance and average feasibility across
all participants is shown in Figure 4.
Policies and Laws Although environment infra-
Access
structure was rated most important,
it was the second lowest in feasibil-
ity, whereas information and/or
communication was lowest in
Children & School
importance and highest in feasibil-
ity. In general, the areas that would
be most fruitful to pursue are those
Coalitions/Collaborations judged both important and feasible.
According to Figure 4, the policies
Information/Communication and laws and community infrastruc-
Layer Value Community Infrastructure ture clusters best meet that
1 3.68 to 3.77 requirement.
2 3.77 to 3.85
3 3.85 to 3.94
4 3.94 to 4.02 Item Analysis
5 4.02 to 4.11
To examine the relationship
FIGURE 2Importance Rating Map Showing the Average Cluster Rating for the Impor- between importance and feasibility,
tance Variable we plotted the two variables against
one another. This analysis revealed
that Statement 84 (i.e., focus on life-
Environment Infrastructure long physical activity), for example,
was rated highest in both impor-
Policies and Laws tance and feasibility. The bivariate
Access plot of feasibility and importance for
all 90 statements is shown in Figure
5. The statement identification num-
bers in the figure can be linked to the
Children & School
90 specific statements that were
brainstormed by participants.
The final cluster map with the
Coalitions/Collaborations major interpreted features overlaid
upon the clusters is presented in Fig-
Information/Communication ure 6. The figure provides evidence
Layer Value Community Infrastructure that the initial conceptual distinc-
1 3.29 to 3.40 tion between community and sys-
2 3.40 to 3.51 tems factors has some empirical cor-
3 3.51 to 3.61 roboration (i.e., the clusters on the
4 3.61 to 3.72 top describe systems factors whereas
5 3.72 to 3.83
those across the bottom depict com-
FIGURE 3Feasibility Rating Map Showing the Average Cluster Rating for the Feasibility munity factors). The map can also be
Variable interpreted from left to right in terms
of regions or clusters of clusters.
The two clusters on the left define
and information and/or communica- Pattern Matching the structure region that includes the
tion were rated as most feasible for system factors policies and law and
Pattern matching is used when the community factors coalitions
achieving change during the next 2
comparing patterns of variables and /or collaborations. In the center
to 5 years.

Policy and Politics
is the infrastructure region that

includes both environmental infra- Importance Feasibility
structure and community infrastruc-
ture and the cluster children and 4.11 3.83
school that bridges between these.
Finally, on the right of the map is the Environment Infrastructure Information/Communication
transmission region that includes
Policies and Laws Community Infrastructure
the system factors access and the
community factors information and/ Community Infrastructure Policies and Laws
or communication.
> DISCUSSION
Children & School Children & School
Access Access
The mapping process had several Coalitions/Collaborations Environment Infrastructure

immediate positive consequences.
First, it provided the HDOH with a Information/Communication Coalitions/Collaborations
systematic process that was per-
ceived by multiple stakeholders to 3.68 3.29
r = .07
have a high degree of credibility.
Second, the concept mapping pro-
FIGURE 4Ladder Graph Pattern Match of Importance and Feasibility
cess and its results reached influen-
tial stakeholders throughout Hawaii,
including leaders from community
agencies and coalitions across the 4.5 45
84
state, enabling broader stakeholder 19 41 20
1
85 43
engagement than might otherwise 28,90
66 58 11 83 54
82 68
have been possible and generating 4.0 16 26 17,30 80
18,71 144 75 31
results that were fed back to those 42 33
3 22 87
34,39
53 79 9,73
stakeholders in a timely manner. 6
40 81 32,78
47
56
59 23
8612 25
63
Feasibility
69 67 64 89 46
Third, the results were translated 3.5 49 74 62 55 88
7 44 5065
245 38
directly into specific objectives that 13 57
were incorporated into the HHI plan 36 61

76 72
2 8
60 21
and subsequently implemented. For 3.0 52 37
51 29
15
example, consider how just the five 27
77
10
statements rated highest in both 48

importance and feasibility were 2.5
enacted. To address focus on life- 35
long physical activity (84) the HHI

adopted a policy to promote activi- 2.0 70
ties of daily living (as opposed to

enhancing skills of athletes). For 2.5 3.5 4.5
media-supported health promotion
campaigns (54), the HHI developed Importance
the Start. Living. Healthy. campaign
to raise awareness and knowledge of FIGURE 5Bivariate Plot of Feasibility Versus Importance for the 90 System and Com-
healthy behaviors through radio, munity Factors
television, print, and Internet media.
For school policy promoting PA,
diet, and tobacco control (31), the policies. To address engaging target needs assessments and plan commu-
HHI provided funding to 24 schools populations in promoting health nity-based health promotion
statewide to initiate healthy school (75), the HHI provided funding to 26 activities. And to address standard-
workgroups to address school health communities to conduct community ized and consistent messages and/or

results illustrate how a hierarchical,
System Factors Infrastructure visual display of stakeholder per-
Transmission ceptions can have general utility for
Structure health planning. Depending on the
Environment Infrastructure specificity needed, when consider-
ing issues, planners can easily move
Policies and Laws between the different levels of gener-
Access
ality from the broad community-sys-
tem view, to the three-category
structure-infrastructure-access
Children & School scheme, to the seven categories
depicted by the clusters, or to the 90
specific brainstormed statements. In
one hierarchical graphic device, the
Coalitions/Collaborations map provides a high-level organiz-
ing structure and considerable oper-
Information/Communication ational detail that together can be
Community Infrastructure used to guide action planning as
well as evaluation design and
measurement.
Although concept mapping has
Community Factors many benefits, as previously noted,
the tight deadline and other factors
FIGURE 6Final Concept Map Showing Clusters and Relationships to Theoretical
made it difficult to use the method in
Constructs
an optimal way. For instance, partic-
ipation rates among those in Hawaii
information about risks and recom- sible to involve people from as well as national public health pro-
mended behaviors (80), the HHI con- throughout the state without neces- fessionals were lower than expected.
tracted with the University of sitating face-to-face meetings. This is likely due, at least in part, to
Hawaii School of Medicine to pro- The concept mapping process the studys timing because the pro-
vide continuing education for health provided a solid, credible founda- cess took place during the
professionals to increase the number tion to support the HDOH proposal. Thanksgiving holiday. The brief
of health workers who make It helped fulfill the legal mandate duration of the study most likely
appropriate recommendations and contained in Act 304 as well as the also affected participation. Extend-
referrals. ethical standards to involve stake- ing the process for several more
Overall, the mapping process holders in shaping decisions that weeks and setting up a more exten-
enabled the HDOH to move rapidly would affect them. The mapping sive system to prompt and remind
and develop a statewide health results validated the HDOHs recom- participants would improve partici-
improvement plan in a timely fash- mended intervention strategy and pation. Consolidating the large num-
ion. Using the concept mapping provided the opportunity to trans- ber of brainstormed statements into
approach, the HDOH succeeded in late public health theory into a a manageable subset was also chal-
obtaining outside input in a matter grounded action plan. lenging, especially given the short
of weeks from broad-based, volun- The results also suggest several deadline. Staff members had only a
tary, and anonymous stakeholders. conclusions that extend beyond the single evening to reduce 496
The method was cost-effective immediate context of this study. brainstormed statements to the final
and capitalized on the benefits of First, the theoretical distinction set of 90.
Web technology. Hawaii imposes between community and systems Furthermore, the process used
significant barriers on stakeholder factors was clearly recovered in the here did not examine how particular
gatherings because of its geography, maps, with all community-related community or systems factors might
as air travel is the only practical clusters arrayed across the bottom of relate differently to the three specific
means of transportation between the the map and system-related clusters health behaviors (i.e., tobacco use,
islands. Using the Web made it pos- grouped across the top. Second, the nutrition, and physical activity).

Policy and Politics
This limitation is a consequence of munity conditions and systems. Concept Systems. (2000). The concept sys-
tem. Ithaca, NY: Concept Systems Incorpo-
the deliberate decision to elicit state- Results from such evaluations can be
rated. Available from http://
ments that pertain to all three behav- linked either qualitatively or statisti- www.conceptsystems.com
iors in one project. Although it cally to the original planning ratings
Coxon, A. P. M. (1999). Sorting data: Collec-
would have been possible to con- of importance and feasibility. In this tion and analysis (Sage University Papers
duct three parallel concept mapping way, the map structure can act as a on Quantitative Applications in the Social
studies, this would have increased unifying device that helps integrate Sciences, 07-127) Thousand Oaks, CA:
the time and cost; it would also have initial planning with ongoing assess- Sage.
reinforced categorical distinctions ment and close the loop on the Davison, M. L. (1983). Multidimensional
that might overly constrain the aims traditional planning-evaluation scaling. New York: John Wiley.
of a comprehensive community cycle. Everitt, B. (1980). Cluster analysis (2nd ed.).
health initiative. Although there was The concept mapping technique New York: Halsted.
not sufficient time to address this proved to be a cost-effective and suc- Fawcett, S. B., Francisco, V. T., Hyra, D.,
issue in this project, one alternative cessful way of identifying statewide Paine-Andrews, A., Schultz, J. A., & Russos,
S., et al. (2000). Building healthy communi-
approach for future consideration objectives for changes in community ties. In A. Tarlov & R. St. Peter (Eds.), The
would be to classify each statement conditions and systems relating to society and population health reader: A
with respect to which of the behav- tobacco use, nutrition, and physical state and community perspective (pp. 75-
iors it addresses either through activity. This process enabled the 93). New York: New Press.
direct coding or through an addi- HDOH to develop its HHI in a timely Green, L. W., & Kreuter, M. W. (1999).
tional rating. The advantage of this way, and their experience shows Health promotion planning: An educa-
tional and ecological approach (3rd ed.).
(over three separate maps) would be how a hierarchical, visual display of
Mountain View, CA: Mayfield.
to show the relative emphases in stakeholder perceptions can be
Green, L. W., Richard, L., & Potvin, L.
each of the three areas while still useful for health planning.
(1996). Ecological foundations of health
preserving the integrated promotion. American Journal of Health
perspective across them. NOTES Promotion, 10, 270-281.
In spite of these limitations, the 1. The Concept System and Concept Greene, J. C., & Caracelli, V. J. (1997).
concept mapping technique pro- System Global software are licensed Advances in mixed-method evaluation:
vided an effective way of (a) engag- through Concept Systems Incorporated, The challenges and benefits of integrating
ing geographically dispersed stake- diverse paradigms. New Directions for Eval-
Ithaca, New York (http://
uation, 74.
holders, including local constituents www.conceptsystems.com).
and subject area experts across the 2. Throughout this article, identification Institute of Medicine. (1996). Healthy com-
numbers associated with the final munities: New partnerships for the future of
country; (b) generating valid find- public health (M. A. Stoto, C. Abel, & A.
ings that are understandable for brainstormed statements are shown in
Dievler, Eds.). Washington, DC: National
parentheses to enable one to locate the
nonscientists and have clear impli- Academy Press.
statements in tables and on maps.
cations for policy and practice; and Koteen, J. (1997). Strategic management in
(c) delivering useful results in a brief public and nonprofit organizations: Think-
period of time at relatively low cost. REFERENCES ing and acting strategically on public con-
cerns. New York: Praeger.
Finally, it is worth noting that the Allison, M., & Kaye, J. (1997). Strategic
map results provide a contextual Kruskal, J. B., & Wish, M. (1978). Multidi-
planning for nonprofit organizations: A
mensional scaling. Beverly Hills, CA: Sage.
framework that can be useful for practical guide and workbook. New York:
subsequent evaluation of the HDOH John Wiley. Lorange, P. (1994). Strategic planning pro-
cess. Brookfield, VT: Dartmouth Publishing
plan. At regular intervals, perhaps Breslow, L. (1996). Social ecological strate-
Company.
annually, the HDOH might review gies for promoting healthy lifestyles. Amer-
ican Journal of Health Promotion, 10, 253- McLinden, D. J., & Trochim, W. M. K.
each of the clusters and assess the
257. (1998). Getting to parallel: Assessing the
degree to which relevant actions return on expectations of training. Perfor-
Centers for Disease Control and Prevention.
have been taken and behavioral out- mance Improvement, 37(8), 21-26.
(2001). Investment in tobacco control: State
comes affected. This can be done highlights2001. Atlanta: U.S. Department of National Conference of State Legislatures.
either qualitatively as part of an Health and Human Services, Centers for (2001). State allocation of tobacco settle-
overall program review or quantita- Disease Control and Prevention, National ment funds FY 2000 and FY 2001. Washing-
tively through a surveillance system Center for Chronic Disease Prevention and ton, DC: Health Policy Tracking Service
Health Promotion, Office on Smoking and National Council of State Legislatures.
designed to capture changes in com- Health.

ODonnell, M. P. (1989). Definition of health Trochim, W. (1989b). Concept mapping: report of the surgeon general. Atlanta: U.S.
promotion: Part III: Expanding the defini- Soft science or hard art? Evaluation and Department of Health and Human Services,
tion. American Journal of Health Promo- Program Planning, 12, 87-110. Centers for Disease Control and Prevention,
tion, 3(3), 5. Trochim, W. (1993, November). Reliability
National Center for Chronic Disease Preven-
tion and Health Promotion, Office on Smok-
Osborn, A. F. (1948). Your creative power. of concept mapping. Paper presented at the
ing and Health.
New York: Scribner. annual conference of the American Evalua-
tion Association, Dallas. U.S. Department of Health and Human Ser-
Rosenberg, S., & Kim, M. P. (1975). The
vices. (2000b). Healthy People 2010 (2nd
method of sorting as a data gathering proce- Trochim, W., Cook, J., & Setze, R. (1994).
ed., Vols. 1-2). Washington, DC: Govern-
dure in multivariate research. Multivariate Using concept mapping to develop a con-
ment Printing Office.
Behavioral Research, 10, 489-502. ceptual framework of staffs views of a sup-
ported employment program for persons Weller, S. C., & Romney, A. K. (1988). Sys-
Shern, D. L., Trochim, W., & LaComb, C. A.
with severe mental illness. Consulting and tematic data collection. Newbury Park, CA:
(1995). The use of concept mapping for
Clinical Psychology, 62, 766-775. Sage.
assessing fidelity of model transfer: An
example from psychiatric rehabilitation. Trochim, W., & Linton, R. (1986). Concep- Witkin, B., & Trochim, W. (1997). Toward a
Evaluation and Program Planning, 18, 2. tualization for evaluation and planning. synthesis of listening constructs: A concept
Evaluation and Program Planning, 9, 289- map analysis of the construct of listening.
Trochim, W. (1989a). An introduction to
308. International Journal of Listening, 11, 69-
concept mapping for planning and evalua-
87.
tion. Evaluation and Program Planning, 12, U.S. Department of Health and Human Ser-
1-16. vices. (2000a). Reducing tobacco use: A

140
RESEARCH PAPER
Development of a model of the tobacco industry’s

interference with tobacco control programmes
W M K Trochim, F A Stillman, P I Clark, C L Schmitt
.............................................................................................................................
Tobacco Control 2003;12:140–147
Objective: To construct a conceptual model of tobacco industry tactics to undermine tobacco control
programmes for the purposes of: (1) developing measures to evaluate industry tactics, (2) improving
tobacco control planning, and (3) supplementing current or future frameworks used to classify and ana-
lyse tobacco industry documents.
See end of article for Design: Web based concept mapping was conducted, including expert brainstorming, sorting, and
authors’ affiliations rating of statements describing industry tactics. Statistical analyses used multidimensional scaling and
....................... cluster analysis. Interpretation of the resulting maps was accomplished by an expert panel during a
Correspondence to:
face-to-face meeting.
William M Trochim, 249 Subjects: 34 experts, selected because of their previous encounters with industry resistance or because
MVR Hall, Department of of their research into industry tactics, took part in some or all phases of the project.
Policy Analysis & Results: Maps with eight non-overlapping clusters in two dimensional space were developed, with
Management, Cornell
University, Ithaca, NY
importance ratings of the statements and clusters. Cluster and quadrant labels were agreed upon by
14853, USA; the experts.
wmt1@cornell.edu Conclusions: The conceptual maps summarise the tactics used by the industry and their relationships
to each other, and suggest a possible hierarchy for measures that can be used in statistical modelling
Received 2 January 2003.
Accepted 23 January of industry tactics and for review of industry documents. Finally, the maps enable hypothesis of a likely
2003 progression of industry reactions as public health programmes become more successful, and therefore
....................... more threatening to industry profits.
A
substantial peer reviewed literature exists describing the programmes, reduce measurable outcomes, and lead to an
great variety of strategies and tactics the tobacco indus- underestimation and devaluation of the impact and effective-
try uses to undermine public health. A good deal of this ness of tobacco control efforts.
work has documented, at least qualitatively, the tobacco The ASSIST evaluation was the first major tobacco control
industry’s specific actions to prevent or undermine tobacco evaluation to hypothesise a relationship between the indus-
control programmes and organisations.1–10 The tobacco industry’s anti-tobacco control efforts and the programme.17 ASSIST
try has been concerned that large scale, comprehensive included the construct of pro-tobacco efforts in the overall
tobacco control programmes would reduce smoking and thus evaluation model (fig 1). However, before this construct can be
reduce profits.11 12 operationalised, it needs to be conceptualised well. Categoris-
A prime example of a programme that the industry ing the dimensions of anti-tobacco control tactics and
perceived as a threat was the American Stop Smoking building a comprehensive model of these actions is a
Intervention Study (ASSIST)13 14 which was the first, large necessary first step toward development of measurable
multi-state initiative (1991 to 1999) that sought to reduce components and indices that can be used in programme
tobacco use by changing the sociopolitical environment evaluation. While originating in connection with the ASSIST
through media and policy advocacy, and the development of initiative, this problem of accounting for industry counter-
state infrastructure to deliver tobacco control.15 Given its efforts is not limited to that context alone, but is of relevance
scope, it is not surprising that ASSIST caught the attention of in the evaluation of any tobacco control programme.
the tobacco industry. For example, Andrew H Tisch, then Currently, there is no overarching conceptual model that
chairman and CEO of Lorillard Tobacco Company, delivered a could guide operationalisation of measures of industry tactics
speech in 1992 that described how threatening the ASSIST that might be useful for evaluation. Outside of the informal and
programme was to the industry.16 anecdotal literature on specific industry tactics, about the clos-
A major purpose of the ASSIST project was the evaluation of est thing to a current standardised framework that might be
its effects. Detailed measures were collected on both the pro- applicable for describing industry tactics is the UCSF/ANRF
grammes (including the capacity, resources, and efforts Tobacco Documents Thesaurus, a detailed glossary of terms
involved in implementing the various programme compo- used to index tobacco industry documents.18 However, the The-
nents) and outcomes (both intermediate and long term). saurus was not designed to provide a conceptual framework for
However, because of the presence of the industry, tobacco tobacco industry tactics. It is essentially a vocabulary of stand-
control programmes cannot be evaluated like most other pro- ard subject terms, or keywords, used to index and describe all
grammes. While local, state, and federal governments are documents in the tobacco control field.19 20 While essential for
expending resources to reduce smoking rates and promote document research, it has little utility for operationalising
tobacco control, the tobacco industry is expending significant measures of industry tactics.
resources to promote sales of their product, influence govern-
ments, and undermine these programmes. The industry’s
anti-tobacco control efforts constitute a countervailing force .............................................................
to tobacco control programmes that needs to be considered Abbreviations: ASSIST, American Stop Smoking Intervention Study;
when evaluating programme effectiveness since industry MDS, multidimensional scaling; UCSF/ANRF, University of California
efforts could actually swamp any impact coming from these San Francisco/American Nonsmokers Rights Foundation
www.tobaccocontrol.com
Model of tobacco industry interference 141
Figure 1 General conceptual model for the ASSIST evaluation.17
This paper describes the development of a comprehensive response to the prompt: “One specific activity/tactic the
conceptual map of the tactics that the tobacco industry uses to tobacco industry uses to oppose tobacco control is . . .” They
undermine tobacco control efforts. The resulting conceptual entered the statements in a list without regard to structure,
map, developed in the context of the ASSIST evaluation, has hierarchy, or clustering of statements. The process resulted in
utility beyond that context for the development of measures generation of 226 statements.
for programme evaluation, for improving strategic level In preparation for the sorting and rating task, the 226 state-
tobacco control programme planning, and for informing cur- ments were edited and consolidated. The process used was one
rent or future frameworks used to classify and analyse tobacco of grouping statements that were similar, then constructing
industry documents. one statement that captured the content of the group of state-
ments. The goal was to have a set of mutually exclusive state-
METHODS ments, with only one main idea in each, and with no loss of
The concept mapping methodology21 was used to develop the content from the original list. In this manner, the original 226
conceptual model of pro-tobacco tactics. Concept mapping is a statements were consolidated into the final set of 88
participatory mixed methods approach that integrates group statements.†
process activities (brainstorming, unstructured pile sorting,
and rating of the brainstormed items) with several multivari- Sorting and rating
ate statistical analyses (multidimensional scaling and hierar- Twenty one of the experts were asked to log on to another web
chical cluster analysis) to yield both statistical and graphic page for the sorting and rating tasks. Each conducted an
representations of a conceptual domain. unstructured sorting of the statements.22–24 They grouped the
brainstormed statements into piles “in a way that makes
Participants sense to you”. The only restrictions in this sorting task were
The participants were selected because they had previously that each statement could not be its own pile, there could not
encountered overt industry resistance to tobacco control be a pile consisting of all the statements, and there could be no
programming and/or research, had published research arising “miscellaneous” pile (any item thought to be unique was to be
from searches of the industry documents, or had otherwise put into its own pile). Each expert was asked to supply a brief
demonstrated understanding of industry challenges to tobacco label that summarised the contents of each of their
control. Among those represented, all were from the USA, 15 groups/piles.
were academics, seven represented advocacy organisations, Each participant was then asked to rate the 88 statements
seven contract research organisations, four government agen- with these instructions: “Rate each statement on a 1 to 5 scale
cies, and five were from tobacco control funding organisations for its relative importance in undermining tobacco control
(classifications not mutually exclusive). All participants (n=34) efforts. Use a 1 if the statement is relatively unimportant
utilised a web based program* to participate in the mapping (compared to the rest of the statements) in undermining
process (brainstorming, or sorting and rating, or both). A subset tobacco control efforts; use a 5 if it is extremely important.
of this group (n=13) participated in a face-to-face expert panel Although every statement probably has some importance (or
to interpret the results of the electronic mapping process. it wouldn’t have been brainstormed), try to spread out your
ratings and use each of the five rating values at least several
Procedures times.”
The general procedure for concept mapping is described in
detail elsewhere.21 There were four distinct phases in the Data analyses and generation of the maps
process: brainstorming, sorting and rating, data analyses and The analyses‡ began with construction from the sort
generation of the maps, and expert panel interpretation of the information of a binary, symmetric matrix of similarities. For
maps. any two items, a 1 was assigned if the two items were placed
in the same pile by the participant, otherwise a 0 was
Brainstorming
The experts logged on to a private web page over a four week .............................................................
period. Each brainstormed statement was generated in
†Detailed and intermediate results, including the original list of 226
brainstormed statements, can be obtained at http://omni.cornell.edu/
............................................................. tactics/
‡All analyses were accomplished and results produced using the Concept
*The Concept System Global© web software was used for all web System software, version 1.75. Further information on the software may
processes on this project. Further information on the software may be be obtained from Concept Systems Inc, http://
obtained from Concept Systems Inc, http://www.conceptsystems.com/ www.conceptsystems.com/
142 Trochim, Stillman, Clark, et al
Table 1 Statement numbers, statements within clusters listed in descending order of average importance, and
importance rating mean and standard deviation (SD)
Number Statement Mean SD
Lobbying and legislative strategy 3.71 0.94

85 Writing and pushing pre-emptive legislation at state level 4.67 0.58
8 Creating loopholes in laws and agreements (e.g. the MSA) to allow business as usual 4.57 0.68
26 Contributing funds to political groups at federal, state and local level, to support industry goals 4.43 0.98
53 Using clout to influence introduction, advancement, modification, or suppression of bills in legislative bodies 4.38 0.74
87 Lobbying to assure that funds directed to tobacco control are diverted to non-tobacco control initiatives 4.33 0.73
27 Using clout to limit powers of regulatory agencies (jurisdiction, procedures, budgets) 4.29 0.78
63 Providing legislators with contributions, gifts, and other perks 4.10 0.77
44 Promoting partial or weak measures as an alternative to effective measures 4.10 0.77
52 Inserting limiting language in legislation, such as “knowingly” sell tobacco to minors 4.05 0.74
13 Writing weak tobacco control legislation then arguing that tobacco control measures are ineffective 3.86 0.85
17 Ghost writing non-tobacco bills (e.g. sewage) with clauses that if enacted, would bring pre-emption via the backdoor 3.71 0.90
7 Lobbying government officials to set unrealistic tobacco control goals to ensure programme failure 3.67 1.20
61 Using political and/or monetary clout to delay funding of tobacco control programmes 3.67 1.06
36 Lobby to assure that funds are diverted to ineffective tobacco control activities 3.67 1.06
62 Working against campaign finance reform to maintain influence 3.62 1.12
21 Working against strengthening campaign and lobbying disclosure laws 3.57 1.08
19 Promoting tort reform 3.38 1.24
41 Using clout to assign tobacco control programmes to hostile/apathetic agencies for implementation 3.19 1.08
76 Conducting “briefings” of members of Congress, allies, and consultants to sway opinion on an issue 3.14 1.06
1 Promoting smokers’ rights legislation 3.05 1.02
29 Use of tobacco companies subsidiaries (i.e. Miller and Kraft) in political opposition to tobacco control legislation 3.05 1.12
10 Ensuring supportive legislators will lob soft questions during testimony 2.38 0.92
2 Using tobacco employees to lobby against legislation with the excuse that it threatens their job security 2.38 1.16
Legal and economic intimidation 3.46 1.04

16 Devoting considerable resources to legal fights 4.76 0.44
65 Create and fund front groups 3.81 1.12
46 Assuring that court battles are fought in favourable jurisdictions 3.76 0.83
64 Infiltrating official and de facto regulatory organisations (like ASHRAE) 3.43 1.16
58 Filtering documentation through their attorneys in order to hide behind attorney work product 3.29 1.35
9 Encourage (or fail to discourage) smuggling as a way to counter tax hikes. 3.10 1.26
4 Counter tax increases with promotions and cents off 3.05 1.20
48 Threatening to withdraw support from credible groups to control 2.48 0.98
Usurping the agenda 3.39 1.12

42 Developing alliances with retailers, vendors, and the hospitality industry in opposition to public health policies 3.90 0.89
40 Usurping the public health process, such as creating their own youth tobacco prevention programmes 3.33 1.20
22 Avoiding regulatory and legislative interventions by establishing their own programmes such, as “We Card” 3.24 1.04
66 Promoting a tobacco control focus that is limited to youth issues 3.24 1.26
35 Shifting blame to the victims (e.g. passing youth possession laws to punish youths) 3.24 1.22
Creating illusion of support 3.27 1.09

54 Using legal and constitutional challenges to undermine federal, state, and local legislative and regulatory initiatives 4.52 0.75
81 Using anti-lobbying legislation to suppress tobacco control advocacy 3.57 1.16
68 Flying in cadre of “experts” to fight local/state legislation 3.43 0.98
39 Creating the illusion of a pro-tobacco grassroots movement through direct mail database and paid-for petition names 3.19 1.21
60 Using international activities to avoid domestic rules on ads, taxation, etc 3.05 1.02
33 Entering false testimony and false data into the public record 2.95 1.20
75 Tying states’ MSA money to increases/decreases of smoking prevalence 2.95 1.32
59 Using employees and their families to make campaign contributions that are difficult to track 2.52 1.08
Harassment 3.26 1.19

43 Intimidating opponents with overwhelming resources 4.38 0.74
32 Using the courts, and threats of legal action to silence opponents 4.19 0.93
37 Harassing tobacco control workers via letters, FOIAs, and legal action. 3.43 1.43
56 Silencing industry insiders 3.19 1.36
23 Hassling prominent tobacco control scientists for their advocacy work 3.00 1.45
3 Infiltrating tobacco prevention and control groups 2.81 1.17
25 Trying to undermine those selling effective cessation products 1.81 1.25
Undermining science 3.26 1.09

11 Creating doubt about the credibility of science by paying scientists to disseminate pro-tobacco information 3.76 0.77
18 Sowing confusion about the meaning of statistical significance and research methods 3.57 1.12
38 Creating scientific forums to get pro-tobacco information into the scientific literature 3.33 1.24
5 Influencing scientific publication by paying journal editors to write editorials opposing tobacco restrictions 3.10 1.09
71 Creating doubt about the credibility of science by paying scientists to provide expert testimony 3.10 1.22
80 Creating doubt about the credibility of legitimate science by paying scientists to conduct research 3.05 1.16
86 Conducting studies that, by design, cannot achieve a significant result 2.90 1.04
Media manipulation 2.91 1.13

77 Using advertising dollars to control content of media 3.71 0.96
34 Putting own “spin” on the issues by manufacturing information sources 3.43 1.12
67 Taking advantage of the “balanced reporting” concept to get equal time for junk science 2.86 1.20
69 Ghost writing pro-tobacco articles 2.76 1.22
6 Avoiding the key health questions by saying they are not experts and then not agreeing with the experts 2.71 1.27
84 Misrepresenting facts in situations where there is no time to verify 2.67 0.97
74 Publicly acknowledging the risk of tobacco use, but minimising the magnitude 2.67 1.20
30 Publicising research into “safe cigarettes” 2.48 1.12
Table 1 continued
Number Statement Mean SD
Public relations 2.85 1.10

12 Using philanthropy to link their public image with positive causes 4.00 0.89
28 Using philanthropy to build a constituency of support among credible groups 3.62 0.80
73 Diverting attention from the health issues by focusing attention on the economic issues 3.48 0.98
51 Distracting attention from the real issues with alternative stances such as accommodation and ventilation 3.38 1.40
88 Asserting that restrictions on tobacco could lead to restrictions on other industries and products 3.38 0.92
14 Minimising importance of misdeeds in the past by claiming they have changed 3.24 1.41
20 Argue that tobacco control policies are anti-business 3.19 1.03
72 Maintaining that the tobacco industry is of critical importance to the economy 3.19 1.08
45 Portraying themselves “responsible”, “reasonable” and willing to engage in a “dialogue” 2.90 1.34
78 Misrepresenting legal issues to naive reporters and stock analysts 2.86 1.20
79 Feeding pro-tobacco information to market analysts who are predisposed to accepting and transmitting it 2.86 1.20
15 Representing people as “anti-smoker” instead of anti-smoking 2.81 1.03
82 Developing pro-tobacco media content, such as videos and press releases 2.67 0.97
83 Painting tobacco control activists as extremists 2.67 1.15
55 Pretending that the “real” tobacco control agenda is prohibition 2.57 1.08
57 Casting tobacco control as a civil rights threat 2.52 1.25
49 Portraying tobacco control as a class struggle against poor and minority groups 2.48 0.98
24 Extensive media training for executives who will be in the public eye 2.43 1.12
70 Shifting attention toward lawyers’ monetary gains and away from tobacco litigation 2.38 1.20
47 Avoiding losing public debates by overcomplicating simple issues 2.29 1.15
31 Blaming it on “fall-guys” (past or rogue employees) when the industry is caught misbehaving 2.00 1.22
50 Refusing or avoiding media debates where they think they will do poorly 1.71 0.72
ASHRAE, American Society of Heating, Refridgerating and Air-Conditioning Engineers; FOIA, Freedom of Information Act; MSA, Master Settlement
Agreement.
entered.23 The total similarity matrix was obtained by Expert panel interpretation of the maps
summing across the individual matrices. Thus, any cell in this A panel of 13 tobacco control experts who were members of
matrix could take integer values between 0 and 22 (the the larger group was convened for a face-to-face meeting to
number of people who sorted the statements); the value indi- review and interpret the results of the mapping process. The
cates the number of people who placed the pair in the same interpretation session followed a structured process described
pile. In addition, in this analysis the final matrix was filtered in detail in Trochim.21 Participants examined the maps to
by changing any matrix values of 1 to a 0. In effect, this means determine whether they made intuitive sense and to discuss
that there needed to be at least two participants who place any what the maps might imply about the ideas that underlie their
two statements together in order for them to be considered at conceptualisation. They discussed each cluster until a consen-
all similar. This filtering helps minimise the effects of any sus was reached on an acceptable cluster label. Participants
errors or spuriousness in sorting on the final results. then examined the labelled cluster map to identify any inter-
The total similarity matrix was analysed using non-metric pretable groups of clusters or “regions”. These were discussed
multidimensional scaling (MDS) analysis25 with a two dimen- and partitions drawn on the map to indicate the different
sional solution. The solution was limited to two dimensions regions. Just as in labelling the clusters, the group then arrived
because of ease of use considerations.26 at a consensus label for each of the identified regions. This
The x,y configuration output from MDS was the input for step-by-step interpretation culminated in a discussion of the
the hierarchical cluster analysis utilising Ward’s algorithm27 as overall meaning of the various maps and representations, and
the basis for defining a cluster. Using the MDS configuration in the articulation of a conceptual model of pro-tobacco
as input to the cluster analysis in effect forces the cluster tactics.
analysis to partition the MDS configuration into non-
overlapping clusters in two dimensional space. There is no RESULTS
simple mathematical criterion by which a final number of The usual statistic that is reported in MDS analyses to indicate
clusters can be selected. The procedure followed here was to the goodness-of-fit of the two dimensional configuration to
examine an initial cluster solution that was the maximum the original similarity matrix is called the “stress value”. A
thought desirable for interpretation in this context. Then, suc- lower stress value indicates a better fit. In a study of the reli-
cessively lower cluster solutions were examined, with a judg- ability of concept mapping, Trochim reported that the average
ment made at each level about whether the merger seemed stress value across 33 projects was 0.285 with a range from
substantively reasonable. 0.155 to 0.352.28 The stress value in this analysis was 0.237,
The MDS configuration of the statement points was which is better (that is, lower) than average.
graphed in two dimensions. This “point map” displayed the The pattern of judgments of the suitability of different clus-
location of all the brainstormed statements with statements ter solutions was examined and resulted in acceptance of an
closer to each other generally expected to be more similar in eight cluster solution as the one that both preserved the most
meaning. A “cluster map” was also generated that displayed detail and yielded substantively interpretable clusters of
the original statement points enclosed by polygon shaped statements. The 88 statements are shown in table 1 in
boundaries that depict the clusters. descending order by average importance within the eight
The 1 to 5 importance rating variable was averaged across clusters, along with their standard deviations. The point clus-
persons for each item and each cluster. This rating infor- ter map in fig 2 shows all of the pro-tobacco tactics statements
mation was first depicted graphically in a “point rating map” (points) in relation to each other.
showing the original point map with the average rating per Figure 3 shows the cluster rating map where the layers of
item displayed as vertical columns in the third dimension and, each cluster depict the average importance rating, with more
second, in a “cluster rating map” that showed the cluster layers equivalent to higher importance. Note that the average
average rating using the third dimension. represented by the layers in the map is actually a double
Figure 2 Point cluster map showing the multidimensional scaling arrangement of the 88 statements with the eight cluster solution
superimposed.
averaging—across all of the participants and across all of the Members of the expert panel then suggested that the two
factors in each cluster. Consequently, even slight differences in dimensions can be viewed as forming four quadrants based on
averages between clusters are likely to be meaningfully inter- the 2 × 2 combination of these dimensions and provided a
pretable. The map shows that clusters along the bottom are short label for each quadrant: Public + Messages = Issue
judged more important in undermining anti-tobacco efforts. framing; Public + Action = Lobbying tactics; Covert +
Messages = Science PR (public relations); and Covert +
Action = Harassment.
Expert panel interpretation
Finally, the expert panel discussed these dimensionalities
The expert panel interpreted the map and table in terms of
and agreed upon a final labelling for all areas of the map.
several interesting patterns. The four clusters across the top
These features are all depicted in fig 3.
were thought to describe the messages that the tobacco
industry issues or tries to control—what the tobacco industry DISCUSSION
says. This includes attempts to undermine science and legiti- The primary purpose of this project was the development of a
mate messages from scientific quarters (Undermining sci- conceptual framework that describes the tactics the tobacco
ence), the manipulation of the media (Media manipulation), industry uses to undermine tobacco control programmes.
the industry’s public relations efforts (Public relations), and Such a framework may be used in a variety of ways. Here, we
the tactics they use to gain control of the public agenda discuss the potential utility of the framework for evaluation
(Usurping the agenda). The four clusters across the bottom measurement development, strategic planning, and to support
describe industry actions—what the tobacco industry does. efforts to classify and analyse tobacco industry documents.
This includes lobbying efforts (Lobbying and legislative strat-
egy), the use of front groups and artificially created Use in measurement development
“grassroots” movements (Creating the illusion of support), Figure 3 could be used as the basis for the development of an
intimidation (Legal and economic intimidation), and harass- index of tobacco tactics. To do so would require that each of
ment of tobacco control professionals (Harassment). the clusters be operationalised. The statements within each
The participants also interpreted a horizontal dimensional- cluster suggest potential elements that might be measured as
ity. Toward the left on the map are clusters that represent tac- part of the index. For instance, one statement in the cluster
tics that are more hidden or covert in nature. On the right are Lobbying and legislative strategy was “Promoting smokers’
tactics that tend to be more overt or public in nature. The rights legislation”. This could be operationalised at the state
dimensional interpretation is not meant to suggest that any level as the number of proposed bills or a measure of the
cluster would be exclusively classifiable into one or the other amount of relevant legislative committee activity. Another
extreme on a dimension. Undermining science is not statement was “Lobby to assure that funds are diverted to
exclusively Covert, while Lobbying and legislative strategy is ineffective tobacco control activities”. Here, measures of
not exclusively public. The relational nature of the map tobacco control programme funding and evidence of lobbying
suggests that the clusters vary along the public-covert and activities might be utilised. In this manner, the statements in
message-action dimensions with varying levels of each end each cluster can act as prompts or suggestions for potential
point present in each cluster. operationalisations.
Figure 3 Concept map showing clusters, cluster labels, relative importance ratings, and expert’s interpretations of dimensions and regions.
In addition, the overall structure of the map suggests how The map thus provides a high level strategic model of the
such an index might be aggregated. For instance, sub-index industry’s response to increased tobacco control efforts. This
scores for the clusters Public relations and Usurping the model can be used in tobacco control planning to better
agenda can be aggregated into a total score that represents anticipate what the industry may do next.
Issue framing. Moving one level up the hierarchy, the four
sub-index scores that represent the quadrants can be Use in tobacco document analysis
aggregated into an overall index of Pro-tobacco tactics. The conceptual framework can be used for classifying industry
We know from the map results that the expert panel did not documents specifically with respect to industry tactics and, as
view all of the tactics as equally important. This importance such, would augment and extend existing document classifi-
rating information can be incorporated into the development cation and indexing procedures like the UCSF/ANRF Tobacco
of an index such that sub-index scores for each cluster are Documents Thesaurus. For example, each document could be
weighted by the average importance and the final index classified for its relevance to the eight cluster areas. Once
aggregation weighted by quadrant importance averages. done, it would be immediately possible to retrieve all
documents that provide evidence for a particular type of tactic
Use in tobacco control planning (for example, cluster), or display all documents that reflect a
The conceptual map can provide a high level strategic view of broader cross-cutting (for example, a column category like
industry tactics that can help tobacco control planners better “covert”) activity on the part of the industry.
anticipate the tactics that the industry might use in certain For example, consider the cluster Creating illusion of
circumstances. For instance, a potentially useful aspect of the support in the lower part of the map. The statements in that
map that surfaced in the interpretation can be seen as one cluster (table 1) indicate several key sub-topics that are
moves from the right to the left side. The overt public industry relevant and could help guide both the searching and classify-
tactics on the right of the map tend to be ongoing activities ing of documents.
that the industry does routinely. Like virtually all other major For instance, for the statement “Creating the illusion of a
industries, the tobacco industry has ongoing public relations pro-tobacco grassroots movement through direct mail data-
and lobbying efforts as suggested in the clusters on the right. base and paid-for petition names”, one document identified as
But, how does the industry change its tactics in response to relevant is a 1994 Philip Morris presentation that described
the perceived threat of increasing tobacco control efforts? The their efforts to create the illusion of support: “We also are
map and the expert panel suggested that they probably do so mobilising support among our consumers. Consumers who
by moving from upper right toward lower left. Initially they respond to our brand promotions receive an insert with their
most likely augment their public relations and lobbying fulfillment packages . . .so far, more than 400,000 consumers
efforts. If the tobacco control efforts become salient enough, have responded, and the programme has generated some
the map suggests that the industry will be pressured increas- 80,000 letters to Capital Hill, about 10,000 per month.”29 Simi-
ingly to the more covert activities on the left side that include larly, for the statement “Using employees and their families to
undermining science, legal and economic intimidation, and make campaign contributions that are difficult to track” one
harassment. relevant document is a 1997 Brown and Williamson letter
which states that “as a Brown & Williamson employee, you

What this paper adds
can play a major role in influencing elections, the future of our
business and, of course, our respective jobs” by “making con-
tributions to the B&W Employee Political Action Committee”. A major challenge in evaluating tobacco control efforts is
The letter discusses previous contribution levels for 1996 and the need to measure tobacco industry counter-efforts and
options for method of contributing (payroll deductions or per- their effects. Currently, no overarching conceptual model
sonal checks) and asks for a $200 contribution from each eli- exists to guide operationalisation of measures of industry
gible participant.30 Or, for the statement “Flying in cadre of tactics that might be useful for evaluation. This study used
‘experts’ to fight local/state legislation”, a 1993 Philip Morris a web based multivariate concept mapping methodology
document describes the objective “to support the defeat of with a panel of tobacco control experts to develop a con-
ceptual model of the tobacco industry interference with
unwarranted smoking restrictions and to discourage unfair
tobacco control programmes.
discrimination against smokers”. Goals and tactics were:
The resulting conceptual maps summarise the tactics
“promotion of ETS in the context of indoor air quality and use
used by the industry and their relationships to each other,
of experts to directly and indirectly influence legislation, rule-
and suggest a possible hierarchy for measures that can be
making and standards in relation to ETS and workplace used in statistical modelling of industry tactics and for
smoking issues.”31 These examples are meant to illustrate how review of industry documents. Finally, the maps enable
the conceptual map can be used both as a suggestive device hypothesis of a likely progression of industry reactions as
when searching the documents and as an expert derived hier- public health programmes become more successful, and
archical thematic taxonomy of pro-tobacco tactics that can be therefore more threatening to industry profits.
useful in coding and organising the documents subsequently
identified.
Another document related application would be to develop
they need to be addressed in comprehensive evaluations of
a cross referencing between the map categories and other
tobacco control programmes.
classification systems such as the UCSF/ANRF Tobacco Docu-
Regardless of the real world potential uses for the
ments Thesaurus. For example, the Thesaurus includes the
conceptual map, the structure is an intriguing one in its own
terms “lobbying”, “industry front group”, and “industry
right. It summarises a very complex area concisely and
sponsored research” which could be linked with the map cat-
provides a compelling theoretical model that needs to be
egories Lobbying and legislative strategy, Creating illusion of
tested and extended empirically in follow up work. Replica-
support, and Undermining science, respectively. This type of
cross referencing would enable the tobacco documents to be tions of this study could be used to determine the reliability
accessed immediately through different conceptual schema and generalisability of the model. In addition, the model is
that were devised for different purposes, without having to general enough at its highest level to be a potential framework
reclassify all documents from scratch. that might be applied to understanding the tactics of other
In addition to its use in addressing the three issues industries that attempt to undermine the legitimate work of
described above, the conceptual framework can act as an public health programmes.
organising device that encourages greater synergy between
the three activities. For example, if in a local context, tobacco ACKNOWLEDGEMENTS
This article was supported by contract number N01-CP-95030 from
control planners determine that the industry is likely to the National Cancer Institute. Its contents are solely the responsibility
increase its efforts in creating the illusion of support in the of the authors and do not necessarily represent the official view of the
immediate future, the planners could examine that cluster on National Cancer Institute.
the map to help determine the specific tactics the industry
might use, to think about how to measure or track the indus- .....................
try’s effort in this area, and to access the tobacco document Authors’ affiliations
evidence relevant to that cluster that describes the history of W M K Trochim, Department of Policy Analysis & Management, Cornell
similar activities in other contexts. University, Ithaca, New York, USA
Additional work could enhance the utility of this frame- F A Stillman, Institute of Global Tobacco Control, Johns Hopkins
University, Baltimore, Maryland, USA
work for document analysis. In this study, participants were P I Clark, C L Schmitt, Battelle Memorial Institute, Centers for Public
asked to brainstorm industry tactics from their point of view Health Research and Evaluation, Baltimore, Maryland, USA
and in their own language. This creates, in effect, a map that
is decidedly anti-tobacco in its perspective. But the tobacco REFERENCES
documents themselves are generated from an opposing 1 Saloojee Y, Dagli E. Tobacco industry tactics for resisting public policy
perspective, using euphemisms and industry code terms on health. Bull World Health Organ 2000;78:902–10.
2 Glantz SA, Barnes DE, Bero L, et al. Looking through a keyhole at the
designed to portray their pro-tobacco efforts in a good public tobacco industry. JAMA 1995;274:219–24.
light. Where anti-tobacco researchers might, for instance, talk 3 Cummings KM, Sciandra R, Gingrass A, et al. What scientists funded
about the industry “paying scientists to conduct research to by the tobacco industry believe about the hazards of cigarette smoking.
Am J Public Health 1991;81:894–6.
create doubt about legitimate science” (statement 80), it is 4 Sweda ELJ, Daynard RA. Tobacco industry tactics. Br Med Bull
unlikely that industry documents would describe their activi- 1996;52:183–92.
ties in a similar manner. Document searches that rely directly 5 Zeltner T, Kessler D, Martiny A, et al. Tobacco company strategies to
on the language of the map are unlikely to be fruitful or get at undermine tobacco control activities at the World Health Organization.
Geneva: World Health Organization, 2000. URL: http://
the desired topics. This suggests that it would be useful to filestore.who.int∼who/home/tobacco/tobacco.pdf
develop the type of cross referencing to the Thesaurus that 6 Samuels B, Glantz SA. The politics of local tobacco control. JAMA
was discussed above. 1991;266:2110–7.
7 Goldstein AO, Bearman NS. State tobacco lobbyists and organizations
Finally, there were activities of the tobacco industry, such as in the United States: crossed lines. Am J Public Health 1996;86:1137–
manipulating product chemistry or price, that were not 42.
included in this map because the focus in this project was on 8 Jacobson PD, Wasserman J. The implementation and enforcement of
tobacco control laws: policy implications for activists and the industry.
specific activities/tactics the industry uses to undermine Journal of Health Politics, Policy & Law 1999;24:567–98.
tobacco control programmes. The manipulation of chemistry 9 Givel MS, Glantz SA. Tobacco lobby political influence on US state
or price were not perceived by participants as “tactics” for legislatures in the 1990s. Tobacco Control 2001;10:124–34.
undermining tobacco control per se. Despite not being consid- 10 Traynor MP, Begay ME, Glantz SA. New tobacco industry strategy to
prevent local tobacco control. JAMA 1993;270:479–86.
ered industry tactics for undermining tobacco control 11 Aguinaga S, Glantz S. The use of public records acts to interfere with
programmes, the importance of these issues is undeniable and tobacco control. Tobacco Control 1995;4:222–30.
12 Bialous SA, Fox BJ, Glantz SA. Tobacco industry allegations of “illegal 22 Rosenberg S, Kim MP. The method of sorting as a data gathering
lobbying” and state tobacco control. Am J Public Health 2001;91:62–7. procedure in multivariate research. Multivariate Behavioral Research
13 Author unknown. Synar/ASSIST Task Force. Philip Morris. 1993. Bates 1975;10:489–502.
No. 2023961347-1359 Accessed 1 November 2001. 23 Weller SC, Romney AK. Systematic data collection. Newbury Park,
www.pmdocs.com <http://www.pmdocs.com> Merlo E. Vendor California: Sage Publications, 1988.
Conference Draft. Philip Morris Corporate Affairs. December, 1993. 24 Coxon APM. Sorting data: collection and analysis. Sage University
Draft speech. Bates No. 2040863440-3463. Papers on Quantitative Applications in the Social Sciences, 07-127.
14 Author unknown. (ASSIST program and Synar Amendment to Thousand Oaks, California: Sage Publications, 1999.
ADAMHA). July 1992. Philip Morris. Bates No. 2048621152-1175. 25 Davison ML. Multidimensional scaling. New York: John Wiley and
Accessed 5 November 2001. URL: www.pmdocs.com Sons, 1983.
15 Manley M, Lynn W, Epps R, et al. The American Stop Smoking 26 Kruskal JB, Wish M. Multidimensional scaling. Beverly Hills, California:
Intervention Study for cancer prevention: an overview. Tobacco Control Sage Publications, 1978.
1998;6(suppl 2):S5–11. 27 Everitt B. Cluster analysis, 2nd ed. New York: Halsted Press, a division
16 Tisch AH. Lorillard article for NY Assoc. of Tobacco and Candy of John Wiley and Sons, 1980.
Distributors. Lorillard Tobacco Company.(1992) Bates No. 92761408-9. 28 Trochim W. Reliability of concept mapping. Paper presented at the
Accessed 5 August 2001. URL: http://www.lorillarddocs.com Annual Conference of the American Evaluation Association, Dallas,
17 Stillman F, Hartman A, Graubard B, et al. The American Stop Smoking Texas, November 1993. Accessed 15 August 2001. URL:
Intervention Study: conceptual framework and evaluation design. http://trochim.human.cornell.edu/research/reliable/reliable.htm
Evaluation Review 1999;23:259–80. 29 Philip Morris. Final draft: EM presentation 30 March 1994. Philip
18 University of California San Francisco. URL: http:// Morris. Bates No. 2024007084-7109. Accessed 21 August 2001. URL:
www.library.ucsf.edu/tobacco/thesaurus.html http://www.pmdocs.com/
19 Malone RE, Balbach ED. Tobacco industry documents: treasure trove or 30 Brown & Williamson. [Memo: Dear Fellow B&W Employee] Brown &
quagmire? Tobacco Control 2000;9:334–8. Williamson, 13 February 1997. Bates No. 621960913. Accessed 15
20 Glantz S. The truth about big tobacco in its own words. BMJ August 2001. URL: http://www.bw.aalatg.com/public.asp
2000;321:313–4. 31 Philip Morris. Public Smoking. Philip Morris, 1993. Bates No.
21 Trochim W. An introduction to concept mapping for planning and 2024234063–4075. Accessed 21 August 2001. URL:
evaluation. Evaluation Program Planning 1989;12:1–16. http://www.pmdocs.com
ORGANIZATIONAL
10.1177/109442802237114
Jackson, Trochim / CONCEPT
RESEARCH
MAPPING
METHODS
OPEN-ENDED RESPONSES
Concept Mapping as an Alternative

Approach for the Analysis of
Open-Ended Survey Responses
KRISTIN M. JACKSON
WILLIAM M. K. TROCHIM
Cornell University
This article presents concept mapping as an alternative method to existing code-

based and word-based text analysis techniques for one type of qualitative text
data—open-ended survey questions. It is argued that the concept mapping method
offers a unique blending of the strengths of these approaches while minimizing
some of their weaknesses. This method appears to be especially well suited for the
type of text generated by open-ended questions as well for organizational research
questions that are exploratory in nature and aimed at scale or interview question
development and/or developing conceptual coding schemes. A detailed example
of concept mapping on open-ended survey data is presented. Reliability and valid-
ity issues associated with concept mapping are also discussed.
Qualitative text data in the form of brief, open-ended survey responses are often elic-
ited in organizational research to gather new information about an experience or topic,
to explain or clarify quantitative findings, and to explore different dimensions of
respondents’ experiences (Sproull, 1988). For example, they can provide details in the
employees’ “own words” as to why they feel stress on the job, why there may be resis-
tance to an organizational change effort, or why employee perceptions have changed
toward an organization policy. The appeal of this type of data is that it can provide a
somewhat rich description of respondent reality at a relatively low cost to the
researcher. In comparison to interviews or focus groups, open-ended survey questions
can offer greater anonymity to respondents and often elicit more honest responses
(Erickson & Kaplan, 2000). They can also capture diversity in responses and provide
alternative explanations to those that closed-ended survey questions are able to capture
Authors’Note: The authors wish to thank Randall Peterson for his thoughtful suggestions on this manu-
script and Alex Susskind for granting us access to the subject pool. This article was previously presented at
the American Evaluation Association Conference, Honolulu, Hawaii, November 2000. Correspondence
concerning this article should be addressed to Kristin M. Jackson, Johnson Graduate School of Manage-
ment, Cornell University, 301 Sage Hall, Ithaca, NY 14853-6201; e-mail: kmj13@cornell.edu.
Organizational Research Methods, Vol. 5 No. 4, October 2002 307-336
DOI: 10.1177/109442802237114
© 2002 Sage Publications
307
308 ORGANIZATIONAL RESEARCH METHODS
(Miles & Huberman, 1994; Pothas, Andries, & DeWet, 2001; Tashakkori & Teddlie,
1998). Open-ended questions are used in organizational research to explore, explain,
and/or reconfirm existing ideas.
However, the drawbacks are that open-ended survey data are often time-consuming
to analyze, some respondents do not answer the questions, and coding decisions made
by researchers can pose threats to the reliably and validity of the results (Krippendorff,
1980; Seidel & Kelle, 1995). Depending on the method chosen for analysis, there are
different trade-offs that limit the type of inference we can draw and the strength of the-
ory we can build from this type of data (Fine & Elsbach, 2000). In this article, we pres-
ent concept mapping as an alternative method to existing text analysis techniques that
is particularly well suited to the type of text generated by open-ended questions as well
as to the exploratory nature of these types of questions. By blending the strengths of
existing text analysis techniques and coupling them with the use of advanced
multivariate statistical methods, concept mapping offers organizational researchers a
way to code and represent meaning in text data based on respondent input with consid-
erable savings in analysis time and improvement in analytic rigor. Concept mapping
can be used to develop coding schemes and/or reexamine existing theoretical coding
schemes, to develop follow-up interview questions and closed-ended scale items and
to represent the diversity and dimensionality in meaning through analysis of the entire
sample as well as assessment of subgroup differences. Concept mapping offers organi-
zational researchers a chance to make better use of open-ended text data.
Characteristics and Analysis of

Open-Ended Survey Question Text
Open-ended survey responses are extremely useful in helping to explain or gain
insight into organizational issues but at the same time to generate both an interesting
and challenging type of text to analyze. This type of text contains characteristics of
shorter “free list” types of text as well as more “narrative” characteristics of longer text
documents. The limited response length of the survey format forces respondents to
express themselves in more of a concise “list” format while at the same time giving
them the opportunity to “vent” or explain themselves in a short narrative form.
Responses typically vary from a few phrases to a couple of paragraphs and represent a
wide variety of concepts with varying frequency and detail—a “free list in context”
type of text.
The analysis of this type of text poses several challenges. The “free list in context”
nature of the data can make it difficult to choose an appropriate methodology. There
has been considerable debate about which methods give the greatest reliability and
validity in representing content in text (Gerbner, Holsti, Krippendorff, Paisley, &
Stone, 1969; Pool, 1959). Open-ended survey responses are challenging because brief
responses (as compared to interview transcripts or journals) are typically sparse, and
the removal of context from concepts is problematic for coder understanding. The sur-
vey format does not allow the opportunity for immediate follow-up questions to
improve understanding. Also, some respondents are more willing or able to express
their answers, respondents typically produce many different kinds of responses, and
responses can generate frequent or infrequent mention of topics that may have differ-
ent importance to the respondents (Geer, 1991; Rea & Parker, 1997; Sproull, 1988).
Jackson, Trochim / CONCEPT MAPPING OPEN-ENDED RESPONSES 309
This type of data makes standardization and reduction into codes very difficult, can
make the reporting of frequencies or co-occurrences less meaningful, and requires
careful justification of analysis decisions.
Ryan and Bernard (2000) have suggested that for analyzing free-flowing text, there
are two broad methodological approaches that can be classified as (a) words as units of
analysis (e.g., keywords in context [KWIC], semantic networks, cognitive maps) ver-
sus (b) codes as units of analysis (grounded theory, traditional content analysis,
schema analysis, etc.). This distinction—between word-based and code-based meth-
odologies—is the starting point for the methodological considerations here.
The central contention of this article is that concept mapping methodology is par-
ticularly well suited for open-ended survey text data because it combines the strengths
of word-based and code-based methodologies while mitigating some of their weak-
nesses. As described here, concept mapping is a type of participatory text analysis that
directly involves respondents or their proxies in the coding of the text. It is a multistep,
hybrid method that uses original intact respondent statements as units of analysis,
solicits the actual survey respondents or respondent proxies who use pile sorting to
“code” the data, aggregates quantitatively across individual conceptual schemes, and
enables data structure to emerge through use of multidimensional scaling and cluster
analysis of the aggregated individual coding data. Because it is based on the coding
schemes of the original survey respondents (or their proxies), it avoids some of the
problems associated with researcher-generated coding schemes. Depending on deci-
sions made at each step, the analysis can vary in the degree to which it is grounded in
existing theory. The result is a visual representation—a map—of thematic clusters.
This article discusses word-based and code-based approaches to text analysis and
argues that concept mapping offers a unique blending of the strengths of each. A
detailed example of the use of concept mapping on open-ended text response data is
presented.
Background
Word-Based Analysis Methods
Methods using words as units of analysis have been applied in organizational

research primarily in inductive qualitative studies seeking to allow data structure to
emerge or to validate a thematic content analysis (e.g., Jehn, 1995). They have several
strengths (see Ryan & Bernard [2000] and Mohammed, Klimoski, & Rentsch [2000]
for a detailed discussion of different methods). Because they use the natural meaning
embedded in language structures to represent meaning in text (Carley & Kaufer, 1993;
Carley & Palmquist, 1992), they can be used to analyze both dense and sparse types of
text. These methods typically employ computer-assisted coding, which has the advan-
tages of time-saving automation, improved reliability of coding, and expanded possi-
bilities for units of analysis—for example, the ability to map (and quantify) the rela-
tional patterns among symbols along a series of dimensions (Carley, 1993; Stone,
1997).
For example, semantic network representations count the co-occurrence of word
units to identify clusters of concepts as well as the attributes (strength, direction) of
relationships between them (Doerfel & Barnett, 1999; Roberts, 1997; Young, 1996).
Consider the following statements: “Tom loves working with Tim. Tim likes working
with Tom but loves working with Joe.” There is a strength difference between the
words like and love, and there is a difference in direction between the strengths in
coworker preference. Text mapping techniques can capture these attributes (Carley,
1993). They can also map the relationships of concepts both within a respondent’s
statement and between respondents along a series of dimensions (e.g., grammatical
patterns or centrality) (Carley, 1997). Cognitive mapping is another word-based tech-
nique that aims to elicit individuals’ judgments about relationships between a set of
important concepts about a topic in a map form that represents a mental model
(Axelrod, 1976). This is useful for comparing cognitive structures about a topic
between respondents (Carley, 1993).
These methods have great strength in that they use words (created by the respon-
dents) for units of analysis, capture relationships between concepts, and allow struc-
ture in the data to emerge based on co-occurrences of words or relational similarities
rather than imposing researcher bias in the form of preconceived thematic categories.
Word analysis techniques often are able to represent relationships that thematic code
methods cannot.
However, although convenience, reliability, and number of coding options offered
by these computer-aided analyses are improved, there are two common validity criti-
cisms. The first is that computers are unable to interpret meaning in symbols, so they
do not add validity to inference (Shapiro, 1997). They do not add an understanding or
explanation of the word unit in its social or psychological context—a human, often the
researcher, is still required to interpret map outputs. Consider the example of coding a
statement that reads, “The employee received the urgent memo and put it in the trash.”
Human judgment might thematically classify or deduce, for example, that either the
memo was not addressed to the recipient or the recipient did not think the memo was
urgent or important. Word-unit analysis can only identify concepts or actions (e.g.,
received the memo, put in trash) and/or the direction of the action (e.g. memo to
employee, memo to trash). These methods are useful in identifying similarities in
responses between individuals but less useful in drawing conclusions about the con-
text of the concepts or about the sample’s responses as a whole. The second criticism is
that even when the analysis does identify these relationships, they continue to be based
on an initial researcher judgment in selecting concepts for analysis, choosing fre-
quency cutoffs for selection, or creating exception dictionaries for the computer to run
analyses (Ryan & Bernard, 2000).
Code-Based Analysis Methods
Code-based analyses, or thematic coding methods, are often used for reducing text
data into manageable summary categories or themes for making inference about a
sample (Krippendorff, 1980; Weber, 1990). These methods are most commonly used
with denser types of text, such as in-depth interview transcripts or employee journals,
in which richer context can lead to the identification of reoccurring themes or meta-
phors. Although they differ in the end result they produce (e.g., grounded theory
approaches seek to build theory through systematic inquiry techniques to discover
themes, whereas content analysis seeks to test theory with preestablished themes
[Denzin & Lincoln, 2000; Ryan & Bernard, 2000]), they share strength in making
clear links between theory and data and in drawing conclusions across (rather than
between, as with word-based approaches) subjects or text blocks in a sample. Because
open-ended survey responses are typically a sparse, list-like type of text, content anal-
ysis has typically been applied to it over other types of code-based approaches. There-
fore, we will focus on criticisms of content analysis in this context.
Content analysis has been criticized for three main reasons: (a) It relies on
researcher-driven classification schemes; (b) it allows interdependence between cod-
ers; and (c) as a methodology, it offers weak reliability and validity assessments (Kelle &
Laurie, 1998; Krippendorff, 1980; Weber, 1990). Preconceived categorical coding
schemes have been criticized for two reasons. First, relying on coding schemes that are
created a priori or through a process of induction by the researcher can create a biased
method of classification that forces meaning into a framework that may or may not
accurately represent the respondent’s meaning. Second, because meaning is not inter-
preted uniformly across individuals, training coders to understand and agree on the
meaning of preestablished categories often leads to intercoder discussion about certain
units or categories to increase interreliability (to force “fit” between the data and the
theoretical framework) of the analysis. In many contexts, content analysis will be an
overly deterministic approach to finding structure in open-ended survey responses.
Finally, results from this type of analysis are often reported in frequency tables, cross-
tabulations, or correlations. The tendency for sporadic mention and wide variety of
concepts typically generated by open-ended responses makes the validity of this kind
of reporting suspect. Nonexhaustive categorical coding schemes pose a common
threat to validity in content analysis (Seidel & Kelle, 1995). This problem can be com-
pounded by the fact that respondents who are more interested in the topic of an open-
ended question are more likely to answer than those who are not as interested (Geer,
1991). Therefore, frequency counts may overrepresent the interested or disgruntled
and leave a proportion of the sample with different impressions of reality
underrepresented in the results. If coding categories are not exhaustive or statements
are coded into a category that is only semirepresentative of the respondent’s reality,
frequency counts and cross-tabs may underrepresent or overrepresent the distribution
of meaning in the sample. It has been suggested that one way to avoid this is to calcu-
late frequencies on the basis of the number of respondents rather than the number of
comments (Kraut, 1996). However, this does not overcome the issue of preconceived
and/or nonexhaustive coding schemes.
Concept Mapping as a Methodological Blend

of Word-Based and Code-Based Approaches
The “free list in context” nature of open-ended survey responses makes it difficult
to choose between the two approaches. On one hand, the free list characteristics of the
data lend themselves nicely to word-based approaches. They can easily recognize
reoccurring words or patterns of words. On the other hand, retaining the context of
those concepts and a desire to analyze the responses as a set representing the whole
sample make code-based analyses more appropriate.
Given the mixed strengths and weaknesses in thematic and word-mapping
approaches, there are likely to be benefits from using both in concert. This could per-
haps most easily be accomplished by analyzing the same data twice, once with each
approach. But there are likely to be efficiencies, and perhaps even new synergies, from
combining features of both approaches into new integrated methods for text analysis.
This article argues that concept mapping is such an integrated approach.
There are several specific methodologies that share the name concept mapping, but
they differ considerably both methodologically and in terms of results. One form of
concept mapping (Novak, 1998; Novak & Gowin, 1997) widely used in education is
essentially an informal process whereby an individual draws a picture of all the ideas
related to some general theme or question and shows how these are related. The result-
ing map usually has each idea in a separate box or oval with lines connecting related
ideas and often labeled with “connective” terms (e.g., leads to, results from, is a part
of, etc.). This has been done in “free form,” where the respondents record whatever
comes to their minds, and also in a more “fixed form,” where respondents construct
meaning among a given set of concepts (Novak, 1998). The cognitive mapping
approach described above (Carley & Kaufer, 1993) is a more statistical variant of this
type of concept mapping. These methods are aimed at representing the mental models
of individuals.
Another form of concept mapping (Trochim, 1989) is a more formal group process
tool that includes a sequence of structured group activities linked to a series of
multivariate statistical analyses that process the group input and generate maps.
Instead of representing the mental models of individual respondents, it depicts an
aggregate representation of the text (across respondents) in the form of thematic clusters
as generated by respondents. The process typically involves participants in brainstorm-
ing a large set of statements relevant to the topic of interest and then in individually sort-
ing these statements into piles based on conceptual similarity (a free or single-pile sort
technique (Weller & Romney, 1988). The individual sort matrices are aggregated sim-
ply by adding them together. The analysis includes a two-dimensional multidimen-
sional scaling (MDS) of the sort data and a hierarchical cluster analysis of the MDS
coordinates. The resulting maps represent a “structured conceptualization” or a multi-
dimensional graphic representation of the group’s set of ideas. Each idea is represented
as a dot or point, with ideas that are more similar (as determined by the multivariate
analysis of the participants’ input) located more proximally. Ideas (i.e., points on the
map) are clustered statistically into larger categories that are overlaid on the base maps.
Thus, the methods referred to as concept mapping range from informal, individual-
oriented approaches to formalized, statistical group processes.
This article concentrates solely on the latter form of more formalized group-oriented
concept mapping, and for the remainder of this article the term concept mapping will
be used to refer only to this variant. Although it has typically been used in group pro-
cess or evaluation applications, it has potential to analyze and represent meaning in
open-ended survey responses. It is similar to word-based approaches in that it allows
for visual representation of conceptual similarities through statistical mapping, but
different in that it retains context by using intact respondent statements as unit of anal-
ysis instead of words. It is similar to code-based approaches because it allows human
judgment to cluster these similarities thematically, but different in that it uses statisti-
cal analysis based on respondent judgments (rather than being researcher-driven) as a
basis for those decisions. The role that theory plays in informing (or biasing, as it may
be) the concept mapping analysis depends on decisions made by the researcher at each
stage of the analysis (e.g., in creating units, choosing sorters, and finishing the cluster
analysis solution).
The Concept Mapping Analysis: An Example

In considering why this method is a good match for the “free list in context” type of
data, it is useful to discuss each step of the analysis through an extended example.
There are five steps in the concept mapping process: (a) Create units of analysis, (b)
sort units of analysis into piles of similar concepts, (c) run the MDS analysis of the
pile-sort data, (d) run the cluster analyses on the MDS coordinates to decide on a final
cluster solution, and (e) label the clusters. The decisions made at each stage of the anal-
ysis (e.g., about how to unitize, how to choose sorters, whom to include in the cluster
replay analysis) have reliability and validity implications. After the example is pre-
sented, the reliability and validity issues associated with the concept mapping analysis
will be discussed.
Content for the concept mapping analysis is generated by the survey responses. The
data presented here were gathered from an open-ended question at the end of a two-
page Likert-type scale questionnaire about group process. The closed-ended questions
were primarily about group conflict, group knowledge, and expectations about the
group outcomes. The open-ended question was intentionally placed at the end of the
questionnaire to make sure the respondents had thought about the way their groups
worked together. The open-ended question was intended to explore what different
types or categories of group norms were operating in a sample of 22 work teams at the
time of measurement. The intent of the analysis was to explore what categories or
themes would emerge from the sample as a whole, not to assess which particular norms
were operating in specific individual teams. These data represent the team members’
answers to the following question, which was part of a larger survey:
What are the norms in your team? (e.g. Group norms have been defined as “written or
unwritten patterns of beliefs, attitudes, communication, and behaviors that become
established among team members.”)
Participants
The survey sample consisted of 22 teams with 76 respondents (a 74% response rate)
from an undergraduate hotel administration course at Cornell University. Each class
member was assigned to a work team of 4 or 5 people at the beginning of the semester.
Each team was then given the task of conceptualizing, opening, and managing a res-
taurant using a restaurant management computer simulation program. The teams
worked together on decision tasks such as marketing, setting menus, facilities
upgrades, staffing, and so on. Final grades were based on restaurant success, a business
plan, and teammate evaluations. The responses to this survey were gathered 1 month
into the semester, after the teams had completed several group assignments and estab-
lished some degree of working history with or understanding of each other. Respon-
dents were instructed that they were to answer all questions on the survey with their
group in mind and were given class time to complete it.
Procedure
Step 1: Creating Units of Analysis. The list-like format of open-ended survey ques-
tion text lends itself to relatively easy creation of units of analysis. A unit of analysis
consists of a sentence or phrase containing only one concept—units can often be lifted
intact from the response because respondents tend to express one idea for each concern
or opinion they list. Otherwise, unitizing is done by breaking sentences into single-
concept phrases. In this way, the context of each concept is retained and is readily
available to the sorters. It is important that each unit only contain one concept so that it
can be considered distinct from other units—for similar reasons that double-barreled
survey questions pose problems. There are two options for making unitizing decisions:
They can be made (a) by two or more researchers together (researchers can also unitize
decisions separately, then perform an interrater reliability check) or (b) by a group of
respondents (typically three to four) who work together to create units. The result of
the unitizing process is a set of single-concept statements that are placed on cards for
sorting. The benefit to having the researcher do the unitizing is that involving partici-
pants can be both time-consuming and costly if it is necessary to pay them. The draw-
back is that the way units are created may not reflect the original intent of the respon-
dents. But with this type of text, creating units of analysis is relatively easy. If trade-off
decisions have to be made concerning the amount of access to respondents, it is recom-
mended that respondents be involved in the sorting and cluster-solution stages of the
analysis over the unitizing process.
In this example, the researchers did the unitizing. The respondents’ answers to the
group norms question were, on average, a short paragraph of one to three sentences
and contained different ideas ranging from “don’t know” to statements about commu-
nication, group roles, personalities, and so on. Each answer was broken down into sep-
arate statements containing one idea about a group norm. For example, one response
was, “We have a solid belief that we all want to do well on this project and will work as
hard as possible to achieve a good grade and learn a lot from this project.”
This response was broken down into three separate statements: (a) We have a solid
belief that we all want to do well on this project, (b) we will work as hard as possible to
achieve a good grade, and (c) to learn a lot from this project. This was done for the
entire data set and resulted in 156 statements. To ensure that each unit of analysis
would be considered independently of the others, each statement was given a random
number generated by a random number function and placed on a 2- by 4-inch card.
Step 2: Sorting. The next step in the concept mapping process is to have a group of at
least 10 sorters code these units by sorting them into piles of similar statements.1
Sorters are given instructions to put each card in a pile with other cards that contain
statements they think are similar. There is no limit to the number of piles they can cre-
ate. Their only limitation is that they cannot create a “miscellaneous” pile. Any state-
ment they do not judge to be similar to any other statement should be left in its own
pile. This improves the fidelity of the data by excluding the possibility of a “junk” clus-
ter after the final analysis. Finally, they were asked to give each pile a name that they
thought most accurately represented the statements in it.
In general, it is recommended that the original respondents do the sorting to ensure
maximum representativeness of the structure that emerges from the MDS analysis.
Using the original respondents eliminates the possibility that the researcher will
impose his or her interpretation of meaning on the data (as in thematic coding schemes
or concept selection in word-analysis methods). However, there are times and circum-
stances that may make using respondents very difficult. For example, there may be
limited access to a sample (e.g., permission to administer the survey only); the respon-
dents may have very limited time to spare (e.g., CEOs); or using the original respon-
dents may be a source of contamination for a follow-up survey (e.g., prime them to dis-
cuss certain issues that will be measured again in the future). In such cases, proxy
sorters can be acceptably substituted based on careful consideration of the following
criteria: (a) how their backgrounds and experiences are similar/different to the respon-
dents and how that might influence their interpretation of units, (b) any theoretical
background/understanding underlying the research topic that they have in common
with the respondents and how a deeper/lesser understanding of that theory may influ-
ence interpretation, and (c) the degree to which existing theoretical frameworks can
provide a basis for comparison in gauging the degree of difference between respondent
content and proxy sorter groupings. Obviously, using the original respondents will
allow for less bias from existing theory or research frameworks. When the original
respondents cannot be used, it is important that it be made publicly known, that the
proxy sorters be carefully selected, and that caution be used in drawing inference from
the final maps—as the case would be with content analysis.
In this example, graduate students were used as proxy sorters instead of the original
respondents. This trade-off was made to eliminate contamination of the respondent
sample for a second time measurement about group norms. The graduate student sort-
ers were selected as appropriate based on their familiarity with the content of the
research question—in this case, teamwork and group processes. Each had also been
involved in multiple classroom team experiences. Even though the reality of the survey
respondents was not technically available to the sorters, the general social experience
was. In addition, both the sample and the proxy sorters had similar “theoretical” under-
standings about what group norms are from taking courses in the same school. Finally,
because there is an extensive literature about group norms, if there had been a major
mismatch between the content of the respondents’ statements and the way the proxy
sorters grouped them, it would have stood out to the researchers. For example, previ-
ous research has shown that groups generally develop norms that govern their “task”-
related interaction (e.g., work strategies) as well as their “social”-related interaction
(e.g., how they handle personality clashes) (Guzzo & Shea, 1992). If the proxy sorters
were consistently placing statements exclusively about personality clashes in clusters
labeled work strategies, there would be reason to further investigate why the proxies
were doing this. Each of the 10 sorters was given a packet with the stack of 156 cards,
Post-it Notes to put labels on their piles, and rubber bands to bind their piles and labels.
Step 3: The Multidimensional Scaling Analysis. Using respondents or respondent
proxies to code the data allows structure to emerge from the MDS analysis based on
aggregated individual understanding (in the form of similarity judgments) of original
responses. The first step is to create a matrix for each sorter. In this example, a 156 ×
156 binary square matrix (rows and columns represent statements) was created for
each coder. Cell values represented whether (1) or not (0) a pair of statements was
sorted by that coder into the same pile. The second step is to aggregate the similarity
judgments of the sorters by adding all 10 of the individual matrices together. From that
aggregated matrix, MDS created coordinate estimates and a two-dimensional map2 of
distances between the statements based on the aggregate sorts of the 10 coders as
shown in Figure 1. Each statement on the map is represented by a point (accompanied
by the statement number). The distance between the points represents the estimates
from MDS of how similar the statements are judged to be by the sorters. Points that are
farther apart on the map were sorted together less often than those that are closer
13,21,27,50,60,71,129,151
12
68
39,40,43 5738 24 111 112 85,89,96,99,109,113
101 117 120 131 143 59
23 18,32,70 58 147
122 78 130 149 29
84 10
62,46
4 35 83 123
61 91 63 88,144
150 17 154
91 97 100 22 128
51 81 42 139 75
51 65,121,142
90
2 56 73
126 48,64,66,76,135
5 156
148 127
104 19 8 138
74 92 53 137 1
11 55,102,114,118,
105 20
15 141 44 116 16 14 133,153,155
54 80 52 26 41
125 107 41
25 103 72 47 87 7,34,86,146
37 106 33 67
115 119 9 136 132
49
31 106 145 30 82
108 110 98 134 152 140
77 93 124
3 28 45
6 95
36 79
69
94
Figure 1: Multidimensional Scaling Point Map of Statements

Note. Similar statements are closer together.
together. The position of each point on the map (e.g., top, bottom, right, left) is not
important—only the distance or spatial relationship between the points.
Step 4: Choosing a Final Cluster Solution. The next step in this analysis is to deter-
mine the appropriate number of clusters that represent a final solution for the data. In
this example, hierarchical agglomerative cluster analysis using Ward’s algorithm was
used on the MDS map coordinates to determine how the statements cluster together
based on similarity. This type of cluster analysis is most helpful in identifying catego-
ries when the structure of categories is not already known (Afifi & Clark, 1996). A 20-
to-8 cluster replay analysis (Concept-Systems, 1999) was done to decide on the appro-
priate cluster solution. This analysis begins with each statement as its own cluster and
tracks the merging of the statements into clusters up to a 20-cluster solution. The out-
put from this analysis generates two decision tools3: (a) a list of the statements in the
4
20-cluster solution with their bridging values, presented in Table 1; and (b) the merg-
ing of clusters for each cluster solution (a list version of a dendogram), presented in
Table 2. The two decision tools together provide a statistical basis to guide human
judgment about the goodness of fit for the final cluster solution.
Each proposed cluster solution is then examined to determine how appropriate the
merging or splitting of statement groups is. A final cluster solution is chosen by exam-
ining all of the cluster solutions within a certain range5 to determine how appropriate
the merging or splitting of statement groups is. It is important to note that the central
decision being made here is on the number of clusters to select—the hierarchical clus-
ter tree structure is entirely determined by the analysis and is not the subject of
researcher discretion or judgment. The reason such judgment is required with cluster
analysis is that there is no sensible mathematical criterion that can be used to select the
number of clusters. This is because the “best” number of clusters depends on the level
of specificity desired and the context at hand, factors that can only be judged subjec-
Table 1
20-to-8 Cluster Replay Solution Output
Cluster Statement
Number (With Statement Number)
1 (153) We have not had any opportunity to establish group norms.

(155) We haven’t spent that much time together.
(64) We haven’t actually worked together yet (on the project) but through a few
interactions.
(102) We haven’t done that much together.
(114) We have done little group work so far.
(118) We do not have any established yet.
(133) Did not meet enough to figure out.
(20) Not sure yet.
(48) We have not met enough to determine.
(76) I’m not sure I can answer yet?
(1) Don’t know yet.
(66) I don’t know yet.
2 (14) None yet.
(41) None yet.
(7) N/A
(34) N/A
(86) N/A
(146) N/A
(16) We haven’t had that much time to work together.
(26) There is not really a “group” goal yet.
(138) Our group has limited work time; I’ll get back to ya!
(105) We haven’t discussed anything regarding beliefs, attitudes.
3 (120) To communicate our beliefs until everyone is satisfied.
(23) Open.
(39) Openness.
(40) Open.
(43) Openness.
(101) To listen to each other.
(78) Everyone is not afraid to voice their opinion.
(57) Discussion.
(117) Norms include: group discussions.
(122) Hearing one another’s opinions.
(35) Consideration for opinion of others.
(38) Make sure everyone understands what is going on.
(10) We are able to discuss our ideas, nobody holds back.
4 (12) No lack of communication.
(13) Communicative.
(21) So far, we seem to communicate well.
(27) Everyone communicates everything.
(60) Communicating among the group about our tasks.
(71) Communication.
(129) Communicative.
(151) The norms of my team: our strong communication skills.
(50) We want to have a channel of open communication.
5 (32) We let everyone have a say in decisions.
(70) Everyone should agree on a final decision.
(18) We are pretty much a democratic group.
(130) We must make key decisions as a group.
(24) Majority rules.
(continued)
Table 1 continued
Cluster Statement
6 (112) Everyone should contribute.

(111) We review work all together.
(131) We are all contributing members of the team.
7 (85) Splitting up the work fairly.
(89) Dividing up the work as fairly as possible.
(113) Work should be distributed evenly.
(96) We divide work.
(99) Norms include: divvying up the work.
(109) Everyone agrees to split up work evenly.
(68) Each person must contribute.
(59) We are all expected to contribute equally.
(143) To all contribute equally.
(29) Dividing up the tasks as fairly as possible.
(147) Doing your portion of the work.
(149) Make sure to be a part of the group.
8 (58) Complete all work that is assigned to you.
(46) Do what’s asked of you.
(83) That everyone will help when they are needed.
(62) The group norm: take responsibility.
(100) Jeremy and I seem to argue a lot (compromising in the end) and take charge.
(154) Responsibility.
(123) Just that everyone works hard.
(22) We are all expected to do our work on time.
(139) The group norm: do your work.
9 (90) There will probably end up a certain structure—like a leader, secretary, caller,
etc.
(88) There is one member that volunteers to collect everyone’s papers and e-mail
lists.
(144) Larissa—organizer/recorder. Lisa & Christopher—implementers.
Me (Drew)—idea person
(128) The girl offers to do the grunt work and contributes but isn’t too forceful.
(73) Who the leaders are.
(127) Probably will have one leader that leads w/o meaning to, someone who is
an organizer.
(75) Only a few take initiative to write or do stuff.
(63) One person or two assign the work and the others follow.
10 (91) Everyone respects each other’s ideas.
(65) We are always respectful (well . . . so far).
(121) Respect.
(142) Mutual respect for each other.
(150) Cooperative.
(97) Actually, everyone seems to work very well together.
(126) We all seem willing to work together.
(2) To work together.
(51) To always help each other w/patience.
(5) Norms include: lots of interactive helping.
11 (4) Must compromise.
(84) So far, we are all willing to compromise.
(61) Compromising is sometimes hard.
12 (17) If everyone does not agree, then there should be no unnecessary hostility.
(42) No one is better than another person.
(81) So far, we are all willing to work together.
13 (156) Good meshing of attitudes.
(54) We all seem willing to meet.
(148) I think a norm in our team is honesty.
Table 1 continued
Cluster Statement
(56) Everyone is open-minded.

(53) Showing up to meetings.
(44) We are all expected to do a quality job.
14 (92) Attendance.
(74) To be there.
(19) We all believe in well thought out decisions.
(104) To help get all the work done.
(15) We all know we have a job to do.
15 (25) We all believe in hard work.
(115) The desire to achieve.
(72) Hard work.
(37) Must work hard.
(103) Get the job done.
(141) We all want what is best for the group.
(125) We try to get things done as quickly as possible.
16 (31) The desire to be the best.
(110) Positive attitudes.
(119) The norms of my team: our positive attitude.
(98) Everyone keeps a positive attitude.
(33) My group norm is, and I am very happy about it, professionalism.
(47) The group norm: do a good job.
(49) Joking.
(134) Humor is a key.
(106) I think a norm in our team is fun.
(108) Positive outlook.
17 (69) Our team wants to learn.
(36) Wants to exceed.
(94) We want to learn a lot from this project.
(6) We have a solid belief that we all want to do well on this project.
(3) We will work as hard as possible to achieve a good grade.
(79) We want to run an upscale restaurant.
(95) We all want to do our work well, and get it behind us.
(28) Our team and project will have our own style and attitude.
18 (8) Our group seems fairly diverse with a multitude of attitudes and backgrounds.
(137) We are a good blend of different people.
(116) Seemingly compatible.
(80) We get along well.
(52) Our group members have these characteristics: quiet, unsure, persuadable,
leader, intelligence, optimistic.
19 (9) Kindness.
(77) Our team seems to contain dedicated individuals who want to do well in
the class.
(30) Must be calm.
(67) Very outgoing.
(136) Must be personable.
(145) Regular meetings will be held.
20 (124) I think there are cliques starting already.
(93) Male-based.
(82) Everyone has their own personal objectives it seems like.
(152) Intelligent.
(140) For some reason we all agree we are the team with the lowest cumulative IQ.
(45) We are all transfers.
(107) 2 kids offer very little input and say “whatever” a lot.
(132) We seem to be divided on most issues.
(87) Some keep talking and do not listen to others when they talk.
Table 2
Cluster Replay Solutions: From 20 to 8
At Cluster Solution Clusters Merged
19 14, 15
18 10, 11
17 5, 6
16 12, 13
15 1, 2
14 18, 19
13 8, 9
12 3, 4
11 14, 15, 16
10 10, 11, 12, 13
9 18, 19, 20
8 5, 6, 7
tively. So this issue of cluster number selection illustrates how concept mapping is a
blending of human judgment based on the more objective mathematical algorithm of
cluster analysis.
It was decided by the researchers6 that a 15-cluster solution was most appropriate.
Original respondents or proxies were not used primarily because of resource con-
straints and because the purpose of the analysis was merely to create a heuristic-like
representation of how the class described the norms of their team. The final cluster
solution map is represented in Figure 2. This decision was based on the desirability of
not splitting Clusters 1 and 2 (the “don’t know” clusters). All previous splits were
deemed reasonable.
Step 5: Labeling the Clusters. The final step in the analysis is to identify the sort-
pile label (i.e., the labels each sorter assigns to the piles of sort cards) that best repre-
sents each cluster. A centroid analysis is used to select a label for each cluster from the
pile names generated by the sorters. A centroid is defined as “the point whose coordi-
nates are the means of all the observations in that cluster” (Afifi & Clark, 1996, p. 392).
Three steps are involved in the computation. First, a centroid is computed for each
of the clusters on the map. For each cluster, this is the average x and the average y value
of the MDS coordinates for each point in the cluster. Second, a centroid value is com-
puted for every sort-pile label for every sorter. For each sort-pile label, this is the aver-
age x and the average y value of the MDS coordinates for each statement point that the
sorter placed in that pile. Finally, for each cluster, the Euclidean distance is computed
between the cluster’s centroid and the centroid of each pile label. The pile label with
the smallest Euclidean distance is considered the best fitting one. The closest 10 pile
labels constitute a “top-10” list of pile names that offers the best choice and the 9 most
reasonable alternative choices. It is then up to the decision makers to examine the list of
possible pile labels and decide if any of them is more appropriate to the statements in
the pile than the label that was statistically chosen by the software. If none of the pile
labels completely captures the theme of the cluster, a label can also be manually
entered. This decision process is also indicative of the blending of objective statistical
algorithm and human judgment involved that makes concept mapping a blend between
word-based and code-based approaches.
13,21,27,50,60,71,129,151
12
68
39,40,43 5738 24 112
111 85,89,96,99,109,113
101 117 120 131143 59
23 18,32,70 58 149 147
122 78 130 29
84 10
62,46
4 35 83 123
61 63 88,144
91
150 17 154
91 97 100 22 128
51 139 75
51 81 42
65,121,142 90
2 56 73
126 48,64,66,76,135
5 156
19 148 8 127
104 138
74 92 53 137 1
11 55,102,114,118,
105 20
15 141 44 116 16 14 133,153,155
54 52 26 41
125 80 107 41
47 7,34,86,146
25 103 72 67
87
37 106 33
115 119 9 136 132
49
31 106 145 82
108 110 98 134 30 140
152 93 124
77
3 45
6 28
95
36 79
69
94
Figure 2: Final Cluster Solution With Statement Points
The decision about what to label each cluster was made by the researchers. The
resulting 15 categories of group norms that emerged from this analysis were Role Def-
inition, All Do Their Part, Equal Distribution of Work, Consensual Decision Making,
Communication, Open to Ideas/Listening, Respect/Compromise, Cooperation, Work
Hard, Compatibility, Keeping Positive Attitude, Personality Characteristics, Achieve-
ment, Team Characteristics, and Don’t Know. The final map with labels is presented in
Figure 3, and the corresponding cluster statements are presented in Table 3. In inter-
preting the final map, keep in mind that each statement on the map is represented as a
point that is included in a cluster. The proximity of the clusters represents how similar
the statements in them were judged to be by the coders/sorters. Clusters that are farther
apart on the map contain, in general, statements that were sorted together less often
than those that are closer together. The position of each cluster on the map (e.g., top,
bottom, right, left) is not meaningful—only the distance or spatial relationship
between them. The breadth or tightness (i.e., shape and size) of a cluster generally rep-
resents whether it is a broader or narrower conceptual area.
Interpretation
These results can be interpreted in several ways. The most basic interpretation is
that through this analysis, there has emerged a theory-based representation of 15 cate-
gories, including the classification of text content within these categories, which repre-
sents the range of norms in the teams of this sample. Similar to traditional content anal-
ysis, the concepts have been coded into themes or categories (based on the clusters).
Similar to word-mapping analyses, the concepts’ positions on the maps represent rela-
tionships of similarity—both at the cluster and the unit-of-analysis level.
(Text continues on page 327)
Communication Task Related

Consensual
Decision Making
Open to Ideas/Listening Equal Distribution of Work
All Do Their
Respect/ Part
Compromise Role Definition
Cooperation
Compatibility Don't
Work Hard Know
Yet
Keeping Personality Team
Positive Characteristics Characteristics
Attitudes
Achievement
Interpersonal Related
Figure 3: Final Map With Cluster Labels and Global Interpretation
Based on this relational positioning or “structured conceptualization” of the data, it

is also possible to take this analysis one step further by examining the map for
“regions” of meaning. A region on the map represents clusters that can be meaning-
fully grouped together more tightly than they are with other regional groups of clus-
ters. This is apparent by more separation, or white space, between regions of the map.
Decisions about regional distinctions can be driven by theoretical preconceptions or
simply through discussion.
For example, the solid and dotted lines overlaid on Figure 3 represent one interpre-
tation of how group norms might be conceptualized at more of a “global” level. From
the literature on group norms, we know that group work is composed of task-process
activities related to the division and coordination of labor but also depends on interper-
sonal interaction among its members (Guzzo & Shea, 1992). There is a clear division
along these lines between the “east” (more interpersonal) and “west” (more task-
related) sides of the map, such that if it is folded on the diagonal, the two regions fall on
either side. Interestingly, there are two clusters in the “northwest” corner of the map
that seem to bridge task-process- and interpersonal-process-related norms: “commu-
nication” and “openness to ideas/listening.” There is also a clear separation between
the “don’t know” concept and the rest of the concepts on the map. As mentioned above,
the interpretation of these results is constrained by the amount of involvement by the
researchers. However, our involvement illustrates how this analysis can be grounded
quite heavily in existing theory if the research objective allows it.
Table 3
Final Cluster Solution
Statement
Cluster Name Bridging Value
Cluster 1: Don’t know

(153) We have not had any opportunity to establish group norms. .00
(155) We haven’t spent that much time together. .00
(64) We haven’t actually worked together yet (on the project)
but through a few interactions. .00
(102) We haven’t done that much together. .00
(114) We have done little group work so far. .00
(118) We do not have any established yet. .00
(133) Did not meet enough to figure out. .00
(20) Not sure yet. .00
(48) We have not met enough to determine. .00
(76) I’m not sure I can answer yet? .00
(1) Don’t know yet. .02
(66) I don’t know yet. .02
(14) None yet. .03
(41) None yet. .04
(7) N/A .09
(34) N/A .09
(86) N/A .09
(146) N/A .09
(16) We haven’t had that much time to work together. .10
(26) There is not really a “group” goal yet. .16
(138) Our group has limited work time; I’ll get back to ya! .22
(105) We haven’t discussed anything regarding beliefs, attitudes. .32
Average bridging .05
Cluster 2: Open to ideas/listening

(120) To communicate our beliefs until everyone is satisfied. .29
(23) Open. .30
(39) Openness. .30
(40) Open. .30
(43) Openness. .30
(101) To listen to each other. .33
(78) Everyone is not afraid to voice their opinion. .34
(57) Discussion. .35
(117) Norms include: group discussions. .35
(122) Hearing one another’s opinions. .36
(35) Consideration for opinion of others. .37
(38) Make sure everyone understands what is going on. .40
(10) We are able to discuss our ideas, nobody holds back. .44
Cluster 3: Communication
(12) No lack of communication. .18
(13) Communicative. .18
(continued)
Table 3 continued
Statement
Cluster 3: Communication
(21) So far, we seem to communicate well. .18
(27) Everyone communicates everything. .18
(60) Communicating among the group about our tasks. .18
(71) Communication .18
(129) Communicative. .18
(151) The norms of my team: our strong communication skills. .18
(50) We want to have a channel of open communication. .21
Cluster 4: Consensual decision making

(32) We let everyone have a say in decisions. .36
(70) Everyone should agree on a final decision. .36
(18) We are pretty much a democratic group. .37
(130) We must make key decisions as a group. .39
(112) Everyone should contribute. .40
(24) Majority rules. .42
(111) We review work all together. .44
(131) We are all contributing members of the team. .45
Cluster 5: Equal distribution of work

(85) Splitting up the work fairly. .22
(89) Dividing up the work as fairly as possible. .22
(113) Work should be distributed evenly. .22
(96) We divide work. .22
(99) Norms include: divvying up the work. .22
(109) Everyone agrees to split up work evenly. .22
(68) Each person must contribute. .25
(59) We are all expected to contribute equally. .30
(143) To all contribute equally. .33
(29) Dividing up the tasks as fairly as possible. .39
(147) Doing your portion of the work. .40
(149) Make sure to be a part of the group. .47
Cluster 6: All do their part

(58) Complete all work that is assigned to you. .49
(46) Do what’s asked of you. .49
(83) That everyone will help when they are needed. .50
(62) The group norm: take responsibility. .53
(100) Jeremy and I seem to argue a lot (compromising in the end)
and take charge. .57
(154) Responsibility. .57
(123) Just that everyone works hard. .59
(22) We are all expected to do our work on time. .60
(139) The group norm: do your work. .67

Table 3 continued
Statement
Cluster 7: Role definition

(90) There will probably end up a certain structure—like a leader,
secretary, caller, etc .36
(88) There is one member that volunteers to collect everyone’s
papers and email lists. .37
(144) Larissa—organizer/recorder. Lisa & Christopher—implementers.
Me (Drew)—(idea person. .37
(128) The girl offers to do the grunt work and contributes but isn’t too forceful .41
(73) Who the leaders are. .43
(127) Probably will have one leader that leads w/o meaning to, someone
who is an organizer. .44
(75) Only a few take initiative to write or do stuff. .47
(63) One person or two assign the work and the others follow. .50
Cluster 8: Respect/compromise
(91)Everyone respects each other’s ideas. .35
(65) We are always respectful (well...so far). .38
(121) Respect. .38
(142) Mutual respect for each other. .38
(150) Cooperative. .44
(4) Must compromise. .44
(84) So far, we are all willing to compromise. .45
(97) Actually, everyone seems to work very well together. .46
(126) We all seem willing to work together. .47
(2) To work together .49
(61) Compromising is sometimes hard. .52
(51) To always help each other w/patience. .63
(5) Norms include: lots of interactive helping. .63
Cluster 9: Cooperation
(17) If everyone does not agree, then there should be no
unnecessary hostility. .44
(156) Good meshing of attitudes. .46
(42) No one is better than another person. .50
(54) We all seem willing to meet. .50
(81) So far, we are all willing to work together. .50
(148) I think a norm in our team is honesty. .52
(56) Everyone is open-minded. .53
(53) Showing up to meetings. .57
(44) We are all expected to do a quality job. .62
Cluster 10: Work hard

(25) We all believe in hard work. .43
(115) The desire to achieve. .45
(continued)
Table 3 continued
Statement
Cluster 10: Work hard

(72) Hard work. .47
(37) Must work hard .48
(103) Get the job done. .49
(141) We all want what is best for the group. .49
(125) We try to get things done as quickly as possible. .51
(92) Attendance. .53
(74) To be there. .54
(19) We all believe in well thought out decisions. .62
(104) To help get all the work done. .64
(15) We all know we have a job to do. .67
Cluster 11: Keeping positive attitude

(31) The desire to be the best. .39
(110) Positive attitudes. .43
(119) The norms of my team: our positive attitude. .43
(98) Everyone keeps a positive attitude. .45
(33) My group norm is, and I am very happy about it, professionalism. .48
(47) The group norm: do a good job. .50
(49) Joking. .52
(134) Humor is a key. .52
(106) I think a norm in our team is fun. .54
(108) Positive outlook. .60
Cluster 12: Achievement

(69) Our team wants to learn. .38
(36) Wants to exceed. .39
(94) We want to learn a lot from this project. .40
(6) We have a solid belief that we all want to do well on this project. .41
(3) We will work as hard as possible to achieve a good grade. .44
(79) We want to run an upscale restaurant. .47
(95) We all want to do our work well, and get it behind us. .57
(28) Our team and project will have our own style and attitude. .63
Cluster 13: Compatibility

(8) Our group seems fairly diverse with a multitude of attitudes
and backgrounds .52
(137) We are a good blend of different people. .52
(116) Seemingly compatible. .54
(80) We get along well. .54
(52) Our group members have these characteristics: quiet, unsure,
persuadable; leader; (intelligence; optimistic .58

Table 3 continued
Statement
Cluster 14: Personality characteristics

(9) Kindness. .51
(77) Our team seems to contain dedicated individuals who want
to do well in HA 136 .51
(30) Must be calm. .53
(67) Very outgoing. .57
(136) Must be personable. .57
(145) Regular meetings will be held. .83
Cluster 15: Team characteristics

(124) I think there are cliques starting already. .46
(93) Male-based. .51
(82) Everyone has their own personal objectives it seems like. .52
(152) Intelligent. .53
(140) For some reason we all agree we are the team with the
lowest cumulative IQ. .54
(45) We are all transfers. .54
(107) 2 kids offer very little input and say “whatever” a lot. .61
(132) We seem to be divided on most issues. .71
(87) Some keep talking and do not listen to others when they talk. 1.00
(140) For some reason we all agree we are the team with the
lowest cumulative IQ. .54
Reliability and Validity

Concept mapping presents a visually appealing classification of text data, but more
important, it also offers several advantages over existing word-based and code-based
methods in terms of reliability and validity. Decisions made at each stage of the analy-
sis can either increase or decrease how representative results are of the sample versus
how much they are informed by existing theory. Krippendorff (1980) has outlined a
useful framework for discussion of the reliability and validity of content analysis that
will be applied here.
Reliability
Reliability has been defined as obtaining data from research that represent “varia-
tions in real phenomena rather than the extraneous circumstances of measurement”
(Krippendorff, 1980). Krippendorf (1980) discusses three types of reliability in con-
tent analysis: stability, reproducibility, and accuracy. Stability refers to the degree to
which the same coder at different times codes the same data in a similar manner.
Reproducibility refers to the extent to which similar results can be reproduced in differ-
ent times and locations and with different coders. Accuracy refers to the amount of
error (intraobserver inconsistencies, interobserver disagreements, and systematic

deviations from the standard).
The reliability of concept mapping can be assessed in several ways (Trochim,
1993). The stability of the method can be addressed, for example, by having each
sorter repeat his or her sort at a later time, then assess the correlation between the two
sort matrices. Reproducibility refers to intercoder reliability and can be assessed by
correlating each individual sorter’s matrix against the entire sample of sorters (a form
of item-total reliability assessment that in effect treats the aggregate as the “errorless
solution”). This has been discussed in detail by Trochim (1993). Intercoder reliability
is especially important to consider when making decisions about whom to choose as
sorters because it has implications for the validity of results. In the analysis of text data,
meaning is constructed in readers’ minds through an interaction of their interpretation
of the text and their own experiences or reality (Findahl & Hoijer, 1981; Lindkvist,
1981). Therefore, if sorters with a different experience or background sort the
responses, they may interpret them differently than the original respondents intended.
The most obvious way to minimize the potential for misunderstanding is to have the
original respondents serve as sorters. We highly recommend this. However, there are
times and circumstances that may make that difficult. If so, we recommend careful
selection of proxies per the guidelines described above.
One major reliability benefit to the concept mapping method is that the accuracy of
each coder is not a problem compared to more traditional notions of intercoder reli-
ability. There is no preestablished category structure to which to conform. Each sorter
makes his or her own judgments about how many categories to create, what each cate-
gory should contain, and what each category should be called. Therefore, intersorter
error or disagreement is taken into account through statistical aggregation of the simi-
larity judgments of the individual coders. Occasionally one coder will generate a sort
that is radically different from the other coders’ (for example, one coder does a sort
with only 2 piles, whereas the rest of the coders generated 10 to 12 piles). As discussed
above in regard to stability, when individual sort matrices are correlated against the
aggregate, any outliers will be identified by very low correlations. At that point, the
researcher must make a judgment as to whether the sorter followed instructions and
can do a reproducible sort. A radically different sort may represent a legitimate inter-
pretation of similarity between concepts, but often it represents that the coder did not
understand the purpose or instructions of the sort. These situations must be carefully
considered, and decisions about including or excluding the sort must be well justified.
In addition to the reliability issues discussed above, Krippendorff (1980) identified
four more common reliability concerns: (a) Some units are harder to code than others;
(b) some categories are harder to understand than others; (c) subsets of categories can
sometimes be confused with larger categories; and (d) individual coders may be care-
less, inconsistent, or interdependent. Each of these will be discussed.
Some Units Are Harder to Code Than Others and Some Categories Are Harder to
Understand Than Others. A strength of the concept mapping method is that it offers a
nonforced, systematic way for sorters to create categories that they understand from
their unique perspectives. Sorters are given instructions to create as many or as few
piles as it takes to group the statements (i.e., units) according to how similar they are to
each other. Therefore, sorters can create their own categories and place “hard to cate-
gorize” statements in whichever pile they feel is appropriate. If they feel they do not
understand how to categorize a particular statement, they have the option of leaving it
in its own pile instead of forcing it into a preestablished category.
The software also generates a useful statistic called a “bridging value” that helps the
researcher to identify the degree to which any given statement is related to ones that are
similar in meaning or tend to “bridge” a more diverse set of statements. Statements that
are difficult to sort will show up as having a high bridging value (Concept-Systems,
1999). Bridging values can be used at several points in the analysis. For example, while
choosing the final cluster solution, the decision makers can examine bridging values of
each statement as a guide to whether that statement should be included in a different
cluster. Bridging values are also available for each cluster (see Table 3). Cluster bridg-
ing values are an indicator of how cohesive the statements are with the other statements
around them—it is the average bridging value of all statements in a cluster (Concept-
Systems, 1999).
Subsets of Categories Can Sometimes Be Confused With Larger Categories. The
emergence of subcategories in the concept mapping methodology is not an issue. Each
statement (or unit) is placed on a separate card and represents only one concept (in this
example, one group norm). Clusters of points on the map represent patterns of shared
similarity in meaning but do not necessarily represent exact agreement on the mean-
ings. The results from all of the sorts are aggregated to give us the most accurate
“model” of reality based on the sorters’ perspectives in aggregated form. Therefore,
instead of subcategories, statements that are understood as categorically/thematically
similar but conceptually different will be sorted into separate piles as understood by
the sorters and will most likely emerge as proximally located on the map through the
MDS analysis of the aggregated sort results.
Individual Coders May Be Careless, Inconsistent, or Interdependent. Another

strength that concept mapping brings to reliability is that sorters (coders) in this
methodology are always independent of each other. There is no need for sorters to
discuss how to conceptualize problematic concepts or reach a greater degree of
interrater agreement. In our experience and as mentioned by others (Boster, 1994),
because sorters are conceptualizing their own similarity judgments, their attention
level and enthusiasm for the task tends to be high—unless they are given too many
statements to sort. As mentioned above in the discussion of stability, reproducibility,
and accuracy, carelessness or inconsistencies can easily be identified by low correla-
tions between matrices.
Validity
Qualitative data pose an interesting obstacle to validity. If we know nothing about

the subject, we cannot capture meaning effectively—conversely, if we know a lot
about the subject, our own biases might interfere (Krippendorff, 1980; Miles &
Huberman, 1994; Patton, 1990). Concept mapping helps to ease this tension somewhat
by combining statistical analysis and human judgment. The degree to which theory
guides the concept mapping analysis is introduced through choices about whom to
include as decision makers in the analysis. The more respondents are used at each
stage of the analysis, the greater the resulting map represents their collective under-
standing of the topic at hand. Because concepts are social constructions, there is really
no way to establish a standard by which to judge the degree of error (Krippendorff,

1980). The main strength that concept mapping offers to validity is that by using multi-
dimensional scaling and cluster analysis to represent the similarity judgments of mul-
tiple coders, it allows meaning and relationships to emerge by aggregating the “biases”
or “constructions” of many. Instead of arbitrary bias and potentially forcing values of
the investigator with a priori categories or semantic encoding choices, sorting concepts
allows for a web of concept relationships to be represented by sorters immersed in the
context of their own social reality.
An additional value of concept mapping is that by having multiple sorters create
their own categories, we can help ensure that the categories are exhaustive—an espe-
cially important validity concern considering the variability of concepts produced in
open-ended survey responses. Nonexhaustive categorical coding schemes pose a com-
mon threat to validity in code-based content analysis (Seidel & Kelle, 1995). A more
word-based analysis of encoding the co-occurrence of concepts would be below the
level of interest in this study because the context of the concepts (in relation to each
other and to each cluster) is more important than just co-occurring.
Construct Validity. Krippendorff (1980) has identified two types of construct valid-
ity in content analysis. First, semantical validity is the “degree to which a method is
sensitive to the symbolic meanings that are relevant within a given context” (p. 157). It
has an internal component in terms of how the units of analysis are broken down as
well as an external component in terms of how well coders understand the symbolic
meaning of respondents’ language.
Internal semantical validity refers to the process of unitizing—the reduction of the
original text to individual phrases. Open-ended survey responses usually generate list-
like phrases, which lend themselves well to unitizing. A unit should consist of a sen-
tence or phrase containing one single concept. This is the most time-consuming step of
the analysis. Although is not likely that the researcher would be able to bias decisions
about how to break up responses based on the result he or she hoped to obtain from the
analysis, it is our experience that involving two or three of the original respondents in
this process is useful. Most of the units are created by separating complex sentences
into simple sentences (see the example above). The important issue in unitizing is to
absolutely retain the original language and meaning of the original statement. Discus-
sion with collaborators and/or original survey respondents can be used to reduce any
uncertainty about a decision. Although this step in the analysis can potentially intro-
duce threats to validity, the sparseness and list-like nature of open-ended survey
responses usually does not create very much uncertainty in creating units (this would
not be the case in trying to unitize denser texts).
In terms of external semantical validity, if unitizing is done well and enough of the
context of the statements was preserved, the meaning should be clear to the sorters
(this is why choosing sorters based on well-justified criteria is important). Statements
that are hard to interpret can easily be identified in this analysis because they will have
high bridging values and/or often appear more toward the center of the map (they are
pulled in several directions by the MDS analysis). The researcher can then use this
information to revisit how the units of analysis were created.
The second type of construct validity, sampling validity, “assesses the degree to
which available data are either an unbiased sample from a universe of interest or suffi-
ciently similar to another sample from the sample universe so that data can be taken as
statistically representative of that universe” (Krippendorff, 1980, p. 157). The survey

data presented in this example were taken from a census rather than a sample and there-
fore were representative of the population of interest (that semester’s class). This is
likely the case in most analyses of open-ended survey data. Random sample selection,
sample size guidelines, and how to handle missing data are issues that apply to this
method as much as they to do other survey methods.
A final validity consideration is the external validity of concept mapping results/
classifications. By using human judgment and statistical analyses in concert, the cate-
gories in concept maps are more data-driven than they are in traditional content analy-
sis (where they are instead typically picked by the researcher). They also do not depend
on researcher judgments about which concepts to encode or include in exclusion dic-
tionaries, as do word-unit approaches. Concept mapping is a systematic way of for-
malizing a choice in syntax/context relationship. That being said, the final judgment
about this representation is based on human interpretation. Having the actual respon-
dents participate in determining the final cluster solution (the cluster replay analysis)
serves as a check for validity of meaning.
Limitations
The method proposed here has only addressed the analysis of a relatively simple
and sparse type of qualitative data. More dense or complex textual data, such as long
interview transcripts or board meeting minutes, pose a different series of methodologi-
cal, reliability, and validity questions, which will be the subject of future research efforts.
In this example, it was relatively straightforward to reduce three- to five-sentence
responses into units of analysis containing only one concept. Each sentence did not
rely on the context of those preceding or following it for meaning. For example, they
were not part of a complex argument or reasoning process.
The answers here were also all stated in one direction. They did not contain “if,
then” statements; positive or negative qualifications; or conditional judgments. Con-
sider the following statement: “I would hire her if she had more education and different
job experience.” This statement contains two different concepts (more education and
different experience) that qualify a decision (to hire or not). This poses problems for
unit-of-analysis reduction. This also causes problems for sorters who might be faced
with statements in the same set such as, “I would hire her if she had a college degree
and a better GPA”; “I won’t hire her”; and “She doesn’t have enough experience, but I
think job training will help her.” In the case of this type of data, semantical or semantic
network text analysis would probably be more effective.
Another limitation of this methodology is that of resource restriction and/or sorter
burden. This data set contained 156 statements, which can be considered as being on
the upper end of what can be reasonably be processed by sorters. More than 200 state-
ments tend to overwhelm sorters and greatly reduce their willingness to remain
engaged and finish the task. For the statement set presented in this example, it took
each sorter about 30 to 40 minutes to finish the sort and give each of their piles a label.
This is somewhat quick compared to traditional content analysis coding, but the con-
cept mapping method requires at least 10 to 12 carefully selected sorters to produce a
reliable map. It is always possible to reduce a large list of statements by combining or
eliminating redundant or near-redundant ones, and it is always possible to achieve a
lower number of discrete statements by broadening the criteria of what constitutes
redundancy. It should be noted that in this example, repeat units of analysis were
allowed. For example, if 15 people said, “I don’t know,” then 15 “I don’t know” state-
ments were sorted. Ordinarily, in a larger data set, 15 repetitions of “I don’t know” are
not necessary. If the decision to eliminate redundant statements is made, caution must
be used in drawing inference from the results (e.g., if redundant statements are elimi-
nated, no inference about importance or frequency can be made).
Future Directions
There are several directions in which this methodology can be developed and
extended in future studies to build theory. The identification of regions on the maps can
lead to theorizing about scale subdimensions or uncover theoretical areas that need
more investigation. Concept mapping can also be used to generate items and identify
dimensions in the process of scale development (e.g., see Jackson, Mannix, Peterson, &
Trochim, 2002) and can also be used to develop coding schemes and content for inter-
view and/or follow-up interview questions (Penney & Trochim, 2000). Another inter-
esting application of this methodology is to compare how different groups of people
(e.g., from different disciplines, industries, or levels of organizational hierarchy)
might generate different concept mapping solutions depending on their experiences
and understanding of the same phenomena. This can potentially guide researchers in
identifying boundary conditions for theory.
An extension of the core concept mapping results may also be of use in organiza-
tional research. For example, once the final map has been produced, comparisons
among different stakeholders groups can be made by gathering Likert-type scale rat-
ings of each statement on any dimension (e.g., importance or relevance to job or group)
and according to any demographic characteristic of interest (e.g., management vs. line
workers, engineers vs. marketing, or new employee vs. employee with long tenure). In
this way, there is a map that represents the entire population of interest that also allows
differences among participants to be identified within the clusters. Intergroup agree-
ment or differences can be statistically assessed. This can be dummy coded and used in
a regression to predict performance.
Conclusions
Concept mapping is one alternative for the analysis of open-ended survey
responses. It has several notable strengths or advantages over alternative approaches.
Unlike word-analysis approaches, it does not rely on precoded, computer-recognized
semantic relationships or frequency counts, therefore retaining the context of the origi-
nal concept. Unlike code-analysis approaches, it does not use forced category classifi-
cations that are laden with researcher bias. Instead, it enables estimation of the similar-
ity between concepts and clusters of concept categories that are representative of a
combination of human judgment/respondent experience and statistical analysis. Con-
cept mapping cuts analysis time down significantly (the above analysis was completed
in 3 hours, including sort-material preparation and data entry) while at the same time
offering improvements to some of the reliability and validity challenges of word-based
and code-based analysis methods.
Concept mapping of open-ended survey questions appears to be especially well

suited for the following types of organizational research questions: (a) when the
researcher does not want to impose bias or suggest relationships by forcing the data
into a preconceived coding scheme, (b) when existing coding schemes or theoretical
frameworks do not already exist or when the purpose of the research is to explore pos-
sibilities for conceptual categories, and (c) when there are competing theoretical
explanations or frameworks. The degree to which these three objectives can be accom-
plished, of course, depends on decisions made at each stage of the analysis. Concept
mapping is a promising alternative for the analysis of open-ended survey questions in
organizational research and for building stronger theory from their results. By involv-
ing human judgment at various steps of a statistical mapping analysis, it combines the
best of interpretive and representative techniques. It appears to be a rather promising
addition to the analysis techniques currently available.
Notes
1. This process can also be done on the computer or over the Web.
2. One approach in studies using multidimensional scaling (MDS) analysis is to run analyses
for multidimensional solutions and then pick the dimension that accounts for the greatest
amount of variance as a goodness-of-fit test (e.g., see Pinkley, 1990). The concept mapping
method does not do this for two reasons. First, two dimensions are easier to interpret and under-
stand in the final maps. Second, as Kruskal and Wish (1978) point out, “when an MDS config-
uration is desired primarily as the foundation on which to display clustering results, then a
two-dimensional configuration is far more useful than one involving three or more dimen-
sions” (p. 58).
3. The software we used generates these decision tools. We would like to point out that the
concept mapping analysis can be conducted using most commercial statistical packages.
However, some of the output that is generated by the software we used would require more
postprocessing.
4. The bridging value, ranging from 0 to 1, tells how often a statement was sorted with others
that are close to it on the map or whether it was sorted with items that are farther away on the map
(Concept-Systems, 1999). Lower bridging values indicate a “tighter” relationship with other
statements in the cluster. This information can be used as a “backup” to human judgment about
the appropriateness of a cluster solution.
5. Depending on the level of detail desired, this range may increase or decrease. A range of 8
to 20 is recommended for most data sets of this size.
6. To ensure maximum validity of how the structure is represented by thematic clusters, it is
recommended that a group of original respondents make the final cluster solution decisions. In
this example, the researchers made the final cluster solution decisions because the purpose of the
analysis was merely to explore what kinds of norms the students would mention. There was no
intention of making predictions; generalizing the results; or drawing conclusions about the
agreement, usefulness, similarities, or most frequent type of norm. We felt that imposing our
theoretical understanding of group norms, drawing from a vast literature on group norms, was
acceptable in this case. However, in additional projects in which we have used this methodology
(e.g., Jackson, Mannix, Peterson, & Trochim, 2002), when the intention of inference was high
and the respondents’reality was totally unknown to us, we used the original respondents in every
step of the analysis (sorting, cluster replay, and labeling). If proxy sorters are chosen to make the
final cluster solution, the same guidelines for choosing them should apply and be justified.
References
Afifi, A., & Clark, V. (1996). Computer-aided multivariate analysis (3rd ed.). Boca Raton, LA:
Chapman & Hall/CRC.
Axelrod, R. (Ed.). (1976). Structure of decision: The cognitive maps of political elites. Prince-
ton, NJ: Princeton University Press.
Boster, J. (1994, June). The successive pile sort. Cultural Anthropology Methods, pp. 11-12.
Carley, K. (1993). Coding choices for textual analysis: A comparison of content analysis and
map analysis. In P. Marsden (Ed.), Sociological methodology (Vol. 23, pp. 75-126). Wash-
ington, DC: Blackwell for the American Sociological Association.
Carley, K. (1997). Network text analysis: The network position of concepts. In C. Roberts (Ed.),
Text analysis for the social sciences: Methods for drawing statistical inference from texts
and transcripts (pp. 79-100). Mahway, NJ: Lawrence Erlbaum.
Carley, K., & Kaufer, D. (1993). Semantic connectivity: An approach for analyzing symbols in
semantic networks. Communication Theory, 3, 183-213.
Carley, K., & Palmquist, M. (1992). Extracting, representing, and analyzing mental models. So-
cial Forces, 70, 601-636.
Concept-Systems. (1999). The Concept System facilitator training manual. Ithaca, NY: Con-
cept Systems Inc. Available from http://www.conceptsystems.com
Denzin, N., & Lincoln, Y. (Eds.). (2000). Handbook of qualitative research (2nd ed.). Thousand
Oaks, CA: Sage.
Doerfel, M., & Barnett, G. (1999). A semantic network analysis of the international communi-
cation association. Human Communication Research, 25, 589-603.
Erickson, P. I., & Kaplan, C. P. (2000). Maximizing qualitative responses about smoking in
structured interviews. Qualitative Health Research, 10, 829-840.
Findahl, O., & Hoijer, B. (1981). Media content and human comprehension. In K. Rosengren
(Ed.), Advances in content analysis (pp. 111-132). Beverly Hills, CA: Sage.
Fine, G., & Elsbach, K. (2000). Ethnography and experiment in social psychological theory
building: Tactics for integrating qualitative field data with quantitative lab data. Journal of
Experimental Social Psychology, 36, 51-76.
Geer, J. G. (1991). Do open-ended questions measure “salient” issues? Public Opinion Quar-
terly, 55, 360-370.
Gerbner, G., Holsti, O., Krippendorff, K., Paisley, W., & Stone, P. (Eds.). (1969). The analysis of
communication content: Development in scientific theories and computer techniques. New
York: John Wiley.
Guzzo, R., & Shea, G. (1992). Group performance and intergroup relations in organizations. In
M. Dunnette & L. Hough (Eds.), Handbook of industrial and organizational psychology
(2nd ed., Vol. 3, pp. 269-313). Palo Alto, CA: Consulting Psychologists Press.
Jackson, K., Mannix, E., Peterson, R., & Trochim, W. (2002, June 15-18). A multi-faceted ap-
proach to process conflict. Paper presented at the International Association for Conflict
Management, Salt Lake City, UT.
Jehn, K. (1995). A multimethod examination of the benefits and detriments of intragroup con-
flict. Administrative Science Quarterly, 40, 256-282.
Kelle, U., & Laurie, H. (1998). Computer use in qualitative research and issues of validity. In
U. Kelle (Ed.), Computer-aided qualitative data analysis: Theory, methods, and practice
(pp. 19-28). Thousand Oaks, CA: Sage.
Kraut, A. (Ed.). (1996). Organizational surveys: Tools for assessment and change. San Fran-
cisco: Jossey-Bass.
Krippendorff, K. (1980). Content analysis: An introduction to its methodology (Vol. 5).
Newbury Park, CA: Sage.
Kruskal, J., & Wish, M. (1978). Multidimensional scaling. Beverly Hills, CA: Sage.
Lindkvist, K. (1981). Approaches to textual analysis. In K. Rosengren (Ed.), Advances in con-

tent analysis (pp. 23-41). Beverly Hills, CA: Sage.
Miles, M., & Huberman, M. (1994). Qualitative data analysis: An expanded sourcebook (2nd
ed.). Thousand Oaks, CA: Sage.
Mohammed, S., Klimoski, R., & Rentsch, J. (2000). The measurement of team mental models:
We have no shared schema. Organizational Research Methods, 3, 123-165.
Novak, J. (1998). Learning, creating, and using knowledge: Concept maps as facilitative tools
in schools and corporations. Mahwah, NJ: Lawrence Erlbaum.
Novak, J., & Gowin, D. B. (1997). Learning how to learn. New York: Cambridge University
Press.
Patton, M. (1990). Qualitative evaluation and research methods (2nd ed.). Newbury Park, CA:
Sage.
Penney, N., & Trochim, W. (2000, November 1-5). Concept mapping data as a guide for devel-
oping qualitative interview questions. Paper presented at the American Evaluation Associ-
ation: Increasing Evaluation Capacity, Honolulu, HI.
Pinkley, R. (1990). Dimensions of conflict frame: Disputant interpretations of conflict. Journal
of Applied Psychology, 75, 117-126.
Pool, I. d. S. (Ed.). (1959). Trends in content analysis. Urbana-Champagne: University of Illi-
nois Press.
Pothas, A.-M., Andries, D., & DeWet, J. (2001). Customer satisfaction: Keeping tabs on the is-
sues that matter. Total Quality Management, 12, 83-94.
Rea, L., & Parker, R. (1997). Designing and conducting survey research: A comprehensive
guide. San Francisco: Jossey-Bass.
Roberts, C. (1997). A theoretical map for selecting among text analysis methods. In C. Roberts
(Ed.), Text analysis for the social sciences: Methods for drawing statistical inferences from
texts and transcripts (pp. 275-283). Mahwah, NJ: Lawrence Erlbaum.
Ryan, G., & Bernard, R. (2000). Data management and analysis methods. In N. Denzin & Y.
Lincoln (Eds.), Handbook of qualitative research (2nd ed., pp. 769-802). Thousand Oaks,
CA: Sage.
Seidel, J., & Kelle, U. (1995). Different functions of coding in the analysis of textual data. In U.
Kelle (Ed.), Computer-aided qualitative data analysis: Theory, methods, and practice
(pp. 52-61). Thousand Oaks, CA: Sage.
Shapiro, G. (1997). The future of coders: Human judgments in a world of sophisticated soft-
ware. In C. Roberts (Ed.), Text analysis for the social sciences: Methods for drawing statis-
tical inferences from texts and transcripts (pp. 225-238). Mahwah, NJ: Lawrence Erlbaum.
Sproull, N. (1988). Handbook of research methods: A guide for practitioners and students in the
social sciences (2nd ed.). Lanham, MD: Scarecrow Press.
Stone, P. (1997). Thematic text analysis: New agendas for analyzing text content. In C. Robert
(Ed.), Text analysis for the social sciences: Methods for drawing statistical inferences from
texts and transcripts (pp. 35-54). Mahwah, NJ: Lawrence Erlbaum.
Tashakkori, A., & Teddlie, C. (1998). Mixed methodology: Combining qualitative and quantita-
tive approaches. Thousand Oaks, CA: Sage.
Trochim, W. (1989). An introduction to concept mapping for planning and evaluation. Evalua-
tion and Program Planning, 12, 1-16.
Trochim, W. (1993, November 6-11). Reliability of concept mapping. Paper presented at the
Weber, R. (1990). Basic content analysis (2nd ed.). Newbury Park, CA: Sage.
Weller, S., & Romney, A. K. (1988). Systematic data collection (Vol. 10). Newbury Park, CA:
Sage.
Young, M. (1996). Cognitive mapping meets semantic networks. Journal of Conflict Resolu-
tion, 40, 395-414.
Kristin M. Jackson is a Ph.D. candidate at the Johnson Graduate School of Management at Cornell Univer-
sity. Her research interests include social research methods, leadership in groups and teams, conflict and
work strategies in autonomous groups, and group decision making.
William M. K. Trochim is a professor of policy analysis and management in human ecology at Cornell Uni-
versity. His research interests include social research methods, group decision support systems, concept
mapping, pattern matching, and decision analysis.
Sign our guestbook
Complete the Guestbook below to gain access to a trial version of version 4.
Please sign in and tell us about your interests. When you have completed this information,
you will be able to access the download page. Required fields have names shown in red.
Your Information
SEARCH SITE First Name: Pablo
Last Name: Española
Title: Professor
Organization: University of the Philippines
CSI consultants Department: Management

have a wealth of
experience to help Address: Gen. Luna St.
organizations get
the most out of The City Iloilo City
Concept System.
Read More... State/Province Iloilo
ZIP/Postal Code: 5000
Country: Philippines
Phone: +63 33 337-7626
Fax: +63 33 337-7626
E-mail: pabloespanola@gmail.com
Type of organization you work in: Academic Institution
How did you learn about the Concept System?

Cornell U - Research Methods Knowledge Base
Comments and Questions

Concept Mapping is relatively unknown to me. I'd like to
explore its use and capability.
Click on the Submit button to complete your registration
Submit Reset
Sign our guestbook
Complete the Guestbook below to gain access to a trial version of version 4.
Thank you! Bookmark our website, as it will be updated all the time. You can now try out
our software demo, if you wish. Proceed to the download page.
SEARCH SITE
The Concept System

assists you by
integrating
measurement and
existing systems.
Read More...
Download Page
This link page allow you to download our current version of the software, version 4
Please bookmark this page so that you can return to it without having to re-register in the
guestbook.
The Concept System Setup Program
Click this link, and choose 'save to disk'.

If you have a previous versions of the Concept System, please uninstall that first.
SEARCH SITE
Next, double-click C4Installer.msi to install the Concept System, we recommend you use the
default installation options.
Once you have completed the installation process, you can open up the Concept System and
add the Strategic Planning Example to the Project List for evaluation purposes.
To check in the Strategic Planning Example:

Start The Concept System
Click the Check In button
Using our
Navigate to the folder the Concept System resides in (probably: C:\program files\Concept
proprietary software
System\)
and proven group
Open the Data folder
processes your
Click on example.mdb
organization can
Click the OK button
conduct your
planning and To open the Strategic Planning Example use:
decision making Username: admin
efforts faster, more Password: concept
comprehensively,
more objectively, Please print this page for reference if you need it.
and with much more
satisfying results.
Read More...
Updated website, that page no longer exists.
We're sorry, you've reached a page which no longer exists at ConceptSystems.com. We just
launched a new and improved version of our website, which will make it easier to find
information related to Concept Mapping, our software and consulting solutions. Please
browse from the menu at the left, and you'll be able to easily find the resource you are
looking for. If you have any questions, email us at:
Concept Systems Information
SEARCH SITE
Pattern matches
quickly and visually
point out the
relative opinions
and priorities of
distinct groups.
Read More...
Using the Concept System Knowledge Base
How the knowledge base is structured and what you should do if you need assistance
Concept Mapping Overview
SEARCH SITE Basic overview of Concept Mapping and how you can learn more about it
Concept Mapping Process Guide
Explanations and examples to help you understand each step of the Concept Mapping
Process
The Concept System
assists you by
integrating
measurement and The Concept System Software Guide
existing systems.
Read More... Step-by-step instructions to help you complete tasks using The Concept System
The Concept System Word Processor Guide
Detailed online documentation for qualitative analysis, text abstraction and report generation

Social Research Methods Knowledge Base PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Social Research Methods Knowledge Base PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Resources

*Concept Systems, Knowledge Base Research Center Selecting Statistics

Research Synthesis Gallery The Simulation Book Research Methods Tutorials

● the Knowledge Base -- an online hypertext textbook on applied social research

You can purchase a complete printed copy of the Research Methods

Using the KB in a Course

The latest editions of the Knowledge Base are published exclusively

Atomic Dog Publishing has agreed to allow me to continue to make

1. You must notify me by e-mail (wmt1@cornell.edu) each time

2. You may not reproduce these webpages in part or in whole

3. You must cite this website appropriately in any list of

Please see the latest edition of the Knowledge Base at http://www.

About the Author

William M.K. Trochim is a Professor in the Department of Policy

This work, as is true for all significant efforts in life, is a collaborative

For Mary and Nora

who continue to astonish me with their resilience, patience, and

Copyright ©2002, William M.K. Trochim, All Rights Reserved

Research Methods Knowledge Base

Yin-Yang Map Navigating the Knowledge Base

The Border Contents

The Home Page

The Table of Contents

The Yin-Yang Map

The Road Map

Copyright ©2002, William M.K. Trochim, All Rights Reserved

Copyright ©2002, William M.K. Trochim, All Rights Reserved

Copyright ©2002, William M.K. Trochim, All Rights Reserved

Copyright ©2002, William M.K. Trochim, All Rights Reserved

Copyright ©2002, William M.K. Trochim, All Rights Reserved

Cleaning and organizing the data for analysis (Data Preparation)

Copyright ©2002, William M.K. Trochim, All Rights Reserved

Copyright ©2002, William M.K. Trochim, All Rights Reserved

Knowledge Base Search Page

Start Search Clear

information and retrieval

information not retrieval

(information not retrieval) and WAIS

Order the Enhanced and Revised KB

Thanks for your interest in the Research Methods Knowledge Base.

Copyright ©2002, William M.K. Trochim, All Rights Reserved

Copyright ©2002, William M.K. Trochim, All Rights Reserved

©Copyright, William M.K. Trochim 1998-2000. All Rights Reserved.

CONTENT AND LIABILITY DISCLAIMER

Copyright ©2002, William M.K. Trochim, All Rights Reserved

Design and Procedures section:

IV. Conclusions, Abstract and Reference Sections

II. Parallel Construction

Tense is kept parallel within and between sentences (as appropriate).

IV. Spelling and Word Usage

Copyright ©2002, William M.K. Trochim, All Rights Reserved

Reference Citations in the Text of Your Paper

"To be or not to be" (Shakespeare, 1660, p. 241)

One Work by One Author:

One Work by Multiple Authors:

Wasserstein et al. (1994) found [subsequent times you cite in text]

Reference List in Reference Section

EXAMPLES BOOK BY ONE AUTHOR:

Jones, T. (1940). My life on the road. New York: Doubleday.

BOOK BY TWO AUTHORS:

BOOK BY THREE OR MORE AUTHORS:

BOOK WITH NO GIVEN AUTHOR OR EDITOR:

TWO OR MORE BOOKS BY THE SAME AUTHOR: