Beruflich Dokumente
Kultur Dokumente
February 2003
This publication has been produced by the National Food Service Management Institute-Applied
Research Division, located at The University of Southern Mississippi with headquarters at The
University of Mississippi. Funding for the Institute has been provided with Federal funds from
the U.S. Department of Agriculture, Food and Nutrition Service, to The University of
Mississippi. The contents of this publication do not necessarily reflect the views or policies of
The University of Mississippi or the U.S. Department of Agriculture, nor does mention of trade
names, commercial products, or organizations imply endorsement by the U.S. Government. The
National Food Service Management Institute complies with all applicable laws regarding
affirmative action and equal opportunity in all its activities and programs and does not
discriminate against anyone protected by law because of age, color, disability, national origin,
race, religion, sex, or status as a veteran or disabled veteran.
National Food Service Management Institute
The University of Mississippi
Building the Future Through Child Nutrition
Location
The National Food Service Management Institute (NFSMI) was established by Congress in 1989
at The University of Mississippi in Oxford as the resource center for Child Nutrition Programs.
The Institute operates under a grant agreement with the United States Departme nt of Agriculture,
Food and Nutrition Service. The NFSMI Applied Research Division is located at The University
of Southern Mississippi in Hattiesburg.
Mission
The mission of the NFSMI is to provide information and services that promote the continuous
improvement of Child Nutrition Programs.
Vision
The vision of the NFSMI is to be the leader in providing education, research, and resources to
promote excellence in Child Nutrition Programs.
Administrative Offices
Education Division Applied Research Division
The University of Mississippi The University of Southern Mississippi
P.O. Drawer 188 Box 10077
University, MS 38677-0188 Hattiesburg, MS 39406-0077
Phone: 800-321-3054 Phone: 601-266-5773
http://www.nfsmi.org
Training Evaluation 1
Abstract
tools achieve the purpose for which they were intended. Cronbach described evaluation
systematic and comparatively objective (Torres, Preskill, & Piontek, 1996). In this
review, evaluation is defined as a study designed and conducted to assist some audience
to assess an object’s merit and worth (Stufflebeam, 2001). One major model of
evaluation was identified. This model, developed by Kirkpatrick in 1952, remains widely
used today (ASTD, 1997). The model includes four levels of measurement to assess
evaluation strategies based on the Kirkpatrick Model holds the greatest promise for
• Phillips (1991) defined evalua tion as a systematic process to determine the worth,
of forming value judgments about the quality of programs, products, and goals.
collecting and analyzing data in order to determine whether and to what degree
extent to which a program has met its stated performance goals and objectives.
evaluation found in this literature review. The reason for selecting Stufflebeam’s
definition was based on the applicability of the definition across multiple disciplines.
Training Evaluation 3
Based on this definition of evaluation, the Kirkpatrick Model was the most frequently
Phillips (1991) stated the Kirkpatrick Model was probably the most well known
framework for classifying areas of evaluation. This was confirmed in 1997 when the
America Society for Training and Development (ASTD) assessed the nationwide
variety of types of U.S. organizations. Survey results indicated the majority (81%) of
HRD executives attached some level of importance to evaluation and over half (67%)
used the Kirkpatrick Model. The most frequently reported challenge was determining the
impact of the training (ASTD, 1997). Lookatch (1991) and ASTD (2002) reported that
reaction to the program, the amount of learning that took place, the extent of behavior
change after participants returned to their jobs, and any final results from a change in
doctoral research, the concept of the four Kirkpatrick measurement levels of evaluation
emerged. While writing an article about training in 1959, Kirkpatrick (1996) referred to
these four measurement levels as the four steps of a training evaluation. It is unclear
even to Kirkpatrick how these four steps became known as the Kirkpatrick Model, but
this description persists today (Kirkpatrick, 1998). As reported in the literature, this
Kirkpatrick’s first level of measurement, reaction, is defined as how well the trainees
liked the training program. The second me asurement level, learning, is designated as the
determination of what knowledge, attitudes, and skills were learned in the training. The
recognized a big difference between knowing principles and techniques and using those
principles and techniques on the job. The fourth measurement level, results, is the
expected outcomes of most educationa l training programs such as reduced costs, reduced
turnover and absenteeism, reduced grievances, improved profits or morale, and increased
Numerous studies reported use of compone nts of the Kirkpatrick Model; however,
no study was found that applied all four levels of the model. Although level one is the
found that reported use of level one as a sole measure of training. One application of the
second level of evaluation, knowledge, was reported by Alliger and Horowitz (1989). In
this study the IBM Corporation incorporated knowledge tests into internally developed
training. To ensure the best design, IBM conducted a study to identify the optimal test
for internally developed courses. Four separate tests composed of 25 questions each were
developed based on ten key learning components. Four scoring methods were evaluated
including one that used a unique measure of confidence. The confidence measurement
assessed how confident the trainee was with answers given. Tests were administered
both before and after training. Indices from the study assisted the organization to
evaluate the course design, effectiveness of the training, and effectiveness of the course
Training Evaluation 5
instructors. The development of the confidence index was the most valuable aspect of
the study. Alliger and Horowitz stated that behavior in the workplace was not only a
function of knowledge, but also of how certain the employee was of that knowledge.
Two studies were found that measured job application and changes in behavior (level
three of the Kirkpatrick Model). British Airways assessed the effectiveness of the
Managing People First (MPF) training by measuring the value shift, commitment, and
empowerment of the trainees (Paulet & Moult, 1987). An in-depth interview was used to
measure the action potential (energy generated in the participants by the course) and level
of action as a result of the course. A want level was used to measure the action potential
and a do level for the action. Each measurement was assigned a value of high, medium,
or low. However, high, medium, and low were not defined. The study showed that 27%
of all participants (high want level and high do level) were committed to MPF values and
pursued the programs aims/philosophy. Nearly 30% of participants were fully committed
to the aims/philosophy of MPF although they did not fully convert commitment to action
(high want level and medium and low do level). Approximately one-third of the
participants (29%) moderately converted enthusiasm into committed action (medium and
low want level and medium and low do level). But 13% remained truly uncommitted
Behavioral changes (level three of the Kirkpatrick Model) were measured following
low impact Outdoor-Based Experiential Training with the goal of team building
(OBERT) (Wagner & Roland, 1992). Over 20 organizations and 5,000 participants were
studied. Three measures were used to determine behavioral changes. Measure one was a
questionnaire completed by participant s both before and after training. The second
Training Evaluation 6
measure was supervisory reports completed on the functioning of work groups before and
after training. The third measure was interviews with managers, other than the
return on investment (ROI), was used by companies because of the pressures placed on
management (TQM) and continuous quality improvements (CQI) and the threat of
outsourcing due to downsizing. Great debate was found in the training and development
literature about the use of ROI measures of training programs. Many training and
development professionals believed that ROI was too difficult and unreliable a measure
to use for training evaluation (Barron, 1997). One study was found by a major
corporation that measured change in productivity and ROI of a training program (Paquet,
development and training department, which provides training for employees of CIGNA
management training made a business contribution. The research question posed was,
workplace?” The team conducting the research identified that data collection needed to
be built into the training program for optimal data gathering. If managers could use the
evaluation data for their own benefit as part of their training, they would be more likely
to cooperate.
Training Evaluation 7
training.
feedback.
action plan.
A case study approach was used to measure results. All fourteen case studies
reported in the article showed improvement; however, no specific measures were given.
and corporate overhead were used to calculate a cost of $1,600 per participant. Dollar
values were attached to action plans to determine the benefit value to the corporation.
However, these figures were not reported. The author stated that the ROI left little doubt
After forty years of using the classic Kirkpatrick Model, several authors have
suggested that adaptations should be made to the model. Warr, Allan and Birdie (1999)
over a seven- month period in a longitudinal study using a variation of the Kirkpatrick
Model. The main objective of this study was to demonstrate that training improved
performance, thereby justifying the investment in the training as appropriate. Warr et al.
(1999) suggested that the levels in the Kirkpatrick Model may be interrelated. They
investigated six trainee features and one organizational characteristic that might predict
outcomes at each measurement level. The six trainee features studied were learning
qualifications, tenure, and age. The one organizational feature evaluated was transfer
climate which was defined as the extent to which the learning from the training was
Warr et al. (1999) examined associations between three of the four measurement
levels in a modified Kirkpatrick framework. Warr et al. combined the two higher
Kirkpatrick measurement levels, behavior and results, into one measurement level called
job behavior. The three levels of measurement included were reactions, learning, and job
behavior. Trainees (all men) completed a knowledge test and a questionnaire on arrival
at the course prior to training. A questionnaire was also completed after the training. A
third questionnaire was mailed one month later. All questionnaire data were converted
into a measurement level score. The reaction le vel was assessed using the data gathered
after the training that asked about enjoyment of the training, perceptions of the usefulness
of the training, and the perceptions of the difficulty of the training. The learning level
was assessed using all three questionnaires. Since a training objective was to improve
trainee attitude towards the new electronic equipment, the perception of the value of the
equipment was measured at the second level of learning. Because experience or the
Training Evaluation 9
passage of time impacts performance, these researchers measured the amount of learning
that occurred during the course. Change scores were examined between before training
and after training data. Warr et al. derived a correlatio n between change scores and six
correlated with both pretest and posttest scores and could predict change in training. Job
behavior, the third measurement level, was eva luated using the before training
questionnaire results as compared to the data gathered one- month after training. Multiple
regression analyses of the different level scores were used to identify unique
relationships.
Warr et al. (1999) reported the relationship of the six individual trainee features and
one organizational feature as predictors of each evaluation level. At level one, all
measures of learning change. Learning level scores that reflected changes were strongly
predicted by reaction level scores. Findings suggested a possible link between reactions
and learning that could be identified with the use of more differentiated indicators at the
reaction level. At level three, trainee confidence and transfer support significantly
predicted job behavior. Transfer support was a part of the organizational feature of
transfer climate. Transfer support was the amount of support given by supervisors and
colleagues for the application of the training material. Warr et al. suggested that an
investigation into the pretest scores might explain reasons for the behavior and generate
organizational improvements.
Training Evaluation 10
Belfield, Hywell, Bullock, Eynon, and Wall (2001) considered the question of how
using an adaptation of the Kirkpatrick Model with five levels. The five levels were
within the medical profession, Belfield et al. indicated that a limited number (less than
2%) evaluated healthcare outcomes. The majority of the abstracts reviewed (70%)
reported within the articles because of incorrect term usage. Of those examined, Belfield
et al. indicated the authors needed to focus on clear communication of the design of
evaluation methods.
Abernathy (1999) admitted quantifying the value of training was no easy task and
presented two additional variations of the Kirkpatrick Model; one developed by Kevin
Oake and another developed by Julie Tamminen. However, no application of the two
• A smile sheet that asked the trainee if they liked the training.
• Job improvement that measured whether the training helped the trainee do
the human resources trainer for the Motley Fool investment Web site. Her version of the
enriched. The component educated answered the question, “Did they receive the
learning they needed to do their jobs better?” The component amused was assessed with,
“Was it an enjoyable experience, such that they were motivated to learn and inspired to
go apply the learning?” The component enriched was evaluated with, “Was the
company, and the company's customers enriched by the learning?” and “Did improved
skills and performance on an individual and company level result from the training?”
Schriver (1994) in work with Martin Marietta Energy Systems. Marshall and Schriver
suggested that many trainers misinterpreted the Kirkpatrick Model and believed that an
evaluation for knowledge was the same as testing for skills. Because skills and
knowledge were both included in level two of the Kirkpatrick Model, evaluators assumed
skills were tested when only knowledge was tested. As a result, Marshall and Schriver
recommended a five-step model that separated level two of the Kirkpatrick Model into
Only the theory of the model was presented in the article; no application of this model
was found.
training from the development through the delivery and impact. Step one involved the
analysis of the System Performance Indicators that included the trainee’s qualifications,
instructor abilities, instructional materials, facilities, and training dollars. Step two
involved the evaluation of the development process that included the plan, design,
development, and delivery. Step three was defined as output which equated to the first
three levels of the Kirkpatrick Model. Step three involves trainees’ reactions, knowledge
and skills gained, and improved job performance. Bushnell separated outcomes or results
of the training into the fourth step. Outcomes were defined as profits, customer
satisfaction, and productivity. This model was applied by IBM’s global education
With the advancement of training into the electronic age and the presentation of
programs formation rather than evaluation of the training effectiveness. Two applications
Although the Kirkpatrick Model has been applied for years to the traditional face-
to-face educational and technical training, recently the model was applied to non-
Applications of all levels of the Kirkpatrick Model are presented along with sample tools.
However, no data were presented demonstrating use of the model in an electronic training
application.
Radhakrishna, Plank, and Mitchell (2001) used a learning style instrument (LSI) and a
demo graphic profile in addition to reaction measures and learning measures. The three
describe both the demographic profile and the learning style of the participants. The
evaluation of the training began with an on- line pretest and an on- line LSI. The pretest
included seven demographic questions. The LSI, pretest and posttest, and LSI
questionnaire were paired by the agent's social security numbers. Fifty- five agents of the
available (106) agents completed all four instruments and were included in this study.
Lippert et al. (2001) reported training assessment results in five separate sections.
The demographic profile of the 55 agents was reported in the first section. The majority
were male (90%), aged 36-50 (71%), and possessed both prior Internet training (72%)
and intermediate computer skills (82%). The second result section described the LS I
responses. The top two ranked learning styles were convergers (38%) and assimilators
(35%). Convergers did best in activities requiring practical application of ideas and
preferred working with things instead of people. Lippert et al. reported no significant
relationships between the learning style results and the pretest and posttest results. The
research team suggested the strongest learner voice was the converger style and perhaps
Training Evaluation 14
that personality type (converger) was unintentionally attracted to participate in this study.
The third result section described the pretest and posttest scores. Both the number and
percent of correct and incorrect responses were identified as knowledge gain scores
(stated as percentages). The significance levels for differences in pretest and posttest
knowledge scores were categorized into substantial gain (30% or greater), moderate gain
(20-29%), little gain (10-19%) and negligible gain (0-9%). Overall, the reported
knowledge score gain ranged from a low of 7% to a high of 25%. Based on the data,
Lippert et al. reported that it was possible to increase participant knowledge via Web-
based training.
The fourth result section discussed the agent interaction in the Listserv discussions.
interaction was lower than expected by Lippert and colleagues (2001). The agents
commented they were reluctant to communicate on the Listserv after experiencing some
of the discussions by the higher level educators (Ph.D.). The fifth result section
examined the questionnaire responses following the training. Almost half of the agents
(47%) responded that the use of the Internet could provide a learning experience as
effective as face-to-face classes. Lippert et al. concluded the two Internet software tools,
the Web and the Listserv, were effective in accomplishing the training objectives.
Conclusion
many authors. The Kirkpatrick Model was assessed as a valuable framework designed
Value was based on the foundational ideas of Kirkpatrick and the longitudinal strength of
the model. The popularity of the Kirkpatrick Model was demonstrated by the 1997
ASTD survey results; however, few studies showing the full use of the model were
found. In addition to the Kirkpatrick Model, six adaptations were found; but no
time, money, materials, space, equipment, and manpower, continued efforts are needed to
assess all levels of effectiveness of training programs. Trainers from all disciplines
should develop evaluation plans for training and share the results of these initiatives.
Training Evaluation 16
References
Abernathy, D. (1999). Thinking outside the evaluation box. Training and Development,
53(2), 18-24.
Alliger, G.M., & Horowitz, H.M. (1989). IBM takes the guessing out of testing.
American Society for Training and Development. (1997). National HRD executive
http://www.astd.org/virtual_community/research/nhrd/nhrd_executive
survey_97me.htm
American Society for Training and Development. (2002). Training for the next economy:
http://store.astd.org/default.asp
Belfield, C., Hywell, T., Bullock, A., Eynon, R., & Wall, D. (2001). Measuring
Boulmetis, J., & Dutwin, P. (2000). The abc's of evaluation: Timeless techniques for
Holli, B., & Calabrese, R. (1998). Communication and education skills for dietetics
Kirkpatrick, D. (1996). Great ideas revisited. Training and Development, 50(1), 54-60.
Lippert, R., Radhakrishna, R., Plank, O., & Mitchell, C. (2001). Using different
Lookatch, R.P (1991). HRD’s failure to sell itself. Training and Development, 45(7),
35-40.
Paquet, B., Kasl, E., Weinstein, L., & Waite, W. (1987). The bottom line. In D.
Paulet, R., & Moult, G. (1987). Putting value into evaluation. Training and
Phillips, J. (1991). Handbook of training evaluation and measurement methods (2nd ed.).
Phillips, J., & Pulliam, P. (2000). Level 5 evaluation: Measuring ROI. Alexandria, VA:
Academic/Plenum.
Torres, R. T., Preskill, H.S., & Piontek, M.E. (1996). Evaluation strategies for
Warr, P., Allan, C., & Birdi, K. (1999). Predicting three levels of training outcome.
Wagner, R. J., & Roland, C. C. (1992). How effective is outdoor training? Training and