Sie sind auf Seite 1von 15

Computers & Education 51 (2008) 448–462

www.elsevier.com/locate/compedu

Designing a Web-based assessment environment


for improving pre-service teacher assessment literacy
a,*
Tzu-Hua Wang , Kuo-Hua Wang b, Shih-Chieh Huang c

a
Department of Education, National Hsinchu University of Education, No. 521, Nanda Rd., Hsinchu City 300, Taiwan
b
Graduate Institute of Science Education, National Changhua University of Education, No. 1, Jinde Rd., Changhua City,
Changhua County 500, Taiwan
c
Biology Department, National Changhua University of Education, No. 1, Jinde Rd., Changhua City, Changhua County 500, Taiwan

Received 19 March 2007; received in revised form 16 May 2007; accepted 14 June 2007

Abstract

Teacher assessment literacy is a key factor in the success of teaching, but some studies concluded that teachers lack it.
The aim of this research is to propose the ‘‘Practicing, Reflecting and Revising with WATA system (P2R-WATA) Assess-
ment Literacy Development Model’’ for improving pre-service teacher assessment literacy. WATA system offers person-
alized learning resources and opportunities for pre-service teachers to assemble tests and administer them to students
on-line. Furthermore, WATA system facilitates performance of test analysis and item analysis, and enables pre-service
teachers to review statistical information from the test and item analyses to revise test items. Sixty pre-service teachers par-
ticipated in this research. The research results indicate that pre-service teachers using P2R-WATA Assessment Literacy
Development Model have better effectiveness in improving their assessment knowledge and assessment perspectives.
 2007 Elsevier Ltd. All rights reserved.

Keywords: Teacher assessment literacy; Teacher education; WATA system; P2R-WATA Assessment Literacy Development Model

1. Introduction

The National Science Education Standards (NSES) of the United States emphasizes the importance of
assessment in the process of teaching. However, assessment is also important for students. Some researchers
have suggested that proper assessment can enhance student learning effectiveness (Campbell & Collins, 2007;
Mertler, 2004; Mertler & Campbell, 2005). Unfortunately, numerous studies have revealed that teachers lack
the assessment literacy required to administer proper and effective assessment in a classroom (e.g. Arter, 2001;
Brookhart, 2001; Mertler, 2004; Popham, 2006; Wang, Wang, Wang, Huang, & Chen, 2004). The literature
has concluded that the major reasons are ‘‘faults in teacher education program (Mertler, 2004; Stiggins,
2004)’’ and ‘‘faults in teacher certification policy (Stiggins, 1999, 2004)’’. This research attempts to address

*
Corresponding author.
E-mail address: tzuhuawang@gmail.com (T.H. Wang).

0360-1315/$ - see front matter  2007 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compedu.2007.06.010
T.H. Wang et al. / Computers & Education 51 (2008) 448–462 449

the problem of ‘‘faults in teacher education program’’. Based on the key elements of four successful training
models mentioned in the literature, this research constructs a Web-based training model, named P2R-WATA,
to enhance the development of teacher assessment literacy. This research also tries to investigate the effective-
ness of the P2R-WATA in improving the assessment literacy of pre-service Biology teachers.

2. Literature review

2.1. Teacher assessment literacy

Assessment literacy is part of the package of pedagogical content knowledge (PCK), which has been intro-
duced as an element of the knowledge base for teaching (Shulman, 1986). PCK has been described as ‘‘special
amalgam of content and pedagogy that is uniquely the province of teachers, their own special form of profes-
sional understanding (Shulman, 1987, p. 8)’’. In recent years, researches in the domain of teacher education
have significantly promoted the understanding of the content of PCK. Magnusson, Krajcik, and Borko
(1999) conclude that ‘‘knowledge and beliefs about assessment in science’’ consists in teachers’ PCK. Magnus-
son et al. further explain that ‘‘knowledge and beliefs about assessment in science’’ includes ‘‘knowledge of
dimensions of science learning to assess’’ and ‘‘knowledge of methods of assessment’’. The former refers to
teacher knowledge of the aspects of student learning that are important to assess within a particular unit
of study. The latter refers to their knowledge of the possible methods to assess the specific aspects of student
learning that are important to a particular unit of study. That is to say, teacher assessment literacy is included
in the ‘‘knowledge and beliefs about assessment in science’’.
In addition to Magnusson et al. (1999), in the paper – ‘‘Assessment Literacy for the 21st century’’ (Stiggins,
1995), Stiggins integrates the arguments brought up before and develops a brief definition of those equipped
with assessment literacy:
‘‘. . .know the difference between sound and unsound assessments. . . .are not intimidated by the sometimes
mysterious and always daunting technical world of assessment . . . knowing what they are assessing, why
they are doing so, how best to assess the achievement of interest, how to generate sound samples of per-
formance, what can go wrong, and how to prevent those problems before they occur. Most important,
those who are truly sensitive to the potential negative consequences of inaccurate assessment never permit
students to be put in a situation where their achievement might be mismeasured. . .’’ (Stiggins, 1995).

Table 1
Definition of teacher assessment literacy
Researchers/organizations Definition
Center for School Improvement and Policy Studies, Assessment literate educators recognize sound assessment, evaluation, and
Boise State University (2007) communication practices; they
Understand which assessment methods to use to gather dependable information
about student achievement
Communicate assessment results effectively, whether using report card grades, test
scores, portfolios, or conferences
Can use assessment to maximize student motivation and learning by involving
students as full partners in assessment, record keeping, and communication
Mertler (2004) 1. Recognize sound assessment, evaluation, and communication practices
2. Understand which assessment methods to use to gather dependable information
and student achievement
3. Communicate assessment results effectively, whether using report card grades,
test scores, portfolios, or conferences
4. Can use assessment to maximize student motivation and learning by involving
student as full partners in assessment, record keeping, and communication
North Central Regional Educational Laboratory The possession of knowledge about the basic principals of sound assessment
(2007) practice, including terminology, the development and use of assessment
methodologies and techniques, familiarity with standards of quality in assessment
. . . and familiarity with alternatives to traditional measurements of learning
450 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

Moreover, some academic organizations and researchers also interpret assessment literacy in their own
way, as shown in Table 1.
This research will adopt the definition given by Mertler (2004), and divide teacher assessment literacy into
two aspects: teacher assessment knowledge and their perspectives on it. Regarding the limitation of this
research, the former includes assessment knowledge about multiple-choice tests, such as the construction of
test items, the assembling and administering of tests, test analysis, item analysis and others. The latter includes
teacher perspectives on assessment functions and assessment procedures of multiple-choice tests.

2.2. Development of teacher assessment literacy

Researches show that assessment in the classroom has become an important activity in the instructional
process (Mertler, 2004; Stiggins & Chappuis, 2005) because a well-designed assessment and its scoring carry
quite positive educational significance for both teachers and students (AFT, NCME, & NEA, 1990; Stiggins,
2004; Stiggins & Chappuis, 2005). Popham (2006) indicated that assessment plays a pivotal role in the educa-
tion of students. Campbell and Collins (2007) further suggested that when assessment and instruction work in
tandem, and assessment is implemented effectively, improvement in student achievement is likely to occur.
Moreover, according to NSES of the United States, ‘assessment is an important tool for good inquiry into
teaching . . .skilled teachers of science are diagnosticians who understand students’ ideas, beliefs, and reason-
ing (NRC, 1996, p. 63)’. In brief, assessment plays an important role for both students and teachers.
In recent years, the ‘‘No Child Left Behind (NCLB)’’ project carried out in the United States puts emphasis
on assessments in schools because much evidence shows that performance on these assessments correlates to
student scores in standardized achievement examinations (Campbell, Murphy, & Holt, 2002; Mertler, 2004).
Campbell et al. (2002) argue that this tendency requires teachers to develop assessment-related ability accord-
ing to the curriculum standards set by each state and further to improve their own teaching and student learn-
ing effectiveness. Therefore, the promotion of professional knowledge of assessment among teachers has
become an important issue, and in turn assessment literacy is greatly valued (Mertler, 2004). In addition to
the growing importance of improving teacher assessment literacy resulted from the NCLB, Lukin, Bandalos,
Eckhout, and Mickelson (2004) further indicate that in the project of ‘‘Standards-based, Teacher-led Assess-
ment and Reporting System (STARS)’’ devised by the state of Nebraska in 2000, the district assessment is
adopted instead to manage teaching materials with greater precision, and optimize student learning efficiency
(Bandalos, 2004; Plake, Impara, & Buckendahl, 2004). Lukin et al. point out that the STARS education
reform can be successful only when the district assessment is used to improve the teaching in schools and
for school improvement and accountability purposes. Lukin et al. further observe that another element affect-
ing the result of the STARS education reform is the improvement of classroom assessment. As a de-centralized
model of education reform, STARS can be successful only with teacher participation and teacher shouldering
of responsibility for student learning effectiveness. In this situation, classroom assessment quality and teacher
assessment literacy are issues that need to be seriously considered. Since assessment in the classroom can be
used to aid both instruction and learning, Nebraska State law LB 812 regulates teacher assessment ability and
its development (Buckendahl, Impara, & Plake, 2004; Lukin et al., 2004; Plake et al., 2004). All in all, teacher
assessment literacy has been more and more emphasized in the United States recently.
The literature review above shows that in recent decades, scholars have seen teaching, assessment and learn-
ing effectiveness as being closely related to one another. Further, in educational reform projects carried out in
recent years, assessment has played an important role in assisting teaching and learning and promoting learn-
ing effectiveness. However, many researchers have claimed that both pre-service and in-service teachers are not
equipped with appropriate classroom assessment ability. There had been extensive discussions of this issue in
the 1990s (e.g. Plake, Impara, & Fager, 1993), and it is seen as a problem that will persist for years to come
(e.g. Campbell & Collins, 2007; Campbell et al., 2002; Mertler, 2004; Mertler & Campbell, 2005).
Thus, in recent decades, scholars have emphasized the importance of teacher assessment literacy. They
believe that the courses related to the development of teacher assessment literacy should be included in both
pre-service and in-service teacher education (AFT, NCME, & NEA, 1990; Brookhart, 2001; Campbell et al.,
2002; Lukin et al., 2004; Mertler, 2004; Stiggins, 1995, 1999, 2004). Four effective models of developing teacher
assessment literacy are introduced below:
T.H. Wang et al. / Computers & Education 51 (2008) 448–462 451

 Plake and Impara (1993): In addition to ‘‘courses training material’’, the design of ‘‘parent teacher vign-
ette’’ helps construct the ‘‘simulated parent teacher conference’’. In this conference, teachers can simulate
how to explain the significance of assessment scores to parents, which helps teachers put their assessment
knowledge into practice.
 NAC (Lukin et al., 2004): NAC program consists of 18 credit hours of graduate level courses and targets
experienced, practicing teachers and/or administrators. The 18 credit hours consist of six-hour courses
offered in each of two consecutive summers with six-hour of ‘‘practicum’’ during the intervening school
year. The six-hour course in the first summer covers basic assessment concepts as they are applied in both
classroom and large-scale settings. The six-hour course in the second summer focuses on analyzing and
interpreting assessment data and data based decision-making. The distinguishing feature of the NAC pro-
gram are a great increase in assessment course credits, and a ‘‘practicum’’ of six credit hours that provides
pre-service teachers with opportunities to use their 12 credit hours of knowledge and techniques of assess-
ment in a realistic situation.
 ALLT (Arter, 2001), PALS & IPALS (Lukin et al., 2004): The training is done through a ‘‘learning team’’.
Team members can study assessment materials together and share what they learn. Putting the assessment
knowledge and assessment techniques they learn into practice is also emphasized.

In the four models mentioned above, assessment-related knowledge and techniques are all taught in a tra-
ditional way. An important characteristic shared by them is that they all provide opportunities for teachers to
apply what they have learned to realistic classroom situations. This combination of classroom experience and
assessment literacy training is affirmed by many related literature (Brookhart, 2001; Lukin et al., 2004; Mer-
tler, 2004; Stiggins, 1999; Taylor & Nolen, 1996). Taylor and Nolen (1996) even argue that only when taught
in realistic teaching situations can concepts of assessment be meaningful to the teachers taking the courses on
assessment literacy. For example, in the model of Plake and Impara (1993), both ‘‘practice exercise’’ and
‘‘hands-on experience’’ are included. In NAC model, the ‘‘practicum’’ course held in spring and fall semesters
requires course-takers to apply the assessment concepts and techniques they have learned in the summer
semester to the realistic classroom situation. The ALLT, PALS and IPALS are similar in that they all include
the part of realistic classroom assessment. In ALLT, administrative staff and teachers are both involved in
assisting in the assessment done in classroom. PALS and IPALS also include the ‘‘student teaching semester’’,
in which course-takers can apply the assessment knowledge and techniques they have learned to a realistic
education environment.
This research will take the four models mentioned above as references. Consulting their shared character-
istic of integrating classroom experiences into the development of teacher assessment literacy, this research
aims to construct a teacher assessment literacy development model in an e-Learning environment.

2.3. Web-based assessment system and teacher assessment literacy development

With the improvement of Internet communication technology and database technology in recent years,
Web-Based Testing (WBT) has been a common and effective type of assessment and has been employed in
different educational settings (He & Tymms, 2005; Sheader, Gouldsborough, & Grady, 2006; Wang et al.,
2004; Wang, 2007). With WBT, teachers can construct test items, correct test papers and record scores on-line.
Moreover, the WBT can present a test paper for testees to take in the form of hypermedia, including audio,
video and even virtual or dynamic images designed by JAVA. These characteristics enable the WBT to take
the place of traditional paper tests in some way. Below we will investigate how to develop a Web-based assess-
ment system to help develop teacher assessment literacy from the twin angles of designing a Web-based assess-
ment system and developing teacher assessment literacy.
Scholars have offered a great of advice on the design of computer-assisted and Web-based assessment systems.
Bonham, Beichner, Titus, and Martin (2000) point out that a good assessment system should be equipped with
the following characteristics and functions: able to be connected through common Internet explorer software,
able to identify users by secret codes, able to grade automatically, and able to collect and record the information
related to student scores. Gardner, Sheridan, and White (2002) conclude that the system should be equipped the
ability to construct a test item bank, in which teachers can store the items they already have or which have been
452 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

provided by the publishers of textbooks. Teachers can assemble tests on-line at any time and make students take
them on-line. In addition, the system should also include the design of scores bulletin board, which is constantly
updated with the most recent student scores and enables students to query their scores at any time and monitor
their own learning. He and Tymms (2005) conclude that the outstanding ability of computers to collect and pro-
cess information makes it easy to collect the information on examinee examination tracks. Therefore, an assess-
ment system should not only passively collect information about the examination tracks but actively provide
feedback, guide learning, and assist detecting learning misconceptions.
In addition to the above design suggestions, the design of the Web-based assessment system in this research
should also assist in developing teacher assessment literacy. This research therefore adopts the Web-based
Assessment and Test Analysis system (WATA system) (Wang et al., 2004), which is equipped with the com-
plete assessment process described by Gronlund and Linn (1990, pp. 109–141, 228). Using the angle of Gronl-
und & Linn, the similarities and differences between the WATA system and other six different Web-based
assessment systems are shown in Table 2.
The ‘Step 1–Step 7’ in Table 2 are the ‘‘basic steps in classroom testing’’ proposed by Gronlund and Linn
(1990, pp. 109–141, 228). Table 2 shows that the WATA system is the only system that allows the construction
of a Two-Way Chart and assembles the test based on it. In ‘‘appraising the test’’, there are also many differ-
ences. For example, Gateway Testing System, Lon-CAPA, Mallard, Question Mark, Top-Class and
Web@ssessor can present original scores, statistic charts and test results, but they do not perform item anal-
ysis. Question Mark is able to record the answer history of each student and perform item analysis. However,
it does not analyse student scores in more details (such as the T score) or perform a test analysis (such as KR
20 Reliability). Based on the analysis above, the WATA system appears to more completely satisfy the needs
of this research as it is provided with a comprehensive assessment process. That is why it will be used in this
research as the Web-based assessment system for the construction of an e-Learning environment to develop
teacher assessment literacy.
The WATA system is equipped with the scaffold of complete assessment process, which is the ‘‘Triple-A
Model (Wang et al., 2004)’’. The Triple-A Model is developed from the idea of ‘‘basic steps in classroom test-
ing’’, which includes ‘‘determining the purpose of testing’’, ‘‘constructing the Two-Way Chart’’, ‘‘selecting
appropriate items according to the Two-Way Chart’’, ‘‘preparing relevant items’’, ‘‘assembling the test’’,
and ‘‘appraising the test’’, as described by Gronlund and Linn (1990, pp. 109–141, 228), along with the results
of the questionnaires and interviews done with the in-service teachers (Wang et al., 2004). The content of the
Triple-A Model includes:

Assembling: Teachers can construct the question database by themselves, arrange a Two-Way Chart and
assemble tests based on it.
Administering: Teachers arrange and administer a multi-examination schedule.
Appraising: After tests are taken, teachers can perform test analysis and item analysis.

Table 2
Comparing functions of seven different WBT systems (Wang et al., 2004)
WBT systems Assembling Administering, Step 6 Appraising, Step 7
Step 1 & Step 2 Step 3 Step 4 Step 5
Gateway Testing System X X V V V D
Lon-CAPA X X V V V D
Mallard X X V V V D
Question Mark X X V V V D
Top-Class X X V V V D
Web@ssessor X X V V V D
WATA V V V V V V
X = not available; V = available; D = partially available; Step 1: Determining the purpose of testing; Step 2: Constructing the two-way
chart; Step 3: Selecting appropriate items according to the Two-Way Chart; Step 4: Preparing relevant items; Step 5: Assembling the test;
Step 6: Administering the test; Step 7: Appraising the test, including ‘‘variance’’, ‘‘standard deviation’’, ‘‘testee T score, z score and Z
score’’, ‘‘mean of all grades’’, ‘‘average difficulty of the test’’, ‘‘KR20 reliability’’, ‘‘DP’’, ‘‘ID’’, ‘‘answers on each item in upper group and
lower group’’, ‘‘wrong answers of students for each item’’, ‘‘distracter analyses of all testees, upper group and lower group’’, and so on.
T.H. Wang et al. / Computers & Education 51 (2008) 448–462 453

In addition to the Triple-A Model taken from the literature review, the ‘‘personalized design (Passig, 2001)’’
and the ‘‘situated design (Clarke & Hollingsworth, 2002; Wiske, Sick, & Wirsig, 2001)’’ are two important ele-
ments in designing an environment for teacher education in this research. These two designs include:

 Personalized design
Passig (2001) observes that in an e-Learning environment of teacher education, personalized design is quite
important. Egbert and Thomas (2001) also point out that an e-Learning environment requires teachers to
participate and learn quickly, instead of passively waiting to be crammed with information. The e-Learning
system should vary with the individual progress and needs of each teacher. Moreover, it should be able to
provide a wide variety of personalized feedback whenever required. In other words, the teachers are given
more time to reflect and should be able to decide their own learning procedures. They can search for and
gain feedback from the system depending on their learning progress, the questions they encounter, and the
situation they are in.
 Situated design
Clarke and Hollingsworth (2002) and Wiske et al. (2001) all point out that situated design is important for
the environment in teacher education. One advantage of e-Learning is that situations that cannot be easily
simulated in a traditional classroom may be simulated with the help of information technology. Therefore,
taking situated design as the essence, the Web-based assessment system constructed in this research is thus
provided with a high degree of contextualization, which enables learners to experience a whole set of class-
room assessment procedures in simulation.

The following is the concrete way of devising personalized and situated design in the WATA system:

 Personalized design in the WATA system


The WATA system provides each user an individual interface of the Triple-A Model. Each user can con-
duct a personalized ‘‘assembling of test papers’’, ‘‘management of test schedules’’ and ‘‘test analysis and
item analysis’’. Moreover, in each section and page of the Triple-A Model, the WATA system provides elec-
tronic learning resources related to the each section or page.
 Situated design in the WATA system
The WATA system is equipped with the Triple-A Model. It can provide a framework for the complete
assessment process, helping users to simulate the complete assessment process on-line. The complete assess-
ment process stated by Gronlund and Linn (1990, pp. 109–141, 228) includes ‘‘determining the purpose of
testing’’, ‘‘constructing the Two-Way Chart’’, ‘‘selecting appropriate items according to the Two-Way
Chart’’, ‘‘preparing relevant items’’, ‘‘assembling the test’’, and ‘‘appraising the test’’.

In addition to adopting the WATA system to construct the e-Learning environment for teacher assessment
literacy development, this research also takes as a reference the four teacher assessment literacy development
models referred to in the literature, the model of Plake and Impara (1993), the NAC model (Lukin et al.,
2004), the ALLT model (Arter, 2001), and the PALS model and IPALS model (Lukin et al., 2004). Based
on the view of combining classroom experiences and the development of assessment literacy held by Brook-
hart (2001), Lukin et al. (2004), and Taylor and Nolen (1996), this research develops the ‘‘P2R-WATA Assess-
ment Literacy Development Model (P2R-WATA)’’ (Fig. 1) to provide teachers with a better model for
training and promotion of their assessment literacy. All pre-service teachers who participate in can simulate
the assembling the tests according to scaffold of the complete assessment process (Triple-A Model) in the
WATA system, and ‘‘practice’’ administering and appraising tests on-line, using real students. After the stu-
dents finish the tests, the WATA system can assist pre-service teachers in test analysis and item analysis, and
then provide immediate statistical feedback to the pre-service teachers. Using the statistical feedback provided
by the system, pre-service teachers can ‘‘reflect’’ the faults of the items they construct. Based on what they have
learned before, electronic learning resources, and the statistics information provided by the WATA system,
they can ‘‘revise’’ their items and re-administer the revised version of their tests. This will help pre-service
teachers test whether their revision strategies are effective. This research explores the effectiveness of the
P2R-WATA in promoting teacher assessment literacy.
454 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

Fig. 1. P2R-WATA Assessment Literacy Development Model.

3. Methodology

3.1. Participants

Participants in this research consisted of 30 third grade Biology pre-service teachers (male: 20, female: 10)
and 30 fourth grade Biology pre-service teachers (male: 15, female: 15). The third grade was assigned to the
control group and the fourth grade was assigned to the experimental group. The average age of the control
group is 21.47 (SD = .67), that of the experimental group is 22.53 (SD = .67). There is no significant difference
between control group and experimental group on the entry behaviour of assessment knowledge (F1,57 = .101,
p = .752) and assessment perspectives (F1,57 = .341, p = .562).

3.2. Instruments

3.2.1. Assessment Knowledge Test (AKT) & Survey of Assessment Perspectives (SAP)
AKT is mainly used to evaluate the assessment knowledge of pre-service teacher. Before designing the
AKT, all the assessment concepts slated for inclusion in the AKT were established and coded. Based on Stan-
dards for Teacher Competence in Educational Assessment of Students (STCEAS) (AFT, NCME, & NEA,
1990) and the basic assessment concepts of Gronlund and Linn (1990), three experts on Biology education
and assessment chosen some assessment concepts important to Biology teachers (Table 3). The 50 items in
the first version of the AKT covered all assessment concepts listed in Table 3. After the pilot test of the
AKT, an item whose discrimination index was below .250 was cancelled (Noll, Scannell, & Craig, 1979). Forty
items were retained in the final version of the AKT. Cronbach’s a for the AKT is 0.993, and its average dif-
ficulty is .505.
The Survey of Assessment Perspectives (SAP) (Wang et al., 2004) is used to assess the perspectives of the
assessment functions and procedures among pre-service teachers. The SAP is a nominal-categorical measure-
ment and each item grouped participants into categories based upon agreement (Yes = 1 or No = 0). There
are two subscales in the SAP, ‘‘perspectives about assessment functions’’, consisting of seven items (Cron-
bach’s a = .71), and ‘‘perspectives about assessment steps’’, consisting of eight items (Cronbach’s a = .78).
We also adopt the Multitrait-Multimethod Matrix (MTMM) to examine the construct validity of the
instruments in this research, and find that MTMM analysis supports both the convergent and discriminant
validity (Campbell & Fiske, 1959) (see Table 4).

3.2.2. Web-based Assessment and Test Analysis (WATA) system (Wang et al., 2004)
This research uses the WATA system to construct the ‘‘P2R-WATA Assessment Literacy Development
Model’’. The WATA system was developed on the basis of the personalized Triple-A Model, which in turn
T.H. Wang et al. / Computers & Education 51 (2008) 448–462 455

Table 3
Assessment concepts covered in AKT
1. Constructing items, assembling test papers and administering tests
1.1. Principles of constructing a multiple-choice item
1.2. Characteristics of multiple-choice item
1.3. Bloom’s Taxonomy
1.4. Difference between summative assessment and formative assessment
1.5. Functions of summative assessment and formative assessment in teaching activity
1.6. General steps of administering tests
1.7. Administering tests
2. Analysis and appraising of testing data
2.1. Test analysis
2.1.1. Variance
2.1.2. Average
2.1.3. Standard deviation
2.1.4. KR20 and Cronbach a reliability
2.1.5. Test difficulty
2.1.6. Validity (content validity – Two-Way Chart)
2.1.7. Analysis of students’ scores distribution
2.1.8. T-scores
2.1.9. z-scores
2.1.10. Normal distribution
2.2. Item analysis
2.2.1. Options distractor power analysis
2.2.2. Item discrimination analysis
2.2.3. Students’ error-conception analysis
2.2.4. Item difficulty analysis

Table 4
MTMM analysis of AKT and SAP
AKT SAP
AKT (0.993)
SAP 0.441 (0.660)

is based on the ‘‘basic steps in classroom testing’’ proposed by Gronlund and Linn (1990, pp. 109–141, 228)
and interviews with 17 in-service teachers and assessment experts. The Triple-A Model comprises (Wang et al.,
2004):

Assembling: Construction of item pools and test items; assembly of tests based on the Two-Way Chart
(Fig. 2).
Administering: Assignment of test items and item choices randomly to testees; provision of personal identifi-
cation numbers (PINs) and examination passwords for testees to take the test over the Internet;
collection, recording, and processing the scores and other data from the tests.
Appraising: Analysis of the collected process data tests; generation of the test analyses and item analysis
(Fig. 3) statistical reports.

3.3. Research design

This research was implemented in the course of ‘‘Biology Teaching Theory and Practice’’, an important
required course for Biology pre-service teachers. The experimental group used the P2R-WATA Assessment
Literacy Development Model (P2R-WATA). The control group used the P2R Assessment Literacy Develop-
ment Model (P2R). The P2R-WATA is constructed based on the WATA system, whose primary framework is
456 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

Fig. 2. Assembling a test according to the Two-Way Chart.

Fig. 3. Test analysis and item analysis.

the aforementioned Triple-A Model. The Triple-A Model provides experimental group pre-service teachers
with the ‘‘framework of complete assessment steps’’, which facilitates improvement of assessment knowledge
and assessment perspectives among experimental group pre-service teachers. In addition to the Triple-A
Model, the P2R-WATA includes two essential components:

 Personalized design
This design is based on the suggestions held by Passig (2001) and Egbert and Thomas (2001) that e-Learn-
ing system for teacher education should vary with individual progress and the needs of each teacher. All
T.H. Wang et al. / Computers & Education 51 (2008) 448–462 457

experimental group pre-service teachers can improve their assessment knowledge and assessment perspec-
tives using the personalized Triple-A Model interface. In addition, the WATA system supports individual
pre-service teachers by providing personalized electronic learning resources.
 Situated design
This design is based on the important characteristic, providing opportunities for learners to apply what they
have learned to realistic classroom situations, shared by the four assessment literacy development models
introduced before. All experimental group pre-service teachers can practice assembling, administering and
appraising tests on-line. Using the P2R-WATA, pre-service teachers can ‘‘practice’’ assembling and admin-
istering tests on-line, and testees can take the test over the internet. Process data from the test is then ana-
lysed and a set of test-related statistics generated by the WATA system. Pre-service teachers can make use
of the statistics to ‘‘reflect’’ on their own mistakes in test construction and on testee learning. After reflec-
tion, pre-service teachers will ‘‘revise’’ their mistakes based on the statistics provided by the system. They
may also draw on the personalized electronic learning resources provided by the WATA system. After revi-
sion, the revised test can be administered to the same testees again to test whether the revision strategies
have been effective.

The P2R is similar to the P2R-WATA. The major difference is that the P2R-WATA provides experimental
group pre-service teachers with an e-Learning environment using the WATA system and personalized elec-
tronic learning resources, but the P2R provides control group pre-service teachers with a traditional learning
environment and printed rather than personalized electronic learning resources. The content of printed learn-
ing resources was identical to the personalized electronic learning resources. Neither group received lectures
from their professors during this research.

3.4. Research procedure

The research procedure consisted of following steps:

1. All the pre-service teachers were divided into experimental group and control group. The pre-test of AKT
and SAP were administered to the experimental group and control group pre-service teachers to ascertain
their entry behaviour of assessment knowledge and assessment perspectives.
2. After administering the two pre-tests, the experimental group and control group pre-service teachers were
asked to practice assembling tests. All pre-service teachers were provided with the teaching materials about
the topic of ‘Evolution’ and ‘Classification’, and a Two-Way Chart. These two topics were selected from the
textbook for students in the first grade of junior high school. Based on the provided teaching materials and
the Two-Way Chart, the pre-service teachers constructed their own test papers. There were ten multiple-
choice items in the test paper. Throughout the process of constructing the items, the pre-service teachers
in the experimental group relied on the WATA system, while the pre-service teachers in the control group
used Microsoft Word.
3. After they finish designing the first version of test papers, the first-graders from sixteen classes in a junior
high school in the middle of Taiwan take the test papers. All junior high school students had been taught
the topic of ‘Evolution’ and ‘Classification’ before. In this test, whether a class should complete the test
papers designed by the experimental group or the control group pre-service teachers was randomly
assigned. If the former was the case, the class of students had to take the test online with the WATA system
in a computer classroom at the same time. If the latter was the case, the class of students had to take the test
in the format of paper-and-pencil test in a traditional classroom at the same time.
4. After the first version of test papers was administered, all the process data during the tests was sent back to
the individual pre-service teachers. The WATA system automatically corrected and analysed all test papers
of the experimental group pre-service teachers, while the control group pre-service teachers manually cor-
rected their test papers and did the analysis. After they finished the analysis, both groups have to compose
tests and item analysis reports, and revise the items they themselves consider to be problematic, such as
those with poor discrimination index, distraction index and difficulty index. In addition, the pre-service
teachers in both groups may not add or delete any items in the first version of test papers. They were
458 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

restricted to just revising items. The experimental group pre-service teachers directly revised the items in the
WATA system, while the control group pre-service teachers did it in Microsoft Word. After the revised test
papers were constructed, both groups were given the middle-test of AKT and SAP.
5. The classes of junior high school students that completed the first version of test papers by the experimental
group pre-service teachers was also asked to complete the revised version by the same group. The same pro-
cedure was used with the control group. After the revised test papers were administered, both groups again
did the analysis, composed the test and item analysis reports and revised the items. Finally, they were all
given the post-test of AKT and SAP, which was meant to evaluate the effectiveness of their learning.

3.5. Data collection and analysis

All data collected are quantitative, including pre-test, mid-test and post-test scores of the AKT and SAP.
The pre-test scores are used to test the differences in entry behaviour between the control and experimental
group. In addition, the repeated measures analysis, using the age of pre-service teachers as a covariate to
remove the effect, is used to test the differences in the improvement of assessment knowledge and assessment
perspectives between the control and experimental group.

4. Results

4.1. Development of assessment knowledge

The results of repeated measures analysis are shown in Table 5. There is a significant difference between
experimental and control groups (F1,57 = 4.588, p < .05) on the pre-test, mid-test and post-test scores of the
AKT. Furthermore, the experimental group performs significantly better than control group as shown by
the Post Hoc test (Table 6; p < .05).
In addition, Table 5 also indicates that there is no significant difference among the pre-test, mid-test and
post-test scores for all pre-service teachers on the AKT. However, the ‘‘Group’’ factor significantly interacts

Table 5
Summary table of repeated measures on AKT (n = 60)
Source SS DF MS F
Between
Age 129.976 1 129.976 1.102
Group 541.082 1 541.082 4.588*
Error 6722.739 57 117.943
Within
AKT 75.031 2 37.515 .586
AKT · Age 113.187 2 56.593 .811
AKT · Group 833.740 2 416.870 5.972**
Error 7957.435 114 69.802
Age: the age of pre-service teachers is used as a covariate to remove its effect; Group: experimental group and control group.
* p < .05.
** p < .01.

Table 6
Post Hoc test of experimental and control groups on AKT
Treatment AKT mean differencea Standard error
Experimental group (n = 30) control group (n = 30) 7.677* 3.584
a
Adjustments for multiple comparisons: Bonferroni method.
* p < .05.
T.H. Wang et al. / Computers & Education 51 (2008) 448–462 459

Table 7
Cross analysis for Group by AKT scores
Group AKT Mean scoresa Standard error
Control group (n = 30) Pre-test 49.781 3.045
Mid-test 55.176 2.378
Post-test 62.062 2.594
Experimental group (n = 30) Pre-test 51.303 3.045
Mid-test 61.807 2.378
Post-test 76.938 2.594
a
Evaluated at covariates appeared in the model: Age = 22.000. The age of pre-service teachers is used as a covariate to remove its effect.

Fig. 4. Graph of interaction between Group and AKT scores.

with the pre-test, mid-test and post-test scores of AKT (F2,114 = 5.972, p < .01). The findings of cross analysis
for ‘‘Group’’ by AKT scores are shown in Table 7 and Fig. 4.
Fig. 4 shows that there is no significant difference between the experimental group and control group in the
pre-test scores of AKT (see Section 3.1). However, the trend of the experimental group is significantly diverged
from that of the control group (Table 6; p < .05).

Table 8
Summary table of repeated measures on SAP (n = 60)
Source SS DF MS F
Between
Age .458 1 .458 .260
Group 26.013 1 26.013 14.791**
Error 100.246 57 1.759
Within
SAP .525 2 .262 .141
SAP · Age .103 2 .052 .028
SAP · Group 23.566 2 11.783 6.311**
Error 212.852 114 1.867
Age: the age of pre-service teachers is used as a covariate to remove its effect; Group: experimental group and control group.
** p < .01.
460 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

Table 9
Post Hoc test of experimental and control groups on SAP
Treatment SAP mean differencea Standard error
Experimental group (n = 30) Control group (n = 30) 1.683* .438
a
Adjustments for multiple comparisons: Bonferroni method.
* p < .05.

Table 10
Cross analysis for Group by SAP scores
Group SAP Mean scoresa Standard error
Control group (n = 30) Pre-test 11.296 .479
Mid-test 11.921 .321
Post-test 12.958 .250
Experimental group (n = 30) Pre-test 11.737 .479
Mid-test 14.579 .321
Post-test 14.909 .250
a
Evaluated at covariates appeared in the model: Age = 22.000. The age of pre-service teachers is used as a covariate to remove its effect.

Fig. 5. Graph of interaction between Group and SAP scores.

4.2. Development of assessment perspectives

The results of repeated measures analysis are shown in Table 8. There is a significant difference between
experimental and control groups (F1,57 = 14.791, p < .01) on the pre-test, mid-test and post-test scores of
the SAP. The Post Hoc test shows that the experimental group performs significantly better than the control
group (Table 9; p < .05).
In addition, Table 8 also indicates that there is no significant difference among all pre-service teachers on
the pre-test, mid-test and post-test scores of SAP. However, the ‘‘Group’’ factor significantly interacts with the
pre-test, mid-test and post-test scores of SAP (F2,114 = 6.311, p < .01). The findings of cross analysis for
‘‘Group’’ by SAP scores are shown in Table 10 and Fig. 5.
Fig. 5 shows that there is no significant difference between the experimental group and control group in
their SAP pre-test scores (see Section 3.1). However, the trend of the experimental group is significantly
diverged from that of the control group (Table 9; p < .05).

5. Concluding remarks

Positive findings of the effectiveness of the P2R-WATA in this research show that the P2R-WATA can
more effectively improve Biology pre-service teacher assessment knowledge and assessment perspective than
T.H. Wang et al. / Computers & Education 51 (2008) 448–462 461

the P2R. This research results state above can be explained by the characteristics of the P2R-WATA. Its
design is based on the suggestions of personalized design (Passig, 2001) and situated design (Clarke & Hol-
lingsworth, 2002; Wiske et al., 2001). Those pre-service teachers in the experimental group (P2R-WATA)
can determine their own learning speed and process in the WATA system, and use the information provided
by the WATA system for feedback and reflection. The situated design of the P2R-WATA permits the pre-ser-
vice teachers in the experimental group to combine assessment knowledge with assessment practice and in turn
improve their own assessment literacy.
In addition, the results also suggest that a well-designed Web-based assessment system is suitable for tea-
cher education in improving teacher assessment literacy. Moreover, the design of Triple-A Model is suggested
to be included in the well-designed Web-based assessment system. The Triple-A Model offers a comprehensive
framework of assessment steps, providing scaffold to facilitate the development of assessment literacy. The
pre-service teachers in the P2R-WATA have the opportunity to practice assembling, administering and
appraising tests on-line. Furthermore, they can exploit the array of statistical data generated by the WATA
system to perform test and item analysis, compose test and item analysis reports, and revise the tests they
had made. The P2R-WATA also provides experimental group pre-service teachers with opportunities to test
their item revision strategies. Though the P2R provides control group pre-service teachers with opportunities
to practice assembling, administering and appraising tests, it provides no personalized e-Learning environ-
ment, no personalized electronic learning resources, no scaffolding to develop assessment literacy, and no
automatically generated statistical data. Pre-service teachers in the P2R may encounter problems when per-
forming item and test analysis because the statistics are complex and difficult to perform without the aid of
a dedicated system. Moreover, when faced with problems, the control group pre-service teachers are limited
to the support of printed learning resources and cannot take advantage of personalized feedback and support
from personalized electronic learning resources.
Preliminary results on the effectiveness of the P2R-WATA are promising. This research suggests that the
P2R-WATA should be considered as a model for an assessment literacy development program. However,
because the instruments in this research are developed primarily to assess Biology teacher’s assessment literacy
about multiple-choice tests in a traditional classroom, this research suggests that more instruments should be
developed to assess the assessment literacy of teachers with other subject matter backgrounds, and to assess
teachers’ assessment literacy about alternative assessments (e.g. performance assessment, authentic assessment
and portfolio assessment). In addition, further research should explore the effectiveness of the P2R-WATA in
in-service teacher education. Moreover, longitudinal research should also be conducted to investigate whether
teachers in the P2R-WATA will apply their increased assessment literacy to their classroom teaching to
improve student learning effectiveness.

Acknowledgements

This paper is a part of Dr. Tzu-Hua Wang’s unpublished doctoral dissertation from Graduate Institute of
Science Education, National Changhua University of Education, Taiwan. Dr. Tzu-Hua Wang would like to
express the deepest gratitude to his dissertation advisors, Prof. Shih-Chieh Huang and Prof. Kuo-Hua Wang,
for their continuous support and invaluable guidance. Dr. Tzu-Hua Wang is also grateful to his dissertation
committee members, Prof. Ching-Kuch Chang, Prof. Mei-Hung Chiu, Prof. Hsiang-Chuan Liu, Prof. Whe-
Dar Lin, Prof. Wei-Lung Wang and Prof. Chih-Chiang Yang, for their thoughtful comments and suggestions.
The authors are also grateful for the insightful comments from the referees.

References

American Federation of Teachers, National Council on Measurement in Education, & National Education Association (AFT, NCME, &
NEA). (1990). The standards for teacher competence in the educational assessment of students. Educational Measurement: Issues and
Practice, 9(4), 30–32.
Arter, J. (2001). Learning teams for classroom assessment literacy. NASSP Bulletin, 85(621), 53–65.
Bandalos, D. (2004). Introduction to the special issue on the Nebraska Standards-based, Teacher-led assessment and Reporting System
(STARS). Educational Assessment: Issues and Practice, 23(2), 4–6.
462 T.H. Wang et al. / Computers & Education 51 (2008) 448–462

Bonham, S. W., Beichner, R. J., Titus, A., & Martin, L. (2000). Education research using Web-based assessment systems. Journal of
Research on Computing in Education, 33, 28–45.
Brookhart, S. M. (2001). The standards and classroom assessment research. Paper presented at the annual meeting of the American
Association of Colleges for Teacher Education, Dallas, TX (ERIC Document Reproduction Service No. ED451189).
Buckendahl, C. W., Impara, J. C., & Plake, B. S. (2004). A strategy for evaluating district developed assessments for state accountability.
Educational Measurement: Issues and Practice, 23(2), 15–23.
Campbell, C., & Collins, V. L. (2007). Identifying essential topics in general and special education introductory assessment textbooks.
Educational Measurement, Issues and Practice, 26(1), 9–18.
Campbell, C., Murphy, J. A., & Holt, J. K. (2002). Psychometric analysis of an assessment literacy instrument: Applicability to preservice
teachers. Paper presented at the annual meeting of the Mid-Western Educational Research Association, Columbus, OH.
Campbell, D., & Fiske, D. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin,
56, 81–105.
Center for School Improvement and Policy Studies, Boise State University. (2007). What is assessment literacy? Assessment literacy.
Retrieved May 16, 2007 (available http://csi.boisestate.edu/al/).
Clarke, D., & Hollingsworth, H. (2002). Elaborating a model of teacher professional growth. Teaching and Teacher Education, 18(8),
913–1059.
Egbert, J., & Thomas, M. (2001). The new frontier: A case study in applying instructional design for distance teacher education. Journal of
Technology and Teacher Education, 9(3), 391–405.
Gardner, L., Sheridan, D., & White, D. (2002). A Web-based learning and assessment system to support flexible education. Journal of
Computer Assisted Learning, 18, 125–136.
Gronlund, N. E., & Linn, R. L. (1990). Measurement and evaluation in teaching (6th ed.). New York: MacMillan.
He, Q., & Tymms, P. (2005). A computer-assisted test design and diagnosis system for use by classroom teachers. Journal of Computer
Assisted Learning, 21(6), 419–429.
Lukin, L. E., Bandalos, D. L., Eckhout, T. J., & Mickelson, K. (2004). Facilitating the development of assessment literacy. Educational
Measurement: Issues and Practice, 23(2), 26–32.
Magnusson, S., Krajcik, J., & Borko, H. (1999). Nature, sources and development of pedagogical content knowledge. In J. Gess-Newsome & N.
G. Lederman (Eds.), Examining pedagogical content knowledge (pp. 95–132). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Mertler, C. A. (2004). Secondary teachers’ assessment literacy: Does classroom experience make a difference?. American Secondary
Education 32(3), 49–64.
Mertler, C. A. & Campbell, C. (2005). Measuring teachers’ knowledge & application of classroom assessment concepts: Development of the
assessment literacy inventory. Paper presented at the annual meeting of the American Educational Research Association, Quebec, Canada.
Noll, V., Scannell, D., & Craig, R. (1979). Introduction to educational measurement (4th ed.). Boston: Houghton Mifflin Company.
North Central Regional Educational Laboratory. (2007). Indicator. Assessment. Retrieved May 16, 2007 (available www.ncrel.org/
engauge/framewk/pro/literacy/prolitin.htm).
National Research Council (NRC). (1996). National science education standards. Washington, DC: National Academy Press.
Passig, D. (2001). Future online teachers’ scaffolding: What kind of advanced technological innovations would teachers like to see in future
distance training projects? Journal of Technology and Teacher Education, 9(4), 599–606.
Plake, B. S. & Impara, J. C. (1993). Teacher assessment literacy: Development of training modules (ERIC Document Reproduction Service
No. ED358131).
Plake, B. S., Impara, J. C., & Buckendahl, C. W. (2004). Technical quality criteria for evaluating district assessment portfolios used in the
Nebraska STARS. Educational Measurement: Issues and Practice, 23(2), 10–14.
Plake, B. S., Impara, J. C., & Fager, J. J. (1993). Assessment competencies of teachers: A national survey. Educational Measurement: Issues
and Practice, 12(4), 10–12, 39.
Popham, W. J. (2006). Needed: A dose of assessment literacy. Educational Leadership, 63(6), 84–85.
Sheader, E., Gouldsborough, I., & Grady, R. (2006). Staff and student perceptions of computer-assisted assessment for physiology
practical classes. American Journal of Physiology-Advances in Physiology Education, 30(4), 174–180.
Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15(1), 4–14.
Shulman, L. S. (1987). Knowledge and teaching: Foundations of the new reform. Harvard Educational Review, 57, 1–22.
Stiggins, R. J. (1995). Assessment literacy for the 21st century. Phi Delta Kappan, 77(3), 238–245.
Stiggins, R. J. (1999). Evaluating classroom assessment training in teacher education programs. Educational Measurement: Issues and
Practice, 18(1), 23–27.
Stiggins, R. J. (2004). New assessment beliefs for a new school mission. Phi Delta Kappan, 86(1), 22–27.
Stiggins, R. J., & Chappuis, J. (2005). Using student-involved classroom assessment to close achievement gaps. Theory Into Practice, 44(1),
11–18.
Taylor, C. S., & Nolen, S. B. (1996). What does the psychometrician’s classroom look like? Reframing assessment concepts in the context
of learning. Education Policy Analysis Archives, 4(17) (available olam.ed.asu.edu/epaa/v4n17.html).
Wang, T. H. (2007). What strategies are effective for formative assessment in an e-learning environment? Journal of Computer Assisted
Learning, 23, 171–186.
Wang, T. H., Wang, K. H., Wang, W. L., Huang, S. C., & Chen, Sherry Y. (2004). Web-based Assessment and Test Analyses (WATA)
system: Development and evaluation. Journal of Computer Assisted Learning, 20(1), 59–71.
Wiske, M. S., Sick, M., & Wirsig, S. (2001). New technologies to support teaching for understanding. International Journal of Educational
Research, 35, 483–501.