Assessment Project

Report on Assessment Project in AL 6730 Assessment in TESOL
Introduction
Throughout AL 6730 Language Assessment in TESOL, I learned that language testing is
one of the crucial keys to help students acquire a second language, and I discovered many
concepts and techniques to develop effective assessment tools. In this course, I also had an
opportunity to work with other TESOL students to design assessment materials and administer
them in a language class offered by Hawaii Literacy. We strived to use authentic-like materials
and tasks, and in our project described here, we employed Google Maps in order to prepare
students for English use outside the classroom. Hawaii Literacy offers many classes around
Honolulu, one of which is a drop-in ESL class. I have been volunteering there for about a year,
and this connection afforded us the opportunity to experiment with our test project. At an earlier
point in time, we decided on the use of Google Maps since it is frequently, as well as globally,
used as a route planner for travelling by foot, car, and public transportation, and is therefore a
valuable resource. Hence, our project was designed to teach necessary vocabulary and language
structures for students to be able to use Google Maps in English and to measure their
achievement of this learning in our assessment product. As a result of this project, we saw
accomplishments but also encountered challenges and possible improvements for our future
success in authentic assessment. This paper first introduces how our project developed the
Google Maps-based materials for language testing. It also reports students outcomes and our
analysis of their results. Finally, it discusses authenticity in language assessment by reflecting on
this project and exploring other studies that have debated the same issues.
Project Description
Background information

Host class
English Language Learner (ELL) class, also known as Hawaii Literacys Adult Drop-In
Center, was the target class for our assessment project. This class meets on Mondays,
Wednesdays, Thursdays, Fridays, and Saturdays. Drop-in means students can attend whenever
they like. Students register for the class on the first day, and the registration is good for one year.
One class serves students of all levels. Therefore, their proficiency level is fairly mixed, and it
may range from low beginning to mid-intermediate. The class aims to assist adult learners
wanting to improve their English communication and literacy skills, so they can become active
members of the community. Samuel Skeist, the host teacher of the class, implements a variety of
activities designed to help students manage real world tasks with English. He knows that reading
development cannot be achieved only by reading activities, but should include activities
designed for other skills which will supplement students reading development. Mr. Skeist
manages such an environment where students study together with diverse proficiency levels by
encouraging high proficiency students to help low proficiency students. Furthermore, he
implements balanced activities to give some challenge no matter what the proficiency level of
the individual student is. No formal assessment is conducted in this course. The majority of
students are from Korea and China, and there are some students from Vietnam and Japan.
Students ages range from forty to sixty, and most students have already retired from their
permanent jobs, but there are other students who are part time workers or unemployed. Each
lesson is fairly well balanced with controlled activities such as reading comprehension,
vocabulary reinforcement, and less structured activities such as group discussions.
Host institution

Hawaii Literacy is a non-profit organization which provides literacy programs including
family literacy, adult literacy, English language learning programs, and a Bookmobile. English
Language Learning class, as one of the institutions programs, is located at Kaumakapili Church
in Honolulu, Hawaii. The overall goal of this institution is to help adult learners gain literacy
and/or English knowledge and skills. Hawaii Literacys other programs also offer literacy
learning services with a variety of tutoring settings. All programs are supported by volunteers
dedicating to meeting various literacy needs from students. The non-credit program does not
formally assess students to determine if they pass or fail.
Assessment Project Group Members
This assessment project was conducted by three graduate students: Sara Fowler, William
Jackson, and Yuhei Yamamoto. This section introduces these test developers backgrounds of
teaching English.
Sara Fowler worked as an ESL tutor at the middle school level on a one-on-one basis
with low-proficiency students for three years. In addition, she worked as a writing and Spanish
tutor for college level students for two years, which included helping ESL students. Ms. Fowler
has experience teaching at a drop-in school that is similar to Hawaii Literacy, where she taught
English mainly to Hispanic adult students at an ESL school called Esperanza Center in Baltimore.
In this school, students ages ranged between seventeen and seventy, and their English literacy
skills were classified from completely illiterate to university educated. After teaching at
Esperanza Center, she currently works at Hawaii Pacific Universitys Center for Academic
Success as a writing and Spanish tutor for college level students.

William Jackson has taught at ESL classes in global settings. He went to Taiwan to teach
for one year, and he also taught in Korea for another year. Mr. Jackson also has teaching
background in various locations throughout Latin America.
I, Yuhei Yamamoto, am an international student from Japan and have been in the MA
TESOL program at Hawaii Pacific University since the fall semester of 2012. Throughout the
past two years, I participated in many short term tutoring opportunities such as teaching ESL
students and helping Korean elementary school teachers build their English fluency. I also
developed my career through academic conferences and workshops. For example, aside from
attending those conferences, I also presented my study entitled Implementing Extensive
Reading Activities at the 2014 Hawaii TESOL conference. In addition, during summer vacation,
I went back to Japan and I worked as a workshop instructor for Japanese elementary school
teachers who needed training to improve English teaching at their schools. Finally, our project
site, Hawaii Literacy, is the school where I mainly teach English in Hawaii. For one year, I have
taught at the school for practice teaching and volunteer purposes.
Language Assessment Instrument
The Google-Maps based assessment was administered on April 2, 2014. We stressed
authenticity and its usefulness in our alternative assessment. Our assessment was a quiz designed
to measure students understanding of the days lesson in tasks of getting and giving directions
featured by the online tool, Google Maps. Since the program is drop-in, we needed to teach some
basic vocabulary as a strategic rehearsal to make our materials a valid measure of what we had
taught.
Our test was integrative because each section required students to answer questions with
different language elements. Although we did not aim to measure students production of

grammar and vocabulary, they needed to recognize the relevant grammar and vocabulary to
complete each section. Reading, listening, and writing skills were measured, and this
combination of multiple language skills in the test showed that our test was direct. In the reading
section, students drew on the maps based on the directions they read. In the writing section,
students wrote directions for their friends. In the listening section, students listened to the
directions orally given by the test providers, and filled in each blank with words. Our test was
criterion-referenced, so each item was worth a set number of points. In addition, students gained
points based on the correctness of their answers, and these students responses were calculated
for the total score. A benchmark was set up to see whether students had performed satisfactorily.
We used objective scoring procedures for items A through D with a right or wrong basis. Item E
was scored subjectively, and we gave one point per step the students had written. Spelling and
grammar were not evaluated unless their written sentences were impossible to comprehend.
Finally, we controlled the directions that students had given in scoring procedures; that is, only
understandable steps were considered correct answers.
Objectives
The objectives for this assessment were to:
Measure students comprehension of vocabulary items
Measure students reading, listening, and writing skills
Provide assessment materials which will be relevant to real life situations and useful for
both frequent and occasional attendants to the class.
Design tasks and activities on an appropriate level for the students
Create test items that can be managed by all students in the days class
Specification
1. Specifications of content:
a. Operations:
i. Follow printed directions on a blank map
ii. Comprehension of written directions for navigation on a map

iii. Comprehension of oral directions for navigation on a map
iv. Give written directions using a map

b. Types of text:
i. Maps of Honolulu
ii. Step-by-step written instructions adapted from Google Maps
iii. Step-by-step verbal instructions given by another person

c. Length of text:
i. Three sets of written instruction, approximately between 35 and 75 words
ii. One set of verbal instructions, approximately 75 words (repeated three
times)
iii. One set of written instructions produced by the student, to be
approximately 5 sentences

d. Addressees of text: The candidate is expected to be able to write or speak to
i. Adult native and non-native speakers of English
ii. The learner
iii. General audience

e. Topics:
i. Navigation by bus and on foot in a city setting in scenarios that are related
by a storyline
ii. Giving and receiving directions

f. Readability (Flesh-Kincaid or grade level): 3rd grade reading level
g. Structural range:
i. Imperative
ii. Future Tense

h. Vocabulary range:
i. Directions for navigation: turn, right, left, straight, continue, road splits /
fork in the road, corner, get on/off (the bus)
ii. Prepositions, especially toward and onto

i. Dialect and style:
i. Standard American English
ii. Casual style
iii. List style (Google Maps form)

j. Speed of processing: 30 seconds for each item, approximately 10-20 words each

2. Structure, timing, medium, and techniques:
a. Test structure:
i. Google Maps Walking

ii. Note taking on written bus directions
iii. Google Maps Walking
iv. Note taking on oral bus directions
v. Writing directions Walking

b. Number of items:
i. Approximately 5-10 steps in sections A-D
ii. Writing approximately 5 steps in section E

c. Number of passages:
i. 3 reading
ii. 1 listening (repeated 3 times)

d. Medium:
i. Paper and pencil
ii. Blank maps of the areas in the directions
iii. Lists of directions
iv. Blank lined handwriting sheet

e. Testing techniques:
i. Information transfer by drawing on a map or taking notes
ii. Extended answer writing directions based on a map
iii. Gap filling

3. Criterial level of performance:
a. Satisfactory performance for A, C: correctly following 80% of directions
b. Satisfactory performance for B, D: correctly filling in 80% of blanks
c. Satisfactory performance for E: correctly indicating 80% of directions
4. Scoring procedure:
a. Following directions: Each step within an item is worth 1 point fulfilling
correctly earns 1 point, fulfilling incorrectly earns 0 points

b. Taking notes: Each blank within an item is worth 1 point - fulfilling correctly
earns 1 point, fulfilling incorrectly earns 0 points

c. Writing directions: Each written step is worth 1 point (up to a total of 5 points)
step must be coherent, contain relevant vocabulary, and be recognizable as a step
in a set of directions; steps not meeting these criteria or failure to write a step does
not receive credit

d. All group members collaboratively scored the written directions to ensure that all
scores were agreed upon

e. 30 points possible, total
i. Item A = 5 points
ii. Item B = 7 points
iii. Item C = 6 points
iv. Item D = 7 points
v. Item E = 5 points

f. Grammar and spelling were not taken into account in scoring, so long as they did
not interfere with the meaning

5. Sampling:
a. Lesson plan for the day
b. Google Maps

Student results
The charts of the student results are shown below (1). The mean of the overall score in
the test was 22.75 , and the highest possible score was 30 points. It was detected that there were
seven outliers in the test results; that is, five extremely high scorers and two bottom scorers. This
result skewed students true ability levels. The median is 26 which falls exactly in the middle of
all the scores. The mode was actually the highest possible score of 30 points. This indicates that
there are some high proficiency students in this test and lesson, although the expected range was
between beginning and mid- intermediate. The range was 30 , and this also indicates the wide
spread of the students different levels of performance.
(1)

Statistical analysis of item A is shown below in (2). Results showed a mean score of 4.75
out of 5 possible points and a median, mode, and range of 5. In addition, item facility (I.F.) was
0.912 , and item discrimination (I.D.) was 0.66 . These data show that only a few students
answered incorrectly, and the item could have been too easy for the rest of the students.
(2)

Statistical analysis of item B is shown below in (3). Results showed a mean score of 5.95
out of 7 possible points and a median, mode, and range of 7 . I.F. and I.D. were 0. 62 and 0.83
respectively. These data indicate that item 2 led to a pattern similar to item A, so it could have
Column1
Mean 22.75
Standard Error 1.684595595
Median 26
Mode 30
Standard Deviation 8.252799262
Sample Variance 68.10869565
Kurtosis 1.62960062
Skewness -1.434411858
Range 30
Minimum 0
Maximum 30
Sum 546
Count 24
Column1
Mean 4.75
Median 5
Mode 5
Kurtosis 21.91753
Skewness -4.62701
Range 5
Minimum 0
Maximum 5
Sum 114
Count 24

also been too easy for most students. Interestingly, one low scorer answered this item correctly,
and therefore I.D. was increased compared to item A.
(3)

Statistical analysis of item C is shown below in (4). The result showed a mean score of
4.33 out of 6 possible points and a median, mode, and range of 6 . The I.F. and I.D. were 0.58
and 0.83 respectively. The top 12 scorers answered correctly, and one low scorer also obtained
the points with the same I.D. as item B.
(4)

Column1
Mean 5.958333333
Median 7
Mode 7
Kurtosis 5.741713739
Skewness -2.47046596
Range 7
Minimum 0
Maximum 7
Sum 143
Count 24
Column1
Mean 4.333333
Median 6
Mode 6
Kurtosis -0.62055
Skewness -1.06098
Range 6
Minimum 0
Maximum 6
Sum 104
Count 24

Statistical analysis of item D is shown below in (5). Results were a mean score of 5.12
out of 7 possible points and a mode, median, and range of 7 . The I.F. and I.D. were 0.54 and 1.0
respectively. The I.D value reached the highest possible number, 1.0 . Interestingly, a high
number of intermediate scorers as well as all the low scorers made errors. Along with the value
of mode, median, and range, the data showed that this item was not difficult for advanced
learners and somewhat distinguished between intermediate and low scorers.
(5)

Statistical analysis of item E is shown below in (6). Results showed a mean score of 2.5
out of possible 5 points and a mode of 0 , median of 3 , and range of 5 . The I.F. and I.D. were
calculated to be 0.25 and 0.833 respectively. This item required students to write directions and
led to the most distinctive results among all of four items. The mode indicates that the item was
the most difficult section. Although an I.D. value of 1 had been expected, some low scorers
performed well on this item. With this exception, the result showed a pattern of descending
values from top to bottom.
(6)
Column1
Mean 5.125
Median 7
Mode 7
Kurtosis 0.00428829
Skewness -1.1572364
Range 7
Minimum 0
Maximum 7
Sum 123
Count 24

Reflection and Discussion
Throughout AL 6730, our project sought to conduct an alternative assessment project
using Google Maps in our testing items. Our original stimulus to employ Google Maps was that
we wished to measure students language performance using authentic-like materials, and we
hoped that students would feel our assessment was useful for their English use outside the
classroom. This project helped me learn practical aspects of test design and administration. In
addition, for my future teaching in Japan, I would like to introduce authentic materials and non-
traditional test tasks for formative assessments in my classes. However, my students will
probably be required to take traditional tests for summative or high stakes exams, so for the
foreseeable future, authentic assessment will still be rare in Japan. Reflecting on the results of the
project, I encountered some considerations that might be useful for my future attempts to use
authentic materials. Hence, the aim of this section is to examine other studies about authenticity
in language testing and reflect on my group project based on findings from the literature review.
This paper first discusses the necessity of authentic materials and tasks in language testing, and
then shows the challenges of defining authenticity in classroom assessment. Finally, this paper
considers whether authentic materials increase students motivation.
Necessity of Authentic Assessment
Column1
Mean 2.5
Median 3
Mode 0
Kurtosis -1.631566261
Skewness -0.068887741
Range 5
Minimum 0
Maximum 5
Sum 60
Count 24

It is important to explore theories that strive to define the necessity of authenticity in
language testing. Newman, Brandt, and Wiggins (1998) claimed that assessment was authentic
when the meaning and value that were measured in students tasks extended beyond their success
in school. Palm (2008) also pointed out that authentic tests were true to students lives beyond
their school, curriculum, and classroom practice. Wiggins (1989) was an early proponent for
authentic assessment, claiming any exhibition of mastery should be the students opportunity to
show off what they know and are able to do rather than a trial by question (p. 43). He also
believed that the use of intellectual performances stressed authenticity in assessment and led to
ideal outcomes in authentic contexts. His position did not insist that traditional tests were
inauthentic but indicated that they were less meaningful and less direct, which meant that
students would not receive faithful assessment. Bachman and Palmer (1996) viewed authenticity
as an essential component in language tests, suggesting that test developers carefully consider it
in designing materials.
I am proud of our assessment materials because Google Maps enabled us to incorporate
some authenticity into the test items, and it could be useful for students when they use English
outside the classroom. Most students were familiar with Google Maps and some students told me
that they often used it to check routes of where they wanted to go, so we considered the materials
to be meaningful and relevant for them. Therefore, I agree with studies insisting that authentic
assessment enables test developers to measure what students know and what they are able to do
in tasks instead of measuring their performance in traditional questions. Students at Hawaii
Literacy come to school with the desire to acquire English for their everyday life. Hence, our
group project aimed to emphasize the relevance of test items to students lives.
Limitations of Authenticity in Language Testing

Currently, authenticity is considered a major goal in language test design, but there are
also challenges with how it can be realistically achieved in language testing. Brown (2001)
argued that because tests are fundamentally structured with artificial contexts for language use,
authenticity is hardly defined in language testing. Spolsky (1985) asserted that test interaction
cannot be authentic as students are required to display knowledge and skills that are not seen in
their non-test interactions. Brown conducted an assessment task where students in a university
level Japanese course interviewed native speakers outside the classroom, and he argued that no
matter how similar tasks were to real world tasks, students would not engage in assessment tasks
in the same way that they interact in the non-test tasks. This then results in a threat to validity
because even though the aim for assessment is the necessary abilities for real world tasks,
students are not engaged in the same way that they use English outside the classroom. Arnold
(1991) also stressed that measuring students success in authentic tasks depends on their
authentic response, arguing that it is difficult for students to act in authentic-like roles in the
classroom. Authentic assessment is often situational and involves interaction with classmates.
However, Arnold challenged that in language classrooms, role play activities and simulations
involved non-native peers in interaction, and he pointed out that such an interaction is hardly
authentic if they are intended to prepare for conversations with native speakers. Henk (1993)
supported that the use of authentic text is valuable in reading assessment, but he also warned of
some complications of which test developers need to be aware. First, in order to sustain fairness
in reading tests, prior knowledge should be assessed, because authentic contents may already be
familiar to some students but not others. Second, since the length of reading materials needs to
be suitable for the limited time of the testing, a small number of authentic sources are appropriate.

Finally, poor readers are often threatened by more complicated independent reading with
authentic materials.
Our project used visual images and words directly from Google Maps, as well as tasks
such as drawing steps on a map and writing directions for friends. We believed that these
attempts made our assessment more authentic. However, in our test items, there were also non-
authentic tasks such as gap-filling. I reflected that despite our effort for authenticity, it was often
hard to introduce real world tasks since some traditional assessment tasks were useful to measure
students responses. In addition, even though our project did not include students interaction in
the assessment, I learned that interaction may enhance authenticity in language testing. I will
need to consider how to make test interaction more authentic so as to correlate what students do
in real world tasks with the test objectives to measure their real world skills. Finally, the test
results indicated that some students were overwhelmed by our test items. As many studies have
discussed, authentic materials need to be carefully designed with consideration of students
proficiency levels. Our assessment project used close-ended tasks, and the problem of some
students struggling with our items could have been resolved by using open-ended items in which
students could show their differing degree of fluency.
Authentic assessment to Increase Students Motivation
Many authors have attempted to investigate whether authentic assessment increases
students motivation. Peacock (1997) stated that authentic materials can reduce students
motivation. He conducted a classroom research project with beginning EFL students to
investigate the effect of authentic materials on students motivation. As a result, it was found that
whereas authentic materials increased on-task behavior and overall class motivation, students
reported that authentic materials were less interesting than artificial materials. It was reported

that the topic and language were too difficult for students. With these results, Peacock suggested
that teachers should carefully select or adapt authentic materials because this can affect students
motivation for learning in authentic contexts.
No official research to collect data of students motivation was conducted in this project.
Yet, because students were curious and willing to talk about how they did in our assessment, I
was able to listen to their reflection on the test items in person. I asked them what they thought of
the test items, and most students answered either that the test was difficult or easy. Our project
aimed to increase students motivation by making test items relevant to their lives, and as a result,
some students perceived the materials as useful tools outside the classroom but others did not. I
concluded that proficiency levels and content relevance were two significant factors in selecting
which authentic materials would be valuable for students.
Conclusion
This paper explored the effectiveness as well as the limitations of the use of authentic
materials in assessment activities. Overall, although designing authentic test items seems
challenging, I believe this approach is significantly important to measure students English
abilities outside the classroom. Thus, I would like to pursue emphasizing authenticity in my
future test materials. Moreover, I learned that more data collection is necessary to obtain the
effectiveness of our assessment in this project. In particular, I would like to conduct research to
investigate whether authentic assessment increases students motivation. This literature review
and AL 6730 project helped me gain knowledge of theories, practical issues, and research
procedures in language assessment.
Future I nquires
How does proficiency level affect students outcomes in authentic assessment?

What are the issues with using alternative assessment in EFL settings in which students
learn English for school entrance examinations?
What are considerations for selecting authentic materials that increase students
motivation in language assessment?

References
Arnold, E. (1991). Authenticity revisited: How real is real? English for Specific Purposes, 10(3),
237-244.
Bachman, L., & Palmer, A. (1996). Language testing in practice: Designing and developing
useful language tests. Oxford, England: Oxford University Press.
Brown, R. (2001). The eye of the beholder: Authenticity in an embedded assessment task.
Language Testing, 18(4), 463-481.
Henk, W. (1993). New directions in reading assessment. Reading & Writing Quarterly, 9(1),
103-120.
Newmann, F., Brandt, R., & Wiggins, G. (1998). Research news and comment: An exchange of
views on "Semantics, psychometrics, and assessment reform: A close look at
'authentic' assessments". Educational Researcher, 27(6), 19-22.
Palm, T. (2008). Performance assessment and authentic assessment: A conceptual analysis of the
literature. Practical Assessment, Research & Evaluation, 13(4), 1-10.
Peacock, M. (1997). The effect of authentic materials on the motivation of EFL learners. ELT
Journal, 51(2), 144-156.

Spolsky, B. (1985). The limit of authenticity in language testing. Language Testing, 2(1), 31-40.
Wiggins, G. (1989). Teaching to the (authentic) test. Education Leadership, 46(7), 41-47.

Assessment Project

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Assessment Project

Hochgeladen von

Copyright:

Verfügbare Formate

Report on Assessment Project in AL 6730 Assessment in TESOL

Das könnte Ihnen auch gefallen