Peer Self and Tutor Assessment Relative Reliabilities

Studies in Higher Education
ISSN: 0307-5079 (Print) 1470-174X (Online) Journal homepage: http://www.tandfonline.com/loi/cshe20
Peer, self and tutor assessment: Relative

reliabilities
Lorraine A.J. Stefani
To cite this article: Lorraine A.J. Stefani (1994) Peer, self and tutor assessment: Relative
reliabilities, Studies in Higher Education, 19:1, 69-75, DOI: 10.1080/03075079412331382153
To link to this article: http://dx.doi.org/10.1080/03075079412331382153
Published online: 05 Aug 2006.
Submit your article to this journal
Article views: 1446
View related articles
Citing articles: 109 View citing articles
Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=cshe20
Download by: [Ondokuz Mayis Universitesine]
Date: 02 May 2016, At: 08:35
Studies in Higher Education
Volume 19, No. t, 1994
69
RESEARCH NOTE
Peer, Self and Tutor Assessment:

relative reliabilities
LORRAINE A. J. STEFANI
Downloaded by [Ondokuz Mayis Universitesine] at 08:35 02 May 2016
The Queen's University of Belfast
ABSTRACT A collaborative peer, self and tutor assessment scheme in which the students themselves
defined the marking schedule for a scientific report of a laboratory experiment within the biological
sciences, is evaluted in terms of correlations between sets of marks. The issues addressed in this report
include: (t) the reliability of student-derived marks, with particular emphasis on perceived tendencies
of high achieving students to underestimate their performance and low achieving students to
overestimate their performance; (2) the use of student-derived marks in formal grading procedures;
and (3) the learning benefits which accrue for students participating in peer and self-assessment
procedures. The results of this study undertaken within the context of a clearly defined, carefully
monitored assignment indicate that students have a realistic perception of their own abilities and can
make rational judgements on the achievements of their peers. The positive implications of introducing
peer and self-assessment schemes into undergraduate courses are discussed.
Introduction
Over the past few years the United Kingdom Enterprise Higher Education (EHE) initiative
has provided an impetus for change in the way lecturers and tutors interact with and
communicate knowledge to large groups of students. There is a growing awareness amongst
university lecturers of the pressure to increase student participation in the learning process
and to provide a skills-based education as well as one based on academic achievement.
Many of the new teaching and learning strategies developed within EHE institutions
have been designed to ensure that students become more aware of the demands of future
employers for graduates who are able to display a range of personal transferable skills.
Communication and presentation skills, problem-solving and organisational skills, team-work
and leadership skills have all been incorporated into degree courses. In addition, in many
fields of professional training there has been a concern for developing students' ability to
assess and evaluate their own work in ways which are applicable to their future profession
(Magin & Churches, 1989). According to Boud & Lublin (1983), "one of the most important
processes that can occur in undergraduate education is the growth in students of the ability
to be realistic judges of their own performance and the ability to monitor their own learning".
The prevalent model for assessment throughout the education system has been one in
which "students have little or no input, are often unaware of the assessment criteria and have
little recourse regarding the judgements made of them" (Falchikov, 1986). Within the
context of the changing climate of higher education, development of the skill of self-assessment is becoming an increasingly important issue in many EHE institutions, and many
self-assessment devices are being introduced as aids to learning. However, an issue which is
still an obstacle to wider introduction of self-assessment over a range of courses and learning
formats, is the summative use of self-assessment for grading purposes.
As discussed by Boud (1989), many people believe that student-derived marks could not
70
L. A..7. Stefani
be used in formal grading procedures because they would not be accurate enough. In a
comprehensive review of the available literature on self-assessment procedures, Boud &
Fatchikov (1989) reported the general trend in the studies was that high achieving students
underestimate their performance and low achieving students overestimate their performance.
However, a close examination of many of the studies which were included in Boud &
Falchikov's review highlights the fact that much of the work which has been carried out on
peer and self-assessment is reported in an inconsistent manner and it can be difficult to define
the parameters of many of the studies. More extensive analyses of the reliability of studentderived marks across a range of subject areas is required to determine the extent to which
peer and self-assessments could be used in formal grading procedures.
This paper reports the results of a study undertaken with a large class of students
self-assessing a biochemistry laboratory practical report and a second class peer-assessing the
same laboratory report using a student-derived marking schedule. The questions addressed
are: (1) Do students of lower ability overmark themselves and students of higher ability
undermark themselves? (2) Could self- and peer-assessments be used summatively in formal
grading procedures as well as formatively in contributing to the learning process by assisting
learners to direct their energies to areas of improvement? (3) Is there any correlation between
self, peer and tutor assessment of an assigned piece of work and the end of term ranking of
students after traditionally assessed examinations? (4) Is self- and peer-assessment beneficial
as a learning experience for students?
Methodology
A peer and self-assessment procedure was presented to two first-year undergraduate classes
within the context of writing a report of a laboratory practical project which constitutes part
of the student training in biochemical techniques. The students themselves drew up the
marking schedules which they felt were appropriate for the task. This was done by a class
representative negotiating with the rest of the students until the class was satisfied with the
scheme. No modification of the schemes was made by the tutor on the basis that engendering
high levels of communication and negotiation within large classes of students was considered
to be an important contribution to the success of this innovation. Student ownership of the
work was also considered to be a high priority. As it can be extremely difficult to obtain
agreement between lecturers and tutors on appropriate marking criteria, it seemed unfair to
thwart the student efforts by introducing modifications.
One class of 87 students agreed to self-assessment of the laboratory reports and another
class of 67 students agreed to peer assessment of the reports using the student generated
marking schedules. An ideal situation would be self, peer and tutor assessment occurring
within the same class, but timetable constraints did not allow for this. The student marking
schedules are shown in Table I.
When the laboratory work was completed, the students were given 7 days to hand in
their reports. All the reports were assessed by the tutor, but these marks were not initially
released to the students. The reports to be self-assessed were handed back to the students
who were then given 7 days to assess their own work before handing in the report to receive
the tutor assessment. The reports to be peer-assessed were handed back to the class randomly
and this class was also given 7 days to complete the assessment. This project was carried out
with first year undergraduate students and with such a large class (67), the students did not
know many of their peers. Although the projects were not coded in any way, it turned out that
no student peer-assessed a friend's laboratory report.
It had been agreed with the class that the reports would be marked out of 100 and that
Peer, Self and Tutor Assessment

TABLE I.
Student-derived
schedules
71
marking
(a) Self-assessment Schedule

Aims and hypothesis
Methods and apparatus
Results (calculations etc.)
Interpretation of results
Discussion
Total
15
20
25
25
15
100
(b) PeerAssessment Schedule

Introduction
Aims
Method
Results
Discussion of results
Conclusions
Total
15
15
10
10
25
25
100
we would have a - 10 m a r k 'acceptance range'. T h e agreed m a r k would be the average of

the tutor and the student m a r k within this constraint. F o r marks which fell outside o f the
'acceptance range' a discussion meeting would be set up between students a n d tutor to
decide u p o n an appropriate final mark. It was also agreed within the School of Biology and
Biochemistry that the marks from this project would be used summatively as part o f the
overall continuous assessment c o m p o n e n t of the e n d of year marks. T h e contribution of this
project was 2 % of the final mark. Full details o f the above p r o c e d u r e have previously been
published (Stefani, 1992).
In the first instance, the data from this experiment in collaborative self, p e e r and tutor
assessment were analysed to determine: the levels of student u n d e r m a r k i n g or overmarking
in comparison with tutor marking, u n d e r m a r k i n g or overmarking based on gender comparisons and the accuracy of the m a r k i n g relative to student age (Stefani, 1992). T h e data
obtained from this experiment can further be used to examine two crucial questions relating
to the use o f peer and self-assessment procedures in a summative as well as formative
manner: (1) D o students o f lower ability overmark themselves and students o f higher ability
u n d e r m a r k themselves? (2) W h a t is the correlation between tutor, self and peer assessment
o f a course assignment and end of term student marking? I n the current paper, these
questions are n o w addressed and statistical analysis o f the data derived from the experiment
are p r e s e n t e d a n d discussed with such issues in mind.
Results
Mark Analysis
Out of a class o f 87 students participating in the self-assessment exercise, 80 students
presented their reports for tutor assessment. T a b l e II provides information on the averages
obtained from tutor assessment and self-assessment. T h e self-assessment marks appear m o r e
stringent than the tutor marks and there is slight indication that students m a r k within a m o r e
restricted range as indicated by the lower standard deviation.
T h e s e resuks in themselves give no indication as to whether students with high marks
from the tutor t e n d e d to award themselves a lower m a r k a n d students with low marks from
72
L. A. ft. Stefani
TABLE II. Tutor versus self assessment: comparison of means and
standard deviations
Tutor mark
Student self mark
Mean
Standard deviation
80
80
75.3
72.7
10.1
9.3
N.B. For various reasons seven students did not participate in the
self-assessment exercise.
the tutor award a higher self mark. This can be e x a m i n e d by categorising the student marks
into quartiles based on the scores obtained from lecturer marks. T a b l e III shows the o u t c o m e
o f this analysis.
TABLE III. Tutor versus self-assessment: differences in means based on performance quartiles
Quartile group
(tutor marks)
45-62
63-74
75-87
88-100
Number in quartile
Tutor 12
Student 20
Tutor 42
Student 41
Tutor 21
Student 16
Tutor 5
Student 3
Tutor mark
(mean)
Self mark
(mean)
Difference of means
(T - S)
56.2
54.9
+ 1.3
70.9
67.9
+3
81.5
78.2
+ 3.3
92.4
89.7
+ 2.7
T h e group in the lowest quartile (receiving tutor marks between 45 and 62) provided self
marks which were on average 1.3 marks lower than the marks awarded by the tutor. Students
in the highest quartile (receiving tutor marks between 88 and 100) provided self marks which
were o n average 2.7 marks below the tutor mark. T h e highest discrepancies occur where
there is greatest clustering of the m a r k s - - b e t w e e n 63 and 87. T h e i m p o r t a n t points to note
are that these figures r u n counter to the notion that students receiving lower marks from the
tutors award themselves higher marks, a n d could be interpreted to confirm to some extent the
belief that higher achievers m a y m a r k themselves down. However, on this last point it is just
as feasible to argue that the discrepancies are no greater than might be f o u n d as a result o f
inter-examiner variability in double marking procedures.
Analysis o f the correlation between the self-assessments and the t u t o r assessments gives
an r value o f 0.93. F r o m these results it can be inferred that use o f the student marks in place
o f tutor marks would result in a similar ordering of individual performance with only the
slightest t e n d e n c y towards undermarking, particularly with high achievers, b u t no corres p o n d i n g overmarking with low achievers.
D u r i n g the course o f this experiment, and whether the students were engaged in
self-assessment (as r e p o r t e d above) or p e e r assessment (yet to be reported), it was n o t e d that
the students were highly m o t i v a t e d a n d m o r e interested in the task than is generally observed
during large practical classes. This m a y have been due to a greater sense of involvement in
all aspects o f the project than is usually the case. O n e consequence o f this was a very high
s t a n d a r d o f work p r o d u c e d a n d overall higher achievement in interpreting experimental data
and presenting a scientific report.
T h e role o f self-assessment in the d e v e l o p m e n t o f professional c o m p e t e n c e has been
recognised, and B o u d (1989) argues that one o f the characteristics o f effective learners is that
73
they have a realistic sense of their own strengths and weaknesses. T o examine this issue, the
extent to which an exercise o f this nature could be used as a student performance indicator
was addressed. T h e student ranking d e t e r m i n e d by the self-assessment marks was correlated
with the e n d o f t e r m student ranking after traditionally assessed examinations. T h i s was then
c o m p a r e d with the correlation between ranking of the students according to tutor derived
marks for this exercise a n d the end of t e r m ranking. T h e correlation between self-assessment
o f a scientific r e p o r t a n d the o u t c o m e o f examinations gives an r value o f 0.71, and the
correlation between tutor assessment o f the scientific r e p o r t and the o u t c o m e of end of term
examinations gives an r value o f 0.58.
T h i s is an interesting result because it suggests that if self-assessment marks alone were
to be used as the d e t e r m i n a n t o f examination results, the result would be a moderately similar
ordering o f individual performance to that obtained from traditionally assessed examinations.
Such a sweeping generalisation based on one exercise in self-assessment with one class is o f
course quite unacceptable, b u t these results do provide encouragement for continued
introduction o f self-assessment exercises in different courses at different stages o f u n d e r graduate training. Similar analyses were p e r f o r m e d with the data obtained from the p e e r
assessment class. Out o f a total o f 67, four students failed to participate in the exercise due
to absence.
T a b l e IV shows the averages and standard deviations o f the peer and tutor assessments
of the laboratory r e p o r t and T a b l e V shows the assessments categorised into quartiles to
d e t e r m i n e whether the p e e r marking o f the higher and lower achievers is m o r e or less
stringent than the tutor marking. As with the self-assessments, the peer assessment figures
suggest that the students m a r k within a m o r e restricted range than tutors.
TABLE IV. Tutor versus peer assessment: comparison of
means and standard deviations
Tutor mark
Peer mark
Mean
Standard deviation
63
57
74
74.4
12.01
10.7
This set o f results indicates that peer assessment is more stringent than tutor assessment
within the lower m a r k range a n d slightly less stringent t h r o u g h o u t the rest o f the range.
However, the small differences in the means a n d the reasonable agreement between the
n u m b e r s of students within each quartile indicate as with the self-assessed scripts that the
general ranking within the class shows g o o d agreement between the peer and t u t o r assessments. This is further highlighted with a correlation coefficient between peer assessment and
tutor assessment o f r = 0.89.
As with the self-assessments, the correlation between the p e e r and tutor assessments in
this exercise a n d the student ranking after traditionally assessed examinations was calculated.
T h e correlation between peer assessment o f a scientific report and the o u t c o m e of traditionally assessed examinations gives an r value o f 0.47 and the correlation between the tutor
assessments o f this exercise a n d the o u t c o m e o f the examinations gives an r value o f 0.58.
T h e discrepancies between the assessments of this exercise and the o u t c o m e of examinations are n o t particularly surprising. A questionnaire which was designed to determine the
perceived benefits o f p e e r and self-assessment procedures was given to a11 the students.
A l m o s t 100% o f the students said that p e e r a n d self-assessment procedures m a d e t h e m think
m o r e a n d 8 5 % said that it m a d e t h e m learn m o r e t h a n traditionally assessed work. Therefore,
it m u s t be accepted that the results observed here relate to a highly motivated activity
74
L. A. J. Stefani
TABLE V. Tutor versus peer assessment: differences in means based on performance quartiles
Quartile group
(tutor marks)
45-62
63-74
75-87
88-100
Number in quartile
Tutor 8
Student 7
Tutor 25
Student 20
Tutor 19
Student 26
Tutor 5
Student 4
Tutor mark
(mean)
Peer mark
(mean)
Difference of means
(T - P)
56.8
53.5
+ 3.3
67.8
70.15
- 2.4
79.6
81.0
- 1.4
91.6
93.0
- 1.4
c o m p a r e d to e n d of term examinations, a n d no firm conclusions can be m a d e on the basis

o f this one exercise. Nevertheless, the observations m a d e during the course o f this exercise
are highly encouraging with respect to the reliability o f learners' p e e r a n d self-assessments.
Discussion
This study has shown that student assessment can be as reliable as that of lecturers and goes
some way to dispelling fears that lower achievers award themselves higher marks and higher
achievers m a r k themselves d o w n relative to tutor marking. Although this p a p e r reports the
results of one exercise in peer and self-assessment, the introduction of similar procedures to
other groups o f students in different contexts a n d subject areas has shown remarkably similar
results (Fitzgerald & Stefani, in preparation).
W i t h respect to the validity of student marks, the differences in correlations between the
assessments o f a scientific report and end o f term ranking o f students raise interesting points.
S t u d e n t motivation in this exercise was perceived to be very high and it is likely that the
students worked h a r d e r to reach higher levels of achievement than might normally be the case
with respect to end o f t e r m examinations. G i v e n the tack o f training in p e e r a n d self-assessm e n t experienced by the two classes o f students in this study, no firm conclusions should be
drawn regarding these results. However, long-term m o n i t o r i n g o f these groups o f students is
u n d e r way regarding their abilities a n d attitudes towards alternative assessment methods.
Procedures have been i n t r o d u c e d to give students formative and constructive feedback on
their achievements in this area.
M a n y lecturers/tutors express great fear o f h a n d i n g any of the p o w e r of assessment over
to students. This fear generally stems from the possibility that the student marks will differ
significantly from lecturer marks. T o counteract this fear, it can be argued that introducing
students to self and p e e r assessment early in their academic career and using the m a r k
summatively as well as formatively will engender a sense o f responsibility in students such
that b y the time that the grading a n d ranking of students becomes a crucial matter, for
example in the final year of u n d e r g r a d u a t e training, students will be well a c c u s t o m e d to the
procedures. Gradually, within this university student derived marks are contributing to end
o f t e r m s t u d e n t ranking.
C o w a n (1988) has argued that the benefits o f self-assessment are so great that we should
trust students to act appropriately even when there is a risk that there could be differences
between the student m a r k and the tutor mark. I n support of this idea, when the students
participating in this p e e r and self-assessment exercise were given a questionnaire to evaluate
the learning benefits, almost 100% o f the students said that the scheme m a d e t h e m think
75
more, 85% said it made them learn more and 97% said that it was challenging. These
responses were given despite the fact that 100% of the students said that it was more time
consuming and over 75% said that it was hard (Stefani, 1992). Since introducing this
assessment strategy, many students have asked if this procedure will operate in any subsequent courses. Perhaps student demand will take over and lecturers will be forced to
respond to student needs and introduce peer and self-assessment procedures more widely. In
the writer's opinion this would be no bad thing.
Correspondence: Lorraine A. J. Stefani, School of Biology and Biochemistry, The Queen's

University of Belfast, Belfast BT9 7BL, United Kingdom.
REFF~.ENCES
BOUD, D. (1989) The role of self-assessment in student grading, Assessment and Evaluation in Higher
Education, 14, pp. 20-30.
BOUD, D. & FALCHIKOV,N. (1989) Quantitative studies of student self-assessment in higher education: a
critical analysis of findings, Higher Education, 18, pp. 529-549.
BOtrD, D. & LUBLIN,J. (I 983) Self-assessment in Professional Education. A Report to the Commonwealth Research
and Development Committee (Tertiary Education Research Centre, University of New South Wales).
COWAN, J. (1988) Struggling with student self-assessment, in: D. J. BouD (Ed.) Developing Student Autonomy
in Learning, 2nd edn, pp. 192-210 (London, Kogan Page).
FALCmKOV, N. (1986) Product comparisons and process benefits of collaborative peer and self-assessment,
Assessment and Evaluation in Higher Education, 11 (4), pp. 146-166.
MAGrN, D.J. & CHURCHES, A.E. (1989) What do Students Learn from Self and Peer Assessment in Designing
for Learning in Industry and Education, pp. 224-233 (Canberra, Australian Society for Educational
Technology).
STEFA~I, L.A.J. (1992) Comparison of collaborative self, peer and tutor assessment in a biochemistry
practical, Biochemical Education, 20, pp. 148-151.

Peer Self and Tutor Assessment Relative Reliabilities

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Peer Self and Tutor Assessment Relative Reliabilities

Hochgeladen von

Copyright:

Verfügbare Formate

Studies in Higher Education

ISSN: 0307-5079 (Print) 1470-174X (Online) Journal homepage: http://www.tandfonline.com/loi/cshe20

Peer, self and tutor assessment: Relative

Published online: 05 Aug 2006.

Submit your article to this journal

Article views: 1446

View related articles

Citing articles: 109 View citing articles

Full Terms & Conditions of access and use can be found at

Date: 02 May 2016, At: 08:35

Studies in Higher Education

Volume 19, No. t, 1994

Peer, Self and Tutor Assessment:

Downloaded by [Ondokuz Mayis Universitesine] at 08:35 02 May 2016

The Queen's University of Belfast

Downloaded by [Ondokuz Mayis Universitesine] at 08:35 02 May 2016

Peer, Self and Tutor Assessment

(a) Self-assessment Schedule

Downloaded by [Ondokuz Mayis Universitesine] at 08:35 02 May 2016

(b) PeerAssessment Schedule

we would have a - 10 m a r k 'acceptance range'. T h e agreed m a r k would be the average of

Downloaded by [Ondokuz Mayis Universitesine] at 08:35 02 May 2016

Downloaded by [Ondokuz Mayis Universitesine] at 08:35 02 May 2016

Peer, Self and Tutor Assessment

Downloaded by [Ondokuz Mayis Universitesine] at 08:35 02 May 2016

c o m p a r e d to e n d of term examinations, a n d no firm conclusions can be m a d e on the basis

Peer, Self and Tutor Assessment

Downloaded by [Ondokuz Mayis Universitesine] at 08:35 02 May 2016

Correspondence: Lorraine A. J. Stefani, School of Biology and Biochemistry, The Queen's

Das könnte Ihnen auch gefallen