Beruflich Dokumente
Kultur Dokumente
in Assessment
Peter Hill and Michael Barber
December 2014
Preparing for a Renaissance in Assessment
Peter Hill and Michael Barber
ABOUT PEARSON or send a letter to Creative Commons, 559
Pearson is the worlds leading learning Nathan Abbott Way, Stanford, California
company. Our education business combines 94305, USA.
150 years of experience in publishing with
the latest learning technology and online Sample reference: Hill, P. and M. Barber (2014)
support. We serve learners of all ages around Preparing for a Renaissance in Assessment,
the globe, employing 45,000 people in more London: Pearson.
than seventy countries, helping people to
learn whatever, whenever and however they ABOUT THE AUTHORS
choose. Whether its designing qualifications in
the UK, supporting colleges in the US, training
school leaders in the Middle East or helping
students in China learn English, we aim to help
people make progress in their lives through
learning.
Dr Peter Hill has held senior positions in
INTRODUCTION TO THE education in Australia, the USA and Hong
SERIES Kong, including as Chief Executive of the Vic-
The Chief Education Advisor, Sir Michael Barber, torian Curriculum and Assessment Board,
on behalf of Pearson, is commissioning a series Chief General Manager of the Department
of independent, open and practical publications of School Education in Victoria, Australia, Pro-
containing new ideas and evidence about fessor of Leadership and Management at the
what works in education. The publications University of Melbourne, Director of Research
contribute to the global discussion about and Development at the National Center on
education and debate the big unanswered Education and the Economy in Washington
questions in education by focusing on the DC, Secretary General of the Hong Kong
following eight themes: Learning Science, Examinations and Assessment Authority and
Knowledge and Skills, Pedagogy and Educator Chief Executive of the Australian Curriculum,
Effectiveness, Measurement and Assessment, Assessment and Reporting Authority.
Digital and Adaptive Learning, Institutional
Improvement, System Reform and Innovation, He is currently a consultant advising on system
and Access for All. We hope the series will reform in the areas of curriculum, assessment
be useful to policy-makers, educators and all and certification. He has published numerous
those interested in learning. research articles and co-authored with Michael
Fullan and Carmel Crvola the award-winning
CREATIVE COMMONS book, Breakthrough, published by Corwin Press.
Permission is granted under a Creative Com-
mons Attribution 3.0 Unported (CC by 3.0)
licence to replicate, copy, distribute, transmit
or adapt all content freely provided that attri-
bution is provided as illustrated in the refer-
ence below. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/3.0
ii
Standards (from 1997 to 2001). Before joining
government he was a professor at the Institute
of Education at the University of London.
He is the author of several books including
Instruction to Deliver, The Learning Game:
Sir Michael Barber is a leading authority on Arguments for an Education Revolution and How
education systems and education reform. to Do the Impossible: A Guide for Politicians with
Over the past two decades his research a Passion for Education.
and advisory work have focused on school
improvement, standards and performance; Michael has recently been appointed as Chair
system-wide reform; effective implementation; of the World Economic Global Advisory
access, success and funding in higher education; Forum.
and access and quality in schools in developing
countries. ACKNOWLEDGEMENTS
We would like to recognise the significant
Michael joined Pearson in 2011 as Chief contribution of Simon Breakspear to the
Education Advisor, leading Pearsons conceptualisation of this paper and Jacqueline
worldwide programme of research into Cheng for working with us to develop the
education policy and efficacy, advising on and paper. We would also like to put on the
supporting the development of products and record our gratitude to Carmel Crvola,
services that build on the research findings and Michael Fullan, Doug Kubach and many
playing a particular role in Pearsons strategy colleagues within the Pearson North America
for education in the poorest sectors of the assessment community; to Maria Langworthy,
world, particularly in fast-growing developing Tony Mackay, Geoff Masters, Roger Murphy
economies. and Jim Tognolini, for the time they took to read
drafts and for their many valuable suggestions
Prior to Pearson, Michael was a Partner at for improving the text; and Lee Sing Kong for
McKinsey & Company and Head of McKinseys writing the foreword. Finally, thanks to Peter
global education practice. He co-authored Jackson and Tanya Kreisky for their editorial
two major McKinsey education reports: How work; to Olivia Simmons and Liz Hudson for
the Worlds Most Improved School Systems Keep managing production of the final version; and
Getting Better (2010) and How the Worlds to Splinter for the design.
Best-Performing Schools Come Out on Top
(2007). He is also Distinguished Visiting Fellow Pearson 2014 The contents and opinions
at the Harvard Graduate School of Education expressed in this report are those of the
and holds an honorary doctorate from the authors only. Figures reprinted with permission.
University of Exeter.
ISBN: 9780992422653
Michael previously served the UK government
as Head of the Prime Ministers Delivery Unit
(from 2001 to 2005) and as Chief Adviser to
the Secretary of State for Education on School
iii
CONTENTS
FOREWORD by Lee Sing Kong 1
EXECUTIVE SUMMARY 3
Setting the scene 3
Assessment: a field in need of reform 3
Transforming assessment 6
A framework for action 10
3. TRANSFORMING ASSESSMENT 41
REFERENCES 72
v
FOREWORD
Assessment is a very complex topic. As systems, a fundamental issue that must be first
this essay articulates, it is meant to monitor clearly articulated is What is the purpose of
or to measure what students have learnt. education in this new world that we live and
For validity and reliability, and to minimise work in? Only when we can articulate with
subjectivity, standardised tests are often clarity the purpose of education in terms of
adopted and marks are awarded, followed by the learning outcomes that the education
a process in which test scores are converted process aims to achieve can we then articulate
into grades. The grades are then recognised what an assessment renaissance implies so
as measures of students learning attainment. that the what and how of assessment can be
But what assessment actually means is seldom crystallised.
articulated. Is it a measure of the body of
knowledge that a student has acquired, or is it For an assessment renaissance to be
also a measure of other attributes? meaningful, it also needs a total cultural shift
within society to accept the different what and
Institutes of higher education have often found how of assessment. The current mindset of
such assessment grades to be so lacking in assessment is all about test scores, irrespective
substance for admission purposes that many of of whether the meaning of the test scores is
these institutes have introduced other modes well clarified. In realising the outcomes of the
of assessment so as to gauge the other desired assessment renaissance, there may not always
attributes of their candidates. The complexity be a test score to contend with. It may just be a
of assessment is further compounded by the series of qualitative descriptions of the extent
way in which test scores are utilised. Apart to which a student may have demonstrated
from being considered for entry into further various attributes that cannot be quantified.
education, they are also used for the purpose Can society accept such assessment outcomes?
of accountability of schools or the system, as
well as the performance of teachers. Going forward, assessment will remain
a complex issue, no matter what form
In the twentieth century, the standardised test the assessment renaissance may take. It
approach could be valid and reliable, though is here that the importance of research
never perfect. However, in the twenty-first- and development into assessment issues
century landscape, where the demands go cannot be overemphasised. If the what and
beyond just knowledge and technical skills, the how can be conducted with clarity of
there is, indeed, a need for an assessment meaning, and considered valid and reliable
renaissance so that the desired attributes with minimal subjectivity, and if society at
can be meaningfully monitored or measured. large can be educated about the need for
However, in this new world, where there are such a renaissance, then there will be light at
so many drivers that are impacting education the end of the tunnel. I believe this will take
1
FOREWORD
2
EXECUTIVE SUMMARY
3
Table ES.1 Key features of the education revolution.
Overthrown and
Key element Replaced by
repudiated
1. C
apacity to Practices reflecting an Practices that build on prior learning and
learn assumption that students reflect a belief in the potential for all students
commence school tabula to learn and achieve high standards, given high
rasa and with an innate and expectations, motivation and sufficient time and
fixed capacity to learn and support
profit from formal education
3. E ducation The school as the focus of The student as the focus of educational policy
policy educational policy and concerted attention to personalising
learning
4. O
pportunity Current age and time-bound Students able to progress at different rates and
to learn parameters: with time and support varied to meet individual
agegrade progression needs
9.004.00 school hours Significantly increased access to care and
open 200/365 days a year education to better align with the realities of
modern living and working
Greater use of the home, the community and
other settings as contexts for 24/7 learning
Assessments that can Assessments unable to assess accurately at either end of the ability
accommodate the full range distribution, or away from critical cut-scores
of student abilities Assessments within tiered credentials or tiered assessments, with
resulting problems of cost, logistics, cross-tier comparability and
capping of student aspirations
Assessments that provide Over-reliance on grades or levels that reveal little about what the
meaningful information on student can do
learning outcomes Feedback to schools on student performance typically provided
too late and too broad-brush to be of value in improving learning
and teaching
Assessments used to generate a single score for each student
which is then further summarised at the school or system level as
a percentage meeting a nominated cut-score a volatile statistic,
hiding more than it reveals about performance, particularly shifts
in performance on either side of the cut-score. Alternatively,
summarised as a mean score unadjusted for intake and other
characteristics beyond the control of the teacher or school
Assessments that support Assessment policies that pay little or no attention to formative
students and teachers in assessment and to providing teachers with the tools and the
making use of ongoing capacity to use it on a daily basis
feedback to personalise An absence of validated learning progressions, efficient processes
instruction and improve for collecting and analysing data and easy-to-use assessment tools
learning and teaching
Assessments that have Assessments that carry undue weight in high-stakes decision-
integrity and that are used making, increasing the risks of cheating and gaming the system
in ways that motivate
improvement efforts and
minimise opportunities for
cheating and gaming the
system
PREPARING FOR A RENAISSANCE IN ASSESSMENT
6
EXECUTIVE SUMMARY
Curriculum
Personalised
Assessment
instruction
Next -
generation
learning
Professional
Resources
learning
Data management
and analysis
was contained within teachers books of and detailed feedback into the learning and
marks, attendance registers, student record teaching process.
cards and student reports. Next-generation
learning systems will create an explosion in Professional learning
data because they track learning and teaching In next-generation learning systems, the
at the individual student and lesson level every teacher retains the key role in fostering the
day in order to personalise and thus optimise learning for each student, but the job itself
learning. Moreover, they will incorporate al changes. Learning systems of the future will
gorithms that interrogate assessment data free up teacher time currently spent on
on an ongoing basis and provide instant preparation, marking and record-keeping
7
Table ES.3 Transforming assessment.
Assessments that can Use of adaptive testing to generate more accurate estimates
accommodate the full range of student abilities across the full range of achievement while
of student abilities reducing testing time
Assessments that have The adoption of (1) more cumulative approaches to approaches to
integrity and are used in ways assessment for selection purposes, with opportunities to re-sit; and
that motivate improvement (2) intelligent accountability systems that utilise multiple indicators
efforts and that minimise of performance, that are designed to incentivise improvement and
opportunities for cheating that avoid the creation of winlose consequences for stakeholders
and gaming the system for outcomes not fully under their control
Assessments that support Sophisticated online intelligent learning systems to integrate the
students and teachers in key components involved in effective instruction and to support
making use of ongoing a new generation of empowered teachers in reliably assessing
feedback to personalise a much wider range of outcomes, using instant and powerful
instruction and improve feedback on learning and teaching to deliver truly personalised
learning and teaching instruction
EXECUTIVE SUMMARY
and allow a greater focus on the professional 1 the teacherstudent interface (tradi-
roles of diagnosis, personalised instruction, tionally the classroom);
scaffolding deep learning, motivation, guidance 2 the school; and
and care. This is the combination of activities 3 the system.
that John Hattie describes as teacher as
activator (2009: 17). The most important level is the teacher
student interface, because this is where
Personalised instruction learning takes place and where there is the
With all the above in place, it is then possible to greatest need for assessment data to enable
talk confidently about personalised instruction, a truly personalised approach to learning and
which is the final and most crucial component teaching. We would argue that the other levels
of Figure ES.1. By personalised instruction, and purposes of assessment should be built
we mean instruction that is adjusted on a on the assessment carried out at this level.
daily basis to the readiness of each student
and that adapts to each students specific The challenge for awarding bodies
learning needs, interests and aspirations. The In considering the future of assessment for
fundamental premises of personalised learning certification purposes, the challenge facing
have been a part of the writings of educators awarding bodies is to work out how they can
for decades but have, in recent years, become take greater advantage of new technologies to
a realisable dream, thanks to the advent of deliver examinations online and, by so doing,
new digital technologies. enhance their capacity to:
9
PREPARING FOR A RENAISSANCE IN ASSESSMENT
10
1. SETTING
THE SCENE
Three core processes lie at the heart of order thinking and interpersonal skills vital for
schooling: living and learning in the twenty-first century.
See, for example, Gordon Commission on the Future of Assessment in Education (2013) and Global Education Leaders
1.
Program (2014).
11
PREPARING FOR A RENAISSANCE IN ASSESSMENT
As we started to write this essay, we realised they have long been, consisting of classrooms,
that we could not discuss changes in the halls, libraries, staffrooms and school grounds
field of assessment without relating them to for recreation and sport. Instruction continues
a much wider set of revolutionary changes to be delivered by a teacher, who teaches
taking place in education. So, in order to a class of students of the same age, all
understand the whys and hows of the coming progressing through a standard curriculum at
renaissance in assessment, we will begin with the same pace, with new teachers each year.
a brief overview of the more fundamental Despite considerable experimentation with
changes happening more broadly in education, new arrangements and new technology, rows
of which assessment is but one vital part. of tables and chairs and students working
with paper, pen and printed texts continue to
THE EDUCATIONAL REVOLUTION predominate. The school year and the school
day reflect the demands of an agrarian society
Change is a constant in the modern world, and that has long since disappeared, with teachers
we certainly witness it in education (although and students enjoying long holidays and short
when the dust settles we often remark on hours that are out of alignment with the
how the fundamentals seem to stay the working days and hours of their parents and
same). In many areas of educational policy guardians, who face challenges in organising
and practice, we simply see pendulum swings. child care. In brief, school education has been
Every now and then, however, radical change characterised by constant surface-level change
occurs that completely upsets the old ways of and periodic calls for a thorough overhaul, but
doing things. Such change is revolutionary in the fundamentals have remained surprisingly
character since it overthrows and repudiates constant.
established methods and replaces them with
an entirely new order. So, not for the first time, we need to
take stock and ask the question, Are we
One hesitates to use the term revolution currently witnessing changes that have more
when talking about fundamental changes in fundamental and far-reaching consequences
education: after all, no parent welcomes the and that will lead to a reconceptualisation
notion of their children being caught up in of school education? We have concluded,
anything revolutionary. Furthermore, schools as have many other commentators, that this
have been among the most stable institutions time things are different. In particular, we
of society and are not prone to radical change. believe that two game-changers are at work
that will shake the very foundations of the
Looking back, we can see that formal ed current paradigm of school education. The
ucations basic structures and modes of first is the push of globalisation and new digital
delivery have barely changed over the past technologies, which are sweeping all before
140 years.That is something one cannot say of them. As Hannon and colleagues observe,
health care, public transport or policing. this is an argument that has been exhaustively
rehearsed, but is no less valid for that (2011:
Despite many recent innovations, schools 2). The second is the pull inherent in the
continue to provide the same kinds of realisation that the current paradigm is no
functions and are recognisably similar to what longer working as well as it should.
12
SETTING THE SCENE
The new world order brought about by How should we prepare young people for
globalisation and the emergence of the such a world? There are those who argue
13
PREPARING FOR A RENAISSANCE IN ASSESSMENT
Figure 1.1 Actual and projected development in digital jobs in the EU: vacancy and graduate
numbers.
1,000,000
900,000
800,000
700,000
600,000
500,000
400,000
300,000
200,000
100,000
0
2011 2012 2013 2014 2015
that knowledge of the fundamentals of the We agree with these points and dont believe
disciplines that have long formed the core of they are in conflict.
traditional school subjects remains vital. At the
same time, there are those who call for: Discussing knowledge of the core disciplines,
Daniel Willingham has observed (2006: 1):
l
ess emphasis on memorisation of
unrelated facts and a greater emphasis on research literature from cognitive science
deep learning of big ideas and organising shows that knowledge does much
principles (the least obsolescent aspects more than just help students hone their
of knowledge); thinking skills: it actually makes learning
more explicit and systematic attention to easier. Knowledge is not only cumulative,
a set of skills, capabilities, understandings it grows exponentially. Those with a rich
and dispositions that run right across the base of factual knowledge find it easier
traditional subject-based curricula and to learn more the rich get richer. In
that facilitate response to change and addition, factual knowledge enhances
the rapid acquisition of new knowledge; cognitive processes like problem solving
a greater emphasis on doing in addition and reasoning. The richer the knowledge
to the acquisition of knowledge and on base, the more smoothly and effectively
allowing living, learning and action to these cognitive processes the very ones
come together in our conceptions of that teachers target operate. So, the
the educated person. more knowledge students accumulate,
the smarter they become.
14
SETTING THE SCENE
In other words, what we are really asking for report of the Committee on Defining Deeper
is more. Yes, we need to be careful to avoid Learning and 21st Century Skills represents
an overloaded curriculum.Yes, we must ensure a significant step towards clarifying the
there is space for deeper learning of the fundamental definition and research-related
more important content, which does imply questions (see Pellegrino et al. 2012).
acquiring a rich base of factual knowledge and,
beyond that, the ability to understand and In addition, progress has been made on scoping
apply it. But yes, we also want to ensure, in a and sequencing these skills or competencies
more systematic, conscious and explicit way, within the context of the overall curriculum.
that, as students learn in specific areas of the For example, the online Australian Curriculum
curriculum, they are also acquiring key cross- for K-10 students gives prominence to seven
curricular skills, capabilities and dispositions general capabilities:
through direct engagement with a curriculum
that blends living, learning and action. A 1 literacy;
number of systems have undertaken major 2 n umeracy;
revisions of curricula to address the need to 3 information and communication tech-
reduce content coverage in order to promote nology capability;
deeper learning, with Singapore one of the 4 c ritical and creative thinking;
first to take decisive action (Ng 2008). 5 p ersonal and social capability;
6 e thical understanding;
Embedding so-called twenty-first-century 7 intercultural understanding.3
skills or next-generation learning into the
curriculum has proved much more challenging. Each has been scoped in terms of the key
These learning outcomes are increasingly seen outcomes relevant to each capability and
as critical to equip young people with the sequenced into six levels spanning years
skills required to be ongoing learners who can K-10. Examples are given, with hyperlinks to
navigate an ever-changing world of work and specific content areas within mainstream
find fulfilment in their lives. Learning outcomes curriculum subjects where these capabilities
include the well-understood basics of literacy are particularly relevant and can be developed.
and numeracy but also involve an education
characterised by deep learning and the ability However, the task is not one of simply adding
to think, learn, inquire, problem-solve, create, a new set of skills to the curriculum but of
relate and also to manage oneself and ones continually challenging our concepts of what it
learning. means to be an educated person. Here, again,
it is a matter of more, not less. In addition
Discussion of these higher-order thinking, inter- to knowledge of the disciplines and cross-
and intra-personal skills has often taken place curricular skills and understandings, schools
without any real agreement on meanings and are being expected to provide young people
definitions, and with little research evidence of with an appreciation of, and engagement with,
their importance or even whether they can the big challenges of the modern world, such as
be taught successfully. The publication of the sustainability, peace and conflict, the widening
See http://www.australiancurriculum.edu.au/generalcapabilities/overview/general-capabilities-in-the-australian-curriculum
3.
15
PREPARING FOR A RENAISSANCE IN ASSESSMENT
gap between rich and poor, population long been recognised as important, they have
and resources.4 In other words, schools are often fallen outside the scope of what has
expected to prepare young people to be been mandated, made explicit, assessed or
informed and actively engaged citizens.5 certificated. As a consequence, it has been all
too easy for them to remain at the level of
One example of where this has been taken rhetoric rather than at that of deliberate policy.
seriously is Hong Kongs new credential for
students at the end of Year 12, the Diploma New models of learning and teaching
of Secondary Education, which requires
all students to study, in addition to Chinese
are evolving that make traditional
classroom, teacher and textbook modes
language, English language, mathematics and of formal learning obsolete
between two and four other subjects of their
choosing, a subject called Liberal Studies. The Globalisation and the new technologies have
aim is to ensure that all students develop an fundamental implications, not only for what
understanding of the major issues confronting students need to know and be able to do but
society in the twenty-first century and are also for how it will be taught. Thanks to high-
equipped with the critical thinking skills they speed internet access, the low cost of devices
need to make informed, critical judgements such as smartphones and tablet personal
about these issues. computers, social media and the evolution
of the semantic web, users can find, share
Beyond skills or competencies and new and combine information more easily. New
understandings, there are calls for schools models of learning and teaching are evolving
to pay more attention to developing the that make traditional classroom, teacher and
character traits and dispositions in young textbook modes of formal learning obsolete.
people that will support them in confronting
the unprecedented changes taking place in Some form of blended learning, in which a
the world around them, such as resilience, part of what students learn is through online
adaptability, entrepreneurialism, sensitivity delivery of content and instruction with
to cultural and personal differences and the elements of personalisation for when, where
disposition to think and act ethically. Cultivating and at what pace, is increasingly becoming
such outcomes is quite a different matter to the norm, although the form it takes varies
imparting skills and understandings, because it enormously, as does the quality.
means engaging students in situations where
these qualities matter and can be experienced, But deeper, technology-enabled transform
reflected upon and nurtured. ations are on the horizon. Big publishing
and information technology companies, in
Whatever name we give to the disparate conjunction with universities and foundations,
set of learning outcomes that constitute are embarking on the design of new, fully
next-generation learning, it is clear that they integrated online learning systems that use
are central to education in the twenty-first detailed learning progressions and continuous
century. While many of these outcomes have monitoring of progress and responses to
4.
A comprehensive framework for considering fifteen global challenges of the early twenty-first century has been developed by
the Millennium Project. See http://www.millennium-project.org/millennium/challeng.html (accessed 15 November 2014).
5.
Regarding the importance of education for citizenship, see in particular, Feith (2011).
16
SETTING THE SCENE
deliver finely calibrated instruction that reflects been reached in the delivery of learning
students learning styles, needs and aspirations. outcomes and in closing achievement gaps.
A key motivation behind the development of Investment in school education is no longer
these more personalised learning systems is yielding the returns it once did, when the focus
the expectation that they will make learning was on access rather than outcomes.
more engaging and more efficient. It is hoped,
too, that they will accelerate progress for In the USA, which has extensive longitudinal
students who have fallen behind. They have data on performance, NAEP (National
significant implications for the role of teachers, Assessment of Educational Progress) survey
especially their knowledge and skillset. results indicate that overall performance has
improved very little since the 1970s.6
Glimpses into the future can be had now
in pioneering schools across the world. But the USA is not alone. Figure 1.2 shows
Significantly, the new digital technologies are annualised changes in performance in reading
not just an option for advanced economies, and mathematics across PISA (Programme
they also offer affordable options for countries for International Student Assessment) assess
in the developing world, particularly through ments for the top nine countries between the
the use of mobile phones (m-learning) to first survey results (either 2000 or 2003) and
reach places where there are no schools, the most recent 2012 survey. (The error bars
teachers or libraries. are 95-per-cent confidence intervals around
each change score.) In the case of reading,
In summary, the increasing availability of only two of the top nine performing countries
powerful and transformative interactive digital in the first survey (Japan and Korea) recorded
technologies is redefining how learning takes a statistically significant improvement, and
place in schools and all other settings.They are in the case of mathematics, none did. This
key ingredients of the education revolution. was despite significant efforts and additional
resources directed at improving outcomes in
The performance ceiling each of these countries.
Digital technologies and the new Knowledge
Society that they are creating, of themselves, In addition, some of the high-performing
would probably be sufficient to fuel the countries (notably Australia, New Zealand
education revolution, but, as we indicated and Finland) have experienced a statistically
earlier, there is another game-changer at work, significant decline in performance levels rather
namely the pull factor inherent in the growing than an improvement. In short, patterns
realisation that the current paradigm of school of results from longitudinal surveys of
education is no longer working as it should. achievement such as NAEP and PISA would
suggest that there are limits as to how much
For many advanced nations, there are clear more productivity can be squeezed out of
indications from longitudinal surveys of school systems operating within the current
achievement that a performance ceiling has paradigm.7
6.
For a commentary on this phenomenon, see Tucker (2013b).
7.
It should be noted, however, that there are those who argue that tests such as PISA, which seek to provide a common
yardstick across nations, are not sensitive to improvements in teaching and learning. PISA does not assess how well students
have learned a specific curriculum but rather their ability to apply understandings in reading, mathematics and science to
everyday problems and situations.
17
PREPARING FOR A RENAISSANCE IN ASSESSMENT
Figure 1.2 Annualised change across PISA assessments of reading and mathematics for top nine
performing countries. Source OECD (2013b).
Readings
3.00
2.00 Korea
1.00
Canada Japan
0.00
Australia
-1.00 Belgium
-2.00 NZ Ireland
-3.00 Finland
-4.00 Sweden
Mathematics
3.00
Korea
2.00 Japan Switzerland
1.00
0.00
Belgium
-1.00 NZ
Finland
-2.00
Canada
-3.00 Netherlands Australia
-4.00
Much of the attention given to improving of 2012 data indicated that around 15 per cent
learning outcomes has been directed at the of the variance in mathematics performance
school level. Analyses of the 2009 PISA data could be attributed to differences between
indicate that in the participating countries, schools (OECD 2013c: Table IV.1.12a). In
after adjustments for demographic and socio- other words, there are substantial differences
economic characteristics, around 20 per cent between schools even when their intake
of the variance in reading performance could characteristics have been taken into account.
be attributed to differences between schools Research into school effectiveness, much of
(OECD 2011:Table IV.2.2a).The same analyses which was undertaken in the 1980s and early
It should be noted, however, that there are those who argue that tests such as PISA, which seek to provide a common
7.
yardstick across nations, are not sensitive to improvements in teaching and learning. PISA does not assess how well students
have learned a specific curriculum but rather their ability to apply understandings in reading, mathematics and science to
everyday problems and situations.
18
SETTING THE SCENE
1990s, has provided us with a good knowledge
of the more powerful school-level levers for
quality of teaching is the key to
unlocking significant improvements
improvement. Strong educational leadership, in outcomes
a small number of strategic priorities and
a climate of high expectations of student There is now a wide consensus that quality
behaviour and learning are among the factors of teaching is the key to unlocking significant
that have delivered remarkable and rapid improvements in outcomes. In 2007, Barber
turnarounds. and Mourshed, in How the Worlds Best-
Performing School Systems Come Out on Top,
However, estimates of school effects can be concluded that three things matter most:
misleading. Analyses that take into account the
fact that students are not only taught within 1 getting the right people to become
a given school but are also in a particular teachers;
class within that school, result in much lower 2 developing them into effective instruct
estimates of the variance in outcomes at the ors; and
school level but high proportions of variance 3 ensuring that the system is able to
at the class level. For example, in one such deliver the best possible instruction for
study conducted by Hill and Rowe in Australia every child.
in the 1990s, it was found that fitting a two-
level model (students within schools) to local In response to the call for a greater focus
assessment data resulted in estimates of school on teaching quality, many nations have
effects of 17.6 per cent for English and 16.6 initiated work on clarifying teacher roles and
per cent for mathematics (very similar to the expectations, improving the quality of recruits
OECD two-level model outcomes). However, into teaching, ensuring that pre-service
three-level modelling (students within classes, teacher training includes a solid foundation
within schools) resulted in estimates of 8.2 of professional practice and systematically
per cent for English and 5.4 per cent for building opportunities to reflect on and
mathematics at school level, but 43.7 per cent enhance their practice into teachers daily
for English and 56.4 per cent for mathematics lives. In a few countries, but particularly in
at class level (Hill and Rowe 1996). the USA, a key part of the solution is seen
as the implementation of systems of teacher
In other words, it matters more which class accountability for student learning, with direct
a student is assigned to than which school links between individual teachers and their
they attend.This is not an altogether surprising students test scores.
conclusion when one considers that learning
takes place in classrooms with a specific However, a succession of other commentators,
teacher and a class of students with particular beginning with Dan Lortie in 1975 and most
backgrounds, but it points to the fact that, recently Jal Mehta (2013), have reached a
in order to improve learning, it is important more fundamental conclusion.8 They believe
to focus on what is happening in individual that, in many nations, improvements to the
classrooms and on the quality of teaching quality of teaching can only come through
received by each student. the transformation of teaching from a largely
8.
Lortie is quoted in the insightful and scholarly review of the field by Grossman and McDonald (2008).
19
PREPARING FOR A RENAISSANCE IN ASSESSMENT
under-qualified and trained, heavily unionised, by Hauser, Professor Geoff Masters presents
bureaucratically controlled semi-profession a dramatic depiction of the extent of the
into a true profession with a distinctive overlap in performance of more than a
knowledge base, a framework for teaching, quarter of a million mathematics students in
well defined common terms for describing different grades in the USA (2013: Fig. 2.3; see
and analysing teaching at a level of specificity Figure 1.3). Much of the overlap seems to be
and strict control, by the profession itself, a consequence of the fact that high-achieving
on entry into the profession. Broadly, we students make steady progress, but low-
agree with this analysis (noting that this achieving students make very little progress
characterisation of teaching is less applicable over time.
in many Asian countries) and believe that the
performance ceiling will remain until the full The phenomenon of wide variations in
professionalisation of teaching, in this sense, performance of students of the same age is
has become a reality. This is what Michael observed in almost all studies where vertically
Barber has called informed professionalism equated test data (across age grades) are
(2014: slide 3). available. These variations indicate that the
greatest opportunities for improvement exist
Whatever the precise contribution of at the student level, but, so far, few systems have
teacher effects (quality of teaching) or the been able to significantly narrow achievement
optimum strategies for maximising them, it gaps within grades.
is unquestionably the case that the greatest
proportion of variance in learning outcomes We would suggest that this is in no small part
is at student level. Using data from a study due to the way in which school education
Figure 1.3 Distributions of students mathematics achievements (Years 27, USA, 2003).
Source: Masters (2013).
Year 2 Year 3 Year 4 Year 5 Year 6 Year 7
Band 6
Band 5
Band 4
Band 3
Band 2
Band 1
20
SETTING THE SCENE
is delivered. The current system has been learning capacity and know through direct
described as an industrial mass-production experience, supported by research from
model where: a number of fields (particularly cognitive
science), that potentially everyone can achieve
students are organised into grades, high standards when expectations are high
based primarily on age rather than and when the individual is motivated to
readiness to learn; learn and given sufficient time and support
there are discrete curricula and standards to succeed.9 In addition, students increasingly
for each grade; come to school having already had significant
each grade is taught over a single school exposure and access to knowledge, courtesy
year by the same teacher; and of television and the internet.
almost all students move to the next
grade, new curriculum objectives and The agegrade progression model is a barrier
standards and new teachers, regardless to realising the new goal of high standards for
of how well they mastered the objectives all because its very structure has an inbuilt
of the preceding grade. assumption of equal time and support for
each student. It was never designed to deal
This is a model that could only be effective if with the wide variation in readiness to learn,
one assumed equal starting points and equal or to educate all to high standards, or to equip
readiness to learn. In the real world, this is students to live and work in the Knowledge
highly improbable. Society of the twenty-first century. It has
thwarted at least a decade of intensive reform
The agegrade progression model made efforts that have delivered, at best, only the
sense when it was first invented as a means most meagre returns (Fullan et al. 2006).
of educating the masses for a world in which
most work required low levels of education Instead of putting schools at the centre of
and automation had not begun to take over improvement efforts, the new paradigm starts
routine tasks. The system efficiently filtered with individual students, taking their starting
out those who were not able to succeed and points, motivations and readiness to learn and
directed them to early employment while working back from those to design what is
giving continued access to more and better needed to deliver truly personalised learning
quality education to the successful few, enabling (Leadbeater 2002). It makes the assumption
them to move into professions requiring high that systems capable of achieving universally
levels of education. high standards are those that can personalise
the programme of learning and progression
It was developed at a time when the accepted offered to the needs and motivations of each
view was that the ability to learn and to profit learner (OECD 2008: 4). In the process, current
from education was a fixed characteristic conceptions of learning and teaching, and of
of individuals and when students arrived at the school itself as the place in which formal
school with relatively little exposure to formal education takes place, are being challenged.
knowledge. We now have a more positive set
of beliefs and understandings about human
21
PREPARING FOR A RENAISSANCE IN ASSESSMENT
22
SETTING THE SCENE
Overthrown and
Key element Replaced by
repudiated
1. C
apacity to Practices reflecting an Practices that build on prior learning and
learn assumption that students reflect a belief in the potential for all students
commence school tabula to learn and achieve high standards, given high
rasa and with an innate and expectations, motivation and sufficient time and
fixed capacity to learn and support
profit from formal education
3. E ducation The school as the focus of The student as the focus of educational policy
policy educational policy and concerted attention to personalising
learning
4. O
pportunity Current age and time-bound Students able to progress at different rates and
to learn parameters: with time and support varied to meet individual
agegrade progression needs
9.004.00 school hours Significantly increased access to care and
open 200/365 days a year education to better align with the realities of
modern living and working
Greater use of the home, the community and
other settings as contexts for 24/7 learning
23
PREPARING FOR A RENAISSANCE IN ASSESSMENT
innovators. What is more, this will be far from schooling. As Barber et al. observed in Oceans
impersonal and will provide for increased of Innovation (2012: 58),
person-to-person interaction, guidance,
instruction and networking. Educator roles will The challenge is that while education
become more differentiated, with a new class reformers are seeking to design a
of professionals providing high-quality care, system for 20 years ahead, teachers
direction, guidance, coaching, motivation and struggle with the present and parents
management of individual student learning and remember the system of 20 years
development. Teachers will focus less on being ago: the conceptual gap is therefore
providers of knowledge and more on assisting 40 years a major communications
students to apply their knowledge, enabling challenge which governments and
them to overcome barriers to progress and educators often underestimate. You
helping them to discern what is important and could argue that the gap is even bigger
true. than this, given that school students
of today will still be part of the global
The sixth and final key change involves the workforce 50 years from now.
gradual emergence of teaching as a true
profession with a distinctive knowledge base, Certainly, an enterprise such as school
a framework for teaching with well-defined education cannot and should not be changed
common terms for describing and analysing lightly or in ways that generate confusion and
teaching at a level of specificity and strict disarray. Change needs to be managed care
control by the profession itself on entry into fully. At the same time, the stakes are high,
the profession. This last change is likely to be and the underlying forces for fundamental
closely linked to the aforementioned changes change are compelling and irresistible. We do
in how students learn in the future and to no favours to future generations if we do not
the new roles that educators in schools will respond to these changes with the urgency
perform. required.
24
SETTING THE SCENE
2. ASSESSMENT:
A FIELD IN NEED OF REFORM
Assessment, when used in an educational
context, is a broad term referring to any formative versus summative; norm-
appraisal (or judgement or evaluation) of referenced versus criterion/standards-
a students work or performance (Sadler referenced; tests versus assessments;
1989: 120). It can be done informally, through internal versus external; continuous
direct observation and questioning, or more versus terminal; measurement versus
systematically, through the use of rubrics to judgement; assessment of learning
analyse performance, including classroom versus assessment for learning; and so
activities and tests, or it can be done formally, on.
through system-wide testing programmes and
public examinations. In principle, virtually any
educational outcome is assessable, although tensions arise when assessments
not all can or need to be measured with the designed for one purpose are assumed
same power. to be fit for another or when the impact
of a secondary use of assessment on
The primary purpose of educational core instructional activities is ignored
assessment is to seek to determine what
students know, understand and can do. While Professor Paul Newton pointed out some
that would seem a relatively straightforward years ago that much of the confusion and
intention, in the real world of policy and division in the field of educational assessment is
practice, educational assessment is complex not caused by the assessments themselves but
and frequently controversial. In a recent by the uses to which they are put. In particular,
review of the field, Professor Geoff Masters, tensions arise when assessments designed for
CEO of the Australian Council for Educational one purpose are assumed to be fit for another
Research, an organisation that played a leading or when the impact of a secondary use of
role in the implementation of OECDs PISA assessment on core instructional activities is
programme, states (2013: 12): ignored (Newton 2007). Newton provided
comments on a non-exhaustive list of more
than a dozen uses, each supporting a particular
The field of educational assessment is set of decisions and having different assessment
currently divided and in disarray. Fault design implications, and illustrated how readily
lines fragment the field into differing, and disarray can arise in the field of assessment
often competing philosophies, methods when important distinctions are ignored and
and approaches. The resulting false dichotomies are perpetuated.
dichotomies have become the default
basis for conceptualising and describing In order to better understand the significance
the field: quantitative versus qualitative; of the radical changes in thinking and practice
25
PREPARING FOR A RENAISSANCE IN ASSESSMENT
on assessment that we and others have and referrals or testimonials. The use of
foreshadowed, this chapter: certification for selection purposes has high-
stakes consequences for students, and, in
reviews some key purposes of as some countries, where results are used for
sessment, including its use in formal accountability purposes, for teachers, school
programmes for the purposes of leaders and schools too.
certification, selection and accountability
and its formative use in classrooms and The certification/selection functions of
schools for improving learning and educational assessment have a very long and
teaching; interconnected history. It could be claimed
identifies why assessment, when used that their origins lie in the national system
for these purposes, has often been of examinations created for the purpose
controversial, difficult and a barrier to of selection into the Chinese Imperial Civil
change. Service some 1,300 years ago. It was the
Chinese who invented written examinations
ASSESSMENT FOR CERTIFICATION AND based on a set curriculum, leading to the
SELECTION PURPOSES award of degrees and used explicitly for the
purposes of selection by merit principles
In the school education context, the primary not taken up in the West until more than a
purpose of certification is to attest to a millennium later.
students educational attainments in individual
subjects or areas or across a whole programme As for the certification/selection of students
of study. Certification is typically carried out on at the end of their secondary education, the
completion of high school, although in many German and Finnish Abitur, can be traced
systems (such as the UK, Bangladesh, India, back to Prussian law introduced in 1788. The
Indonesia, Pakistan, Singapore and Thailand), French Baccalaureate was created in 1808
it continues to be a two-step process. Here, under Napoleon. The British Higher School
the first set of examinations in several subjects Certificate Examinations (the forerunner of
is taken at the end of the period of junior the present-day GCE A-Level examinations)
secondary education (usually the tenth or were established in 1918.
eleventh year of schooling) and the second
two years later, in a smaller number of subjects All of these examination systems were
studied in depth. conceived initially for the purpose of selection
into university. They continue to serve this
The selection function involves the use function today, but in a very different context
of assessment information by admissions of expanded access and retention, as well as
staff and employers choosing applicants for the more general purposes of certification
positions. This often entails manipulating of performance, high-school graduation and
information generated by the certification selection, regardless of whether students
process and sometimes supplementing it with proceed to university, work or other forms of
further information, including the outcomes education and training.
of interviews, evidence of achievements,
participation in other relevant activities
26
ASSESSMENT: A FIELD IN NEED OF REFORM
In the USA, the use of examinations to certify of which gets aired annually in the media,
and select students can be traced back to while other issues cause internal dilemmas for
the New York state legislatures creation awarding bodies.
of the Regents examination system. These
high-school, end-of-course exams were first ASSESSMENT FOR ACCOUNTABILITY
administered after the Civil War in 1878. PURPOSES
Twenty-three US states run graduation/exit
examinations that require a certain standard Another long-standing use of assessment,
of attainment in order to receive a high-school and one that has gained huge prominence
diploma. In most states, these high-school in recent years, is for the purpose of holding
examinations are first taken in the tenth grade providers (systems, schools and teachers)
although students typically complete high directly accountable for the performance of
school at the end of grade 12. their students. In education, as in almost all
areas of public and corporate life, ever more
Selection into universities in the USA has complex formal systems of accountability
traditionally depended on the use of high- have been created that variously consider
school grade-point averages and scores on compliance with regulations, adherence to
standardised scholastic aptitude tests, such professional norms and educational outcomes.
as the SAT.1 The SAT evolved in the 1920s, It is the last of these which we will focus on
from the IQ tests developed for the Army here, as it involves a very specific and often
during the First World War. Some 1.9 million controversial use of assessment information.
men were tested on the Army Alpha test of
intelligence for literates, and the Army Beta Making use of assessment information for
test of intelligence for illiterates and non-English accountability purposes has a long history.
speakers, especially new immigrants (Wigdor In 1863, the British government, as part of
and Green 1991). These were aptitude rather new funding arrangements for elementary
than attainment tests, associated with the new education, implemented a system in which
science of intelligence testing, new theories funds received by individual schools depended
of psychometrics and the invention of the in part on students performance in examin
multiple-choice question, allowing fast and ations administered by school inspectors. This
efficient testing of large numbers of candidates. system, referred to as payment by results,
They have had an enormous impact on a wide was highly controversial but, nevertheless,
range of other school testing programmes a key part of the drive in Victorian England
and, indeed, on the more traditional school to establish a system of public elementary
curriculum-based examination systems education for all.This system remained in place
typical of Europe, Australasia and some Asian for just over thirty years, and, at its height in
countries such as India, Pakistan, Malaysia, the 1870s and 1880s, on average around half
Hong Kong and Singapore. of the national-level funding an elementary
school received depended on the outcome
In all parts of the world, assessment for of student examinations. From then on, it
certification of students at the end of high was considered inadvisable to use assessment
school generates ongoing controversy, much data to hold teachers accountable for student
1.
Most high schools in the USA use a system of five grades in assessing student performance and assign points to these grades
as follows: A = 4; B = 3; C = 2; D = 1; F = 0. The average of a students grade points is referred to as a GPA.
27
PREPARING FOR A RENAISSANCE IN ASSESSMENT
learning. Instead, the emphasis shifted to schools to perform but had not increased
performance at the school level. the support to do so. The Blair governments
brought major increases in teachers pay
In the UK, the next stage came when the and growth in the numbers of teachers and
post-Second World War settlement was sought improvements in teacher training
overthrown by the 1988 Education Reform and high-quality professional development
Act, which at one and the same time introduced for all primary teachers in the teaching of
market-style reform devolution of resources mathematics and English.
to schools, open enrolment and new school
models and sharper accountability, including Importantly, the Blair government argued that
Englands first National Curriculum and only if the system demonstrated its impact,
national testing of children at ages seven, through accountability and transparency, could
eleven, fourteen and sixteen. increased investment in education over the
years 19982008 be justified, revealing the
Implementation of the new assessment connection between assessment policy and
arrangements took the best part of a decade, overall strategy.
with implementation errors and significant
controversy at every step. By 1995, however, In the USA, which has had a long history of
national assessment of seven-, eleven- and assessment for accountability purposes, the
fourteen-year-olds in mathematics and English No Child Left Behind (NCLB) legislation
(and science for eleven- and fourteen-year- enacted in January 2002, with cross-party
olds) had been introduced. The General support, introduced what might be regarded
Certificate of Secondary Education (GCSE) as the most ambitious attempt ever to seek to
exam, new in 1988, was reformed and adapted use accountability testing as a means of raising
and became the main means of assessment standards.2 It required states to:
for sixteen-year-olds.
establish standards for academic
Moreover, by the mid-1990s, transparency had proficiency in reading, mathematics and
become a major theme of the reforms, and science;
the results of the tests at eleven and fourteen establish measures for assessing all
and exams at sixteen and eighteen were students in public schools each year in
published in performance tables, which the English and mathematics in grades 38
media promptly turned into rankings. and in one of grades 1012, and later
on in science;
The Blair government, first elected in develop a definition of what would
1997, stood by both accountability and constitute adequate yearly progress
transparency, indicating that it would publish (AYP) towards the standard that has
more information, including data on a schools been set for academic proficiency;
progress over time and value-added indicators. set targets for schools to enable them
Crucially too, its critique was that the previous to achieve 100 per cent academic
government had increased the pressure on proficiency over twelve years; and
No Child Left Behind is a United States Act of Congress that was a reauthorisation of the Elementary and Secondary
2.
Education Act.
28
ASSESSMENT: A FIELD IN NEED OF REFORM
s et measurable objectives for improved results from subjects other than reading and
achievement for each of the following mathematics (Polokoff et al. 2014).
subgroups: economically disadvantaged
students, students with disabilities and Common to the UK, the USA and almost
students with limited English proficiency. all other countries that have adopted
accountability testing has been a consensus
In addition, the NCLB legislation incorporated that outcomes matter, that they should be
the requirement that states implement high- measured and that schools and systems
stakes consequences for schools and districts should be held accountable for them. From a
that failed to demonstrate AYP. social-democratic perspective, accountability
testing has been seen as a way to promote
It soon became evident that NCLB targets were greater equality of opportunity by focusing
unrealisable for many schools. Implementation on groups who have traditionally achieved
of the legislation has generated much debate low educational outcomes and using the data
and controversy, with many criticising NCLB for to target interventions. From a neo-liberal
its punitive approach to school accountability perspective, it has been seen as creating
and its over-reliance on test scores when an informed public who are better able to
making judgements about schools. Without exercise choice in where they send their
doubt, NCLB made a major contribution to children to school (Hursh 2007), which, in turn,
putting achievement gaps firmly on the national is seen as leading to ongoing improvements in
agenda. However, no consensus has emerged the quality of educational provision as schools
on how it could be modified or, indeed, compete with one another for students.
whether it should be scrapped in the context
of the reauthorisation of the Elementary and Accountability testing certainly resonates with
Secondary Education Act. electorates that have come to believe that
justice and progress can occur only under
Since 2012, most states have applied for conditions of transparency and full knowledge
and have been granted waivers from NCLB of the facts.4 Parents believe that they are
requirements and, in particular, from exclusive entitled to know how their child is progressing
reliance on test scores, in exchange for and how the childs school and school system
rigorous and comprehensive plans to improve is performing. They also believe that there is
educational outcomes for all students, close a corresponding entitlement to remediation
achievement gaps, increase equity and improve when their child is not making adequate
the quality of instruction in the classroom.3 progress or when the childs school or school
However, recent research indicates that some system is not performing to expectation.
NCLB waivers allow the flawed accountability
practices of the original law to continue and It is thus no surprise that accountability testing
have missed the opportunity to design more has become common across the world. It
effective school accountability systems that may take the form of specially developed
might consider non-test-based indicators, standardised tests, particularly to measure
student growth, student demographics or basic literacy and numeracy, or use standards-
3.
See http://www2.ed.gov/policy/elsec/guid/esea-flexibility/index.html (accessed 18 November 2014).
4.
For a compelling discussion of why transparency rules in the modern world, see Fullan (2008).
29
PREPARING FOR A RENAISSANCE IN ASSESSMENT
based external examinations of school subjects One response has been to offer tiered
originally designed for certification purposes. credentials. For example, in England and Wales,
Evidence is mixed regarding the effectiveness students sitting the GCSE examinations may
of accountability testing as a policy to improve sit either for Foundation papers, graded GC,
outcomes. Analyses of PISA 2009 data on or for Higher papers, graded EA*, according
factors that might explain differences between to their ability and expected performance.
countries in student performance revealed Criticism of such arrangements has focused on
that, across OECD countries, the use of the potential for tiering to place a cap on the
standards-based external examinations of aspirations of students who may have been
school subjects for accountability purposes guided into sitting for lower-tier papers. On the
was associated with higher levels of student other hand, tiered papers have the advantage of
performance, but no measurable relationship creating a better match between the demands
was found between the prevalence of of the assessment and the assumed ability
standardised tests and the performance of level of candidates and therefore leading to a
school systems (OECD 2010). more efficient assessment. The case for tiering
is stronger for subjects such as mathematics
In terms of the challenges associated with and science, which differentiate through the
the use of formal assessment programmes specific content of questions posed, than for
when used for certification, selection and subjects such as English and history, which
accountability, there are four that have been differentiate through the quality of responses
universal: to less content-specific questions.5
1 a ccommodating the full range of student Another response has been to expand the
abilities; range of subjects available within a mainstream
2 p roviding meaningful information on credential, with the intention of better catering
learning outcomes; for those not suited to or interested in
3 assessing the full range of valued studying traditional academic subjects. But this
outcomes; sets up a hierarchy of esteem among subjects
4 maintaining the integrity of assessments. that are manifestly not of equal challenge,
even though the subjects in themselves may
We will discuss each of these in turn. be equally valuable and worthy of study.
If awarding bodies seek to maintain some
Accommodating the full range of student comparability in the standards of these very
abilities different (in terms of demand) subjects, they
In the case of assessment for certification risk discouraging the very students they wish
and assessment purposes, most examination to encourage. If they decide to award grades
systems were designed initially for the most that reflect the candidature of each subject,
academically able of the age cohort but have then they generate a problem for users of the
since been modified or redesigned in an credential, particularly for those who require
attempt to accommodate the expanded range an overall indicator of performance, such as
of student aptitudes that have accompanied admissions officers.
increased retention rates.
For a discussion of tiering in the context of the GCSE, see Oates (2013).
5.
30
ASSESSMENT: A FIELD IN NEED OF REFORM
In the case of standardised testing for within the confines of a single fixed-item test.
accountability purposes, an ongoing challenge As we will see in the next section, however,
has been to design tests that can be the problem is now being addressed through
administered within the time allowed and various forms of dynamic, adaptive test
yet provide accurate measures across the full delivery that can be facilitated by the adoption
spectrum of abilities within a given age/grade of onscreen assessment.
cohort. As we will indicate in the next section,
technical solutions for better assessing the full Providing meaningful information on
range of abilities have existed for some time, learning outcomes
but test developers have not always been in Another big challenge has been that of
a position to implement them. The fall-back providing assessment information in ways that
position has often been to design tests that are meaningful and facilitate decision-making.
have maximum reliability around critical cut- This, of course, may have little or nothing to
scores associated with one or more defined do with the assessments themselves but rather
standards of performance, which is perfectly with how assessment information is used.
reasonable if what matters is the standard itself.
Of course, if one is interested in performance In the case of assessment for certification,
across the full spectrum of abilities, then the where the primary use has been for selection
number of items and/or score points required purposes, many systems have provided some
to obtain accurate measures rises dramatically. form of ranking statistic, such as a standardised
The problem can be appreciated by looking score, a percentile rank or a grade determined
at Figure 2.1, which shows, for PISA 2009 by fixed percentages. Normative information
mathematics, both student ability measures can indeed facilitate selection decisions but
and item difficulties on the same scale, allowing by itself provides no indication as to what
the distribution of student ability measures students actually know and can do and can
to be compared to the distribution of item conceal changes in performance levels over
difficulties.6 time.
It can be seen that the distribution of item As a consequence, most awarding bodies
difficulties closely follows the distribution have moved away from normative reporting
of student abilities, which indicates a well- in favour of a form of standards-referenced
targeted test, but it is also evident that there reporting in which psychometric methods
is only one item appropriate for students in are used to develop an achievement scale
the ability range 3 to 2. To get an accurate along which cut-scores are identified in order
estimate of students in this ability range, more to create a number of hierarchically ordered
items would be needed of matching difficulties. levels which are then given labels (e.g., grades
AF), accompanied by descriptions of what a
In brief, tests and examinations are now being typical student achieving a given level/grade is
required to be more sensitive to performance able to do.
across a much wider spectrum of student
abilities than can be satisfactorily assessed
In PISA and most other tests these days, ability measures are estimated on a scale of logits (the logarithm of the probability of
6.
31
PREPARING FOR A RENAISSANCE IN ASSESSMENT
Figure 2.1 PISA 2009 mathematics: plot of student abilities and item difficulties.
Source: OECD (2012).
3
19
2 11 21
5
22
10 33
1 36
34
31
17
2 7 12
14 30
0 13 27 28 29
15
6 8 24
35
4
26
-1 3 20 25
18
23
1
16
-2
32
-3
-4
32
ASSESSMENT: A FIELD IN NEED OF REFORM
For such standards to have meaning, it and unambiguous information, such as the
is necessary to ensure that they remain percentage of students of a given age/grade
comparable over time. For testing situations, meeting a given standard across the system
the answer is to equate successive tests by and within a given school. However, the
embedding a set of anchor test items into the complexity of schooling makes it difficult to
live test. This is routinely done in longitudinal capture the performance of a school using
surveys of achievement such as PISA, or simple statistics like this.
literacy and numeracy standardised tests
typically administered at state and national To begin with, the information is never as
levels for accountability purposes. unambiguous as it might seem, thanks to the
existence of measurement error, which is a
Ensuring standards are meaningful and remain feature of all assessments.This unavoidable fact
constant over time is less straightforward in does not sit well with the average layperson,
the case of public examinations for which, in who typically sees any error as inexcusable and
the interests of transparency, all examination believes all assessments should be completely
questions enter the public domain immediately accurate.
after they are administered and thus cannot be
used again for equating purposes. In many parts Compounding this problem is the fact that
of the world, reliance is placed on professional the easy-to-understand percent meeting the
judgement to set and maintain standards, in standard index is particularly unreliable when
cluding examination of scripts at grade it comes to summarising the performance
boundaries and comparisons with scripts from of a school. For small schools, the degree
previous years. In England, Northern Ireland of uncertainty over the percentage of their
and Wales, so-called prediction matrices are students meeting a given standard may be
used to guide examiners in setting boundaries greater than the percentage change which
for grades.These matrices make use of students the system has declared necessary to demon
performance at a previous stage of schooling strate adequate progress. This unreliability of
to predict their performance in GCSE and percentages above a given cut-score statistic
GCE examinations. Examining bodies have also leads to zigzag patterns of performance
to report and justify significant disparities be- over time, with some schools erroneously
tween predicted grades and actual grades believing that they did very well one year but
awarded. poorly the next, when in fact the differences
may not have been statistically different but
For a review of some of these difficulties in the context of the ubiquitous use of the percent meeting the standard index,
7.
see Ho (2008).
33
PREPARING FOR A RENAISSANCE IN ASSESSMENT
improvement in the performance of the least Certainly it is important for schools and
able who are well below the standard. teachers to have access to objective
information on both the absolute and relative
Mean scores are more informative because levels of performance of their students. But the
they take into account the actual scores potential of test results to improve learning and
of each student, but they dont take into teaching can be overstated. Results typically
account the backgrounds of the students reach schools many weeks or even months
taking the test and other factors beyond after students take the tests, by which time
the control of the school. Researchers have they may be in another grade, in another class
long advocated greater reliance on so-called and with another teacher, so the information is
value-added measures that seek to adjust for too late to inform practice. Even where there
prior achievement, intake and other school is timely feedback to schools, the information
and student factors, but there has been a may not be specific or precise enough to
reluctance to embrace them, first because of inform practice or improve learning in any but
a commitment to the notion that all should be a very general way. This is particularly the case
assessed against the same standard, and second in testing programmes in which the test items
because value-added indices are inherently represent a very broad and light sample from
complex and difficult to grasp for those lacking the target domain.
an understanding of the underlying statistical
manipulations. In seeking to use assessments designed for
broad system and school accountability
A claim that has often been made about purposes to inform daily teaching, it is as well
accountability-assessment programmes is to recall Newtons warnings of the tensions
that, in addition to providing information of that can arise when assessments designed for
general public interest, they provide schools one purpose are assumed to be fit for another,
and teachers with valuable information for or when the impact of a secondary use of
guiding and improving learning and teaching. assessment on core instructional activities is
In other words, an important rationale for ignored.
administering the tests is that the feedback
they provide can enhance teachers pro Assessing the full range of valued outcomes
fessional practice and give pointers on where A long-standing challenge in assessment for
to focus school improvement efforts. Often, certification and accountability purposes,
schools and teachers are given access to and one on which we are now beginning
detailed breakdowns of the performance to see significant progress, has been how
of different groups of students on individual to assess the full range of valued outcomes.
test items or on subsets of items assessing Recent systematic quantitative analyses and
specific aspects of the curriculum. Better still, benchmarking of curriculum documents with
some systems publish detailed analyses of the corresponding examination papers have
performances of students on test questions, revealed imbalances, with a preponderance
identifying common difficulties encountered of questions relying on relatively low-level
and providing suggestions and identifying cognitive processes such as memorisation,
resources to teachers on ways in which these comprehension and problem-solving of a
can be overcome. predictable and formulaic nature and few
34
ASSESSMENT: A FIELD IN NEED OF REFORM
questions assessing the kinds of thinking skills and re-marking of samples by external
that result from deep learning.8 examiners, tightly defining the nature of the
assessment and how it will be scored and
In some jurisdictions, the traditional essay statistical moderation using results on the
question, which can often tap into higher- examination paper as the moderating variable.
order cognitive skills, has been discarded in
favour of multiple-choice or short-response Faced with the costs of an effective system
structured question formats, in an effort to of moderation, pressure from teachers to
improve marking reliability and efficiency and relieve them of the burdens that such systems
to provide greater access to the full range of often impose and the difficulties of managing
student abilities. widespread distrust in the integrity of such
assessments, many awarding bodies have felt
There have also been significant gaps in obliged to eliminate the use of school-based
assessment, particularly when it comes to assessment or to restrict it only to those
laboratory, field and practical work, oral instances where it is deemed absolutely
language and presentations and almost essential (such as in the case of orals to assess
all the inter- and intra-personal skills and second-language acquisition). Such trends run
competences discussed earlier, which are now counter to the directions emerging in the
seen as vitally important for learning, living and development of modern curricula that will
working in the twenty-first century. prepare students for a globalised world and
life within the emerging Knowledge Society.
Such imbalances and gaps make it impossible
to have a complete picture of a students The problem of assessing only a limited
learning and, more seriously, mean that range of valued outcomes is, of course, even
outcomes not assessed in the examination will more acute in most accountability-testing
receive little or no attention in the classroom. programmes, which typically assess only a
Thus, assessment is dictating and constraining, small part of the intended curriculum (literacy
rather than reflecting, the curriculum. and numeracy and sometimes core science
An example of research that has investigated the level of cognitive demand in examinations is Clesham (2013).
8.
35
PREPARING FOR A RENAISSANCE IN ASSESSMENT
on other forms of assessment. The Gordon including bribery, paying someone else to
Commission summarised this problem in the sit ones examination (identity fraud) and
context of the USA as follows (2013: 78): cribbing (concealing notes). In the modern
digital age, smartphones and purpose-built
[assessment] has been seen by concealed microelectronic devices which can
policymakers as a means of enforcing communicate with an outside collaborator or
accountability for the performance of post exam questions live on social-networking
teachers and schools. Accountability websites have introduced a whole new level
is not the problem. The problem is that of complexity and challenge to the task of
other purposes of assessment, such maintaining the integrity of examinations. This
as providing instructionally relevant integrity must be maintained without negatively
feedback to teachers and students, affecting the validity of the examination,
get lost when the sole goal of states is infringing on individuals liberties or otherwise
to use them to obtain an estimate of causing undue expense, personal stress or
how much students have learned in the inconvenience to all. Of course, cheating and
course of a year. corruption enter into many aspects of everyday
life, so it is no surprise that they should enter
Avoiding this danger calls for a rethink not into assessment for certification and selection
just of what should be assessed within purposes. On the other hand, any system that
accountability programmes but also of the allows such behaviour to become widespread
fundamental premises underpinning them.This will inevitably fall into complete disrepute, so
is something we will return to in more detail this issue needs the closest attention.
in the next section, where we identify some
of the more promising developments under If maintaining the integrity of assessments is
way to enable assessment of a wider range of a challenge in assessment for certification
learning outcomes. purposes, where problems tend to involve
isolated students, it is perhaps an even more
Maintaining the integrity of assessments serious challenge in accountability testing, in
Those responsible for running examinations which the stakes are often high for teachers,
have always had to cope with the threats principals and system officials. Assessments can
posed by those (typically the small minority) be compromised by behaviours ranging from
who seek to beat the system. As Steven Levitt excessive drilling to the test to more serious
and Stephen Dubner cleverly illustrated in but much rarer instances of professional
their best-seller Freakonomics (2007), if the misconduct. Moreover, the line between right
incentive is there, some people will do what and wrong is not always clear-cut, at least to
it takes to get what they want, so perhaps we some. For example, there are documented
should not be surprised that people will do cases in which a school administrator who
all they can to exploit the vulnerabilities of had deliberately altered students responses
examination systems. to give them higher scores declared this
behaviour morally defensible as it guarded
Cheating and corruption were a notorious and against potential closure of the local school
well-documented problem throughout the long and its attendant consequences.
history of the Chinese Imperial Examinations,
36
ASSESSMENT: A FIELD IN NEED OF REFORM
In both the USA and the UK, there is evidence ASSESSMENT FOR IMPROVING
that improvements in the performance of LEARNING AND TEACHING
schools and students as assessed through high-
stakes testing programmes is typically higher As an integral part of the three core processes
than that indicated by performance on parallel referred to at the beginning of the first chapter,
low-stakes programmes, giving credence the most critical role of assessment is that of
to the view that test-based accountability monitoring student progress. This provides
improvements in learning outcomes reflect, in feedback, which can inform decisions about
part, drilling to the test and various strategies what to teach next (the curriculum) and pro
to game the system.9 vide evidence of the outcomes of learning
and teaching. This feedback is most powerful
This suggests a problem that goes well beyond when used by students to adjust their learning
isolated cases of cheating or manipulating strategies and by teachers to make daily, micro-
outcomes, and which has little to do with level adjustments to their teaching. When used
concerns over the nature of the assessments to inform, guide and personalise learning and
used in accountability testing. Instead, it relates teaching, this is known as formative assessment
to a clash in values and to underlying faults in (Popham 2008).
a clash in values and under- assessment, when used formatively,
lying faults in the accountability
arrangements generate widespread
is one of the most powerful inter
ventions found in the educational
attempts to game the system research literature
9.
See, for example, Chadowsky and Chadowsky (2010) and Statistics Commission (2005).
37
PREPARING FOR A RENAISSANCE IN ASSESSMENT
When we visit the doctor, we are in a one- h ave a simple and efficient process for
on-one situation, and we receive individual real-time collection, storage and analysis
attention. If unsure of the diagnosis or of large amounts of data about their
treatment, our doctor refers us for further students;
tests or to a specialist. This is routine practice monitor students and their progress on
in healthcare. (That is not to suggest that the a daily basis using a set of structured
more personalised approach in healthcare observations and assessment tools
always results in accurate diagnoses but rather linked to the objectives of each lesson
that it has a much greater likelihood of doing so.) and integrated into learning activities
to minimise interruption to normal
When we go to school, we join a class of classroom routines;
twenty-five or more students assigned to a use the data as a starting point for both
teacher who is expected to be able to cope immediate and longer-term planning
with all but the most extreme learning or and adjustment of instruction explicitly
behavioural difficulties. Most assessment is linked to curriculum objectives and
informal, unsystematic and takes two forms: tailored to the needs of individual
(1) ongoing observations of and reflections students.
on students at work; and (2) the posing of
questions to monitor responses to instruction.
When teachers do assess more systematically,
formative assessment: too onerous
for the majority of teachers to
it is invariably for the purpose of making implement and sustain
judgements and generating evidence to
support a final set of assessment grades.These Much of the above has simply not been
then appear on students end-of-term or end- available, and this has made formative
of-year report cards and may subsequently assessment too onerous for the majority of
be used for various internal guidance and teachers to implement and sustain. But without
selection purposes. such a systematic, data-driven approach to
instruction, teaching remains an imprecise
To tap fully into the power of formative and somewhat idiosyncratic process that is
assessment, particularly for the more critical too dependent on the personal intuition and
parts of the curriculum (such as learning to competence of individual teachers.
read), it is necessary for teachers to:
This may sound a brutal claim and is certainly
have a clear notion about which not meant as an attack on teachers but rather
aspect or qualities of learning they of the paradigm within which they operate
wish students to develop, in the form and the impossibility of personalising learning
of validated maps of the sequence in given current conceptions and practices. The
which students typically learn a given issue to be explored in the next chapter is the
curriculum outcome (variously known extent to which new thinking and new digital
as learning progressions or critical technologies can remove many of the barriers
learning instructional paths [Fullan et al. to full adoption of formative assessment.
2006: 54]);
38
ASSESSMENT: A FIELD IN NEED OF REFORM
Assessments that can Assessments unable to assess accurately at either end of the ability
accommodate the full range distribution, or away from critical cut-scores.
of student abilities Assessments within tiered credentials or tiered assessments, with
resulting problems of cost, logistics, cross-tier comparability and
capping of student aspirations
Assessments that provide Over-reliance on grades or levels that reveal little about what the
meaningful information on student can do
learning outcomes Feedback to schools on student performance typically provided
too late and too broad-brush to be of value in improving learning
and teaching
Assessments used to generate a single score for each student
which is then further summarised at the school or system level as
a percentage meeting a nominated cut-score a volatile statistic,
hiding more than it reveals about performance, particularly shifts
in performance on either side of the cut-score. Alternatively,
summarised as a mean score unadjusted for intake and other
characteristics beyond the control of the teacher or school
Assessments that support Assessment policies that pay little or no attention to formative
students and teachers in assessment and to providing teachers with the tools and the
making use of ongoing capacity to use it on a daily basis
feedback to personalise An absence of validated learning progressions, efficient processes
instruction and improve for collecting and analysing data and easy-to-use assessment tools
learning and teaching
Assessments that have Assessments that carry undue weight in high-stakes decision-
integrity and that are used making, increasing the risks of cheating and gaming the system
in ways that motivate
improvement efforts and
minimise opportunities for
cheating and gaming the
system
39
PREPARING FOR A RENAISSANCE IN ASSESSMENT
40
3. TRANSFORMING
ASSESSMENT
Lets briefly review what has been suggested those higher-order thinking and inter- and
so far. At its core, school education is about intra-personal skills vital for living and learning
deciding what students need to learn (the in the twenty-first century. In this chapter we
curriculum), about learning and teaching outline the key elements of these changes.
and about assessment (monitoring student
progress). Of the three, assessment is the As we noted earlier, this future is in many
lagging factor and often sits uncomfortably respects already with us and can be viewed
with the other two, for the reasons we have at the margins of current practice (which is
just identified, many of which have to do not so often where one encounters the new)
with the assessments themselves but with the or is being created by bringing together
uses to which they are put. components that already exist but which have
never before been made to work together.
However, we are on the verge of an education
revolution as a result of irresistible external This chapter describes ways in which new
pressures generated by globalisation, new thinking and new digital technologies are
digital technologies and the emergence of the transforming assessment and overcoming cur-
Knowledge Society. Added to which, there are rent barriers and limitations. We begin by
internal pressures in many high-performing considering how these changes affect formal
countries brought about by a performance assessment programmes, such as those used
ceiling in terms of the improvements in learning for certification/selection and accountability,
outcomes that can be delivered within the and then go on to consider assessment as part
current paradigm of school education. of the ongoing process of learning and teaching.
41
PREPARING FOR A RENAISSANCE IN ASSESSMENT
online assessment environment offers a will require many fewer items than had they
number of major advances once the technical sat a standard, fixed-item test.
problems of access have been addressed and
the reluctance to abandon tried and tested Implementing CAT requires significant upfront
traditional approaches has been overcome.1 and ongoing investment in the required
infrastructure, particularly for schools in pro
Assessing the full range of abilities viding computers and online access, but also
We referred earlier to the dilemma of in item development, maintenance and the
examiners and test constructors in assessing creation of sophisticated software to deliver
the full range of abilities in many assessment valid, individually tailored tests while ensuring
contexts. Test developers find it difficult, the accuracy and comparability of ability
if not impossible, to design paper-based estimates. Moreover, its use is confined to
examinations and standardised tests that assessment tasks that can be scored in real
can be administered within the limited time time, making it unsuitable for assessing a range
allowed and yet provide accurate measures of outcomes, including certain higher-order
across the full spectrum of abilities for a given cognitive skills.
age/grade cohort. Many tests have both floor
and ceiling effects. (There are insufficient items A number of states in the USA have
to properly assess the highest and lowest implemented CAT programmes, although their
achieving students.) use has been constrained by requirements that
accountability testing should assess only grade-
One response has been to develop tiered specific content. Only one state, Oregon, has
credentials, while another has been to design thus far implemented a CAT system that is
tests that maximise reliability around cut- part of state accountability arrangements and
scores associated with one or more defined aligned with grade-level content standards.
standards of performance, while accepting
greater imprecision of measurement above In the future, more states will adopt CAT. The
and below these cut-scores. Smarter Balanced Assessment Consortium,
one of the two state-led consortia working to
Yet another approach, and one that has develop next-generation assessments aligned
been known about for decades, involves the to the Common Core State Standards (CCSS),
use of computer adaptive testing (CAT) and is making use of CAT and a bank of more than
the application of psychometric methods 21,000 items to deliver online, high-stakes
to calibrate a bank of questions of known accountability tests.2
difficulty. If students perform well on an item
of intermediate difficulty, they are presented In the case of public examinations, a further
with a more difficult question. If they perform major impediment to CAT is the requirement
poorly, a simpler question is presented. Testing that all items be released into the public domain
proceeds until an estimate of sufficient after the examination is concluded. Doing so
precision is achieved, which, for most students, would compromise the integrity of any CAT-
1.
For a wide-ranging, in-depth review of the potential for computers to impact on assessment, see, in particular, the collection of
papers in Lissitz and Jiao (2012).
2.
See http://www.smarterbalanced.org/smarter-balanced-assessments/computer-adaptive-testing/ (accessed 15 November
2014).
42
TRANSFORMING ASSESSMENT
43
PREPARING FOR A RENAISSANCE IN ASSESSMENT
44
TRANSFORMING ASSESSMENT
the same test in order to improve coverage the overall score is what users want to have,
of the curriculum and to provide more there is growing demand for more accurate
meaningful information on performance knowledge of the specific strengths of students
across the curriculum. across a range of outcomes. This applies
particularly to some of the so-called twenty-
Currently, most formal assessment pro first-century skills that are clearly discrete and
grammes focus on generating a single score that do not lend themselves to traditional
to summarise attainment. They are conceived forms of assessment and reporting.
within what Robert Mislevy and colleagues
(2012: 1213) refer to as the standard In the future, we can expect online assessment
assessment paradigm: to collect a wide range of information on
multiple dimensions of outcomes, and data
Data from each student are sparse, analytics to mine far more information from
typically discrete responses to perhaps students responses, thus enabling a more
30 to 80 test items. The items are rounded and complete picture of a students
predefined. The target of inference is achievements and capabilities. This requires
a students level of proficiency in a new kinds of assessment, as we have mentioned
domain framed in trait or behaviourist earlier, but also new kinds of metrics to
psychology and defined operationally summarise achievement and performance in
by the items. Learning during the those domains that require separate forms of
course of assessment is assumed to reporting.
be negligible.
Looking even further into the future, more
But in an online environment it is possible dramatic changes in the ways of assessing and
to not be so constrained, and one might characterising individuals may become possible
think of assessment as involving continuous ways that personalise the assessment by
performances in interactive environments, for looking not just at multidimensional aspects of
example; richer data that encompass many performance but that also take into account
aspects of activity at any level of detail; interest the particular situation and context in which
in multiple aspects of proficiency, evoked in individuals were observed and other person-
different combinations in different situations; specific information about the performance,
learning may occur, and may indeed be an aim challenging the sufficiency of what Mislevy
of the experience (Mislevy et al. 2012: 13). refers to as the one-size-fits-all presumption
of standard assessment, which defines the
For some purposes, the current paradigm, target of inference in terms of an assessor
which involves a predetermined test that specified domain of tasks, to be administered,
seeks to make inference to a single underlying scored, and interpreted in the same way for all
trait, such as literacy or mathematics, may students (2013: 89).
continue to make sense, at least for the time
being. But there is a price to pay. Finally, online environments open up poss
ibilities for more immediate, detailed and
Achievement is inherently multidimensional, meaningful reporting of formal assessment
and, while there will be contexts in which data that is tailored to the needs of specific
45
PREPARING FOR A RENAISSANCE IN ASSESSMENT
stakeholder groups, including parents, teachers, Assessing the full range of valued outcomes
school administrators, employers, tertiary As noted earlier, there is evidence that many
institutions and the general public, using the formal assessment programmes are charac-
internet and smartphone/tablet devices. In terised by a preponderance of questions rely-
addition, online environments offer richer ways ing on relatively low-level cognitive processes
to record the achievements and significant such as memorisation, comprehension and
experiences of individual students, particularly problem-solving of a predictable and formu-
via lifelong personalised student e-portfolios. laic nature; few questions assess the kinds of
thinking skills resulting from deep learning and
For example, the Hong Kong Education Bureau the capacity to apply what one has learned
coordinates an online Student Learning Profile to new situations; and no questions address
system for its high schools, providing a range of the inter-personal and intra-personal skills and
online templates that schools can use or adapt competences now seen as vitally important.
to capture information to supplement the
Hong Kong Diploma of Secondary Education To a large extent, this situation reflects an
Examination results, including: over-reliance on multiple-choice questions
which came about thanks to inflated concerns
academic performance in school (other for reliability, at the expense of validity, and by
than examination results); economic considerations over the costs of
other learning experiences; marking essays and open-ended questions.4
performance/awards gained outside But it also reflects the absence of established
school; and ways to assess these outcomes rigorously.
students self-accounts of their learning
experiences and career goal setting.3 In some situations, a partial answer may be
to both reduce the frequency of testing and
The system has gained wide acceptance to increase the proportion of questions in
among universities in Hong Kong, mainland tests and examinations that assess higher-
China and overseas. order cognitive processes. In the USA, the
widely adopted CCSS have presented the two
More meaningful information on learning assessment consortia charged with develop
is what assessment reform is ultimately all ing aligned assessment systems a significant
about, as it is the key to better choices, lifting challenge in assessing a range of higher-order
performance and the motivation to improve. cognitive processes and problem-solving
New thinking and new technologies offer the capabilities. Sample items published on their
prospect of much progress in the quality of respective websites indicate that significant
information on students achievements and progress has been made in meeting this
capabilities. challenge.5
3.
See http://cd1.edb.hkedcity.net/cd/lwl/ole/SLP/SLP_01_intro_01.asp (accessed 15 November 2014).
4.
When questioned recently on their views about the current state of testing in the USA, Howard Everson, Vice President for
Research at the College Board, said he thought the importance of reliability had been overblown in the USA. Professor Robert
Linn, one of the countrys leading assessment experts, agreed, adding that reliability was less important than comparability and
validity and fairness. See the full interview at Tucker (2013a).
5.
See, for example, http://parcc.pearson.com/sample-items (accessed 15 November 2014); and http://www.smarterbalanced.org/
sample-items-and-performance-tasks (accessed 15 November 2014).
46
TRANSFORMING ASSESSMENT
For other outcomes, the way forward may be Pearsons Center for NextGen Learning and
to learn from systems that have succeeded Assessment has published a Framework of
in assessing hard-to-test outcomes through Approaches to Performance Assessment that
the use of performance assessments. The sets out different approaches to assessing a
Colorado Department of Education defines wide range of valued learning outcomes that
performance assessment as assessment are not easily assessed using traditional testing
based on observation and judgement. It has approaches.7
two parts: the task itself and the criteria for
judging quality. Students complete a task (give Both of these consortia in the USA have
a demonstration or create a product), which developed performance tasks that assess
is evaluated by judging its level of quality using higher-order thinking and problem-solving
a rubric. Examples of demonstrations include capabilities, in many cases making use
playing a musical instrument, carrying out the of technology-enhanced item formats
steps in a scientific experiment, speaking a and detailed scoring rubrics that require
foreign language, reading aloud with fluency, professional judgements of the quality of
repairing an engine or working productively students responses. One example (Deer in
in a group. Examples of products can include the Park), developed by PARCC (Partnership
writing an essay, producing a work of art, for Assessment of Readiness for College and
writing a lab report, etc.6 David Conley and Careers) for its prototyping project, was the
Linda Darling-Hammond describe a number of fourth-grade sample question shown in Figure
performance assessments in Creating Systems 3.2.8
of Assessment for Deeper Learning (2013).
State Park
8 miles
A ranger estimates that there are 9 deer in each square mile of the park.
If this estimate is correct, how many total deer are in the park? Explain your answer
using numbers, symbols and words.
6.
See http://www.cde.state.co.us/contentcollaboratives/phase2performance (accessed 15 November 2014).
7.
See http://paframework.csprojecthub.com/?page=home (accessed 15 November 2014).
8.
Available at http://www.ccsstoolbox.com (accessed 15 November 2014).
47
PREPARING FOR A RENAISSANCE IN ASSESSMENT
This is clearly a challenging problem for fourth In the USA, the two federally funded assessment
graders, involving the operations of addition, consortia, PARCC and Smarter Balanced, both
subtraction, multiplication and perhaps also intend to incorporate automated scoring into
division (up to multi-digit) and requiring their common core state assessments, planned
knowledge of areas, perimeters, rectangles and for implementation in 2014. This indicates
how to solve for an unknown in a perimeter. a growing confidence in automated essay-
Moreover, students might choose to tackle scoring as means of enabling the assessment
the problem in different ways and arrive at of a wider range of outcomes in the context
the correct answer. The accompanying rubrics of large-scale, high-stakes testing programmes.
allow for a possible 6 points of credit for their
response. A more fundamental solution lies in using
digital technologies to support the adoption
A barrier to the use of such assessments has of a new generation of assessment tasks
been the difficulty and costs of objectively specifically designed to assess deep learning
rating open-ended student responses. and other key outcomes not amenable to
However, advances in artificial intelligence in assessment via traditional tests and exam
combination with online delivery are helping inations. Computerised assessment opens
to overcome some of these barriers. While up the prospect of presenting students with
it might at first seem implausible that a tasks that are interactive, that make use of
machine could mark an essay, several studies simulations in which students manipulate
have indicated that automated essay-scoring variables to achieve a desired result, that are
systems employing artificial intelligence are dynamic, with the task itself subject to new
capable of achieving levels of reliability equal information and changing circumstances,
to or exceeding that of trained human raters.9 and that generate a detailed log of students
Some widely used systems include Project interactions with the task. Furthermore, it
Essay, Grader, Intelligent Essay Assessor, offers solutions to the age-old problems of
E-rater, IntelliMetric and Bayesian Essay validity and reliability across those assessing, by
Test Scoring System. allowing not only automated scoring of keyed
responses but also rating of a wider range
All systems developed thus far have certain of response types, including performances
limitations, but so too does human rating.10 captured using video and sound recordings,
Currently, automated scoring of extended by multiple professional assessors in different
response questions is usually deployed in high- locations and at different times.
stakes testing contexts in conjunction with
human rating (to provide a second rating or to Jim Soland, Laura Hamilton and Brian Stecher,
quality-assure the human ratings, for example). in Measuring 21st-Century Skills: Guidance for
As automated essay-scoring technologies Educators (2013), provide (in addition to a
improve, they can be expected to play a much review of the issues involved) interesting case
more prominent role. studies of new measures that indicate what is
possible right now. One example they highlight,
For a comparison of strengths and weaknesses of automated and human scoring, see Zhang (2013).
10.
48
TRANSFORMING ASSESSMENT
which does not require overly sophisticated In the case of assessment for certification and
technology, is Mission Skills Assessment, a selection purposes, one-shot examinations
scientifically based assessment of six character can place students under great pressure to
traits teamwork, creativity, ethics, resilience, perform, particularly in some Asian countries
curiosity and time management which has where academic expectations are high and
been developed by the Independent Schools failure to excel can cause great loss of face for
Data Exchange and ETS (Educational Testing students and their families.These pressures can
Services) in the USA. For each trait, an overall be reduced through more cumulative forms of
assessment is achieved by combining multiple assessment and/or a system in which students
indicators of the relevant construct, including have opportunities to take examinations when
student self-reports, teacher observations they are ready to sit them and to re-sit them
and situational judgement tests. In this way, in order to improve grades.
it has proven possible to achieve high levels
of reliability (as measured by both internal In the case of assessment for accountability
consistency and testretest reliability) and purposes, undue pressures on teachers and
of validity (in terms of predicting student school and system administrators can be
academic outcomes). reduced though the use of multiple indicators
of performance, as opposed to exclusive
A more high-tech example is the OECDs reliance on test scores, and on accountability
proposal for assessing collaborative problem- for implementing policies and practices aimed
solving as part of PISA 2015 (OECD 2013a). at improving student progress, as opposed to
This will be a fully computer-based assessment student attainment data that takes little or no
in which a student interacts with a simulated account of the circumstances and influences
collaborator or avatar in order to solve a affecting attainment.
complex problem.
The quality of assessments is also a factor
Both Mission Skills Assessment and PISAs to consider. Questions with one correct
assessment of collaborative problem-solving answer (such as multiple-choice questions)
represent examples of the first tentative steps are particularly vulnerable to cheating, but
in the unfolding of next-generation digital questions that require higher-order thinking,
assessment. open responses and demonstration of a
students underlying thinking in arriving at an
Maintaining the integrity of assessments answer are less vulnerable (assuming one can
When the stakes for individuals are high, risks authenticate that it is the work of the student,
to the integrity of assessments will also be perhaps with the help of voice-recognition
high. That is human nature, and something software, secure browsers and equipment to
technology cannot change. Accountability detect unauthorised use of cellphones and
is vital, but if it is implemented in ways that other devices).
provoke fear rather than motivation and the
capacity to improve, then the accountability That said, new developments in technology can
system itself is the problem and should be nevertheless be of assistance. One of the great-
adjusted. est fears for administrators of examinations and
tests is security prior to administration. Papers
49
PREPARING FOR A RENAISSANCE IN ASSESSMENT
50
TRANSFORMING ASSESSMENT
the integration of these core activities. Dr At the largest scale, one might view the entire
Ramona Pierson, one of the leaders in the curriculum in broad detail. At the smallest scale,
development of new software to drive more it could be a small segment of the curriculum,
personalised approaches to learning and broken down into a sequence of step-by-
teaching, summarises the challenge as follows step items of skill and knowledge required in
(2011: 1): order to attain more generalised curriculum
outcomes. These are what are known as
The goal of the Learning Ecosystem student learning progressions and are the
(LE) is to bring critical resources into basic units on which learning ecosystems are
the hands of teachers to transform built (Popham 2008: 83). They are much more
the teaching and learning moment. By granular than one finds in most articulations
leveraging a fully integrated learning of curriculum or core standards and, in the
ecosystem, education will finally be able context of learning ecosystems, are not static
to fulfil the goal of developing a mass but are continually refined on the basis of
customised, personal learning solution system feedback on how students are learning.
at scale for all students and educators.
In next-generation learning sys
Most of us are familiar with the way in which so
much of the content of learning and teaching
tems, the teacher can construct and
deconstruct the curriculum in ways
that formerly existed in print form (such as uniquely relevant to students
curricula, lesson plans, student and teacher
texts and resources, assessments and teacher In addition, at each scale, one would be
professional-development materials) has mi able to view the curriculum according to
grated online in recent years. Developers ones particular focus. In next-generation
of next-generation learning systems such as learning systems, the teacher can construct
Pearson dont start with preconceived notions and deconstruct the curriculum in ways
of any of these components but completely uniquely relevant to students, building upon
rethink the whole delivery process and how local curriculum standards and content
to best assist teachers to connect all of the and supplemented with other content, but
elements so that they operate seamlessly. always within a common framework and
using a consistent set of terminology and
We can follow the logic of these systems with codes, allowing easy identification and cross-
the aid of the diagram in Figure 3.3. referencing. In this way, they will be able to
connect more readily with students interests
Curriculum and aspirations and engage them more deeply
At the top of the diagram is the curriculum, in the learning.
but one looking quite different to curriculum
documents of the past, consisting of online Assessment
interactive multidimensional maps at several Going clockwise around Figure 3.3, the next
different scales that can be interrogated in element is assessment. Personalised learning
different ways, depending on ones focus or systems move straight from the curriculum
query. (deciding what students need to learn) to
assessment, because effective learning and
51
PREPARING FOR A RENAISSANCE IN ASSESSMENT
Curriculum
Personalised
Assessment
instruction
Next -
generation
learning
Professional
Resources
learning
Data management
and analysis
teaching require that one begins with the psychology to just one principle, I would say this:
individual student and their starting points. The most important single factor influencing
learning is what the learner already knows.
Geoff Masters quotes David Ausubel, the Ascertain this and teach him accordingly
American psychologist renowned for his (Ausubel 1968: vi, quoted in Masters 2013: 10).
ground-breaking research into the role of
advance organisers in learning, as having So the primary role of assessment is to work
declared: If I had to reduce all of educational out whether the student is ready to learn
52
TRANSFORMING ASSESSMENT
the next segment of the curriculum, and, if that can be interrogated for patterns and
not, where the gaps are so that these can used to generate individualised and pictorial
be attended to first. As instruction proceeds, achievement maps or profiles.
assessment is both backward-looking as a
check on what has been learnt and on the Within next-generation learning systems,
quality of that learning and forward-looking assessment will occur at all scales, from the
in terms of readiness to tackle new content. most granular to the most synoptic. While its
Whereas in the past assessment has typically primary function will be formative, directed at
been looked upon as a discrete activity that proximal learning objectives and concerned
follows teaching and learning, in the future it with immediate feedback to improve learning
will be seen more as an aspect of ongoing and teaching, there will be a seamless transition
instruction. to summative assessment of progress towards,
and achievement of, wider curricular goals.
Assessment might take the form of a series What is more, these summative assessments
of stand-alone mini-tests or quizzes, but, will be demonstrably reliable, comparable
increasingly, it will be embedded naturally and valid for incorporation into reporting
into learning activities so that assessment is systems, which can then support a range of
continuous and unobtrusive, making use of uses including certification, selection and
the students digital learning footprint to track accountability. In other words, we see a new
progress, thereby encouraging immediate generation of assessments that will blur
attention to learning obstacles if and when current distinctions and unhelpful dichotomies
they are encountered and breaking down the such as internal/external, formative/summative
barriers between learning and assessment. and qualitative/quantitative.
Furthermore, such assessment will not always Much of the routine work in collecting, marking
or even mainly be about assigning scores. As and extracting information from student
Sadler, one of the first to articulate the concept responses will be automated, thus freeing up
of formative assessment, observed many years the teacher to focus on making use of the
ago: Qualitative [personalised] judgments are feedback obtained from daily observations and
invariably involved in appraising a students assessment tasks to personalise learning and
performance Growth takes place on many improve instruction. An example of the kind
interrelated fronts at once and is continuous of tool that makes this possible is Assistments,
rather than lock-step (1989: 123). developed at the Worcester Polytechnic
Institute.11 Where professional judgement is
Through the use of rubrics, which will define involved in assessing work, multiple graders
performance in terms of a hierarchically may be involved to ensure consistency of
ordered set of levels representing increasing standards and to maximise the reliability of
quality of responses to specific tasks, and a assessments.
common set of curriculum identifiers, it will
be possible to not only provide immediate While learning systems will embed a
feedback to guide learning and teaching but comprehensive range of assessments, au
also to build a digital record of achievement thoring tools will also enable teachers to
53
PREPARING FOR A RENAISSANCE IN ASSESSMENT
generate their own and upload them into the The days of hard-copy textbooks, textbook-
system for review and analysis as part of an adoption regimes and the domination of
overall development and quality-assurance the multibillion-dollar textbook market by
process. a handful of publishers may be numbered.
Many textbooks have been converted into
The assessment systems developed by PARCC digital format and made more interactive, thus
and Smarter Balanced represent a significant bringing down costs, allowing more frequent
milestone in the creation of large-scale, updating of their contents and also opening up
integrated online learning-assessment systems the field to smaller players.14
that incorporate assessments and tools to
support formative classroom assessment A plethora of interactive online resources is
practices, monitor student progress and meet emerging, developed both commercially and
mandatory accountability measures. by the profession itself. Much of this is being
made available at low cost or free of charge.
A further example of a more developmental Examples of providers include KQED, a San
initiative is New Pedagogies for Deep Learning Francisco-based public media outlet offering
(NPDL), a global partnership of clusters of educators free resources for integrating
100 schools in each of ten countries that are media and new-media tools into teaching
committed to mobilising deep learning across and learning, and CK12, a not-for-profit
systems.12 One component of NPDL is a foundation that creates and aggregates high-
research-and-development effort to create a quality resources aligned to state curriculum
new generation of instruments and protocols standards and offers its FlexBook System, an
to assess deep learning. The starting point will online platform for assembling, authoring and
be setting out of competencies for learning distributing interactive, multi-modal content
tasks and assessing student progress. This will for schools.15
begin with adaptation of rubrics from the ITL
Research/21CLD programme that defined Through meta-tagging of resources to the
levels and broad indicators of various deep- curriculum (facilitated by common terms
learning competencies.13 and definitions) and also to other pertinent
dimensions relevant to teaching, next-gen
Resources eration learning systems will tap into this much
In generating instructional sequences, learning richer pool. For example, in Australia, education
tasks and associated assessment activities, ministers have established Education Services
next-generation learning systems will embed Australia (ESA) as a not-for-profit company to
or search out the resources that most closely support national priorities and initiatives and,
match students learning needs, accessing both in particular, to create, publish, disseminate
purpose-built, commercially available materials and market curriculum and assessment
and the rapidly expanding collections of public- materials, ICT-based solutions, products and
domain/creative-commons resources. services that support learning in the context
12.
See http://www.newpedagogies.info (accessed 15 November 2014).
13.
See http://www.itlresearch.com/itl-leap21 (accessed 15 November 2014).
14.
See for example Boundless, with its online interactive textbook alternative that makes use of open-source content
(www.boundless.com) and edSurge (www.edsurge.com/products/curriculum-products).
15.
http://blogs.kqed.org/education/ and http://www.ck12.org/about/ (accessed 15 November 2014).
54
TRANSFORMING ASSESSMENT
55
PREPARING FOR A RENAISSANCE IN ASSESSMENT
Next-generation learning systems will changes. Learning systems of the future will
incorporate algorithms that interrogate free up teacher time currently spent on
assessment data on an ongoing basis and preparation, marking and record-keeping
provide instant and detailed feedback into the and allow a greater focus on the professional
learning and teaching process. Moreover, the roles of diagnosis, personalised instruction,
information generated by learning systems will scaffolding deep learning, motivation, guidance
have value well beyond the individual learner: and care. This is the combination of activities
it will provide a source of generalisable new that John Hattie describes as teacher as
knowledge, paving the way for a design activator (2009: 17).
science approach, in which the primary focus
of educational research is on evidence-based Teachers will need to constantly update and
strategies for improving learning and teaching.17 acquire knowledge in order to perform this
role effectively. They will need the kind of
This will become increasingly viable through specific knowledge base characteristic of any
the application of data-mining and data true profession. Next-generation learning
analytics to discover patterns and relationships systems will therefore build in both formal and
within the vast number of transactions that informal personalised professional learning for
occur on a daily basis within classrooms.18 teachers, connecting them to instructional
materials, resources and networks that
For so long, much of what happened provide timely, point-of-need professional
inside classrooms has remained hidden in development and support directly related to
a black box, making it difficult to pursue the task in hand, together with opportunities
a deliberate and continuous approach to to gain recognition and credit for their learning
the improvement of learning and teaching. and development.
Next-generation learning systems offer the
prospect of revolutionising learning research Personalised instruction
and development by incorporating internal With all the above in place, it is then possible to
data-driven processes for improvement and talk confidently about personalised instruction,
by creating a design-focused concept of which is the final and most crucial component
the role of research in shaping practice. In of Figure 3.3. By personalised instruction, we
other words, we will see the development mean instruction that is adjusted on a daily
of learning systems consciously created as basis to the readiness of each student and
evolving products of ongoing research and that adapts to their specific learning needs,
development, aimed at achieving continuous interests and aspirations. The fundamental
improvement.19 premises of personalised learning have been a
part of the writings of educators for decades
Professional learning but have become a realisable dream in recent
In next-generation learning systems, the years, thanks to the advent of new digital
teacher retains the key role in fostering the technologies.
learning for each student, but the job itself
17.
One of the first and most persuasive to advocate a shift of the whole educational research enterprise towards improvement
by design was Thomas Sergiovanni (see Sergiovanni 2000).
18.
For a summary of this emerging field, see US Department of Education (2012).
19.
This was foreseen by a number of writers a decade or more ago, notably by Professor David Cohen and colleagues (2003).
56
TRANSFORMING ASSESSMENT
57
PREPARING FOR A RENAISSANCE IN ASSESSMENT
Assessments that can Use of adaptive testing to generate more accurate estimates
accommodate the full range of student abilities across the full range of achievement while
of student abilities reducing testing time
Assessments that have The adoption of (1) more cumulative approaches to approaches to
integrity and are used in ways assessment for selection purposes, with opportunities to re-sit; and
that motivate improvement (2) intelligent accountability systems that utilise multiple indicators
efforts and that minimise of performance, that are designed to incentivise improvement and
opportunities for cheating that avoid the creation of winlose consequences for stakeholders
and gaming the system for outcomes not fully under their control
Assessments that support Sophisticated online intelligent learning systems to integrate the
students and teachers in key components involved in effective instruction and to support
making use of ongoing a new generation of empowered teachers in reliably assessing
feedback to personalise a much wider range of outcomes, using instant and powerful
instruction and improve feedback on learning and teaching to deliver truly personalised
learning and teaching instruction
58
TRANSFORMING ASSESSMENT
Thats quite an impressive list, but does it add Rather than focusing on discrete assessment
up to an assessment renaissance? We believe programmes, we would suggest that it is more
that it does, but only if we are prepared to productive to view assessment as serving
rethink some of the purposes of assessment, to distinct data needs at three levels:
seek a better alignment between assessment
with curriculum and teaching and to rebalance 1 the teacherstudent interface (tradi-
assessment priorities. tionally the classroom);
2 the school; and
An integrated, multi-level view of 3 the system.
assessment
Perhaps the most urgent need right now in the The most important level is the teacher
field of assessment is for an overall conceptual student interface, because this is where
framework and longer-term vision for its place learning takes place and where there is the
and purpose in relation to the core processes greatest need for assessment data to enable
of curriculum and of learning and teaching. We a truly personalised approach to learning
believe that the starting point is to think of and teaching. We would argue that the other
assessment in an integrated, multi-level way, two levels should be built on the assessment
which, building upon the work of Rick Stiggins carried out at this first level.
and Dale Duke (2008), and drawing upon
earlier work by Peter Hill (2010), we represent Next is the school level, where education is
as a three-level pyramid (see Figure 3.4). managed and delivered. Schools need to draw
System
School
Teacherstudent
59
PREPARING FOR A RENAISSANCE IN ASSESSMENT
upon assessment data, collected at all three interface, are fully aligned with the curriculum
levels, to evaluate their performance, to be and with pedagogies adapted to twenty-first-
accountable to parents for the progress of century learning and support new and more
their students and to manage learning and sophisticated forms of certification and multi-
teaching within the school. This involves using level accountability. It requires close attention
assessment for both summative and formative to the design not just of discrete assessments
purposes in addressing key questions such as: but to what Masters refers to as learning
assessment systems (2013: 3256).
How are we doing relative to other
schools? The challenge for awarding bodies
Are we improving? In considering the future of assessment for
How successful are we in teaching the certification purposes, the challenge facing
intended curriculum? awarding bodies is to work out how they can
Which students, classrooms and take greater advantage of new technologies to
teachers need extra support? deliver examinations online and thus improve
their capacity to:
At the top of the pyramid is the system that
provides the policy and resourcing context for a ssess a wider range of valued outcomes;
the schools it serves. Systems need assessment create more authentic assessment tasks;
data for macro-level formative and summative assess the full range of student abilities
purposes, including the evaluation of policies more accurately and speed up the
and programmes, to identify priorities and marking process, particularly for
support needs, certifying student achievement, extended response questions;
holding others to account and, in turn, being extend the window of time in which
accountable for the performance of the examinations may be taken and work
system as a whole. towards the longer-term goal of
examinations on demand; and
Within this tri-level assessment model, we use the potential of online assessment
envisage much greater vertical and horizontal and developments in psychometric
flows of information among and within methods to more rigorously maintain
the three levels than currently occurs. We and constantly benchmark standards to
also predict greater reliance by systems on ensure they are world-class.
assessment carried out at the lower levels, as
the availability and quality of assessment data To date, many awarding bodies, while
collected at the teacherstudent interface embracing onscreen marking, have moved
improves. only cautiously towards the adoption of online
assessment, primarily due to constraints of
New developments in assessment, online connectivity and hardware availability. As these
assessment environments and next-generation constraints are removed and solutions found
learning systems provide the opportunity to to security and integrity issues, schools and
rebalance assessment policies and practices students will increasingly opt for credentials
so that they build on high-quality assessment offered via online assessment, noting that
of student progress at the teacherlearner these credentials are less geographically
60
TRANSFORMING ASSESSMENT
61
PREPARING FOR A RENAISSANCE IN ASSESSMENT
are multiple purposes of assessment to account for what at each level of the system
and that a better balance must be and establishing accountability arrangements
struck among them. The country must that are reasonable, effective and promote a
invest in the development of new types shared trust in the system. This means being
of assessments that work together sure, as far as possible, that accountabilities are
in synergistic ways to effectively within the power of the person or organisation
accomplish these different purposes being held to account.
in essence, systems of assessment.
Those systems must include tools In the school educational context, this typically
that provide teachers with actionable means holding systems, schools and teachers
information about their students and responsible for:
their practice in real time. We must also
assure that, in serving accountability s tudent growth or progress, rather than
purposes, assessments external to the absolute levels of performance;20 and
classroom will be designed and used to doing those things that the evidence
support high-quality education. shows lead to improved outcomes not
just for achievement of the outcomes
themselves (which may be only partly
In other words, balance and alignment are attributable to the specific person or
critical when it comes to uses of assessment. organisation being held to account).
The answer is not to abandon the search for
rigorous systems of accountability but rather Direct accountability for outcomes is only
to engage the teaching profession in the design appropriate where it is possible to separate
and implementation of systems that deserve out the impact of those being held to account.
their support. Having achieved agreement on accountability
at different levels, one can then begin to align
An important avenue for building the it with a multi-level system of assessment that
professions trust in accountability systems is balances out and aligns the claims of different
through embracing the concept of reciprocal purposes of assessment.
accountability, which Elmore states as implying
that,For each unit of performance I demand of Equally important in the design of accountability
you, I have equal and reciprocal responsibility systems is the need to take into account
to provide you with a unit of capacity to capacity-building requirements, particularly
produce that performance, if you do not those related to teachers assessment literacy
already have that capacity (2004: 2445). and their capacity to make full use of the
The implications of reciprocal accountability potential of assessment data, so that they can
for how systems and schools operate are in turn provide feedback and enhance their
substantial. Accountability is best thought of as own capacity to deliver more effective and
a multi-level, shared, reciprocal process that all personalised forms of learning and teaching.
parties embrace.
The challenge for learning and teaching
Designing an effective accountability system This takes us to the challenges inherent in
involves clarifying who can and should be held seeking to transform assessment undertaken
62
TRANSFORMING ASSESSMENT
as part of the ongoing process of learning the systems required to collect and analyse the
and teaching. Earlier, we noted the prospect data such assessment provides. It also raises big
of addressing the limitations of the age issues about teacher development and teacher
grade progression model and of realising capacity in order to operate in a digital class-
the potential of formative assessment in room in which the goal is personalised
generating powerful feedback to optimise learning, with increasing integration of
learning and teaching on a day-to-day basis. classroom activity into learning systems, and in
We suggested that this transformation would which the teachers role changes significantly
increasingly mean that formative assessment is potentially in the direction of becoming more
an integral and vital part of learning systems professional.
designed to deliver personalised learning. We
also proposed that this kind of assessment we are on the verge of a radical change
should provide the primary building block for in thinking and practice regarding
all other kinds of assessment. assessment in school education;
63
PREPARING FOR A RENAISSANCE IN ASSESSMENT
4. A FRAMEWORK
FOR ACTION
In this chapter, we propose a way for policy- them. Poorly executed, we could run into
makers, schools, school-system leaders and difficulties that take years to rectify.
other key players to prepare for the assessment
renaissance, to ensure that they maximise the In addition, we need always to be conscious
benefits of new developments and changes of the wider context and of the fundamental
in thinking whilst avoiding the potential changes that are happening in education more
downsides.We present a framework for action broadly, of which assessment is but one, albeit
that allows change to be implemented in ways vital, part. That wider context will influence
and timeframes suited to the starting points, both the nature and the pace of change.
capacity and readiness of schools and systems.
In the previous chapter, we focused on the As we have indicated throughout, much of
potential benefits of the impending assessment the innovation in the area of assessment will
renaissance, but it cannot be assumed that occur at the fringes of the system and perhaps
these benefits will always be realised. The outside it altogether, in the realm currently
path ahead is likely to be rocky. There are thought of as computer gaming. In addition,
many examples of systems and schools that ideas and innovations will be shared laterally
have had their fingers burnt by the over-hasty between schools and indeed across national
adoption of early and untried versions of next- boundaries. This process of innovation is to
generation assessment that failed to live up to be welcomed, but an inevitable consequence,
expectations. without intervention, would be haphazard
adoption and potentially a growing gap
There are also examples of systems that between the haves of the assessment re
have used assessment reform in ways that naissance and the have nots. If there is to be
reinforce problematic practices and work universal benefit across a system, governments
against the more important, longer-term goals will need to act. Moreover, some of what is
of personalising learning, enhancing teacher required to make the renaissance universal,
engagement and professionalism, incentivising such as the technological infrastructure, cannot
students, teachers and school administrators be provided by individual schools.
and better aligning assessment with curriculum,
learning and teaching. The realisation of the assessment renaissance
and its benefits depends, therefore, on
As we indicated at the outset, while we may governments, systems and schools playing a
well be on the verge of a radical change in powerful strategic role. Here we set out what
thinking and practice regarding assessment the key features of that role might be.
in school education, the exact form these
changes will take depends very much on how
we anticipate, envision, plan for and shape
64
A FRAMEWORK FOR ACTION
65
PREPARING FOR A RENAISSANCE IN ASSESSMENT
66
A FRAMEWORK FOR ACTION
by the uncertainty. It is particularly damaging (Barber and Fullan 2005) and supplemented
if qualifications, which need to be a currency with additional conclusions of specific rele
in the labour market, become politically vance to assessment reform. We call these
contested. For these reasons, as far as possible, conclusions a Tri-Level Reform Solution,
governments should strive to gain cross-party because we consider them relevant to the
consensus for assessment strategy and thus aforementioned three levels of teacher
enable it to be pursued systematically over an learner, school and system.
extended period. Much the same issues apply
at the level of system and school leaders, Moral purpose
although generally with less force, meaning The overwhelming majority of educators are
that it is important to create a shared vision motivated by a sense of moral purpose. This
owned by all rather than by the current applies particularly to the role of assessment.
leadership. Moral purpose is heightened when assessment
is seen as the key to improving learning,
7. COMMUNICATE CONSISTENTLY especially for those who are falling behind, or to
providing recognition of student achievement.
Earlier we identified the forty-year
communication gap. There are many Positive experiences
misconceptions in the assessment debate, People frequently change their behaviours
especially (but not only) among parents and before they change their beliefs. New, positive
the public. There is a strong attachment to experiences with next-generation assessment
traditional assessment in many countries, will be a powerful motivator, especially
including China, Korea and the UK. when they relate to fulfilling moral purpose.
Government and educators often add to the Moreover, they will differ from individual to
confusion by engaging in loud and sometimes individual, depending on their starting point.
wilfully misleading debate. If the assessment
renaissance is to come about and its benefits Shared vision and ownership
for learners are to be realised, then there will Motivation is further enhanced when there
need to be consistent communication, ideally is a shared vision and ownership of change.
with government and leading educators Successful systems and schools dont simply
working together on the messages, and with demand change; they build a shared vision
school principals and teachers communicating and ownership and engage all stakeholders in
to parents the significance of the changes for its creation and realisation. Next-generation
their children. assessment must be willingly embraced by the
profession rather than imposed from above.
8. APPLY THE CHANGE KNOWLEDGE
Learning in context is key
In approaching the task of change management, Even the best professional development
our starting point needs to be our knowledge workshops are only input for success. Actual
base of what it takes to achieve successful, success occurs in the context of daily
system-wide change. We summarise this learning. The most fundamental feature of
knowledge base below, adapted from a set next-generation assessment its use to
of conclusions that we previously published improve learning and teaching can only be
67
PREPARING FOR A RENAISSANCE IN ASSESSMENT
Professional learning communities at the Lateral capacity is vital for spreading knowledge
school and school-network levels are crucial and increasing commitment. Lateral capacity-
in establishing purposeful and collaborative building consists of strategies that enable
learning cultures in which teachers learn from teachers, schools and school systems to learn
each other and school leaders and teachers from each other. This implies systematic and
collaborate for continuous improvement. For purposeful networking to connect with those
next-generation assessment to become a who are on the same journey, but perhaps in a
reality, teachers will need to adopt, over time, different place on the path.
a different and more professional role than the
one currently demanded by one-size-fits-all Leadership is the key to system
instructional approaches. Professional learning transformation
communities are the key to bringing about Leaders must work with a vision, goals and
this role transformation. In addition, as Michael more proximal objectives and do so with and
Fullan would argue, professional learning in a through the development of other leaders
purposeful and collaborative learning culture as they go. It also means having leaders with
can be a powerful way to reduce ineffective specialist knowledge of the field, such as a
teaching and unwanted variation and maximise full-time chief information officer whose role
effective teaching and positive variation. is to attend to digital needs and the use of
technology to improve learning and teaching.
System support
Schools, their leaders and the professional Better value for money
learning communities within them will not be The logistical complexity and costs of most
sustained unless the system actively supports current formal assessment programmes are
and encourages them and fosters and maintains formidable. Apart from test development,
their development. While some systems are they include:
still struggling with the infrastructure issues
of interconnectivity and hardware, others are p rinting the tests;
grappling with problems such as identifying maintaining the security of printed tests;
open platforms for next-generation learning secure distribution and collection of
systems, accessing quality online content, papers;
designing new assessments and so on. labour-intensive marking of scripts;
data entry and cleaning;
68
A FRAMEWORK FOR ACTION
p
sychometric work to calibrate test of formative assessment and feedback is small
items, equate tests and generate results; relative to the potential pay-off in learning
preparing results for publication and outcomes.
making them available to schools and a
wider public along with relevant advice; DRAWING TOGETHER THE THREADS
and
providing support materials to assist Our argument has been that the push factor
stakeholders in making use of the data. of globalisation and the pull factor of the
performance ceiling are together giving rise
Once schools and homes have connectivity to an educational revolution in which certain
and the relevant hardware to support online long-held beliefs and ways of doing things
assessment, and once systems invest in more are repudiated and replaced by a new set of
sophisticated test-delivery systems, the burden beliefs and practices.
of a number of these logistical and cost issues
can be reduced significantly. The seeds of each of these key changes can be
seen all around us, but full adoption will take
Fullan and Langworthy (2014) provide a some time to achieve. And for the education
compelling argument that, while costs are revolution to happen, we will have to change
coming down every day, even at current prices, our views on the following factors:
the costs per student per year can be offset
through reprioritisation and savings in other A students capacity to learn and profit
areas. In the case of assessment, there are from formal education.
specific additional upfront costs in developing What students need to learn. There
relevant software, creating quality banks has to be a greater emphasis on the
of items and creating new kinds of tests or deeper understanding of big ideas, the
examinations. However, considerable savings organising principles of disciplines and
are possible through work with other systems explicit and systematic attention to
that have already done or are about to do twenty-first-century skills.
this developmental work and are prepared to The focus of educational policy. We
share it at little or no cost. need a shift from focusing on the school
to focusing on the individual student.
But these costs need to be considered alongside The basic organisation of schooling, in
the expected benefits and, in particular, particular a repudiation of the age
the significantly higher learning outcomes grade progression model in favour of
achievable by using online assessment to access and progression more aligned to
facilitate formative assessment and generate a students readiness to learn.
instructionally valuable feedback. Professor How students will learn and how
John Hatties meta-analysis of the research teachers will teach, in particular, a shift
literature (2009) indicates their sizeable effect towards much of learning time spent
(sizes in excess of 0.7 of a standard deviation). within an online learning environment,
In other words, the level of investment in online with teachers focused less on providing
assessment and in building teacher capacity knowledge and more on assisting
required to facilitate and realise the benefits students to apply their knowledge,
69
PREPARING FOR A RENAISSANCE IN ASSESSMENT
a ssess the full range of student abilities; Above all, we believe it is vital not to
provide more meaningful information underestimate the significance of what is
on learning outcomes; taking place in this field. We see these changes
assess the full range of valued outcomes; in thinking on assessment leading to a veritable
70
A FRAMEWORK FOR ACTION
71
PREPARING FOR A RENAISSANCE IN ASSESSMENT
REFERENCES
Ausubel, D. P. (1968) Educational Psychology, A Cognitive View, New York: Holt, Rinehart & Winston.
Barber, M. (2014) Consistent Quality Plus Innovation: Not One or the Other, Both,
Education Reform Summit, 10 July. Available at http://blog.pearson.com/wp-content/
uploads/2014/07/20140710-National-Education-Reform-Summit-2.pdf (accessed 12
November 2014).
Barber, M., K. Donnelly and S. Rizvi (2012) Oceans of Innovation: The Atlantic, the Pacific, Global
Leadership and the Future of Education, London: IPPR. Available at http://www.ippr.org/
publication/55/9543/oceans-of-innovation-the-atlantic-the-pacific-global-leadership-and-the-
future-of-education (accessed 12 November 2014).
Barber, M. and M. Fullan (2005) Tri-level Development: Its the System, Education Week, 2 March.
Barber, M., with A. Moffit and P. Kihn (2011) Deliverology 101: A Field Guide for Educational Leaders,
Thousand Oaks, Calif.: Corwin.
Barber, M. and M. Mourshed (2007) How the Worlds Best-Performing School Systems Come Out on
Top, London: McKinsey & Company. Available at http://mckinseyonsociety.com/how-the-
worlds-best-performing-schools-come-out-on-top (accessed 12 November 2014).
Black, P. and D. Wiliam (1998) Assessment and Classroom Learning, Assessment in Education,
5 (1): 771.
Chadowsky, N. and V. Chadowsky (2010) State Test Score Trends through 200809, Part 1: Rising
Scores on State Tests and NAEP, Washington, DC: Center on Education Policy. Available at
http://www.cep-dc.org/publications/index.cfm?selectedYear=2010 (accessed 18 November
2014).
72
REFERENCES
Cohen, D. K., S. W. Raudenbush and D. L. Ball (2003) Resources, Instruction and Research,
Educational Evaluation and Policy Analysis, 25 (2): 11942.
Conley, D. T. and L. Darling-Hammond (2013) Creating Systems of Assessment for Deeper Learning,
Palo Alto, Calif.: Stanford Center for Opportunity Policy in Education.
DiCerbo, K. E. and J. T. Behrens (2014) Impacts of the Digital Ocean on Education, London:
Pearson. Available at https://research.pearson.com/content/plc/prkc/uk/open-ideas/en/articles/
a-tidal-wave-of-data/_jcr_content/par/articledownloadcompo/file.res/3897.Digital_Ocean_
web.pdf (accessed 12 November 2014).
Dikli, S. (2006) An Overview of Automated Essay Scoring, Journal of Technology, Learning and
Assessment, 5 (1). Available at http://ejournals.bc.edu/ojs/index.php/jtla/article/view/1640
(accessed 12 November 2014).
Dweck, C. S. (2006) Mindset: The New Psychology of Success, New York: Random House.
Elmore, R. F. (2004) School Reform from the Inside Out: Policy, Practice and Performance, Cambridge,
Mass.: Harvard Education Press.
European Commission (2013) Commission Staff Working Document: Digital Agenda Scoreboards
2013, available at http://ec.europa.eu/digital-agenda/sites/digital-agenda/files/DAE%20
SCOREBOARD%202013%20-%20SWD%202013%20217%20FINAL.pdf (accessed 15
November 2014).
Feith, D. (2011) Teaching America: The Case for Civic Education, Lanham, Md.: Rowman & Littlefield.
Fullan, M. (2008) The Six Secrets of Change: What the Best Leaders Do to Help Their Organizations
Survive and Thrive, San Francisco, Calif.: Jossey-Bass.
Fullan, M. and K. Donnelly (2013) Alive in the Swamp: Assessing Digital Innovations in Education,
London: Nesta. Available at http://www.nesta.org.uk/sites/default/files/alive_in_the_swamp.pdf
(accessed 12 November 2014).
Fullan, M., P. Hill and C. Crvola (2006) Breakthrough, Thousand Oaks, Calif.: Corwin Press.
Fullan, M. and M. Langworthy (2014) A Rich Seam: How New Pedagogies Find Deep Learning,
London: Pearson. Available at http://www.michaelfullan.ca/wp-content/uploads/2014/01/3897.
Rich_Seam_web.pdf (accessed 12 November 2014).
Global Education Leaders Program (2014) Transforming Global Education with New Metrics:
Statement by the Global Education Leaders Program, available at http://gelponline.org/sites/
default/files/resource-files/gelp_statement_june_2014.pdf (accessed 15 November 2014).
73
REFERENCES
Gordon Commission on the Future of Assessment in Education (2013) A Public Policy Statement,
Princeton, NJ: The Gordon Commission. Available at http://www.gordoncommission.org/rsc/
pdfs/gordon_commission_public_policy_report.pdf (accessed 12 November 2014).
Grossman P. and M. McDonald (2008) Back to the Future: Directions for Research in Teaching
and Teacher Education, American Educational Research Journal, 45 (1): 184205.
Hannon, V., A. Patton and J. Temperley (2011) Developing an Innovation Ecosystem for Education,
San Jose, Calif.: Cisco Systems. Available at http://www.cisco.com/web/strategy/docs/education/
ecosystem_for_edu.pdf (accessed 12 November 2014).
Hattie, J. (2009) Visible Learning: A Synthesis of over 800 Meta-Analyses Relating to Achievement,
London and New York: Routledge.
(2012) Visible Learning for Teachers: Maximizing Impact on Learning, London and New
York: Routledge.
Hattie, J. and H. Timperley (2007) The Power of Feedback, Review of Educational Research, 77
(1): 81112.
Hill, P. W. (2010) Using Assessment Data to Lead Teaching and Learning, in A. M. Blankstein, P. D.
Houston and R. W. Cole (eds.), Data-Enhanced Leadership: Using What You Know to Be a More
Effective Leader, Thousand Oaks, Calif.: Corwin Press, pp. 3150.
Hill, P. W. and K. J. Rowe (1996) Multilevel Modeling in School Effectiveness Research, School
Effectiveness and School Improvement, 7 (1): 134.
Ho, A. D. (2008) The Problem with Proficiency: Limitations of Statistics and Policy under No
Child Left Behind, Educational Researcher, 37 (6): 35160.
Hursh, D. (2007) Assessing No Child Left Behind and the Rise of Neoliberal Education,
American Educational Research Journal, 44 (3): 493518.
Leadbeater, C. (2002) Learning about Personalization, London: Innovation Unit, Department for
Education and Skills.
Levitt, S. D. and S. J. Dubner (2007) Freakonomics: A Rogue Economist Explores the Hidden Side of
Everything, London: Penguin Books.
74
REFERENCES
Lissitz, R. W. and H. Jiao (eds.) (2012) Computers and Their Impact on State Assessments, Charlotte,
NC: Information Age Publishing, Inc.
Lortie, D. C. (1975) Schoolteacher: A Sociological Study, Chicago, Ill.: University of Chicago Press.
Massachusetts Business Alliance for Education (2014) The New Opportunity to Lead: A Vision for
Education in Massachusetts in the Next 20 Years. Available at http://www.mbae.org/wp-content/
uploads/2014/03/New-Opportunity-To-Lead.pdf (accessed 13 November 2014).
Mehta, J. (2013) The Allure or Order: High Hopes and Dashed Expectations and the Troubled Quest
to Remake American Schooling, Oxford: Oxford University Press.
Mislevy, R. (2013) Postmodern Test Theory, in The Gordon Commission, To Assess, To Teach,
To Learn: A Vision for the Future of Assessment Technical Report. Available at http://www.
gordoncommission.org/rsc/pdfs/gordon_commission_technical_report.pdf (accessed 13
November 2014).
Mislevy, R. J., J. T. Behrens, K. E. Dicerbo and R. Levy (2012) Design and Discovery in Educational
Assessment Evidence-Centered Design, Psychometrics, and Educational Data Mining, Journal
of Educational Data Mining, 4 (1): Article 2. Available at http://researchnetwork.pearson.com/
wp-content/uploads/mislevyetalvol4issue1p11_48.pdf (accessed 13 November 2014).
Oates, T. (2013) Tiering in GCSE: Which Structure Holds Most Promise? Available at
http://www.cambridgeassessment.org.uk/Images/138921-tiering-in-gcse-which-structure-holds-
most-promise-.pdf (accessed 13 November 2014).
OECD (2008) 21st Century Learning: Research, Innovation and Policy Directions from Recent
OECD Analyses. Available at http://www.oecd.org/dataoecd/39/8/40554299.pdf (accessed 13
November 2014).
(2010) PISA 2009 Results: What Makes a School Successful? Resources, Policies and Practices
(Volume IV). Available at http://dx.doi.org/10.1787/9789264091559-en (accessed 13
November 2014).
75
REFERENCES
(2011) PISA 2009 Results: What Makes a School Successful? Resources, Policies and Practices,
vol. IV. Available at http://dx.doi.org/10.1787/888932343285 (accessed 24 November 2014).
(2013b) Pisa 2012 Results: What Students Know and Can Do Student Performance in
Mathematics, Reading and Science, vol. I, Tables 1.4.3b and 1.2.3b, Annex B1. Available at http://
www.oecd.org/pisa/keyfindings/pisa-2012-results-volume-i.htm (accessed 13 November
2014).
(2013c) PISA 2012 Results: What Makes Schools Successful? Resources, Policies and
Practices, vol. IV. Available at http://www.oecd.org/pisa/keyfindings/pisa-2012-results-volume-iv.
htm (accessed 18 November 2014).
Pellegrino, J. W., M. L. Hilton, Committee on Defining Deeper Learning and 21st Century Skills
(2012) Education for Life and Work: Developing Transferable Knowledge and Skills in the 21st
Century, Washington, DC: National Research Council of the National Academies. Available
at http://www.leg.state.vt.us/WorkGroups/EdOp/Education%20for%20Life%20and%20
Work-%20National%20Academy%20of%20Sciences.pdf (accessed 18 November 2014).
Phillips, T. (2013) Bras Begone: China Clamps Down on Cheating in University Entrance Exams
by Banning Brassieres, The Telegraph, available at http://news.nationalpost.com/2013/06/06/
bras-begone-china-clamps-down-on-cheating-in-university-entrance-exams-by-banning-
brassieres (accessed 15 November 2014).
Pierson, R. (2011) Learning Ecosystem Brief Design Paper: Mass Customized, Personalized Learning.
Available at http://www.innovationunit.org/sites/default/files/Pierson paper.pdf (accessed 13
November 2014).
Polokoff, M. S., A. J. McEachin, S. L. Wrabel and M. Duque (2014) The Waive of the Future?
School Accountability in the Waiver Era, Educational Researcher, 43 (1): 4554. Available at
http://edr.sagepub.com/content/43/1/45.full.pdf+html?ijkey=LoPEgefArEO0M&keytype=
ref&siteid=spedr (accessed 13 November 2014).
Popham, W. J. (2008) Transformative Assessment, Alexandria, Va.: Association for Supervision and
Curriculum Development.
76
REFERENCES
Sergiovanni, T. J. (2000) Changing Change: Towards a Design Science and Art, Journal of
Educational Change, 1 (1): 5775.
Soland, J., L. Hamilton and B. M. Stecher (2013) Measuring 21st-Century Skills: Guidance for
Educators, Working Paper, Rand Education. Available at http://www.rand.org/pubs/external-
publications/EP50463.html (accessed 13 November 2014).
Statistics Commission (2005) Measuring Standards in English Primary Schools, London: Statistics
Commission.
Tucker, M. (2013a) Linn and Everson on Testing, Standards and Accountability, available at http://
blogs.edweek.org/edweek/top_performers/2013/09/linn_and_everson_on_testing_standards_
and_accountability.html (accessed 15 November 2014).
Wigdor, A. K. and B. F. Green Jr. (eds.) (1991) Performance Assessment for the Workplace, vol. I.
Committee on the Performance of Military Personnel, Commission on Behavioral and Social
Sciences and Education, National Research Council, Washington, DC: National Academy Press.
Zhang, M. (2013) Contrasting Automated and Human Scoring, R&D Connections, 21 (March).
Available at http://www.ets.org/Media/Research/pdf/RD_Connections_21.pdf (accessed 13
November 2014).
77
REFERENCES
78
Pearson
80 Strand
London
WC2R 0RL
T +44 (0)20 7010 2000
F +44 (0)20 7010 6060
www.pearson.com
@Pearson #PearsonResearch