Beruflich Dokumente
Kultur Dokumente
Compulsory Readings:
1. To obtain some hints about contextualizing assessments :
a) Three wishes for PISA for Development
In the story of Aladdin, the hero finds a magic lamp, which, when rubbed, releases a genie
who grants him his every wish. Not surprisingly, this soon gets him into trouble. Recent
announcements by the OECD suggest that the genie is out of the lamp for international
assessments of educational achievement, and like Aladdin, we should choose our wishes
wisely.
Barbara Ischinger and Andreas Schleicher of the OECD recently came to the World Bank to
discuss the OECDs new initiative called PISA for Development. Most of us are already
familiar with the OECDs PISA exercise, which assesses the reading, mathematics, and
science competencies of 15-year olds around the world. The aim of PISA for Development is
to identify how PISA can best support evidence-based policy making in emerging and
developing economies that, until now, have been unable or unwilling to participate in the
main PISA survey. This will be done through the following adaptations to the PISA design:
The expected outcome is a set of enhanced PISA instruments that are more relevant for the
contexts found in emerging and developing economies, but which produce scores that are on
the
same
scale
as
the
main
PISA
assessment.
Many of these adaptations to PISA have been on the wish list of those working with emerging
and developing economies for some time, and have the potential if handled well to be
game changers in terms of the policy utility of PISA for these economies. This leads to my
three
wishes
for
PISA
for
Development.
Wish #1 -Test questions that 15 year olds in emerging and developing economies can
actually
answer
The success of PISA for Development rests on the extent to which the OECD can come up
with a test that is capable of capturing the range of achievement among 15-year-olds in
participating economies. However, based on what Ive read, the OECD is planning to draw
primarily on existing test questions from the PISA pool. This could be problematic because
we already know from experience that emerging economies that have participated in the
regular PISA survey sometimes end up with so many zero scores on the achievement tests
that they are unable to do much with the data. Ensuring sufficient questions that tap into
lower-than-usual levels of the achievement scale will allow economies participating in PISA
for Development to better harness the policy value of the enhanced background
questionnaires as well as to better understand the achievement levels of their out-of-school
populations.
Wish #2 - A test that emerging and developing economies can afford
Several emerging and developing economies from Latin America to Africa to Asia - are on
the list of potential participants in the PISA for Development pilot. The unit cost of their
participation around US$800K-US$900K is more expensive than a regular PISA exercise
because the pilot will include considerable developmental work on instruments and a
methodology for including out-of-school children. While a variety of funding sources will
cover the costs for this round, looking ahead, it will be important to come up with a model
that is financially feasible for emerging and developing economies to handle on their own, or
with
limited
external
help.
Wish
#3
A
test
that
contributes
to
learning
for
all
A PISA for Development test that is affordable, and that yields detailed information on
achievement levels at the lower end of the scale, would be a valuable contribution to the post2105 development agenda, which almost certainly will include a global learning goal. In
order for the test to actually contribute to the attainment of such a goal, however, it will need
to incorporate into its design a laser-like focus on how the results can most effectively be
used to support and improve learning in the participating economies. At the end of the day, a
test is just a test. The real power the magic comes from what we do with the results.
The lamp takes Aladdin on a series of adventures, which, after some twists and turns, leads to
a happy ending. Lets choose our own wishes for PISA for Development carefully to ensure
we do the same.
As we work in an international organization, we often get a question such as: What is the
right assessment for our country, an international assessment, a regional assessment, or
our own national assessment? This is probably due to the tremendous growth in the
number of assessments.
Since the International Association for Evaluation of the Educational Achievement (IEA)
implemented the first international mathematics study in the late 1950s, many large-scale
assessments have appeared in the field, and much more countries have been participating
in these assessment studies. For example, when OECD first implemented the Programme
for International Student Achievement (PISA) in 2000, there were 28 countries who
participated, while in 2012, this number increased to some 65. At the same time, there
have been a number of regional assessments that have collected assessment data since the
1990s. These include Southern and East Consortium for Monitoring Educational Quality
(SACMEQ), Programme dAnalyse des Systmes Educatifs des Pays de la CONFEMEN
(PASEC), Laboratorio Latinoamericano de Evaluacin de la Calidad de la Educacin
(LLECE), etc. Many countries have also established national assessments, it has been
reported that most of the countries have conducted at least one national assessment since
2000, the year of the Dakar EFA conference.
There is no straight answer to the question, but certain issues must be considered in order
to decide which assessment might be most appropriate for your country.
1. Subject matter What subjects are considered important in your country? They could be
basic cognitive skills such as mathematics, reading, and science. How about life skills, such
as health knowledge, global citizenship, sustainable development knowledge,
entrepreneurship? Most of the international and regional assessments deal with the basic
cognitive skills. This was the case for the initial SACMEQ study; however, during the
SACMEQ Assembly of Ministers in 2005, Ministers wished to have information on the level
of HIV and AIDS knowledge of the Grade 6 pupils. A new initiative, South East Asia Primary
Learning Metrics (SEA-PLM), which is organized by the UNICEF and South East Asia
Ministries of Education Organization (SEAMEO) is planning to measure global citizenship,
based on the regionally common values.
2. Test framework Do you want to see if the students have acquired the knowledge taught
at school, or see if the schools prepared students to be ready for the world of work? Most of
the assessments, such as IEAs TIMSS, PIRLS, as well as regional assessments such as
SACMEQ, PASEC, and LLECE are based on the school curriculum. For example, in the
1990s when Zimbabwe launched its first national assessment, the concern was to see if the
intended school curriculum was actually implemented by the teachers and achieved by the
pupils, and therefore, this became the model for SACMEQ. On the other hand, PISA and
PISA for Development are exceptions, where the test framework is rather forward looking
and assesses the students competencies to be demonstrated in a totally novel situation.
3. Target population Which grade level or age level needs to be assessed? Are you
concerned about the early grade pupils who need to be diagnosed? Then what language
should be used for the tests? If the local mother tongue is used as the language of instruction,
then standardization of such variety of test forms could be a big challenge. On the other hand,
you may be challenged if you administer the tests in a unique language at this early level.
This was the reasons in which SACMEQ countries opted for Grade 6. While PASEC also test
at Grade 6, they also test Grade 2, and therefore a careful test construction and test operation
would be required. IEAs TIMSS administers at Grade 4, as well as Grade 8 to see the
progress between these two grades. Even if you are concerned about the comprehension of
school subjects at the end of the cycle, the question may not be about which grade level, but
which age level. The grade-based target population is the norm when the test framework is
based on the school curriculum, prescribed for different grade levels. However, PISA and
PISA for Development use the age-level target population, 15 years of age. The challenge for
many developing countries using the age-based target population is the large proportion of
the students who may belong to different grade levels or different cycles due to repetitions or
late entries to schools, or already dropped out of the system.
4. What to do with the assessment data What is the purpose of measuring the learning
achievement? If it is for a certification of a cycle or selection into a higher cycle, then public
examinations would be sufficient. If the countries would like to connect the achievement data
with other background information such as teacher characteristics, teacher performance,
school resources, home background, etc., then background questionnaires are required. You
may be also interested in evaluating the competency of teachers. In this case, a teacher test
must be considered, as was the case for SACMEQ studies since 2000. This allowed
SACMEQ policy makers to realize not only the teachers pedagogical skills but also the
subject knowledge was critical for learning improvement. Does the international and/or
regional assessment that you are about to join include the kind of information that are the
priority area of the Ministry?
5. Research capacity needs The Ministries of Education will have to ensure that they have
the technical capacity to carry out the assessment. This includes the sampling of schools and
students, test construction and item analyses, data processing, data analyses, and report
writing. The advantage of belonging to regional or international assessments would be that
the same methodology is shared and a common capacity building is usually provided by the
technical teams for the participating members. In certain cases, participating member
countries could collaborate and help each other, as experienced by some SACMEQ countries.
6. Ownership Capacity building should lead to the countries ownership of the assessment
study. This includes ownership of the data, instruments, and all the methodologies. This
would allow countries to do more autonomous secondary data analyses using the data
collected, after the international and/or regional reporting, so that the assessment results and
policy recommendations could be also linked back to the original policy concerns. Both
SACMEQ and PASEC use this strategy where country teams are responsible for producing
more focused analyses.
The Ministry may be also interested in joining all of the existing international and regional
assessment networks, as well as conducting its own national assessment. But we need to
remember that to improve quality of education, it's not the level of participation in assessment
studies that matters, it's the way assessment results are used. After all, the assessment is
merely one small part within the policy reform.
c) http://unesdoc.unesco.org/images/0021/002193/219349E.pdf
(The PDF will be found in the folder)
Individual assessments
Individual assessments for pupils can be formative and give feedback to pupils and teachers
on their skills and progression, or they can besummative, in the form of final grades or
examination results.
In classrooms, teachers may design formative or summative tests to evaluate whether
students are following the curriculum. Formative tests are diagnostic in nature: teachers want
to know if learning is taking place and, if it is not, to provide appropriate interventions.
Formative tests are also important for feedback to pupils and parents about the pupils
progression. Tests can also be summative, conducted at the end of the unit, term, or year, to
determine whether students have acquired the required knowledge and skills. Teacherdesigned tests are generally used as an assessment tool within a classroom or grade. They do
not compare student learning across schools.
Public examinations have different aims to class-based tests. The results are typically used to
certify that individual students have attained a certain level in their studies. An examination
can also be used to assess whether or not schools are implementing the curriculum and
whether teachers are delivering appropriate instructions. They may also be used to select
students for further education. When the students educational or professional future is
dependent on their performance in an examination, it is referred to as a high-stakes
assessment.
These aspects are typically defined within a national assessment framework. Following are
some further considerations regarding each issue:
A national assessment program can serve multiple purposes, and the main purpose should
determine the design of the assessment. It is therefore very important to be clear from the
beginning about the main purpose. The use of one single test for several purposes might be
inappropriate as the information ideally required in each case is not the same. Therefore,
education authorities are advised to rank the different purposes in order of priority and adjust
test designs accordingly. (SeeStandards, Accountability, and Student Assessment Systems,
Canadian Education Association (CEA))
There are three general purposes for most national assessments. The first group consists of
tests which summarize the achievement of individual pupils at the end of a school year or at
the end of a particular educational stage, and which have a significant impact on their
educational careers. These are high stakes tests, which are often referred to as summative.
Second are assessments intended to monitor and evaluate schools and/or the education system
as a whole. In this case, test results are used as indicators of the quality of teaching and the
performance of teachers, but also to gauge the overall effectiveness of education policies and
practices. A third category is composed of assessments that are mainly for the purpose of
assisting the learning process of individual pupils by identifying their specific learning needs
and adapting teaching accordingly.
Competencies to be tested
The assessment domains can either be based on particular subjects in the curriculum, or can
test key competencies for learning across subjects, such as numeracy, literacy, problem
solving, or information and communication skills. The assessment of key competencies will
be most relevant for formative assessment programs designed to monitor education systems
and/or identify individual learning needs. All national assessments measure cognitive skills in
the areas of language/literacy and mathematics/numeracy, a reflection of the importance of
those outcomes for basic education. In some countries, knowledge of other areas, such as
science, social studies, particular languages or other domains, are also included in an
assessment.
Whatever the domain of the assessment, it is important to develop an appropriate framework
that clearly defines the competencies and skills to be tested, and a test specification. This is
necessary both for constructing assessment instruments and afterward for interpreting results.
Examinations and tests to monitor schools are often compulsory for all pupils, while tests that
concentrate on evaluation of the educational system as a whole are often administered to a
representative sample. If a test is sample based, it is necessary to consider how the results are
to be reported when the sample is defined. If results are to be broken down by regions, school
types, gender, language of instruction etc., one has to make sure that the sample is
representative at all those levels.
Test design
To ensure the validity of the test, it must consist of test items representing the whole range of
the test domain described in the framework. The test must also contain enough items for each
proficiency level. Items can be either multiple-choice or open ended, or a combination of
both. Open ended questions require a very strict scoring manual and schooling of scorers,
however. In many countries there is now a rapid movement from paper based towards
computer based testing. This opens up the possibility of adaptive testing, where the test is
automatically adjusted to the students proficiency level. This method allows for more precise
measurement of the whole competency and more targeted testing.
A rotated test design (matrix sampling) is often used for sample based tests to monitor a
whole education system. In a rotated design, the test is constituted on a set of booklets or in
blocks each representing only a part of the whole test. Each student only answers one
booklet, which can contain different blocks of material. This enables testing of a large set of
items without making the test too long for each student. However, with this method it is not
possible to deliver individual results for students.
All test items for any of these types of tests must be piloted and analysed using psychometric
methods before they are used in the final test, in order to make sure that the test meets all
requirements for validity and reliability.
Measuring trends
To measure development of learning achievement over time, the test must contain a set of
anchoring items, which are repeated every cycle. The anchoring items will be used to make
sure the reported proficiency levels represent the same level of difficulty over time. In other
words that the numerical results always represent the same level of competency. Anchoring
items must be kept confidential to ensure the same test conditions over time. Only by using a
test design of this type can trends be monitored.
Test expertise
Development of national tests requires high expertise, both curricular and content specific
expertise and high psychometric competence. An important consideration is how to ensure
relevant expertise during the whole process. In some countries there are national institutes or
test centres that contribute the necessary expertise, but often this is not the case. There are,
however, a number of national and international test institutes that will be able to support
countries and provide important capacity building.
Useful Links:
e1.http://www.uis.unesco.org/Education/Documents/assessing_national_achievement_le
vel_Edu.pdf
e2. http://eacea.ec.europa.eu/education/eurydice/documents/thematic_reports/109en.pdf
e3. http://www.ets.org/k12?WT.ac=clkf
e4. http://www.cito.com/
e5. http://www.washington.edu/assessment/testing-center/
e6. http://www.nap.edu.au/naplan/naplan.html
e7. https://www.nfer.ac.uk/shadomx/apps/fms/fmsdownload.cfm?file_uuid=67EAAF91C29E-AD4D-07F1-A9373EA17105&siteName=nfer
e8. http://nmssa.otago.ac.nz/index.htm
e9. https://www.gov.uk/government/organisations/standards-and-testing-agency
Enabling context refers to the broader context in which the assessment activity takes
place and the extent to which that context is conducive to, or supportive of, the
assessment. It covers such issues as the legislative or policy framework for
assessment activities, institutional and organizational structures, the availability of
sufficient and stable sources of funding, and the presence of trained assessment staff.
System alignment refers to the extent to which the assessment is aligned with the rest
of the education system. This includes the degree of congruence between assessment
activities and system learning goals, standards, curriculum, and pre- and in-service
teacher training.
Crossing the quality drivers with the different assessment types/purposes provides the
framework and broad indicator areas shown in Table 1. This framework is a starting point for
identifying indicators that can be used to review assessment systems and plan for their
improvement.
Table 1: Framework for building an effective assessment system, with indicator area
http://www.oecd.org/general/thecasefor21st-centurylearning.htm
Anyone wondering why knowledge and skills are important to the future of our economies
should consider two facts.
First, jobs: employment rates are higher among people with more education than among
those with less. This has continued to be the case during the crisis. Also, in those OECD
countries where college education has expanded most over recent decades, learning
differentials for college graduates have continued to rise compared with school leavers, for
instance. Their pay did not decrease, unlike that of low-skilled workers. So from a jobs
perspective, it pays to study.
This is a good, concrete argument for skilling up. But the case for 21st century learning
goes deeper than this and is more abstract. It is about how knowledge is generated and
applied, about shifts in ways of doing business, of managing the workplace or linking
producers and consumers, and becoming quite a different student from the kind that
dominated the 20th century. What we learn, the way we learn it, and how we are taught is
changing. This has implications for schools and higher level education, as well as for
lifelong learning.
For most of the last century, the widespread belief among policymakers was that you had to
get the basics right in education before you could turn to broader skills. It's as though
schools needed to be boring and dominated by rote learning before deeper, more
invigorating learning could flourish.
Those that hold on to this view should not be surprised if students lose interest or drop out
of schools because they cannot relate what is going on in school to their real lives.
If you were running a supermarket instead of a school and saw that 30 out of 100
customers each day left your shop without buying anything, you would think about
changing your inventory. But that does not happen easily in schools because of deeply
rooted, even if scientifically unsupported, beliefs that learning can only occur in a
particular way.
In 2010, the world is now more indifferent to tradition and past reputations of educational
establishments. It is unforgiving to frailty and ignorant of custom or practice.
We live in a fast-changing world, and producing more of the same knowledge and skills
b) UIS and Brookings. 2013. Toward universal learning: A global framework for measuring
learning. Montreal and Washington DC: UIS and Brookings. Page 8 Table 1 illustrates
suggested learning domains to be acquired (and measured). (The PDF version is available in
the folder)