Sie sind auf Seite 1von 60

Penilaian Kinerja

Untuk Menilai
Proses
Oleh :
Markus Simeon K.
Maubuthy

efinition |Performance Assessment


Thorndike (1971:238)
Definition
Target
Compone
nt
+/Validity &
Reliability
Examples

... one in which some criterion situation is simulated to a much


greater degree than is represented by the usual paper and pencil
test.

James H. McMilan (2007:229)


A performance assessment is one in which the teacher observes
and makes a judment about the students demonstration of a skill
or competency in creating a product, constructing a response, or
making a presentation.

Nitko & Brookhart (2007:244)


A performance assessment (a) presents a task requiring
students to do an activity (make something, produce a
report, or demonstrate a process) that requires applying
their knowledge and skill and (b) uses clearly defined
criteria to evaluate how well the student has achieved this
application.

efinition |Performance Assessment


Definition
Target
Compone
nt
+/Validity &
Reliability
Examples

Pedoman MP IPA (Lampiran Permendikbud No. 58 Thn 2014,


465)

Penilaian kinerja atau praktik dilakukan dengan penilaian


kinerja, yaitu dengan cara mengamati kegiatan peserta
didik dalam melakukan sesuatu.

efinition |Performance Assessment


How to Design a Performance Assessment

Definition
Target
Compone
nt
+/Validity &
Reliability
Examples

Clarifying
Performan
ce

Developin
g
Performan
ce
Exercises

Scoring
and
Recording
Result.

Behavior to be demonstrated or Product to be


created?
Individual performance or group performance?
Performance criteria
The ways to cause students to perform in a
manner that will reveal their level of proficiency.
Stuctured assignment or naturally occuring
events
Defines targets, conditions, and standards
Number of exercises.

efinition |Performance Assessment


How to Design a Performance Assessment

Definition
Target
Compone
nt
+/Validity &
Reliability
Examples

Clarifying
Performan
ce

Developin
g
Performan
ce
Exercises

Scoring
and
Recording
Result.

Level of detail results (holistik or analytical)


Recording procedures
Checklist

Rating Scale

Anecdotal
Record

Mental Record

List or key
attributes of
good
performance
checked present
or absent

Performance
continuum maped
on several point
numerical scale
ranging from low to
high

Student
performance
is described
in detail in
writing

Assessor store
judgments and/or
descriptions of
performance in
memory

Quick, useful
with large
number of
criteria

Can record
judgment and
rationale with one
rating

Can provide
rich potraits
of
achievement

Quick and easy way


to record

Time
consuming

Difficult to retain
accurate

Result can lack


Can demand
extensive,
depth
Identify the rater

efinition |Performance Assessment


How to Design a Performance Assessment

Definition

Menentukan indikator

Target

Memilih fokus asesment

Compone
nt

Memilih tingkat realisme

Memilih metode observer,


pencatatan, dan penskoran

Mengujikan task dan rubtik

Memperbaiki task dan rubrik untuk


pembelajaran berikutnya

+/Validity &
Reliability
Examples

(Ana Ratna Wulan)

efinition |Performance Assessment


When I choose to focus on process?
Definition
Target
Compone
nt
+/Validity &
Reliability
Examples

The steps in a procedure can be


specified and have been explicitly
taught.

The extent to which an individual


deviates from accepted procedure can
be accurately and objectively
measured.

3
4

Much or all of the evidence needed to evaluate


performance is to be found in the way that
performance is carried out and/or little or none of
the evidence needed to evaluate performance is
present at the end of performance.

An ample number of persons are available to


observe, record and score the procedures
used during performance.

argets |Performance Assessment


Definition
Target
Compone
nt
+/Validity &
Reliability
Examples

Thorndike (1971:249)
Scientific Method of Thinking: Ideas logically organized,
well-planned, complete and concise story, thorough and
accurate research, evidence that student knows the
subject thoroughly.
Effective, Dramatic Presentation of Scientific Truths:
Idea of exhibit should be clearly and dramatically
conveyed. Explanatory lettering should be large, neat,
brief.
Creative Ability: Exhibit should be original in plan and
execution. Clever use of salvage materials in important.
New applications of scientific ideas are best of all.
Technical Skills: workmanship, craftmanship, rugged
construction, neatness, safety, lettering, etc.

argets |Performance Assessment


Definition
Target
Compone
nt
+/Validity &
Reliability
Examples

Gronlund (1985:384)
Skill

Speaking, writting, listening, oral reading, performing


laboratory experiment

Work
habits

Efectiveness in planing, use of time, use of equipment,


use of resources

Social
attitudes

Concern for the welfare of others, respect for laws,


respect for the property of others

Scientific
attitudes

Open mindedness, willingness to suspend judgment,


sensitivy to cause-effect relations, an inquiring mind

Interest

Expressed feeling toward variois educational activity

Appreciatio
n

Feeling of satisfaction and enjoyment expressed toward


an instruction.

Adjustment

Relationship to peers, reaction to praise and criticism,


rection to authority, emotional stability, social
adaptability

argets |Performance Assessment


Definition

Stiggins (1994:171-174)

Target

Compone
nt
+/Validity &
Reliability
Examples

Knowledge: Use of reference material to acquire knowledge.


Detemine if student have gained control over a body of
knowledge through the proper and efficient use of reference
materials.
Reasoning: Application of that knowledge in variety of problemsolving contexts.
Skill: Proficiency in a range of skill arenas.
Affective: Feelings, attitudes, values, and other affective
characteristics

Caution!
The only target for which performance assessment is
not recommended is the assessment of simple
elements or complex components of subject matter
knowledge to be mastered through memorization.

argets |Performance Assessment


Definition
Target
Compone
nt
+/Validity &
Reliability
Examples

Marzano, et al (Nitko & Brookhart, 2007:244)


Complex thingking learning target
a. Effectively translate issues and situations into meaningful
tasks that have a clear purpose.
b. Effectively uses a variety of complex reasoning strategies.
1. Comparison
2. Classification
3. Induction
4. Deduction
5. Error Analysis
6. Constructing Support
7. Abstracting
8. Analyzing Perspectives
9. Decision Making
10. Investigation
11. Problem Solving
12. Eksperimental Inquiry

argets |Performance Assessment


Definition
Target
Compone
nt
+/Validity &
Reliability
Examples

Marzano, et al (Nitko & Brookhart, 2007:244)


Information processing learning target your ability to
review and evaluate how valuable each source of
information is to the parts of your project.
a. Effectively interprets and synthesizes information
b. Effectively uses a variety of information-gathering
techniques and resources.
c. Accurately assesses the value of information
d. Recognizes where and how projects would benefit from
additional information.

argets |Performance Assessment


Definition
Target
Compone
nt
+/Validity &
Reliability
Examples

Marzano, et al (Nitko & Brookhart, 2007:244)


Habits of mind learning target your ability to effectively
define your goal in the assiggment and to explain your plan
for attaining the goal.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.

Is aware of own thinking.


Makes effective plans.
Is aware of and uses necessary resources.
Evaluates the effectiveness of own actions.
Is sensitive to feedback.
Is accurate and seeks accuracy
Is clear and seeks clarity.
Is open-minded.
Restrains impulsivity.
Takes a possition when the situation warrants it.
Is sensitive to the feelings and level of knowledge of other.
Engages intensively in tasks even when answers or solutions are not
immediately apparent.
13. Pushes the limits of own knowledge and ability.
14. Generates, trusts, and maintains own standarts of evaluation.
15. Generates new ways of viewing a situation outside the boundaries of
standard convention.

argets |Performance Assessment


Definition
Target
Compone
nt
+/-

Marzano, et al (Nitko & Brookhart, 2007:244)


Effective communication learning target your ability to
communicate your conclusions and findings.
1. Expresses idea clearly.
2. Effectively communicates with diverse audiences.
3. Effectively communicates in a variety of ways.
4. Effectively communicates for a variety purposes.
5. Creates quality products.

Validity &
Reliability
Examples

argets |Performance Assessment


Definition

Marzano, et al (Nitko & Brookhart, 2007:244)


Collaboration/Cooperation.
1. Works toward the achievement of group goals.

Target
Compone
nt
+/Validity &
Reliability
Examples

2. Demonstates effective interpersonal skill


3. Contributes to group maintenance.
4. Effectively performs a variety of roles within a group.
Content learning target your understanding of the subject.

omponents |
Restricted
Types

Definition

Extended
Task

Target

Components

Task
description
Task question

Component
Components

Graphic
Qualitative

Rubric

Examples

Numerical

Rating Scale

+/Validity &
Reliability

Performa
Criteria

Holistic
Analitic
Types
General
Specific

Description
graphic

omponents |Performance Task


Definition
Target
Component
+/Validity &
Reliability
Examples

Definition
A performance task is an assessment activity that requires a
student to demonstrate her/his achievement by producing
an extended written or spoken anwer, by engaging in group
or individual activities, or by creating a specific product.
The performance task you administered may be used to
assess product the student produces and/or the process a
student uses to complete the product.
(Nitko & Brookhart, 2007:244).
Performance task is what student are required to do in the
performance assessment, either individually or in group.
(McMillan, 2007:239)

omponents |Performance Task


Definition

Types of Task

relatively brief responses. The task in structured and specific.

Target

Component

Restricted-types tasks target a narrowly defined skill require

Extended-type tasks are more complex, elaborate, and timeconsuming. Extended type tasks often include collaborative work
with small groups of student. The assignment usually requires that

+/Validity &
Reliability
Examples

students use a variety of sources of information.

omponents |Performance Task


Task Description
Definition
Target
Component
+/Validity &
Reliability
Examples

Task description is used to provide a blueprint or listing of


spesifications to ensure that essential criteria are met, that the
task is reasonable, and that it will elicit desired student
performance.
The task description should include the following:
Content and skill target to be assessed
Description of student activities
Group or individual
Help allowed
Resources needed
Teacher role
Administrative process
Scoring procedures

omponents |Performance Task


Task Question or Prompt
Definition

The actual question, problem, or prompt that giving to students


based on the task description.

Target
Component

It needs to be stated so that it clearly identifies what the


outcome is, outlines what students are allowed and
encouraged to do, and explains the criteria that will be used to
judge.

+/-

It also provides a context that helps student understand the


meaningfulness and relevance of the task.

Validity &
Reliability
Examples

omponents |Performance Task


How to Craft A Task
Definition
Target

Develop
Task and
Context
Description

Generate or
Identify
Idea for a
Task

Develop
Task
Question or
Prompt
(McMillan, 2007:239)

Component
Content
standards

Information Processing
Standard

Habits of Mind
Standard

+/Validity &
Reliability

DRAF
T1

DRAF
T2

DRAF
T3

Effective
Communication
Standard

Final Draft of Task

DRAF
T4

Examples
Complex Reasoning
Standard

Collaboration/Cooperative
Standard

(Marzano., et al, (1993) as cited


in Nitko & Brookhart (2007:265)

omponents |Performance Task


Criteria for Performance Task
Definition
Target
Component
+/Validity &
Reliability
Examples

1.

Essential
The task fits into the core of the curriculum. It represents a big idea
2. Authentic
The task should be authentic. Wiggins (1998) as cited in (McMilan, 2007:244)
suggests standard for judging the degree of authenticity in an assessment
task as follows:
1) Realistic. The task replicates the ways in which a persons knowledge
and ability are tested in real-word situation.
2) Requires judgment and innovation. The student has to use knowledge
and skills wisely and effectively to solve unstructured problem, and the
solution involves more than following a set routine or procedure or
plugging in knowledge.
3) Ask the student to do the subject. The student has to carry out
exploration and work within the discipline of the subject area, rather than
restating what is already known or what was taught.
4) Replicates or stimulates the contents in which adults are tested in
workplace, in civic life, and in personal life. Contexts involve specific
situation that have particular constraints, purposes, and audiences.
5) Assesses the students ability to efficiently and effectively use a repertoire
of knowledge and skill to negotiate a complex task. Students should be
required to integrate all knowledge and skills needed.

omponents |Performance Task


Criteria for Performance Task
Definition
Target
Component
+/Validity &
Reliability
Examples

6)

Allows appropriate opportunities to rehearse, practice, consult resources,


and get feedback on and refine performance and products. Rather than
rely on secure tests as an audit of performance, learning should be
focused through cycles of performance-feedback-revision-performance,
on the production of known high-quality products and standars, and
learning in context.

As standards has been developed by Freed Newmann (1997) cited in


(McMilan, 2007:245) stated that authentic tasks require the following:
) Construction of meaning (use of reasoning and higher-order thinking skills
to produce meaning or knowledge)
1) Organization of information
2) Consideration of alternatives
) Disciplined inquiry (thinking like experts searching for in-depth
understanding)
3) Disciplinary content
4) Disciplinary proses
5) Elaborted written communication
) Value beyond school (aesthetic, utilitarian, or personal value apart from
documenting the competence of the learner)
6) Problem connected to the world
7) Audience beyond the school

omponents |Performance Task


Criteria for Performance Task
Definition
Target
Component
+/Validity &
Reliability
Examples

3.
4.
5.

Structure the task to assess multiple learning targets.


Structure the task so that you can help students succeed.
Fesible Think through What Student Will Do to be sure that the task is
feasible
) It is developmentally appropriate for students. It should be realistic for
students to implement the task. (Consider: resources, time, costs, and the
opportunity to be successful).
) It is safe.
6. The task should allow for multiple solution.
7. The task should be clear
8. Engaging The task should be challenging and stimulating to students
) The task is thought provoking.
) It fosters persistence.
9. Include explicitly stated scoring criteria as part of the task.
10. Include constraints for completing the task.

omponents |Scoring Rubric


Definition
Definition
Target
Component
+/Validity &
Reliability
Examples

A coherent set of rules using to assess the quality of a


students performance.
The rules guide your judgments and ensure that you
apply your judgments consistenly.
The rules may be in the form of rating scale or a
checklist.
(Nitko & Brookhart, 2007:244).
Rubric contains scoring criteria/performance criteria and
rating scale. A rubric or scoring rubric, is a scoring guide
that uses criteria to differentiate between levels of
student proficiency.
(McMillan, 2007:252).

omponents |Scoring Rubric


Performance Criteria
Definition
Target

Scoring criteria/criteria/performance criteria are what


you look for in student responses to evaluate their
progress toward meeting the learning target.
(McMilan, 2007)

Component
+/-

The specific behaviors a student should display when


properly carrying out a performance or creating a
poduct.
(Russel & Airasian, 2012:209)

Validity &
Reliability
Examples

omponents |Scoring Rubric


Developing Performance Criteria
Definition

Step
1

Reflective brainstorming

Step
2

Categorize the many elements

Component

Step
3

Define each key dimension in clear, simple


language

+/-

Step
4

Contrasting

Validity &
Reliability

Step
5

Describing success

Examples

Step
6

Revising & Refining

Target

(Stiggins, 1994:181-186)

omponents |Scoring Rubric


Developing Performance Criteria
Definition
Target
Component
+/Validity &
Reliability
Examples

1. Select the performance to be assessed and either perform it


yourself or imagine yourself performing.
2. List the important aspects of the performance.
3. Try to limit the number of performance criteria, so they all can
be observed during a students performance.
4. If possible, have groups of teachers think through the
important criteria included in a task.
5. Express the performance citeria in terms of observable student
behaviors.
6. Do not use ambiguous words that cloud the meaning of the
performance criteria.
7. Arrange the performance criteria in order in which they are
likely to be observed.
8. Check for existing performance criteria before defining your
own.
(Russel & Airasian, 2012:214-215)

omponents |Scoring Rubric


Rating Scale
Definition

Target
Component

A rating scale is used to indicate the degree to which a


particular dimension is present.
Provides a way to record and communicate qualitatively
different level of performance.
Types of rating scales:
Numerical
Uses numbers on a continuum to indicate difference levels
of proficienci in terms of frequency or quality.
Complete understanding 5 4 3 2 1 No understanding

+/Validity &
Reliability
Examples

Qualitative
Uses verbal description to indicate different level
Never, Seldom, Occasionaly, Frequently, Always
Excellent, good, fair, poor
(McMilan, 2007:250-252)

omponents |Scoring Rubric


Rating Scale
Definition
Target
Component
+/Validity &
Reliability
Examples

Types of rating scales:


Numerical
A series of numbers to indicate the degree to which a characteristic
is present. Typically, each of a series of numbers is given a verbal
description that remain constant from one characteristic to another.
Directions: Indicate the degree to which this pupil contributes to class
discusion by circling the appropriate number. The number
represent the following values: 5 outstanding, 4 above
average, 3 average, 2 below average, and 1
unsatisfactory
1. To what extent does the pupil participate in discusion?
1 2 3
4 5
Graphic Rating Scale
The distinguis feature if the graphic rating scale is that each
characteristic is followed by a horizontal line.

omponents |Scoring Rubric


Rating Scale
Definition

Types of rating scales:


Description Graphic Rating Scale
Uses descriptive phrase to identify the points on a graphic scale.

Target
Component
+/Validity &
Reliability
Examples
(Gronlund, 1985:391-392)

omponents |Scoring Rubric


How to Craft a Rubric
Definition

General Steps In Preparing and Using Rubrics


Step Select a performance/proces to be assessed.
1

Target
Component
+/Validity &
Reliability
Examples

Step State performance criteria for the process.


2
Step Decide on the number of scoring levels for the rubrics, usually three to five.
3
Step
4

State the description of performance criteria at the highest level of student


performance.

Step
5

State the description of performance criteria at the remaining scoring


levels.

Step Compare each students performance with each scoring level.


6
Step Select the scoring level closest to a students actual performance.
7
Step Grade the student.
8

omponents |Scoring Rubric


How to Craft a Rubric
Definition
Target
Component
+/Validity &
Reliability
Examples

Top-Down Approach
The top-down approach begins with a conceptual frame-work that you can
use to evaluate students performance to develop scoring rubrics; follows
these steps:
1. Adapt or create a conceptual framework of achievement dimensions
that describes the content and performance that you should assess.
2. Develop a detailed outline that arrange the content and performance
from step 1 in a way that identifies what you should include in the
general rubric.
3. Craft a general scoring rubric that conforms to this detailed outline and
focuses on the important aspects of content and process to be
assessed across different tasks. It can be used as is to score student
work, or it can be used to craft specific rubrics.
4. Craft a specific scoring rubric for the specific performance task you
are going to use.
5. Use the specific scoring rubric to assess the performances of several
students; use this experience to revise the rubric as necessary.

omponents |Scoring Rubric


How to Craft a Rubric
Definition
Target
Component
+/Validity &
Reliability
Examples

Bottom-Up Approach
With the bottom-up approach you begin with samples of students work,
using actual responses to create your own framework. Use examples of
different quality levels to help you identity the dimensions along which
student can be assessed, follow these steps :
1. Obtain of about 10 to 12 students actual responses to a performance
item. Be sure the responses you select illustrate various levels of
quality of the general achievement you are assessing.
2. Read the responses and sort all of them into three groups: high-quality
responses, medium-quality responses, and low-quality responses.
3. After sorting, carefully study each students responses within the
groups, and write very specific reasons why you put that responses
into particular group.
4. Look at your comment across all categories and identify the emerging
dimensions.
5. Separately for each of the three quality levels of each achievement
dimension you identified in step 4, write a specific student-centered
description of what the responses at the level are typically like.

omponents |Scoring Rubric


How to Craft a Rubric
Definition
Target
Component
+/Validity &
Reliability
Examples

Suggestion for developing rubrics


1. Be sure the criteria focus on important aspect of the performance.
2. Match the type of rating with the purpose of the assessment. If your
purpose is more global and you need an overall judgment, a holistic
scale should be used. If the major reason for the assessment is to
provide feedback about different aspect of the performance, an
analytical approach would be best.
3. The descriptions of the criteria should be directly observable.
4. The criteria should be written so that student, parent, and other
understand them. Recall that the criteria should be shared with
student to incorporate the descriptions as standards in doing their
work.
5. The characteristics and traits used in the scale should be clearly and
specifically defined. You need to have sufficient detail in you
descriptions so that the criteria are not vague. If a few general terms
are used, observed behaviors are open to different interpretations.
The wording needs to be clear and unambiguous .
6. Take appropriate steps to minimize scoring error.
7. The scoring system needs to be feasible.

omponents |Scoring Rubric


Types of Scoring Rubrics
Definition
Target
Component
+/Validity &
Reliability
Examples

Analytic Scoring Rubric


Each criterion is evaluated separately
Gives diagnostic information to teacher
Gives formative feedback to students
Easier to link to instruction than holistic rubrics
Good for formative assessment; adaptable for summative assessment
Takes more time to score than holistic rubrics
Take more time to achieve inter-rater reliability then with holistic rubrics
Holistic Scoring Rubric
All criteria are evaluated simultaneously
Scoring is faster than with analytic rubrics
Requires less time to achieve inter-rater reliability
Good for summative assessmnet
Single overall score does not communicate information about what to
do to improve
Not good for formative assessment

omponents |Scoring Rubric


Types of Scoring Rubrics
Definition
Target
Component
+/Validity &
Reliability
Examples

Generic Scoring Rubric


Description of work gives characteristics that apply to a whole family of
tasks
Can share with students, explicitly linking assessment and instruction
Reuse same rubrics with several tasks or assignments
Supports learning by helping students see good work as bigger than
one task
Support student self-evaluation
Students can help construct generic rubricss
Lower reliability at first than with task-spesific rubrics
Requires practice to apply well
Task-spesific Rubric
Description at work refers to the specific content of a particular task
Teacher sometimes say using these makes scoring easier
Requires less time to achieve inter-rater reliability
Cannot share with students
Need to write new rubrics for each task
For open-ended tasks, good answers not listed in rubrics may be
evaluated poorly

omponents |Scoring Rubric


How to Craft a Checklist
Definition
Target
Component
+/Validity &
Reliability
Examples

List and describe clearly each specific sub


performance or step in the procedure you want the
student to follow

Add to the list specific errors that students


commonly make (avoid unwieldy lists, however)

Order the correct steps and the errors in the


approximate sequence in which they should occur

Make sure you include a way either to check the


steps as the student performs them or to number
the sequence in which the student performs them

omponents |Scoring Rubric


Common Error in Rating
Definition
Target

Component

+/-

Validity &
Reliability

Examples

Leniency error occurs when a rater tends to make almost all rating
toward the high end of the scale, avoiding the low end.
Severity error is the opposite of leniency error.
Central tendency error occurs when a rater hesitates to use
extremes and uses the the middle part of the scale only.
Halo effect occurs when a raters general impression of a person
influences the rating of individual characteristics.
Personal bias occurs when a rater tends to rate based on
inapropriate or irrelevant stereotypes favoring boys over girls,
whites over blacks, etc.
A logical error occurs when a rater gives similar ratings on two or
more dimensions of performance that the rater believes are
logically related but that are in fact unrelated.
Rater drift occurs when the raters, whose ratings originally agreed,
begin to redefine the rubrucs for themselves.
Reliability decay is related error: Immediately after training, raters
apply the rubrics consistently across students and mark
consistently with one another. However, as time passes, the ratings
become less consistent.

trengths|Performance Assessment
Definition
Target

Component
Strength &
Weakness
Validity &
Reliability

Examples

Integrates assessment with instruction.


Learning occurs during assessment.
Provides the agreement between teachers and students about
assessment criteria and given tasks.
Emphasis pupils to demonstrate a proces that can be directly observed
(Provides additional way for students to show what they know and can
do).
Performance task require integration of knowledge, reasoning, skill, and
abilities.
Emphasis on application of knowledge (real world situation).
Performance tasks clarify the meaning of complex learning targets.
Tends to be more authentic than other types of assessments.
More engaging; active involvement of students.
Provides opportunities for formative assessment.
Performance tasks let teachers assess the processes students use as
well as the products they produce.
Forces teacher to establish specific criteria to identify successful
performance.
Encourages student self-assessment.

eaknesses|Performance Assessment
Definition
Target

Component

Strength &
Weakness

Validity &
Reliability
Examples

High-quality performance task & scoring rubrics are difficult to


craft.
Requires considerable teacher time to prepare and student
time to complete.
Scores from performance tasks may have lower scorer
reliability (reliability may be difficult to establish).
Students performance on one task provides little information
about their performance on other tasks
Limited ability to generalize to a larger domain of knowledge.
Performance tasks do not assess all learning targets well.
Completing performance tasks may be discouraging to less
able students
Performance assessments may underrepresent the learning of
some cultural groups.
Performance assessments may be corruptoble (measurement
error due to subjective narute of scoring may be significant).

Validity|Performance Assessment
Definition

Principle of a good measuring instrument


All good measuring instruments have certain primary qualities.

Target

a. Validity
b. Reliability

Component

c.
d.
e.
f.

+ /Validity &
Reliability
Examples

Objectity
Ease of administering
Ease of scoring
Ease of interpreting

g. Adequate norms
h. Equivalent forms
i. Economy
Noll, et al. (1979:90)

Validity|Performance Assessment
Definition
Target

Validity means the degree to which it is relevant to its


purpose. In the case of performance test, validity is the
degree of correspondence between performance on the test
and ability to perform the criterion activity.
(Thorndike, 1971:240)

Component
+ /Validity &
Reliability
Examples

Validity is the effectiveness of a test for the purposes for


which it is used.
(Noll, et al.,1979:1971)

Validity|Performance Assessment
Source of Information for Validity
Definition
Target
Component
+ /Validity &
Reliability
Examples

Content-related evidence
The extent to which the assessment is representative of the domain
of interest.
When a teacher gives a test that measures appropriately the content
and behavior that are the objectives of intruction.
Criterion-related evidence
The relationship between an assessment and another measure of
same trait. Generally based on agreement between the scores on a
test and some outside measure, called criterion.
Constuct-related evidence
Is concerned with the question of how well differences in test scores
conform to predictions about characteristics that are based on an
underlying theory or construct. Judgments of construct validity are in
fact most often based on a combination of logical analyses and an
accumulation of empirical studies.

Validity|Performance Assessment

Definition
Target
Component
+ /Validity &
Reliability
Examples

To ensure valid performance assessment, students should be


instructed on the desired performance criteria before being assessed.
To improve validity of performance assessment:
Stating performance criteria in terms:
a. setting performance criteria at an appropriate difficulty level for
students
b. limiting the number of performance criteria
c. maintaining a written record of student performance and
checking to determine whether extraneous factors influenced a
students performance
(Russel & Airasian, 2012).
) Be sure that what you require students to do in your performance
activity matches the learning targets and that your scoring rubrics
evaluate those same learning targets.
) Be sure the performance tasks you craft require students to use
curriculum-specified thingking processes.
) Be sure to use many different types of assessment procedures to
sample the breadth of your states standards and your local
curriculums learning targets
(Nitko & Brookhart, 2007:245).

eliability|Performance Assessment
Definition
Target
Component
+ /Validity &
Reliability
Examples

Definition

Reliability refers to the consistency of measurement.

Reliability is concerned with the consistency, stability, and


dependability of scores.

Relaibility is the degree to which students results remain consistent


over replication of an assessment procedure.

A reliable measure in one that provides consistent and stable


indication of the characteristic being investigated.

eliability|Performance Assessment
Definition
Target
Component

Estimating Reliability
The reliability of ratings is an important criterion for evaluating performance
assessments.
Estimating reliability Over Time

Test-retest
Alternate forms on different occasions
Estimating Reliability on a Single Occasion

+ /Validity &
Reliability
Examples

Alternate forms
Coefficient alpha
Split-halves coefficient
Estimating Scorer Reliability

Correlation of two scorers results


Percentage agreement
Kappa coefficient

eliability|Performance Assessment
Estimating Reliability

Definition

Target

Product Moment

Component
+ /Validity &
Reliability
Examples

Spearman-Brown

Flanagan

Rulon

eliability|Performance Assessment
Estimating Reliability

Definition

KR20

Target

KR21

Component
+ /Validity &
Reliability
Examples

Alpha coeficient

Kappa coeficient

Validity|Performance Assessment
How to improve the reliability of ratings
Definition
Target

Component

+ /Validity &
Reliability
Examples

Organize the achievement dimensions within a scoring rubric into


logical groups that match the content and process framework of the
curriculum.
For each achievement dimension, use behavioral descriptors to
define each level of performance.
Provide specimens or examples of students work to help define
each level of an achievement dimension.
Have several teachers work together to develop a scoring rubric or
rating scale
Have several teachers review and critique the draft of a scoring
rubric or rating scale.
Provide training and supervised practice for all persons who will
use the scoring rubric or rating scale.
Have more than one rater rate each students performance on the
task
Monitor raters by periodically sampling their ratings, checking on
the accuracy and consistency with which they are applying the
scoring rubrics and rating scales. Retrain those persons whose
ratings are inaccurate or inconsistent.

Example|

References
Arikunton, S. (2012). Dasar-dasar evaluasi pendidikan (Ed. 2). Jakarta: Bumi Aksara

Definition

Gronlund, N. E. (1985). Measurement and evaluation in teaching. New York: Macmillan


Publishing Company

Target

McMillan, J. H. (2007). Classroom assessment: principles and practice for effective


standar-based instruction (4th ed.). Boston: Pearson Education, Inc

Component
+ /Validity &
Reliability
Examples

Miller, M. D., Linn R. L., Gronlund, N. E. (2009). Measurement and assessment in


teaching. New Jersey: Pearson Education, Inc
Nitko, A. J., & Brookhart, S. M. (2007). Educational assesment of students (5th ed.).
New Jersey: Pearson Education, Inc
Noll, V. H., Scannell, D. P., Craig, R. C. (1979). Introduction to educational
measurement. Boston: Houghton Mifflin Company
Russell, M. K., Airasian, P. W. (2012). Classroom assessment (7th ed.). New York:
McGraw-Hill
Thorndike. R. L. (1971). Educational measurement (2nd ed.). Washington:
Wulan, A. R. Penilaian Kinerja dan Portofolio pada Pembelajaran Biologi (Handout
Penilaian Kinerja dan Portofolio). Bandung: FMIPA UPI