Sie sind auf Seite 1von 31

Assessment in the Applied Studio

Bill Clemmons, Mary Jo Clemmons, and Christine Phillips

Point Loma Nazarene University

Nothing is more central or more basic to a music department than playing and

singing music. Although no one would wish to diminish the critical importance of

music theory, music history, music education, entrepreneurship and other vital

components of a healthy and vibrant music department, it is unlikely that students will

get very far in their discipline unless they are actively engaged with making music. Yet,

as basic as these activities are to a healthy music program, it is often difficult to

accurately measure and assess students' progress in the applied studio and the applied

jury, the standard tool used to assess private instruction. What exactly does the private

lesson jury measure? Does it measure whether a student has met the expectations of the

syllabus in the manner of every other class in the University, met the subjective

approval of the instructor, or does it measure something larger such as “professional

quality tone,” or “mature interpretation?” Each of these would resist objective

measurement, often generating disagreement even among experts, frequently hinging

on matters of taste and preference. Or, consider the difficulty in assessing varying

levels of experience. Can a first-semester student who is just beginning instruction on

her instrument earn the same grade as another first-year student who is performing a

more difficult concerto movement? Questions like this highlight the need for a

deliberate assessment process.

Statement of the Problem

This article presents one approach to tackling some of these issues through a

process known as calibrating or benchmarking. In the calibration exercise developed at

our institution, private-lesson instructors were asked to agree on grading standards

while watching and grading video recorded juries. We felt that this process would

create meaningful dialog among our performance faculty, develop common standards

across applied areas, help to develop a shared vocabulary about measuring standards,

and generate useful assessment data that would help our programs.

A Review of the Literature

Higher education has traditionally struggled with finding an objective

assessment to measure student, juried performances, even though performance is an

essential outcome to music education. Assessment practices in Higher Education have

remained clouded with professor bias on one end, and deciding what is being

measured, whether it is student achievement or student progress (or both), on the other

end. How do we remove these confounding issues and move toward objectivity and

precision? The question remains whether there exists a calibration process that can

remove professor bias and accurately measure student performance achievement while

at the same time provide a measure of individual student progress.

Assessment in the Applied Studio 2

Assessment in the Applied Studio 3
Performance Assessment

Music performance is considered a foundational content standard for the

National Association for Music Education. The National Association of Schools of

Music (1999) suggested that competence on at least one major performing medium

should be expected for students pursuing any baccalaureate degree in music. Since the

dominant method for scoring music performance in higher education is assessment by

juries, (Lebler, Carey & Scott, 2014) the challenge is to create a method that is both a

reliable assessment of the performance and of student achievement. The potential for

subjectivity confounds the reliability of any performance assessment (Ciorba & Smith,

2009) and must be taken into account.

Fiske (1983) noted the need to establish reliable evaluation criteria in 1983, and

establishing reliable criteria was still noted as being under-investigated by Bergee

twenty years later (2003). Additionally, Asmus established that reliable performance

scoring is essential to be useful to both student and teacher to improve instructional

strategies (2003). Wesolowski (2012) also emphasized the importance of clearly defining

the criteria to both judges and students, while Lebler, Carey and Scott added the

dimension of aligning program objectives to improve the educational experience (2014).

There has been an increase in applying assessment criteria in the area of fine arts,

and through the use of rubrics, those assessing musical performance are moving closer

towards more equitable and useful data results. Parkes (2010) and Mintz (2015)

supported the importance of criteria assessment as a critical link in the teaching and

Assessment in the Applied Studio 4

learning process throughout a variety of music settings. According to Asmus (1999),

rubrics provide specific advantages when used to assess music performances and the

author contended that the key elements of a music performance rubric are the

descriptors for what a performance is like within the full range of proficiency levels.

Gordon (2002) supported this claim that the more descriptors included for each

dimension, the more reliable the rubric will become, as long as that number does not

exceed five.

Bergee tested the reliability and validity of specific "criteria rating scales," or

rubrics (2003). His findings supported the concept that the criteria helped applied music

faculty grade more consistently in the jury setting especially when they had access to a

specific tool rather than just commenting on a more broad impression (Bergee, 2003).

He found that when faculty used a rating scale, the feedback provided to performers

was more accurate and balanced. Ciorba and Smith also suggested a good way to

provide strong evidence of student achievement by "combining several performance

dimensions into one standardized multidimensional assessment rubric” (2009, p. 14).

Rubric Creation

Rubrics have proven effective across content areas for the purpose of assessment

in higher education to measure student progress and establish clear learning

expectations for students (Ciorba & Smith, 2009). A rubric can define the difference

between a formative progress assessment, and a summative performance assessment.

Assessment in the Applied Studio 5

These two aspects often stand in conflict with one another and need to be made visible

so that the jury process can be more fair and useful to students (Fiske, 1983). It is also

important that rubric criteria are understandable to faculty. A study by Fiske (1983)

showed reliability was increased when department-wide criterion scales were

developed and shared amongst faculty. A study by Ciorba and Smith (2009) showed a

high inter-judge reliability when a faculty panel developed and shared a common

rubric across all performance areas. This was especially true when the rubric

development process was spread out over a period of at least a semester.

Professor bias must also be minimized for student scores to be reliable. Chase,

Ferguson, and Hoey (2014) encouraged the use of highly detailed rubrics with

underlying definitions to aid in precision to improve a higher degree of consistency

among raters. As reported by Wesolowski, Wind, and Engelhard, a combination of

rater training, the development of exemplars, and a clear benchmarking system proved

optimum in performance assessment reliability (2016).

Assessment Models

Music research within the last twenty years demonstrates increased reliability

when using rubrics (Asmus, 1999; Bergee, 2003; Ciorba & Smith, 2009; Wesolowski,

2012; Fuller, 2014; Mintz, 2015). Researchers agree that consistent, clear criteria

embedded within the rubrics provide the highest rate of reliability in performance

Assessment in the Applied Studio 6

assessment. The reliability of rubrics can be verified in multiple mediums using both

individual performance assessment, and group performance assessment.

Vocal Progress Score Metric (BYU)

The exemplary assessment process used for voice students at Brigham Young

University (BYU) is described in detail by Clayne Robison in his book, Beautiful Singing

(2001). Robison described the systematic way the voice faculty track student progress

from one semester to another, from one year to another, from one teacher to another. It

is based around the "Voice Progress Score Chart" (VPS) - a simple 1-5 rubric with the

additional option to add a "+" or "-" to the score. At every audition, every jury, and

every solo performance, students are given a VPS that is tallied each semester and

plotted into a graph (see Figure 1).

Figure 1—The Voice Progress Score Chart from Beautiful Singing (2001) by Clayne Robison

Rating Explanation
5 In a fully professional setting (e.g., a leading role in a regional
professional company), this performance would have received favorable
press reviews and a significant 'bravo' response from the audience (None
of us has yet given a 5 in an audition).
4 In a featured university setting (e.g., a leading role in a major opera,
oratorio, or music theatre production with orchestra), this performance
would have been completely successful. I would enjoy hearing this
student sing for an hour-long senior recital.
3 In a modest university public performance setting (e.g., a secondary role
in an opera, oratorio or musical theater production with orchestra), this
performance would have been successful. I would enjoy hearing this
student sing for half an hour in a junior recital.
2 In a university classroom performance setting (e.g., in an opera scenes
class or a short recital with piano) this performance would have been
satisfactory. This student's technique is
Sufficiently solid to permit concentration on character projection. I

Assessment in the Applied Studio 7

would remain comfortable during a 15-minute recital
1 Preliminary vocal technical work is still needed before attempting any
significant public singing. This student however shows promise as a voice
major at BYU.
0 Not yet ready to be considered as a voice major.

Robison sights multiple positive outcomes at BYU from the implementation of

the VPS system. One of them was helping to alleviate bias and potential conflicts of

interest since the voice faculty began attending opera auditions and giving each student

a Vocal Progress Score. Robison then based his casting on an overall score instead of his

own personal choice (2001).

"The best assessment methods are those that enable us to connect the dots,

recognize patterns of achievement or non-achievement across our students, and make

adjustments to facilitate their learning" (Chase, Ferguson, & Hoey, 2014, pg. 70). The

VPS allows faculty to "connect the dots" from semester to semester, assessing students

on a professional scale that is not based on their progress from the previous semester,

but based on a professional standard, separate from the jury rubric criteria.

Kansas State High School

For the purpose of testing the reliability of using a rubric to score choral music

festival assessment, Latimer, Bergee and Cohen (2010) evaluated a large-group festival

given by the Kansas State High School Activities Association. They used metrics to

score responses to four questions of reliability: 1) Did the rubric result in internally

consistent ratings? 2) How reliable were adjudicators in assigning performance

Assessment in the Applied Studio 8

dimension scores, global scores, and ratings when using the rubric? 3) What is the

correlation between performance dimension scores, global scores, and ratings? 4) What

was the rubric's perceived level of pedagogical utility, as reported by adjudicators and

directors, and what changes did they recommend? Their conclusions demonstrated

good internal consistency when compared to other, non-rubric forms (Latimer, Bergee,

& Cohen, 2010).

The Assessment Activity and our Goals

As with many music programs, our music department at XXX University is

actively engaged in collecting assessment data to generate useful feedback while

measuring the progress of our goals. We have collected data for years in academic

areas within the music field (e.g. history, theory, composition, etc.), but struggled to

find a way to generate meaningful assessment data about applied studies. For this

reason, we structured an assessment day focused on applied instruction. Our four

assessment-day goals were to 1) promote meaningful discussion about performance

standards, 2) find common ground across applied areas about measuring these

standards, 3) develop a standardized vocabulary, expressed in our shared rubrics, that

we can employ across applied areas, and 4) generate useful assessment data that would

inform departmental decisions. To accomplish these goals we structured our

assessment activity around videos of student juries held a few days before. Our faculty

watched the videos, trying to justify the scoring of a previous jury. The entire day was

Assessment in the Applied Studio 9

structured with an eye towards examining standards, evaluating the jury process, and

creating conversation across performing areas.

Pre-Workshop Planning

Our assessment activity was relatively compact, lasting four hours on a single

day. Advanced planning was needed to maintain efficiency and ensure the success of

the day’s activities. The department agreed to video record the juries in a standardized

manner, develop a set of scoring rubrics for each applied area, store these two

documents (the video and the marked rubric) for the duration of the student’s tenure in

the department, and make both of these documents available for quick retrieval by the

members of each group on the assessment day.

We wanted a standardized recording solution to avoid the possibility of unequal

quality recordings that biased the outcome. We selected some small, inexpensive video

cameras that could record in HD and stereo sound onto large SD cards. The cameras

were extremely simple to operate and the setup and use of the cameras was quick,

efficient and non-threatening. The only responsibility at the end of the jury session was

for one instructor to return the camera to the music office where an assistant would

remove the cards and upload the videos to a shared drive.

Of critical importance to the entire process was ensuring that all videos and

rubrics went into secure storage that guaranteed the student’s privacy and allowed

sharing between the teacher and student. They had to have the capacity to securely

Assessment in the Applied Studio 10

store vast amounts of data for multiple semesters. Our University recently moved to an

online storage solution that provided us with almost unlimited storage capacity. The

caveat with online storage is that video files in HD are quite large and require time to

upload to cloud storage across HTTP. We surmounted this obstacle by setting a queue

and letting the files upload at night. The following morning, the departmental assistant

emailed links to the jury video and the scored and scanned rubrics to the students so

that they could review and discuss them with their applied instructors.

In order to eliminate dead time, we selected inexpensive 8 Gb flash drives and

copied onto them all of the videos and scanned jury rubrics that the group would need.

The videos and rubrics were put on flash drives ahead of time by the department chair

who also made sure that every group had a variety of juries to assess; some that

demonstrated very high-level performance, some at a middle level, and also a few that

demonstrated entry-level performance. Each faculty group met separately in a

classroom with a video projector and a flash drive, and watched and scored at the same


Grading Rubrics

A crucial component of the assessment process was getting the music faculty to

start grading the applied lesson jury from a rubric instead of simply providing

qualitative comments and a grade. This shift in process and thinking required both the

consent of and participation by the faculty, and a willingness to change habits and

Assessment in the Applied Studio 11

process in favor of a new approach that creates shared accountability. We started the

shift towards grading rubrics a year in advance of actually implementing their use.

Faculty members in each performance area were responsible for developing a basic

outline of what its faculty envisioned, what they valued in applied instruction, and

what they wanted the student to demonstrate at the applied jury. We did not attempt

to standardize or drill down the language of the rubrics at this early stage but allowed

each area to freely explore format, language and scoring. What was important was

participation across the department, thinking about the rubrics, and engaging with

language used in measuring performance standards. By the time we ran our

assessment day and calibration exercise, all of the faculty had scored applied lesson

juries twice using paper and pencil copies of the jury rubrics they had developed, and

were familiar enough with the process that they were ready to look at it with a critical

eye towards improvement.

Structuring the Assessment Day

We began our assessment workshop after a light breakfast to minimize or

eliminate stress and to ensure that everyone was comfortable and ready to work. We

broke the faculty participants into groups of three to four in such a way that no one

graded the same jury that they had scored a few days earlier at end of the semester

juries. We also ensured there were no two faculty members in the same musical

discipline in a group.

Assessment in the Applied Studio 12

After watching the video, the group scored the performance against the area

rubric while discussing the scoring as a group. Once the group scored the performance,

they were then asked to look at the score they assigned to the performance and compare

it to the actual jury score discussing how the scores differed. This step was important to

justify their score, or benchmark the manner in which the jury scored the performance

in order to predict and replicate the scores of subsequent performances.

The next step in the process was to score as many videos as possible without

discussion in a manner similar to that of a jury where every member of a jury scores

individually. We limited the scoring activity to an hour and a half in order to alleviate

fatigue. The process took about ten minutes per performance, so in the time allotted for

this activity, the groups were able to score seven to nine videos. The groups were next

instructed to open the original scored rubrics created a few days earlier in the actual

jury. The members of the group then compared their scores to each other as well as to

the original jury scores, keeping note of discrepancies. They were instructed to note if

they were able to accurately predict and replicate the jury grades.

The final component of our assessment day was a set of post-assessment

discussion questions (see Figure 2). There were nine questions that were to be filled out

by each group as a whole and submitted by one person in the group. The groups'

responses then formed the basis for the post-workshop debrief held over lunch. We

purposefully arranged the setting of our lunch to promote dialogue, and moved

through the questionnaire group by group as we ate. In fact, everything throughout the

Assessment in the Applied Studio 13

day was purposefully designed to communicate to the faculty that their time was

valued and their voices were important.

Figure 2—The post assessment day discussion questions.

1. How would you rate the overall quality of the juries that you observed today? Do our
students seem to be doing good work?
2. If you were going to offer one piece of advice that would improve the jury process (not
necessarily the rubric) as you either observed it or experienced it, what would it be?
3. Of the juries that you were assigned, how many were you actually able to assess? How
many juries are actually practical in the time that we have available?
4. When you looked at the jury sheets/grades, were you able to consistently match the
jury’s assessment? Were your standards higher or lower? Was the scoring consistent?
5. Most of us also listen to/serve on juries during finals week. How did the rubric that you
looked at today differ from the one that you use in your own area?
6. What did you see on the rubric that you used today that you thought was a good idea
and would consider adopting for your own area?
7. If you were going to offer one improvement for the rubric that you used today what would
it be?
8. Did the rubric that you used today actually assess all of the areas that you felt needed to
be assessed? Are there missing areas? Are there redundancies, unnecessary items or
9. If you were going to improve the assessment process that we did today, what would you
improve? What worked, what did not? How could we improve what we did today?

Closing the Assessment Loop and Follow Up

Our Assessment Day de-brief created a vast amount of discussion, generating

dozens of ideas for improving our work and process. In fact, there were far more ideas

generated than could be reasonably developed and implemented over the course of the

following year. However, an important part of the assessment cycle, and one of our

stated goals, was generating useful data about the program, and both finding areas for

Assessment in the Applied Studio 14

improvement and implementing the improvements in a reasonable timeframe. To this

end we identified five areas to concentrate on and which we also felt we could

accomplish over the course of the following year:

1. Standardize the look and feel of the rubrics

2. Standardize how the rubrics score

3. Implement an online version of the rubrics

4. Create a separate, non-graded rubric that measures the student's performance

level against professional-level standards

5. Change the culture of the jury to function more like a performance and less like
an exam

Generally, our faculty felt that our students were doing good work and were

pleased with the performances that they observed on the videos. However, our

assessment day workshop was the first time that our faculty had examined the jury

process itself and the mechanics of grading. Faculty also had strong feelings about how

standards differed across disciplines. To this end we agreed to work on our rubrics and

devoted discussion time over the next year towards improving them.

We selected two rubrics whose format we especially appreciated (one originally

created for academic papers and a second from winds and brass) and combined them

into a new, standardized rubric (see Appendix). We liked the larger format of a legal-

size rubric with wide columns and lots of room for comments. From the winds/brass

rubric we liked the scoring columns and totals at the ends of rows in such a way that

the jury grade was calculated from the rubric instead of assigned separately. Our new,

Assessment in the Applied Studio 15

standardized rubric combined these two features and provided generic language that

described what we were hearing instead of simply offering qualitative statements such

as "excellent, good or poor."

The look and feel of the rubric provided a starting point to designing an online

interface that implements the same behavior as the paper copy. That is, by selecting a

box in the rubric the instructor is generating a numerical score. The columns in the

rubric are labeled "Exemplary" "Proficient" "Developing" and "Initial" and have below

them a range of 3-4 numbers, so that a score must be selected within the rubric box.

Unfortunately, our faculty could not agree on a single way to score the rubrics. A

portion of our faculty wanted their area rubrics to score to 100, a move that runs

counter to best practices research which encourages scoring to be as simple as possible

and not contain more than five categories. In spite of a great deal of discussion devoted

to scoring, we still wound up with several different scoring methods, and continue to

discuss and work towards a department-wide solution. The standardized rubric that

we include in the appendix offers one approach along with one scoring solution (see

Figure 3) that is a snapshot of where we are now. Our work in this area is ongoing.

Figure 3—One scoring solution for the new, standardized rubric.

Description Grade Points Range Spread

Highest (40) All 4 A+ 40-39 2
Mostly 4 and some 3 A 38-36 3 7
Split 4 and 3 A- 35-34 2
Mostly 3 and some 4 B+ 33-32 2
Middle (30) All 3 B 31-28 4 9
Mostly 3 and some 2 B- 27-25 3
Mostly 2 and some 3 C+ 24-22 3
Middle (20) All 2 C 21-18 4 10
Mostly 2 and some 1 C- 17-15 3

Assessment in the Applied Studio 16

Mostly 1 and some 2 D+ 14-12 3
Middle (10) All 1 D 11-8 4 10
Mostly 1 and some 0 D- 7-5 3
Mostly or all 0 F 4-0 5 5

The final change to our rubrics was separating the jury grade from an assessment

of the student's performance level against professional performance standards. We

imagined a scale that ranged from "0", representing an absolute entry-level student who

should not be performing publicly, to a "13", representing a young professional

embarking on a career as a performer. The intent was to avoid having the jury grade

convey an unrealistic career expectation. Thus, an inexperienced performer, such as a

first-semester freshman, who works hard and meets all of the expectations of her

teacher and her teacher's syllabus, might still earn an "A" for the semester but would

place low on the Applied Performance Assessment Rubric.

We adapted our new, separate rubric (see Figure 4) from the model developed by

Clayne Robison for the vocalists at Brigham Young University that was given as Figure

1. Dr. Robison's model (2001) was specifically tailored for vocalists who are planning

on a professional singing career, but we felt that we could adapt it for our own needs.

This rubric is only scored after the jury rubric has been graded. In this way, we try to

ensure that the jury and semester grade are based on the syllabus and the studio

teacher's stated requirements.

Figure 4—Our new, Applied Performance Assessment Rubric, based on a model by Clayne Robison (2001)

Assessment in the Applied Studio 17

Preliminary technical work is still needed before attempting any significant public
performance. Not convinced that this student should pursue performance.

Preliminary technical work is still needed before attempting any significant public
performance. However, this student shows potential as a performer.

Preliminary technical work is still needed before attempting any significant public
performance. However, this student should consider performance.

In a university classroom setting (e.g., a studio class or short, non-public recital with
piano) this performance would have been almost satisfactory.

In a university classroom performance setting (e.g., a studio class or short, non-public

4 recital with piano), this performance would have been satisfactory. I would enjoy a 15-
minute recital with this performer.

In a university classroom performance setting (e.g., a studio class or short, non-public

5 recital with piano) this performance would have been very satisfactory. I would enjoy a
20-minute recital.

In a modest university public performance setting (a Monday-afternoon recital, a public,

6 shared recital program) this performance would have been mostly successful. I would
enjoy a 25-minute recital.

In a modest university public performance setting (a Monday-afternoon recital, a public,

7 shared recital program) this performance would have been successful. I would enjoy a
30-minute recital.

In a modest university public performance setting (a Monday-afternoon recital, a public,

8 shared recital program) this performance would have been completely successful. I
would enjoy a 40-minute recital.

In a featured university setting (opera, oratorio, full recital, solo with the orchestra) this
9 performance would have been mostly successful. I would enjoy a 50 minute recital with
this performer.

In a featured university setting (opera, oratorio, full recital, solo with the orchestra) this
10 performance would have been successful. I would enjoy a one-hour recital with this

In a featured university setting (opera, oratorio, full recital, solo with the orchestra) this
11 performance would have been completely successful. I would enjoy a one-and-a-half-
hour recital with this performer.

In an apprenticeship program or as an emerging artist in opera, oratorio, musical

theatre, or orchestral program, this performance would be successful.

Assessment in the Applied Studio 18

In a professional opera, oratorio, recital, or concert this performance would be
completely successful.

The final observation offered by our faculty was to replace the paper copy and

roll it into a technology-based solution. A first attempt used Google Docs, specifically a

Google Form that tallies the group's responses into a Google Sheet. This solution

presented all of the functionality of the paper rubric but in a different format that

looked more like a questionnaire rather than a scoring rubric. The advantages of a

Google Docs solution were obvious: simple, free, with collaboration built in, and they

can be set up relatively quickly, requiring a minimal amount of behind-the-scenes

tweaking. The disadvantages of a Google Docs solution are minimal error checking or

security features, the look and feel are different (it no longer looks like a rubric) and the

amount of manual data entry required. Our long-term goal is to move our rubrics into

our CMS (such as Canvas, Blackboard, Moodle and similar); a project that has already


Music Survey Results

In April 2016, we sent out a short survey to poll opinions about the calibration

process and the common rubric that was developed for music juries (see Figure 6). The

survey was sent out electronically to all music faculty and adjuncts, 23 people in all. Of

Assessment in the Applied Studio 19

those 23, six were full-time faculty and 17 were adjuncts, two of whom are classified as

“part-time.” Eleven took the survey, giving a 52% response rate.

Assessment in the Applied Studio 20

Figure 5—The post-assessment day survey that was completed by the faculty.

In May 2015, the music professors spent several hours working to calibrate jury
expectations and results for the purpose of equalizing and norming the jury process. They
also worked to develop a common rubric to be used in all juries by all instruments. In
December, the common rubric and new process were used for the first time. Please answer
the following questions about your experience and opinion about that process and the new
common rubric.
1. BRANCH to 6--Were you involved in the calibration process in May 2015?
Yes No

2. I think the calibration process helped create a shared vocabulary across applied
Strongly disagree Disagree Undecided Agree Strongly agree

3. I think it was valuable to calibrate

a. the common rubric.
b. performance standards across applied areas.
c. the jury process across applied areas.
d. with someone outside my applied area.

4. I think the calibration process clarified performance standards across applied areas.
Strongly disagree Disagree Undecided Agree Strongly agree

5. Since participating in the calibration process, I feel more ownership of

a. the jury process.
Strongly disagree Disagree Undecided Agree Strongly agree

b. the common rubric

Strongly disagree Disagree Undecided Agree Strongly agree

6. I think that having a common rubric …

a. changed the dynamics of the jury between faculty members.
b. clarifies performance standards.
c. makes my jury feedback to students more valid and reliable.
d. makes the jury process more equalized across all applied areas.
e. makes me better able to express to students how they will be evaluated.
Strongly disagree Disagree Undecided Agree Strongly agree

Please give specific feedback in as much detail as you like.

7. What changes, if any, have resulted from the calibration process and common

8. What changes, if any, resulted from the new rubric in the way you prepare students
for juries?

9. Any other comments?

Assessment in the Applied Studio 21

The survey used a branch flow to allow respondents to answer only those

questions that pertained to them. All respondents answered a series of eight questions

addressing their experience and opinions about using a common jury rubric, with

additional space to write comments. Since only the full-time faculty and two part-time

faculty participated in the calibration exercise, the branch flow in the survey directed

those respondents to a series of seven additional questions about the calibration

process. Of the eight full-time and part-time faculty who participated in calibration,

seven responded, giving us an 87% response rate on the calibration questions.

The responses to the survey questions about the calibration process suggest that

faculty appreciated this process for a number of reasons. Respondents indicated that

they valued having colleagues from different applied areas participating together in the

calibration process. They felt that calibrating across applied areas helped to create a

shared vocabulary, standardize the jury process, and clarify performance standards

meeting three of our four goals (see Figure 6).

Figure 6—Responses to the post-assessment day survey, questions 1-4.

Rated “Agree” or N
“Strongly Agree”
I think the calibration process helped create 100% 7
a shared vocabulary across applied areas.
I think it was valuable to calibrate:
the common rubric. 100% 7
performance standards 100% 7
across applied areas.
with someone outside my 100% 7
applied area.
I think it was valuable to calibrate the jury 86% 7
process across applied areas.
I think the calibration process clarified 86% 7
performance standards across applied areas.

Assessment in the Applied Studio 22

When asked what changes faculty experienced as a result of the calibration

process and common rubric, respondents’ comments were positive overall, such as “We

made our rubric more clear”; "Better rubric, better discussion and shared vocabulary.

We can now talk about how we want the jury to work and whether or not we are

successfully meeting our goals"; The new rubric “can give students a strong

understanding of where they are compared to the rest of the world.”

In spite of these positive comments, the survey results indicated that participants

did not feel a strong sense of ownership of either the jury process or the common rubric

even after they had gone through the calibration process. This is not surprising

considering that faculty had used the new rubric for juries only once.

Figure 7—Responses to the post-assessment survey, questions 5-8.

Rated “Agree” or “Strongly Agree” N

Since participating in the calibration process, 29% 7
I feel more ownership of the jury process.
Since participating in the calibration process, 57% 7
I feel more ownership of the common rubric.

The next series of questions were answered by all respondents, some who went

through the calibration process and some who did not. These questions focused on the

common rubric and indicated that the majority of respondents found the common

rubric improved the jury process and performance standards, and has the potential to

validate jury feedback to students. These findings supported the responses from those

who went through calibration discussed earlier, namely that faculty value dialoging

Assessment in the Applied Studio 23

with each other across applied areas about juries and the jury process. One comment

emphasized this, saying, “It was beneficial to see what other areas were using to assess

progress. This type of collaboration allowed for growth in the assessment of our

students department-wide.”

Figure 8—Responses to the post-assessment survey, questions 7-9

Rated “Agree” or “Strongly Agree” N

Having a common rubric makes the jury process 82% 11
more equalized across all applied areas.
Having a common rubric clarifies performance 73% 11
Having a common rubric makes my jury 73% 11
feedback to students more valid and reliable.

There were additional comments indicating mixed feelings about the rubric.

Some respondents felt the rubric was too complicated and that it distracted them from

focusing on students’ performances during juries, saying, “The entire jury was spent

attending to the… rubric.” In contrast, another commented that because of the rubric,

“students now have a clearer understanding of what is expected and what the jury saw

in the students' performance.” Again, these varying comments are understandable

given that faculty were only one semester in to using the new rubric and emphasize the

need to continue to refine and simplify the common rubric.

Though our sample size was small, it seems reasonable to assert from the

collected data that faculty found worth and value from discussions and calibration

across applied areas that have led to a shared vocabulary, and standardization of jury

expectations and process. Respondents who did not participate in the calibration

Assessment in the Applied Studio 24

process indicated that they valued the common rubric that was developed even while

admitting the need to simplify it and continue the discussion across applied areas.

There is every reason to think that these benefits will grow over time as the faculty use

the common rubric and go through more calibration exercises across disciplines.


Our Music Department implemented the assessment day exercise outlined in this

article with the goals of creating meaningful dialogue among our performance faculty,

developing common standards across applied areas, developing a shared vocabulary

about measuring these standards, and generating useful assessment data. Throughout

the day, our music faculty watched, scored, and discussed multiple video-recorded

student juries, then compared their scores against the original jury scores. Through this

process our faculty were able to get a feel for how the jury process worked in our

department and how scores and standards compared across performance disciplines.

They also were able to outline areas for improvement across our applied juries.

Although the discussions generated through the calibration process proved valuable to

our department, we did not find agreement in all areas and are still working on some of

the key components of our jury process, specifically the format of the rubrics and how

they are scored.

We have presented a snapshot of where we are in our process with one approach

to a common rubric, a simple scoring system that scores from 0 to 5, and the grading

Assessment in the Applied Studio 25

scale that could be used to score the rubric. We believe the jury assessment process that

we have outlined is both simple and universal enough that it can be implemented in a

wide number of music departments. The calibration process should prove itself to be

equally useful, and generate meaningful discussion and useful data for other

departments. We also feel that there is room for further development of our ideas, and

are already in process developing our own online solution that we hope to roll into our

campus LMS. In addition, we feel that a logical next step would be to encourage

students to use the assessment data to reflect and journal about their jury performance

and scores, and to use this data to map out a learning plan. In this way the assessment

data comes full circle and informs the work and direction of study for both student and


Assessment in the Applied Studio 26


Asmus, E. P. (January, 1999). Rubrics: Definitions, benefits, history and types. Music
Educators Journal, 85(4).

Asmus, E. P. (March, 2003). Music assessment concepts. Music Educators Journal, 86(2),
pp. 19-24. doi: 10.2307/3399585 .

Bergee, M.J. (Summer, 2003). Faculty interjudge reliability of music performance

evaluation. Journal of Research in Music Education, 51(2). doi: 10.2307/3345847.

Chase, D. M., Ferguson, J. L., & Hoey IV, J. J. (2014). Assessment in Creative Disciplines;
Quantifying the Aesthetic. Common Ground Publishing. Champaign, Il.

Ciorba, C. R., & Smith, N. Y. (April, 2009). Measurement of instrumental and vocal
undergraduate performance juries using a multidimensional assessment
rubric. Journal of Research in Music Education. doi: 10.1177/0022429409333405.

Fiske, H. E. (1983). Judging musical performances: Method or madness? The Applications

of Research in Music Education, 1(3), 7-10.

Fuller, J. A. (June 2014). Music assessment in higher education. Open Journal of Social
Sciences, (2) 476-484. doi: 10.4236/jss.2014.26056.

Gordon, E. (2002). Rating scales and their uses for evaluating achievement in music
performance. Chicago: GIA.

Latimer, M. E., Bergee, M. J., & Cohen, M. L. (2010). Reliability and perceived
pedagogical utility of a weighted music performance assessment rubric. The
National Association for Music Education. doi: 10.1177/0022429410369836.

Lebler, D., Carey, G, & Scott, D. (2014) Assessment in music education: from policy to
practice. doi: 10.1007/978-3-319-10274-0.

Mintz, S. (April 2015). Performance-based Assessment. Higher – ed. doi:


Assessment in the Applied Studio 27

National Association of Schools of Music (1999). National Association of Schools of Music
handbook 1999-2000. Reston, VA: Author.

Parkes, K. A. (2010). Performance assessment: Lessons from performers. International

Journal of Teaching and Learning in Higher Education. 22 (1), 98-106.

Robison, C. W. (2001). Beautiful singing: "Mind warp" moments (Ed.1). Provo, UT: Clayne
W. Robison.

Wesolowski, Brian C., (March 2012). Understanding and developing rubrics for music
performance assessment. Music Educators Journal, doi: 00274321, 98(3) 35-42.

Wesolowski, B. C., Wind, S. A., & Engelhard, G. (2015). Rater fairness in music
performance assessment: Evaluating model-data fit and differential rater
functioning. Musicae Scientiae 19(2) 147-170. DOI: 10.1177/1029864915589014.

Assessment in the Applied Studio 28

Appendix--Revised Assessment Rubric for Applied Juries
Item Exemplary (5) Proficient (4-3) Developing (2-1) Initial (0) Comments

Repertory and Style

Selections are appropriate to Selections demonstrate essen-tial Repertory is either well below or
Repertory is exceptional, creative course level and provide musical musical skills and offer some well beyond the student’s ability
Repertory and and innovative and tech-nical challenges that opportunities for the student to and displays minimal evidence of
Selection demonstrate growth display progress progress
5 4 3 2 1 0
Tone can tend to feel unsecure
Professional, full and character- Tone often loses focus and/or
Tone is characteristic, secure, and and tenuous at times. Tone is not
istically mature tone. Exceptional support and is uncharacteristic.
supported. The improvement and always centered or characteristic.
Tone Quality support, depth and volume Little or no improvement from
growth is evident. Some improvement is visible but
throughout selections previous semester
more is needed.
5 4 3 2 1 0
Incorrect pitches and/or serious
Accuracy and intonation are secure Intonation suffers at time and
Intonation is secure and profes- intonation problems visibly mar
and contribute to the musical missed notes interfere with the
sional technique evidenced even in the performance and make the
Pitch Accuracy technically difficult and awkward
presentation. Technical passages performance. Some improvement
listener uncomfortable. No visible
and Intonation are secure and improvement is has been made but more is
passages improvement from previous
evident. needed.
5 4 3 2 1 0
Technical Progress
Fluid technique and technical Technique is awkward and
Technique is improving although
Smooth, natural, and seemingly growth is evident throughout noticeably hampers the perfor-
difficulties are still evident. Tech-
effortless throughout selections. selections. Technical passages are mance. Technical difficulties from
nical passages are limited. Im-
Technical Facility Professional technique is im- secure and the performance
provement is visible but more is
previous semesters are still
pressive and technically brilliant demonstrates a wide range of evident, unchanged and
technical work unaddressed.
5 4 3 2 1 0

Inaccuracies and muddiness

Full range of articulations are Inaccuracies and muddiness
Wide range of articulations demon- visibly mar the performance.
accurate and effortless through-out communicate a lack of knowledge
strates an understanding of playing Little or no evidence of
selections and communicate a of or inability to engage playing
Articulation style. Musical style changes knowledge of playing style and
sophisticated and professional styles. Some improvement is
appropriately from piece to piece. little or no improvement from
understanding of playing style visible but more is needed.
previous semesters.

5 4 3 2 1 0
Tempo is significantly Misplaced rhythms and rhythmic
Tempos are secure and convey a
Nuanced use of tempo and rhythm slower/faster than suggested discrepancies mar the perfor-
strong grasp of playing style.
Metrical and is used to communicate at a high tempo. Misplaced rhythms and/or mance. Tempos are
Rhyth-mic nuance is used to
Rhythmical level. Tempos are technically discrepancies in rhythm are inappropriate. Technical
commu-nicate lines and emotiional
Accuracy brilliant. uncomfortable. Limited use of limitations prohibit the use of
rhythmic nuance. rhythmic nuance.
5 4 3 2 1 0

Item Exemplary (5) Proficient (4-3) Developing (2-1) Initial (0) Comments

Dynamic markings are not

Played as written and observed Observed most of the written
communicated and performance
Exceptional use of dynamic con- dynamic contrasts. Dynamics dynamics and at times used
does not engage the full dynamic,
Dynamics and trasts to richly communicate full creatively communicated an dynamics in a creative manner to
performing range. Little or no
Contrast range of dynamic possibilities. appropriate level of musical fashion the line. Some
progress from previous
understanding. improvement is visible.
5 4 3 2 1 0
Musicality and Performance
The musical line suffers at times Performance visibly suffers from
Phrasing clearly used to commu-
Exceptionally planned and from unclear, poorly executed or phrasing that is either inconsistent
nicate the musical line. Strong evi-
executed phrasing communicates missing phrasing. Improvement or completely missing. The
Phrasing dence of musical growth from
mature and professional musicality from previous semesters is musical line is not communicated
previous semesters.
evident but more is needed. and no improvement is evident.
5 4 3 2 1 0

Exceptionally high level of Appropriate style is maintained Incorrect style or lack of any
Communicates appropriate style
emotion-al involvement conveys a throughout the selections and stylistic change from piece to
and emotional connection is
deep understanding of the music emotional involvement is readily piece. Performer is emotionally
Musicianship/ and a desire to communicate an visible. Strong growth from
evident at times. Some growth is
detached from the music. No
Communication visible but more is needed.
emotional connection with the previous semesters. growth from previous semesters.

5 4 3 2 1 0
Appearance and deportment are
Appearance and deportment are Appearance and deportment are Appearance and/or deportment
professional, sophisticated and
appropriate and thoughtfully acceptable and do not detract are noticeably inappropriate and
Appearance and contribute to an impressive and
planned. from the performance. are visually uncomfortable.
Performance well-planned performance

5 4 3 2 1 0