Usability-Bab 6

Self Reported
Matrics
Mustika Febrillia 13414076
Atikah Arysolia Taifur 14414056
Self Reported Data
To learn about the usability of something is to ask
the partcipants to tell you about their experience
with it
How exactly to ask participants so we get good

data?
Self-reported data is one the best to describe it
Self-reported data is the verbatim comment

made by participants while using a product
Kind of Self-reported Data
Subjective data used as a counterpart to
objective, whichh is often used to describe
performance data form a usability study. But
imply lack of objectivity to the data youre
collecting
Preference data Often used as a counterpart

to performance. Preference implies a choice of
option over another, which is often not the case
in UX studies.
Importance of Self-Reported
Data
This data give the most important information about
users perception of the system and their interaction
with it.
The data would tell about how the users feel about
the system
Fact : users will not remember how long the process

of using a website and how many clicks they do,
but if the experience makes them happy , thats the
only thing matters.
Subjective reaction may be the best predictor of

their likelihood to return or make a purchase in the
future
Rating Scale
One of the most common ways to capture sefl-
reported data in a UX study is with some type of
rating scales.
Likert Scales
Using a statement to which respondents rate their
level of agreement.
Characteristics : (1) Express a degree of

agreement with a statement, (2) Give odd
number of response allowing natural response.
In designing statement, avoid adverb such as

very, extremely, or absolutely and use
unmodified versions of adjective
Semantic Differential Scales
The semantic differential technique involves
presenting pairs of bipolar or opposite adjective
Five seven point scale is commonly used (odd)
Please be aware of the connotations of different

pairings of words
When to collect self-
reported data?
Post-task ratings
Quick rating immediately after each task can help

pinpoint tasks and parts of the interface that are
particulary problematic
Post-study ratings
Can provide overall evaluation after the

participant has had interact with product more
fully. Can do more in-depth rating
How to collect ratings
There are 3 ways :
Answer questions or provide rating orally
Easiest method, but can get bias. Ex: feel

uncomfortable verbally stating poor ratings
Record responses on a paper form
Manual entry data can get bias
Provide responses using some type of online tool
Need tablets or laptop

Biases in collecting self-
reported data
Social desirability bias : respondents tend to give
answers they believe will make them look better
in the eyes of others.
Studies shown (Dillman et al., 2008) people

who are asked directly for self-reported provide
more positive feedback than when asked
through an anonymous web survey
General guidelines for rating
scales
Multiple scales to help triangulate can get
more reliable data if can think different ways to
ask participants
Odd or even number of values odd number

can get gather neutral response.
Total number of points five or seven points. But

more is not always better.
Analyzing Rating-Scale Data
The most common technique for analyzing data
from rating scales is to assign a numeric value to
each of the scale positions and then compute
the averages.
Can use descriptive statistics such as average,

modus and etc
Post-Task Ratings
The main goal of ratings associated with each
task is to give you some insight into which tasks
the participants thought the most diffiicult. Next
will examine some of spesific technique.
Ease of Use
Usually asking users to rate how easy or how
difficult each task was
Some UX professionals prefer to use a traditional

Likert scale
Compared to severl other post-task ratings and

found it to be among the most effective
After-Scenario
Questionnaire (ASQ)
Touch upon 3 fundamental areas of usability : (1)
effectiveness, (2) efficiency, and (3) satisfaction
There are developed 3 rating scales :

Expectation Measure
Most important thing about each task is how
easy or difficult it was in comparison to how easy
org difficult the user thought it was going to be
Expecting measure : asking the respondent rate

how easy/difficult they expect each of task to be
The result can interprate in quadran

A comparison of Post-task
Self-Reported Metrics
Goal : to see if these rating techniques are
sensitive to detecting differences in perceived
difficulty of the tasks
Also wanted to see how the perceived difficulty

of the tasks corresponded to the task
performance data
Post-session
Ratings
These can be used as an overall barometer of
the usability of the product
Aggregating Individual Task
Take an average of the individual task-based
ratings using rating or weighted average
Simply take an average of the data. If some tasks

are more important than others, than take a
weighted average.
By looking the data, we can take an average

perception as it changes over the time
System Usability
Scale
One of the most widely used
tools for assessing the
perceived usability of a system
or product.
Consist of 10 statements to
which user rate their level of
agreement
Interpretation of SUS score:
<50 : Not acceptable
50-70 : Marginal
>70 : Acceptable
Computer System Usability
Questionnaire (CSUQ)
CSUQ designed to administered by mail or online
Consists of 19 statements, user rates agreement

on seven-point scale
Statement example : It was simple to use this

system
Results viewed in four categories: System

Usefulness, Information Quality, Interface Quality,
and Overall Satisfaction
Questionnaire for User
Interface Satisfaction (QUIS)
Developed by HCIL 1998
Consists of 27 rating scales divided into five

categories: Overall Reaction, Screen,
Terminology/ System Information, Learning, and
System Capabilities
10-point scales change depend on statement

USE Questionnaire
Proposed by Arnie Lund (2001)
Summary in Radar Chart

Product Reaction Cards
Proposed by Benedeck and Miner (2002)
118 Cards of
Adjectives
Pick Top 5
Cards
Explain Why!
Result using
Word Cloud
Comparison of Postsession
Self-Reported Metrics
Study conducted by Tullis and Stetson (2004)
Used SUS, QUIS, CSUQ, Words, and Ours to evaluate

two web portals 123 participants, each use one
questionnaire
Net Promoter Score (NPS)
Originated by Fred Reichheld (2003)
How likely is it that you would recommend this to

a friend or colleague?
Three catagories of respondent:

o Detractors : gave ratings 0-6
o Passives : gave ratings 7 or 8
o Promoters : gave ratings 9 or 10
Using SUS to Compare
Designs
Traci Hart (2004): comparing three different
websites for adults. After attempting task, fill SUS
questionnaire
The American Insititutes for Research (2001) :

comparing Windows ME and Windows XP, 36
expert attempt task, then fill SUS questionnaire
Sarah (2006) : comparing three types of paper

ballots, using the ballots in simulation, then fill SUS
questionnaire
Online Services
VoC study typically done on live websites
Common approach Pop-up surveys
Another approach standard mechanism for

getting feedback
1. Website Analysis and Measurement Inventory
2. American Customer Satisfaction Index
3. OpinionLab
1. WAMMI
www.wammi.com
Composed of 20 statements, five-point likert

2. ACSI
3. OpinionLab
Page-level feedback from users
Issues with Live-Site Surveys
Number of questions
Self-selection of respondents
Number of respondents
Nonduplication of respondents
Other Types of Self-reported
Metrics
Assessing Specific Attributes
Assessing Specific Elements
Open-Ended Questions
Awareness and Comprehension
Awareness and Usefulness Gaps

Assessing Specific Attributes
Some of attribues of product or websites that
might be assessing:
- Visual Appeal
- Perceived Efficiency
- Confidence
- Usefulness
- Credibility
- Appropriateness of terminology
- Ease of navigation
- Responsiveness
Assessing Specific Elements
Such as:
- Instructions
- FAQs or Online help
- Homepage
- Search function
- Site map
- etc.
Open-Ended Questions
Allow user to add comments related to any of
the individual ratings scales
Ask the users to list three to five things they like

the most about the product and three to five
things they like the least
Ask to describe anything they found confusing

about the interface
Simple analysis method
Word Clouds
Awereness and
Comprehension
Testing users learning and comprehension to
website contents by giving quiz to test their
comprehension of the information
If necessary to administer a pretest to determine

what they already knew and compare to the
post test
Awereness and Usefulness
Gaps
Typically ask user about awareness as
a yes/no question, e.g: Were you
aware of this functionality?, answer
in 1-5 rating
Convert rating-scale data into top-2-

box score
Plot % of user who aware and % of

user who found the functionality
useful

Usability-Bab 6

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Usability-Bab 6

Hochgeladen von

Copyright:

Verfügbare Formate

Self Reported

How exactly to ask participants so we get good

Self-reported data is one the best to describe it

Self-reported data is the verbatim comment

Preference data Often used as a counterpart

Fact : users will not remember how long the process

Subjective reaction may be the best predictor of

Characteristics : (1) Express a degree of

In designing statement, avoid adverb such as

Five seven point scale is commonly used (odd)

Please be aware of the connotations of different

Quick rating immediately after each task can help

Can provide overall evaluation after the

Answer questions or provide rating orally

Easiest method, but can get bias. Ex: feel

Record responses on a paper form

Manual entry data can get bias

Provide responses using some type of online tool

Need tablets or laptop

Studies shown (Dillman et al., 2008) people

Odd or even number of values odd number

Total number of points five or seven points. But

Can use descriptive statistics such as average,

Some UX professionals prefer to use a traditional

Compared to severl other post-task ratings and

There are developed 3 rating scales :

Expecting measure : asking the respondent rate

The result can interprate in quadran

Also wanted to see how the perceived difficulty

Simply take an average of the data. If some tasks

By looking the data, we can take an average

Consists of 19 statements, user rates agreement

Statement example : It was simple to use this

Results viewed in four categories: System

Consists of 27 rating scales divided into five

10-point scales change depend on statement

Summary in Radar Chart

Used SUS, QUIS, CSUQ, Words, and Ours to evaluate

How likely is it that you would recommend this to

Three catagories of respondent:

The American Insititutes for Research (2001) :

Sarah (2006) : comparing three types of paper

Common approach Pop-up surveys

Another approach standard mechanism for

1. Website Analysis and Measurement Inventory

2. American Customer Satisfaction Index

Composed of 20 statements, five-point likert

Assessing Specific Elements

Awareness and Comprehension

Awareness and Usefulness Gaps

- FAQs or Online help

Ask the users to list three to five things they like

Ask to describe anything they found confusing

Simple analysis method

If necessary to administer a pretest to determine

Convert rating-scale data into top-2-

Plot % of user who aware and % of

Das könnte Ihnen auch gefallen