CONTOH Scale Construction

Psychology 5130 Lecture 3
Summated Scale Construction

Psychological Scales
Scale: An instrument, usually a questionnaire, designed to measure a person's position on some

dimension. Examples: Intelligence scale; conscientiousness scale; self-esteem scale, etc.
Can be responded to by oneself or by others – yielding what is called self-report or other-report.
Can assess ability, or attitudes, or personality
Just a very few of the dimensions which are important are
The Big 5 dimensions of Extraversion, Agreeableness, Conscientiousness, Stability, and Openness,
Narcissism, Cognitive ability, Job Satisfaction, Organizational commitment, Positive Affectivity,
Negative affectivity, Intent to leave, Job Embededness, Depression, Integrity, Self-esteem, . . . . . . . . . . .
Do we have enough psychological scales?
Here’s what one author said more than 25 years ago . . .
Focus here will be on Summated scales to measure personality related constructs.

Historical Perspective
Three types
Guttman
Thurstone
Likert (pronounced Likkert)
A Likert scale (/ˈlɪk.ərt/ LIK-ərt[1] but more commonly pronounced /ˈlaɪ.kərt/ LY-kərt)
Now, most scales are Likert.
P513 Lecture 3: Scale Construction - 1 11/16/2018

Likert Scale / Summated Scale
A set of statements regarding a construct, presented to respondents with instructions to indicate the
extent of agreement with each statement on a response scale consisting of from 2 alternatives to 11
alternatives.
Each response is assigned a numeric value.
Respondent's position on the dimension being measured, that is, the respondents, score, is the sum or
mean of the values of responses to the statements related to that dimension.
An example questionnaire from which 5 scale scores are extracted follows on the next two pages.
It’s the Sample 50-item Big Five questionnaire taken from the web site of the International Personality
Item Pool (IPIP) (http://ipip.ori.org/ipip/).
The 5 constructs measured by Big Five questionnaires are often called domains.
The items on the web site have been modified so that each is a complete sentence.
For example, item 1 on the web site is “Am the life of the party.” Here it is “I am the life of the party.”
Even numbered items have been shaded. I have no evidence that such shading is beneficial.
The IPIP web site recommends a 5-point response scale. I prefer a 7-point response scale.
If you need a 50-item Big Five questionnaire, you may copy and use what follows.
Items are: E: 1,6,11,16,21,26,31,36,41,46
A: 2,7,12,17,22,27,32,37,42,47
C: 3,8,13,18,23,28,33,38,43,48
S: 4,9,14,19,24,29,34,39,44,49
O: 5,10,15,20,25,30,35,40,45,50
Note the periodicity in the placement of items – every 5th item is from the same domain.
Such periodicity is common in multiple domain scales.

Questionnaire ID__________________________
Circle the number that represents how accurately the statement describes you.
7 = Completely Accurate
6 = Very Accurate
5 = Probably Accurate
4 = Sometimes Accurate, Sometimes Inaccurate
3 = Probably Inaccurate
2 = Very Inaccurate
1 = Completely Inaccurate
1. I am the life of the party. 1 2 3 4 5 6 7

2. I feel little concern for others. 1 2 3 4 5 6 7
3. I am always prepared. 1 2 3 4 5 6 7
4. I get stressed out easily. 1 2 3 4 5 6 7
5. I have a rich vocabulary. 1 2 3 4 5 6 7
6. I don't talk a lot. 1 2 3 4 5 6 7
7. I am interested in people. 1 2 3 4 5 6 7
8. I leave my belongings around. 1 2 3 4 5 6 7
9. I am relaxed most of the time. 1 2 3 4 5 6 7
10. I have difficulty understanding abstract ideas. 1 2 3 4 5 6 7
11. I feel comfortable around people. 1 2 3 4 5 6 7
12. I insult people. 1 2 3 4 5 6 7
13. I pay attention to details. 1 2 3 4 5 6 7
14. I worry about things. 1 2 3 4 5 6 7
15. I have a vivid imagination. 1 2 3 4 5 6 7
16. I keep in the background. 1 2 3 4 5 6 7
17. I sympathize with others' feelings. 1 2 3 4 5 6 7
18. I make a mess of things. 1 2 3 4 5 6 7
19. I seldom feel blue. 1 2 3 4 5 6 7
20. I am not interested in abstract ideas. 1 2 3 4 5 6 7
21. I start conversations. 1 2 3 4 5 6 7
22. I am not interested in other people's problems. 1 2 3 4 5 6 7
23. I get chores done right away. 1 2 3 4 5 6 7
24. I am easily disturbed. 1 2 3 4 5 6 7
25. I have excellent ideas. 1 2 3 4 5 6 7

Circle the number that represents how accurately the statement describes you.
7 = Completely Accurate
6 = Very Accurate
5 = Probably Accurate
4 = Sometimes Accurate, Sometime Inaccurate
3 = Probably Inaccurate
2 = Very Inaccurate
1 = Completely Inaccurate
26. I have little to say. 1 2 3 4 5 6 7

27. I have a soft heart. 1 2 3 4 5 6 7
28. I often forget to put things back in their proper place. 1 2 3 4 5 6 7
29. I get upset easily. 1 2 3 4 5 6 7
30. I do not have a good imagination. 1 2 3 4 5 6 7
31. I talk to a lot of different people at parties. 1 2 3 4 5 6 7
32. I am not really interested in others. 1 2 3 4 5 6 7
33. I like order. 1 2 3 4 5 6 7
34. I change my mood a lot. 1 2 3 4 5 6 7
35. I am quick to understand things. 1 2 3 4 5 6 7
36. I don’t like to draw attention to myself. 1 2 3 4 5 6 7
37. I take time out for others. 1 2 3 4 5 6 7
38. I shirk my duties. 1 2 3 4 5 6 7
39. I have frequent mood swings. 1 2 3 4 5 6 7
40. I use difficult words. 1 2 3 4 5 6 7
41. I don’t mind being the center of attention. 1 2 3 4 5 6 7
42. I feel others’ emotions. 1 2 3 4 5 6 7
43. I follow a schedule. 1 2 3 4 5 6 7
44. I get irritated easily. 1 2 3 4 5 6 7
45. I spend time reflecting on things. 1 2 3 4 5 6 7
46. I am quiet around strangers. 1 2 3 4 5 6 7
47. I make people feel at ease. 1 2 3 4 5 6 7
48. I am exacting in my work. 1 2 3 4 5 6 7
49. I often feel blue. 1 2 3 4 5 6 7
50. I am full of ideas. 1 2 3 4 5 6 7

Example - Overall Job Satisfaction Scale from Michelle Hinton Watson's Thesis
The items in this scale are presented as questions. In other instances, they are presented as statements.
If presented as statements, the responses would represent amount of agreement.
If you need an overall job satisfaction scale, you may use this.
For each statement please put a check ( ) in the space showing how you feel about the following aspects
of your job. This time, indicate how satisfied you are with the following things about your job.
(1) (2) (3) (4) (5) (6) (7)

Very Moderately Slightly Neither Slightly Moderately Very
Dissatisfied Dissatisfied Dissatisfied Satisfied nor Satisfied Satisfied Satisfied
Dissatisfied
VD MD SD N SS MS VS
1 2 3 4 5 6 7
Overall Satisfaction
27. How satisfied do you feel ( ) ( ) ( ) ( ) ( ) ( ) ( )

with your chances for getting
ahead in this organization?
32. All in all, how satisfied ( ) ( ) ( ) ( ) ( ) ( ) ( )

are you with the persons in
your work group?

are you with your supervisor?

are you with this organization,
compared to most others?
43. How satisfied do you feel ( ) ( ) ( ) ( ) ( ) ( ) ( )

with the progress you have
made in this organization up
to now?
45. Considering your skills and ( ) ( ) ( ) ( ) ( ) ( ) ( )

the effort you put into the
work, how satisfied are you
with your pay?

are you with your job?

Why do we have multi-item scales.
1. Precision, a first reason for using multiple-item scales.
Consider a single item in a scale, “I am satisfied with my job.” Now consider the true positions of
several respondents, represented by the positions of the top arrows:True Positions
VD MD SD N SS MS VS
1 2 3 4 5 6 7

are you with your job?
The response labels put there by the test constructor represent points on a continuum.
They're like 1-foot marks on a scale of height.
So, in the above situation each respondent except the rightmost one would respond, MS, which would be
assigned the value 6. But the rightmost respondent, whose actual position on the dimension is close to
his/her nearest neighbor, would pick VS, creating a considerable “response” distance from that neighbor.
Since each respondent can pick only one of the response CATEGORIES, any response made may miss
the respondent’s true amount of satisfaction by about 7 percent on a 7-point scale, by about 10 percent
on a 5-point scale.
Note the wide range of actual feelings which would be represented by a 6 above.
Consider that two persons very close in their actual feelings about the job could get scores which were
7% apart. E.g., a person whose actual feeling is 6.55 would check 7. But a person whose actual feeling
was 6.45 would check 6. The difference of 1 would be much greater than the actual difference of .1 in
actual feeling. See red arrows above. 6.45 6.55
6 7
This situation is analogous to one that most students have strong feelings about – the use of 5 grades to
represent performance in a course. We all remember those instances in which we missed the next higher
grade mark by a 10th of a point. The use of a single item with just a few response categories is
analogous.
Solution: Use multiple items. While each one may miss its mark considerably, some of the misses will
be positive and some will be negative, cancelling out the positives, so the average of the responses will
be very close to the respondent’s true position on the continuum.
Conclusion: Having multiple items and averaging responses to the multiple items increases
accuracy of identification of the respondent’s true position on a dimension.

2. Reliability, a second reason for using multiple-item scales.
Since a single categorical item response involves only a gross approximation to the actual (True) feeling,
on repeated measurement, a person giving only one response might get a very different score (6 vs. 7,
for example) on a single item. This reduces reliability. Reducing reliability reduces estimated validity.
Reducing estimated validity reduces your chances of getting published.
Conclusion: Summing or averaging responses to multiple items results in a measure that is

inherently more reliable.
3. Ability to use internal consistency to assess Reliability.
It is possible to assess the reliability of multiple items scales in a single administration of the scale to a
group by computing coefficient alpha. That is not possible with a single-item scale.
Conclusion: Using multiple items and basing the scale score on the sum or mean greatly facilitates
our ability to estimate the reliability of the scale score.
4. Insulation from the effects of idiosyncratic items.
Sometimes, a respondent will have a unique reaction to the wording of a single item. This reaction may
be based on the respondent’s history or understanding of that item. If that item is the only item in the
scale, then the respondent’s position on the dimension will be greatly distorted by that reaction.
Conclusion: Including multiple items and using the sum or mean of responses to them diminishes
the influence of any one idiosyncratic item.
Come on!!! What’s not to like about using multiple items???
1. Test length.
2. Overestimation of reliability by alpha from using too-similar items.

Issues in development of Likert/Summated scales
1. Do you have to ask for agreement?
The original idea was to assess agreement. But now, other ordered characteristics are used. E.g., level
of satisfaction, strength of feeling, accuracy with which a statement describes you, etc.
2. How many response categories should be employed?
2-11. Seven or more is preferable. Spector on p. 21 recommends 5-9. There are 3 reasons to use 7 or
more.
a. If your study will involve level shifts between conditions, you should allow plenty of room to shift,
which means using 7- or 9-point scales.
b. If you plan to use confirmatory factor analysis of structural equation models on your data, seven or
more response options per item is preferred for reasons associated with factor analysis.
c. We’ve obtained results suggesting that inconsistency of responding is relatively independent of
response level when 7 point scales are used.
Below are correlations of inconsistency (vertical) vs. level (horizontal) for 5-point scales on the left and
7-point on the right. X-axis is individual person scores. Y-axis is standard deviations of items making
up the score.
Nhung Honest 5 point response scale; r=-.35 Incentive Honest – 7 point response scale; r = -.03
Nhung faking 5 point response scale; r=-.43 Incentive Faking 7 point response scale; r = +.08
Vikus 5 point response scale; r = -.49 FOR Study Gen 7 point response scale; r = -.04
Bias Study IPIP – 5 point response scale; r = -.32 FOR Study FOR 7 point response scale; r = +.08
Bias Study NEO-FFI 5 point response scale; r = -.34 Worthy Thesis 7 point response scale; r=-.18

3. Should there be a neutral category?
I am not familiar with a clear-cut, strong argument either way. I prefer one.
If you analyze the data using Confirmatory Factor Analysis or Structural Equation Modeling, it doesn’t
matter.
My guess (and it’s just a guess) is that you’ll get a few more failures to respond without one, from
people who just can’t make up their minds.
And variability of responses might be slightly smaller with one, from those same people responding in
the middle.
But, I’m not aware of a meta-analysis on this issue.
4. What numeric values should be assigned to each response possibility for analyses based on
sums or means?
Although at one time there were arguments for scaling the various response alternatives, now almost
everyone who analyzes the data traditionally uses successive equally spaced integers. They need not
be, but everyone uses successive, as opposed to every other, for example, integers.
For example
Strongly Strongly
Disagree Disagree Neutral Agree Agree
1 2 3 4 5
Or
Strongly Moderately Moderately Strongly

Disagree Disagree Disagree Neutral Agree Agree Agree
1 2 3 4 5 6 7
Newer Confirmatory Factor Analysis and Structural Equation Modeling based analyses assuming the
data are “Ordered Categorical” require simply that the responses categories be ordered. No numeric
assignment is required.
5. If the analyses are based on sums or means, which integers should be used?
Answer: Any set will do. They should be successive integers.
1 to 5 or 1 to 7
0 to 4 or 0 to 6
-2 to +2 or -3 to +3.
6. Does agreement have to be high or low numbers?
Yes, the God of statistics will strike you down if you make small numbers indicate more of ANY
construct. Being a golfer will not save you.
I strongly prefer assigning numbers so that a bigger response value represents more of the construct as
it is named. I’m sure it’s what the God of Statistics intended.

7. What about including negatively worded items, perhaps better labeled as “opposite idea” items.
Negatively worded items may be included, although there is no guarantee that responses to negatively
worded items will be the actual negation of what the responses to a positively worded counterpart would
have been.
I like my supervisor vs. I dislike my supervisor.
Responses to these two items should be perfectly negatively correlated, but often they are not.
Many studies have found that items with negative wording are responded to similarly to other
negatively worded items, regardless of content or dimension, presumably due to the negativity of the
item, regardless of the main content of the items. We have found this in seven datasets.
We’ve also found that items with positive wording are responded to similarly regardless of content just
because of the fact that they’re positively-worded.
Recommendation:
Best: Design and, using factor analysis, analyze your questionnaire so that it permits
estimation of the bias tendencies. Estimate a general factor, a positively-worded item factor,
and a negatively-worded item factor. Treat these three factors as separate indicators of the
construct. Nobody does this now, because the discovery of such wording-related response
tendencies is still being investigated.
Expedient: Have an equal number of positively-worded and negatively-worded items and

average across wordings to cancel out differences in response tendencies associated with
wording.

8. If negatively worded items are included, how should they be scored?
Typically, negatively worded items are reverse-scored and then they’re treated as if they had been
positively worded.
Example for items with 5 categories and values 1,2,3,4, and 5.
Original Reversed
1 5
2 4
3 3
4 2
5 1
Suppose Q1 = I like my job

Suppose Q7 = I don’t like to come to work in the morning> A negatively-worded item for job
satisfaction.
Data matrix:
Person Q1 Q7 Q7R Scale as sum Scale as mean

1 5 2 4 9=5+4 4.5
2 4 1 5 9=4+5 4.5
3 1 5 1 2=1+1 1
4 2 4 2 4=2+2 2
9. Should the scale score be the sum or the mean of items?
If there are no missing values, the sum and the mean will be perfectly correlated – they’re
mathematically equivalent, so you can use either.
The mean is more easily related to the questionnaire items if they all have the same response format.
SPSS’s RELIABILITY procedure computes only the sum.
If there are missing values, use the mean of available items or use imputation techniques to be
described next to impute missing values, after which it won’t matter whether you use the mean or sum.

10. What about missing responses?
There are several possibilities
a. Listwise deletion. Generally not preferred but if only a couple out of 200 are missing, use it.
Q1 Q2 Q3 Substituted: Q1 Q2 Q3
1 5 4 5 1 5 4 5
2 2 _ 3 3 3 2 4
3 3 2 4 4 1 2 1
4 1 2 1
Problem: Can decimate the dataset. You may be left with highly conscientiousness, agreeable
participants because that kind of participant is the only that that will respond to all the items.
b. Item mean substitution. Substitute mean of other persons' responses to the same item for missing
item. Item mean substitution. Not recommended.
Q1 Q2 Q3 Substituted: Q1 Q2 Q3 Q1 Q2 Q3
5 4 5 5 4 5 5 4 5
2 _ 3 2 2.7 3 or
2 3 3
3 2 4 3 2 4 3 2 4
1 2 1 1 2 1 1 2 1
|
Mean of 4, 2, & 2 substituted.

c. Person scale mean substitution. Substitute mean of other items from same scale that person
responded to for the missing item. Assume Q1, Q2, and Q3 are three items forming a scale. Not
recommended.
Q1 Q2 Q3 Substituted: Q1 Q2 Q3
5 4 5 5 4 5
2 _ 3 2 2.5 3 Mean of 2 & 3 substituted.
3 2 4 3 2 4
1 2 1 1 2 1
d. Use a more sophisticated imputation technique. Several are available in SPSS.
I have often used SPSS’s imputation techniques.
---------------------------------
e. The convention wisdom is changing on issues of missing values. Many modern statistical techniques
are designed to work with all available data. These techniques do not include REGRESSION and GLM.
11. Writing the items. Spector, p. 23...
a. Each item should involve only one idea.
The death penalty should be abolished because it’s against religious law.
b. Avoid colloquialisms, jargon.
I am the life of the party. I shirk my duties. e. Avoid items that might trigger
emotional responses in certain
c. Consider the reading level of the respondent.
samples.
d. Avoid using “not” to create negatively worded items.
Good: Communication in my organization is poor.
Bad: Communication is my organization is not good.

The effect of self-presentation tendencies
Suppose two independent constructs are being measured using summated rating scales.
Suppose each construct was measured with a two-item scale using a 6-valued response format consisting
of the values 1 through 6.
Suppose 16 persons participated, giving the following matrix of responses.

Construct Construct
1 2
Q1 Q2 Q3 Q4
1 2 2 2 2
2 2 2 3 3
3 2 2 4 4
4 2 2 5 5
5 3 3 2 2
6 3 3 3 3
7 3 3 4 4
8 3 3 5 5
9 4 4 2 2
10 4 4 3 3
11 4 4 4 4
12 4 4 5 5
13 5 5 2 2
14 5 5 3 3
15 5 5 4 4
16 5 5 5 5
For these hypothetical data, Q1 and Q2 are perfectly correlated, as are Q3 and Q4. Obviously, items
within the same scale are not perfectly correlated in real life.
But Q1+Q2 are uncorrelated with Q3+Q4. The constructs are independent.
compute C1=mean(Q1,Q2).
compute C2=mean(Q3,Q4). Syntax to create construct scale scores
correlate c1 with c2.
True Correlation between the constructs, C1 and C2.

Correlations
C2
C1 Pearson Correlation .000
Sig. (2-tailed) 1.000
N 16

Now, suppose that the odd-numbered participants were people who preferred the low end of
whatever response scale they were filling out, while the even numbered participants were people who
preferred the high end of whatever scale they were filling out. Obviously, our participants don’t
separate into odd-even groups like this, but they do separate. There ARE people who prefer the high
end of the response scale and there ARE people who prefer the low end.
For example, many personality items have valence – agreeing indicates you’re “good”; disagreeing
indicates you’re “not good”. We think that those who are feeling good about themselves will be tend
to choose the agreement end of the response scale and those who are not feeling so good about
themselves will tend to choose the disagreement end. We believe that some people respond to
unobtrusive content – for example, the valence – of the items.
Assume that the response tendency results in a bias of 1 response value down in the case of those who
tend to disagree or up in the case of those who tend to agree.
Observed Values True Values
Construct 1 Construct 2 Construct 1 Construct 2
Person Biased Biased Biased Biased True True True True

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
1 1 1 1 1 1 2 2 2 2
2 3 3 4 4 2 2 2 3 3
3 1 1 3 3 3 2 2 4 4
4 3 3 6 6 4 2 2 5 5
5 2 2 1 1 5 3 3 2 2
6 4 4 4 4 6 3 3 3 3
7 2 2 3 3 7 3 3 4 4
8 4 4 6 6 8 3 3 5 5
9 3 3 1 1 9 4 4 2 2
10 5 5 4 4 10 4 4 3 3
11 3 3 3 3 11 4 4 4 4
12 5 5 6 6 12 4 4 5 5
13 4 4 1 1 13 5 5 2 2
14 6 6 4 4 14 5 5 3 3
15 4 4 3 3 15 5 5 4 4
16 6 6 6 6 16 5 5 5 5
Now the correlation between Q1+Q2 with Q3+Q4 is .555, a value that is statistically significant.
The point of this is that differences in participants' response tendencies (e.g., the tendency of some to
use only the upper part of a response scale while others use the lower part of the scale) can result in
positive correlations between constructs that are in fact, uncorrelated.
This is a problem that has been referred to as the method bias problem. The term refers to the fact that
correlations between constructs obtained using the same method are biased upwarly. It plagues the use
of summated rating scales. Many journals will not accept research in which the independent and the
dependent variables are measured using the same method.

The process of creating a summated scale.
1. Define/conceptualize the construct to be measured.
2. See if someone else has already created a scale measuring that construct. If so, and if it appears OK,
don’t re-invent the wheel. Faculty. Buros Institute. IPIP web site. Google.
http://buros.org/mental-measurements-yearbook
Remember . . .
3. If you must develop your own, begin by generating items.
4. Have a sample of SMEs rate the extent to which each item represents the construct. Keep only the
best.
5. Administer items to a pilot sample from the population of interest.
6. Perform item analysis of the responses of the pilot sample.
a. Assess reliability.
b. Identify bad items, those that reduce reliability, and eliminate them.
c. Assess dimensionality using exploratory factor analysis.
All items in the same scale should represent the same dimension and no other dimension.
7. Perform a validation study assessing convergent and discriminant validity using the population of
interest, perhaps using the pilot sample.
a. Administer other similar scales.
b. Administer other discriminating scales.
Kayitesi Wilt’s thesis – a validation study of the Cultural Intelligence Scale (CQS).
She compared the mean CQS scores of persons who’ve been abroad and enjoyed that travel vs
those who’ve been abroad and not enjoyed it. That will be convergent validity.
She administered the CQS, an Emotional Intelligence Scale, a Social Intelligence Scale, and a
Big Five questionnaire. She assessed discriminant validity of the CQS with respect to the other
scales. It should not be highly correlated with any of them. That’s discriminant validity
8. Administer to a sample from the population of interest along with the other scales that are part of
your research project.
Assess the theoretical relationships of interest to you.
9. Publish the scale and get rich.

Example of processing items of a scale in SPSS
This example is taken from an independent study project conducted by Lyndsay Wrensen examining
factors related to faking of the Big 5 personality inventory. She administered the IPIP Big 5
inventory twice – once under instructions to respond honestly and again (counterbalanced) under
instructions to respond as if seeking a customer service job.
The data here are the honest condition responses to the Extroversion scale. Participants read each item
and indicate how accurately it described them using 1=Very inaccurate to 5=Very accurate. Some of the
items were negatively worded. We now would use a 7-point response scale. This project was done
almost 10 years ago.
Extroversion item responses before reverse-scoring the negatively worded items.
of

1. Reverse-score the negatively-worded items.
SPSS Syntax to reverse-score negatively worded items.
recode he2 he4 he6 he8 he10 (1=5)(2=4)(3=3)(4=2)(5=1) into he2r he4r he6r he8r he10r.
execute.
Or you can do the reverse scoring manually or using pull-down menus.
However, you do it, put the reverse-scored values in columns that are different from the originals.
Extroversion item responses after reverse-scoring the negatively worded items.

2. Deal with missing data. (Not illustrated here.)
For example, use SPSS’s imputation features. Set up a time with me, and I’ll walk you through the
process.
3. Perform reliability analyses.
Analyze -> Scale -> Reliabilities
Note that the reverse-scored

items are the ones that are
included in the
RELIABILITY analysis.
Reliability
Wa rnings
Th e covarian ce m atrix is ca lcula ted and u sed in the ana lysis.
Ca se Pr oces sing Summ ary
N %
Ca ses Va lid 179 90. 4
Exclude d a 19 9.6
To tal 198 100 .0
a. Listwise delet ion b ased on al l vari ables in th e pro cedure.

Re liabil ity Statisti cs
Cro nbach's
Alp ha B ased on
Sta ndardized
Cro nbach's A lpha Ite ms N o f Item s
.85 9 .86 0 10
Ite m Sta tistic s
Me an Std . Deviatio n N
he 1 3.1 3 1.1 22 17 9
he 3 3.9 7 .90 8 17 9
he 5 3.7 2 1.0 93 17 9
he 7 3.3 4 1.2 77 17 9
he 9 3.4 1 1.2 16 17 9
he 2r 3.5 6 1.2 54 17 9
he 4r 3.2 7 1.1 36 17 9
he 6r 3.7 9 1.1 10 17 9
he 8r 2.7 4 1.2 24 17 9
he 10r 2.7 0 1.2 85 17 9
Inter-Ite m Correla tion Matr ix
he 1 he 3 he 5 he 7 he 9 he 2r he 4r he 6r he 8r he 10r
he 1 1.0 00 .33 4 .24 5 .51 8 .54 2 .36 7 .43 5 .21 1 .33 6 .29 6
he 3 .33 4 1.0 00 .47 3 .37 2 .33 1 .35 4 .36 7 .36 8 .17 0 .38 3
he 5 .24 5 .47 3 1.0 00 .55 3 .32 9 .42 1 .40 7 .48 8 .24 2 .51 9
he 7 .51 8 .37 2 .55 3 1.0 00 .42 7 .39 1 .40 4 .44 6 .19 8 .37 5
he 9 .54 2 .33 1 .32 9 .42 7 1.0 00 .38 2 .49 6 .25 8 .46 1 .29 5
he 2r .36 7 .35 4 .42 1 .39 1 .38 2 1.0 00 .55 0 .57 2 .33 1 .44 8
he 4r .43 5 .36 7 .40 7 .40 4 .49 6 .55 0 1.0 00 .37 5 .37 1 .45 0
he 6r .21 1 .36 8 .48 8 .44 6 .25 8 .57 2 .37 5 1.0 00 .24 1 .41 7
he 8r .33 6 .17 0 .24 2 .19 8 .46 1 .33 1 .37 1 .24 1 1.0 00 .21 0
he 10r .29 6 .38 3 .51 9 .37 5 .29 5 .44 8 .45 0 .41 7 .21 0 1.0 00
Th e covarian ce m atrix is ca lcula ted a nd u sed in the anal ysis.
Summa ry Ite m Statisti cs
Ma ximu m /
Me an Mi nimu m Ma ximu m Ra nge Mi nimu m Va riance N o f Item s
Ite m Me ans 3.3 63 2.6 98 3.9 72 1.2 74 1.4 72 .18 0 10
Ite m Va riances 1.3 63 .82 5 1.6 50 .82 5 2.0 00 .06 5 10
Int er-Ite m Co rrela tions .38 1 .17 0 .57 2 .40 2 3.3 63 .01 0 10
Th e covarian ce m atrix is ca lculat ed a nd used in the analysis.
Ite m-Total S tatistics
Co rrecte d
Scale M ean if Scale V arian ce Ite m-To tal Sq uare d Mu ltiple Cro nba ch's A lpha
Ite m De leted if I tem Delet ed Co rrela tion Co rrela tion if I tem Delet ed
he 1 30 .50 50 .162 .54 7 .45 6 .84 8
he 3 29 .66 52 .496 .51 6 .31 2 .85 1
he 5 29 .92 49 .504 .61 2 .50 4 .84 2
he 7 30 .29 47 .724 .61 0 .50 1 .84 2
he 9 30 .22 48 .691 .58 6 .44 9 .84 4
he 2r 30 .07 47 .501 .63 8 .48 9 .83 9
he 4r 30 .36 48 .546 .64 9 .45 5 .83 9
he 6r 29 .84 50 .080 .56 0 .44 3 .84 7
he 8r 30 .89 51 .309 .41 7 .27 3 .85 9
he 10r 30 .93 48 .501 .55 7 .37 5 .84 7

New Directions in Measurement of Psychological Constructs
1) Measurement Using Factor Scores from factor analyses
Factor scores are measures obtained from factor analyses of items.
Factors scores are computed by differentially weighting each item according to its contribution to the
indication of the dimension. Items which are not highly correlated with the dimension are given little
weight. Those which are highly correlated with the dimension are given more weight.
Note that summated scale scores are computed by equally weighting each item that is thought to
be relevant. So a summated scores is a crude factor score.
The loadings of the items on the factor are used to determine the weights.
Advantages of factor scores
They probably better capture the dimension of interest. They’re probably more highly
correlated with the dimension than the simple sum of items.
They can be computed taking into account other factors that might influence the items, thus may
be uncontaminated by the other factors.
Disadvantages of factor scores
They are harder to compute, requiring a factor analysis program.
The weights will differ from sample to sample so your weighting scheme based on your
sample will differ from my weighting scheme based on my sample.
2. Using techniques based on Item Response Theory.
Item Response Theory (IRT) is a statistical theory of how people respond to items and how to score the
items. It is kind of like factor analysis, but the underlying theory is different from that used in factor
analysis.
IRT methods are used by most large-scale test publishers, such as ETS, ACT, Pearson. IRT methods
routinely incorporate ideas that are not usually considered by persons using summated scales.
If you’re serious about measurement, you’ll have to learn a lot about both factor analytic methods and
IRT methods.

3) Using the whole buffalo: Measuring other aspects of responses to questionnaires, such as
inconsistency.
Virtually all scales are scores to represent the level of responses to items representing a dimension.
So, a Conscientiousness score is the average level, the mean of the responses of a person to the
Conscientiousness items in a questionnaire.
What about the variability of a person’s responses to the C items?
We’ve been exploring the relationships of Inconsistency of responding, as measured by the standard
deviation of persons’ responses to items from the same dimension.
Recent data (Reddock, Biderman, & Nguyen, 2011).
Overall UGPA was the criterion. Conscientiousness and Variability were predictors.
Results with Conscientiousness scale scores from the FOR condition

.250
FORCon
UGPA R = .315
-.219
Inconsistency
Both standardized coefficients are significant at p < .01.
These data suggest that Inconsistency of Responding may be a valid predictor of certain types of
performance.
References
Reddock, C. M., Biderman, M. D., & Nguyen, N. T. (2011). The relationship of reliability and validity
of personality tests to frame-of-reference instructions and within-person inconsistency.
International Journal of Selection and Assessment, 19, 119-131.
For example: If you give a Big 5 questionnaire to a group of respondents, you can measure the
following 11 attributes
Extraversion Agreeableness Conscientiousness Stability Openness
General Affect Positive Wording Bias Negative Wording Bias
Inconsistency Extreme Response Tendency Acquiescent Response Tendency

CONTOH Scale Construction

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

CONTOH Scale Construction

Hochgeladen von

Copyright:

Verfügbare Formate

Psychology 5130 Lecture 3

Summated Scale Construction

Scale: An instrument, usually a questionnaire, designed to measure a person's position on some

Here’s what one author said more than 25 years ago . . .

Focus here will be on Summated scales to measure personality related constructs.

Now, most scales are Likert.

P513 Lecture 3: Scale Construction - 1 11/16/2018

Items are: E: 1,6,11,16,21,26,31,36,41,46

Such periodicity is common in multiple domain scales.

P513 Lecture 3: Scale Construction - 2 11/16/2018

1. I am the life of the party. 1 2 3 4 5 6 7

P513 Lecture 3: Scale Construction - 3 11/16/2018

26. I have little to say. 1 2 3 4 5 6 7

P513 Lecture 3: Scale Construction - 4 11/16/2018

(1) (2) (3) (4) (5) (6) (7)

27. How satisfied do you feel ( ) ( ) ( ) ( ) ( ) ( ) ( )

32. All in all, how satisfied ( ) ( ) ( ) ( ) ( ) ( ) ( )

35. All in all, how satisfied ( ) ( ) ( ) ( ) ( ) ( ) ( )

37. All in all, how satisfied ( ) ( ) ( ) ( ) ( ) ( ) ( )

43. How satisfied do you feel ( ) ( ) ( ) ( ) ( ) ( ) ( )

45. Considering your skills and ( ) ( ) ( ) ( ) ( ) ( ) ( )

50. All in all, how satisfied ( ) ( ) ( ) ( ) ( ) ( ) ( )

P513 Lecture 3: Scale Construction - 5 11/16/2018

1. Precision, a first reason for using multiple-item scales.

27. All in all, how satisfied ( ) ( ) ( ) ( ) ( ) ( ) ( )

P513 Lecture 3: Scale Construction - 6 11/16/2018

Conclusion: Summing or averaging responses to multiple items results in a measure that is

3. Ability to use internal consistency to assess Reliability.

4. Insulation from the effects of idiosyncratic items.

Come on!!! What’s not to like about using multiple items???

2. Overestimation of reliability by alpha from using too-similar items.

P513 Lecture 3: Scale Construction - 7 11/16/2018

P513 Lecture 3: Scale Construction - 8 11/16/2018

Strongly Moderately Moderately Strongly

6. Does agreement have to be high or low numbers?

P513 Lecture 3: Scale Construction - 9 11/16/2018

I like my supervisor vs. I dislike my supervisor.

Expedient: Have an equal number of positively-worded and negatively-worded items and

P513 Lecture 3: Scale Construction - 10 11/16/2018

Example for items with 5 categories and values 1,2,3,4, and 5.

Suppose Q1 = I like my job

Person Q1 Q7 Q7R Scale as sum Scale as mean

9. Should the scale score be the sum or the mean of items?

SPSS’s RELIABILITY procedure computes only the sum.

P513 Lecture 3: Scale Construction - 11 11/16/2018

Mean of 4, 2, & 2 substituted.

P513 Lecture 3: Scale Construction - 12 11/16/2018

Suppose 16 persons participated, giving the following matrix of responses.

True Correlation between the constructs, C1 and C2.

P513 Lecture 3: Scale Construction - 13 11/16/2018

Person Biased Biased Biased Biased True True True True

P513 Lecture 3: Scale Construction - 14 11/16/2018

1. Define/conceptualize the construct to be measured.

3. If you must develop your own, begin by generating items.

5. Administer items to a pilot sample from the population of interest.

6. Perform item analysis of the responses of the pilot sample.

Assess the theoretical relationships of interest to you.

9. Publish the scale and get rich.

P513 Lecture 3: Scale Construction - 15 11/16/2018

Extroversion item responses before reverse-scoring the negatively worded items.

P513 Lecture 3: Scale Construction - 16 11/16/2018

SPSS Syntax to reverse-score negatively worded items.

Or you can do the reverse scoring manually or using pull-down menus.

Extroversion item responses after reverse-scoring the negatively worded items.